Immediate and accurate analysis of financial time series data is crucial to the price discovery mechanism that is at the heart of capital markets.

We’ll show you how insights can be derived from financial time series data, in real-time, using Machine Learning. In particular, a Keras model implementing an LSTM neural network for anomaly detection is provided.

The data we’re using comes from the Band Protocol public dataset available in Google BigQuery. Band Protocol is a cross-chain data oracle platform that aggregates and connects real-world data and APIs to smart contracts.


In this article we’ll cover our Ethereum 2.0 ETL tools for exporting Ethereum 2.0 blockchain data, the public Medalla dataset in BigQuery, and a Nansen.ai dashboard we built with some interesting charts and tables.

The article is broken down into three parts:

  1. A quickstart guide for Ethereum 2.0 ETL. The tools allow you to export beacon blocks, attestations, deposits, slashings, voluntary exits, validators, and validator committees.
  2. Some sample queries for BigQuery.
  3. Visualisations in Nansen.

Now, let’s go through the details.

Ethereum 2.0 ETL Quickstart Guide

The easiest way to get started is to request access to a Medalla node on Infura. Scroll to the bottom…


In this article you’ll learn how to package a JavaScript library for use in a BigQuery UDF. We’ll consider a particular example — ethers.js —a complete Ethereum library and wallet implementation in JavaScript. It will allow us to decode raw transactions and logs data in the bigquery-public-data.crypto_ethereum dataset in BigQuery.

The whole process can be broken down into three steps:

  1. Create a package.js file with the JavaScript lib dependency.
  2. Create webpack.config.js and build a JS file using webpack.
  3. Upload the generated JS file to GCS and use it in BigQuery UDF.

You can find the source code for bundling and…


This is a tutorial article explaining how to replay time series data from a BigQuery table into a Pub/Sub topic. There are several use cases when you might need it:

  • Backtesting.
  • Demos / Visualizations.
  • Integration testing.

The go-to GCP service for moving data between different services is Dataflow. While there are many Google-provided Dataflow templates, there are none for moving data from BigQuery to Pub/Sub.

That’s why we developed our own tool to solve this task: https://github.com/blockchain-etl/bigquery-to-pubsub. It can be used to replay any BigQuery table with a TIMESTAMP field. It’s a Python program that sequentially pulls chunks of data…


  • You can now easily query parsed ENS, 0x and many more (see below) smart contract events in Google BigQuery: 0x tables, ENS tables. Those tables are near real-time.
  • You can easily add events for any Ethereum contract you are interested in to public blockchain-etl datasets. Find instructions below.

Accessing Datasets in BigQuery

After you open the dataset links (0x tables, ENS tables) in your browser you should be able to see the blockchain-etl project and the available datasets within it on the left:

You can select each table and view its schema, details, and preview the data on the bottom right.

Try pasting this…


The crypto_ethereum and crypto_bitcoin datasets in BigQuery are now updated using the streaming technology. You can also subscribe to public Pub/Sub topics that feed those tables.

The overall architecture is depicted below:

Blockchain ETL architecture

The following blockchains are covered:

  • Ethereum
  • Bitcoin
  • ZCash
  • Litecoin
  • Doge
  • Dash

We added delays for each blockchain that prevent streaming orphaned blocks resulting from chain reorganisations. You can look up how many blocks we lag behind the tip of the chain in the LAG_BLOCKS parameter in the configuration files in the Github repository https://github.com/blockchain-etl-streaming. Those values were calculated based on the longest orphaned chains within the last year…


The Laws of Human Nature by Robert Greene

For every law the author provides an example from history, interprets and explains it and gives advice on how to use the law.

  • Pros: very interesting historical examples.
  • Cons: the author didn’t provide any scientific evidence for the laws in the book so they are purely an opinion of the author.

1. The Law of Irrationality

  • Law: Often people are dominated by emotions and behave irrationally without realizing it. This is the source of bad decisions and negative patterns in life.
  • Example: Athenes prospered when it was led by Pericles in 400 BC, who is believed to…


With our recent release of Bitcoin-derived blockchain datasets, BigQuery now contains 8 cryptocurrencies in total including Bitcoin, Bitcoin Cash, Zcash, Litecoin, Dogecoin, Dash, Ethereum, and Ethereum Classic. Below is the graph demonstrating daily transaction counts for those blockchains:

Daily transaction counts. Scroll to bottom for interactive version of the graphs.

Ethereum is clearly leading with almost 600k daily transactions on average in Jan 2019. The highest daily transactions were seen in Jan 2018 with almost 1.2M tx/day, which is around 14 tx/sec on average.

You can also see the Ethereum DAO and the Bitcoin Cash forks on the top graph. …


In this article I will guide you through the process of creating an ERC20 token recommendation system built with TensorFlow, Cloud Machine Learning Engine, Cloud Endpoints, and App Engine. The solution is based on the tutorial article by Google. The data used for training the recommendation system is taken from our public Ethereum dataset in BigQuery.

The article is broken down into the following parts:

  1. Intro to collaborative filtering for recommendation systems.
  2. Creating and training the model for token recommendation system.
  3. Tuning hyperparameters in Cloud ML Engine.
  4. Deploying the recommendation system to Cloud Endpoints and App Engine.

Intro to collaborative filtering for recommendation systems

The collaborative filtering


The Gini coefficient, also known as the Gini index, is a common econometric tool for measuring inequality of asset distribution.

Here is the query that outputs Gini coefficient for each day given daily non-zero (anonymous) account balances:

It uses 1 — 2B formula from this Wikipedia page https://en.wikipedia.org/wiki/Gini_coefficient, where B is the area under the Lorenz curve:

  • balance * (rank — 1) is the area of the rectangular horizontal slice under the Lorenz curve.
  • balance / 2 is the area of the triangle on the left of the rectangular slice.
  • all slices are then summed: sum((balance * (rank —…

Evgeny Medvedev

Creator of https://github.com/blockchain-etl, Co-founder of https://d5.ai and https://nansen.ai, Google Cloud GDE, AWS Certified Solutions Architect

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store