Exporting and Analyzing Ethereum Blockchain

  1. When did the first in history ERC20 transfer happen?
  2. How many ERC20 compatible contracts are there on the blockchain?
  3. What are the most popular tokens?
  4. What fraction of ERC20 transfers are sent to new addresses, i.e. addresses that had 0 balance before?
  5. In what order are transactions included in a block in relation to their gas price?
  6. What fraction of all transactions are calls to contracts?
  7. What was the highest transaction throughput?
  8. What is the total Ether volume?
  9. Token balance for any address on any date?
  10. What is the total gas used in all transactions?

Exporting Data to CSV

I exported the first 5 million blocks and the whole process took me:

> nohup geth --cache=1024 &
> geth attach 
> eth.syncing
currentBlock: 5583296,
highestBlock: 5583377,
knownStates: 65750401,
pulledStates: 65729512,
startingBlock: 5268399
  • Clone Ethereum ETL
> git clone https://github.com/medvedev1088/ethereum-etl
  • Install dependencies
> cd ethereum-etl
> sudo apt-get install python3-pip
> pip3 install -r requirements.txt
  • Run Ethereum ETL (you may want to comment out the parts that you don’t need in export_all.sh before running it):
> nohup bash export_all.sh -s 0 -e 4999999 -b 100000 -p file://$HOME/.ethereum/geth.ipc -o output &
> aws configure
AWS Access Key ID:
  • Upload the files to S3:
> cd output
> aws s3 sync . s3://<your_bucket>/

Running SQLs in AWS Athena

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. You can also connect to it with the JDBC driver.

CREATE DATABASE ethereumetl;
  • Run sanity check queries:
AND end_block >= $query_start_block
AND start_block <= $query_end_block;
  • Token balance for any address on any block height:
> SELECT token_address, sum(value) AS balance
(SELECT token_address,
to_address AS token_address
FROM token_transfers
SELECT token_address,
from_address AS token_addres
FROM token_transfers)
WHERE token_address = '0x2cd7e2e65bb6ef5f233f9dea83736cf8195fe70d'
AND block_number < 5600000
GROUP BY token_address
token_address balance
0x809826cceab68c387726af962713b64cb5cb3cca 0
0xd3ace836e47f7cf4948dffd8ca2937494c52580c 1500000000000000000000
0xf3e014fe81267870624132ef3a646b8e83853a96 7770000000000000000
0xa4e8c3ec456107ea67d3075bf9e3df3a75823db0 0 0x0947b0e6d821378805c9598291385ce7c791a6b2 0
  • The first ERC20 transfer:
> SELECT tx_hash
FROM token_transfers
WHERE start_block < 1000000
ORDER BY block_number limit 1
  • How many ERC20 compatible contracts are there on the blockchain?
> SELECT count(distinct(token_address))
FROM token_transfers
(SELECT count(1)
FROM transactions
WHERE gas != 21000) AS DECIMAL(38, 10))/
(SELECT count (1)
FROM transactions)
  • What fraction of ERC20 transfers are sent to new addresses, i.e. addresses that had 0 balance before the transfer? (48%)
(SELECT count(1)
(SELECT count(1) AS transfer_count
FROM token_transfers
GROUP BY token_address, to_address)) AS DECIMAL(38, 10)) /
(SELECT count(1)
FROM token_transfers)
  • What was the highest transaction throughput, assuming 15 seconds block time? (25 transactions per second)
> select max(transaction_count) / 15 from blocks25
  • Maximum number of ERC20 transfers per transaction (1000):
> SELECT count(1), tx_hash
FROM token_transfers
GROUP BY tx_hash
ORDER BY 1 DESC limit 1
  • Total Ether volume (5.8 billion Ether ~= $2.9 trillion for $500/Ether):
> select sum(value) from transactions5752307481068270840842025778
  • Total value of gas used (746 thousand Ether ~= $373 million ) (this should use receipts.gas_used instead of transactions.gas):
> select sum(CAST(gas AS DECIMAL(38, 0))  * gas_price) from transactions745703410653020991704491

Visualizing the Data

Amazon QuickSight is a cloud-powered business analytics service that makes it easy to build visualizations, perform ad-hoc analysis, and get business insights from your data. It allows querying data from Amazon Redshift, Amazon RDS, Amazon Athena, Amazon S3, and Amazon EMR (Presto and Apache Spark); connect to databases like SQL Server, MySQL, and PostgreSQL, in the cloud or on-premises.

  • Sign in to QuickSight console https://us-east-1.quicksight.aws.amazon.com/sn/start
  • Go to Manage QuickSight on the top right > Account Settings > Edit AWS Permissions. Check the box for Amazon Athena, Amazon S3 and select all S3 buckets
  • Return to the QuickSight dashboard and click on Manage Data
  • Click on New Data Set and select Athena
  • Enter the Data source name and click on Create Data Source
  • Select ethereumetl database, then blocks table
  • TODO
  • Transaction Count
  • ERC20 Transfer Count
  • ERC20 Tokens with Greatest Number of Transfers



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Evgeny Medvedev

Evgeny Medvedev

Creator of https://github.com/blockchain-etl, Co-founder of https://d5.ai and https://nansen.ai, Google Cloud GDE, AWS Certified Solutions Architect