Unlocking Insights: Blob Data Analysis with Ethereum ETL Post-Duncan Upgrade

Evgeny Medvedev
5 min readMay 29, 2024

--

With the recent Duncan upgrade, the Ethereum ecosystem has seen substantial changes, particularly in the management of blob data. These changes demand updated ETL tools and methodologies. This article examines how the new enhancements to Ethereum ETL support advanced blob data analysis, showcasing practical uses and visualizations.

Enhancements to Ethereum ETL

To keep pace with the Duncan upgrade, important updates have been made to the Ethereum ETL project. You can review the details in the pull request here and the updated schema is available here.

The new fields added to each CSV file are as follows:

blocks.csv:

  • blob_gas_used (bigint): The total amount of blob gas consumed by transactions in the block.
  • excess_blob_gas (bigint): A running total of blob gas consumed in excess of the target, prior to the block. This is used to set blob gas pricing.

transactions.csv:

  • max_fee_per_blob_gas (bigint): The maximum fee a user is willing to pay per blob gas.
  • blob_versioned_hashes (string): A list of hashed outputs from kzg_to_versioned_hash.

receipts.csv:

  • blob_gas_price (bigint): The price of blob gas in a transaction.
  • blob_gas_used (bigint): The amount of blob gas used in a transaction.

These new fields enable users to extract detailed blob data and perform sophisticated analyses. The enhancements to the Postgres exporter now support these new blob fields, allowing efficient querying and analysis using the Postgres database. For more details, you can check the pull request here.

Additionally, new fields have been added to the BigQuery public datasets:

bigquery-public-data.crypto_ethereum.blocks:

  • blob_gas_used (INT64): The total amount of blob gas consumed by transactions in the block.
  • excess_blob_gas (INT64): A running total of blob gas consumed in excess of the target, prior to the block. This is used to set blob gas pricing.

bigquery-public-data.crypto_ethereum.transactions:

  • max_fee_per_blob_gas (INT64): The maximum fee a user is willing to pay per blob gas.
  • blob_versioned_hashes (STRING, REPEATED): A list of hashed outputs from kzg_to_versioned_hash.
  • receipt_blob_gas_price (INT64): Blob gas price in a transaction.
  • receipt_blob_gas_used (INT64): Blob gas used in a transaction.

Querying and Analysing the Data

To demonstrate the practical applications of these enhancements, let’s consider two key analyses: daily gas spend and blob submitter analysis. You can find the associated Google Colab notebook with interactive charts here.

Daily Gas Spend Analysis

We queried the Ethereum blockchain to calculate the daily gas spend. Using Plotly, we created an interactive plot to visualize this data.

import pandas as pd
import plotly.graph_objects as go
from google.colab import auth
from google.cloud import bigquery

# Authenticate to Google Cloud
auth.authenticate_user()

# Create a BigQuery client
client = bigquery.Client(project='<your_gcp_project_id>')

# Define the SQL query
query = """
SELECT
DATE(block_timestamp) AS date,
SUM((CAST(receipt_blob_gas_used AS NUMERIC) * CAST(receipt_blob_gas_price AS NUMERIC))) AS gas_spend
FROM bigquery-public-data.crypto_ethereum.transactions
WHERE DATE(block_timestamp) > '2024-04-13'
GROUP BY date
ORDER BY date DESC
"""

# Run the query
query_job = client.query(query)
results = query_job.result()
df = results.to_dataframe()

# Create the interactive plot
fig = go.Figure()

fig.add_trace(go.Scatter(
x=df['date'],
y=df['gas_spend'],
mode='lines+markers',
name='Gas Spend',
marker=dict(symbol='circle', size=6)
))

fig.update_layout(
title='Daily Ethereum Gas Spend',
xaxis_title='Date',
yaxis_title='Gas Spend',
xaxis=dict(
tickformat='%Y-%m-%d',
tickangle=45
),
height=600,
template='plotly_white'
)

# Show the plot
fig.show()
Interactive chart available here

Blob Submitters Analysis

We analyzed the blob submitters and visualized their activity with an interactive stacked bar chart.

# Define the SQL query
query = """
-- Leveraging the fantastic labeled data from https://dune.com/queries/3521610
WITH labeled_data AS (
WITH layer_twos AS (
SELECT address, entity, x_username
FROM UNNEST([
STRUCT('0x0d3250c3d5facb74ac15834096397a3ef790ec99' AS address, 'zkSync' AS entity, 'zksync' AS x_username),
STRUCT('0x2c169dfe5fbba12957bdd0ba47d9cedbfe260ca7' AS address, 'StarkNet' AS entity, 'Starknet' AS x_username),
STRUCT('0x6887246668a3b87f54deb3b94ba47a6f63f32985' AS address, 'OP Mainnet' AS entity, 'Optimism' AS x_username),
STRUCT('0x5050f69a9786f081509234f1a7f4684b5e5b76c9' AS address, 'Base' AS entity, 'base' AS x_username),
STRUCT('0xc1b634853cb333d3ad8663715b08f41a3aec47cc' AS address, 'Arbitrum' AS entity, 'arbitrum' AS x_username),
STRUCT('0x625726c858dbf78c0125436c943bf4b4be9d9033' AS address, 'Zora' AS entity, 'ourZORA' AS x_username),
STRUCT('0x889e21d7ba3d6dd62e75d4980a4ad1349c61599d' AS address, 'Aevo' AS entity, 'aevoxyz' AS x_username),
STRUCT('0x41b8cd6791de4d8f9e0eaf7861ac506822adce12' AS address, 'Kroma' AS entity, 'kroma_network' AS x_username),
STRUCT('0x14e4e97bdc195d399ad8e7fc14451c279fe04c8e' AS address, 'Lyra' AS entity, 'lyrafinance' AS x_username),
STRUCT('0x99199a22125034c808ff20f377d91187e8050f2e' AS address, 'Mode' AS entity, 'modenetwork' AS x_username),
STRUCT('0x415c8893d514f9bc5211d36eeda4183226b84aa7' AS address, 'Blast' AS entity, 'Blast_L2' AS x_username),
STRUCT('0x6017f75108f251a488b045a7ce2a7c15b179d1f2' AS address, 'Fraxtal' AS entity, 'fraxfinance' AS x_username),
STRUCT('0x99526b0e49a95833e734eb556a6abaffab0ee167' AS address, 'PGN' AS entity, 'pgn_eth' AS x_username),
STRUCT('0xc70ae19b5feaa5c19f576e621d2bad9771864fe2' AS address, 'Paradex' AS entity, 'tradeparadex' AS x_username),
STRUCT('0xa9268341831eFa4937537bc3e9EB36DbecE83C7e' AS address, 'Linea' AS entity, 'LineaBuild' AS x_username),
STRUCT('0xc94c243f8fb37223f3eb2f7961f7072602a51b8b' AS address, 'Metal' AS entity, 'Metal_L2' AS x_username),
STRUCT('0xe1b64045351b0b6e9821f19b39f81bc4711d2230' AS address, 'Boba Network' AS entity, 'bobanetwork' AS x_username),
STRUCT('0x08f9f14ff43e112b18c96f0986f28cb1878f1d11' AS address, 'Camp Network' AS entity, 'Camp_L2' AS x_username),
STRUCT('0x5ead389b57d533a94a0eacd570dc1cc59c25f2d4' AS address, 'Public Goods Network' AS entity, 'pgn_eth' AS x_username),
STRUCT('0xcf2898225ed05be911d3709d9417e86e0b4cfc8f' AS address, 'Scroll' AS entity, 'Scroll_ZKP' AS x_username),
STRUCT('0xa6ea2f3299b63c53143c993d2d5e60a69cd6fe24' AS address, 'Lisk' AS entity, 'LiskHQ' AS x_username),
STRUCT('0x3d0bf26e60a689a7da5ea3ddad7371f27f7671a5' AS address, 'Optopia' AS entity, 'Optopia_AI' AS x_username),
STRUCT('0x5c53f2ff1030c7fbc0616fd5b8fc6be97aa27e00' AS address, 'Lumio' AS entity, 'PontemNetwork' AS x_username),
STRUCT('0x1fd6a75cc72f39147756a663f3ef1fc95ef89495' AS address, 'opBNB' AS entity, 'BNBCHAIN' AS x_username),
STRUCT('0xa76e31d8471d569efdd3d95d1b11ce6710f4533f' AS address, 'Manta' AS entity, 'MantaNetwork' AS x_username),
STRUCT('0x84bdfb21ed7c8b332a42bfd595744a84f3101e4e' AS address, 'Karak' AS entity, 'Karak_Network' AS x_username),
STRUCT('0x994c288de8418c8d3c5a4d21a69f35bf9641781c' AS address, 'Hypr' AS entity, 'hypr_network' AS x_username),
STRUCT('0x6079e9c37b87fe06d0bde2431a0fa309826c9b67' AS address, 'Ancient8' AS entity, 'Ancient8_gg' AS x_username),
STRUCT('0x2f6afe2e3fea041b892a6e240fd1a0e5b51e8376' AS address, 'Mantle' AS entity, '0xMantle' AS x_username),
STRUCT('0xcdf02971871b7736874e20b8487c019d28090019' AS address, 'Metis' AS entity, 'MetisL2' AS x_username),
STRUCT('0xf8db8aba597ff36ccd16fecfbb1b816b3236e9b8' AS address, 'Orderly' AS entity, 'OrderlyNetwork' AS x_username),
STRUCT('0xdec273bf31ad79ad00d619c52662f724176a12fb' AS address, 'Lambda' AS entity, 'Lambdaim' AS x_username),
STRUCT('0x68bdfece01535090c8f3c27ec3b1ae97e83fa4aa' AS address, 'Mint' AS entity, 'Mint_Blockchain' AS x_username),
STRUCT('0x000000633b68f5d8d3a86593ebb815b4663bcbe0' AS address, 'Taiko' AS entity, 'taikoxyz')
]) AS layer_twos
)
SELECT address, entity, x_username
FROM layer_twos

UNION ALL

SELECT DISTINCT et.from_address AS address, 'Inscriptions' AS entity, '' AS x_username
FROM `bigquery-public-data.crypto_ethereum.transactions` et
LEFT JOIN layer_twos lt ON lt.address = et.from_address
WHERE lt.address IS NULL
AND et.block_number >= 19426589
AND et.block_timestamp >= '2024-03-13'
AND et.transaction_type = 3
AND LEFT(SAFE_CAST(FROM_HEX(SUBSTR(et.input, 3)) AS STRING), 5) = 'data:'
)
SELECT
DATE(block_timestamp) AS day,
COALESCE(ld.entity, 'other') AS blob_submitter_label,
SUM(ARRAY_LENGTH(blob_versioned_hashes)) AS blobs,
SUM(SUM(ARRAY_LENGTH(blob_versioned_hashes))) OVER (PARTITION BY COALESCE(ld.entity, 'other')) AS total_blobs
FROM `bigquery-public-data.crypto_ethereum.transactions` et
LEFT JOIN labeled_data ld ON et.from_address = ld.address
WHERE block_timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY day, ld.entity
ORDER BY total_blobs DESC;
"""

# Run the query
query_job = client.query(query)
results = query_job.result()
df = results.to_dataframe()
Interactive chart available here

Key insights from the “Blob Submitters by Day” chart include:

  1. Dominant Blob Submitters: Entities such as Base, Arbitrum, and OP Mainnet consistently contribute a significant number of blobs each day, indicating their active participation in the network.
  2. Fluctuations in Activity: Noticeable peaks and troughs in daily blob submissions suggest varying levels of activity, with significant peaks around certain dates.
  3. Diverse Participation: The variety of entities contributing to blob submissions highlights the decentralized nature of the network.

Conclusion

The enhancements to Ethereum ETL bring several key benefits to the Ethereum community:

  1. Extract Relevant Blob Data: The new fields added to Ethereum ETL enable everyone to extract detailed and relevant blob data from the blockchain. For more details, see the pull requests here and here.
  2. Query from Public Datasets: These enhancements also allow users to query the updated data directly from the public datasets in BigQuery.
  3. Easily Analyze and Visualize Data: Using the GCP SDK and Python notebooks, users can easily analyze and visualize the data. You can find the associated Google Colab notebook with interactive charts here.

By leveraging these improvements, researchers, developers, and analysts can gain deeper insights into Ethereum’s blob data, fostering innovation in the ecosystem.

--

--