Glossary

Glossary: Data Warehouse

A data warehouse is a centralized system for storing structured data from multiple sources, such as onchain events, product usage, and marketing data, so it can be queried and analyzed together.

What is a Data Warehouse?

A data warehouse is a centralized system for storing structured data from multiple sources, such as smart contract events, product usage, and marketing data, so it can be queried and analyzed together.

Data Warehouse Explained

Imagine every source of data about your DeFi protocol, wallet transactions, app signups, marketing campaign clicks, support tickets, living in separate filing cabinets that don't talk to each other. Answering a simple question like "did our Twitter campaign lead to deposits" means manually cross referencing files by hand.

A data warehouse puts everything in one place, structured into tables that can be joined together. Smart contract events get indexed alongside offchain data like UTM parameters or signup forms, so a team can query across both at once instead of stitching spreadsheets together.

What Data Warehouse Means For

Audience

Use Case

Data and analytics teams

Store onchain and offchain data together so it can be joined and queried in one place instead of across disconnected tools

Growth and marketing teams

Connect marketing touchpoints to onchain outcomes, like deposits or swaps, by querying both datasets from the same warehouse

DeFi protocol teams

Build a single source of truth for reporting, instead of reconciling numbers from a block explorer, a dashboard tool, and a spreadsheet separately

Examples

  • A DeFi protocol loads wallet transaction history and UTM campaign data into the same warehouse to measure which channel drove the most onchain conversions.

  • A growth team queries the warehouse to build a retention cohort joining wallet addresses with signup dates from their web app.

  • A data team pipes raw onchain events into a warehouse nightly, so dashboards always reflect the latest activity.

  • An analyst joins entity labels with transaction tables in the warehouse to exclude bot wallets from a usage report.

FAQs

Is a data warehouse the same as a database? Not exactly. A database is often optimized for fast read and write of live application data, while a data warehouse is optimized for storing large volumes of historical data for analysis and reporting.

Why do DeFi protocols need a data warehouse for onchain data? Raw blockchain data is enormous, distributed across chains, and hard to query directly. A data warehouse organizes it into structured tables that can be analyzed efficiently alongside other business data.

What is the relationship between a data warehouse and SQL? SQL is the language used to query the data stored inside a data warehouse.

Can onchain and offchain data live in the same warehouse? Yes, and this is often the point. Combining wallet activity with marketing, product, or support data is what allows teams to answer cross functional questions.

How does data get into a warehouse? Through data pipelines that extract data from sources like blockchains, apps, or marketing tools, transform it into a consistent structure, and load it into the warehouse.