

Key Takeaways
Growth experiments require defined hypothesis, single variable, and baseline comparison, separating real learning from campaign driven metric spikes that only simulate success
DeFi growth experiments isolate activation, usage, retention stages, ensuring each test targets one funnel layer with a clear primary metric and controlled cohort
False positives dominate DeFi experiments when incentives, lack of controls, or peak-based evaluation replace baseline comparison and post-experiment floor measurement
Why Most DeFi Teams Run Campaigns, Not Experiments
Most DeFi growth activity is not experimentation. It is repeated campaign execution without a hypothesis, a control, or a defined success condition. A team launches a liquidity incentive, watches TVL rise, calls it a success, and runs another one. Nothing was tested. Nothing was learned. The same cliff appears after every campaign because the underlying questions were never asked.
A growth experiment is something more specific: a planned change designed to test one hypothesis about user behaviour, with a defined metric, a defined time window, and a way to separate signal from noise. The difference matters because campaigns optimise for peaks and experiments build understanding. Peaks fade. Understanding compounds.
DeFi teams face a genuine challenge here. The onchain environment makes experimentation harder than in Web2: actions are irreversible, incentives distort behaviour, governance slows iteration, and the user base is pseudonymous. These constraints do not make experimentation impossible. They make disciplined experiment design more important, not less. This article covers how to do it. For the broader context of where experiments sit in post-launch growth, see the DeFi post-launch growth guide.
What a Growth Experiment Means in DeFi
A growth experiment in DeFi is a planned change to one variable in your protocol's growth system, designed to test a single hypothesis about user behaviour, with a defined primary metric and a defined baseline to compare against.
Every word in that definition is doing work:
Planned change: not a campaign run because the team needs activity, but a deliberate intervention chosen because it tests something specific.
One variable: changing multiple things simultaneously makes it impossible to know what caused any observed effect.
Single hypothesis: a statement of the form 'we believe that changing X will cause Y to increase, because Z.' If you cannot write this sentence, you are not running an experiment.
Defined primary metric: one metric that represents real behaviour change at the stage being tested. Not a dashboard of ten metrics.
Defined baseline: the pre-experiment level of the primary metric, which post-experiment results are compared against, not against the peak during the experiment.
Distinction: A campaign asks: how do we generate activity? An experiment asks: does this specific change cause more users to do this specific thing durably? Both have their place. Only one builds knowledge.
How to Form DeFi Growth Hypotheses
A testable hypothesis has three parts: the change, the expected behaviour, and the reason. The reason is the most important and most frequently omitted part. Without it, a confirmed result teaches you nothing transferable, and a failed result gives you no direction for the next test.
The hypothesis format to use:
Template: We believe that [specific change] will cause [specific user behaviour metric] to [increase/decrease], because [reasoning about why this change affects this behaviour].
Examples of well-formed DeFi growth hypotheses:
Hypothesis | What It Tests | Why the Reasoning Matters |
Adding a portfolio performance summary to the dashboard will increase D30 wallet return rate, because users with visible accumulated returns have a reason to check back. | Whether portfolio visibility creates return motivation, distinct from yield or position management. | If confirmed, dashboards are a retention tool. If not, return motivation is driven by something else, narrowing what to test next. |
Sending a position health alert when a lending health factor drops below 1.5 will increase the rate of users topping up positions, because it converts a passive position into one that requires attention. | Whether triggered communication creates the recurring engagement loop that a lending position implies but does not automatically produce. | If confirmed, alerts are a retention mechanic worth investing in at scale. If not, users who would act are already monitoring manually. |
Notice what these hypotheses do not contain: vague goals like 'improve engagement' or 'increase liquidity.' Each one names a specific change, a specific metric, and a specific reason. The reason is what lets you learn from the result regardless of whether it confirms or refutes the hypothesis.
Designing Experiments Across Activation, Usage, and Retention
The right experiment design depends on which stage of the growth funnel is being tested. Each stage has different variables, different metrics, and different time windows. Mixing them produces results that are uninterpretable. For the full funnel context, see the DeFi growth funnel guide.
Activation Experiments
Activation experiments test changes to the connect-to-first-transaction journey. The primary metric is always the connect-to-transaction rate: the percentage of wallets that connect and complete a meaningful first onchain action within a defined window, typically seven days. Secondary metrics like time-to-first-transaction and drop-off by funnel stage help diagnose why the primary metric moved, but they do not replace it.
Activation experiments are among the most tractable in DeFi because they test changes to the product interface, not to the protocol itself. They can be run without governance, without contract changes, and without token spend. The variables to test include gas estimate placement, first-action defaults, trust signal positioning, and wallet confirmation guidance. See the full set of high-priority activation experiments in the onchain activation guide.
Design rule: Activation experiments need a clean cohort of new wallets, not all wallets. Mixing returning users into an activation experiment contaminates the sample and inflates the apparent conversion rate.
Usage Experiments
Usage experiments test changes that affect how deeply activated users engage with the protocol, measured by transaction frequency, position size, or feature adoption. The primary metric is transactions per active wallet per period. Usage experiments are harder to isolate than activation experiments because they require users who have already activated, and usage behaviour is more influenced by market conditions than activation behaviour.
The variables worth testing at the usage stage include: feature discoverability (does surfacing a feature in the UI increase its adoption?), cross-feature pathways (does guiding a swapper toward LP provision increase their transaction frequency?), and communication triggers (does an alert about a portfolio opportunity produce a transaction?). Each of these has a clear hypothesis and a measurable primary metric.
Design rule: Usage experiments require longer time windows than activation experiments. A single week is insufficient to distinguish a genuine increase in transaction frequency from normal variance. Run usage experiments for at least three to four weeks before reading results.
Retention Experiments
Retention experiments test changes that affect whether activated users return and transact again, measured by D30 wallet return rate. Retention is the hardest stage to experiment on because the time window is long, the signal is noisy, and many external factors: market movements, competitor incentives, macro conditions, affect whether users return independently of what the protocol does.
The most reliable retention experiments test specific communication interventions (alerts, portfolio summaries) and specific product features (position management tools, performance dashboards) on isolated cohorts. The control group is a matched cohort of similar wallets who do not receive the intervention. Without a control, it is almost impossible to separate the experiment's effect from background retention trends. The onchain retention guide covers which product design choices create the recurring value that retention experiments are typically trying to validate.
Design rule: Retention experiments must be measured against a baseline period without the intervention, or against a control cohort. Comparing D30 return rates during a campaign to D30 return rates during the retention experiment is not a valid comparison: campaign periods inflate baseline behaviour.
Experiment Constraints Unique to DeFi
Three structural constraints make growth experimentation in DeFi meaningfully different from Web2. Understanding them before designing experiments prevents the most common failure modes.
Irreversible Transactions
Onchain actions cannot be undone. A Web2 product team that ships a broken feature can roll it back within minutes. A DeFi team that introduces a flawed incentive mechanism, a misconfigured smart contract, or a poorly designed LP incentive cannot reverse the transactions that have already occurred. This asymmetry has a direct implication for experiment design: the downside of a failed DeFi experiment can be much larger than in Web2, and the failure is permanent.
The practical response is to stage experiments by risk level. UI and communication changes carry low risk and can be deployed broadly. Incentive structure changes carry medium risk and should be tested on smaller cohorts with defined caps. Smart contract changes carry high risk and require audit, staged deployment, and defined exit conditions before any experiment goes live.
Incentives Distort Behaviour
Token incentives are both a DeFi growth tool and a DeFi experimentation problem. Any experiment that includes a token reward will attract behaviour that is optimising for the reward, not for the product. This makes it extremely difficult to measure whether a product change caused behaviour to shift, or whether the incentive did. Incentive-contaminated experiments almost always produce false positives.
The discipline required is to separate incentive experiments from product experiments and never run both simultaneously on the same cohort. If you want to know whether a UI change improves activation, test it on a cohort with no active incentive running. If you want to know whether an incentive improves retention, test it on a cohort where the product is otherwise unchanged. Mixing the two makes both results uninterpretable. The analysis of how DeFi incentive programs shape growth covers the incentive distortion problem in detail.
Governance and Contracts Slow Iteration
Protocol-level changes in DeFi often require governance votes, timelock periods, or multisig approvals. This means the iteration speed available to a DeFi growth team is structurally slower than what a Web2 product team can achieve. A hypothesis that requires a smart contract change may take weeks or months to test, by which time market conditions have shifted and the result is harder to interpret.
The practical response is to distinguish between experiments that require protocol changes and those that do not, and to prioritise the latter heavily in the experiment backlog. UI changes, communication experiments, and frontend defaults can all be tested without touching contracts or governance. These should form the majority of any DeFi experiment programme, with protocol-level experiments reserved for hypotheses that cannot be tested any other way.
What to Measure in Each Experiment
The measurement framework for a DeFi growth experiment has three layers: the primary metric, the leading indicator, and the guardrail metric. All three should be defined before the experiment runs, not after. For a full reference on which metrics matter at each stage, see the DeFi KPIs guide.
Experiment Stage | Primary Metric | Leading Indicator | Guardrail Metric |
Activation | Connect-to-first-transaction rate within 7 days | Initiation rate: % of connected wallets that begin a transaction flow | Confirmation-to-completion rate: an activation experiment should not improve connect-to-transaction by pushing users through steps they are not ready for |
Usage | Transactions per active wallet per 30 days | Feature adoption rate: % of active wallets using the feature being tested | Average transaction value: usage frequency should not increase at the cost of transaction size declining sharply |
Retention | D30 wallet return rate for the experiment cohort vs control cohort | Re-engagement rate: % of dormant wallets that transact again within 7 days of the intervention | Churn rate in the period following re-engagement: re-engaged users who churn immediately are not retained, they are temporarily activated |
The guardrail metric is the most frequently omitted element. Its purpose is to catch experiments that appear to improve the primary metric while actually damaging something else. An activation experiment that increases connect-to-transaction rate by pushing confused users through confirmation steps they did not understand is not a win. The guardrail catches this by flagging if confirmation-to-completion drops.
How to Avoid False Positives in DeFi Experiments
False positives are the most common outcome of DeFi growth experiments run without discipline. The experiment appears to work. The metric rises. The team calls it a success and moves on. The baseline returns to its prior level within weeks. Nothing was actually learned and the team has slightly more confidence in an approach that does not work.
Five specific practices prevent false positives in DeFi experiments:
Compare Post-Experiment Baselines, Not Peak Numbers
The success condition of a DeFi growth experiment is whether the post-experiment baseline is higher than the pre-experiment baseline, not whether the metric peaked during the experiment. Any incentive, campaign, or novel product change will produce a short-term spike in almost any metric. The spike is not evidence that the change worked. The floor after the spike is the evidence.
Use Control Groups or Time-Based Comparisons
Without a comparison group, it is impossible to know whether the metric moved because of the experiment or because of an external factor. The ideal is a matched control cohort: wallets with similar characteristics who do not receive the intervention. Where cohort matching is not possible, a time-based comparison using the same metric from the prior period under identical market conditions is the next best option. No comparison group means no experiment.
Define the Success Condition
Define what a successful result looks like before the experiment runs, not after seeing the data. Post-hoc success conditions are one of the most reliable ways to turn a failed experiment into an apparent win. If the hypothesis is that connect-to-transaction rate will increase, decide in advance: by how much, over what time period, compared against what baseline. If the result does not meet that condition, the experiment did not confirm the hypothesis, even if some metric moved in the right direction.
Separate Incentive Effects from Product-Led Growth
Any metric that rises during an incentive period is measuring the incentive, not the product change. Run product experiments during periods with no active incentive programme on the same cohort, or explicitly account for the incentive in the experimental design by running matched incentive and non-incentive cohorts. Do not interpret a combined incentive-plus-product-change result as evidence for the product change.
Observe Over a Longer Period
DeFi user behaviour is particularly sensitive to novelty effects: users interact with new features at launch and then revert to prior patterns. An experiment that is read after one week may be measuring novelty, not genuine behaviour change. Most DeFi experiments need a minimum of three to four weeks of post-experiment observation before the result can be considered stable. Retention experiments need longer, because the D30 return window itself is 30 days.
👉Running experiments without clean data? Formo connects frontend events to onchain transactions at the wallet level, so you can measure pre- and post-experiment baselines, cohort behaviour, and stage-level drop-off without building custom data infrastructure. See how it works.
How to Build a Growth Experiments in DeFi
Start with a prioritised list of hypotheses waiting to be tested, organised by stage, ranked by expected impact and implementation effort, and maintained as a living document that the growth team works from.
Step 1: Run the Stage Diagnosis First
The experiment backlog should be built from data, not from opinions about what might work. Before generating hypotheses, measure where the biggest drop-off is in the current funnel: activation, engagement, or retention. The stage with the largest loss is where the first experiments should be concentrated. Running activation experiments when retention is the primary problem is a resource misallocation. See the post-launch growth diagnosis framework for the specific metrics and signals to use.
Step 2: Generate Hypotheses Against the Broken Stage
With the broken stage identified, generate five to ten hypotheses that could address it. Use the hypothesis format: change, expected behaviour, reason. Do not filter at this stage. The goal is to produce enough options that prioritisation is meaningful.
Step 3: Score Each Hypothesis
Score each hypothesis on three dimensions before deciding what to test first:
Dimension | What to Assess | High Score | Low Score |
Expected Impact | If the hypothesis is confirmed, how much does the primary metric move? | Closing a known, large drop-off in the funnel | Marginal optimisation of a step that is already converting well |
Implementation Effort | What does running this experiment require? UI change, communication tool, contract change, governance? | Frontend or copy change only. No contract or governance involvement. | Requires contract deployment, audit, or governance vote |
Learning Value | Regardless of result, how much does this experiment tell you about the underlying growth system? | Tests a core assumption about why users do or do not take a specific action | Tests a surface-level variation where either result is difficult to act on |
Step 4: Define Run Order and Sequencing Rules
Two experiments that touch the same cohort at the same time contaminate each other's results. The backlog should define which experiments run in parallel and which must be sequenced. As a default, run no more than one experiment per funnel stage on any given cohort at the same time. Activation, usage, and retention experiments can run in parallel if they are on different cohorts or at genuinely distinct stages.
Step 5: Review and Reload After Each Result
After each experiment concludes, update the backlog based on what was learned. A confirmed hypothesis generates follow-on experiments that test scaling the effect or extending it to new cohorts. A refuted hypothesis generates experiments that test alternative reasons for the same problem. An experiment backlog that does not update after results is not a learning system. It is a list of things to try.
The Bottom Line
Most DeFi growth activity is campaign execution mistaken for experimentation. Campaigns are not inherently wrong, but they answer a different question. A campaign asks how to generate activity. An experiment asks whether a specific change causes a specific behaviour to shift durably. Only the second one builds the kind of knowledge that compounds into a growth system.
The discipline required is not complicated. Form a hypothesis with a reason. Define the primary metric and the baseline before running the experiment. Use a control or time comparison. Read the post-experiment baseline, not the peak. Update the experiment backlog based on what you learned. Applied consistently, this turns growth from a cycle of campaigns and cliffs into a process of progressive understanding of what actually drives your users. For the full picture of the growth system this experimentation process feeds into, see the onchain growth framework.
Run DeFi Growth Experiments with Formo
You can’t improve what you don’t measure. Formo makes analytics and attribution simple for DeFi apps, so you can focus on growth.
Growth experiments are only as good as the data they are measured against. Pre-experiment baselines, cohort-level behaviour, stage drop-off rates, and post-experiment floor comparisons all require connecting frontend events to onchain transactions at the wallet level. Formo is built to give DeFi teams exactly this, without custom data infrastructure.
For DeFi teams building a growth experiment programme, Formo provides:
Wallet-level funnel data from connect through to first transaction and repeat usage — via Formo's analytics, so pre- and post-experiment baselines are measurable at each funnel stage
Cohort segmentation so you can isolate experiment cohorts from control cohorts and from incentive-period wallets — powered by wallet profiles
Acquisition source attribution via onchain attribution, so you can verify whether wallets in your experiment cohort arrived through the channel being tested
D30 return rates and retention cohort comparisons via retention analytics, so retention experiments have the comparison data they need to separate signal from noise
Ask AI to surface the pre- and post-experiment metric comparison directly, without needing SQL or a dedicated data analyst
DeFi teams including Kyberswap and WalletConnect use Formo to drive growth onchain.
Explore the Onchain Growth Series
This article is part of Formo's onchain series, a collection of practical guides for DeFi founders and growth teams covering the full post-launch lifecycle. Each guide goes deep on a single growth challenge with frameworks you can apply directly to your protocol.
FAQs About DeFi Growth Experiments
What counts as a 'growth experiment' in DeFi?
A growth experiment in DeFi is a planned change meant to test one clear growth hypothesis. It is not a random campaign or one-off incentive push. Each experiment should target one stage like activation, usage, or retention. If you cannot say what you are testing, it is not an experiment.
Why do our growth experiments keep 'working' but nothing actually improves long term?
Your experiments look like they work because incentives or hype create short-term spikes. Those spikes disappear once the experiment ends. This creates false positives that hide real product issues. If baselines do not rise, the experiment did not work.
What should we measure in a DeFi growth experiment?
You should measure the single metric that represents real behaviour change for the stage you are testing. For activation, this is first onchain transaction. For usage or retention, this is repeat transactions or sustained positions. Secondary metrics are noise if the main behaviour does not change.
What makes experimenting in DeFi harder than in Web2?
Experimenting in DeFi is harder because onchain actions are irreversible and visible. Bad experiments can lock users into poor states or leak value through incentives. Governance and contracts slow down iteration. You cannot roll back mistakes the way you can in Web2.
How do I avoid lying to myself with DeFi experiment results?
You avoid lying to yourself by comparing post-experiment baselines, not peak numbers. If behaviour returns to pre-test levels, the change did not stick. You also need control groups or time-based comparisons. Without this, most DeFi wins are just noise.


