Formo: How to Unify Onchain and Offchain Data to Build Wallet-Level Analytics That Drive Retention
Formo’s guide shows how to unify onchain and offchain data to build wallet-level analytics, improve retention, and activate cross-data campaigns for growth.
Unifying transactional blockchain records with traditional product analytics is one of the most consequential tasks engineering and product teams face as Web3 usage scales. Formo and other analytics vendors have popularized the idea that connecting wallet activity to session- and campaign-level data produces clearer behavior signals, better segmentation, and measurably higher retention. This article lays out a practical blueprint for teams that want to unify onchain and offchain data, explaining what to collect, how to connect disparate sources, the engineering trade-offs between platform and custom builds, key metrics to monitor, and the governance and privacy guardrails you must put in place to use unified profiles responsibly.
Why combining onchain and offchain data changes how teams see users
Onchain telemetry—transactions, token movements, smart contract calls, NFT trades, governance votes—captures behavior that standard web analytics cannot see. Offchain systems record sessions, clicks, campaign attribution, and social engagement, but miss the underlying asset interactions that often determine user intent in crypto-native products. When these two streams are joined into wallet-level profiles, product teams can trace a user’s journey from first site visit to token interaction, quantify conversions in native terms (e.g., transaction completed, token staked), and personalize outreach based on financial signals rather than purely behavioral proxies. That combination makes retention and monetization strategies far more precise.
What onchain and offchain datasets actually look like
Onchain data is structured around immutable ledger events and addresses. Typical elements include wallet addresses, transaction hashes, smart contract events and function calls, token transfers and balances, gas spent, DeFi protocol actions (lending, staking), NFT sales, and governance participation. Offchain data comprises session identifiers, page views, button clicks, UTM and marketing attribution parameters, server-side events, community engagement (Discord, Twitter), KYC or demographic attributes when available, and third-party data such as exchange price feeds.
Bridging these two formats requires a schema that can represent both event-level blockchain activity and user-facing session events without losing fidelity. That means storing canonical onchain records alongside session timestamps and campaign metadata so downstream analytics can correlate an onchain transaction with the marketing touch or UX step that preceded it.
Define the data strategy before you build
The first technical decision is strategic: what business questions must unified profiles answer? Do you need to attribute new wallet creation to specific campaigns? Are you trying to identify high-value wallets for retention programs? Which onchain conversions—first deposit, swap, mint—map to product success? Answering those questions drives your schema, ingestion cadence, and the integration approach.
Set measurable goals up front (e.g., increase 30-day retention by X% for wallets acquired via campaign Y) and enumerate the acquisition channels and onchain conversion events you’ll track. That clarity prevents over-collection and ensures engineering work targets actionable outcomes rather than vanity metrics.
Platform vs. custom vs. hybrid: trade-offs for data integration
There are three common approaches to unify onchain and offchain data:
- Platform-based integrations: Analytics platforms offer turnkey collectors, real-time onchain indexers, and visualization dashboards. They lower initial development cost and accelerate time-to-insight but may limit schema flexibility and require trust in a third party for sensitive processing.
- Custom infrastructure: Building your own ingestion stack—blockchain indexers, event processors, data warehouses, and attribution services—gives full control and can be optimized for unique product models. It demands more engineering effort, operational overhead, and expertise in blockchain tooling.
- Hybrid models: Many teams combine a commercial platform for real-time onchain decoding (e.g., indexed events and decoded logs) with custom ETL and business logic layered in their warehouse. Hybrid approaches let teams move fast while preserving long-term flexibility.
Platform vendors such as those mentioned in community guides provide prebuilt pipelines that decode smart contract events and surface normalized onchain events. For teams without extensive blockchain engineering capacity, those platforms can be a pragmatic starting point, while larger teams often migrate to custom or hybrid architectures as requirements harden.
How to build the collection and event pipeline
A robust pipeline handles ingestion, normalization, identity resolution, storage, and streaming for downstream analytics and activation.
- Onchain ingestion: Use a blockchain indexer or node subscription to capture transactions and events in near real time. Parse logs into normalized events (transaction_sent, token_transfer, contract_call) and enrich with derived fields (e.g., token symbols, USD value at time of event).
- Offchain collection: Instrument web and mobile apps with SDKs and server-side tracking to capture page views, clicks, form submissions, and campaign UTM tags. Ensure server-side events are timestamped and include session IDs to facilitate joins.
- Normalization: Apply a flexible schema that can map both onchain and offchain events to standard event types. Enforce consistent naming conventions and data types to avoid downstream schema drift.
- Identity linking: Capture wallet connections during authentication flows and associate session IDs, anonymous cookies, or device fingerprints to wallet addresses at the earliest opportunity. For users who never connect a wallet, keep session-level analytics separately until a link is established.
- Storage and streaming: Route normalized events into a data warehouse for analytics and into event streaming systems for real-time use cases (fraud detection, personalization, airdrop eligibility).
- Instrumentation QA: Implement automated schema validation and event tests so that a broken or renamed event doesn’t invalidate conversion funnels.
How identity resolution and wallet clustering work in practice
A major challenge is that users often operate multiple wallets across chains. Wallet clustering groups addresses that likely belong to the same entity using heuristics such as shared IPs, transaction patterns, signature reuse, and onchain metadata. Clustering improves attribution and lifetime value calculations, but it must be applied cautiously because heuristics can be noisy.
Where possible, rely on explicit signals—wallet connection events, signed messages, and authenticated sessions—to create deterministic links between an offchain user identity and wallet addresses. Use clustering only to supplement gaps and label probabilistic links clearly in downstream analyses.
Creating unified wallet-level profiles
A unified profile should present a single view that includes:
- Real-time feed of onchain actions: deposits, swaps, NFT purchases, governance votes.
- Offchain engagement signals: recent site visits, campaign origin, community interactions.
- Derived financial metrics: approximate net worth, average transaction size, portfolio diversification.
- User labels and segments: automated tags such as “NFT Collector,” “DeFi Power User,” or “Early Adopter” based on behavioral rules.
- Cross-chain activity: flags or aggregated metrics representing activity across EVM-compatible networks.
These profiles power both analytics and activation: cohort analysis, funnel inspection, targeted campaigns, and token-gated experiences.
Dashboards and cross-data analytics that reveal causality
Once events are unified and profiles exist, build dashboards that combine Web2 and Web3 KPIs: daily active wallets (DAW), acquisition cost per wallet (CPW), activation rate (time-to-first-transaction), lifetime value (LTV), total value locked (TVL), and retention cohorts defined by first onchain action. Funnels that span both environments—e.g., campaign click → site visit → wallet connect → first deposit—are especially valuable because they illuminate where users drop off and which channels deliver high-LTV wallets.
Real-time dashboards enable product and growth teams to react quickly to market shifts. They also serve as a single pane of truth for finance, compliance, and engineering stakeholders.
Activating unified data: how insights translate into growth
Unified profiles let teams move from analysis to action. Use wallet-level signals to:
- Trigger targeted airdrops and allowlists for wallets that meet behavioral criteria.
- Personalize in-app messaging and email/Discord campaigns based on recent onchain activity or portfolio characteristics.
- Prioritize support and retention efforts for high-value wallets.
- Allocate paid acquisition spend toward channels that historically produce quality wallets.
- Implement token-gated features for segments with specific holdings or history.
Activation closes the loop: data informs programs, programs change behavior, and new behavior feeds back into the analytics stack.
Operational and technical pitfalls to watch for
Several recurring issues complicate unification projects:
- Fragmented journeys: Users move across social platforms, websites, wallets, and blockchains—tracking that flow requires end-to-end instrumentation.
- Pseudonymity: New wallets can be spun up easily, inflating acquisition counts and complicating retention metrics.
- Real-time pressure: Processing and joining high-volume events across multiple chains in real time can stretch infrastructure.
- Cross-chain mapping: Differences in token standards, event formats, and RPC performance add engineering complexity.
- Data quality: Misnamed events, schema drift, or missed events can distort funnel calculations and mislead teams.
Address these with thorough QA, automated validation, pragmatic use of clustering, and careful event naming and governance.
Privacy, compliance, and ethical considerations
Linking offchain identifiers to wallet addresses raises privacy and regulatory questions. In many jurisdictions, identifiers derived from public wallet addresses may become personal data once combined with offchain attributes. Best practices include:
- Minimizing storage of PII offchain and anonymizing wallet-linked events when possible.
- Implementing explicit opt-in consent for cross-platform tracking and clear privacy notices.
- Storing sensitive personal data off the chain and protecting offchain systems with encryption and access controls.
- Using cryptographic techniques—salted hashes, encryption-at-rest, and commitments—where appropriate.
- Collaborating with legal and compliance teams to align with GDPR and other regional frameworks.
Privacy-preserving analytics patterns and secure data handling are not optional; they are critical for trust and long-term adoption.
Key metrics that matter for Web3 product teams
Measure both classic product KPIs and blockchain-native signals:
- Acquisition: wallet connections, CPW (cost per wallet), conversion rate from campaign to wallet connection.
- Activation: time-to-first-transaction, percentage of wallets that become active within X days.
- Retention: weekly/monthly active wallet counts, retention cohorts by acquisition source.
- Monetization: TVL, transaction revenue, average revenue per wallet, customer lifetime value (CLV).
- Engagement: transaction frequency, feature adoption rates, community engagement scores.
Design dashboards and alerts around these metrics to detect regressions and validate growth experiments.
Developer and tooling implications
Unifying onchain and offchain data intersects with many tool categories: node infrastructure and indexers, event processing frameworks, ELT/ETL tools, data warehouses, real-time analytics platforms, and marketing automation systems. Developer tooling should simplify decoding smart contract events, mapping tokens to USD values, and exposing normalized events to BI teams. Security tooling—especially around key management and data access control—must be part of the stack.
Integrations with developer ecosystems (GitHub activity, CI pipelines) can also enrich profiles for developer-facing products, while CRM hooks and automation platforms enable marketing teams to operationalize insights.
How to start: a step-by-step rollout plan for teams
- Define outcomes: pick two or three business objectives (e.g., improve 30-day retention for wallets from X campaign).
- Instrument the surface: ensure web and mobile analytics capture UTM, session IDs, and wallet connect events.
- Ingest onchain events: deploy an indexer or subscribe to a platform that decodes contract logs into normalized events.
- Implement identity linking: capture wallet connections and persist the link to session records in your warehouse.
- Build unified profiles and a small set of dashboards for the selected KPIs.
- Run experiments: use cohorts to test targeted activations (airdrops, allowlists) and measure lift.
- Iterate: validate assumptions, expand tracked events, and harden privacy controls.
Starting with a narrow scope reduces time to impact and makes it easier to validate the approach before scaling.
Broader implications for the software industry and businesses
The rise of unified Web3 analytics signals a shift in how product teams approach user value: financial behavior and token ownership become as central as clicks and sessions. For businesses, that means new possibilities—token-gated loyalty programs, financially informed personalization, and transparent monetization channels—but also new responsibilities in privacy and compliance. Developers and data teams will need to acquire blockchain-native skills, and organizations should expect a growing demand for tools that translate onchain complexity into business-friendly metrics. Marketing, product, finance, and security teams must collaborate more closely than before, turning data unification into a cross-functional capability rather than an engineering silo.
Formo and similar analytics offerings are accelerating this transition by packaging complex blockchain decoding and attribution into developer-friendly APIs and dashboards, but the strategic value comes from how teams integrate those outputs into product decisions and growth playbooks.
Practical troubleshooting tips and governance practices
- Validate events end-to-end: set up automated tests that replay known flows and assert expected events arrive in the warehouse.
- Enforce naming conventions and schema checks: use CI to prevent accidental changes that break downstream reports.
- Monitor data integrity: track event delivery rates and set alerts for significant drops or anomalies.
- Document linkage rules: keep a clear record of how session IDs, wallet addresses, and clusters are associated.
- Apply least-privilege access: restrict who can join PII and wallet-level data to minimize exposure.
Good governance reduces technical debt and prevents analytical errors from cascading into business decisions.
As decentralized ecosystems mature, expect tooling to better handle cross-chain normalization, privacy-preserving identity resolution, and more sophisticated activation primitives such as gasless airdrops and programmable entitlements. Teams that invest now in a principled approach to unifying onchain and offchain data—combining careful instrumentation, robust identity linking, privacy-aware practices, and activation pathways—will be best positioned to convert blockchain-native signals into predictable product growth and sustainable user engagement.




















