December 3, 2024

Vana: The First Open Protocol for AI Data Sovereignty

Personal data has become the foundation of the internet economy. For two decades, we've accepted a simple bargain: platforms provide free services in exchange for collecting and monetizing our data. This arrangement – "if you aren't paying, you're the product" – has shaped everything from targeted advertising to data brokers.

The rise of AI has dramatically raised the stakes. Platforms now sell user data for hundreds of millions of dollars to train AI models – transforming personal information from a resource for targeted ads into the fundamental building block of artificial intelligence. Yet the users who create this data remain cut out of its value.

This wasn't the original vision. The internet's architects imagined users – not platforms – maintaining direct control of their information. Tim Berners-Lee himself has spent years working to restore this data sovereignty. But the convenience of cloud infrastructure and free services prevailed, and platforms became the custodians of our digital lives.

But two transformative shifts have converged: AI has made personal data exponentially more valuable, while advances in decentralized technologies have finally given individuals the tools to control it.

Vana is the first open protocol for data sovereignty. It enables users to export their data from platforms and join data collectives that negotiate directly with AI companies and developers. Through encrypted personal storage and client-side computation, users maintain complete control while achieving the network effects that were previously only possible through centralized platforms. It offers a self-sovereign internet where both parties win: developers can build transformative applications with their dream datasets, while users maintain complete control over their most valuable asset.

Today we're releasing the Vana Whitepaper ahead of our mainnet launch. In this post, we'll explore how Vana transforms personal data from a resource to be extracted into an asset class controlled by its creators.

Overcoming Data's Double-Spend Problem


The core challenge in financializing data is that, unlike other digital assets, data's economic value depends on controlled access - once data becomes public, it loses its market value. Traditional blockchains, with their emphasis on public verifiability, are not well suited for working with private data. Vana solves this through an architecture that combines private data custody with public ownership rights.

The Vana network maintains a global state consisting of:

  • Data ownership records: Cryptographic proofs of data possession
  • Access permissions: Who can access what data, under what conditions
  • Validation proofs: Attestations of data quality, authenticity, and metadata
  • Onchain data collective contracts and token balances: Economic rights and governance

While the data itself remains encrypted in personal servers or secure enclaves, the network enables programmatic control over who can access the data, under what conditions, and how value flows back to data creators.

Here’s what this looks like in practice: a user can export their private data from any platform, host their data in a personal server secured by a private encryption key, and join a data collective on Vana that pools user data in similar categories. These collectives, known as DataDAOs, can negotiate with researchers training AI models or application developers to pay for usage of the pooled data. When data is bought by an external developer, contributors to the data pool are remunerated. 

DataDAOs and the Data Tokens


Data Liquidity Pools serve as a coordination mechanism to turn personal data into a new asset class, mapping non fungible data to a fungible data token. Data Liquidity Pools refer to smart contracts that instantiate a DataDAO, which in turn refers to the larger ecosystem of data contributors, developers, and researchers that evolve around a particular data ecosystem. When a user contributes data to a DataDAO, they are issued DLP-specific tokens according to the DataDAO’s unique Proof of Contribution.

Each DataDAO implements its own Proof of Contribution function tailored to the specific type of data it handles, as different forms of data have inherently different measures of quality and value. For instance, a DLP focused on financial data might prioritize factors like transaction accuracy, completeness of records, and consistency of reporting in its scoring mechanism. In contrast, a social media-focused DLP might weigh factors such as user engagement levels, account longevity, and content interaction metrics more heavily. For health data, a DLP might emphasize data freshness, frequency of measurements, and device accuracy ratings.

The Vana protocol defines a standardized attestation schema to include proofs and metadata onchain, while keeping data private. Data validation occurs through a network of Trusted Execution Environments (TEEs) called the Satya Network. These nodes provide verifiable attestations about data quality while preserving privacy of the underlying data. Each DLP defines its own validation criteria, enabling a market-driven approach to data quality assessment. Some DLPs are also leveraging zk-based proofs, including zk email and zktls.

DLPs serve as the fundamental coordination mechanism for collective data assets in the Vana network. Unlike traditional liquidity pools in DeFi that coordinate fungible token pairs, DLPs coordinate non-fungible personal data contributions while maintaining privacy and sovereignty.

The Vana Foundation runs an accelerator program with 12 leading DataDAOs, and has 300 applications for the next cohort. Current DataDAO teams are 2-5 people working full-time on building a DLP around a specific data source, including for Twitter Data, Synthetic Data, Genetic Data, and Browser Data. Each DataDAO issues its own dataset-specific token. You can learn more about active DataDAOs here

The power of DLPs lies in their permissionless nature - anyone can create one without seeking approval from the platforms where the data originates. This is possible because DLPs leverage existing data privacy regulations that guarantee individual users the right to export and control their personal data.

When AI researchers and model developers want to access this pooled data, they engage with the DataDAO's governance system rather than negotiating with thousands of individual users. This collective bargaining approach is transformative: data contributors receive governance tokens proportional to their contributions, giving them both economic rights and decision-making power over how their data is used. The result is a virtuous cycle where high-quality data contributions are rewarded, market forces determine fair access pricing, and ongoing data maintenance is incentivized.

For example, an AI researcher might propose a staged access plan to the DataDAO, starting with a quality control phase accessing 10% of the dataset, followed by full dataset access for model training - all while keeping the underlying data encrypted and secure. In exchange, they would burn a specified amount of DLP tokens, effectively distributing value to all data contributors. This simple but powerful mechanism ensures that as the dataset's value grows, the benefits flow directly back to those who contributed their data.


DataDAOs and the VANA token


When Vana mainnet launches, it will establish the first large-scale alternative to Big Tech's data monopolies. Until now, AI companies seeking training data had only one real option: negotiate with centralized platforms like Meta and Google who control massive user datasets. And developers have been forced to work within the walled gardens limiting their access to the best datasets. Up until this point it was perhaps even rational: it’s a technical and social challenge to coordinate with millions of individual users for data access.

Mainnet fundamentally changes this dynamic by creating the infrastructure for true data sovereignty at scale. For the first time, millions of users can pool their data into a liquid market that rivals Big Tech's data repositories in size and value, while maintaining cryptographic control of their information. Vana mainnet creates a unified data economy with real price discovery, where market forces rather than platform monopolies determine the value of data.

Along the way, we lay out a path where user data is truly sovereign: controlled by the user through a non-custodial wallet, and portable with them throughout the internet.

The VANA token enables this vision through several key functions:

  • Network security through validator staking
  • Transaction fee payments for network operations
  • DLP staking, which determines emission rewards for different DataDAOs
  • The required currency for purchasing access to data across all DLPs

When AI companies want to access data from a DLP, they must use VANA to purchase and burn the DLP's tokens. This creates a direct economic link between network usage and token value - as more AI companies seek to access user data, they drive demand for both VANA and DLP tokens. The burning mechanism ensures that value flows back to both the network and data contributors.

The top 16 DataDAOs earn a share of emissions, designed to reward early contributors for onboarding data to the network. The top 16 are chosen every epoch (3 weeks) based on who has the most VANA stake. The rewards are split between the top 16 based on a set of performance metrics governed by the Vana DAO. You can learn more about DataDAO rewards here: https://www.vana.org/posts/datdao-rewards.

In this way, VANA serves as both the economic foundation for data transactions and an efficient proxy for the aggregate value of data assets in the network. As more AI companies seek to access user data through DLPs, the mechanism of purchasing VANA to burn DLP tokens creates a sustainable economy that rewards both data contributors and network participants.

A New, Open Era for the Data Economy


The launch of Vana mainnet marks the beginning of a fundamental power shift in the AI economy. For the first time, users can collectively challenge Big Tech's data monopolies, transforming personal data from a resource to be extracted into an asset class we control. This isn't just about compensation - it's about reshaping who builds, controls, and benefits from AI.

The opportunity is immediate and massive. AI companies are hitting a data wall, desperately seeking fresh training data beyond what they can scrape from the public internet. Through Vana, users can now pool their data into datasets that rival and even outperform those of major platforms, while maintaining cryptographic control. Vana as a network gets stronger with each incremental user, enabling datasets that span across platforms, combine different data types, and empower users with true self-sovereignty over their data. 

We're building an AI economy that works for users and open source builders, not web2 giants. One where data flows freely but sovereignty remains absolute. One where the next generation of AI models is trained on user-owned data, with the benefits flowing back to contributors - and where the world’s best AI developers can access their dream datasets. Join us as the community rolls out the foundation of a new open, data economy.