Author: Sebastian Melendez Source: Artemis Translation: Shan Ouba, Golden Finance
Introduction
Stablecoins are the focus of the market right now. There is major news almost every day. Last week, Stripe announced that it would acquire wallet service company Privy, and PayPal announced that it would natively mint PYUSD on Stellar. The news is endless, and it is almost overwhelming. As more and more companies enter this field, the need to track and obtain stablecoin data is growing. However, from our communication with customers, people always ask four questions over and over again:
What are stablecoins used for?
Who is using stablecoins?
What opportunities exist?
In which countries or regions are stablecoins used?
My job at Artemis is to collect, organize, and aggregate stablecoin data every day to answer these questions. Today, we’re going to debunk some “simple-looking” data myths to see how difficult these questions really are.
Myth 1: Stablecoin data is open, transparent, and readily available to everyone
Independent access to on-chain data is incredibly expensive and technically challenging. While the accessibility of raw blockchain data has improved over the past five years, many barriers remain. Mainstream data service providers such as Dune, Flipside, Allium and Goldsky each have their own advantages, but none of them can cover all key blockchains.
Reality:
Now almost every company is launching its own blockchain, each with its own unique features, making data analysis extremely complicated.
If you want to fully understand your stablecoin usage patterns and discover potential opportunities, you need to be able to perform panoramic analysis on all relevant chains, not just the currently deployed platform. As multi-chain strategies develop and analytical needs deepen, the complexity of data infrastructure also increases.
Take PYUSD as an example:
Once you integrate LayerZero's OFT cross-chain protocol, to really see the whole picture, you have to master:
What's worse, users may also bridge tokens to more platforms, which makes the data problem exponentially complicated.
The problem is not just the chain you are currently online, but the entire ecosystem is constantly expanding, and new chains are emerging one after another. This leads to the second problem: architectural fragmentation.
The data architecture and format of each chain are different
Recall the early 2000s, when you sent a file to someone, it didn’t mean that the other party could open it. PowerPoint wouldn’t open, videos lacked decoders, systems were independent, and nothing worked seamlessly. Even elementary school students have been tortured by these problems.
The blockchain world today is as chaotic as it was back then.
The most active chains at present - Solana, Tron, Ethereum, TON, Stellar, Aptos - have data architectures that are very different.
Give a few examples:
Solana: you have to understand the concepts of token account and owner account
Ethereum: you have to understand smart contracts, EOA, and ERC-20 standards
Aptos, Sui: use object-oriented model, assets are programmable objects
Stellar, TON: the architecture is completely different, but the usage of stablecoins is amazing
Understanding the activities on these chains means you have to disassemble an increasingly complex technology network.
Look at PYUSD again:
Before, you only needed to understand the architecture of Ethereum, Solana and LayerZero. But now with its landing on Stellar, you also have to understand:
Stellar's smart contract platform Soroban
Soroban's virtual machine model
Transfer and balance management logic that is completely different from Ethereum
In other words, you even have to become an expert in a certain chain to access and parse the data, let alone extract insights from it.
Myth 2: Insights will naturally emerge as long as you get the blockchain data
Many people think: As long as the data access problem is solved, user insights can be easily obtained next. Assuming that you have obtained access rights and captured the balance and transfer data set of the entire chain, what do you get?
The answer is: A bunch of noise.
The on-chain address is just a string of letters and numbers, and the wallet balance is often inaccurate or misleading. Raw blockchain data does not equal insight, it is just a messy pile of data that needs to be extremely complex to clean and process before it can become valuable.
The reality is: to understand what’s happening on-chain, you need context and off-chain data
Even if you go to great lengths to collect on-chain data, you still can’t answer the key questions: Who is using your stablecoin? Where are they?
All you can say is: “My stablecoin is being used.” This is not actionable and doesn’t help you understand: user behavior, market penetration, growth opportunities. To achieve these insights, you must rely on off-chain context. The real question is: What off-chain data do you need, and how do you get it?
Application and Protocol Tags:There is no single, reliable source for tagging on-chain activity. Flipside, Dune, the Open Label Initiative, block explorers, Arkham—they all provide some information, but each has its own patterns and limited coverage. To answer basic questions like "What application is this address using?""What kind of usage are we seeing?", you need to unify these scattered sources of labels and manually label important wallet addresses. If you don't do this, you are limited to raw transaction data, which doesn't provide any information about actual usage patterns.
Geolocation:This is the key—and perhaps the question I'm asked most often: Where are my users? We use time zone heuristics and advanced techniques to infer geographic distribution. More importantly, we work with data partners to obtain proprietary off-chain geographic data that helps us pinpoint which country a wallet is most likely to be from.
The reality is that solving this labeling problem requires significant resources and industry relationships. You need partnerships with major L1s and protocols to build comprehensive labeled datasets. Most teams don’t have the bandwidth or connectivity to handle this manually — which is why many analytics efforts hit a wall once they get the raw blockchain data. The contextual layer is where the real work begins.
Myth 3: Blockchain Data is Intuitive and Consistent
Blockchains are far more complex than they appear. While the industry has begun to standardize around specific design patterns for token transfers over the past few years, this wasn’t always the case. When bridging technology first became popular, there were no community standards for tracking cross-chain activity. This created confusion when trying to accurately track balances and transfers — especially for tokens that have been around long enough to predate these standards. You need to understand the specific history and idiosyncrasies of each chain to get accurate data.
Reality: Blockchain “database schemas” are always changing —You have to be an “on-chain historian” to get accurate data
It’s easy to forget that these ecosystems are constantly changing. Take Solana, for example, which has undergone major upgrades to both its architecture (how its blockchain works) and its token program (how tokens are created and transferred).
Architecture Upgrades:When Solana first launched, the chain did not store timestamps in long-term storage. This caused major problems when trying to calculate historical balances over time. Solana fixed this issue in 2020, but the damage was done: how can you reconstruct accurate historical balances without timestamps?
Token Plan Upgrade:Last year, Solana introduced Token Plan 2022 to address fragmentation issues in the original design, but this means you need to understand the nuances of the old and new token plans to accurately track fungible tokens.
Based on this, it’s common to hear people say that blockchains are immutable, public, append-only databases. While this is generally true now, it wasn’t always true in the early days. Optimism is a great example - they didn’t just go through a genesis event and launch. In fact, they completely relaunched online a few months later.
The result?There is no complete dataset of all token transfers on the original Optimism chain.
Why does this matter?This missing data is critical to understanding the current and historical activity of major stablecoins on the OP mainnet, including USDC, USDT, and DAI. Without this data, you can’t get a complete dataset, nor can you calculate accurate wallet balances.
Building an accurate dataset requires becoming a blockchain historian. Understanding the subtle evolution of each chain and accounting for all these historical differences requires years of hard work.
Conclusion
Blockchain data faces unique challenges that simply don’t exist in other industries. Even though it is nominally “open and transparent,” extracting meaningful insights requires off-chain data, integrating a dozen data service providers, reading contextual information scattered in crypto Twitter and official documents, and an engineering team of more than 10 people. Otherwise, you’re just blindly chasing a phantom market that changes at the speed of light.