Source: Bitcoin Magazine; Compiled by Wuzhu, Golden Finance
Everyone who has used Bitcoin has used a mempool. So, what is a mempool?
Technically, there is no so-called "unique" mempool. Each Bitcoin full node operates its own mempool, which is a cache of valid Bitcoin transactions that have been broadcast to the network but have not yet been confirmed in a block. Nodes exchange messages with each other to see what transactions they have or do not have, and to exchange transactions they do not have.
Each mempool is essentially an independent "island" with its own set of unconfirmed transactions and sometimes even its own configuration variables and settings. The size of the mempool can be configured, and the default is 300 MB. In addition, there is a minimum fee rate, which is dynamically adjusted and can be configured to a value. This is used to determine which transactions will be kicked out of the mempool when the mempool is full and more transactions are constantly pouring in. There are also some other configurable options, such as the datacarrier and datacarriersize options, which affect transactions containing OP_RETURN outputs.
Different nodes have different reasons for running mempools, and therefore different requirements, but ultimately those requirements are met by all nodes running their own mempools in sync and interacting with each other.
Think of each mempool as a real pool that is connected to each other via underlying channels. The larger the mempool, the deeper the underlying pools. Miners, exchanges, block explorers, these will all be the deepest pools. They all have their own incentives to know about every unconfirmed transaction waiting to be included in a block. Miners, to make sure they have the most profitable transactions in the next block. Exchanges, to make sure they know about all pending transactions. Block explorers, because their entire service is to present as complete a dataset about the blockchain and mempools as possible. Your normal node really only needs to be deep enough to contain the highest fee-rate portion of the "mempool".
Now imagine each transaction as a drop of liquid, the higher the fee rate, the denser the liquid. These liquids flow through channels between pools, and upon reaching each pool, a drop of liquid is replicated and then sent through the channels to any other pool that has not yet received that liquid. As the pools fill up, the liquid overflows, with the less dense liquid (lower fee rate) overflowing the edges of the pool first.
Eventually, some lucky miner will scoop a certain amount of liquid from the bottom of the pool and pour it into the latest glass trough, forming a long winding glass trough that fills up with liquid and stays there forever (the blockchain).
This arrangement of interconnected pools serves different purposes for different users.
Traders
When users make transactions, the memory pool serves two purposes. First, and most importantly, it is to send their transactions to miners. If a transaction does not enter the miner's memory pool, it cannot be included in a block. The memory pools are linked and share transactions with each other, ensuring that once a transaction is placed in one memory pool, it will eventually enter the memory pools of all miners. Having a strong and decentralized network that can ensure that transactions are eventually sent from users to all miners, regardless of network connectivity variations and fragmentation, is invaluable.
The second use is fee estimation, which is particularly important for Layer 2 users who need to ensure that transactions in response to invalid states are confirmed in a timely manner at all times. Fees can be estimated to some extent by simply looking at the fee rates of transactions in those blocks, but this does not provide any information about the state of the mempool after the latest block. It cannot account for sudden spikes, opportunists rushing into the mempool, or the next surge of transactions that have not yet concluded. Without access to the mempool, fee estimation cannot ensure that it takes into account the current state of pending transactions.
Receiving
When you receive bitcoin, your node verifies the transaction and the entire block containing the transaction. The transaction paying you is broadcasted, ends up in the miner's mempool, the miner finds a block, the block is broadcast to the network, and your node downloads and verifies the block.
But that’s not how it actually works (unless you disable your node’s mempool and run in block-only mode). Your node validates each transaction as it first arrives in the mempool, and caches it as a valid Bitcoin transaction. When miners find a block, they actually only forward the block header and a small piece of compressed information (for lack of a better simple explanation) that can be used to determine which transactions are included in the block. Your node then grabs the pre-validated transactions, validates the block header, and if it all passes, forwards the “compact block”.
This optimization is actually why miners no longer rely on centralized and permissioned relay networks, such as FIBRE, formerly maintained by Matt Corrallo, and the short-lived Falcon Network. Miners used to have to connect to the Falcon Network to guarantee low latency for block relay with other miners, due to the slow relay speeds of peer-to-peer networks.
Miners
Miners obviously want to see everything. They are profit-driven entities that want to be able to filter through the largest possible set of pending transactions to only those transactions that contain the highest paid fees. This is how they maximize profits and earn revenue to continue to expand their business and remain competitive.
They literally take money from the mempool. Their incentive to take any valid paid transaction is so strong that they have historically, currently, and almost certainly in the future built numerous systems and even socially available informal arrangements designed to allow users to submit transactions directly to miners rather than through an open peer-to-peer network.
Block explorers, on-chain analysis tools, etc.
Like miners, they want to see every pending transaction that has been created and broadcast to the world. The main difference between the two is that miners profit directly from these transactions by collecting fees, while blockchain explorers and analysis companies profit from these transactions indirectly by displaying, analyzing, and providing information analysis in profitable products.
I can't name any specific examples involving cached mempool data, but it is well known that on-chain analysis companies regularly purchase privately obtained metadata on on-chain transaction activity. They also operate Sybil Bitcoin nodes, which peer as widely as possible with nodes across the network in order to narrow down the range of nodes that initially broadcast transactions.
Block explorers also make money from visual displays of blockchain and mempool data, and their entire business model revolves around this. Having access to more data and showing it to users means more potentially profitable information if that information or information derived from it can be displayed in a useful or novel way.
Information Wants to Flow
All of these different types of users benefit from “one” public mempool for a simple reason: information flows freely between them. As long as enough fees pass the minimum relay filter, it complies with consensus, and there is no legitimate risk of denial of service or resource exhaustion for individual nodes, it provides value to each type of user to propagate information in each individual mempool in the network.
Without a fully functional public mempool, the only viable alternatives for all of these different individual users with different purposes are either centralized solutions, or an unmanageable mess of sloppy and disorganized attempts to build fragmented public mempools that each user needs to keep track of individually.
This not only raises concerns about manipulated fee data, defrauding users, and the ability for miners to extract value by privately relaying transactions. These are all issues Bitcoin must face without a healthy and open public memory pool.
In subsequent posts, I will look at these issues, as well as different types of memory pool filters and why they exist.