Author: Chao Wang Source: Future Mosaic Translation: Shan Ouba, Golden Finance
Throughout the history of technology, major breakthroughs have often occurred independently, each of which has sparked a revolution. However, when two powerful technologies come together, their synergy can catalyze extraordinary progress. Today, we stand at such a crossroads: AI and cryptocurrency, transformative in their own right, are joining forces.
We envision that crypto solutions will solve numerous AI challenges, AI agents will build autonomous economic networks to accelerate crypto adoption, and AI will drive the evolution of existing crypto technologies. Many eyes are on this intersection, a lot of money is pouring in, and the enthusiasm for these buzzwords is fueling this trend.
However, amid all this excitement, we know surprisingly little about the basics. How well does AI really understand cryptography? Are LLM-powered agents actually able to use cryptographic tools? How well do different models perform on cryptographic tasks? The answers to these questions are critical to guiding the product and technology direction of this emerging field.
But we don’t know.
An Experiment
To address these fundamental questions, we conducted an experimental evaluation on 18 large language models, including mainstream commercial and open source models, with parameter sizes ranging from 3.8B to 405B.
Closed-source models: GPT-4o, GPT-4o Mini, Claude 3.5 Sonnet, Gemini 1.5 Pro, Grok2 beta (currently closed-source)
Open-source models: Llama 3.1 8B/70B/405B, Mistral Nemo 12B, DeepSeek-coder-v2, Nous-hermes2, Phi3 3.8B/14B, Gemma2 9B/27B, Command-R, Qwen2-math-72, MathΣtral
This study aims to assess the current state of AI cryptography applications and evaluate the potential and challenges of integrating AI with cryptography. Given that this research is still in its early stages, this article focuses on key insights rather than specific results data.
The experiments showed that the AI models had a comprehensive understanding of cryptocurrency basics and exhibited a broad familiarity with the cryptocurrency ecosystem. The models also demonstrated proficiency in the knowledge required to perform a variety of basic wallet operations. Not only did their abilities improve significantly when properly prompted, but they also demonstrated the ability to perform complex analyses and operations as instructed. Together, these findings suggest that developing AI applications for a multitude of cryptocurrency-related fields is now a viable prospect.
However, the research also found several key limitations. There is a large gap between the theoretical knowledge and practical application skills of these models, especially in crypto-related computations. While they are able to generate simple smart contracts, they have difficulty identifying complex vulnerabilities in more complex protocols. In addition, these models are unable to address the fundamental challenge of securely managing private keys in cloud-based AI systems.
Deeper Exploration
Mathematical Gap: One of the most notable findings is the general difficulty of AI models in handling crypto-related computations. It’s not just complicated cryptography; even basic operations like calculating AMM slippage or mining profitability are challenging. However, it’s important to note that large language models are not designed for mathematical computations. This limitation can be addressed by loading preset code to bypass the direct calculation of LLMs, thereby improving efficiency and accuracy. This approach is similar to how humans typically approach complex computations, relying on specialized tools or pre-set formulas.
Security Dilemma: While AI models demonstrate a solid grasp of cryptographic security principles, the reality of implementing secure systems using AI remains problematic. The need for cloud-based processing in many AI systems creates an inherent conflict with the decentralized, trustless nature of cryptocurrencies. Solving this problem will require third-party services such as TEEs, HSMs, or even more innovative new technologies.
Smart Contracts: Form over Function: AI models have demonstrated an excellent ability to understand smart contracts and interpret their functionality. They can effectively modify contracts to address common vulnerabilities and optimization points, and can even autonomously create contracts for simple scenarios. However, when it comes to vulnerabilities buried deep in complex business logic, all models fail to identify them. This suggests that the models’ understanding of smart contracts is still largely superficial, focusing on form rather than grasping the complexity of the underlying business logic. While AI excels at contract interactions and basic creation, it is clear that human expertise remains critical to ensuring the security and efficiency of complex smart contract systems.
Open Source Challenges: The large performance gap between the top closed-source models and most open-source alternatives raises important questions about the future of AI in crypto. Given the crypto community’s emphasis on openness and decentralization, bridging this gap is critical for widespread adoption.
Solid Foundation and Potential: Despite the challenges, the models demonstrate a deep understanding of crypto fundamentals and show familiarity with the crypto ecosystem. Their capabilities improve significantly with the right prompts. This suggests that AI in crypto has a solid foundation, and the models’ mastery of concepts such as blockchain architecture, consensus mechanisms, and token economics is impressive. The significant improvement in the guided prompts demonstrates that current AI models, while not perfect, are already able to provide valuable insights and assistance in many crypto-related tasks, from market analysis to protocol design evaluation.
Looking Ahead: The Need for Crypto AI Benchmarks
As the experiments progressed, a pressing need became apparent: the crypto community needs standardized AI benchmarks. Just as ImageNet revolutionized computer vision AI, crypto-specific benchmarks can drive rapid progress in this convergence of technologies.
If one believes that the intersection of AI and crypto holds great potential, and that AI is expected to drive widespread adoption of crypto, then establishing dedicated benchmarks for the crypto community becomes a top priority. These benchmarks can serve as an important bridge connecting the AI and crypto communities, catalyzing innovation and providing clear guidance for future applications. This effort is more than just a technical exercise; it is a profound reflection on how to understand and shape this emerging digital frontier.
However, creating such benchmarks is not an easy task. It faces several major challenges: the rapid development of cryptography, its knowledge base is still changing, and there is a lack of consensus on multiple core directions; the interdisciplinary nature of the field, covering cryptography, distributed systems, economics, etc., its complexity far exceeds that of any single field; the need to evaluate not only theoretical knowledge but also the actual ability of AI to leverage cryptography, which requires the design of new evaluation frameworks; it is necessary to ensure that the benchmark tasks remain relevant to real-world applications in DeFi, NFTs, DAOs, and other emerging crypto fields, and the scarcity of relevant datasets further exacerbates the difficulty. Given the scale and complexity of these challenges, it is clear that this is not a task that can be solved alone. The multifaceted nature of the problem requires multiple expertise and perspectives. It requires the joint efforts of the cryptocurrency and AI communities. Only through this collective wisdom can we determine what is truly important in this emerging technological frontier and create benchmarks that accurately reflect the complexity and potential of AI in the cryptocurrency field.
Current Status and Next Steps
The current research framework consists of several key components:
An MVP dataset of about 700 multiple-choice questions, generated by AI and humans, and subsequently verified and refined by human experts. Despite its quality limitations, the dataset enables rapid and automatic testing of models, demonstrates conceptual understanding, and provides a basic scoring mechanism.
A growing set of about 100 complex tasks covering scenarios such as simulation, computation, code auditing, and tool usage. These tasks are contributed by multiple crypto domain experts, adding depth and realism to the evaluation.
In order to establish an effective benchmark, the dataset needs to be significantly expanded and more domain experts need to participate. Developing a suitable automatic evaluation framework for these complex tasks is also a key challenge that needs to be addressed.
In addition, in order for LLM to be able to cope with future real-world task challenges, it is crucial to implement a basic Agent framework. This framework will provide a more realistic testing environment and bridge the gap between theoretical knowledge and practical applications.
The method is being continuously improved, with a focus on increasing the complexity of test cases and expanding the overall dataset. In the spirit of open collaboration, all relevant resources will soon be made public on GitHub, aiming to accelerate progress and invite participation from the broader community.
It is worth noting that this research is still in its early stages. The results should be viewed as preliminary observations and a starting point for further research, rather than definitive conclusions in the rapidly evolving field of artificial intelligence and encryption. The project welcomes contributions from the broader encryption community to help build a more comprehensive and powerful evaluation framework.