DeepSeek before V4 release: Traits, Organization, and Liang Wenfeng's unique goals

2026/04/02 17:06

DeepSeek is at a critical juncture. Since the second half of 2025, the following DeepSeek members have clearly left or found new opportunities: Wang Bingxuan, who was poached by Yao Shunyu of Tencent at the end of last year, was the core author of DeepSeek LLM (DeepSeek's first-generation large language model) and participated in the training of subsequent models. Wei Haoran, who left around the Spring Festival, was the core author of the DeepSeek-OCR series and may join a major company. Guo Daya, who recently officially resigned, is a core author of DeepSeek-R1 and may join a major company. Ruan Chong, who left in early 2025 to retire, officially announced in January this year that he had joined the autonomous driving startup DeepSeek. Ruan Chong is a veteran member who joined during the Magic Square era and is a core contributor to DeepSeek's multimodal achievements such as Janus-Pro. DeepSeek has not previously raised funds and has no clear company valuation. While other AI companies have seen their market capitalization or valuation soar, Liang Wenfeng is trying to answer the team members' questions: How much is the company really worth? This is related to the value of the stock option agreements signed by employees. Starting in the fall of 2025, Liang Wenfeng began to emphasize productization and commercialization more. DeepSeek already had a product team of several dozen people, but it had not yet ventured into popular application areas such as AI programming and general agents, and its consumer-facing product was still only a typical chatbot. Liang Wenfeng's new challenge also included managing scale. DeepSeek's headcount had surpassed that of Magic Cube, making it the largest organization he had ever managed. Overshadowing these multiple changes, DeepSeek V4 had not yet been officially released. In fact, around January 2026, a minor parameter version of V4 had been provided to some open-source framework communities for adaptation. According to previously relatively optimistic expectations, the high-parameter version of V4 was originally expected to be released and open-sourced around mid-February, around the Spring Festival. It is understood that DeepSeek V4 may be released in April. Some people left, but more chose to stay. DeepSeek is adjusting, but many of its characteristics remain unchanged. It is one of the few core AI labs globally that doesn't rely on heavy workloads. While core AI developers at companies like Google, OpenAI, xAI, and ByteDance in China and the US work 70-80 hours a week, most DeepSeek employees leave the company around 6-7 pm on weekdays, and they don't clock in in the morning. Liang Wenfeng believes that it's difficult for a person to produce high-quality output for more than 6-8 hours a day. DeepSeek doesn't have explicit performance evaluations or deadlines. This lean yet highly talent-dense organization continues its "natural division of labor," allowing researchers to freely form teams or pursue new ideas independently. "Besides the main research areas, some people at DeepSeek are also doing long-term research that might not yield results even after a year," said a source close to DeepSeek. "DeepSeek is the best place to find people who genuinely want to do research, both domestically and globally." Of course, DeepSeek also has another characteristic: secrecy. Especially after 2025, apart from publicly releasing technical reports, founder Liang Wenfeng and team members have collectively remained "silent," making it difficult to hear their voices on social media or in communities where AI practitioners are active. This report presents DeepSeek's characteristics, work focus, organizational structure, and the changes taking place in this organization of fewer than 200 people, as learned from various sources. All of this stems from the unique goals set for DeepSeek by Liang Wenfeng. Liang Wenfeng: Doing a Few Things, Doing Them to the Extreme. Liang Wenfeng's AI goals predate DeepSeek's founding in 2023. In 2016, Hassabis, the founder of DeepMind and the originator of AGI, assembled a quantitative trading team to generate revenue for DeepMind, which was then trying to separate from Google, but failed to make money. In the same year, Liang Wenfeng, a graduate of Zhejiang University with both bachelor's and master's degrees, had been working in quantitative investment for eight years. He founded Magic Square in 2015, began running deep learning live trading on GPUs in 2016, and by the end of 2017, he had achieved "AI-enabled almost all trading strategies." In 2019, he began building Magic Square's first computing cluster, "Firefly 1," with 1,100 GPUs. Also in 2019, Magic Square AI (Magic Square Artificial Intelligence Basic Research Co., Ltd.) was officially registered. Luo Fuli, who is now in charge of AI at Xiaomi, and Ruan Chong, who recently joined Yuanrong, both joined Magic Square after this, and later moved to DeepSeek in 2023. As someone who achieved financial freedom before the age of 30, Liang Wenfeng's life is simple and mysterious. Those around him remember him wearing the same clothes for days on end. He used to stay in hotels in Hangzhou for extended periods, while renting apartments in Beijing, where most of the DeepSeek R&D staff were located. He is lean, has a regular exercise routine, and is known for his outdoor hobbies such as hiking. While Huang Renxun invites Nvidia employees to his home for drinks, casual conversation, and to happily show off sports cars, Liang Wenfeng doesn't participate in quarterly team-building activities, rarely dines with team members, and only makes a brief appearance at the year-end team-building event, never participating in the entire event. In 2022, an employee of MagicCube, known as "An Ordinary Little Pig," donated 138 million yuan to charity. Many later speculated that this "little pig" was Liang Wenfeng. MagicCube staff responded, "Employee donations are all anonymous, and the company doesn't know the little pig's true identity." Within his work scope, Liang Wenfeng only does a few things. He doesn't do things that most startup CEOs do, such as fundraising. In 2023, Liang Wenfeng met with a small group of investors. But according to our understanding, he made an unconventional request: similar to the investment agreement between OpenAI and Microsoft, Liang Wenfeng wanted investors to accept a cap on the return. After this round of meetings, no institutional investors invested in DeepSeek. In the two years that followed, large-scale modeling funding in China surged, with single rounds frequently reaching hundreds of millions of dollars. However, Liang Wenfeng stopped meeting with investors and even ceased establishing new connections. Even outside of funding season, most founders wouldn't refuse to meet with partners from top-tier institutions, but Liang Wenfeng declined most such requests. Liang Wenfeng devoted almost all his time to a few things he believed should be focused on, doing them meticulously and to the extreme. One of the keys to DeepSeek's previous success was its "single-minded focus," prioritizing language models above all else and avoiding popular directions like multimodal generation. Within his chosen main focus, Liang Wenfeng would delve into the details "hands-on." He learns about algorithms, architecture, infrastructure, and data from team members with diverse backgrounds, and actively participates in detailed discussions about models and products. Many who have met Liang Wenfeng mention that he lacks the "aura" of a CEO or a so-called genius; he's more like a researcher, discussing specific technical issues with others. Zhang Jinjian, founding partner of Oasis Capital, shared a story in his book "Those Who Live Out Their Lives," where he asked Yan Junjie, founder of MiniMax, which he had invested in, "Is there anyone more focused than you?" Yan Junjie recounted an experience where he arranged to have dinner with a friend he'd never met before. Arriving early, he saw a young man in a T-shirt and assumed he was an assistant. The man didn't introduce himself but instead asked Yan Junjie many technical questions. After half an hour, Yan Junjie asked, "When will Mr. Liang arrive?" The man replied, "I am Liang Wenfeng." DeepSeek Organization: Flat, Cross-functional, No Overtime In line with Liang Wenfeng's style, DeepSeek's organization is extremely flat, with cross-functional division of labor, cautious expansion, and no overtime. When Liang Wenfeng founded Magic Square, he had partners, but DeepSeek has no second-in-command, especially in the research team, which only has two levels: Liang Wenfeng and other researchers. Liang Wenfeng makes major decisions and bears the most responsibility for the results. This research team now has about 100 people, resembling a large laboratory. DeepSeek researchers, primarily born around 2000, habitually refer to Liang Wenfeng, born in 1985, as "Boss Liang." This "boss" is more like a mentor: organizing R&D, coordinating resources, conducting specific research, and appearing as the corresponding author on collaborative projects. Liang Wenfeng himself is most involved in the schema architecture team, where he engages in in-depth discussions to finalize the architecture of each generation of schemas. This team comprises several dozen people and is the main force behind pre-training. Closely related to the schema architecture are the Infra and data teams, each also numbering several dozen. While the Infra team in some companies might resemble an "internal contractor" fulfilling algorithm requirements, DeepSeek's Infra team participates in discussions and provides suggestions during the finalization stage before model training. The close collaboration between these modules blurs the lines between DeepSeek's teams, creating a "cross-functional division of labor." This is actually the most suitable form of collaboration for model training, because data selection and infrastructure implementation must be considered during model experimentation and finalization. Liang Wenfeng acts as the detector and glue connecting these different modules; he attends each team's individual meetings to understand the overall progress and bottlenecks. Most of DeepSeek's team weekly meetings are also open to people from other teams, allowing cross-team participation. The in-depth, detail-oriented leadership style and spontaneously formed close collaboration are difficult to achieve in large organizations. Therefore, DeepSeek is very cautious about expanding the size of its core R&D team. What's truly unique in the global AI community is that DeepSeek doesn't have overtime. They don't have clock-in/out systems, nor are there explicit performance evaluations. Most members leave the company around 6 or 7 PM on weekdays. DeepSeek provides employees with free after-get off work benefits, such as sports lessons and reimbursement for sports facilities. Liang Wenfeng believes that it's difficult for an individual to work at a high quality for more than 6-8 hours a day. Poor judgment under the influence of overtime fatigue wastes valuable computing resources, resulting in more harm than good. Regarding personnel, DeepSeek previously rarely recruited from the outside, primarily retaining recent graduates and interns. In early 2025, LatePost compiled a list of 172 researchers (including interns) who had participated in DeepSeek's three generations of models (LLM, V2, V3 & R1), and found the resumes of 84 of them: over 70% were undergraduates or master's students, and over 70% were under 30 years old. Before V3 and R1, DeepSeek, with about 1/10 the staff of a large company and about half the average working hours, achieved a high level of focus and dedication, placing it among the world's top large-scale model developers. However, as access to cutting-edge AI capabilities required exploring more and more directions, maintaining this organizational size, communication style, and collaborative atmosphere became increasingly difficult. Over the past 15 months, DeepSeek has continued to do its own thing, while the external world has changed dramatically. After the explosive success of V3 and R1 in early 2025, DeepSeek didn't capitalize on its momentum with a major overhaul. Instead, it continued its focused research and development, with publicly available results falling into three main categories: First, efficiency optimization: maximizing GPU computing power to increase the intelligence output per unit of computing power. This includes the complete training and inference infrastructure released by DeepSeek at its open-source week in early 2025, encompassing the inference kernel, communication library, matrix multiplication library, and data processing framework. (Note: The kernel is the code that performs the lowest-level computations on the GPU, used to implement core operations such as matrix multiplication.) There have also been continuous improvements to the "attention mechanism": such as NSA (Native Sparse Attention) in early 2025 and the subsequent DSA (Dynamic Sparse Attention). Combined with MLA (Multi-Head Latent Attention) in the earlier V2, their common goal is to process longer contexts without significantly increasing computing power. As seen in the DeepSeek-V3.2 update at the end of September 2025, DeepSeek even changed its underlying operator library from the mainstream CUDA and Triton languages to TileLang. CUDA is the lowest-level language provided by NVIDIA, Triton is open-sourced by OpenAI, and TileLang is an open-source project initiated by Yang Zhi's team at Peking University. Secondly, there are improvements to the model architecture, such as mHC (Popular Constrained Hyperconnection), released in early 2026, which aims to improve stability during large-scale training; and Engrams, which build long-term memory outside the model. It is widely believed that mHC will be used in the training of V4. Thirdly, there are some "non-mainstream" explorations, such as DeepSeek-OCR, which converts text into images and then inputs them into the model. This approach aims to allow the model to understand paragraphs and hierarchical levels in a way that is closer to how humans "read text," improving its ability to understand complex documents. Within DeepSeek, there are many more such ongoing attempts, including continuous learning and self-learning. In 2025, Liang Wenfeng also recruited consultants with backgrounds in neuroscience and brain science to explore learning mechanisms closer to the human brain. Meanwhile, the external AI environment has changed dramatically since 2025, with two main competitive lines gaining attention: One is Agentic models and applications based on coding capabilities. This is currently the most intense battleground between Anthropic and OpenAI, resulting in the competition between Opus 4.6 vs. GPT-5.4 and Claude Code vs. Codex. The OpenClaw crayfish, which has become incredibly popular since the beginning of the year, is also the latest form of Agentic application. Secondly, there's multimodal generation, a field that has repeatedly gained attention for its "magical effects": OpenAI GPT-4o in the spring of 2025, Google NanoBanana in the fall, and ByteDance Seedance 2.0 before the 2026 Spring Festival. Video generation is also related to a more cutting-edge direction: "world models." DeepSeek initially didn't invest much in multimodal generation because Liang Wenfeng believed it wasn't the main focus of intelligence. Regarding the Agent, DeepSeek-V3.2 strengthened its Agent capabilities, but DeepSeek's overall iteration frequency was lower than R1, causing considerable anxiety among other smaller developers. Since the beginning of 2025, Zhipu, MiniMax, and Kimi have updated their models to versions 5, 4, and 3 respectively, enhancing the Agent or coding capabilities. According to OpenRouter data, in the past 30 days (February 24th - March 26th), among the top 10 models consuming the most OpenClaw application tokens through OpenRouter, 6 models originated from China, with DeepSeek-V3.2 ranking 12th. (Note: OpenRouter reflects the usage of individuals and small and medium-sized developers more, and can only be used as a reference for overall token consumption.) First, it's about building large-scale models based on the domestic ecosystem. DeepSeek will invest in adapting to domestic GPUs to address the limited supply of high-performance GPUs. For example, after updating to V3.1 last August, they mentioned that DeepSeek uses UE8M0 FP8—a data compression format—"designed for next-generation domestic chips." The aforementioned replacement of Triton with the domestic open-source TileLang is also this type of work, allowing for greater control at the foundational layer. In discussions with AI practitioners, Liang Wenfeng also raised the hypothetical question: "Can we use a portion of existing computing power to achieve all current intelligence?" Second, it's about "original innovation," exploring directions that large companies or other startups wouldn't or wouldn't attempt. For example, in the second half of 2024, DeepSeek began the Janus series, attempting to unify the understanding and generation of multimodal models. DeepSeek has also worked on the Prover series, exploring formal proofs. There's also OCR (Optical Character Recognition) from 2025, and ongoing internal research into continuous learning and bionic brain-like models. As the founder, Liang Wenfeng is most concerned not only with the model's performance itself, but also with the more fundamental and original discoveries made along the way. However, this doesn't align with some of the external expectations for DeepSeek: some hope that every move DeepSeek makes will be as groundbreaking as R1, which is somewhat unrealistic and doesn't conform to the laws of technology. Liang Wenfeng may disregard external expectations, but he must confront and address internal expectations. For many younger researchers, conducting more cutting-edge research also means embracing greater uncertainty. A safer path is to continuously participate in the industry's strongest models, be named on high-profile technical reports, and have access to abundant GPU resources to support experiments and exploration. Besides the prestige and influence, the high financial promises that attract DeepSeek members are another major draw. DeepSeek's absolute salaries are not low, but offers from outside are even higher. Some headhunters told us that competitors offered "irresistible figures," "two to three times the amount wouldn't be a problem," and "other companies offered eight-figure sums (including stock or options)." New developments include the IPOs of MiniMax and Zhipu, which saw their stock prices soar, and the upcoming IPOs of Jieyue and Kimi. This has led some DeepSeek members to question the unpriced stock options they hold. Faced with lucrative offers, more people chose to stay. They appreciate Liang Wenfeng's approach to AGI and are willing to pursue non-competitive exploration; they are also accustomed to DeepSeek's relatively relaxed and comfortable research environment. Recent rumors are inaccurate; while there have been changes within the DeepSeek team, there has been no mass exodus. "Those who stayed still have some ideals," said a person close to DeepSeek. Liang Wenfeng felt that besides improving model efficiency and performance, it was necessary to explore directions with unclear immediate returns, because "companies with more computing power abroad, such as Google and OpenAI, are definitely trying various directions internally." To this day, DeepSeek's relatively small team and its transparent, flat organizational structure since its inception allow for natural division of labor among members: sometimes a new direction is started simply because three to five people think an idea is good, and then they work on it together. This echoes Liang Wenfeng's description in a 2024 interview with *Dark Surge*: "We generally don't pre-define tasks," and "Everyone has their own unique growth experience and comes with their own ideas; there's no need to push them... However, when an idea shows potential, we will allocate resources from top to bottom." "DeepSeek is a place where people who genuinely want to do research can be found, both domestically and globally," said a person close to DeepSeek. Changing the world, and being changed by the world—DeepSeek's unique understanding and breakdown of AGI goals is its valuable asset, and also the reason for its current internal tensions. Because Liang Wenfeng's emphasis on ecosystem building and original exploration overlaps with, but is not entirely consistent with, the industry's general prioritization of "maintaining the strongest"—a goal that is not entirely aligned with, the overall objective. Furthermore, as large models have evolved, the standards for "strong" and "original" have become increasingly blurred and subjective. Benchmark scores can no longer fully measure model performance. Especially after entering the competition for agentic models, product reach and the resulting long-tail use cases and diverse data have become more important—precisely areas where DeepSeek, focused on model development, has not previously invested much. The upcoming V4 will likely remain the strongest open-source model, but it's unlikely to be overwhelmingly strong. This is because different developers and users in different scenarios now have increasingly diverse standards and perceptions of "strong." What constitutes original and valuable new exploration has always been a matter of debate, depending on the experience, judgment, and intuition of different researchers—the so-called "technological taste." The way to verify this taste is through experimentation, but the number and scale of experiments are limited by GPU resources. Compared to its peers, DeepSeek doesn't have as much computing power. Finally, whether it's the ecological foundation of large models or exploring directions that other teams might not attempt in the pursuit of model performance, the returns for the work that Liang Wenfeng values are extremely uncertain. Cutting-edge research should bear this uncertainty, but it doesn't fully match the fact of limited computing resources and the outside world's expectations that DeepSeek can continue to amaze or even "crush" others. Liang Wenfeng realized the need for change, and recently he began to find ways to value the company and give team members more definite expectations. DeepSeek will also invest more in product development. We have compiled all the job postings published by a DeepSeek HR on social media from December 2024 to the present. In the latest job posting in mid-March this year, DeepSeek mentioned the name of a specific product for the first time, recruiting a "Model Strategy Product Manager" in the Agent field: "Continuously tracking industry trends, familiar with and deeply using well-known agents such as Claude Code, OpenClaw, and Manus…" We will definitely see more moves from DeepSeek in Agent products in the future. In early 2025, DeepSeek, with its generous open-source spirit and remarkable achievements despite its limited resources, shocked and changed China and the world. It inspired a group of peers to focus more on the model technology itself, spurred subsequent models such as Kimi K2 and K2-thinking, and directly spawned new teams, such as MiroMind, funded by Chen Tianqiao. A miracle is a miracle precisely because it is rare and a low-probability event. In China's competitive and results-oriented environment, the very existence of DeepSeek, which dares to pursue unique goals, is a surprising and low-probability event. Those who have met Liang Wenfeng describe him as "remarkably resistant to noise." After the R1 craze in 2025, Liang Wenfeng displayed indifference to the hype. Now, he faces a different challenge: distinguishing between noise and signals amidst intensifying external competition, adhering to what should be adhered to, and changing what needs to be changed. "Those who focus on their work may not necessarily succeed in the turbulent market, but only with more companies like DeepSeek can Chinese technology move from 'copying' to leading the way," said one industry insider. This is the work of Liang Wenfeng and DeepSeek. For those who were deeply moved by this company, the best course of action is simple: remove the overly optimistic narrative and approach the company and its technological innovations with a more objective and realistic perspective.

Gain a broader understanding of the crypto industry through informative reports, and engage in in-depth discussions with other like-minded authors and readers. You are welcome to join us in our growing Coinlive community:https://t.me/CoinliveSG

Add Comment

LoginLeave your comments

0 Comments

Earliest

Load more comments

Live Updates

Yesterday
Argentina's Export Surge Offers Opportunity to Boost Currency Reserves
Bullish
Bearish
Yesterday
Vitalik Buterin Discusses Quantum-Resistant Algorithms at Web3 Carnival 2026
Bullish
Bearish
Yesterday
AI TRENDS | Kimi K2.6 Tops OpenRouter Programming Leaderboard
Bullish
Bearish
Yesterday
Robinhood Ventures invests $75 million in OpenAI
Bullish
Bearish
Yesterday
Iranian media reports that undersea cables in the Strait of Hormuz may be at risk.
Bullish
Bearish
Yesterday
Bitcoin's monthly Bollinger Bands have reached their narrowest level in history, potentially signaling strong volatility.
Bullish
Bearish
Yesterday
Volvo CEO: EX60 Orders Surpass Expectations, Production to Reach 40,000 Units This Year
Bullish
Bearish
Yesterday
Bitcoin's Fundamentals Adjust Amid BTCFi Ecosystem Contraction
Bullish
Bearish
Yesterday
Bitcoin's Potential for Volatility Amid Technical Indicators
Bullish
Bearish
Yesterday
ECB's Kocher Cites Uncertainty Over Iran Conflict Ahead of Rate Meeting
Bullish
Bearish

DeepSeek before V4 release: Traits, Organization, and Liang Wenfeng's unique goals

Live Updates

Trending News

Are Researchers’ Act of Jailbreaking AI-Powered Robots for Unrestricted Actions Justifiable or Risky?

Sam Altman Presents A Whole New ‘World’ as Worldcoin Rebrands with Layer-2 Network and New Identity Verification Features

US Treasury Recovers $4 Billion with AI, Sixfold Increase from Previous Year

市场大幅波动！无聊猿NFT地板价飙升，APE代币单日暴涨超60%背后的原因是什么？

重磅来袭！首批比特币现货ETF期权正式登陆纽交所

中国发布《国家级开放区块链网络白皮书》，支持上海推进区块链创新与发展

美国财政赤字再创新高，国债利息飙升引发加密金融解法热议

黃仁勳笑看 x86 聯盟崛起：讚揚 Intel 和 AMD 抗衡 Arm 架構的舉措

Can You Trust Celebrity-Endorsed Cryptos? Andrew Tate's Tokens Collapse to Zero – What Happened

Crypto CEO’s Singapore PR Rejection Goes Viral, Says He Has "Started Exploring Alternatives"