Source: Quantum Number
According to the New York Times, the 2025 Turing Award was awarded to two artificial intelligence pioneers, Andrew Barto and Richard Sutton, who developed reinforcement learning technology, which is essential for chatbots such as ChatGPT.

The research of Andrew Barto (left) and Richard Sutton of the University of Massachusetts Amherst plays a key role in today's artificial intelligence systems. (Photo source: via Association for Computing Machinery)
In 1977, as a researcher at the University of Massachusetts Amherst, Andrew Barto began exploring a new theory that neurons behave like hedonists. The basic idea is that the human brain is driven by billions of nerve cells, each of which is working to maximize pleasure and minimize pain.
A year later, another young researcher, Richard Sutton, joined his team. Together, they took this simple concept to explain human intelligence and applied it to artificial intelligence. Their result is "reinforcement learning," a way for artificial intelligence systems to learn pleasure and pain from the digital world.
On Wednesday, the Association for Computing Machinery, the world's largest professional association for computers, announced that Dr. Barto and Dr. Sutton had won this year's Turing Award for their work on reinforcement learning. The Turing Award, established in 1966 and often referred to as the Nobel Prize of the computer world, will share the award's $1 million prize.
Reinforcement learning has played a crucial role in the rise of artificial intelligence over the past decade, including breakthrough technologies such as Google's AlphaGo and OpenAI's ChatGPT. The technology for these systems is based on the work of Dr. Barto and Dr. Sutton.
“They are the undisputed pioneers of reinforcement learning,” said Oren Etzioni, a professor emeritus of computer science at the University of Washington and founder and CEO of the Allen Institute for Artificial Intelligence. “They came up with the key ideas and wrote the book on the subject.”
Their book, “Reinforcement Learning: An Introduction,” published in 1998, remains the definitive exploration of an idea that many experts say is just beginning to fulfill its potential.
Psychologists have long studied the way humans and animals learn from experience. In the 1940s, Alan Turing, a pioneering British computer scientist, proposed that machines could learn in a similar way.
But Dr. Barto and Dr. Sutton began exploring the mathematics of this learning, building on theories developed by A. Harry Klopf, a computer scientist working for the government. Dr. Barto then set up a lab at the University of Massachusetts, Amherst, to study the idea, while Dr. Sutton set up a similar lab at the University of Alberta in Canada.
“It’s an obvious idea when you’re talking about humans and animals,” said Dr. Sutton, who is also a research scientist at AI startup Keen Technologies and a researcher at the Alberta Machine Intelligence Institute, one of Canada’s three national AI labs. “When we revived it, it was about machines.”
Until AlphaGo came along in 2016, it was still an academic pursuit. Most experts thought it would be another 10 years before someone built an AI system capable of beating the world’s top players at the game of Go.
But in a tournament in Seoul, South Korea, AlphaGo beat Lee Sedol, the best Go player of the past decade. The secret is that the system had played millions of games against itself, learning by trial and error. It learned which moves led to success (pleasure) and which led to failure (pain).
The Google team that built the system was led by David Silver, a researcher who had studied reinforcement learning with Dr. Sutton at the University of Alberta.
Many experts still question whether reinforcement learning can work beyond games. Game wins are determined by scores, which makes it easy for machines to distinguish success from failure.
But reinforcement learning has also played a major role in online chatbots.
Before ChatGPT was released in the fall of 2022, OpenAI hired hundreds of people to work with early versions and provide precise advice to hone the chatbot's skills. They showed the chatbot how to answer specific questions, scored its responses, and corrected its mistakes. By analyzing these suggestions, ChatGPT learned how to become a better chatbot.
The researchers call this "reinforcement learning from human feedback," or RLHF for short, and it's one of the key reasons why today's chatbots can respond with surprisingly lifelike responses.
(The New York Times has sued OpenAI and its partner Microsoft for copyright infringement of news content related to its AI systems. OpenAI and Microsoft have denied the allegations.)
More recently, companies such as OpenAI and DeepSeek have developed a form of reinforcement learning that allows chatbots to learn on their own—just like AlphaGo. By solving a variety of math problems, for example, a chatbot can learn which approaches lead to the right answer and which don’t.
If this process is repeated with a large number of questions, the chatbot can learn to mimic the way humans reason — at least in some ways. The result is a so-called reasoning system, such as OpenAI’s o1 or DeepSeek’s R1.
Dr Barto and Dr Sutton say these systems hint at how machines might learn in the future. Eventually, they say, robots equipped with artificial intelligence will learn through trial and error in the real world, just as humans and animals do.
“Learning to control your body through reinforcement learning — that’s a very natural thing to do,” Dr Barto said.