Two AI pioneers win Turing Awards

2025/03/06 14:42

Source: Quantum Number

According to the New York Times, the 2025 Turing Award was awarded to two artificial intelligence pioneers, Andrew Barto and Richard Sutton, who developed reinforcement learning technology, which is essential for chatbots such as ChatGPT.

The research of Andrew Barto (left) and Richard Sutton of the University of Massachusetts Amherst plays a key role in today's artificial intelligence systems. (Photo source: via Association for Computing Machinery)

In 1977, as a researcher at the University of Massachusetts Amherst, Andrew Barto began exploring a new theory that neurons behave like hedonists. The basic idea is that the human brain is driven by billions of nerve cells, each of which is working to maximize pleasure and minimize pain.

A year later, another young researcher, Richard Sutton, joined his team. Together, they took this simple concept to explain human intelligence and applied it to artificial intelligence. Their result is "reinforcement learning," a way for artificial intelligence systems to learn pleasure and pain from the digital world.

On Wednesday, the Association for Computing Machinery, the world's largest professional association for computers, announced that Dr. Barto and Dr. Sutton had won this year's Turing Award for their work on reinforcement learning. The Turing Award, established in 1966 and often referred to as the Nobel Prize of the computer world, will share the award's $1 million prize.

Reinforcement learning has played a crucial role in the rise of artificial intelligence over the past decade, including breakthrough technologies such as Google's AlphaGo and OpenAI's ChatGPT. The technology for these systems is based on the work of Dr. Barto and Dr. Sutton.

“They are the undisputed pioneers of reinforcement learning,” said Oren Etzioni, a professor emeritus of computer science at the University of Washington and founder and CEO of the Allen Institute for Artificial Intelligence. “They came up with the key ideas and wrote the book on the subject.”

Their book, “Reinforcement Learning: An Introduction,” published in 1998, remains the definitive exploration of an idea that many experts say is just beginning to fulfill its potential.

Psychologists have long studied the way humans and animals learn from experience. In the 1940s, Alan Turing, a pioneering British computer scientist, proposed that machines could learn in a similar way.

But Dr. Barto and Dr. Sutton began exploring the mathematics of this learning, building on theories developed by A. Harry Klopf, a computer scientist working for the government. Dr. Barto then set up a lab at the University of Massachusetts, Amherst, to study the idea, while Dr. Sutton set up a similar lab at the University of Alberta in Canada.

“It’s an obvious idea when you’re talking about humans and animals,” said Dr. Sutton, who is also a research scientist at AI startup Keen Technologies and a researcher at the Alberta Machine Intelligence Institute, one of Canada’s three national AI labs. “When we revived it, it was about machines.”

Until AlphaGo came along in 2016, it was still an academic pursuit. Most experts thought it would be another 10 years before someone built an AI system capable of beating the world’s top players at the game of Go.

But in a tournament in Seoul, South Korea, AlphaGo beat Lee Sedol, the best Go player of the past decade. The secret is that the system had played millions of games against itself, learning by trial and error. It learned which moves led to success (pleasure) and which led to failure (pain).

The Google team that built the system was led by David Silver, a researcher who had studied reinforcement learning with Dr. Sutton at the University of Alberta.

Many experts still question whether reinforcement learning can work beyond games. Game wins are determined by scores, which makes it easy for machines to distinguish success from failure.

But reinforcement learning has also played a major role in online chatbots.

Before ChatGPT was released in the fall of 2022, OpenAI hired hundreds of people to work with early versions and provide precise advice to hone the chatbot's skills. They showed the chatbot how to answer specific questions, scored its responses, and corrected its mistakes. By analyzing these suggestions, ChatGPT learned how to become a better chatbot.

The researchers call this "reinforcement learning from human feedback," or RLHF for short, and it's one of the key reasons why today's chatbots can respond with surprisingly lifelike responses.

(The New York Times has sued OpenAI and its partner Microsoft for copyright infringement of news content related to its AI systems. OpenAI and Microsoft have denied the allegations.)

More recently, companies such as OpenAI and DeepSeek have developed a form of reinforcement learning that allows chatbots to learn on their own—just like AlphaGo. By solving a variety of math problems, for example, a chatbot can learn which approaches lead to the right answer and which don’t.

If this process is repeated with a large number of questions, the chatbot can learn to mimic the way humans reason — at least in some ways. The result is a so-called reasoning system, such as OpenAI’s o1 or DeepSeek’s R1.

Dr Barto and Dr Sutton say these systems hint at how machines might learn in the future. Eventually, they say, robots equipped with artificial intelligence will learn through trial and error in the real world, just as humans and animals do.

“Learning to control your body through reinforcement learning — that’s a very natural thing to do,” Dr Barto said.

Gain a broader understanding of the crypto industry through informative reports, and engage in in-depth discussions with other like-minded authors and readers. You are welcome to join us in our growing Coinlive community:https://t.me/CoinliveSG

Add Comment

LoginLeave your comments

0 Comments

Earliest

Load more comments

Live Updates

11 hours ago
“Don't sell your Bitcoin. It's going to be very hard to get it back." -Eric Weiss
Bullish
Bearish
11 hours ago
U.S. Congressman discloses 2 bizzare crypto trades
Bullish
Bearish
11 hours ago
Trump Warns U.S. Could Become “Third World” Without Tariffs, Pushes for Early Supreme Court Ruling
Bullish
Bearish
11 hours ago
US regulators clarify rules for spot crypto trading
Bullish
Bearish
11 hours ago
Thiel-Backed Whale ETHZilla Dives Into Liquid Restaking, Injecting $100M ETH Into EtherFi
Bullish
Bearish
11 hours ago
Coinbase launches futures product combining tech stocks with crypto exposure
Bullish
Bearish
12 hours ago
This Canadian shawarma chain has been buying Bitcoin since COVID — despite Warren Buffett’s warning
Bullish
Bearish
12 hours ago
파이 네트워크, 업그레이드 준비⋯ 수요는 정체
Bullish
Bearish
12 hours ago
تعرّف على أفضل العملات البديلة للشراء قبل الترقية القادمة لشبكة سولانا
Bullish
Bearish
13 hours ago
리플(XRP), 9월 전문가 가격 전망은?
Bullish
Bearish

Two AI pioneers win Turing Awards

Live Updates

Trending News

Europol Dismantles Global Drug Syndicate, Confiscates $26M in Crypto From 9 Drug Traffickers

Solana’s Fartcoin Rockets to $1 Billion Market Cap Before Dipping: Is This the Start or the End of This Meme Coin’s Journey?

Will AI Revolutionise or Undermine Education? World Bank Advocates for Strategic Adoption, Rejects Outright Opposition

Satoshi Proclaimer Craig Wright Gets One Year Jail Time for Contempt of Court: Does the Punishment Fit the Case or is it Too Lenient?

Bitfinex Hacker Breaks Silence: Claims Solo Heist of 120,000 Bitcoin, Defends Wife's Innocence—Noble Gesture or Reckless Move?

Crypto.com and F1 to Continue Shared Momentum for Five More Years with Multi-Year Renewal of Partnership: A Formula for Success?

High-Profile Cases Converge: UnitedHealth CEO Murder Case Suspect Shares Jail with Diddy and SBF

Blockchain in Orbit: SpaceCoin XYZ, Equipped with Crypto Engines, Launches First Satellite for Outer Space Security

Former South Korea Lawmaker Sentenced for Hiding $6.8 Million in Undisclosed Crypto Holdings

Singaporean Student Loses $5,000 in Cryptocurrency After Clicking on Fake Google Ad