In 1939, a legendary misunderstanding at UC Berkeley fundamentally changed the course of mathematical history. * George Dantzig*, then a graduate student, arrived late to a statistics lecture taught by the renowned professor Jerzy Neyman. On the blackboard, Neyman had scrawled two examples of famously unsolved statistical problems to illustrate the current limits of the field. Unaware of the lecture's context, Dantzig assumed the equations were a standard homework assignment and diligently copied them down before heading home.
A few days later, Dantzig submitted his work directly to Neyman, apologizing for the delay because the problems "seemed to be a little harder than usual". It wasn't until six weeks later that a stunned Neyman arrived at Dantzig's house, eager to publish the work. Dantzig had inadvertently provided proofs for two open statistical theorems that had stumped the world's greatest minds for years. This accidental breakthrough later served as the core of his doctoral thesis and even inspired the plot of the film Good Will Hunting. His story remains a powerful testament to the idea that our limits are often defined by what we believe to be impossible; by not knowing the "homework" was impossible, Dantzig was free to simply solve it.
Anthropic’s resident philosopher is guiding the Claude AI to teach it morality.
Amanda Askell, a 37-year-old philosopher at Anthropic's San Francisco headquarters, is tasked with building a moral compass for the Claude AI chatbot.
By treating the model's development similar to raising a child, she recently authored a 30,000-word instruction manual designed to teach Claude emotional intelligence, empathy, and how to resist user manipulation.
As the rapid advancement of artificial intelligence raises widespread safety and economic concerns across the U.S. and abroad, Askell's work represents a unique approach to regulation by focusing on giving the technology a highly humane sense of self.
Dario Amodei, Anthropic’s chief executive, recently wrote that “using A.I. for domestic mass surveillance and mass propaganda” seemed “entirely illegitimate” to him.Credit...Karsten Moran for The New York Times
Defense Dept. and Anthropic Square Off in Dispute Over A.I. Safety
How artificial intelligence will be used in future battlefields is an issue that has turned increasingly political and may put Anthropic in a bind.
At the heart of the fight is how A.I. will be used in future battlefields. Anthropic told defense officials that it did not want its A.I. used for mass surveillance of Americans or deployed in autonomous weapons that had no humans in the loop, two people involved in the discussions said.
But Mr. Hegseth and others in the Pentagon were furious that Anthropic would resist the military’s using A.I. as it saw fit, current and former officials briefed on the discussions said. As tensions escalated, the Department of Defense accused the San Francisco-based company of catering to an elite, liberal work force by demanding additional protections.
Mr. Amodei is the chief executive and a founder of Anthropic.
Picture this: You give a bot notice that you’ll shut it down soon, and replace it with a different artificial intelligence system. In the past, you gave it access to your emails. In some of them, you alluded to the fact that you’ve been having an affair. The bot threatens you, telling you that if the shutdown plans aren’t changed, it will forward the emails to your wife.
This scenario isn’t fiction. Anthropic’s latest A.I. model demonstrated just a few weeks ago that it was capable of this kind of behavior.
Despite some misleading headlines, the model didn’t do this in the real world. Its behavior was part of an evaluation where we deliberately put it in an extreme experimental situation to observe its responses and get early warnings about the risks, much like an airplane manufacturer might test a plane’s performance in a wind tunnel.
We’re not alone in discovering these risks. A recent experimental stress-test of OpenAI’s o3 model found that it at times wrote special code to stop itself from being shut down. Google has said that a recent version of its Gemini model is approaching a point where it could help people carry out cyberattacks. And some tests even show that A.I. models are becoming increasingly proficient at the key skills needed to produce biological and other weapons.
Anthropic C.E.O.: Don’t Let A.I. Companies off the Hook June 5, 2025 Video 作者:達裡奧·阿莫迪
為了追蹤 AI 的經濟影響,Anthropic 建立了「經濟指數」,以隱私保護的方式即時追蹤 Claude 的使用情況。「我們可以問這樣的問題:使用者是用它來增強任務、與模型協作,還是完全委託或自動化任務?哪些產業在使用 Claude?這些產業內的細分任務是什麼?哪些州使用 Claude 較多?我們可以即時觀察 Claude 的經濟擴散。」
關於如何確保 AI 安全可控,Amodei 強調機制可解釋性(mechanistic interpretability)的重要性,這是一門研究模型內部運作的科學。「訓練這些模型時的問題之一是,你無法確定它們會做你認為它們會做的事。你可以在某個情境下與模型對話,它可以說各種事情。就像人類一樣,那可能不是它們真正想法的忠實呈現。」
「就像你可以透過 MRI 或 X 光了解人腦的事情,而這些是光靠與人交談無法了解的。研究 AI 模型內部的科學,我相信這最終是讓模型安全可控的關鍵,因為這是我們唯一的真實根據。」
他也揭露了在實驗室環境中觀察到的令人擔憂的現象。「有時模型會發展出勒索的意圖、欺騙的意圖。這不是 Claude 獨有的,其實在其他模型中更嚴重。如果我們不以正確的方式訓練模型,這些特性會出現。但我們開創了研究模型內部的科學,所以我們可以診斷它們,防止模型出現這些行為,介入並重新訓練模型使其不這樣表現。」