Author: Benj Edwards

AI

In a new paper published Thursday titled “Auditing language models for hidden objectives,” Anthropic researchers described how models trained to deliberately conceal certain motives from evaluators could still inadvertently reveal secrets, thanks to their ability to adopt different contextual roles or “personas.” The researchers were initially astonished by how effectively some of their interpretability methods seemed to uncover these hidden motives, although the methods are still under research. While the research involved models trained specifically to conceal motives from automated software evaluators called reward models (RMs), the broader purpose of studying hidden objectives is to prevent future scenarios where powerful…

Read More
AI

On Saturday, a developer using Cursor AI for a racing game project hit an unexpected roadblock when the programming assistant abruptly refused to continue generating code, instead offering some unsolicited career advice. According to a bug report on Cursor’s official forum, after producing approximately 750 to 800 lines of code (what the user calls “locs”), the AI assistant halted work and delivered a refusal message: “I cannot generate code for you, as that would be completing your work. The code appears to be handling skid mark fade effects in a racing game, but you should develop the logic yourself. This…

Read More
AI

Even when these AI search tools cited sources, they often directed users to syndicated versions of content on platforms like Yahoo News rather than original publisher sites. This occurred even in cases where publishers had formal licensing agreements with AI companies. URL fabrication emerged as another significant problem. More than half of citations from Google’s Gemini and Grok 3 led users to fabricated or broken URLs resulting in error pages. Of 200 citations tested from Grok 3, 154 resulted in broken links. These issues create significant tension for publishers, which face difficult choices. Blocking AI crawlers might lead to loss…

Read More
AI

Anthropic CEO Dario Amodei raised a few eyebrows on Monday after suggesting that advanced AI models might someday be provided with the ability to push a “button” to quit tasks they might find unpleasant. Amodei made the provocative remarks during an interview at the Council on Foreign Relations, acknowledging that the idea “sounds crazy.” “So this is—this is another one of those topics that’s going to make me sound completely insane,” Amodei said during the interview. “I think we should at least consider the question of, if we are building these systems and they do all kinds of things like…

Read More
AI

On the Frontier Math benchmark by EpochAI, o3 solved 25.2 percent of problems, while no other model has exceeded 2 percent—suggesting a leap in mathematical reasoning capabilities over the previous model. Benchmarks vs. real-world value Ideally, potential applications for a true PhD-level AI model would include analyzing medical research data, supporting climate modeling, and handling routine aspects of research work. The high price points reported by The Information, if accurate, suggest that OpenAI believes these systems could provide substantial value to businesses. The publication notes that SoftBank, an OpenAI investor, has committed to spending $3 billion on OpenAI’s agent products…

Read More
AI

Developers using the Responses API can access the same models that power ChatGPT Search: GPT-4o search and GPT-4o mini search. These models can browse the web to answer questions and cite sources in their responses. That’s notable because OpenAI says the added web search ability dramatically improves the factual accuracy of its AI models. On OpenAI’s SimpleQA benchmark, which aims to measure confabulation rate, GPT-4o search scored 90 percent, while GPT-4o mini search achieved 88 percent—both substantially outperforming the larger GPT-4.5 model without search, which scored 63 percent. Despite these improvements, the technology still has significant limitations. Aside from issues…

Read More