Here is a 444-word journalistic summary synthesizing the key information from the 5 articles: OpenAI CEO Sam Altman Warns AI Advancements Outpacing Preparedness In a series of recent interviews and announcements, OpenAI CEO Sam Altman has sounded the alarm that the rapid progress of artificial intelligence is outpacing the world's ability to handle the implications. Altman believes artificial general intelligence (AGI) and even superintelligence are "pretty close" and "not that far off," respectively, warning that "the world is not prepared" for the capabilities that OpenAI is already developing internally. Altman revealed that the AI models OpenAI is currently using internally are accelerating the company's research and development at a faster pace than originally anticipated. He suggested these models go beyond what is publicly available, stating "we're going to have extremely capable models soon." This rapid advancement has left Altman feeling "stressed and anxious" about the future, as he believes the world is ill-equipped to handle the impending AI breakthroughs. The financial realities of this AI arms race are also becoming clearer. OpenAI has revised its cash burn projections, now expecting to spend a staggering $665 billion training and operating AI models through 2030 - a $111 billion increase from previous estimates. While revenue is climbing, it cannot keep pace with the skyrocketing costs, delaying OpenAI's projected profitability until 2030. This spending surge is driven by the compute-intensive nature of the latest AI systems. OpenAI's new coding model, Codex 5.3, was co-developed by the model itself, illustrating the self-improving capabilities of these advanced systems. Altman noted that traditional software development skills are becoming "effectively completely irrelevant" as AI automates more tasks. In response, competing AI labs like DeepSeek are working to develop more transparent models that can explain their reasoning. DeepSeek-R1, for example, outperforms OpenAI's GPT-4 on several math and problem-solving benchmarks while displaying its step-by-step thought process. This aims to make the models more accountable and less vulnerable to misuse. Additionally, OpenAI is expanding into crypto security, launching EVMbench to test whether AI can help identify and fix vulnerabilities in Ethereum smart contracts. As decentralized finance continues to grow, securing these self-executing contracts is crucial. Overall, Altman's warnings underscore the breakneck pace of AI advancement and the urgent need for the world to prepare for the profound societal, economic, and security implications. While OpenAI and other leading labs race to develop ever-more capable AI systems, the challenge of ensuring these technologies are used responsibly and for the benefit of humanity remains a critical priority.
Anthropic's Claude Code: A Highly Agentic Coding Assistant Anthropic's Claude Code is making waves in the AI coding assistant space, with its ability to autonomously generate, execute, and improve code with minimal human input. According to a senior Google engineer, Jaana Dogan, Claude Code was able to produce a working system in just one hour that matched what her team had been developing for over a year. The key to Claude Code's capabilities lies in its "agentic" nature - it can plan, execute, and refine code, rather than just providing occasional assistance or code completion. This allows developers to run multiple instances of Claude Code in parallel, coordinating different parts of a codebase. However, Anthropic emphasizes that there are best practices to leverage this effectively, which are covered in a new course on The Batch by Andrew Ng. The course delves into Claude Code's underlying architecture, including the tools it uses to navigate codebases and maintain memory across sessions. It also explores practical applications, such as enhancing a RAG chatbot, analyzing ecommerce data, and building a web app from a Figma mockup. Beyond the core coding capabilities, Anthropic has recently added new desktop features to Claude Code. These include the ability to spin up development servers, display running web apps, spot and fix errors automatically, and even handle code reviews and merge pull requests on GitHub projects. Notably, Anthropic has also introduced Claude Code Security, a research preview feature aimed at security teams and open-source maintainers. This tool uses the latest Claude Opus 4.6 model to identify complex vulnerabilities that traditional static analysis tools often miss, and suggests targeted patches for human review. While the reactions to large language models like Claude Code have ranged from "fanboy" enthusiasm to "curmudgeon" skepticism, Ellie Pavlick of Brown University suggests that "it's OK to not know" the full extent of their capabilities or intelligence. Nonetheless, the rapid advancements in Claude Code's functionality are undeniable, and Anthropic is positioning it as a powerful tool to streamline and enhance the software development workflow.
Distilling the tool-using capabilities of large language models (LLMs) into smaller, more efficient small language models (SLMs) is a key challenge for their practical application. The predominant approach, supervised fine-tuning (SFT), suffers from poor generalization as it trains models to imitate a static set of teacher trajectories rather than learn a robust methodology. While reinforcement learning (RL) offers an alternative, the standard RL using sparse rewards fails to effectively guide SLMs, causing them to struggle with inefficient exploration and adopt suboptimal strategies. To address these distinct challenges, we propose MENTOR, a framework that synergistically combines RL with teacher-guided distillation. Instead of simple imitation, MENTOR employs an RL-based process to learn a more generalizable policy through exploration. In addition, to solve the problem of reward sparsity, it uses a teacher's reference trajectory to construct a dense, composite teacher-guided reward that provides fine-grained guidance. Extensive experiments demonstrate that MENTOR significantly improves the cross-domain generalization and strategic competence of SLMs compared to both SFT and standard sparse-reward RL baselines.
Gradient-based data attribution methods, such as influence functions, are critical for understanding the impact of individual training samples without requiring repeated model retraining. However, their scalability is often limited by the high computational and memory costs associated with per-sample gradient computation. In this work, we propose GraSS, a novel gradient compression algorithm and its variants FactGraSS for linear layers specifically, that explicitly leverage the inherent sparsity of per-sample gradients to achieve sub-linear space and time complexity. Extensive experiments demonstrate the effectiveness of our approach, achieving substantial speedups while preserving data influence fidelity. In particular, FactGraSS achieves up to 165% faster throughput on billion-scale models compared to the previous state-of-the-art baselines. Our code is publicly available at https://github.com/TRAIS-Lab/GraSS.
Large language models (LLMs) have demonstrated promising performance in generating diagnostic conclusions from imaging findings, thereby supporting radiology reporting, trainee education, and quality control. However, systematic guidance on how to optimize prompt design across different clinical contexts remains underexplored. Moreover, a comprehensive and standardized framework for assessing the trustworthiness of LLM-generated radiology reports is yet to be established. This study aims to enhance the trustworthiness of LLM-generated liver MRI reports by introducing a Multi-Dimensional Credibility Assessment (MDCA) framework and providing guidance on institution-specific prompt optimization. The proposed framework is applied to evaluate and compare the performance of several advanced LLMs, including Kimi-K2-Instruct-0905, Qwen3-235B-A22B-Instruct-2507, DeepSeek-V3, and ByteDance-Seed-OSS-36B-Instruct, using the SiliconFlow platform.