OpenAI's Next AI Model Will Have 'Extreme' Reasoning Capabilities OpenAI, the renowned artificial intelligence research company, is poised to unveil its latest AI model with "extreme" reasoning abilities, according to industry experts and the company's own statements. This development comes as OpenAI's existing AI coding tool, Codex, has seen a surge in usage, with over 1 million active weekly users, a figure that has tripled since the release of the latest version. Thibault Sottiaux, the head of OpenAI's Codex product, has outlined the company's ambitions to expand Codex beyond just coding, positioning it as a "standard agent" that can be deployed across various enterprise applications, including for non-technical workers. Sottiaux emphasized that the core of Codex is focused on "instruction following, understanding large amounts of data, finding its own context, and navigating the world in order to make decisions." This push towards more advanced reasoning capabilities is echoed by OpenAI investor Vinod Khosla, who predicts that AI will be capable of performing up to 80% of all jobs, from physicians to accountants. Khosla believes this AI-driven labor transformation will be so significant that today's 5-year-olds may not even need to find a job, as the "need to work will go away." Nvidia, a leading provider of AI hardware and software, is also making strides in this area. The company has released Cosmos Reason 2, a state-of-the-art open reasoning vision-language model (VLM) that enables robots and AI agents to see, understand, plan, and act in the physical world like humans. Cosmos Reason 2 boasts improved spatio-temporal understanding, flexible deployment options, and expanded visual perception capabilities. Furthermore, Nvidia is advancing autonomous networks in the telecommunications industry, unveiling an open-source large telco model (LTM) and AI agent blueprints to help operators build intelligent, self-managing networks. This effort is part of GSMA's new Open Telco AI initiative, which aims to provide open resources for the mobile communications industry. As AI models continue to evolve, the industry is witnessing a shift towards more advanced reasoning and autonomy, with OpenAI, Nvidia, and other players leading the charge. These developments have the potential to transform various industries, from healthcare to telecommunications, and could significantly impact the future of work and the global economy.
Here is a 472-word journalistic summary synthesizing the key details from the 5 articles: OpenAI CEO Sam Altman Defends Pentagon Deal Amid Backlash In a move that has sparked significant controversy, OpenAI has struck a deal with the U.S. Department of War (DoW) to allow the company's AI models, including ChatGPT, to be deployed on the Pentagon's classified networks. This comes just hours after the Trump administration ordered federal agencies to cut ties with OpenAI's rival, Anthropic, labeling the company a "supply-chain risk to National Security." The OpenAI-Pentagon agreement was announced on Friday by CEO Sam Altman, who took to social media over the weekend to defend the decision. Altman acknowledged that the "optics don't look good" but insisted that AI safety and the wide distribution of benefits are core to OpenAI's mission. He said the deal prohibits the use of OpenAI's technology for domestic mass surveillance and autonomous weapons that can make strike decisions without human oversight - principles that the DoW has agreed to and incorporated into the contract. Altman's comments came amid vocal backlash from critics, including a campaign urging ChatGPT users to switch to Anthropic's Claude chatbot. There was also evidence that the campaign was having an effect, with Claude briefly surpassing ChatGPT as the most downloaded free app on Apple's App Store. Graffiti outside OpenAI's offices in San Francisco also attacked the company's decision to work with the Pentagon. The deal has highlighted the diverging approaches of OpenAI and Anthropic when it comes to engaging with the U.S. government on AI development. Anthropic CEO Dario Amodei had previously refused Pentagon demands to allow the company's AI to be used for "all lawful purposes," citing concerns about mass surveillance and autonomous weapons. In contrast, OpenAI appears to have been more willing to work with the DoW, with Altman saying the company felt comfortable with the lack of restrictions because "so many safeguards were already built into its models." The Trump administration's move to designate Anthropic a "supply-chain risk" and order federal agencies to stop using its technology has been described as an "extremely scary precedent" by Altman. He has also expressed disagreement with the supply-chain risk designation, suggesting that the government's actions against Anthropic set a worrying precedent. As the debate over the appropriate use of AI in military and national security applications continues, the OpenAI-Pentagon deal and the government's actions against Anthropic have thrust these issues into the spotlight. The outcome of this clash could have significant implications for the future of AI development and its relationship with the U.S. government.
Distilling the tool-using capabilities of large language models (LLMs) into smaller, more efficient small language models (SLMs) is a key challenge for their practical application. The predominant approach, supervised fine-tuning (SFT), suffers from poor generalization as it trains models to imitate a static set of teacher trajectories rather than learn a robust methodology. While reinforcement learning (RL) offers an alternative, the standard RL using sparse rewards fails to effectively guide SLMs, causing them to struggle with inefficient exploration and adopt suboptimal strategies. To address these distinct challenges, we propose MENTOR, a framework that synergistically combines RL with teacher-guided distillation. Instead of simple imitation, MENTOR employs an RL-based process to learn a more generalizable policy through exploration. In addition, to solve the problem of reward sparsity, it uses a teacher's reference trajectory to construct a dense, composite teacher-guided reward that provides fine-grained guidance. Extensive experiments demonstrate that MENTOR significantly improves the cross-domain generalization and strategic competence of SLMs compared to both SFT and standard sparse-reward RL baselines.
Gradient-based data attribution methods, such as influence functions, are critical for understanding the impact of individual training samples without requiring repeated model retraining. However, their scalability is often limited by the high computational and memory costs associated with per-sample gradient computation. In this work, we propose GraSS, a novel gradient compression algorithm and its variants FactGraSS for linear layers specifically, that explicitly leverage the inherent sparsity of per-sample gradients to achieve sub-linear space and time complexity. Extensive experiments demonstrate the effectiveness of our approach, achieving substantial speedups while preserving data influence fidelity. In particular, FactGraSS achieves up to 165% faster throughput on billion-scale models compared to the previous state-of-the-art baselines. Our code is publicly available at https://github.com/TRAIS-Lab/GraSS.
Large language models (LLMs) have demonstrated promising performance in generating diagnostic conclusions from imaging findings, thereby supporting radiology reporting, trainee education, and quality control. However, systematic guidance on how to optimize prompt design across different clinical contexts remains underexplored. Moreover, a comprehensive and standardized framework for assessing the trustworthiness of LLM-generated radiology reports is yet to be established. This study aims to enhance the trustworthiness of LLM-generated liver MRI reports by introducing a Multi-Dimensional Credibility Assessment (MDCA) framework and providing guidance on institution-specific prompt optimization. The proposed framework is applied to evaluate and compare the performance of several advanced LLMs, including Kimi-K2-Instruct-0905, Qwen3-235B-A22B-Instruct-2507, DeepSeek-V3, and ByteDance-Seed-OSS-36B-Instruct, using the SiliconFlow platform.