Large Language Models
50 articles
Thinking Machines Lab develops AI model for simultaneous conversation
Thinking Machines Lab, founded by former OpenAI CTO Mira Murati, has introduced interaction models aimed at enabling AI to engage in conversations more naturall...
ChatGPT Sees Increased Adoption Among Older Users in Early 2026
In the first quarter of 2026, ChatGPT experienced significant growth, particularly among users aged 35 and older.
Optimizing Matrix Multiplication for Swift in LLM Training
This article discusses the optimization of matrix multiplication code in Swift for training a Large Language Model (LLM). The author aims to enhance performance...
arXivLabs Encourages Collaboration on New Features with a Focus on Privacy
arXivLabs is a collaborative framework for developing and sharing new features on the arXiv platform. The initiative emphasizes values such as openness, communi...
ChatGPT 5.5 Pro Demonstrates Advanced Mathematical Problem Solving Capabilities
Recent experiments with ChatGPT 5.5 Pro highlight its ability to tackle complex mathematical problems, including those in combinatorics and additive number theo...
Improvements in AI Alignment Training Reduce Misalignment Issues in Claude Models
Recent updates to the training of Claude AI models have led to significant reductions in agentic misalignment, particularly in scenarios involving ethical dilem...
Analysis Reveals Cost Changes with GPT-5.5 Compared to GPT-5.4
The launch of GPT-5.5 has resulted in a price increase of 100% for input and output tokens compared to GPT-5.4. While users may face cost increases of 49-92%, t...
New Method Converts AI Activations into Readable Text for Better Understanding
A new technique called Natural Language Autoencoders (NLAs) has been developed to translate AI model activations into natural language, allowing for easier inte...

Anthropic Enhances Claude Chatbot for Consumer Use
Anthropic is shifting its focus to make its Claude chatbot more user-friendly for consumers. The company aims to improve the chatbot's ability to handle persona...
ChatGPT's Approach to Privacy and Data Usage in AI Training
ChatGPT employs measures to protect user privacy by minimizing the use of personal data in its training processes. Users have the option to control whether thei...
ChatGPT Futures Class of 2026 Unveiled with 26 Student Innovators
The ChatGPT Futures Class of 2026 consists of 26 students who are leveraging AI to create impactful projects. This initiative highlights a new generation focuse...
OpenAI introduces GPT-5.5 Instant as the new default model for ChatGPT
OpenAI has launched GPT-5.5 Instant, replacing the previous GPT-5.3 Instant as the default model for ChatGPT. The new model aims to reduce inaccuracies in sensi...
Gemma 4 Introduces Multi-Token Prediction Drafters for Faster Inference
Gemma 4 has launched Multi-Token Prediction (MTP) drafters, enhancing inference speed by up to three times without compromising output quality. This advancement...
ChatGPT Updates to GPT-5.5 Instant for Improved Accuracy and Personalization
The latest update to ChatGPT introduces the GPT-5.5 Instant model, enhancing its accuracy and personalization capabilities. This version significantly reduces i...
Introduction of GPT-5.5 Instant Model with Enhanced Safety Measures
The GPT-5.5 Instant model has been launched, featuring a comprehensive safety approach similar to earlier versions. This model is categorized as High capability...
Workshop on Building a Language Model from Scratch Using nanoGPT
A workshop is being offered to teach participants how to create a language model from scratch using the nanoGPT framework. The session focuses on building a sim...
Study finds AI models may outperform doctors in emergency room diagnoses
A study from Harvard Medical School indicates that AI models can provide more accurate diagnoses than human physicians in emergency room settings. The findings ...
Kimi K2.6 Wins AI Coding Contest Against Major Language Models
Kimi K2.6, developed by Moonshot AI, emerged victorious in an AI Coding Contest, outperforming notable models like GPT-5.5 and Claude Opus 4.7. The contest invo...
arXivLabs Encourages Collaborative Development of New Features
arXivLabs is a platform that facilitates the development and sharing of new features for the arXiv website. It emphasizes values such as openness, community, an...
Grok 4.3 Model Released with New Features
The Grok 4.3 model has been launched, introducing several enhancements and features for developers. This update aims to improve user experience and functionalit...
IBM launches Granite 4.1 language models with improved performance and training methods
IBM has introduced Granite 4.1, a new series of open-source language models designed for enterprise applications. The models, available in sizes of 3B, 8B, and ...
Understanding the Origins and Solutions for Goblin Outputs in AI Models
Research has identified the timeline and root causes of personality-driven quirks, referred to as 'goblin outputs,' in GPT-5 behavior. This analysis also explor...
Mistral Medium 3.5 Launches with Cloud Coding Agents and New Work Mode in Le Chat
Mistral Medium 3.5 has been introduced as a new default model for coding tasks in Mistral Vibe and Le Chat. This model enables remote coding agents to operate i...
Introduction of Talkie, a 13B Vintage Language Model Trained on Pre-1931 Texts
Talkie is a newly developed 13 billion parameter language model trained exclusively on historical English texts up to 1930. This model aims to explore the capab...
Amateur mathematician uses AI to solve long-standing Erdős problem
Liam Price, a 23-year-old without advanced math training, has solved a 60-year-old conjecture using ChatGPT. This solution, which employs a novel method, has ga...
New AI Agent Wiki Layer Utilizes Markdown and Git for Knowledge Management
A new wiki layer for AI agents has been developed using markdown and Git, allowing agents to maintain and share knowledge effectively. This system features a pr...
OpenAI introduces GPT-5.5 and GPT-5.5 Pro models along with various API updates
OpenAI has launched GPT-5.5 and GPT-5.5 Pro, enhancing capabilities for complex tasks and API requests. The updates include new models for image generation, imp...

DeepSeek Launches New AI Model One Year After Initial Release
DeepSeek, a Chinese AI company, has introduced a new flagship AI model, marking a year since its previous open-source model disrupted the tech industry. This ne...
Overview of Large Language Model Development and Functionality
A detailed guide explains the process of creating large language models like ChatGPT, from data collection to training and post-training refinement. It highligh...
DeepSeek previews V4 large language model amid escalating AI competition
DeepSeek has unveiled a preview of its V4 large language model, allowing users to test its features. This release follows the success of its R1 reasoning model ...

DeepSeek Launches New AI Models One Year After Major Innovation
DeepSeek has introduced preview versions of its latest AI models, V4 Flash and V4 Pro, marking significant upgrades since its previous breakthrough. The new mod...
OpenAI launches GPT-5.5, advancing towards a multi-functional AI platform
OpenAI has introduced GPT-5.5, its latest AI model, which is designed to enhance various computing tasks and bring the company closer to developing a comprehens...
OpenAI announces GPT-5.5, its latest artificial intelligence model
OpenAI has announced the release of GPT-5.5, its latest artificial intelligence model, which boasts…

OpenAI launches GPT-5.5, emphasizing rapid AI development and improved capabilities
OpenAI has introduced its latest AI model, GPT-5.5, to paid subscribers just six weeks after GPT-5.4. This rapid release highlights the competitive landscape am...
OpenAI announces the release of GPT-5.5 with new features
OpenAI has introduced GPT-5.5, an updated version of its language model. This release includes enhanced capabilities and improvements aimed at better user exper...

OpenAI Launches GPT-5.5 Model for Enhanced Task Performance
OpenAI has released its latest AI model, GPT-5.5, designed to perform tasks with minimal instructions. The model aims to improve efficiency in areas such as cod...
Claude Code Addresses Recent Quality Issues and Resets Usage Limits
Claude Code has resolved three identified issues that led to reports of degraded performance for some users. As of April 20, the service has been updated, and u...
OpenAI launches GPT-5.5 with enhanced coding and knowledge work capabilities
OpenAI has introduced GPT-5.5, a new AI model designed to improve efficiency and intelligence in coding and knowledge work. The model is capable of handling com...
GPT-5.5 System Card
The article discusses the release of the GPT-5.5 system card, detailing its capabilities and improvements over previous versions. This update is significant as ...

Google's Search Chief Discusses AI's Impact on Online Information Access
Elizabeth Reid, Google's VP of search, addresses the shift from traditional search engines to large language models (LLMs) like ChatGPT and Gemini. This transit...
Google's VP Discusses the Shift from Traditional Search to AI Models
The rise of large language models like ChatGPT and Google's Gemini is transforming how users access information online. Elizabeth Reid, VP of search at Google, ...

Taiwanese Banks Collaborate on AI Model for Financial Sector
A project is underway in Taiwan to create a large language model tailored for the finance sector. Sixteen local financial institutions are participating, aiming...
Introduction of Qwen3.6-27B Model for Advanced Coding Tasks
The Qwen3.6-27B model has been launched, offering flagship-level capabilities for coding tasks. This 27 billion parameter dense model aims to enhance programmin...

Anthropic's Mythos AI Model Limited to Select Businesses for Cybersecurity Testing
Anthropic has developed a new AI model, Mythos, designed to identify and exploit cybersecurity vulnerabilities. The company has chosen to restrict its release t...
Qwen 3.6 Max Preview Introduces Enhanced Features and Improvements
The latest preview of Qwen 3.6 Max showcases significant enhancements in its capabilities. These updates aim to improve user experience and functionality as the...
Challenges Persist in AI Agent Development Despite Industry Enthusiasm
Executives and engineers in Silicon Valley discussed the complexities and costs associated with AI agents at recent events. Key issues include the over-reliance...
Anthropic Labs launches Claude Design for collaborative visual creation
Claude Design, a new product from Anthropic Labs, enables users to collaborate with an AI to create visual content such as designs and prototypes. Currently in ...

Analysis of Anthropic's Mythos AI Model and Its Military Applications
Gregory Allen from the CSIS Wadhwani AI Center discusses the functionalities of Anthropic's Mythos AI model. The analysis highlights the role of AI in enhancing...

Anthropic Launches Opus 4.7 AI Model Amid Industry Developments
Anthropic has released its updated AI model, Opus 4.7, shortly after the limited launch of Mythos. Meanwhile, Elon Musk is advancing his Terafab project despite...

Anthropic releases Opus 4.7, enhancing AI capabilities in software engineering
Anthropic PBC has launched Opus 4.7, an upgraded version of its AI model designed to improve software engineering and coding tasks. The company has also develop...