Introduction of Talkie, a 13B Vintage Language Model Trained on Pre-1931 Texts

Apr 27, 2026

AI Summary

Talkie is a newly developed 13 billion parameter language model trained exclusively on historical English texts up to 1930. This model aims to explore the capabilities of vintage language models and their potential contributions to understanding AI, while also addressing challenges related to data quality and anachronism in training.

Talkie is a 13B language model trained on 260 billion tokens of pre-1931 English text, making it the largest vintage language model to date.
The model's training corpus includes a variety of historical texts, such as books, newspapers, and scientific journals, with a focus on avoiding contamination from modern sources.
Researchers are investigating the model's ability to predict future events and generate new ideas based on historical knowledge, as well as its performance in programming tasks.
Talkie underperforms compared to modern language models in standard evaluations, but shows promise in language understanding and numeracy tasks.
The development of Talkie highlights the importance of data quality, as historical texts require accurate transcription, and efforts are being made to improve optical character recognition (OCR) systems for better training efficiency.
Future plans include scaling the model further and expanding the corpus to include multilingual texts, while also refining the training process to minimize anachronistic influences.

language modelvintageai researchtalkienlp

Introduction of Talkie, a 13B Vintage Language Model Trained on Pre-1931 Texts

Related Stories

Thinking Machines Lab develops AI model for simultaneous conversation

ChatGPT Sees Increased Adoption Among Older Users in Early 2026

Optimizing Matrix Multiplication for Swift in LLM Training

arXivLabs Encourages Collaboration on New Features with a Focus on Privacy