FAQ
Frequently Asked Questions
Latam-GPT is a comprehensive technological public good upon which various actors can build specific AI solutions, democratizing access to key tools for competitiveness and regional development. While one of its central outcomes is the development of an open large language model (LLM), the project also encompasses the generation of enabling and complementary capabilities: the training and coordination of regional talent, the creation of relevant and representative data corpora, the design of proprietary benchmarks and evaluations, as well as the development of shared infrastructure and technical knowledge. In this sense, Latam-GPT should be understood as a strategic and collaborative project that brings together multiple initiatives aimed at strengthening the Artificial Intelligence (AI) ecosystem in Latin America and the Caribbean.
At this stage, Latam-GPT 70Bn 1.0 is released as a codebase, data, and trained files for developers to adapt to specific uses. Latam-GPT is not yet available as an interactive conversational chatbot for mass use from regular computers or mobile phones.
Copuchat is an experimental application hosted on latamgpt.org, based on GPT-4.1 by OpenAI, designed to simulate conversations that real users might have with future versions of Latam-GPT. Its purpose is to collect real interactions from people in Latin America and the Caribbean to better understand how they use this type of technology and, in doing so, support the alignment and post-training processes of the model.
The development of Latam-GPT was built on three fundamental pillars that current commercial models do not fully address in the context of Latin America and the Caribbean. The goal is to ensure the relevance, representativeness, and technological sovereignty of Artificial Intelligence in the region.
1. Development of local capabilities. For AI to truly serve people, it is essential to understand how it works internally and not just use tools developed by third parties. This project enables regional talent to acquire deep technical experience, with the goal of leading innovation processes rather than being limited to implementing external technologies.
2. Addressing the regional representation gap. Latam-GPT performs better on tasks related to topics from Latin America and the Caribbean. Currently, global models are trained primarily with data from the Global North, where Spanish represents only about 4% of the data, and Portuguese between 2% and 3%. Latam-GPT seeks to reduce this inequality by integrating data that reflects the culture, languages, and identity of Latin America and the Caribbean.
3. Technological sovereignty. Latam-GPT proposes an open alternative to the dominance of large technology companies, demonstrating that the region has the capacity and autonomy needed to develop advanced Artificial Intelligence projects.
A first advantage of Latam-GPT is that, unlike models of similar size, it shows better performance on tasks that require knowledge of the cultural context of Latin America and the Caribbean.
The second relates to its condition as an open model, which will allow an organization to take the model and "educate" it with its own manuals or regulations. This, in turn, will allow strategic sectors to have greater information security.
And the third is its transparency — unlike closed models that withhold key information about data and training, Latam-GPT champions openness and clarity, strengthening trust, technical scrutiny, and regional collaboration.
LatamGPT was trained with a significantly higher proportion of data about the region than any model to date, through the CPT technique that provides additional knowledge to the base model LLama 3.1 70Bn.
In its initial stage (version 1.0), Latam-GPT is not directly comparable to commercial models that have had large-scale investments. While the first version of the model performs below others in some benchmarks, its comparative performance and observed results are relevant evidence. In particular, they show generated capabilities — both technical and in infrastructure management — that lay the groundwork for future versions of the model to potentially reach equivalence with the most advanced models while maintaining the best performance in the Latin American and Caribbean context.
The analysis of instruction-tuned versions will be addressed in later stages of the project.
- Amazon Web Services (AWS)
- Banco de Desarrollo de América Latina y el Caribe (CAF)
- Inter-American Development Bank (IDB)
- Ministry of Science, Innovation, Technology and Telecommunications of Costa Rica
- Ministry of Science, Technology, Knowledge and Innovation of Chile
- Organization of American States (OAS)
- Ministry of Science, Technology and Innovation of Brazil
- Government Office of Information and Communication Technologies (OGTIC) of the Dominican Republic
- Presidency of the Council of Ministers of Peru
- Secretariat of Science, Humanities, Technology and Innovation of Mexico
- National Secretariat of Science, Technology and Innovation (SENACYT) of Panama
- Academia de la Lengua Chilena, Chile
- Agency for E-Government and Information and Knowledge Society (AGESIC), Uruguay
- ARTIFICYAN, Chile
- Mexican Association of the Information Technology Industry (AMITI), Mexico
- Bibliotecas UC, Chile
- Centro de Investigación en Ciencias de Información Geoespacial (CentroGEO), Mexico
- Centro de Investigación e Innovación en Tecnologías de la Información y Comunicación (INFOTEC), Mexico
- Economic Commission for Latin America and the Caribbean (CEPAL)
- Council of Rectors of Chilean Universities (CRUCH), Chile
- Corporación Universitaria Minuto de Dios, Colombia
- Data Observatory, Chile
- DatySoc, Uruguay
- Department of Computer Science (DCC), Chile
- Department of Philosophy, UChile, Chile
- Institute of Technology and Engineering, UNAHUR, Argentina
- Duoc UC, Chile
- Chilean Army, Chile
- National Polytechnic School of Artificial Intelligence, Ecuador
- Faculty of Mathematics, Astronomy, Physics and Computing (FAMAF) – Universidad Nacional de Córdoba, Argentina
- Fundación Vía Libre, Argentina
- FUNDAR, Argentina
- IAEN, Ecuador
- National Institute of Astrophysics, Optics and Electronics (INAOE), Mexico
- JhedAI, Chile
- LabEVA, Faculty of Information and Communication, Universidad de la República, Uruguay
- Open Laboratory of Artificial Intelligence (LAIA)
- Perú AiMaraLab, Peru
- Red Divulga Ciencia, Ecuador
- SOMOSNLP, Spain
- Tabuga, Dominican Republic
- Theodora, Chile
- Universidad Avellaneda, Argentina
- Universidad Central de Venezuela, Venezuela
- Universidad Continental, Peru
- Universidad de Costa Rica, Costa Rica
- Universidad de los Andes de Colombia, Colombia
- Universidad Espíritu Santo, Colombia
- Universidad Gabriela Mistral, Chile
- Universidad Javeriana, Universidad La Salle, Colombia
- Universidad Nacional de San Martín (UNSAM), Argentina
- Universidad Ricardo Palma, Peru
- Universidad Tecnológica de Panamá, Panama
- Wikimedia Chile, Chile
Latam-GPT is an unprecedented collaborative effort bringing together nearly 200 professionals and more than 65 institutions from 15 countries (13 from Latin America and the Caribbean and 2 external to the region), reflecting the magnitude and regional character of the project. This coordination demonstrates that the development of Artificial Intelligence in Latin America and the Caribbean is possible through collaborative work, and also shows that collaborations of this scale between academia, the public sector, and specialized organizations are achievable.
The project is coordinated by CENIA and is made possible thanks to the collaboration of multiple institutions in the region, including:
Strategic Collaborating Entities
Signatory Institutions
The development uses Llama 3.1 (70 billion parameters) as its base architecture, and also conducts experiments with more compact models (primarily 8 billion parameters). A vital component has been the optimization through AWS infrastructure, which simplified the management of critical infrastructure, making it possible to iterate more quickly. These iterations reduced training time by 64%, from 25 to just 9 days, compared to the initial less optimized tests.
The project has consolidated a corpus of more than 300 billion tokens of plain text information with a regional focus, equivalent to approximately 230 billion words.
The team completed the training of the first version of the base model from this corpus, which was announced at the launch on February 10, 2026. The model does not yet have a confirmed release date. In parallel, benchmarks will be published to evaluate the cultural and contextual knowledge of language models about Latin America and the Caribbean, along with a broad regional collaboration network that has strengthened technical and human capabilities in Artificial Intelligence.
The first version of the model is conceived as a solid foundation upon which to iterate through evaluation, feedback, and continuous improvement processes. Nevertheless, it corresponds to a base model at an early stage of development, so it may have limitations typical of this type of model. The project's goal is to progressively advance toward a robust model, especially in areas where knowledge of the Latin American and Caribbean context is decisive, thus contributing to the strengthening of regional capabilities in Artificial Intelligence.
A rigorous curation process is applied to the 300 billion tokens. This process ensures that the data used to train the model is anonymized and free of toxic content, such as hate speech or inappropriate language. These practices are complemented by ongoing work with the project's ethics team, aimed at progressively strengthening a human rights and responsible use approach. Likewise, the project's ethical principles are embodied in transparency, as the initiative seeks to promote openness in its processes and development criteria, strengthening public trust and technical and academic scrutiny.
- Universities and research centers.
- Startups and entrepreneurs to create solutions.
- Governments and social organizations to improve public management and citizen services.
As a public good, it is designed to be used by:
The representativeness of Latam-GPT is ensured through concrete efforts to expand the regional coverage of the corpus, incorporating information from 20 countries in Latin America and the Caribbean, obtained in collaboration with relevant institutions and subjected to rigorous curation and balancing processes. The corpus is organized into 10 priority thematic areas — Sports and Recreation; Arts; Politics; Communication and Media; Medicine and Health; Economics and Finance; Humanities and Social Sciences; Hard Sciences; Education; and, in an emerging capacity, Indigenous Peoples —, allowing it to capture a broad diversity of regional contexts and enable future expansions.
Latam-GPT seeks to ensure that the countries of Latin America and the Caribbean move beyond being solely consumers of technologies developed in the Global North and advance toward a more prominent role in the development of Artificial Intelligence, incorporating the real problems and needs of the region. The project demonstrates that the region can build its own capabilities through a collaborative, ethical approach aligned with its linguistic, cultural, and institutional realities. In this sense, Latam-GPT represents a concrete step toward greater regional technological autonomy and an informed, situated contribution to the global debate on the future of Artificial Intelligence.