Faq

Frequently Asked Questions

LatamGPT is a comprehensive technological public good upon which various actors can build specific AI solutions, democratizing access to key tools for competitiveness and regional development. While one of its central outcomes is the development of an open large language model (LLM), the project also encompasses the generation of enabling and complementary capabilities: the training and coordination of regional talent, the creation of relevant and representative data corpora, the design of proprietary benchmarks and evaluations, as well as the development of shared infrastructure and technical knowledge. In this sense, LatamGPT should be understood as a strategic and collaborative project that brings together multiple initiatives aimed at strengthening the Artificial Intelligence (AI) ecosystem in Latin America and the Caribbean.

At this stage, LatamGPT 70Bn 1.0 is released as a codebase, data, and trained files for developers to adapt to specific uses. LatamGPT is not yet available as an interactive conversational chatbot for mass use from regular computers or mobile phones.

Copuchat is an experimental application hosted on latamgpt.org, based on GPT-4.1 by OpenAI, designed to simulate conversations that real users might have with future versions of LatamGPT. Its purpose is to collect real interactions from people in Latin America and the Caribbean to better understand how they use this type of technology and, in doing so, support the alignment and post-training processes of the model.

The development of LatamGPT was built on three fundamental pillars that current commercial models do not fully address in the context of Latin America and the Caribbean. The goal is to ensure the relevance, representativeness, and technological sovereignty of Artificial Intelligence in the region.

1. Development of local capabilities. For AI to truly serve people, it is essential to understand how it works internally and not just use tools developed by third parties. This project enables regional talent to acquire deep technical experience, with the goal of leading innovation processes rather than being limited to implementing external technologies.

2. Addressing the regional representation gap. LatamGPT performs better on tasks related to topics from Latin America and the Caribbean. Currently, global models are trained primarily with data from the Global North, where Spanish represents only about 4% of the data, and Portuguese between 2% and 3%. LatamGPT seeks to reduce this inequality by integrating data that reflects the culture, languages, and identity of Latin America and the Caribbean.

3. Technological sovereignty. LatamGPT proposes an open alternative to the dominance of large technology companies, demonstrating that the region has the capacity and autonomy needed to develop advanced Artificial Intelligence projects.

A first advantage of LatamGPT is that, unlike models of similar size, it shows better performance on tasks that require knowledge of the cultural context of Latin America and the Caribbean.

The second relates to its condition as an open model, which will allow an organization to take the model and "educate" it with its own manuals or regulations. This, in turn, will allow strategic sectors to have greater information security.

And the third is its transparency — unlike closed models that withhold key information about data and training, LatamGPT champions openness and clarity, strengthening trust, technical scrutiny, and regional collaboration.

LatamGPT was trained with a significantly higher proportion of data about the region than any model to date, through the CPT technique that provides additional knowledge to the base model LLama 3.1 70Bn.

In its initial stage (version 1.0), LatamGPT is not directly comparable to commercial models that have had large-scale investments. While the first version of the model performs below others in some benchmarks, its comparative performance and observed results are relevant evidence. In particular, they show generated capabilities — both technical and in infrastructure management — that lay the groundwork for future versions of the model to potentially reach equivalence with the most advanced models while maintaining the best performance in the Latin American and Caribbean context.

The analysis of instruction-tuned versions will be addressed in later stages of the project.

LatamGPT is an unprecedented collaborative effort bringing together nearly 200 professionals and more than 65 institutions from 15 countries (13 from Latin America and the Caribbean and 2 external to the region), reflecting the magnitude and regional character of the project. This coordination demonstrates that the development of Artificial Intelligence in Latin America and the Caribbean is possible through collaborative work, and also shows that collaborations of this scale between academia, the public sector, and specialized organizations are achievable.

The project is coordinated by CENIA and is made possible thanks to the collaboration of multiple institutions in the region, including:

Strategic Collaborating Entities

Amazon Web Services (AWS)
Banco de Desarrollo de América Latina y el Caribe (CAF)
Inter-American Development Bank (IDB)
Ministry of Science, Innovation, Technology and Telecommunications of Costa Rica
Ministry of Science, Technology, Knowledge and Innovation of Chile
Organization of American States (OAS)
Ministry of Science, Technology and Innovation of Brazil
Government Office of Information and Communication Technologies (OGTIC) of the Dominican Republic
Presidency of the Council of Ministers of Peru
Secretariat of Science, Humanities, Technology and Innovation of Mexico
National Secretariat of Science, Technology and Innovation (SENACYT) of Panama

Signatory Institutions

Academia de la Lengua Chilena, Chile
Agency for E-Government and Information and Knowledge Society (AGESIC), Uruguay
ARTIFICYAN, Chile
Mexican Association of the Information Technology Industry (AMITI), Mexico
Bibliotecas UC, Chile
Centro de Investigación en Ciencias de Información Geoespacial (CentroGEO), Mexico
Centro de Investigación e Innovación en Tecnologías de la Información y Comunicación (INFOTEC), Mexico
Economic Commission for Latin America and the Caribbean (CEPAL)
Council of Rectors of Chilean Universities (CRUCH), Chile
Corporación Universitaria Minuto de Dios, Colombia
Data Observatory, Chile
DatySoc, Uruguay
Department of Computer Science (DCC), Chile
Department of Philosophy, UChile, Chile
Institute of Technology and Engineering, UNAHUR, Argentina
Duoc UC, Chile
Chilean Army, Chile
National Polytechnic School of Artificial Intelligence, Ecuador
Faculty of Mathematics, Astronomy, Physics and Computing (FAMAF) – Universidad Nacional de Córdoba, Argentina
Fundación Vía Libre, Argentina
FUNDAR, Argentina
IAEN, Ecuador
National Institute of Astrophysics, Optics and Electronics (INAOE), Mexico
JhedAI, Chile
LabEVA, Faculty of Information and Communication, Universidad de la República, Uruguay
Open Laboratory of Artificial Intelligence (LAIA)
Perú AiMaraLab, Peru
Red Divulga Ciencia, Ecuador
SOMOSNLP, Spain
Tabuga, Dominican Republic
Theodora, Chile
Universidad Avellaneda, Argentina
Universidad Central de Venezuela, Venezuela
Universidad Continental, Peru
Universidad de Costa Rica, Costa Rica
Universidad de los Andes de Colombia, Colombia
Universidad Espíritu Santo, Colombia
Universidad Gabriela Mistral, Chile
Universidad Javeriana, Universidad La Salle, Colombia
Universidad Nacional de San Martín (UNSAM), Argentina
Universidad Ricardo Palma, Peru
Universidad Tecnológica de Panamá, Panama
Wikimedia Chile, Chile

The development uses Llama 3.1 (70 billion parameters) as its base architecture, and also conducts experiments with more compact models (primarily 8 billion parameters). A vital component has been the optimization through AWS infrastructure, which simplified the management of critical infrastructure, making it possible to iterate more quickly. These iterations reduced training time by 64%, from 25 to just 9 days, compared to the initial less optimized tests.

The project has consolidated a corpus of more than 300 billion tokens of plain text information with a regional focus, equivalent to approximately 230 billion words.

The team completed the training of the first version of the base model from this corpus, which was announced at the launch on February 10, 2026. The model does not yet have a confirmed release date. In parallel, benchmarks will be published to evaluate the cultural and contextual knowledge of language models about Latin America and the Caribbean, along with a broad regional collaboration network that has strengthened technical and human capabilities in Artificial Intelligence.

The first version of the model is conceived as a solid foundation upon which to iterate through evaluation, feedback, and continuous improvement processes. Nevertheless, it corresponds to a base model at an early stage of development, so it may have limitations typical of this type of model. The project's goal is to progressively advance toward a robust model, especially in areas where knowledge of the Latin American and Caribbean context is decisive, thus contributing to the strengthening of regional capabilities in Artificial Intelligence.

A rigorous curation process is applied to the 300 billion tokens. This process ensures that the data used to train the model is anonymized and free of toxic content, such as hate speech or inappropriate language. These practices are complemented by ongoing work with the project's ethics team, aimed at progressively strengthening a human rights and responsible use approach. Likewise, the project's ethical principles are embodied in transparency, as the initiative seeks to promote openness in its processes and development criteria, strengthening public trust and technical and academic scrutiny.

As a public good, it is designed to be used by:

Universities and research centers.
Startups and entrepreneurs to create solutions.
Governments and social organizations to improve public management and citizen services.

The representativeness of LatamGPT is ensured through concrete efforts to expand the regional coverage of the corpus, incorporating information from 20 countries in Latin America and the Caribbean, obtained in collaboration with relevant institutions and subjected to rigorous curation and balancing processes. The corpus is organized into 10 priority thematic areas — Sports and Recreation; Arts; Politics; Communication and Media; Medicine and Health; Economics and Finance; Humanities and Social Sciences; Hard Sciences; Education; and, in an emerging capacity, Indigenous Peoples —, allowing it to capture a broad diversity of regional contexts and enable future expansions.

LatamGPT seeks to ensure that the countries of Latin America and the Caribbean move beyond being solely consumers of technologies developed in the Global North and advance toward a more prominent role in the development of Artificial Intelligence, incorporating the real problems and needs of the region. The project demonstrates that the region can build its own capabilities through a collaborative, ethical approach aligned with its linguistic, cultural, and institutional realities. In this sense, LatamGPT represents a concrete step toward greater regional technological autonomy and an informed, situated contribution to the global debate on the future of Artificial Intelligence.

Frequently Asked Questions

What is LatamGPT?

Can LatamGPT be used as a chatbot for interactions with any user?

What is the purpose of Copuchat?

Why build LatamGPT if there are already many advanced and accessible language models?

In what ways will LatamGPT be better than other models?

What distinguishes LatamGPT from other models developed in the Global North?

Who is participating in LatamGPT?

How is LatamGPT being trained?

What is the current state of progress of the model?

What is the first version of LatamGPT like and what can be expected from it?

How are ethical principles and responsible use safeguarded in the development of LatamGPT?

What types of actors will be able to use LatamGPT once it is available?

How does LatamGPT ensure it represents the diversity of realities in Latin America and the Caribbean?

What role does LatamGPT seek to play in the global debate on Artificial Intelligence?