FAQ

Frequently Asked Questions

    Latam-GPT is a comprehensive technological public good upon which various actors can build specific AI solutions, democratizing access to key tools for competitiveness and regional development. While one of its central outcomes is the development of an open large language model (LLM), the project also encompasses the generation of enabling and complementary capabilities: the training and coordination of regional talent, the creation of relevant and representative data corpora, the design of proprietary benchmarks and evaluations, as well as the development of shared infrastructure and technical knowledge. In this sense, Latam-GPT should be understood as a strategic and collaborative project that brings together multiple initiatives aimed at strengthening the Artificial Intelligence (AI) ecosystem in Latin America and the Caribbean.

    At this stage, Latam-GPT 70Bn 1.0 is released as a codebase, data, and trained files for developers to adapt to specific uses. Latam-GPT is not yet available as an interactive conversational chatbot for mass use from regular computers or mobile phones.

    Copuchat is an experimental application hosted on latamgpt.org, based on GPT-4.1 by OpenAI, designed to simulate conversations that real users might have with future versions of Latam-GPT. Its purpose is to collect real interactions from people in Latin America and the Caribbean to better understand how they use this type of technology and, in doing so, support the alignment and post-training processes of the model.

    The development of Latam-GPT was built on three fundamental pillars that current commercial models do not fully address in the context of Latin America and the Caribbean. The goal is to ensure the relevance, representativeness, and technological sovereignty of Artificial Intelligence in the region.

    1. Development of local capabilities. For AI to truly serve people, it is essential to understand how it works internally and not just use tools developed by third parties. This project enables regional talent to acquire deep technical experience, with the goal of leading innovation processes rather than being limited to implementing external technologies.

    2. Addressing the regional representation gap. Latam-GPT performs better on tasks related to topics from Latin America and the Caribbean. Currently, global models are trained primarily with data from the Global North, where Spanish represents only about 4% of the data, and Portuguese between 2% and 3%. Latam-GPT seeks to reduce this inequality by integrating data that reflects the culture, languages, and identity of Latin America and the Caribbean.

    3. Technological sovereignty. Latam-GPT proposes an open alternative to the dominance of large technology companies, demonstrating that the region has the capacity and autonomy needed to develop advanced Artificial Intelligence projects.

    A first advantage of Latam-GPT is that, unlike models of similar size, it shows better performance on tasks that require knowledge of the cultural context of Latin America and the Caribbean.

    The second relates to its condition as an open model, which will allow an organization to take the model and "educate" it with its own manuals or regulations. This, in turn, will allow strategic sectors to have greater information security.

    And the third is its transparency — unlike closed models that withhold key information about data and training, Latam-GPT champions openness and clarity, strengthening trust, technical scrutiny, and regional collaboration.

    LatamGPT was trained with a significantly higher proportion of data about the region than any model to date, through the CPT technique that provides additional knowledge to the base model LLama 3.1 70Bn.

    In its initial stage (version 1.0), Latam-GPT is not directly comparable to commercial models that have had large-scale investments. While the first version of the model performs below others in some benchmarks, its comparative performance and observed results are relevant evidence. In particular, they show generated capabilities — both technical and in infrastructure management — that lay the groundwork for future versions of the model to potentially reach equivalence with the most advanced models while maintaining the best performance in the Latin American and Caribbean context.

    The analysis of instruction-tuned versions will be addressed in later stages of the project.

    Latam-GPT is an unprecedented collaborative effort bringing together nearly 200 professionals and more than 65 institutions from 15 countries (13 from Latin America and the Caribbean and 2 external to the region), reflecting the magnitude and regional character of the project. This coordination demonstrates that the development of Artificial Intelligence in Latin America and the Caribbean is possible through collaborative work, and also shows that collaborations of this scale between academia, the public sector, and specialized organizations are achievable.

    The project is coordinated by CENIA and is made possible thanks to the collaboration of multiple institutions in the region, including:

    Strategic Collaborating Entities

    1. Amazon Web Services (AWS)
    2. Banco de Desarrollo de América Latina y el Caribe (CAF)
    3. Inter-American Development Bank (IDB)
    4. Ministry of Science, Innovation, Technology and Telecommunications of Costa Rica
    5. Ministry of Science, Technology, Knowledge and Innovation of Chile
    6. Organization of American States (OAS)
    7. Ministry of Science, Technology and Innovation of Brazil
    8. Government Office of Information and Communication Technologies (OGTIC) of the Dominican Republic
    9. Presidency of the Council of Ministers of Peru
    10. Secretariat of Science, Humanities, Technology and Innovation of Mexico
    11. National Secretariat of Science, Technology and Innovation (SENACYT) of Panama

    Signatory Institutions

    1. Academia de la Lengua Chilena, Chile
    2. Agency for E-Government and Information and Knowledge Society (AGESIC), Uruguay
    3. ARTIFICYAN, Chile
    4. Mexican Association of the Information Technology Industry (AMITI), Mexico
    5. Bibliotecas UC, Chile
    6. Centro de Investigación en Ciencias de Información Geoespacial (CentroGEO), Mexico
    7. Centro de Investigación e Innovación en Tecnologías de la Información y Comunicación (INFOTEC), Mexico
    8. Economic Commission for Latin America and the Caribbean (CEPAL)
    9. Council of Rectors of Chilean Universities (CRUCH), Chile
    10. Corporación Universitaria Minuto de Dios, Colombia
    11. Data Observatory, Chile
    12. DatySoc, Uruguay
    13. Department of Computer Science (DCC), Chile
    14. Department of Philosophy, UChile, Chile
    15. Institute of Technology and Engineering, UNAHUR, Argentina
    16. Duoc UC, Chile
    17. Chilean Army, Chile
    18. National Polytechnic School of Artificial Intelligence, Ecuador
    19. Faculty of Mathematics, Astronomy, Physics and Computing (FAMAF) – Universidad Nacional de Córdoba, Argentina
    20. Fundación Vía Libre, Argentina
    21. FUNDAR, Argentina
    22. IAEN, Ecuador
    23. National Institute of Astrophysics, Optics and Electronics (INAOE), Mexico
    24. JhedAI, Chile
    25. LabEVA, Faculty of Information and Communication, Universidad de la República, Uruguay
    26. Open Laboratory of Artificial Intelligence (LAIA)
    27. Perú AiMaraLab, Peru
    28. Red Divulga Ciencia, Ecuador
    29. SOMOSNLP, Spain
    30. Tabuga, Dominican Republic
    31. Theodora, Chile
    32. Universidad Avellaneda, Argentina
    33. Universidad Central de Venezuela, Venezuela
    34. Universidad Continental, Peru
    35. Universidad de Costa Rica, Costa Rica
    36. Universidad de los Andes de Colombia, Colombia
    37. Universidad Espíritu Santo, Colombia
    38. Universidad Gabriela Mistral, Chile
    39. Universidad Javeriana, Universidad La Salle, Colombia
    40. Universidad Nacional de San Martín (UNSAM), Argentina
    41. Universidad Ricardo Palma, Peru
    42. Universidad Tecnológica de Panamá, Panama
    43. Wikimedia Chile, Chile

    The development uses Llama 3.1 (70 billion parameters) as its base architecture, and also conducts experiments with more compact models (primarily 8 billion parameters). A vital component has been the optimization through AWS infrastructure, which simplified the management of critical infrastructure, making it possible to iterate more quickly. These iterations reduced training time by 64%, from 25 to just 9 days, compared to the initial less optimized tests.

    The project has consolidated a corpus of more than 300 billion tokens of plain text information with a regional focus, equivalent to approximately 230 billion words.

    The team completed the training of the first version of the base model from this corpus, which was announced at the launch on February 10, 2026. The model does not yet have a confirmed release date. In parallel, benchmarks will be published to evaluate the cultural and contextual knowledge of language models about Latin America and the Caribbean, along with a broad regional collaboration network that has strengthened technical and human capabilities in Artificial Intelligence.

    The first version of the model is conceived as a solid foundation upon which to iterate through evaluation, feedback, and continuous improvement processes. Nevertheless, it corresponds to a base model at an early stage of development, so it may have limitations typical of this type of model. The project's goal is to progressively advance toward a robust model, especially in areas where knowledge of the Latin American and Caribbean context is decisive, thus contributing to the strengthening of regional capabilities in Artificial Intelligence.

    A rigorous curation process is applied to the 300 billion tokens. This process ensures that the data used to train the model is anonymized and free of toxic content, such as hate speech or inappropriate language. These practices are complemented by ongoing work with the project's ethics team, aimed at progressively strengthening a human rights and responsible use approach. Likewise, the project's ethical principles are embodied in transparency, as the initiative seeks to promote openness in its processes and development criteria, strengthening public trust and technical and academic scrutiny.

    As a public good, it is designed to be used by:

    • Universities and research centers.
    • Startups and entrepreneurs to create solutions.
    • Governments and social organizations to improve public management and citizen services.

    The representativeness of Latam-GPT is ensured through concrete efforts to expand the regional coverage of the corpus, incorporating information from 20 countries in Latin America and the Caribbean, obtained in collaboration with relevant institutions and subjected to rigorous curation and balancing processes. The corpus is organized into 10 priority thematic areas — Sports and Recreation; Arts; Politics; Communication and Media; Medicine and Health; Economics and Finance; Humanities and Social Sciences; Hard Sciences; Education; and, in an emerging capacity, Indigenous Peoples —, allowing it to capture a broad diversity of regional contexts and enable future expansions.

    Latam-GPT seeks to ensure that the countries of Latin America and the Caribbean move beyond being solely consumers of technologies developed in the Global North and advance toward a more prominent role in the development of Artificial Intelligence, incorporating the real problems and needs of the region. The project demonstrates that the region can build its own capabilities through a collaborative, ethical approach aligned with its linguistic, cultural, and institutional realities. In this sense, Latam-GPT represents a concrete step toward greater regional technological autonomy and an informed, situated contribution to the global debate on the future of Artificial Intelligence.