Foundation Models

Foundation Models

We're excited to offer a diverse selection of powerful base models to fuel your AI development. This list represents a range of cutting-edge language models from leading AI research organizations, each with unique strengths and characteristics. Whether you're building chatbots, generating creative content, or tackling complex reasoning tasks, you'll find a model here to suit your needs.

The models are categorized by their origin and include key details like model size (number of parameters), architecture, and any specific optimizations (e.g., instruction following, chat optimization, math capabilities). We also provide information about the training techniques used (SFT, RLHF) and the model's context window size, which is crucial for handling longer texts.

Qwen (Alibaba Cloud)
  • Qwen2.5: An improved version of the Qwen2 series.
    • Qwen2.5-0.5B-Instruct, Qwen2.5-1.5B-Instruct, Qwen2.5-3B-Instruct, Qwen2.5-7B-Instruct, Qwen2.5-14B-Instruct, Qwen2.5-32B-Instruct, Qwen2.5-72B-Instruct: These are instruction-following models, varying in size from 0.5 billion to 72 billion parameters. Larger models generally have greater capacity for understanding and generating text.
  • Qwen2: A series of large language models.
    • Qwen2-0.5B-Instruct, Qwen2-1.5B-Instruct, Qwen2-7B-Instruct, Qwen2-72B-Instruct: Similar to the Qwen2.5 models, these are instruction-following versions with varying sizes.
    • Qwen2-Math-7B-Instruct, Qwen2-Math-72B-Instruct: Specialized versions of Qwen2 designed for mathematical reasoning and problem-solving.
  • Qwen1.5: An earlier generation of Qwen models.
    • Qwen1.5-110B-Chat: A large chat-optimized model with 110 billion parameters.

Llama (Meta)

  • Llama 3.2: A version of Meta's Llama series.
    • Llama-3.2-1B-Instruct, Llama-3.2-3B-Instruct: Instruction-following models.
    • Llama-3.2-1B, Llama-3.2-3B: Base Llama 3.2 models without specific instruction tuning.
  • Meta-Llama 3.1: An earlier version in the Llama 3 series.
    • Meta-Llama-3.1-8B-Instruct, Meta-Llama-3.1-70B-Instruct: Instruction-following models.
    • Meta-Llama-3.1-8B, Meta-Llama-3.1-70B: Base models.
  • Meta-Llama 3: Another version in the Llama 3 series.
    • Meta-Llama-3-8B-Instruct, Meta-Llama-3-70B-Instruct: Instruction-following models.
    • Meta-Llama-3-8B, Meta-Llama-3-70B: Base models.
  • Llama 3.3:
    • Llama-3.3-70B-Instruct: An instruction-following model.

Gemma (Google)

  • gemma-2-2b-it, gemma-2-9b-it, gemma-2-27b-it: These are instruction-tuned versions of the Gemma 2 model, with sizes of 2 billion, 9 billion, and 27 billion parameters respectively. "it" likely stands for instruction tuned.

SeaLLMs

  • SeaLLMs-v3-1.5B-Chat, SeaLLMs-v3-7B-Chat: Chat-optimized versions of SeaLLMs.
  • SeaLLMs-v3-1.5B, SeaLLMs-v3-7B: Base SeaLLMs models.

Phi (Microsoft)

  • Phi-3.5-mini-instruct: A smaller instruction-following model.
  • Phi-3-medium-128k-instruct, Phi-3-medium-4k-instruct: Medium-sized instruction-following models; "128k" and "4k" likely refer to context window size (the amount of text the model can consider at once).
  • Phi-3-small-128k-instruct, Phi-3-small-4k-instruct: Small instruction-following models with different context window sizes.
  • Phi-3-mini-128k-instruct, Phi-3-mini-4k-instruct: Smaller instruction-following models with different context window sizes.

Yi (01.AI)

  • Yi-1.5-6B-Chat, Yi-1.5-9B-Chat, Yi-1.5-34B-Chat: Chat-optimized models with varying sizes.
  • Yi-1.5-9B-Chat-16K, Yi-1.5-34B-Chat-16K: Chat-optimized models, possibly with extended context windows (16K tokens).

Aya (CohereForAI)

  • aya-23-8B, aya-23-35B: Models from the Aya series.

Baichuan (Baichuan Inc.)

  • Baichuan2-7B-Chat, Baichuan2-13B-Chat: Chat-optimized models from the Baichuan 2 series.

DeepSeek (DeepSeek AI)

  • DeepSeek-R1: A base model.
  • DeepSeek-R1-Zero: A potentially specialized version of DeepSeek-R1.
  • DeepSeek-R1-Distill-Llama-70B, DeepSeek-R1-Distill-Qwen-32B, DeepSeek-R1-Distill-Qwen-14B, DeepSeek-R1-Distill-Llama-8B, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Qwen-1.5B: Distilled versions of DeepSeek-R1, using other models (Llama, Qwen) for knowledge transfer or improved efficiency.
InfoKey Points:
  • -Instruct: Suffix often indicates the model has been fine-tuned for following instructions.
  • -Chat: Suffix often indicates the model is optimized for conversational interactions.
  • -Math: Suffix suggests a model specialized for mathematical tasks.
  • Size (e.g., 7B, 70B, 110B): Refers to the number of parameters in the model (billions). More parameters generally mean greater capacity but also higher computational cost.
  • -16K, -128k, -4k: These suffixes likely relate to the model's context window size. A larger context window allows the model to process more text at once.
  • SFT (Supervised Fine-Tuning): A training technique where the model is fine-tuned on a dataset of input-output pairs.
  • RLHF (Reinforcement Learning from Human Feedback): A training technique that uses human feedback to improve the model's responses.
    • Related Articles

    • Distributed Training: LLaMA-Factory on Managed Slurm

      1. Overview This guide walks you through implementing distributed training with LLaMA-Factory on a Managed Slurm cluster. The documentation covers all essential aspects of the workflow, including environment configuration, efficient job scheduling ...
    • How it works?

      Overall, the GreenNode AI Platform forms an end-to-end pipeline for building, training, managing, and deploying machine learning models in an AI platform, which includes four main components Notebook Instance, Model Training, Model Registry and ...
    • Network Volume

      Network Volumes on the Greennode AI Platform provide a high-performance and scalable storage solution specifically designed to meet the storage and data management needs of AI projects. One of the standout features of Network Volumes is their ability ...
    • Import a Model Registry

      The model registry is a centralized repository for storing trained models, their metadata, versions, and associated artifacts. It allows for version control, management, and organization of models developed during the training phase. This enables ...
    • GreenNode AI Platform Release Note 2024

      This central hub provides comprehensive information about the latest updates, new features, enhancements, and bug fixes introduced in each release of the GreenNode AI Platform in 2024. Our goal is to keep you informed and empowered to make the most ...