We're excited to offer a diverse selection of powerful base models to fuel your AI development. This list represents a range of cutting-edge language models from leading AI research organizations, each with unique strengths and characteristics. Whether you're building chatbots, generating creative content, or tackling complex reasoning tasks, you'll find a model here to suit your needs.
The models are categorized by their origin and include key details like model size (number of parameters), architecture, and any specific optimizations (e.g., instruction following, chat optimization, math capabilities). We also provide information about the training techniques used (SFT, RLHF) and the model's context window size, which is crucial for handling longer texts.
Qwen2.5-0.5B-Instruct
, Qwen2.5-1.5B-Instruct
, Qwen2.5-3B-Instruct
, Qwen2.5-7B-Instruct
, Qwen2.5-14B-Instruct
, Qwen2.5-32B-Instruct
, Qwen2.5-72B-Instruct
: These are instruction-following models, varying in size from 0.5 billion to 72 billion parameters. Larger models generally have greater capacity for understanding and generating text.Qwen2-0.5B-Instruct
, Qwen2-1.5B-Instruct
, Qwen2-7B-Instruct
, Qwen2-72B-Instruct
: Similar to the Qwen2.5 models, these are instruction-following versions with varying sizes.Qwen2-Math-7B-Instruct
, Qwen2-Math-72B-Instruct
: Specialized versions of Qwen2 designed for mathematical reasoning and problem-solving.Qwen1.5-110B-Chat
: A large chat-optimized model with 110 billion parameters.Llama (Meta)
Llama-3.2-1B-Instruct
, Llama-3.2-3B-Instruct
: Instruction-following models.Llama-3.2-1B
, Llama-3.2-3B
: Base Llama 3.2 models without specific instruction tuning.Meta-Llama-3.1-8B-Instruct
, Meta-Llama-3.1-70B-Instruct
: Instruction-following models.Meta-Llama-3.1-8B
, Meta-Llama-3.1-70B
: Base models.Meta-Llama-3-8B-Instruct
, Meta-Llama-3-70B-Instruct
: Instruction-following models.Meta-Llama-3-8B
, Meta-Llama-3-70B
: Base models.Llama-3.3-70B-Instruct
: An instruction-following model.Gemma (Google)
gemma-2-2b-it
, gemma-2-9b-it
, gemma-2-27b-it
: These are instruction-tuned versions of the Gemma 2 model, with sizes of 2 billion, 9 billion, and 27 billion parameters respectively. "it" likely stands for instruction tuned.SeaLLMs
SeaLLMs-v3-1.5B-Chat
, SeaLLMs-v3-7B-Chat
: Chat-optimized versions of SeaLLMs.SeaLLMs-v3-1.5B
, SeaLLMs-v3-7B
: Base SeaLLMs models.Phi (Microsoft)
Phi-3.5-mini-instruct
: A smaller instruction-following model.Phi-3-medium-128k-instruct
, Phi-3-medium-4k-instruct
: Medium-sized instruction-following models; "128k" and "4k" likely refer to context window size (the amount of text the model can consider at once).Phi-3-small-128k-instruct
, Phi-3-small-4k-instruct
: Small instruction-following models with different context window sizes.Phi-3-mini-128k-instruct
, Phi-3-mini-4k-instruct
: Smaller instruction-following models with different context window sizes.Yi (01.AI)
Yi-1.5-6B-Chat
, Yi-1.5-9B-Chat
, Yi-1.5-34B-Chat
: Chat-optimized models with varying sizes.Yi-1.5-9B-Chat-16K
, Yi-1.5-34B-Chat-16K
: Chat-optimized models, possibly with extended context windows (16K tokens).Aya (CohereForAI)
aya-23-8B
, aya-23-35B
: Models from the Aya series.Baichuan (Baichuan Inc.)
Baichuan2-7B-Chat
, Baichuan2-13B-Chat
: Chat-optimized models from the Baichuan 2 series.DeepSeek (DeepSeek AI)
DeepSeek-R1
: A base model.DeepSeek-R1-Zero
: A potentially specialized version of DeepSeek-R1.DeepSeek-R1-Distill-Llama-70B
, DeepSeek-R1-Distill-Qwen-32B
, DeepSeek-R1-Distill-Qwen-14B
, DeepSeek-R1-Distill-Llama-8B
, DeepSeek-R1-Distill-Qwen-7B
, DeepSeek-R1-Distill-Qwen-1.5B
: Distilled versions of DeepSeek-R1, using other models (Llama, Qwen) for knowledge transfer or improved efficiency.