Mixtral 8x7B

8x7B v0.1 · Family: Mistral

Open-weights sparse mixture-of-experts model from Mistral AI: 46.7B total parameters (12.9B active per token), 32K context window, Apache 2.0 license.

⚠ Deprecated✓ Public access⚖ Open sourceLLM📁 Mistral

Context window

32K

tokens

Parameters

46.7B total / 12.9B active

parameters

Release date

11 December 2023

🏢Mistral AIProducer

Access:APIDownloadDeployment:💻 Local☁ Cloud

Overview

Mixtral 8x7B is a decoder-only Sparse Mixture-of-Experts (SMoE) language model released by Mistral AI on December 11, 2023 under the Apache 2.0 license. At every layer and for every token, a router network selects 2 of 8 expert groups in the feed-forward block and combines their outputs additively. This yields 46.7B total parameters while only ~12.9B are active per token, keeping inference cost and latency comparable to a 12.9B model.

The model supports a 32k token context window and five languages: English, French, Italian, German and Spanish. Its Instruct variant, fine-tuned with SFT and DPO, scores 8.30 on MT-Bench. Mixtral 8x7B was distributed both as downloadable weights and via the Mistral API as open-mixtral-8x7b. It was marked deprecated on November 30, 2024 and retired from the Mistral API on March 30, 2025.

Classification

LLM

Family: Mistral

Access & deployment

APIDownload

LocalCloud

Weights: Open source

Key parameters

📏 Context: 32K

🧩 Parameters: 46.7B total / 12.9B active

✓ Fine-tuning

📥 Input: text

Technical specification

Context window

32K

tokens

Parameters

46.7B total / 12.9B active

parameters

License

Apache 2.0

Features:✓ Fine-tuning

Modalities

⬇ Input

text

⬆ Output

textcode

Capabilities and applications

Native model capabilities

Language modeling

Ability to predict subsequent tokens and generate coherent natural-language text based on the preceding context.

Category: language

Coding

Generating, analysing and modifying source code.

Category: coding

Multilingual

Understanding and generating text in many languages.

Category: language

Long context

Maintaining coherence and focus across very long input context.

Category: language

Reasoning

The model's ability to reason logically and solve complex problems.

Category: reasoning

Benchmark results

2 benchmarks

MT-Bench

8.30

📄 mistral.ai/news/mixtral-of-experts

Score for Mixtral 8x7B Instruct (SFT + DPO).

MMLU

accuracy

70.6%%

📄 mistral.ai/news/mixtral-of-experts

Technical architecture

Core Architecture

MOMoE TRTransformer

Model Form

LLLLM

Sources and related pages

3 sources

WebMixtral of experts — Mistral AI (announcement)mistral.ai DocsMistral docs — Mixtral 8x7B model carddocs.mistral.ai PaperMixtral of Experts (arXiv:2401.04088)arxiv.org

Browse related topics

📁 Mistral 🧠 MoE 🧠 Transformer 🧠 LLM All llm models