DeepSeek-R1

R1 · Family: DeepSeek

Open reasoning model from DeepSeek (January 2025). 671B MoE with 37B active parameters, trained via pure RL with verifiable rewards (GRPO) on top of DeepSeek-V3.

✓ Active✓ Public access⚖ Open weights★ FeaturedReasoning modelLLM📁 DeepSeek

Context window

128K

tokens

Parameters

671B (37B active)

parameters

Max output

32,768

tokens

Release date

20 January 2025

🏢DeepSeek AIProducer

Access:APIDownloadHostedDeployment:☁ Cloud💻 Local

Overview

DeepSeek-R1 is an open reasoning model released by DeepSeek-AI on January 20, 2025. Architecturally it is a Mixture-of-Experts with 671 billion total parameters and 37B active per token, built on top of DeepSeek-V3-Base. R1 was produced via a Reasoning RL pipeline with verifiable rewards — GRPO (Group Relative Policy Optimization), where rewards come from rules (math correctness, code execution, format) rather than a learned reward model. Context: 128,000 tokens. Licence: MIT on the weights, the model is publicly available on Hugging Face.

Released alongside R1 was R1-Zero — a variant trained with pure RL without any SFT cold-start, proving that Reasoning RL can elicit long chain-of-thought and self-correction directly from RL. Production R1 added a short SFT cold-start on a few hundred CoT examples for more readable outputs. A series of distilled variants on Llama 3.1 and Qwen 2.5 (1.5B, 7B, 8B, 14B, 32B, 70B) was also published, porting much of R1's capability to single-GPU models.

Results: AIME 2024 79.8% pass@1, MATH-500 97.3%, Codeforces 96.3 percentile, MMLU 90.8%, GPQA Diamond 71.5%, LiveCodeBench 65.9%, SWE-bench Verified 49.2% — on par with or above OpenAI o1 at a fraction of inference cost. The model is available via the DeepSeek API, Hugging Face, Together AI, Fireworks, OpenRouter, Amazon Bedrock Marketplace and Vertex AI Model Garden. DeepSeek-R1 — together with the GRPO algorithm publication — established the de-facto standard for open Reasoning RL and triggered a wave of reproductions (TinyZero, Open-R1, SimpleRL).

Classification

Reasoning modelLLM

Family: DeepSeek

Applications

Coding Q&A / Question answering Knowledge work Research assistance Brainstorming Writing assistance Document generation Data analysis

Access & deployment

APIDownloadHosted

CloudLocal

Weights: Open weights

Key parameters

📏 Context: 128K

🧩 Parameters: 671B (37B active)

✓ Tools · ✓ Fine-tuning

📥 Input: text

Platforms

Hugging Face Hub Amazon Bedrock Vertex AI

Technical specification

Context window

128K

tokens

Parameters

671B (37B active)

parameters

Max output tokens

32,768

tokens per response

Knowledge cutoff

1 Jul 2024

Knowledge boundary

License

MIT

Hardware requirements

The full model requires a multi-GPU cluster (typically 8×H100 80 GB or larger). Distilled variants (1.5B–70B) run on a single consumer/data-center GPU.

Features:✓ Tool use✓ Fine-tuning

Modalities

⬇ Input

text

⬆ Output

textcode

Capabilities and applications

Native model capabilities

Reasoning

The model's ability to reason logically and solve complex problems.

Category: reasoning

Multi-step reasoning

Carrying out multi-step chains of reasoning across long, complex tasks.

Category: reasoning

Coding

Generating, analysing and modifying code in many programming languages. Covers writing functions, debugging, refactoring, code review, and creating tests. Measured by benchmarks such as HumanEval and SWE-bench.

Category: coding

Planning

Forming and executing action plans for complex tasks.

Category: planning

Long context

Support for large context windows — tens to hundreds of thousands (or millions) of input tokens. Enables analysis of entire codebases, long documents, and many parallel conversations without losing earlier information. GPT-5.1 supports 400,000 tokens.

Category: language

Language modeling

Ability to predict subsequent tokens and generate coherent natural-language text based on the preceding context.

Category: language

Agentic capability

The model's ability to autonomously plan and execute multi-step tasks by sequentially using tools, maintaining context, and adapting to intermediate results.

Category: planning

Structured output

Producing data in structured formats such as JSON.

Category: structured_generation

Diagram reasoning

Category: reasoning

Function Calling

Category: planning

Application domains

Coding Q&A / Question answering Knowledge work Research assistance Brainstorming Writing assistance Document generation Data analysis

Benchmark results

8 benchmarks

AIME 2024

pass@1 · cons@64 (majority voting)

79.8%

📄 DeepSeek-R1 paper (arXiv:2501.12948)

MATH

pass@1 · MATH-500 subset

97.3%

📄 DeepSeek-R1 paper (MATH-500)

Codeforces

percentile · 2,029 ELO equivalent

96.3percentile

📄 DeepSeek-R1 paper

MMLU

accuracy · pass@1

90.8%

📄 DeepSeek-R1 paper

GPQA

pass@1 · GPQA Diamond

71.5%

📄 DeepSeek-R1 paper

LiveCodeBench

pass@1 · COT@8

65.9%

📄 DeepSeek-R1 paper

SWE-bench

resolved · SWE-bench Verified

49.2%

📄 DeepSeek-R1 paper

MMLU-Pro

EM · Exact Match

84.0%

📄 DeepSeek-R1 paper

Technical architecture

Core Architecture

MOMoE TRTransformer RORoPE MHMHA GQGQA

Model Form

LLLLM RMReasoning model

Training Techniques

RRReasoning RL GRGRPO SFSFT RLRLHF ITInstruction Tuning PRPretraining RFRFT COCoT

Deployment and security

☁ Available on platforms

☁Hugging Face HubPlatform ☁Amazon BedrockPlatform ☁Vertex AIPlatform

Sources and related pages

5 sources

PaperDeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learningarxiv.org Repodeepseek-ai/DeepSeek-R1 (GitHub)github.com RepoDeepSeek-R1 on Hugging Facehuggingface.co WebDeepSeek Chat / APIchat.deepseek.com DocsDeepSeek API docsapi-docs.deepseek.com

Browse related topics

📁 DeepSeek 🌐 Coding 🌐 Q&A / Question answering 🌐 Knowledge work 🌐 Research assistance 🧠 MoE 🧠 Transformer 🧠 RoPE ☁ Hugging Face Hub ☁ Amazon Bedrock All reasoning model models All llm models