AiLLM• deepseek R1 • openmodel

Open R1

A fully open reproduction of DeepSeek-R1. let’s build it together!

Open R1 represents a groundbreaking community-driven initiative to recreate DeepSeek-R1’s advanced AI capabilities through transparent, open-source methodologies. 

Open R1 Chat free online

Limitless Possibilities

Open R1 represents a groundbreaking community-driven initiative to recreate DeepSeek-R1’s advanced AI capabilities through transparent, open-source methodologies. 

How to install Open R1

To run the code in this project, first, create a Python virtual environment using e.g. Conda:

conda create -n openr1 python=3.11 && conda activate openr1

Next, install vLLM:

pip install vllm==0.6.6.post1

# For HF (cluster only has CUDA 12.1)
pip install vllm==0.6.6.post1 --extra-index-url https://download.pytorch.org/whl/cu121

This will also install PyTorch v2.5.1 and it is very important to use this version since the vLLM binaries are compiled for it. You can then install the remaining dependencies for your specific use case via pip install -e .[LIST OF MODES]. For most contributors, we recommend:

pip install -e ".[dev]"

Next, log into your Hugging Face and Weights and Biases accounts as follows:

huggingface-cli login
wandb login

Finally, check your system has Git LFS installed so that you can load and push models/datasets to the Hugging Face Hub:

git-lfs --version

If it isn’t installed, run:

sudo apt-get install git-lfs

Essential FAQs About Open R1: The Open-Source DeepSeek-R1 Implementation

Q1: What is Open R1 and how does it relate to DeepSeek-R1?

Open R1 is an open-source reproduction of DeepSeek-R1, providing full implementation of its reasoning-optimized training pipeline. The project includes complete toolchain (GRPO training, SFT fine-tuning, synthetic data generation) under MIT license, though original training data remains proprietary.

Q2: How can I contribute to the Open R1 project?

Contribution pathways include:
1. Code Development: Improve training scripts (grpo.py/sft.py)
2. Dataset Curation: Submit high-quality math/code reasoning datasets
3. Documentation: Create multilingual tutorials
4. Model Evaluation: Submit benchmark results using evaluate.py

Q3: What are the technical advantages of GRPO training in Open R1?

GRPO (Group Relative Policy Optimization) features:
Group Reward Mechanism: Optimizes policy through multi-trajectory comparison
Hybrid Training: Combines 16-step gradient accumulation with BF16 precision
Dynamic Filtering: Automatically retains top 5% reasoning paths

accelerate launch --config_file configs/zero3.yaml src/open_r1/grpo.py \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 16