AiLLM• deepseek R1 • openmodel
Open R1
A fully open reproduction of DeepSeek-R1. let’s build it together!
Open R1 represents a groundbreaking community-driven initiative to recreate DeepSeek-R1’s advanced AI capabilities through transparent, open-source methodologies.
Open R1 Chat free online
Limitless Possibilities
Open R1 represents a groundbreaking community-driven initiative to recreate DeepSeek-R1’s advanced AI capabilities through transparent, open-source methodologies.
How to install Open R1
To run the code in this project, first, create a Python virtual environment using e.g. Conda:
conda create -n openr1 python=3.11 && conda activate openr1
Next, install vLLM:
pip install vllm==0.6.6.post1 # For HF (cluster only has CUDA 12.1) pip install vllm==0.6.6.post1 --extra-index-url https://download.pytorch.org/whl/cu121
This will also install PyTorch v2.5.1
and it is very important to use this version since the vLLM binaries are compiled for it. You can then install the remaining dependencies for your specific use case via pip install -e .[LIST OF MODES]
. For most contributors, we recommend:
pip install -e ".[dev]"
Next, log into your Hugging Face and Weights and Biases accounts as follows:
huggingface-cli login wandb login
Finally, check your system has Git LFS installed so that you can load and push models/datasets to the Hugging Face Hub:
git-lfs --version
If it isn’t installed, run:
sudo apt-get install git-lfs
Essential FAQs About Open R1: The Open-Source DeepSeek-R1 Implementation
Q1: What is Open R1 and how does it relate to DeepSeek-R1?
Open R1 is an open-source reproduction of DeepSeek-R1, providing full implementation of its reasoning-optimized training pipeline. The project includes complete toolchain (GRPO training, SFT fine-tuning, synthetic data generation) under MIT license, though original training data remains proprietary.
Q2: How can I contribute to the Open R1 project?
Contribution pathways include:
1. Code Development: Improve training scripts (grpo.py/sft.py)
2. Dataset Curation: Submit high-quality math/code reasoning datasets
3. Documentation: Create multilingual tutorials
4. Model Evaluation: Submit benchmark results using evaluate.py
Q3: What are the technical advantages of GRPO training in Open R1?
GRPO (Group Relative Policy Optimization) features:
– Group Reward Mechanism: Optimizes policy through multi-trajectory comparison
– Hybrid Training: Combines 16-step gradient accumulation with BF16 precision
– Dynamic Filtering: Automatically retains top 5% reasoning paths
accelerate launch --config_file configs/zero3.yaml src/open_r1/grpo.py \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 16