OpenR1-Qwen-7B
This is a finetune of Qwen2.5-Math-Instruct on OpenR1-220k-Math (default
split).
Quick start
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "open-r1/OpenR1-Qwen-7B"
device = "cuda"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Find the value of $x$ that satisfies the equation $4x+5 = 6x+7$."
messages = [
{"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
{"role": "user", "content": prompt}
]
Training
We train the model on the default
split of OpenR1-220k-Math for 3 epochs. We use learning rate of 5e-5 and extend the context length from 4k to 32k, by increasing RoPE frequency to 300k. The training follows a linear learning rate schedule with a 10% warmup phase. The table below compares the performance of OpenR1-Qwen-7B to DeepSeek-Distill-Qwen-7B and OpenThinker-7B using lighteval.
You can find the training and evaluation code at: https://github.com/huggingface/open-r1/
Model |
MATH-500 |
AIME24 |
AIME25 |
DeepSeek-Distill-Qwen-7B |
91.6 |
43.3 |
40.0 |
OpenR1-Qwen-7B |
90.6 |
36.7 |
40.0 |
OpenThinker-7B |
89.6 |
30.0 |
33.3 |