SOKRATES: Qwen3-8B PrOntoQA OaK-DPO Iteration 2
Second DPO iteration achieving 98.1% accuracy on PrOntoQA.
Performance
| Stage | Accuracy |
|---|---|
| SFT | 93.3% |
| DPO Iter 1 | 96.8% |
| DPO Iter 2 | 98.1% |
Usage
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"Moonlight556/sokrates-qwen3-8b-prontoqa-oak-dpo-iter2",
torch_dtype="bfloat16"
)
- Downloads last month
- 3