| | --- |
| | license: apache-2.0 |
| | pipeline_tag: text-generation |
| | datasets: |
| | - codeparrot/github-code-clean |
| | - bigcode/starcoderdata |
| | - bigcode/the-stack-smol |
| | tags: |
| | - diffusion |
| | - llm |
| | - diffreaper |
| | - dllm |
| | - mercury |
| | language: |
| | - en |
| | --- |
| | # DiffReaper 3 |
| |
|
| | DiffReaper 3 is the third revision of DiffReaper; a experimental 1.5B parameter Discrete Diffusion Language Model (dLLM) designed for high-throughput parallel token prediction. |
| | Unlike traditional autoregressive models, DiffReaper is optimized for non-linear sequence refinement across mixed Python logic and natural language corpora. |
| |
|
| | ## Model Details |
| | - **Architecture:** 24-Layer Transformer Encoder |
| | - **Hidden Dimension:** 2048 |
| | - **Attention Heads:** 16 |
| | - **Objective:** Discrete Masked Diffusion (Mercury-style) |
| | - **Training Precision:** BF16 |
| | - **Context Window:** 1024 tokens |