| | --- |
| | license: apache-2.0 |
| | language: |
| | - en |
| | base_model: |
| | - darwinkernelpanic/DiffReaper-3 |
| | --- |
| | # DiffReaper-Talk |
| |
|
| | A 1.5B parameter Discrete Diffusion Language Model (dLLM) optimized for parallel token prediction. Trained during foundational pre-training phase on general text corpora. |
| |
|
| | ## Summary |
| | DiffReaper-Talk uses a Transformer-based discrete diffusion architecture to predict multiple tokens in parallel. This approach avoids the sequential bottleneck of standard autoregressive generation. |
| |
|
| | ## Technical Details |
| | - **Architecture:** 24-Layer Transformer Encoder |
| | - **Embedding Dim:** 2048 |
| | - **Heads:** 16 |
| | - **Parameters:** ~1.5 Billion |
| | - **Hardware:** 1x NVIDIA A100 (80GB VRAM) |
| | - **Objective:** Markovian Discrete Denoising (Continuous Embedding Space) |
| | - **Precision:** Mixed BF16 |
| | - **Context Window:** 1024 Tokens |
| |
|
| | ## Current Status |
| | Phase 2 (Logic) Complete. Logic and domain-specific training (Code) to be applied post-convergence. |