Update README.md
Browse files
README.md
CHANGED
|
@@ -272,7 +272,9 @@ a:hover .link-arrow {
|
|
| 272 |
</div>
|
| 273 |
</div>
|
| 274 |
<div class="model-description">
|
| 275 |
-
<p>A sequel! The new Nanuq series is meant to be as a testing grounds for my GRPO experiments,
|
|
|
|
|
|
|
| 276 |
</div>
|
| 277 |
</div>
|
| 278 |
</div>
|
|
|
|
| 272 |
</div>
|
| 273 |
</div>
|
| 274 |
<div class="model-description">
|
| 275 |
+
<p>A sequel! The new Nanuq series is meant to be as a testing grounds for my GRPO experiments, This model is meant to have great Instruct Following and System prompt Adherence in Creative Scenarios.</p>
|
| 276 |
+
<p>Built ontop of Austral Xgen 9B, I made an RL env using PrimeIntellect-ai/verifiers and implemented InternLM/POLAR in said env, then using Pocketdoc's Systemmax dataset, I finetuned the model for 150 steps and this was the result.</p>
|
| 277 |
+
<p>There's alot of things i could do different, As the reward almost falls flat as soon as you get out of warm-up but this model was pretty decent so decided to release it, Hope people enjoy it!</p>
|
| 278 |
</div>
|
| 279 |
</div>
|
| 280 |
</div>
|