Delta-Vector commited on
Commit
59a2a8c
·
verified ·
1 Parent(s): b286f7d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -272,7 +272,9 @@ a:hover .link-arrow {
272
  </div>
273
  </div>
274
  <div class="model-description">
275
- <p>A sequel! The new Nanuq series is meant to be as a testing grounds for my GRPO experiments, Built ontop of Austral Xgen 9B, I made an RL env using PrimeIntellect-ai/verifiers and implemented InternLM/POLAR in said env, then using Pocketdoc's Systemmax dataset, I finetuned the model for 150 steps and this was the result.</p>
 
 
276
  </div>
277
  </div>
278
  </div>
 
272
  </div>
273
  </div>
274
  <div class="model-description">
275
+ <p>A sequel! The new Nanuq series is meant to be as a testing grounds for my GRPO experiments, This model is meant to have great Instruct Following and System prompt Adherence in Creative Scenarios.</p>
276
+ <p>Built ontop of Austral Xgen 9B, I made an RL env using PrimeIntellect-ai/verifiers and implemented InternLM/POLAR in said env, then using Pocketdoc's Systemmax dataset, I finetuned the model for 150 steps and this was the result.</p>
277
+ <p>There's alot of things i could do different, As the reward almost falls flat as soon as you get out of warm-up but this model was pretty decent so decided to release it, Hope people enjoy it!</p>
278
  </div>
279
  </div>
280
  </div>