--- base_model: - snwy/frankenqwen3-8B-235B-dense-conversion-interleaved-untuned library_name: transformers tags: - fine-tuning - prose - GRPO - axolotl - finetune - roleplaying - creative-writing ---
A sequel! The new Nanuq series is meant to be as a testing grounds for my GRPO experiments, This model is a full post-train heal of Snwy's Frankenmerge between Q3 235B and Q3 8B.
Pretrained for 2 epochs on 1B tokens of Creative Writing data, Then SFT with alot of my own and Pocketdoc's Instruct dataset, and then GRPO'd with the Claude-2.7K dataset in an attempt to align it to be more like Claude with POLARS and Verifiers
There's alot of things i could do different, As the reward almost falls flat as soon as you get out of warm-up but this model was pretty decent so decided to release it(Esp considering it's starting place), Hope people enjoy it!
Model has been tuned with the ChatML formatting. A typical input would look like this:
"""<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
<|im_start|>assistant
"""
The training was done for 2 epochs of Pretraining and 2 epochs of SFT and finally 500 steps of GRPO using Verifiers with 8 x H200s GPUs for the fine-tuning of the model.
Thank you to Intervitens, Cgato, Kubernetes Bad, Cgato, Snwy, Auri, Will Brown and most of all: Kalomaze