Nanbeige/Nanbeige4.1-3B
Text Generation • 4B • Updated
• 444k • • 954
datatrove for all things web-scale data preparation: https://github.com/huggingface/datatrovenanotron for lightweight 4D parallelism LLM training: https://github.com/huggingface/nanotronlighteval for in-training fast parallel LLM evaluations: https://github.com/huggingface/lighteval