AI & ML interests

Helping AI to become AGI

Recent Activity

KingNishĀ  updated a dataset 19 days ago
HelpingAI/Dhanishtha-2.0-SUPERTHINKER
AbhaykoulĀ  updated a dataset 22 days ago
HelpingAI/KS-WIKI
AbhaykoulĀ  published a dataset 22 days ago
HelpingAI/KS-WIKI
View all activity

KingNishĀ 
posted an update 15 days ago
view post
Post
2296
Muon vs MuonClip vs Muon+Adamw

Muon has gone from an experiment to a mainstream optimizer, but does it hold up for fine‑tuning? We ran head‑to‑head tests on Qwen3‑4B (10k+ high‑quality instruction rows) to find out.

Short story: Pure Muon converged fastest at the start, but its gradient‑norm spikes made training unstable. MuonClip (Kimi K2’s clipping) stabilizes long pretraining runs, yet in our small‑scale fine‑tune it underperformed, lower token accuracy and slower convergence. The winner was the hybrid: Muon for 2D layers + AdamW for 1D layers. It delivered the best balance of stability and final performance and even beat vanilla AdamW.

Takeaway: for small-scale fine-tuning, hybrid = practical and reliable.

Next Step: scale to larger models/datasets to see if Muon’s spikes become catastrophic or if clipping wins out.

Full Blog Link: https://huggingface.co/blog/KingNish/optimizer-part1
KingNishĀ 
posted an update 17 days ago
KingNishĀ 
updated a Space about 2 months ago
KingNishĀ 
published a Space about 2 months ago
AbhaykoulĀ 
posted an update 4 months ago
view post
Post
3140
šŸš€ Ever dreamed of training your own Large Language Model from scratch? What if I told you it doesn't require a supercomputer or PhD in ML? 🤯

Introducing LLM Trainer - the educational framework that makes LLM training accessible to EVERYONE! Whether you're on a CPU-only laptop or scaling to distributed GPUs, we've got you covered. šŸ’»āž”ļøšŸ–„ļø

Why LLM Trainer? Because existing tools are either too simplistic (hiding the magic) or too complex (requiring expert knowledge). We bridge the gap with:

šŸŽ“ Educational transparency - every component built from scratch with clear code
šŸ’» CPU-first approach - start training immediately, no GPU needed
šŸ”§ Full customization - modify anything you want
šŸ“ˆ Seamless scaling - from laptop to cluster without code changes
šŸ¤ HuggingFace integration - works with existing models & tokenizers

Key highlights:
āœ… Built-in tokenizers (BPE, WordPiece, HF wrappers)
āœ… Complete Transformer implementation from scratch
āœ… Optimized for CPU training
āœ… Advanced features: mixed precision, gradient checkpointing, multiple generation strategies
āœ… Comprehensive monitoring & metrics

Perfect for:
- Students learning transformers
- Researchers prototyping new ideas
- Developers building domain-specific models

Ready to train your first LLM? It's easier than you think!

šŸ”— Check it out: https://github.com/HelpingAI/llm-trainer
šŸ“š Docs: Getting Started Guide
šŸ’¬ Join the community: GitHub Discussions

#AI #MachineLearning #LLM #DeepLearning #OpenSource #Python #HuggingFace #NLP

Special thanks to HuggingFace and PyTorch teams for the amazing ecosystem! šŸ™
  • 1 reply
Ā·

remove dupe

#2 opened 4 months ago by
Vortexjr

Update README.md

#1 opened 4 months ago by
Vortexjr