Blog-explorers

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

ccocks-deca new activity 20 days ago

blog-explorers/README:Card forensics using VLM localized prompt approach

suayptalha authored a paper about 1 month ago

Superpositional Gradient Descent: Harnessing Quantum Principles for Model Training

ccocks-deca new activity about 2 months ago

blog-explorers/README:Pending Blog-Explorers Access Request

View all activity

ccocks-deca

in blog-explorers/README 20 days ago

Card forensics using VLM localized prompt approach

#11 opened 21 days ago by

vijayksagi

antonioloison

authored a paper about 1 month ago

Surfer 2: The Next Generation of Cross-Platform Computer Use Agents

Paper • 2510.19949 • Published Oct 22 • 38

ccocks-deca

in blog-explorers/README about 2 months ago

Pending Blog-Explorers Access Request

#10 opened about 2 months ago by

KarthikAvinash

nouamanetazi

posted an update about 2 months ago

Post

4029

After training 𝐒𝐦𝐨𝐥𝐋𝐌𝟑 on 𝟑𝟖𝟒 𝐇𝟏𝟎𝟎𝐬 for nearly a month, I've come to realize something most people overlook: 𝐢𝐧𝐟𝐫𝐚𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞 𝐢𝐬 𝐭𝐡𝐞 𝐦𝐚𝐤𝐞-𝐨𝐫-𝐛𝐫𝐞𝐚𝐤 𝐟𝐚𝐜𝐭𝐨𝐫 𝐢𝐧 𝐋𝐋𝐌 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠. 🔥

Everyone talks about model architecture and data quality. And yes, those matter immensely. But here's what nobody tells you: when your training run fails at 2 AM because of mysterious 𝐍𝐂𝐂𝐋 𝐞𝐫𝐫𝐨𝐫𝐬, or when your expensive GPU cluster is running at 𝟔𝟎% 𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐜𝐲, the problem isn't your model. It's most probably a 𝐦𝐢𝐬𝐮𝐬𝐞 𝐨𝐟 𝐭𝐡𝐞 𝐡𝐚𝐫𝐝𝐰𝐚𝐫𝐞. 🛠️

Questions that seemed simple but had no clear answers: Why is 𝐌𝐨𝐄 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐬𝐥𝐨𝐰𝐞𝐫 𝐭𝐡𝐚𝐧 𝐝𝐞𝐧𝐬𝐞 𝐦𝐨𝐝𝐞𝐥𝐬? Which 𝐍𝐂𝐂𝐋 𝐟𝐥𝐚𝐠𝐬 should we actually set? How often should we checkpoint without killing throughput?

That's why we built 𝐓𝐡𝐞 𝐒𝐦𝐨𝐥 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐏𝐥𝐚𝐲𝐛𝐨𝐨𝐤 📖: a complete guide covering everything from model architecture and data curation to the SmolLM3 training marathon, post-training techniques, and crucially, the 𝐢𝐧𝐟𝐫𝐚𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞 𝐥𝐚𝐲𝐞𝐫 that most teams get wrong.

We validated real vs theoretical bandwidth across the entire stack: 𝐇𝐁𝐌𝟑 𝐡𝐢𝐭𝐭𝐢𝐧𝐠 𝟑 𝐓𝐁/𝐬, 𝐍𝐕𝐋𝐢𝐧𝐤 𝟒.𝟎 𝐫𝐞𝐚𝐜𝐡𝐢𝐧𝐠 𝟕𝟖𝟔 𝐆𝐁/𝐬, 𝐏𝐂𝐈𝐞 𝐆𝐞𝐧𝟒 𝐚𝐭 𝟏𝟒.𝟐 𝐆𝐁/𝐬. Then we ran collective operations across 𝟏𝟐𝟖 𝐆𝐏𝐔𝐬 (16 nodes, 8xH100s each) and measured how performance degrades at scale: all-reduce drops from 𝟒𝟖𝟎 𝐆𝐁/𝐬 on a single node to 𝟑𝟐𝟎-𝟑𝟓𝟎 𝐆𝐁/𝐬 across 16 nodes.

If you've ever wondered why your training runs are slower than they should be, or you're planning to scale up and want to avoid expensive mistakes, this guide might save you weeks of debugging.

𝐓𝐡𝐞 𝐒𝐦𝐨𝐥 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐏𝐥𝐚𝐲𝐛𝐨𝐨𝐤: https://lnkd.in/e5MKXUHS

Shared with ❤️ by the HuggingFace team

huybery

authored a paper about 2 months ago

VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos

Paper • 2510.19488 • Published Oct 22 • 19

KennyUTC

authored 3 papers about 2 months ago

A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

Paper • 2508.21148 • Published Aug 28 • 140

SPARK: Synergistic Policy And Reward Co-Evolving Framework

Paper • 2509.22624 • Published Sep 26 • 17

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

Paper • 2510.08540 • Published Oct 9 • 109

lbourdois

posted an update 2 months ago

Post

1234

New blog post analyzing the top 50 entities with the most downloaded models on @huggingface 🤗!

https://huggingface.co/blog/lbourdois/huggingface-models-stats

The purpose here is to get an idea of the profile of the models with the greatest impact in open source (we are not interested in closed models here!).

32 figures + data

Enjoy 🤗

antonioloison

authored 2 papers 2 months ago

ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval

Paper • 2505.17166 • Published May 22

ModernVBERT: Towards Smaller Visual Document Retrievers

Paper • 2510.01149 • Published Oct 1 • 30

mlabonne

posted an update 2 months ago

Post

7204

LiquidAI/LFM2-8B-A1B just dropped!

8.3B params with only 1.5B active/token 🚀

> Quality ≈ 3–4B dense, yet faster than Qwen3-1.7B
> MoE designed to run on phones/laptops (llama.cpp / vLLM)
> Pre-trained on 12T tokens → strong math/code/IF

1 reply

mlabonne

posted an update 3 months ago

Post

3700

⚛️ New drop of tiny task-specific models!

Want to do data extraction, translation, RAG, tool use, or math on a Raspberry Pi? We got you covered! ✅

These tiny models were fine-tuned to perform narrow tasks extremely well, making them competitive with much larger models.

You can deploy them today on-device or even on GPUs for big data operations!

LiquidAI/liquid-nanos-68b98d898414dd94d4d5f99a

1 reply

ehristoforu

posted an update 3 months ago

Post

2266

🚀Hello from the Project Fluently team!

✨ We are happy to share with you our new universal LLM models based on Qwen3 1.7B and 4B — powerful, multilingual and ready to solve a wide range of problems!

🛠️ We have conducted additional training and carefully merged them to achieve even better results and maximize the potential of the models.

🆓 And most importantly — the models are completely open and free under the Apache-2.0 license!

🔗 Links to repositories:
- FluentlyQwen3-4B: fluently/FluentlyQwen3-4B
- FluentlyQwen3-1.7B: fluently/FluentlyQwen3-1.7B

😍 We will be very glad to hear your feedback and impressions! Your opinion is very important to us!

vansin

posted an update 3 months ago

Post

2313

A cute Intern With Hugging Face

yjernite

posted an update 3 months ago

Post

2581

Tremendous quality of life upgrade on the Hugging Face Hub - we now have auto-complete emojis 🤗 🥳 👏 🙌 🎉

Get ready for lots more very serious analysis on a whole range of topics from yours truly now that we have unlocked this full range of expression 😄 🤔 🗣 🙊