Running 3.58k The Ultra-Scale Playbook π 3.58k The ultimate guide to training LLM on large GPU Clusters
Running on CPU Upgrade Featured 2.61k The Smol Training Playbook π 2.61k The secrets to building world-class LLMs
aisingapore/Qwen-SEA-LION-v4-32B-IT-4BIT Text Generation β’ 6B β’ Updated 16 days ago β’ 708 β’ 3
SEA-VL: Multicultural VL Dataset for Southeast Asia Collection Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia β’ 3 items β’ Updated Apr 12 β’ 20
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper β’ 2503.07920 β’ Published Mar 10 β’ 101
You Only Cache Once: Decoder-Decoder Architectures for Language Models Paper β’ 2405.05254 β’ Published May 8, 2024 β’ 10
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning Paper β’ 2410.02884 β’ Published Oct 3, 2024 β’ 54