ยท
AI & ML interests
Efficient Training Recipes of Large Models (mostly LLMs)
Organizations
repliedto their post 2 months ago view post Are you familiar with reverse residual connections or looping in language models?Excited to share my Looped-GPT blog post and codebase ๐https://github.com/sanyalsunny111/Looped-GPTTL;DR: looping during pre-training improves generalization.Plot shows GPT2 LMs pre-trained with 15.73B OWT tokensP.S. This is my first post here โ I have ~4 followers and zero expectations for reach ๐