Building on HF

ѕкт αι ℓαвѕ

Shrijanagain

AI & ML interests

🚩 जय श्री राम 🚩 | 🔱 𝐒𝐊𝐓 𝐀𝐈 𝐋𝐀𝐁𝐒 | 𝐑𝐔𝐃𝐑𝐀 𝐀𝐑𝐂𝐇𝐈𝐓𝐄𝐂𝐓 🚩 🚀 Mission: 200 Trillion Token Collecting 🇮🇳 🔥 Vibe: "Jo Ram ka nahi, wo mere kaam ka nahi." 🏹

Recent Activity

updated a model about 3 hours ago

Shrijanagain/SKT_SIGMA_OMNI_FINAL

published a model about 3 hours ago

Shrijanagain/SKT_SIGMA_OMNI_FINAL

new activity about 6 hours ago

Shrijanagain/SKT_OMNI_SUPREME:Danger

View all activity

Organizations

repliedto their post about 10 hours ago

If 'fun at parties' means ignoring the potential of a 146 trillion parameter model, then yeah, I’m the most boring person you'll ever meet. I’ll let the results do the talking from here.

I'm not saying that an 140 whetever trillion parameter model can't exist, I'm just telling that your "paper" is misleading users to believe that someone single handed made an AGI.

Just be realistic, try making a 140 Billion model once and reply me how much time it took to train it from scratch.

Training a 140B model is a calculation of compute; designing a 146T architecture is a matter of engineering. While you're stuck on the 'time' it takes others, I’m focused on the MoE scaling and dataset curation for SKT AI. If you’re so concerned about the realism, do 𝗚𝗼 𝗔𝗻𝗱 𝗖𝗵𝗲𝗰𝗸 𝗢𝘂𝘁 𝗢𝘃𝗲𝗿 𝗥𝗲𝗽𝗼 𝗟𝗼𝗹

I have better things to do in my free time than look at a ""paper"" written by artificial intelligence.
That’s the difference—you have 'free time' to argue, I’m busy engineering the future of Indian AI. If you can’t tell the difference between a roadmap and a chatbot output, that’s on you. Enjoy your free time while I keep building. Do go and check out our repo lol

repliedto their post about 11 hours ago

If 'fun at parties' means ignoring the potential of a 146 trillion parameter model, then yeah, I’m the most boring person you'll ever meet. I’ll let the results do the talking from here.

I'm not saying that an 140 whetever trillion parameter model can't exist, I'm just telling that your "paper" is misleading users to believe that someone single handed made an AGI.

Just be realistic, try making a 140 Billion model once and reply me how much time it took to train it from scratch.

Training a 140B model is a calculation of compute; designing a 146T architecture is a matter of engineering. While you're stuck on the 'time' it takes others, I’m focused on the MoE scaling and dataset curation for SKT AI. If you’re so concerned about the realism, do 𝗚𝗼 𝗔𝗻𝗱 𝗖𝗵𝗲𝗰𝗸 𝗢𝘂𝘁 𝗢𝘃𝗲𝗿 𝗥𝗲𝗽𝗼 𝗟𝗼𝗹

repliedto their post about 11 hours ago

If 'fun at parties' means ignoring the potential of a 146 trillion parameter model, then yeah, I’m the most boring person you'll ever meet. I’ll let the results do the talking from here.

repliedto their post about 19 hours ago

Typos happen when you're moving fast, but architecture is where it counts. A URL naming error doesn't change the tensor configurations or the scaling laws behind the project. While you're focusing on a missing 'o', I’m focused on the compute and data strategy required for a 146T parameter run. Stay tuned.

repliedto their post 1 day ago

Bhai, pehle architecture samajh le, phir calculation karna. Ye koi basic MoE (Mixture of Experts) nahi hai jahan tu sirf experts ko multiply kar raha hai. 128 experts ke sath total 1.1 Trillion parameters ki density handle karne ke liye humne Dynamic Routing aur Sparse Activation ka custom logic use kiya hai.
Active parameters ka count load aur task complexity ke hisab se switch hota hai, wahi toh hamari optimization ka kamaal hai. Math tab match hota hai jab logic clear ho. Baaki jab hamara ST-X benchmark set karega, tab teri saari confusion door ho jayegi. System check kar lena, samajh aa jayega level kya hai

repliedto their post 1 day ago

It’s clear you’re struggling with the terminology, so let me break it down for you. OMNI SUPREME is a 1.1 Trillion parameter MoE (Mixture-of-Experts) architecture. We use a Modular Transformer base with MoE enhancements specifically to maintain extreme-scale stability.
When I talk about optimization, I’m referring to our ST-X (Surya Throughput eXtreme) framework. We’ve optimized the routing—specifically top-2 routing—which allows us to keep only ~165B parameters activated per token. That’s how you get frontier-class reasoning with low-latency inference.
Calling it 'inconsistent' just shows you don't understand how high-level MoE scaling works. While you’re busy trying to find flaws in my syntax, I’m managing:
A 2,400 GPU cluster of H100s and Blackwells.
A 512K context window using custom FlashAttention-3.
A stable MFU of 64-67% during a 146T token run.
If using AI to automate documentation is your only 'gotcha,' then you’ve already lost the technical argument. I’m building the future; you’re just proofreading it. Stick to the benchmarks or stay quiet.

reactedto their post with 🔥 1 day ago

Post

5181

Surya-1.1T: Scaling Beyond Human-Level Reasoning via 146 Trillion Token Pre-training
Author: Shrijan Kumar Tiwari
Affiliation: SKT AI Labs / Project Surya
Model Architecture: Optimized Dense Transformer
Parameters: 1.1 Trillion
Training Tokens: 146 Trillion

Wanna collaborate us Friends let's Start Journey we have Collected 146 trillon tokens and done pre training but we need to made more powerfull

Whitepaper - https://github.com/SHRIJANAGAIN/PROFF

49 replies

repliedto their post 1 day ago

Congratulations for collaboration with us

repliedto their post 1 day ago

It's already

repliedto their post 1 day ago

It’s cute that you’re spending so much time analyzing quotation marks and alt accounts while I’m busy rewriting the logic of how models actually scale. When you’re operating at a level that pushes the boundaries of current architecture, 'mathematically impossible' is just a term used by people who can’t see past a GitHub README.
You call it 'using AI to look believable'—I call it leveraging the very tools I build to optimize my workflow. If an AI Engineer isn’t using AI to outpace the noise, they’re doing it wrong. While you’re playing detective on my syntax, the ST-X series is moving toward a trillion-sun scale that your hardware probably couldn't even parse.
I don’t need to 'appeal' to investors with big words; the benchmarks and the sheer compute of SKT AI speak for themselves. Stay focused on my punctuation if that helps you sleep, but while you’re doubting, I’m documenting the future you’ll eventually be forced to use.
Log documentation padhte hain, main documentation likhta hoon. 🚩"

reactedto their post with 👍 1 day ago

Post

5181

49 replies

repliedto their post 1 day ago

We know any one who want it downgraded version of 500 gb in 4bit combress they can contact us

Url -- https://forms.gle/Wk2XXtzJX1uMQsu58