Poll: Will 2026 be the year of subquadratic attention?
The transformer architecture is cursed by its computational complexity. It is why you run out of tokens and have to compact. But some would argue that this is a feature not a bug and that this is also why these models are so good. We've been doing a lot of research on trying to make equally good models that are computationally cheaper, But so far, none of the approaches have stood the test of time. Or so it seems.
Please vote, don't be shy. Remember that the Dunning-Kruger effect is very real, so the person who knows less about transformers than you is going to vote. We want everyone's opinion, no matter confidence.
π if you think at least one frontier model* will have no O(n^2) attention by the end of 2026 π₯ If you disagree
* Frontier models - models that match / outperform the flagship claude, gemini or chatgpt at the time on multiple popular benchmarks
I just pushed Claude Code Agent Swarm with 20 coding agents on my desktop GPU workstation.
With local AI, I donβt have /fast CC switch, but I have /absurdlyfast: - 100β499 tokens/second read, yeah 100k, not a typo | 811 tok/sec generation - KV cache: 707β200 tokens - Hardware: 5+ year old GPUs 4xA6K gen1; Itβs not the car. Itβs the driver.
Qwen3 Coder Next AWQ with cache at BF16. Scores 82.1% in C# on 29-years-in-dev codebase vs Opus 4.5 at only 57.5%. When your codebase predates Stack Overflow, you don't need the biggest model; you need the one that actually remembers Windows 95.
My current bottleneck is my 27" monitor. Can't fit all 20 Theos on screen without squinting.
The 2025 Chinese LLM Showdown: Western Models Still Dominate Top 4, but China Leads the Open-Source Arena.
π The Champions: Claude-Opus-4.5, Gemini-3-Pro, GPT-5.2, and Gemini-3-Flash sweep the top four spots. π The Pursuers: Doubao and DeepSeek-V3.2 tie for first place among Chinese models; GLM-4.7, ERNIE-5.0, and Kimi secure their positions in the domestic top five. π₯ The Biggest Highlight: The top three spots on the open-source leaderboard are entirely held by Team China (DeepSeek, GLM, Kimi), outperforming the best western open-source models.
4 replies
Β·
reacted to MonsterMMORPG's
post with π14 days ago
Info LTX 2 is the newest state of the art (SOTA) Open Source video generation model and tutorial will show you how to use it with very best and most performant way in ComfyUI and also in SwarmUI. Moreover, Z Image Base model published and I will show how to use Z Image Base with most amazing preset and workflow as well. Furthermore, this tutorial will show you how to install, update, setup, download ComfyUI and SwarmUI and models and presets and workflows both on Windows and on RunPod, Massed Compute and SimplePod. Linux users can use Massed Compute scripts and installers directly. This is a masterpiece entire lecture level complete tutorial. This video will kickstart your AI journey 100x. Both local Windows and Cloud.
45 Second Raw Demo Video
This video made with text + image + audio = lip synched and animated video at once
Just sharing a result of a homelab infrastructure experiment:
I've managed to setup a distributed inference infra at home using a DGX Spark (128GB unified gddr6) and a linux workstation with an RTX 6000 Pro (96GB gddr7) connected via 100Gbps RoCEv2. The model I've used (https://lnkd.in/gx6J7YuB) is about 140GB so could not fit either of the GPU. Full setup and tutorial soon on devquasar.com