Asankhaya Sharma
AI & ML interests
Recent Activity
Organizations
Precisely, in fact this quest started by trying to see if attention can implement and track a simple hash table that would be sufficient to solve hash hop. One of the reasons we havenβt heard from Magic labs is because tool calling + reasoning has made long context less relevant.
Good question! Let me clarify - the MALM and Qwen models don't share tokens directly.
MALM has its own tokenizer where each function name becomes a single token. It encodes the query, searches its memory bank, and returns the top matching functions as plain text (function name, signature, docstring, code).
This retrieved text is then simply concatenated with the user query as a prompt to Qwen. Qwen tokenizes this combined prompt using its own tokenizer and generates code.
So the flow is:
User query goes to MALM
- MALM retrieves relevant functions as text
- Text prompt = query + retrieved code
- Qwen tokenizes this prompt with its own tokenizer
- Qwen generates output
There's no token-level integration - just text passing between the two models. MALM acts as a retrieval layer that provides relevant context, and Qwen does the generation with that context in its prompt.
The single-token-per-key insight is only within MALM for perfect retrieval. Qwen sees regular text.
I wrote a deep dive into how Magic AI's 100M token context window might work, starting from their HashHop benchmark and building up to MALM - a Memory-Augmented Language Model.
Key insight: treating each key as a single token enables perfect retrieval at unlimited context lengths.
The article covers:
- How HashHop works and why its perfect accuracy is suspicious
- Building a tokenized solver that achieves 100% accuracy
- Scaling to MALM for real code search tasks
- Why this approach could handle 100M+ tokens
Read the full article: https://huggingface.co/blog/codelion/reverse-engineering-magic-hashhop
Try the model: codelion/malm-165m
Code: https://github.com/codelion/hash-hop
I wrote a deep dive into how Magic AI's 100M token context window might work, starting from their HashHop benchmark and building up to MALM - a Memory-Augmented Language Model.
Key insight: treating each key as a single token enables perfect retrieval at unlimited context lengths.
The article covers:
- How HashHop works and why its perfect accuracy is suspicious
- Building a tokenized solver that achieves 100% accuracy
- Scaling to MALM for real code search tasks
- Why this approach could handle 100M+ tokens
Read the full article: https://huggingface.co/blog/codelion/reverse-engineering-magic-hashhop
Try the model: codelion/malm-165m
Code: https://github.com/codelion/hash-hop