zero-gpu-explorers (ZeroGPU Explorers)

posted an update about 3 hours ago

Post

42

Announcing: OpenMed Multilingual PII Detection Models

Today I am releasing 105 open-source models for Personally Identifiable Information (PII) detection in French, German, and Italian.

All Apache 2.0 licensed. Free for commercial use. No restrictions.

Performance:

- French: 97.97% F1 (top model)
- German: 97.61% F1 (top model)
- Italian: 97.28% F1 (top model)

All top-10 models per language exceed 96% F1

Coverage:

55+ PII entity types per language
Native ID formats: NSS (French), Sozialversicherungsnummer (German), Codice Fiscale (Italian)
Language-specific address, phone, and name patterns

Training Data:

French: 49,580 samples
German: 42,250 samples
Italian: 40,944 samples

Why Multilingual?

European healthcare operates in European languages. Clinical notes, patient records, and medical documents are generated in French, German, Italian, and other languages.

Effective de-identification requires:

- Native language understanding — not translation
- Local ID format recognition — each country has unique patterns
- Cultural context awareness — names, addresses, and formats vary
- These models deliver production-ready accuracy without requiring data to leave your infrastructure or language.

HIPAA & GDPR Compliance
Built for US and European privacy regulations:

- On-premise deployment: Process data locally with zero external dependencies
- Data sovereignty: No API calls, no cloud services, no cross-border transfers
- Air-gapped capable: Deploy in fully isolated environments if required
- Regulatory-grade accuracy: Supporting Expert Determination standards
- HIPAA and GDPR compliance across languages, without compliance gaps.

Use Cases
- Hospital EHR systems: Automated patient record de-identification
- Clinical research: Multilingual dataset preparation for studies
- Insurance companies: Claims processing across

https://huggingface.co/collections/OpenMed/multilingual-pii-and-de-identification

1 reply

·

mitkox

posted an update 2 days ago

Post

4389

I just pushed Claude Code Agent Swarm with 20 coding agents on my desktop GPU workstation.

With local AI, I don’t have /fast CC switch, but I have /absurdlyfast:
- 100’499 tokens/second read, yeah 100k, not a typo | 811 tok/sec generation
- KV cache: 707’200 tokens
- Hardware: 5+ year old GPUs 4xA6K gen1; It’s not the car. It’s the driver.

Qwen3 Coder Next AWQ with cache at BF16. Scores 82.1% in C# on 29-years-in-dev codebase vs Opus 4.5 at only 57.5%. When your codebase predates Stack Overflow, you don't need the biggest model; you need the one that actually remembers Windows 95.

My current bottleneck is my 27" monitor. Can't fit all 20 Theos on screen without squinting.

1 reply

·

MaziyarPanahi

posted an update 3 days ago

Post

1126

From Golden Gate Bridge to Broken JSON: Why Anthropic's SAE Steering Fails for Structured Output

I ran 6 experiments trying to use Anthropic's SAE steering for JSON generation.

- Base model: 86.8% valid JSON
- Steering only: 24.4%
- Fine-tuned: 96.6%
- FSM constrained: 100%

Steering is for semantics, not syntax.

https://huggingface.co/blog/MaziyarPanahi/sae-steering-json

MaziyarPanahi

posted an update 4 days ago

Post

3864

🚨 Day 8/8: OpenMed Medical Reasoning Dataset Release - THE GRAND FINALE

Today I complete my 8-day release series with Medical-Reasoning-SFT-Mega.
The largest open medical reasoning dataset, combining 7 state-of-the-art AI models with fair distribution deduplication.

THE 7 SOURCE MODELS (Original Sample Counts):

1. Trinity-Mini: 810,284 samples
2. Qwen3-Next-80B: 604,249 samples
3. GPT-OSS-120B: 506,150 samples
4. Nemotron-Nano-30B: 444,544 samples
5. GLM-4.5-Air: 225,179 samples
6. MiniMax-M2.1: 204,773 samples
7. Baichuan-M3-235B: 124,520 samples

TOTAL BEFORE DEDUPLICATION: 2,919,699 samples

TOKEN COUNTS:
- Content tokens: 2.22 Billion
- Reasoning tokens: 1.56 Billion
- Total tokens: 3.78 Billion
- Samples with chain-of-thought: 100%

Quick Start:

from datasets import load_dataset
ds = load_dataset("OpenMed/Medical-Reasoning-SFT-Mega")

6 replies

·

mitkox

posted an update 13 days ago

Post

253

▐▛██▜▌ Claude Code v2.1.23
▝████▘ Kimi-K2.5 · API Usage Billing
▘▘ ▝▝ ~/dev/vllm
/model to try Opus 4.5
❯ hey
● Hello! How can I help you today?
❯ what model are you?
● I'm Claude Kimi-K2.5, running in a local environment on Linux.

Took some time to download and vLLM hybrid inferencing magic to get it running on my desktop workstation.

codelion

posted an update 18 days ago

Post

3083

Reverse Engineering a $500M Mystery: From HashHop to Memory-Augmented Language Models

I wrote a deep dive into how Magic AI's 100M token context window might work, starting from their HashHop benchmark and building up to MALM - a Memory-Augmented Language Model.

Key insight: treating each key as a single token enables perfect retrieval at unlimited context lengths.

The article covers:

- How HashHop works and why its perfect accuracy is suspicious
- Building a tokenized solver that achieves 100% accuracy
- Scaling to MALM for real code search tasks
- Why this approach could handle 100M+ tokens

Read the full article: https://huggingface.co/blog/codelion/reverse-engineering-magic-hashhop

Try the model: codelion/malm-165m

Code: https://github.com/codelion/hash-hop

1 reply

·

mitkox

posted an update 20 days ago

Post

1505

GLM-4.7-Flash is fast, good and cheap.
3,074 tokens/sec peak at 200k tokens context window on my desktop PC.
Works with Claude Code and opencode for hours. No errors, drop-in replacement of the Anthropic cloud AI.
MIT licensed, open weights, free for commercial use and modifications.
Supports speculative decoding using MTP, which is highly effective in mitigating latency.
Great for on device AI coding as AWQ 4bit at 18.5 GB. Hybrid inference on a single consumer GPU + CPU RAM.

3 replies

·

MaziyarPanahi

posted an update about 1 month ago

Post

3694

🎉 OpenMed 2025 Year in Review: 6 Months of Open Medical AI

I'm thrilled to share what the OpenMed community has accomplished since our July 2025 launch!

📊 The Numbers

29,700,000 downloads Thank you! 🙏

- 481 total models (475 medical NER models + 6 fine-tuned LLMs)
- 475 medical NER models in [OpenMed](

OpenMed ) organization
- 6 fine-tuned LLMs in [openmed-community](

openmed-community )
- 551,800 PyPI downloads of the [openmed package](https://pypi.org/project/openmed/)
- 707 followers on HuggingFace (you!)
- 97 GitHub stars on the [toolkit repo](https://github.com/maziyarpanahi/openmed)

🏆 Top Models by Downloads

1. [OpenMed-NER-PharmaDetect-SuperClinical-434M]( OpenMed/OpenMed-NER-PharmaDetect-SuperClinical-434M) — 147,305 downloads
2. [OpenMed-NER-ChemicalDetect-ElectraMed-33M]( OpenMed/OpenMed-NER-ChemicalDetect-ElectraMed-33M) — 126,785 downloads
3. [OpenMed-NER-BloodCancerDetect-TinyMed-65M]( OpenMed/OpenMed-NER-BloodCancerDetect-TinyMed-65M) — 126,465 downloads

🔬 Model Categories

Our 481 models cover comprehensive medical domains:

- Disease Detection (~50 variants)
- Pharmaceutical Detection (~50 variants)
- Oncology Detection (~50 variants)
- Genomics/DNA Detection (~80 variants)
- Chemical Detection (~50 variants)
- Species/Organism Detection (~60 variants)
- Protein Detection (~50 variants)
- Pathology Detection (~50 variants)
- Blood Cancer Detection (~30 variants)
- Anatomy Detection (~40 variants)
- Zero-Shot NER (GLiNER-based)

OpenMed
OpenMed NER: Open-Source, Domain-Adapted State-of-the-Art Transformers for Biomedical NER Across 12 Public Datasets (2508.01630)
https://huggingface.co/collections/OpenMed/medical-and-clinical-ner
https://huggingface.co/collections/OpenMed/zeroshot-medical-and-clinical-ner
OpenMed/Medical-Reasoning-SFT-GPT-OSS-120B

1 reply

·

mitkox

posted an update about 1 month ago

Post

3303

I just stress-tested the Beast: MiniMax-M2.1 on Z8 Fury G5.
2101 tokens/sec. FORTY concurrent clients. That's 609 t/s out, 1492 t/s in. The model outputs fire faster than I can type, but feeds on data like a black hole on cheat day.
But wait, there's more! Threw it into Claude Code torture testing with 60+ tools, 8 agents (7 sub-agents because apparently one wasn't enough chaos). It didn't even flinch. Extremely fast, scary good at coding. The kind of performance that makes you wonder if the model's been secretly reading Stack Overflow in its spare time lol
3 months ago, these numbers lived in my "maybe in “2030 dreams. Today it's running on my desk AND heaths my home office during the winter!

3 replies

·

codelion

posted an update about 2 months ago

Post

6085

Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models!

Key findings from our research on optimal architectures for small language models:

→ Depth beats width: 32 layers outperforms 12 layers at the same parameter count
→ Best-in-class factuality: 47.5% on TruthfulQA
→ 10x training efficiency using WSD (Warmup-Stable-Decay) conversion
→ Canon layers add only 0.13% parameters but improve reasoning

We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens.

Blog: https://huggingface.co/blog/codelion/optimal-model-architecture
Model: codelion/dhara-70m

1 reply

·

codelion

posted an update about 2 months ago

Post

2405

Introducing PTS Visualizer - an interactive tool for exploring how language models reason!

Visualize pivotal tokens, thought anchors, and reasoning circuits. See which tokens and sentences significantly impact success probability, explore embedding clusters, and trace reasoning step-by-step.

Try it: codelion/pts-visualizer

Explore PTS datasets:
- Qwen3-0.6B: codelion/Qwen3-0.6B-pts
- DeepSeek-R1: codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts

Or upload your own JSONL files!

GitHub: https://github.com/codelion/pts

mitkox

posted an update 2 months ago

Post

2387

Got to 1199.8 tokens/sec with Devstral Small -2 on my desktop GPU workstation. vLLM nightly.
Works out of the box with Mistral Vibe. Next is time to test the big one.

3 replies

·

codelion

posted an update 2 months ago

Post

2598

Recently, Essential AI released a new 8B base model EssentialAI/rnj-1 they highlighted the importance of data mix for pretraning -

"In the long run, we expect our methods to automatically represent, transform, and blend data to optimize measurable abilities in pre-training. Our work on modeling data taxonomies led to new approaches for jointly clustering and mixing data distributions under data repetition penalties. Many improvements in our STEM abilities can be traced back to this. "

This resonates with the recent work we did around optimal dataset mixing for pretraining where we saw have the right mix can increase the efficiency of training -
https://huggingface.co/blog/codelion/optimal-dataset-mixing

codelion

posted an update 2 months ago

Post

2677

NotebookLM's infographics feature is amazing, it generates poster-type images from any text. Here is one I tried for my new HF article on ellora - https://huggingface.co/blog/codelion/ellora-lora-recipes

codelion

posted an update 2 months ago

Post

2321

Perplexity released a dataset (BrowseSafe) and benchmark to catch and prevent malicious prompt-injection instructions in real-time.

We trained a prompt injection classifier on BrowseSafe using adaptive-classifier with ModernBERT-base embeddings.

74.9% F1 on detecting prompt injection in web content.

Model -> adaptive-classifier/browsesafe
Dataset -> perplexity-ai/browsesafe-bench
Repo -> https://github.com/codelion/adaptive-classifier

1 reply

·

codelion

posted an update 2 months ago

Post

1613

I just published Ellora - 6 production-ready LoRA recipes for enhancing LLMs with specific capabilities. Each recipe costs under $100 to run and includes complete training code, data generation, and evaluation.

The 6 Recipes:
Recipe 1: Accuracy Recovery - Recover 75% of quantization losses with self-distillation
Recipe 2: Reasoning LoRA - Add structured thinking with GRPO (0% to 60% adoption, 75% quality boost)
Recipe 3: Tool Calling - Real execution on actual codebases
Recipe 4: Context Extension - Scale from 32K to 2M tokens (61x increase)
Recipe 5: Secure Code Generation - 97% vulnerability reduction using automated Semgrep analysis
Recipe 6: Execution-Aware World Models - Teaching models runtime behavior

Why Recipes?
Ellora provides methodologies, not frameworks. Use them with your existing tools (PEFT, LoRAX, vLLM, Unsloth, HuggingFace). Each recipe uses self-supervised data generation (Magpie approach) - no expensive human labeling required.

All recipes include Jupyter notebooks you can run immediately with clear success metrics.

GitHub: https://github.com/codelion/ellora
Full Article: https://huggingface.co/blog/codelion/ellora-lora-recipes

Built something with these recipes? I'd love to see what you create!

Paper99

authored 2 papers 2 months ago

Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Paper • 2511.22699 • Published Nov 27, 2025 • 237

Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield

Paper • 2511.22677 • Published Nov 27, 2025 • 32

mitkox

posted an update 3 months ago

Post

3191

I run 20 AI coding agents locally on my desktop workstation at 400+ tokens/sec with MiniMax-M2. It’s a Sonnet drop-in replacement in my Cursor, Claude Code, Droid, Kilo and Cline peak at 11k tok/sec input and 433 tok/s output, can generate 1B+ tok/m.All with 196k context window. I'm running it for 6 days now with this config.

Today max performance was stable at 490.2 tokens/sec across 48 concurrent clients and MiniMax M2.

Z8 Fury G5, Xeon 3455, 4xA6K. Aibrix 0.5.0, vLLM 0.11.2,

5 replies

·

codelion

posted an update 3 months ago

Post

1997

Introducing OpenEvolve Prompt Optimizer - a Space that automatically evolves and optimizes your prompts using OpenEvolve!

This tool uses OpenEvolve to iteratively improve prompts by testing them on real datasets and evolving better versions. No more manual prompt engineering guesswork - let OpenEvolve find the optimal prompts for you.

How it works:
- Enter your initial prompt using {input} as a placeholder for dataset inputs
- Input any HuggingFace dataset name you want to use for optimization
- Specify the dataset split and field names for your use case
- Click Optimize Prompt and the system will validate everything first
- Compare your initial prompt vs the evolved best prompt side-by-side

Try it here: algorithmicsuperintelligence/prompt-optimizer

OpenEvolve GitHub: https://github.com/algorithmicsuperintelligence/openevolve

ZeroGPU Explorers

AI & ML interests

Recent Activity

Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield

AI & ML interests

Recent Activity

Team members 750

zero-gpu-explorers's activity