Model Card for Bleenk
Model Summary
Bleenk 123B is an agentic large language model developed by Robi Labs for advanced software engineering tasks. The model is optimized for tool-driven workflows, large-scale codebase exploration, coordinated multi-file editing, and powering autonomous and semi-autonomous software engineering agents.
Bleenk is designed for long-horizon reasoning and real-world engineering environments rather than single-turn code generation.
Model Details
Model Description
- Developed by: Robi Labs
- Created for: Bleenk
- Funded by: Robi Labs
- Shared by: Robi Labs
- Model type: Agentic Large Language Model (LLM)
- Language(s) (NLP): Primarily English; supports multilingual code and technical text
- License: To be released by Robi Labs
- Finetuned from model: Proprietary pretraining and fine-tuning pipeline
Model Sources
- Demo: https://bleenk.app
Uses
Direct Use
- Software engineering agents
- AI-powered code assistants
- Codebase navigation and analysis
- Multi-file refactoring and maintenance
- Tool-augmented development workflows
Downstream Use
- Fine-tuning for organization-specific codebases
- Integration into internal developer platforms
- Agent frameworks for autonomous engineering
Out-of-Scope Use
- General-purpose chat or conversational agents
- High-risk decision-making without human oversight
- Tasks requiring domain-specific legal, medical, or financial guarantees
Bias, Risks, and Limitations
- The model may produce incorrect or incomplete code without verification
- Tool misuse may result in unintended system changes
- Performance depends on tool availability and prompt quality
- Trained primarily on publicly available and licensed data, which may encode historical biases
Recommendations
Users should employ strong sandboxing, testing, and human-in-the-loop review when deploying Bleenk in production environments.
How to Get Started with the Model
ollama pull RobiLabs/bleenk:latest
ollama run RobiLabs/bleenk:latest
Training Details
Training Data
The model was trained on a mixture of:
- Publicly available code repositories
- Licensed datasets
- Synthetic data generated for software engineering tasks
Training Procedure
Preprocessing
Data was filtered for quality, deduplicated, and normalized for code and technical text.
Training Hyperparameters
- Training regime: Mixed-precision training (bf16)
Evaluation
Testing Data, Factors & Metrics
Testing Data
- SWE-bench Verified
- SWE-bench Multilingual
- Terminal Bench
Metrics
- Task success rate
- Patch correctness
- Tool execution accuracy
Results
| Model | Size (B Tokens) | SWE Bench Verified | SWE Bench Multilingual | Terminal Bench |
|---|---|---|---|---|
| Bleenk | 123 | 73.2% | 71.3% | 45.5% |
| Devstral 2 | 123 | 72.2% | 61.3% | 40.5% |
| Devstral Small 2 | 24 | 65.8% | 51.6% | 32.0% |
| DeepSeek v3.2 | 671 | 73.1% | 70.2% | 46.4% |
| Kimi K2 Thinking | 1000 | 71.3% | 61.1% | 35.7% |
| MiniMax M2 | 230 | 69.4% | 56.5% | 30.0% |
| GLM 4.6 | 455 | 68.0% | β | 40.5% |
| Qwen 3 Coder Plus | 480 | 69.6% | 54.7% | 37.5% |
| Gemini 3 Pro | β | 76.2% | β | 54.2% |
| Claude Sonnet 4.5 | β | 77.2% | 68.0% | 42.8% |
| GPT 5.1 Codex Max | β | 77.9% | β | 58.1% |
| GPT 5.1 Codex High | β | 73.7% | β | 52.8% |
Environmental Impact
Environmental impact details will be released as measurements are finalized.
Technical Specifications
Model Architecture and Objective
Transformer-based large language model optimized for agentic reasoning and tool usage.
Compute Infrastructure
Hardware
Large-scale GPU/accelerator clusters
Software
Custom training and inference stack developed by Robi Labs
Model Card Authors
Robi Labs Research Team
Model Card Contact
- Downloads last month
- 51
4-bit