Model Card for Bleenk

Model Summary

Bleenk 123B is an agentic large language model developed by Robi Labs for advanced software engineering tasks. The model is optimized for tool-driven workflows, large-scale codebase exploration, coordinated multi-file editing, and powering autonomous and semi-autonomous software engineering agents.

Bleenk is designed for long-horizon reasoning and real-world engineering environments rather than single-turn code generation.

Model Details

Model Description

  • Developed by: Robi Labs
  • Created for: Bleenk
  • Funded by: Robi Labs
  • Shared by: Robi Labs
  • Model type: Agentic Large Language Model (LLM)
  • Language(s) (NLP): Primarily English; supports multilingual code and technical text
  • License: To be released by Robi Labs
  • Finetuned from model: Proprietary pretraining and fine-tuning pipeline

Model Sources

Uses

Direct Use

  • Software engineering agents
  • AI-powered code assistants
  • Codebase navigation and analysis
  • Multi-file refactoring and maintenance
  • Tool-augmented development workflows

Downstream Use

  • Fine-tuning for organization-specific codebases
  • Integration into internal developer platforms
  • Agent frameworks for autonomous engineering

Out-of-Scope Use

  • General-purpose chat or conversational agents
  • High-risk decision-making without human oversight
  • Tasks requiring domain-specific legal, medical, or financial guarantees

Bias, Risks, and Limitations

  • The model may produce incorrect or incomplete code without verification
  • Tool misuse may result in unintended system changes
  • Performance depends on tool availability and prompt quality
  • Trained primarily on publicly available and licensed data, which may encode historical biases

Recommendations

Users should employ strong sandboxing, testing, and human-in-the-loop review when deploying Bleenk in production environments.

How to Get Started with the Model

ollama pull RobiLabs/bleenk:latest
ollama run RobiLabs/bleenk:latest

Training Details

Training Data

The model was trained on a mixture of:

  • Publicly available code repositories
  • Licensed datasets
  • Synthetic data generated for software engineering tasks

Training Procedure

Preprocessing

Data was filtered for quality, deduplicated, and normalized for code and technical text.

Training Hyperparameters

  • Training regime: Mixed-precision training (bf16)

Evaluation

Testing Data, Factors & Metrics

Testing Data

  • SWE-bench Verified
  • SWE-bench Multilingual
  • Terminal Bench

Metrics

  • Task success rate
  • Patch correctness
  • Tool execution accuracy

Results

Model Size (B Tokens) SWE Bench Verified SWE Bench Multilingual Terminal Bench
Bleenk 123 73.2% 71.3% 45.5%
Devstral 2 123 72.2% 61.3% 40.5%
Devstral Small 2 24 65.8% 51.6% 32.0%
DeepSeek v3.2 671 73.1% 70.2% 46.4%
Kimi K2 Thinking 1000 71.3% 61.1% 35.7%
MiniMax M2 230 69.4% 56.5% 30.0%
GLM 4.6 455 68.0% – 40.5%
Qwen 3 Coder Plus 480 69.6% 54.7% 37.5%
Gemini 3 Pro – 76.2% – 54.2%
Claude Sonnet 4.5 – 77.2% 68.0% 42.8%
GPT 5.1 Codex Max – 77.9% – 58.1%
GPT 5.1 Codex High – 73.7% – 52.8%

Environmental Impact

Environmental impact details will be released as measurements are finalized.

Technical Specifications

Model Architecture and Objective

Transformer-based large language model optimized for agentic reasoning and tool usage.

Compute Infrastructure

Hardware

Large-scale GPU/accelerator clusters

Software

Custom training and inference stack developed by Robi Labs

Model Card Authors

Robi Labs Research Team

Model Card Contact

hello@robiai.com

Downloads last month
51
GGUF
Model size
125B params
Architecture
mistral3
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support