Model Card for Bleenk

Model Summary

Bleenk 123B is an agentic large language model developed by Robi Labs for advanced software engineering tasks. The model is optimized for tool-driven workflows, large-scale codebase exploration, coordinated multi-file editing, and powering autonomous and semi-autonomous software engineering agents.

Bleenk is designed for long-horizon reasoning and real-world engineering environments rather than single-turn code generation.

Model Details

Model Description

Developed by: Robi Labs
Created for: Bleenk
Funded by: Robi Labs
Shared by: Robi Labs
Model type: Agentic Large Language Model (LLM)
Language(s) (NLP): Primarily English; supports multilingual code and technical text
License: To be released by Robi Labs
Finetuned from model: Proprietary pretraining and fine-tuning pipeline

Model Sources

Demo: https://bleenk.app

Uses

Direct Use

Software engineering agents
AI-powered code assistants
Codebase navigation and analysis
Multi-file refactoring and maintenance
Tool-augmented development workflows

Downstream Use

Fine-tuning for organization-specific codebases
Integration into internal developer platforms
Agent frameworks for autonomous engineering

Out-of-Scope Use

General-purpose chat or conversational agents
High-risk decision-making without human oversight
Tasks requiring domain-specific legal, medical, or financial guarantees

Bias, Risks, and Limitations

The model may produce incorrect or incomplete code without verification
Tool misuse may result in unintended system changes
Performance depends on tool availability and prompt quality
Trained primarily on publicly available and licensed data, which may encode historical biases

Recommendations

Users should employ strong sandboxing, testing, and human-in-the-loop review when deploying Bleenk in production environments.

How to Get Started with the Model

ollama pull RobiLabs/bleenk:latest
ollama run RobiLabs/bleenk:latest

Training Details

Training Data

The model was trained on a mixture of:

Publicly available code repositories
Licensed datasets
Synthetic data generated for software engineering tasks

Training Procedure

Preprocessing

Data was filtered for quality, deduplicated, and normalized for code and technical text.

Training Hyperparameters

Training regime: Mixed-precision training (bf16)

Evaluation

Testing Data, Factors & Metrics

Testing Data

SWE-bench Verified
SWE-bench Multilingual
Terminal Bench

Metrics

Task success rate
Patch correctness
Tool execution accuracy

Results

Model	Size (B Tokens)	SWE Bench Verified	SWE Bench Multilingual	Terminal Bench
Bleenk	123	73.2%	71.3%	45.5%
Devstral 2	123	72.2%	61.3%	40.5%
Devstral Small 2	24	65.8%	51.6%	32.0%
DeepSeek v3.2	671	73.1%	70.2%	46.4%
Kimi K2 Thinking	1000	71.3%	61.1%	35.7%
MiniMax M2	230	69.4%	56.5%	30.0%
GLM 4.6	455	68.0%	–	40.5%
Qwen 3 Coder Plus	480	69.6%	54.7%	37.5%
Gemini 3 Pro	–	76.2%	–	54.2%
Claude Sonnet 4.5	–	77.2%	68.0%	42.8%
GPT 5.1 Codex Max	–	77.9%	–	58.1%
GPT 5.1 Codex High	–	73.7%	–	52.8%

Environmental Impact

Environmental impact details will be released as measurements are finalized.

Technical Specifications

Model Architecture and Objective

Transformer-based large language model optimized for agentic reasoning and tool usage.

Compute Infrastructure

Hardware

Large-scale GPU/accelerator clusters

Software

Custom training and inference stack developed by Robi Labs

Model Card Authors

Robi Labs Research Team

Model Card Contact

hello@robiai.com

Downloads last month: 51

GGUF

Model size

125B params

Architecture

mistral3

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support