Upload folder using huggingface_hub

Browse files

Files changed (3) hide show

README.md +8 -87
config.json +6 -17
model.safetensors +2 -2

README.md CHANGED Viewed

@@ -1,94 +1,15 @@
 ---
 license: apache-2.0
 language:
-- en
-pipeline_tag: text-generation
 tags:
-- qwen3
-- reasoning
-- long-context
-- enterprise
-- research
-- conversational
 ---
 # DeepBrainz-R1-4B-16K
-DeepBrainz-R1-4B-16K is a compact, long-context reasoning model in the **DeepBrainz-R series**, designed for structured problem-solving, analysis, and enterprise research workflows.
-The model emphasizes **reasoning quality, instruction robustness, and stability over long contexts**, while remaining efficient to deploy on modern GPU inference runtimes.
----
-## Model Highlights
-- ~4B parameters
-- 16K context length
-- Optimized for reasoning-centric math and coding tasks
-- Designed for modern GPU inference runtimes
-- **Architecture:** Qwen3-compatible (DeepBrainz-R series, post-trained, and optimized for reasoning-centric workloads)
----
-## Intended Use
-- Advanced reasoning systems
-- Math and Coding
-- Research and evaluation
-- Agentic workflows
-- Inference-time scaling and test-time compute experiments
-**Not intended** as a general-purpose chat replacement for large frontier models.
----
-## Usage
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model_id = "DeepBrainz/DeepBrainz-R1-4B-16K"
-tok = AutoTokenizer.from_pretrained(model_id)
-mdl = AutoModelForCausalLM.from_pretrained(model_id)
-prompt = "Solve step by step: If x + 5 = 12, what is x?"
-inputs = tok(prompt, return_tensors="pt")
-out = mdl.generate(
-    **inputs,
-    max_new_tokens=256,
-    do_sample=True,
-    temperature=0.6,
-    top_p=0.95,
-)
-print(tok.decode(out[0], skip_special_tokens=True))
-```
----
-## Training Summary
-The model was produced using a **multi-stage optimization process** involving large-scale on-policy optimization and **iterative refinement** to improve reasoning quality and robustness.
-Specific training details are intentionally abstracted in this public release.
----
-## Limitations
-Performance depends on task complexity and inference configuration.
-Larger models may outperform R1-4B-16K on extremely complex tasks.
----
-## License
-Apache 2.0
----
-## About DeepBrainz
-DeepBrainz builds reasoning-first AI systems focused on efficiency, structure, and real-world problem-solving.

 ---
 license: apache-2.0
 language:
+  - en
 tags:
+  - deepbrainz
+  - reasoning
+  - 4b
+  - qwen3
 ---
 # DeepBrainz-R1-4B-16K
+**DeepBrainz-R1-4B-16K** is a 4B parameter reasoning model trained by DeepBrainz AI.
+- **Context:** 16,384
+- **Architecture:** Qwen3-4B (Hybrid Sharding Reconstruction)

config.json CHANGED Viewed

@@ -1,29 +1,18 @@
 {
-  "architectures": [
-    "Qwen3ForCausalLM"
-  ],
-  "model_type": "qwen3",
   "hidden_size": 2560,
   "intermediate_size": 9728,
   "num_hidden_layers": 36,
   "num_attention_heads": 32,
   "num_key_value_heads": 8,
   "head_dim": 128,
   "max_position_embeddings": 16384,
-  "rms_norm_eps": 1e-06,
-  "rope_theta": 1000000.0,
-  "rope_scaling": null,
-  "attention_bias": false,
-  "attention_dropout": 0.0,
-  "hidden_act": "silu",
-  "initializer_range": 0.02,
-  "tie_word_embeddings": false,
   "torch_dtype": "bfloat16",
-  "transformers_version": "4.45.0",
-  "use_cache": true,
-  "use_sliding_window": false,
-  "vocab_size": 151936,
   "bos_token_id": 151643,
   "eos_token_id": 151645,
-  "pad_token_id": 151643
 }

 {
   "hidden_size": 2560,
   "intermediate_size": 9728,
   "num_hidden_layers": 36,
   "num_attention_heads": 32,
   "num_key_value_heads": 8,
   "head_dim": 128,
+  "vocab_size": 151936,
+  "architectures": [
+    "Qwen3ForCausalLM"
+  ],
+  "model_type": "qwen3",
   "max_position_embeddings": 16384,
   "torch_dtype": "bfloat16",
   "bos_token_id": 151643,
   "eos_token_id": 151645,
+  "tie_word_embeddings": false
 }

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:fd82355faee36c0009ba306e6e795de6062fa87a1f7d3a27b453b3974fa17681
-size 17645743016

 version https://git-lfs.github.com/spec/v1
+oid sha256:d714b8badaee6a4c0e8c196e4bfb1ec226bcaf153e958318b9c57d252a264746
+size 8822894488