ArunkumarVR commited on
Commit
a29defa
·
verified ·
1 Parent(s): d65854b

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +8 -87
  2. config.json +6 -17
  3. model.safetensors +2 -2
README.md CHANGED
@@ -1,94 +1,15 @@
1
  ---
2
  license: apache-2.0
3
  language:
4
- - en
5
- pipeline_tag: text-generation
6
  tags:
7
- - qwen3
8
- - reasoning
9
- - long-context
10
- - enterprise
11
- - research
12
- - conversational
13
  ---
14
-
15
  # DeepBrainz-R1-4B-16K
16
 
17
- DeepBrainz-R1-4B-16K is a compact, long-context reasoning model in the **DeepBrainz-R series**, designed for structured problem-solving, analysis, and enterprise research workflows.
18
-
19
- The model emphasizes **reasoning quality, instruction robustness, and stability over long contexts**, while remaining efficient to deploy on modern GPU inference runtimes.
20
-
21
- ---
22
-
23
- ## Model Highlights
24
-
25
- - ~4B parameters
26
- - 16K context length
27
- - Optimized for reasoning-centric math and coding tasks
28
- - Designed for modern GPU inference runtimes
29
- - **Architecture:** Qwen3-compatible (DeepBrainz-R series, post-trained, and optimized for reasoning-centric workloads)
30
-
31
- ---
32
-
33
- ## Intended Use
34
-
35
- - Advanced reasoning systems
36
- - Math and Coding
37
- - Research and evaluation
38
- - Agentic workflows
39
- - Inference-time scaling and test-time compute experiments
40
-
41
- **Not intended** as a general-purpose chat replacement for large frontier models.
42
-
43
- ---
44
-
45
- ## Usage
46
-
47
- ```python
48
- from transformers import AutoModelForCausalLM, AutoTokenizer
49
-
50
- model_id = "DeepBrainz/DeepBrainz-R1-4B-16K"
51
-
52
- tok = AutoTokenizer.from_pretrained(model_id)
53
- mdl = AutoModelForCausalLM.from_pretrained(model_id)
54
-
55
- prompt = "Solve step by step: If x + 5 = 12, what is x?"
56
- inputs = tok(prompt, return_tensors="pt")
57
-
58
- out = mdl.generate(
59
- **inputs,
60
- max_new_tokens=256,
61
- do_sample=True,
62
- temperature=0.6,
63
- top_p=0.95,
64
- )
65
-
66
- print(tok.decode(out[0], skip_special_tokens=True))
67
- ```
68
-
69
- ---
70
-
71
- ## Training Summary
72
-
73
- The model was produced using a **multi-stage optimization process** involving large-scale on-policy optimization and **iterative refinement** to improve reasoning quality and robustness.
74
-
75
- Specific training details are intentionally abstracted in this public release.
76
-
77
- ---
78
-
79
- ## Limitations
80
-
81
- Performance depends on task complexity and inference configuration.
82
- Larger models may outperform R1-4B-16K on extremely complex tasks.
83
-
84
- ---
85
-
86
- ## License
87
-
88
- Apache 2.0
89
-
90
- ---
91
-
92
- ## About DeepBrainz
93
-
94
- DeepBrainz builds reasoning-first AI systems focused on efficiency, structure, and real-world problem-solving.
 
1
  ---
2
  license: apache-2.0
3
  language:
4
+ - en
 
5
  tags:
6
+ - deepbrainz
7
+ - reasoning
8
+ - 4b
9
+ - qwen3
 
 
10
  ---
 
11
  # DeepBrainz-R1-4B-16K
12
 
13
+ **DeepBrainz-R1-4B-16K** is a 4B parameter reasoning model trained by DeepBrainz AI.
14
+ - **Context:** 16,384
15
+ - **Architecture:** Qwen3-4B (Hybrid Sharding Reconstruction)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.json CHANGED
@@ -1,29 +1,18 @@
1
  {
2
- "architectures": [
3
- "Qwen3ForCausalLM"
4
- ],
5
- "model_type": "qwen3",
6
  "hidden_size": 2560,
7
  "intermediate_size": 9728,
8
  "num_hidden_layers": 36,
9
  "num_attention_heads": 32,
10
  "num_key_value_heads": 8,
11
  "head_dim": 128,
 
 
 
 
 
12
  "max_position_embeddings": 16384,
13
- "rms_norm_eps": 1e-06,
14
- "rope_theta": 1000000.0,
15
- "rope_scaling": null,
16
- "attention_bias": false,
17
- "attention_dropout": 0.0,
18
- "hidden_act": "silu",
19
- "initializer_range": 0.02,
20
- "tie_word_embeddings": false,
21
  "torch_dtype": "bfloat16",
22
- "transformers_version": "4.45.0",
23
- "use_cache": true,
24
- "use_sliding_window": false,
25
- "vocab_size": 151936,
26
  "bos_token_id": 151643,
27
  "eos_token_id": 151645,
28
- "pad_token_id": 151643
29
  }
 
1
  {
 
 
 
 
2
  "hidden_size": 2560,
3
  "intermediate_size": 9728,
4
  "num_hidden_layers": 36,
5
  "num_attention_heads": 32,
6
  "num_key_value_heads": 8,
7
  "head_dim": 128,
8
+ "vocab_size": 151936,
9
+ "architectures": [
10
+ "Qwen3ForCausalLM"
11
+ ],
12
+ "model_type": "qwen3",
13
  "max_position_embeddings": 16384,
 
 
 
 
 
 
 
 
14
  "torch_dtype": "bfloat16",
 
 
 
 
15
  "bos_token_id": 151643,
16
  "eos_token_id": 151645,
17
+ "tie_word_embeddings": false
18
  }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:fd82355faee36c0009ba306e6e795de6062fa87a1f7d3a27b453b3974fa17681
3
- size 17645743016
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d714b8badaee6a4c0e8c196e4bfb1ec226bcaf153e958318b9c57d252a264746
3
+ size 8822894488