EllaNeiman commited on
Commit
eb2568c
·
verified ·
1 Parent(s): d40bdef

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -4
README.md CHANGED
@@ -64,10 +64,10 @@ Unlike most compact models, Jamba Reasoning 3B supports extremely long contexts.
64
 
65
  ### **Run the model with vLLM**
66
 
67
- For best results, we recommend using vLLM version 0.10.2 or higher and enabling `--mamba-ssm-cache-dtype=float32`
68
 
69
  ```bash
70
- pip install vllm>=0.10.2
71
  ```
72
 
73
  Using vllm in online server mode:
@@ -83,10 +83,9 @@ from vllm import LLM, SamplingParams
83
  from transformers import AutoTokenizer
84
 
85
  model = "ai21labs/AI21-Jamba-Reasoning-3B"
86
- number_gpus = 1
87
 
88
  llm = LLM(model=model,
89
- tensor_parallel_size=number_gpus,
90
  mamba_ssm_cache_dtype="float32")
91
 
92
  tokenizer = AutoTokenizer.from_pretrained(model)
 
64
 
65
  ### **Run the model with vLLM**
66
 
67
+ For best results, we recommend using vLLM version 0.11.0 or higher and enabling `--mamba-ssm-cache-dtype=float32`
68
 
69
  ```bash
70
+ pip install vllm>=0.11.0
71
  ```
72
 
73
  Using vllm in online server mode:
 
83
  from transformers import AutoTokenizer
84
 
85
  model = "ai21labs/AI21-Jamba-Reasoning-3B"
 
86
 
87
  llm = LLM(model=model,
88
+ tensor_parallel_size=1,
89
  mamba_ssm_cache_dtype="float32")
90
 
91
  tokenizer = AutoTokenizer.from_pretrained(model)