Rico commited on
Commit
7bf5511
·
1 Parent(s): 5fe5f2a

[UPDATE] update readme and files

Browse files
Files changed (3) hide show
  1. README.md +5 -15
  2. docs/deploy_guidance.md +210 -0
  3. figures/stepfun-logo.png +0 -0
README.md CHANGED
@@ -1,10 +1,6 @@
1
- ---
2
- license: apache-2.0
3
- library_name: transformers
4
- ---
5
  <div align="center">
6
  <picture>
7
- <img src="stepfun-logo.png" width="30%" alt="StepFun: Cost-Effective Multimodal Intelligence">
8
  </picture>
9
  </div>
10
 
@@ -16,14 +12,14 @@ library_name: transformers
16
  </div>
17
 
18
  <div align="center" style="line-height: 1;">
19
- <a href="https://github.com/stepfun-ai/Step3" target="_blank"><img alt="Github" src="https://img.shields.io/badge/🤖Github-StepFun-ffc107?color=ffc107&logoColor=white"/></a>
20
  <a href="https://www.modelscope.cn/models/stepfun-ai/step3" target="_blank"><img alt="ModelScope" src="https://img.shields.io/badge/🤖ModelScope-StepFun-ffc107?color=7963eb&logoColor=white"/></a>
21
  <a href="https://x.com/StepFun_ai" target="_blank"><img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-StepFun-white?logo=x&logoColor=white"/></a>
22
  </div>
23
 
24
  <div align="center" style="line-height: 1;">
25
  <a href="https://discord.com/invite/XHheP5Fn" target="_blank"><img alt="Discord" src="https://img.shields.io/badge/Discord-StepFun-white?logo=discord&logoColor=white"/></a>
26
- <a href="LICENSE"><img alt="License" src="https://img.shields.io/badge/License-Apache%202.0-blue?&color=blue"/></a>
27
  </div>
28
 
29
  <div align="center">
@@ -333,11 +329,6 @@ Note: Parts of the evaluation results are reproduced using the same settings.
333
  > [!Note]
334
  > Step3's API is accessible at https://platform.stepfun.com/, where we offer OpenAI-compatible API for you.
335
 
336
-
337
- > You can access Step3's API on https://platform.stepfun.com/ , we provide OpenAI/Anthropic-compatible API for you.
338
- >
339
-
340
-
341
  ### Inference with Hugging Face Transformers
342
 
343
  We introduce how to use our model at inference stage using transformers library. It is recommended to use python=3.10, torch>=2.1.0, and transformers=4.54.0 as the development environment.We currently only support bf16 inference, and multi-patch is supported by default. This behavior is aligned with vllm and sglang.
@@ -387,7 +378,7 @@ print(decoded)
387
  ### Inference with vLLM and SGLang
388
 
389
 
390
- Our model checkpoints are stored in bf16 and block-fp8 format, you can find it on [Huggingface](https://huggingface.co/stepfun-ai/step3).
391
 
392
  Currently, it is recommended to run Step3 on the following inference engines:
393
 
@@ -419,5 +410,4 @@ Both the code repository and the model weights are released under the [Apache Li
419
  author={StepFun Team},
420
  url={https://stepfun.ai/research/step3},
421
  }
422
- ```
423
-
 
 
 
 
 
1
  <div align="center">
2
  <picture>
3
+ <img src="figures/stepfun-logo.png" width="30%" alt="StepFun: Cost-Effective Multimodal Intelligence">
4
  </picture>
5
  </div>
6
 
 
12
  </div>
13
 
14
  <div align="center" style="line-height: 1;">
15
+ <a href="https://github.com/stepfun-ai/Step3" target="_blank"><img alt="GitHub" src="https://img.shields.io/badge/GitHub-StepFun-white?logo=github&logoColor=white"/></a>
16
  <a href="https://www.modelscope.cn/models/stepfun-ai/step3" target="_blank"><img alt="ModelScope" src="https://img.shields.io/badge/🤖ModelScope-StepFun-ffc107?color=7963eb&logoColor=white"/></a>
17
  <a href="https://x.com/StepFun_ai" target="_blank"><img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-StepFun-white?logo=x&logoColor=white"/></a>
18
  </div>
19
 
20
  <div align="center" style="line-height: 1;">
21
  <a href="https://discord.com/invite/XHheP5Fn" target="_blank"><img alt="Discord" src="https://img.shields.io/badge/Discord-StepFun-white?logo=discord&logoColor=white"/></a>
22
+ <a href="https://huggingface.co/stepfun-ai/step3/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/badge/License-Apache%202.0-blue?&color=blue"/></a>
23
  </div>
24
 
25
  <div align="center">
 
329
  > [!Note]
330
  > Step3's API is accessible at https://platform.stepfun.com/, where we offer OpenAI-compatible API for you.
331
 
 
 
 
 
 
332
  ### Inference with Hugging Face Transformers
333
 
334
  We introduce how to use our model at inference stage using transformers library. It is recommended to use python=3.10, torch>=2.1.0, and transformers=4.54.0 as the development environment.We currently only support bf16 inference, and multi-patch is supported by default. This behavior is aligned with vllm and sglang.
 
378
  ### Inference with vLLM and SGLang
379
 
380
 
381
+ Our model checkpoints are stored in bf16 and block-fp8 format, you can find it on [Huggingface](https://huggingface.co/collections/stepfun-ai/step3-688a3d652dbb45d868f9d42d).
382
 
383
  Currently, it is recommended to run Step3 on the following inference engines:
384
 
 
410
  author={StepFun Team},
411
  url={https://stepfun.ai/research/step3},
412
  }
413
+ ```
 
docs/deploy_guidance.md ADDED
@@ -0,0 +1,210 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Step3 Model Deployment Guide
2
+
3
+ This document provides deployment guidance for Step3 model.
4
+
5
+ Currently, our open-source deployment guide only includes TP and DP+TP deployment methods. The AFD (Attn-FFN Disaggregated) approach mentioned in our [paper](https://arxiv.org/abs/2507.19427) is still under joint development with the open-source community to achieve optimal performance. Please stay tuned for updates on our open-source progress.
6
+
7
+ ## Overview
8
+
9
+ Step3 is a 321B-parameter VLM with hardware-aware model-system co-design optimized for minimizing decoding costs.
10
+
11
+ For out fp8 version, about 326G memory is required.
12
+ The smallest deployment unit for this version is 8xH20 with either Tensor Parallel (TP) or Data Parallel + Tensor Parallel (DP+TP).
13
+
14
+ For out bf16 version, about 642G memory is required.
15
+ The smallest deployment unit for this version is 16xH20 with either Tensor Parallel (TP) or Data Parallel + Tensor Parallel (DP+TP).
16
+
17
+ ## Deployment Options
18
+
19
+ ### vLLM Deployment
20
+
21
+ Please make sure to use nightly version of vllm. For details, please refer to [vllm nightly installation doc](https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html#pre-built-wheels).
22
+ ```bash
23
+ uv pip install -U vllm \
24
+ --torch-backend=auto \
25
+ --extra-index-url https://wheels.vllm.ai/nightly
26
+ ```
27
+
28
+ We recommend to use the following command to deploy the model:
29
+
30
+ **`max_num_batched_tokens` should be larger than 4096. If not set, the default value is 8192.**
31
+
32
+ #### BF16 Model
33
+ ##### Tensor Parallelism(Serving on 16xH20):
34
+
35
+ ```bash
36
+ # start ray on node 0 and node 1
37
+
38
+ # node 0:
39
+ vllm serve /path/to/step3 \
40
+ --tensor-parallel-size 16 \
41
+ --reasoning-parser step3 \
42
+ --enable-auto-tool-choice \
43
+ --tool-call-parser step3 \
44
+ --trust-remote-code \
45
+ --port $PORT_SERVING
46
+ ```
47
+
48
+ ###### Data Parallelism + Tensor Parallelism(Serving on 16xH20):
49
+ Step3 only has single kv head, so attention data parallelism can be adopted to reduce the kv cache memory usage.
50
+
51
+ ```bash
52
+ # start ray on node 0 and node 1
53
+
54
+ # node 0:
55
+ vllm serve /path/to/step3 \
56
+ --data-parallel-size 16 \
57
+ --tensor-parallel-size 1 \
58
+ --reasoning-parser step3 \
59
+ --enable-auto-tool-choice \
60
+ --tool-call-parser step3 \
61
+ --trust-remote-code \
62
+ ```
63
+
64
+ #### FP8 Model
65
+ ##### Tensor Parallelism(Serving on 8xH20):
66
+
67
+ ```bash
68
+ vllm serve /path/to/step3-fp8 \
69
+ --tensor-parallel-size 8 \
70
+ --reasoning-parser step3 \
71
+ --enable-auto-tool-choice \
72
+ --tool-call-parser step3 \
73
+ --gpu-memory-utilization 0.85 \
74
+ --trust-remote-code \
75
+ ```
76
+
77
+ ###### Data Parallelism + Tensor Parallelism(Serving on 8xH20):
78
+
79
+ ```bash
80
+ vllm serve /path/to/step3-fp8 \
81
+ --data-parallel-size 8 \
82
+ --tensor-parallel-size 1 \
83
+ --reasoning-parser step3 \
84
+ --enable-auto-tool-choice \
85
+ --tool-call-parser step3 \
86
+ --trust-remote-code \
87
+ ```
88
+
89
+
90
+ ##### Key parameter notes:
91
+
92
+ * `reasoning-parser`: If enabled, reasoning content in the response will be parsed into a structured format.
93
+ * `tool-call-parser`: If enabled, tool call content in the response will be parsed into a structured format.
94
+
95
+ ### SGLang Deployment
96
+
97
+ 0.4.10 or later is needed for SGLang.
98
+
99
+ ```
100
+ pip3 install "sglang[all]>=0.4.10"
101
+ ```
102
+
103
+ #### BF16 Model
104
+ ##### Tensor Parallelism(Serving on 16xH20):
105
+
106
+ ```bash
107
+ # start ray on node 0 and node 1
108
+
109
+ # node 0:
110
+ python -m sglang.launch_server \
111
+ --model-path /path/to/step3 \
112
+ --trust-remote-code \
113
+ --tool-call-parser step3 \
114
+ --reasoning-parser step3 \
115
+ --tp 16
116
+ ```
117
+
118
+ #### FP8 Model
119
+ ##### Tensor Parallelism(Serving on 8xH20):
120
+
121
+ ```bash
122
+ python -m sglang.launch_server \
123
+ --model-path /path/to/step3-fp8 \
124
+ --trust-remote-code \
125
+ --tool-call-parser step3 \
126
+ --reasoning-parser step3-fp8 \
127
+ --tp 8
128
+ ```
129
+
130
+
131
+ ### TensorRT-LLM Deployment
132
+
133
+ [Coming soon...]
134
+
135
+
136
+ ## Client Request Examples
137
+
138
+ Then you can use the chat API as below:
139
+ ```python
140
+ from openai import OpenAI
141
+
142
+ # Set OpenAI's API key and API base to use vLLM's API server.
143
+ openai_api_key = "EMPTY"
144
+ openai_api_base = "http://localhost:8000/v1"
145
+
146
+ client = OpenAI(
147
+ api_key=openai_api_key,
148
+ base_url=openai_api_base,
149
+ )
150
+
151
+ chat_response = client.chat.completions.create(
152
+ model="step3",
153
+ messages=[
154
+ {"role": "system", "content": "You are a helpful assistant."},
155
+ {
156
+ "role": "user",
157
+ "content": [
158
+ {
159
+ "type": "image_url",
160
+ "image_url": {
161
+ "url": "https://xxxxx.png"
162
+ },
163
+ },
164
+ {"type": "text", "text": "Please describe the image."},
165
+ ],
166
+ },
167
+ ],
168
+ )
169
+ print("Chat response:", chat_response)
170
+ ```
171
+ You can also upload base64-encoded local images:
172
+
173
+ ```python
174
+ import base64
175
+ from openai import OpenAI
176
+ # Set OpenAI's API key and API base to use vLLM's API server.
177
+ openai_api_key = "EMPTY"
178
+ openai_api_base = "http://localhost:8000/v1"
179
+ client = OpenAI(
180
+ api_key=openai_api_key,
181
+ base_url=openai_api_base,
182
+ )
183
+ image_path = "/path/to/local/image.png"
184
+ with open(image_path, "rb") as f:
185
+ encoded_image = base64.b64encode(f.read())
186
+ encoded_image_text = encoded_image.decode("utf-8")
187
+ base64_step = f"data:image;base64,{encoded_image_text}"
188
+ chat_response = client.chat.completions.create(
189
+ model="step3",
190
+ messages=[
191
+ {"role": "system", "content": "You are a helpful assistant."},
192
+ {
193
+ "role": "user",
194
+ "content": [
195
+ {
196
+ "type": "image_url",
197
+ "image_url": {
198
+ "url": base64_step
199
+ },
200
+ },
201
+ {"type": "text", "text": "Please describe the image."},
202
+ ],
203
+ },
204
+ ],
205
+ )
206
+ print("Chat response:", chat_response)
207
+
208
+ ```
209
+
210
+ Note: In our image preprocessing pipeline, we implement a multi-patch mechanism to handle large images. If the input image exceeds 728x728 pixels, the system will automatically apply image cropping logic to get patches of the image.
figures/stepfun-logo.png ADDED