2025-04-22 16:56:39
LLaVA部署 | ZhiJiao
LLaVA部署
2024-12-11 20:13:20

多模态大模型LLaVA部署

使用LLaVA的官方项目进行部署, 项目中已经完善了基于gradio的web页面, 整体部署相对比较方便,这里记录一下出现的bug.

官方地址: https://github.com/haotian-liu/LLaVA

整个LLaVA的部署流程如下图所示(见官方项目)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
flowchart BT
%% Declare Nodes
gws("Gradio (UI Server)")
c("Controller (API Server):<br/>PORT: 10000")
mw7b("Model Worker:<br/>llava-v1.5-7b<br/>PORT: 40000")
mw13b("Model Worker:<br/>llava-v1.5-13b<br/>PORT: 40001")
sglw13b("SGLang Backend:<br/>llava-v1.6-34b<br/>http://localhost:30000")
lsglw13b("SGLang Worker:<br/>llava-v1.6-34b<br/>PORT: 40002")

%% Declare Styles
classDef data fill:#3af,stroke:#48a,stroke-width:2px,color:#444
classDef success fill:#8f8,stroke:#0a0,stroke-width:2px,color:#444
classDef failure fill:#f88,stroke:#f00,stroke-width:2px,color:#444

%% Assign Styles
class id,od data;
class cimg,cs_s,scsim_s success;
class ncimg,cs_f,scsim_f failure;

subgraph Demo Connections
direction BT
c<-->gws

mw7b<-->c
mw13b<-->c
lsglw13b<-->c
sglw13b<-->lsglw13b
end

这里主要讲三部分: Gradio Web server, LLaVA controller, wokers

git clone and install environment
1
2
git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA
1
2
3
4
conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip # enable PEP 660 support
pip install -e .
1
2
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
LLaVA controller
1
python -m llava.serve.controller --host 0.0.0.0 --port 10000
Gradio web server
1
2
python -m llava.serve.gradio_web_server --host 0.0.0.0 --port 6006 --controller 
http://localhost:10000 --model-list-mode reload
workers

主要参数: CUDA_VISIBLE_DEVICES

  1. 单卡单模型

    1
    CUDA_VISIBLE_DEVICES=0 python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path /home/zhijiao/MLM/llava/models/llava-v1.6-vicuna-7b
  2. 多卡多模型

1
2
3
4
# GPU-1 : 第1个model worker
CUDA_VISIBLE_DEVICES=0 python -m llava.serve.model_worker --host 0.0.0.0 --controller
http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path
/root/autodl-tmp/models/llava-v1.6-vicuna-7b
1
2
3
4
# GPU-2 : 第2个model worker
CUDA_VISIBLE_DEVICES=1 python -m llava.serve.model_worker --host 0.0.0.0 --controller
http://localhost:10000 --port 40001 --worker http://localhost:40001 --model-path
/root/autodl-tmp/models/llava-v1.5-7b]
  1. 多卡单模型
1
2
3
CUDA_VISIBLE_DEVICES=0,1 python -m llava.serve.model_worker --host 0.0.0.0 --controller 
http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path
/root/autodl-tmp/models/llava-v1.6-vicuna-7b
Bugs
  1. 初次加载workers时,会自动从huggingface上下载对应模型的checkpoints, 此时需要代理。
  2. 运行workers指令时,如果有全局代理,可能会出现如下类似报错:
    1
    2
    3
    4
    packages/fastchat/serve/base_model_worker.py", line 97, in register_to_controller
    2024-06-04 16:34:51 | ERROR | stderr | assert r.status_code == 200
    2024-06-04 16:34:51 | ERROR | stderr | ^^^^^^^^^^^^^^^^^^^^
    2024-06-04 16:34:51 | ERROR | stderr | AssertionError
    解决方案:关闭代理
  3. 运行Gradio Webui时可能会在右上角弹出如下错误:
1
2
3
4
Error 
Could not parse server response: SyntaxError:
Unexpected token 'l', "Internal s"... is not valid
JSON

解决方案:
fastapi版本过高,降低版本

1
pip install fastapi==0.111.0
Preview

image

Prev
2024-12-11 20:13:20
Next
2025-04-22 16:56:39