部署指南

阅读时间: 约 20 分钟 适用人群: 运维工程师、DevOps

概述

本文提供 UCM 在生产环境中的部署指南，包括单机部署、分布式部署和容器化部署。

1. 部署准备

1.1 系统要求

组件	最低要求	推荐配置
操作系统	Linux (Ubuntu 20.04+)	Ubuntu 22.04 LTS
CPU	8 核	32+ 核
内存	32 GB	128+ GB
GPU	1x 16GB 显存	多卡 A100/H100
存储	100 GB SSD	NVMe SSD 1TB+
网络	1 Gbps	25 Gbps+ (分布式)

1.2 软件依赖

Python >= 3.10
PyTorch >= 2.0
vLLM == 0.9.2

CUDA >= 11.8
cuDNN >= 8.6
# 可选依赖
prometheus_client  # 监控
grafana           # 可视化

2. 安装 UCM

2.1 pip 安装

# 基础安装
pip install ucm

pip install ucm[full]

2.2 源码安装

git clone https://github.com/your-org/unified-cache-management.git
cd unified-cache-management
# 设置平台
export PLATFORM=cuda  # 或 ascend, musa, maca

pip install -e .

2.3 Docker 安装

FROM nvidia/cuda:12.1-devel-ubuntu22.04

RUN apt-get update && apt-get install -y python3.10 python3-pip

COPY . /app
WORKDIR /app
RUN pip install -e .

EXPOSE 8000 9090

CMD ["python", "-m", "vllm.entrypoints.openai.api_server"]

3. 配置文件

3.1 基础配置

创建 ucm_config.yaml:


ucm_connectors:
  - ucm_connector_name: "UcmPipelineStore"
    ucm_connector_config:
      # Pipeline 类型
      store_pipeline: "Cache|Posix"

      # 存储路径
      storage_backends: "/data/ucm_cache"

      # Pinned Memory 缓冲区
      buffer_number: 2048

      # Block 大小
      block_size: 16

      # Direct I/O
      io_direct: true

ucm_sparse_config:
  ESA:
    sparse_ratio: 0.3
    local_window_sz: 2
    min_blocks: 4

metrics_config_path: "./metrics_config.yaml"

load_only_first_rank: false

3.2 高级配置

ucm_connectors:
  - ucm_connector_name: "UcmPipelineStore"
    ucm_connector_config:
      store_pipeline: "Cache|NFS"
      storage_backends:
        - "/mnt/nfs/ucm_cache"
        - "/mnt/nfs/ucm_cache_backup"
      buffer_number: 4096
      io_direct: true

      # NFS 特定配置
      nfs_config:
        retry_count: 3
        timeout_seconds: 30

multi_gpu:
  enabled: true
  strategy: "round_robin"

monitoring:
  prometheus:
    enabled: true
    port: 9090
  logging:
    level: INFO
    file: "/var/log/ucm/ucm.log"

4. 单机部署

4.1 启动脚本

创建 start_server.sh:

#!/bin/bash

export PLATFORM=cuda
export UNIFIED_CACHE_LOG_LEVEL=INFO
export CUDA_VISIBLE_DEVICES=0,1,2,3

MODEL_PATH="/models/llama-7b"
UCM_CONFIG="./ucm_config.yaml"
# 启动 vLLM + UCM
python -m vllm.entrypoints.openai.api_server \
    --model $MODEL_PATH \
    --kv-connector "UCMConnector" \
    --kv-connector-module-path "ucm.integration.vllm.ucm_connector" \
    --kv-role "kv_both" \
    --kv-connector-extra-config "{\"UCM_CONFIG_FILE\": \"$UCM_CONFIG\"}" \
    --tensor-parallel-size 4 \
    --max-model-len 8192 \
    --gpu-memory-utilization 0.8 \
    --port 8000

4.2 运行服务

chmod +x start_server.sh
./start_server.sh

4.3 验证部署

curl http://localhost:8000/health

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-7b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

5. 分布式部署

5.1 架构

graph TB subgraph clients["客户端"] C1["Client 1"] C2["Client 2"] C3["Client N"] end subgraph lb["负载均衡"] LB["Load Balancer
(Nginx/HAProxy)"] end subgraph workers["推理节点"] W1["Worker 1
GPU 0-3"] W2["Worker 2
GPU 0-3"] W3["Worker N
GPU 0-3"] end subgraph storage["共享存储"] NFS["NFS Server
KV Cache 存储"] end C1 --> LB C2 --> LB C3 --> LB LB --> W1 LB --> W2 LB --> W3 W1 --> NFS W2 --> NFS W3 --> NFS

5.2 NFS 服务器配置

# /etc/exports
/data/ucm_cache  *(rw,sync,no_subtree_check,no_root_squash)
# 启动 NFS
sudo systemctl start nfs-kernel-server

5.3 Worker 节点配置

ucm_connectors:
  - ucm_connector_name: "UcmPipelineStore"
    ucm_connector_config:
      store_pipeline: "Cache|NFS"
      storage_backends: "192.168.1.100:/data/ucm_cache"
      buffer_number: 4096

5.4 负载均衡配置

# nginx.conf
upstream ucm_backend {
    least_conn;
    server worker1:8000;
    server worker2:8000;
    server worker3:8000;
}
server {
    listen 80;
    location / {
        proxy_pass http://ucm_backend;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }
}

6. Kubernetes 部署

6.1 Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ucm-server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ucm
  template:
    metadata:
      labels:
        app: ucm
    spec:
      containers:
      - name: ucm
        image: ucm:latest
        ports:
        - containerPort: 8000
        - containerPort: 9090
        resources:
          limits:
            nvidia.com/gpu: 4
        volumeMounts:
        - name: cache-storage
          mountPath: /data/ucm_cache
        - name: config
          mountPath: /app/config
        env:
        - name: UNIFIED_CACHE_LOG_LEVEL
          value: "INFO"
      volumes:
      - name: cache-storage
        persistentVolumeClaim:
          claimName: ucm-cache-pvc
      - name: config
        configMap:
          name: ucm-config

6.2 Service

apiVersion: v1
kind: Service
metadata:
  name: ucm-service
spec:
  type: LoadBalancer
  ports:
  - name: api
    port: 8000
    targetPort: 8000
  - name: metrics
    port: 9090
    targetPort: 9090
  selector:
    app: ucm

6.3 ConfigMap

apiVersion: v1
kind: ConfigMap
metadata:
  name: ucm-config
data:
  ucm_config.yaml: |
    ucm_connectors:
      - ucm_connector_name: "UcmPipelineStore"
        ucm_connector_config:
          store_pipeline: "Cache|Posix"
          storage_backends: "/data/ucm_cache"
          buffer_number: 2048

7. 多平台部署

7.1 NVIDIA GPU

export PLATFORM=cuda
export CUDA_VISIBLE_DEVICES=0,1,2,3
pip install -e .

7.2 Huawei Ascend NPU

export PLATFORM=ascend
export ASCEND_VISIBLE_DEVICES=0,1,2,3
pip install -e .

7.3 平台特定配置

platform:
  type: ascend
  device_ids: [0, 1, 2, 3]

ucm_connectors:
  - ucm_connector_name: "UcmPipelineStore"
    ucm_connector_config:
      store_pipeline: "Cache|Posix"
      storage_backends: "/data/ucm_cache"
      # Ascend 特定配置
      ascend_config:
        enable_hccl: true

8. 生产环境清单

8.1 部署前检查

系统要求满足
GPU 驱动正确安装
CUDA/cuDNN 版本兼容
存储空间充足
网络配置正确

8.2 配置检查

配置文件语法正确
存储路径存在且可写
端口未被占用
权限设置正确

8.3 监控检查

Prometheus 指标可访问
日志正常输出
告警规则已配置
Grafana 仪表板可用

2026年2月24日

GitHub

基础设施调试指南

配置参考手册

Ucm

Inference Cookbook

Title here

部署指南

概述

1. 部署准备

1.1 系统要求

1.2 软件依赖

2. 安装 UCM

2.1 pip 安装

2.2 源码安装

2.3 Docker 安装

3. 配置文件

3.1 基础配置

3.2 高级配置

4. 单机部署

4.1 启动脚本

4.2 运行服务

4.3 验证部署

5. 分布式部署

5.1 架构

5.2 NFS 服务器配置

5.3 Worker 节点配置

5.4 负载均衡配置

6. Kubernetes 部署

6.1 Deployment

6.2 Service

6.3 ConfigMap

7. 多平台部署

7.1 NVIDIA GPU

7.2 Huawei Ascend NPU

7.3 平台特定配置

8. 生产环境清单

8.1 部署前检查

8.2 配置检查

8.3 监控检查

部署指南

概述#

1. 部署准备#

1.1 系统要求#

1.2 软件依赖#

2. 安装 UCM#

2.1 pip 安装#

2.2 源码安装#

2.3 Docker 安装#

3. 配置文件#

3.1 基础配置#

3.2 高级配置#

4. 单机部署#

4.1 启动脚本#

4.2 运行服务#

4.3 验证部署#

5. 分布式部署#

5.1 架构#

5.2 NFS 服务器配置#

5.3 Worker 节点配置#

5.4 负载均衡配置#

6. Kubernetes 部署#

6.1 Deployment#

6.2 Service#

6.3 ConfigMap#

7. 多平台部署#

7.1 NVIDIA GPU#

7.2 Huawei Ascend NPU#

7.3 平台特定配置#

8. 生产环境清单#

8.1 部署前检查#

8.2 配置检查#

8.3 监控检查#

概述

1. 部署准备

1.1 系统要求

1.2 软件依赖

2. 安装 UCM

2.1 pip 安装

2.2 源码安装

2.3 Docker 安装

3. 配置文件

3.1 基础配置

3.2 高级配置

4. 单机部署

4.1 启动脚本

4.2 运行服务

4.3 验证部署

5. 分布式部署

5.1 架构

5.2 NFS 服务器配置

5.3 Worker 节点配置

5.4 负载均衡配置

6. Kubernetes 部署

6.1 Deployment

6.2 Service

6.3 ConfigMap

7. 多平台部署

7.1 NVIDIA GPU

7.2 Huawei Ascend NPU

7.3 平台特定配置

8. 生产环境清单

8.1 部署前检查

8.2 配置检查

8.3 监控检查