配置参考手册

阅读时间: 约 15 分钟 前置要求: 部署指南

概述

本文档提供 UCM 所有配置项的完整参考，包括 YAML 配置、环境变量和运行时配置。

1. 配置文件结构

1.1 YAML 配置文件

# ucm_config.yaml - UCM 完整配置模板
# ========================================
# ========================================
ucm_connectors:
  - ucm_connector_name: "UcmPipelineStore"
    ucm_connector_config:
      # Pipeline 配置
      store_pipeline: "Cache|Posix"
      # 存储路径
      storage_backends: "/data/ucm_cache"

      # 缓冲池配置
      buffer_number: 2048
      # Block 配置
      block_size: 16
      shard_size: 2048
      tensor_size: 65536

# 稀疏注意力配置
# ========================================
ucm_sparse_method: "GSA"

ucm_sparse_config:
  GSA:
    sparse_ratio: 0.3
    local_window_sz: 2
    init_window_sz: 1
    min_blocks: 4
    prefetch_workers: 4

# 监控配置
# ========================================
metrics:
  enabled: true
  prometheus_port: 9090

# 日志配置
# ========================================
logging:
  level: "INFO"
  format: "[%(asctime)s] [%(name)s] [%(levelname)s] %(message)s"

1.2 配置优先级

graph TB subgraph priority["配置优先级（从高到低）"] P1["1. 运行时参数"] P2["2. 环境变量"] P3["3. 配置文件"] P4["4. 默认值"] P1 --> P2 --> P3 --> P4 end

2. 存储连接器配置

2.1 通用配置

参数	类型	默认值	说明
`ucm_connector_name`	string	-	连接器名称
`storage_backends`	string/list	-	存储路径
`store_pipeline`	string	“Posix”	Pipeline 组合
`block_size`	int	16	Block 大小（Token 数）
`shard_size`	int	2048	分片大小
`tensor_size`	int	65536	张量大小

2.2 Cache 层配置

ucm_connector_config:
  # 缓冲池大小（必须 >= 1024）
  buffer_number: 2048

  # LRU 缓存容量
  cache_capacity: 10000

  # 预取配置
  prefetch_enabled: true
  prefetch_depth: 4

参数	类型	默认值	说明
`buffer_number`	int	1024	Pinned Buffer 数量
`cache_capacity`	int	10000	LRU 缓存容量
`prefetch_enabled`	bool	true	启用预取
`prefetch_depth`	int	4	预取深度

2.3 Posix 后端配置

ucm_connector_config:
  storage_backends: "/data/ucm_cache"

  # 文件配置
  file_mode: 0644
  dir_mode: 0755

  # I/O 配置
  use_direct_io: true
  io_block_size: 4096

参数	类型	默认值	说明
`storage_backends`	string	-	存储路径
`use_direct_io`	bool	true	使用 Direct I/O
`io_block_size`	int	4096	I/O 块大小

2.4 NFS 后端配置

ucm_connector_config:
  storage_backends:
    - "/mnt/nfs1/ucm_cache"
    - "/mnt/nfs2/ucm_cache"

  # 路径选择策略
  path_strategy: "round_robin"  # round_robin, random, tiered

  # NFS 特定配置
  nfs_rsize: 1048576
  nfs_wsize: 1048576

2.5 DS3FS 后端配置

ucm_connector_config:
  # S3 配置
  s3_endpoint: "https://s3.amazonaws.com"
  storage_backends: "ucm-cache-bucket"
  s3_access_key: "${AWS_ACCESS_KEY_ID}"
  s3_secret_key: "${AWS_SECRET_ACCESS_KEY}"

  # 优化配置
  use_io_uring: true
  connection_pool_size: 32
  max_concurrent_requests: 64

  # 分块上传
  multipart_threshold: 8388608
  multipart_chunk_size: 8388608

2.6 Mooncake 后端配置

ucm_connector_config:
  cluster_endpoints:
    - "mooncake-1.local:9100"
    - "mooncake-2.local:9100"

  # RDMA 配置
  use_rdma: true
  rdma_device: "mlx5_0"

  # 复制因子
  replication_factor: 2

3. Pipeline 配置

3.1 Pipeline 语法

store_pipeline: "<Layer1>|<Layer2>|..."

3.2 支持的组合

Pipeline	说明	适用场景
`Posix`	仅本地存储	单机测试
`Cache`	仅内存缓存	临时缓存
`Cache\|Posix`	缓存 + 本地	单机生产
`Cache\|NFS`	缓存 + NFS	企业内网
`Cache\|Ds3fs`	缓存 + S3	云环境
`Cache\|Mooncake`	缓存 + 分布式	高性能集群

3.3 多级 Pipeline

# 三级 Pipeline
store_pipeline: "Cache|NFS|Ds3fs"

# 1. 首先查找内存缓存
# 2. 未命中则查找 NFS

4. 稀疏注意力配置

4.1 方法选择

# 可选值: GSA, ESA, Blend, KVStar, RERoPE
ucm_sparse_method: "GSA"

4.2 GSA 配置

ucm_sparse_config:
  GSA:
    # 稀疏比例
    sparse_ratio: 0.3
    # 窗口配置
    local_window_sz: 2
    init_window_sz: 1
    # 最小 Block 数
    min_blocks: 4

    # 预取配置
    prefetch_workers: 4
    max_pending_prefetch: 64
    prefetch_ahead: 2

4.3 ESA 配置

ucm_sparse_config:
  ESA:
    sparse_ratio: 0.3
    local_window_sz: 2
    init_window_sz: 1
    min_blocks: 4

    # 检索配置
    retrieval_stride: 5
    retrieval_workers: 4
    retrieval_cache_size: 10000

4.4 Blend 配置

ucm_sparse_config:
  Blend:
    # Chunk 配置
    blend_chunk_size: 64
    min_match_ratio: 0.8
    # 位置校正
    position_correction: true
    recompute_boundary: 2
    # RoPE 配置
    rope_theta: 10000.0

4.5 KVStar 配置

ucm_sparse_config:
  KVStar:
    checkpoint_interval: 8
    lock_duration: 16
    update_threshold: 2

    # 底层方法
    base_method: "GSA"

4.6 RERoPE 配置

ucm_sparse_config:
  RERoPE:
    max_position: 4096
    extend_factor: 2.0
    rope_theta: 10000.0
    # 底层方法
    base_method: "GSA"

5. 环境变量

5.1 构建环境变量

变量	默认值	说明
`PLATFORM`	cuda	硬件平台
`ENABLE_SPARSE`	false	启用稀疏模块

5.2 运行时环境变量

变量	默认值	说明
`UNIFIED_CACHE_LOG_LEVEL`	INFO	日志级别
`UCM_CONFIG_FILE`	-	配置文件路径
`UCM_DISABLE_PATCHES`	0	禁用补丁
`UCM_PATCH_VERBOSE`	0	补丁详细日志

5.3 监控环境变量

变量	默认值	说明
`UCM_METRICS_ENABLED`	true	启用监控
`UCM_METRICS_PORT`	9090	Prometheus 端口

5.4 传输层环境变量

变量	默认值	说明
`UCM_PINNED_POOL_SIZE`	1073741824	Pinned 池大小
`UCM_TRANSPORT_STREAMS`	2	传输流数量

6. vLLM 集成配置

6.1 KVTransferConfig

from vllm.config import KVTransferConfig

ktc = KVTransferConfig(
    # Connector 类名
    kv_connector="UCMConnector",

    # 模块路径
    kv_connector_module_path="ucm.integration.vllm.ucm_connector",
    # 角色
    kv_role="kv_both",  # kv_producer, kv_consumer, kv_both

    # 额外配置
    kv_connector_extra_config={
        "UCM_CONFIG_FILE": "./ucm_config.yaml"
    }
)

6.2 完整 vLLM 示例

from vllm import LLM
from vllm.config import KVTransferConfig
# UCM 配置
ktc = KVTransferConfig(
    kv_connector="UCMConnector",
    kv_connector_module_path="ucm.integration.vllm.ucm_connector",
    kv_role="kv_both",
    kv_connector_extra_config={
        "UCM_CONFIG_FILE": "./ucm_config.yaml",
        # 或直接配置
        "store_pipeline": "Cache|Posix",
        "storage_backends": "/data/ucm_cache",
        "buffer_number": 2048,
    }
)
# 创建 LLM
llm = LLM(
    model="meta-llama/Llama-2-7b-hf",
    tensor_parallel_size=1,
    kv_transfer_config=ktc,
)

7. 配置验证

7.1 验证脚本

#!/usr/bin/env python
"""验证 UCM 配置"""

import yaml
from pathlib import Path


def validate_config(config_path: str):
    """验证配置文件"""
    with open(config_path) as f:
        config = yaml.safe_load(f)

    errors = []
    warnings = []

    # 检查必需字段
    if 'ucm_connectors' not in config:
        errors.append("Missing 'ucm_connectors' field")
    else:
        for i, conn in enumerate(config['ucm_connectors']):
            if 'ucm_connector_name' not in conn:
                errors.append(f"Connector {i}: missing 'ucm_connector_name'")
            if 'ucm_connector_config' not in conn:
                errors.append(f"Connector {i}: missing 'ucm_connector_config'")
            else:
                cfg = conn['ucm_connector_config']
                if 'storage_backends' not in cfg:
                    errors.append(f"Connector {i}: missing 'storage_backends'")

                # 检查 buffer_number
                if cfg.get('buffer_number', 1024) < 1024:
                    warnings.append(f"Connector {i}: buffer_number < 1024")

    # 检查稀疏配置
    if 'ucm_sparse_method' in config:
        method = config['ucm_sparse_method']
        sparse_config = config.get('ucm_sparse_config', {})
        if method not in sparse_config:
            warnings.append(f"Sparse method '{method}' has no configuration")

    # 输出结果
    if errors:
        print("Errors:")
        for e in errors:
            print(f"  - {e}")
    if warnings:
        print("Warnings:")
        for w in warnings:
            print(f"  - {w}")
    if not errors and not warnings:
        print("Configuration is valid!")

    return len(errors) == 0


if __name__ == "__main__":
    import sys
    config_file = sys.argv[1] if len(sys.argv) > 1 else "ucm_config.yaml"
    validate_config(config_file)

7.2 运行验证

python validate_config.py ucm_config.yaml

8. 配置示例

8.1 开发环境

ucm_connectors:
  - ucm_connector_name: "UcmPipelineStore"
    ucm_connector_config:
      store_pipeline: "Cache|Posix"
      storage_backends: "/tmp/ucm_cache"
      buffer_number: 1024
ucm_sparse_method: "GSA"
ucm_sparse_config:
  GSA:
    sparse_ratio: 0.3

logging:
  level: "DEBUG"

8.2 生产环境

ucm_connectors:
  - ucm_connector_name: "UcmPipelineStore"
    ucm_connector_config:
      store_pipeline: "Cache|Posix"
      storage_backends: "/data/ucm_cache"
      buffer_number: 4096
      cache_capacity: 50000
ucm_sparse_method: "GSA"
ucm_sparse_config:
  GSA:
    sparse_ratio: 0.3
    prefetch_workers: 8
metrics:
  enabled: true
  prometheus_port: 9090
logging:
  level: "INFO"

8.3 云环境

ucm_connectors:
  - ucm_connector_name: "UcmPipelineStore"
    ucm_connector_config:
      store_pipeline: "Cache|Ds3fs"
      s3_endpoint: "https://s3.amazonaws.com"
      storage_backends: "my-ucm-bucket"
      s3_access_key: "${AWS_ACCESS_KEY_ID}"
      s3_secret_key: "${AWS_SECRET_ACCESS_KEY}"
      buffer_number: 2048
      use_io_uring: true
ucm_sparse_method: "GSA"

2026年2月24日

GitHub

部署指南

性能调优指南

配置参考手册

概述#

1. 配置文件结构#

1.1 YAML 配置文件#

1.2 配置优先级#

2. 存储连接器配置#

2.1 通用配置#

2.2 Cache 层配置#

2.3 Posix 后端配置#

2.4 NFS 后端配置#

2.5 DS3FS 后端配置#

2.6 Mooncake 后端配置#

3. Pipeline 配置#

3.1 Pipeline 语法#

3.2 支持的组合#

3.3 多级 Pipeline#

4. 稀疏注意力配置#

4.1 方法选择#

4.2 GSA 配置#

4.3 ESA 配置#

4.4 Blend 配置#

4.5 KVStar 配置#

4.6 RERoPE 配置#

5. 环境变量#

5.1 构建环境变量#

5.2 运行时环境变量#

5.3 监控环境变量#

5.4 传输层环境变量#

6. vLLM 集成配置#

6.1 KVTransferConfig#

6.2 完整 vLLM 示例#

7. 配置验证#

7.1 验证脚本#

7.2 运行验证#

8. 配置示例#

8.1 开发环境#

8.2 生产环境#

8.3 云环境#

概述

1. 配置文件结构

1.1 YAML 配置文件

1.2 配置优先级

2. 存储连接器配置

2.1 通用配置

2.2 Cache 层配置

2.3 Posix 后端配置

2.4 NFS 后端配置

2.5 DS3FS 后端配置

2.6 Mooncake 后端配置

3. Pipeline 配置

3.1 Pipeline 语法

3.2 支持的组合

3.3 多级 Pipeline

4. 稀疏注意力配置

4.1 方法选择

4.2 GSA 配置

4.3 ESA 配置

4.4 Blend 配置

4.5 KVStar 配置

4.6 RERoPE 配置

5. 环境变量

5.1 构建环境变量

5.2 运行时环境变量

5.3 监控环境变量

5.4 传输层环境变量

6. vLLM 集成配置

6.1 KVTransferConfig

6.2 完整 vLLM 示例

7. 配置验证

7.1 验证脚本

7.2 运行验证

8. 配置示例

8.1 开发环境

8.2 生产环境

8.3 云环境