Title here
Summary here
阅读时间: 约 15 分钟 前置要求: 部署指南
本文档提供 UCM 所有配置项的完整参考,包括 YAML 配置、环境变量和运行时配置。
# ucm_config.yaml - UCM 完整配置模板
# ========================================
# ========================================
ucm_connectors:
- ucm_connector_name: "UcmPipelineStore"
ucm_connector_config:
# Pipeline 配置
store_pipeline: "Cache|Posix"
# 存储路径
storage_backends: "/data/ucm_cache"
# 缓冲池配置
buffer_number: 2048
# Block 配置
block_size: 16
shard_size: 2048
tensor_size: 65536
# 稀疏注意力配置
# ========================================
ucm_sparse_method: "GSA"
ucm_sparse_config:
GSA:
sparse_ratio: 0.3
local_window_sz: 2
init_window_sz: 1
min_blocks: 4
prefetch_workers: 4
# 监控配置
# ========================================
metrics:
enabled: true
prometheus_port: 9090
# 日志配置
# ========================================
logging:
level: "INFO"
format: "[%(asctime)s] [%(name)s] [%(levelname)s] %(message)s"| 参数 | 类型 | 默认值 | 说明 |
|---|---|---|---|
ucm_connector_name |
string | - | 连接器名称 |
storage_backends |
string/list | - | 存储路径 |
store_pipeline |
string | “Posix” | Pipeline 组合 |
block_size |
int | 16 | Block 大小(Token 数) |
shard_size |
int | 2048 | 分片大小 |
tensor_size |
int | 65536 | 张量大小 |
ucm_connector_config:
# 缓冲池大小(必须 >= 1024)
buffer_number: 2048
# LRU 缓存容量
cache_capacity: 10000
# 预取配置
prefetch_enabled: true
prefetch_depth: 4| 参数 | 类型 | 默认值 | 说明 |
|---|---|---|---|
buffer_number |
int | 1024 | Pinned Buffer 数量 |
cache_capacity |
int | 10000 | LRU 缓存容量 |
prefetch_enabled |
bool | true | 启用预取 |
prefetch_depth |
int | 4 | 预取深度 |
ucm_connector_config:
storage_backends: "/data/ucm_cache"
# 文件配置
file_mode: 0644
dir_mode: 0755
# I/O 配置
use_direct_io: true
io_block_size: 4096| 参数 | 类型 | 默认值 | 说明 |
|---|---|---|---|
storage_backends |
string | - | 存储路径 |
use_direct_io |
bool | true | 使用 Direct I/O |
io_block_size |
int | 4096 | I/O 块大小 |
ucm_connector_config:
storage_backends:
- "/mnt/nfs1/ucm_cache"
- "/mnt/nfs2/ucm_cache"
# 路径选择策略
path_strategy: "round_robin" # round_robin, random, tiered
# NFS 特定配置
nfs_rsize: 1048576
nfs_wsize: 1048576ucm_connector_config:
# S3 配置
s3_endpoint: "https://s3.amazonaws.com"
storage_backends: "ucm-cache-bucket"
s3_access_key: "${AWS_ACCESS_KEY_ID}"
s3_secret_key: "${AWS_SECRET_ACCESS_KEY}"
# 优化配置
use_io_uring: true
connection_pool_size: 32
max_concurrent_requests: 64
# 分块上传
multipart_threshold: 8388608
multipart_chunk_size: 8388608ucm_connector_config:
cluster_endpoints:
- "mooncake-1.local:9100"
- "mooncake-2.local:9100"
# RDMA 配置
use_rdma: true
rdma_device: "mlx5_0"
# 复制因子
replication_factor: 2store_pipeline: "<Layer1>|<Layer2>|..."
| Pipeline | 说明 | 适用场景 |
|---|---|---|
Posix |
仅本地存储 | 单机测试 |
Cache |
仅内存缓存 | 临时缓存 |
Cache|Posix |
缓存 + 本地 | 单机生产 |
Cache|NFS |
缓存 + NFS | 企业内网 |
Cache|Ds3fs |
缓存 + S3 | 云环境 |
Cache|Mooncake |
缓存 + 分布式 | 高性能集群 |
# 三级 Pipeline
store_pipeline: "Cache|NFS|Ds3fs"
# 1. 首先查找内存缓存
# 2. 未命中则查找 NFS# 可选值: GSA, ESA, Blend, KVStar, RERoPE
ucm_sparse_method: "GSA"ucm_sparse_config:
GSA:
# 稀疏比例
sparse_ratio: 0.3
# 窗口配置
local_window_sz: 2
init_window_sz: 1
# 最小 Block 数
min_blocks: 4
# 预取配置
prefetch_workers: 4
max_pending_prefetch: 64
prefetch_ahead: 2ucm_sparse_config:
ESA:
sparse_ratio: 0.3
local_window_sz: 2
init_window_sz: 1
min_blocks: 4
# 检索配置
retrieval_stride: 5
retrieval_workers: 4
retrieval_cache_size: 10000ucm_sparse_config:
Blend:
# Chunk 配置
blend_chunk_size: 64
min_match_ratio: 0.8
# 位置校正
position_correction: true
recompute_boundary: 2
# RoPE 配置
rope_theta: 10000.0ucm_sparse_config:
KVStar:
checkpoint_interval: 8
lock_duration: 16
update_threshold: 2
# 底层方法
base_method: "GSA"ucm_sparse_config:
RERoPE:
max_position: 4096
extend_factor: 2.0
rope_theta: 10000.0
# 底层方法
base_method: "GSA"| 变量 | 默认值 | 说明 |
|---|---|---|
PLATFORM |
cuda | 硬件平台 |
ENABLE_SPARSE |
false | 启用稀疏模块 |
| 变量 | 默认值 | 说明 |
|---|---|---|
UNIFIED_CACHE_LOG_LEVEL |
INFO | 日志级别 |
UCM_CONFIG_FILE |
- | 配置文件路径 |
UCM_DISABLE_PATCHES |
0 | 禁用补丁 |
UCM_PATCH_VERBOSE |
0 | 补丁详细日志 |
| 变量 | 默认值 | 说明 |
|---|---|---|
UCM_METRICS_ENABLED |
true | 启用监控 |
UCM_METRICS_PORT |
9090 | Prometheus 端口 |
| 变量 | 默认值 | 说明 |
|---|---|---|
UCM_PINNED_POOL_SIZE |
1073741824 | Pinned 池大小 |
UCM_TRANSPORT_STREAMS |
2 | 传输流数量 |
from vllm.config import KVTransferConfig
ktc = KVTransferConfig(
# Connector 类名
kv_connector="UCMConnector",
# 模块路径
kv_connector_module_path="ucm.integration.vllm.ucm_connector",
# 角色
kv_role="kv_both", # kv_producer, kv_consumer, kv_both
# 额外配置
kv_connector_extra_config={
"UCM_CONFIG_FILE": "./ucm_config.yaml"
}
)from vllm import LLM
from vllm.config import KVTransferConfig
# UCM 配置
ktc = KVTransferConfig(
kv_connector="UCMConnector",
kv_connector_module_path="ucm.integration.vllm.ucm_connector",
kv_role="kv_both",
kv_connector_extra_config={
"UCM_CONFIG_FILE": "./ucm_config.yaml",
# 或直接配置
"store_pipeline": "Cache|Posix",
"storage_backends": "/data/ucm_cache",
"buffer_number": 2048,
}
)
# 创建 LLM
llm = LLM(
model="meta-llama/Llama-2-7b-hf",
tensor_parallel_size=1,
kv_transfer_config=ktc,
)#!/usr/bin/env python
"""验证 UCM 配置"""
import yaml
from pathlib import Path
def validate_config(config_path: str):
"""验证配置文件"""
with open(config_path) as f:
config = yaml.safe_load(f)
errors = []
warnings = []
# 检查必需字段
if 'ucm_connectors' not in config:
errors.append("Missing 'ucm_connectors' field")
else:
for i, conn in enumerate(config['ucm_connectors']):
if 'ucm_connector_name' not in conn:
errors.append(f"Connector {i}: missing 'ucm_connector_name'")
if 'ucm_connector_config' not in conn:
errors.append(f"Connector {i}: missing 'ucm_connector_config'")
else:
cfg = conn['ucm_connector_config']
if 'storage_backends' not in cfg:
errors.append(f"Connector {i}: missing 'storage_backends'")
# 检查 buffer_number
if cfg.get('buffer_number', 1024) < 1024:
warnings.append(f"Connector {i}: buffer_number < 1024")
# 检查稀疏配置
if 'ucm_sparse_method' in config:
method = config['ucm_sparse_method']
sparse_config = config.get('ucm_sparse_config', {})
if method not in sparse_config:
warnings.append(f"Sparse method '{method}' has no configuration")
# 输出结果
if errors:
print("Errors:")
for e in errors:
print(f" - {e}")
if warnings:
print("Warnings:")
for w in warnings:
print(f" - {w}")
if not errors and not warnings:
print("Configuration is valid!")
return len(errors) == 0
if __name__ == "__main__":
import sys
config_file = sys.argv[1] if len(sys.argv) > 1 else "ucm_config.yaml"
validate_config(config_file)python validate_config.py ucm_config.yamlucm_connectors:
- ucm_connector_name: "UcmPipelineStore"
ucm_connector_config:
store_pipeline: "Cache|Posix"
storage_backends: "/tmp/ucm_cache"
buffer_number: 1024
ucm_sparse_method: "GSA"
ucm_sparse_config:
GSA:
sparse_ratio: 0.3
logging:
level: "DEBUG"ucm_connectors:
- ucm_connector_name: "UcmPipelineStore"
ucm_connector_config:
store_pipeline: "Cache|Posix"
storage_backends: "/data/ucm_cache"
buffer_number: 4096
cache_capacity: 50000
ucm_sparse_method: "GSA"
ucm_sparse_config:
GSA:
sparse_ratio: 0.3
prefetch_workers: 8
metrics:
enabled: true
prometheus_port: 9090
logging:
level: "INFO"ucm_connectors:
- ucm_connector_name: "UcmPipelineStore"
ucm_connector_config:
store_pipeline: "Cache|Ds3fs"
s3_endpoint: "https://s3.amazonaws.com"
storage_backends: "my-ucm-bucket"
s3_access_key: "${AWS_ACCESS_KEY_ID}"
s3_secret_key: "${AWS_SECRET_ACCESS_KEY}"
buffer_number: 2048
use_io_uring: true
ucm_sparse_method: "GSA"