Title here
Summary here
阅读时间: 约 15 分钟 前置要求: 性能调优
本文档提供 UCM 常见问题的诊断和解决方法,包括安装问题、运行时错误、性能问题和集成问题。
错误信息:
error: CUDA driver version is insufficient for CUDA runtime version
解决方案:
# 检查 CUDA 版本
nvidia-smi
nvcc --version
# 确保版本匹配错误信息:
fatal error: torch/extension.h: No such file or directory
解决方案:
# 确保 PyTorch 安装正确
pip install torch --upgrade
python -c "import torch; print(torch.utils.cpp_extension.CUDA_HOME)"错误信息:
ImportError: cannot import name 'KVConnectorBase' from 'vllm'
解决方案:
pip install vllm==0.9.2
python -c "import vllm; print(vllm.__version__)"错误信息:
ModuleNotFoundError: No module named 'triton.language'
解决方案:
pip install triton==2.0.0错误信息:
FileNotFoundError: [Errno 2] No such file or directory: '/data/ucm_cache'
解决方案:
mkdir -p /data/ucm_cache
chmod 755 /data/ucm_cache错误信息:
OSError: [Errno 28] No space left on device
解决方案:
df -h /data
rm -rf /data/ucm_cache/*
# ucm_config.yaml
ucm_connector_config:
auto_cleanup: true
max_cache_size: 100G错误信息:
RuntimeError: CUDA out of memory while allocating pinned memory
解决方案:
ucm_connector_config:
buffer_number: 1024 # 减少缓冲区数量
pinned_pool_size: 536870912 # 减少到 512 MB错误信息:
torch.cuda.OutOfMemoryError: CUDA out of memory
解决方案:
ucm_sparse_method: "GSA"
ucm_sparse_config:
GSA:
sparse_ratio: 0.2 # 更激进的稀疏比例错误信息:
mount.nfs: Connection timed out
解决方案:
ping nfs-server.local
showmount -e nfs-server.local
sudo iptables -L -n | grep 2049错误信息:
botocore.exceptions.NoCredentialsError: Unable to locate credentials
解决方案:
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
ucm_connector_config:
s3_access_key: "${AWS_ACCESS_KEY_ID}"
s3_secret_key: "${AWS_SECRET_ACCESS_KEY}"症状:
# 检查补丁状态
import sys
print("'ucm' in modules:", 'ucm' in sys.modules)
print("'vllm' in modules:", 'vllm' in sys.modules)
from vllm.v1.core.sched.output import SchedulerOutput
print("UCM meta:", hasattr(SchedulerOutput, 'ucm_connector_meta'))解决方案:
import ucm # 必须先导入
import vllm # 然后导入 vLLM
# 或强制应用补丁
from ucm.integration.vllm.patch.apply_patch import apply_all_patches
apply_all_patches()错误信息:
ValueError: Unknown kv_connector: UCMConnector
解决方案:
# 检查模块路径
from vllm.config import KVTransferConfig
ktc = KVTransferConfig(
kv_connector="UCMConnector",
# 确保路径正确
kv_connector_module_path="ucm.integration.vllm.ucm_connector",
kv_role="kv_both",
)症状:
诊断:
from ucm.store.factory import UcmConnectorFactory
store = UcmConnectorFactory.create_connector(config, 0)
test_ids = [b"test_block_1"]
results = store.lookup(test_ids)
print(f"Lookup results: {results}")
import torch
tensor = torch.randn(1000)
task = store.dump(test_ids, 0, tensor)
store.wait(task)
store.commit(test_ids, [True])
results = store.lookup(test_ids)
print(f"After dump: {results}")解决方案:
原因:初始化开销
解决方案:
llm = LLM(model="...", kv_transfer_config=ktc)
warmup_prompt = "Hello, how are you?"
_ = llm.generate([warmup_prompt])诊断:
from ucm.shared.metrics.ucmmonitor import StatsMonitor
monitor = StatsMonitor.get_instance()
stats = monitor.get_stats()
print(f"Load speed: {stats.get('load_speed', 0) / 1e9:.2f} GB/s")
print(f"Save speed: {stats.get('save_speed', 0) / 1e9:.2f} GB/s")解决方案:
ucm_connector_config:
transport_streams: 4
buffer_number: 2048诊断:
from ucm.shared.trans.pinned_pool import PinnedMemoryPool
pool = PinnedMemoryPool.get_instance()
stats = pool.get_stats()
usage = stats['currently_used'] / stats['total_allocated']
print(f"Buffer usage: {usage * 100:.1f}%")解决方案:
ucm_connector_config:
buffer_number: 4096
ucm_sparse_config:
GSA:
prefetch_workers: 8
prefetch_ahead: 4export UNIFIED_CACHE_LOG_LEVEL=DEBUG
export UCM_PATCH_VERBOSE=1
# 运行
python your_script.pyfrom ucm.shared.metrics.ucmmonitor import StatsMonitor
import time
monitor = StatsMonitor.get_instance()
while True:
stats = monitor.get_stats()
print(f"\rHit: {stats.get('hit_rate', 0)*100:.1f}% | "
f"Load: {stats.get('load_speed', 0)/1e9:.1f} GB/s", end="")
time.sleep(1)import torch
with torch.profiler.profile(
activities=[
torch.profiler.ProfilerActivity.CPU,
torch.profiler.ProfilerActivity.CUDA,
],
record_shapes=True
) as prof:
# 运行推理
output = llm.generate(prompts)
print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=20))A: 目前完全支持 vLLM v0.9.2,部分支持 v0.9.1。
A: 设置环境变量:
export UCM_DISABLE_PATCHES=1A:
rm -rf /data/ucm_cache/*
from ucm.store.factory import UcmConnectorFactory
store = UcmConnectorFactory.create_connector(config, 0)
store.clear()A: UCM 支持张量并行(TP),每个 rank 有独立的 Connector。
A:
from ucm.config import get_current_config
config = get_current_config()
print(config)在报告问题时,请提供:
python -c "import ucm; print(ucm.__version__)"
python -c "import vllm; print(vllm.__version__)"
python -c "import torch; print(torch.__version__)"
nvidia-smi