集成调试指南

阅读时间: 约 15 分钟 适用人群: 需要调试 vLLM 集成问题的开发者

概述

本文提供 vLLM 集成层的调试入口点、验证方法和问题排查技巧。

1. 调试入口点

1.1 核心入口点列表

vLLM 集成调试入口
=================

1. 补丁系统
   ├── ucm/integration/vllm/patch/apply_patch.py:84
   │   apply_all_patches()
   └── ucm/integration/vllm/patch/apply_patch.py:20
       install_import_hook()

2. UCM Connector
   ├── ucm/integration/vllm/ucm_connector.py:85
   │   UCMDirectConnector.__init__()
   ├── ucm/integration/vllm/ucm_connector.py:150
   │   UCMDirectConnector.get_num_new_matched_tokens()
   ├── ucm/integration/vllm/ucm_connector.py:200
   │   UCMDirectConnector.start_load_kv()
   └── ucm/integration/vllm/ucm_connector.py:280
       UCMDirectConnector.wait_for_save()

3. Blend Connector
   └── ucm/integration/vllm/blend_connector.py:40
       BlendConnector.__init__()

4. 请求哈希
   └── ucm/integration/vllm/ucm_connector.py:50
       RequestHasher.generate_block_hashes()

5. vLLM 补丁点
   ├── ucm/integration/vllm/patch/patch_funcs/v092/vllm_patch.py:60
   │   _patch_scheduler_output()
   ├── ucm/integration/vllm/patch/patch_funcs/v092/vllm_patch.py:100
   │   _patch_kv_cache_manager()
   └── ucm/integration/vllm/patch/patch_funcs/v092/vllm_patch.py:138
       _patch_attention_layer()

1.2 调试流程

flowchart TB subgraph init["初始化调试"] I1["1. 验证补丁安装"] --> I2["2. 检查 Connector 创建"] I2 --> I3["3. 验证 Store 连接"] end subgraph scheduler["Scheduler 侧"] S1["4. 哈希生成"] --> S2["5. lookup 查询"] S2 --> S3["6. 元数据构建"] end subgraph worker["Worker 侧"] W1["7. 元数据绑定"] --> W2["8. KV 加载"] W2 --> W3["9. 模型执行"] W3 --> W4["10. KV 保存"] end init --> scheduler scheduler --> worker

2. 验证补丁生效

2.1 快速验证脚本

#!/usr/bin/env python
"""验证 UCM 补丁是否正确应用"""

def verify_ucm_patches():
    print("=== UCM Patch Verification ===\n")

    # 1. 检查 Import Hook
    import sys
    hook_installed = any(
        'UCMPatchFinder' in str(type(finder))
        for finder in sys.meta_path
    )
    print(f"1. Import Hook: {'INSTALLED' if hook_installed else 'NOT FOUND'}")

    # 2. 检查核心补丁
    try:
        from vllm.v1.core.sched.output import SchedulerOutput
        has_ucm_meta = hasattr(SchedulerOutput, 'ucm_connector_meta')
        print(f"2. SchedulerOutput patch: {'APPLIED' if has_ucm_meta else 'MISSING'}")
    except ImportError as e:
        print(f"2. SchedulerOutput patch: ERROR - {e}")

    # 3. 检查 Attention 补丁
    try:
        from vllm.attention import Attention
        forward = Attention.forward
        is_patched = hasattr(forward, '__wrapped__') or 'patched' in str(forward)
        print(f"3. Attention patch: {'APPLIED' if is_patched else 'UNKNOWN'}")
    except ImportError as e:
        print(f"3. Attention patch: ERROR - {e}")

    # 4. 检查 UCM Connector 可用
    try:
        from ucm.integration.vllm.ucm_connector import UCMDirectConnector
        print(f"4. UCMDirectConnector: AVAILABLE")
    except ImportError as e:
        print(f"4. UCMDirectConnector: ERROR - {e}")

    print("\n=== Verification Complete ===")


if __name__ == "__main__":
    # 先导入 UCM 触发补丁安装
    import ucm

    verify_ucm_patches()

2.2 运行验证

python verify_patches.py

预期输出:

=== UCM Patch Verification ===

1. Import Hook: INSTALLED
2. SchedulerOutput patch: APPLIED
3. Attention patch: APPLIED
4. UCMDirectConnector: AVAILABLE

=== Verification Complete ===

3. 调试 Connector

3.1 检查 Connector 初始化

from vllm.config import KVTransferConfig

ktc = KVTransferConfig(
    kv_connector="UCMConnector",
    kv_connector_module_path="ucm.integration.vllm.ucm_connector",
    kv_role="kv_both",
    kv_connector_extra_config={
        "UCM_CONFIG_FILE": "./ucm_config.yaml"
    }
)
# 手动创建 Connector 进行调试
from ucm.integration.vllm.ucm_connector import UCMDirectConnector

connector = UCMDirectConnector(
    rank=0,
    local_rank=0,
    config=ktc,
)

print(f"Connector type: {type(connector)}")
print(f"Store: {connector.store}")
print(f"Hasher: {connector.request_hasher}")

3.2 调试 lookup 流程

import torch

class MockRequest:
    def __init__(self, request_id, prompt_token_ids):
        self.request_id = request_id
        self.prompt_token_ids = prompt_token_ids
request = MockRequest(
    request_id="test_001",
    prompt_token_ids=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10] * 10
)

matched = connector.get_num_new_matched_tokens(request)
print(f"Matched tokens: {matched}")
print(f"Total tokens: {len(request.prompt_token_ids)}")
print(f"Hit rate: {matched / len(request.prompt_token_ids) * 100:.1f}%")

3.3 调试加载/保存

# 准备测试数据
kv_tensor = torch.zeros(
    (32 * 2, 16, 32, 128),  # layers*2, block_size, heads, head_dim
    dtype=torch.float16,
    device='cuda'
)

class MockMeta:
    def __init__(self):
        self.load_block_ids = ([b"hash1", b"hash2"], [0, 1])
        self.dump_block_ids = ([b"hash3", b"hash4"], [2, 3])
connector._request_metas["test_001"] = MockMeta()
# 测试加载
connector.start_load_kv(["test_001"])
print("Load started")
# 测试保存
connector.wait_for_save(["test_001"])
print("Save completed")

4. 常见问题排查

4.1 问题: Connector 未被识别

症状:

ValueError: Unknown kv_connector: UCMConnector

排查步骤:

# 1. 检查模块路径
import importlib
try:
    module = importlib.import_module("ucm.integration.vllm.ucm_connector")
    print("Module found")
    print(f"Classes: {dir(module)}")
except ImportError as e:
    print(f"Import error: {e}")

from ucm.integration.vllm.ucm_connector import UCMConnector
print(f"UCMConnector available: {UCMConnector is not None}")
# 3. 确认配置
print(f"kv_connector: {ktc.kv_connector}")
print(f"module_path: {ktc.kv_connector_module_path}")

4.2 问题: 补丁未生效

症状:

UCM 功能不工作
日志中没有 UCM 相关输出

排查步骤:

import sys
print("Import order check:")
print(f"  'ucm' in modules: {'ucm' in sys.modules}")
print(f"  'vllm' in modules: {'vllm' in sys.modules}")
# 正确顺序: ucm 在 vllm 之前

# 2. 手动应用补丁
from ucm.integration.vllm.patch.apply_patch import apply_all_patches
apply_all_patches()
# 3. 强制重新加载
import importlib
importlib.reload(sys.modules['vllm.attention'])

4.3 问题: KV 命中率为 0

症状:

lookup 总是返回空
所有请求都完整计算

排查步骤:

print(f"Store: {connector.store}")
print(f"Store type: {type(connector.store)}")
# 2. 检查存储路径
import os
storage_path = connector.store.storage_path
print(f"Storage path: {storage_path}")
print(f"Path exists: {os.path.exists(storage_path)}")
print(f"Files: {os.listdir(storage_path) if os.path.exists(storage_path) else 'N/A'}")

test_ids = connector.request_hasher.generate_block_hashes([1, 2, 3, 4], 4)
results = connector.store.lookup(test_ids)
print(f"Lookup results: {results}")

test_tensor = torch.zeros(1000, dtype=torch.float16, device='cuda')
task = connector.store.dump(test_ids[:1], 0, test_tensor)
connector.store.wait(task)
connector.store.commit(test_ids[:1], [True])
results = connector.store.lookup(test_ids[:1])
print(f"After dump lookup: {results}")

4.4 问题: 数据传输失败

症状:

load/dump 超时
CUDA 错误 排查步骤:

# 1. 检查 CUDA 状态
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"Current device: {torch.cuda.current_device()}")
print(f"Device name: {torch.cuda.get_device_name()}")
# 2. 检查张量设备
print(f"Tensor device: {kv_tensor.device}")
print(f"Tensor dtype: {kv_tensor.dtype}")
# 3. 测试小数据传输
small_tensor = torch.zeros(100, dtype=torch.float16, device='cuda')
task = connector.store.dump([b"test"], 0, small_tensor)
status = connector.store.wait(task)
print(f"Small dump status: {status}")
# 4. 检查错误日志
import logging
logging.basicConfig(level=logging.DEBUG)

5. 性能分析

5.1 测量 Connector 开销

import time
# 测量 lookup 时间
start = time.time()
for _ in range(100):
    connector.get_num_new_matched_tokens(request)
lookup_time = (time.time() - start) / 100
print(f"Average lookup: {lookup_time * 1000:.2f}ms")

start = time.time()
for _ in range(10):
    connector.start_load_kv(["test_001"])
load_time = (time.time() - start) / 10
print(f"Average load: {load_time * 1000:.2f}ms")

5.2 使用 Profiler

import torch

with torch.profiler.profile(
    activities=[
        torch.profiler.ProfilerActivity.CPU,
        torch.profiler.ProfilerActivity.CUDA,
    ],
    record_shapes=True
) as prof:
    connector.start_load_kv(["test_001"])
    connector.wait_for_save(["test_001"])

print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=10))

6. 调试清单

6.1 初始化检查

UCM 模块正确导入
Import Hook 已安装
补丁已应用
Connector 已创建
Store 已连接

6.2 运行时检查

Block 哈希生成一致
lookup 返回正确结果
数据传输正常
元数据正确传递

6.3 性能检查

lookup 延迟 < 10ms
load/dump 速度正常
命中率符合预期
无内存泄漏

7. 快速参考

7.1 环境变量

UNIFIED_CACHE_LOG_LEVEL=DEBUG

UCM_DISABLE_PATCHES=1

UCM_PATCH_VERBOSE=1

7.2 关键日志标记

[UCM.Connector]   # Connector 日志
[UCM.Patch]       # 补丁日志
[UCM.Hasher]      # 哈希生成日志
[UCM.Load]        # 加载日志
[UCM.Save]        # 保存日志

2026年2月24日

GitHub

补丁详细分析

Ucm

Inference Cookbook

Title here

集成调试指南

概述

1. 调试入口点

1.1 核心入口点列表

1.2 调试流程

2. 验证补丁生效

2.1 快速验证脚本

2.2 运行验证

3. 调试 Connector

3.1 检查 Connector 初始化

3.2 调试 lookup 流程

3.3 调试加载/保存

4. 常见问题排查

4.1 问题: Connector 未被识别

4.2 问题: 补丁未生效

4.3 问题: KV 命中率为 0

4.4 问题: 数据传输失败

5. 性能分析

5.1 测量 Connector 开销

5.2 使用 Profiler

6. 调试清单

6.1 初始化检查

6.2 运行时检查

6.3 性能检查

7. 快速参考

7.1 环境变量

7.2 关键日志标记

集成调试指南

概述#

1. 调试入口点#

1.1 核心入口点列表#

1.2 调试流程#

2. 验证补丁生效#

2.1 快速验证脚本#

2.2 运行验证#

3. 调试 Connector#

3.1 检查 Connector 初始化#

3.2 调试 lookup 流程#

3.3 调试加载/保存#

4. 常见问题排查#

4.1 问题: Connector 未被识别#

4.2 问题: 补丁未生效#

4.3 问题: KV 命中率为 0#

4.4 问题: 数据传输失败#

5. 性能分析#

5.1 测量 Connector 开销#

5.2 使用 Profiler#

6. 调试清单#

6.1 初始化检查#

6.2 运行时检查#

6.3 性能检查#

7. 快速参考#

7.1 环境变量#

7.2 关键日志标记#

概述

1. 调试入口点

1.1 核心入口点列表

1.2 调试流程

2. 验证补丁生效

2.1 快速验证脚本

2.2 运行验证

3. 调试 Connector

3.1 检查 Connector 初始化

3.2 调试 lookup 流程

3.3 调试加载/保存

4. 常见问题排查

4.1 问题: Connector 未被识别

4.2 问题: 补丁未生效

4.3 问题: KV 命中率为 0

4.4 问题: 数据传输失败

5. 性能分析

5.1 测量 Connector 开销

5.2 使用 Profiler

6. 调试清单

6.1 初始化检查

6.2 运行时检查

6.3 性能检查

7. 快速参考

7.1 环境变量

7.2 关键日志标记