yolov26_3d/eval_tools/docs/EVALUATION_DESIGN.md

# 模型输出评测方案设计

## 1. 评测概述

### 1.1 评测目标
- **2D检测评测**: 评估所有类别的2D边界框检测性能
- **3D检测评测**: 评估3D类别的空间定位和朝向估计性能

### 1.2 评测类别划分
- **3D目标类别** (0-3): vehicle, pedestrian, bike, rider
- **纯2D目标类别** (4-13): roadblock, head, tsr, guideboard, plate, wheel, tl_border, tl_wick, tl_num, tricycle

## 2. 数据格式解析

### 2.1 真值数据格式

#### 2.1.1 3D类别真值格式

**完整3D标注（车辆类别）- 50个值**:
```
[label, x, y, w, h,                          # 0-4: 类别和2D框（归一化）
 x3d_ori, y3d_ori, z3d_ori,                  # 5-7: 原始3D中心点
 l3d, h3d, w3d,                              # 8-10: 3D尺寸
 rot_y,                                      # 11: 旋转角
 xc_ori, yc_ori,                             # 12-13: 原始中心点2D投影
 xc_ori_d, yc_ori_d,                         # 14-15: 深度相关中心点
 alpha_ori,                                  # 16: 原始alpha角
 0,                                          # 17: 占位符
 # 前面 (18-25)
 x3d_front, y3d_front, z3d_front, alpha_front, xc_front, yc_front, score_front, is_occ_front,
 # 后面 (26-33)
 x3d_back, y3d_back, z3d_back, alpha_back, xc_back, yc_back, score_back, is_occ_back,
 # 左面 (34-41)
 x3d_left, y3d_left, z3d_left, alpha_left, xc_left, yc_left, score_left, is_occ_left,
 # 右面 (42-49)
 x3d_right, y3d_right, z3d_right, alpha_right, xc_right, yc_right, score_right, is_occ_right]
```

**完整3D标注（非车辆类别）- 18个值**:
```
[label, x, y, w, h,                          # 0-4: 类别和2D框（归一化）
 x3d_ori, y3d_ori, z3d_ori,                  # 5-7: 3D中心点
 l3d, h3d, w3d,                              # 8-10: 3D尺寸
 rot_y,                                      # 11: 旋转角
 xc_ori, yc_ori,                             # 12-13: 中心点2D投影
 xc_ori_d, yc_ori_d,                         # 14-15: 深度相关中心点
 alpha_ori,                                  # 16: alpha角
 0]                                          # 17: 占位符
```

**仅2D标注 - 6个值**:
```
[label, x, y, w, h, -1]                      # 最后一位为-1表示无3D标注
```

#### 2.1.2 纯2D类别真值格式（6个值）
```
[label, x, y, w, h, -1]                      # label ∈ {4,5,6,7,8,9,10,11,12,13}
```

### 2.2 检测结果格式

#### 2.2.1 3D类别检测格式（15个值）

**车辆类别**:
```
vehicle 0.95 368.08 574.17 437.89 617.20 cam -30.14 1.43 68.55 5.52 2.50 2.31 2.70 left
[label, conf, x1, y1, x2, y2, coord_sys, x3d, y3d, z3d, l3d, h3d, w3d, rot_y, face_type]
```
注：face_type可以是 front, back, left, right，也支持 rear 和 tail 作为 back 的别名

**非车辆类别**:
```
pedestrian 0.95 368.08 574.17 437.89 617.20 cam -30.14 1.43 68.55 5.52 2.50 2.31 2.70 whole
[label, conf, x1, y1, x2, y2, coord_sys, x3d, y3d, z3d, l3d, h3d, w3d, rot_y, whole]
```

#### 2.2.2 纯2D类别检测格式（5个值）
```
plate 0.94246 532.12 203.26 558.73 214.86
[label, conf, x1, y1, x2, y2]
```

## 3. 评测指标设计

### 3.1 2D检测指标

#### 3.1.1 基础指标
- **Precision (精确率)**: TP / (TP + FP)
- **Recall (召回率)**: TP / (TP + FN)
- **AP (Average Precision)**: PR曲线下面积，IoU阈值=0.5
- **mAP (mean Average Precision)**: 所有类别AP的平均值

#### 3.1.2 匹配规则
- **IoU阈值**: 0.5
- **匹配策略**:
  1. 计算预测框与真值框的IoU
  2. 按置信度从高到低排序预测框
  3. 每个真值框最多匹配一个预测框
  4. IoU >= 0.5 且类别相同视为匹配成功（TP）
  5. 未匹配的预测框为FP，未匹配的真值框为FN

#### 3.1.3 分类别评测
- 对每个类别分别计算 Precision, Recall, AP
- 类别包括: vehicle, pedestrian, bike, rider, roadblock, head, tsr, guideboard, plate, wheel, tl_border, tl_wick, tl_num, tricycle

#### 3.1.4 整体评测
- **总Precision**: 所有类别的总TP / (总TP + 总FP)
- **总Recall**: 所有类别的总TP / (总TP + 总FN)
- **mAP**: 所有类别AP的算术平均

### 3.2 3D检测指标

#### 3.2.1 评测范围
仅评测3D类别：vehicle, pedestrian, bike, rider

#### 3.2.2 前提条件
只有在2D检测匹配成功（IoU >= 0.5）且真值包含完整3D标注的情况下，才进行3D指标评测

#### 3.2.3 3D评测指标

**车辆类别的测距误差计算**:

车辆类别需要根据预测结果中的最近面信息（front/back/left/right），选取真值中对应的最近面中心点进行比较：

1. 根据预测结果中的`face_type`字段（front/back/left/right），确定预测的最近面
2. 从真值的4个面信息中，选取对应面的中心点坐标
3. 计算预测最近面中心点与真值对应面中心点的误差

```
# 车辆类别
face_mapping = {
    'front': [18, 19, 20],  # x3d_front, y3d_front, z3d_front 在真值中的索引
    'back':  [26, 27, 28],  # x3d_back, y3d_back, z3d_back
    'left':  [34, 35, 36],  # x3d_left, y3d_left, z3d_left
    'right': [42, 43, 44]   # x3d_right, y3d_right, z3d_right
}

# 根据预测的face_type选择真值中对应的面中心点
face_type = det_result['face_type']  # 'front', 'back', 'left', 'right'
x3d_gt, y3d_gt, z3d_gt = gt_values[face_mapping[face_type]]

# 获取预测的最近面中心点
x3d_pred, y3d_pred, z3d_pred = det_result['3d_info']['center']

# 计算误差
lateral_error = |x3d_pred - x3d_gt|
longitudinal_error = |z3d_pred - z3d_gt|
```

**非车辆类别的测距误差计算**:

非车辆类别（pedestrian, bike, rider）直接使用3D框中心点计算误差：

```
# 非车辆类别
x3d_gt, y3d_gt, z3d_gt = gt_values[5:8]  # x3d_ori, y3d_ori, z3d_ori
x3d_pred, y3d_pred, z3d_pred = det_result['3d_info']['center']

lateral_error = |x3d_pred - x3d_gt|
longitudinal_error = |z3d_pred - z3d_gt|
```

**Heading偏差 (Heading Error)**:

所有3D类别使用相同的方式计算heading误差：
```
heading_error = |normalize_angle(rot_y_pred - rot_y_gt)|
```
其中 normalize_angle 将角度差归一化到 [-π, π]

#### 3.2.4 统计指标
对每个3D类别分别统计：
- **横向误差**: 平均值、中位数、标准差、90%分位数
- **纵向误差**: 平均值、中位数、标准差、90%分位数
- **Heading误差**: 平均值、中位数、标准差、90%分位数

## 4. 评测流程设计

### 4.1 数据预处理

#### 4.1.1 真值数据解析
```python
def parse_ground_truth(gt_line, img_width, img_height):
    """
    解析真值标注
    返回: {
        'label': int,
        'bbox_2d': [x1, y1, x2, y2],  # 像素坐标
        'has_3d': bool,
        '3d_info': {
            'center': [x3d, y3d, z3d],  # 原始中心点(用于非车辆类别)
            'dimensions': [l3d, h3d, w3d],
            'rotation': rot_y,
            'faces': {  # 仅车辆类别有此字段
                'front': [x3d, y3d, z3d, alpha, xc, yc, score, is_occ],
                'back':  [x3d, y3d, z3d, alpha, xc, yc, score, is_occ],
                'left':  [x3d, y3d, z3d, alpha, xc, yc, score, is_occ],
                'right': [x3d, y3d, z3d, alpha, xc, yc, score, is_occ]
            } if label == 0 else None
        } if has_3d else None
    }
    """
```

#### 4.1.2 检测结果解析
```python
def parse_detection(det_line):
    """
    解析检测结果
    返回: {
        'label': str -> int,
        'confidence': float,
        'bbox_2d': [x1, y1, x2, y2],
        '3d_info': {
            'center': [x3d, y3d, z3d],
            'dimensions': [l3d, h3d, w3d],
            'rotation': rot_y,
            'face_type': str
        } if is_3d_class else None
    }
    """
```

### 4.2 2D评测流程

```
对每张图像:
  1. 加载真值和检测结果
  2. 对每个类别:
     a. 筛选出该类别的GT和DET
     b. 计算所有配对的IoU矩阵
     c. 按置信度排序DET
     d. 贪婪匹配（Hungarian or Greedy）
     e. 统计TP, FP, FN
     f. 记录置信度和匹配状态

对每个类别:
  3. 根据所有图像的统计:
     a. 按置信度排序所有预测
     b. 计算不同阈值下的Precision-Recall
     c. 计算AP (使用插值或积分)

整体统计:
  4. 计算总Precision, Recall
  5. 计算mAP
```

### 4.3 3D评测流程

```
对每张图像:
  1. 基于2D匹配结果
  2. 对每对匹配成功的(GT, DET):
     a. 检查GT是否有完整3D标注
     b. 检查DET是否为3D类别
     c. 如果都满足:
        - 如果是车辆类别(label=0):
          * 根据DET的face_type选择GT中对应面的中心点
          * 计算预测最近面与真值对应面的横向/纵向误差
        - 如果是非车辆类别(label=1,2,3):
          * 直接使用3D框中心点计算横向/纵向误差
        - 计算Heading误差（所有类别相同）
        - 按类别记录

对每个3D类别:
  3. 统计所有图像的误差:
     - 横向: mean, median, std, 90th percentile
     - 纵向: mean, median, std, 90th percentile
     - Heading: mean, median, std, 90th percentile
```

## 5. 实现架构

### 5.1 模块划分

```
eval_tools/
├── evaluator/
│   ├── __init__.py
│   ├── parser.py              # 数据解析模块
│   ├── matcher.py             # 2D匹配模块
│   ├── metrics_2d.py          # 2D指标计算
│   ├── metrics_3d.py          # 3D指标计算
│   ├── evaluator.py           # 主评测器
│   └── visualizer.py          # 结果可视化
├── configs/
│   └── eval_config.yaml       # 评测配置
└── eval.py                    # 评测入口脚本
```

### 5.2 核心类设计

#### 5.2.1 数据解析器
```python
class GroundTruthParser:
    def parse_line(self, line, img_shape)
    def is_3d_annotated(self, values)
    def get_class_name(self, label_id)

class DetectionParser:
    def parse_line(self, line)
    def map_class_name(self, name_str)
```

#### 5.2.2 匹配器
```python
class Matcher2D:
    def __init__(self, iou_threshold=0.5)
    def compute_iou(self, box1, box2)
    def match(self, gts, dets)  # 返回匹配对列表
```

#### 5.2.3 指标计算器
```python
class Metrics2D:
    def __init__(self)
    def add_image_results(self, matches, gts, dets, class_id)
    def compute_ap(self, class_id)
    def compute_map(self), face_type=None)
    def compute_statistics(self, class_id)
    def get_summary(self)
    def _get_vehicle_face_center(self, gt_faces, face_type)  # 根据face_type获取对应面中心
class Metrics3D:
    def __init__(self)
    def add_sample(self, gt_3d, det_3d, class_id)
    def compute_statistics(self, class_id)
    def get_summary(self)
```

#### 5.2.4 主评测器
```python
class Evaluator:
    def __init__(self, config)
    def load_ground_truth(self, gt_file)
    def load_detections(self, det_file)
    def evaluate_2d(self)
    def evaluate_3d(self)
    def generate_report(self, output_path)
```

### 5.3 配置文件示例

```yaml
# eval_config.yaml
dataset:
  gt_path: "path/to/ground_truth"
  det_path: "path/to/detections"
  image_list: "path/to/image_list.txt"

classes:
  3d_classes: [0, 1, 2, 3]  # vehicle, pedestrian, bike, rider
  2d_classes: [4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
  class_names:
    0: "vehicle"
    1: "pedestrian"
    2: "bike"
    3: "rider"
    4: "roadblock"
    5: "head"
    6: "tsr"
    7: "guideboard"
    8: "plate"
    9: "wheel"
    10: "tl_border"
    11: "tl_wick"
    12: "tl_num"
    13: "tricycle"

matching:
  iou_threshold: 0.5

metrics_2d:
  enabled: true
  confidence_threshold: [0.1, 0.3, 0.5, 0.7, 0.9]

metrics_3d:
  enabled: true
  distance_ranges:  # 可选：分距离段统计
    - [0, 30]
    - [30, 60]
    - [60, 100]
    - [100, 999]

output:
  save_path: "eval_results"
  format: ["json", "csv", "txt"]
  visualize: true
```

## 6. 输出报告格式

### 6.1 2D评测报告

```json
{
  "2d_evaluation": {
    "per_class": {
      "vehicle": {
        "precision": 0.92,
        "recall": 0.88,
        "ap": 0.90,
        "num_gt": 1500,
        "num_det": 1450,
        "tp": 1320,
        "fp": 130,
        "fn": 180
      },
      "pedestrian": {...},
      ...
    },
    "overall": {
      "precision": 0.87,
      "recall": 0.84,
      "map": 0.85,
      "num_classes": 14
    }
  }
}
```

### 6.2 3D评测报告

```json
{
  "3d_evaluation": {
    "vehicle": {
      "lateral_error": {
        "mean": 0.25,
        "median": 0.18,
        "std": 0.15,
        "percentile_90": 0.45
      },
      "longitudinal_error": {
        "mean": 1.2,
        "median": 0.9,
        "std": 0.8,
        "percentile_90": 2.1
      },
      "heading_error": {
        "mean": 0.08,
        "median": 0.05,
        "std": 0.06,
        "percentile_90": 0.15
      },
      "num_samples": 1320
    },
    "pedestrian": {...},
    ...
  }
}
```

## 7. 关键实现细节

### 7.1 IoU计算
```python
def compute_iou(box1, box2):
    """
    box: [x1, y1, x2, y2]
    """
    x1 = max(box1[0], box2[0])
    y1 = max(box1[1], box2[1])
    x2 = min(box1[2], box2[2])
    y2 = min(box1[3], box2[3])

    if x2 < x1 or y2 < y1:
        return 0.0

    intersection = (x2 - x1) * (y2 - y1)
    area1 = (box1[2] - box1[0]) * (box1[3] - box1[1])
    area2 = (box2[2] - box2[0]) * (box2[3] - box2[1])
    union = area1 + area2 - intersection

    return intersection / union if union > 0 else 0.0
```

### 7.2 AP计算（11点插值法）
```python
def compute_ap(precisions, recalls):
    """
    使用VOC 11点插值法计算AP
    """
    ap = 0.0
    for t in np.linspace(0, 1, 11):
        if np.sum(recalls >= t) == 0:
            p = 0
        else:
            p = np.max(precisions[recalls >= t])
        ap += p / 11.0
    return ap
```

### 7.3 角度归一化
```python
def normalize_angle(angle):
    """
    将角度归一化到[-π, π]
    """
    while angle > np.pi:
        angle -= 2 * np.pi
    while angle < -np.pi:
        angle += 2 * np.pi
    return angle
```

### 7.4 坐标系转换
```python
def normalized_to_pixel(bbox_norm, img_width, img_height):
    """
    归一化坐标转像素坐标
    bbox_norm: [x_center, y_center, w, h] (normalized)
    返回: [x1, y1, x2, y2] (pixel)
    """
    x_center = bbox_norm[0] * img_width
    y_center = bbox_norm[1] * img_height
    w = bbox_norm[2] * img_width
    h = bbox_norm[3] * img_height

    x1 = x_center - w / 2
    y1 = y_center - h / 2
    x2 = x_center + w / 2
    y2 = y_center + h / 2

    return [x1, y1, x2, y2]
```

## 8. 使用示例

### 8.1 命令行使用
```bash
# 基础评测
python eval_tools/eval.py \
    --gt-path /path/to/labels \
    --det-path /path/to/predictions \
    --output-dir eval_results

# 指定配置文件
python eval_tools/eval.py \
    --config eval_tools/configs/eval_config.yaml

# 只评测2D
python eval_tools/eval.py \
    --config eval_config.yaml \
    --eval-2d-only

# 只评测3D
python eval_tools/eval.py \
    --config eval_config.yaml \
    --eval-3d-only
```

### 8.2 Python API使用
```python
from eval_tools.evaluator import Evaluator

# 创建评测器
evaluator = Evaluator(config_path='eval_config.yaml')

# 加载数据
evaluator.load_data(
    gt_path='/path/to/labels',
    det_path='/path/to/predictions'
)

# 执行评测
results_2d = evaluator.evaluate_2d()
results_3d = evaluator.evaluate_3d()

# 生成报告
evaluator.generate_report(
    output_dir='eval_results',
    formats=['json', 'csv', 'html']
)
```

## 9. 扩展性考虑

### 9.1 多IoU阈值评测
可扩展支持COCO风格的多IoU阈值（0.5:0.05:0.95）

### 9.2 距离分段评测
对3D指标按不同距离段分别统计（近距离、中距离、远距离）

### 9.3 场景分类评测
可按不同场景（白天/夜晚、晴天/雨天等）分别评测

### 9.4 时序一致性评测
对视频序列评测跟踪一致性和稳定性

## 10. 注意事项

1. **坐标系统一**: 确保GT和DET使用相同的坐标系和图像尺寸
2. **类别映射**: 注意字符串类别名和数字ID的映射关系
3. **边界情况**: 处理空检测、空真值、图像尺寸不一致等情况
4. **性能优化**: 对于大规模数据，考虑并行处理和内存优化
5. **数值精度**: 3D指标计算注意浮点数精度问题
6. **可视化**: 提供检测结果和误差分布的可视化，便于分析