Files
yolov26_3d/eval_tools/model_comparison/README_per_case_comparison.md
2026-06-24 09:35:46 +08:00

3.1 KiB
Executable File

Per-Case 2D Metrics Comparison Tool

This tool compares per_case_2d metrics between two model evaluation reports and identifies cases with significant metric differences.

Files

  • compare_per_case_2d.py - Main Python script for comparing per-case metrics
  • compare_per_case_2d.sh - Shell script with pre-configured paths for mono3d vs yolov5s-300w-newdata comparison

Usage

Quick Start (Using Shell Script)

cd /deeplearning_team/ydong/dongying/projects/yolov5-3d
./eval_tools/model_comparison/compare_per_case_2d.sh

This will compare the two models and save results to evaluation_results/per_case_2d_comparison.json.

Custom Comparison (Using Python Script)

python eval_tools/model_comparison/compare_per_case_2d.py \
    --model1 path/to/model1/evaluation_report.json \
    --model2 path/to/model2/evaluation_report.json \
    --model1-name "Model-A" \
    --model2-name "Model-B" \
    --threshold 0.1 \
    --output comparison_results.json \
    --top-n 30

Arguments

  • --model1: Path to first model's evaluation_report.json (required)
  • --model2: Path to second model's evaluation_report.json (required)
  • --model1-name: Display name for model 1 (default: "Model-1")
  • --model2-name: Display name for model 2 (default: "Model-2")
  • --threshold: Threshold for significant difference, e.g., 0.1 = 10% (default: 0.1)
  • --output: Output JSON file path (default: "per_case_comparison.json")
  • --top-n: Number of top different cases to display (default: 20)

Output

The script generates:

  1. Console Output:

    • Summary of total cases and common cases
    • Top N cases with significant differences
    • Summary statistics (mean, std, median, range) for each class and metric
  2. JSON File: Contains detailed comparison data including:

    • summary: Overview statistics
    • significant_differences: List of cases exceeding the threshold
    • all_case_comparisons: Complete per-case comparison data
    • summary_statistics: Statistical analysis by class and metric

Example Output

Top 30 Cases with Significant Differences
================================================================================

1. Case: 20251118/seq-53
   Class: pedestrian, Metric: ap
   mono3d: 1.0000
   yolov5s-300w-newdata: 0.0000
   Difference: -1.0000 (abs: 1.0000)

2. Case: 20251121/seq-30
   Class: roadblock, Metric: ap
   mono3d: 1.0000
   yolov5s-300w-newdata: 0.0000
   Difference: -1.0000 (abs: 1.0000)
...

Summary Statistics
================================================================================

VEHICLE:
  ap        : mean=-0.0776, std=0.1439, median=-0.0243, range=[-0.7935, +0.0994]
  precision : mean=+0.1279, std=0.2248, median=+0.0934, range=[-0.9442, +0.6074]
  recall    : mean=-0.1210, std=0.1579, median=-0.0635, range=[-0.8975, +0.0000]

Interpretation

  • Positive difference: Model 2 performs better than Model 1
  • Negative difference: Model 1 performs better than Model 2
  • Cases are sorted by absolute difference (largest differences first)
  • Summary statistics show overall trends across all cases