# Per-Case 2D Metrics Comparison Tool This tool compares `per_case_2d` metrics between two model evaluation reports and identifies cases with significant metric differences. ## Files - `compare_per_case_2d.py` - Main Python script for comparing per-case metrics - `compare_per_case_2d.sh` - Shell script with pre-configured paths for mono3d vs yolov5s-300w-newdata comparison ## Usage ### Quick Start (Using Shell Script) ```bash cd /deeplearning_team/ydong/dongying/projects/yolov5-3d ./eval_tools/model_comparison/compare_per_case_2d.sh ``` This will compare the two models and save results to `evaluation_results/per_case_2d_comparison.json`. ### Custom Comparison (Using Python Script) ```bash python eval_tools/model_comparison/compare_per_case_2d.py \ --model1 path/to/model1/evaluation_report.json \ --model2 path/to/model2/evaluation_report.json \ --model1-name "Model-A" \ --model2-name "Model-B" \ --threshold 0.1 \ --output comparison_results.json \ --top-n 30 ``` ### Arguments - `--model1`: Path to first model's evaluation_report.json (required) - `--model2`: Path to second model's evaluation_report.json (required) - `--model1-name`: Display name for model 1 (default: "Model-1") - `--model2-name`: Display name for model 2 (default: "Model-2") - `--threshold`: Threshold for significant difference, e.g., 0.1 = 10% (default: 0.1) - `--output`: Output JSON file path (default: "per_case_comparison.json") - `--top-n`: Number of top different cases to display (default: 20) ## Output The script generates: 1. **Console Output**: - Summary of total cases and common cases - Top N cases with significant differences - Summary statistics (mean, std, median, range) for each class and metric 2. **JSON File**: Contains detailed comparison data including: - `summary`: Overview statistics - `significant_differences`: List of cases exceeding the threshold - `all_case_comparisons`: Complete per-case comparison data - `summary_statistics`: Statistical analysis by class and metric ## Example Output ``` Top 30 Cases with Significant Differences ================================================================================ 1. Case: 20251118/seq-53 Class: pedestrian, Metric: ap mono3d: 1.0000 yolov5s-300w-newdata: 0.0000 Difference: -1.0000 (abs: 1.0000) 2. Case: 20251121/seq-30 Class: roadblock, Metric: ap mono3d: 1.0000 yolov5s-300w-newdata: 0.0000 Difference: -1.0000 (abs: 1.0000) ... Summary Statistics ================================================================================ VEHICLE: ap : mean=-0.0776, std=0.1439, median=-0.0243, range=[-0.7935, +0.0994] precision : mean=+0.1279, std=0.2248, median=+0.0934, range=[-0.9442, +0.6074] recall : mean=-0.1210, std=0.1579, median=-0.0635, range=[-0.8975, +0.0000] ``` ## Interpretation - **Positive difference**: Model 2 performs better than Model 1 - **Negative difference**: Model 1 performs better than Model 2 - Cases are sorted by absolute difference (largest differences first) - Summary statistics show overall trends across all cases