93 lines
3.1 KiB
Markdown
93 lines
3.1 KiB
Markdown
|
|
# Per-Case 2D Metrics Comparison Tool
|
||
|
|
|
||
|
|
This tool compares `per_case_2d` metrics between two model evaluation reports and identifies cases with significant metric differences.
|
||
|
|
|
||
|
|
## Files
|
||
|
|
|
||
|
|
- `compare_per_case_2d.py` - Main Python script for comparing per-case metrics
|
||
|
|
- `compare_per_case_2d.sh` - Shell script with pre-configured paths for mono3d vs yolov5s-300w-newdata comparison
|
||
|
|
|
||
|
|
## Usage
|
||
|
|
|
||
|
|
### Quick Start (Using Shell Script)
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cd /deeplearning_team/ydong/dongying/projects/yolov5-3d
|
||
|
|
./eval_tools/model_comparison/compare_per_case_2d.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
This will compare the two models and save results to `evaluation_results/per_case_2d_comparison.json`.
|
||
|
|
|
||
|
|
### Custom Comparison (Using Python Script)
|
||
|
|
|
||
|
|
```bash
|
||
|
|
python eval_tools/model_comparison/compare_per_case_2d.py \
|
||
|
|
--model1 path/to/model1/evaluation_report.json \
|
||
|
|
--model2 path/to/model2/evaluation_report.json \
|
||
|
|
--model1-name "Model-A" \
|
||
|
|
--model2-name "Model-B" \
|
||
|
|
--threshold 0.1 \
|
||
|
|
--output comparison_results.json \
|
||
|
|
--top-n 30
|
||
|
|
```
|
||
|
|
|
||
|
|
### Arguments
|
||
|
|
|
||
|
|
- `--model1`: Path to first model's evaluation_report.json (required)
|
||
|
|
- `--model2`: Path to second model's evaluation_report.json (required)
|
||
|
|
- `--model1-name`: Display name for model 1 (default: "Model-1")
|
||
|
|
- `--model2-name`: Display name for model 2 (default: "Model-2")
|
||
|
|
- `--threshold`: Threshold for significant difference, e.g., 0.1 = 10% (default: 0.1)
|
||
|
|
- `--output`: Output JSON file path (default: "per_case_comparison.json")
|
||
|
|
- `--top-n`: Number of top different cases to display (default: 20)
|
||
|
|
|
||
|
|
## Output
|
||
|
|
|
||
|
|
The script generates:
|
||
|
|
|
||
|
|
1. **Console Output**:
|
||
|
|
- Summary of total cases and common cases
|
||
|
|
- Top N cases with significant differences
|
||
|
|
- Summary statistics (mean, std, median, range) for each class and metric
|
||
|
|
|
||
|
|
2. **JSON File**: Contains detailed comparison data including:
|
||
|
|
- `summary`: Overview statistics
|
||
|
|
- `significant_differences`: List of cases exceeding the threshold
|
||
|
|
- `all_case_comparisons`: Complete per-case comparison data
|
||
|
|
- `summary_statistics`: Statistical analysis by class and metric
|
||
|
|
|
||
|
|
## Example Output
|
||
|
|
|
||
|
|
```
|
||
|
|
Top 30 Cases with Significant Differences
|
||
|
|
================================================================================
|
||
|
|
|
||
|
|
1. Case: 20251118/seq-53
|
||
|
|
Class: pedestrian, Metric: ap
|
||
|
|
mono3d: 1.0000
|
||
|
|
yolov5s-300w-newdata: 0.0000
|
||
|
|
Difference: -1.0000 (abs: 1.0000)
|
||
|
|
|
||
|
|
2. Case: 20251121/seq-30
|
||
|
|
Class: roadblock, Metric: ap
|
||
|
|
mono3d: 1.0000
|
||
|
|
yolov5s-300w-newdata: 0.0000
|
||
|
|
Difference: -1.0000 (abs: 1.0000)
|
||
|
|
...
|
||
|
|
|
||
|
|
Summary Statistics
|
||
|
|
================================================================================
|
||
|
|
|
||
|
|
VEHICLE:
|
||
|
|
ap : mean=-0.0776, std=0.1439, median=-0.0243, range=[-0.7935, +0.0994]
|
||
|
|
precision : mean=+0.1279, std=0.2248, median=+0.0934, range=[-0.9442, +0.6074]
|
||
|
|
recall : mean=-0.1210, std=0.1579, median=-0.0635, range=[-0.8975, +0.0000]
|
||
|
|
```
|
||
|
|
|
||
|
|
## Interpretation
|
||
|
|
|
||
|
|
- **Positive difference**: Model 2 performs better than Model 1
|
||
|
|
- **Negative difference**: Model 1 performs better than Model 2
|
||
|
|
- Cases are sorted by absolute difference (largest differences first)
|
||
|
|
- Summary statistics show overall trends across all cases
|