# Example: Using 2-Level Path Structure for Model Evaluation This example demonstrates how to evaluate models with 2-level directory structures. ## Scenario You have organized your test data with an additional level of hierarchy: ``` /data/detections/ ├── G1M3_AFS1616/ # Level 1: Dataset/configuration name │ ├── case_001/ # Level 2: Individual test cases │ │ └── txt_results/ │ │ ├── frame001.txt │ │ └── frame002.txt │ └── case_002/ │ └── txt_results/ │ └── frame001.txt └── G1M3_AFS1920/ └── case_003/ └── txt_results/ └── frame001.txt /data/ground_truth/ ├── G1M3_AFS1616/ │ ├── case_001/ │ │ └── labels/ │ │ ├── frame001.txt │ │ └── frame002.txt │ └── case_002/ │ └── labels/ │ └── frame001.txt └── G1M3_AFS1920/ └── case_003/ └── labels/ └── frame001.txt ``` ## Step 1: Create Configuration File Create `eval_tools/configs/eval_config_2level_example.yaml`: ```yaml # Evaluation Configuration for 2-Level Path Structure dataset: det_path: "/data/detections" gt_path: "/data/ground_truth" path_depth: 2 # Enable 2-level structure image: width: 1920 height: 1080 model: input_size: 704 min_box_size_at_input_scale: 8 performance: num_workers: 32 roi_gt: enabled: true calib_root: "/data/ground_truth" roi_config: [1920, 960] roi: enabled: false region: [0, 120, 1920, 1080] input_size: [704, 352] classes: 3d_classes: [0, 1, 2, 3] 2d_classes: [4, 5, 6, 7, 8, 9, 10, 11, 12, 13] class_names: 0: "vehicle" 1: "pedestrian" 2: "bicycle" 3: "rider" 4: "roadblock" 5: "head" 6: "tsr" 7: "guideboard" 8: "plate" 9: "wheel" 10: "tl_border" 11: "tl_wick" 12: "tl_num" 13: "tricycle" matching: iou_threshold: 0.5 metrics_2d: enabled: true conf_threshold: 0.3 ap_method: "voc2010" metrics_3d: enabled: true heading_tolerance: "both" distance_ranges: - [0, 20] - [20, 40] - [40, 60] - [60, 80] - [80, 100] - [100, 999] lateral_distance_ranges: - [-50, -40] - [-40, -30] - [-30, -20] - [-20, -10] - [-10, 0] - [0, 10] - [10, 20] - [20, 30] - [30, 40] - [40, 50] output: save_path: "evaluation_results/2level_example/{timestamp}" formats: ["json", "txt"] print_details: true per_case_reports: true ``` ## Step 2: Run Evaluation ### Method 1: Using Config File ```bash python eval_tools/core/eval.py \ --config eval_tools/configs/eval_config_2level_example.yaml \ --save-detailed-matches ``` ### Method 2: Command Line Override ```bash python eval_tools/core/eval.py \ --config eval_tools/configs/eval_config_2level_example.yaml \ --path-depth 2 \ --det-path /data/detections \ --gt-path /data/ground_truth \ --output-dir evaluation_results/custom_output ``` ### Method 3: Without Config File ```bash python eval_tools/core/eval.py \ --det-path /data/detections \ --gt-path /data/ground_truth \ --path-depth 2 \ --img-width 1920 \ --img-height 1080 \ --iou-threshold 0.5 \ --conf-threshold 0.3 \ --heading-tolerance both \ --output-dir evaluation_results/2level_test ``` ## Step 3: Compare Two Models with 2-Level Paths ### Create Config for Model 1 `eval_tools/configs/eval_config_model1_2level.yaml`: ```yaml dataset: det_path: "/data/model1/detections" gt_path: "/data/ground_truth" path_depth: 2 # ... other settings ... ``` ### Create Config for Model 2 `eval_tools/configs/eval_config_model2_2level.yaml`: ```yaml dataset: det_path: "/data/model2/detections" gt_path: "/data/ground_truth" path_depth: 2 # ... other settings ... ``` ### Update Comparison Script Edit `eval_tools/model_comparison/compare_models_with_common_matches.sh`: ```bash #!/bin/bash set -e # Configuration MODEL1_CONFIG="eval_tools/configs/eval_config_model1_2level.yaml" MODEL2_CONFIG="eval_tools/configs/eval_config_model2_2level.yaml" OUTPUT_BASE="evaluation_results/2level_comparison" TIMESTAMP=$(date +%Y%m%d_%H%M%S) MODEL1_NAME="model1" MODEL2_NAME="model2" # Step 1: Evaluate Model 1 echo "Step 1: Evaluating Model 1..." MODEL1_OUTPUT="${OUTPUT_BASE}/${MODEL1_NAME}/${TIMESTAMP}" python eval_tools/core/eval.py \ --config ${MODEL1_CONFIG} \ --output-dir ${MODEL1_OUTPUT} \ --save-detailed-matches # Step 2: Evaluate Model 2 echo "Step 2: Evaluating Model 2..." MODEL2_OUTPUT="${OUTPUT_BASE}/${MODEL2_NAME}/${TIMESTAMP}" python eval_tools/core/eval.py \ --config ${MODEL2_CONFIG} \ --output-dir ${MODEL2_OUTPUT} \ --save-detailed-matches # Step 3: Find common matches echo "Step 3: Finding common matches..." COMMON_MATCHES_DIR="${OUTPUT_BASE}/common_matches_${TIMESTAMP}" mkdir -p ${COMMON_MATCHES_DIR} python eval_tools/model_comparison/find_common_matches.py \ --model1-matches ${MODEL1_OUTPUT}/detailed_3d_matches.json \ --model2-matches ${MODEL2_OUTPUT}/detailed_3d_matches.json \ --output ${COMMON_MATCHES_DIR}/common_matches.json \ --model1-name "${MODEL1_NAME}" \ --model2-name "${MODEL2_NAME}" # Step 4: Compare models echo "Step 4: Comparing models..." COMPARISON_DIR="${OUTPUT_BASE}/comparison_${TIMESTAMP}" python eval_tools/model_comparison/compare_models.py \ --model1 ${MODEL1_OUTPUT}/evaluation_report.json \ --model2 ${MODEL2_OUTPUT}/evaluation_report.json \ --model1-name "${MODEL1_NAME}" \ --model2-name "${MODEL2_NAME}" \ --common-matches ${COMMON_MATCHES_DIR}/common_matches.json \ --output-dir ${COMPARISON_DIR} echo "✓ Comparison complete!" echo "Results: ${COMPARISON_DIR}/comparison_report.txt" ``` ### Run Comparison ```bash bash eval_tools/model_comparison/compare_models_with_common_matches.sh ``` ## Expected Output ``` ================================================================================ YOLOv5-3D Model Evaluation ================================================================================ Detection path: /data/detections Ground truth path: /data/ground_truth Path depth: 2 Output directory: evaluation_results/2level_example/20260211_143022 Image size: 1920x1080 IoU threshold: 0.5 Confidence threshold: 0.3 AP method: voc2010 Heading tolerance: both Number of workers: 32 Evaluate 2D: True Evaluate 3D: True Save detailed matches: Yes ================================================================================ Loading data... Found 3 case(s) in detection root: /data/detections (path_depth=2) Loaded 5 image pairs for evaluation ================================================== Evaluating 2D Detection Metrics ================================================== Processing case [1/3]: case_001 (2 frames) case_001: 100%|████████████████████| 2/2 [00:00<00:00, 45.23it/s] Processing case [2/3]: case_002 (1 frames) case_002: 100%|████████████████████| 1/1 [00:00<00:00, 48.12it/s] Processing case [3/3]: case_003 (2 frames) case_003: 100%|████████████████████| 2/2 [00:00<00:00, 46.87it/s] ================================================== Evaluating 3D Detection Metrics ================================================== Processing case [1/3]: case_001 (2 frames) case_001: 100%|████████████████████| 2/2 [00:00<00:00, 42.15it/s] Processing case [2/3]: case_002 (1 frames) case_002: 100%|████████████████████| 1/1 [00:00<00:00, 43.89it/s] Processing case [3/3]: case_003 (2 frames) case_003: 100%|████████████████████| 2/2 [00:00<00:00, 41.76it/s] Detailed 3D matches saved to: evaluation_results/2level_example/20260211_143022/detailed_3d_matches.json JSON report saved to: evaluation_results/2level_example/20260211_143022/evaluation_report.json Text report saved to: evaluation_results/2level_example/20260211_143022/evaluation_report.txt Per-case reports saved to: evaluation_results/2level_example/20260211_143022/per_case_reports/ (3 cases) ================================================================================ EVALUATION SUMMARY - OVERALL ================================================================================ 2D Metrics: Precision: 0.8542 Recall: 0.8123 mAP: 0.8234 3D Metrics: vehicle [overall]: Lat=0.234m, Long=0.456m, Head=0.123rad (relaxed=0.098rad, rev=12) (n=145) pedestrian [overall]: Lat=0.189m, Long=0.312m, Head=0.234rad (relaxed=0.187rad, rev=8) (n=67) ================================================================================ ✓ Evaluation completed successfully! ``` ## Troubleshooting ### Issue: No cases found **Check:** 1. Verify `path_depth` is set correctly (1 or 2) 2. Ensure directory structure matches the expected format 3. Check that level1 directory names match between det_path and gt_path ### Issue: Some cases are skipped **Check:** 1. Each case must have `txt_results/` subdirectory (for detections) 2. Each case must have `labels/` subdirectory (for ground truth) 3. Frame names must match between detections and labels ### Issue: GT case directory not found **Check:** 1. Level1 directory names must be identical in both paths 2. Case names must be identical in both paths 3. Path structure must be consistent ## Benefits of 2-Level Structure 1. **Better Organization**: Group related test cases by dataset or configuration 2. **Flexible Comparison**: Compare models across different datasets easily 3. **Scalability**: Handle large numbers of test cases more efficiently 4. **Backward Compatible**: Existing 1-level structures continue to work ## Migration from 1-Level to 2-Level If you have existing 1-level structure and want to migrate: ```bash # Original structure /data/detections/case1/txt_results/ /data/detections/case2/txt_results/ # Create 2-level structure mkdir -p /data/detections_2level/dataset_A mv /data/detections/case1 /data/detections_2level/dataset_A/ mv /data/detections/case2 /data/detections_2level/dataset_A/ # Update config # path_depth: 1 → path_depth: 2 # det_path: /data/detections → /data/detections_2level ```