207 lines
5.1 KiB
Markdown
207 lines
5.1 KiB
Markdown
|
|
# Ground 2D Detection Training Guide
|
||
|
|
|
||
|
|
This guide explains how to train YOLO26 models on custom ground 2D detection datasets with difficulty-based loss weighting.
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
The ground 2D detection implementation supports:
|
||
|
|
- Custom annotation format with string class names
|
||
|
|
- Difficulty scores for each bounding box
|
||
|
|
- Difficulty-based loss weighting (loss_weight = 1.0 / (1.0 + difficulty))
|
||
|
|
- Minimum box size filtering
|
||
|
|
- Optional YUV444 color space support
|
||
|
|
|
||
|
|
## Dataset Structure
|
||
|
|
|
||
|
|
### Directory Layout
|
||
|
|
```
|
||
|
|
dataset/
|
||
|
|
├── images/
|
||
|
|
│ ├── train/
|
||
|
|
│ │ ├── img001.jpg
|
||
|
|
│ │ ├── img002.jpg
|
||
|
|
│ │ └── ...
|
||
|
|
│ └── val/
|
||
|
|
│ ├── img101.jpg
|
||
|
|
│ ├── img102.jpg
|
||
|
|
│ └── ...
|
||
|
|
├── labels/
|
||
|
|
│ ├── train/
|
||
|
|
│ │ ├── img001.txt
|
||
|
|
│ │ ├── img002.txt
|
||
|
|
│ │ └── ...
|
||
|
|
│ └── val/
|
||
|
|
│ ├── img101.txt
|
||
|
|
│ ├── img102.txt
|
||
|
|
│ └── ...
|
||
|
|
└── dataset.yaml
|
||
|
|
```
|
||
|
|
|
||
|
|
### Label Format
|
||
|
|
|
||
|
|
Each label file contains one line per object with 7 columns:
|
||
|
|
```
|
||
|
|
class_name x_center y_center width height difficulty1 difficulty2
|
||
|
|
```
|
||
|
|
|
||
|
|
Example:
|
||
|
|
```
|
||
|
|
car 0.5 0.5 0.3 0.2 0.0 0.0
|
||
|
|
pedestrian 0.3 0.4 0.1 0.15 0.5 0.5
|
||
|
|
bicycle 0.7 0.6 0.15 0.2 1.0 1.0
|
||
|
|
```
|
||
|
|
|
||
|
|
Where:
|
||
|
|
- `class_name`: String class name (e.g., "car", "pedestrian")
|
||
|
|
- `x_center, y_center, width, height`: Normalized coordinates [0, 1]
|
||
|
|
- `difficulty1, difficulty2`: Difficulty values that will be combined (difficulty = difficulty1 + difficulty2)
|
||
|
|
|
||
|
|
### Dataset YAML Configuration
|
||
|
|
|
||
|
|
Create a `dataset.yaml` file:
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
# Dataset paths
|
||
|
|
path: /path/to/dataset
|
||
|
|
train: images/train
|
||
|
|
val: images/val
|
||
|
|
|
||
|
|
# Class mapping: string names to numeric IDs
|
||
|
|
class_map:
|
||
|
|
car: 0
|
||
|
|
pedestrian: 1
|
||
|
|
bicycle: 2
|
||
|
|
truck: 3
|
||
|
|
|
||
|
|
# Optional parameters
|
||
|
|
min_wh: 2.0 # Keep boxes whose width or height is at least this many pixels
|
||
|
|
use_yuv444: false # Use YUV444 color space (default: false)
|
||
|
|
```
|
||
|
|
|
||
|
|
## Training
|
||
|
|
|
||
|
|
### Python API
|
||
|
|
|
||
|
|
```python
|
||
|
|
from ultralytics.models.yolo.detect import GroundDetectionTrainer
|
||
|
|
|
||
|
|
# Initialize trainer
|
||
|
|
trainer = GroundDetectionTrainer(
|
||
|
|
overrides={
|
||
|
|
"model": "yolo26n.pt", # or yolo26s.pt, yolo26m.pt, etc.
|
||
|
|
"data": "path/to/dataset.yaml",
|
||
|
|
"epochs": 100,
|
||
|
|
"imgsz": 640,
|
||
|
|
"batch": 16,
|
||
|
|
"device": 0,
|
||
|
|
}
|
||
|
|
)
|
||
|
|
|
||
|
|
# Start training
|
||
|
|
trainer.train()
|
||
|
|
```
|
||
|
|
|
||
|
|
### Command Line
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Using GroundDetectionTrainer directly
|
||
|
|
python -c "from ultralytics.models.yolo.detect import GroundDetectionTrainer; \
|
||
|
|
trainer = GroundDetectionTrainer(overrides={'model': 'yolo26n.pt', 'data': 'dataset.yaml', 'epochs': 100}); \
|
||
|
|
trainer.train()"
|
||
|
|
```
|
||
|
|
|
||
|
|
## Difficulty-Based Loss Weighting
|
||
|
|
|
||
|
|
The implementation applies difficulty-based weighting to all loss components:
|
||
|
|
|
||
|
|
```python
|
||
|
|
loss_weight = 1.0 / (1.0 + difficulty)
|
||
|
|
```
|
||
|
|
|
||
|
|
Examples:
|
||
|
|
- difficulty = 0.0 (easy): weight = 1.0
|
||
|
|
- difficulty = 1.0 (medium): weight = 0.5
|
||
|
|
- difficulty = 2.0 (hard): weight = 0.33
|
||
|
|
|
||
|
|
This allows the model to focus more on easier, more detectable objects while still learning from harder examples.
|
||
|
|
|
||
|
|
## Class Mapping
|
||
|
|
|
||
|
|
The `class_map` in the YAML allows flexible class merging:
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
class_map:
|
||
|
|
car: 0
|
||
|
|
suv: 0 # Maps to same class as car
|
||
|
|
van: 0 # Maps to same class as car
|
||
|
|
bus: 1
|
||
|
|
truck: 2
|
||
|
|
pedestrian: 3
|
||
|
|
```
|
||
|
|
|
||
|
|
## Minimum Box Size Filtering
|
||
|
|
|
||
|
|
Boxes are filtered out only when both width and height are smaller than `min_wh` pixels:
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
min_wh: 2.0 # Filter boxes only if both sides are smaller than 2 px
|
||
|
|
```
|
||
|
|
|
||
|
|
This is useful for removing very small objects that are difficult to detect.
|
||
|
|
|
||
|
|
## YUV444 Color Space
|
||
|
|
|
||
|
|
If your images are in YUV444 format, enable conversion:
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
use_yuv444: true
|
||
|
|
```
|
||
|
|
|
||
|
|
The dataset will automatically convert images from YUV444 to BGR during loading.
|
||
|
|
|
||
|
|
## Validation
|
||
|
|
|
||
|
|
The trained model can be validated using the standard YOLO validation:
|
||
|
|
|
||
|
|
```python
|
||
|
|
from ultralytics import YOLO
|
||
|
|
|
||
|
|
model = YOLO("runs/detect/train/weights/best.pt")
|
||
|
|
results = model.val(data="dataset.yaml")
|
||
|
|
```
|
||
|
|
|
||
|
|
## Inference
|
||
|
|
|
||
|
|
Use the trained model for inference:
|
||
|
|
|
||
|
|
```python
|
||
|
|
from ultralytics import YOLO
|
||
|
|
|
||
|
|
model = YOLO("runs/detect/train/weights/best.pt")
|
||
|
|
results = model.predict("path/to/image.jpg")
|
||
|
|
```
|
||
|
|
|
||
|
|
## Key Implementation Files
|
||
|
|
|
||
|
|
- `ultralytics/utils/instance.py`: Extended `Instances` class with difficulty support
|
||
|
|
- `ultralytics/data/utils.py`: Added `verify_image_label_ground()` function
|
||
|
|
- `ultralytics/data/dataset.py`: Added `YOLOGroundDataset` class
|
||
|
|
- `ultralytics/utils/loss.py`: Added `v8DetectionLossGround` class
|
||
|
|
- `ultralytics/data/build.py`: Modified `build_yolo_dataset()` to detect ground datasets
|
||
|
|
- `ultralytics/models/yolo/detect/train.py`: Added `GroundDetectionTrainer` class
|
||
|
|
- `ultralytics/cfg/datasets/ground_template.yaml`: Dataset configuration template
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
### Issue: "class_map not found in data"
|
||
|
|
Make sure your dataset YAML has a `class_map` dictionary instead of a `names` list.
|
||
|
|
|
||
|
|
### Issue: "labels require 6 columns"
|
||
|
|
Check that your label files have exactly 7 columns (class_name + 4 coords + 2 difficulties).
|
||
|
|
|
||
|
|
### Issue: "negative label values"
|
||
|
|
Ensure all coordinates and difficulty values are non-negative.
|
||
|
|
|
||
|
|
### Issue: "non-normalized coordinates"
|
||
|
|
Coordinates must be normalized to [0, 1] range.
|