Files
HSAP/datasets/lane.embedded.bak/DATASETS_LAYOUT.md
Chengfang Lu e72bc061c5 feat: HSAP platform v2 — modular navigation, quality review, audit log, world model simulation
Major changes:
- New frontend (platform/web/): Vite + React 18 + TypeScript + Tailwind
- 4-module navigation: 数据送标 / 模型管理 / 车队管理 / 系统管理
- Data catalog with charts (DMS/ADAS/Lane 3-tab view)
- Quality review workflow (标注质检): Good/Fine/Bad scoring with auto-advance
- Audit enhancements: batch operations, rejection categories, Feishu notifications
- Operation audit log (操作日志)
- World model simulation studio (仿真工坊)
- Dataset version management with snapshots and diff
- ADAS 7-class dataset integration (138K images organized + compressed)
- User management with Feishu integration and pagination
- CRUD/search/filter on all pages, card layout redesign
- PIL-optimized image overlay rendering
- Auto-snapshot on build, in_review workflow stage
- Removed embedded algorithm code (now in workspace)
2026-06-03 11:40:21 +08:00

104 lines
3.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 多包数据集目录规范DATASET + DATASET-AddBy-*
## 目录约定
```
lane0_copy/
├── DATASET/ # 基线包 v1冻结不覆盖
│ ├── images/ ...
│ ├── annotations/segmentation_masks/ ...
│ ├── list/train_gt.txt # 仅本包内相对路径: images/... mask/...
│ └── manifest.json
├── DATASET-AddBy-zhangsan-20260615/ # 工程师增量包(独立目录)
│ ├── images/ ...
│ ├── annotations/segmentation_masks/ ...
│ ├── list/train_gt.txt
│ └── manifest.json
├── lists_merged/ # 跨包合并后的训练列表(不写回各包)
│ └── train_all_v2.txt # 行内带包名前缀,见下
└── datasets_registry.json # 登记所有包与合并列表版本
```
**命名规则:** `DATASET-AddBy-<工程师姓名>-<日期>`
- 日期建议 `YYYYMMDD`,例如 `20260615`
- 姓名用英文/拼音,避免空格(可用 `_`
## 列表文件格式(合并训练)
`data_root` 设为 **`lane0_copy`**(各包的父目录),合并列表每行两列,路径**带包名前缀**
```
DATASET/images/src_.../frame_000001.jpg DATASET/annotations/segmentation_masks/src_.../frame_000001.png
DATASET-AddBy-zhangsan-20260615/images/src_.../frame_000001.jpg DATASET-AddBy-zhangsan-20260615/annotations/...
```
UFLD 配置示例(**推荐:在 config 里写 train_packs**
```python
# configs/mufld_lane_multi_pack.py
data_root = '/home/chengfanglu/DATA/lane0_copy'
train_packs = ['DATASET', 'DATASET-A'] # 短名可在 datasets_registry.json 的 aliases 里映射
pack_list_name = 'list/train_gt.txt'
merged_list_dir = 'lists_merged'
```
`python train.py configs/mufld_lane_multi_pack.py` 会自动合并并缓存到 `lists_merged/train__DATASET__....txt`
别名示例 `datasets_registry.json`
```json
"aliases": {
"DATASET-A": "DATASET-AddBy-zhangsan-20260615"
}
```
## 工作流
### 1. 新建增量包(工程师提交 archive + train_val_gt.txt
```bash
conda activate lane_light
python scripts/build_ufld_pack.py \
--src /path/to/new_archive \
--parent /home/chengfanglu/DATA/lane0_copy \
--engineer zhangsan \
--date 20260615
```
生成:`DATASET-AddBy-zhangsan-20260615/`
### 2. 合并多包训练列表(不改动 DATASET v1
```bash
python scripts/merge_ufld_lists.py \
--data-root /home/chengfanglu/DATA/lane0_copy \
--out lists_merged/train_all_v2.txt \
--prefix-from-pack \
DATASET/list/train_gt.txt \
DATASET-AddBy-zhangsan-20260615/list/train_gt.txt
```
### 3. 训练
```bash
cd /home/chengfanglu/DATA/BK2/UFLD
# configs 里 data_root=lane0_copy, train_list=lists_merged/train_all_v2.txt
python train.py configs/mufld_lane_culane.py
```
### 4. 登记版本
合并脚本加 `--update-registry` 会写入 `datasets_registry.json`
## 原则
| 项 | 做法 |
|----|------|
| 基线复现 | 永远保留 `DATASET/list/train_gt.txt`,训练用副本 `lists_merged/*.txt` |
| 增量隔离 | 每个工程师一个 `DATASET-AddBy-*`,不往 DATASET 里混贴文件 |
| 磁盘 | 默认硬链接;跨盘用 `--copy` |
| 去重 | 合并时按**图像路径**去重,先出现的包优先(`--base` 指定主包) |