Files
HSAP/datasets/lane.embedded.bak/DATASETS_LAYOUT.md
Chengfang Lu e72bc061c5 feat: HSAP platform v2 — modular navigation, quality review, audit log, world model simulation
Major changes:
- New frontend (platform/web/): Vite + React 18 + TypeScript + Tailwind
- 4-module navigation: 数据送标 / 模型管理 / 车队管理 / 系统管理
- Data catalog with charts (DMS/ADAS/Lane 3-tab view)
- Quality review workflow (标注质检): Good/Fine/Bad scoring with auto-advance
- Audit enhancements: batch operations, rejection categories, Feishu notifications
- Operation audit log (操作日志)
- World model simulation studio (仿真工坊)
- Dataset version management with snapshots and diff
- ADAS 7-class dataset integration (138K images organized + compressed)
- User management with Feishu integration and pagination
- CRUD/search/filter on all pages, card layout redesign
- PIL-optimized image overlay rendering
- Auto-snapshot on build, in_review workflow stage
- Removed embedded algorithm code (now in workspace)
2026-06-03 11:40:21 +08:00

3.3 KiB
Raw Blame History

多包数据集目录规范DATASET + DATASET-AddBy-*

目录约定

lane0_copy/
├── DATASET/                              # 基线包 v1冻结不覆盖
│   ├── images/ ...
│   ├── annotations/segmentation_masks/ ...
│   ├── list/train_gt.txt                 # 仅本包内相对路径: images/... mask/...
│   └── manifest.json
│
├── DATASET-AddBy-zhangsan-20260615/      # 工程师增量包(独立目录)
│   ├── images/ ...
│   ├── annotations/segmentation_masks/ ...
│   ├── list/train_gt.txt
│   └── manifest.json
│
├── lists_merged/                         # 跨包合并后的训练列表(不写回各包)
│   └── train_all_v2.txt                  # 行内带包名前缀,见下
│
└── datasets_registry.json                # 登记所有包与合并列表版本

命名规则: DATASET-AddBy-<工程师姓名>-<日期>

  • 日期建议 YYYYMMDD,例如 20260615
  • 姓名用英文/拼音,避免空格(可用 _

列表文件格式(合并训练)

data_root 设为 lane0_copy(各包的父目录),合并列表每行两列,路径带包名前缀

DATASET/images/src_.../frame_000001.jpg DATASET/annotations/segmentation_masks/src_.../frame_000001.png
DATASET-AddBy-zhangsan-20260615/images/src_.../frame_000001.jpg DATASET-AddBy-zhangsan-20260615/annotations/...

UFLD 配置示例(推荐:在 config 里写 train_packs

# configs/mufld_lane_multi_pack.py
data_root = '/home/chengfanglu/DATA/lane0_copy'
train_packs = ['DATASET', 'DATASET-A']   # 短名可在 datasets_registry.json 的 aliases 里映射
pack_list_name = 'list/train_gt.txt'
merged_list_dir = 'lists_merged'

python train.py configs/mufld_lane_multi_pack.py 会自动合并并缓存到 lists_merged/train__DATASET__....txt

别名示例 datasets_registry.json

"aliases": {
  "DATASET-A": "DATASET-AddBy-zhangsan-20260615"
}

工作流

1. 新建增量包(工程师提交 archive + train_val_gt.txt

conda activate lane_light
python scripts/build_ufld_pack.py \
  --src /path/to/new_archive \
  --parent /home/chengfanglu/DATA/lane0_copy \
  --engineer zhangsan \
  --date 20260615

生成:DATASET-AddBy-zhangsan-20260615/

2. 合并多包训练列表(不改动 DATASET v1

python scripts/merge_ufld_lists.py \
  --data-root /home/chengfanglu/DATA/lane0_copy \
  --out lists_merged/train_all_v2.txt \
  --prefix-from-pack \
  DATASET/list/train_gt.txt \
  DATASET-AddBy-zhangsan-20260615/list/train_gt.txt

3. 训练

cd /home/chengfanglu/DATA/BK2/UFLD
# configs 里 data_root=lane0_copy, train_list=lists_merged/train_all_v2.txt
python train.py configs/mufld_lane_culane.py

4. 登记版本

合并脚本加 --update-registry 会写入 datasets_registry.json

原则

做法
基线复现 永远保留 DATASET/list/train_gt.txt,训练用副本 lists_merged/*.txt
增量隔离 每个工程师一个 DATASET-AddBy-*,不往 DATASET 里混贴文件
磁盘 默认硬链接;跨盘用 --copy
去重 合并时按图像路径去重,先出现的包优先(--base 指定主包)