Files
HSAP/CLAUDE.md
Chengfang Lu e72bc061c5 feat: HSAP platform v2 — modular navigation, quality review, audit log, world model simulation
Major changes:
- New frontend (platform/web/): Vite + React 18 + TypeScript + Tailwind
- 4-module navigation: 数据送标 / 模型管理 / 车队管理 / 系统管理
- Data catalog with charts (DMS/ADAS/Lane 3-tab view)
- Quality review workflow (标注质检): Good/Fine/Bad scoring with auto-advance
- Audit enhancements: batch operations, rejection categories, Feishu notifications
- Operation audit log (操作日志)
- World model simulation studio (仿真工坊)
- Dataset version management with snapshots and diff
- ADAS 7-class dataset integration (138K images organized + compressed)
- User management with Feishu integration and pagination
- CRUD/search/filter on all pages, card layout redesign
- PIL-optimized image overlay rendering
- Auto-snapshot on build, in_review workflow stage
- Removed embedded algorithm code (now in workspace)
2026-06-03 11:40:21 +08:00

507 lines
24 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# CLAUDE.md
This file provides guidance to Claude Code when working with code in this repository.
## Project Overview
**HSAP** (Huaxu Sentinel Active Safety Platform / 华胥 Sentinel 主动安全平台) is a truck active safety algorithm iteration platform covering DMS (Driver Monitoring), Lane detection, and ADAS perception tasks.
It is NOT a training framework — it is a **platform** that orchestrates collaboration workflows: data ingestion → labeling → audit → training → promotion, with full traceability and governance.
## Architecture: Three-Layer Decoupling
The top-level directory is strictly separated into three layers:
```
HSAP/
├── platform/ # Orchestration layer (API, auth, audit, jobs, web UI)
├── algorithms/ # Algorithm code layer (YOLO, UFLD adapters + registries)
├── datasets/ # Data layer (packs, inbox, sources, labeling configs)
├── scripts/ # Operational scripts (init, smoke tests, sync, worker)
├── docs/ # Documentation (20+ md files covering all aspects)
├── lake/ # Data lake staging area
├── reports/ # Reports, CSVs, figures
├── manifests/ # Runtime configs (feishu.env, DB, job logs, catalog cache)
├── as.py # CLI entry point for workflow commands
├── workflow.registry.yaml # Central registry: projects, packs, automation rules
├── docker-compose.yml # PostgreSQL + Redis + platform + worker + optional minio
├── Dockerfile # Python 3.11-slim, FastAPI on port 8787
└── Makefile # up/down/dev/logs/build/ps/health shortcuts
```
### Layer Responsibilities
| Layer | Purpose | Key Files |
|-------|---------|-----------|
| `platform/as_platform/` | FastAPI backend: API routes, auth (Feishu SSO + JWT), job queue, labeling service, data lake ingest, fleet map, DB models | `api/server.py`, `config.py`, `sdk.py` |
| `algorithms/` | Algorithm adapters: DMS YOLO and Lane UFLD, each with an `adapter.py` and metadata | `registry.yaml` for algorithm registration |
| `datasets/` | Dataset scaffolds: DMS packs (YAML registry), Lane packs (JSON registry), labeling configs | `dms/data_packs.yaml`, `lane/datasets_registry.json` |
## Key Design Decisions
### 1. Platform orchestrates, does NOT implement training
Training is routed through adapters (`algorithms/*/adapter.py`) and the job runner (`platform/as_platform/jobs/runner.py`). Adding a new task type only requires a new adapter + registry entry.
### 2. Audit-first governance
All write operations (build/train/promote/register) go through an audit queue by default. This ensures every model-affecting action is reviewable and traceable.
### 3. Dual execution modes
- **`thread`**: In-process thread execution (local dev, no Redis needed). Set via `AS_JOB_EXECUTOR=thread`.
- **`worker`**: Redis-backed async execution (Docker/production). Set via `AS_JOB_EXECUTOR=worker`. Worker runs `scripts/worker.py`.
### 4. Database flexibility
- Docker: PostgreSQL 16 (auto-configured)
- Local: Auto-falls back to SQLite (`manifests/platform.db`) when PostgreSQL is unavailable
### 5. Authentication
- Feishu (飞书) OAuth2 SSO with JWT tokens
- Dev mode: `AS_DEV_AUTH=true` bypasses Feishu login
- RBAC roles: `admin`, `reviewer`, `engineer`, `labeler`, `viewer`
## Platform Subsystems (`platform/as_platform/`)
```
as_platform/
├── api/ # FastAPI routes: auth, labeling, delivery, fleet, Feishu callbacks
├── auth/ # Feishu OAuth, JWT tokens, user management, RBAC deps
├── db/ # SQLAlchemy engine, models (User, Job, Campaign, etc.), init
├── data/ # Data lake core: ingest pipelines (DMS COCO/YOLO, Lane lines/mask), batch staging, catalog cache
│ └── ingest/ # Ingest adapters: dms_coco, dms_yolo, dms_inbox_raw, lane_lines, lane_mask
├── labeling/ # Labeling service: annotate, batch staging, vendor import, scope, progress, locks
├── audit/ # Audit queue and preview logic
├── jobs/ # Job queue (Redis List or in-memory), runner, Feishu Bitable sync
├── deliveries/ # Delivery/model handoff service
├── fleet/ # Fleet map: GPS tracking, T-Box ingest, mock data seeding
├── training/ # Training service orchestration
├── agents/ # Agent graphs: ingest_flow, labeling_flow, train_promote_flow
│ └── graphs/ # Workflow graph definitions
├── integrations/ # Third-party: Feishu Bitable, Feishu notify, delivery ingest
├── redis/ # Redis pub/sub bus
├── config.py # Central config from env vars (AS_* prefix)
└── sdk.py # Python SDK for platform operations
```
## CLI Usage (`as.py`)
```bash
python as.py status # Show workspace and active packs
python as.py pending # Show pending audit items
python as.py add dms dam --src ... # Register a DMS batch
python as.py build dms dam # Build dataset from active packs
python as.py train dms dam --track local # Train locally
python as.py train dms dam --track platform # Train via platform (requires audit)
```
## Workflow Registry (`workflow.registry.yaml`)
Central configuration for:
- **Projects**: `dms` (DMS YOLO) and `lane` (Lane UFLD), each with root paths, registries, active packs
- **Platform settings**: batch metadata schema, drop zones for inbox/sources, training tracks, agent graphs
- **Automation rules**: eval-before-promote requirement, minimum delta thresholds, baseline metrics
## Docker Services
| Service | Port | Description |
|---------|------|-------------|
| platform | 8787 | FastAPI + static web UI |
| postgres | 5432 (host mapped to 5433) | PostgreSQL 16 |
| redis | 6379 (host mapped to 6380) | Redis 7 |
| worker | - | Async job executor (same image, different command) |
| minio | 9000/9001 | Optional S3-compatible staging (profile: minio) |
## Build & Run Commands
```bash
# Quick start (Docker)
bash scripts/init_after_clone.sh # Generate .env / feishu.env
bash scripts/dev_up.sh # Or: make up
# Local dev (no Docker for platform)
pip install -r requirements.txt
bash scripts/run_local.sh
# Infrastructure only (Docker) + local platform
docker compose up -d postgres redis
bash scripts/run_local.sh
# Utilities
make logs # platform + worker logs
make down # stop all
make dev # with Vite hot reload on :5173
make health # check API health
```
## Key Environment Variables
| Variable | Default | Purpose |
|----------|---------|---------|
| `AS_PLATFORM_PORT` | 8787 | Platform API port |
| `AS_DB_HOST/PORT/USER/PASSWORD/NAME` | - | PostgreSQL connection |
| `AS_REDIS_URL` | - | Redis connection URL |
| `AS_JOB_EXECUTOR` | `thread` | `thread` or `worker` |
| `AS_DEV_AUTH` | `false` | Bypass Feishu auth in dev |
| `AS_JWT_SECRET` | - | JWT signing secret |
| `AS_WORKSPACE_ROOT` | - | External workspace path for large files |
| `AS_FLEET_MAP_ENABLED` | `1` | Enable fleet map APIs |
| `AS_FLEET_MOCK_SEED` | `1` | Seed demo vehicles on first start |
## Important Conventions
1. **Never commit**: `.env`, `feishu.env`, `node_modules`, `*.pt` (model weights), images/videos
2. **Python path**: Platform code runs with `PYTHONPATH=platform` from repo root
3. **Package name**: The Python package is `as_platform` (historical name preserved)
4. **Web UI**: React/Vite frontend sources are external (in workspace), built via `scripts/build_hsap_ls_ui.sh`
5. **Large files**: Images, videos, model weights live in external workspace, mounted at `/data/workspace` in Docker
## Documentation Index
Key docs in `docs/`:
- `DEVELOPMENT_GUIDE.md` — Architecture, conventions, design decisions
- `DEVELOPMENT_ROADMAP.md` — Q2 2026 roadmap with phased milestones
- `DATA_LAKE_CHECKLIST.md` — Data lake operations checklist
- `LABELING_SOP.md` — Labeling standard operating procedure
- `FLEET_MAP.md` — Fleet map / T-Box GPS tracking
- `FEISHU_BITABLE_OPS.md` — Feishu Bitable integration operations
- `BATCH_DELIVERY_OPS.md` — Batch delivery operations
- `LANE_LABELING_PLAN.md` — Lane labeling plan
- `MINIO_STAGING.md` — S3-compatible staging setup
- `PILOT_BATCH.md` — Pilot batch procedures
- `GIT_PUSH.md` — Git push checklist
## Module Navigation (4-Module Architecture)
The frontend (`platform/web/`) is organized into 4 decoupled modules, each with its own tab sub-navigation:
| Module | Route Prefix | API Prefix | Sub-pages |
|--------|-------------|-----------|-----------|
| **数据送标** (Labeling) | `/labeling/*` | `/api/v1/labeling/*` | 数据上载、送标工作台、标注进度、导出与入库、批次台账、数据目录 |
| **模型管理** (Models) | `/models/*` | `/api/v1/models/*` | 模型概览、训练提交、训练记录、评估管理、模型晋级 |
| **车队管理** (Fleet) | `/fleet/*` | `/api/v1/fleet/*` | 车队总览、车辆管理、实时地图、行程记录、T-Box配置 |
| **系统管理** (System) | `/system/*` | `/api/v1/system/*` | 审核队列、任务监控、执行日志、用户管理 |
**Key files:**
- `platform/web/src/app/Sidebar.tsx` — Collapsible accordion sidebar with permission-gated module groups
- `platform/web/src/app/MainShell.tsx` — Main layout with sidebar + content area + legacy redirects
- `platform/web/src/modules/{labeling,models,fleet,system}/*Shell.tsx` — Module shells with tab sub-navigation
- `platform/as_platform/api/models_routes.py``/api/v1/models/*` (model lifecycle APIs)
- `platform/as_platform/api/system_routes.py``/api/v1/system/*` (audit, jobs, traces, users APIs)
**Legacy route redirects** (old → new):
- `/deliveries``/labeling/deliveries`
- `/catalog``/labeling/catalog`
- `/audit``/system/audit`
- `/jobs``/system/jobs`
- `/training``/models/training/records`
- `/labeling/ml``/models/overview`
## Technology Stack
- **Backend**: Python 3.11, FastAPI, SQLAlchemy 2.0, Uvicorn
- **Frontend**: React 18 + TypeScript + Vite + Tailwind CSS (`platform/web/`)
- **Database**: PostgreSQL 16 (production), SQLite (local fallback)
- **Queue**: Redis (List-based job queue + Pub/Sub)
- **Auth**: Feishu OAuth2 + JWT (python-jose)
- **Container**: Docker Compose v2
- **Algorithms**: YOLOv6 (DMS), UFLD (Lane detection)
## 开发计划 — 缺失功能 Todo
### P0 · 首页仪表盘 (1天)
- [ ] 新建 `/` 路由仪表盘页,替代当前直接跳转批次台账
- [ ] 数据流水线 KPI 卡片:各阶段批次数(待送标/标中/待入库/已入库)
- [ ] 模型健康卡片:最新 mAP、训练中任务数、生产模型版本
- [ ] 审核待办卡片pending 审核数、今日处理量
- [ ] 车队实时卡片:在线车辆数、活跃行程数
- [ ] 最近活动时间线(最近登记/审核/训练事件)
### P0 · 飞书审核通知 (0.5天)
- [ ] 审核提交时 → 通知审核员群:"{user} 提交了 {action},请审核"
- [ ] 审核通过/驳回时 → 通知提审人:"你的 {action} 已{通过/驳回}"
- [ ] 复用已有 `integrations/feishu_notify.py`,接入 `audit/queue.py`
- [ ] 环境变量 `FEISHU_LABELING_CHAT_ID` 控制通知目标群
### P1 · 入库基础质检 (1天)
- [ ] 扫描入库时增加基础质量检测:
- 图片可读性(损坏/全黑/全白检测)
- 分辨率分布(中位数/最小/最大)
- 标注文件格式正确性YOLO 格式校验)
- 标注框越界/零宽高检测
- [ ] 质检结果在扫描面板展示(通过/警告/拒绝)
- [ ] 拒绝的批次不允许登记,需人工处理
### P1 · 操作审计日志 (0.5天) — 详细设计
#### DB 设计
- [ ] A1. 新增 `OperationLog` 表 (SQLAlchemy model)
- `id` (int, PK, auto)
- `timestamp` (datetime, 操作时间)
- `user_id` (int, FK → users.id, nullable, 操作人)
- `user_name` (str, 操作人姓名冗余)
- `category` (str, 操作分类: auth/data/labeling/audit/training/system)
- `action` (str, 具体操作: login/logout/register_batch/open_campaign/submit_approval/approve/reject/set_roles/create_snapshot/build_dms/train_dms 等)
- `target_type` (str, 操作对象类型: user/batch/campaign/approval/job/snapshot/role)
- `target_id` (str, 操作对象 ID)
- `summary` (str, 一行摘要,如 "登记批次 ddaw/20260601_pilot → raw_pool")
- `detail_json` (text, 完整上下文 JSON可选)
- `ip_address` (str, 请求来源 IP可选)
- [ ] A2. 自动建表 (`Base.metadata.create_all` + `_ensure_*_columns`)
- [ ] A3. 定期清理:保留 90 天,超过的自动归档/删除
#### 后端埋点
- [ ] B1. 新增 `audit_log.py` 工具模块,提供 `log_op(db, **kwargs)` 便捷函数
- [ ] B2. 在以下关键操作处插入日志记录:
| 分类 | 操作 | 文件位置 |
|------|------|---------|
| auth | login (dev/feishu) | `auth_routes.py` |
| auth | logout | (前端触发,可选) |
| data | register_batch | `server.py` api_register_batch |
| data | scan_inbox | `server.py` (只记录登记动作) |
| labeling | open_campaign | `labeling_routes.py` |
| labeling | submit_campaign | `labeling_routes.py` |
| labeling | labeling_export | `labeling_routes.py` |
| labeling | import_vendor | `labeling_routes.py` |
| audit | submit_approval | `queue.py` |
| audit | approve | `queue.py` |
| audit | reject | `queue.py` |
| training | create_training | `server.py` / `models_routes.py` |
| system | set_user_roles | `auth_routes.py` |
| system | sync_feishu_users | `system_routes.py` |
| data | create_snapshot | `models_routes.py` |
| delivery | create/submit/delete | `delivery_routes.py` |
- [ ] B3. 所有日志记录使用 `threading.Thread(daemon=True)` 异步写入,不阻塞主流程
#### 后端 API
- [ ] C1. `GET /api/v1/system/audit-log` — 查询日志列表
- 参数: `user_id`, `category`, `action`, `target_type`, `search`(模糊搜索 summary), `offset`, `limit`
- 返回: `{items: [...], total: N}`
- 权限: `admin:users``*`
- [ ] C2. `GET /api/v1/system/audit-log/stats` — 统计摘要
- 返回: `{today_count, top_users, top_actions, by_category}`
#### 前端页面
- [ ] D1. 系统管理 → 新增"操作日志"Tab
- 时间线列表视图(倒序),每行显示:时间、用户头像+姓名、分类Badge、操作摘要
- 筛选栏:按用户、分类、操作类型、时间范围
- 点击展开查看 detail JSON
- 分页
- [ ] D2. 仪表盘增加"最近操作"卡片(可选,后做)
### P2 · 标注质量抽检 (1.5天)
- [ ] 标注提交后随机抽取 N 张(可配置比例),进入抽检队列
- [ ] 抽检页面:并排显示图片+标注框,通过/不通过
- [ ] 不通过 → 退回标注员重标
- [ ] 统计抽检通过率、各标注员准确率
### P2 · 模型预标(需要模型)(2天)
- [ ] 选已有模型 → 对新批次跑推理 → 生成预标注
- [ ] 标注员在预标基础上修正,而非从零开始
- [ ] 记录预标模型版本,关联到后续训练
### P3 · 模型部署追踪(需要生产环境)
- [ ] 模型版本状态experiment → candidate → production → retired
- [ ] 部署历史:何时上线、运行多久、何时下线
- [ ] 线上推理效果监控(回传误检/漏检统计)
### P3 · 采集任务管理(需要车队运营)
- [ ] 创建采集任务:指定场景/路线/时间段
- [ ] 关联车队车辆:指派车辆执行采集
- [ ] 数据回传状态追踪:已采集/已传输/已入库
---
## 当前完成状态
| 模块 | 页面 | 状态 |
|------|------|------|
| 数据送标 | 送标工作台 (扫描入库) | ✅ |
| 数据送标 | 标注进度 (Campaigns) | ✅ |
| 数据送标 | 导出与入库 (供应商回传) | ✅ |
| 数据送标 | 批次台账 (Deliveries) | ✅ |
| 数据送标 | 数据目录 (可视化) | ✅ |
| 模型管理 | 模型概览 (KPI卡片) | ✅ |
| 模型管理 | 数据集版本 (自动快照+diff) | ✅ |
| 模型管理 | 训练提交 (表单校验) | ✅ |
| 模型管理 | 训练记录 (分页+展开详情) | ✅ |
| 模型管理 | 评估管理 (mAP对比图) | ✅ |
| 模型管理 | 模型晋级 (版本选择+历史) | ✅ |
| 车队管理 | 总览/车辆/地图/行程/T-Box | ✅ |
| 系统管理 | 审核队列 (批量操作+驳回分类) | ✅ |
| 系统管理 | 任务监控 (自动刷新) | ✅ |
| 系统管理 | 执行日志 (Trace查看) | ✅ |
| 系统管理 | 用户管理 (飞书信息+分页) | ✅ |
| 🆕 P0 | 首页仪表盘 | ❌ |
| 🆕 P0 | 飞书审核通知 | ❌ |
| 🆕 P1 | 入库基础质检 | ❌ |
| 🆕 P1 | 操作审计日志 | ❌ |
---
## 新增架构:世界模型仿真 + 视频预处理管线
### 一、整体数据闭环(完整架构)
```
┌──────────────────────────────┐
│ T-Box / 采集车 GPS │
│ 多帧视频流 (连续帧) │
└──────────────┬───────────────┘
┌──────────────────────────────────────────────────────────────┐
│ 视频预处理管线 (Preprocess Pipeline) │
│ │
│ ① 去噪 (Denoise) ② 去重 (Dedup) ③ 异常过滤 (Anomaly) │
│ 光流/像素级去噪 SSIM/感知哈希 全黑/模糊/过曝检测 │
│ │ │ │ │
│ └────────────────────┴────────────────────┘ │
│ │ │
│ ④ 关键帧提取 │
│ 场景切换/间隔采样 │
│ │ │
│ ⑤ 质量评分 │
│ 每帧 quality_score 0-100 │
└────────────────────────────┬─────────────────────────────────┘
┌─────────────────┐
│ 扫描入库/登记 │
│ stage: raw_pool │
└────────┬────────┘
┌──────────────┼──────────────┐
│ │ │
▼ ▼ ▼
真实数据标注 质检审核通过 仿真数据生成
│ │ │
└──────────────┼──────────────┘
┌─────────────────┐
│ build 入库 │
│ 自动版本快照 │
└────────┬────────┘
┌─────────────────┐
│ 模型训练/评估 │
└────────┬────────┘
┌────────┴────────┐
│ 评估反馈 → 数据缺口│
│ mAP低 → 仿真补数据 │
└─────────────────┘
```
### 二、视频预处理管线设计
#### 2.1 数据模型
```
预处理Job:
id, source_path, status, created_at
params: {denoise_method, dedup_threshold, anomaly_filters, keyframe_interval}
result: {
input_frames, output_frames, removed_duplicates,
removed_anomalies, quality_distribution, processing_time
}
```
#### 2.2 去噪 (Denoise)
| 方法 | 适用场景 | 计算量 |
|------|---------|--------|
| `fast_nonlocal_means` | 夜间/低光图像 | 中 |
| `bilateral_filter` | 保留边缘的平滑 | 低 |
| `temporal_median` | 连续帧时序去噪 | 高 |
| `none` | 光线充足的日间 | 无 |
#### 2.3 去重 (Dedup)
| 方法 | 阈值 | 说明 |
|------|------|------|
| `ssim` | 0.95 | 结构相似度 > 0.95 视为重复 |
| `phash` | hamming ≤ 5 | 感知哈希距离 ≤ 5 视为重复 |
| `histogram` | correlation > 0.98 | 直方图相关 > 0.98 视为重复 |
#### 2.4 异常过滤 (Anomaly)
| 检测项 | 阈值 | 动作 |
|--------|------|------|
| 全黑图 | mean_pixel < 5 | 丢弃 |
| 全白图 | mean_pixel > 250 | 丢弃 |
| 模糊图 | Laplacian variance < 100 | 丢弃 |
| 过曝 | overexposed_ratio > 0.4 | 丢弃 |
| 遮挡 | black_border_ratio > 0.3 | 标记 |
#### 2.5 关键帧提取策略
| 策略 | 适用场景 |
|------|---------|
| `fixed_interval` | 等间隔采样,每 N 帧取 1 帧 |
| `scene_change` | 检测场景切换时抽取 |
| `motion_peak` | 运动幅度最大帧(最有信息量) |
| `quality_top` | 取质量分最高的 K% |
### 三、API 设计
```
# 视频预处理
POST /api/v1/preprocess/analyze # 分析视频(不解码全量,采样统计)
POST /api/v1/preprocess/run # 执行预处理(异步 Job
GET /api/v1/preprocess/jobs # 预处理任务列表
GET /api/v1/preprocess/jobs/{id} # 任务详情 + 统计
GET /api/v1/preprocess/jobs/{id}/frames # 预览帧(抽样展示)
POST /api/v1/preprocess/jobs/{id}/ingest # 处理后数据入库
# 仿真生成
POST /api/v1/simulate/generate # 提交生成任务
GET /api/v1/simulate/jobs # 生成历史
GET /api/v1/simulate/jobs/{id}/images # 预览生成结果
POST /api/v1/simulate/jobs/{id}/ingest # 入库
```
### 四、前端页面设计
#### 4.1 视频预处理页 `/labeling/preprocess`
```
┌─ 配置面板 ────────────────────────────┐
│ 源路径: [________________] [浏览] │
│ │
│ 去噪: [none ▼] │
│ 去重: [ssim ▼] 阈值: [0.95] │
│ 异常: ☑全黑 ☑模糊 ☑过曝 │
│ 关键帧: [fixed_interval ▼] 间隔: [10] │
│ │
│ [🔍 分析视频] [▶ 执行预处理] │
├─ 分析结果 ────────────────────────────┤
│ 总帧数: 12,340 │
│ 预计去重后: ~8,500 (去重率 31%) │
│ 异常帧: 234 (1.9%) │
│ 质量分布: ████████░░ 85% good │
├─ 处理后预览 ──────────────────────────┤
│ [缩略图网格] [采样帧对比] [去重前后对比] │
└──────────────────────────────────────┘
```
#### 4.2 仿真工坊页 `/labeling/simulate` (已实现,见源码)
### 五、后端模块结构
```
platform/as_platform/data/
├── simulate.py # 世界模型仿真 API 层 (已实现)
├── preprocess.py # 视频预处理管线 (待实现)
│ ├── denoise.py # 去噪算法
│ ├── dedup.py # 去重算法
│ └── anomaly.py # 异常检测
└── quality.py # 图像质量评分引擎
```
### 六、依赖
```txt
# 视频预处理
opencv-python-headless>=4.8.0 # 帧提取、去噪、Laplacian
scikit-image>=0.21.0 # SSIM、直方图
Pillow>=10.0.0 # (已有) 基础图像操作
imagehash>=4.3.1 # 感知哈希去重
```
| 🆕 P2 | 标注质量抽检 | ❌ |