单目3D初始代码

2026-06-24 09:35:46 +08:00
commit 04a5895b6b
1153 changed files with 340700 additions and 0 deletions
--- a/examples/YOLOv8-Segmentation-ONNXRuntime-Python/README.md
+++ b/examples/YOLOv8-Segmentation-ONNXRuntime-Python/README.md
@@ -0,0 +1,64 @@
+# YOLOv8-Segmentation-ONNXRuntime-Python Demo
+
+This repository provides a [Python](https://www.python.org/) demo for performing instance segmentation with [Ultralytics YOLOv8](https://docs.ultralytics.com/models/yolov8/) using [ONNX Runtime](https://onnxruntime.ai/). It highlights the interoperability of YOLOv8 models, allowing inference without requiring the full [PyTorch](https://pytorch.org/) stack. This approach is ideal for deployment scenarios where minimal dependencies are preferred. Learn more about the [segmentation task](https://docs.ultralytics.com/tasks/segment/) on our documentation.
+
+## ✨ Features
+
+- **Framework Agnostic**: Runs segmentation inference purely on ONNX Runtime without importing PyTorch.
+- **Efficient Inference**: Supports both FP32 and [half-precision](https://www.ultralytics.com/glossary/half-precision) (FP16) for [ONNX](https://onnx.ai/) models, catering to different computational needs and optimizing [inference latency](https://www.ultralytics.com/glossary/inference-latency).
+- **Ease of Use**: Utilizes simple command-line arguments for straightforward model execution.
+- **Broad Compatibility**: Leverages [NumPy](https://numpy.org/) and [OpenCV](https://opencv.org/) for image processing, ensuring wide compatibility across various environments.
+
+## 🛠️ Installation
+
+Install the required packages using pip. You will need [`ultralytics`](https://github.com/ultralytics/ultralytics) for exporting the YOLOv8-seg ONNX model and using some utility functions, [`onnxruntime-gpu`](https://pypi.org/project/onnxruntime-gpu/) for GPU-accelerated inference, and [`opencv-python`](https://pypi.org/project/opencv-python/) for image processing.
+
+```bash
+pip install ultralytics
+pip install onnxruntime-gpu # For GPU support
+# pip install onnxruntime # For CPU-only support
+pip install numpy opencv-python
+```
+
+## 🚀 Getting Started
+
+### 1. Export the YOLOv8 ONNX Model
+
+First, export your Ultralytics YOLOv8 segmentation model to the ONNX format using the `ultralytics` package. This step converts the PyTorch model into a standardized format suitable for ONNX Runtime. Check our [Export documentation](https://docs.ultralytics.com/modes/export/) for more details on export options and our [ONNX integration guide](https://docs.ultralytics.com/integrations/onnx/).
+
+```bash
+yolo export model=yolov8s-seg.pt imgsz=640 format=onnx opset=12 simplify
+```
+
+### 2. Run Inference
+
+Perform inference with the exported ONNX model on your images or video sources. Specify the path to your ONNX model and the image source using the command-line arguments.
+
+```bash
+python main.py --model yolov8s-seg.onnx --source path/to/image.jpg
+```
+
+### Example Output
+
+After running the command, the script will process the image, perform segmentation, and display the results with bounding boxes and masks overlaid.
+
+<img src="https://user-images.githubusercontent.com/51357717/279988626-eb74823f-1563-4d58-a8e4-0494025b7c9a.jpg" alt="YOLOv8 Segmentation ONNX Demo Output" width="800">
+
+## 💡 Advanced Usage
+
+For more advanced usage scenarios, such as processing video streams or adjusting inference parameters, please refer to the command-line arguments available in the `main.py` script. You can explore options like confidence thresholds and input image sizes.
+
+## 🤝 Contributing
+
+We welcome contributions to improve this demo! If you encounter bugs, have feature requests, or want to submit enhancements (like a new algorithm or improved processing steps), please open an issue or pull request on the main [Ultralytics repository](https://github.com/ultralytics/ultralytics). See our [Contributing Guide](https://docs.ultralytics.com/help/contributing/) for more details on how to get involved.
+
+## 📄 License
+
+This project is licensed under the AGPL-3.0 License. For detailed information, please see the [LICENSE](https://github.com/ultralytics/ultralytics/blob/main/LICENSE) file or read the full [AGPL-3.0 license text](https://opensource.org/license/agpl-v3).
+
+## 🙏 Acknowledgments
+
+- This YOLOv8-Segmentation-ONNXRuntime-Python demo was contributed by GitHub user [jamjamjon](https://github.com/jamjamjon).
+- Thanks to the [ONNX Runtime community](https://github.com/microsoft/onnxruntime) for providing a robust and efficient inference engine.
+
+We hope you find this demo useful! Feel free to contribute and help make it even better.
--- a/examples/YOLOv8-Segmentation-ONNXRuntime-Python/main.py
+++ b/examples/YOLOv8-Segmentation-ONNXRuntime-Python/main.py
@@ -0,0 +1,179 @@
+# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
+
+from __future__ import annotations
+
+import argparse
+
+import cv2
+import numpy as np
+import onnxruntime as ort
+import torch
+
+from ultralytics.engine.results import Results
+from ultralytics.utils import ASSETS, YAML, nms, ops
+from ultralytics.utils.checks import check_yaml
+
+
+class YOLOv8Seg:
+    """YOLOv8 segmentation model for performing instance segmentation using ONNX Runtime.
+
+    This class implements a YOLOv8 instance segmentation model using ONNX Runtime for inference. It handles
+    preprocessing of input images, running inference with the ONNX model, and postprocessing the results to generate
+    bounding boxes and segmentation masks.
+
+    Attributes:
+        session (ort.InferenceSession): ONNX Runtime inference session for model execution.
+        imgsz (tuple[int, int]): Input image size as (height, width) for the model.
+        classes (dict): Dictionary mapping class indices to class names from the dataset.
+        conf (float): Confidence threshold for filtering detections.
+        iou (float): IoU threshold used by non-maximum suppression.
+
+    Methods:
+        letterbox: Resize and pad image while maintaining aspect ratio.
+        preprocess: Preprocess the input image before feeding it into the model.
+        postprocess: Post-process model predictions to extract meaningful results.
+        process_mask: Process prototype masks with predicted mask coefficients to generate instance segmentation masks.
+
+    Examples:
+        >>> model = YOLOv8Seg("yolov8n-seg.onnx", conf=0.25, iou=0.7)
+        >>> img = cv2.imread("image.jpg")
+        >>> results = model(img)
+        >>> cv2.imshow("Segmentation", results[0].plot())
+    """
+
+    def __init__(self, onnx_model: str, conf: float = 0.25, iou: float = 0.7, imgsz: int | tuple[int, int] = 640):
+        """Initialize the instance segmentation model using an ONNX model.
+
+        Args:
+            onnx_model (str): Path to the ONNX model file.
+            conf (float, optional): Confidence threshold for filtering detections.
+            iou (float, optional): IoU threshold for non-maximum suppression.
+            imgsz (int | tuple[int, int], optional): Input image size of the model. Can be an integer for square input
+                or a tuple for rectangular input.
+        """
+        available = ort.get_available_providers()
+        providers = [p for p in ("CUDAExecutionProvider", "CPUExecutionProvider") if p in available]
+        self.session = ort.InferenceSession(onnx_model, providers=providers or available)
+
+        self.imgsz = (imgsz, imgsz) if isinstance(imgsz, int) else imgsz
+        self.classes = YAML.load(check_yaml("coco8.yaml"))["names"]
+        self.conf = conf
+        self.iou = iou
+
+    def __call__(self, img: np.ndarray) -> list[Results]:
+        """Run inference on the input image using the ONNX model.
+
+        Args:
+            img (np.ndarray): The original input image in BGR format.
+
+        Returns:
+            (list[Results]): Processed detection results after post-processing, containing bounding boxes and
+                segmentation masks.
+        """
+        prep_img = self.preprocess(img, self.imgsz)
+        outs = self.session.run(None, {self.session.get_inputs()[0].name: prep_img})
+        return self.postprocess(img, prep_img, outs)
+
+    def letterbox(self, img: np.ndarray, new_shape: tuple[int, int] = (640, 640)) -> np.ndarray:
+        """Resize and pad image while maintaining aspect ratio.
+
+        Args:
+            img (np.ndarray): Input image in BGR format.
+            new_shape (tuple[int, int], optional): Target shape as (height, width).
+
+        Returns:
+            (np.ndarray): Resized and padded image.
+        """
+        shape = img.shape[:2]  # current shape [height, width]
+
+        # Scale ratio (new / old)
+        r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
+
+        # Compute padding
+        new_unpad = round(shape[1] * r), round(shape[0] * r)
+        dw, dh = (new_shape[1] - new_unpad[0]) / 2, (new_shape[0] - new_unpad[1]) / 2  # wh padding
+
+        if shape[::-1] != new_unpad:  # resize
+            img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
+        top, bottom = round(dh - 0.1), round(dh + 0.1)
+        left, right = round(dw - 0.1), round(dw + 0.1)
+        img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=(114, 114, 114))
+
+        return img
+
+    def preprocess(self, img: np.ndarray, new_shape: tuple[int, int]) -> np.ndarray:
+        """Preprocess the input image before feeding it into the model.
+
+        Args:
+            img (np.ndarray): The input image in BGR format.
+            new_shape (tuple[int, int]): The target shape for resizing as (height, width).
+
+        Returns:
+            (np.ndarray): Preprocessed image ready for model inference, with shape (1, 3, height, width) and normalized
+                to [0, 1].
+        """
+        img = self.letterbox(img, new_shape)
+        img = img[..., ::-1].transpose([2, 0, 1])[None]  # BGR to RGB, BHWC to BCHW
+        img = np.ascontiguousarray(img)
+        img = img.astype(np.float32) / 255  # Normalize to [0, 1]
+        return img
+
+    def postprocess(self, img: np.ndarray, prep_img: np.ndarray, outs: list) -> list[Results]:
+        """Post-process model predictions to extract meaningful results.
+
+        Args:
+            img (np.ndarray): The original input image.
+            prep_img (np.ndarray): The preprocessed image used for inference.
+            outs (list): Model outputs containing predictions and prototype masks.
+
+        Returns:
+            (list[Results]): Processed detection results containing bounding boxes and segmentation masks.
+        """
+        preds, protos = (torch.from_numpy(p) for p in outs)
+        preds = nms.non_max_suppression(preds, self.conf, self.iou, nc=len(self.classes))
+
+        results = []
+        for i, pred in enumerate(preds):
+            pred[:, :4] = ops.scale_boxes(prep_img.shape[2:], pred[:, :4], img.shape)
+            masks = self.process_mask(protos[i], pred[:, 6:], pred[:, :4], img.shape[:2])
+            results.append(Results(img, path="", names=self.classes, boxes=pred[:, :6], masks=masks))
+
+        return results
+
+    def process_mask(
+        self, protos: torch.Tensor, masks_in: torch.Tensor, bboxes: torch.Tensor, shape: tuple[int, int]
+    ) -> torch.Tensor:
+        """Process prototype masks with predicted mask coefficients to generate instance segmentation masks.
+
+        Args:
+            protos (torch.Tensor): Prototype masks with shape (mask_dim, mask_h, mask_w).
+            masks_in (torch.Tensor): Predicted mask coefficients with shape (N, mask_dim), where N is number of
+                detections.
+            bboxes (torch.Tensor): Bounding boxes with shape (N, 4), where N is number of detections.
+            shape (tuple[int, int]): The size of the input image as (height, width).
+
+        Returns:
+            (torch.Tensor): Binary segmentation masks with shape (N, height, width).
+        """
+        c, mh, mw = protos.shape  # CHW
+        masks = (masks_in @ protos.float().view(c, -1)).view(-1, mh, mw)  # Matrix multiplication
+        masks = ops.scale_masks(masks[None], shape)[0]  # Scale masks to original image size
+        masks = ops.crop_mask(masks, bboxes)  # Crop masks to bounding boxes
+        return masks.gt_(0.0)  # Convert to binary masks
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--model", type=str, required=True, help="Path to ONNX model")
+    parser.add_argument("--source", type=str, default=str(ASSETS / "bus.jpg"), help="Path to input image")
+    parser.add_argument("--conf", type=float, default=0.25, help="Confidence threshold")
+    parser.add_argument("--iou", type=float, default=0.7, help="NMS IoU threshold")
+    args = parser.parse_args()
+
+    model = YOLOv8Seg(args.model, args.conf, args.iou)
+    img = cv2.imread(args.source)
+    results = model(img)
+
+    cv2.imshow("Segmented Image", results[0].plot())
+    cv2.waitKey(0)
+    cv2.destroyAllWindows()