# Scripts Directory This directory contains utility scripts for the python-rtsp-worker project. ## convert_pt_to_tensorrt.py Converts PyTorch models (.pt, .pth) to TensorRT engines (.trt) for optimized GPU inference. ### Features - **Multiple Precision Modes**: FP32, FP16, INT8 - **Dynamic Batch Size**: Support for variable batch sizes - **Automatic Optimization**: Creates optimization profiles for best performance - **ONNX Intermediate**: Uses ONNX as intermediate format for compatibility - **Easy to Use**: Simple command-line interface ### Requirements Make sure you have the following dependencies installed: ```bash pip install torch tensorrt onnx ``` ### Quick Start **Basic conversion (FP32)**: ```bash python scripts/convert_pt_to_tensorrt.py \ --model path/to/model.pt \ --output models/model.trt ``` **FP16 precision** (recommended for most cases - 2x faster, minimal accuracy loss): ```bash python scripts/convert_pt_to_tensorrt.py \ --model yolov8n.pt \ --output models/yolov8n.trt \ --fp16 ``` **Custom input shape**: ```bash python scripts/convert_pt_to_tensorrt.py \ --model model.pt \ --output model.trt \ --input-shape 1,3,416,416 ``` **Dynamic batch size** (for variable batch inference): ```bash python scripts/convert_pt_to_tensorrt.py \ --model model.pt \ --output model.trt \ --dynamic-batch \ --max-batch 16 ``` **Maximum optimization** (FP16 + INT8): ```bash python scripts/convert_pt_to_tensorrt.py \ --model model.pt \ --output model.trt \ --fp16 \ --int8 ``` ### Command-Line Arguments | Argument | Required | Default | Description | |----------|----------|---------|-------------| | `--model`, `-m` | Yes | - | Path to PyTorch model file (.pt or .pth) | | `--output`, `-o` | Yes | - | Output path for TensorRT engine (.trt) | | `--input-shape`, `-s` | No | 1,3,640,640 | Input tensor shape as B,C,H,W | | `--fp16` | No | False | Enable FP16 precision (faster, ~same accuracy) | | `--int8` | No | False | Enable INT8 precision (fastest, needs calibration) | | `--dynamic-batch` | No | False | Enable dynamic batch size support | | `--max-batch` | No | 16 | Maximum batch size for dynamic batching | | `--workspace-size` | No | 4 | TensorRT workspace size in GB | | `--gpu` | No | 0 | GPU device ID to use | | `--input-names` | No | ["input"] | Custom input tensor names | | `--output-names` | No | ["output"] | Custom output tensor names | | `--keep-onnx` | No | False | Keep intermediate ONNX file for debugging | | `--verbose`, `-v` | No | False | Enable verbose logging | ### Performance Tips 1. **Always use FP16** unless you need FP32 precision: - 2x faster inference - 50% less VRAM usage - Minimal accuracy loss for most models 2. **Use dynamic batching** for variable workloads: - Process 1-16 images with same engine - Automatic optimization for common batch sizes 3. **Increase workspace size** for complex models: - Default 4GB works for most models - Increase to 8GB for very large models 4. **INT8 quantization** for maximum speed: - Requires calibration data (not included in basic conversion) - 4x faster than FP32 - Best for deployment scenarios ### Integration with Model Repository Once converted, use the TensorRT engine with the model repository: ```python from services.model_repository import TensorRTModelRepository # Initialize repository repo = TensorRTModelRepository(gpu_id=0, default_num_contexts=4) # Load the converted model repo.load_model( model_id="my_model", file_path="models/model.trt", num_contexts=4 ) # Run inference import torch input_tensor = torch.rand(1, 3, 640, 640, device='cuda:0') outputs = repo.infer( model_id="my_model", inputs={"input": input_tensor} ) ``` ### Troubleshooting **Issue**: `Failed to parse ONNX model` - Solution: Check if your PyTorch model is compatible with ONNX export - Try updating PyTorch and ONNX versions **Issue**: `FP16 not supported on this platform` - Solution: Your GPU doesn't support FP16. Remove `--fp16` flag **Issue**: `Out of memory during conversion` - Solution: Reduce `--workspace-size` or free up GPU memory **Issue**: `Model contains only state_dict` - Solution: Your checkpoint only has weights. You need the full model architecture. - Modify the script's `load_pytorch_model()` method to instantiate your model class ### Examples for Common Models **YOLOv8**: ```bash # Download model first # yolo export model=yolov8n.pt format=engine device=0 # Or use this script python scripts/convert_pt_to_tensorrt.py \ --model yolov8n.pt \ --output models/yolov8n.trt \ --input-shape 1,3,640,640 \ --fp16 ``` **ResNet**: ```bash python scripts/convert_pt_to_tensorrt.py \ --model resnet50.pt \ --output models/resnet50.trt \ --input-shape 1,3,224,224 \ --fp16 \ --dynamic-batch \ --max-batch 32 ``` **Custom Model**: ```bash python scripts/convert_pt_to_tensorrt.py \ --model custom_model.pt \ --output models/custom.trt \ --input-shape 1,3,512,512 \ --input-names image \ --output-names predictions \ --fp16 \ --verbose ``` ### Notes - The script uses ONNX as an intermediate format, which is the recommended approach - TensorRT engines are hardware-specific; rebuild for different GPUs - Conversion time varies (30 seconds to 5 minutes depending on model size) - The first inference after loading is slower (warmup) ### Support For issues or questions, please check: - TensorRT documentation: https://docs.nvidia.com/deeplearning/tensorrt/ - PyTorch ONNX export guide: https://pytorch.org/docs/stable/onnx.html