python-rtsp-worker/scripts/README.md

197 lines
5.5 KiB
Markdown

# Scripts Directory
This directory contains utility scripts for the python-rtsp-worker project.
## convert_pt_to_tensorrt.py
Converts PyTorch models (.pt, .pth) to TensorRT engines (.trt) for optimized GPU inference.
### Features
- **Multiple Precision Modes**: FP32, FP16, INT8
- **Dynamic Batch Size**: Support for variable batch sizes
- **Automatic Optimization**: Creates optimization profiles for best performance
- **ONNX Intermediate**: Uses ONNX as intermediate format for compatibility
- **Easy to Use**: Simple command-line interface
### Requirements
Make sure you have the following dependencies installed:
```bash
pip install torch tensorrt onnx
```
### Quick Start
**Basic conversion (FP32)**:
```bash
python scripts/convert_pt_to_tensorrt.py \
--model path/to/model.pt \
--output models/model.trt
```
**FP16 precision** (recommended for most cases - 2x faster, minimal accuracy loss):
```bash
python scripts/convert_pt_to_tensorrt.py \
--model yolov8n.pt \
--output models/yolov8n.trt \
--fp16
```
**Custom input shape**:
```bash
python scripts/convert_pt_to_tensorrt.py \
--model model.pt \
--output model.trt \
--input-shape 1,3,416,416
```
**Dynamic batch size** (for variable batch inference):
```bash
python scripts/convert_pt_to_tensorrt.py \
--model model.pt \
--output model.trt \
--dynamic-batch \
--max-batch 16
```
**Maximum optimization** (FP16 + INT8):
```bash
python scripts/convert_pt_to_tensorrt.py \
--model model.pt \
--output model.trt \
--fp16 \
--int8
```
### Command-Line Arguments
| Argument | Required | Default | Description |
|----------|----------|---------|-------------|
| `--model`, `-m` | Yes | - | Path to PyTorch model file (.pt or .pth) |
| `--output`, `-o` | Yes | - | Output path for TensorRT engine (.trt) |
| `--input-shape`, `-s` | No | 1,3,640,640 | Input tensor shape as B,C,H,W |
| `--fp16` | No | False | Enable FP16 precision (faster, ~same accuracy) |
| `--int8` | No | False | Enable INT8 precision (fastest, needs calibration) |
| `--dynamic-batch` | No | False | Enable dynamic batch size support |
| `--max-batch` | No | 16 | Maximum batch size for dynamic batching |
| `--workspace-size` | No | 4 | TensorRT workspace size in GB |
| `--gpu` | No | 0 | GPU device ID to use |
| `--input-names` | No | ["input"] | Custom input tensor names |
| `--output-names` | No | ["output"] | Custom output tensor names |
| `--keep-onnx` | No | False | Keep intermediate ONNX file for debugging |
| `--verbose`, `-v` | No | False | Enable verbose logging |
### Performance Tips
1. **Always use FP16** unless you need FP32 precision:
- 2x faster inference
- 50% less VRAM usage
- Minimal accuracy loss for most models
2. **Use dynamic batching** for variable workloads:
- Process 1-16 images with same engine
- Automatic optimization for common batch sizes
3. **Increase workspace size** for complex models:
- Default 4GB works for most models
- Increase to 8GB for very large models
4. **INT8 quantization** for maximum speed:
- Requires calibration data (not included in basic conversion)
- 4x faster than FP32
- Best for deployment scenarios
### Integration with Model Repository
Once converted, use the TensorRT engine with the model repository:
```python
from services.model_repository import TensorRTModelRepository
# Initialize repository
repo = TensorRTModelRepository(gpu_id=0, default_num_contexts=4)
# Load the converted model
repo.load_model(
model_id="my_model",
file_path="models/model.trt",
num_contexts=4
)
# Run inference
import torch
input_tensor = torch.rand(1, 3, 640, 640, device='cuda:0')
outputs = repo.infer(
model_id="my_model",
inputs={"input": input_tensor}
)
```
### Troubleshooting
**Issue**: `Failed to parse ONNX model`
- Solution: Check if your PyTorch model is compatible with ONNX export
- Try updating PyTorch and ONNX versions
**Issue**: `FP16 not supported on this platform`
- Solution: Your GPU doesn't support FP16. Remove `--fp16` flag
**Issue**: `Out of memory during conversion`
- Solution: Reduce `--workspace-size` or free up GPU memory
**Issue**: `Model contains only state_dict`
- Solution: Your checkpoint only has weights. You need the full model architecture.
- Modify the script's `load_pytorch_model()` method to instantiate your model class
### Examples for Common Models
**YOLOv8**:
```bash
# Download model first
# yolo export model=yolov8n.pt format=engine device=0
# Or use this script
python scripts/convert_pt_to_tensorrt.py \
--model yolov8n.pt \
--output models/yolov8n.trt \
--input-shape 1,3,640,640 \
--fp16
```
**ResNet**:
```bash
python scripts/convert_pt_to_tensorrt.py \
--model resnet50.pt \
--output models/resnet50.trt \
--input-shape 1,3,224,224 \
--fp16 \
--dynamic-batch \
--max-batch 32
```
**Custom Model**:
```bash
python scripts/convert_pt_to_tensorrt.py \
--model custom_model.pt \
--output models/custom.trt \
--input-shape 1,3,512,512 \
--input-names image \
--output-names predictions \
--fp16 \
--verbose
```
### Notes
- The script uses ONNX as an intermediate format, which is the recommended approach
- TensorRT engines are hardware-specific; rebuild for different GPUs
- Conversion time varies (30 seconds to 5 minutes depending on model size)
- The first inference after loading is slower (warmup)
### Support
For issues or questions, please check:
- TensorRT documentation: https://docs.nvidia.com/deeplearning/tensorrt/
- PyTorch ONNX export guide: https://pytorch.org/docs/stable/onnx.html