197 lines
5.5 KiB
Markdown
197 lines
5.5 KiB
Markdown
# Scripts Directory
|
|
|
|
This directory contains utility scripts for the python-rtsp-worker project.
|
|
|
|
## convert_pt_to_tensorrt.py
|
|
|
|
Converts PyTorch models (.pt, .pth) to TensorRT engines (.trt) for optimized GPU inference.
|
|
|
|
### Features
|
|
|
|
- **Multiple Precision Modes**: FP32, FP16, INT8
|
|
- **Dynamic Batch Size**: Support for variable batch sizes
|
|
- **Automatic Optimization**: Creates optimization profiles for best performance
|
|
- **ONNX Intermediate**: Uses ONNX as intermediate format for compatibility
|
|
- **Easy to Use**: Simple command-line interface
|
|
|
|
### Requirements
|
|
|
|
Make sure you have the following dependencies installed:
|
|
|
|
```bash
|
|
pip install torch tensorrt onnx
|
|
```
|
|
|
|
### Quick Start
|
|
|
|
**Basic conversion (FP32)**:
|
|
```bash
|
|
python scripts/convert_pt_to_tensorrt.py \
|
|
--model path/to/model.pt \
|
|
--output models/model.trt
|
|
```
|
|
|
|
**FP16 precision** (recommended for most cases - 2x faster, minimal accuracy loss):
|
|
```bash
|
|
python scripts/convert_pt_to_tensorrt.py \
|
|
--model yolov8n.pt \
|
|
--output models/yolov8n.trt \
|
|
--fp16
|
|
```
|
|
|
|
**Custom input shape**:
|
|
```bash
|
|
python scripts/convert_pt_to_tensorrt.py \
|
|
--model model.pt \
|
|
--output model.trt \
|
|
--input-shape 1,3,416,416
|
|
```
|
|
|
|
**Dynamic batch size** (for variable batch inference):
|
|
```bash
|
|
python scripts/convert_pt_to_tensorrt.py \
|
|
--model model.pt \
|
|
--output model.trt \
|
|
--dynamic-batch \
|
|
--max-batch 16
|
|
```
|
|
|
|
**Maximum optimization** (FP16 + INT8):
|
|
```bash
|
|
python scripts/convert_pt_to_tensorrt.py \
|
|
--model model.pt \
|
|
--output model.trt \
|
|
--fp16 \
|
|
--int8
|
|
```
|
|
|
|
### Command-Line Arguments
|
|
|
|
| Argument | Required | Default | Description |
|
|
|----------|----------|---------|-------------|
|
|
| `--model`, `-m` | Yes | - | Path to PyTorch model file (.pt or .pth) |
|
|
| `--output`, `-o` | Yes | - | Output path for TensorRT engine (.trt) |
|
|
| `--input-shape`, `-s` | No | 1,3,640,640 | Input tensor shape as B,C,H,W |
|
|
| `--fp16` | No | False | Enable FP16 precision (faster, ~same accuracy) |
|
|
| `--int8` | No | False | Enable INT8 precision (fastest, needs calibration) |
|
|
| `--dynamic-batch` | No | False | Enable dynamic batch size support |
|
|
| `--max-batch` | No | 16 | Maximum batch size for dynamic batching |
|
|
| `--workspace-size` | No | 4 | TensorRT workspace size in GB |
|
|
| `--gpu` | No | 0 | GPU device ID to use |
|
|
| `--input-names` | No | ["input"] | Custom input tensor names |
|
|
| `--output-names` | No | ["output"] | Custom output tensor names |
|
|
| `--keep-onnx` | No | False | Keep intermediate ONNX file for debugging |
|
|
| `--verbose`, `-v` | No | False | Enable verbose logging |
|
|
|
|
### Performance Tips
|
|
|
|
1. **Always use FP16** unless you need FP32 precision:
|
|
- 2x faster inference
|
|
- 50% less VRAM usage
|
|
- Minimal accuracy loss for most models
|
|
|
|
2. **Use dynamic batching** for variable workloads:
|
|
- Process 1-16 images with same engine
|
|
- Automatic optimization for common batch sizes
|
|
|
|
3. **Increase workspace size** for complex models:
|
|
- Default 4GB works for most models
|
|
- Increase to 8GB for very large models
|
|
|
|
4. **INT8 quantization** for maximum speed:
|
|
- Requires calibration data (not included in basic conversion)
|
|
- 4x faster than FP32
|
|
- Best for deployment scenarios
|
|
|
|
### Integration with Model Repository
|
|
|
|
Once converted, use the TensorRT engine with the model repository:
|
|
|
|
```python
|
|
from services.model_repository import TensorRTModelRepository
|
|
|
|
# Initialize repository
|
|
repo = TensorRTModelRepository(gpu_id=0, default_num_contexts=4)
|
|
|
|
# Load the converted model
|
|
repo.load_model(
|
|
model_id="my_model",
|
|
file_path="models/model.trt",
|
|
num_contexts=4
|
|
)
|
|
|
|
# Run inference
|
|
import torch
|
|
input_tensor = torch.rand(1, 3, 640, 640, device='cuda:0')
|
|
outputs = repo.infer(
|
|
model_id="my_model",
|
|
inputs={"input": input_tensor}
|
|
)
|
|
```
|
|
|
|
### Troubleshooting
|
|
|
|
**Issue**: `Failed to parse ONNX model`
|
|
- Solution: Check if your PyTorch model is compatible with ONNX export
|
|
- Try updating PyTorch and ONNX versions
|
|
|
|
**Issue**: `FP16 not supported on this platform`
|
|
- Solution: Your GPU doesn't support FP16. Remove `--fp16` flag
|
|
|
|
**Issue**: `Out of memory during conversion`
|
|
- Solution: Reduce `--workspace-size` or free up GPU memory
|
|
|
|
**Issue**: `Model contains only state_dict`
|
|
- Solution: Your checkpoint only has weights. You need the full model architecture.
|
|
- Modify the script's `load_pytorch_model()` method to instantiate your model class
|
|
|
|
### Examples for Common Models
|
|
|
|
**YOLOv8**:
|
|
```bash
|
|
# Download model first
|
|
# yolo export model=yolov8n.pt format=engine device=0
|
|
|
|
# Or use this script
|
|
python scripts/convert_pt_to_tensorrt.py \
|
|
--model yolov8n.pt \
|
|
--output models/yolov8n.trt \
|
|
--input-shape 1,3,640,640 \
|
|
--fp16
|
|
```
|
|
|
|
**ResNet**:
|
|
```bash
|
|
python scripts/convert_pt_to_tensorrt.py \
|
|
--model resnet50.pt \
|
|
--output models/resnet50.trt \
|
|
--input-shape 1,3,224,224 \
|
|
--fp16 \
|
|
--dynamic-batch \
|
|
--max-batch 32
|
|
```
|
|
|
|
**Custom Model**:
|
|
```bash
|
|
python scripts/convert_pt_to_tensorrt.py \
|
|
--model custom_model.pt \
|
|
--output models/custom.trt \
|
|
--input-shape 1,3,512,512 \
|
|
--input-names image \
|
|
--output-names predictions \
|
|
--fp16 \
|
|
--verbose
|
|
```
|
|
|
|
### Notes
|
|
|
|
- The script uses ONNX as an intermediate format, which is the recommended approach
|
|
- TensorRT engines are hardware-specific; rebuild for different GPUs
|
|
- Conversion time varies (30 seconds to 5 minutes depending on model size)
|
|
- The first inference after loading is slower (warmup)
|
|
|
|
### Support
|
|
|
|
For issues or questions, please check:
|
|
- TensorRT documentation: https://docs.nvidia.com/deeplearning/tensorrt/
|
|
- PyTorch ONNX export guide: https://pytorch.org/docs/stable/onnx.html
|