| .. | ||
| build_batch_model.sh | ||
| convert_pt_to_tensorrt.py | ||
| decoder_test.py | ||
| detailed_profiling.py | ||
| profiling.py | ||
| README.md | ||
| timing_instrumentation.py | ||
Scripts Directory
This directory contains utility scripts for the python-rtsp-worker project.
convert_pt_to_tensorrt.py
Converts PyTorch models (.pt, .pth) to TensorRT engines (.trt) for optimized GPU inference.
Features
- Multiple Precision Modes: FP32, FP16, INT8
- Dynamic Batch Size: Support for variable batch sizes
- Automatic Optimization: Creates optimization profiles for best performance
- ONNX Intermediate: Uses ONNX as intermediate format for compatibility
- Easy to Use: Simple command-line interface
Requirements
Make sure you have the following dependencies installed:
pip install torch tensorrt onnx
Quick Start
Basic conversion (FP32):
python scripts/convert_pt_to_tensorrt.py \
--model path/to/model.pt \
--output models/model.trt
FP16 precision (recommended for most cases - 2x faster, minimal accuracy loss):
python scripts/convert_pt_to_tensorrt.py \
--model yolov8n.pt \
--output models/yolov8n.trt \
--fp16
Custom input shape:
python scripts/convert_pt_to_tensorrt.py \
--model model.pt \
--output model.trt \
--input-shape 1,3,416,416
Dynamic batch size (for variable batch inference):
python scripts/convert_pt_to_tensorrt.py \
--model model.pt \
--output model.trt \
--dynamic-batch \
--max-batch 16
Maximum optimization (FP16 + INT8):
python scripts/convert_pt_to_tensorrt.py \
--model model.pt \
--output model.trt \
--fp16 \
--int8
Command-Line Arguments
| Argument | Required | Default | Description |
|---|---|---|---|
--model, -m |
Yes | - | Path to PyTorch model file (.pt or .pth) |
--output, -o |
Yes | - | Output path for TensorRT engine (.trt) |
--input-shape, -s |
No | 1,3,640,640 | Input tensor shape as B,C,H,W |
--fp16 |
No | False | Enable FP16 precision (faster, ~same accuracy) |
--int8 |
No | False | Enable INT8 precision (fastest, needs calibration) |
--dynamic-batch |
No | False | Enable dynamic batch size support |
--max-batch |
No | 16 | Maximum batch size for dynamic batching |
--workspace-size |
No | 4 | TensorRT workspace size in GB |
--gpu |
No | 0 | GPU device ID to use |
--input-names |
No | ["input"] | Custom input tensor names |
--output-names |
No | ["output"] | Custom output tensor names |
--keep-onnx |
No | False | Keep intermediate ONNX file for debugging |
--verbose, -v |
No | False | Enable verbose logging |
Performance Tips
-
Always use FP16 unless you need FP32 precision:
- 2x faster inference
- 50% less VRAM usage
- Minimal accuracy loss for most models
-
Use dynamic batching for variable workloads:
- Process 1-16 images with same engine
- Automatic optimization for common batch sizes
-
Increase workspace size for complex models:
- Default 4GB works for most models
- Increase to 8GB for very large models
-
INT8 quantization for maximum speed:
- Requires calibration data (not included in basic conversion)
- 4x faster than FP32
- Best for deployment scenarios
Integration with Model Repository
Once converted, use the TensorRT engine with the model repository:
from services.model_repository import TensorRTModelRepository
# Initialize repository
repo = TensorRTModelRepository(gpu_id=0, default_num_contexts=4)
# Load the converted model
repo.load_model(
model_id="my_model",
file_path="models/model.trt",
num_contexts=4
)
# Run inference
import torch
input_tensor = torch.rand(1, 3, 640, 640, device='cuda:0')
outputs = repo.infer(
model_id="my_model",
inputs={"input": input_tensor}
)
Troubleshooting
Issue: Failed to parse ONNX model
- Solution: Check if your PyTorch model is compatible with ONNX export
- Try updating PyTorch and ONNX versions
Issue: FP16 not supported on this platform
- Solution: Your GPU doesn't support FP16. Remove
--fp16flag
Issue: Out of memory during conversion
- Solution: Reduce
--workspace-sizeor free up GPU memory
Issue: Model contains only state_dict
- Solution: Your checkpoint only has weights. You need the full model architecture.
- Modify the script's
load_pytorch_model()method to instantiate your model class
Examples for Common Models
YOLOv8:
# Download model first
# yolo export model=yolov8n.pt format=engine device=0
# Or use this script
python scripts/convert_pt_to_tensorrt.py \
--model yolov8n.pt \
--output models/yolov8n.trt \
--input-shape 1,3,640,640 \
--fp16
ResNet:
python scripts/convert_pt_to_tensorrt.py \
--model resnet50.pt \
--output models/resnet50.trt \
--input-shape 1,3,224,224 \
--fp16 \
--dynamic-batch \
--max-batch 32
Custom Model:
python scripts/convert_pt_to_tensorrt.py \
--model custom_model.pt \
--output models/custom.trt \
--input-shape 1,3,512,512 \
--input-names image \
--output-names predictions \
--fp16 \
--verbose
Notes
- The script uses ONNX as an intermediate format, which is the recommended approach
- TensorRT engines are hardware-specific; rebuild for different GPUs
- Conversion time varies (30 seconds to 5 minutes depending on model size)
- The first inference after loading is slower (warmup)
Support
For issues or questions, please check:
- TensorRT documentation: https://docs.nvidia.com/deeplearning/tensorrt/
- PyTorch ONNX export guide: https://pytorch.org/docs/stable/onnx.html