python-rtsp-worker/scripts
2025-11-09 11:47:18 +07:00
..
build_batch_model.sh nms optimization 2025-11-09 11:47:18 +07:00
convert_pt_to_tensorrt.py feat: inference subsystem and optimization to decoder 2025-11-09 00:57:08 +07:00
README.md feat: inference subsystem and optimization to decoder 2025-11-09 00:57:08 +07:00

Scripts Directory

This directory contains utility scripts for the python-rtsp-worker project.

convert_pt_to_tensorrt.py

Converts PyTorch models (.pt, .pth) to TensorRT engines (.trt) for optimized GPU inference.

Features

  • Multiple Precision Modes: FP32, FP16, INT8
  • Dynamic Batch Size: Support for variable batch sizes
  • Automatic Optimization: Creates optimization profiles for best performance
  • ONNX Intermediate: Uses ONNX as intermediate format for compatibility
  • Easy to Use: Simple command-line interface

Requirements

Make sure you have the following dependencies installed:

pip install torch tensorrt onnx

Quick Start

Basic conversion (FP32):

python scripts/convert_pt_to_tensorrt.py \
    --model path/to/model.pt \
    --output models/model.trt

FP16 precision (recommended for most cases - 2x faster, minimal accuracy loss):

python scripts/convert_pt_to_tensorrt.py \
    --model yolov8n.pt \
    --output models/yolov8n.trt \
    --fp16

Custom input shape:

python scripts/convert_pt_to_tensorrt.py \
    --model model.pt \
    --output model.trt \
    --input-shape 1,3,416,416

Dynamic batch size (for variable batch inference):

python scripts/convert_pt_to_tensorrt.py \
    --model model.pt \
    --output model.trt \
    --dynamic-batch \
    --max-batch 16

Maximum optimization (FP16 + INT8):

python scripts/convert_pt_to_tensorrt.py \
    --model model.pt \
    --output model.trt \
    --fp16 \
    --int8

Command-Line Arguments

Argument Required Default Description
--model, -m Yes - Path to PyTorch model file (.pt or .pth)
--output, -o Yes - Output path for TensorRT engine (.trt)
--input-shape, -s No 1,3,640,640 Input tensor shape as B,C,H,W
--fp16 No False Enable FP16 precision (faster, ~same accuracy)
--int8 No False Enable INT8 precision (fastest, needs calibration)
--dynamic-batch No False Enable dynamic batch size support
--max-batch No 16 Maximum batch size for dynamic batching
--workspace-size No 4 TensorRT workspace size in GB
--gpu No 0 GPU device ID to use
--input-names No ["input"] Custom input tensor names
--output-names No ["output"] Custom output tensor names
--keep-onnx No False Keep intermediate ONNX file for debugging
--verbose, -v No False Enable verbose logging

Performance Tips

  1. Always use FP16 unless you need FP32 precision:

    • 2x faster inference
    • 50% less VRAM usage
    • Minimal accuracy loss for most models
  2. Use dynamic batching for variable workloads:

    • Process 1-16 images with same engine
    • Automatic optimization for common batch sizes
  3. Increase workspace size for complex models:

    • Default 4GB works for most models
    • Increase to 8GB for very large models
  4. INT8 quantization for maximum speed:

    • Requires calibration data (not included in basic conversion)
    • 4x faster than FP32
    • Best for deployment scenarios

Integration with Model Repository

Once converted, use the TensorRT engine with the model repository:

from services.model_repository import TensorRTModelRepository

# Initialize repository
repo = TensorRTModelRepository(gpu_id=0, default_num_contexts=4)

# Load the converted model
repo.load_model(
    model_id="my_model",
    file_path="models/model.trt",
    num_contexts=4
)

# Run inference
import torch
input_tensor = torch.rand(1, 3, 640, 640, device='cuda:0')
outputs = repo.infer(
    model_id="my_model",
    inputs={"input": input_tensor}
)

Troubleshooting

Issue: Failed to parse ONNX model

  • Solution: Check if your PyTorch model is compatible with ONNX export
  • Try updating PyTorch and ONNX versions

Issue: FP16 not supported on this platform

  • Solution: Your GPU doesn't support FP16. Remove --fp16 flag

Issue: Out of memory during conversion

  • Solution: Reduce --workspace-size or free up GPU memory

Issue: Model contains only state_dict

  • Solution: Your checkpoint only has weights. You need the full model architecture.
  • Modify the script's load_pytorch_model() method to instantiate your model class

Examples for Common Models

YOLOv8:

# Download model first
# yolo export model=yolov8n.pt format=engine device=0

# Or use this script
python scripts/convert_pt_to_tensorrt.py \
    --model yolov8n.pt \
    --output models/yolov8n.trt \
    --input-shape 1,3,640,640 \
    --fp16

ResNet:

python scripts/convert_pt_to_tensorrt.py \
    --model resnet50.pt \
    --output models/resnet50.trt \
    --input-shape 1,3,224,224 \
    --fp16 \
    --dynamic-batch \
    --max-batch 32

Custom Model:

python scripts/convert_pt_to_tensorrt.py \
    --model custom_model.pt \
    --output models/custom.trt \
    --input-shape 1,3,512,512 \
    --input-names image \
    --output-names predictions \
    --fp16 \
    --verbose

Notes

  • The script uses ONNX as an intermediate format, which is the recommended approach
  • TensorRT engines are hardware-specific; rebuild for different GPUs
  • Conversion time varies (30 seconds to 5 minutes depending on model size)
  • The first inference after loading is slower (warmup)

Support

For issues or questions, please check: