History

Siwat Sirichai 3a47920186 event driven system		2025-11-10 11:51:06 +07:00
..
build_batch_model.sh	nms optimization	2025-11-09 11:47:18 +07:00
convert_pt_to_tensorrt.py	feat: inference subsystem and optimization to decoder	2025-11-09 00:57:08 +07:00
decoder_test.py	event driven system	2025-11-10 11:51:06 +07:00
detailed_profiling.py	event driven system	2025-11-10 11:51:06 +07:00
profiling.py	event driven system	2025-11-10 11:51:06 +07:00
README.md	feat: inference subsystem and optimization to decoder	2025-11-09 00:57:08 +07:00
timing_instrumentation.py	event driven system	2025-11-10 11:51:06 +07:00

README.md

Scripts Directory

This directory contains utility scripts for the python-rtsp-worker project.

convert_pt_to_tensorrt.py

Converts PyTorch models (.pt, .pth) to TensorRT engines (.trt) for optimized GPU inference.

Features

Multiple Precision Modes: FP32, FP16, INT8
Dynamic Batch Size: Support for variable batch sizes
Automatic Optimization: Creates optimization profiles for best performance
ONNX Intermediate: Uses ONNX as intermediate format for compatibility
Easy to Use: Simple command-line interface

Requirements

Make sure you have the following dependencies installed:

pip install torch tensorrt onnx

Quick Start

Basic conversion (FP32):

python scripts/convert_pt_to_tensorrt.py \
    --model path/to/model.pt \
    --output models/model.trt

FP16 precision (recommended for most cases - 2x faster, minimal accuracy loss):

python scripts/convert_pt_to_tensorrt.py \
    --model yolov8n.pt \
    --output models/yolov8n.trt \
    --fp16

Custom input shape:

python scripts/convert_pt_to_tensorrt.py \
    --model model.pt \
    --output model.trt \
    --input-shape 1,3,416,416

Dynamic batch size (for variable batch inference):

python scripts/convert_pt_to_tensorrt.py \
    --model model.pt \
    --output model.trt \
    --dynamic-batch \
    --max-batch 16

Maximum optimization (FP16 + INT8):

python scripts/convert_pt_to_tensorrt.py \
    --model model.pt \
    --output model.trt \
    --fp16 \
    --int8

Command-Line Arguments

Argument	Required	Default	Description
`--model`, `-m`	Yes	-	Path to PyTorch model file (.pt or .pth)
`--output`, `-o`	Yes	-	Output path for TensorRT engine (.trt)
`--input-shape`, `-s`	No	1,3,640,640	Input tensor shape as B,C,H,W
`--fp16`	No	False	Enable FP16 precision (faster, ~same accuracy)
`--int8`	No	False	Enable INT8 precision (fastest, needs calibration)
`--dynamic-batch`	No	False	Enable dynamic batch size support
`--max-batch`	No	16	Maximum batch size for dynamic batching
`--workspace-size`	No	4	TensorRT workspace size in GB
`--gpu`	No	0	GPU device ID to use
`--input-names`	No	["input"]	Custom input tensor names
`--output-names`	No	["output"]	Custom output tensor names
`--keep-onnx`	No	False	Keep intermediate ONNX file for debugging
`--verbose`, `-v`	No	False	Enable verbose logging

Performance Tips

Always use FP16 unless you need FP32 precision:
- 2x faster inference
- 50% less VRAM usage
- Minimal accuracy loss for most models
Use dynamic batching for variable workloads:
- Process 1-16 images with same engine
- Automatic optimization for common batch sizes
Increase workspace size for complex models:
- Default 4GB works for most models
- Increase to 8GB for very large models
INT8 quantization for maximum speed:
- Requires calibration data (not included in basic conversion)
- 4x faster than FP32
- Best for deployment scenarios

Integration with Model Repository

Once converted, use the TensorRT engine with the model repository:

from services.model_repository import TensorRTModelRepository

# Initialize repository
repo = TensorRTModelRepository(gpu_id=0, default_num_contexts=4)

# Load the converted model
repo.load_model(
    model_id="my_model",
    file_path="models/model.trt",
    num_contexts=4
)

# Run inference
import torch
input_tensor = torch.rand(1, 3, 640, 640, device='cuda:0')
outputs = repo.infer(
    model_id="my_model",
    inputs={"input": input_tensor}
)

Troubleshooting

Issue: Failed to parse ONNX model

Solution: Check if your PyTorch model is compatible with ONNX export
Try updating PyTorch and ONNX versions

Issue: FP16 not supported on this platform

Solution: Your GPU doesn't support FP16. Remove --fp16 flag

Issue: Out of memory during conversion

Solution: Reduce --workspace-size or free up GPU memory

Issue: Model contains only state_dict

Solution: Your checkpoint only has weights. You need the full model architecture.
Modify the script's load_pytorch_model() method to instantiate your model class

Examples for Common Models

YOLOv8:

# Download model first
# yolo export model=yolov8n.pt format=engine device=0

# Or use this script
python scripts/convert_pt_to_tensorrt.py \
    --model yolov8n.pt \
    --output models/yolov8n.trt \
    --input-shape 1,3,640,640 \
    --fp16

ResNet:

python scripts/convert_pt_to_tensorrt.py \
    --model resnet50.pt \
    --output models/resnet50.trt \
    --input-shape 1,3,224,224 \
    --fp16 \
    --dynamic-batch \
    --max-batch 32

Custom Model:

python scripts/convert_pt_to_tensorrt.py \
    --model custom_model.pt \
    --output models/custom.trt \
    --input-shape 1,3,512,512 \
    --input-names image \
    --output-names predictions \
    --fp16 \
    --verbose

Notes

The script uses ONNX as an intermediate format, which is the recommended approach
TensorRT engines are hardware-specific; rebuild for different GPUs
Conversion time varies (30 seconds to 5 minutes depending on model size)
The first inference after loading is slower (warmup)

Support

For issues or questions, please check:

TensorRT documentation: https://docs.nvidia.com/deeplearning/tensorrt/
PyTorch ONNX export guide: https://pytorch.org/docs/stable/onnx.html