feat: inference subsystem and optimization to decoder
This commit is contained in:
commit
3c83a57e44
19 changed files with 3897 additions and 0 deletions
197
scripts/README.md
Normal file
197
scripts/README.md
Normal file
|
|
@ -0,0 +1,197 @@
|
|||
# Scripts Directory
|
||||
|
||||
This directory contains utility scripts for the python-rtsp-worker project.
|
||||
|
||||
## convert_pt_to_tensorrt.py
|
||||
|
||||
Converts PyTorch models (.pt, .pth) to TensorRT engines (.trt) for optimized GPU inference.
|
||||
|
||||
### Features
|
||||
|
||||
- **Multiple Precision Modes**: FP32, FP16, INT8
|
||||
- **Dynamic Batch Size**: Support for variable batch sizes
|
||||
- **Automatic Optimization**: Creates optimization profiles for best performance
|
||||
- **ONNX Intermediate**: Uses ONNX as intermediate format for compatibility
|
||||
- **Easy to Use**: Simple command-line interface
|
||||
|
||||
### Requirements
|
||||
|
||||
Make sure you have the following dependencies installed:
|
||||
|
||||
```bash
|
||||
pip install torch tensorrt onnx
|
||||
```
|
||||
|
||||
### Quick Start
|
||||
|
||||
**Basic conversion (FP32)**:
|
||||
```bash
|
||||
python scripts/convert_pt_to_tensorrt.py \
|
||||
--model path/to/model.pt \
|
||||
--output models/model.trt
|
||||
```
|
||||
|
||||
**FP16 precision** (recommended for most cases - 2x faster, minimal accuracy loss):
|
||||
```bash
|
||||
python scripts/convert_pt_to_tensorrt.py \
|
||||
--model yolov8n.pt \
|
||||
--output models/yolov8n.trt \
|
||||
--fp16
|
||||
```
|
||||
|
||||
**Custom input shape**:
|
||||
```bash
|
||||
python scripts/convert_pt_to_tensorrt.py \
|
||||
--model model.pt \
|
||||
--output model.trt \
|
||||
--input-shape 1,3,416,416
|
||||
```
|
||||
|
||||
**Dynamic batch size** (for variable batch inference):
|
||||
```bash
|
||||
python scripts/convert_pt_to_tensorrt.py \
|
||||
--model model.pt \
|
||||
--output model.trt \
|
||||
--dynamic-batch \
|
||||
--max-batch 16
|
||||
```
|
||||
|
||||
**Maximum optimization** (FP16 + INT8):
|
||||
```bash
|
||||
python scripts/convert_pt_to_tensorrt.py \
|
||||
--model model.pt \
|
||||
--output model.trt \
|
||||
--fp16 \
|
||||
--int8
|
||||
```
|
||||
|
||||
### Command-Line Arguments
|
||||
|
||||
| Argument | Required | Default | Description |
|
||||
|----------|----------|---------|-------------|
|
||||
| `--model`, `-m` | Yes | - | Path to PyTorch model file (.pt or .pth) |
|
||||
| `--output`, `-o` | Yes | - | Output path for TensorRT engine (.trt) |
|
||||
| `--input-shape`, `-s` | No | 1,3,640,640 | Input tensor shape as B,C,H,W |
|
||||
| `--fp16` | No | False | Enable FP16 precision (faster, ~same accuracy) |
|
||||
| `--int8` | No | False | Enable INT8 precision (fastest, needs calibration) |
|
||||
| `--dynamic-batch` | No | False | Enable dynamic batch size support |
|
||||
| `--max-batch` | No | 16 | Maximum batch size for dynamic batching |
|
||||
| `--workspace-size` | No | 4 | TensorRT workspace size in GB |
|
||||
| `--gpu` | No | 0 | GPU device ID to use |
|
||||
| `--input-names` | No | ["input"] | Custom input tensor names |
|
||||
| `--output-names` | No | ["output"] | Custom output tensor names |
|
||||
| `--keep-onnx` | No | False | Keep intermediate ONNX file for debugging |
|
||||
| `--verbose`, `-v` | No | False | Enable verbose logging |
|
||||
|
||||
### Performance Tips
|
||||
|
||||
1. **Always use FP16** unless you need FP32 precision:
|
||||
- 2x faster inference
|
||||
- 50% less VRAM usage
|
||||
- Minimal accuracy loss for most models
|
||||
|
||||
2. **Use dynamic batching** for variable workloads:
|
||||
- Process 1-16 images with same engine
|
||||
- Automatic optimization for common batch sizes
|
||||
|
||||
3. **Increase workspace size** for complex models:
|
||||
- Default 4GB works for most models
|
||||
- Increase to 8GB for very large models
|
||||
|
||||
4. **INT8 quantization** for maximum speed:
|
||||
- Requires calibration data (not included in basic conversion)
|
||||
- 4x faster than FP32
|
||||
- Best for deployment scenarios
|
||||
|
||||
### Integration with Model Repository
|
||||
|
||||
Once converted, use the TensorRT engine with the model repository:
|
||||
|
||||
```python
|
||||
from services.model_repository import TensorRTModelRepository
|
||||
|
||||
# Initialize repository
|
||||
repo = TensorRTModelRepository(gpu_id=0, default_num_contexts=4)
|
||||
|
||||
# Load the converted model
|
||||
repo.load_model(
|
||||
model_id="my_model",
|
||||
file_path="models/model.trt",
|
||||
num_contexts=4
|
||||
)
|
||||
|
||||
# Run inference
|
||||
import torch
|
||||
input_tensor = torch.rand(1, 3, 640, 640, device='cuda:0')
|
||||
outputs = repo.infer(
|
||||
model_id="my_model",
|
||||
inputs={"input": input_tensor}
|
||||
)
|
||||
```
|
||||
|
||||
### Troubleshooting
|
||||
|
||||
**Issue**: `Failed to parse ONNX model`
|
||||
- Solution: Check if your PyTorch model is compatible with ONNX export
|
||||
- Try updating PyTorch and ONNX versions
|
||||
|
||||
**Issue**: `FP16 not supported on this platform`
|
||||
- Solution: Your GPU doesn't support FP16. Remove `--fp16` flag
|
||||
|
||||
**Issue**: `Out of memory during conversion`
|
||||
- Solution: Reduce `--workspace-size` or free up GPU memory
|
||||
|
||||
**Issue**: `Model contains only state_dict`
|
||||
- Solution: Your checkpoint only has weights. You need the full model architecture.
|
||||
- Modify the script's `load_pytorch_model()` method to instantiate your model class
|
||||
|
||||
### Examples for Common Models
|
||||
|
||||
**YOLOv8**:
|
||||
```bash
|
||||
# Download model first
|
||||
# yolo export model=yolov8n.pt format=engine device=0
|
||||
|
||||
# Or use this script
|
||||
python scripts/convert_pt_to_tensorrt.py \
|
||||
--model yolov8n.pt \
|
||||
--output models/yolov8n.trt \
|
||||
--input-shape 1,3,640,640 \
|
||||
--fp16
|
||||
```
|
||||
|
||||
**ResNet**:
|
||||
```bash
|
||||
python scripts/convert_pt_to_tensorrt.py \
|
||||
--model resnet50.pt \
|
||||
--output models/resnet50.trt \
|
||||
--input-shape 1,3,224,224 \
|
||||
--fp16 \
|
||||
--dynamic-batch \
|
||||
--max-batch 32
|
||||
```
|
||||
|
||||
**Custom Model**:
|
||||
```bash
|
||||
python scripts/convert_pt_to_tensorrt.py \
|
||||
--model custom_model.pt \
|
||||
--output models/custom.trt \
|
||||
--input-shape 1,3,512,512 \
|
||||
--input-names image \
|
||||
--output-names predictions \
|
||||
--fp16 \
|
||||
--verbose
|
||||
```
|
||||
|
||||
### Notes
|
||||
|
||||
- The script uses ONNX as an intermediate format, which is the recommended approach
|
||||
- TensorRT engines are hardware-specific; rebuild for different GPUs
|
||||
- Conversion time varies (30 seconds to 5 minutes depending on model size)
|
||||
- The first inference after loading is slower (warmup)
|
||||
|
||||
### Support
|
||||
|
||||
For issues or questions, please check:
|
||||
- TensorRT documentation: https://docs.nvidia.com/deeplearning/tensorrt/
|
||||
- PyTorch ONNX export guide: https://pytorch.org/docs/stable/onnx.html
|
||||
Loading…
Add table
Add a link
Reference in a new issue