adsist-cms/python-detector-worker

Fork 0

ziesorx 6bb679f4d8 fix: use gpu

2025-09-25 22:59:55 +07:00

4 KiB

Raw Blame History

Hardware Acceleration Setup

This detector worker now includes complete NVIDIA hardware acceleration with FFmpeg and OpenCV built from source.

What's Included

🔧 Complete Hardware Stack

FFmpeg 6.0 built from source with NVIDIA Video Codec SDK
OpenCV 4.8.1 built with CUDA and custom FFmpeg integration
GStreamer with NVDEC/VAAPI plugins
TurboJPEG for optimized JPEG encoding (3-5x faster)
CUDA support for YOLO model inference

🎯 Hardware Acceleration Methods (Automatic Detection)

GStreamer NVDEC - Best for RTSP streaming, lowest latency
OpenCV CUDA - Direct GPU memory access, best integration
FFmpeg CUVID - Custom build with full NVIDIA acceleration
VAAPI - Intel/AMD GPU support
Software Fallback - CPU-only as last resort

Build and Run

Single Build Script

./build-nvdec.sh

Build time: 45-90 minutes (compiles FFmpeg + OpenCV from source)

Run with GPU Support

docker run --gpus all -p 8000:8000 detector-worker:complete-hw-accel

Performance Improvements

Expected CPU Reduction

Video decoding: 70-90% reduction (moved to GPU)
JPEG encoding: 70-80% faster with TurboJPEG
Model inference: GPU accelerated with CUDA
Overall system: 50-80% less CPU usage

Profiling Results Comparison

Before (Software Only):

cv2.imencode: 6.5% CPU time (1.95s out of 30s)
psutil.cpu_percent: 88% CPU time (idle polling)
Video decoding: 100% CPU

After (Hardware Accelerated):

Video decoding: GPU (~5-10% CPU overhead)
JPEG encoding: 3-5x faster with TurboJPEG
Model inference: GPU accelerated

Verification

Check Hardware Acceleration Support

docker run --rm --gpus all detector-worker:complete-hw-accel \
  bash -c "ffmpeg -hwaccels && python3 -c 'import cv2; build=cv2.getBuildInformation(); print(\"CUDA:\", \"CUDA\" in build); print(\"CUVID:\", \"CUVID\" in build)'"

Runtime Logs

The application will automatically log which acceleration method is being used:

Camera cam1: Successfully using GStreamer with NVDEC hardware acceleration
Camera cam2: Using FFMPEG hardware acceleration (backend: FFMPEG)
Camera cam3: Using OpenCV CUDA hardware acceleration

Files Modified

Docker Configuration

Dockerfile.base - Complete hardware acceleration stack
build-nvdec.sh - Single build script for everything

Application Code

core/streaming/readers.py - Multi-method hardware acceleration
core/utils/hardware_encoder.py - TurboJPEG + NVENC encoding
core/utils/ffmpeg_detector.py - Runtime capability detection
requirements.base.txt - Added TurboJPEG, removed opencv-python

Architecture

Input RTSP Stream
       ↓
1. GStreamer NVDEC Pipeline (NVIDIA GPU)
   rtspsrc → nvv4l2decoder → nvvideoconvert → OpenCV
       ↓
2. OpenCV CUDA Backend (NVIDIA GPU)
   OpenCV with CUDA acceleration
       ↓
3. FFmpeg CUVID (NVIDIA GPU)
   Custom FFmpeg with h264_cuvid decoder
       ↓
4. VAAPI (Intel/AMD GPU)
   Hardware acceleration for non-NVIDIA
       ↓
5. Software Fallback (CPU)
   Standard OpenCV software decoding

Benefits

For Development

Single Dockerfile.base - Everything consolidated
Automatic detection - No manual configuration needed
Graceful fallback - Works without GPU for development

For Production

Maximum performance - Uses best available acceleration
GPU memory efficiency - Direct GPU-to-GPU pipeline
Lower latency - Hardware decoding + CUDA inference
Reduced CPU load - Frees CPU for other tasks

Troubleshooting

Build Issues

Ensure NVIDIA Docker runtime is installed
Check CUDA 12.6 compatibility with your GPU
Build takes 45-90 minutes - be patient

Runtime Issues

Verify nvidia-smi works in container
Check logs for acceleration method being used
Fallback to software decoding is automatic

This setup provides production-ready hardware acceleration with automatic detection and graceful fallback for maximum compatibility.

4 KiB Raw Blame History