python-detector-worker/README-hardware-acceleration.md
2025-09-25 22:59:55 +07:00

4 KiB

Hardware Acceleration Setup

This detector worker now includes complete NVIDIA hardware acceleration with FFmpeg and OpenCV built from source.

What's Included

🔧 Complete Hardware Stack

  • FFmpeg 6.0 built from source with NVIDIA Video Codec SDK
  • OpenCV 4.8.1 built with CUDA and custom FFmpeg integration
  • GStreamer with NVDEC/VAAPI plugins
  • TurboJPEG for optimized JPEG encoding (3-5x faster)
  • CUDA support for YOLO model inference

🎯 Hardware Acceleration Methods (Automatic Detection)

  1. GStreamer NVDEC - Best for RTSP streaming, lowest latency
  2. OpenCV CUDA - Direct GPU memory access, best integration
  3. FFmpeg CUVID - Custom build with full NVIDIA acceleration
  4. VAAPI - Intel/AMD GPU support
  5. Software Fallback - CPU-only as last resort

Build and Run

Single Build Script

./build-nvdec.sh

Build time: 45-90 minutes (compiles FFmpeg + OpenCV from source)

Run with GPU Support

docker run --gpus all -p 8000:8000 detector-worker:complete-hw-accel

Performance Improvements

Expected CPU Reduction

  • Video decoding: 70-90% reduction (moved to GPU)
  • JPEG encoding: 70-80% faster with TurboJPEG
  • Model inference: GPU accelerated with CUDA
  • Overall system: 50-80% less CPU usage

Profiling Results Comparison

Before (Software Only):

  • cv2.imencode: 6.5% CPU time (1.95s out of 30s)
  • psutil.cpu_percent: 88% CPU time (idle polling)
  • Video decoding: 100% CPU

After (Hardware Accelerated):

  • Video decoding: GPU (~5-10% CPU overhead)
  • JPEG encoding: 3-5x faster with TurboJPEG
  • Model inference: GPU accelerated

Verification

Check Hardware Acceleration Support

docker run --rm --gpus all detector-worker:complete-hw-accel \
  bash -c "ffmpeg -hwaccels && python3 -c 'import cv2; build=cv2.getBuildInformation(); print(\"CUDA:\", \"CUDA\" in build); print(\"CUVID:\", \"CUVID\" in build)'"

Runtime Logs

The application will automatically log which acceleration method is being used:

Camera cam1: Successfully using GStreamer with NVDEC hardware acceleration
Camera cam2: Using FFMPEG hardware acceleration (backend: FFMPEG)
Camera cam3: Using OpenCV CUDA hardware acceleration

Files Modified

Docker Configuration

  • Dockerfile.base - Complete hardware acceleration stack
  • build-nvdec.sh - Single build script for everything

Application Code

  • core/streaming/readers.py - Multi-method hardware acceleration
  • core/utils/hardware_encoder.py - TurboJPEG + NVENC encoding
  • core/utils/ffmpeg_detector.py - Runtime capability detection
  • requirements.base.txt - Added TurboJPEG, removed opencv-python

Architecture

Input RTSP Stream
       ↓
1. GStreamer NVDEC Pipeline (NVIDIA GPU)
   rtspsrc → nvv4l2decoder → nvvideoconvert → OpenCV
       ↓
2. OpenCV CUDA Backend (NVIDIA GPU)
   OpenCV with CUDA acceleration
       ↓
3. FFmpeg CUVID (NVIDIA GPU)
   Custom FFmpeg with h264_cuvid decoder
       ↓
4. VAAPI (Intel/AMD GPU)
   Hardware acceleration for non-NVIDIA
       ↓
5. Software Fallback (CPU)
   Standard OpenCV software decoding

Benefits

For Development

  • Single Dockerfile.base - Everything consolidated
  • Automatic detection - No manual configuration needed
  • Graceful fallback - Works without GPU for development

For Production

  • Maximum performance - Uses best available acceleration
  • GPU memory efficiency - Direct GPU-to-GPU pipeline
  • Lower latency - Hardware decoding + CUDA inference
  • Reduced CPU load - Frees CPU for other tasks

Troubleshooting

Build Issues

  • Ensure NVIDIA Docker runtime is installed
  • Check CUDA 12.6 compatibility with your GPU
  • Build takes 45-90 minutes - be patient

Runtime Issues

  • Verify nvidia-smi works in container
  • Check logs for acceleration method being used
  • Fallback to software decoding is automatic

This setup provides production-ready hardware acceleration with automatic detection and graceful fallback for maximum compatibility.