python-detector-worker/README-hardware-acceleration.md
2025-09-25 22:59:55 +07:00

127 lines
No EOL
4 KiB
Markdown

# Hardware Acceleration Setup
This detector worker now includes **complete NVIDIA hardware acceleration** with FFmpeg and OpenCV built from source.
## What's Included
### 🔧 Complete Hardware Stack
- **FFmpeg 6.0** built from source with NVIDIA Video Codec SDK
- **OpenCV 4.8.1** built with CUDA and custom FFmpeg integration
- **GStreamer** with NVDEC/VAAPI plugins
- **TurboJPEG** for optimized JPEG encoding (3-5x faster)
- **CUDA** support for YOLO model inference
### 🎯 Hardware Acceleration Methods (Automatic Detection)
1. **GStreamer NVDEC** - Best for RTSP streaming, lowest latency
2. **OpenCV CUDA** - Direct GPU memory access, best integration
3. **FFmpeg CUVID** - Custom build with full NVIDIA acceleration
4. **VAAPI** - Intel/AMD GPU support
5. **Software Fallback** - CPU-only as last resort
## Build and Run
### Single Build Script
```bash
./build-nvdec.sh
```
**Build time**: 45-90 minutes (compiles FFmpeg + OpenCV from source)
### Run with GPU Support
```bash
docker run --gpus all -p 8000:8000 detector-worker:complete-hw-accel
```
## Performance Improvements
### Expected CPU Reduction
- **Video decoding**: 70-90% reduction (moved to GPU)
- **JPEG encoding**: 70-80% faster with TurboJPEG
- **Model inference**: GPU accelerated with CUDA
- **Overall system**: 50-80% less CPU usage
### Profiling Results Comparison
**Before (Software Only)**:
- `cv2.imencode`: 6.5% CPU time (1.95s out of 30s)
- `psutil.cpu_percent`: 88% CPU time (idle polling)
- Video decoding: 100% CPU
**After (Hardware Accelerated)**:
- Video decoding: GPU (~5-10% CPU overhead)
- JPEG encoding: 3-5x faster with TurboJPEG
- Model inference: GPU accelerated
## Verification
### Check Hardware Acceleration Support
```bash
docker run --rm --gpus all detector-worker:complete-hw-accel \
bash -c "ffmpeg -hwaccels && python3 -c 'import cv2; build=cv2.getBuildInformation(); print(\"CUDA:\", \"CUDA\" in build); print(\"CUVID:\", \"CUVID\" in build)'"
```
### Runtime Logs
The application will automatically log which acceleration method is being used:
```
Camera cam1: Successfully using GStreamer with NVDEC hardware acceleration
Camera cam2: Using FFMPEG hardware acceleration (backend: FFMPEG)
Camera cam3: Using OpenCV CUDA hardware acceleration
```
## Files Modified
### Docker Configuration
- **Dockerfile.base** - Complete hardware acceleration stack
- **build-nvdec.sh** - Single build script for everything
### Application Code
- **core/streaming/readers.py** - Multi-method hardware acceleration
- **core/utils/hardware_encoder.py** - TurboJPEG + NVENC encoding
- **core/utils/ffmpeg_detector.py** - Runtime capability detection
- **requirements.base.txt** - Added TurboJPEG, removed opencv-python
## Architecture
```
Input RTSP Stream
1. GStreamer NVDEC Pipeline (NVIDIA GPU)
rtspsrc → nvv4l2decoder → nvvideoconvert → OpenCV
2. OpenCV CUDA Backend (NVIDIA GPU)
OpenCV with CUDA acceleration
3. FFmpeg CUVID (NVIDIA GPU)
Custom FFmpeg with h264_cuvid decoder
4. VAAPI (Intel/AMD GPU)
Hardware acceleration for non-NVIDIA
5. Software Fallback (CPU)
Standard OpenCV software decoding
```
## Benefits
### For Development
- **Single Dockerfile.base** - Everything consolidated
- **Automatic detection** - No manual configuration needed
- **Graceful fallback** - Works without GPU for development
### For Production
- **Maximum performance** - Uses best available acceleration
- **GPU memory efficiency** - Direct GPU-to-GPU pipeline
- **Lower latency** - Hardware decoding + CUDA inference
- **Reduced CPU load** - Frees CPU for other tasks
## Troubleshooting
### Build Issues
- Ensure NVIDIA Docker runtime is installed
- Check CUDA 12.6 compatibility with your GPU
- Build takes 45-90 minutes - be patient
### Runtime Issues
- Verify `nvidia-smi` works in container
- Check logs for acceleration method being used
- Fallback to software decoding is automatic
This setup provides **production-ready hardware acceleration** with automatic detection and graceful fallback for maximum compatibility.