127 lines
No EOL
4 KiB
Markdown
127 lines
No EOL
4 KiB
Markdown
# Hardware Acceleration Setup
|
|
|
|
This detector worker now includes **complete NVIDIA hardware acceleration** with FFmpeg and OpenCV built from source.
|
|
|
|
## What's Included
|
|
|
|
### 🔧 Complete Hardware Stack
|
|
- **FFmpeg 6.0** built from source with NVIDIA Video Codec SDK
|
|
- **OpenCV 4.8.1** built with CUDA and custom FFmpeg integration
|
|
- **GStreamer** with NVDEC/VAAPI plugins
|
|
- **TurboJPEG** for optimized JPEG encoding (3-5x faster)
|
|
- **CUDA** support for YOLO model inference
|
|
|
|
### 🎯 Hardware Acceleration Methods (Automatic Detection)
|
|
1. **GStreamer NVDEC** - Best for RTSP streaming, lowest latency
|
|
2. **OpenCV CUDA** - Direct GPU memory access, best integration
|
|
3. **FFmpeg CUVID** - Custom build with full NVIDIA acceleration
|
|
4. **VAAPI** - Intel/AMD GPU support
|
|
5. **Software Fallback** - CPU-only as last resort
|
|
|
|
## Build and Run
|
|
|
|
### Single Build Script
|
|
```bash
|
|
./build-nvdec.sh
|
|
```
|
|
**Build time**: 45-90 minutes (compiles FFmpeg + OpenCV from source)
|
|
|
|
### Run with GPU Support
|
|
```bash
|
|
docker run --gpus all -p 8000:8000 detector-worker:complete-hw-accel
|
|
```
|
|
|
|
## Performance Improvements
|
|
|
|
### Expected CPU Reduction
|
|
- **Video decoding**: 70-90% reduction (moved to GPU)
|
|
- **JPEG encoding**: 70-80% faster with TurboJPEG
|
|
- **Model inference**: GPU accelerated with CUDA
|
|
- **Overall system**: 50-80% less CPU usage
|
|
|
|
### Profiling Results Comparison
|
|
**Before (Software Only)**:
|
|
- `cv2.imencode`: 6.5% CPU time (1.95s out of 30s)
|
|
- `psutil.cpu_percent`: 88% CPU time (idle polling)
|
|
- Video decoding: 100% CPU
|
|
|
|
**After (Hardware Accelerated)**:
|
|
- Video decoding: GPU (~5-10% CPU overhead)
|
|
- JPEG encoding: 3-5x faster with TurboJPEG
|
|
- Model inference: GPU accelerated
|
|
|
|
## Verification
|
|
|
|
### Check Hardware Acceleration Support
|
|
```bash
|
|
docker run --rm --gpus all detector-worker:complete-hw-accel \
|
|
bash -c "ffmpeg -hwaccels && python3 -c 'import cv2; build=cv2.getBuildInformation(); print(\"CUDA:\", \"CUDA\" in build); print(\"CUVID:\", \"CUVID\" in build)'"
|
|
```
|
|
|
|
### Runtime Logs
|
|
The application will automatically log which acceleration method is being used:
|
|
```
|
|
Camera cam1: Successfully using GStreamer with NVDEC hardware acceleration
|
|
Camera cam2: Using FFMPEG hardware acceleration (backend: FFMPEG)
|
|
Camera cam3: Using OpenCV CUDA hardware acceleration
|
|
```
|
|
|
|
## Files Modified
|
|
|
|
### Docker Configuration
|
|
- **Dockerfile.base** - Complete hardware acceleration stack
|
|
- **build-nvdec.sh** - Single build script for everything
|
|
|
|
### Application Code
|
|
- **core/streaming/readers.py** - Multi-method hardware acceleration
|
|
- **core/utils/hardware_encoder.py** - TurboJPEG + NVENC encoding
|
|
- **core/utils/ffmpeg_detector.py** - Runtime capability detection
|
|
- **requirements.base.txt** - Added TurboJPEG, removed opencv-python
|
|
|
|
## Architecture
|
|
|
|
```
|
|
Input RTSP Stream
|
|
↓
|
|
1. GStreamer NVDEC Pipeline (NVIDIA GPU)
|
|
rtspsrc → nvv4l2decoder → nvvideoconvert → OpenCV
|
|
↓
|
|
2. OpenCV CUDA Backend (NVIDIA GPU)
|
|
OpenCV with CUDA acceleration
|
|
↓
|
|
3. FFmpeg CUVID (NVIDIA GPU)
|
|
Custom FFmpeg with h264_cuvid decoder
|
|
↓
|
|
4. VAAPI (Intel/AMD GPU)
|
|
Hardware acceleration for non-NVIDIA
|
|
↓
|
|
5. Software Fallback (CPU)
|
|
Standard OpenCV software decoding
|
|
```
|
|
|
|
## Benefits
|
|
|
|
### For Development
|
|
- **Single Dockerfile.base** - Everything consolidated
|
|
- **Automatic detection** - No manual configuration needed
|
|
- **Graceful fallback** - Works without GPU for development
|
|
|
|
### For Production
|
|
- **Maximum performance** - Uses best available acceleration
|
|
- **GPU memory efficiency** - Direct GPU-to-GPU pipeline
|
|
- **Lower latency** - Hardware decoding + CUDA inference
|
|
- **Reduced CPU load** - Frees CPU for other tasks
|
|
|
|
## Troubleshooting
|
|
|
|
### Build Issues
|
|
- Ensure NVIDIA Docker runtime is installed
|
|
- Check CUDA 12.6 compatibility with your GPU
|
|
- Build takes 45-90 minutes - be patient
|
|
|
|
### Runtime Issues
|
|
- Verify `nvidia-smi` works in container
|
|
- Check logs for acceleration method being used
|
|
- Fallback to software decoding is automatic
|
|
|
|
This setup provides **production-ready hardware acceleration** with automatic detection and graceful fallback for maximum compatibility. |