python-detector-worker/README-hardware-acceleration.md

# Hardware Acceleration Setup

This detector worker now includes **complete NVIDIA hardware acceleration** with FFmpeg and OpenCV built from source.

## What's Included

### 🔧 Complete Hardware Stack
- **FFmpeg 6.0** built from source with NVIDIA Video Codec SDK
- **OpenCV 4.8.1** built with CUDA and custom FFmpeg integration
- **GStreamer** with NVDEC/VAAPI plugins
- **TurboJPEG** for optimized JPEG encoding (3-5x faster)
- **CUDA** support for YOLO model inference

### 🎯 Hardware Acceleration Methods (Automatic Detection)
1. **GStreamer NVDEC** - Best for RTSP streaming, lowest latency
2. **OpenCV CUDA** - Direct GPU memory access, best integration
3. **FFmpeg CUVID** - Custom build with full NVIDIA acceleration
4. **VAAPI** - Intel/AMD GPU support
5. **Software Fallback** - CPU-only as last resort

## Build and Run

### Single Build Script
```bash
./build-nvdec.sh
```
**Build time**: 45-90 minutes (compiles FFmpeg + OpenCV from source)

### Run with GPU Support
```bash
docker run --gpus all -p 8000:8000 detector-worker:complete-hw-accel
```

## Performance Improvements

### Expected CPU Reduction
- **Video decoding**: 70-90% reduction (moved to GPU)
- **JPEG encoding**: 70-80% faster with TurboJPEG
- **Model inference**: GPU accelerated with CUDA
- **Overall system**: 50-80% less CPU usage

### Profiling Results Comparison
**Before (Software Only)**:
- `cv2.imencode`: 6.5% CPU time (1.95s out of 30s)
- `psutil.cpu_percent`: 88% CPU time (idle polling)
- Video decoding: 100% CPU

**After (Hardware Accelerated)**:
- Video decoding: GPU (~5-10% CPU overhead)
- JPEG encoding: 3-5x faster with TurboJPEG
- Model inference: GPU accelerated

## Verification

### Check Hardware Acceleration Support
```bash
docker run --rm --gpus all detector-worker:complete-hw-accel \
  bash -c "ffmpeg -hwaccels && python3 -c 'import cv2; build=cv2.getBuildInformation(); print(\"CUDA:\", \"CUDA\" in build); print(\"CUVID:\", \"CUVID\" in build)'"
```

### Runtime Logs
The application will automatically log which acceleration method is being used:
```
Camera cam1: Successfully using GStreamer with NVDEC hardware acceleration
Camera cam2: Using FFMPEG hardware acceleration (backend: FFMPEG)
Camera cam3: Using OpenCV CUDA hardware acceleration
```

## Files Modified

### Docker Configuration
- **Dockerfile.base** - Complete hardware acceleration stack
- **build-nvdec.sh** - Single build script for everything

### Application Code
- **core/streaming/readers.py** - Multi-method hardware acceleration
- **core/utils/hardware_encoder.py** - TurboJPEG + NVENC encoding
- **core/utils/ffmpeg_detector.py** - Runtime capability detection
- **requirements.base.txt** - Added TurboJPEG, removed opencv-python

## Architecture

```
Input RTSP Stream
       ↓
1. GStreamer NVDEC Pipeline (NVIDIA GPU)
   rtspsrc → nvv4l2decoder → nvvideoconvert → OpenCV
       ↓
2. OpenCV CUDA Backend (NVIDIA GPU)
   OpenCV with CUDA acceleration
       ↓
3. FFmpeg CUVID (NVIDIA GPU)
   Custom FFmpeg with h264_cuvid decoder
       ↓
4. VAAPI (Intel/AMD GPU)
   Hardware acceleration for non-NVIDIA
       ↓
5. Software Fallback (CPU)
   Standard OpenCV software decoding
```

## Benefits

### For Development
- **Single Dockerfile.base** - Everything consolidated
- **Automatic detection** - No manual configuration needed
- **Graceful fallback** - Works without GPU for development

### For Production
- **Maximum performance** - Uses best available acceleration
- **GPU memory efficiency** - Direct GPU-to-GPU pipeline
- **Lower latency** - Hardware decoding + CUDA inference
- **Reduced CPU load** - Frees CPU for other tasks

## Troubleshooting

### Build Issues
- Ensure NVIDIA Docker runtime is installed
- Check CUDA 12.6 compatibility with your GPU
- Build takes 45-90 minutes - be patient

### Runtime Issues
- Verify `nvidia-smi` works in container
- Check logs for acceleration method being used
- Fallback to software decoding is automatic

This setup provides **production-ready hardware acceleration** with automatic detection and graceful fallback for maximum compatibility.