update docs

This commit is contained in:
Siwat Sirichai 2025-11-09 11:53:03 +07:00
parent 56a65a3377
commit e71316ef3d

View file

@ -122,7 +122,7 @@ jpeg_data = encoder.encode(nv_image, "jpeg", encode_params)
## Performance Metrics
### VRAM Usage (Python Process)
### VRAM Usage (at 720p)
| Streams | Total VRAM | Overhead | Per Stream | Marginal Cost |
|---------|-----------|----------|------------|---------------|
@ -134,24 +134,10 @@ jpeg_data = encoder.encode(nv_image, "jpeg", encode_params)
**Result:** Perfect linear scaling at ~60 MB per stream
### Capacity Estimates
With 60 MB per stream + 216 MB baseline:
- **16GB GPU**: ~269 cameras (conservative: ~250)
- **24GB GPU**: ~407 cameras (conservative: ~380)
- **48GB GPU**: ~815 cameras (conservative: ~780)
- **For 1000 streams**: ~60GB VRAM required
### Throughput
- **Frame Rate**: 7-7.5 FPS per stream @ 720p
- **JPEG Encoding**: 1-2ms per frame
- **Connection Time**: ~15s for stream stabilization
## Project Structure
```
python-rtsp-worker/
├── app.py # FastAPI application
├── services/
@ -166,23 +152,11 @@ python-rtsp-worker/
├── requirements.txt # Python dependencies
├── .env # Camera URLs (gitignored)
├── .env.example # Template for camera URLs
└── .gitignore
```
## Dependencies
```
fastapi # Web framework
uvicorn[standard] # ASGI server
torch # GPU tensor operations
PyNvVideoCodec # NVDEC hardware decoding
av # FFmpeg/RTSP client
cuda-python # CUDA driver bindings
nvidia-nvimgcodec-cu12 # nvJPEG encoding
python-dotenv # Environment variables
```
## Configuration
### Environment Variables (.env)
@ -319,48 +293,6 @@ python test_jpeg_encode.py
**Cause**: Likely CUDA context cleanup order issues
**Workaround**: Functionality works correctly; cleanup errors can be ignored
## Technical Decisions
### Why PyNvVideoCodec?
- Direct access to NVDEC hardware decoder
- Minimal overhead compared to FFmpeg/torchaudio
- Returns GPU tensors via DLPack
- Better control over decode sessions
### Why Shared CUDA Context?
- Reduces VRAM from ~200MB to ~60MB per stream (70% savings)
- Enables 1000-stream target on 60GB GPU
- Minimal complexity overhead with singleton pattern
### Why nvImageCodec?
- GPU-native JPEG encoding (nvJPEG)
- Zero-copy with PyTorch via `__cuda_array_interface__`
- 1-2ms encoding time per 720p frame
- Keeps data on GPU until final compression
### Why Thread-Safe Ring Buffer?
- Decouples decoding from inference pipeline
- Prevents frame drops during processing spikes
- Allows async frame access
- Configurable buffer size per stream
## Future Considerations
### Hardware Decode Session Limits
- NVIDIA GPUs typically support 5-30 concurrent decode sessions
- May need multiple GPUs for 1000 streams
- Test with actual hardware to verify limits
### Scaling Beyond 1000 Streams
- Multi-GPU support with context per GPU
- Load balancing across GPUs
- Network bandwidth considerations
### TensorRT Integration
- Next step: Integrate with TensorRT inference pipeline
- GPU frames → TensorRT → Results
- Keep entire pipeline on GPU
## References
- [PyNvVideoCodec Documentation](https://developer.nvidia.com/pynvvideocodec)