update docs
This commit is contained in:
parent
56a65a3377
commit
e71316ef3d
1 changed files with 3 additions and 71 deletions
74
claude.md
74
claude.md
|
|
@ -122,7 +122,7 @@ jpeg_data = encoder.encode(nv_image, "jpeg", encode_params)
|
|||
|
||||
## Performance Metrics
|
||||
|
||||
### VRAM Usage (Python Process)
|
||||
### VRAM Usage (at 720p)
|
||||
|
||||
| Streams | Total VRAM | Overhead | Per Stream | Marginal Cost |
|
||||
|---------|-----------|----------|------------|---------------|
|
||||
|
|
@ -134,24 +134,10 @@ jpeg_data = encoder.encode(nv_image, "jpeg", encode_params)
|
|||
|
||||
**Result:** Perfect linear scaling at ~60 MB per stream
|
||||
|
||||
### Capacity Estimates
|
||||
|
||||
With 60 MB per stream + 216 MB baseline:
|
||||
|
||||
- **16GB GPU**: ~269 cameras (conservative: ~250)
|
||||
- **24GB GPU**: ~407 cameras (conservative: ~380)
|
||||
- **48GB GPU**: ~815 cameras (conservative: ~780)
|
||||
- **For 1000 streams**: ~60GB VRAM required
|
||||
|
||||
### Throughput
|
||||
|
||||
- **Frame Rate**: 7-7.5 FPS per stream @ 720p
|
||||
- **JPEG Encoding**: 1-2ms per frame
|
||||
- **Connection Time**: ~15s for stream stabilization
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
|
||||
python-rtsp-worker/
|
||||
├── app.py # FastAPI application
|
||||
├── services/
|
||||
|
|
@ -166,23 +152,11 @@ python-rtsp-worker/
|
|||
├── requirements.txt # Python dependencies
|
||||
├── .env # Camera URLs (gitignored)
|
||||
├── .env.example # Template for camera URLs
|
||||
|
||||
└── .gitignore
|
||||
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
|
||||
```
|
||||
fastapi # Web framework
|
||||
uvicorn[standard] # ASGI server
|
||||
torch # GPU tensor operations
|
||||
PyNvVideoCodec # NVDEC hardware decoding
|
||||
av # FFmpeg/RTSP client
|
||||
cuda-python # CUDA driver bindings
|
||||
nvidia-nvimgcodec-cu12 # nvJPEG encoding
|
||||
python-dotenv # Environment variables
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables (.env)
|
||||
|
|
@ -319,48 +293,6 @@ python test_jpeg_encode.py
|
|||
**Cause**: Likely CUDA context cleanup order issues
|
||||
**Workaround**: Functionality works correctly; cleanup errors can be ignored
|
||||
|
||||
## Technical Decisions
|
||||
|
||||
### Why PyNvVideoCodec?
|
||||
- Direct access to NVDEC hardware decoder
|
||||
- Minimal overhead compared to FFmpeg/torchaudio
|
||||
- Returns GPU tensors via DLPack
|
||||
- Better control over decode sessions
|
||||
|
||||
### Why Shared CUDA Context?
|
||||
- Reduces VRAM from ~200MB to ~60MB per stream (70% savings)
|
||||
- Enables 1000-stream target on 60GB GPU
|
||||
- Minimal complexity overhead with singleton pattern
|
||||
|
||||
### Why nvImageCodec?
|
||||
- GPU-native JPEG encoding (nvJPEG)
|
||||
- Zero-copy with PyTorch via `__cuda_array_interface__`
|
||||
- 1-2ms encoding time per 720p frame
|
||||
- Keeps data on GPU until final compression
|
||||
|
||||
### Why Thread-Safe Ring Buffer?
|
||||
- Decouples decoding from inference pipeline
|
||||
- Prevents frame drops during processing spikes
|
||||
- Allows async frame access
|
||||
- Configurable buffer size per stream
|
||||
|
||||
## Future Considerations
|
||||
|
||||
### Hardware Decode Session Limits
|
||||
- NVIDIA GPUs typically support 5-30 concurrent decode sessions
|
||||
- May need multiple GPUs for 1000 streams
|
||||
- Test with actual hardware to verify limits
|
||||
|
||||
### Scaling Beyond 1000 Streams
|
||||
- Multi-GPU support with context per GPU
|
||||
- Load balancing across GPUs
|
||||
- Network bandwidth considerations
|
||||
|
||||
### TensorRT Integration
|
||||
- Next step: Integrate with TensorRT inference pipeline
|
||||
- GPU frames → TensorRT → Results
|
||||
- Keep entire pipeline on GPU
|
||||
|
||||
## References
|
||||
|
||||
- [PyNvVideoCodec Documentation](https://developer.nvidia.com/pynvvideocodec)
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue