update docs
This commit is contained in:
parent
56a65a3377
commit
e71316ef3d
1 changed files with 3 additions and 71 deletions
74
claude.md
74
claude.md
|
|
@ -122,7 +122,7 @@ jpeg_data = encoder.encode(nv_image, "jpeg", encode_params)
|
||||||
|
|
||||||
## Performance Metrics
|
## Performance Metrics
|
||||||
|
|
||||||
### VRAM Usage (Python Process)
|
### VRAM Usage (at 720p)
|
||||||
|
|
||||||
| Streams | Total VRAM | Overhead | Per Stream | Marginal Cost |
|
| Streams | Total VRAM | Overhead | Per Stream | Marginal Cost |
|
||||||
|---------|-----------|----------|------------|---------------|
|
|---------|-----------|----------|------------|---------------|
|
||||||
|
|
@ -134,24 +134,10 @@ jpeg_data = encoder.encode(nv_image, "jpeg", encode_params)
|
||||||
|
|
||||||
**Result:** Perfect linear scaling at ~60 MB per stream
|
**Result:** Perfect linear scaling at ~60 MB per stream
|
||||||
|
|
||||||
### Capacity Estimates
|
|
||||||
|
|
||||||
With 60 MB per stream + 216 MB baseline:
|
|
||||||
|
|
||||||
- **16GB GPU**: ~269 cameras (conservative: ~250)
|
|
||||||
- **24GB GPU**: ~407 cameras (conservative: ~380)
|
|
||||||
- **48GB GPU**: ~815 cameras (conservative: ~780)
|
|
||||||
- **For 1000 streams**: ~60GB VRAM required
|
|
||||||
|
|
||||||
### Throughput
|
|
||||||
|
|
||||||
- **Frame Rate**: 7-7.5 FPS per stream @ 720p
|
|
||||||
- **JPEG Encoding**: 1-2ms per frame
|
|
||||||
- **Connection Time**: ~15s for stream stabilization
|
|
||||||
|
|
||||||
## Project Structure
|
## Project Structure
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
python-rtsp-worker/
|
python-rtsp-worker/
|
||||||
├── app.py # FastAPI application
|
├── app.py # FastAPI application
|
||||||
├── services/
|
├── services/
|
||||||
|
|
@ -166,23 +152,11 @@ python-rtsp-worker/
|
||||||
├── requirements.txt # Python dependencies
|
├── requirements.txt # Python dependencies
|
||||||
├── .env # Camera URLs (gitignored)
|
├── .env # Camera URLs (gitignored)
|
||||||
├── .env.example # Template for camera URLs
|
├── .env.example # Template for camera URLs
|
||||||
|
|
||||||
└── .gitignore
|
└── .gitignore
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Dependencies
|
|
||||||
|
|
||||||
```
|
|
||||||
fastapi # Web framework
|
|
||||||
uvicorn[standard] # ASGI server
|
|
||||||
torch # GPU tensor operations
|
|
||||||
PyNvVideoCodec # NVDEC hardware decoding
|
|
||||||
av # FFmpeg/RTSP client
|
|
||||||
cuda-python # CUDA driver bindings
|
|
||||||
nvidia-nvimgcodec-cu12 # nvJPEG encoding
|
|
||||||
python-dotenv # Environment variables
|
|
||||||
```
|
|
||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
### Environment Variables (.env)
|
### Environment Variables (.env)
|
||||||
|
|
@ -319,48 +293,6 @@ python test_jpeg_encode.py
|
||||||
**Cause**: Likely CUDA context cleanup order issues
|
**Cause**: Likely CUDA context cleanup order issues
|
||||||
**Workaround**: Functionality works correctly; cleanup errors can be ignored
|
**Workaround**: Functionality works correctly; cleanup errors can be ignored
|
||||||
|
|
||||||
## Technical Decisions
|
|
||||||
|
|
||||||
### Why PyNvVideoCodec?
|
|
||||||
- Direct access to NVDEC hardware decoder
|
|
||||||
- Minimal overhead compared to FFmpeg/torchaudio
|
|
||||||
- Returns GPU tensors via DLPack
|
|
||||||
- Better control over decode sessions
|
|
||||||
|
|
||||||
### Why Shared CUDA Context?
|
|
||||||
- Reduces VRAM from ~200MB to ~60MB per stream (70% savings)
|
|
||||||
- Enables 1000-stream target on 60GB GPU
|
|
||||||
- Minimal complexity overhead with singleton pattern
|
|
||||||
|
|
||||||
### Why nvImageCodec?
|
|
||||||
- GPU-native JPEG encoding (nvJPEG)
|
|
||||||
- Zero-copy with PyTorch via `__cuda_array_interface__`
|
|
||||||
- 1-2ms encoding time per 720p frame
|
|
||||||
- Keeps data on GPU until final compression
|
|
||||||
|
|
||||||
### Why Thread-Safe Ring Buffer?
|
|
||||||
- Decouples decoding from inference pipeline
|
|
||||||
- Prevents frame drops during processing spikes
|
|
||||||
- Allows async frame access
|
|
||||||
- Configurable buffer size per stream
|
|
||||||
|
|
||||||
## Future Considerations
|
|
||||||
|
|
||||||
### Hardware Decode Session Limits
|
|
||||||
- NVIDIA GPUs typically support 5-30 concurrent decode sessions
|
|
||||||
- May need multiple GPUs for 1000 streams
|
|
||||||
- Test with actual hardware to verify limits
|
|
||||||
|
|
||||||
### Scaling Beyond 1000 Streams
|
|
||||||
- Multi-GPU support with context per GPU
|
|
||||||
- Load balancing across GPUs
|
|
||||||
- Network bandwidth considerations
|
|
||||||
|
|
||||||
### TensorRT Integration
|
|
||||||
- Next step: Integrate with TensorRT inference pipeline
|
|
||||||
- GPU frames → TensorRT → Results
|
|
||||||
- Keep entire pipeline on GPU
|
|
||||||
|
|
||||||
## References
|
## References
|
||||||
|
|
||||||
- [PyNvVideoCodec Documentation](https://developer.nvidia.com/pynvvideocodec)
|
- [PyNvVideoCodec Documentation](https://developer.nvidia.com/pynvvideocodec)
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue