update docs

2025-11-09 11:53:03 +07:00 · 2025-11-09 11:53:03 +07:00 · e71316ef3d
commit e71316ef3d
parent 56a65a3377
1 changed files with 3 additions and 71 deletions
--- a/claude.md
+++ b/claude.md
@ -122,7 +122,7 @@ jpeg_data = encoder.encode(nv_image, "jpeg", encode_params)
 ## Performance Metrics
-### VRAM Usage (Python Process)
+### VRAM Usage (at 720p)
 | Streams | Total VRAM | Overhead | Per Stream | Marginal Cost |
 |---------|-----------|----------|------------|---------------|
@ -134,24 +134,10 @@ jpeg_data = encoder.encode(nv_image, "jpeg", encode_params)
 **Result:** Perfect linear scaling at ~60 MB per stream
 ### Capacity Estimates
 With 60 MB per stream + 216 MB baseline:
 - **16GB GPU**: ~269 cameras (conservative: ~250)
 - **24GB GPU**: ~407 cameras (conservative: ~380)
 - **48GB GPU**: ~815 cameras (conservative: ~780)
 - **For 1000 streams**: ~60GB VRAM required
 ### Throughput
 - **Frame Rate**: 7-7.5 FPS per stream @ 720p
 - **JPEG Encoding**: 1-2ms per frame
 - **Connection Time**: ~15s for stream stabilization
 ## Project Structure
 ```
 python-rtsp-worker/
 ├── app.py                      # FastAPI application
 ├── services/
@ -166,23 +152,11 @@ python-rtsp-worker/
 ├── requirements.txt           # Python dependencies
 ├── .env                       # Camera URLs (gitignored)
 ├── .env.example              # Template for camera URLs
 └── .gitignore
 ```
 ## Dependencies
 ```
 fastapi                    # Web framework
 uvicorn[standard]         # ASGI server
 torch                     # GPU tensor operations
 PyNvVideoCodec            # NVDEC hardware decoding
 av                        # FFmpeg/RTSP client
 cuda-python               # CUDA driver bindings
 nvidia-nvimgcodec-cu12    # nvJPEG encoding
 python-dotenv             # Environment variables
 ```
 ## Configuration
 ### Environment Variables (.env)
@ -319,48 +293,6 @@ python test_jpeg_encode.py
 **Cause**: Likely CUDA context cleanup order issues
 **Workaround**: Functionality works correctly; cleanup errors can be ignored
 ## Technical Decisions
 ### Why PyNvVideoCodec?
 - Direct access to NVDEC hardware decoder
 - Minimal overhead compared to FFmpeg/torchaudio
 - Returns GPU tensors via DLPack
 - Better control over decode sessions
 ### Why Shared CUDA Context?
 - Reduces VRAM from ~200MB to ~60MB per stream (70% savings)
 - Enables 1000-stream target on 60GB GPU
 - Minimal complexity overhead with singleton pattern
 ### Why nvImageCodec?
 - GPU-native JPEG encoding (nvJPEG)
 - Zero-copy with PyTorch via `__cuda_array_interface__`
 - 1-2ms encoding time per 720p frame
 - Keeps data on GPU until final compression
 ### Why Thread-Safe Ring Buffer?
 - Decouples decoding from inference pipeline
 - Prevents frame drops during processing spikes
 - Allows async frame access
 - Configurable buffer size per stream
 ## Future Considerations
 ### Hardware Decode Session Limits
 - NVIDIA GPUs typically support 5-30 concurrent decode sessions
 - May need multiple GPUs for 1000 streams
 - Test with actual hardware to verify limits
 ### Scaling Beyond 1000 Streams
 - Multi-GPU support with context per GPU
 - Load balancing across GPUs
 - Network bandwidth considerations
 ### TensorRT Integration
 - Next step: Integrate with TensorRT inference pipeline
 - GPU frames → TensorRT → Results
 - Keep entire pipeline on GPU
 ## References
 - [PyNvVideoCodec Documentation](https://developer.nvidia.com/pynvvideocodec)