update docs

2025-11-09 11:53:03 +07:00 · 2025-11-09 11:53:03 +07:00 · e71316ef3d
commit e71316ef3d
parent 56a65a3377
1 changed files with 3 additions and 71 deletions
--- a/claude.md
+++ b/claude.md
@ -122,7 +122,7 @@ jpeg_data = encoder.encode(nv_image, "jpeg", encode_params)

 ## Performance Metrics

-### VRAM Usage (Python Process)
+### VRAM Usage (at 720p)

 | Streams | Total VRAM | Overhead | Per Stream | Marginal Cost |
 |---------|-----------|----------|------------|---------------|
@ -134,24 +134,10 @@ jpeg_data = encoder.encode(nv_image, "jpeg", encode_params)

 **Result:** Perfect linear scaling at ~60 MB per stream

-### Capacity Estimates
-
-With 60 MB per stream + 216 MB baseline:
-
- **16GB GPU**: ~269 cameras (conservative: ~250)
- **24GB GPU**: ~407 cameras (conservative: ~380)
- **48GB GPU**: ~815 cameras (conservative: ~780)
- **For 1000 streams**: ~60GB VRAM required
-
-### Throughput
-
- **Frame Rate**: 7-7.5 FPS per stream @ 720p
- **JPEG Encoding**: 1-2ms per frame
- **Connection Time**: ~15s for stream stabilization
-
 ## Project Structure

 ```
+
 python-rtsp-worker/
 ├── app.py                      # FastAPI application
 ├── services/
@ -166,23 +152,11 @@ python-rtsp-worker/
 ├── requirements.txt           # Python dependencies
 ├── .env                       # Camera URLs (gitignored)
 ├── .env.example              # Template for camera URLs
+
 └── .gitignore

 ```

-## Dependencies
-
-```
-fastapi                    # Web framework
-uvicorn[standard]         # ASGI server
-torch                     # GPU tensor operations
-PyNvVideoCodec            # NVDEC hardware decoding
-av                        # FFmpeg/RTSP client
-cuda-python               # CUDA driver bindings
-nvidia-nvimgcodec-cu12    # nvJPEG encoding
-python-dotenv             # Environment variables
-```
-
 ## Configuration

 ### Environment Variables (.env)
@ -319,48 +293,6 @@ python test_jpeg_encode.py
 **Cause**: Likely CUDA context cleanup order issues
 **Workaround**: Functionality works correctly; cleanup errors can be ignored

-## Technical Decisions
-
-### Why PyNvVideoCodec?
- Direct access to NVDEC hardware decoder
- Minimal overhead compared to FFmpeg/torchaudio
- Returns GPU tensors via DLPack
- Better control over decode sessions
-
-### Why Shared CUDA Context?
- Reduces VRAM from ~200MB to ~60MB per stream (70% savings)
- Enables 1000-stream target on 60GB GPU
- Minimal complexity overhead with singleton pattern
-
-### Why nvImageCodec?
- GPU-native JPEG encoding (nvJPEG)
- Zero-copy with PyTorch via `__cuda_array_interface__`
- 1-2ms encoding time per 720p frame
- Keeps data on GPU until final compression
-
-### Why Thread-Safe Ring Buffer?
- Decouples decoding from inference pipeline
- Prevents frame drops during processing spikes
- Allows async frame access
- Configurable buffer size per stream
-
-## Future Considerations
-
-### Hardware Decode Session Limits
- NVIDIA GPUs typically support 5-30 concurrent decode sessions
- May need multiple GPUs for 1000 streams
- Test with actual hardware to verify limits
-
-### Scaling Beyond 1000 Streams
- Multi-GPU support with context per GPU
- Load balancing across GPUs
- Network bandwidth considerations
-
-### TensorRT Integration
- Next step: Integrate with TensorRT inference pipeline
- GPU frames → TensorRT → Results
- Keep entire pipeline on GPU
-
 ## References

 - [PyNvVideoCodec Documentation](https://developer.nvidia.com/pynvvideocodec)