feat: update rtsp scaling plan
	
		
			
	
		
	
	
		
	
		
			All checks were successful
		
		
	
	
		
			
				
	
				Build Worker Base and Application Images / check-base-changes (push) Successful in 8s
				
			
		
			
				
	
				Build Worker Base and Application Images / build-base (push) Has been skipped
				
			
		
			
				
	
				Build Worker Base and Application Images / build-docker (push) Successful in 2m53s
				
			
		
			
				
	
				Build Worker Base and Application Images / deploy-stack (push) Successful in 8s
				
			
		
		
	
	
				
					
				
			
		
			All checks were successful
		
		
	
	Build Worker Base and Application Images / check-base-changes (push) Successful in 8s
				
			Build Worker Base and Application Images / build-base (push) Has been skipped
				
			Build Worker Base and Application Images / build-docker (push) Successful in 2m53s
				
			Build Worker Base and Application Images / deploy-stack (push) Successful in 8s
				
			This commit is contained in:
		
							parent
							
								
									9f29755e0f
								
							
						
					
					
						commit
						e87ed4c056
					
				
					 1 changed files with 382 additions and 0 deletions
				
			
		
							
								
								
									
										382
									
								
								RTSP_SCALING_SOLUTION.md
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										382
									
								
								RTSP_SCALING_SOLUTION.md
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,382 @@
 | 
			
		|||
# RTSP Stream Scaling Solution Plan
 | 
			
		||||
 | 
			
		||||
## Problem Statement
 | 
			
		||||
Current implementation fails with 8+ concurrent RTSP streams (1280x720@6fps) due to:
 | 
			
		||||
- Python GIL bottleneck limiting true parallelism
 | 
			
		||||
- OpenCV/FFMPEG resource contention
 | 
			
		||||
- Thread starvation causing frame read failures
 | 
			
		||||
- Socket buffer exhaustion dropping UDP packets
 | 
			
		||||
 | 
			
		||||
## Selected Solution: Phased Approach
 | 
			
		||||
 | 
			
		||||
### Phase 1: Quick Fix - Multiprocessing (8-20 cameras)
 | 
			
		||||
**Timeline:** 1-2 days
 | 
			
		||||
**Goal:** Immediate fix for current 8 camera deployment
 | 
			
		||||
 | 
			
		||||
### Phase 2: Long-term - go2rtc or GStreamer/FFmpeg Proxy (20+ cameras)
 | 
			
		||||
**Timeline:** 1-2 weeks
 | 
			
		||||
**Goal:** Scalable architecture for future growth
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## Implementation Checklist
 | 
			
		||||
 | 
			
		||||
### Phase 1: Multiprocessing Solution
 | 
			
		||||
 | 
			
		||||
#### Core Architecture Changes
 | 
			
		||||
- [ ] Create `RTSPProcessManager` class to manage camera processes
 | 
			
		||||
- [ ] Implement shared memory for frame passing (using `multiprocessing.shared_memory`)
 | 
			
		||||
- [ ] Create `CameraProcess` worker class for individual camera handling
 | 
			
		||||
- [ ] Add process pool executor with configurable worker count
 | 
			
		||||
- [ ] Implement process health monitoring and auto-restart
 | 
			
		||||
 | 
			
		||||
#### Frame Pipeline
 | 
			
		||||
- [ ] Replace threading.Thread with multiprocessing.Process for readers
 | 
			
		||||
- [ ] Implement zero-copy frame transfer using shared memory buffers
 | 
			
		||||
- [ ] Add frame queue with backpressure handling
 | 
			
		||||
- [ ] Create frame skipping logic when processing falls behind
 | 
			
		||||
- [ ] Add timestamp-based frame dropping (keep only recent frames)
 | 
			
		||||
 | 
			
		||||
#### Thread Safety & Synchronization (CRITICAL)
 | 
			
		||||
- [ ] Implement `multiprocessing.Lock()` for all shared memory write operations
 | 
			
		||||
- [ ] Use `multiprocessing.Queue()` instead of shared lists (thread-safe by design)
 | 
			
		||||
- [ ] Replace counters with `multiprocessing.Value()` for atomic operations
 | 
			
		||||
- [ ] Implement lock-free ring buffer using `multiprocessing.Array()` for frames
 | 
			
		||||
- [ ] Use `multiprocessing.Manager()` for complex shared objects (dicts, lists)
 | 
			
		||||
- [ ] Add memory barriers for CPU cache coherency
 | 
			
		||||
- [ ] Create read-write locks for frame buffers (multiple readers, single writer)
 | 
			
		||||
- [ ] Implement semaphores for limiting concurrent RTSP connections
 | 
			
		||||
- [ ] Add process-safe logging with `QueueHandler` and `QueueListener`
 | 
			
		||||
- [ ] Use `multiprocessing.Condition()` for frame-ready notifications
 | 
			
		||||
- [ ] Implement deadlock detection and recovery mechanism
 | 
			
		||||
- [ ] Add timeout on all lock acquisitions to prevent hanging
 | 
			
		||||
- [ ] Create lock hierarchy documentation to prevent deadlocks
 | 
			
		||||
- [ ] Implement lock-free data structures where possible (SPSC queues)
 | 
			
		||||
- [ ] Add memory fencing for shared memory access patterns
 | 
			
		||||
 | 
			
		||||
#### Resource Management
 | 
			
		||||
- [ ] Set process CPU affinity for better cache utilization
 | 
			
		||||
- [ ] Implement memory pool for frame buffers (prevent allocation overhead)
 | 
			
		||||
- [ ] Add configurable process limits based on CPU cores
 | 
			
		||||
- [ ] Create graceful shutdown mechanism for all processes
 | 
			
		||||
- [ ] Add resource monitoring (CPU, memory per process)
 | 
			
		||||
 | 
			
		||||
#### Configuration Updates
 | 
			
		||||
- [ ] Add `max_processes` config parameter (default: CPU cores - 2)
 | 
			
		||||
- [ ] Add `frames_per_second_limit` for frame skipping
 | 
			
		||||
- [ ] Add `frame_queue_size` parameter
 | 
			
		||||
- [ ] Add `process_restart_threshold` for failure recovery
 | 
			
		||||
- [ ] Update Docker container to handle multiprocessing
 | 
			
		||||
 | 
			
		||||
#### Error Handling
 | 
			
		||||
- [ ] Implement process crash detection and recovery
 | 
			
		||||
- [ ] Add exponential backoff for process restarts
 | 
			
		||||
- [ ] Create dead process cleanup mechanism
 | 
			
		||||
- [ ] Add logging aggregation from multiple processes
 | 
			
		||||
- [ ] Implement shared error counter with thresholds
 | 
			
		||||
 | 
			
		||||
#### Testing
 | 
			
		||||
- [ ] Test with 8 cameras simultaneously
 | 
			
		||||
- [ ] Verify frame rate stability under load
 | 
			
		||||
- [ ] Test process crash recovery
 | 
			
		||||
- [ ] Measure CPU and memory usage
 | 
			
		||||
- [ ] Load test with 15-20 cameras
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
### Phase 2: go2rtc or GStreamer/FFmpeg Proxy Solution
 | 
			
		||||
 | 
			
		||||
#### Option A: go2rtc Integration (Recommended)
 | 
			
		||||
- [ ] Deploy go2rtc as separate service container
 | 
			
		||||
- [ ] Configure go2rtc streams.yaml for all cameras
 | 
			
		||||
- [ ] Implement Python client to consume go2rtc WebRTC/HLS streams
 | 
			
		||||
- [ ] Add automatic camera discovery and registration
 | 
			
		||||
- [ ] Create health monitoring for go2rtc service
 | 
			
		||||
 | 
			
		||||
#### Option B: Custom Proxy Service
 | 
			
		||||
- [ ] Create standalone RTSP proxy service
 | 
			
		||||
- [ ] Implement GStreamer pipeline for multiple RTSP inputs
 | 
			
		||||
- [ ] Add hardware acceleration detection (NVDEC, VAAPI)
 | 
			
		||||
- [ ] Create shared memory or socket output for frames
 | 
			
		||||
- [ ] Implement dynamic stream addition/removal API
 | 
			
		||||
 | 
			
		||||
#### Integration Layer
 | 
			
		||||
- [ ] Create Python client for proxy service
 | 
			
		||||
- [ ] Implement frame receiver from proxy
 | 
			
		||||
- [ ] Add stream control commands (start/stop/restart)
 | 
			
		||||
- [ ] Create fallback to multiprocessing if proxy fails
 | 
			
		||||
- [ ] Add proxy health monitoring
 | 
			
		||||
 | 
			
		||||
#### Performance Optimization
 | 
			
		||||
- [ ] Implement hardware decoder auto-detection
 | 
			
		||||
- [ ] Add adaptive bitrate handling
 | 
			
		||||
- [ ] Create intelligent frame dropping at source
 | 
			
		||||
- [ ] Add network buffer tuning
 | 
			
		||||
- [ ] Implement zero-copy frame pipeline
 | 
			
		||||
 | 
			
		||||
#### Deployment
 | 
			
		||||
- [ ] Create Docker container for proxy service
 | 
			
		||||
- [ ] Add Kubernetes deployment configs
 | 
			
		||||
- [ ] Create service mesh for multi-instance scaling
 | 
			
		||||
- [ ] Add load balancer for camera distribution
 | 
			
		||||
- [ ] Implement monitoring and alerting
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## Quick Wins (Implement Immediately)
 | 
			
		||||
 | 
			
		||||
### Network Optimizations
 | 
			
		||||
- [ ] Increase system socket buffer sizes:
 | 
			
		||||
  ```bash
 | 
			
		||||
  sysctl -w net.core.rmem_default=2097152
 | 
			
		||||
  sysctl -w net.core.rmem_max=8388608
 | 
			
		||||
  ```
 | 
			
		||||
- [ ] Increase file descriptor limits:
 | 
			
		||||
  ```bash
 | 
			
		||||
  ulimit -n 65535
 | 
			
		||||
  ```
 | 
			
		||||
- [ ] Add to Docker compose:
 | 
			
		||||
  ```yaml
 | 
			
		||||
  ulimits:
 | 
			
		||||
    nofile:
 | 
			
		||||
      soft: 65535
 | 
			
		||||
      hard: 65535
 | 
			
		||||
  ```
 | 
			
		||||
 | 
			
		||||
### Code Optimizations
 | 
			
		||||
- [ ] Fix RTSP TCP transport bug in readers.py
 | 
			
		||||
- [ ] Increase error threshold to 30 (already done)
 | 
			
		||||
- [ ] Add frame timestamp checking to skip old frames
 | 
			
		||||
- [ ] Implement connection pooling for RTSP streams
 | 
			
		||||
- [ ] Add configurable frame skip interval
 | 
			
		||||
 | 
			
		||||
### Monitoring
 | 
			
		||||
- [ ] Add metrics for frames processed/dropped per camera
 | 
			
		||||
- [ ] Log queue sizes and processing delays
 | 
			
		||||
- [ ] Track FFMPEG/OpenCV resource usage
 | 
			
		||||
- [ ] Create dashboard for stream health monitoring
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## Performance Targets
 | 
			
		||||
 | 
			
		||||
### Phase 1 (Multiprocessing)
 | 
			
		||||
- Support: 15-20 cameras
 | 
			
		||||
- Frame rate: Stable 5-6 fps per camera
 | 
			
		||||
- CPU usage: < 80% on 8-core system
 | 
			
		||||
- Memory: < 2GB total
 | 
			
		||||
- Latency: < 200ms frame-to-detection
 | 
			
		||||
 | 
			
		||||
### Phase 2 (GStreamer)
 | 
			
		||||
- Support: 50+ cameras (100+ with HW acceleration)
 | 
			
		||||
- Frame rate: Full 6 fps per camera
 | 
			
		||||
- CPU usage: < 50% on 8-core system
 | 
			
		||||
- Memory: < 1GB for proxy + workers
 | 
			
		||||
- Latency: < 100ms frame-to-detection
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## Risk Mitigation
 | 
			
		||||
 | 
			
		||||
### Known Risks
 | 
			
		||||
1. **Race Conditions** - Multiple processes writing to same memory location
 | 
			
		||||
   - *Mitigation*: Strict locking protocol, atomic operations only
 | 
			
		||||
2. **Deadlocks** - Circular lock dependencies between processes
 | 
			
		||||
   - *Mitigation*: Lock ordering, timeouts, deadlock detection
 | 
			
		||||
3. **Frame Corruption** - Partial writes to shared memory during reads
 | 
			
		||||
   - *Mitigation*: Double buffering, memory barriers, atomic swaps
 | 
			
		||||
4. **Memory Coherency** - CPU cache inconsistencies between cores
 | 
			
		||||
   - *Mitigation*: Memory fencing, volatile markers, cache line padding
 | 
			
		||||
5. **Lock Contention** - Too many processes waiting for same lock
 | 
			
		||||
   - *Mitigation*: Fine-grained locks, lock-free structures, sharding
 | 
			
		||||
6. **Multiprocessing overhead** - Monitor shared memory performance
 | 
			
		||||
7. **Memory leaks** - Implement proper cleanup and monitoring
 | 
			
		||||
8. **Network bandwidth** - Add bandwidth monitoring and alerts
 | 
			
		||||
9. **Hardware limitations** - Profile and set realistic limits
 | 
			
		||||
 | 
			
		||||
### Fallback Strategy
 | 
			
		||||
- Keep current threading implementation as fallback
 | 
			
		||||
- Implement feature flag to switch between implementations
 | 
			
		||||
- Add automatic fallback on repeated failures
 | 
			
		||||
- Maintain backwards compatibility with existing API
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## Success Criteria
 | 
			
		||||
 | 
			
		||||
### Phase 1 Complete When:
 | 
			
		||||
- [x] All 8 cameras run simultaneously without frame read failures
 | 
			
		||||
- [ ] System stable for 24+ hours continuous operation
 | 
			
		||||
- [ ] CPU usage remains below 80%
 | 
			
		||||
- [ ] No memory leaks detected
 | 
			
		||||
- [ ] Frame processing latency < 200ms
 | 
			
		||||
 | 
			
		||||
### Phase 2 Complete When:
 | 
			
		||||
- [ ] Successfully handling 20+ cameras
 | 
			
		||||
- [ ] Hardware acceleration working (if available)
 | 
			
		||||
- [ ] Proxy service stable and monitored
 | 
			
		||||
- [ ] Automatic scaling implemented
 | 
			
		||||
- [ ] Full production deployment complete
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## Thread Safety Implementation Details
 | 
			
		||||
 | 
			
		||||
### Critical Sections Requiring Synchronization
 | 
			
		||||
 | 
			
		||||
#### 1. Frame Buffer Access
 | 
			
		||||
```python
 | 
			
		||||
# UNSAFE - Race condition
 | 
			
		||||
shared_frames[camera_id] = new_frame  # Multiple writers
 | 
			
		||||
 | 
			
		||||
# SAFE - With proper locking
 | 
			
		||||
with frame_locks[camera_id]:
 | 
			
		||||
    # Double buffer swap to avoid corruption
 | 
			
		||||
    write_buffer = frame_buffers[camera_id]['write']
 | 
			
		||||
    write_buffer[:] = new_frame
 | 
			
		||||
    # Atomic swap of buffer pointers
 | 
			
		||||
    frame_buffers[camera_id]['write'], frame_buffers[camera_id]['read'] = \
 | 
			
		||||
        frame_buffers[camera_id]['read'], frame_buffers[camera_id]['write']
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
#### 2. Statistics/Counters
 | 
			
		||||
```python
 | 
			
		||||
# UNSAFE
 | 
			
		||||
frame_count += 1  # Not atomic
 | 
			
		||||
 | 
			
		||||
# SAFE
 | 
			
		||||
with frame_count.get_lock():
 | 
			
		||||
    frame_count.value += 1
 | 
			
		||||
# OR use atomic Value
 | 
			
		||||
frame_count = multiprocessing.Value('i', 0)  # Atomic integer
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
#### 3. Queue Operations
 | 
			
		||||
```python
 | 
			
		||||
# SAFE - multiprocessing.Queue is thread-safe
 | 
			
		||||
frame_queue = multiprocessing.Queue(maxsize=100)
 | 
			
		||||
# Put with timeout to avoid blocking
 | 
			
		||||
try:
 | 
			
		||||
    frame_queue.put(frame, timeout=0.1)
 | 
			
		||||
except queue.Full:
 | 
			
		||||
    # Handle backpressure
 | 
			
		||||
    pass
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
#### 4. Shared Memory Layout
 | 
			
		||||
```python
 | 
			
		||||
# Define memory structure with proper alignment
 | 
			
		||||
class FrameBuffer:
 | 
			
		||||
    def __init__(self, camera_id, width=1280, height=720):
 | 
			
		||||
        # Align to cache line boundary (64 bytes)
 | 
			
		||||
        self.lock = multiprocessing.Lock()
 | 
			
		||||
 | 
			
		||||
        # Double buffering for lock-free reads
 | 
			
		||||
        buffer_size = width * height * 3  # RGB
 | 
			
		||||
        self.buffer_a = multiprocessing.Array('B', buffer_size)
 | 
			
		||||
        self.buffer_b = multiprocessing.Array('B', buffer_size)
 | 
			
		||||
 | 
			
		||||
        # Atomic pointer to current read buffer (0 or 1)
 | 
			
		||||
        self.read_buffer_idx = multiprocessing.Value('i', 0)
 | 
			
		||||
 | 
			
		||||
        # Metadata (atomic access)
 | 
			
		||||
        self.timestamp = multiprocessing.Value('d', 0.0)
 | 
			
		||||
        self.frame_number = multiprocessing.Value('L', 0)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
### Lock-Free Patterns
 | 
			
		||||
 | 
			
		||||
#### Single Producer, Single Consumer (SPSC) Queue
 | 
			
		||||
```python
 | 
			
		||||
# Lock-free for one writer, one reader
 | 
			
		||||
class SPSCQueue:
 | 
			
		||||
    def __init__(self, size):
 | 
			
		||||
        self.buffer = multiprocessing.Array('i', size)
 | 
			
		||||
        self.head = multiprocessing.Value('L', 0)  # Writer position
 | 
			
		||||
        self.tail = multiprocessing.Value('L', 0)  # Reader position
 | 
			
		||||
        self.size = size
 | 
			
		||||
 | 
			
		||||
    def put(self, item):
 | 
			
		||||
        next_head = (self.head.value + 1) % self.size
 | 
			
		||||
        if next_head == self.tail.value:
 | 
			
		||||
            return False  # Queue full
 | 
			
		||||
        self.buffer[self.head.value] = item
 | 
			
		||||
        self.head.value = next_head  # Atomic update
 | 
			
		||||
        return True
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
### Memory Barrier Considerations
 | 
			
		||||
```python
 | 
			
		||||
import ctypes
 | 
			
		||||
 | 
			
		||||
# Ensure memory visibility across CPU cores
 | 
			
		||||
def memory_fence():
 | 
			
		||||
    # Force CPU cache synchronization
 | 
			
		||||
    ctypes.CDLL(None).sched_yield()  # Linux/Unix
 | 
			
		||||
    # OR use threading.Barrier for synchronization points
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
### Deadlock Prevention Strategy
 | 
			
		||||
 | 
			
		||||
#### Lock Ordering Protocol
 | 
			
		||||
```python
 | 
			
		||||
# Define strict lock acquisition order
 | 
			
		||||
LOCK_ORDER = {
 | 
			
		||||
    'frame_buffer': 1,
 | 
			
		||||
    'statistics': 2,
 | 
			
		||||
    'queue': 3,
 | 
			
		||||
    'config': 4
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
# Always acquire locks in ascending order
 | 
			
		||||
def safe_multi_lock(locks):
 | 
			
		||||
    sorted_locks = sorted(locks, key=lambda x: LOCK_ORDER[x.name])
 | 
			
		||||
    for lock in sorted_locks:
 | 
			
		||||
        lock.acquire(timeout=5.0)  # Timeout prevents hanging
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
#### Monitoring & Detection
 | 
			
		||||
```python
 | 
			
		||||
# Deadlock detector
 | 
			
		||||
def detect_deadlocks():
 | 
			
		||||
    import threading
 | 
			
		||||
    for thread in threading.enumerate():
 | 
			
		||||
        if thread.is_alive():
 | 
			
		||||
            frame = sys._current_frames().get(thread.ident)
 | 
			
		||||
            if frame and 'acquire' in str(frame):
 | 
			
		||||
                logger.warning(f"Potential deadlock: {thread.name}")
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## Notes
 | 
			
		||||
 | 
			
		||||
### Current Bottlenecks (Must Address)
 | 
			
		||||
- Python GIL preventing parallel frame reading
 | 
			
		||||
- FFMPEG internal buffer management
 | 
			
		||||
- Thread context switching overhead
 | 
			
		||||
- Socket receive buffer too small for 8 streams
 | 
			
		||||
- **Thread safety in shared memory access** (CRITICAL)
 | 
			
		||||
 | 
			
		||||
### Key Insights
 | 
			
		||||
- Don't need every frame - intelligent dropping is acceptable
 | 
			
		||||
- Hardware acceleration is crucial for 50+ cameras
 | 
			
		||||
- Process isolation prevents cascade failures
 | 
			
		||||
- Shared memory faster than queues for large frames
 | 
			
		||||
 | 
			
		||||
### Dependencies to Add
 | 
			
		||||
```txt
 | 
			
		||||
# requirements.txt additions
 | 
			
		||||
psutil>=5.9.0  # Process monitoring
 | 
			
		||||
py-cpuinfo>=9.0.0  # CPU detection
 | 
			
		||||
shared-memory-dict>=0.7.2  # Shared memory utils
 | 
			
		||||
multiprocess>=0.70.14  # Better multiprocessing with dill
 | 
			
		||||
atomicwrites>=1.4.0  # Atomic file operations
 | 
			
		||||
portalocker>=2.7.0  # Cross-platform file locking
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
**Last Updated:** 2025-09-25
 | 
			
		||||
**Priority:** CRITICAL - Production deployment blocked
 | 
			
		||||
**Owner:** Engineering Team
 | 
			
		||||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue