refactor: half way to process per session

2025-09-25 20:52:26 +07:00 · 2025-09-25 20:52:26 +07:00 · 34d1982e9e
commit 34d1982e9e
parent 2e5316ca01
12 changed files with 2771 additions and 92 deletions
--- a/IMPLEMENTATION_PLAN.md
+++ b/IMPLEMENTATION_PLAN.md
@ -0,0 +1,339 @@
 # Session-Isolated Multiprocessing Architecture - Implementation Plan
 ## 🎯 Objective
 Eliminate shared state issues causing identical results across different sessions by implementing **Process-Per-Session architecture** with **per-camera logging**.
 ## 🔍 Root Cause Analysis
 ### Current Shared State Issues:
 1. **Shared Model Cache** (`core/models/inference.py:40`): All sessions share same cached YOLO model instances
 2. **Single Pipeline Instance** (`core/detection/pipeline.py`): One pipeline handles all sessions with shared mappings
 3. **Global Session Mappings**: `session_to_subscription` and `session_processing_results` dictionaries
 4. **Shared Thread Pool**: Single `ThreadPoolExecutor` for all sessions
 5. **Global Frame Cache** (`app.py:39`): `latest_frames` shared across endpoints
 6. **Single Log File**: All cameras write to `detector_worker.log`
 ## 🏗️ New Architecture: Process-Per-Session
 ```
 FastAPI Main Process (Port 8001)
 ├── WebSocket Handler (manages connections)
 ├── SessionProcessManager (spawns/manages session processes)
 ├── Main Process Logger → detector_worker_main.log
 ├──
 ├── Session Process 1 (Camera/Display 1)
 │   ├── Dedicated Model Pipeline
 │   ├── Own Model Cache & Memory
 │   ├── Session Logger → detector_worker_camera_display-001_cam-001.log
 │   └── Redis/DB connections
 ├──
 ├── Session Process 2 (Camera/Display 2)
 │   ├── Dedicated Model Pipeline
 │   ├── Own Model Cache & Memory
 │   ├── Session Logger → detector_worker_camera_display-002_cam-001.log
 │   └── Redis/DB connections
 └──
 └── Session Process N...
 ```
 ## 📋 Implementation Tasks
 ### Phase 1: Core Infrastructure ✅ **COMPLETED**
 - [x] **Create SessionProcessManager class** ✅
  - Manages lifecycle of session processes
  - Handles process spawning, monitoring, and cleanup
  - Maintains process registry and health checks
 - [x] **Implement SessionWorkerProcess** ✅
  - Individual process class that handles one session completely
  - Loads own models, pipeline, and maintains state
  - Communicates via queues with main process
 - [x] **Design Inter-Process Communication** ✅
  - Command queue: Main → Session (frames, commands, config)
  - Result queue: Session → Main (detections, status, errors)
  - Use `multiprocessing.Queue` for thread-safe communication
 **Phase 1 Testing Results:**
 - ✅ Server starts successfully on port 8001
 - ✅ WebSocket connections established (10.100.1.3:57488)
 - ✅ SessionProcessManager initializes (max_sessions=20)
 - ✅ Multiple session processes created (9 camera subscriptions)
 - ✅ Individual session processes spawn with unique PIDs (e.g., PID: 16380)
 - ✅ Session logging shows isolated process names (SessionWorker-session_xxx)
 - ✅ IPC communication framework functioning
 **What to Look For When Testing:**
 - Check logs for "SessionProcessManager initialized"
 - Verify individual session processes: "Session process created: session_xxx (PID: xxxx)"
 - Monitor process isolation: Each session has unique process name "SessionWorker-session_xxx"
 - Confirm WebSocket integration: "Session WebSocket integration started"
 ### Phase 2: Per-Session Logging ✅ **COMPLETED**
 - [x] **Implement PerSessionLogger** ✅
  - Each session process creates own log file
  - Format: `detector_worker_camera_{subscription_id}.log`
  - Include session context in all log messages
  - Implement log rotation (daily/size-based)
 - [x] **Update Main Process Logging** ✅
  - Main process logs to `detector_worker_main.log`
  - Log session process lifecycle events
  - Track active sessions and resource usage
 **Phase 2 Testing Results:**
 - ✅ Main process logs to dedicated file: `logs/detector_worker_main.log`
 - ✅ Session-specific logger initialization working
 - ✅ Each camera spawns with unique session worker name: "SessionWorker-session_{unique_id}_{camera_name}"
 - ✅ Per-session logger ready for file creation (will create files when sessions fully initialize)
 - ✅ Structured logging with session context in format
 - ✅ Log rotation capability implemented (100MB max, 5 backups)
 **What to Look For When Testing:**
 - Check for main process log: `logs/detector_worker_main.log`
 - Monitor per-session process names in logs: "SessionWorker-session_xxx"
 - Once sessions initialize fully, look for per-camera log files: `detector_worker_camera_{camera_name}.log`
 - Verify session start/end events are logged with timestamps
 - Check log rotation when files exceed 100MB
 ### Phase 3: Model & Pipeline Isolation ✅ **COMPLETED**
 - [x] **Remove Shared Model Cache** ✅
  - Eliminated `YOLOWrapper._model_cache` class variable
  - Each process loads models independently
  - Memory isolation prevents cross-session contamination
 - [x] **Create Per-Process Pipeline Instances** ✅
  - Each session process instantiates own `DetectionPipeline`
  - Removed global pipeline singleton pattern
  - Session-local `session_to_subscription` mapping
 - [x] **Isolate Session State** ✅
  - Each process maintains own `session_processing_results`
  - Session mappings are process-local
  - Complete state isolation per session
 **Phase 3 Testing Results:**
 - ✅ **Zero Shared Cache**: Models log "(ISOLATED)" and "no shared cache!"
 - ✅ **Individual Model Loading**: Each session loads complete model set independently
  - `car_frontal_detection_v1.pt` per session
  - `car_brand_cls_v1.pt` per session
  - `car_bodytype_cls_v1.pt` per session
 - ✅ **Pipeline Isolation**: Each session has unique pipeline instance ID
 - ✅ **Memory Isolation**: Different sessions cannot share model instances
 - ✅ **State Isolation**: Session mappings are process-local (ISOLATED comments added)
 **What to Look For When Testing:**
 - Check logs for "(ISOLATED)" on model loading
 - Verify each session loads models independently: "Loading YOLO model ... (ISOLATED)"
 - Monitor unique pipeline instance IDs per session
 - Confirm no shared state between sessions
 - Look for "Successfully loaded model ... in isolation - no shared cache!"
 ### Phase 4: Integrated Stream-Session Architecture 🚧 **IN PROGRESS**
 **Problem Identified:** Frame processing pipeline not working due to dual stream systems causing communication gap.
 **Root Cause:**
 - Old RTSP Process Manager capturing frames but not forwarding to session workers
 - New Session Workers ready for processing but receiving no frames
 - Architecture mismatch preventing detection despite successful initialization
 **Solution:** Complete integration of stream reading INTO session worker processes.
 - [ ] **Integrate RTSP Stream Reading into Session Workers**
  - Move RTSP stream capture from separate processes into each session worker
  - Each session worker handles: RTSP connection + frame processing + model inference
  - Eliminate communication gap between stream capture and detection
 - [ ] **Remove Duplicate Stream Management Systems**
  - Delete old RTSP Process Manager (`core/streaming/process_manager.py`)
  - Remove conflicting stream management from main process
  - Consolidate to single session-worker-only architecture
 - [ ] **Enhanced Session Worker with Stream Integration**
  - Add RTSP stream reader to `SessionWorkerProcess`
  - Implement frame buffer queue management per worker
  - Add connection recovery and stream health monitoring per session
 - [ ] **Complete End-to-End Isolation per Camera**
  ```
  Session Worker Process N:
  ├── RTSP Stream Reader (rtsp://cameraN)
  ├── Frame Buffer Queue
  ├── YOLO Detection Pipeline
  ├── Model Cache (isolated)
  ├── Database/Redis connections
  └── Per-camera Logger
  ```
 **Benefits for 20+ Cameras:**
 - **Python GIL Bypass**: True parallelism with multiprocessing
 - **Resource Isolation**: Process crashes don't affect other cameras
 - **Memory Distribution**: Each process has own memory space
 - **Independent Recovery**: Per-camera reconnection logic
 - **Scalable Architecture**: Linear scaling with available CPU cores
 ### Phase 5: Resource Management & Cleanup
 - [ ] **Process Lifecycle Management**
  - Automatic process cleanup on WebSocket disconnect
  - Graceful shutdown handling
  - Resource deallocation on process termination
 - [ ] **Memory & GPU Management**
  - Monitor per-process memory usage
  - GPU memory isolation between sessions
  - Prevent memory leaks in long-running processes
 - [ ] **Health Monitoring**
  - Process health checks and restart capability
  - Performance metrics per session process
  - Resource usage monitoring and alerting
 ## 🔄 What Will Be Replaced
 ### Files to Modify:
 1. **`app.py`**
   - Replace direct pipeline execution with process management
   - Remove global `latest_frames` cache
   - Add SessionProcessManager integration
 2. **`core/models/inference.py`**
   - Remove shared `_model_cache` class variable
   - Make model loading process-specific
   - Eliminate cross-session model sharing
 3. **`core/detection/pipeline.py`**
   - Remove global session mappings
   - Make pipeline instance session-specific
   - Isolate processing state per session
 4. **`core/communication/websocket.py`**
   - Replace direct pipeline calls with IPC
   - Add process spawn/cleanup on subscribe/unsubscribe
   - Implement queue-based communication
 ### New Files to Create:
 1. **`core/processes/session_manager.py`**
   - SessionProcessManager class
   - Process lifecycle management
   - Health monitoring and cleanup
 2. **`core/processes/session_worker.py`**
   - SessionWorkerProcess class
   - Individual session process implementation
   - Model loading and pipeline execution
 3. **`core/processes/communication.py`**
   - IPC message definitions and handlers
   - Queue management utilities
   - Protocol for main ↔ session communication
 4. **`core/logging/session_logger.py`**
   - Per-session logging configuration
   - Log file management and rotation
   - Structured logging with session context
 ## ❌ What Will Be Removed
 ### Code to Remove:
 1. **Shared State Variables**
   ```python
   # From core/models/inference.py
   _model_cache: Dict[str, Any] = {}
   # From core/detection/pipeline.py
   self.session_to_subscription = {}
   self.session_processing_results = {}
   # From app.py
   latest_frames = {}
   ```
 2. **Global Singleton Patterns**
   - Single pipeline instance handling all sessions
   - Shared ThreadPoolExecutor across sessions
   - Global model manager for all subscriptions
 3. **Cross-Session Dependencies**
   - Session mapping lookups across different subscriptions
   - Shared processing state between unrelated sessions
   - Global frame caching across all cameras
 ## 🔧 Configuration Changes
 ### New Configuration Options:
 ```json
 {
  "session_processes": {
    "max_concurrent_sessions": 20,
    "process_cleanup_timeout": 30,
    "health_check_interval": 10,
    "log_rotation": {
      "max_size_mb": 100,
      "backup_count": 5
    }
  },
  "resource_limits": {
    "memory_per_process_mb": 2048,
    "gpu_memory_fraction": 0.3
  }
 }
 ```
 ## 📊 Benefits of New Architecture
 ### 🛡️ Complete Isolation:
 - **Memory Isolation**: Each session runs in separate process memory space
 - **Model Isolation**: No shared model cache between sessions
 - **State Isolation**: Session mappings and processing state are process-local
 - **Error Isolation**: Process crashes don't affect other sessions
 ### 📈 Performance Improvements:
 - **True Parallelism**: Bypass Python GIL limitations
 - **Resource Optimization**: Each process uses only required resources
 - **Scalability**: Linear scaling with available CPU cores
 - **Memory Efficiency**: Automatic cleanup on session termination
 ### 🔍 Enhanced Monitoring:
 - **Per-Camera Logs**: Dedicated log file for each session
 - **Resource Tracking**: Monitor CPU/memory per session process
 - **Debugging**: Isolated logs make issue diagnosis easier
 - **Audit Trail**: Complete processing history per camera
 ### 🚀 Operational Benefits:
 - **Zero Cross-Session Contamination**: Impossible for sessions to affect each other
 - **Hot Restart**: Individual session restart without affecting others
 - **Resource Control**: Fine-grained resource allocation per session
 - **Development**: Easier testing and debugging of individual sessions
 ## 🎬 Implementation Order
 1. **Phase 1**: Core infrastructure (SessionProcessManager, IPC)
 2. **Phase 2**: Per-session logging system
 3. **Phase 3**: Model and pipeline isolation
 4. **Phase 4**: Resource management and monitoring
 ## 🧪 Testing Strategy
 1. **Unit Tests**: Test individual session processes in isolation
 2. **Integration Tests**: Test main ↔ session process communication
 3. **Load Tests**: Multiple concurrent sessions with different models
 4. **Memory Tests**: Verify no cross-session memory leaks
 5. **Logging Tests**: Verify correct log file creation and rotation
 ## 📝 Migration Checklist
 - [ ] Backup current working version
 - [ ] Implement Phase 1 (core infrastructure)
 - [ ] Test with single session process
 - [ ] Implement Phase 2 (logging)
 - [ ] Test with multiple concurrent sessions
 - [ ] Implement Phase 3 (isolation)
 - [ ] Verify complete elimination of shared state
 - [ ] Implement Phase 4 (resource management)
 - [ ] Performance testing and optimization
 - [ ] Documentation updates
 ---
 **Expected Outcome**: Complete elimination of cross-session result contamination with enhanced monitoring capabilities and true session isolation.
--- a/app.py
+++ b/app.py
@ -22,15 +22,11 @@ if __name__ != "__main__":  # When imported by uvicorn
 from core.communication.websocket import websocket_endpoint
 from core.communication.state import worker_state
-# Configure logging
+# Import and setup main process logging
-logging.basicConfig(
+from core.logging.session_logger import setup_main_process_logging
-    level=logging.DEBUG,
+
-    format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
+# Configure main process logging
-    handlers=[
+setup_main_process_logging("logs")
        logging.FileHandler("detector_worker.log"),
        logging.StreamHandler()
    ]
 )
 logger = logging.getLogger("detector_worker")
 logger.setLevel(logging.DEBUG)
--- a/core/communication/session_integration.py
+++ b/core/communication/session_integration.py
@ -0,0 +1,319 @@
 """
 Integration layer between WebSocket handler and Session Process Manager.
 Bridges the existing WebSocket protocol with the new session-based architecture.
 """
 import asyncio
 import logging
 from typing import Dict, Any, Optional
 import numpy as np
 from ..processes.session_manager import SessionProcessManager
 from ..processes.communication import DetectionResultResponse, ErrorResponse
 from .state import worker_state
 from .messages import serialize_outgoing_message
 # Streaming is now handled directly by session workers - no shared stream manager needed
 logger = logging.getLogger(__name__)
 class SessionWebSocketIntegration:
    """
    Integration layer that connects WebSocket protocol with Session Process Manager.
    Maintains compatibility with existing WebSocket message handling.
    """
    def __init__(self, websocket_handler=None):
        """
        Initialize session WebSocket integration.
        Args:
            websocket_handler: Reference to WebSocket handler for sending messages
        """
        self.websocket_handler = websocket_handler
        self.session_manager = SessionProcessManager()
        # Track active subscriptions for compatibility
        self.active_subscriptions: Dict[str, Dict[str, Any]] = {}
        # Set up callbacks
        self.session_manager.set_detection_result_callback(self._on_detection_result)
        self.session_manager.set_error_callback(self._on_session_error)
    async def start(self):
        """Start the session integration."""
        await self.session_manager.start()
        logger.info("Session WebSocket integration started")
    async def stop(self):
        """Stop the session integration."""
        await self.session_manager.stop()
        logger.info("Session WebSocket integration stopped")
    async def handle_set_subscription_list(self, message) -> bool:
        """
        Handle setSubscriptionList message by managing session processes.
        Args:
            message: SetSubscriptionListMessage
        Returns:
            True if successful
        """
        try:
            logger.info(f"Processing subscription list with {len(message.subscriptions)} subscriptions")
            new_subscription_ids = set()
            for subscription in message.subscriptions:
                subscription_id = subscription.subscriptionIdentifier
                new_subscription_ids.add(subscription_id)
                # Check if this is a new subscription
                if subscription_id not in self.active_subscriptions:
                    logger.info(f"Creating new session for subscription: {subscription_id}")
                    # Convert subscription to configuration dict
                    subscription_config = {
                        'subscriptionIdentifier': subscription.subscriptionIdentifier,
                        'rtspUrl': getattr(subscription, 'rtspUrl', None),
                        'snapshotUrl': getattr(subscription, 'snapshotUrl', None),
                        'snapshotInterval': getattr(subscription, 'snapshotInterval', 5000),
                        'modelUrl': subscription.modelUrl,
                        'modelId': subscription.modelId,
                        'modelName': subscription.modelName,
                        'cropX1': subscription.cropX1,
                        'cropY1': subscription.cropY1,
                        'cropX2': subscription.cropX2,
                        'cropY2': subscription.cropY2
                    }
                    # Create session process
                    success = await self.session_manager.create_session(
                        subscription_id, subscription_config
                    )
                    if success:
                        self.active_subscriptions[subscription_id] = subscription_config
                        logger.info(f"Session created successfully for {subscription_id}")
                        # Stream handling is now integrated into session worker process
                    else:
                        logger.error(f"Failed to create session for {subscription_id}")
                        return False
                else:
                    # Update existing subscription configuration if needed
                    self.active_subscriptions[subscription_id].update({
                        'modelUrl': subscription.modelUrl,
                        'modelId': subscription.modelId,
                        'modelName': subscription.modelName,
                        'cropX1': subscription.cropX1,
                        'cropY1': subscription.cropY1,
                        'cropX2': subscription.cropX2,
                        'cropY2': subscription.cropY2
                    })
            # Remove sessions for subscriptions that are no longer active
            current_subscription_ids = set(self.active_subscriptions.keys())
            removed_subscriptions = current_subscription_ids - new_subscription_ids
            for subscription_id in removed_subscriptions:
                logger.info(f"Removing session for subscription: {subscription_id}")
                await self.session_manager.remove_session(subscription_id)
                del self.active_subscriptions[subscription_id]
            # Update worker state for compatibility
            worker_state.set_subscriptions(message.subscriptions)
            logger.info(f"Subscription list processed: {len(new_subscription_ids)} active sessions")
            return True
        except Exception as e:
            logger.error(f"Error handling subscription list: {e}", exc_info=True)
            return False
    async def handle_set_session_id(self, message) -> bool:
        """
        Handle setSessionId message by forwarding to appropriate session process.
        Args:
            message: SetSessionIdMessage
        Returns:
            True if successful
        """
        try:
            display_id = message.payload.displayIdentifier
            session_id = message.payload.sessionId
            logger.info(f"Setting session ID {session_id} for display {display_id}")
            # Find subscription identifier for this display
            subscription_id = None
            for sub_id in self.active_subscriptions.keys():
                # Extract display identifier from subscription identifier
                if display_id in sub_id:
                    subscription_id = sub_id
                    break
            if not subscription_id:
                logger.error(f"No active subscription found for display {display_id}")
                return False
            # Forward to session process
            success = await self.session_manager.set_session_id(
                subscription_id, str(session_id), display_id
            )
            if success:
                # Update worker state for compatibility
                worker_state.set_session_id(display_id, session_id)
                logger.info(f"Session ID {session_id} set successfully for {display_id}")
            else:
                logger.error(f"Failed to set session ID {session_id} for {display_id}")
            return success
        except Exception as e:
            logger.error(f"Error setting session ID: {e}", exc_info=True)
            return False
    async def process_frame(self, subscription_id: str, frame: np.ndarray, display_id: str, timestamp: float = None) -> bool:
        """
        Process frame through appropriate session process.
        Args:
            subscription_id: Subscription identifier
            frame: Frame to process
            display_id: Display identifier
            timestamp: Frame timestamp
        Returns:
            True if frame was processed successfully
        """
        try:
            if timestamp is None:
                timestamp = asyncio.get_event_loop().time()
            # Forward frame to session process
            success = await self.session_manager.process_frame(
                subscription_id, frame, display_id, timestamp
            )
            if not success:
                logger.warning(f"Failed to process frame for subscription {subscription_id}")
            return success
        except Exception as e:
            logger.error(f"Error processing frame for {subscription_id}: {e}", exc_info=True)
            return False
    async def _on_detection_result(self, subscription_id: str, response: DetectionResultResponse):
        """
        Handle detection result from session process.
        Args:
            subscription_id: Subscription identifier
            response: Detection result response
        """
        try:
            logger.debug(f"Received detection result from {subscription_id}: phase={response.phase}")
            # Send imageDetection message via WebSocket (if needed)
            if self.websocket_handler and hasattr(self.websocket_handler, 'send_message'):
                from .models import ImageDetectionMessage, DetectionData
                # Convert response detections to the expected format
                # The DetectionData expects modelId and modelName, and detection dict
                detection_data = DetectionData(
                    detection=response.detections,
                    modelId=getattr(response, 'model_id', 0),  # Get from response if available
                    modelName=getattr(response, 'model_name', 'unknown')  # Get from response if available
                )
                # Convert timestamp to string format if it exists
                timestamp_str = None
                if hasattr(response, 'timestamp') and response.timestamp:
                    from datetime import datetime
                    if isinstance(response.timestamp, (int, float)):
                        # Convert Unix timestamp to ISO format string
                        timestamp_str = datetime.fromtimestamp(response.timestamp).strftime("%Y-%m-%dT%H:%M:%S.%fZ")
                    else:
                        timestamp_str = str(response.timestamp)
                detection_message = ImageDetectionMessage(
                    subscriptionIdentifier=subscription_id,
                    data=detection_data,
                    timestamp=timestamp_str
                )
                serialized = serialize_outgoing_message(detection_message)
                await self.websocket_handler.send_message(serialized)
        except Exception as e:
            logger.error(f"Error handling detection result from {subscription_id}: {e}", exc_info=True)
    async def _on_session_error(self, subscription_id: str, error_response: ErrorResponse):
        """
        Handle error from session process.
        Args:
            subscription_id: Subscription identifier
            error_response: Error response
        """
        logger.error(f"Session error from {subscription_id}: {error_response.error_type} - {error_response.error_message}")
        # Send error message via WebSocket if needed
        if self.websocket_handler and hasattr(self.websocket_handler, 'send_message'):
            error_message = {
                'type': 'sessionError',
                'payload': {
                    'subscriptionIdentifier': subscription_id,
                    'errorType': error_response.error_type,
                    'errorMessage': error_response.error_message,
                    'timestamp': error_response.timestamp
                }
            }
            try:
                serialized = serialize_outgoing_message(error_message)
                await self.websocket_handler.send_message(serialized)
            except Exception as e:
                logger.error(f"Failed to send error message: {e}")
    def get_session_stats(self) -> Dict[str, Any]:
        """
        Get statistics about active sessions.
        Returns:
            Dictionary with session statistics
        """
        return {
            'active_sessions': self.session_manager.get_session_count(),
            'max_sessions': self.session_manager.max_concurrent_sessions,
            'subscriptions': list(self.active_subscriptions.keys())
        }
    async def handle_progression_stage(self, message) -> bool:
        """
        Handle setProgressionStage message.
        Args:
            message: SetProgressionStageMessage
        Returns:
            True if successful
        """
        try:
            # For now, just update worker state for compatibility
            # In future phases, this could be forwarded to session processes
            worker_state.set_progression_stage(
                message.payload.displayIdentifier,
                message.payload.progressionStage
            )
            return True
        except Exception as e:
            logger.error(f"Error handling progression stage: {e}", exc_info=True)
            return False
--- a/core/communication/websocket.py
+++ b/core/communication/websocket.py
@ -24,6 +24,7 @@ from .state import worker_state, SystemMetrics
 from ..models import ModelManager
 from ..streaming.manager import shared_stream_manager
 from ..tracking.integration import TrackingPipelineIntegration
 from .session_integration import SessionWebSocketIntegration
 logger = logging.getLogger(__name__)
@ -48,6 +49,9 @@ class WebSocketHandler:
        self._heartbeat_count = 0
        self._last_processed_models: set = set()  # Cache of last processed model IDs
        # Initialize session integration
        self.session_integration = SessionWebSocketIntegration(self)
    async def handle_connection(self) -> None:
        """
        Main connection handler that manages the WebSocket lifecycle.
@ -66,14 +70,16 @@ class WebSocketHandler:
            # Send immediate heartbeat to show connection is alive
            await self._send_immediate_heartbeat()
-            # Start background tasks (matching original architecture)
+            # Start session integration
-            stream_task = asyncio.create_task(self._process_streams())
+            await self.session_integration.start()
            # Start background tasks - stream processing now handled by session workers
            heartbeat_task = asyncio.create_task(self._send_heartbeat())
            message_task = asyncio.create_task(self._handle_messages())
-            logger.info(f"WebSocket background tasks started for {client_info} (stream + heartbeat + message handler)")
+            logger.info(f"WebSocket background tasks started for {client_info} (heartbeat + message handler)")
-            # Wait for heartbeat and message tasks (stream runs independently)
+            # Wait for heartbeat and message tasks
            await asyncio.gather(heartbeat_task, message_task)
        except Exception as e:
@ -87,6 +93,11 @@ class WebSocketHandler:
                    await stream_task
                except asyncio.CancelledError:
                    logger.debug(f"Stream task cancelled for {client_info}")
            # Stop session integration
            if hasattr(self, 'session_integration'):
                await self.session_integration.stop()
            await self._cleanup()
    async def _send_immediate_heartbeat(self) -> None:
@ -180,11 +191,11 @@ class WebSocketHandler:
        try:
            if message_type == MessageTypes.SET_SUBSCRIPTION_LIST:
-                await self._handle_set_subscription_list(message)
+                await self.session_integration.handle_set_subscription_list(message)
            elif message_type == MessageTypes.SET_SESSION_ID:
-                await self._handle_set_session_id(message)
+                await self.session_integration.handle_set_session_id(message)
            elif message_type == MessageTypes.SET_PROGRESSION_STAGE:
-                await self._handle_set_progression_stage(message)
+                await self.session_integration.handle_progression_stage(message)
            elif message_type == MessageTypes.REQUEST_STATE:
                await self._handle_request_state(message)
            elif message_type == MessageTypes.PATCH_SESSION_RESULT:
@ -619,31 +630,108 @@ class WebSocketHandler:
            logger.error(f"Failed to send WebSocket message: {e}")
            raise
    async def send_message(self, message) -> None:
        """Public method to send messages (used by session integration)."""
        await self._send_message(message)
    # DEPRECATED: Stream processing is now handled directly by session worker processes
    async def _process_streams(self) -> None:
        """
-        Stream processing task that handles frame processing and detection.
+        DEPRECATED: Stream processing task that handles frame processing and detection.
-        This is a placeholder for Phase 2 - currently just logs that it's running.
+        Stream processing is now integrated directly into session worker processes.
        """
        logger.info("DEPRECATED: Stream processing task - now handled by session workers")
        return  # Exit immediately - no longer needed
        # OLD CODE (disabled):
        logger.info("Stream processing task started")
        try:
            while self.connected:
                # Get current subscriptions
                subscriptions = worker_state.get_all_subscriptions()
-                # TODO: Phase 2 - Add actual frame processing logic here
+                if not subscriptions:
-                # This will include:
+                    await asyncio.sleep(0.5)
-                # - Frame reading from RTSP/HTTP streams
+                    continue
-                # - Model inference using loaded pipelines
+
-                # - Detection result sending via WebSocket
+                # Process frames for each subscription
                for subscription in subscriptions:
                    await self._process_subscription_frames(subscription)
                # Sleep to prevent excessive CPU usage (similar to old poll_interval)
-                await asyncio.sleep(0.1)  # 100ms polling interval
+                await asyncio.sleep(0.25)  # 250ms polling interval
        except asyncio.CancelledError:
            logger.info("Stream processing task cancelled")
        except Exception as e:
            logger.error(f"Error in stream processing: {e}", exc_info=True)
    async def _process_subscription_frames(self, subscription) -> None:
        """
        Process frames for a single subscription by getting frames from stream manager
        and forwarding them to the appropriate session worker.
        """
        try:
            subscription_id = subscription.subscriptionIdentifier
            # Get the latest frame from the stream manager
            frame_data = await self._get_frame_from_stream_manager(subscription)
            if frame_data and frame_data['frame'] is not None:
                # Extract display identifier (format: "test1;Dispenser Camera 1")
                display_id = subscription_id.split(';')[-1] if ';' in subscription_id else subscription_id
                # Forward frame to session worker via session integration
                success = await self.session_integration.process_frame(
                    subscription_id=subscription_id,
                    frame=frame_data['frame'],
                    display_id=display_id,
                    timestamp=frame_data.get('timestamp', asyncio.get_event_loop().time())
                )
                if success:
                    logger.debug(f"[Frame Processing] Sent frame to session worker for {subscription_id}")
                else:
                    logger.warning(f"[Frame Processing] Failed to send frame to session worker for {subscription_id}")
        except Exception as e:
            logger.error(f"Error processing frames for {subscription.subscriptionIdentifier}: {e}")
    async def _get_frame_from_stream_manager(self, subscription) -> dict:
        """
        Get the latest frame from the stream manager for a subscription using existing API.
        """
        try:
            subscription_id = subscription.subscriptionIdentifier
            # Use existing stream manager API to check if frame is available
            if not shared_stream_manager.has_frame(subscription_id):
                # Stream should already be started by session integration
                return {'frame': None, 'timestamp': None}
            # Get frame using existing API with crop coordinates if available
            crop_coords = None
            if hasattr(subscription, 'cropX1') and subscription.cropX1 is not None:
                crop_coords = (
                    subscription.cropX1, subscription.cropY1,
                    subscription.cropX2, subscription.cropY2
                )
            # Use existing get_frame method
            frame = shared_stream_manager.get_frame(subscription_id, crop_coords)
            if frame is not None:
                return {
                    'frame': frame,
                    'timestamp': asyncio.get_event_loop().time()
                }
            return {'frame': None, 'timestamp': None}
        except Exception as e:
            logger.error(f"Error getting frame from stream manager for {subscription.subscriptionIdentifier}: {e}")
            return {'frame': None, 'timestamp': None}
    async def _cleanup(self) -> None:
        """Clean up resources when connection closes."""
        logger.info("Cleaning up WebSocket connection")
--- a/core/detection/pipeline.py
+++ b/core/detection/pipeline.py
@ -58,10 +58,10 @@ class DetectionPipeline:
        # Pipeline configuration
        self.pipeline_config = pipeline_parser.pipeline_config
-        # SessionId to subscriptionIdentifier mapping
+        # SessionId to subscriptionIdentifier mapping (ISOLATED per session process)
        self.session_to_subscription = {}
-        # SessionId to processing results mapping (for combining with license plate results)
+        # SessionId to processing results mapping (ISOLATED per session process)
        self.session_processing_results = {}
        # Statistics
@ -72,7 +72,8 @@ class DetectionPipeline:
            'total_processing_time': 0.0
        }
-        logger.info("DetectionPipeline initialized")
+        logger.info(f"DetectionPipeline initialized for model {model_id} with ISOLATED state (no shared mappings or cache)")
        logger.info(f"Pipeline instance ID: {id(self)} - unique per session process")
    async def initialize(self) -> bool:
        """
--- a/core/logging/init.py
+++ b/core/logging/init.py
@ -0,0 +1,3 @@
 """
 Per-Session Logging Module
 """
--- a/core/logging/session_logger.py
+++ b/core/logging/session_logger.py
@ -0,0 +1,356 @@
 """
 Per-Session Logging Configuration and Management.
 Each session process gets its own dedicated log file with rotation support.
 """
 import logging
 import logging.handlers
 import os
 import sys
 from pathlib import Path
 from typing import Optional
 from datetime import datetime
 import re
 class PerSessionLogger:
    """
    Per-session logging configuration that creates dedicated log files for each session.
    Supports log rotation and structured logging with session context.
    """
    def __init__(
        self,
        session_id: str,
        subscription_identifier: str,
        log_dir: str = "logs",
        max_size_mb: int = 100,
        backup_count: int = 5,
        log_level: int = logging.INFO,
        detection_mode: bool = True
    ):
        """
        Initialize per-session logger.
        Args:
            session_id: Unique session identifier
            subscription_identifier: Subscription identifier (contains camera info)
            log_dir: Directory to store log files
            max_size_mb: Maximum size of each log file in MB
            backup_count: Number of backup files to keep
            log_level: Logging level
            detection_mode: If True, uses reduced verbosity for detection processes
        """
        self.session_id = session_id
        self.subscription_identifier = subscription_identifier
        self.log_dir = Path(log_dir)
        self.max_size_mb = max_size_mb
        self.backup_count = backup_count
        self.log_level = log_level
        self.detection_mode = detection_mode
        # Ensure log directory exists
        self.log_dir.mkdir(parents=True, exist_ok=True)
        # Generate clean filename from subscription identifier
        self.log_filename = self._generate_log_filename()
        self.log_filepath = self.log_dir / self.log_filename
        # Create logger
        self.logger = self._setup_logger()
    def _generate_log_filename(self) -> str:
        """
        Generate a clean filename from subscription identifier.
        Format: detector_worker_camera_{clean_subscription_id}.log
        Returns:
            Clean filename for the log file
        """
        # Clean subscription identifier for filename
        # Replace problematic characters with underscores
        clean_sub_id = re.sub(r'[^\w\-_.]', '_', self.subscription_identifier)
        # Remove consecutive underscores
        clean_sub_id = re.sub(r'_+', '_', clean_sub_id)
        # Remove leading/trailing underscores
        clean_sub_id = clean_sub_id.strip('_')
        # Generate filename
        filename = f"detector_worker_camera_{clean_sub_id}.log"
        return filename
    def _setup_logger(self) -> logging.Logger:
        """
        Setup logger with file handler and rotation.
        Returns:
            Configured logger instance
        """
        # Create logger with unique name
        logger_name = f"session_worker_{self.session_id}"
        logger = logging.getLogger(logger_name)
        # Clear any existing handlers to avoid duplicates
        logger.handlers.clear()
        # Set logging level
        logger.setLevel(self.log_level)
        # Create formatter with session context
        formatter = logging.Formatter(
            fmt='%(asctime)s [%(levelname)s] %(name)s [Session: {session_id}] [Camera: {camera}]: %(message)s'.format(
                session_id=self.session_id,
                camera=self.subscription_identifier
            ),
            datefmt='%Y-%m-%d %H:%M:%S'
        )
        # Create rotating file handler
        max_bytes = self.max_size_mb * 1024 * 1024  # Convert MB to bytes
        file_handler = logging.handlers.RotatingFileHandler(
            filename=self.log_filepath,
            maxBytes=max_bytes,
            backupCount=self.backup_count,
            encoding='utf-8'
        )
        file_handler.setLevel(self.log_level)
        file_handler.setFormatter(formatter)
        # Create console handler for debugging (optional)
        console_handler = logging.StreamHandler(sys.stdout)
        console_handler.setLevel(logging.WARNING)  # Only warnings and errors to console
        console_formatter = logging.Formatter(
            fmt='[{session_id}] [%(levelname)s]: %(message)s'.format(
                session_id=self.session_id
            )
        )
        console_handler.setFormatter(console_formatter)
        # Add handlers to logger
        logger.addHandler(file_handler)
        logger.addHandler(console_handler)
        # Prevent propagation to root logger
        logger.propagate = False
        # Log initialization (reduced verbosity in detection mode)
        if self.detection_mode:
            logger.info(f"Session logger ready for {self.subscription_identifier}")
        else:
            logger.info(f"Per-session logger initialized")
            logger.info(f"Log file: {self.log_filepath}")
            logger.info(f"Session ID: {self.session_id}")
            logger.info(f"Camera: {self.subscription_identifier}")
            logger.info(f"Max size: {self.max_size_mb}MB, Backup count: {self.backup_count}")
        return logger
    def get_logger(self) -> logging.Logger:
        """
        Get the configured logger instance.
        Returns:
            Logger instance for this session
        """
        return self.logger
    def log_session_start(self, process_id: int):
        """
        Log session start with process information.
        Args:
            process_id: Process ID of the session worker
        """
        if self.detection_mode:
            self.logger.info(f"Session started - PID {process_id}")
        else:
            self.logger.info("=" * 60)
            self.logger.info(f"SESSION STARTED")
            self.logger.info(f"Process ID: {process_id}")
            self.logger.info(f"Session ID: {self.session_id}")
            self.logger.info(f"Camera: {self.subscription_identifier}")
            self.logger.info(f"Timestamp: {datetime.now().isoformat()}")
            self.logger.info("=" * 60)
    def log_session_end(self):
        """Log session end."""
        self.logger.info("=" * 60)
        self.logger.info(f"SESSION ENDED")
        self.logger.info(f"Timestamp: {datetime.now().isoformat()}")
        self.logger.info("=" * 60)
    def log_model_loading(self, model_id: int, model_name: str, model_path: str):
        """
        Log model loading information.
        Args:
            model_id: Model ID
            model_name: Model name
            model_path: Path to the model
        """
        if self.detection_mode:
            self.logger.info(f"Loading model {model_id}: {model_name}")
        else:
            self.logger.info("-" * 40)
            self.logger.info(f"MODEL LOADING")
            self.logger.info(f"Model ID: {model_id}")
            self.logger.info(f"Model Name: {model_name}")
            self.logger.info(f"Model Path: {model_path}")
            self.logger.info("-" * 40)
    def log_frame_processing(self, frame_count: int, processing_time: float, detections: int):
        """
        Log frame processing information.
        Args:
            frame_count: Current frame count
            processing_time: Processing time in seconds
            detections: Number of detections found
        """
        self.logger.debug(f"FRAME #{frame_count}: Processing time: {processing_time:.3f}s, Detections: {detections}")
    def log_detection_result(self, detection_type: str, confidence: float, bbox: list):
        """
        Log detection result.
        Args:
            detection_type: Type of detection (e.g., "Car", "Frontal")
            confidence: Detection confidence
            bbox: Bounding box coordinates
        """
        self.logger.info(f"DETECTION: {detection_type} (conf: {confidence:.3f}) at {bbox}")
    def log_database_operation(self, operation: str, session_id: str, success: bool):
        """
        Log database operation.
        Args:
            operation: Type of operation
            session_id: Session ID used in database
            success: Whether operation succeeded
        """
        status = "SUCCESS" if success else "FAILED"
        self.logger.info(f"DATABASE {operation}: {status} (session: {session_id})")
    def log_error(self, error_type: str, error_message: str, traceback_str: Optional[str] = None):
        """
        Log error with context.
        Args:
            error_type: Type of error
            error_message: Error message
            traceback_str: Optional traceback string
        """
        self.logger.error(f"ERROR [{error_type}]: {error_message}")
        if traceback_str:
            self.logger.error(f"Traceback:\n{traceback_str}")
    def get_log_stats(self) -> dict:
        """
        Get logging statistics.
        Returns:
            Dictionary with logging statistics
        """
        try:
            if self.log_filepath.exists():
                stat = self.log_filepath.stat()
                return {
                    'log_file': str(self.log_filepath),
                    'file_size_mb': round(stat.st_size / (1024 * 1024), 2),
                    'created': datetime.fromtimestamp(stat.st_ctime).isoformat(),
                    'modified': datetime.fromtimestamp(stat.st_mtime).isoformat(),
                }
            else:
                return {'log_file': str(self.log_filepath), 'status': 'not_created'}
        except Exception as e:
            return {'log_file': str(self.log_filepath), 'error': str(e)}
    def cleanup(self):
        """Cleanup logger handlers."""
        if hasattr(self, 'logger') and self.logger:
            for handler in self.logger.handlers[:]:
                handler.close()
                self.logger.removeHandler(handler)
 class MainProcessLogger:
    """
    Logger configuration for the main FastAPI process.
    Separate from session logs to avoid confusion.
    """
    def __init__(self, log_dir: str = "logs", max_size_mb: int = 50, backup_count: int = 3):
        """
        Initialize main process logger.
        Args:
            log_dir: Directory to store log files
            max_size_mb: Maximum size of each log file in MB
            backup_count: Number of backup files to keep
        """
        self.log_dir = Path(log_dir)
        self.max_size_mb = max_size_mb
        self.backup_count = backup_count
        # Ensure log directory exists
        self.log_dir.mkdir(parents=True, exist_ok=True)
        # Setup main process logger
        self._setup_main_logger()
    def _setup_main_logger(self):
        """Setup main process logger."""
        # Configure root logger
        root_logger = logging.getLogger("detector_worker")
        # Clear existing handlers
        for handler in root_logger.handlers[:]:
            root_logger.removeHandler(handler)
        # Set level
        root_logger.setLevel(logging.INFO)
        # Create formatter
        formatter = logging.Formatter(
            fmt='%(asctime)s [%(levelname)s] %(name)s [MAIN]: %(message)s',
            datefmt='%Y-%m-%d %H:%M:%S'
        )
        # Create rotating file handler for main process
        max_bytes = self.max_size_mb * 1024 * 1024
        main_log_path = self.log_dir / "detector_worker_main.log"
        file_handler = logging.handlers.RotatingFileHandler(
            filename=main_log_path,
            maxBytes=max_bytes,
            backupCount=self.backup_count,
            encoding='utf-8'
        )
        file_handler.setLevel(logging.INFO)
        file_handler.setFormatter(formatter)
        # Create console handler
        console_handler = logging.StreamHandler()
        console_handler.setLevel(logging.INFO)
        console_handler.setFormatter(formatter)
        # Add handlers
        root_logger.addHandler(file_handler)
        root_logger.addHandler(console_handler)
        # Log initialization
        root_logger.info("Main process logger initialized")
        root_logger.info(f"Main log file: {main_log_path}")
 def setup_main_process_logging(log_dir: str = "logs"):
    """
    Setup logging for the main FastAPI process.
    Args:
        log_dir: Directory to store log files
    """
    MainProcessLogger(log_dir=log_dir)
--- a/core/models/inference.py
+++ b/core/models/inference.py
@ -34,11 +34,7 @@ class InferenceResult:
 class YOLOWrapper:
-    """Wrapper for YOLO models with caching and optimization"""
+    """Wrapper for YOLO models with per-instance isolation (no shared cache)"""
    # Class-level model cache shared across all instances
    _model_cache: Dict[str, Any] = {}
    _cache_lock = Lock()
    def __init__(self, model_path: Path, model_id: str, device: Optional[str] = None):
        """
@ -65,61 +61,48 @@ class YOLOWrapper:
        logger.info(f"Initialized YOLO wrapper for {model_id} on {self.device}")
    def _load_model(self) -> None:
-        """Load the YOLO model with caching"""
+        """Load the YOLO model in isolation (no shared cache)"""
-        cache_key = str(self.model_path)
+        try:
            from ultralytics import YOLO
-        with self._cache_lock:
+            logger.debug(f"Loading YOLO model {self.model_id} from {self.model_path} (ISOLATED)")
            # Check if model is already cached
            if cache_key in self._model_cache:
                logger.info(f"Loading model {self.model_id} from cache")
                self.model = self._model_cache[cache_key]
                self._extract_class_names()
                return
-            # Load model
+            # Load model directly without any caching
-            try:
+            self.model = YOLO(str(self.model_path))
                from ultralytics import YOLO
-                logger.info(f"Loading YOLO model from {self.model_path}")
+            # Determine if this is a classification model based on filename or model structure
            # Classification models typically have 'cls' in filename
            is_classification = 'cls' in str(self.model_path).lower()
-                # Load model normally first
+            # For classification models, create a separate instance with task parameter
-                self.model = YOLO(str(self.model_path))
+            if is_classification:
                try:
                    # Reload with classification task (like ML engineer's approach)
                    self.model = YOLO(str(self.model_path), task="classify")
                    logger.info(f"Loaded classification model {self.model_id} with task='classify' (ISOLATED)")
                except Exception as e:
                    logger.warning(f"Failed to load with task='classify', using default: {e}")
                    # Fall back to regular loading
                    self.model = YOLO(str(self.model_path))
                    logger.info(f"Loaded model {self.model_id} with default task (ISOLATED)")
            else:
                logger.info(f"Loaded detection model {self.model_id} (ISOLATED)")
-                # Determine if this is a classification model based on filename or model structure
+            # Move model to device
-                # Classification models typically have 'cls' in filename
+            if self.device == 'cuda' and torch.cuda.is_available():
-                is_classification = 'cls' in str(self.model_path).lower()
+                self.model.to('cuda')
                logger.info(f"Model {self.model_id} moved to GPU (ISOLATED)")
-                # For classification models, create a separate instance with task parameter
+            self._extract_class_names()
                if is_classification:
                    try:
                        # Reload with classification task (like ML engineer's approach)
                        self.model = YOLO(str(self.model_path), task="classify")
                        logger.info(f"Loaded classification model {self.model_id} with task='classify'")
                    except Exception as e:
                        logger.warning(f"Failed to load with task='classify', using default: {e}")
                        # Fall back to regular loading
                        self.model = YOLO(str(self.model_path))
                        logger.info(f"Loaded model {self.model_id} with default task")
                else:
                    logger.info(f"Loaded detection model {self.model_id}")
-                # Move model to device
+            logger.debug(f"Successfully loaded model {self.model_id} in isolation - no shared cache!")
                if self.device == 'cuda' and torch.cuda.is_available():
                    self.model.to('cuda')
                    logger.info(f"Model {self.model_id} moved to GPU")
-                # Cache the model
+        except ImportError:
-                self._model_cache[cache_key] = self.model
+            logger.error("Ultralytics YOLO not installed. Install with: pip install ultralytics")
-                self._extract_class_names()
+            raise
-
+        except Exception as e:
-                logger.info(f"Successfully loaded model {self.model_id}")
+            logger.error(f"Failed to load YOLO model {self.model_id}: {str(e)}", exc_info=True)
-
+            raise
            except ImportError:
                logger.error("Ultralytics YOLO not installed. Install with: pip install ultralytics")
                raise
            except Exception as e:
                logger.error(f"Failed to load YOLO model {self.model_id}: {str(e)}", exc_info=True)
                raise
    def _extract_class_names(self) -> None:
        """Extract class names from the model"""
@ -375,19 +358,15 @@ class YOLOWrapper:
        return 'cls' in str(self.model_path).lower() or 'classify' in str(self.model_path).lower()
    def clear_cache(self) -> None:
-        """Clear the model cache"""
+        """Clear model resources (no cache in isolated mode)"""
-        with self._cache_lock:
+        if self.model:
-            cache_key = str(self.model_path)
+            # Clear any model resources if needed
-            if cache_key in self._model_cache:
+            logger.info(f"Cleared resources for model {self.model_id} (no shared cache)")
                del self._model_cache[cache_key]
                logger.info(f"Cleared cache for model {self.model_id}")
    @classmethod
    def clear_all_cache(cls) -> None:
-        """Clear all cached models"""
+        """No-op in isolated mode (no shared cache to clear)"""
-        with cls._cache_lock:
+        logger.info("No shared cache to clear in isolated mode")
            cls._model_cache.clear()
            logger.info("Cleared all model cache")
    def warmup(self, image_size: Tuple[int, int] = (640, 640)) -> None:
        """
@ -438,16 +417,17 @@ class ModelInferenceManager:
            YOLOWrapper instance
        """
        with self._lock:
-            # Check if already loaded
+            # Check if already loaded for this specific manager instance
            if model_id in self.models:
-                logger.debug(f"Model {model_id} already loaded")
+                logger.debug(f"Model {model_id} already loaded in this manager instance")
                return self.models[model_id]
-            # Load the model
+            # Load the model (each instance loads independently)
            model_path = self.model_dir / model_file
            if not model_path.exists():
                raise FileNotFoundError(f"Model file not found: {model_path}")
            logger.info(f"Loading model {model_id} in isolation for this manager instance")
            wrapper = YOLOWrapper(model_path, model_id, device)
            self.models[model_id] = wrapper
--- a/core/processes/init.py
+++ b/core/processes/init.py
@ -0,0 +1,3 @@
 """
 Session Process Management Module
 """
--- a/core/processes/communication.py
+++ b/core/processes/communication.py
@ -0,0 +1,317 @@
 """
 Inter-Process Communication (IPC) system for session processes.
 Defines message types and protocols for main ↔ session communication.
 """
 import time
 from enum import Enum
 from typing import Dict, Any, Optional, Union
 from dataclasses import dataclass, field
 import numpy as np
 class MessageType(Enum):
    """Message types for IPC communication."""
    # Commands: Main → Session
    INITIALIZE = "initialize"
    PROCESS_FRAME = "process_frame"
    SET_SESSION_ID = "set_session_id"
    SHUTDOWN = "shutdown"
    HEALTH_CHECK = "health_check"
    # Responses: Session → Main
    INITIALIZED = "initialized"
    DETECTION_RESULT = "detection_result"
    SESSION_SET = "session_set"
    SHUTDOWN_COMPLETE = "shutdown_complete"
    HEALTH_RESPONSE = "health_response"
    ERROR = "error"
@dataclass
 class IPCMessage:
    """Base class for all IPC messages."""
    type: MessageType
    session_id: str
    timestamp: float = field(default_factory=time.time)
    message_id: str = field(default_factory=lambda: str(int(time.time() * 1000000)))
@dataclass
 class InitializeCommand(IPCMessage):
    """Initialize session process with configuration."""
    subscription_config: Dict[str, Any] = field(default_factory=dict)
    model_config: Dict[str, Any] = field(default_factory=dict)
@dataclass
 class ProcessFrameCommand(IPCMessage):
    """Process a frame through the detection pipeline."""
    frame: Optional[np.ndarray] = None
    display_id: str = ""
    subscription_identifier: str = ""
    frame_timestamp: float = 0.0
@dataclass
 class SetSessionIdCommand(IPCMessage):
    """Set the session ID for the current session."""
    backend_session_id: str = ""
    display_id: str = ""
@dataclass
 class ShutdownCommand(IPCMessage):
    """Shutdown the session process gracefully."""
@dataclass
 class HealthCheckCommand(IPCMessage):
    """Check health status of session process."""
@dataclass
 class InitializedResponse(IPCMessage):
    """Response indicating successful initialization."""
    success: bool = False
    error_message: Optional[str] = None
@dataclass
 class DetectionResultResponse(IPCMessage):
    """Detection results from session process."""
    detections: Dict[str, Any] = field(default_factory=dict)
    processing_time: float = 0.0
    phase: str = ""  # "detection" or "processing"
@dataclass
 class SessionSetResponse(IPCMessage):
    """Response confirming session ID was set."""
    success: bool = False
    backend_session_id: str = ""
@dataclass
 class ShutdownCompleteResponse(IPCMessage):
    """Response confirming graceful shutdown."""
@dataclass
 class HealthResponse(IPCMessage):
    """Health status response."""
    status: str = "unknown"  # "healthy", "degraded", "unhealthy"
    memory_usage_mb: float = 0.0
    cpu_percent: float = 0.0
    gpu_memory_mb: Optional[float] = None
    uptime_seconds: float = 0.0
    processed_frames: int = 0
@dataclass
 class ErrorResponse(IPCMessage):
    """Error message from session process."""
    error_type: str = ""
    error_message: str = ""
    traceback: Optional[str] = None
 # Type aliases for message unions
 CommandMessage = Union[
    InitializeCommand,
    ProcessFrameCommand,
    SetSessionIdCommand,
    ShutdownCommand,
    HealthCheckCommand
 ]
 ResponseMessage = Union[
    InitializedResponse,
    DetectionResultResponse,
    SessionSetResponse,
    ShutdownCompleteResponse,
    HealthResponse,
    ErrorResponse
 ]
 IPCMessageUnion = Union[CommandMessage, ResponseMessage]
 class MessageSerializer:
    """Handles serialization/deserialization of IPC messages."""
    @staticmethod
    def serialize_message(message: IPCMessageUnion) -> Dict[str, Any]:
        """
        Serialize message to dictionary for queue transport.
        Args:
            message: Message to serialize
        Returns:
            Dictionary representation of message
        """
        result = {
            'type': message.type.value,
            'session_id': message.session_id,
            'timestamp': message.timestamp,
            'message_id': message.message_id,
        }
        # Add specific fields based on message type
        if isinstance(message, InitializeCommand):
            result.update({
                'subscription_config': message.subscription_config,
                'model_config': message.model_config
            })
        elif isinstance(message, ProcessFrameCommand):
            result.update({
                'frame': message.frame,
                'display_id': message.display_id,
                'subscription_identifier': message.subscription_identifier,
                'frame_timestamp': message.frame_timestamp
            })
        elif isinstance(message, SetSessionIdCommand):
            result.update({
                'backend_session_id': message.backend_session_id,
                'display_id': message.display_id
            })
        elif isinstance(message, InitializedResponse):
            result.update({
                'success': message.success,
                'error_message': message.error_message
            })
        elif isinstance(message, DetectionResultResponse):
            result.update({
                'detections': message.detections,
                'processing_time': message.processing_time,
                'phase': message.phase
            })
        elif isinstance(message, SessionSetResponse):
            result.update({
                'success': message.success,
                'backend_session_id': message.backend_session_id
            })
        elif isinstance(message, HealthResponse):
            result.update({
                'status': message.status,
                'memory_usage_mb': message.memory_usage_mb,
                'cpu_percent': message.cpu_percent,
                'gpu_memory_mb': message.gpu_memory_mb,
                'uptime_seconds': message.uptime_seconds,
                'processed_frames': message.processed_frames
            })
        elif isinstance(message, ErrorResponse):
            result.update({
                'error_type': message.error_type,
                'error_message': message.error_message,
                'traceback': message.traceback
            })
        return result
    @staticmethod
    def deserialize_message(data: Dict[str, Any]) -> IPCMessageUnion:
        """
        Deserialize dictionary back to message object.
        Args:
            data: Dictionary representation
        Returns:
            Deserialized message object
        """
        msg_type = MessageType(data['type'])
        session_id = data['session_id']
        timestamp = data['timestamp']
        message_id = data['message_id']
        base_kwargs = {
            'session_id': session_id,
            'timestamp': timestamp,
            'message_id': message_id
        }
        if msg_type == MessageType.INITIALIZE:
            return InitializeCommand(
                type=msg_type,
                subscription_config=data['subscription_config'],
                model_config=data['model_config'],
                **base_kwargs
            )
        elif msg_type == MessageType.PROCESS_FRAME:
            return ProcessFrameCommand(
                type=msg_type,
                frame=data['frame'],
                display_id=data['display_id'],
                subscription_identifier=data['subscription_identifier'],
                frame_timestamp=data['frame_timestamp'],
                **base_kwargs
            )
        elif msg_type == MessageType.SET_SESSION_ID:
            return SetSessionIdCommand(
                backend_session_id=data['backend_session_id'],
                display_id=data['display_id'],
                **base_kwargs
            )
        elif msg_type == MessageType.SHUTDOWN:
            return ShutdownCommand(**base_kwargs)
        elif msg_type == MessageType.HEALTH_CHECK:
            return HealthCheckCommand(**base_kwargs)
        elif msg_type == MessageType.INITIALIZED:
            return InitializedResponse(
                type=msg_type,
                success=data['success'],
                error_message=data.get('error_message'),
                **base_kwargs
            )
        elif msg_type == MessageType.DETECTION_RESULT:
            return DetectionResultResponse(
                type=msg_type,
                detections=data['detections'],
                processing_time=data['processing_time'],
                phase=data['phase'],
                **base_kwargs
            )
        elif msg_type == MessageType.SESSION_SET:
            return SessionSetResponse(
                type=msg_type,
                success=data['success'],
                backend_session_id=data['backend_session_id'],
                **base_kwargs
            )
        elif msg_type == MessageType.SHUTDOWN_COMPLETE:
            return ShutdownCompleteResponse(type=msg_type, **base_kwargs)
        elif msg_type == MessageType.HEALTH_RESPONSE:
            return HealthResponse(
                type=msg_type,
                status=data['status'],
                memory_usage_mb=data['memory_usage_mb'],
                cpu_percent=data['cpu_percent'],
                gpu_memory_mb=data.get('gpu_memory_mb'),
                uptime_seconds=data.get('uptime_seconds', 0.0),
                processed_frames=data.get('processed_frames', 0),
                **base_kwargs
            )
        elif msg_type == MessageType.ERROR:
            return ErrorResponse(
                type=msg_type,
                error_type=data['error_type'],
                error_message=data['error_message'],
                traceback=data.get('traceback'),
                **base_kwargs
            )
        else:
            raise ValueError(f"Unknown message type: {msg_type}")
--- a/core/processes/session_manager.py
+++ b/core/processes/session_manager.py
@ -0,0 +1,464 @@
 """
 Session Process Manager - Manages lifecycle of session processes.
 Handles process spawning, monitoring, cleanup, and health checks.
 """
 import time
 import logging
 import asyncio
 import multiprocessing as mp
 from typing import Dict, Optional, Any, Callable
 from dataclasses import dataclass
 from concurrent.futures import ThreadPoolExecutor
 import threading
 from .communication import (
    MessageSerializer, MessageType,
    InitializeCommand, ProcessFrameCommand, SetSessionIdCommand,
    ShutdownCommand, HealthCheckCommand,
    InitializedResponse, DetectionResultResponse, SessionSetResponse,
    ShutdownCompleteResponse, HealthResponse, ErrorResponse
 )
 from .session_worker import session_worker_main
 logger = logging.getLogger(__name__)
@dataclass
 class SessionProcessInfo:
    """Information about a running session process."""
    session_id: str
    subscription_identifier: str
    process: mp.Process
    command_queue: mp.Queue
    response_queue: mp.Queue
    created_at: float
    last_health_check: float = 0.0
    is_initialized: bool = False
    processed_frames: int = 0
 class SessionProcessManager:
    """
    Manages lifecycle of session processes.
    Each session gets its own dedicated process for complete isolation.
    """
    def __init__(self, max_concurrent_sessions: int = 20, health_check_interval: int = 30):
        """
        Initialize session process manager.
        Args:
            max_concurrent_sessions: Maximum number of concurrent session processes
            health_check_interval: Interval in seconds between health checks
        """
        self.max_concurrent_sessions = max_concurrent_sessions
        self.health_check_interval = health_check_interval
        # Active session processes
        self.sessions: Dict[str, SessionProcessInfo] = {}
        self.subscription_to_session: Dict[str, str] = {}
        # Thread pool for response processing
        self.response_executor = ThreadPoolExecutor(max_workers=4, thread_name_prefix="ResponseProcessor")
        # Health check task
        self.health_check_task = None
        self.is_running = False
        # Message callbacks
        self.detection_result_callback: Optional[Callable] = None
        self.error_callback: Optional[Callable] = None
        # Store main event loop for async operations from threads
        self.main_event_loop = None
        logger.info(f"SessionProcessManager initialized (max_sessions={max_concurrent_sessions})")
    async def start(self):
        """Start the session process manager."""
        if self.is_running:
            return
        self.is_running = True
        # Store the main event loop for use in threads
        self.main_event_loop = asyncio.get_running_loop()
        logger.info("Starting session process manager")
        # Start health check task
        self.health_check_task = asyncio.create_task(self._health_check_loop())
        # Start response processing for existing sessions
        for session_info in self.sessions.values():
            self._start_response_processing(session_info)
    async def stop(self):
        """Stop the session process manager and cleanup all sessions."""
        if not self.is_running:
            return
        logger.info("Stopping session process manager")
        self.is_running = False
        # Cancel health check task
        if self.health_check_task:
            self.health_check_task.cancel()
            try:
                await self.health_check_task
            except asyncio.CancelledError:
                pass
        # Shutdown all sessions
        shutdown_tasks = []
        for session_id in list(self.sessions.keys()):
            task = asyncio.create_task(self.remove_session(session_id))
            shutdown_tasks.append(task)
        if shutdown_tasks:
            await asyncio.gather(*shutdown_tasks, return_exceptions=True)
        # Cleanup thread pool
        self.response_executor.shutdown(wait=True)
        logger.info("Session process manager stopped")
    async def create_session(self, subscription_identifier: str, subscription_config: Dict[str, Any]) -> bool:
        """
        Create a new session process for a subscription.
        Args:
            subscription_identifier: Unique subscription identifier
            subscription_config: Subscription configuration
        Returns:
            True if session was created successfully
        """
        try:
            # Check if we're at capacity
            if len(self.sessions) >= self.max_concurrent_sessions:
                logger.warning(f"Cannot create session: at max capacity ({self.max_concurrent_sessions})")
                return False
            # Check if subscription already has a session
            if subscription_identifier in self.subscription_to_session:
                existing_session_id = self.subscription_to_session[subscription_identifier]
                logger.info(f"Subscription {subscription_identifier} already has session {existing_session_id}")
                return True
            # Generate unique session ID
            session_id = f"session_{int(time.time() * 1000)}_{subscription_identifier.replace(';', '_')}"
            logger.info(f"Creating session process for subscription {subscription_identifier}")
            logger.info(f"Session ID: {session_id}")
            # Create communication queues
            command_queue = mp.Queue()
            response_queue = mp.Queue()
            # Create and start process
            process = mp.Process(
                target=session_worker_main,
                args=(session_id, command_queue, response_queue),
                name=f"SessionWorker-{session_id}"
            )
            process.start()
            # Store session information
            session_info = SessionProcessInfo(
                session_id=session_id,
                subscription_identifier=subscription_identifier,
                process=process,
                command_queue=command_queue,
                response_queue=response_queue,
                created_at=time.time()
            )
            self.sessions[session_id] = session_info
            self.subscription_to_session[subscription_identifier] = session_id
            # Start response processing for this session
            self._start_response_processing(session_info)
            logger.info(f"Session process created: {session_id} (PID: {process.pid})")
            # Initialize the session with configuration
            model_config = {
                'modelId': subscription_config.get('modelId'),
                'modelUrl': subscription_config.get('modelUrl'),
                'modelName': subscription_config.get('modelName')
            }
            init_command = InitializeCommand(
                type=MessageType.INITIALIZE,
                session_id=session_id,
                subscription_config=subscription_config,
                model_config=model_config
            )
            await self._send_command(session_id, init_command)
            return True
        except Exception as e:
            logger.error(f"Failed to create session for {subscription_identifier}: {e}", exc_info=True)
            # Cleanup on failure
            if session_id in self.sessions:
                await self._cleanup_session(session_id)
            return False
    async def remove_session(self, subscription_identifier: str) -> bool:
        """
        Remove a session process for a subscription.
        Args:
            subscription_identifier: Subscription identifier to remove
        Returns:
            True if session was removed successfully
        """
        try:
            session_id = self.subscription_to_session.get(subscription_identifier)
            if not session_id:
                logger.warning(f"No session found for subscription {subscription_identifier}")
                return False
            logger.info(f"Removing session {session_id} for subscription {subscription_identifier}")
            session_info = self.sessions.get(session_id)
            if session_info:
                # Send shutdown command
                shutdown_command = ShutdownCommand(session_id=session_id)
                await self._send_command(session_id, shutdown_command)
                # Wait for graceful shutdown (with timeout)
                try:
                    await asyncio.wait_for(self._wait_for_shutdown(session_info), timeout=10.0)
                except asyncio.TimeoutError:
                    logger.warning(f"Session {session_id} did not shutdown gracefully, terminating")
            # Cleanup session
            await self._cleanup_session(session_id)
            return True
        except Exception as e:
            logger.error(f"Failed to remove session for {subscription_identifier}: {e}", exc_info=True)
            return False
    async def process_frame(self, subscription_identifier: str, frame: Any, display_id: str, frame_timestamp: float) -> bool:
        """
        Send a frame to the session process for processing.
        Args:
            subscription_identifier: Subscription identifier
            frame: Frame to process
            display_id: Display identifier
            frame_timestamp: Timestamp of the frame
        Returns:
            True if frame was sent successfully
        """
        try:
            session_id = self.subscription_to_session.get(subscription_identifier)
            if not session_id:
                logger.warning(f"No session found for subscription {subscription_identifier}")
                return False
            session_info = self.sessions.get(session_id)
            if not session_info or not session_info.is_initialized:
                logger.warning(f"Session {session_id} not initialized")
                return False
            # Create process frame command
            process_command = ProcessFrameCommand(
                session_id=session_id,
                frame=frame,
                display_id=display_id,
                subscription_identifier=subscription_identifier,
                frame_timestamp=frame_timestamp
            )
            await self._send_command(session_id, process_command)
            return True
        except Exception as e:
            logger.error(f"Failed to process frame for {subscription_identifier}: {e}", exc_info=True)
            return False
    async def set_session_id(self, subscription_identifier: str, backend_session_id: str, display_id: str) -> bool:
        """
        Set the backend session ID for a session.
        Args:
            subscription_identifier: Subscription identifier
            backend_session_id: Backend session ID
            display_id: Display identifier
        Returns:
            True if session ID was set successfully
        """
        try:
            session_id = self.subscription_to_session.get(subscription_identifier)
            if not session_id:
                logger.warning(f"No session found for subscription {subscription_identifier}")
                return False
            # Create set session ID command
            set_command = SetSessionIdCommand(
                session_id=session_id,
                backend_session_id=backend_session_id,
                display_id=display_id
            )
            await self._send_command(session_id, set_command)
            return True
        except Exception as e:
            logger.error(f"Failed to set session ID for {subscription_identifier}: {e}", exc_info=True)
            return False
    def set_detection_result_callback(self, callback: Callable):
        """Set callback for handling detection results."""
        self.detection_result_callback = callback
    def set_error_callback(self, callback: Callable):
        """Set callback for handling errors."""
        self.error_callback = callback
    def get_session_count(self) -> int:
        """Get the number of active sessions."""
        return len(self.sessions)
    def get_session_info(self, subscription_identifier: str) -> Optional[Dict[str, Any]]:
        """Get information about a session."""
        session_id = self.subscription_to_session.get(subscription_identifier)
        if not session_id:
            return None
        session_info = self.sessions.get(session_id)
        if not session_info:
            return None
        return {
            'session_id': session_id,
            'subscription_identifier': subscription_identifier,
            'created_at': session_info.created_at,
            'is_initialized': session_info.is_initialized,
            'processed_frames': session_info.processed_frames,
            'process_pid': session_info.process.pid if session_info.process.is_alive() else None,
            'is_alive': session_info.process.is_alive()
        }
    async def _send_command(self, session_id: str, command):
        """Send command to session process."""
        session_info = self.sessions.get(session_id)
        if not session_info:
            raise ValueError(f"Session {session_id} not found")
        serialized = MessageSerializer.serialize_message(command)
        session_info.command_queue.put(serialized)
    def _start_response_processing(self, session_info: SessionProcessInfo):
        """Start processing responses from a session process."""
        def process_responses():
            while session_info.session_id in self.sessions and session_info.process.is_alive():
                try:
                    if not session_info.response_queue.empty():
                        response_data = session_info.response_queue.get(timeout=1.0)
                        response = MessageSerializer.deserialize_message(response_data)
                        if self.main_event_loop:
                            asyncio.run_coroutine_threadsafe(
                                self._handle_response(session_info.session_id, response),
                                self.main_event_loop
                            )
                    else:
                        time.sleep(0.01)
                except Exception as e:
                    logger.error(f"Error processing response from {session_info.session_id}: {e}")
        self.response_executor.submit(process_responses)
    async def _handle_response(self, session_id: str, response):
        """Handle response from session process."""
        try:
            session_info = self.sessions.get(session_id)
            if not session_info:
                return
            if response.type == MessageType.INITIALIZED:
                session_info.is_initialized = response.success
                if response.success:
                    logger.info(f"Session {session_id} initialized successfully")
                else:
                    logger.error(f"Session {session_id} initialization failed: {response.error_message}")
            elif response.type == MessageType.DETECTION_RESULT:
                session_info.processed_frames += 1
                if self.detection_result_callback:
                    await self.detection_result_callback(session_info.subscription_identifier, response)
            elif response.type == MessageType.SESSION_SET:
                logger.info(f"Session ID set for {session_id}: {response.backend_session_id}")
            elif response.type == MessageType.HEALTH_RESPONSE:
                session_info.last_health_check = time.time()
                logger.debug(f"Health check for {session_id}: {response.status}")
            elif response.type == MessageType.ERROR:
                logger.error(f"Error from session {session_id}: {response.error_message}")
                if self.error_callback:
                    await self.error_callback(session_info.subscription_identifier, response)
        except Exception as e:
            logger.error(f"Error handling response from {session_id}: {e}", exc_info=True)
    async def _wait_for_shutdown(self, session_info: SessionProcessInfo):
        """Wait for session process to shutdown gracefully."""
        while session_info.process.is_alive():
            await asyncio.sleep(0.1)
    async def _cleanup_session(self, session_id: str):
        """Cleanup session process and resources."""
        try:
            session_info = self.sessions.get(session_id)
            if not session_info:
                return
            # Terminate process if still alive
            if session_info.process.is_alive():
                session_info.process.terminate()
                # Wait a bit for graceful termination
                await asyncio.sleep(1.0)
                if session_info.process.is_alive():
                    session_info.process.kill()
            # Remove from tracking
            del self.sessions[session_id]
            if session_info.subscription_identifier in self.subscription_to_session:
                del self.subscription_to_session[session_info.subscription_identifier]
            logger.info(f"Session {session_id} cleaned up")
        except Exception as e:
            logger.error(f"Error cleaning up session {session_id}: {e}", exc_info=True)
    async def _health_check_loop(self):
        """Periodic health check of all session processes."""
        while self.is_running:
            try:
                for session_id in list(self.sessions.keys()):
                    session_info = self.sessions.get(session_id)
                    if session_info and session_info.is_initialized:
                        # Send health check
                        health_command = HealthCheckCommand(session_id=session_id)
                        await self._send_command(session_id, health_command)
                await asyncio.sleep(self.health_check_interval)
            except asyncio.CancelledError:
                break
            except Exception as e:
                logger.error(f"Error in health check loop: {e}", exc_info=True)
                await asyncio.sleep(5.0)  # Brief pause before retrying
--- a/core/processes/session_worker.py
+++ b/core/processes/session_worker.py
@ -0,0 +1,813 @@
 """
 Session Worker Process - Individual process that handles one session completely.
 Each camera/session gets its own dedicated worker process for complete isolation.
 """
 import asyncio
 import multiprocessing as mp
 import time
 import logging
 import sys
 import os
 import traceback
 import psutil
 import threading
 import cv2
 import requests
 from typing import Dict, Any, Optional, Tuple
 from pathlib import Path
 import numpy as np
 from queue import Queue, Empty
 # Import core modules
 from ..models.manager import ModelManager
 from ..detection.pipeline import DetectionPipeline
 from ..models.pipeline import PipelineParser
 from ..logging.session_logger import PerSessionLogger
 from .communication import (
    MessageSerializer, MessageType, IPCMessageUnion,
    InitializeCommand, ProcessFrameCommand, SetSessionIdCommand,
    ShutdownCommand, HealthCheckCommand,
    InitializedResponse, DetectionResultResponse, SessionSetResponse,
    ShutdownCompleteResponse, HealthResponse, ErrorResponse
 )
 class IntegratedStreamReader:
    """
    Integrated RTSP/HTTP stream reader for session worker processes.
    Handles both RTSP streams and HTTP snapshots with automatic failover.
    """
    def __init__(self, session_id: str, subscription_config: Dict[str, Any], logger: logging.Logger):
        self.session_id = session_id
        self.subscription_config = subscription_config
        self.logger = logger
        # Stream configuration
        self.rtsp_url = subscription_config.get('rtspUrl')
        self.snapshot_url = subscription_config.get('snapshotUrl')
        self.snapshot_interval = subscription_config.get('snapshotInterval', 2000) / 1000.0  # Convert to seconds
        # Stream state
        self.is_running = False
        self.rtsp_cap = None
        self.stream_thread = None
        self.stop_event = threading.Event()
        # Frame buffer - single latest frame only
        self.frame_queue = Queue(maxsize=1)
        self.last_frame_time = 0
        # Stream health monitoring
        self.consecutive_errors = 0
        self.max_consecutive_errors = 30
        self.reconnect_delay = 5.0
        self.frame_timeout = 10.0  # Seconds without frame before considered dead
        # Crop coordinates if present
        self.crop_coords = None
        if subscription_config.get('cropX1') is not None:
            self.crop_coords = (
                subscription_config['cropX1'],
                subscription_config['cropY1'],
                subscription_config['cropX2'],
                subscription_config['cropY2']
            )
    def start(self) -> bool:
        """Start the stream reading in background thread."""
        if self.is_running:
            return True
        try:
            self.is_running = True
            self.stop_event.clear()
            # Start background thread for stream reading
            self.stream_thread = threading.Thread(
                target=self._stream_loop,
                name=f"StreamReader-{self.session_id}",
                daemon=True
            )
            self.stream_thread.start()
            self.logger.info(f"Stream reader started for {self.session_id}")
            return True
        except Exception as e:
            self.logger.error(f"Failed to start stream reader: {e}")
            self.is_running = False
            return False
    def stop(self):
        """Stop the stream reading."""
        if not self.is_running:
            return
        self.logger.info(f"Stopping stream reader for {self.session_id}")
        self.is_running = False
        self.stop_event.set()
        # Close RTSP connection
        if self.rtsp_cap:
            try:
                self.rtsp_cap.release()
            except:
                pass
            self.rtsp_cap = None
        # Wait for thread to finish
        if self.stream_thread and self.stream_thread.is_alive():
            self.stream_thread.join(timeout=3.0)
    def get_latest_frame(self) -> Optional[Tuple[np.ndarray, str, float]]:
        """Get the latest frame if available. Returns (frame, display_id, timestamp) or None."""
        try:
            # Non-blocking get - return None if no frame available
            frame_data = self.frame_queue.get_nowait()
            return frame_data
        except Empty:
            return None
    def _stream_loop(self):
        """Main stream reading loop - runs in background thread."""
        self.logger.info(f"Stream loop started for {self.session_id}")
        while self.is_running and not self.stop_event.is_set():
            try:
                if self.rtsp_url:
                    # Try RTSP first
                    self._read_rtsp_stream()
                elif self.snapshot_url:
                    # Fallback to HTTP snapshots
                    self._read_http_snapshots()
                else:
                    self.logger.error("No stream URL configured")
                    break
            except Exception as e:
                self.logger.error(f"Error in stream loop: {e}")
                self._handle_stream_error()
        self.logger.info(f"Stream loop ended for {self.session_id}")
    def _read_rtsp_stream(self):
        """Read frames from RTSP stream."""
        if not self.rtsp_cap:
            self._connect_rtsp()
        if not self.rtsp_cap:
            return
        try:
            ret, frame = self.rtsp_cap.read()
            if ret and frame is not None:
                # Process the frame
                processed_frame = self._process_frame(frame)
                if processed_frame is not None:
                    # Extract display ID from subscription identifier
                    display_id = self.subscription_config['subscriptionIdentifier'].split(';')[-1]
                    timestamp = time.time()
                    # Put frame in queue (replace if full)
                    try:
                        # Clear queue and put new frame
                        try:
                            self.frame_queue.get_nowait()
                        except Empty:
                            pass
                        self.frame_queue.put((processed_frame, display_id, timestamp), timeout=0.1)
                        self.last_frame_time = timestamp
                        self.consecutive_errors = 0
                    except:
                        pass  # Queue full, skip frame
            else:
                self._handle_stream_error()
        except Exception as e:
            self.logger.error(f"Error reading RTSP frame: {e}")
            self._handle_stream_error()
    def _read_http_snapshots(self):
        """Read frames from HTTP snapshot URL."""
        try:
            response = requests.get(self.snapshot_url, timeout=10)
            response.raise_for_status()
            # Convert response to numpy array
            img_array = np.asarray(bytearray(response.content), dtype=np.uint8)
            frame = cv2.imdecode(img_array, cv2.IMREAD_COLOR)
            if frame is not None:
                # Process the frame
                processed_frame = self._process_frame(frame)
                if processed_frame is not None:
                    # Extract display ID from subscription identifier
                    display_id = self.subscription_config['subscriptionIdentifier'].split(';')[-1]
                    timestamp = time.time()
                    # Put frame in queue (replace if full)
                    try:
                        # Clear queue and put new frame
                        try:
                            self.frame_queue.get_nowait()
                        except Empty:
                            pass
                        self.frame_queue.put((processed_frame, display_id, timestamp), timeout=0.1)
                        self.last_frame_time = timestamp
                        self.consecutive_errors = 0
                    except:
                        pass  # Queue full, skip frame
            # Wait for next snapshot interval
            time.sleep(self.snapshot_interval)
        except Exception as e:
            self.logger.error(f"Error reading HTTP snapshot: {e}")
            self._handle_stream_error()
    def _connect_rtsp(self):
        """Connect to RTSP stream."""
        try:
            self.logger.info(f"Connecting to RTSP: {self.rtsp_url}")
            # Create VideoCapture with optimized settings
            self.rtsp_cap = cv2.VideoCapture(self.rtsp_url)
            # Set buffer size to 1 to reduce latency
            self.rtsp_cap.set(cv2.CAP_PROP_BUFFERSIZE, 1)
            # Check if connection successful
            if self.rtsp_cap.isOpened():
                # Test read a frame
                ret, frame = self.rtsp_cap.read()
                if ret and frame is not None:
                    self.logger.info(f"RTSP connection successful for {self.session_id}")
                    self.consecutive_errors = 0
                    return True
            # Connection failed
            if self.rtsp_cap:
                self.rtsp_cap.release()
                self.rtsp_cap = None
        except Exception as e:
            self.logger.error(f"Failed to connect RTSP: {e}")
        return False
    def _process_frame(self, frame: np.ndarray) -> Optional[np.ndarray]:
        """Process frame - apply cropping if configured."""
        if frame is None:
            return None
        try:
            # Apply crop if configured
            if self.crop_coords:
                x1, y1, x2, y2 = self.crop_coords
                if x1 < x2 and y1 < y2:
                    frame = frame[y1:y2, x1:x2]
            return frame
        except Exception as e:
            self.logger.error(f"Error processing frame: {e}")
            return None
    def _handle_stream_error(self):
        """Handle stream errors with reconnection logic."""
        self.consecutive_errors += 1
        if self.consecutive_errors >= self.max_consecutive_errors:
            self.logger.error(f"Too many consecutive errors ({self.consecutive_errors}), stopping stream")
            self.stop()
            return
        # Close current connection
        if self.rtsp_cap:
            try:
                self.rtsp_cap.release()
            except:
                pass
            self.rtsp_cap = None
        # Wait before reconnecting
        self.logger.warning(f"Stream error #{self.consecutive_errors}, reconnecting in {self.reconnect_delay}s")
        time.sleep(self.reconnect_delay)
    def is_healthy(self) -> bool:
        """Check if stream is healthy (receiving frames)."""
        if not self.is_running:
            return False
        # Check if we've received a frame recently
        if self.last_frame_time > 0:
            time_since_frame = time.time() - self.last_frame_time
            return time_since_frame < self.frame_timeout
        return False
 class SessionWorkerProcess:
    """
    Individual session worker process that handles one camera/session completely.
    Runs in its own process with isolated memory, models, and state.
    """
    def __init__(self, session_id: str, command_queue: mp.Queue, response_queue: mp.Queue):
        """
        Initialize session worker process.
        Args:
            session_id: Unique session identifier
            command_queue: Queue to receive commands from main process
            response_queue: Queue to send responses back to main process
        """
        self.session_id = session_id
        self.command_queue = command_queue
        self.response_queue = response_queue
        # Process information
        self.process = None
        self.start_time = time.time()
        self.processed_frames = 0
        # Session components (will be initialized in process)
        self.model_manager = None
        self.detection_pipeline = None
        self.pipeline_parser = None
        self.logger = None
        self.session_logger = None
        self.stream_reader = None
        # Session state
        self.subscription_config = None
        self.model_config = None
        self.backend_session_id = None
        self.display_id = None
        self.is_initialized = False
        self.should_shutdown = False
        # Frame processing
        self.frame_processing_enabled = False
    async def run(self):
        """
        Main entry point for the worker process.
        This method runs in the separate process.
        """
        try:
            # Set process name for debugging
            mp.current_process().name = f"SessionWorker-{self.session_id}"
            # Setup basic logging first (enhanced after we get subscription config)
            self._setup_basic_logging()
            self.logger.info(f"Session worker process started for session {self.session_id}")
            self.logger.info(f"Process ID: {os.getpid()}")
            # Main message processing loop with integrated frame processing
            while not self.should_shutdown:
                try:
                    # Process pending messages
                    await self._process_pending_messages()
                    # Process frames if enabled and initialized
                    if self.frame_processing_enabled and self.is_initialized and self.stream_reader:
                        await self._process_stream_frames()
                    # Brief sleep to prevent busy waiting
                    await asyncio.sleep(0.01)
                except Exception as e:
                    self.logger.error(f"Error in main processing loop: {e}", exc_info=True)
                    self._send_error_response("main_loop_error", str(e), traceback.format_exc())
        except Exception as e:
            # Critical error in main run loop
            if self.logger:
                self.logger.error(f"Critical error in session worker: {e}", exc_info=True)
            else:
                print(f"Critical error in session worker {self.session_id}: {e}")
        finally:
            # Cleanup stream reader
            if self.stream_reader:
                self.stream_reader.stop()
            if self.session_logger:
                self.session_logger.log_session_end()
            if self.session_logger:
                self.session_logger.cleanup()
            if self.logger:
                self.logger.info(f"Session worker process {self.session_id} shutting down")
    async def _handle_message(self, message: IPCMessageUnion):
        """
        Handle incoming messages from main process.
        Args:
            message: Deserialized message object
        """
        try:
            if message.type == MessageType.INITIALIZE:
                await self._handle_initialize(message)
            elif message.type == MessageType.PROCESS_FRAME:
                await self._handle_process_frame(message)
            elif message.type == MessageType.SET_SESSION_ID:
                await self._handle_set_session_id(message)
            elif message.type == MessageType.SHUTDOWN:
                await self._handle_shutdown(message)
            elif message.type == MessageType.HEALTH_CHECK:
                await self._handle_health_check(message)
            else:
                self.logger.warning(f"Unknown message type: {message.type}")
        except Exception as e:
            self.logger.error(f"Error handling message {message.type}: {e}", exc_info=True)
            self._send_error_response(f"handle_{message.type.value}_error", str(e), traceback.format_exc())
    async def _handle_initialize(self, message: InitializeCommand):
        """
        Initialize the session with models and pipeline.
        Args:
            message: Initialize command message
        """
        try:
            self.logger.info(f"Initializing session {self.session_id}")
            self.logger.info(f"Subscription config: {message.subscription_config}")
            self.logger.info(f"Model config: {message.model_config}")
            # Store configuration
            self.subscription_config = message.subscription_config
            self.model_config = message.model_config
            # Setup enhanced logging now that we have subscription config
            self._setup_enhanced_logging()
            # Initialize model manager (isolated for this process)
            self.model_manager = ModelManager("models")
            self.logger.info("Model manager initialized")
            # Download and prepare model if needed
            model_id = self.model_config.get('modelId')
            model_url = self.model_config.get('modelUrl')
            model_name = self.model_config.get('modelName', f'Model-{model_id}')
            if model_id and model_url:
                model_path = self.model_manager.ensure_model(model_id, model_url, model_name)
                if not model_path:
                    raise RuntimeError(f"Failed to download/prepare model {model_id}")
                self.logger.info(f"Model {model_id} prepared at {model_path}")
                # Log model loading
                if self.session_logger:
                    self.session_logger.log_model_loading(model_id, model_name, str(model_path))
                # Load pipeline configuration
                self.pipeline_parser = self.model_manager.get_pipeline_config(model_id)
                if not self.pipeline_parser:
                    raise RuntimeError(f"Failed to load pipeline config for model {model_id}")
                self.logger.info(f"Pipeline configuration loaded for model {model_id}")
                # Initialize detection pipeline (isolated for this session)
                self.detection_pipeline = DetectionPipeline(
                    pipeline_parser=self.pipeline_parser,
                    model_manager=self.model_manager,
                    model_id=model_id,
                    message_sender=None  # Will be set to send via IPC
                )
                # Initialize pipeline components
                if not await self.detection_pipeline.initialize():
                    raise RuntimeError("Failed to initialize detection pipeline")
                self.logger.info("Detection pipeline initialized successfully")
                # Initialize integrated stream reader
                self.logger.info("Initializing integrated stream reader")
                self.stream_reader = IntegratedStreamReader(
                    self.session_id,
                    self.subscription_config,
                    self.logger
                )
                # Start stream reading
                if self.stream_reader.start():
                    self.logger.info("Stream reader started successfully")
                    self.frame_processing_enabled = True
                else:
                    self.logger.error("Failed to start stream reader")
                self.is_initialized = True
                # Send success response
                response = InitializedResponse(
                    type=MessageType.INITIALIZED,
                    session_id=self.session_id,
                    success=True
                )
                self._send_response(response)
            else:
                raise ValueError("Missing required model configuration (modelId, modelUrl)")
        except Exception as e:
            self.logger.error(f"Failed to initialize session: {e}", exc_info=True)
            response = InitializedResponse(
                type=MessageType.INITIALIZED,
                session_id=self.session_id,
                success=False,
                error_message=str(e)
            )
            self._send_response(response)
    async def _handle_process_frame(self, message: ProcessFrameCommand):
        """
        Process a frame through the detection pipeline.
        Args:
            message: Process frame command message
        """
        if not self.is_initialized:
            self._send_error_response("not_initialized", "Session not initialized", None)
            return
        try:
            self.logger.debug(f"Processing frame for display {message.display_id}")
            # Process frame through detection pipeline
            if self.backend_session_id:
                # Processing phase (after session ID is set)
                result = await self.detection_pipeline.execute_processing_phase(
                    frame=message.frame,
                    display_id=message.display_id,
                    session_id=self.backend_session_id,
                    subscription_id=message.subscription_identifier
                )
                phase = "processing"
            else:
                # Detection phase (before session ID is set)
                result = await self.detection_pipeline.execute_detection_phase(
                    frame=message.frame,
                    display_id=message.display_id,
                    subscription_id=message.subscription_identifier
                )
                phase = "detection"
            self.processed_frames += 1
            # Send result back to main process
            response = DetectionResultResponse(
                session_id=self.session_id,
                detections=result,
                processing_time=result.get('processing_time', 0.0),
                phase=phase
            )
            self._send_response(response)
        except Exception as e:
            self.logger.error(f"Error processing frame: {e}", exc_info=True)
            self._send_error_response("frame_processing_error", str(e), traceback.format_exc())
    async def _handle_set_session_id(self, message: SetSessionIdCommand):
        """
        Set the backend session ID for this session.
        Args:
            message: Set session ID command message
        """
        try:
            self.logger.info(f"Setting backend session ID: {message.backend_session_id}")
            self.backend_session_id = message.backend_session_id
            self.display_id = message.display_id
            response = SessionSetResponse(
                session_id=self.session_id,
                success=True,
                backend_session_id=message.backend_session_id
            )
            self._send_response(response)
        except Exception as e:
            self.logger.error(f"Error setting session ID: {e}", exc_info=True)
            self._send_error_response("set_session_id_error", str(e), traceback.format_exc())
    async def _handle_shutdown(self, message: ShutdownCommand):
        """
        Handle graceful shutdown request.
        Args:
            message: Shutdown command message
        """
        try:
            self.logger.info("Received shutdown request")
            self.should_shutdown = True
            # Cleanup resources
            if self.detection_pipeline:
                # Add cleanup method to pipeline if needed
                pass
            response = ShutdownCompleteResponse(session_id=self.session_id)
            self._send_response(response)
        except Exception as e:
            self.logger.error(f"Error during shutdown: {e}", exc_info=True)
    async def _handle_health_check(self, message: HealthCheckCommand):
        """
        Handle health check request.
        Args:
            message: Health check command message
        """
        try:
            # Get process metrics
            process = psutil.Process()
            memory_info = process.memory_info()
            memory_mb = memory_info.rss / (1024 * 1024)  # Convert to MB
            cpu_percent = process.cpu_percent()
            # GPU memory (if available)
            gpu_memory_mb = None
            try:
                import torch
                if torch.cuda.is_available():
                    gpu_memory_mb = torch.cuda.memory_allocated() / (1024 * 1024)
            except ImportError:
                pass
            # Determine health status
            status = "healthy"
            if memory_mb > 2048:  # More than 2GB
                status = "degraded"
            if memory_mb > 4096:  # More than 4GB
                status = "unhealthy"
            response = HealthResponse(
                session_id=self.session_id,
                status=status,
                memory_usage_mb=memory_mb,
                cpu_percent=cpu_percent,
                gpu_memory_mb=gpu_memory_mb,
                uptime_seconds=time.time() - self.start_time,
                processed_frames=self.processed_frames
            )
            self._send_response(response)
        except Exception as e:
            self.logger.error(f"Error checking health: {e}", exc_info=True)
            self._send_error_response("health_check_error", str(e), traceback.format_exc())
    def _send_response(self, response: IPCMessageUnion):
        """
        Send response message to main process.
        Args:
            response: Response message to send
        """
        try:
            serialized = MessageSerializer.serialize_message(response)
            self.response_queue.put(serialized)
        except Exception as e:
            if self.logger:
                self.logger.error(f"Failed to send response: {e}")
    def _send_error_response(self, error_type: str, error_message: str, traceback_str: Optional[str]):
        """
        Send error response to main process.
        Args:
            error_type: Type of error
            error_message: Error message
            traceback_str: Optional traceback string
        """
        error_response = ErrorResponse(
            type=MessageType.ERROR,
            session_id=self.session_id,
            error_type=error_type,
            error_message=error_message,
            traceback=traceback_str
        )
        self._send_response(error_response)
    def _setup_basic_logging(self):
        """
        Setup basic logging for this process before we have subscription config.
        """
        logging.basicConfig(
            level=logging.INFO,
            format=f"%(asctime)s [%(levelname)s] SessionWorker-{self.session_id}: %(message)s",
            handlers=[
                logging.StreamHandler(sys.stdout)
            ]
        )
        self.logger = logging.getLogger(f"session_worker_{self.session_id}")
    def _setup_enhanced_logging(self):
        """
        Setup per-session logging with dedicated log file after we have subscription config.
        Phase 2: Enhanced logging with file rotation and session context.
        """
        if not self.subscription_config:
            return
        # Initialize per-session logger
        subscription_id = self.subscription_config.get('subscriptionIdentifier', self.session_id)
        self.session_logger = PerSessionLogger(
            session_id=self.session_id,
            subscription_identifier=subscription_id,
            log_dir="logs",
            max_size_mb=100,
            backup_count=5
        )
        # Get the configured logger (replaces basic logger)
        self.logger = self.session_logger.get_logger()
        # Log session start
        self.session_logger.log_session_start(os.getpid())
    async def _process_pending_messages(self):
        """Process pending IPC messages from main process."""
        try:
            # Process all pending messages
            while not self.command_queue.empty():
                message_data = self.command_queue.get_nowait()
                message = MessageSerializer.deserialize_message(message_data)
                await self._handle_message(message)
        except Exception as e:
            if not self.command_queue.empty():
                # Only log error if there was actually a message to process
                self.logger.error(f"Error processing messages: {e}", exc_info=True)
    async def _process_stream_frames(self):
        """Process frames from the integrated stream reader."""
        try:
            if not self.stream_reader or not self.stream_reader.is_running:
                return
            # Get latest frame from stream
            frame_data = self.stream_reader.get_latest_frame()
            if frame_data is None:
                return
            frame, display_id, timestamp = frame_data
            # Process frame through detection pipeline
            subscription_identifier = self.subscription_config['subscriptionIdentifier']
            if self.backend_session_id:
                # Processing phase (after session ID is set)
                result = await self.detection_pipeline.execute_processing_phase(
                    frame=frame,
                    display_id=display_id,
                    session_id=self.backend_session_id,
                    subscription_id=subscription_identifier
                )
                phase = "processing"
            else:
                # Detection phase (before session ID is set)
                result = await self.detection_pipeline.execute_detection_phase(
                    frame=frame,
                    display_id=display_id,
                    subscription_id=subscription_identifier
                )
                phase = "detection"
            self.processed_frames += 1
            # Send result back to main process
            response = DetectionResultResponse(
                type=MessageType.DETECTION_RESULT,
                session_id=self.session_id,
                detections=result,
                processing_time=result.get('processing_time', 0.0),
                phase=phase
            )
            self._send_response(response)
            # Log frame processing (debug level to avoid spam)
            self.logger.debug(f"Processed frame #{self.processed_frames} from {display_id} (phase: {phase})")
        except Exception as e:
            self.logger.error(f"Error processing stream frame: {e}", exc_info=True)
 def session_worker_main(session_id: str, command_queue: mp.Queue, response_queue: mp.Queue):
    """
    Main entry point for session worker process.
    This function is called when the process is spawned.
    """
    # Create worker instance
    worker = SessionWorkerProcess(session_id, command_queue, response_queue)
    # Run the worker
    asyncio.run(worker.run())