refactor: half way to process per session
All checks were successful
Build Worker Base and Application Images / check-base-changes (push) Successful in 7s
Build Worker Base and Application Images / build-base (push) Has been skipped
Build Worker Base and Application Images / build-docker (push) Successful in 2m52s
Build Worker Base and Application Images / deploy-stack (push) Successful in 9s
All checks were successful
Build Worker Base and Application Images / check-base-changes (push) Successful in 7s
Build Worker Base and Application Images / build-base (push) Has been skipped
Build Worker Base and Application Images / build-docker (push) Successful in 2m52s
Build Worker Base and Application Images / deploy-stack (push) Successful in 9s
This commit is contained in:
parent
2e5316ca01
commit
34d1982e9e
12 changed files with 2771 additions and 92 deletions
339
IMPLEMENTATION_PLAN.md
Normal file
339
IMPLEMENTATION_PLAN.md
Normal file
|
@ -0,0 +1,339 @@
|
|||
# Session-Isolated Multiprocessing Architecture - Implementation Plan
|
||||
|
||||
## 🎯 Objective
|
||||
Eliminate shared state issues causing identical results across different sessions by implementing **Process-Per-Session architecture** with **per-camera logging**.
|
||||
|
||||
## 🔍 Root Cause Analysis
|
||||
|
||||
### Current Shared State Issues:
|
||||
1. **Shared Model Cache** (`core/models/inference.py:40`): All sessions share same cached YOLO model instances
|
||||
2. **Single Pipeline Instance** (`core/detection/pipeline.py`): One pipeline handles all sessions with shared mappings
|
||||
3. **Global Session Mappings**: `session_to_subscription` and `session_processing_results` dictionaries
|
||||
4. **Shared Thread Pool**: Single `ThreadPoolExecutor` for all sessions
|
||||
5. **Global Frame Cache** (`app.py:39`): `latest_frames` shared across endpoints
|
||||
6. **Single Log File**: All cameras write to `detector_worker.log`
|
||||
|
||||
## 🏗️ New Architecture: Process-Per-Session
|
||||
|
||||
```
|
||||
FastAPI Main Process (Port 8001)
|
||||
├── WebSocket Handler (manages connections)
|
||||
├── SessionProcessManager (spawns/manages session processes)
|
||||
├── Main Process Logger → detector_worker_main.log
|
||||
├──
|
||||
├── Session Process 1 (Camera/Display 1)
|
||||
│ ├── Dedicated Model Pipeline
|
||||
│ ├── Own Model Cache & Memory
|
||||
│ ├── Session Logger → detector_worker_camera_display-001_cam-001.log
|
||||
│ └── Redis/DB connections
|
||||
├──
|
||||
├── Session Process 2 (Camera/Display 2)
|
||||
│ ├── Dedicated Model Pipeline
|
||||
│ ├── Own Model Cache & Memory
|
||||
│ ├── Session Logger → detector_worker_camera_display-002_cam-001.log
|
||||
│ └── Redis/DB connections
|
||||
└──
|
||||
└── Session Process N...
|
||||
```
|
||||
|
||||
## 📋 Implementation Tasks
|
||||
|
||||
### Phase 1: Core Infrastructure ✅ **COMPLETED**
|
||||
- [x] **Create SessionProcessManager class** ✅
|
||||
- Manages lifecycle of session processes
|
||||
- Handles process spawning, monitoring, and cleanup
|
||||
- Maintains process registry and health checks
|
||||
|
||||
- [x] **Implement SessionWorkerProcess** ✅
|
||||
- Individual process class that handles one session completely
|
||||
- Loads own models, pipeline, and maintains state
|
||||
- Communicates via queues with main process
|
||||
|
||||
- [x] **Design Inter-Process Communication** ✅
|
||||
- Command queue: Main → Session (frames, commands, config)
|
||||
- Result queue: Session → Main (detections, status, errors)
|
||||
- Use `multiprocessing.Queue` for thread-safe communication
|
||||
|
||||
**Phase 1 Testing Results:**
|
||||
- ✅ Server starts successfully on port 8001
|
||||
- ✅ WebSocket connections established (10.100.1.3:57488)
|
||||
- ✅ SessionProcessManager initializes (max_sessions=20)
|
||||
- ✅ Multiple session processes created (9 camera subscriptions)
|
||||
- ✅ Individual session processes spawn with unique PIDs (e.g., PID: 16380)
|
||||
- ✅ Session logging shows isolated process names (SessionWorker-session_xxx)
|
||||
- ✅ IPC communication framework functioning
|
||||
|
||||
**What to Look For When Testing:**
|
||||
- Check logs for "SessionProcessManager initialized"
|
||||
- Verify individual session processes: "Session process created: session_xxx (PID: xxxx)"
|
||||
- Monitor process isolation: Each session has unique process name "SessionWorker-session_xxx"
|
||||
- Confirm WebSocket integration: "Session WebSocket integration started"
|
||||
|
||||
### Phase 2: Per-Session Logging ✅ **COMPLETED**
|
||||
- [x] **Implement PerSessionLogger** ✅
|
||||
- Each session process creates own log file
|
||||
- Format: `detector_worker_camera_{subscription_id}.log`
|
||||
- Include session context in all log messages
|
||||
- Implement log rotation (daily/size-based)
|
||||
|
||||
- [x] **Update Main Process Logging** ✅
|
||||
- Main process logs to `detector_worker_main.log`
|
||||
- Log session process lifecycle events
|
||||
- Track active sessions and resource usage
|
||||
|
||||
**Phase 2 Testing Results:**
|
||||
- ✅ Main process logs to dedicated file: `logs/detector_worker_main.log`
|
||||
- ✅ Session-specific logger initialization working
|
||||
- ✅ Each camera spawns with unique session worker name: "SessionWorker-session_{unique_id}_{camera_name}"
|
||||
- ✅ Per-session logger ready for file creation (will create files when sessions fully initialize)
|
||||
- ✅ Structured logging with session context in format
|
||||
- ✅ Log rotation capability implemented (100MB max, 5 backups)
|
||||
|
||||
**What to Look For When Testing:**
|
||||
- Check for main process log: `logs/detector_worker_main.log`
|
||||
- Monitor per-session process names in logs: "SessionWorker-session_xxx"
|
||||
- Once sessions initialize fully, look for per-camera log files: `detector_worker_camera_{camera_name}.log`
|
||||
- Verify session start/end events are logged with timestamps
|
||||
- Check log rotation when files exceed 100MB
|
||||
|
||||
### Phase 3: Model & Pipeline Isolation ✅ **COMPLETED**
|
||||
- [x] **Remove Shared Model Cache** ✅
|
||||
- Eliminated `YOLOWrapper._model_cache` class variable
|
||||
- Each process loads models independently
|
||||
- Memory isolation prevents cross-session contamination
|
||||
|
||||
- [x] **Create Per-Process Pipeline Instances** ✅
|
||||
- Each session process instantiates own `DetectionPipeline`
|
||||
- Removed global pipeline singleton pattern
|
||||
- Session-local `session_to_subscription` mapping
|
||||
|
||||
- [x] **Isolate Session State** ✅
|
||||
- Each process maintains own `session_processing_results`
|
||||
- Session mappings are process-local
|
||||
- Complete state isolation per session
|
||||
|
||||
**Phase 3 Testing Results:**
|
||||
- ✅ **Zero Shared Cache**: Models log "(ISOLATED)" and "no shared cache!"
|
||||
- ✅ **Individual Model Loading**: Each session loads complete model set independently
|
||||
- `car_frontal_detection_v1.pt` per session
|
||||
- `car_brand_cls_v1.pt` per session
|
||||
- `car_bodytype_cls_v1.pt` per session
|
||||
- ✅ **Pipeline Isolation**: Each session has unique pipeline instance ID
|
||||
- ✅ **Memory Isolation**: Different sessions cannot share model instances
|
||||
- ✅ **State Isolation**: Session mappings are process-local (ISOLATED comments added)
|
||||
|
||||
**What to Look For When Testing:**
|
||||
- Check logs for "(ISOLATED)" on model loading
|
||||
- Verify each session loads models independently: "Loading YOLO model ... (ISOLATED)"
|
||||
- Monitor unique pipeline instance IDs per session
|
||||
- Confirm no shared state between sessions
|
||||
- Look for "Successfully loaded model ... in isolation - no shared cache!"
|
||||
|
||||
### Phase 4: Integrated Stream-Session Architecture 🚧 **IN PROGRESS**
|
||||
|
||||
**Problem Identified:** Frame processing pipeline not working due to dual stream systems causing communication gap.
|
||||
|
||||
**Root Cause:**
|
||||
- Old RTSP Process Manager capturing frames but not forwarding to session workers
|
||||
- New Session Workers ready for processing but receiving no frames
|
||||
- Architecture mismatch preventing detection despite successful initialization
|
||||
|
||||
**Solution:** Complete integration of stream reading INTO session worker processes.
|
||||
|
||||
- [ ] **Integrate RTSP Stream Reading into Session Workers**
|
||||
- Move RTSP stream capture from separate processes into each session worker
|
||||
- Each session worker handles: RTSP connection + frame processing + model inference
|
||||
- Eliminate communication gap between stream capture and detection
|
||||
|
||||
- [ ] **Remove Duplicate Stream Management Systems**
|
||||
- Delete old RTSP Process Manager (`core/streaming/process_manager.py`)
|
||||
- Remove conflicting stream management from main process
|
||||
- Consolidate to single session-worker-only architecture
|
||||
|
||||
- [ ] **Enhanced Session Worker with Stream Integration**
|
||||
- Add RTSP stream reader to `SessionWorkerProcess`
|
||||
- Implement frame buffer queue management per worker
|
||||
- Add connection recovery and stream health monitoring per session
|
||||
|
||||
- [ ] **Complete End-to-End Isolation per Camera**
|
||||
```
|
||||
Session Worker Process N:
|
||||
├── RTSP Stream Reader (rtsp://cameraN)
|
||||
├── Frame Buffer Queue
|
||||
├── YOLO Detection Pipeline
|
||||
├── Model Cache (isolated)
|
||||
├── Database/Redis connections
|
||||
└── Per-camera Logger
|
||||
```
|
||||
|
||||
**Benefits for 20+ Cameras:**
|
||||
- **Python GIL Bypass**: True parallelism with multiprocessing
|
||||
- **Resource Isolation**: Process crashes don't affect other cameras
|
||||
- **Memory Distribution**: Each process has own memory space
|
||||
- **Independent Recovery**: Per-camera reconnection logic
|
||||
- **Scalable Architecture**: Linear scaling with available CPU cores
|
||||
|
||||
### Phase 5: Resource Management & Cleanup
|
||||
- [ ] **Process Lifecycle Management**
|
||||
- Automatic process cleanup on WebSocket disconnect
|
||||
- Graceful shutdown handling
|
||||
- Resource deallocation on process termination
|
||||
|
||||
- [ ] **Memory & GPU Management**
|
||||
- Monitor per-process memory usage
|
||||
- GPU memory isolation between sessions
|
||||
- Prevent memory leaks in long-running processes
|
||||
|
||||
- [ ] **Health Monitoring**
|
||||
- Process health checks and restart capability
|
||||
- Performance metrics per session process
|
||||
- Resource usage monitoring and alerting
|
||||
|
||||
## 🔄 What Will Be Replaced
|
||||
|
||||
### Files to Modify:
|
||||
1. **`app.py`**
|
||||
- Replace direct pipeline execution with process management
|
||||
- Remove global `latest_frames` cache
|
||||
- Add SessionProcessManager integration
|
||||
|
||||
2. **`core/models/inference.py`**
|
||||
- Remove shared `_model_cache` class variable
|
||||
- Make model loading process-specific
|
||||
- Eliminate cross-session model sharing
|
||||
|
||||
3. **`core/detection/pipeline.py`**
|
||||
- Remove global session mappings
|
||||
- Make pipeline instance session-specific
|
||||
- Isolate processing state per session
|
||||
|
||||
4. **`core/communication/websocket.py`**
|
||||
- Replace direct pipeline calls with IPC
|
||||
- Add process spawn/cleanup on subscribe/unsubscribe
|
||||
- Implement queue-based communication
|
||||
|
||||
### New Files to Create:
|
||||
1. **`core/processes/session_manager.py`**
|
||||
- SessionProcessManager class
|
||||
- Process lifecycle management
|
||||
- Health monitoring and cleanup
|
||||
|
||||
2. **`core/processes/session_worker.py`**
|
||||
- SessionWorkerProcess class
|
||||
- Individual session process implementation
|
||||
- Model loading and pipeline execution
|
||||
|
||||
3. **`core/processes/communication.py`**
|
||||
- IPC message definitions and handlers
|
||||
- Queue management utilities
|
||||
- Protocol for main ↔ session communication
|
||||
|
||||
4. **`core/logging/session_logger.py`**
|
||||
- Per-session logging configuration
|
||||
- Log file management and rotation
|
||||
- Structured logging with session context
|
||||
|
||||
## ❌ What Will Be Removed
|
||||
|
||||
### Code to Remove:
|
||||
1. **Shared State Variables**
|
||||
```python
|
||||
# From core/models/inference.py
|
||||
_model_cache: Dict[str, Any] = {}
|
||||
|
||||
# From core/detection/pipeline.py
|
||||
self.session_to_subscription = {}
|
||||
self.session_processing_results = {}
|
||||
|
||||
# From app.py
|
||||
latest_frames = {}
|
||||
```
|
||||
|
||||
2. **Global Singleton Patterns**
|
||||
- Single pipeline instance handling all sessions
|
||||
- Shared ThreadPoolExecutor across sessions
|
||||
- Global model manager for all subscriptions
|
||||
|
||||
3. **Cross-Session Dependencies**
|
||||
- Session mapping lookups across different subscriptions
|
||||
- Shared processing state between unrelated sessions
|
||||
- Global frame caching across all cameras
|
||||
|
||||
## 🔧 Configuration Changes
|
||||
|
||||
### New Configuration Options:
|
||||
```json
|
||||
{
|
||||
"session_processes": {
|
||||
"max_concurrent_sessions": 20,
|
||||
"process_cleanup_timeout": 30,
|
||||
"health_check_interval": 10,
|
||||
"log_rotation": {
|
||||
"max_size_mb": 100,
|
||||
"backup_count": 5
|
||||
}
|
||||
},
|
||||
"resource_limits": {
|
||||
"memory_per_process_mb": 2048,
|
||||
"gpu_memory_fraction": 0.3
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 📊 Benefits of New Architecture
|
||||
|
||||
### 🛡️ Complete Isolation:
|
||||
- **Memory Isolation**: Each session runs in separate process memory space
|
||||
- **Model Isolation**: No shared model cache between sessions
|
||||
- **State Isolation**: Session mappings and processing state are process-local
|
||||
- **Error Isolation**: Process crashes don't affect other sessions
|
||||
|
||||
### 📈 Performance Improvements:
|
||||
- **True Parallelism**: Bypass Python GIL limitations
|
||||
- **Resource Optimization**: Each process uses only required resources
|
||||
- **Scalability**: Linear scaling with available CPU cores
|
||||
- **Memory Efficiency**: Automatic cleanup on session termination
|
||||
|
||||
### 🔍 Enhanced Monitoring:
|
||||
- **Per-Camera Logs**: Dedicated log file for each session
|
||||
- **Resource Tracking**: Monitor CPU/memory per session process
|
||||
- **Debugging**: Isolated logs make issue diagnosis easier
|
||||
- **Audit Trail**: Complete processing history per camera
|
||||
|
||||
### 🚀 Operational Benefits:
|
||||
- **Zero Cross-Session Contamination**: Impossible for sessions to affect each other
|
||||
- **Hot Restart**: Individual session restart without affecting others
|
||||
- **Resource Control**: Fine-grained resource allocation per session
|
||||
- **Development**: Easier testing and debugging of individual sessions
|
||||
|
||||
## 🎬 Implementation Order
|
||||
|
||||
1. **Phase 1**: Core infrastructure (SessionProcessManager, IPC)
|
||||
2. **Phase 2**: Per-session logging system
|
||||
3. **Phase 3**: Model and pipeline isolation
|
||||
4. **Phase 4**: Resource management and monitoring
|
||||
|
||||
## 🧪 Testing Strategy
|
||||
|
||||
1. **Unit Tests**: Test individual session processes in isolation
|
||||
2. **Integration Tests**: Test main ↔ session process communication
|
||||
3. **Load Tests**: Multiple concurrent sessions with different models
|
||||
4. **Memory Tests**: Verify no cross-session memory leaks
|
||||
5. **Logging Tests**: Verify correct log file creation and rotation
|
||||
|
||||
## 📝 Migration Checklist
|
||||
|
||||
- [ ] Backup current working version
|
||||
- [ ] Implement Phase 1 (core infrastructure)
|
||||
- [ ] Test with single session process
|
||||
- [ ] Implement Phase 2 (logging)
|
||||
- [ ] Test with multiple concurrent sessions
|
||||
- [ ] Implement Phase 3 (isolation)
|
||||
- [ ] Verify complete elimination of shared state
|
||||
- [ ] Implement Phase 4 (resource management)
|
||||
- [ ] Performance testing and optimization
|
||||
- [ ] Documentation updates
|
||||
|
||||
---
|
||||
|
||||
**Expected Outcome**: Complete elimination of cross-session result contamination with enhanced monitoring capabilities and true session isolation.
|
14
app.py
14
app.py
|
@ -22,15 +22,11 @@ if __name__ != "__main__": # When imported by uvicorn
|
|||
from core.communication.websocket import websocket_endpoint
|
||||
from core.communication.state import worker_state
|
||||
|
||||
# Configure logging
|
||||
logging.basicConfig(
|
||||
level=logging.DEBUG,
|
||||
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
|
||||
handlers=[
|
||||
logging.FileHandler("detector_worker.log"),
|
||||
logging.StreamHandler()
|
||||
]
|
||||
)
|
||||
# Import and setup main process logging
|
||||
from core.logging.session_logger import setup_main_process_logging
|
||||
|
||||
# Configure main process logging
|
||||
setup_main_process_logging("logs")
|
||||
|
||||
logger = logging.getLogger("detector_worker")
|
||||
logger.setLevel(logging.DEBUG)
|
||||
|
|
319
core/communication/session_integration.py
Normal file
319
core/communication/session_integration.py
Normal file
|
@ -0,0 +1,319 @@
|
|||
"""
|
||||
Integration layer between WebSocket handler and Session Process Manager.
|
||||
Bridges the existing WebSocket protocol with the new session-based architecture.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
from typing import Dict, Any, Optional
|
||||
import numpy as np
|
||||
|
||||
from ..processes.session_manager import SessionProcessManager
|
||||
from ..processes.communication import DetectionResultResponse, ErrorResponse
|
||||
from .state import worker_state
|
||||
from .messages import serialize_outgoing_message
|
||||
# Streaming is now handled directly by session workers - no shared stream manager needed
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class SessionWebSocketIntegration:
|
||||
"""
|
||||
Integration layer that connects WebSocket protocol with Session Process Manager.
|
||||
Maintains compatibility with existing WebSocket message handling.
|
||||
"""
|
||||
|
||||
def __init__(self, websocket_handler=None):
|
||||
"""
|
||||
Initialize session WebSocket integration.
|
||||
|
||||
Args:
|
||||
websocket_handler: Reference to WebSocket handler for sending messages
|
||||
"""
|
||||
self.websocket_handler = websocket_handler
|
||||
self.session_manager = SessionProcessManager()
|
||||
|
||||
# Track active subscriptions for compatibility
|
||||
self.active_subscriptions: Dict[str, Dict[str, Any]] = {}
|
||||
|
||||
# Set up callbacks
|
||||
self.session_manager.set_detection_result_callback(self._on_detection_result)
|
||||
self.session_manager.set_error_callback(self._on_session_error)
|
||||
|
||||
async def start(self):
|
||||
"""Start the session integration."""
|
||||
await self.session_manager.start()
|
||||
logger.info("Session WebSocket integration started")
|
||||
|
||||
async def stop(self):
|
||||
"""Stop the session integration."""
|
||||
await self.session_manager.stop()
|
||||
logger.info("Session WebSocket integration stopped")
|
||||
|
||||
async def handle_set_subscription_list(self, message) -> bool:
|
||||
"""
|
||||
Handle setSubscriptionList message by managing session processes.
|
||||
|
||||
Args:
|
||||
message: SetSubscriptionListMessage
|
||||
|
||||
Returns:
|
||||
True if successful
|
||||
"""
|
||||
try:
|
||||
logger.info(f"Processing subscription list with {len(message.subscriptions)} subscriptions")
|
||||
|
||||
new_subscription_ids = set()
|
||||
for subscription in message.subscriptions:
|
||||
subscription_id = subscription.subscriptionIdentifier
|
||||
new_subscription_ids.add(subscription_id)
|
||||
|
||||
# Check if this is a new subscription
|
||||
if subscription_id not in self.active_subscriptions:
|
||||
logger.info(f"Creating new session for subscription: {subscription_id}")
|
||||
|
||||
# Convert subscription to configuration dict
|
||||
subscription_config = {
|
||||
'subscriptionIdentifier': subscription.subscriptionIdentifier,
|
||||
'rtspUrl': getattr(subscription, 'rtspUrl', None),
|
||||
'snapshotUrl': getattr(subscription, 'snapshotUrl', None),
|
||||
'snapshotInterval': getattr(subscription, 'snapshotInterval', 5000),
|
||||
'modelUrl': subscription.modelUrl,
|
||||
'modelId': subscription.modelId,
|
||||
'modelName': subscription.modelName,
|
||||
'cropX1': subscription.cropX1,
|
||||
'cropY1': subscription.cropY1,
|
||||
'cropX2': subscription.cropX2,
|
||||
'cropY2': subscription.cropY2
|
||||
}
|
||||
|
||||
# Create session process
|
||||
success = await self.session_manager.create_session(
|
||||
subscription_id, subscription_config
|
||||
)
|
||||
|
||||
if success:
|
||||
self.active_subscriptions[subscription_id] = subscription_config
|
||||
logger.info(f"Session created successfully for {subscription_id}")
|
||||
|
||||
# Stream handling is now integrated into session worker process
|
||||
else:
|
||||
logger.error(f"Failed to create session for {subscription_id}")
|
||||
return False
|
||||
|
||||
else:
|
||||
# Update existing subscription configuration if needed
|
||||
self.active_subscriptions[subscription_id].update({
|
||||
'modelUrl': subscription.modelUrl,
|
||||
'modelId': subscription.modelId,
|
||||
'modelName': subscription.modelName,
|
||||
'cropX1': subscription.cropX1,
|
||||
'cropY1': subscription.cropY1,
|
||||
'cropX2': subscription.cropX2,
|
||||
'cropY2': subscription.cropY2
|
||||
})
|
||||
|
||||
# Remove sessions for subscriptions that are no longer active
|
||||
current_subscription_ids = set(self.active_subscriptions.keys())
|
||||
removed_subscriptions = current_subscription_ids - new_subscription_ids
|
||||
|
||||
for subscription_id in removed_subscriptions:
|
||||
logger.info(f"Removing session for subscription: {subscription_id}")
|
||||
await self.session_manager.remove_session(subscription_id)
|
||||
del self.active_subscriptions[subscription_id]
|
||||
|
||||
# Update worker state for compatibility
|
||||
worker_state.set_subscriptions(message.subscriptions)
|
||||
|
||||
logger.info(f"Subscription list processed: {len(new_subscription_ids)} active sessions")
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error handling subscription list: {e}", exc_info=True)
|
||||
return False
|
||||
|
||||
async def handle_set_session_id(self, message) -> bool:
|
||||
"""
|
||||
Handle setSessionId message by forwarding to appropriate session process.
|
||||
|
||||
Args:
|
||||
message: SetSessionIdMessage
|
||||
|
||||
Returns:
|
||||
True if successful
|
||||
"""
|
||||
try:
|
||||
display_id = message.payload.displayIdentifier
|
||||
session_id = message.payload.sessionId
|
||||
|
||||
logger.info(f"Setting session ID {session_id} for display {display_id}")
|
||||
|
||||
# Find subscription identifier for this display
|
||||
subscription_id = None
|
||||
for sub_id in self.active_subscriptions.keys():
|
||||
# Extract display identifier from subscription identifier
|
||||
if display_id in sub_id:
|
||||
subscription_id = sub_id
|
||||
break
|
||||
|
||||
if not subscription_id:
|
||||
logger.error(f"No active subscription found for display {display_id}")
|
||||
return False
|
||||
|
||||
# Forward to session process
|
||||
success = await self.session_manager.set_session_id(
|
||||
subscription_id, str(session_id), display_id
|
||||
)
|
||||
|
||||
if success:
|
||||
# Update worker state for compatibility
|
||||
worker_state.set_session_id(display_id, session_id)
|
||||
logger.info(f"Session ID {session_id} set successfully for {display_id}")
|
||||
else:
|
||||
logger.error(f"Failed to set session ID {session_id} for {display_id}")
|
||||
|
||||
return success
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error setting session ID: {e}", exc_info=True)
|
||||
return False
|
||||
|
||||
async def process_frame(self, subscription_id: str, frame: np.ndarray, display_id: str, timestamp: float = None) -> bool:
|
||||
"""
|
||||
Process frame through appropriate session process.
|
||||
|
||||
Args:
|
||||
subscription_id: Subscription identifier
|
||||
frame: Frame to process
|
||||
display_id: Display identifier
|
||||
timestamp: Frame timestamp
|
||||
|
||||
Returns:
|
||||
True if frame was processed successfully
|
||||
"""
|
||||
try:
|
||||
if timestamp is None:
|
||||
timestamp = asyncio.get_event_loop().time()
|
||||
|
||||
# Forward frame to session process
|
||||
success = await self.session_manager.process_frame(
|
||||
subscription_id, frame, display_id, timestamp
|
||||
)
|
||||
|
||||
if not success:
|
||||
logger.warning(f"Failed to process frame for subscription {subscription_id}")
|
||||
|
||||
return success
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error processing frame for {subscription_id}: {e}", exc_info=True)
|
||||
return False
|
||||
|
||||
async def _on_detection_result(self, subscription_id: str, response: DetectionResultResponse):
|
||||
"""
|
||||
Handle detection result from session process.
|
||||
|
||||
Args:
|
||||
subscription_id: Subscription identifier
|
||||
response: Detection result response
|
||||
"""
|
||||
try:
|
||||
logger.debug(f"Received detection result from {subscription_id}: phase={response.phase}")
|
||||
|
||||
# Send imageDetection message via WebSocket (if needed)
|
||||
if self.websocket_handler and hasattr(self.websocket_handler, 'send_message'):
|
||||
from .models import ImageDetectionMessage, DetectionData
|
||||
|
||||
# Convert response detections to the expected format
|
||||
# The DetectionData expects modelId and modelName, and detection dict
|
||||
detection_data = DetectionData(
|
||||
detection=response.detections,
|
||||
modelId=getattr(response, 'model_id', 0), # Get from response if available
|
||||
modelName=getattr(response, 'model_name', 'unknown') # Get from response if available
|
||||
)
|
||||
|
||||
# Convert timestamp to string format if it exists
|
||||
timestamp_str = None
|
||||
if hasattr(response, 'timestamp') and response.timestamp:
|
||||
from datetime import datetime
|
||||
if isinstance(response.timestamp, (int, float)):
|
||||
# Convert Unix timestamp to ISO format string
|
||||
timestamp_str = datetime.fromtimestamp(response.timestamp).strftime("%Y-%m-%dT%H:%M:%S.%fZ")
|
||||
else:
|
||||
timestamp_str = str(response.timestamp)
|
||||
|
||||
detection_message = ImageDetectionMessage(
|
||||
subscriptionIdentifier=subscription_id,
|
||||
data=detection_data,
|
||||
timestamp=timestamp_str
|
||||
)
|
||||
|
||||
serialized = serialize_outgoing_message(detection_message)
|
||||
await self.websocket_handler.send_message(serialized)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error handling detection result from {subscription_id}: {e}", exc_info=True)
|
||||
|
||||
async def _on_session_error(self, subscription_id: str, error_response: ErrorResponse):
|
||||
"""
|
||||
Handle error from session process.
|
||||
|
||||
Args:
|
||||
subscription_id: Subscription identifier
|
||||
error_response: Error response
|
||||
"""
|
||||
logger.error(f"Session error from {subscription_id}: {error_response.error_type} - {error_response.error_message}")
|
||||
|
||||
# Send error message via WebSocket if needed
|
||||
if self.websocket_handler and hasattr(self.websocket_handler, 'send_message'):
|
||||
error_message = {
|
||||
'type': 'sessionError',
|
||||
'payload': {
|
||||
'subscriptionIdentifier': subscription_id,
|
||||
'errorType': error_response.error_type,
|
||||
'errorMessage': error_response.error_message,
|
||||
'timestamp': error_response.timestamp
|
||||
}
|
||||
}
|
||||
|
||||
try:
|
||||
serialized = serialize_outgoing_message(error_message)
|
||||
await self.websocket_handler.send_message(serialized)
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to send error message: {e}")
|
||||
|
||||
def get_session_stats(self) -> Dict[str, Any]:
|
||||
"""
|
||||
Get statistics about active sessions.
|
||||
|
||||
Returns:
|
||||
Dictionary with session statistics
|
||||
"""
|
||||
return {
|
||||
'active_sessions': self.session_manager.get_session_count(),
|
||||
'max_sessions': self.session_manager.max_concurrent_sessions,
|
||||
'subscriptions': list(self.active_subscriptions.keys())
|
||||
}
|
||||
|
||||
async def handle_progression_stage(self, message) -> bool:
|
||||
"""
|
||||
Handle setProgressionStage message.
|
||||
|
||||
Args:
|
||||
message: SetProgressionStageMessage
|
||||
|
||||
Returns:
|
||||
True if successful
|
||||
"""
|
||||
try:
|
||||
# For now, just update worker state for compatibility
|
||||
# In future phases, this could be forwarded to session processes
|
||||
worker_state.set_progression_stage(
|
||||
message.payload.displayIdentifier,
|
||||
message.payload.progressionStage
|
||||
)
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.error(f"Error handling progression stage: {e}", exc_info=True)
|
||||
return False
|
||||
|
|
@ -24,6 +24,7 @@ from .state import worker_state, SystemMetrics
|
|||
from ..models import ModelManager
|
||||
from ..streaming.manager import shared_stream_manager
|
||||
from ..tracking.integration import TrackingPipelineIntegration
|
||||
from .session_integration import SessionWebSocketIntegration
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
@ -48,6 +49,9 @@ class WebSocketHandler:
|
|||
self._heartbeat_count = 0
|
||||
self._last_processed_models: set = set() # Cache of last processed model IDs
|
||||
|
||||
# Initialize session integration
|
||||
self.session_integration = SessionWebSocketIntegration(self)
|
||||
|
||||
async def handle_connection(self) -> None:
|
||||
"""
|
||||
Main connection handler that manages the WebSocket lifecycle.
|
||||
|
@ -66,14 +70,16 @@ class WebSocketHandler:
|
|||
# Send immediate heartbeat to show connection is alive
|
||||
await self._send_immediate_heartbeat()
|
||||
|
||||
# Start background tasks (matching original architecture)
|
||||
stream_task = asyncio.create_task(self._process_streams())
|
||||
# Start session integration
|
||||
await self.session_integration.start()
|
||||
|
||||
# Start background tasks - stream processing now handled by session workers
|
||||
heartbeat_task = asyncio.create_task(self._send_heartbeat())
|
||||
message_task = asyncio.create_task(self._handle_messages())
|
||||
|
||||
logger.info(f"WebSocket background tasks started for {client_info} (stream + heartbeat + message handler)")
|
||||
logger.info(f"WebSocket background tasks started for {client_info} (heartbeat + message handler)")
|
||||
|
||||
# Wait for heartbeat and message tasks (stream runs independently)
|
||||
# Wait for heartbeat and message tasks
|
||||
await asyncio.gather(heartbeat_task, message_task)
|
||||
|
||||
except Exception as e:
|
||||
|
@ -87,6 +93,11 @@ class WebSocketHandler:
|
|||
await stream_task
|
||||
except asyncio.CancelledError:
|
||||
logger.debug(f"Stream task cancelled for {client_info}")
|
||||
|
||||
# Stop session integration
|
||||
if hasattr(self, 'session_integration'):
|
||||
await self.session_integration.stop()
|
||||
|
||||
await self._cleanup()
|
||||
|
||||
async def _send_immediate_heartbeat(self) -> None:
|
||||
|
@ -180,11 +191,11 @@ class WebSocketHandler:
|
|||
|
||||
try:
|
||||
if message_type == MessageTypes.SET_SUBSCRIPTION_LIST:
|
||||
await self._handle_set_subscription_list(message)
|
||||
await self.session_integration.handle_set_subscription_list(message)
|
||||
elif message_type == MessageTypes.SET_SESSION_ID:
|
||||
await self._handle_set_session_id(message)
|
||||
await self.session_integration.handle_set_session_id(message)
|
||||
elif message_type == MessageTypes.SET_PROGRESSION_STAGE:
|
||||
await self._handle_set_progression_stage(message)
|
||||
await self.session_integration.handle_progression_stage(message)
|
||||
elif message_type == MessageTypes.REQUEST_STATE:
|
||||
await self._handle_request_state(message)
|
||||
elif message_type == MessageTypes.PATCH_SESSION_RESULT:
|
||||
|
@ -619,31 +630,108 @@ class WebSocketHandler:
|
|||
logger.error(f"Failed to send WebSocket message: {e}")
|
||||
raise
|
||||
|
||||
async def send_message(self, message) -> None:
|
||||
"""Public method to send messages (used by session integration)."""
|
||||
await self._send_message(message)
|
||||
|
||||
# DEPRECATED: Stream processing is now handled directly by session worker processes
|
||||
async def _process_streams(self) -> None:
|
||||
"""
|
||||
Stream processing task that handles frame processing and detection.
|
||||
This is a placeholder for Phase 2 - currently just logs that it's running.
|
||||
DEPRECATED: Stream processing task that handles frame processing and detection.
|
||||
Stream processing is now integrated directly into session worker processes.
|
||||
"""
|
||||
logger.info("DEPRECATED: Stream processing task - now handled by session workers")
|
||||
return # Exit immediately - no longer needed
|
||||
|
||||
# OLD CODE (disabled):
|
||||
logger.info("Stream processing task started")
|
||||
try:
|
||||
while self.connected:
|
||||
# Get current subscriptions
|
||||
subscriptions = worker_state.get_all_subscriptions()
|
||||
|
||||
# TODO: Phase 2 - Add actual frame processing logic here
|
||||
# This will include:
|
||||
# - Frame reading from RTSP/HTTP streams
|
||||
# - Model inference using loaded pipelines
|
||||
# - Detection result sending via WebSocket
|
||||
if not subscriptions:
|
||||
await asyncio.sleep(0.5)
|
||||
continue
|
||||
|
||||
# Process frames for each subscription
|
||||
for subscription in subscriptions:
|
||||
await self._process_subscription_frames(subscription)
|
||||
|
||||
# Sleep to prevent excessive CPU usage (similar to old poll_interval)
|
||||
await asyncio.sleep(0.1) # 100ms polling interval
|
||||
await asyncio.sleep(0.25) # 250ms polling interval
|
||||
|
||||
except asyncio.CancelledError:
|
||||
logger.info("Stream processing task cancelled")
|
||||
except Exception as e:
|
||||
logger.error(f"Error in stream processing: {e}", exc_info=True)
|
||||
|
||||
async def _process_subscription_frames(self, subscription) -> None:
|
||||
"""
|
||||
Process frames for a single subscription by getting frames from stream manager
|
||||
and forwarding them to the appropriate session worker.
|
||||
"""
|
||||
try:
|
||||
subscription_id = subscription.subscriptionIdentifier
|
||||
|
||||
# Get the latest frame from the stream manager
|
||||
frame_data = await self._get_frame_from_stream_manager(subscription)
|
||||
|
||||
if frame_data and frame_data['frame'] is not None:
|
||||
# Extract display identifier (format: "test1;Dispenser Camera 1")
|
||||
display_id = subscription_id.split(';')[-1] if ';' in subscription_id else subscription_id
|
||||
|
||||
# Forward frame to session worker via session integration
|
||||
success = await self.session_integration.process_frame(
|
||||
subscription_id=subscription_id,
|
||||
frame=frame_data['frame'],
|
||||
display_id=display_id,
|
||||
timestamp=frame_data.get('timestamp', asyncio.get_event_loop().time())
|
||||
)
|
||||
|
||||
if success:
|
||||
logger.debug(f"[Frame Processing] Sent frame to session worker for {subscription_id}")
|
||||
else:
|
||||
logger.warning(f"[Frame Processing] Failed to send frame to session worker for {subscription_id}")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error processing frames for {subscription.subscriptionIdentifier}: {e}")
|
||||
|
||||
async def _get_frame_from_stream_manager(self, subscription) -> dict:
|
||||
"""
|
||||
Get the latest frame from the stream manager for a subscription using existing API.
|
||||
"""
|
||||
try:
|
||||
subscription_id = subscription.subscriptionIdentifier
|
||||
|
||||
# Use existing stream manager API to check if frame is available
|
||||
if not shared_stream_manager.has_frame(subscription_id):
|
||||
# Stream should already be started by session integration
|
||||
return {'frame': None, 'timestamp': None}
|
||||
|
||||
# Get frame using existing API with crop coordinates if available
|
||||
crop_coords = None
|
||||
if hasattr(subscription, 'cropX1') and subscription.cropX1 is not None:
|
||||
crop_coords = (
|
||||
subscription.cropX1, subscription.cropY1,
|
||||
subscription.cropX2, subscription.cropY2
|
||||
)
|
||||
|
||||
# Use existing get_frame method
|
||||
frame = shared_stream_manager.get_frame(subscription_id, crop_coords)
|
||||
if frame is not None:
|
||||
return {
|
||||
'frame': frame,
|
||||
'timestamp': asyncio.get_event_loop().time()
|
||||
}
|
||||
|
||||
return {'frame': None, 'timestamp': None}
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error getting frame from stream manager for {subscription.subscriptionIdentifier}: {e}")
|
||||
return {'frame': None, 'timestamp': None}
|
||||
|
||||
|
||||
async def _cleanup(self) -> None:
|
||||
"""Clean up resources when connection closes."""
|
||||
logger.info("Cleaning up WebSocket connection")
|
||||
|
|
|
@ -58,10 +58,10 @@ class DetectionPipeline:
|
|||
# Pipeline configuration
|
||||
self.pipeline_config = pipeline_parser.pipeline_config
|
||||
|
||||
# SessionId to subscriptionIdentifier mapping
|
||||
# SessionId to subscriptionIdentifier mapping (ISOLATED per session process)
|
||||
self.session_to_subscription = {}
|
||||
|
||||
# SessionId to processing results mapping (for combining with license plate results)
|
||||
# SessionId to processing results mapping (ISOLATED per session process)
|
||||
self.session_processing_results = {}
|
||||
|
||||
# Statistics
|
||||
|
@ -72,7 +72,8 @@ class DetectionPipeline:
|
|||
'total_processing_time': 0.0
|
||||
}
|
||||
|
||||
logger.info("DetectionPipeline initialized")
|
||||
logger.info(f"DetectionPipeline initialized for model {model_id} with ISOLATED state (no shared mappings or cache)")
|
||||
logger.info(f"Pipeline instance ID: {id(self)} - unique per session process")
|
||||
|
||||
async def initialize(self) -> bool:
|
||||
"""
|
||||
|
|
3
core/logging/__init__.py
Normal file
3
core/logging/__init__.py
Normal file
|
@ -0,0 +1,3 @@
|
|||
"""
|
||||
Per-Session Logging Module
|
||||
"""
|
356
core/logging/session_logger.py
Normal file
356
core/logging/session_logger.py
Normal file
|
@ -0,0 +1,356 @@
|
|||
"""
|
||||
Per-Session Logging Configuration and Management.
|
||||
Each session process gets its own dedicated log file with rotation support.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import logging.handlers
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
from datetime import datetime
|
||||
import re
|
||||
|
||||
|
||||
class PerSessionLogger:
|
||||
"""
|
||||
Per-session logging configuration that creates dedicated log files for each session.
|
||||
Supports log rotation and structured logging with session context.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
session_id: str,
|
||||
subscription_identifier: str,
|
||||
log_dir: str = "logs",
|
||||
max_size_mb: int = 100,
|
||||
backup_count: int = 5,
|
||||
log_level: int = logging.INFO,
|
||||
detection_mode: bool = True
|
||||
):
|
||||
"""
|
||||
Initialize per-session logger.
|
||||
|
||||
Args:
|
||||
session_id: Unique session identifier
|
||||
subscription_identifier: Subscription identifier (contains camera info)
|
||||
log_dir: Directory to store log files
|
||||
max_size_mb: Maximum size of each log file in MB
|
||||
backup_count: Number of backup files to keep
|
||||
log_level: Logging level
|
||||
detection_mode: If True, uses reduced verbosity for detection processes
|
||||
"""
|
||||
self.session_id = session_id
|
||||
self.subscription_identifier = subscription_identifier
|
||||
self.log_dir = Path(log_dir)
|
||||
self.max_size_mb = max_size_mb
|
||||
self.backup_count = backup_count
|
||||
self.log_level = log_level
|
||||
self.detection_mode = detection_mode
|
||||
|
||||
# Ensure log directory exists
|
||||
self.log_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Generate clean filename from subscription identifier
|
||||
self.log_filename = self._generate_log_filename()
|
||||
self.log_filepath = self.log_dir / self.log_filename
|
||||
|
||||
# Create logger
|
||||
self.logger = self._setup_logger()
|
||||
|
||||
def _generate_log_filename(self) -> str:
|
||||
"""
|
||||
Generate a clean filename from subscription identifier.
|
||||
Format: detector_worker_camera_{clean_subscription_id}.log
|
||||
|
||||
Returns:
|
||||
Clean filename for the log file
|
||||
"""
|
||||
# Clean subscription identifier for filename
|
||||
# Replace problematic characters with underscores
|
||||
clean_sub_id = re.sub(r'[^\w\-_.]', '_', self.subscription_identifier)
|
||||
|
||||
# Remove consecutive underscores
|
||||
clean_sub_id = re.sub(r'_+', '_', clean_sub_id)
|
||||
|
||||
# Remove leading/trailing underscores
|
||||
clean_sub_id = clean_sub_id.strip('_')
|
||||
|
||||
# Generate filename
|
||||
filename = f"detector_worker_camera_{clean_sub_id}.log"
|
||||
|
||||
return filename
|
||||
|
||||
def _setup_logger(self) -> logging.Logger:
|
||||
"""
|
||||
Setup logger with file handler and rotation.
|
||||
|
||||
Returns:
|
||||
Configured logger instance
|
||||
"""
|
||||
# Create logger with unique name
|
||||
logger_name = f"session_worker_{self.session_id}"
|
||||
logger = logging.getLogger(logger_name)
|
||||
|
||||
# Clear any existing handlers to avoid duplicates
|
||||
logger.handlers.clear()
|
||||
|
||||
# Set logging level
|
||||
logger.setLevel(self.log_level)
|
||||
|
||||
# Create formatter with session context
|
||||
formatter = logging.Formatter(
|
||||
fmt='%(asctime)s [%(levelname)s] %(name)s [Session: {session_id}] [Camera: {camera}]: %(message)s'.format(
|
||||
session_id=self.session_id,
|
||||
camera=self.subscription_identifier
|
||||
),
|
||||
datefmt='%Y-%m-%d %H:%M:%S'
|
||||
)
|
||||
|
||||
# Create rotating file handler
|
||||
max_bytes = self.max_size_mb * 1024 * 1024 # Convert MB to bytes
|
||||
file_handler = logging.handlers.RotatingFileHandler(
|
||||
filename=self.log_filepath,
|
||||
maxBytes=max_bytes,
|
||||
backupCount=self.backup_count,
|
||||
encoding='utf-8'
|
||||
)
|
||||
file_handler.setLevel(self.log_level)
|
||||
file_handler.setFormatter(formatter)
|
||||
|
||||
# Create console handler for debugging (optional)
|
||||
console_handler = logging.StreamHandler(sys.stdout)
|
||||
console_handler.setLevel(logging.WARNING) # Only warnings and errors to console
|
||||
console_formatter = logging.Formatter(
|
||||
fmt='[{session_id}] [%(levelname)s]: %(message)s'.format(
|
||||
session_id=self.session_id
|
||||
)
|
||||
)
|
||||
console_handler.setFormatter(console_formatter)
|
||||
|
||||
# Add handlers to logger
|
||||
logger.addHandler(file_handler)
|
||||
logger.addHandler(console_handler)
|
||||
|
||||
# Prevent propagation to root logger
|
||||
logger.propagate = False
|
||||
|
||||
# Log initialization (reduced verbosity in detection mode)
|
||||
if self.detection_mode:
|
||||
logger.info(f"Session logger ready for {self.subscription_identifier}")
|
||||
else:
|
||||
logger.info(f"Per-session logger initialized")
|
||||
logger.info(f"Log file: {self.log_filepath}")
|
||||
logger.info(f"Session ID: {self.session_id}")
|
||||
logger.info(f"Camera: {self.subscription_identifier}")
|
||||
logger.info(f"Max size: {self.max_size_mb}MB, Backup count: {self.backup_count}")
|
||||
|
||||
return logger
|
||||
|
||||
def get_logger(self) -> logging.Logger:
|
||||
"""
|
||||
Get the configured logger instance.
|
||||
|
||||
Returns:
|
||||
Logger instance for this session
|
||||
"""
|
||||
return self.logger
|
||||
|
||||
def log_session_start(self, process_id: int):
|
||||
"""
|
||||
Log session start with process information.
|
||||
|
||||
Args:
|
||||
process_id: Process ID of the session worker
|
||||
"""
|
||||
if self.detection_mode:
|
||||
self.logger.info(f"Session started - PID {process_id}")
|
||||
else:
|
||||
self.logger.info("=" * 60)
|
||||
self.logger.info(f"SESSION STARTED")
|
||||
self.logger.info(f"Process ID: {process_id}")
|
||||
self.logger.info(f"Session ID: {self.session_id}")
|
||||
self.logger.info(f"Camera: {self.subscription_identifier}")
|
||||
self.logger.info(f"Timestamp: {datetime.now().isoformat()}")
|
||||
self.logger.info("=" * 60)
|
||||
|
||||
def log_session_end(self):
|
||||
"""Log session end."""
|
||||
self.logger.info("=" * 60)
|
||||
self.logger.info(f"SESSION ENDED")
|
||||
self.logger.info(f"Timestamp: {datetime.now().isoformat()}")
|
||||
self.logger.info("=" * 60)
|
||||
|
||||
def log_model_loading(self, model_id: int, model_name: str, model_path: str):
|
||||
"""
|
||||
Log model loading information.
|
||||
|
||||
Args:
|
||||
model_id: Model ID
|
||||
model_name: Model name
|
||||
model_path: Path to the model
|
||||
"""
|
||||
if self.detection_mode:
|
||||
self.logger.info(f"Loading model {model_id}: {model_name}")
|
||||
else:
|
||||
self.logger.info("-" * 40)
|
||||
self.logger.info(f"MODEL LOADING")
|
||||
self.logger.info(f"Model ID: {model_id}")
|
||||
self.logger.info(f"Model Name: {model_name}")
|
||||
self.logger.info(f"Model Path: {model_path}")
|
||||
self.logger.info("-" * 40)
|
||||
|
||||
def log_frame_processing(self, frame_count: int, processing_time: float, detections: int):
|
||||
"""
|
||||
Log frame processing information.
|
||||
|
||||
Args:
|
||||
frame_count: Current frame count
|
||||
processing_time: Processing time in seconds
|
||||
detections: Number of detections found
|
||||
"""
|
||||
self.logger.debug(f"FRAME #{frame_count}: Processing time: {processing_time:.3f}s, Detections: {detections}")
|
||||
|
||||
def log_detection_result(self, detection_type: str, confidence: float, bbox: list):
|
||||
"""
|
||||
Log detection result.
|
||||
|
||||
Args:
|
||||
detection_type: Type of detection (e.g., "Car", "Frontal")
|
||||
confidence: Detection confidence
|
||||
bbox: Bounding box coordinates
|
||||
"""
|
||||
self.logger.info(f"DETECTION: {detection_type} (conf: {confidence:.3f}) at {bbox}")
|
||||
|
||||
def log_database_operation(self, operation: str, session_id: str, success: bool):
|
||||
"""
|
||||
Log database operation.
|
||||
|
||||
Args:
|
||||
operation: Type of operation
|
||||
session_id: Session ID used in database
|
||||
success: Whether operation succeeded
|
||||
"""
|
||||
status = "SUCCESS" if success else "FAILED"
|
||||
self.logger.info(f"DATABASE {operation}: {status} (session: {session_id})")
|
||||
|
||||
def log_error(self, error_type: str, error_message: str, traceback_str: Optional[str] = None):
|
||||
"""
|
||||
Log error with context.
|
||||
|
||||
Args:
|
||||
error_type: Type of error
|
||||
error_message: Error message
|
||||
traceback_str: Optional traceback string
|
||||
"""
|
||||
self.logger.error(f"ERROR [{error_type}]: {error_message}")
|
||||
if traceback_str:
|
||||
self.logger.error(f"Traceback:\n{traceback_str}")
|
||||
|
||||
def get_log_stats(self) -> dict:
|
||||
"""
|
||||
Get logging statistics.
|
||||
|
||||
Returns:
|
||||
Dictionary with logging statistics
|
||||
"""
|
||||
try:
|
||||
if self.log_filepath.exists():
|
||||
stat = self.log_filepath.stat()
|
||||
return {
|
||||
'log_file': str(self.log_filepath),
|
||||
'file_size_mb': round(stat.st_size / (1024 * 1024), 2),
|
||||
'created': datetime.fromtimestamp(stat.st_ctime).isoformat(),
|
||||
'modified': datetime.fromtimestamp(stat.st_mtime).isoformat(),
|
||||
}
|
||||
else:
|
||||
return {'log_file': str(self.log_filepath), 'status': 'not_created'}
|
||||
except Exception as e:
|
||||
return {'log_file': str(self.log_filepath), 'error': str(e)}
|
||||
|
||||
def cleanup(self):
|
||||
"""Cleanup logger handlers."""
|
||||
if hasattr(self, 'logger') and self.logger:
|
||||
for handler in self.logger.handlers[:]:
|
||||
handler.close()
|
||||
self.logger.removeHandler(handler)
|
||||
|
||||
|
||||
class MainProcessLogger:
|
||||
"""
|
||||
Logger configuration for the main FastAPI process.
|
||||
Separate from session logs to avoid confusion.
|
||||
"""
|
||||
|
||||
def __init__(self, log_dir: str = "logs", max_size_mb: int = 50, backup_count: int = 3):
|
||||
"""
|
||||
Initialize main process logger.
|
||||
|
||||
Args:
|
||||
log_dir: Directory to store log files
|
||||
max_size_mb: Maximum size of each log file in MB
|
||||
backup_count: Number of backup files to keep
|
||||
"""
|
||||
self.log_dir = Path(log_dir)
|
||||
self.max_size_mb = max_size_mb
|
||||
self.backup_count = backup_count
|
||||
|
||||
# Ensure log directory exists
|
||||
self.log_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Setup main process logger
|
||||
self._setup_main_logger()
|
||||
|
||||
def _setup_main_logger(self):
|
||||
"""Setup main process logger."""
|
||||
# Configure root logger
|
||||
root_logger = logging.getLogger("detector_worker")
|
||||
|
||||
# Clear existing handlers
|
||||
for handler in root_logger.handlers[:]:
|
||||
root_logger.removeHandler(handler)
|
||||
|
||||
# Set level
|
||||
root_logger.setLevel(logging.INFO)
|
||||
|
||||
# Create formatter
|
||||
formatter = logging.Formatter(
|
||||
fmt='%(asctime)s [%(levelname)s] %(name)s [MAIN]: %(message)s',
|
||||
datefmt='%Y-%m-%d %H:%M:%S'
|
||||
)
|
||||
|
||||
# Create rotating file handler for main process
|
||||
max_bytes = self.max_size_mb * 1024 * 1024
|
||||
main_log_path = self.log_dir / "detector_worker_main.log"
|
||||
file_handler = logging.handlers.RotatingFileHandler(
|
||||
filename=main_log_path,
|
||||
maxBytes=max_bytes,
|
||||
backupCount=self.backup_count,
|
||||
encoding='utf-8'
|
||||
)
|
||||
file_handler.setLevel(logging.INFO)
|
||||
file_handler.setFormatter(formatter)
|
||||
|
||||
# Create console handler
|
||||
console_handler = logging.StreamHandler()
|
||||
console_handler.setLevel(logging.INFO)
|
||||
console_handler.setFormatter(formatter)
|
||||
|
||||
# Add handlers
|
||||
root_logger.addHandler(file_handler)
|
||||
root_logger.addHandler(console_handler)
|
||||
|
||||
# Log initialization
|
||||
root_logger.info("Main process logger initialized")
|
||||
root_logger.info(f"Main log file: {main_log_path}")
|
||||
|
||||
|
||||
def setup_main_process_logging(log_dir: str = "logs"):
|
||||
"""
|
||||
Setup logging for the main FastAPI process.
|
||||
|
||||
Args:
|
||||
log_dir: Directory to store log files
|
||||
"""
|
||||
MainProcessLogger(log_dir=log_dir)
|
|
@ -34,11 +34,7 @@ class InferenceResult:
|
|||
|
||||
|
||||
class YOLOWrapper:
|
||||
"""Wrapper for YOLO models with caching and optimization"""
|
||||
|
||||
# Class-level model cache shared across all instances
|
||||
_model_cache: Dict[str, Any] = {}
|
||||
_cache_lock = Lock()
|
||||
"""Wrapper for YOLO models with per-instance isolation (no shared cache)"""
|
||||
|
||||
def __init__(self, model_path: Path, model_id: str, device: Optional[str] = None):
|
||||
"""
|
||||
|
@ -65,61 +61,48 @@ class YOLOWrapper:
|
|||
logger.info(f"Initialized YOLO wrapper for {model_id} on {self.device}")
|
||||
|
||||
def _load_model(self) -> None:
|
||||
"""Load the YOLO model with caching"""
|
||||
cache_key = str(self.model_path)
|
||||
"""Load the YOLO model in isolation (no shared cache)"""
|
||||
try:
|
||||
from ultralytics import YOLO
|
||||
|
||||
with self._cache_lock:
|
||||
# Check if model is already cached
|
||||
if cache_key in self._model_cache:
|
||||
logger.info(f"Loading model {self.model_id} from cache")
|
||||
self.model = self._model_cache[cache_key]
|
||||
self._extract_class_names()
|
||||
return
|
||||
logger.debug(f"Loading YOLO model {self.model_id} from {self.model_path} (ISOLATED)")
|
||||
|
||||
# Load model
|
||||
try:
|
||||
from ultralytics import YOLO
|
||||
# Load model directly without any caching
|
||||
self.model = YOLO(str(self.model_path))
|
||||
|
||||
logger.info(f"Loading YOLO model from {self.model_path}")
|
||||
# Determine if this is a classification model based on filename or model structure
|
||||
# Classification models typically have 'cls' in filename
|
||||
is_classification = 'cls' in str(self.model_path).lower()
|
||||
|
||||
# Load model normally first
|
||||
self.model = YOLO(str(self.model_path))
|
||||
# For classification models, create a separate instance with task parameter
|
||||
if is_classification:
|
||||
try:
|
||||
# Reload with classification task (like ML engineer's approach)
|
||||
self.model = YOLO(str(self.model_path), task="classify")
|
||||
logger.info(f"Loaded classification model {self.model_id} with task='classify' (ISOLATED)")
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to load with task='classify', using default: {e}")
|
||||
# Fall back to regular loading
|
||||
self.model = YOLO(str(self.model_path))
|
||||
logger.info(f"Loaded model {self.model_id} with default task (ISOLATED)")
|
||||
else:
|
||||
logger.info(f"Loaded detection model {self.model_id} (ISOLATED)")
|
||||
|
||||
# Determine if this is a classification model based on filename or model structure
|
||||
# Classification models typically have 'cls' in filename
|
||||
is_classification = 'cls' in str(self.model_path).lower()
|
||||
# Move model to device
|
||||
if self.device == 'cuda' and torch.cuda.is_available():
|
||||
self.model.to('cuda')
|
||||
logger.info(f"Model {self.model_id} moved to GPU (ISOLATED)")
|
||||
|
||||
# For classification models, create a separate instance with task parameter
|
||||
if is_classification:
|
||||
try:
|
||||
# Reload with classification task (like ML engineer's approach)
|
||||
self.model = YOLO(str(self.model_path), task="classify")
|
||||
logger.info(f"Loaded classification model {self.model_id} with task='classify'")
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to load with task='classify', using default: {e}")
|
||||
# Fall back to regular loading
|
||||
self.model = YOLO(str(self.model_path))
|
||||
logger.info(f"Loaded model {self.model_id} with default task")
|
||||
else:
|
||||
logger.info(f"Loaded detection model {self.model_id}")
|
||||
self._extract_class_names()
|
||||
|
||||
# Move model to device
|
||||
if self.device == 'cuda' and torch.cuda.is_available():
|
||||
self.model.to('cuda')
|
||||
logger.info(f"Model {self.model_id} moved to GPU")
|
||||
logger.debug(f"Successfully loaded model {self.model_id} in isolation - no shared cache!")
|
||||
|
||||
# Cache the model
|
||||
self._model_cache[cache_key] = self.model
|
||||
self._extract_class_names()
|
||||
|
||||
logger.info(f"Successfully loaded model {self.model_id}")
|
||||
|
||||
except ImportError:
|
||||
logger.error("Ultralytics YOLO not installed. Install with: pip install ultralytics")
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to load YOLO model {self.model_id}: {str(e)}", exc_info=True)
|
||||
raise
|
||||
except ImportError:
|
||||
logger.error("Ultralytics YOLO not installed. Install with: pip install ultralytics")
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to load YOLO model {self.model_id}: {str(e)}", exc_info=True)
|
||||
raise
|
||||
|
||||
def _extract_class_names(self) -> None:
|
||||
"""Extract class names from the model"""
|
||||
|
@ -375,19 +358,15 @@ class YOLOWrapper:
|
|||
return 'cls' in str(self.model_path).lower() or 'classify' in str(self.model_path).lower()
|
||||
|
||||
def clear_cache(self) -> None:
|
||||
"""Clear the model cache"""
|
||||
with self._cache_lock:
|
||||
cache_key = str(self.model_path)
|
||||
if cache_key in self._model_cache:
|
||||
del self._model_cache[cache_key]
|
||||
logger.info(f"Cleared cache for model {self.model_id}")
|
||||
"""Clear model resources (no cache in isolated mode)"""
|
||||
if self.model:
|
||||
# Clear any model resources if needed
|
||||
logger.info(f"Cleared resources for model {self.model_id} (no shared cache)")
|
||||
|
||||
@classmethod
|
||||
def clear_all_cache(cls) -> None:
|
||||
"""Clear all cached models"""
|
||||
with cls._cache_lock:
|
||||
cls._model_cache.clear()
|
||||
logger.info("Cleared all model cache")
|
||||
"""No-op in isolated mode (no shared cache to clear)"""
|
||||
logger.info("No shared cache to clear in isolated mode")
|
||||
|
||||
def warmup(self, image_size: Tuple[int, int] = (640, 640)) -> None:
|
||||
"""
|
||||
|
@ -438,16 +417,17 @@ class ModelInferenceManager:
|
|||
YOLOWrapper instance
|
||||
"""
|
||||
with self._lock:
|
||||
# Check if already loaded
|
||||
# Check if already loaded for this specific manager instance
|
||||
if model_id in self.models:
|
||||
logger.debug(f"Model {model_id} already loaded")
|
||||
logger.debug(f"Model {model_id} already loaded in this manager instance")
|
||||
return self.models[model_id]
|
||||
|
||||
# Load the model
|
||||
# Load the model (each instance loads independently)
|
||||
model_path = self.model_dir / model_file
|
||||
if not model_path.exists():
|
||||
raise FileNotFoundError(f"Model file not found: {model_path}")
|
||||
|
||||
logger.info(f"Loading model {model_id} in isolation for this manager instance")
|
||||
wrapper = YOLOWrapper(model_path, model_id, device)
|
||||
self.models[model_id] = wrapper
|
||||
|
||||
|
|
3
core/processes/__init__.py
Normal file
3
core/processes/__init__.py
Normal file
|
@ -0,0 +1,3 @@
|
|||
"""
|
||||
Session Process Management Module
|
||||
"""
|
317
core/processes/communication.py
Normal file
317
core/processes/communication.py
Normal file
|
@ -0,0 +1,317 @@
|
|||
"""
|
||||
Inter-Process Communication (IPC) system for session processes.
|
||||
Defines message types and protocols for main ↔ session communication.
|
||||
"""
|
||||
|
||||
import time
|
||||
from enum import Enum
|
||||
from typing import Dict, Any, Optional, Union
|
||||
from dataclasses import dataclass, field
|
||||
import numpy as np
|
||||
|
||||
|
||||
class MessageType(Enum):
|
||||
"""Message types for IPC communication."""
|
||||
|
||||
# Commands: Main → Session
|
||||
INITIALIZE = "initialize"
|
||||
PROCESS_FRAME = "process_frame"
|
||||
SET_SESSION_ID = "set_session_id"
|
||||
SHUTDOWN = "shutdown"
|
||||
HEALTH_CHECK = "health_check"
|
||||
|
||||
# Responses: Session → Main
|
||||
INITIALIZED = "initialized"
|
||||
DETECTION_RESULT = "detection_result"
|
||||
SESSION_SET = "session_set"
|
||||
SHUTDOWN_COMPLETE = "shutdown_complete"
|
||||
HEALTH_RESPONSE = "health_response"
|
||||
ERROR = "error"
|
||||
|
||||
|
||||
@dataclass
|
||||
class IPCMessage:
|
||||
"""Base class for all IPC messages."""
|
||||
type: MessageType
|
||||
session_id: str
|
||||
timestamp: float = field(default_factory=time.time)
|
||||
message_id: str = field(default_factory=lambda: str(int(time.time() * 1000000)))
|
||||
|
||||
|
||||
@dataclass
|
||||
class InitializeCommand(IPCMessage):
|
||||
"""Initialize session process with configuration."""
|
||||
subscription_config: Dict[str, Any] = field(default_factory=dict)
|
||||
model_config: Dict[str, Any] = field(default_factory=dict)
|
||||
|
||||
|
||||
|
||||
@dataclass
|
||||
class ProcessFrameCommand(IPCMessage):
|
||||
"""Process a frame through the detection pipeline."""
|
||||
frame: Optional[np.ndarray] = None
|
||||
display_id: str = ""
|
||||
subscription_identifier: str = ""
|
||||
frame_timestamp: float = 0.0
|
||||
|
||||
|
||||
|
||||
@dataclass
|
||||
class SetSessionIdCommand(IPCMessage):
|
||||
"""Set the session ID for the current session."""
|
||||
backend_session_id: str = ""
|
||||
display_id: str = ""
|
||||
|
||||
|
||||
|
||||
@dataclass
|
||||
class ShutdownCommand(IPCMessage):
|
||||
"""Shutdown the session process gracefully."""
|
||||
|
||||
|
||||
|
||||
@dataclass
|
||||
class HealthCheckCommand(IPCMessage):
|
||||
"""Check health status of session process."""
|
||||
|
||||
|
||||
|
||||
@dataclass
|
||||
class InitializedResponse(IPCMessage):
|
||||
"""Response indicating successful initialization."""
|
||||
success: bool = False
|
||||
error_message: Optional[str] = None
|
||||
|
||||
|
||||
|
||||
@dataclass
|
||||
class DetectionResultResponse(IPCMessage):
|
||||
"""Detection results from session process."""
|
||||
detections: Dict[str, Any] = field(default_factory=dict)
|
||||
processing_time: float = 0.0
|
||||
phase: str = "" # "detection" or "processing"
|
||||
|
||||
|
||||
|
||||
@dataclass
|
||||
class SessionSetResponse(IPCMessage):
|
||||
"""Response confirming session ID was set."""
|
||||
success: bool = False
|
||||
backend_session_id: str = ""
|
||||
|
||||
|
||||
|
||||
@dataclass
|
||||
class ShutdownCompleteResponse(IPCMessage):
|
||||
"""Response confirming graceful shutdown."""
|
||||
|
||||
|
||||
|
||||
@dataclass
|
||||
class HealthResponse(IPCMessage):
|
||||
"""Health status response."""
|
||||
status: str = "unknown" # "healthy", "degraded", "unhealthy"
|
||||
memory_usage_mb: float = 0.0
|
||||
cpu_percent: float = 0.0
|
||||
gpu_memory_mb: Optional[float] = None
|
||||
uptime_seconds: float = 0.0
|
||||
processed_frames: int = 0
|
||||
|
||||
|
||||
|
||||
@dataclass
|
||||
class ErrorResponse(IPCMessage):
|
||||
"""Error message from session process."""
|
||||
error_type: str = ""
|
||||
error_message: str = ""
|
||||
traceback: Optional[str] = None
|
||||
|
||||
|
||||
|
||||
# Type aliases for message unions
|
||||
CommandMessage = Union[
|
||||
InitializeCommand,
|
||||
ProcessFrameCommand,
|
||||
SetSessionIdCommand,
|
||||
ShutdownCommand,
|
||||
HealthCheckCommand
|
||||
]
|
||||
|
||||
ResponseMessage = Union[
|
||||
InitializedResponse,
|
||||
DetectionResultResponse,
|
||||
SessionSetResponse,
|
||||
ShutdownCompleteResponse,
|
||||
HealthResponse,
|
||||
ErrorResponse
|
||||
]
|
||||
|
||||
IPCMessageUnion = Union[CommandMessage, ResponseMessage]
|
||||
|
||||
|
||||
class MessageSerializer:
|
||||
"""Handles serialization/deserialization of IPC messages."""
|
||||
|
||||
@staticmethod
|
||||
def serialize_message(message: IPCMessageUnion) -> Dict[str, Any]:
|
||||
"""
|
||||
Serialize message to dictionary for queue transport.
|
||||
|
||||
Args:
|
||||
message: Message to serialize
|
||||
|
||||
Returns:
|
||||
Dictionary representation of message
|
||||
"""
|
||||
result = {
|
||||
'type': message.type.value,
|
||||
'session_id': message.session_id,
|
||||
'timestamp': message.timestamp,
|
||||
'message_id': message.message_id,
|
||||
}
|
||||
|
||||
# Add specific fields based on message type
|
||||
if isinstance(message, InitializeCommand):
|
||||
result.update({
|
||||
'subscription_config': message.subscription_config,
|
||||
'model_config': message.model_config
|
||||
})
|
||||
elif isinstance(message, ProcessFrameCommand):
|
||||
result.update({
|
||||
'frame': message.frame,
|
||||
'display_id': message.display_id,
|
||||
'subscription_identifier': message.subscription_identifier,
|
||||
'frame_timestamp': message.frame_timestamp
|
||||
})
|
||||
elif isinstance(message, SetSessionIdCommand):
|
||||
result.update({
|
||||
'backend_session_id': message.backend_session_id,
|
||||
'display_id': message.display_id
|
||||
})
|
||||
elif isinstance(message, InitializedResponse):
|
||||
result.update({
|
||||
'success': message.success,
|
||||
'error_message': message.error_message
|
||||
})
|
||||
elif isinstance(message, DetectionResultResponse):
|
||||
result.update({
|
||||
'detections': message.detections,
|
||||
'processing_time': message.processing_time,
|
||||
'phase': message.phase
|
||||
})
|
||||
elif isinstance(message, SessionSetResponse):
|
||||
result.update({
|
||||
'success': message.success,
|
||||
'backend_session_id': message.backend_session_id
|
||||
})
|
||||
elif isinstance(message, HealthResponse):
|
||||
result.update({
|
||||
'status': message.status,
|
||||
'memory_usage_mb': message.memory_usage_mb,
|
||||
'cpu_percent': message.cpu_percent,
|
||||
'gpu_memory_mb': message.gpu_memory_mb,
|
||||
'uptime_seconds': message.uptime_seconds,
|
||||
'processed_frames': message.processed_frames
|
||||
})
|
||||
elif isinstance(message, ErrorResponse):
|
||||
result.update({
|
||||
'error_type': message.error_type,
|
||||
'error_message': message.error_message,
|
||||
'traceback': message.traceback
|
||||
})
|
||||
|
||||
return result
|
||||
|
||||
@staticmethod
|
||||
def deserialize_message(data: Dict[str, Any]) -> IPCMessageUnion:
|
||||
"""
|
||||
Deserialize dictionary back to message object.
|
||||
|
||||
Args:
|
||||
data: Dictionary representation
|
||||
|
||||
Returns:
|
||||
Deserialized message object
|
||||
"""
|
||||
msg_type = MessageType(data['type'])
|
||||
session_id = data['session_id']
|
||||
timestamp = data['timestamp']
|
||||
message_id = data['message_id']
|
||||
|
||||
base_kwargs = {
|
||||
'session_id': session_id,
|
||||
'timestamp': timestamp,
|
||||
'message_id': message_id
|
||||
}
|
||||
|
||||
if msg_type == MessageType.INITIALIZE:
|
||||
return InitializeCommand(
|
||||
type=msg_type,
|
||||
subscription_config=data['subscription_config'],
|
||||
model_config=data['model_config'],
|
||||
**base_kwargs
|
||||
)
|
||||
elif msg_type == MessageType.PROCESS_FRAME:
|
||||
return ProcessFrameCommand(
|
||||
type=msg_type,
|
||||
frame=data['frame'],
|
||||
display_id=data['display_id'],
|
||||
subscription_identifier=data['subscription_identifier'],
|
||||
frame_timestamp=data['frame_timestamp'],
|
||||
**base_kwargs
|
||||
)
|
||||
elif msg_type == MessageType.SET_SESSION_ID:
|
||||
return SetSessionIdCommand(
|
||||
backend_session_id=data['backend_session_id'],
|
||||
display_id=data['display_id'],
|
||||
**base_kwargs
|
||||
)
|
||||
elif msg_type == MessageType.SHUTDOWN:
|
||||
return ShutdownCommand(**base_kwargs)
|
||||
elif msg_type == MessageType.HEALTH_CHECK:
|
||||
return HealthCheckCommand(**base_kwargs)
|
||||
elif msg_type == MessageType.INITIALIZED:
|
||||
return InitializedResponse(
|
||||
type=msg_type,
|
||||
success=data['success'],
|
||||
error_message=data.get('error_message'),
|
||||
**base_kwargs
|
||||
)
|
||||
elif msg_type == MessageType.DETECTION_RESULT:
|
||||
return DetectionResultResponse(
|
||||
type=msg_type,
|
||||
detections=data['detections'],
|
||||
processing_time=data['processing_time'],
|
||||
phase=data['phase'],
|
||||
**base_kwargs
|
||||
)
|
||||
elif msg_type == MessageType.SESSION_SET:
|
||||
return SessionSetResponse(
|
||||
type=msg_type,
|
||||
success=data['success'],
|
||||
backend_session_id=data['backend_session_id'],
|
||||
**base_kwargs
|
||||
)
|
||||
elif msg_type == MessageType.SHUTDOWN_COMPLETE:
|
||||
return ShutdownCompleteResponse(type=msg_type, **base_kwargs)
|
||||
elif msg_type == MessageType.HEALTH_RESPONSE:
|
||||
return HealthResponse(
|
||||
type=msg_type,
|
||||
status=data['status'],
|
||||
memory_usage_mb=data['memory_usage_mb'],
|
||||
cpu_percent=data['cpu_percent'],
|
||||
gpu_memory_mb=data.get('gpu_memory_mb'),
|
||||
uptime_seconds=data.get('uptime_seconds', 0.0),
|
||||
processed_frames=data.get('processed_frames', 0),
|
||||
**base_kwargs
|
||||
)
|
||||
elif msg_type == MessageType.ERROR:
|
||||
return ErrorResponse(
|
||||
type=msg_type,
|
||||
error_type=data['error_type'],
|
||||
error_message=data['error_message'],
|
||||
traceback=data.get('traceback'),
|
||||
**base_kwargs
|
||||
)
|
||||
else:
|
||||
raise ValueError(f"Unknown message type: {msg_type}")
|
464
core/processes/session_manager.py
Normal file
464
core/processes/session_manager.py
Normal file
|
@ -0,0 +1,464 @@
|
|||
"""
|
||||
Session Process Manager - Manages lifecycle of session processes.
|
||||
Handles process spawning, monitoring, cleanup, and health checks.
|
||||
"""
|
||||
|
||||
import time
|
||||
import logging
|
||||
import asyncio
|
||||
import multiprocessing as mp
|
||||
from typing import Dict, Optional, Any, Callable
|
||||
from dataclasses import dataclass
|
||||
from concurrent.futures import ThreadPoolExecutor
|
||||
import threading
|
||||
|
||||
from .communication import (
|
||||
MessageSerializer, MessageType,
|
||||
InitializeCommand, ProcessFrameCommand, SetSessionIdCommand,
|
||||
ShutdownCommand, HealthCheckCommand,
|
||||
InitializedResponse, DetectionResultResponse, SessionSetResponse,
|
||||
ShutdownCompleteResponse, HealthResponse, ErrorResponse
|
||||
)
|
||||
from .session_worker import session_worker_main
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@dataclass
|
||||
class SessionProcessInfo:
|
||||
"""Information about a running session process."""
|
||||
session_id: str
|
||||
subscription_identifier: str
|
||||
process: mp.Process
|
||||
command_queue: mp.Queue
|
||||
response_queue: mp.Queue
|
||||
created_at: float
|
||||
last_health_check: float = 0.0
|
||||
is_initialized: bool = False
|
||||
processed_frames: int = 0
|
||||
|
||||
|
||||
class SessionProcessManager:
|
||||
"""
|
||||
Manages lifecycle of session processes.
|
||||
Each session gets its own dedicated process for complete isolation.
|
||||
"""
|
||||
|
||||
def __init__(self, max_concurrent_sessions: int = 20, health_check_interval: int = 30):
|
||||
"""
|
||||
Initialize session process manager.
|
||||
|
||||
Args:
|
||||
max_concurrent_sessions: Maximum number of concurrent session processes
|
||||
health_check_interval: Interval in seconds between health checks
|
||||
"""
|
||||
self.max_concurrent_sessions = max_concurrent_sessions
|
||||
self.health_check_interval = health_check_interval
|
||||
|
||||
# Active session processes
|
||||
self.sessions: Dict[str, SessionProcessInfo] = {}
|
||||
self.subscription_to_session: Dict[str, str] = {}
|
||||
|
||||
# Thread pool for response processing
|
||||
self.response_executor = ThreadPoolExecutor(max_workers=4, thread_name_prefix="ResponseProcessor")
|
||||
|
||||
# Health check task
|
||||
self.health_check_task = None
|
||||
self.is_running = False
|
||||
|
||||
# Message callbacks
|
||||
self.detection_result_callback: Optional[Callable] = None
|
||||
self.error_callback: Optional[Callable] = None
|
||||
|
||||
# Store main event loop for async operations from threads
|
||||
self.main_event_loop = None
|
||||
|
||||
logger.info(f"SessionProcessManager initialized (max_sessions={max_concurrent_sessions})")
|
||||
|
||||
async def start(self):
|
||||
"""Start the session process manager."""
|
||||
if self.is_running:
|
||||
return
|
||||
|
||||
self.is_running = True
|
||||
|
||||
# Store the main event loop for use in threads
|
||||
self.main_event_loop = asyncio.get_running_loop()
|
||||
|
||||
logger.info("Starting session process manager")
|
||||
|
||||
# Start health check task
|
||||
self.health_check_task = asyncio.create_task(self._health_check_loop())
|
||||
|
||||
# Start response processing for existing sessions
|
||||
for session_info in self.sessions.values():
|
||||
self._start_response_processing(session_info)
|
||||
|
||||
async def stop(self):
|
||||
"""Stop the session process manager and cleanup all sessions."""
|
||||
if not self.is_running:
|
||||
return
|
||||
|
||||
logger.info("Stopping session process manager")
|
||||
self.is_running = False
|
||||
|
||||
# Cancel health check task
|
||||
if self.health_check_task:
|
||||
self.health_check_task.cancel()
|
||||
try:
|
||||
await self.health_check_task
|
||||
except asyncio.CancelledError:
|
||||
pass
|
||||
|
||||
# Shutdown all sessions
|
||||
shutdown_tasks = []
|
||||
for session_id in list(self.sessions.keys()):
|
||||
task = asyncio.create_task(self.remove_session(session_id))
|
||||
shutdown_tasks.append(task)
|
||||
|
||||
if shutdown_tasks:
|
||||
await asyncio.gather(*shutdown_tasks, return_exceptions=True)
|
||||
|
||||
# Cleanup thread pool
|
||||
self.response_executor.shutdown(wait=True)
|
||||
|
||||
logger.info("Session process manager stopped")
|
||||
|
||||
async def create_session(self, subscription_identifier: str, subscription_config: Dict[str, Any]) -> bool:
|
||||
"""
|
||||
Create a new session process for a subscription.
|
||||
|
||||
Args:
|
||||
subscription_identifier: Unique subscription identifier
|
||||
subscription_config: Subscription configuration
|
||||
|
||||
Returns:
|
||||
True if session was created successfully
|
||||
"""
|
||||
try:
|
||||
# Check if we're at capacity
|
||||
if len(self.sessions) >= self.max_concurrent_sessions:
|
||||
logger.warning(f"Cannot create session: at max capacity ({self.max_concurrent_sessions})")
|
||||
return False
|
||||
|
||||
# Check if subscription already has a session
|
||||
if subscription_identifier in self.subscription_to_session:
|
||||
existing_session_id = self.subscription_to_session[subscription_identifier]
|
||||
logger.info(f"Subscription {subscription_identifier} already has session {existing_session_id}")
|
||||
return True
|
||||
|
||||
# Generate unique session ID
|
||||
session_id = f"session_{int(time.time() * 1000)}_{subscription_identifier.replace(';', '_')}"
|
||||
|
||||
logger.info(f"Creating session process for subscription {subscription_identifier}")
|
||||
logger.info(f"Session ID: {session_id}")
|
||||
|
||||
# Create communication queues
|
||||
command_queue = mp.Queue()
|
||||
response_queue = mp.Queue()
|
||||
|
||||
# Create and start process
|
||||
process = mp.Process(
|
||||
target=session_worker_main,
|
||||
args=(session_id, command_queue, response_queue),
|
||||
name=f"SessionWorker-{session_id}"
|
||||
)
|
||||
process.start()
|
||||
|
||||
# Store session information
|
||||
session_info = SessionProcessInfo(
|
||||
session_id=session_id,
|
||||
subscription_identifier=subscription_identifier,
|
||||
process=process,
|
||||
command_queue=command_queue,
|
||||
response_queue=response_queue,
|
||||
created_at=time.time()
|
||||
)
|
||||
|
||||
self.sessions[session_id] = session_info
|
||||
self.subscription_to_session[subscription_identifier] = session_id
|
||||
|
||||
# Start response processing for this session
|
||||
self._start_response_processing(session_info)
|
||||
|
||||
logger.info(f"Session process created: {session_id} (PID: {process.pid})")
|
||||
|
||||
# Initialize the session with configuration
|
||||
model_config = {
|
||||
'modelId': subscription_config.get('modelId'),
|
||||
'modelUrl': subscription_config.get('modelUrl'),
|
||||
'modelName': subscription_config.get('modelName')
|
||||
}
|
||||
|
||||
init_command = InitializeCommand(
|
||||
type=MessageType.INITIALIZE,
|
||||
session_id=session_id,
|
||||
subscription_config=subscription_config,
|
||||
model_config=model_config
|
||||
)
|
||||
|
||||
await self._send_command(session_id, init_command)
|
||||
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to create session for {subscription_identifier}: {e}", exc_info=True)
|
||||
# Cleanup on failure
|
||||
if session_id in self.sessions:
|
||||
await self._cleanup_session(session_id)
|
||||
return False
|
||||
|
||||
async def remove_session(self, subscription_identifier: str) -> bool:
|
||||
"""
|
||||
Remove a session process for a subscription.
|
||||
|
||||
Args:
|
||||
subscription_identifier: Subscription identifier to remove
|
||||
|
||||
Returns:
|
||||
True if session was removed successfully
|
||||
"""
|
||||
try:
|
||||
session_id = self.subscription_to_session.get(subscription_identifier)
|
||||
if not session_id:
|
||||
logger.warning(f"No session found for subscription {subscription_identifier}")
|
||||
return False
|
||||
|
||||
logger.info(f"Removing session {session_id} for subscription {subscription_identifier}")
|
||||
|
||||
session_info = self.sessions.get(session_id)
|
||||
if session_info:
|
||||
# Send shutdown command
|
||||
shutdown_command = ShutdownCommand(session_id=session_id)
|
||||
await self._send_command(session_id, shutdown_command)
|
||||
|
||||
# Wait for graceful shutdown (with timeout)
|
||||
try:
|
||||
await asyncio.wait_for(self._wait_for_shutdown(session_info), timeout=10.0)
|
||||
except asyncio.TimeoutError:
|
||||
logger.warning(f"Session {session_id} did not shutdown gracefully, terminating")
|
||||
|
||||
# Cleanup session
|
||||
await self._cleanup_session(session_id)
|
||||
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to remove session for {subscription_identifier}: {e}", exc_info=True)
|
||||
return False
|
||||
|
||||
async def process_frame(self, subscription_identifier: str, frame: Any, display_id: str, frame_timestamp: float) -> bool:
|
||||
"""
|
||||
Send a frame to the session process for processing.
|
||||
|
||||
Args:
|
||||
subscription_identifier: Subscription identifier
|
||||
frame: Frame to process
|
||||
display_id: Display identifier
|
||||
frame_timestamp: Timestamp of the frame
|
||||
|
||||
Returns:
|
||||
True if frame was sent successfully
|
||||
"""
|
||||
try:
|
||||
session_id = self.subscription_to_session.get(subscription_identifier)
|
||||
if not session_id:
|
||||
logger.warning(f"No session found for subscription {subscription_identifier}")
|
||||
return False
|
||||
|
||||
session_info = self.sessions.get(session_id)
|
||||
if not session_info or not session_info.is_initialized:
|
||||
logger.warning(f"Session {session_id} not initialized")
|
||||
return False
|
||||
|
||||
# Create process frame command
|
||||
process_command = ProcessFrameCommand(
|
||||
session_id=session_id,
|
||||
frame=frame,
|
||||
display_id=display_id,
|
||||
subscription_identifier=subscription_identifier,
|
||||
frame_timestamp=frame_timestamp
|
||||
)
|
||||
|
||||
await self._send_command(session_id, process_command)
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to process frame for {subscription_identifier}: {e}", exc_info=True)
|
||||
return False
|
||||
|
||||
async def set_session_id(self, subscription_identifier: str, backend_session_id: str, display_id: str) -> bool:
|
||||
"""
|
||||
Set the backend session ID for a session.
|
||||
|
||||
Args:
|
||||
subscription_identifier: Subscription identifier
|
||||
backend_session_id: Backend session ID
|
||||
display_id: Display identifier
|
||||
|
||||
Returns:
|
||||
True if session ID was set successfully
|
||||
"""
|
||||
try:
|
||||
session_id = self.subscription_to_session.get(subscription_identifier)
|
||||
if not session_id:
|
||||
logger.warning(f"No session found for subscription {subscription_identifier}")
|
||||
return False
|
||||
|
||||
# Create set session ID command
|
||||
set_command = SetSessionIdCommand(
|
||||
session_id=session_id,
|
||||
backend_session_id=backend_session_id,
|
||||
display_id=display_id
|
||||
)
|
||||
|
||||
await self._send_command(session_id, set_command)
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to set session ID for {subscription_identifier}: {e}", exc_info=True)
|
||||
return False
|
||||
|
||||
def set_detection_result_callback(self, callback: Callable):
|
||||
"""Set callback for handling detection results."""
|
||||
self.detection_result_callback = callback
|
||||
|
||||
def set_error_callback(self, callback: Callable):
|
||||
"""Set callback for handling errors."""
|
||||
self.error_callback = callback
|
||||
|
||||
def get_session_count(self) -> int:
|
||||
"""Get the number of active sessions."""
|
||||
return len(self.sessions)
|
||||
|
||||
def get_session_info(self, subscription_identifier: str) -> Optional[Dict[str, Any]]:
|
||||
"""Get information about a session."""
|
||||
session_id = self.subscription_to_session.get(subscription_identifier)
|
||||
if not session_id:
|
||||
return None
|
||||
|
||||
session_info = self.sessions.get(session_id)
|
||||
if not session_info:
|
||||
return None
|
||||
|
||||
return {
|
||||
'session_id': session_id,
|
||||
'subscription_identifier': subscription_identifier,
|
||||
'created_at': session_info.created_at,
|
||||
'is_initialized': session_info.is_initialized,
|
||||
'processed_frames': session_info.processed_frames,
|
||||
'process_pid': session_info.process.pid if session_info.process.is_alive() else None,
|
||||
'is_alive': session_info.process.is_alive()
|
||||
}
|
||||
|
||||
async def _send_command(self, session_id: str, command):
|
||||
"""Send command to session process."""
|
||||
session_info = self.sessions.get(session_id)
|
||||
if not session_info:
|
||||
raise ValueError(f"Session {session_id} not found")
|
||||
|
||||
serialized = MessageSerializer.serialize_message(command)
|
||||
session_info.command_queue.put(serialized)
|
||||
|
||||
def _start_response_processing(self, session_info: SessionProcessInfo):
|
||||
"""Start processing responses from a session process."""
|
||||
def process_responses():
|
||||
while session_info.session_id in self.sessions and session_info.process.is_alive():
|
||||
try:
|
||||
if not session_info.response_queue.empty():
|
||||
response_data = session_info.response_queue.get(timeout=1.0)
|
||||
response = MessageSerializer.deserialize_message(response_data)
|
||||
if self.main_event_loop:
|
||||
asyncio.run_coroutine_threadsafe(
|
||||
self._handle_response(session_info.session_id, response),
|
||||
self.main_event_loop
|
||||
)
|
||||
else:
|
||||
time.sleep(0.01)
|
||||
except Exception as e:
|
||||
logger.error(f"Error processing response from {session_info.session_id}: {e}")
|
||||
|
||||
self.response_executor.submit(process_responses)
|
||||
|
||||
async def _handle_response(self, session_id: str, response):
|
||||
"""Handle response from session process."""
|
||||
try:
|
||||
session_info = self.sessions.get(session_id)
|
||||
if not session_info:
|
||||
return
|
||||
|
||||
if response.type == MessageType.INITIALIZED:
|
||||
session_info.is_initialized = response.success
|
||||
if response.success:
|
||||
logger.info(f"Session {session_id} initialized successfully")
|
||||
else:
|
||||
logger.error(f"Session {session_id} initialization failed: {response.error_message}")
|
||||
|
||||
elif response.type == MessageType.DETECTION_RESULT:
|
||||
session_info.processed_frames += 1
|
||||
if self.detection_result_callback:
|
||||
await self.detection_result_callback(session_info.subscription_identifier, response)
|
||||
|
||||
elif response.type == MessageType.SESSION_SET:
|
||||
logger.info(f"Session ID set for {session_id}: {response.backend_session_id}")
|
||||
|
||||
elif response.type == MessageType.HEALTH_RESPONSE:
|
||||
session_info.last_health_check = time.time()
|
||||
logger.debug(f"Health check for {session_id}: {response.status}")
|
||||
|
||||
elif response.type == MessageType.ERROR:
|
||||
logger.error(f"Error from session {session_id}: {response.error_message}")
|
||||
if self.error_callback:
|
||||
await self.error_callback(session_info.subscription_identifier, response)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error handling response from {session_id}: {e}", exc_info=True)
|
||||
|
||||
async def _wait_for_shutdown(self, session_info: SessionProcessInfo):
|
||||
"""Wait for session process to shutdown gracefully."""
|
||||
while session_info.process.is_alive():
|
||||
await asyncio.sleep(0.1)
|
||||
|
||||
async def _cleanup_session(self, session_id: str):
|
||||
"""Cleanup session process and resources."""
|
||||
try:
|
||||
session_info = self.sessions.get(session_id)
|
||||
if not session_info:
|
||||
return
|
||||
|
||||
# Terminate process if still alive
|
||||
if session_info.process.is_alive():
|
||||
session_info.process.terminate()
|
||||
# Wait a bit for graceful termination
|
||||
await asyncio.sleep(1.0)
|
||||
if session_info.process.is_alive():
|
||||
session_info.process.kill()
|
||||
|
||||
# Remove from tracking
|
||||
del self.sessions[session_id]
|
||||
if session_info.subscription_identifier in self.subscription_to_session:
|
||||
del self.subscription_to_session[session_info.subscription_identifier]
|
||||
|
||||
logger.info(f"Session {session_id} cleaned up")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error cleaning up session {session_id}: {e}", exc_info=True)
|
||||
|
||||
async def _health_check_loop(self):
|
||||
"""Periodic health check of all session processes."""
|
||||
while self.is_running:
|
||||
try:
|
||||
for session_id in list(self.sessions.keys()):
|
||||
session_info = self.sessions.get(session_id)
|
||||
if session_info and session_info.is_initialized:
|
||||
# Send health check
|
||||
health_command = HealthCheckCommand(session_id=session_id)
|
||||
await self._send_command(session_id, health_command)
|
||||
|
||||
await asyncio.sleep(self.health_check_interval)
|
||||
|
||||
except asyncio.CancelledError:
|
||||
break
|
||||
except Exception as e:
|
||||
logger.error(f"Error in health check loop: {e}", exc_info=True)
|
||||
await asyncio.sleep(5.0) # Brief pause before retrying
|
813
core/processes/session_worker.py
Normal file
813
core/processes/session_worker.py
Normal file
|
@ -0,0 +1,813 @@
|
|||
"""
|
||||
Session Worker Process - Individual process that handles one session completely.
|
||||
Each camera/session gets its own dedicated worker process for complete isolation.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import multiprocessing as mp
|
||||
import time
|
||||
import logging
|
||||
import sys
|
||||
import os
|
||||
import traceback
|
||||
import psutil
|
||||
import threading
|
||||
import cv2
|
||||
import requests
|
||||
from typing import Dict, Any, Optional, Tuple
|
||||
from pathlib import Path
|
||||
import numpy as np
|
||||
from queue import Queue, Empty
|
||||
|
||||
# Import core modules
|
||||
from ..models.manager import ModelManager
|
||||
from ..detection.pipeline import DetectionPipeline
|
||||
from ..models.pipeline import PipelineParser
|
||||
from ..logging.session_logger import PerSessionLogger
|
||||
from .communication import (
|
||||
MessageSerializer, MessageType, IPCMessageUnion,
|
||||
InitializeCommand, ProcessFrameCommand, SetSessionIdCommand,
|
||||
ShutdownCommand, HealthCheckCommand,
|
||||
InitializedResponse, DetectionResultResponse, SessionSetResponse,
|
||||
ShutdownCompleteResponse, HealthResponse, ErrorResponse
|
||||
)
|
||||
|
||||
|
||||
class IntegratedStreamReader:
|
||||
"""
|
||||
Integrated RTSP/HTTP stream reader for session worker processes.
|
||||
Handles both RTSP streams and HTTP snapshots with automatic failover.
|
||||
"""
|
||||
|
||||
def __init__(self, session_id: str, subscription_config: Dict[str, Any], logger: logging.Logger):
|
||||
self.session_id = session_id
|
||||
self.subscription_config = subscription_config
|
||||
self.logger = logger
|
||||
|
||||
# Stream configuration
|
||||
self.rtsp_url = subscription_config.get('rtspUrl')
|
||||
self.snapshot_url = subscription_config.get('snapshotUrl')
|
||||
self.snapshot_interval = subscription_config.get('snapshotInterval', 2000) / 1000.0 # Convert to seconds
|
||||
|
||||
# Stream state
|
||||
self.is_running = False
|
||||
self.rtsp_cap = None
|
||||
self.stream_thread = None
|
||||
self.stop_event = threading.Event()
|
||||
|
||||
# Frame buffer - single latest frame only
|
||||
self.frame_queue = Queue(maxsize=1)
|
||||
self.last_frame_time = 0
|
||||
|
||||
# Stream health monitoring
|
||||
self.consecutive_errors = 0
|
||||
self.max_consecutive_errors = 30
|
||||
self.reconnect_delay = 5.0
|
||||
self.frame_timeout = 10.0 # Seconds without frame before considered dead
|
||||
|
||||
# Crop coordinates if present
|
||||
self.crop_coords = None
|
||||
if subscription_config.get('cropX1') is not None:
|
||||
self.crop_coords = (
|
||||
subscription_config['cropX1'],
|
||||
subscription_config['cropY1'],
|
||||
subscription_config['cropX2'],
|
||||
subscription_config['cropY2']
|
||||
)
|
||||
|
||||
def start(self) -> bool:
|
||||
"""Start the stream reading in background thread."""
|
||||
if self.is_running:
|
||||
return True
|
||||
|
||||
try:
|
||||
self.is_running = True
|
||||
self.stop_event.clear()
|
||||
|
||||
# Start background thread for stream reading
|
||||
self.stream_thread = threading.Thread(
|
||||
target=self._stream_loop,
|
||||
name=f"StreamReader-{self.session_id}",
|
||||
daemon=True
|
||||
)
|
||||
self.stream_thread.start()
|
||||
|
||||
self.logger.info(f"Stream reader started for {self.session_id}")
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Failed to start stream reader: {e}")
|
||||
self.is_running = False
|
||||
return False
|
||||
|
||||
def stop(self):
|
||||
"""Stop the stream reading."""
|
||||
if not self.is_running:
|
||||
return
|
||||
|
||||
self.logger.info(f"Stopping stream reader for {self.session_id}")
|
||||
self.is_running = False
|
||||
self.stop_event.set()
|
||||
|
||||
# Close RTSP connection
|
||||
if self.rtsp_cap:
|
||||
try:
|
||||
self.rtsp_cap.release()
|
||||
except:
|
||||
pass
|
||||
self.rtsp_cap = None
|
||||
|
||||
# Wait for thread to finish
|
||||
if self.stream_thread and self.stream_thread.is_alive():
|
||||
self.stream_thread.join(timeout=3.0)
|
||||
|
||||
def get_latest_frame(self) -> Optional[Tuple[np.ndarray, str, float]]:
|
||||
"""Get the latest frame if available. Returns (frame, display_id, timestamp) or None."""
|
||||
try:
|
||||
# Non-blocking get - return None if no frame available
|
||||
frame_data = self.frame_queue.get_nowait()
|
||||
return frame_data
|
||||
except Empty:
|
||||
return None
|
||||
|
||||
def _stream_loop(self):
|
||||
"""Main stream reading loop - runs in background thread."""
|
||||
self.logger.info(f"Stream loop started for {self.session_id}")
|
||||
|
||||
while self.is_running and not self.stop_event.is_set():
|
||||
try:
|
||||
if self.rtsp_url:
|
||||
# Try RTSP first
|
||||
self._read_rtsp_stream()
|
||||
elif self.snapshot_url:
|
||||
# Fallback to HTTP snapshots
|
||||
self._read_http_snapshots()
|
||||
else:
|
||||
self.logger.error("No stream URL configured")
|
||||
break
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Error in stream loop: {e}")
|
||||
self._handle_stream_error()
|
||||
|
||||
self.logger.info(f"Stream loop ended for {self.session_id}")
|
||||
|
||||
def _read_rtsp_stream(self):
|
||||
"""Read frames from RTSP stream."""
|
||||
if not self.rtsp_cap:
|
||||
self._connect_rtsp()
|
||||
|
||||
if not self.rtsp_cap:
|
||||
return
|
||||
|
||||
try:
|
||||
ret, frame = self.rtsp_cap.read()
|
||||
|
||||
if ret and frame is not None:
|
||||
# Process the frame
|
||||
processed_frame = self._process_frame(frame)
|
||||
if processed_frame is not None:
|
||||
# Extract display ID from subscription identifier
|
||||
display_id = self.subscription_config['subscriptionIdentifier'].split(';')[-1]
|
||||
timestamp = time.time()
|
||||
|
||||
# Put frame in queue (replace if full)
|
||||
try:
|
||||
# Clear queue and put new frame
|
||||
try:
|
||||
self.frame_queue.get_nowait()
|
||||
except Empty:
|
||||
pass
|
||||
self.frame_queue.put((processed_frame, display_id, timestamp), timeout=0.1)
|
||||
self.last_frame_time = timestamp
|
||||
self.consecutive_errors = 0
|
||||
except:
|
||||
pass # Queue full, skip frame
|
||||
else:
|
||||
self._handle_stream_error()
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Error reading RTSP frame: {e}")
|
||||
self._handle_stream_error()
|
||||
|
||||
def _read_http_snapshots(self):
|
||||
"""Read frames from HTTP snapshot URL."""
|
||||
try:
|
||||
response = requests.get(self.snapshot_url, timeout=10)
|
||||
response.raise_for_status()
|
||||
|
||||
# Convert response to numpy array
|
||||
img_array = np.asarray(bytearray(response.content), dtype=np.uint8)
|
||||
frame = cv2.imdecode(img_array, cv2.IMREAD_COLOR)
|
||||
|
||||
if frame is not None:
|
||||
# Process the frame
|
||||
processed_frame = self._process_frame(frame)
|
||||
if processed_frame is not None:
|
||||
# Extract display ID from subscription identifier
|
||||
display_id = self.subscription_config['subscriptionIdentifier'].split(';')[-1]
|
||||
timestamp = time.time()
|
||||
|
||||
# Put frame in queue (replace if full)
|
||||
try:
|
||||
# Clear queue and put new frame
|
||||
try:
|
||||
self.frame_queue.get_nowait()
|
||||
except Empty:
|
||||
pass
|
||||
self.frame_queue.put((processed_frame, display_id, timestamp), timeout=0.1)
|
||||
self.last_frame_time = timestamp
|
||||
self.consecutive_errors = 0
|
||||
except:
|
||||
pass # Queue full, skip frame
|
||||
|
||||
# Wait for next snapshot interval
|
||||
time.sleep(self.snapshot_interval)
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Error reading HTTP snapshot: {e}")
|
||||
self._handle_stream_error()
|
||||
|
||||
def _connect_rtsp(self):
|
||||
"""Connect to RTSP stream."""
|
||||
try:
|
||||
self.logger.info(f"Connecting to RTSP: {self.rtsp_url}")
|
||||
|
||||
# Create VideoCapture with optimized settings
|
||||
self.rtsp_cap = cv2.VideoCapture(self.rtsp_url)
|
||||
|
||||
# Set buffer size to 1 to reduce latency
|
||||
self.rtsp_cap.set(cv2.CAP_PROP_BUFFERSIZE, 1)
|
||||
|
||||
# Check if connection successful
|
||||
if self.rtsp_cap.isOpened():
|
||||
# Test read a frame
|
||||
ret, frame = self.rtsp_cap.read()
|
||||
if ret and frame is not None:
|
||||
self.logger.info(f"RTSP connection successful for {self.session_id}")
|
||||
self.consecutive_errors = 0
|
||||
return True
|
||||
|
||||
# Connection failed
|
||||
if self.rtsp_cap:
|
||||
self.rtsp_cap.release()
|
||||
self.rtsp_cap = None
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Failed to connect RTSP: {e}")
|
||||
|
||||
return False
|
||||
|
||||
def _process_frame(self, frame: np.ndarray) -> Optional[np.ndarray]:
|
||||
"""Process frame - apply cropping if configured."""
|
||||
if frame is None:
|
||||
return None
|
||||
|
||||
try:
|
||||
# Apply crop if configured
|
||||
if self.crop_coords:
|
||||
x1, y1, x2, y2 = self.crop_coords
|
||||
if x1 < x2 and y1 < y2:
|
||||
frame = frame[y1:y2, x1:x2]
|
||||
|
||||
return frame
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Error processing frame: {e}")
|
||||
return None
|
||||
|
||||
def _handle_stream_error(self):
|
||||
"""Handle stream errors with reconnection logic."""
|
||||
self.consecutive_errors += 1
|
||||
|
||||
if self.consecutive_errors >= self.max_consecutive_errors:
|
||||
self.logger.error(f"Too many consecutive errors ({self.consecutive_errors}), stopping stream")
|
||||
self.stop()
|
||||
return
|
||||
|
||||
# Close current connection
|
||||
if self.rtsp_cap:
|
||||
try:
|
||||
self.rtsp_cap.release()
|
||||
except:
|
||||
pass
|
||||
self.rtsp_cap = None
|
||||
|
||||
# Wait before reconnecting
|
||||
self.logger.warning(f"Stream error #{self.consecutive_errors}, reconnecting in {self.reconnect_delay}s")
|
||||
time.sleep(self.reconnect_delay)
|
||||
|
||||
def is_healthy(self) -> bool:
|
||||
"""Check if stream is healthy (receiving frames)."""
|
||||
if not self.is_running:
|
||||
return False
|
||||
|
||||
# Check if we've received a frame recently
|
||||
if self.last_frame_time > 0:
|
||||
time_since_frame = time.time() - self.last_frame_time
|
||||
return time_since_frame < self.frame_timeout
|
||||
|
||||
return False
|
||||
|
||||
|
||||
class SessionWorkerProcess:
|
||||
"""
|
||||
Individual session worker process that handles one camera/session completely.
|
||||
Runs in its own process with isolated memory, models, and state.
|
||||
"""
|
||||
|
||||
def __init__(self, session_id: str, command_queue: mp.Queue, response_queue: mp.Queue):
|
||||
"""
|
||||
Initialize session worker process.
|
||||
|
||||
Args:
|
||||
session_id: Unique session identifier
|
||||
command_queue: Queue to receive commands from main process
|
||||
response_queue: Queue to send responses back to main process
|
||||
"""
|
||||
self.session_id = session_id
|
||||
self.command_queue = command_queue
|
||||
self.response_queue = response_queue
|
||||
|
||||
# Process information
|
||||
self.process = None
|
||||
self.start_time = time.time()
|
||||
self.processed_frames = 0
|
||||
|
||||
# Session components (will be initialized in process)
|
||||
self.model_manager = None
|
||||
self.detection_pipeline = None
|
||||
self.pipeline_parser = None
|
||||
self.logger = None
|
||||
self.session_logger = None
|
||||
self.stream_reader = None
|
||||
|
||||
# Session state
|
||||
self.subscription_config = None
|
||||
self.model_config = None
|
||||
self.backend_session_id = None
|
||||
self.display_id = None
|
||||
self.is_initialized = False
|
||||
self.should_shutdown = False
|
||||
|
||||
# Frame processing
|
||||
self.frame_processing_enabled = False
|
||||
|
||||
async def run(self):
|
||||
"""
|
||||
Main entry point for the worker process.
|
||||
This method runs in the separate process.
|
||||
"""
|
||||
try:
|
||||
# Set process name for debugging
|
||||
mp.current_process().name = f"SessionWorker-{self.session_id}"
|
||||
|
||||
# Setup basic logging first (enhanced after we get subscription config)
|
||||
self._setup_basic_logging()
|
||||
|
||||
self.logger.info(f"Session worker process started for session {self.session_id}")
|
||||
self.logger.info(f"Process ID: {os.getpid()}")
|
||||
|
||||
# Main message processing loop with integrated frame processing
|
||||
while not self.should_shutdown:
|
||||
try:
|
||||
# Process pending messages
|
||||
await self._process_pending_messages()
|
||||
|
||||
# Process frames if enabled and initialized
|
||||
if self.frame_processing_enabled and self.is_initialized and self.stream_reader:
|
||||
await self._process_stream_frames()
|
||||
|
||||
# Brief sleep to prevent busy waiting
|
||||
await asyncio.sleep(0.01)
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Error in main processing loop: {e}", exc_info=True)
|
||||
self._send_error_response("main_loop_error", str(e), traceback.format_exc())
|
||||
|
||||
except Exception as e:
|
||||
# Critical error in main run loop
|
||||
if self.logger:
|
||||
self.logger.error(f"Critical error in session worker: {e}", exc_info=True)
|
||||
else:
|
||||
print(f"Critical error in session worker {self.session_id}: {e}")
|
||||
|
||||
finally:
|
||||
# Cleanup stream reader
|
||||
if self.stream_reader:
|
||||
self.stream_reader.stop()
|
||||
|
||||
if self.session_logger:
|
||||
self.session_logger.log_session_end()
|
||||
if self.session_logger:
|
||||
self.session_logger.cleanup()
|
||||
if self.logger:
|
||||
self.logger.info(f"Session worker process {self.session_id} shutting down")
|
||||
|
||||
async def _handle_message(self, message: IPCMessageUnion):
|
||||
"""
|
||||
Handle incoming messages from main process.
|
||||
|
||||
Args:
|
||||
message: Deserialized message object
|
||||
"""
|
||||
try:
|
||||
if message.type == MessageType.INITIALIZE:
|
||||
await self._handle_initialize(message)
|
||||
elif message.type == MessageType.PROCESS_FRAME:
|
||||
await self._handle_process_frame(message)
|
||||
elif message.type == MessageType.SET_SESSION_ID:
|
||||
await self._handle_set_session_id(message)
|
||||
elif message.type == MessageType.SHUTDOWN:
|
||||
await self._handle_shutdown(message)
|
||||
elif message.type == MessageType.HEALTH_CHECK:
|
||||
await self._handle_health_check(message)
|
||||
else:
|
||||
self.logger.warning(f"Unknown message type: {message.type}")
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Error handling message {message.type}: {e}", exc_info=True)
|
||||
self._send_error_response(f"handle_{message.type.value}_error", str(e), traceback.format_exc())
|
||||
|
||||
async def _handle_initialize(self, message: InitializeCommand):
|
||||
"""
|
||||
Initialize the session with models and pipeline.
|
||||
|
||||
Args:
|
||||
message: Initialize command message
|
||||
"""
|
||||
try:
|
||||
self.logger.info(f"Initializing session {self.session_id}")
|
||||
self.logger.info(f"Subscription config: {message.subscription_config}")
|
||||
self.logger.info(f"Model config: {message.model_config}")
|
||||
|
||||
# Store configuration
|
||||
self.subscription_config = message.subscription_config
|
||||
self.model_config = message.model_config
|
||||
|
||||
# Setup enhanced logging now that we have subscription config
|
||||
self._setup_enhanced_logging()
|
||||
|
||||
# Initialize model manager (isolated for this process)
|
||||
self.model_manager = ModelManager("models")
|
||||
self.logger.info("Model manager initialized")
|
||||
|
||||
# Download and prepare model if needed
|
||||
model_id = self.model_config.get('modelId')
|
||||
model_url = self.model_config.get('modelUrl')
|
||||
model_name = self.model_config.get('modelName', f'Model-{model_id}')
|
||||
|
||||
if model_id and model_url:
|
||||
model_path = self.model_manager.ensure_model(model_id, model_url, model_name)
|
||||
if not model_path:
|
||||
raise RuntimeError(f"Failed to download/prepare model {model_id}")
|
||||
|
||||
self.logger.info(f"Model {model_id} prepared at {model_path}")
|
||||
|
||||
# Log model loading
|
||||
if self.session_logger:
|
||||
self.session_logger.log_model_loading(model_id, model_name, str(model_path))
|
||||
|
||||
# Load pipeline configuration
|
||||
self.pipeline_parser = self.model_manager.get_pipeline_config(model_id)
|
||||
if not self.pipeline_parser:
|
||||
raise RuntimeError(f"Failed to load pipeline config for model {model_id}")
|
||||
|
||||
self.logger.info(f"Pipeline configuration loaded for model {model_id}")
|
||||
|
||||
# Initialize detection pipeline (isolated for this session)
|
||||
self.detection_pipeline = DetectionPipeline(
|
||||
pipeline_parser=self.pipeline_parser,
|
||||
model_manager=self.model_manager,
|
||||
model_id=model_id,
|
||||
message_sender=None # Will be set to send via IPC
|
||||
)
|
||||
|
||||
# Initialize pipeline components
|
||||
if not await self.detection_pipeline.initialize():
|
||||
raise RuntimeError("Failed to initialize detection pipeline")
|
||||
|
||||
self.logger.info("Detection pipeline initialized successfully")
|
||||
|
||||
# Initialize integrated stream reader
|
||||
self.logger.info("Initializing integrated stream reader")
|
||||
self.stream_reader = IntegratedStreamReader(
|
||||
self.session_id,
|
||||
self.subscription_config,
|
||||
self.logger
|
||||
)
|
||||
|
||||
# Start stream reading
|
||||
if self.stream_reader.start():
|
||||
self.logger.info("Stream reader started successfully")
|
||||
self.frame_processing_enabled = True
|
||||
else:
|
||||
self.logger.error("Failed to start stream reader")
|
||||
|
||||
self.is_initialized = True
|
||||
|
||||
# Send success response
|
||||
response = InitializedResponse(
|
||||
type=MessageType.INITIALIZED,
|
||||
session_id=self.session_id,
|
||||
success=True
|
||||
)
|
||||
self._send_response(response)
|
||||
|
||||
else:
|
||||
raise ValueError("Missing required model configuration (modelId, modelUrl)")
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Failed to initialize session: {e}", exc_info=True)
|
||||
response = InitializedResponse(
|
||||
type=MessageType.INITIALIZED,
|
||||
session_id=self.session_id,
|
||||
success=False,
|
||||
error_message=str(e)
|
||||
)
|
||||
self._send_response(response)
|
||||
|
||||
async def _handle_process_frame(self, message: ProcessFrameCommand):
|
||||
"""
|
||||
Process a frame through the detection pipeline.
|
||||
|
||||
Args:
|
||||
message: Process frame command message
|
||||
"""
|
||||
if not self.is_initialized:
|
||||
self._send_error_response("not_initialized", "Session not initialized", None)
|
||||
return
|
||||
|
||||
try:
|
||||
self.logger.debug(f"Processing frame for display {message.display_id}")
|
||||
|
||||
# Process frame through detection pipeline
|
||||
if self.backend_session_id:
|
||||
# Processing phase (after session ID is set)
|
||||
result = await self.detection_pipeline.execute_processing_phase(
|
||||
frame=message.frame,
|
||||
display_id=message.display_id,
|
||||
session_id=self.backend_session_id,
|
||||
subscription_id=message.subscription_identifier
|
||||
)
|
||||
phase = "processing"
|
||||
else:
|
||||
# Detection phase (before session ID is set)
|
||||
result = await self.detection_pipeline.execute_detection_phase(
|
||||
frame=message.frame,
|
||||
display_id=message.display_id,
|
||||
subscription_id=message.subscription_identifier
|
||||
)
|
||||
phase = "detection"
|
||||
|
||||
self.processed_frames += 1
|
||||
|
||||
# Send result back to main process
|
||||
response = DetectionResultResponse(
|
||||
session_id=self.session_id,
|
||||
detections=result,
|
||||
processing_time=result.get('processing_time', 0.0),
|
||||
phase=phase
|
||||
)
|
||||
self._send_response(response)
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Error processing frame: {e}", exc_info=True)
|
||||
self._send_error_response("frame_processing_error", str(e), traceback.format_exc())
|
||||
|
||||
async def _handle_set_session_id(self, message: SetSessionIdCommand):
|
||||
"""
|
||||
Set the backend session ID for this session.
|
||||
|
||||
Args:
|
||||
message: Set session ID command message
|
||||
"""
|
||||
try:
|
||||
self.logger.info(f"Setting backend session ID: {message.backend_session_id}")
|
||||
self.backend_session_id = message.backend_session_id
|
||||
self.display_id = message.display_id
|
||||
|
||||
response = SessionSetResponse(
|
||||
session_id=self.session_id,
|
||||
success=True,
|
||||
backend_session_id=message.backend_session_id
|
||||
)
|
||||
self._send_response(response)
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Error setting session ID: {e}", exc_info=True)
|
||||
self._send_error_response("set_session_id_error", str(e), traceback.format_exc())
|
||||
|
||||
async def _handle_shutdown(self, message: ShutdownCommand):
|
||||
"""
|
||||
Handle graceful shutdown request.
|
||||
|
||||
Args:
|
||||
message: Shutdown command message
|
||||
"""
|
||||
try:
|
||||
self.logger.info("Received shutdown request")
|
||||
self.should_shutdown = True
|
||||
|
||||
# Cleanup resources
|
||||
if self.detection_pipeline:
|
||||
# Add cleanup method to pipeline if needed
|
||||
pass
|
||||
|
||||
response = ShutdownCompleteResponse(session_id=self.session_id)
|
||||
self._send_response(response)
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Error during shutdown: {e}", exc_info=True)
|
||||
|
||||
async def _handle_health_check(self, message: HealthCheckCommand):
|
||||
"""
|
||||
Handle health check request.
|
||||
|
||||
Args:
|
||||
message: Health check command message
|
||||
"""
|
||||
try:
|
||||
# Get process metrics
|
||||
process = psutil.Process()
|
||||
memory_info = process.memory_info()
|
||||
memory_mb = memory_info.rss / (1024 * 1024) # Convert to MB
|
||||
cpu_percent = process.cpu_percent()
|
||||
|
||||
# GPU memory (if available)
|
||||
gpu_memory_mb = None
|
||||
try:
|
||||
import torch
|
||||
if torch.cuda.is_available():
|
||||
gpu_memory_mb = torch.cuda.memory_allocated() / (1024 * 1024)
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
# Determine health status
|
||||
status = "healthy"
|
||||
if memory_mb > 2048: # More than 2GB
|
||||
status = "degraded"
|
||||
if memory_mb > 4096: # More than 4GB
|
||||
status = "unhealthy"
|
||||
|
||||
response = HealthResponse(
|
||||
session_id=self.session_id,
|
||||
status=status,
|
||||
memory_usage_mb=memory_mb,
|
||||
cpu_percent=cpu_percent,
|
||||
gpu_memory_mb=gpu_memory_mb,
|
||||
uptime_seconds=time.time() - self.start_time,
|
||||
processed_frames=self.processed_frames
|
||||
)
|
||||
self._send_response(response)
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Error checking health: {e}", exc_info=True)
|
||||
self._send_error_response("health_check_error", str(e), traceback.format_exc())
|
||||
|
||||
def _send_response(self, response: IPCMessageUnion):
|
||||
"""
|
||||
Send response message to main process.
|
||||
|
||||
Args:
|
||||
response: Response message to send
|
||||
"""
|
||||
try:
|
||||
serialized = MessageSerializer.serialize_message(response)
|
||||
self.response_queue.put(serialized)
|
||||
except Exception as e:
|
||||
if self.logger:
|
||||
self.logger.error(f"Failed to send response: {e}")
|
||||
|
||||
def _send_error_response(self, error_type: str, error_message: str, traceback_str: Optional[str]):
|
||||
"""
|
||||
Send error response to main process.
|
||||
|
||||
Args:
|
||||
error_type: Type of error
|
||||
error_message: Error message
|
||||
traceback_str: Optional traceback string
|
||||
"""
|
||||
error_response = ErrorResponse(
|
||||
type=MessageType.ERROR,
|
||||
session_id=self.session_id,
|
||||
error_type=error_type,
|
||||
error_message=error_message,
|
||||
traceback=traceback_str
|
||||
)
|
||||
self._send_response(error_response)
|
||||
|
||||
def _setup_basic_logging(self):
|
||||
"""
|
||||
Setup basic logging for this process before we have subscription config.
|
||||
"""
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format=f"%(asctime)s [%(levelname)s] SessionWorker-{self.session_id}: %(message)s",
|
||||
handlers=[
|
||||
logging.StreamHandler(sys.stdout)
|
||||
]
|
||||
)
|
||||
self.logger = logging.getLogger(f"session_worker_{self.session_id}")
|
||||
|
||||
def _setup_enhanced_logging(self):
|
||||
"""
|
||||
Setup per-session logging with dedicated log file after we have subscription config.
|
||||
Phase 2: Enhanced logging with file rotation and session context.
|
||||
"""
|
||||
if not self.subscription_config:
|
||||
return
|
||||
|
||||
# Initialize per-session logger
|
||||
subscription_id = self.subscription_config.get('subscriptionIdentifier', self.session_id)
|
||||
|
||||
self.session_logger = PerSessionLogger(
|
||||
session_id=self.session_id,
|
||||
subscription_identifier=subscription_id,
|
||||
log_dir="logs",
|
||||
max_size_mb=100,
|
||||
backup_count=5
|
||||
)
|
||||
|
||||
# Get the configured logger (replaces basic logger)
|
||||
self.logger = self.session_logger.get_logger()
|
||||
|
||||
# Log session start
|
||||
self.session_logger.log_session_start(os.getpid())
|
||||
|
||||
async def _process_pending_messages(self):
|
||||
"""Process pending IPC messages from main process."""
|
||||
try:
|
||||
# Process all pending messages
|
||||
while not self.command_queue.empty():
|
||||
message_data = self.command_queue.get_nowait()
|
||||
message = MessageSerializer.deserialize_message(message_data)
|
||||
await self._handle_message(message)
|
||||
except Exception as e:
|
||||
if not self.command_queue.empty():
|
||||
# Only log error if there was actually a message to process
|
||||
self.logger.error(f"Error processing messages: {e}", exc_info=True)
|
||||
|
||||
async def _process_stream_frames(self):
|
||||
"""Process frames from the integrated stream reader."""
|
||||
try:
|
||||
if not self.stream_reader or not self.stream_reader.is_running:
|
||||
return
|
||||
|
||||
# Get latest frame from stream
|
||||
frame_data = self.stream_reader.get_latest_frame()
|
||||
if frame_data is None:
|
||||
return
|
||||
|
||||
frame, display_id, timestamp = frame_data
|
||||
|
||||
# Process frame through detection pipeline
|
||||
subscription_identifier = self.subscription_config['subscriptionIdentifier']
|
||||
|
||||
if self.backend_session_id:
|
||||
# Processing phase (after session ID is set)
|
||||
result = await self.detection_pipeline.execute_processing_phase(
|
||||
frame=frame,
|
||||
display_id=display_id,
|
||||
session_id=self.backend_session_id,
|
||||
subscription_id=subscription_identifier
|
||||
)
|
||||
phase = "processing"
|
||||
else:
|
||||
# Detection phase (before session ID is set)
|
||||
result = await self.detection_pipeline.execute_detection_phase(
|
||||
frame=frame,
|
||||
display_id=display_id,
|
||||
subscription_id=subscription_identifier
|
||||
)
|
||||
phase = "detection"
|
||||
|
||||
self.processed_frames += 1
|
||||
|
||||
# Send result back to main process
|
||||
response = DetectionResultResponse(
|
||||
type=MessageType.DETECTION_RESULT,
|
||||
session_id=self.session_id,
|
||||
detections=result,
|
||||
processing_time=result.get('processing_time', 0.0),
|
||||
phase=phase
|
||||
)
|
||||
self._send_response(response)
|
||||
|
||||
# Log frame processing (debug level to avoid spam)
|
||||
self.logger.debug(f"Processed frame #{self.processed_frames} from {display_id} (phase: {phase})")
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Error processing stream frame: {e}", exc_info=True)
|
||||
|
||||
|
||||
def session_worker_main(session_id: str, command_queue: mp.Queue, response_queue: mp.Queue):
|
||||
"""
|
||||
Main entry point for session worker process.
|
||||
This function is called when the process is spawned.
|
||||
"""
|
||||
# Create worker instance
|
||||
worker = SessionWorkerProcess(session_id, command_queue, response_queue)
|
||||
|
||||
# Run the worker
|
||||
asyncio.run(worker.run())
|
Loading…
Add table
Add a link
Reference in a new issue