refactor: half way to process per session
All checks were successful
Build Worker Base and Application Images / check-base-changes (push) Successful in 7s
Build Worker Base and Application Images / build-base (push) Has been skipped
Build Worker Base and Application Images / build-docker (push) Successful in 2m52s
Build Worker Base and Application Images / deploy-stack (push) Successful in 9s
All checks were successful
Build Worker Base and Application Images / check-base-changes (push) Successful in 7s
Build Worker Base and Application Images / build-base (push) Has been skipped
Build Worker Base and Application Images / build-docker (push) Successful in 2m52s
Build Worker Base and Application Images / deploy-stack (push) Successful in 9s
This commit is contained in:
parent
2e5316ca01
commit
34d1982e9e
12 changed files with 2771 additions and 92 deletions
339
IMPLEMENTATION_PLAN.md
Normal file
339
IMPLEMENTATION_PLAN.md
Normal file
|
@ -0,0 +1,339 @@
|
||||||
|
# Session-Isolated Multiprocessing Architecture - Implementation Plan
|
||||||
|
|
||||||
|
## 🎯 Objective
|
||||||
|
Eliminate shared state issues causing identical results across different sessions by implementing **Process-Per-Session architecture** with **per-camera logging**.
|
||||||
|
|
||||||
|
## 🔍 Root Cause Analysis
|
||||||
|
|
||||||
|
### Current Shared State Issues:
|
||||||
|
1. **Shared Model Cache** (`core/models/inference.py:40`): All sessions share same cached YOLO model instances
|
||||||
|
2. **Single Pipeline Instance** (`core/detection/pipeline.py`): One pipeline handles all sessions with shared mappings
|
||||||
|
3. **Global Session Mappings**: `session_to_subscription` and `session_processing_results` dictionaries
|
||||||
|
4. **Shared Thread Pool**: Single `ThreadPoolExecutor` for all sessions
|
||||||
|
5. **Global Frame Cache** (`app.py:39`): `latest_frames` shared across endpoints
|
||||||
|
6. **Single Log File**: All cameras write to `detector_worker.log`
|
||||||
|
|
||||||
|
## 🏗️ New Architecture: Process-Per-Session
|
||||||
|
|
||||||
|
```
|
||||||
|
FastAPI Main Process (Port 8001)
|
||||||
|
├── WebSocket Handler (manages connections)
|
||||||
|
├── SessionProcessManager (spawns/manages session processes)
|
||||||
|
├── Main Process Logger → detector_worker_main.log
|
||||||
|
├──
|
||||||
|
├── Session Process 1 (Camera/Display 1)
|
||||||
|
│ ├── Dedicated Model Pipeline
|
||||||
|
│ ├── Own Model Cache & Memory
|
||||||
|
│ ├── Session Logger → detector_worker_camera_display-001_cam-001.log
|
||||||
|
│ └── Redis/DB connections
|
||||||
|
├──
|
||||||
|
├── Session Process 2 (Camera/Display 2)
|
||||||
|
│ ├── Dedicated Model Pipeline
|
||||||
|
│ ├── Own Model Cache & Memory
|
||||||
|
│ ├── Session Logger → detector_worker_camera_display-002_cam-001.log
|
||||||
|
│ └── Redis/DB connections
|
||||||
|
└──
|
||||||
|
└── Session Process N...
|
||||||
|
```
|
||||||
|
|
||||||
|
## 📋 Implementation Tasks
|
||||||
|
|
||||||
|
### Phase 1: Core Infrastructure ✅ **COMPLETED**
|
||||||
|
- [x] **Create SessionProcessManager class** ✅
|
||||||
|
- Manages lifecycle of session processes
|
||||||
|
- Handles process spawning, monitoring, and cleanup
|
||||||
|
- Maintains process registry and health checks
|
||||||
|
|
||||||
|
- [x] **Implement SessionWorkerProcess** ✅
|
||||||
|
- Individual process class that handles one session completely
|
||||||
|
- Loads own models, pipeline, and maintains state
|
||||||
|
- Communicates via queues with main process
|
||||||
|
|
||||||
|
- [x] **Design Inter-Process Communication** ✅
|
||||||
|
- Command queue: Main → Session (frames, commands, config)
|
||||||
|
- Result queue: Session → Main (detections, status, errors)
|
||||||
|
- Use `multiprocessing.Queue` for thread-safe communication
|
||||||
|
|
||||||
|
**Phase 1 Testing Results:**
|
||||||
|
- ✅ Server starts successfully on port 8001
|
||||||
|
- ✅ WebSocket connections established (10.100.1.3:57488)
|
||||||
|
- ✅ SessionProcessManager initializes (max_sessions=20)
|
||||||
|
- ✅ Multiple session processes created (9 camera subscriptions)
|
||||||
|
- ✅ Individual session processes spawn with unique PIDs (e.g., PID: 16380)
|
||||||
|
- ✅ Session logging shows isolated process names (SessionWorker-session_xxx)
|
||||||
|
- ✅ IPC communication framework functioning
|
||||||
|
|
||||||
|
**What to Look For When Testing:**
|
||||||
|
- Check logs for "SessionProcessManager initialized"
|
||||||
|
- Verify individual session processes: "Session process created: session_xxx (PID: xxxx)"
|
||||||
|
- Monitor process isolation: Each session has unique process name "SessionWorker-session_xxx"
|
||||||
|
- Confirm WebSocket integration: "Session WebSocket integration started"
|
||||||
|
|
||||||
|
### Phase 2: Per-Session Logging ✅ **COMPLETED**
|
||||||
|
- [x] **Implement PerSessionLogger** ✅
|
||||||
|
- Each session process creates own log file
|
||||||
|
- Format: `detector_worker_camera_{subscription_id}.log`
|
||||||
|
- Include session context in all log messages
|
||||||
|
- Implement log rotation (daily/size-based)
|
||||||
|
|
||||||
|
- [x] **Update Main Process Logging** ✅
|
||||||
|
- Main process logs to `detector_worker_main.log`
|
||||||
|
- Log session process lifecycle events
|
||||||
|
- Track active sessions and resource usage
|
||||||
|
|
||||||
|
**Phase 2 Testing Results:**
|
||||||
|
- ✅ Main process logs to dedicated file: `logs/detector_worker_main.log`
|
||||||
|
- ✅ Session-specific logger initialization working
|
||||||
|
- ✅ Each camera spawns with unique session worker name: "SessionWorker-session_{unique_id}_{camera_name}"
|
||||||
|
- ✅ Per-session logger ready for file creation (will create files when sessions fully initialize)
|
||||||
|
- ✅ Structured logging with session context in format
|
||||||
|
- ✅ Log rotation capability implemented (100MB max, 5 backups)
|
||||||
|
|
||||||
|
**What to Look For When Testing:**
|
||||||
|
- Check for main process log: `logs/detector_worker_main.log`
|
||||||
|
- Monitor per-session process names in logs: "SessionWorker-session_xxx"
|
||||||
|
- Once sessions initialize fully, look for per-camera log files: `detector_worker_camera_{camera_name}.log`
|
||||||
|
- Verify session start/end events are logged with timestamps
|
||||||
|
- Check log rotation when files exceed 100MB
|
||||||
|
|
||||||
|
### Phase 3: Model & Pipeline Isolation ✅ **COMPLETED**
|
||||||
|
- [x] **Remove Shared Model Cache** ✅
|
||||||
|
- Eliminated `YOLOWrapper._model_cache` class variable
|
||||||
|
- Each process loads models independently
|
||||||
|
- Memory isolation prevents cross-session contamination
|
||||||
|
|
||||||
|
- [x] **Create Per-Process Pipeline Instances** ✅
|
||||||
|
- Each session process instantiates own `DetectionPipeline`
|
||||||
|
- Removed global pipeline singleton pattern
|
||||||
|
- Session-local `session_to_subscription` mapping
|
||||||
|
|
||||||
|
- [x] **Isolate Session State** ✅
|
||||||
|
- Each process maintains own `session_processing_results`
|
||||||
|
- Session mappings are process-local
|
||||||
|
- Complete state isolation per session
|
||||||
|
|
||||||
|
**Phase 3 Testing Results:**
|
||||||
|
- ✅ **Zero Shared Cache**: Models log "(ISOLATED)" and "no shared cache!"
|
||||||
|
- ✅ **Individual Model Loading**: Each session loads complete model set independently
|
||||||
|
- `car_frontal_detection_v1.pt` per session
|
||||||
|
- `car_brand_cls_v1.pt` per session
|
||||||
|
- `car_bodytype_cls_v1.pt` per session
|
||||||
|
- ✅ **Pipeline Isolation**: Each session has unique pipeline instance ID
|
||||||
|
- ✅ **Memory Isolation**: Different sessions cannot share model instances
|
||||||
|
- ✅ **State Isolation**: Session mappings are process-local (ISOLATED comments added)
|
||||||
|
|
||||||
|
**What to Look For When Testing:**
|
||||||
|
- Check logs for "(ISOLATED)" on model loading
|
||||||
|
- Verify each session loads models independently: "Loading YOLO model ... (ISOLATED)"
|
||||||
|
- Monitor unique pipeline instance IDs per session
|
||||||
|
- Confirm no shared state between sessions
|
||||||
|
- Look for "Successfully loaded model ... in isolation - no shared cache!"
|
||||||
|
|
||||||
|
### Phase 4: Integrated Stream-Session Architecture 🚧 **IN PROGRESS**
|
||||||
|
|
||||||
|
**Problem Identified:** Frame processing pipeline not working due to dual stream systems causing communication gap.
|
||||||
|
|
||||||
|
**Root Cause:**
|
||||||
|
- Old RTSP Process Manager capturing frames but not forwarding to session workers
|
||||||
|
- New Session Workers ready for processing but receiving no frames
|
||||||
|
- Architecture mismatch preventing detection despite successful initialization
|
||||||
|
|
||||||
|
**Solution:** Complete integration of stream reading INTO session worker processes.
|
||||||
|
|
||||||
|
- [ ] **Integrate RTSP Stream Reading into Session Workers**
|
||||||
|
- Move RTSP stream capture from separate processes into each session worker
|
||||||
|
- Each session worker handles: RTSP connection + frame processing + model inference
|
||||||
|
- Eliminate communication gap between stream capture and detection
|
||||||
|
|
||||||
|
- [ ] **Remove Duplicate Stream Management Systems**
|
||||||
|
- Delete old RTSP Process Manager (`core/streaming/process_manager.py`)
|
||||||
|
- Remove conflicting stream management from main process
|
||||||
|
- Consolidate to single session-worker-only architecture
|
||||||
|
|
||||||
|
- [ ] **Enhanced Session Worker with Stream Integration**
|
||||||
|
- Add RTSP stream reader to `SessionWorkerProcess`
|
||||||
|
- Implement frame buffer queue management per worker
|
||||||
|
- Add connection recovery and stream health monitoring per session
|
||||||
|
|
||||||
|
- [ ] **Complete End-to-End Isolation per Camera**
|
||||||
|
```
|
||||||
|
Session Worker Process N:
|
||||||
|
├── RTSP Stream Reader (rtsp://cameraN)
|
||||||
|
├── Frame Buffer Queue
|
||||||
|
├── YOLO Detection Pipeline
|
||||||
|
├── Model Cache (isolated)
|
||||||
|
├── Database/Redis connections
|
||||||
|
└── Per-camera Logger
|
||||||
|
```
|
||||||
|
|
||||||
|
**Benefits for 20+ Cameras:**
|
||||||
|
- **Python GIL Bypass**: True parallelism with multiprocessing
|
||||||
|
- **Resource Isolation**: Process crashes don't affect other cameras
|
||||||
|
- **Memory Distribution**: Each process has own memory space
|
||||||
|
- **Independent Recovery**: Per-camera reconnection logic
|
||||||
|
- **Scalable Architecture**: Linear scaling with available CPU cores
|
||||||
|
|
||||||
|
### Phase 5: Resource Management & Cleanup
|
||||||
|
- [ ] **Process Lifecycle Management**
|
||||||
|
- Automatic process cleanup on WebSocket disconnect
|
||||||
|
- Graceful shutdown handling
|
||||||
|
- Resource deallocation on process termination
|
||||||
|
|
||||||
|
- [ ] **Memory & GPU Management**
|
||||||
|
- Monitor per-process memory usage
|
||||||
|
- GPU memory isolation between sessions
|
||||||
|
- Prevent memory leaks in long-running processes
|
||||||
|
|
||||||
|
- [ ] **Health Monitoring**
|
||||||
|
- Process health checks and restart capability
|
||||||
|
- Performance metrics per session process
|
||||||
|
- Resource usage monitoring and alerting
|
||||||
|
|
||||||
|
## 🔄 What Will Be Replaced
|
||||||
|
|
||||||
|
### Files to Modify:
|
||||||
|
1. **`app.py`**
|
||||||
|
- Replace direct pipeline execution with process management
|
||||||
|
- Remove global `latest_frames` cache
|
||||||
|
- Add SessionProcessManager integration
|
||||||
|
|
||||||
|
2. **`core/models/inference.py`**
|
||||||
|
- Remove shared `_model_cache` class variable
|
||||||
|
- Make model loading process-specific
|
||||||
|
- Eliminate cross-session model sharing
|
||||||
|
|
||||||
|
3. **`core/detection/pipeline.py`**
|
||||||
|
- Remove global session mappings
|
||||||
|
- Make pipeline instance session-specific
|
||||||
|
- Isolate processing state per session
|
||||||
|
|
||||||
|
4. **`core/communication/websocket.py`**
|
||||||
|
- Replace direct pipeline calls with IPC
|
||||||
|
- Add process spawn/cleanup on subscribe/unsubscribe
|
||||||
|
- Implement queue-based communication
|
||||||
|
|
||||||
|
### New Files to Create:
|
||||||
|
1. **`core/processes/session_manager.py`**
|
||||||
|
- SessionProcessManager class
|
||||||
|
- Process lifecycle management
|
||||||
|
- Health monitoring and cleanup
|
||||||
|
|
||||||
|
2. **`core/processes/session_worker.py`**
|
||||||
|
- SessionWorkerProcess class
|
||||||
|
- Individual session process implementation
|
||||||
|
- Model loading and pipeline execution
|
||||||
|
|
||||||
|
3. **`core/processes/communication.py`**
|
||||||
|
- IPC message definitions and handlers
|
||||||
|
- Queue management utilities
|
||||||
|
- Protocol for main ↔ session communication
|
||||||
|
|
||||||
|
4. **`core/logging/session_logger.py`**
|
||||||
|
- Per-session logging configuration
|
||||||
|
- Log file management and rotation
|
||||||
|
- Structured logging with session context
|
||||||
|
|
||||||
|
## ❌ What Will Be Removed
|
||||||
|
|
||||||
|
### Code to Remove:
|
||||||
|
1. **Shared State Variables**
|
||||||
|
```python
|
||||||
|
# From core/models/inference.py
|
||||||
|
_model_cache: Dict[str, Any] = {}
|
||||||
|
|
||||||
|
# From core/detection/pipeline.py
|
||||||
|
self.session_to_subscription = {}
|
||||||
|
self.session_processing_results = {}
|
||||||
|
|
||||||
|
# From app.py
|
||||||
|
latest_frames = {}
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Global Singleton Patterns**
|
||||||
|
- Single pipeline instance handling all sessions
|
||||||
|
- Shared ThreadPoolExecutor across sessions
|
||||||
|
- Global model manager for all subscriptions
|
||||||
|
|
||||||
|
3. **Cross-Session Dependencies**
|
||||||
|
- Session mapping lookups across different subscriptions
|
||||||
|
- Shared processing state between unrelated sessions
|
||||||
|
- Global frame caching across all cameras
|
||||||
|
|
||||||
|
## 🔧 Configuration Changes
|
||||||
|
|
||||||
|
### New Configuration Options:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"session_processes": {
|
||||||
|
"max_concurrent_sessions": 20,
|
||||||
|
"process_cleanup_timeout": 30,
|
||||||
|
"health_check_interval": 10,
|
||||||
|
"log_rotation": {
|
||||||
|
"max_size_mb": 100,
|
||||||
|
"backup_count": 5
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"resource_limits": {
|
||||||
|
"memory_per_process_mb": 2048,
|
||||||
|
"gpu_memory_fraction": 0.3
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## 📊 Benefits of New Architecture
|
||||||
|
|
||||||
|
### 🛡️ Complete Isolation:
|
||||||
|
- **Memory Isolation**: Each session runs in separate process memory space
|
||||||
|
- **Model Isolation**: No shared model cache between sessions
|
||||||
|
- **State Isolation**: Session mappings and processing state are process-local
|
||||||
|
- **Error Isolation**: Process crashes don't affect other sessions
|
||||||
|
|
||||||
|
### 📈 Performance Improvements:
|
||||||
|
- **True Parallelism**: Bypass Python GIL limitations
|
||||||
|
- **Resource Optimization**: Each process uses only required resources
|
||||||
|
- **Scalability**: Linear scaling with available CPU cores
|
||||||
|
- **Memory Efficiency**: Automatic cleanup on session termination
|
||||||
|
|
||||||
|
### 🔍 Enhanced Monitoring:
|
||||||
|
- **Per-Camera Logs**: Dedicated log file for each session
|
||||||
|
- **Resource Tracking**: Monitor CPU/memory per session process
|
||||||
|
- **Debugging**: Isolated logs make issue diagnosis easier
|
||||||
|
- **Audit Trail**: Complete processing history per camera
|
||||||
|
|
||||||
|
### 🚀 Operational Benefits:
|
||||||
|
- **Zero Cross-Session Contamination**: Impossible for sessions to affect each other
|
||||||
|
- **Hot Restart**: Individual session restart without affecting others
|
||||||
|
- **Resource Control**: Fine-grained resource allocation per session
|
||||||
|
- **Development**: Easier testing and debugging of individual sessions
|
||||||
|
|
||||||
|
## 🎬 Implementation Order
|
||||||
|
|
||||||
|
1. **Phase 1**: Core infrastructure (SessionProcessManager, IPC)
|
||||||
|
2. **Phase 2**: Per-session logging system
|
||||||
|
3. **Phase 3**: Model and pipeline isolation
|
||||||
|
4. **Phase 4**: Resource management and monitoring
|
||||||
|
|
||||||
|
## 🧪 Testing Strategy
|
||||||
|
|
||||||
|
1. **Unit Tests**: Test individual session processes in isolation
|
||||||
|
2. **Integration Tests**: Test main ↔ session process communication
|
||||||
|
3. **Load Tests**: Multiple concurrent sessions with different models
|
||||||
|
4. **Memory Tests**: Verify no cross-session memory leaks
|
||||||
|
5. **Logging Tests**: Verify correct log file creation and rotation
|
||||||
|
|
||||||
|
## 📝 Migration Checklist
|
||||||
|
|
||||||
|
- [ ] Backup current working version
|
||||||
|
- [ ] Implement Phase 1 (core infrastructure)
|
||||||
|
- [ ] Test with single session process
|
||||||
|
- [ ] Implement Phase 2 (logging)
|
||||||
|
- [ ] Test with multiple concurrent sessions
|
||||||
|
- [ ] Implement Phase 3 (isolation)
|
||||||
|
- [ ] Verify complete elimination of shared state
|
||||||
|
- [ ] Implement Phase 4 (resource management)
|
||||||
|
- [ ] Performance testing and optimization
|
||||||
|
- [ ] Documentation updates
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Expected Outcome**: Complete elimination of cross-session result contamination with enhanced monitoring capabilities and true session isolation.
|
14
app.py
14
app.py
|
@ -22,15 +22,11 @@ if __name__ != "__main__": # When imported by uvicorn
|
||||||
from core.communication.websocket import websocket_endpoint
|
from core.communication.websocket import websocket_endpoint
|
||||||
from core.communication.state import worker_state
|
from core.communication.state import worker_state
|
||||||
|
|
||||||
# Configure logging
|
# Import and setup main process logging
|
||||||
logging.basicConfig(
|
from core.logging.session_logger import setup_main_process_logging
|
||||||
level=logging.DEBUG,
|
|
||||||
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
|
# Configure main process logging
|
||||||
handlers=[
|
setup_main_process_logging("logs")
|
||||||
logging.FileHandler("detector_worker.log"),
|
|
||||||
logging.StreamHandler()
|
|
||||||
]
|
|
||||||
)
|
|
||||||
|
|
||||||
logger = logging.getLogger("detector_worker")
|
logger = logging.getLogger("detector_worker")
|
||||||
logger.setLevel(logging.DEBUG)
|
logger.setLevel(logging.DEBUG)
|
||||||
|
|
319
core/communication/session_integration.py
Normal file
319
core/communication/session_integration.py
Normal file
|
@ -0,0 +1,319 @@
|
||||||
|
"""
|
||||||
|
Integration layer between WebSocket handler and Session Process Manager.
|
||||||
|
Bridges the existing WebSocket protocol with the new session-based architecture.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import logging
|
||||||
|
from typing import Dict, Any, Optional
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
from ..processes.session_manager import SessionProcessManager
|
||||||
|
from ..processes.communication import DetectionResultResponse, ErrorResponse
|
||||||
|
from .state import worker_state
|
||||||
|
from .messages import serialize_outgoing_message
|
||||||
|
# Streaming is now handled directly by session workers - no shared stream manager needed
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
class SessionWebSocketIntegration:
|
||||||
|
"""
|
||||||
|
Integration layer that connects WebSocket protocol with Session Process Manager.
|
||||||
|
Maintains compatibility with existing WebSocket message handling.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, websocket_handler=None):
|
||||||
|
"""
|
||||||
|
Initialize session WebSocket integration.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
websocket_handler: Reference to WebSocket handler for sending messages
|
||||||
|
"""
|
||||||
|
self.websocket_handler = websocket_handler
|
||||||
|
self.session_manager = SessionProcessManager()
|
||||||
|
|
||||||
|
# Track active subscriptions for compatibility
|
||||||
|
self.active_subscriptions: Dict[str, Dict[str, Any]] = {}
|
||||||
|
|
||||||
|
# Set up callbacks
|
||||||
|
self.session_manager.set_detection_result_callback(self._on_detection_result)
|
||||||
|
self.session_manager.set_error_callback(self._on_session_error)
|
||||||
|
|
||||||
|
async def start(self):
|
||||||
|
"""Start the session integration."""
|
||||||
|
await self.session_manager.start()
|
||||||
|
logger.info("Session WebSocket integration started")
|
||||||
|
|
||||||
|
async def stop(self):
|
||||||
|
"""Stop the session integration."""
|
||||||
|
await self.session_manager.stop()
|
||||||
|
logger.info("Session WebSocket integration stopped")
|
||||||
|
|
||||||
|
async def handle_set_subscription_list(self, message) -> bool:
|
||||||
|
"""
|
||||||
|
Handle setSubscriptionList message by managing session processes.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
message: SetSubscriptionListMessage
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if successful
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
logger.info(f"Processing subscription list with {len(message.subscriptions)} subscriptions")
|
||||||
|
|
||||||
|
new_subscription_ids = set()
|
||||||
|
for subscription in message.subscriptions:
|
||||||
|
subscription_id = subscription.subscriptionIdentifier
|
||||||
|
new_subscription_ids.add(subscription_id)
|
||||||
|
|
||||||
|
# Check if this is a new subscription
|
||||||
|
if subscription_id not in self.active_subscriptions:
|
||||||
|
logger.info(f"Creating new session for subscription: {subscription_id}")
|
||||||
|
|
||||||
|
# Convert subscription to configuration dict
|
||||||
|
subscription_config = {
|
||||||
|
'subscriptionIdentifier': subscription.subscriptionIdentifier,
|
||||||
|
'rtspUrl': getattr(subscription, 'rtspUrl', None),
|
||||||
|
'snapshotUrl': getattr(subscription, 'snapshotUrl', None),
|
||||||
|
'snapshotInterval': getattr(subscription, 'snapshotInterval', 5000),
|
||||||
|
'modelUrl': subscription.modelUrl,
|
||||||
|
'modelId': subscription.modelId,
|
||||||
|
'modelName': subscription.modelName,
|
||||||
|
'cropX1': subscription.cropX1,
|
||||||
|
'cropY1': subscription.cropY1,
|
||||||
|
'cropX2': subscription.cropX2,
|
||||||
|
'cropY2': subscription.cropY2
|
||||||
|
}
|
||||||
|
|
||||||
|
# Create session process
|
||||||
|
success = await self.session_manager.create_session(
|
||||||
|
subscription_id, subscription_config
|
||||||
|
)
|
||||||
|
|
||||||
|
if success:
|
||||||
|
self.active_subscriptions[subscription_id] = subscription_config
|
||||||
|
logger.info(f"Session created successfully for {subscription_id}")
|
||||||
|
|
||||||
|
# Stream handling is now integrated into session worker process
|
||||||
|
else:
|
||||||
|
logger.error(f"Failed to create session for {subscription_id}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
else:
|
||||||
|
# Update existing subscription configuration if needed
|
||||||
|
self.active_subscriptions[subscription_id].update({
|
||||||
|
'modelUrl': subscription.modelUrl,
|
||||||
|
'modelId': subscription.modelId,
|
||||||
|
'modelName': subscription.modelName,
|
||||||
|
'cropX1': subscription.cropX1,
|
||||||
|
'cropY1': subscription.cropY1,
|
||||||
|
'cropX2': subscription.cropX2,
|
||||||
|
'cropY2': subscription.cropY2
|
||||||
|
})
|
||||||
|
|
||||||
|
# Remove sessions for subscriptions that are no longer active
|
||||||
|
current_subscription_ids = set(self.active_subscriptions.keys())
|
||||||
|
removed_subscriptions = current_subscription_ids - new_subscription_ids
|
||||||
|
|
||||||
|
for subscription_id in removed_subscriptions:
|
||||||
|
logger.info(f"Removing session for subscription: {subscription_id}")
|
||||||
|
await self.session_manager.remove_session(subscription_id)
|
||||||
|
del self.active_subscriptions[subscription_id]
|
||||||
|
|
||||||
|
# Update worker state for compatibility
|
||||||
|
worker_state.set_subscriptions(message.subscriptions)
|
||||||
|
|
||||||
|
logger.info(f"Subscription list processed: {len(new_subscription_ids)} active sessions")
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error handling subscription list: {e}", exc_info=True)
|
||||||
|
return False
|
||||||
|
|
||||||
|
async def handle_set_session_id(self, message) -> bool:
|
||||||
|
"""
|
||||||
|
Handle setSessionId message by forwarding to appropriate session process.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
message: SetSessionIdMessage
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if successful
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
display_id = message.payload.displayIdentifier
|
||||||
|
session_id = message.payload.sessionId
|
||||||
|
|
||||||
|
logger.info(f"Setting session ID {session_id} for display {display_id}")
|
||||||
|
|
||||||
|
# Find subscription identifier for this display
|
||||||
|
subscription_id = None
|
||||||
|
for sub_id in self.active_subscriptions.keys():
|
||||||
|
# Extract display identifier from subscription identifier
|
||||||
|
if display_id in sub_id:
|
||||||
|
subscription_id = sub_id
|
||||||
|
break
|
||||||
|
|
||||||
|
if not subscription_id:
|
||||||
|
logger.error(f"No active subscription found for display {display_id}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Forward to session process
|
||||||
|
success = await self.session_manager.set_session_id(
|
||||||
|
subscription_id, str(session_id), display_id
|
||||||
|
)
|
||||||
|
|
||||||
|
if success:
|
||||||
|
# Update worker state for compatibility
|
||||||
|
worker_state.set_session_id(display_id, session_id)
|
||||||
|
logger.info(f"Session ID {session_id} set successfully for {display_id}")
|
||||||
|
else:
|
||||||
|
logger.error(f"Failed to set session ID {session_id} for {display_id}")
|
||||||
|
|
||||||
|
return success
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error setting session ID: {e}", exc_info=True)
|
||||||
|
return False
|
||||||
|
|
||||||
|
async def process_frame(self, subscription_id: str, frame: np.ndarray, display_id: str, timestamp: float = None) -> bool:
|
||||||
|
"""
|
||||||
|
Process frame through appropriate session process.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
subscription_id: Subscription identifier
|
||||||
|
frame: Frame to process
|
||||||
|
display_id: Display identifier
|
||||||
|
timestamp: Frame timestamp
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if frame was processed successfully
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
if timestamp is None:
|
||||||
|
timestamp = asyncio.get_event_loop().time()
|
||||||
|
|
||||||
|
# Forward frame to session process
|
||||||
|
success = await self.session_manager.process_frame(
|
||||||
|
subscription_id, frame, display_id, timestamp
|
||||||
|
)
|
||||||
|
|
||||||
|
if not success:
|
||||||
|
logger.warning(f"Failed to process frame for subscription {subscription_id}")
|
||||||
|
|
||||||
|
return success
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error processing frame for {subscription_id}: {e}", exc_info=True)
|
||||||
|
return False
|
||||||
|
|
||||||
|
async def _on_detection_result(self, subscription_id: str, response: DetectionResultResponse):
|
||||||
|
"""
|
||||||
|
Handle detection result from session process.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
subscription_id: Subscription identifier
|
||||||
|
response: Detection result response
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
logger.debug(f"Received detection result from {subscription_id}: phase={response.phase}")
|
||||||
|
|
||||||
|
# Send imageDetection message via WebSocket (if needed)
|
||||||
|
if self.websocket_handler and hasattr(self.websocket_handler, 'send_message'):
|
||||||
|
from .models import ImageDetectionMessage, DetectionData
|
||||||
|
|
||||||
|
# Convert response detections to the expected format
|
||||||
|
# The DetectionData expects modelId and modelName, and detection dict
|
||||||
|
detection_data = DetectionData(
|
||||||
|
detection=response.detections,
|
||||||
|
modelId=getattr(response, 'model_id', 0), # Get from response if available
|
||||||
|
modelName=getattr(response, 'model_name', 'unknown') # Get from response if available
|
||||||
|
)
|
||||||
|
|
||||||
|
# Convert timestamp to string format if it exists
|
||||||
|
timestamp_str = None
|
||||||
|
if hasattr(response, 'timestamp') and response.timestamp:
|
||||||
|
from datetime import datetime
|
||||||
|
if isinstance(response.timestamp, (int, float)):
|
||||||
|
# Convert Unix timestamp to ISO format string
|
||||||
|
timestamp_str = datetime.fromtimestamp(response.timestamp).strftime("%Y-%m-%dT%H:%M:%S.%fZ")
|
||||||
|
else:
|
||||||
|
timestamp_str = str(response.timestamp)
|
||||||
|
|
||||||
|
detection_message = ImageDetectionMessage(
|
||||||
|
subscriptionIdentifier=subscription_id,
|
||||||
|
data=detection_data,
|
||||||
|
timestamp=timestamp_str
|
||||||
|
)
|
||||||
|
|
||||||
|
serialized = serialize_outgoing_message(detection_message)
|
||||||
|
await self.websocket_handler.send_message(serialized)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error handling detection result from {subscription_id}: {e}", exc_info=True)
|
||||||
|
|
||||||
|
async def _on_session_error(self, subscription_id: str, error_response: ErrorResponse):
|
||||||
|
"""
|
||||||
|
Handle error from session process.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
subscription_id: Subscription identifier
|
||||||
|
error_response: Error response
|
||||||
|
"""
|
||||||
|
logger.error(f"Session error from {subscription_id}: {error_response.error_type} - {error_response.error_message}")
|
||||||
|
|
||||||
|
# Send error message via WebSocket if needed
|
||||||
|
if self.websocket_handler and hasattr(self.websocket_handler, 'send_message'):
|
||||||
|
error_message = {
|
||||||
|
'type': 'sessionError',
|
||||||
|
'payload': {
|
||||||
|
'subscriptionIdentifier': subscription_id,
|
||||||
|
'errorType': error_response.error_type,
|
||||||
|
'errorMessage': error_response.error_message,
|
||||||
|
'timestamp': error_response.timestamp
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
try:
|
||||||
|
serialized = serialize_outgoing_message(error_message)
|
||||||
|
await self.websocket_handler.send_message(serialized)
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Failed to send error message: {e}")
|
||||||
|
|
||||||
|
def get_session_stats(self) -> Dict[str, Any]:
|
||||||
|
"""
|
||||||
|
Get statistics about active sessions.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dictionary with session statistics
|
||||||
|
"""
|
||||||
|
return {
|
||||||
|
'active_sessions': self.session_manager.get_session_count(),
|
||||||
|
'max_sessions': self.session_manager.max_concurrent_sessions,
|
||||||
|
'subscriptions': list(self.active_subscriptions.keys())
|
||||||
|
}
|
||||||
|
|
||||||
|
async def handle_progression_stage(self, message) -> bool:
|
||||||
|
"""
|
||||||
|
Handle setProgressionStage message.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
message: SetProgressionStageMessage
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if successful
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
# For now, just update worker state for compatibility
|
||||||
|
# In future phases, this could be forwarded to session processes
|
||||||
|
worker_state.set_progression_stage(
|
||||||
|
message.payload.displayIdentifier,
|
||||||
|
message.payload.progressionStage
|
||||||
|
)
|
||||||
|
return True
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error handling progression stage: {e}", exc_info=True)
|
||||||
|
return False
|
||||||
|
|
|
@ -24,6 +24,7 @@ from .state import worker_state, SystemMetrics
|
||||||
from ..models import ModelManager
|
from ..models import ModelManager
|
||||||
from ..streaming.manager import shared_stream_manager
|
from ..streaming.manager import shared_stream_manager
|
||||||
from ..tracking.integration import TrackingPipelineIntegration
|
from ..tracking.integration import TrackingPipelineIntegration
|
||||||
|
from .session_integration import SessionWebSocketIntegration
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
@ -48,6 +49,9 @@ class WebSocketHandler:
|
||||||
self._heartbeat_count = 0
|
self._heartbeat_count = 0
|
||||||
self._last_processed_models: set = set() # Cache of last processed model IDs
|
self._last_processed_models: set = set() # Cache of last processed model IDs
|
||||||
|
|
||||||
|
# Initialize session integration
|
||||||
|
self.session_integration = SessionWebSocketIntegration(self)
|
||||||
|
|
||||||
async def handle_connection(self) -> None:
|
async def handle_connection(self) -> None:
|
||||||
"""
|
"""
|
||||||
Main connection handler that manages the WebSocket lifecycle.
|
Main connection handler that manages the WebSocket lifecycle.
|
||||||
|
@ -66,14 +70,16 @@ class WebSocketHandler:
|
||||||
# Send immediate heartbeat to show connection is alive
|
# Send immediate heartbeat to show connection is alive
|
||||||
await self._send_immediate_heartbeat()
|
await self._send_immediate_heartbeat()
|
||||||
|
|
||||||
# Start background tasks (matching original architecture)
|
# Start session integration
|
||||||
stream_task = asyncio.create_task(self._process_streams())
|
await self.session_integration.start()
|
||||||
|
|
||||||
|
# Start background tasks - stream processing now handled by session workers
|
||||||
heartbeat_task = asyncio.create_task(self._send_heartbeat())
|
heartbeat_task = asyncio.create_task(self._send_heartbeat())
|
||||||
message_task = asyncio.create_task(self._handle_messages())
|
message_task = asyncio.create_task(self._handle_messages())
|
||||||
|
|
||||||
logger.info(f"WebSocket background tasks started for {client_info} (stream + heartbeat + message handler)")
|
logger.info(f"WebSocket background tasks started for {client_info} (heartbeat + message handler)")
|
||||||
|
|
||||||
# Wait for heartbeat and message tasks (stream runs independently)
|
# Wait for heartbeat and message tasks
|
||||||
await asyncio.gather(heartbeat_task, message_task)
|
await asyncio.gather(heartbeat_task, message_task)
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
|
@ -87,6 +93,11 @@ class WebSocketHandler:
|
||||||
await stream_task
|
await stream_task
|
||||||
except asyncio.CancelledError:
|
except asyncio.CancelledError:
|
||||||
logger.debug(f"Stream task cancelled for {client_info}")
|
logger.debug(f"Stream task cancelled for {client_info}")
|
||||||
|
|
||||||
|
# Stop session integration
|
||||||
|
if hasattr(self, 'session_integration'):
|
||||||
|
await self.session_integration.stop()
|
||||||
|
|
||||||
await self._cleanup()
|
await self._cleanup()
|
||||||
|
|
||||||
async def _send_immediate_heartbeat(self) -> None:
|
async def _send_immediate_heartbeat(self) -> None:
|
||||||
|
@ -180,11 +191,11 @@ class WebSocketHandler:
|
||||||
|
|
||||||
try:
|
try:
|
||||||
if message_type == MessageTypes.SET_SUBSCRIPTION_LIST:
|
if message_type == MessageTypes.SET_SUBSCRIPTION_LIST:
|
||||||
await self._handle_set_subscription_list(message)
|
await self.session_integration.handle_set_subscription_list(message)
|
||||||
elif message_type == MessageTypes.SET_SESSION_ID:
|
elif message_type == MessageTypes.SET_SESSION_ID:
|
||||||
await self._handle_set_session_id(message)
|
await self.session_integration.handle_set_session_id(message)
|
||||||
elif message_type == MessageTypes.SET_PROGRESSION_STAGE:
|
elif message_type == MessageTypes.SET_PROGRESSION_STAGE:
|
||||||
await self._handle_set_progression_stage(message)
|
await self.session_integration.handle_progression_stage(message)
|
||||||
elif message_type == MessageTypes.REQUEST_STATE:
|
elif message_type == MessageTypes.REQUEST_STATE:
|
||||||
await self._handle_request_state(message)
|
await self._handle_request_state(message)
|
||||||
elif message_type == MessageTypes.PATCH_SESSION_RESULT:
|
elif message_type == MessageTypes.PATCH_SESSION_RESULT:
|
||||||
|
@ -619,31 +630,108 @@ class WebSocketHandler:
|
||||||
logger.error(f"Failed to send WebSocket message: {e}")
|
logger.error(f"Failed to send WebSocket message: {e}")
|
||||||
raise
|
raise
|
||||||
|
|
||||||
|
async def send_message(self, message) -> None:
|
||||||
|
"""Public method to send messages (used by session integration)."""
|
||||||
|
await self._send_message(message)
|
||||||
|
|
||||||
|
# DEPRECATED: Stream processing is now handled directly by session worker processes
|
||||||
async def _process_streams(self) -> None:
|
async def _process_streams(self) -> None:
|
||||||
"""
|
"""
|
||||||
Stream processing task that handles frame processing and detection.
|
DEPRECATED: Stream processing task that handles frame processing and detection.
|
||||||
This is a placeholder for Phase 2 - currently just logs that it's running.
|
Stream processing is now integrated directly into session worker processes.
|
||||||
"""
|
"""
|
||||||
|
logger.info("DEPRECATED: Stream processing task - now handled by session workers")
|
||||||
|
return # Exit immediately - no longer needed
|
||||||
|
|
||||||
|
# OLD CODE (disabled):
|
||||||
logger.info("Stream processing task started")
|
logger.info("Stream processing task started")
|
||||||
try:
|
try:
|
||||||
while self.connected:
|
while self.connected:
|
||||||
# Get current subscriptions
|
# Get current subscriptions
|
||||||
subscriptions = worker_state.get_all_subscriptions()
|
subscriptions = worker_state.get_all_subscriptions()
|
||||||
|
|
||||||
# TODO: Phase 2 - Add actual frame processing logic here
|
if not subscriptions:
|
||||||
# This will include:
|
await asyncio.sleep(0.5)
|
||||||
# - Frame reading from RTSP/HTTP streams
|
continue
|
||||||
# - Model inference using loaded pipelines
|
|
||||||
# - Detection result sending via WebSocket
|
# Process frames for each subscription
|
||||||
|
for subscription in subscriptions:
|
||||||
|
await self._process_subscription_frames(subscription)
|
||||||
|
|
||||||
# Sleep to prevent excessive CPU usage (similar to old poll_interval)
|
# Sleep to prevent excessive CPU usage (similar to old poll_interval)
|
||||||
await asyncio.sleep(0.1) # 100ms polling interval
|
await asyncio.sleep(0.25) # 250ms polling interval
|
||||||
|
|
||||||
except asyncio.CancelledError:
|
except asyncio.CancelledError:
|
||||||
logger.info("Stream processing task cancelled")
|
logger.info("Stream processing task cancelled")
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"Error in stream processing: {e}", exc_info=True)
|
logger.error(f"Error in stream processing: {e}", exc_info=True)
|
||||||
|
|
||||||
|
async def _process_subscription_frames(self, subscription) -> None:
|
||||||
|
"""
|
||||||
|
Process frames for a single subscription by getting frames from stream manager
|
||||||
|
and forwarding them to the appropriate session worker.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
subscription_id = subscription.subscriptionIdentifier
|
||||||
|
|
||||||
|
# Get the latest frame from the stream manager
|
||||||
|
frame_data = await self._get_frame_from_stream_manager(subscription)
|
||||||
|
|
||||||
|
if frame_data and frame_data['frame'] is not None:
|
||||||
|
# Extract display identifier (format: "test1;Dispenser Camera 1")
|
||||||
|
display_id = subscription_id.split(';')[-1] if ';' in subscription_id else subscription_id
|
||||||
|
|
||||||
|
# Forward frame to session worker via session integration
|
||||||
|
success = await self.session_integration.process_frame(
|
||||||
|
subscription_id=subscription_id,
|
||||||
|
frame=frame_data['frame'],
|
||||||
|
display_id=display_id,
|
||||||
|
timestamp=frame_data.get('timestamp', asyncio.get_event_loop().time())
|
||||||
|
)
|
||||||
|
|
||||||
|
if success:
|
||||||
|
logger.debug(f"[Frame Processing] Sent frame to session worker for {subscription_id}")
|
||||||
|
else:
|
||||||
|
logger.warning(f"[Frame Processing] Failed to send frame to session worker for {subscription_id}")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error processing frames for {subscription.subscriptionIdentifier}: {e}")
|
||||||
|
|
||||||
|
async def _get_frame_from_stream_manager(self, subscription) -> dict:
|
||||||
|
"""
|
||||||
|
Get the latest frame from the stream manager for a subscription using existing API.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
subscription_id = subscription.subscriptionIdentifier
|
||||||
|
|
||||||
|
# Use existing stream manager API to check if frame is available
|
||||||
|
if not shared_stream_manager.has_frame(subscription_id):
|
||||||
|
# Stream should already be started by session integration
|
||||||
|
return {'frame': None, 'timestamp': None}
|
||||||
|
|
||||||
|
# Get frame using existing API with crop coordinates if available
|
||||||
|
crop_coords = None
|
||||||
|
if hasattr(subscription, 'cropX1') and subscription.cropX1 is not None:
|
||||||
|
crop_coords = (
|
||||||
|
subscription.cropX1, subscription.cropY1,
|
||||||
|
subscription.cropX2, subscription.cropY2
|
||||||
|
)
|
||||||
|
|
||||||
|
# Use existing get_frame method
|
||||||
|
frame = shared_stream_manager.get_frame(subscription_id, crop_coords)
|
||||||
|
if frame is not None:
|
||||||
|
return {
|
||||||
|
'frame': frame,
|
||||||
|
'timestamp': asyncio.get_event_loop().time()
|
||||||
|
}
|
||||||
|
|
||||||
|
return {'frame': None, 'timestamp': None}
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error getting frame from stream manager for {subscription.subscriptionIdentifier}: {e}")
|
||||||
|
return {'frame': None, 'timestamp': None}
|
||||||
|
|
||||||
|
|
||||||
async def _cleanup(self) -> None:
|
async def _cleanup(self) -> None:
|
||||||
"""Clean up resources when connection closes."""
|
"""Clean up resources when connection closes."""
|
||||||
logger.info("Cleaning up WebSocket connection")
|
logger.info("Cleaning up WebSocket connection")
|
||||||
|
|
|
@ -58,10 +58,10 @@ class DetectionPipeline:
|
||||||
# Pipeline configuration
|
# Pipeline configuration
|
||||||
self.pipeline_config = pipeline_parser.pipeline_config
|
self.pipeline_config = pipeline_parser.pipeline_config
|
||||||
|
|
||||||
# SessionId to subscriptionIdentifier mapping
|
# SessionId to subscriptionIdentifier mapping (ISOLATED per session process)
|
||||||
self.session_to_subscription = {}
|
self.session_to_subscription = {}
|
||||||
|
|
||||||
# SessionId to processing results mapping (for combining with license plate results)
|
# SessionId to processing results mapping (ISOLATED per session process)
|
||||||
self.session_processing_results = {}
|
self.session_processing_results = {}
|
||||||
|
|
||||||
# Statistics
|
# Statistics
|
||||||
|
@ -72,7 +72,8 @@ class DetectionPipeline:
|
||||||
'total_processing_time': 0.0
|
'total_processing_time': 0.0
|
||||||
}
|
}
|
||||||
|
|
||||||
logger.info("DetectionPipeline initialized")
|
logger.info(f"DetectionPipeline initialized for model {model_id} with ISOLATED state (no shared mappings or cache)")
|
||||||
|
logger.info(f"Pipeline instance ID: {id(self)} - unique per session process")
|
||||||
|
|
||||||
async def initialize(self) -> bool:
|
async def initialize(self) -> bool:
|
||||||
"""
|
"""
|
||||||
|
|
3
core/logging/__init__.py
Normal file
3
core/logging/__init__.py
Normal file
|
@ -0,0 +1,3 @@
|
||||||
|
"""
|
||||||
|
Per-Session Logging Module
|
||||||
|
"""
|
356
core/logging/session_logger.py
Normal file
356
core/logging/session_logger.py
Normal file
|
@ -0,0 +1,356 @@
|
||||||
|
"""
|
||||||
|
Per-Session Logging Configuration and Management.
|
||||||
|
Each session process gets its own dedicated log file with rotation support.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import logging.handlers
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
from datetime import datetime
|
||||||
|
import re
|
||||||
|
|
||||||
|
|
||||||
|
class PerSessionLogger:
|
||||||
|
"""
|
||||||
|
Per-session logging configuration that creates dedicated log files for each session.
|
||||||
|
Supports log rotation and structured logging with session context.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
session_id: str,
|
||||||
|
subscription_identifier: str,
|
||||||
|
log_dir: str = "logs",
|
||||||
|
max_size_mb: int = 100,
|
||||||
|
backup_count: int = 5,
|
||||||
|
log_level: int = logging.INFO,
|
||||||
|
detection_mode: bool = True
|
||||||
|
):
|
||||||
|
"""
|
||||||
|
Initialize per-session logger.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
session_id: Unique session identifier
|
||||||
|
subscription_identifier: Subscription identifier (contains camera info)
|
||||||
|
log_dir: Directory to store log files
|
||||||
|
max_size_mb: Maximum size of each log file in MB
|
||||||
|
backup_count: Number of backup files to keep
|
||||||
|
log_level: Logging level
|
||||||
|
detection_mode: If True, uses reduced verbosity for detection processes
|
||||||
|
"""
|
||||||
|
self.session_id = session_id
|
||||||
|
self.subscription_identifier = subscription_identifier
|
||||||
|
self.log_dir = Path(log_dir)
|
||||||
|
self.max_size_mb = max_size_mb
|
||||||
|
self.backup_count = backup_count
|
||||||
|
self.log_level = log_level
|
||||||
|
self.detection_mode = detection_mode
|
||||||
|
|
||||||
|
# Ensure log directory exists
|
||||||
|
self.log_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
# Generate clean filename from subscription identifier
|
||||||
|
self.log_filename = self._generate_log_filename()
|
||||||
|
self.log_filepath = self.log_dir / self.log_filename
|
||||||
|
|
||||||
|
# Create logger
|
||||||
|
self.logger = self._setup_logger()
|
||||||
|
|
||||||
|
def _generate_log_filename(self) -> str:
|
||||||
|
"""
|
||||||
|
Generate a clean filename from subscription identifier.
|
||||||
|
Format: detector_worker_camera_{clean_subscription_id}.log
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Clean filename for the log file
|
||||||
|
"""
|
||||||
|
# Clean subscription identifier for filename
|
||||||
|
# Replace problematic characters with underscores
|
||||||
|
clean_sub_id = re.sub(r'[^\w\-_.]', '_', self.subscription_identifier)
|
||||||
|
|
||||||
|
# Remove consecutive underscores
|
||||||
|
clean_sub_id = re.sub(r'_+', '_', clean_sub_id)
|
||||||
|
|
||||||
|
# Remove leading/trailing underscores
|
||||||
|
clean_sub_id = clean_sub_id.strip('_')
|
||||||
|
|
||||||
|
# Generate filename
|
||||||
|
filename = f"detector_worker_camera_{clean_sub_id}.log"
|
||||||
|
|
||||||
|
return filename
|
||||||
|
|
||||||
|
def _setup_logger(self) -> logging.Logger:
|
||||||
|
"""
|
||||||
|
Setup logger with file handler and rotation.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Configured logger instance
|
||||||
|
"""
|
||||||
|
# Create logger with unique name
|
||||||
|
logger_name = f"session_worker_{self.session_id}"
|
||||||
|
logger = logging.getLogger(logger_name)
|
||||||
|
|
||||||
|
# Clear any existing handlers to avoid duplicates
|
||||||
|
logger.handlers.clear()
|
||||||
|
|
||||||
|
# Set logging level
|
||||||
|
logger.setLevel(self.log_level)
|
||||||
|
|
||||||
|
# Create formatter with session context
|
||||||
|
formatter = logging.Formatter(
|
||||||
|
fmt='%(asctime)s [%(levelname)s] %(name)s [Session: {session_id}] [Camera: {camera}]: %(message)s'.format(
|
||||||
|
session_id=self.session_id,
|
||||||
|
camera=self.subscription_identifier
|
||||||
|
),
|
||||||
|
datefmt='%Y-%m-%d %H:%M:%S'
|
||||||
|
)
|
||||||
|
|
||||||
|
# Create rotating file handler
|
||||||
|
max_bytes = self.max_size_mb * 1024 * 1024 # Convert MB to bytes
|
||||||
|
file_handler = logging.handlers.RotatingFileHandler(
|
||||||
|
filename=self.log_filepath,
|
||||||
|
maxBytes=max_bytes,
|
||||||
|
backupCount=self.backup_count,
|
||||||
|
encoding='utf-8'
|
||||||
|
)
|
||||||
|
file_handler.setLevel(self.log_level)
|
||||||
|
file_handler.setFormatter(formatter)
|
||||||
|
|
||||||
|
# Create console handler for debugging (optional)
|
||||||
|
console_handler = logging.StreamHandler(sys.stdout)
|
||||||
|
console_handler.setLevel(logging.WARNING) # Only warnings and errors to console
|
||||||
|
console_formatter = logging.Formatter(
|
||||||
|
fmt='[{session_id}] [%(levelname)s]: %(message)s'.format(
|
||||||
|
session_id=self.session_id
|
||||||
|
)
|
||||||
|
)
|
||||||
|
console_handler.setFormatter(console_formatter)
|
||||||
|
|
||||||
|
# Add handlers to logger
|
||||||
|
logger.addHandler(file_handler)
|
||||||
|
logger.addHandler(console_handler)
|
||||||
|
|
||||||
|
# Prevent propagation to root logger
|
||||||
|
logger.propagate = False
|
||||||
|
|
||||||
|
# Log initialization (reduced verbosity in detection mode)
|
||||||
|
if self.detection_mode:
|
||||||
|
logger.info(f"Session logger ready for {self.subscription_identifier}")
|
||||||
|
else:
|
||||||
|
logger.info(f"Per-session logger initialized")
|
||||||
|
logger.info(f"Log file: {self.log_filepath}")
|
||||||
|
logger.info(f"Session ID: {self.session_id}")
|
||||||
|
logger.info(f"Camera: {self.subscription_identifier}")
|
||||||
|
logger.info(f"Max size: {self.max_size_mb}MB, Backup count: {self.backup_count}")
|
||||||
|
|
||||||
|
return logger
|
||||||
|
|
||||||
|
def get_logger(self) -> logging.Logger:
|
||||||
|
"""
|
||||||
|
Get the configured logger instance.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Logger instance for this session
|
||||||
|
"""
|
||||||
|
return self.logger
|
||||||
|
|
||||||
|
def log_session_start(self, process_id: int):
|
||||||
|
"""
|
||||||
|
Log session start with process information.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
process_id: Process ID of the session worker
|
||||||
|
"""
|
||||||
|
if self.detection_mode:
|
||||||
|
self.logger.info(f"Session started - PID {process_id}")
|
||||||
|
else:
|
||||||
|
self.logger.info("=" * 60)
|
||||||
|
self.logger.info(f"SESSION STARTED")
|
||||||
|
self.logger.info(f"Process ID: {process_id}")
|
||||||
|
self.logger.info(f"Session ID: {self.session_id}")
|
||||||
|
self.logger.info(f"Camera: {self.subscription_identifier}")
|
||||||
|
self.logger.info(f"Timestamp: {datetime.now().isoformat()}")
|
||||||
|
self.logger.info("=" * 60)
|
||||||
|
|
||||||
|
def log_session_end(self):
|
||||||
|
"""Log session end."""
|
||||||
|
self.logger.info("=" * 60)
|
||||||
|
self.logger.info(f"SESSION ENDED")
|
||||||
|
self.logger.info(f"Timestamp: {datetime.now().isoformat()}")
|
||||||
|
self.logger.info("=" * 60)
|
||||||
|
|
||||||
|
def log_model_loading(self, model_id: int, model_name: str, model_path: str):
|
||||||
|
"""
|
||||||
|
Log model loading information.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
model_id: Model ID
|
||||||
|
model_name: Model name
|
||||||
|
model_path: Path to the model
|
||||||
|
"""
|
||||||
|
if self.detection_mode:
|
||||||
|
self.logger.info(f"Loading model {model_id}: {model_name}")
|
||||||
|
else:
|
||||||
|
self.logger.info("-" * 40)
|
||||||
|
self.logger.info(f"MODEL LOADING")
|
||||||
|
self.logger.info(f"Model ID: {model_id}")
|
||||||
|
self.logger.info(f"Model Name: {model_name}")
|
||||||
|
self.logger.info(f"Model Path: {model_path}")
|
||||||
|
self.logger.info("-" * 40)
|
||||||
|
|
||||||
|
def log_frame_processing(self, frame_count: int, processing_time: float, detections: int):
|
||||||
|
"""
|
||||||
|
Log frame processing information.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
frame_count: Current frame count
|
||||||
|
processing_time: Processing time in seconds
|
||||||
|
detections: Number of detections found
|
||||||
|
"""
|
||||||
|
self.logger.debug(f"FRAME #{frame_count}: Processing time: {processing_time:.3f}s, Detections: {detections}")
|
||||||
|
|
||||||
|
def log_detection_result(self, detection_type: str, confidence: float, bbox: list):
|
||||||
|
"""
|
||||||
|
Log detection result.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
detection_type: Type of detection (e.g., "Car", "Frontal")
|
||||||
|
confidence: Detection confidence
|
||||||
|
bbox: Bounding box coordinates
|
||||||
|
"""
|
||||||
|
self.logger.info(f"DETECTION: {detection_type} (conf: {confidence:.3f}) at {bbox}")
|
||||||
|
|
||||||
|
def log_database_operation(self, operation: str, session_id: str, success: bool):
|
||||||
|
"""
|
||||||
|
Log database operation.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
operation: Type of operation
|
||||||
|
session_id: Session ID used in database
|
||||||
|
success: Whether operation succeeded
|
||||||
|
"""
|
||||||
|
status = "SUCCESS" if success else "FAILED"
|
||||||
|
self.logger.info(f"DATABASE {operation}: {status} (session: {session_id})")
|
||||||
|
|
||||||
|
def log_error(self, error_type: str, error_message: str, traceback_str: Optional[str] = None):
|
||||||
|
"""
|
||||||
|
Log error with context.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
error_type: Type of error
|
||||||
|
error_message: Error message
|
||||||
|
traceback_str: Optional traceback string
|
||||||
|
"""
|
||||||
|
self.logger.error(f"ERROR [{error_type}]: {error_message}")
|
||||||
|
if traceback_str:
|
||||||
|
self.logger.error(f"Traceback:\n{traceback_str}")
|
||||||
|
|
||||||
|
def get_log_stats(self) -> dict:
|
||||||
|
"""
|
||||||
|
Get logging statistics.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dictionary with logging statistics
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
if self.log_filepath.exists():
|
||||||
|
stat = self.log_filepath.stat()
|
||||||
|
return {
|
||||||
|
'log_file': str(self.log_filepath),
|
||||||
|
'file_size_mb': round(stat.st_size / (1024 * 1024), 2),
|
||||||
|
'created': datetime.fromtimestamp(stat.st_ctime).isoformat(),
|
||||||
|
'modified': datetime.fromtimestamp(stat.st_mtime).isoformat(),
|
||||||
|
}
|
||||||
|
else:
|
||||||
|
return {'log_file': str(self.log_filepath), 'status': 'not_created'}
|
||||||
|
except Exception as e:
|
||||||
|
return {'log_file': str(self.log_filepath), 'error': str(e)}
|
||||||
|
|
||||||
|
def cleanup(self):
|
||||||
|
"""Cleanup logger handlers."""
|
||||||
|
if hasattr(self, 'logger') and self.logger:
|
||||||
|
for handler in self.logger.handlers[:]:
|
||||||
|
handler.close()
|
||||||
|
self.logger.removeHandler(handler)
|
||||||
|
|
||||||
|
|
||||||
|
class MainProcessLogger:
|
||||||
|
"""
|
||||||
|
Logger configuration for the main FastAPI process.
|
||||||
|
Separate from session logs to avoid confusion.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, log_dir: str = "logs", max_size_mb: int = 50, backup_count: int = 3):
|
||||||
|
"""
|
||||||
|
Initialize main process logger.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
log_dir: Directory to store log files
|
||||||
|
max_size_mb: Maximum size of each log file in MB
|
||||||
|
backup_count: Number of backup files to keep
|
||||||
|
"""
|
||||||
|
self.log_dir = Path(log_dir)
|
||||||
|
self.max_size_mb = max_size_mb
|
||||||
|
self.backup_count = backup_count
|
||||||
|
|
||||||
|
# Ensure log directory exists
|
||||||
|
self.log_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
# Setup main process logger
|
||||||
|
self._setup_main_logger()
|
||||||
|
|
||||||
|
def _setup_main_logger(self):
|
||||||
|
"""Setup main process logger."""
|
||||||
|
# Configure root logger
|
||||||
|
root_logger = logging.getLogger("detector_worker")
|
||||||
|
|
||||||
|
# Clear existing handlers
|
||||||
|
for handler in root_logger.handlers[:]:
|
||||||
|
root_logger.removeHandler(handler)
|
||||||
|
|
||||||
|
# Set level
|
||||||
|
root_logger.setLevel(logging.INFO)
|
||||||
|
|
||||||
|
# Create formatter
|
||||||
|
formatter = logging.Formatter(
|
||||||
|
fmt='%(asctime)s [%(levelname)s] %(name)s [MAIN]: %(message)s',
|
||||||
|
datefmt='%Y-%m-%d %H:%M:%S'
|
||||||
|
)
|
||||||
|
|
||||||
|
# Create rotating file handler for main process
|
||||||
|
max_bytes = self.max_size_mb * 1024 * 1024
|
||||||
|
main_log_path = self.log_dir / "detector_worker_main.log"
|
||||||
|
file_handler = logging.handlers.RotatingFileHandler(
|
||||||
|
filename=main_log_path,
|
||||||
|
maxBytes=max_bytes,
|
||||||
|
backupCount=self.backup_count,
|
||||||
|
encoding='utf-8'
|
||||||
|
)
|
||||||
|
file_handler.setLevel(logging.INFO)
|
||||||
|
file_handler.setFormatter(formatter)
|
||||||
|
|
||||||
|
# Create console handler
|
||||||
|
console_handler = logging.StreamHandler()
|
||||||
|
console_handler.setLevel(logging.INFO)
|
||||||
|
console_handler.setFormatter(formatter)
|
||||||
|
|
||||||
|
# Add handlers
|
||||||
|
root_logger.addHandler(file_handler)
|
||||||
|
root_logger.addHandler(console_handler)
|
||||||
|
|
||||||
|
# Log initialization
|
||||||
|
root_logger.info("Main process logger initialized")
|
||||||
|
root_logger.info(f"Main log file: {main_log_path}")
|
||||||
|
|
||||||
|
|
||||||
|
def setup_main_process_logging(log_dir: str = "logs"):
|
||||||
|
"""
|
||||||
|
Setup logging for the main FastAPI process.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
log_dir: Directory to store log files
|
||||||
|
"""
|
||||||
|
MainProcessLogger(log_dir=log_dir)
|
|
@ -34,11 +34,7 @@ class InferenceResult:
|
||||||
|
|
||||||
|
|
||||||
class YOLOWrapper:
|
class YOLOWrapper:
|
||||||
"""Wrapper for YOLO models with caching and optimization"""
|
"""Wrapper for YOLO models with per-instance isolation (no shared cache)"""
|
||||||
|
|
||||||
# Class-level model cache shared across all instances
|
|
||||||
_model_cache: Dict[str, Any] = {}
|
|
||||||
_cache_lock = Lock()
|
|
||||||
|
|
||||||
def __init__(self, model_path: Path, model_id: str, device: Optional[str] = None):
|
def __init__(self, model_path: Path, model_id: str, device: Optional[str] = None):
|
||||||
"""
|
"""
|
||||||
|
@ -65,61 +61,48 @@ class YOLOWrapper:
|
||||||
logger.info(f"Initialized YOLO wrapper for {model_id} on {self.device}")
|
logger.info(f"Initialized YOLO wrapper for {model_id} on {self.device}")
|
||||||
|
|
||||||
def _load_model(self) -> None:
|
def _load_model(self) -> None:
|
||||||
"""Load the YOLO model with caching"""
|
"""Load the YOLO model in isolation (no shared cache)"""
|
||||||
cache_key = str(self.model_path)
|
try:
|
||||||
|
from ultralytics import YOLO
|
||||||
|
|
||||||
with self._cache_lock:
|
logger.debug(f"Loading YOLO model {self.model_id} from {self.model_path} (ISOLATED)")
|
||||||
# Check if model is already cached
|
|
||||||
if cache_key in self._model_cache:
|
|
||||||
logger.info(f"Loading model {self.model_id} from cache")
|
|
||||||
self.model = self._model_cache[cache_key]
|
|
||||||
self._extract_class_names()
|
|
||||||
return
|
|
||||||
|
|
||||||
# Load model
|
# Load model directly without any caching
|
||||||
try:
|
self.model = YOLO(str(self.model_path))
|
||||||
from ultralytics import YOLO
|
|
||||||
|
|
||||||
logger.info(f"Loading YOLO model from {self.model_path}")
|
# Determine if this is a classification model based on filename or model structure
|
||||||
|
# Classification models typically have 'cls' in filename
|
||||||
|
is_classification = 'cls' in str(self.model_path).lower()
|
||||||
|
|
||||||
# Load model normally first
|
# For classification models, create a separate instance with task parameter
|
||||||
self.model = YOLO(str(self.model_path))
|
if is_classification:
|
||||||
|
try:
|
||||||
|
# Reload with classification task (like ML engineer's approach)
|
||||||
|
self.model = YOLO(str(self.model_path), task="classify")
|
||||||
|
logger.info(f"Loaded classification model {self.model_id} with task='classify' (ISOLATED)")
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Failed to load with task='classify', using default: {e}")
|
||||||
|
# Fall back to regular loading
|
||||||
|
self.model = YOLO(str(self.model_path))
|
||||||
|
logger.info(f"Loaded model {self.model_id} with default task (ISOLATED)")
|
||||||
|
else:
|
||||||
|
logger.info(f"Loaded detection model {self.model_id} (ISOLATED)")
|
||||||
|
|
||||||
# Determine if this is a classification model based on filename or model structure
|
# Move model to device
|
||||||
# Classification models typically have 'cls' in filename
|
if self.device == 'cuda' and torch.cuda.is_available():
|
||||||
is_classification = 'cls' in str(self.model_path).lower()
|
self.model.to('cuda')
|
||||||
|
logger.info(f"Model {self.model_id} moved to GPU (ISOLATED)")
|
||||||
|
|
||||||
# For classification models, create a separate instance with task parameter
|
self._extract_class_names()
|
||||||
if is_classification:
|
|
||||||
try:
|
|
||||||
# Reload with classification task (like ML engineer's approach)
|
|
||||||
self.model = YOLO(str(self.model_path), task="classify")
|
|
||||||
logger.info(f"Loaded classification model {self.model_id} with task='classify'")
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"Failed to load with task='classify', using default: {e}")
|
|
||||||
# Fall back to regular loading
|
|
||||||
self.model = YOLO(str(self.model_path))
|
|
||||||
logger.info(f"Loaded model {self.model_id} with default task")
|
|
||||||
else:
|
|
||||||
logger.info(f"Loaded detection model {self.model_id}")
|
|
||||||
|
|
||||||
# Move model to device
|
logger.debug(f"Successfully loaded model {self.model_id} in isolation - no shared cache!")
|
||||||
if self.device == 'cuda' and torch.cuda.is_available():
|
|
||||||
self.model.to('cuda')
|
|
||||||
logger.info(f"Model {self.model_id} moved to GPU")
|
|
||||||
|
|
||||||
# Cache the model
|
except ImportError:
|
||||||
self._model_cache[cache_key] = self.model
|
logger.error("Ultralytics YOLO not installed. Install with: pip install ultralytics")
|
||||||
self._extract_class_names()
|
raise
|
||||||
|
except Exception as e:
|
||||||
logger.info(f"Successfully loaded model {self.model_id}")
|
logger.error(f"Failed to load YOLO model {self.model_id}: {str(e)}", exc_info=True)
|
||||||
|
raise
|
||||||
except ImportError:
|
|
||||||
logger.error("Ultralytics YOLO not installed. Install with: pip install ultralytics")
|
|
||||||
raise
|
|
||||||
except Exception as e:
|
|
||||||
logger.error(f"Failed to load YOLO model {self.model_id}: {str(e)}", exc_info=True)
|
|
||||||
raise
|
|
||||||
|
|
||||||
def _extract_class_names(self) -> None:
|
def _extract_class_names(self) -> None:
|
||||||
"""Extract class names from the model"""
|
"""Extract class names from the model"""
|
||||||
|
@ -375,19 +358,15 @@ class YOLOWrapper:
|
||||||
return 'cls' in str(self.model_path).lower() or 'classify' in str(self.model_path).lower()
|
return 'cls' in str(self.model_path).lower() or 'classify' in str(self.model_path).lower()
|
||||||
|
|
||||||
def clear_cache(self) -> None:
|
def clear_cache(self) -> None:
|
||||||
"""Clear the model cache"""
|
"""Clear model resources (no cache in isolated mode)"""
|
||||||
with self._cache_lock:
|
if self.model:
|
||||||
cache_key = str(self.model_path)
|
# Clear any model resources if needed
|
||||||
if cache_key in self._model_cache:
|
logger.info(f"Cleared resources for model {self.model_id} (no shared cache)")
|
||||||
del self._model_cache[cache_key]
|
|
||||||
logger.info(f"Cleared cache for model {self.model_id}")
|
|
||||||
|
|
||||||
@classmethod
|
@classmethod
|
||||||
def clear_all_cache(cls) -> None:
|
def clear_all_cache(cls) -> None:
|
||||||
"""Clear all cached models"""
|
"""No-op in isolated mode (no shared cache to clear)"""
|
||||||
with cls._cache_lock:
|
logger.info("No shared cache to clear in isolated mode")
|
||||||
cls._model_cache.clear()
|
|
||||||
logger.info("Cleared all model cache")
|
|
||||||
|
|
||||||
def warmup(self, image_size: Tuple[int, int] = (640, 640)) -> None:
|
def warmup(self, image_size: Tuple[int, int] = (640, 640)) -> None:
|
||||||
"""
|
"""
|
||||||
|
@ -438,16 +417,17 @@ class ModelInferenceManager:
|
||||||
YOLOWrapper instance
|
YOLOWrapper instance
|
||||||
"""
|
"""
|
||||||
with self._lock:
|
with self._lock:
|
||||||
# Check if already loaded
|
# Check if already loaded for this specific manager instance
|
||||||
if model_id in self.models:
|
if model_id in self.models:
|
||||||
logger.debug(f"Model {model_id} already loaded")
|
logger.debug(f"Model {model_id} already loaded in this manager instance")
|
||||||
return self.models[model_id]
|
return self.models[model_id]
|
||||||
|
|
||||||
# Load the model
|
# Load the model (each instance loads independently)
|
||||||
model_path = self.model_dir / model_file
|
model_path = self.model_dir / model_file
|
||||||
if not model_path.exists():
|
if not model_path.exists():
|
||||||
raise FileNotFoundError(f"Model file not found: {model_path}")
|
raise FileNotFoundError(f"Model file not found: {model_path}")
|
||||||
|
|
||||||
|
logger.info(f"Loading model {model_id} in isolation for this manager instance")
|
||||||
wrapper = YOLOWrapper(model_path, model_id, device)
|
wrapper = YOLOWrapper(model_path, model_id, device)
|
||||||
self.models[model_id] = wrapper
|
self.models[model_id] = wrapper
|
||||||
|
|
||||||
|
|
3
core/processes/__init__.py
Normal file
3
core/processes/__init__.py
Normal file
|
@ -0,0 +1,3 @@
|
||||||
|
"""
|
||||||
|
Session Process Management Module
|
||||||
|
"""
|
317
core/processes/communication.py
Normal file
317
core/processes/communication.py
Normal file
|
@ -0,0 +1,317 @@
|
||||||
|
"""
|
||||||
|
Inter-Process Communication (IPC) system for session processes.
|
||||||
|
Defines message types and protocols for main ↔ session communication.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import time
|
||||||
|
from enum import Enum
|
||||||
|
from typing import Dict, Any, Optional, Union
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
|
||||||
|
class MessageType(Enum):
|
||||||
|
"""Message types for IPC communication."""
|
||||||
|
|
||||||
|
# Commands: Main → Session
|
||||||
|
INITIALIZE = "initialize"
|
||||||
|
PROCESS_FRAME = "process_frame"
|
||||||
|
SET_SESSION_ID = "set_session_id"
|
||||||
|
SHUTDOWN = "shutdown"
|
||||||
|
HEALTH_CHECK = "health_check"
|
||||||
|
|
||||||
|
# Responses: Session → Main
|
||||||
|
INITIALIZED = "initialized"
|
||||||
|
DETECTION_RESULT = "detection_result"
|
||||||
|
SESSION_SET = "session_set"
|
||||||
|
SHUTDOWN_COMPLETE = "shutdown_complete"
|
||||||
|
HEALTH_RESPONSE = "health_response"
|
||||||
|
ERROR = "error"
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class IPCMessage:
|
||||||
|
"""Base class for all IPC messages."""
|
||||||
|
type: MessageType
|
||||||
|
session_id: str
|
||||||
|
timestamp: float = field(default_factory=time.time)
|
||||||
|
message_id: str = field(default_factory=lambda: str(int(time.time() * 1000000)))
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class InitializeCommand(IPCMessage):
|
||||||
|
"""Initialize session process with configuration."""
|
||||||
|
subscription_config: Dict[str, Any] = field(default_factory=dict)
|
||||||
|
model_config: Dict[str, Any] = field(default_factory=dict)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ProcessFrameCommand(IPCMessage):
|
||||||
|
"""Process a frame through the detection pipeline."""
|
||||||
|
frame: Optional[np.ndarray] = None
|
||||||
|
display_id: str = ""
|
||||||
|
subscription_identifier: str = ""
|
||||||
|
frame_timestamp: float = 0.0
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SetSessionIdCommand(IPCMessage):
|
||||||
|
"""Set the session ID for the current session."""
|
||||||
|
backend_session_id: str = ""
|
||||||
|
display_id: str = ""
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ShutdownCommand(IPCMessage):
|
||||||
|
"""Shutdown the session process gracefully."""
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class HealthCheckCommand(IPCMessage):
|
||||||
|
"""Check health status of session process."""
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class InitializedResponse(IPCMessage):
|
||||||
|
"""Response indicating successful initialization."""
|
||||||
|
success: bool = False
|
||||||
|
error_message: Optional[str] = None
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class DetectionResultResponse(IPCMessage):
|
||||||
|
"""Detection results from session process."""
|
||||||
|
detections: Dict[str, Any] = field(default_factory=dict)
|
||||||
|
processing_time: float = 0.0
|
||||||
|
phase: str = "" # "detection" or "processing"
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SessionSetResponse(IPCMessage):
|
||||||
|
"""Response confirming session ID was set."""
|
||||||
|
success: bool = False
|
||||||
|
backend_session_id: str = ""
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ShutdownCompleteResponse(IPCMessage):
|
||||||
|
"""Response confirming graceful shutdown."""
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class HealthResponse(IPCMessage):
|
||||||
|
"""Health status response."""
|
||||||
|
status: str = "unknown" # "healthy", "degraded", "unhealthy"
|
||||||
|
memory_usage_mb: float = 0.0
|
||||||
|
cpu_percent: float = 0.0
|
||||||
|
gpu_memory_mb: Optional[float] = None
|
||||||
|
uptime_seconds: float = 0.0
|
||||||
|
processed_frames: int = 0
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ErrorResponse(IPCMessage):
|
||||||
|
"""Error message from session process."""
|
||||||
|
error_type: str = ""
|
||||||
|
error_message: str = ""
|
||||||
|
traceback: Optional[str] = None
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
# Type aliases for message unions
|
||||||
|
CommandMessage = Union[
|
||||||
|
InitializeCommand,
|
||||||
|
ProcessFrameCommand,
|
||||||
|
SetSessionIdCommand,
|
||||||
|
ShutdownCommand,
|
||||||
|
HealthCheckCommand
|
||||||
|
]
|
||||||
|
|
||||||
|
ResponseMessage = Union[
|
||||||
|
InitializedResponse,
|
||||||
|
DetectionResultResponse,
|
||||||
|
SessionSetResponse,
|
||||||
|
ShutdownCompleteResponse,
|
||||||
|
HealthResponse,
|
||||||
|
ErrorResponse
|
||||||
|
]
|
||||||
|
|
||||||
|
IPCMessageUnion = Union[CommandMessage, ResponseMessage]
|
||||||
|
|
||||||
|
|
||||||
|
class MessageSerializer:
|
||||||
|
"""Handles serialization/deserialization of IPC messages."""
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def serialize_message(message: IPCMessageUnion) -> Dict[str, Any]:
|
||||||
|
"""
|
||||||
|
Serialize message to dictionary for queue transport.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
message: Message to serialize
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dictionary representation of message
|
||||||
|
"""
|
||||||
|
result = {
|
||||||
|
'type': message.type.value,
|
||||||
|
'session_id': message.session_id,
|
||||||
|
'timestamp': message.timestamp,
|
||||||
|
'message_id': message.message_id,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Add specific fields based on message type
|
||||||
|
if isinstance(message, InitializeCommand):
|
||||||
|
result.update({
|
||||||
|
'subscription_config': message.subscription_config,
|
||||||
|
'model_config': message.model_config
|
||||||
|
})
|
||||||
|
elif isinstance(message, ProcessFrameCommand):
|
||||||
|
result.update({
|
||||||
|
'frame': message.frame,
|
||||||
|
'display_id': message.display_id,
|
||||||
|
'subscription_identifier': message.subscription_identifier,
|
||||||
|
'frame_timestamp': message.frame_timestamp
|
||||||
|
})
|
||||||
|
elif isinstance(message, SetSessionIdCommand):
|
||||||
|
result.update({
|
||||||
|
'backend_session_id': message.backend_session_id,
|
||||||
|
'display_id': message.display_id
|
||||||
|
})
|
||||||
|
elif isinstance(message, InitializedResponse):
|
||||||
|
result.update({
|
||||||
|
'success': message.success,
|
||||||
|
'error_message': message.error_message
|
||||||
|
})
|
||||||
|
elif isinstance(message, DetectionResultResponse):
|
||||||
|
result.update({
|
||||||
|
'detections': message.detections,
|
||||||
|
'processing_time': message.processing_time,
|
||||||
|
'phase': message.phase
|
||||||
|
})
|
||||||
|
elif isinstance(message, SessionSetResponse):
|
||||||
|
result.update({
|
||||||
|
'success': message.success,
|
||||||
|
'backend_session_id': message.backend_session_id
|
||||||
|
})
|
||||||
|
elif isinstance(message, HealthResponse):
|
||||||
|
result.update({
|
||||||
|
'status': message.status,
|
||||||
|
'memory_usage_mb': message.memory_usage_mb,
|
||||||
|
'cpu_percent': message.cpu_percent,
|
||||||
|
'gpu_memory_mb': message.gpu_memory_mb,
|
||||||
|
'uptime_seconds': message.uptime_seconds,
|
||||||
|
'processed_frames': message.processed_frames
|
||||||
|
})
|
||||||
|
elif isinstance(message, ErrorResponse):
|
||||||
|
result.update({
|
||||||
|
'error_type': message.error_type,
|
||||||
|
'error_message': message.error_message,
|
||||||
|
'traceback': message.traceback
|
||||||
|
})
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def deserialize_message(data: Dict[str, Any]) -> IPCMessageUnion:
|
||||||
|
"""
|
||||||
|
Deserialize dictionary back to message object.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
data: Dictionary representation
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Deserialized message object
|
||||||
|
"""
|
||||||
|
msg_type = MessageType(data['type'])
|
||||||
|
session_id = data['session_id']
|
||||||
|
timestamp = data['timestamp']
|
||||||
|
message_id = data['message_id']
|
||||||
|
|
||||||
|
base_kwargs = {
|
||||||
|
'session_id': session_id,
|
||||||
|
'timestamp': timestamp,
|
||||||
|
'message_id': message_id
|
||||||
|
}
|
||||||
|
|
||||||
|
if msg_type == MessageType.INITIALIZE:
|
||||||
|
return InitializeCommand(
|
||||||
|
type=msg_type,
|
||||||
|
subscription_config=data['subscription_config'],
|
||||||
|
model_config=data['model_config'],
|
||||||
|
**base_kwargs
|
||||||
|
)
|
||||||
|
elif msg_type == MessageType.PROCESS_FRAME:
|
||||||
|
return ProcessFrameCommand(
|
||||||
|
type=msg_type,
|
||||||
|
frame=data['frame'],
|
||||||
|
display_id=data['display_id'],
|
||||||
|
subscription_identifier=data['subscription_identifier'],
|
||||||
|
frame_timestamp=data['frame_timestamp'],
|
||||||
|
**base_kwargs
|
||||||
|
)
|
||||||
|
elif msg_type == MessageType.SET_SESSION_ID:
|
||||||
|
return SetSessionIdCommand(
|
||||||
|
backend_session_id=data['backend_session_id'],
|
||||||
|
display_id=data['display_id'],
|
||||||
|
**base_kwargs
|
||||||
|
)
|
||||||
|
elif msg_type == MessageType.SHUTDOWN:
|
||||||
|
return ShutdownCommand(**base_kwargs)
|
||||||
|
elif msg_type == MessageType.HEALTH_CHECK:
|
||||||
|
return HealthCheckCommand(**base_kwargs)
|
||||||
|
elif msg_type == MessageType.INITIALIZED:
|
||||||
|
return InitializedResponse(
|
||||||
|
type=msg_type,
|
||||||
|
success=data['success'],
|
||||||
|
error_message=data.get('error_message'),
|
||||||
|
**base_kwargs
|
||||||
|
)
|
||||||
|
elif msg_type == MessageType.DETECTION_RESULT:
|
||||||
|
return DetectionResultResponse(
|
||||||
|
type=msg_type,
|
||||||
|
detections=data['detections'],
|
||||||
|
processing_time=data['processing_time'],
|
||||||
|
phase=data['phase'],
|
||||||
|
**base_kwargs
|
||||||
|
)
|
||||||
|
elif msg_type == MessageType.SESSION_SET:
|
||||||
|
return SessionSetResponse(
|
||||||
|
type=msg_type,
|
||||||
|
success=data['success'],
|
||||||
|
backend_session_id=data['backend_session_id'],
|
||||||
|
**base_kwargs
|
||||||
|
)
|
||||||
|
elif msg_type == MessageType.SHUTDOWN_COMPLETE:
|
||||||
|
return ShutdownCompleteResponse(type=msg_type, **base_kwargs)
|
||||||
|
elif msg_type == MessageType.HEALTH_RESPONSE:
|
||||||
|
return HealthResponse(
|
||||||
|
type=msg_type,
|
||||||
|
status=data['status'],
|
||||||
|
memory_usage_mb=data['memory_usage_mb'],
|
||||||
|
cpu_percent=data['cpu_percent'],
|
||||||
|
gpu_memory_mb=data.get('gpu_memory_mb'),
|
||||||
|
uptime_seconds=data.get('uptime_seconds', 0.0),
|
||||||
|
processed_frames=data.get('processed_frames', 0),
|
||||||
|
**base_kwargs
|
||||||
|
)
|
||||||
|
elif msg_type == MessageType.ERROR:
|
||||||
|
return ErrorResponse(
|
||||||
|
type=msg_type,
|
||||||
|
error_type=data['error_type'],
|
||||||
|
error_message=data['error_message'],
|
||||||
|
traceback=data.get('traceback'),
|
||||||
|
**base_kwargs
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
raise ValueError(f"Unknown message type: {msg_type}")
|
464
core/processes/session_manager.py
Normal file
464
core/processes/session_manager.py
Normal file
|
@ -0,0 +1,464 @@
|
||||||
|
"""
|
||||||
|
Session Process Manager - Manages lifecycle of session processes.
|
||||||
|
Handles process spawning, monitoring, cleanup, and health checks.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import time
|
||||||
|
import logging
|
||||||
|
import asyncio
|
||||||
|
import multiprocessing as mp
|
||||||
|
from typing import Dict, Optional, Any, Callable
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from concurrent.futures import ThreadPoolExecutor
|
||||||
|
import threading
|
||||||
|
|
||||||
|
from .communication import (
|
||||||
|
MessageSerializer, MessageType,
|
||||||
|
InitializeCommand, ProcessFrameCommand, SetSessionIdCommand,
|
||||||
|
ShutdownCommand, HealthCheckCommand,
|
||||||
|
InitializedResponse, DetectionResultResponse, SessionSetResponse,
|
||||||
|
ShutdownCompleteResponse, HealthResponse, ErrorResponse
|
||||||
|
)
|
||||||
|
from .session_worker import session_worker_main
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SessionProcessInfo:
|
||||||
|
"""Information about a running session process."""
|
||||||
|
session_id: str
|
||||||
|
subscription_identifier: str
|
||||||
|
process: mp.Process
|
||||||
|
command_queue: mp.Queue
|
||||||
|
response_queue: mp.Queue
|
||||||
|
created_at: float
|
||||||
|
last_health_check: float = 0.0
|
||||||
|
is_initialized: bool = False
|
||||||
|
processed_frames: int = 0
|
||||||
|
|
||||||
|
|
||||||
|
class SessionProcessManager:
|
||||||
|
"""
|
||||||
|
Manages lifecycle of session processes.
|
||||||
|
Each session gets its own dedicated process for complete isolation.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, max_concurrent_sessions: int = 20, health_check_interval: int = 30):
|
||||||
|
"""
|
||||||
|
Initialize session process manager.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
max_concurrent_sessions: Maximum number of concurrent session processes
|
||||||
|
health_check_interval: Interval in seconds between health checks
|
||||||
|
"""
|
||||||
|
self.max_concurrent_sessions = max_concurrent_sessions
|
||||||
|
self.health_check_interval = health_check_interval
|
||||||
|
|
||||||
|
# Active session processes
|
||||||
|
self.sessions: Dict[str, SessionProcessInfo] = {}
|
||||||
|
self.subscription_to_session: Dict[str, str] = {}
|
||||||
|
|
||||||
|
# Thread pool for response processing
|
||||||
|
self.response_executor = ThreadPoolExecutor(max_workers=4, thread_name_prefix="ResponseProcessor")
|
||||||
|
|
||||||
|
# Health check task
|
||||||
|
self.health_check_task = None
|
||||||
|
self.is_running = False
|
||||||
|
|
||||||
|
# Message callbacks
|
||||||
|
self.detection_result_callback: Optional[Callable] = None
|
||||||
|
self.error_callback: Optional[Callable] = None
|
||||||
|
|
||||||
|
# Store main event loop for async operations from threads
|
||||||
|
self.main_event_loop = None
|
||||||
|
|
||||||
|
logger.info(f"SessionProcessManager initialized (max_sessions={max_concurrent_sessions})")
|
||||||
|
|
||||||
|
async def start(self):
|
||||||
|
"""Start the session process manager."""
|
||||||
|
if self.is_running:
|
||||||
|
return
|
||||||
|
|
||||||
|
self.is_running = True
|
||||||
|
|
||||||
|
# Store the main event loop for use in threads
|
||||||
|
self.main_event_loop = asyncio.get_running_loop()
|
||||||
|
|
||||||
|
logger.info("Starting session process manager")
|
||||||
|
|
||||||
|
# Start health check task
|
||||||
|
self.health_check_task = asyncio.create_task(self._health_check_loop())
|
||||||
|
|
||||||
|
# Start response processing for existing sessions
|
||||||
|
for session_info in self.sessions.values():
|
||||||
|
self._start_response_processing(session_info)
|
||||||
|
|
||||||
|
async def stop(self):
|
||||||
|
"""Stop the session process manager and cleanup all sessions."""
|
||||||
|
if not self.is_running:
|
||||||
|
return
|
||||||
|
|
||||||
|
logger.info("Stopping session process manager")
|
||||||
|
self.is_running = False
|
||||||
|
|
||||||
|
# Cancel health check task
|
||||||
|
if self.health_check_task:
|
||||||
|
self.health_check_task.cancel()
|
||||||
|
try:
|
||||||
|
await self.health_check_task
|
||||||
|
except asyncio.CancelledError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
# Shutdown all sessions
|
||||||
|
shutdown_tasks = []
|
||||||
|
for session_id in list(self.sessions.keys()):
|
||||||
|
task = asyncio.create_task(self.remove_session(session_id))
|
||||||
|
shutdown_tasks.append(task)
|
||||||
|
|
||||||
|
if shutdown_tasks:
|
||||||
|
await asyncio.gather(*shutdown_tasks, return_exceptions=True)
|
||||||
|
|
||||||
|
# Cleanup thread pool
|
||||||
|
self.response_executor.shutdown(wait=True)
|
||||||
|
|
||||||
|
logger.info("Session process manager stopped")
|
||||||
|
|
||||||
|
async def create_session(self, subscription_identifier: str, subscription_config: Dict[str, Any]) -> bool:
|
||||||
|
"""
|
||||||
|
Create a new session process for a subscription.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
subscription_identifier: Unique subscription identifier
|
||||||
|
subscription_config: Subscription configuration
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if session was created successfully
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
# Check if we're at capacity
|
||||||
|
if len(self.sessions) >= self.max_concurrent_sessions:
|
||||||
|
logger.warning(f"Cannot create session: at max capacity ({self.max_concurrent_sessions})")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Check if subscription already has a session
|
||||||
|
if subscription_identifier in self.subscription_to_session:
|
||||||
|
existing_session_id = self.subscription_to_session[subscription_identifier]
|
||||||
|
logger.info(f"Subscription {subscription_identifier} already has session {existing_session_id}")
|
||||||
|
return True
|
||||||
|
|
||||||
|
# Generate unique session ID
|
||||||
|
session_id = f"session_{int(time.time() * 1000)}_{subscription_identifier.replace(';', '_')}"
|
||||||
|
|
||||||
|
logger.info(f"Creating session process for subscription {subscription_identifier}")
|
||||||
|
logger.info(f"Session ID: {session_id}")
|
||||||
|
|
||||||
|
# Create communication queues
|
||||||
|
command_queue = mp.Queue()
|
||||||
|
response_queue = mp.Queue()
|
||||||
|
|
||||||
|
# Create and start process
|
||||||
|
process = mp.Process(
|
||||||
|
target=session_worker_main,
|
||||||
|
args=(session_id, command_queue, response_queue),
|
||||||
|
name=f"SessionWorker-{session_id}"
|
||||||
|
)
|
||||||
|
process.start()
|
||||||
|
|
||||||
|
# Store session information
|
||||||
|
session_info = SessionProcessInfo(
|
||||||
|
session_id=session_id,
|
||||||
|
subscription_identifier=subscription_identifier,
|
||||||
|
process=process,
|
||||||
|
command_queue=command_queue,
|
||||||
|
response_queue=response_queue,
|
||||||
|
created_at=time.time()
|
||||||
|
)
|
||||||
|
|
||||||
|
self.sessions[session_id] = session_info
|
||||||
|
self.subscription_to_session[subscription_identifier] = session_id
|
||||||
|
|
||||||
|
# Start response processing for this session
|
||||||
|
self._start_response_processing(session_info)
|
||||||
|
|
||||||
|
logger.info(f"Session process created: {session_id} (PID: {process.pid})")
|
||||||
|
|
||||||
|
# Initialize the session with configuration
|
||||||
|
model_config = {
|
||||||
|
'modelId': subscription_config.get('modelId'),
|
||||||
|
'modelUrl': subscription_config.get('modelUrl'),
|
||||||
|
'modelName': subscription_config.get('modelName')
|
||||||
|
}
|
||||||
|
|
||||||
|
init_command = InitializeCommand(
|
||||||
|
type=MessageType.INITIALIZE,
|
||||||
|
session_id=session_id,
|
||||||
|
subscription_config=subscription_config,
|
||||||
|
model_config=model_config
|
||||||
|
)
|
||||||
|
|
||||||
|
await self._send_command(session_id, init_command)
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Failed to create session for {subscription_identifier}: {e}", exc_info=True)
|
||||||
|
# Cleanup on failure
|
||||||
|
if session_id in self.sessions:
|
||||||
|
await self._cleanup_session(session_id)
|
||||||
|
return False
|
||||||
|
|
||||||
|
async def remove_session(self, subscription_identifier: str) -> bool:
|
||||||
|
"""
|
||||||
|
Remove a session process for a subscription.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
subscription_identifier: Subscription identifier to remove
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if session was removed successfully
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
session_id = self.subscription_to_session.get(subscription_identifier)
|
||||||
|
if not session_id:
|
||||||
|
logger.warning(f"No session found for subscription {subscription_identifier}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
logger.info(f"Removing session {session_id} for subscription {subscription_identifier}")
|
||||||
|
|
||||||
|
session_info = self.sessions.get(session_id)
|
||||||
|
if session_info:
|
||||||
|
# Send shutdown command
|
||||||
|
shutdown_command = ShutdownCommand(session_id=session_id)
|
||||||
|
await self._send_command(session_id, shutdown_command)
|
||||||
|
|
||||||
|
# Wait for graceful shutdown (with timeout)
|
||||||
|
try:
|
||||||
|
await asyncio.wait_for(self._wait_for_shutdown(session_info), timeout=10.0)
|
||||||
|
except asyncio.TimeoutError:
|
||||||
|
logger.warning(f"Session {session_id} did not shutdown gracefully, terminating")
|
||||||
|
|
||||||
|
# Cleanup session
|
||||||
|
await self._cleanup_session(session_id)
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Failed to remove session for {subscription_identifier}: {e}", exc_info=True)
|
||||||
|
return False
|
||||||
|
|
||||||
|
async def process_frame(self, subscription_identifier: str, frame: Any, display_id: str, frame_timestamp: float) -> bool:
|
||||||
|
"""
|
||||||
|
Send a frame to the session process for processing.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
subscription_identifier: Subscription identifier
|
||||||
|
frame: Frame to process
|
||||||
|
display_id: Display identifier
|
||||||
|
frame_timestamp: Timestamp of the frame
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if frame was sent successfully
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
session_id = self.subscription_to_session.get(subscription_identifier)
|
||||||
|
if not session_id:
|
||||||
|
logger.warning(f"No session found for subscription {subscription_identifier}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
session_info = self.sessions.get(session_id)
|
||||||
|
if not session_info or not session_info.is_initialized:
|
||||||
|
logger.warning(f"Session {session_id} not initialized")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Create process frame command
|
||||||
|
process_command = ProcessFrameCommand(
|
||||||
|
session_id=session_id,
|
||||||
|
frame=frame,
|
||||||
|
display_id=display_id,
|
||||||
|
subscription_identifier=subscription_identifier,
|
||||||
|
frame_timestamp=frame_timestamp
|
||||||
|
)
|
||||||
|
|
||||||
|
await self._send_command(session_id, process_command)
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Failed to process frame for {subscription_identifier}: {e}", exc_info=True)
|
||||||
|
return False
|
||||||
|
|
||||||
|
async def set_session_id(self, subscription_identifier: str, backend_session_id: str, display_id: str) -> bool:
|
||||||
|
"""
|
||||||
|
Set the backend session ID for a session.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
subscription_identifier: Subscription identifier
|
||||||
|
backend_session_id: Backend session ID
|
||||||
|
display_id: Display identifier
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if session ID was set successfully
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
session_id = self.subscription_to_session.get(subscription_identifier)
|
||||||
|
if not session_id:
|
||||||
|
logger.warning(f"No session found for subscription {subscription_identifier}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Create set session ID command
|
||||||
|
set_command = SetSessionIdCommand(
|
||||||
|
session_id=session_id,
|
||||||
|
backend_session_id=backend_session_id,
|
||||||
|
display_id=display_id
|
||||||
|
)
|
||||||
|
|
||||||
|
await self._send_command(session_id, set_command)
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Failed to set session ID for {subscription_identifier}: {e}", exc_info=True)
|
||||||
|
return False
|
||||||
|
|
||||||
|
def set_detection_result_callback(self, callback: Callable):
|
||||||
|
"""Set callback for handling detection results."""
|
||||||
|
self.detection_result_callback = callback
|
||||||
|
|
||||||
|
def set_error_callback(self, callback: Callable):
|
||||||
|
"""Set callback for handling errors."""
|
||||||
|
self.error_callback = callback
|
||||||
|
|
||||||
|
def get_session_count(self) -> int:
|
||||||
|
"""Get the number of active sessions."""
|
||||||
|
return len(self.sessions)
|
||||||
|
|
||||||
|
def get_session_info(self, subscription_identifier: str) -> Optional[Dict[str, Any]]:
|
||||||
|
"""Get information about a session."""
|
||||||
|
session_id = self.subscription_to_session.get(subscription_identifier)
|
||||||
|
if not session_id:
|
||||||
|
return None
|
||||||
|
|
||||||
|
session_info = self.sessions.get(session_id)
|
||||||
|
if not session_info:
|
||||||
|
return None
|
||||||
|
|
||||||
|
return {
|
||||||
|
'session_id': session_id,
|
||||||
|
'subscription_identifier': subscription_identifier,
|
||||||
|
'created_at': session_info.created_at,
|
||||||
|
'is_initialized': session_info.is_initialized,
|
||||||
|
'processed_frames': session_info.processed_frames,
|
||||||
|
'process_pid': session_info.process.pid if session_info.process.is_alive() else None,
|
||||||
|
'is_alive': session_info.process.is_alive()
|
||||||
|
}
|
||||||
|
|
||||||
|
async def _send_command(self, session_id: str, command):
|
||||||
|
"""Send command to session process."""
|
||||||
|
session_info = self.sessions.get(session_id)
|
||||||
|
if not session_info:
|
||||||
|
raise ValueError(f"Session {session_id} not found")
|
||||||
|
|
||||||
|
serialized = MessageSerializer.serialize_message(command)
|
||||||
|
session_info.command_queue.put(serialized)
|
||||||
|
|
||||||
|
def _start_response_processing(self, session_info: SessionProcessInfo):
|
||||||
|
"""Start processing responses from a session process."""
|
||||||
|
def process_responses():
|
||||||
|
while session_info.session_id in self.sessions and session_info.process.is_alive():
|
||||||
|
try:
|
||||||
|
if not session_info.response_queue.empty():
|
||||||
|
response_data = session_info.response_queue.get(timeout=1.0)
|
||||||
|
response = MessageSerializer.deserialize_message(response_data)
|
||||||
|
if self.main_event_loop:
|
||||||
|
asyncio.run_coroutine_threadsafe(
|
||||||
|
self._handle_response(session_info.session_id, response),
|
||||||
|
self.main_event_loop
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
time.sleep(0.01)
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error processing response from {session_info.session_id}: {e}")
|
||||||
|
|
||||||
|
self.response_executor.submit(process_responses)
|
||||||
|
|
||||||
|
async def _handle_response(self, session_id: str, response):
|
||||||
|
"""Handle response from session process."""
|
||||||
|
try:
|
||||||
|
session_info = self.sessions.get(session_id)
|
||||||
|
if not session_info:
|
||||||
|
return
|
||||||
|
|
||||||
|
if response.type == MessageType.INITIALIZED:
|
||||||
|
session_info.is_initialized = response.success
|
||||||
|
if response.success:
|
||||||
|
logger.info(f"Session {session_id} initialized successfully")
|
||||||
|
else:
|
||||||
|
logger.error(f"Session {session_id} initialization failed: {response.error_message}")
|
||||||
|
|
||||||
|
elif response.type == MessageType.DETECTION_RESULT:
|
||||||
|
session_info.processed_frames += 1
|
||||||
|
if self.detection_result_callback:
|
||||||
|
await self.detection_result_callback(session_info.subscription_identifier, response)
|
||||||
|
|
||||||
|
elif response.type == MessageType.SESSION_SET:
|
||||||
|
logger.info(f"Session ID set for {session_id}: {response.backend_session_id}")
|
||||||
|
|
||||||
|
elif response.type == MessageType.HEALTH_RESPONSE:
|
||||||
|
session_info.last_health_check = time.time()
|
||||||
|
logger.debug(f"Health check for {session_id}: {response.status}")
|
||||||
|
|
||||||
|
elif response.type == MessageType.ERROR:
|
||||||
|
logger.error(f"Error from session {session_id}: {response.error_message}")
|
||||||
|
if self.error_callback:
|
||||||
|
await self.error_callback(session_info.subscription_identifier, response)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error handling response from {session_id}: {e}", exc_info=True)
|
||||||
|
|
||||||
|
async def _wait_for_shutdown(self, session_info: SessionProcessInfo):
|
||||||
|
"""Wait for session process to shutdown gracefully."""
|
||||||
|
while session_info.process.is_alive():
|
||||||
|
await asyncio.sleep(0.1)
|
||||||
|
|
||||||
|
async def _cleanup_session(self, session_id: str):
|
||||||
|
"""Cleanup session process and resources."""
|
||||||
|
try:
|
||||||
|
session_info = self.sessions.get(session_id)
|
||||||
|
if not session_info:
|
||||||
|
return
|
||||||
|
|
||||||
|
# Terminate process if still alive
|
||||||
|
if session_info.process.is_alive():
|
||||||
|
session_info.process.terminate()
|
||||||
|
# Wait a bit for graceful termination
|
||||||
|
await asyncio.sleep(1.0)
|
||||||
|
if session_info.process.is_alive():
|
||||||
|
session_info.process.kill()
|
||||||
|
|
||||||
|
# Remove from tracking
|
||||||
|
del self.sessions[session_id]
|
||||||
|
if session_info.subscription_identifier in self.subscription_to_session:
|
||||||
|
del self.subscription_to_session[session_info.subscription_identifier]
|
||||||
|
|
||||||
|
logger.info(f"Session {session_id} cleaned up")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error cleaning up session {session_id}: {e}", exc_info=True)
|
||||||
|
|
||||||
|
async def _health_check_loop(self):
|
||||||
|
"""Periodic health check of all session processes."""
|
||||||
|
while self.is_running:
|
||||||
|
try:
|
||||||
|
for session_id in list(self.sessions.keys()):
|
||||||
|
session_info = self.sessions.get(session_id)
|
||||||
|
if session_info and session_info.is_initialized:
|
||||||
|
# Send health check
|
||||||
|
health_command = HealthCheckCommand(session_id=session_id)
|
||||||
|
await self._send_command(session_id, health_command)
|
||||||
|
|
||||||
|
await asyncio.sleep(self.health_check_interval)
|
||||||
|
|
||||||
|
except asyncio.CancelledError:
|
||||||
|
break
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error in health check loop: {e}", exc_info=True)
|
||||||
|
await asyncio.sleep(5.0) # Brief pause before retrying
|
813
core/processes/session_worker.py
Normal file
813
core/processes/session_worker.py
Normal file
|
@ -0,0 +1,813 @@
|
||||||
|
"""
|
||||||
|
Session Worker Process - Individual process that handles one session completely.
|
||||||
|
Each camera/session gets its own dedicated worker process for complete isolation.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import multiprocessing as mp
|
||||||
|
import time
|
||||||
|
import logging
|
||||||
|
import sys
|
||||||
|
import os
|
||||||
|
import traceback
|
||||||
|
import psutil
|
||||||
|
import threading
|
||||||
|
import cv2
|
||||||
|
import requests
|
||||||
|
from typing import Dict, Any, Optional, Tuple
|
||||||
|
from pathlib import Path
|
||||||
|
import numpy as np
|
||||||
|
from queue import Queue, Empty
|
||||||
|
|
||||||
|
# Import core modules
|
||||||
|
from ..models.manager import ModelManager
|
||||||
|
from ..detection.pipeline import DetectionPipeline
|
||||||
|
from ..models.pipeline import PipelineParser
|
||||||
|
from ..logging.session_logger import PerSessionLogger
|
||||||
|
from .communication import (
|
||||||
|
MessageSerializer, MessageType, IPCMessageUnion,
|
||||||
|
InitializeCommand, ProcessFrameCommand, SetSessionIdCommand,
|
||||||
|
ShutdownCommand, HealthCheckCommand,
|
||||||
|
InitializedResponse, DetectionResultResponse, SessionSetResponse,
|
||||||
|
ShutdownCompleteResponse, HealthResponse, ErrorResponse
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class IntegratedStreamReader:
|
||||||
|
"""
|
||||||
|
Integrated RTSP/HTTP stream reader for session worker processes.
|
||||||
|
Handles both RTSP streams and HTTP snapshots with automatic failover.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, session_id: str, subscription_config: Dict[str, Any], logger: logging.Logger):
|
||||||
|
self.session_id = session_id
|
||||||
|
self.subscription_config = subscription_config
|
||||||
|
self.logger = logger
|
||||||
|
|
||||||
|
# Stream configuration
|
||||||
|
self.rtsp_url = subscription_config.get('rtspUrl')
|
||||||
|
self.snapshot_url = subscription_config.get('snapshotUrl')
|
||||||
|
self.snapshot_interval = subscription_config.get('snapshotInterval', 2000) / 1000.0 # Convert to seconds
|
||||||
|
|
||||||
|
# Stream state
|
||||||
|
self.is_running = False
|
||||||
|
self.rtsp_cap = None
|
||||||
|
self.stream_thread = None
|
||||||
|
self.stop_event = threading.Event()
|
||||||
|
|
||||||
|
# Frame buffer - single latest frame only
|
||||||
|
self.frame_queue = Queue(maxsize=1)
|
||||||
|
self.last_frame_time = 0
|
||||||
|
|
||||||
|
# Stream health monitoring
|
||||||
|
self.consecutive_errors = 0
|
||||||
|
self.max_consecutive_errors = 30
|
||||||
|
self.reconnect_delay = 5.0
|
||||||
|
self.frame_timeout = 10.0 # Seconds without frame before considered dead
|
||||||
|
|
||||||
|
# Crop coordinates if present
|
||||||
|
self.crop_coords = None
|
||||||
|
if subscription_config.get('cropX1') is not None:
|
||||||
|
self.crop_coords = (
|
||||||
|
subscription_config['cropX1'],
|
||||||
|
subscription_config['cropY1'],
|
||||||
|
subscription_config['cropX2'],
|
||||||
|
subscription_config['cropY2']
|
||||||
|
)
|
||||||
|
|
||||||
|
def start(self) -> bool:
|
||||||
|
"""Start the stream reading in background thread."""
|
||||||
|
if self.is_running:
|
||||||
|
return True
|
||||||
|
|
||||||
|
try:
|
||||||
|
self.is_running = True
|
||||||
|
self.stop_event.clear()
|
||||||
|
|
||||||
|
# Start background thread for stream reading
|
||||||
|
self.stream_thread = threading.Thread(
|
||||||
|
target=self._stream_loop,
|
||||||
|
name=f"StreamReader-{self.session_id}",
|
||||||
|
daemon=True
|
||||||
|
)
|
||||||
|
self.stream_thread.start()
|
||||||
|
|
||||||
|
self.logger.info(f"Stream reader started for {self.session_id}")
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Failed to start stream reader: {e}")
|
||||||
|
self.is_running = False
|
||||||
|
return False
|
||||||
|
|
||||||
|
def stop(self):
|
||||||
|
"""Stop the stream reading."""
|
||||||
|
if not self.is_running:
|
||||||
|
return
|
||||||
|
|
||||||
|
self.logger.info(f"Stopping stream reader for {self.session_id}")
|
||||||
|
self.is_running = False
|
||||||
|
self.stop_event.set()
|
||||||
|
|
||||||
|
# Close RTSP connection
|
||||||
|
if self.rtsp_cap:
|
||||||
|
try:
|
||||||
|
self.rtsp_cap.release()
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
self.rtsp_cap = None
|
||||||
|
|
||||||
|
# Wait for thread to finish
|
||||||
|
if self.stream_thread and self.stream_thread.is_alive():
|
||||||
|
self.stream_thread.join(timeout=3.0)
|
||||||
|
|
||||||
|
def get_latest_frame(self) -> Optional[Tuple[np.ndarray, str, float]]:
|
||||||
|
"""Get the latest frame if available. Returns (frame, display_id, timestamp) or None."""
|
||||||
|
try:
|
||||||
|
# Non-blocking get - return None if no frame available
|
||||||
|
frame_data = self.frame_queue.get_nowait()
|
||||||
|
return frame_data
|
||||||
|
except Empty:
|
||||||
|
return None
|
||||||
|
|
||||||
|
def _stream_loop(self):
|
||||||
|
"""Main stream reading loop - runs in background thread."""
|
||||||
|
self.logger.info(f"Stream loop started for {self.session_id}")
|
||||||
|
|
||||||
|
while self.is_running and not self.stop_event.is_set():
|
||||||
|
try:
|
||||||
|
if self.rtsp_url:
|
||||||
|
# Try RTSP first
|
||||||
|
self._read_rtsp_stream()
|
||||||
|
elif self.snapshot_url:
|
||||||
|
# Fallback to HTTP snapshots
|
||||||
|
self._read_http_snapshots()
|
||||||
|
else:
|
||||||
|
self.logger.error("No stream URL configured")
|
||||||
|
break
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Error in stream loop: {e}")
|
||||||
|
self._handle_stream_error()
|
||||||
|
|
||||||
|
self.logger.info(f"Stream loop ended for {self.session_id}")
|
||||||
|
|
||||||
|
def _read_rtsp_stream(self):
|
||||||
|
"""Read frames from RTSP stream."""
|
||||||
|
if not self.rtsp_cap:
|
||||||
|
self._connect_rtsp()
|
||||||
|
|
||||||
|
if not self.rtsp_cap:
|
||||||
|
return
|
||||||
|
|
||||||
|
try:
|
||||||
|
ret, frame = self.rtsp_cap.read()
|
||||||
|
|
||||||
|
if ret and frame is not None:
|
||||||
|
# Process the frame
|
||||||
|
processed_frame = self._process_frame(frame)
|
||||||
|
if processed_frame is not None:
|
||||||
|
# Extract display ID from subscription identifier
|
||||||
|
display_id = self.subscription_config['subscriptionIdentifier'].split(';')[-1]
|
||||||
|
timestamp = time.time()
|
||||||
|
|
||||||
|
# Put frame in queue (replace if full)
|
||||||
|
try:
|
||||||
|
# Clear queue and put new frame
|
||||||
|
try:
|
||||||
|
self.frame_queue.get_nowait()
|
||||||
|
except Empty:
|
||||||
|
pass
|
||||||
|
self.frame_queue.put((processed_frame, display_id, timestamp), timeout=0.1)
|
||||||
|
self.last_frame_time = timestamp
|
||||||
|
self.consecutive_errors = 0
|
||||||
|
except:
|
||||||
|
pass # Queue full, skip frame
|
||||||
|
else:
|
||||||
|
self._handle_stream_error()
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Error reading RTSP frame: {e}")
|
||||||
|
self._handle_stream_error()
|
||||||
|
|
||||||
|
def _read_http_snapshots(self):
|
||||||
|
"""Read frames from HTTP snapshot URL."""
|
||||||
|
try:
|
||||||
|
response = requests.get(self.snapshot_url, timeout=10)
|
||||||
|
response.raise_for_status()
|
||||||
|
|
||||||
|
# Convert response to numpy array
|
||||||
|
img_array = np.asarray(bytearray(response.content), dtype=np.uint8)
|
||||||
|
frame = cv2.imdecode(img_array, cv2.IMREAD_COLOR)
|
||||||
|
|
||||||
|
if frame is not None:
|
||||||
|
# Process the frame
|
||||||
|
processed_frame = self._process_frame(frame)
|
||||||
|
if processed_frame is not None:
|
||||||
|
# Extract display ID from subscription identifier
|
||||||
|
display_id = self.subscription_config['subscriptionIdentifier'].split(';')[-1]
|
||||||
|
timestamp = time.time()
|
||||||
|
|
||||||
|
# Put frame in queue (replace if full)
|
||||||
|
try:
|
||||||
|
# Clear queue and put new frame
|
||||||
|
try:
|
||||||
|
self.frame_queue.get_nowait()
|
||||||
|
except Empty:
|
||||||
|
pass
|
||||||
|
self.frame_queue.put((processed_frame, display_id, timestamp), timeout=0.1)
|
||||||
|
self.last_frame_time = timestamp
|
||||||
|
self.consecutive_errors = 0
|
||||||
|
except:
|
||||||
|
pass # Queue full, skip frame
|
||||||
|
|
||||||
|
# Wait for next snapshot interval
|
||||||
|
time.sleep(self.snapshot_interval)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Error reading HTTP snapshot: {e}")
|
||||||
|
self._handle_stream_error()
|
||||||
|
|
||||||
|
def _connect_rtsp(self):
|
||||||
|
"""Connect to RTSP stream."""
|
||||||
|
try:
|
||||||
|
self.logger.info(f"Connecting to RTSP: {self.rtsp_url}")
|
||||||
|
|
||||||
|
# Create VideoCapture with optimized settings
|
||||||
|
self.rtsp_cap = cv2.VideoCapture(self.rtsp_url)
|
||||||
|
|
||||||
|
# Set buffer size to 1 to reduce latency
|
||||||
|
self.rtsp_cap.set(cv2.CAP_PROP_BUFFERSIZE, 1)
|
||||||
|
|
||||||
|
# Check if connection successful
|
||||||
|
if self.rtsp_cap.isOpened():
|
||||||
|
# Test read a frame
|
||||||
|
ret, frame = self.rtsp_cap.read()
|
||||||
|
if ret and frame is not None:
|
||||||
|
self.logger.info(f"RTSP connection successful for {self.session_id}")
|
||||||
|
self.consecutive_errors = 0
|
||||||
|
return True
|
||||||
|
|
||||||
|
# Connection failed
|
||||||
|
if self.rtsp_cap:
|
||||||
|
self.rtsp_cap.release()
|
||||||
|
self.rtsp_cap = None
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Failed to connect RTSP: {e}")
|
||||||
|
|
||||||
|
return False
|
||||||
|
|
||||||
|
def _process_frame(self, frame: np.ndarray) -> Optional[np.ndarray]:
|
||||||
|
"""Process frame - apply cropping if configured."""
|
||||||
|
if frame is None:
|
||||||
|
return None
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Apply crop if configured
|
||||||
|
if self.crop_coords:
|
||||||
|
x1, y1, x2, y2 = self.crop_coords
|
||||||
|
if x1 < x2 and y1 < y2:
|
||||||
|
frame = frame[y1:y2, x1:x2]
|
||||||
|
|
||||||
|
return frame
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Error processing frame: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
def _handle_stream_error(self):
|
||||||
|
"""Handle stream errors with reconnection logic."""
|
||||||
|
self.consecutive_errors += 1
|
||||||
|
|
||||||
|
if self.consecutive_errors >= self.max_consecutive_errors:
|
||||||
|
self.logger.error(f"Too many consecutive errors ({self.consecutive_errors}), stopping stream")
|
||||||
|
self.stop()
|
||||||
|
return
|
||||||
|
|
||||||
|
# Close current connection
|
||||||
|
if self.rtsp_cap:
|
||||||
|
try:
|
||||||
|
self.rtsp_cap.release()
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
self.rtsp_cap = None
|
||||||
|
|
||||||
|
# Wait before reconnecting
|
||||||
|
self.logger.warning(f"Stream error #{self.consecutive_errors}, reconnecting in {self.reconnect_delay}s")
|
||||||
|
time.sleep(self.reconnect_delay)
|
||||||
|
|
||||||
|
def is_healthy(self) -> bool:
|
||||||
|
"""Check if stream is healthy (receiving frames)."""
|
||||||
|
if not self.is_running:
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Check if we've received a frame recently
|
||||||
|
if self.last_frame_time > 0:
|
||||||
|
time_since_frame = time.time() - self.last_frame_time
|
||||||
|
return time_since_frame < self.frame_timeout
|
||||||
|
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
class SessionWorkerProcess:
|
||||||
|
"""
|
||||||
|
Individual session worker process that handles one camera/session completely.
|
||||||
|
Runs in its own process with isolated memory, models, and state.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, session_id: str, command_queue: mp.Queue, response_queue: mp.Queue):
|
||||||
|
"""
|
||||||
|
Initialize session worker process.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
session_id: Unique session identifier
|
||||||
|
command_queue: Queue to receive commands from main process
|
||||||
|
response_queue: Queue to send responses back to main process
|
||||||
|
"""
|
||||||
|
self.session_id = session_id
|
||||||
|
self.command_queue = command_queue
|
||||||
|
self.response_queue = response_queue
|
||||||
|
|
||||||
|
# Process information
|
||||||
|
self.process = None
|
||||||
|
self.start_time = time.time()
|
||||||
|
self.processed_frames = 0
|
||||||
|
|
||||||
|
# Session components (will be initialized in process)
|
||||||
|
self.model_manager = None
|
||||||
|
self.detection_pipeline = None
|
||||||
|
self.pipeline_parser = None
|
||||||
|
self.logger = None
|
||||||
|
self.session_logger = None
|
||||||
|
self.stream_reader = None
|
||||||
|
|
||||||
|
# Session state
|
||||||
|
self.subscription_config = None
|
||||||
|
self.model_config = None
|
||||||
|
self.backend_session_id = None
|
||||||
|
self.display_id = None
|
||||||
|
self.is_initialized = False
|
||||||
|
self.should_shutdown = False
|
||||||
|
|
||||||
|
# Frame processing
|
||||||
|
self.frame_processing_enabled = False
|
||||||
|
|
||||||
|
async def run(self):
|
||||||
|
"""
|
||||||
|
Main entry point for the worker process.
|
||||||
|
This method runs in the separate process.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
# Set process name for debugging
|
||||||
|
mp.current_process().name = f"SessionWorker-{self.session_id}"
|
||||||
|
|
||||||
|
# Setup basic logging first (enhanced after we get subscription config)
|
||||||
|
self._setup_basic_logging()
|
||||||
|
|
||||||
|
self.logger.info(f"Session worker process started for session {self.session_id}")
|
||||||
|
self.logger.info(f"Process ID: {os.getpid()}")
|
||||||
|
|
||||||
|
# Main message processing loop with integrated frame processing
|
||||||
|
while not self.should_shutdown:
|
||||||
|
try:
|
||||||
|
# Process pending messages
|
||||||
|
await self._process_pending_messages()
|
||||||
|
|
||||||
|
# Process frames if enabled and initialized
|
||||||
|
if self.frame_processing_enabled and self.is_initialized and self.stream_reader:
|
||||||
|
await self._process_stream_frames()
|
||||||
|
|
||||||
|
# Brief sleep to prevent busy waiting
|
||||||
|
await asyncio.sleep(0.01)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Error in main processing loop: {e}", exc_info=True)
|
||||||
|
self._send_error_response("main_loop_error", str(e), traceback.format_exc())
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
# Critical error in main run loop
|
||||||
|
if self.logger:
|
||||||
|
self.logger.error(f"Critical error in session worker: {e}", exc_info=True)
|
||||||
|
else:
|
||||||
|
print(f"Critical error in session worker {self.session_id}: {e}")
|
||||||
|
|
||||||
|
finally:
|
||||||
|
# Cleanup stream reader
|
||||||
|
if self.stream_reader:
|
||||||
|
self.stream_reader.stop()
|
||||||
|
|
||||||
|
if self.session_logger:
|
||||||
|
self.session_logger.log_session_end()
|
||||||
|
if self.session_logger:
|
||||||
|
self.session_logger.cleanup()
|
||||||
|
if self.logger:
|
||||||
|
self.logger.info(f"Session worker process {self.session_id} shutting down")
|
||||||
|
|
||||||
|
async def _handle_message(self, message: IPCMessageUnion):
|
||||||
|
"""
|
||||||
|
Handle incoming messages from main process.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
message: Deserialized message object
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
if message.type == MessageType.INITIALIZE:
|
||||||
|
await self._handle_initialize(message)
|
||||||
|
elif message.type == MessageType.PROCESS_FRAME:
|
||||||
|
await self._handle_process_frame(message)
|
||||||
|
elif message.type == MessageType.SET_SESSION_ID:
|
||||||
|
await self._handle_set_session_id(message)
|
||||||
|
elif message.type == MessageType.SHUTDOWN:
|
||||||
|
await self._handle_shutdown(message)
|
||||||
|
elif message.type == MessageType.HEALTH_CHECK:
|
||||||
|
await self._handle_health_check(message)
|
||||||
|
else:
|
||||||
|
self.logger.warning(f"Unknown message type: {message.type}")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Error handling message {message.type}: {e}", exc_info=True)
|
||||||
|
self._send_error_response(f"handle_{message.type.value}_error", str(e), traceback.format_exc())
|
||||||
|
|
||||||
|
async def _handle_initialize(self, message: InitializeCommand):
|
||||||
|
"""
|
||||||
|
Initialize the session with models and pipeline.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
message: Initialize command message
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
self.logger.info(f"Initializing session {self.session_id}")
|
||||||
|
self.logger.info(f"Subscription config: {message.subscription_config}")
|
||||||
|
self.logger.info(f"Model config: {message.model_config}")
|
||||||
|
|
||||||
|
# Store configuration
|
||||||
|
self.subscription_config = message.subscription_config
|
||||||
|
self.model_config = message.model_config
|
||||||
|
|
||||||
|
# Setup enhanced logging now that we have subscription config
|
||||||
|
self._setup_enhanced_logging()
|
||||||
|
|
||||||
|
# Initialize model manager (isolated for this process)
|
||||||
|
self.model_manager = ModelManager("models")
|
||||||
|
self.logger.info("Model manager initialized")
|
||||||
|
|
||||||
|
# Download and prepare model if needed
|
||||||
|
model_id = self.model_config.get('modelId')
|
||||||
|
model_url = self.model_config.get('modelUrl')
|
||||||
|
model_name = self.model_config.get('modelName', f'Model-{model_id}')
|
||||||
|
|
||||||
|
if model_id and model_url:
|
||||||
|
model_path = self.model_manager.ensure_model(model_id, model_url, model_name)
|
||||||
|
if not model_path:
|
||||||
|
raise RuntimeError(f"Failed to download/prepare model {model_id}")
|
||||||
|
|
||||||
|
self.logger.info(f"Model {model_id} prepared at {model_path}")
|
||||||
|
|
||||||
|
# Log model loading
|
||||||
|
if self.session_logger:
|
||||||
|
self.session_logger.log_model_loading(model_id, model_name, str(model_path))
|
||||||
|
|
||||||
|
# Load pipeline configuration
|
||||||
|
self.pipeline_parser = self.model_manager.get_pipeline_config(model_id)
|
||||||
|
if not self.pipeline_parser:
|
||||||
|
raise RuntimeError(f"Failed to load pipeline config for model {model_id}")
|
||||||
|
|
||||||
|
self.logger.info(f"Pipeline configuration loaded for model {model_id}")
|
||||||
|
|
||||||
|
# Initialize detection pipeline (isolated for this session)
|
||||||
|
self.detection_pipeline = DetectionPipeline(
|
||||||
|
pipeline_parser=self.pipeline_parser,
|
||||||
|
model_manager=self.model_manager,
|
||||||
|
model_id=model_id,
|
||||||
|
message_sender=None # Will be set to send via IPC
|
||||||
|
)
|
||||||
|
|
||||||
|
# Initialize pipeline components
|
||||||
|
if not await self.detection_pipeline.initialize():
|
||||||
|
raise RuntimeError("Failed to initialize detection pipeline")
|
||||||
|
|
||||||
|
self.logger.info("Detection pipeline initialized successfully")
|
||||||
|
|
||||||
|
# Initialize integrated stream reader
|
||||||
|
self.logger.info("Initializing integrated stream reader")
|
||||||
|
self.stream_reader = IntegratedStreamReader(
|
||||||
|
self.session_id,
|
||||||
|
self.subscription_config,
|
||||||
|
self.logger
|
||||||
|
)
|
||||||
|
|
||||||
|
# Start stream reading
|
||||||
|
if self.stream_reader.start():
|
||||||
|
self.logger.info("Stream reader started successfully")
|
||||||
|
self.frame_processing_enabled = True
|
||||||
|
else:
|
||||||
|
self.logger.error("Failed to start stream reader")
|
||||||
|
|
||||||
|
self.is_initialized = True
|
||||||
|
|
||||||
|
# Send success response
|
||||||
|
response = InitializedResponse(
|
||||||
|
type=MessageType.INITIALIZED,
|
||||||
|
session_id=self.session_id,
|
||||||
|
success=True
|
||||||
|
)
|
||||||
|
self._send_response(response)
|
||||||
|
|
||||||
|
else:
|
||||||
|
raise ValueError("Missing required model configuration (modelId, modelUrl)")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Failed to initialize session: {e}", exc_info=True)
|
||||||
|
response = InitializedResponse(
|
||||||
|
type=MessageType.INITIALIZED,
|
||||||
|
session_id=self.session_id,
|
||||||
|
success=False,
|
||||||
|
error_message=str(e)
|
||||||
|
)
|
||||||
|
self._send_response(response)
|
||||||
|
|
||||||
|
async def _handle_process_frame(self, message: ProcessFrameCommand):
|
||||||
|
"""
|
||||||
|
Process a frame through the detection pipeline.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
message: Process frame command message
|
||||||
|
"""
|
||||||
|
if not self.is_initialized:
|
||||||
|
self._send_error_response("not_initialized", "Session not initialized", None)
|
||||||
|
return
|
||||||
|
|
||||||
|
try:
|
||||||
|
self.logger.debug(f"Processing frame for display {message.display_id}")
|
||||||
|
|
||||||
|
# Process frame through detection pipeline
|
||||||
|
if self.backend_session_id:
|
||||||
|
# Processing phase (after session ID is set)
|
||||||
|
result = await self.detection_pipeline.execute_processing_phase(
|
||||||
|
frame=message.frame,
|
||||||
|
display_id=message.display_id,
|
||||||
|
session_id=self.backend_session_id,
|
||||||
|
subscription_id=message.subscription_identifier
|
||||||
|
)
|
||||||
|
phase = "processing"
|
||||||
|
else:
|
||||||
|
# Detection phase (before session ID is set)
|
||||||
|
result = await self.detection_pipeline.execute_detection_phase(
|
||||||
|
frame=message.frame,
|
||||||
|
display_id=message.display_id,
|
||||||
|
subscription_id=message.subscription_identifier
|
||||||
|
)
|
||||||
|
phase = "detection"
|
||||||
|
|
||||||
|
self.processed_frames += 1
|
||||||
|
|
||||||
|
# Send result back to main process
|
||||||
|
response = DetectionResultResponse(
|
||||||
|
session_id=self.session_id,
|
||||||
|
detections=result,
|
||||||
|
processing_time=result.get('processing_time', 0.0),
|
||||||
|
phase=phase
|
||||||
|
)
|
||||||
|
self._send_response(response)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Error processing frame: {e}", exc_info=True)
|
||||||
|
self._send_error_response("frame_processing_error", str(e), traceback.format_exc())
|
||||||
|
|
||||||
|
async def _handle_set_session_id(self, message: SetSessionIdCommand):
|
||||||
|
"""
|
||||||
|
Set the backend session ID for this session.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
message: Set session ID command message
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
self.logger.info(f"Setting backend session ID: {message.backend_session_id}")
|
||||||
|
self.backend_session_id = message.backend_session_id
|
||||||
|
self.display_id = message.display_id
|
||||||
|
|
||||||
|
response = SessionSetResponse(
|
||||||
|
session_id=self.session_id,
|
||||||
|
success=True,
|
||||||
|
backend_session_id=message.backend_session_id
|
||||||
|
)
|
||||||
|
self._send_response(response)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Error setting session ID: {e}", exc_info=True)
|
||||||
|
self._send_error_response("set_session_id_error", str(e), traceback.format_exc())
|
||||||
|
|
||||||
|
async def _handle_shutdown(self, message: ShutdownCommand):
|
||||||
|
"""
|
||||||
|
Handle graceful shutdown request.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
message: Shutdown command message
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
self.logger.info("Received shutdown request")
|
||||||
|
self.should_shutdown = True
|
||||||
|
|
||||||
|
# Cleanup resources
|
||||||
|
if self.detection_pipeline:
|
||||||
|
# Add cleanup method to pipeline if needed
|
||||||
|
pass
|
||||||
|
|
||||||
|
response = ShutdownCompleteResponse(session_id=self.session_id)
|
||||||
|
self._send_response(response)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Error during shutdown: {e}", exc_info=True)
|
||||||
|
|
||||||
|
async def _handle_health_check(self, message: HealthCheckCommand):
|
||||||
|
"""
|
||||||
|
Handle health check request.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
message: Health check command message
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
# Get process metrics
|
||||||
|
process = psutil.Process()
|
||||||
|
memory_info = process.memory_info()
|
||||||
|
memory_mb = memory_info.rss / (1024 * 1024) # Convert to MB
|
||||||
|
cpu_percent = process.cpu_percent()
|
||||||
|
|
||||||
|
# GPU memory (if available)
|
||||||
|
gpu_memory_mb = None
|
||||||
|
try:
|
||||||
|
import torch
|
||||||
|
if torch.cuda.is_available():
|
||||||
|
gpu_memory_mb = torch.cuda.memory_allocated() / (1024 * 1024)
|
||||||
|
except ImportError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
# Determine health status
|
||||||
|
status = "healthy"
|
||||||
|
if memory_mb > 2048: # More than 2GB
|
||||||
|
status = "degraded"
|
||||||
|
if memory_mb > 4096: # More than 4GB
|
||||||
|
status = "unhealthy"
|
||||||
|
|
||||||
|
response = HealthResponse(
|
||||||
|
session_id=self.session_id,
|
||||||
|
status=status,
|
||||||
|
memory_usage_mb=memory_mb,
|
||||||
|
cpu_percent=cpu_percent,
|
||||||
|
gpu_memory_mb=gpu_memory_mb,
|
||||||
|
uptime_seconds=time.time() - self.start_time,
|
||||||
|
processed_frames=self.processed_frames
|
||||||
|
)
|
||||||
|
self._send_response(response)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Error checking health: {e}", exc_info=True)
|
||||||
|
self._send_error_response("health_check_error", str(e), traceback.format_exc())
|
||||||
|
|
||||||
|
def _send_response(self, response: IPCMessageUnion):
|
||||||
|
"""
|
||||||
|
Send response message to main process.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
response: Response message to send
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
serialized = MessageSerializer.serialize_message(response)
|
||||||
|
self.response_queue.put(serialized)
|
||||||
|
except Exception as e:
|
||||||
|
if self.logger:
|
||||||
|
self.logger.error(f"Failed to send response: {e}")
|
||||||
|
|
||||||
|
def _send_error_response(self, error_type: str, error_message: str, traceback_str: Optional[str]):
|
||||||
|
"""
|
||||||
|
Send error response to main process.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
error_type: Type of error
|
||||||
|
error_message: Error message
|
||||||
|
traceback_str: Optional traceback string
|
||||||
|
"""
|
||||||
|
error_response = ErrorResponse(
|
||||||
|
type=MessageType.ERROR,
|
||||||
|
session_id=self.session_id,
|
||||||
|
error_type=error_type,
|
||||||
|
error_message=error_message,
|
||||||
|
traceback=traceback_str
|
||||||
|
)
|
||||||
|
self._send_response(error_response)
|
||||||
|
|
||||||
|
def _setup_basic_logging(self):
|
||||||
|
"""
|
||||||
|
Setup basic logging for this process before we have subscription config.
|
||||||
|
"""
|
||||||
|
logging.basicConfig(
|
||||||
|
level=logging.INFO,
|
||||||
|
format=f"%(asctime)s [%(levelname)s] SessionWorker-{self.session_id}: %(message)s",
|
||||||
|
handlers=[
|
||||||
|
logging.StreamHandler(sys.stdout)
|
||||||
|
]
|
||||||
|
)
|
||||||
|
self.logger = logging.getLogger(f"session_worker_{self.session_id}")
|
||||||
|
|
||||||
|
def _setup_enhanced_logging(self):
|
||||||
|
"""
|
||||||
|
Setup per-session logging with dedicated log file after we have subscription config.
|
||||||
|
Phase 2: Enhanced logging with file rotation and session context.
|
||||||
|
"""
|
||||||
|
if not self.subscription_config:
|
||||||
|
return
|
||||||
|
|
||||||
|
# Initialize per-session logger
|
||||||
|
subscription_id = self.subscription_config.get('subscriptionIdentifier', self.session_id)
|
||||||
|
|
||||||
|
self.session_logger = PerSessionLogger(
|
||||||
|
session_id=self.session_id,
|
||||||
|
subscription_identifier=subscription_id,
|
||||||
|
log_dir="logs",
|
||||||
|
max_size_mb=100,
|
||||||
|
backup_count=5
|
||||||
|
)
|
||||||
|
|
||||||
|
# Get the configured logger (replaces basic logger)
|
||||||
|
self.logger = self.session_logger.get_logger()
|
||||||
|
|
||||||
|
# Log session start
|
||||||
|
self.session_logger.log_session_start(os.getpid())
|
||||||
|
|
||||||
|
async def _process_pending_messages(self):
|
||||||
|
"""Process pending IPC messages from main process."""
|
||||||
|
try:
|
||||||
|
# Process all pending messages
|
||||||
|
while not self.command_queue.empty():
|
||||||
|
message_data = self.command_queue.get_nowait()
|
||||||
|
message = MessageSerializer.deserialize_message(message_data)
|
||||||
|
await self._handle_message(message)
|
||||||
|
except Exception as e:
|
||||||
|
if not self.command_queue.empty():
|
||||||
|
# Only log error if there was actually a message to process
|
||||||
|
self.logger.error(f"Error processing messages: {e}", exc_info=True)
|
||||||
|
|
||||||
|
async def _process_stream_frames(self):
|
||||||
|
"""Process frames from the integrated stream reader."""
|
||||||
|
try:
|
||||||
|
if not self.stream_reader or not self.stream_reader.is_running:
|
||||||
|
return
|
||||||
|
|
||||||
|
# Get latest frame from stream
|
||||||
|
frame_data = self.stream_reader.get_latest_frame()
|
||||||
|
if frame_data is None:
|
||||||
|
return
|
||||||
|
|
||||||
|
frame, display_id, timestamp = frame_data
|
||||||
|
|
||||||
|
# Process frame through detection pipeline
|
||||||
|
subscription_identifier = self.subscription_config['subscriptionIdentifier']
|
||||||
|
|
||||||
|
if self.backend_session_id:
|
||||||
|
# Processing phase (after session ID is set)
|
||||||
|
result = await self.detection_pipeline.execute_processing_phase(
|
||||||
|
frame=frame,
|
||||||
|
display_id=display_id,
|
||||||
|
session_id=self.backend_session_id,
|
||||||
|
subscription_id=subscription_identifier
|
||||||
|
)
|
||||||
|
phase = "processing"
|
||||||
|
else:
|
||||||
|
# Detection phase (before session ID is set)
|
||||||
|
result = await self.detection_pipeline.execute_detection_phase(
|
||||||
|
frame=frame,
|
||||||
|
display_id=display_id,
|
||||||
|
subscription_id=subscription_identifier
|
||||||
|
)
|
||||||
|
phase = "detection"
|
||||||
|
|
||||||
|
self.processed_frames += 1
|
||||||
|
|
||||||
|
# Send result back to main process
|
||||||
|
response = DetectionResultResponse(
|
||||||
|
type=MessageType.DETECTION_RESULT,
|
||||||
|
session_id=self.session_id,
|
||||||
|
detections=result,
|
||||||
|
processing_time=result.get('processing_time', 0.0),
|
||||||
|
phase=phase
|
||||||
|
)
|
||||||
|
self._send_response(response)
|
||||||
|
|
||||||
|
# Log frame processing (debug level to avoid spam)
|
||||||
|
self.logger.debug(f"Processed frame #{self.processed_frames} from {display_id} (phase: {phase})")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Error processing stream frame: {e}", exc_info=True)
|
||||||
|
|
||||||
|
|
||||||
|
def session_worker_main(session_id: str, command_queue: mp.Queue, response_queue: mp.Queue):
|
||||||
|
"""
|
||||||
|
Main entry point for session worker process.
|
||||||
|
This function is called when the process is spawned.
|
||||||
|
"""
|
||||||
|
# Create worker instance
|
||||||
|
worker = SessionWorkerProcess(session_id, command_queue, response_queue)
|
||||||
|
|
||||||
|
# Run the worker
|
||||||
|
asyncio.run(worker.run())
|
Loading…
Add table
Add a link
Reference in a new issue