# Event-Driven Stream Processing Architecture with Batching

## Overview

This document describes the AsyncIO-based event-driven architecture for connecting stream decoders to models and tracking, with support for batched inference using ping-pong circular buffers.

## Architecture Diagram

```
┌─────────────────────────────────────────────────────────────────┐
│                   StreamConnectionManager                        │
│  - Manages multiple stream connections                          │
│  - Routes events to user callbacks/generators                   │
│  - Coordinates ModelController and TrackingController           │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ├──────────────────┬──────────────────┐
                              ▼                  ▼                  ▼
                    ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
                    │ StreamConnection│ │ StreamConnection│ │ StreamConnection│
                    │   (Stream 1)    │ │   (Stream 2)    │ │   (Stream N)    │
                    │                 │ │                 │ │                 │
                    │ - StreamDecoder │ │ - StreamDecoder │ │ - StreamDecoder │
                    │ - Frame Poller  │ │ - Frame Poller  │ │ - Frame Poller  │
                    │ - Event Emitter │ │ - Event Emitter │ │ - Event Emitter │
                    └─────────────────┘ └─────────────────┘ └─────────────────┘
                              │                  │                  │
                              └──────────────────┴──────────────────┘
                                                 │
                                                 ▼
                              ┌─────────────────────────────────────┐
                              │       ModelController               │
                              │  ┌────────────┐  ┌────────────┐    │
                              │  │ Buffer A   │  │ Buffer B   │    │
                              │  │ (Active)   │  │(Processing)│    │
                              │  │ [frame1]   │  │ [frame9]   │    │
                              │  │ [frame2]   │  │ [frame10]  │    │
                              │  │ [frame3]   │  │ [...]      │    │
                              │  └────────────┘  └────────────┘    │
                              │                                     │
                              │  - Batch accumulation               │
                              │  - Force timeout monitor            │
                              │  - Ping-pong switching              │
                              └─────────────────────────────────────┘
                                                 │
                                    ┌────────────┴────────────┐
                                    ▼                         ▼
                      ┌─────────────────────┐   ┌─────────────────────┐
                      │ TensorRTModelRepo   │   │ TrackingController  │
                      │ - Batched inference │   │ - Track association │
                      │ - Context pooling   │   │ - Track management  │
                      └─────────────────────┘   └─────────────────────┘
                                                 │
                                                 ▼
                                    ┌─────────────────────────┐
                                    │  User Callbacks/Queues  │
                                    │  - on_tracking_result   │
                                    │  - on_detections        │
                                    │  - on_error             │
                                    └─────────────────────────┘
```

## Component Details

### 1. ModelController (Async Batching Layer)

**Responsibilities:**
- Accumulate frames from multiple streams into batches
- Manage ping-pong buffers (BufferA/BufferB)
- Monitor force-switch timeout
- Execute batched inference
- Route results back to streams

**Ping-Pong Buffer Logic:**
- **BufferA (Active)**: Accumulates incoming frames
- **BufferB (Processing)**: Being processed through inference
- **Switch Triggers:**
  1. Active buffer reaches `batch_size` → immediate swap
  2. `force_timeout` expires AND processing buffer is idle → force swap
  3. Never switch if processing buffer is busy

### 2. StreamConnectionManager

**Responsibilities:**
- Create and manage stream connections
- Coordinate ModelController and TrackingController
- Route tracking results to user callbacks/generators
- Handle stream lifecycle (connect, disconnect, errors)

### 3. StreamConnection

**Responsibilities:**
- Wrap a single StreamDecoder
- Poll frames from threaded decoder (bridge to async)
- Submit frames to ModelController
- Emit events to user code

---

## Pseudo Code Implementation

### 1. ModelController with Ping-Pong Buffers

```python
import asyncio
import torch
from typing import Dict, List, Tuple, Optional, Callable
from dataclasses import dataclass
from enum import Enum
import time

@dataclass
class BatchFrame:
    """Represents a frame in the batch buffer"""
    stream_id: str
    frame: torch.Tensor  # GPU tensor (3, H, W)
    timestamp: float
    metadata: Dict = None

class BufferState(Enum):
    IDLE = "idle"
    FILLING = "filling"
    PROCESSING = "processing"

class ModelController:
    """
    Manages batched inference with ping-pong buffers and force-switch timeout.
    """

    def __init__(
        self,
        model_repository,
        model_id: str,
        batch_size: int = 16,
        force_timeout: float = 0.05,  # 50ms
        preprocess_fn: Callable = None,
        postprocess_fn: Callable = None,
    ):
        self.model_repository = model_repository
        self.model_id = model_id
        self.batch_size = batch_size
        self.force_timeout = force_timeout
        self.preprocess_fn = preprocess_fn
        self.postprocess_fn = postprocess_fn

        # Ping-pong buffers
        self.buffer_a: List[BatchFrame] = []
        self.buffer_b: List[BatchFrame] = []

        # Buffer states
        self.active_buffer = "A"  # Which buffer is currently active (filling)
        self.buffer_a_state = BufferState.IDLE
        self.buffer_b_state = BufferState.IDLE

        # Async coordination
        self.buffer_lock = asyncio.Lock()
        self.last_submit_time = time.time()

        # Tasks
        self.timeout_task: Optional[asyncio.Task] = None
        self.processor_task: Optional[asyncio.Task] = None
        self.running = False

        # Result callbacks (stream_id -> callback)
        self.result_callbacks: Dict[str, Callable] = {}

    async def start(self):
        """Start the controller background tasks"""
        self.running = True
        self.timeout_task = asyncio.create_task(self._timeout_monitor())
        self.processor_task = asyncio.create_task(self._batch_processor())

    async def stop(self):
        """Stop the controller and cleanup"""
        self.running = False
        if self.timeout_task:
            self.timeout_task.cancel()
        if self.processor_task:
            self.processor_task.cancel()

        # Process any remaining frames
        await self._process_remaining_buffers()

    def register_callback(self, stream_id: str, callback: Callable):
        """Register a callback for inference results from a stream"""
        self.result_callbacks[stream_id] = callback

    def unregister_callback(self, stream_id: str):
        """Unregister a stream callback"""
        self.result_callbacks.pop(stream_id, None)

    async def submit_frame(self, stream_id: str, frame: torch.Tensor, metadata: Dict = None):
        """
        Submit a frame for batched inference.

        Args:
            stream_id: Unique stream identifier
            frame: GPU tensor (3, H, W)
            metadata: Optional metadata to attach
        """
        async with self.buffer_lock:
            batch_frame = BatchFrame(
                stream_id=stream_id,
                frame=frame,
                timestamp=time.time(),
                metadata=metadata or {}
            )

            # Add to active buffer
            if self.active_buffer == "A":
                self.buffer_a.append(batch_frame)
                self.buffer_a_state = BufferState.FILLING
                buffer_size = len(self.buffer_a)
            else:
                self.buffer_b.append(batch_frame)
                self.buffer_b_state = BufferState.FILLING
                buffer_size = len(self.buffer_b)

            self.last_submit_time = time.time()

            # Check if we should immediately swap (batch full)
            if buffer_size >= self.batch_size:
                await self._try_swap_buffers()

    async def _timeout_monitor(self):
        """Monitor force-switch timeout"""
        while self.running:
            await asyncio.sleep(0.01)  # Check every 10ms

            async with self.buffer_lock:
                time_since_submit = time.time() - self.last_submit_time

                # Check if timeout expired and we have frames waiting
                if time_since_submit >= self.force_timeout:
                    active_buffer = self.buffer_a if self.active_buffer == "A" else self.buffer_b
                    if len(active_buffer) > 0:
                        await self._try_swap_buffers()

    async def _try_swap_buffers(self):
        """
        Attempt to swap ping-pong buffers.
        Only swaps if the inactive buffer is not currently processing.

        This method should be called with buffer_lock held.
        """
        # Check if inactive buffer is available
        inactive_state = self.buffer_b_state if self.active_buffer == "A" else self.buffer_a_state

        if inactive_state != BufferState.PROCESSING:
            # Swap active buffer
            old_active = self.active_buffer
            self.active_buffer = "B" if old_active == "A" else "A"

            # Mark old active buffer as ready for processing
            if old_active == "A":
                self.buffer_a_state = BufferState.PROCESSING
            else:
                self.buffer_b_state = BufferState.PROCESSING

            # Signal processor that there's work to do
            # (The processor task is already running and will pick it up)

    async def _batch_processor(self):
        """Background task that processes batches when available"""
        while self.running:
            await asyncio.sleep(0.001)  # Check every 1ms

            # Check if buffer A needs processing
            if self.buffer_a_state == BufferState.PROCESSING:
                await self._process_buffer("A")

            # Check if buffer B needs processing
            if self.buffer_b_state == BufferState.PROCESSING:
                await self._process_buffer("B")

    async def _process_buffer(self, buffer_name: str):
        """
        Process a buffer through inference.

        Args:
            buffer_name: "A" or "B"
        """
        async with self.buffer_lock:
            # Get buffer to process
            if buffer_name == "A":
                batch = self.buffer_a.copy()
                self.buffer_a.clear()
            else:
                batch = self.buffer_b.copy()
                self.buffer_b.clear()

        if len(batch) == 0:
            # Mark as idle and return
            async with self.buffer_lock:
                if buffer_name == "A":
                    self.buffer_a_state = BufferState.IDLE
                else:
                    self.buffer_b_state = BufferState.IDLE
            return

        # Process batch (outside lock to allow concurrent submissions)
        try:
            results = await self._run_batch_inference(batch)

            # Emit results to callbacks
            for batch_frame, result in zip(batch, results):
                callback = self.result_callbacks.get(batch_frame.stream_id)
                if callback:
                    # Schedule callback asynchronously
                    if asyncio.iscoroutinefunction(callback):
                        asyncio.create_task(callback(result))
                    else:
                        callback(result)

        except Exception as e:
            print(f"Error processing batch: {e}")
            # TODO: Emit error events

        finally:
            # Mark buffer as idle
            async with self.buffer_lock:
                if buffer_name == "A":
                    self.buffer_a_state = BufferState.IDLE
                else:
                    self.buffer_b_state = BufferState.IDLE

    async def _run_batch_inference(self, batch: List[BatchFrame]) -> List[Dict]:
        """
        Run inference on a batch of frames.

        Args:
            batch: List of BatchFrame objects

        Returns:
            List of detection results (one per frame)
        """
        # Preprocess frames (on GPU)
        preprocessed = []
        for batch_frame in batch:
            if self.preprocess_fn:
                processed = self.preprocess_fn(batch_frame.frame)
            else:
                processed = batch_frame.frame
            preprocessed.append(processed)

        # Stack into batch tensor: (N, C, H, W)
        batch_tensor = torch.stack(preprocessed, dim=0)

        # Run inference (TensorRT model repository is sync, so run in executor)
        loop = asyncio.get_event_loop()
        outputs = await loop.run_in_executor(
            None,
            lambda: self.model_repository.infer(
                self.model_id,
                {"images": batch_tensor},
                synchronize=True
            )
        )

        # Postprocess results (split batch back to individual results)
        results = []
        for i, batch_frame in enumerate(batch):
            # Extract single frame output from batch
            frame_output = {k: v[i:i+1] for k, v in outputs.items()}

            if self.postprocess_fn:
                detections = self.postprocess_fn(frame_output)
            else:
                detections = frame_output

            result = {
                "stream_id": batch_frame.stream_id,
                "timestamp": batch_frame.timestamp,
                "detections": detections,
                "metadata": batch_frame.metadata,
            }
            results.append(result)

        return results

    async def _process_remaining_buffers(self):
        """Process any remaining frames in buffers during shutdown"""
        if len(self.buffer_a) > 0:
            await self._process_buffer("A")
        if len(self.buffer_b) > 0:
            await self._process_buffer("B")

    def get_stats(self) -> Dict:
        """Get current buffer statistics"""
        return {
            "active_buffer": self.active_buffer,
            "buffer_a_size": len(self.buffer_a),
            "buffer_b_size": len(self.buffer_b),
            "buffer_a_state": self.buffer_a_state.value,
            "buffer_b_state": self.buffer_b_state.value,
            "registered_streams": len(self.result_callbacks),
        }
```

### 2. StreamConnectionManager

```python
import asyncio
from typing import Dict, Optional, Callable, AsyncIterator
from dataclasses import dataclass
from enum import Enum

class ConnectionStatus(Enum):
    CONNECTING = "connecting"
    CONNECTED = "connected"
    DISCONNECTED = "disconnected"
    ERROR = "error"

@dataclass
class TrackingResult:
    """Result emitted to user callbacks"""
    stream_id: str
    timestamp: float
    tracked_objects: List  # List of TrackedObject from TrackingController
    detections: List  # Raw detections
    frame_shape: Tuple[int, int, int]
    metadata: Dict

class StreamConnection:
    """Represents a single stream connection with event emission"""

    def __init__(
        self,
        stream_id: str,
        decoder,
        model_controller: ModelController,
        tracking_controller,
        poll_interval: float = 0.01,
    ):
        self.stream_id = stream_id
        self.decoder = decoder
        self.model_controller = model_controller
        self.tracking_controller = tracking_controller
        self.poll_interval = poll_interval

        self.status = ConnectionStatus.CONNECTING
        self.frame_count = 0
        self.last_frame_time = 0.0

        # Event emission
        self.result_queue: asyncio.Queue[TrackingResult] = asyncio.Queue()
        self.error_queue: asyncio.Queue[Exception] = asyncio.Queue()

        # Tasks
        self.poller_task: Optional[asyncio.Task] = None
        self.running = False

    async def start(self):
        """Start the connection (decoder and frame polling)"""
        # Start decoder (runs in background thread)
        self.decoder.start()

        # Wait for initial connection
        await asyncio.sleep(2.0)

        if self.decoder.is_connected():
            self.status = ConnectionStatus.CONNECTED
        else:
            self.status = ConnectionStatus.ERROR
            raise ConnectionError(f"Failed to connect to stream {self.stream_id}")

        # Start frame polling task
        self.running = True
        self.poller_task = asyncio.create_task(self._frame_poller())

    async def stop(self):
        """Stop the connection and cleanup"""
        self.running = False

        if self.poller_task:
            self.poller_task.cancel()
            try:
                await self.poller_task
            except asyncio.CancelledError:
                pass

        # Stop decoder
        self.decoder.stop()

        # Unregister from model controller
        self.model_controller.unregister_callback(self.stream_id)

        self.status = ConnectionStatus.DISCONNECTED

    async def _frame_poller(self):
        """Poll frames from threaded decoder and submit to model controller"""
        last_frame_ptr = None

        while self.running:
            try:
                # Poll frame from decoder (runs in thread)
                frame = self.decoder.get_latest_frame(rgb=True)

                # Check if we got a new frame (avoid reprocessing same frame)
                if frame is not None and frame.data_ptr() != last_frame_ptr:
                    last_frame_ptr = frame.data_ptr()
                    self.last_frame_time = time.time()
                    self.frame_count += 1

                    # Submit to model controller for batched inference
                    await self.model_controller.submit_frame(
                        stream_id=self.stream_id,
                        frame=frame,
                        metadata={
                            "frame_number": self.frame_count,
                            "shape": frame.shape,
                        }
                    )

                # Check decoder status
                if not self.decoder.is_connected():
                    self.status = ConnectionStatus.DISCONNECTED
                    # Decoder will auto-reconnect, just update status
                    await asyncio.sleep(1.0)
                    if self.decoder.is_connected():
                        self.status = ConnectionStatus.CONNECTED

            except Exception as e:
                await self.error_queue.put(e)
                self.status = ConnectionStatus.ERROR

            # Sleep until next poll
            await asyncio.sleep(self.poll_interval)

    async def _handle_inference_result(self, result: Dict):
        """
        Callback invoked by ModelController when inference is done.
        Runs tracking and emits final result.
        """
        try:
            # Extract detections
            detections = result["detections"]

            # Run tracking (this is sync, so run in executor)
            loop = asyncio.get_event_loop()
            tracked_objects = await loop.run_in_executor(
                None,
                lambda: self.tracking_controller.update_tracks(detections)
            )

            # Create tracking result
            tracking_result = TrackingResult(
                stream_id=self.stream_id,
                timestamp=result["timestamp"],
                tracked_objects=tracked_objects,
                detections=detections,
                frame_shape=result["metadata"].get("shape"),
                metadata=result["metadata"],
            )

            # Emit to result queue
            await self.result_queue.put(tracking_result)

        except Exception as e:
            await self.error_queue.put(e)

    async def tracking_results(self) -> AsyncIterator[TrackingResult]:
        """
        Async generator for tracking results.

        Usage:
            async for result in connection.tracking_results():
                print(result.tracked_objects)
        """
        while self.running or not self.result_queue.empty():
            try:
                result = await asyncio.wait_for(self.result_queue.get(), timeout=1.0)
                yield result
            except asyncio.TimeoutError:
                continue

    async def errors(self) -> AsyncIterator[Exception]:
        """Async generator for errors"""
        while self.running or not self.error_queue.empty():
            try:
                error = await asyncio.wait_for(self.error_queue.get(), timeout=1.0)
                yield error
            except asyncio.TimeoutError:
                continue

    def get_stats(self) -> Dict:
        """Get connection statistics"""
        return {
            "stream_id": self.stream_id,
            "status": self.status.value,
            "frame_count": self.frame_count,
            "last_frame_time": self.last_frame_time,
            "decoder_connected": self.decoder.is_connected(),
            "decoder_buffer_size": self.decoder.get_buffer_size(),
        }


class StreamConnectionManager:
    """
    High-level manager for stream connections with batched inference.
    """

    def __init__(
        self,
        gpu_id: int = 0,
        batch_size: int = 16,
        force_timeout: float = 0.05,
        poll_interval: float = 0.01,
    ):
        self.gpu_id = gpu_id
        self.batch_size = batch_size
        self.force_timeout = force_timeout
        self.poll_interval = poll_interval

        # Factories
        from services import StreamDecoderFactory, TrackingFactory
        from services.model_repository import TensorRTModelRepository

        self.decoder_factory = StreamDecoderFactory(gpu_id=gpu_id)
        self.tracking_factory = TrackingFactory(gpu_id=gpu_id)
        self.model_repository = TensorRTModelRepository(gpu_id=gpu_id)

        # Controllers
        self.model_controller: Optional[ModelController] = None
        self.tracking_controller = None

        # Connections
        self.connections: Dict[str, StreamConnection] = {}

        # State
        self.initialized = False

    async def initialize(
        self,
        model_path: str,
        model_id: str = "detector",
        preprocess_fn: Callable = None,
        postprocess_fn: Callable = None,
    ):
        """
        Initialize the manager with a model.

        Args:
            model_path: Path to TensorRT model file
            model_id: Model identifier
            preprocess_fn: Preprocessing function (e.g., YOLOv8Utils.preprocess)
            postprocess_fn: Postprocessing function (e.g., YOLOv8Utils.postprocess)
        """
        # Load model
        loop = asyncio.get_event_loop()
        await loop.run_in_executor(
            None,
            lambda: self.model_repository.load_model(model_id, model_path, num_contexts=4)
        )

        # Create model controller
        self.model_controller = ModelController(
            model_repository=self.model_repository,
            model_id=model_id,
            batch_size=self.batch_size,
            force_timeout=self.force_timeout,
            preprocess_fn=preprocess_fn,
            postprocess_fn=postprocess_fn,
        )
        await self.model_controller.start()

        # Create tracking controller
        self.tracking_controller = self.tracking_factory.create_controller(
            model_repository=self.model_repository,
            model_id=model_id,
            tracker_type="iou",
        )

        self.initialized = True

    async def connect_stream(
        self,
        rtsp_url: str,
        stream_id: Optional[str] = None,
        on_tracking_result: Optional[Callable] = None,
        on_error: Optional[Callable] = None,
    ) -> StreamConnection:
        """
        Connect to a stream and start processing.

        Args:
            rtsp_url: RTSP stream URL
            stream_id: Optional stream identifier (auto-generated if not provided)
            on_tracking_result: Optional callback for tracking results
            on_error: Optional callback for errors

        Returns:
            StreamConnection object for this stream
        """
        if not self.initialized:
            raise RuntimeError("Manager not initialized. Call initialize() first.")

        # Generate stream ID if not provided
        if stream_id is None:
            stream_id = f"stream_{len(self.connections)}"

        # Create decoder
        decoder = self.decoder_factory.create_decoder(rtsp_url, buffer_size=30)

        # Create connection
        connection = StreamConnection(
            stream_id=stream_id,
            decoder=decoder,
            model_controller=self.model_controller,
            tracking_controller=self.tracking_controller,
            poll_interval=self.poll_interval,
        )

        # Register callback with model controller
        self.model_controller.register_callback(
            stream_id,
            connection._handle_inference_result
        )

        # Start connection
        await connection.start()

        # Store connection
        self.connections[stream_id] = connection

        # Set up user callbacks if provided
        if on_tracking_result:
            asyncio.create_task(self._forward_results(connection, on_tracking_result))

        if on_error:
            asyncio.create_task(self._forward_errors(connection, on_error))

        return connection

    async def disconnect_stream(self, stream_id: str):
        """Disconnect and cleanup a stream"""
        connection = self.connections.get(stream_id)
        if connection:
            await connection.stop()
            del self.connections[stream_id]

    async def disconnect_all(self):
        """Disconnect all streams"""
        stream_ids = list(self.connections.keys())
        for stream_id in stream_ids:
            await self.disconnect_stream(stream_id)

    async def shutdown(self):
        """Shutdown the manager and cleanup all resources"""
        # Disconnect all streams
        await self.disconnect_all()

        # Stop model controller
        if self.model_controller:
            await self.model_controller.stop()

        # Cleanup (model repository cleanup is sync)
        # Note: May need to handle cleanup carefully to avoid segfaults

    async def _forward_results(self, connection: StreamConnection, callback: Callable):
        """Forward results from connection to user callback"""
        async for result in connection.tracking_results():
            if asyncio.iscoroutinefunction(callback):
                await callback(result)
            else:
                callback(result)

    async def _forward_errors(self, connection: StreamConnection, callback: Callable):
        """Forward errors from connection to user callback"""
        async for error in connection.errors():
            if asyncio.iscoroutinefunction(callback):
                await callback(error)
            else:
                callback(error)

    def get_stats(self) -> Dict:
        """Get statistics for all connections"""
        return {
            "manager": {
                "initialized": self.initialized,
                "num_connections": len(self.connections),
                "batch_size": self.batch_size,
                "force_timeout": self.force_timeout,
            },
            "model_controller": self.model_controller.get_stats() if self.model_controller else {},
            "connections": {
                stream_id: conn.get_stats()
                for stream_id, conn in self.connections.items()
            },
        }
```

### 3. User API Examples

#### Example 1: Simple Callback Pattern

```python
import asyncio
from services import StreamConnectionManager
from services.yolo import YOLOv8Utils

async def main():
    # Create manager
    manager = StreamConnectionManager(
        gpu_id=0,
        batch_size=16,
        force_timeout=0.05,  # 50ms
    )

    # Initialize with model
    await manager.initialize(
        model_path="models/yolov8n.trt",
        model_id="yolo",
        preprocess_fn=YOLOv8Utils.preprocess,
        postprocess_fn=YOLOv8Utils.postprocess,
    )

    # Define callback for tracking results
    def on_tracking_result(result):
        print(f"Stream: {result.stream_id}")
        print(f"Timestamp: {result.timestamp}")
        print(f"Tracked objects: {len(result.tracked_objects)}")
        for obj in result.tracked_objects:
            print(f"  - Track ID {obj.track_id}: class={obj.class_id}, conf={obj.confidence:.2f}")

    def on_error(error):
        print(f"Error: {error}")

    # Connect to stream
    connection = await manager.connect_stream(
        rtsp_url="rtsp://camera1.example.com/stream",
        stream_id="camera1",
        on_tracking_result=on_tracking_result,
        on_error=on_error,
    )

    # Let it run for 60 seconds
    await asyncio.sleep(60)

    # Get statistics
    stats = manager.get_stats()
    print(f"Stats: {stats}")

    # Cleanup
    await manager.shutdown()

if __name__ == "__main__":
    asyncio.run(main())
```

#### Example 2: Async Generator Pattern (Multiple Streams)

```python
import asyncio
from services import StreamConnectionManager
from services.yolo import YOLOv8Utils

async def process_stream(connection, stream_name):
    """Process results from a single stream"""
    async for result in connection.tracking_results():
        print(f"[{stream_name}] Frame {result.metadata['frame_number']}: {len(result.tracked_objects)} objects")

        # Do something with tracked objects
        for obj in result.tracked_objects:
            if obj.class_id == 0:  # Person class
                print(f"  Person detected! Track ID: {obj.track_id}, Conf: {obj.confidence:.2f}")

async def main():
    manager = StreamConnectionManager(
        gpu_id=0,
        batch_size=32,  # Larger batch for multiple streams
        force_timeout=0.05,
    )

    await manager.initialize(
        model_path="models/yolov8n.trt",
        preprocess_fn=YOLOv8Utils.preprocess,
        postprocess_fn=YOLOv8Utils.postprocess,
    )

    # Connect to multiple streams
    camera_urls = [
        ("rtsp://camera1.example.com/stream", "Front Door"),
        ("rtsp://camera2.example.com/stream", "Parking Lot"),
        ("rtsp://camera3.example.com/stream", "Warehouse"),
        ("rtsp://camera4.example.com/stream", "Loading Bay"),
    ]

    tasks = []
    for url, name in camera_urls:
        connection = await manager.connect_stream(
            rtsp_url=url,
            stream_id=name.lower().replace(" ", "_"),
        )

        # Create task to process this stream
        task = asyncio.create_task(process_stream(connection, name))
        tasks.append(task)

    # Run all streams concurrently
    try:
        await asyncio.gather(*tasks)
    except KeyboardInterrupt:
        print("Shutting down...")

    await manager.shutdown()

if __name__ == "__main__":
    asyncio.run(main())
```

#### Example 3: Queue-Based Pattern (for integration with other systems)

```python
import asyncio
from services import StreamConnectionManager
from services.yolo import YOLOv8Utils

async def main():
    manager = StreamConnectionManager(gpu_id=0, batch_size=16)

    await manager.initialize(
        model_path="models/yolov8n.trt",
        preprocess_fn=YOLOv8Utils.preprocess,
        postprocess_fn=YOLOv8Utils.postprocess,
    )

    # Connect to stream (no callback)
    connection = await manager.connect_stream(
        rtsp_url="rtsp://camera.example.com/stream",
        stream_id="main_camera",
    )

    # Use the built-in queue
    result_queue = connection.result_queue

    # Process results from queue
    while True:
        result = await result_queue.get()

        # Send to external system (e.g., message queue, database, API)
        await send_to_kafka(result)
        await save_to_database(result)

        # Or do real-time processing
        if has_person_alert(result.tracked_objects):
            await send_alert("Person detected in restricted area!")

async def send_to_kafka(result):
    # Your Kafka producer code
    pass

async def save_to_database(result):
    # Your database code
    pass

def has_person_alert(tracked_objects):
    # Your alert logic
    return any(obj.class_id == 0 for obj in tracked_objects)

async def send_alert(message):
    print(f"ALERT: {message}")

if __name__ == "__main__":
    asyncio.run(main())
```

#### Example 4: Async Callback with Error Handling

```python
import asyncio
from services import StreamConnectionManager
from services.yolo import YOLOv8Utils

async def main():
    manager = StreamConnectionManager(gpu_id=0, batch_size=16)

    await manager.initialize(
        model_path="models/yolov8n.trt",
        preprocess_fn=YOLOv8Utils.preprocess,
        postprocess_fn=YOLOv8Utils.postprocess,
    )

    # Async callback (can do I/O operations)
    async def on_tracking_result(result):
        # Can use async operations in callback
        await save_to_database(result)

        # Check for alerts
        for obj in result.tracked_objects:
            if obj.class_id == 0 and obj.confidence > 0.8:
                await send_notification(f"High confidence person detection: {obj.track_id}")

    async def on_error(error):
        await log_error_to_monitoring_system(error)

    # Connect with async callbacks
    connection = await manager.connect_stream(
        rtsp_url="rtsp://camera.example.com/stream",
        on_tracking_result=on_tracking_result,
        on_error=on_error,
    )

    # Monitor stats periodically
    while True:
        await asyncio.sleep(10)
        stats = manager.get_stats()
        print(f"Buffer stats: {stats['model_controller']}")
        print(f"Connection stats: {stats['connections']}")

async def save_to_database(result):
    # Simulate async database operation
    await asyncio.sleep(0.01)

async def send_notification(message):
    print(f"NOTIFICATION: {message}")

async def log_error_to_monitoring_system(error):
    print(f"ERROR: {error}")

if __name__ == "__main__":
    asyncio.run(main())
```

---

## Configuration Examples

### Performance Tuning

```python
# Low latency (small batches, quick timeout)
manager = StreamConnectionManager(
    gpu_id=0,
    batch_size=4,
    force_timeout=0.02,  # 20ms
    poll_interval=0.005,  # 200 FPS
)

# High throughput (large batches, longer timeout)
manager = StreamConnectionManager(
    gpu_id=0,
    batch_size=32,
    force_timeout=0.1,  # 100ms
    poll_interval=0.02,  # 50 FPS
)

# Balanced (default)
manager = StreamConnectionManager(
    gpu_id=0,
    batch_size=16,
    force_timeout=0.05,  # 50ms
    poll_interval=0.01,  # 100 FPS
)
```

### Multiple GPUs

```python
# Create manager per GPU
manager_gpu0 = StreamConnectionManager(gpu_id=0, batch_size=16)
manager_gpu1 = StreamConnectionManager(gpu_id=1, batch_size=16)

# Initialize both
await manager_gpu0.initialize(model_path="models/yolov8n.trt", ...)
await manager_gpu1.initialize(model_path="models/yolov8n.trt", ...)

# Distribute streams across GPUs
await manager_gpu0.connect_stream(url1, ...)
await manager_gpu0.connect_stream(url2, ...)
await manager_gpu1.connect_stream(url3, ...)
await manager_gpu1.connect_stream(url4, ...)
```

---

## Key Features Summary

1. **Ping-Pong Buffers**: Efficient batching with minimal latency
2. **Force Timeout**: Prevents starvation of small batches
3. **AsyncIO**: Clean event-driven architecture
4. **Multiple Patterns**: Callbacks, generators, queues
5. **Thread-Async Bridge**: Integrates with existing threaded decoders
6. **Zero-Copy**: All processing stays on GPU
7. **Auto-Reconnection**: Inherits from StreamDecoder
8. **Statistics**: Real-time monitoring of buffers and connections

---

## Performance Characteristics

- **Latency**: `force_timeout + inference_time`
- **Throughput**: Maximized by batching
- **VRAM**: 60MB per stream + batch buffer overhead
- **CPU**: Minimal (async event loop + thread polling)

---

## Next Steps

To implement this design:

1. Create `services/model_controller.py` with `ModelController` class
2. Create `services/stream_connection_manager.py` with `StreamConnectionManager` and `StreamConnection` classes
3. Update `services/__init__.py` to export new classes
4. Create `test_event_driven.py` to test the system
5. Add monitoring/logging throughout
6. Handle edge cases (reconnection, cleanup, errors)