تێپەڕبوون بۆ ناوەڕۆکی سەرەکی

Scaling toward large fleets (design target: 600k+ devices, e.g. Teltonika FMC920 class)

Platform-wide capacity goals (including ~40M users on the application plane) are summarized in ../PLATFORM-MASTER.md. This document focuses on the telematics ingest path (TCP gateway, parsers, storage, observability).

TCP gateway horizontal scaling

  • Run multiple telematics-gateway instances behind a TCP load balancer.
  • Use session affinity (sticky) so the same IMEI lands on the same node while the TCP connection stays open. If a node dies, devices reconnect — design parsers to be stateless except per-socket buffer.

Database

  • Partition or use TimescaleDB for device_positions by time.
  • Archive cold data to object storage if compliance allows.
  • Use read replicas for reporting APIs; writes stay on primary (or sharded by tenant in multi-tenant designs).

Metrics and SLOs

The gateway exposes Prometheus-style metrics on METRICS_PORT (default 9092):

  • telematics_tcp_connections_total
  • telematics_avl_records_total
  • telematics_parse_errors_total
  • telematics_imei_rejected_total

Define SLOs such as: p99 ingest latency < 2s from TCP receive to DB commit in steady state.

Backpressure

  • Per-socket maximum buffer size (gateway drops/reset on abuse).
  • Rate limits at network edge (firewall, cloud security group).
  • Queue depth alerts on device_commands if commands back up.

Multi-tenant (future)

  • Add tenant_id to devices and enforce row-level security in PostgreSQL.
  • Separate API scopes per tenant; never mix data in dashboards.

What not to optimize prematurely

  • Full Codec parity with vendor cloud before you have staging device volume and packet captures.