RN • Portfolio

Full-Stack • DevOps • ML

Sign in Sign up

May 20, 2025•1 min read

ML Inference on a Budget: Batching, Caching & Autoscaling

Keep latency tight and cost low with batching windows, feature caches, and predictive scaling.

#ml #mlops #python #fastapi

ML Inference on a Budget: Batching, Caching & Autoscaling

Batching

Aggregate requests into small micro-batches (10–30ms windows).

Caching

Cache features + outputs (with TTL) when safe; add cache keys for models.

Autoscaling

Use queue depth + latency as signals; scale to zero when idle.

ShareDiscuss a project

React Native: Offline-First Patterns That Scale

Electron Security Hardening Essentials

MERN (Mongo • Express • React • Node)Java + Spring Boot React Native (Android & iOS)PHP / Laravel ML Engineering / MLOps Security • Hardening • RBAC Cloud: AWS • GCP • Azure Microservices • APIs

MERN (Mongo • Express • React • Node)Java + Spring Boot React Native (Android & iOS)PHP / Laravel ML Engineering / MLOps Security • Hardening • RBAC Cloud: AWS • GCP • Azure Microservices • APIs

PERN (Postgres • Express • React • Node)Kotlin • Jetpack Compose Electron (Desktop & macOS)DevOps: Docker • K8s • CI/CD Realtime Systems (Socket.IO)Pipelines • Observability Postgres • Mongo • Redis Performance & Scaling

PERN (Postgres • Express • React • Node)Kotlin • Jetpack Compose Electron (Desktop & macOS)DevOps: Docker • K8s • CI/CD Realtime Systems (Socket.IO)Pipelines • Observability Postgres • Mongo • Redis Performance & Scaling