EmotionSpectrum DistilBERT inference API
FastAPI · micro-batched · observable · runs offline

Read the feeling behind any sentence

A fine-tuned DistilBERT classifier, wrapped in a typed, batched, monitored API. Type a line and watch the live POST /predict response paint the full six-emotion spectrum — served right now by the offline stub.

Sent to POST /predict and classified into six emotions with full probabilities.

Press /Ctrl+Enter to run

Or try a feeling

The widget is just one HTTP call

Everything above is POST /predict. Call it from anything — single sentence or a batch.

Built for throughput, measured under load

A dynamic micro-batcher coalesces concurrent requests into a single forward pass. Numbers from the included scripts/loadtest.py against the offline stub, single worker, Apple-silicon laptop — reproducible from a clean checkout.

Peak throughput
604 req/s
at 16 concurrent clients
p50 latency
8.3 ms
serial round-trip, full stack
Throughput scaling
~5×
serial → 8 concurrent (batcher)
Errors
0
across all benchmarked runs
Concurrency Throughput (req/s) p50 (ms) p95 (ms) p99 (ms)
11188.278.8713.12
859513.3516.4919.97
1660419.0367.49107.39

Reflects the stub plus full HTTP / validation / batching overhead. Reproduce with make bench.

The production layer, not just a model

Everything you need to actually ship inference — typed contracts, health probes, metrics a dashboard can read, and a zero-download offline mode for CI and demos.

Health probes

GET /healthz — readiness + liveness; 503 until the model is loaded and the batcher runs. Wired to a Docker HEALTHCHECK.

Open /healthz

Prometheus metrics

GET /metrics — request rate, latency histogram, in-flight gauge, errors, plus model inference latency and batch-size histograms.

Open /metrics

Typed OpenAPI contract

pydantic v2 validation on every request; single + batch share one endpoint. Interactive Swagger UI at /docs.

Open /docs

Dynamic micro-batching

Concurrent single requests are coalesced into one forward pass with a latency cap you control — ~5× throughput under load.

Container + monitoring

Multi-stage non-root Docker image; docker compose up brings up the API with Prometheus and a provisioned Grafana dashboard.

Offline-first by default

A deterministic lexicon stub backs the API when OFFLINE=1 — the whole service, demo, tests, and load test run with zero downloads. Flip to OFFLINE=0 for the real model.