Read the feeling behind any sentence
A fine-tuned DistilBERT classifier, wrapped in a typed, batched, monitored API. Type a line and watch
the live
POST /predict
response paint the full six-emotion spectrum — served right now by the offline stub.
Sent to POST /predict and classified into six emotions with full probabilities.
Press ⌘/Ctrl+Enter to run
Or try a feeling
The widget is just one HTTP call
Everything above is POST /predict. Call it from anything — single sentence or a batch.
Built for throughput, measured under load
A dynamic micro-batcher coalesces concurrent requests into a single forward pass. Numbers from the
included scripts/loadtest.py
against the offline stub, single worker, Apple-silicon laptop — reproducible from a clean checkout.
| Concurrency | Throughput (req/s) | p50 (ms) | p95 (ms) | p99 (ms) |
|---|---|---|---|---|
| 1 | 118 | 8.27 | 8.87 | 13.12 |
| 8 | 595 | 13.35 | 16.49 | 19.97 |
| 16 | 604 | 19.03 | 67.49 | 107.39 |
Reflects the stub plus full HTTP / validation / batching overhead. Reproduce with
make bench.
The production layer, not just a model
Everything you need to actually ship inference — typed contracts, health probes, metrics a dashboard can read, and a zero-download offline mode for CI and demos.
Health probes
GET /healthz — readiness + liveness; 503 until the model is loaded and the batcher runs. Wired to a Docker HEALTHCHECK.
Prometheus metrics
GET /metrics — request rate, latency histogram, in-flight gauge, errors, plus model inference latency and batch-size histograms.
Typed OpenAPI contract
pydantic v2 validation on every request; single + batch share one endpoint. Interactive Swagger UI at /docs.
Dynamic micro-batching
Concurrent single requests are coalesced into one forward pass with a latency cap you control — ~5× throughput under load.
Container + monitoring
Multi-stage non-root Docker image; docker compose up brings up the API with Prometheus and a provisioned Grafana dashboard.
Offline-first by default
A deterministic lexicon stub backs the API when OFFLINE=1 — the whole service, demo, tests, and load test run with zero downloads. Flip to OFFLINE=0 for the real model.