Monitoring & Security (Reliability, Scaling, Guardrails)
Ensure the Vespa + FastAPI + OpenAI stack runs safely, predictably, and transparently once deployed beyond localhost. This page summarizes how the live system is observed, protected, and governed.
GKEGrafana / PrometheusCloud LoggingCloud ArmorJWT + OIDCSecret Manager
Infrastructure Overview
Where everything runs
Kubernetes (GKE)
Autoscaled serving + labeling
- Vespa content + container nodes
- FastAPI pods (search, labeling, demos)
- Cloud SQL sidecars for Postgres
Networking
Ingress + Firewall
- HTTPS Ingress (Managed Cert)
- Cloud Armor WAF / rules
- Internal Services for Vespa/DB
Storage & Logs
Durable + auditable
- GCS for datasets & artifacts
- Cloud Logging (Stackdriver)
- Object versioning + backups
Monitoring / Dashboards
Real-time visibility
- Prometheus scrape β Grafana
- Alert Policies β Slack/PagerDuty
- Cloud Logging β error analytics
Identity / Access
Zero-trust model
- Service Accounts (least privilege)
- API tokens per role
- Admin via OIDC + 2FA
Pod Health & Performance
Latency, throughput, and autoscaling
Alerting
- Cluster Summary: CPU/Memory per nodepool, pod status.
- Vespa Query Latency: p50/p95/p99 by rank profile.
- Feed Throughput: docs/sec + retries per content node.
- Autoscaler Events: node adds/removes vs load.
- FastAPI: requests/sec + error rate by endpoint.
Alerts: p95 > 500 ms, restarts > 3/10 min, CPU > 85% sustained.
Grafana (live)
Add NEXT_PUBLIC_GRAFANA_URL to embed
Provide a public or auth-proxied Grafana URL via
NEXT_PUBLIC_GRAFANA_URL to render an embed here.Security Controls
Perimeter β AuthZ β Data privacy
1) Perimeter
- Cloud Armor WAF (IP reputation, geo)
- Rate limit (e.g., 3 req/min per token)
- hCaptcha/ReCAPTCHA on public forms
- HTTPS-only via Managed Cert
2) AuthN/AuthZ
- JWT bearer tokens for raters/admins
- Anonymous read endpoints w/ caps
- Admin via GCP OIDC + 2FA
3) Data Integrity & Privacy
- PII-stripped logs, hashed rater IDs
- Nightly GCS backups + versioning
- Secrets in Secret Manager
AI Use Guardrails
Transparency, reversibility, bias checks
Transparency & Opt-Out
GPT models assist in auto-labeling & embeddings; no personal data is processed or stored. Users may pause auto-labeling to stay manual.
Ethical Guidelines
- Explainability (βShow relevance mathβ tooltips)
- Reversibility (versioned, revertible adjustments)
- Non-manipulative UX (clear consent)
- Bias checks (weekly category imbalance scan)
Scaling Reliability & Failover
How we grow and recover
| Component | Scaling | Recovery |
|---|---|---|
| Vespa content nodes | HPA (1β5) + replica sync | Warm replica failover |
| FastAPI containers | Autoscale (GKE/Cloud Run) | 0βN cold start |
| CloudSQL (Postgres) | Managed HA | Point-in-time restore |
| GCS buckets | Multi-region | Immutable history |
| Grafana/Prometheus | StatefulSet + PVC | Snapshot restore job |
Load test (Locust) sustained ~3,000 req/min under default limits; bursts absorbed by queue buffering.
Bot Detection & Abuse Mitigation
Keep signals human and trustworthy
- Behavior heuristics (keypress timing entropy)
- Consensus API: β€1 vote/sec per session
- Honeytoken queries as controls
- Violations β token suspension + Cloud Armor quarantine
Compliance & Logging
Retention, export, verification
- GDPR/CCPA disclosure + data export endpoint
- 30-day user-interaction logs; aggregate metrics kept indefinitely
- Access logs SHA-256 signed and verified hourly
- Periodic penetration tests via Cloud Security Scanner
Grafana Embed
Pod CPU + Query latency
Add
NEXT_PUBLIC_GRAFANA_URL to render a live panel here.Firewall & Request Path
Browser β Ingress β Cloud Armor β FastAPI β Vespa

Replace image with your architecture diagram when ready.
Bot Detection Flow

Ethical AI Statement
βOur system amplifies human expertise, not replaces it.β We keep users in the loop, show how scores are computed, and allow reversions of automated decisions.
Technology Stack!
The Architecture and Tools Supporting this Project
0:00 / 0:00