Single Search Evaluation Playground
Search anything in the embedded app, toggle relevance on results, and watch Precision, Recall, MAP, and nDCG update live. Powered by Vespa (ann_summary_2), FastAPI, and PostgreSQL!
Vespa + ANNLive metricsUser A/B TestsSearch History
How to use this page
- Enter a search. Use Popular or Recent queries in the app to explore.
- Decide what’s relevant. Check the Relevant box for results that truly answer your query.
- Watch metrics update. Precision, Recall, MAP, and nDCG all recalc live at k = 3, 5, 10.
- Repeat with new queries to see how consistent the model is.
Goal: make the top results both accurate and complete.
What the metrics mean
Precision@k
(# relevant in top k) / k
Higher ⇒ fewer junk results up top.
Recall@k
(# relevant ≤ k) / (total relevant)
Checks coverage; rises with k.
MAP
avg precision at each relevant hit
Rewards early, consistent hits.
nDCG
Σ(rel_i / log₂(i+1)) normalized
Rewards good ordering by rank.
0:00 / 0:00
Another Demo! Correcting Results with Keywords!
Have important searches giving the wrong results? Embed a relevance increase or decrease!