GroundCite

How GroundCite works

GroundCite is a Retrieval-Augmented Generation (RAG) engine. Unlike a generic chatbot, it answers only from a fixed library of scientific papers and traces every claim back to the exact source passage. When the library does not cover a question, it says so rather than inventing an answer.

The pipeline

1 · You ask

A question in plain language about the indexed research corpus.

2 · Hybrid retrieval

Dense vector search (pgvector) and keyword search (BM25) run in parallel, then fuse with Reciprocal Rank Fusion to surface the most relevant passages.

3 · Rerank

A language model re-orders the candidates by true relevance to the question, keeping only the best passages.

4 · Grounded answer

The model answers using only those passages, citing each claim with [n]. If the evidence is insufficient, it refuses instead of guessing.

Example output

A real question, answered from the corpus. The markers link each statement to a source below.

Question

What problem does sparse attention address for long-context LLM inference?

Answer

Sparse attention tackles the high computational and memory-bandwidth cost of full attention during long-context inference, where the KV cache grows with sequence length and the selection step itself can stay quadratic1. Approaches reduce this by decoupling selection and overlapping CPU-to-GPU prefetch with computation1, or by distilling document collections into reusable key-value caches so static context is not re-encoded on every query2.

Sources

How it is evaluated

Measured, not assumed. RAGAS-style metrics over a hand-curated question set:

0.95
Faithfulness
0.68
Answer relevancy
0.67
Retrieval hit@k

Faithfulness measures how much of the answer is supported by the retrieved passages; answer relevancy, how well it addresses the question; retrieval hit@k, whether the right paper is retrieved. The retrieval score is optimistic by design (each question targets a known paper) and is disclosed as such.

Under the hood

Running live queries

This page is public, but running a live query needs an access code, which keeps the public demo's cost bounded. Contact me to get an access code, then enter it on the home page and ask away.