More APIs
Reranking
Rerank a list of documents by relevance to a query.
Reranker models take a query and a list of candidate documents and return a relevance score for each document. They're typically used as a second-pass filter after an initial vector search to improve retrieval quality in RAG pipelines.
Endpoint
bash
POST https://api.devupai.com/v1/inference/{model_name}Example
import requests
DEVUP_API_KEY = "$DEVUP_API_KEY"
MODEL = "cross-encoder/ms-marco-MiniLM-L-12-v2"
response = requests.post(
f"https://api.devupai.com/v1/inference/{MODEL}",
headers={
"Authorization": f"Bearer {DEVUP_API_KEY}",
"Content-Type": "application/json",
},
json={
"query": "What is the capital of France?",
"documents": [
"Paris is the capital and most populous city of France.",
"Berlin is the capital of Germany.",
"The Eiffel Tower is located in Paris.",
"France is a country in Western Europe."
],
},
)
result = response.json()
for item in result["scores"]:
print(item)Response
json
{
"scores": [0.98, 0.02, 0.45, 0.31]
}Scores are relevance probabilities in the range [0, 1], in the same order as the input documents. Sort by score descending to get the most relevant documents first.
Usage in a RAG pipeline
A typical pattern:
- Retrieve — run a vector similarity search to fetch the top-N candidate chunks (e.g. top 50)
- Rerank — pass the query + candidates to a reranker to get relevance scores
- Select — keep only the top-K highest-scoring chunks (e.g. top 5) for the LLM context
This two-stage approach improves precision significantly compared to embedding similarity alone.
python
# 1. Get initial candidates from your vector DB
candidates = vector_db.search(query, top_k=50)
# 2. Rerank
response = requests.post(
"https://api.devupai.com/v1/inference/cross-encoder/ms-marco-MiniLM-L-12-v2",
headers={"Authorization": f"Bearer {DEVUP_API_KEY}", "Content-Type": "application/json"},
json={"query": query, "documents": [c["text"] for c in candidates]},
)
scores = response.json()["scores"]
# 3. Select top-K
ranked = sorted(zip(scores, candidates), reverse=True)
top_chunks = [doc for _, doc in ranked[:5]]