Architecture & methodology
OpenClaw Tracker detects duplicate pull requests by standardizing PR intent with Gemini, embedding it into vectors with Voyage AI, and ranking candidates with MongoDB Atlas Vector Search.
System Architecture
Data flows from GitHub through standardization and embedding to the web UI.
๐GitHub APIFetch PRs
๐MongoDB AtlasCentral Store
๐Web UIVisualize
โจGeminiStandardize
โ standardised_pr๐Voyage AIEmbed
โ embedding[]๐UMAPProject
โ umap_x, umap_y๐SimilarityCompute
โ similar_prs[]๐GitHub APIFetch PRs
๐MongoDB AtlasCentral Store
โจGeminiStandardize โ MongoDB
๐Voyage AIEmbed โ MongoDB
๐UMAPProject โ MongoDB
๐Similaritysimilar_prs โ MongoDB
๐Web UIReads from MongoDB
Detection pipeline
From raw GitHub data to clustered duplicates.
Scoring & thresholds
Scores are normalized to 0-1 and bucketed for quick review.
Critical โฅ 90%High โฅ 80%Medium โฅ 70%Low < 70%
- Cosine similarity scores from Atlas Vector Search, normalized to 0-100% for display.
- PR Checker default threshold: 80%. Cluster default: 85% (configurable).
- Query performance: ~50-100ms for top-10 similar PRs using Atlas Vector Search with scalar quantization.
- Each PR stores 15-50 similar pairs via progressive thresholds (0.85โ0.80โ0.75โฆ) to guarantee minimum coverage.
Security model
Public demo โ no authentication required.
- All endpoints are public โ no auth required (public demo deployment)
- /api/check disabled โ returns 503 to prevent cost from public usage
- Rate limiting via MongoDB-backed sliding window
- CSP headers, HSTS, X-Frame-Options configured
Data model
MongoDB collections: pull_requests, sync_metadata, and usage_logs.
API endpoints
REST API surface with example requests and responses.
Tech stack
Technologies powering OpenClaw Tracker.