Skip to main content
Architecture

Architecture & methodology

OpenClaw Tracker detects duplicate pull requests by standardizing PR intent with Gemini, embedding it into vectors with Voyage AI, and ranking candidates with MongoDB Atlas Vector Search.

System Architecture

Data flows from GitHub through standardization and embedding to the web UI.

๐Ÿ™GitHub APIFetch PRs
๐ŸƒMongoDB AtlasCentral Store
โœจGeminiStandardize โ†’ MongoDB
๐Ÿš€Voyage AIEmbed โ†’ MongoDB
๐Ÿ“UMAPProject โ†’ MongoDB
๐Ÿ”—Similaritysimilar_prs โ†’ MongoDB
๐ŸŒWeb UIReads from MongoDB

Detection pipeline

From raw GitHub data to clustered duplicates.

Scoring & thresholds

Scores are normalized to 0-1 and bucketed for quick review.

Critical โ‰ฅ 90%High โ‰ฅ 80%Medium โ‰ฅ 70%Low < 70%
  • Cosine similarity scores from Atlas Vector Search, normalized to 0-100% for display.
  • PR Checker default threshold: 80%. Cluster default: 85% (configurable).
  • Query performance: ~50-100ms for top-10 similar PRs using Atlas Vector Search with scalar quantization.
  • Each PR stores 15-50 similar pairs via progressive thresholds (0.85โ†’0.80โ†’0.75โ€ฆ) to guarantee minimum coverage.

Security model

Public demo โ€” no authentication required.

  • All endpoints are public โ€” no auth required (public demo deployment)
  • /api/check disabled โ€” returns 503 to prevent cost from public usage
  • Rate limiting via MongoDB-backed sliding window
  • CSP headers, HSTS, X-Frame-Options configured

Data model

MongoDB collections: pull_requests, sync_metadata, and usage_logs.

API endpoints

REST API surface with example requests and responses.