Sitemap Similarity Engine
Compares website sitemaps (50–100 sites) by extracting and embedding HTML text via SentenceTransformers to measure content overlap and structural similarity.
Technologies
- Python
- SentenceTransformers
- AWS EC2
- BigQuery
- BeautifulSoup