Firecrawl can recursively search through a urls subdomains, and gather the content
Firecrawl thoroughly crawls websites, ensuring comprehensive data extraction while bypassing any web blocker mechanisms. Here’s how it works:
URL Analysis:
Begins with a specified URL, identifying links by looking at the sitemap and then crawling the website. If no sitemap is found, it will crawl the website following the links.
Recursive Traversal:
Recursively follows each link to uncover all subpages.
Content Scraping:
Gathers content from every visited page while handling any complexities like JavaScript rendering or rate limits.
Result Compilation:
Converts collected data into clean markdown or structured output, perfect for LLM processing or any other task.
This method guarantees an exhaustive crawl and data collection from any starting URL.
from firecrawl import FirecrawlAppapp = FirecrawlApp(api_key="YOUR_API_KEY")crawl_result = app.crawl_url('mendable.ai', {'crawlerOptions': {'excludes': ['blog/*']}})# Get the markdownfor result in crawl_result: print(result['markdown'])