- URL Analysis: Begins with a specified URL, identifying links by looking at the sitemap and then crawling the website. If no sitemap is found, it will crawl the website following the links.
- Recursive Traversal: Recursively follows each link to uncover all subpages.
- Content Scraping: Gathers content from every visited page while handling any complexities like JavaScript rendering or rate limits.
- Result Compilation: Converts collected data into clean markdown or structured output, perfect for LLM processing or any other task.
Crawling
/crawl endpoint
用于爬取一个URL及其所有可访问的子页面。这会提交一个爬虫任务并返回一个作业ID,以检查爬虫的状态。安装
使用
Job ID Response
如果您不使用SDK或更喜欢使用webhook或其他轮询方法,可以将wait_until_done
设置为false
。这将返回一个jobId。
对于cURL,/crawl将始终返回一个jobId,您可以用它来检查爬虫的状态。