# Firecrawl Crawl

Blocks for crawling multiple pages of a website using Firecrawl.

## Firecrawl Crawl

### What it is

Firecrawl crawls websites to extract comprehensive data while bypassing blockers.

### How it works

This block uses Firecrawl's API to crawl multiple pages of a website starting from a given URL. It navigates through links, handling JavaScript rendering and bypassing anti-bot measures to extract clean content from each page.

Configure the crawl depth with the limit parameter, choose output formats (markdown, HTML, or raw HTML), and optionally filter to main content only. The block supports caching with configurable max age and wait times for dynamic content.

### Inputs

| Input               | Description                                                                                             | Type                                                                                                                        | Required |
| ------------------- | ------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------- | -------- |
| url                 | The URL to crawl                                                                                        | str                                                                                                                         | Yes      |
| limit               | The number of pages to crawl                                                                            | int                                                                                                                         | No       |
| only\_main\_content | Only return the main content of the page excluding headers, navs, footers, etc.                         | bool                                                                                                                        | No       |
| max\_age            | The maximum age of the page in milliseconds - default is 1 hour                                         | int                                                                                                                         | No       |
| wait\_for           | Specify a delay in milliseconds before fetching the content, allowing the page sufficient time to load. | int                                                                                                                         | No       |
| formats             | The format of the crawl                                                                                 | List\["markdown" \| "html" \| "rawHtml" \| "links" \| "screenshot" \| "screenshot\@fullPage" \| "json" \| "changeTracking"] | No       |

### Outputs

| Output                 | Description                           | Type                   |
| ---------------------- | ------------------------------------- | ---------------------- |
| error                  | Error message if the crawl failed     | str                    |
| data                   | The result of the crawl               | List\[Dict\[str, Any]] |
| markdown               | The markdown of the crawl             | str                    |
| html                   | The html of the crawl                 | str                    |
| raw\_html              | The raw html of the crawl             | str                    |
| links                  | The links of the crawl                | List\[str]             |
| screenshot             | The screenshot of the crawl           | str                    |
| screenshot\_full\_page | The screenshot full page of the crawl | str                    |
| json\_data             | The json data of the crawl            | Dict\[str, Any]        |
| change\_tracking       | The change tracking of the crawl      | Dict\[str, Any]        |

### Possible use case

**Documentation Indexing**: Crawl entire documentation sites to build searchable knowledge bases or training data.

**Competitor Research**: Extract content from competitor websites for market analysis and comparison.

**Content Archival**: Systematically archive website content for backup or compliance purposes.

***


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://agpt.co/docs/integrations/block-integrations/crawl.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
