Firecrawl Scrape
Blocks for scraping individual web pages and extracting content using Firecrawl.
Firecrawl Scrape
What it is
Firecrawl scrapes a website to extract comprehensive data while bypassing blockers.
How it works
This block uses Firecrawl's scraping API to extract content from a single URL. It handles JavaScript rendering, bypasses anti-bot measures, and can return content in multiple formats including markdown, HTML, and screenshots.
Configure output formats, filter to main content only, and set wait times for dynamic pages. The block returns comprehensive results including extracted content, links found on the page, and optional change tracking data.
Inputs
url
The URL to crawl
str
Yes
limit
The number of pages to crawl
int
No
only_main_content
Only return the main content of the page excluding headers, navs, footers, etc.
bool
No
max_age
The maximum age of the page in milliseconds - default is 1 hour
int
No
wait_for
Specify a delay in milliseconds before fetching the content, allowing the page sufficient time to load.
int
No
formats
The format of the crawl
List["markdown" | "html" | "rawHtml" | "links" | "screenshot" | "screenshot@fullPage" | "json" | "changeTracking"]
No
Outputs
error
Error message if the scrape failed
str
data
The result of the crawl
Dict[str, Any]
markdown
The markdown of the crawl
str
html
The html of the crawl
str
raw_html
The raw html of the crawl
str
links
The links of the crawl
List[str]
screenshot
The screenshot of the crawl
str
screenshot_full_page
The screenshot full page of the crawl
str
json_data
The json data of the crawl
Dict[str, Any]
change_tracking
The change tracking of the crawl
Dict[str, Any]
Possible use case
Article Extraction: Scrape news articles or blog posts to extract clean, readable content.
Price Monitoring: Regularly scrape product pages to track price changes over time.
Content Backup: Create markdown backups of important web pages for offline reference.
Last updated
Was this helpful?