
Crawlee
Crawlee is an open-source web scraping and browser automation library built for Node.js and Python. It helps developers create reliable crawlers with minimal effort. The library handles the complex parts of web scraping like proxy rotation, request queuing, and data storage. Crawlee supports both simple HTTP requests and headless browsers, making it versatile for different scraping needs. It's built by people who scrape for a living and used daily to crawl millions of pages.
Quick Info
Screenshots


Key Features
Smart Proxy Management
Rotates proxies intelligently with human-like fingerprints to reduce blocking. Automatically discards problematic proxies.
Helper Utilities
Includes tools for extracting social handles, phone numbers, infinite scrolling, and blocking unwanted assets.
Multiple Crawler Types
Choose between HTTP crawling with Cheerio/JSDOM parsers or browser automation with Puppeteer/Playwright for JavaScript-heavy sites.
Queue and Storage
Built-in request queue ensures URL uniqueness and preserves progress. Includes dataset storage for saving structured results.
Anti-Blocking Features
Mimics browser headers and TLS fingerprints with automatic rotation based on real-world traffic patterns.
Automatic Scaling
Manages concurrency based on available system resources to optimize performance without overloading your machine.
Use Cases
Web Data Extraction
Collect structured data from websites for analysis, research, or integration with other systems.
Automated Testing
Use browser automation capabilities to test web applications across different scenarios.
Content Monitoring
Track changes on websites and collect updates automatically for monitoring competitors or market changes.
Market Research
Gather pricing, product information, and other competitive data from multiple sources automatically.
Lead Generation
Extract contact information and business details from websites for sales and marketing purposes.
Pricing
Free and open-source. Cloud deployment on Apify platform has separate pricing tiers.
Setup Steps
- Install Node.js 16 or higher
- Run "npx crawlee create my-crawler" or install manually with "npm install crawlee"
- Choose your crawler type (Cheerio, Puppeteer, or Playwright)
- Implement the request handler to process page content
- Add starting URLs and run the crawler