web scraping

Web Scraping with n8n and Playwright for Dynamic Content Extraction

n8n.coach

18 May 2025 — 2 min read

In today's data-driven world, extracting information from websites is a crucial task for businesses, researchers, and developers. However, many modern websites rely on dynamic content loaded via JavaScript, making traditional scraping tools ineffective. Enter n8n and Playwright—a powerful combination for scraping dynamic content efficiently.

In this blog post, we’ll explore how to use n8n, a low-code workflow automation tool, alongside Playwright, a robust browser automation library, to extract dynamic web content seamlessly.

Why Use n8n and Playwright for Web Scraping?

The Challenge of Dynamic Content

Traditional web scrapers (like BeautifulSoup or Scrapy) struggle with websites that load content dynamically using JavaScript. Since these tools only parse static HTML, they miss data rendered after page load.

The Solution: Browser Automation

Playwright is a Node.js library that automates Chromium, Firefox, and WebKit browsers, allowing you to interact with pages just like a real user. This makes it perfect for scraping dynamic content.

n8n, on the other hand, is a workflow automation tool that integrates Playwright seamlessly, enabling you to build scraping workflows without writing extensive code.

Setting Up n8n with Playwright

Prerequisites

A running n8n instance (self-hosted or cloud-based)
Basic familiarity with n8n workflows

Step 1: Install Playwright in n8n

If you're self-hosting n8n, ensure Playwright is installed in your environment:

bash npm install playwright

For n8n.cloud users, Playwright is typically pre-installed.

Step 2: Create a New Workflow

Open your n8n dashboard and create a new workflow.
Add an HTTP Request node to fetch the initial page (optional, if needed).

Step 3: Add a Playwright Node

Search for the Playwright node in the node panel and add it to your workflow.
Configure the node:
Operation: Choose "Extract Data from Page"
URL: Enter the target website URL
Wait For Selector: Specify a CSS selector to ensure dynamic content is loaded (e.g., div.results)
Extraction Method: Use JavaScript to query DOM elements (example below).

Example: Extracting Dynamic Product Data

Here’s a sample Playwright node configuration to scrape product names and prices from an e-commerce site:

```javascript const products = []; const items = await page.$$('.product-item');

for (const item of items) { products.push({ name: await item.$eval('.product-name', el => el.textContent.trim()), price: await item.$eval('.price', el => el.textContent.trim()) }); }

return products; ```

Step 4: Process and Store Data

After extraction, use n8n nodes like:
- Set or Function nodes to transform data.
- Spreadsheet File or Database nodes to save results.

Best Practices for Reliable Scraping

Respect Robots.txt: Check the website’s scraping policies.
Use Delays: Avoid overloading servers by adding pauses between requests.
Handle Errors: Implement retry logic for failed requests.
Rotate User Agents: Prevent blocking by varying headers.

Conclusion

Combining n8n and Playwright provides a flexible, low-code solution for scraping dynamic web content. With Playwright’s browser automation and n8n’s workflow capabilities, you can extract data efficiently while minimizing manual coding.

Whether you're gathering market research, monitoring competitors, or aggregating data for analysis, this approach offers a scalable and maintainable solution.

Ready to automate your web scraping? Try n8n and Playwright today!

Would you like a deeper dive into any specific part of this workflow? Let me know in the comments!