Extraction Rules

Customize your response by adding extraction rules.

WebScrapingAPI allows you to extract specific sections of the webpage. You can do so by using the extract_rules parameter.

This parameter's value can be a string (the CSS selector or XPath) or a stringified object. In the second case, the parameter accepts the following options:

A full example of how this parameter would look in production is:

extract_rules='{"title": {"selector": "h1", "output": "html"}, "subtitle": {"selector": ".font-light.max-w-6xl", "output": "text"}}'

or:

extract_rules='{"title": "h1"}'

Extraction Rules Integration Examples

Extract Content Based on CSS Rules

GET https://api.webscrapingapi.com/v2

The following examples shows how the extraction_rules parameter is used in order to extract specific elements from the targeted website.

Query Parameters

{
    "title": [
        "<h1 class=\"max-w-2xl mb-4 font-extrabold tracking-tight leading-tight dark:text-white text-4xl md:text-5xl xl:text-6xl\">Transform Websites <br>Into Data</h1>"
    ],
    "subtitle": [
        "Navigate the web data landscape effortlessly with our proxy networks, cutting-edge web scrapers, and dedicated data extraction experts. Choose your path - DIY or fully managed by us.",
        "Data collection reinvented: with our advanced scraper APIs, infrastructure building and maintenance become a thing of the past.",
        "Empower your business decision-making with our reliable managed data extraction service. We handle the intricate extraction process, legal compliance, and quality assurance so you can focus on deriving insights.",
        "Unleash the power of insight with tailored data solutions to push your business forward.",
        "Elevate your competitive game in your Industry",
        "From startups to Fortune 500s, WebScrapingAPI® stands as the premier choice for progressive businesses seeking superior data gathering solutions worldwide.",
        "Effortless and innovative solutions tailored to your unique use case, just a click away."
    ]
}

The full GET request for the extract_rules should be:

https://api.webscrapingapi.com/v2?api_key=<YOUR_API_KEY>&url=https://webscrapingapi.com&extract_rules=%7B%22title%22%3A%20%7B%22selector%22%3A%20%22h1%22%2C%20%22output%22%3A%20%22text%22%7D%2C%20%22subtitle%22%3A%20%7B%22selector%22%3A%20%22.font-light.max-w-6xl%22%2C%20%22output%22%3A%20%22text%22%7D%7D
curl https://api.webscrapingapi.com/v2?api_key=<YOUR_API_KEY>&url=https%3A%2F%2Fwebscrapingapi.com&extract_rules=%7B%22title%22%3A%20%7B%22selector%22%3A%20%22h1%22%2C%20%22output%22%3A%20%22text%22%7D%2C%20%22subtitle%22%3A%20%7B%22selector%22%3A%20%22.font-light.max-w-6xl%22%2C%20%22output%22%3A%20%22text%22%7D%7D

Important!** The url & extract_rules parameters have to be encoded.** ( i.e. &url=https%3A%2F%2Fwww.webscrapingapi.com%2F&extract_rules=%7B%22title%22%3A%20%7B%22selector... )

Response Example
{
    "title": [
        "Transform Websites Into Data"
    ],
    "subtitle": [
        "Navigate the web data landscape effortlessly with our proxy networks, cutting-edge web scrapers, and dedicated data extraction experts. Choose your path - DIY or fully managed by us.",
        "Data collection reinvented: with our advanced scraper APIs, infrastructure building and maintenance become a thing of the past.",
        "Empower your business decision-making with our reliable managed data extraction service. We handle the intricate extraction process, legal compliance, and quality assurance so you can focus on deriving insights.",
        "Unleash the power of insight with tailored data solutions to push your business forward.",
        "Elevate your competitive game in your Industry",
        "From startups to Fortune 500s, WebScrapingAPI® stands as the premier choice for progressive businesses seeking superior data gathering solutions worldwide.",
        "Effortless and innovative solutions tailored to your unique use case, just a click away."
    ]
}

More extract_rules object examples

Here are more examples that should help you better understand how the object passed to the extract_rules parameter should look like:

Last updated