# Extraction Rules

{% hint style="success" %}
Extraction rules can be applied with both [JavaScript rendering](https://docs.webscrapingapi.com/browser-api/advanced-api-features/rendering-javascript) enabled and disabled.
{% endhint %}

BrowserAPI allows you to extract specific sections of the webpage. You can do so by using the `extract_rules` parameter.

This parameter's value can be a `string` (the CSS selector or XPath) or a `stringified object`. In the second case, the parameter accepts the following options:

<table><thead><tr><th width="200">Parameter</th><th width="113" align="center">Type</th><th>Description</th></tr></thead><tbody><tr><td><code>selector</code><br><mark style="color:red;background-color:red;">Required</mark></td><td align="center"><code>string</code></td><td>The CSS selector or the XPath.</td></tr><tr><td><code>selector_type</code></td><td align="center"><code>string</code></td><td>The type of the <code>selector</code> option. Accepted values are <code>css</code> and <code>xpath</code>. The default value is <code>xpath</code> if the <code>selector</code> option starts with <code>/</code>, and <code>css</code> otherwise.</td></tr><tr><td><code>output</code></td><td align="center"><code>string</code></td><td>The output format of the selected element. Accepted values are:<br>- <code>html</code> - returns HTML format<br>- <code>text</code> - (default) returns text format<br>- <code>@[attr]</code> - returns the attribute of the element<br>- <code>table_json</code> - returns the JSON format of a table<br>- <code>table_array</code> - returns the array format of a table<br>- another <code>extract_rules</code> object - used to parse nested elements.</td></tr><tr><td><code>all</code></td><td align="center"><code>int</code></td><td>Returns all possible elements. The default value for this parameter is <code>"1"</code>.</td></tr><tr><td><code>clean</code></td><td align="center"><code>int</code></td><td>Removes leading and trailing white spaces, line terminator characters, and newlines from the result. The default value for this parameter is <code>"1"</code>.</td></tr></tbody></table>

A full example of how this parameter would look in production is:

{% code overflow="wrap" %}

```javascript
extract_rules='{"title": {"selector": "h1", "output": "html"}, "subtitle": {"selector": ".font-light.max-w-6xl", "output": "text"}}'
```

{% endcode %}

or:

```javascript
extract_rules='{"title": "h1"}'
```

### Extraction Rules Integration Examples

## Extract Content Based on CSS Rules

<mark style="color:blue;">`GET`</mark> `https://api.webscrapingapi.com/v1`

The following examples shows how the `extraction_rules` parameter is used in order to extract specific elements from the targeted website.

#### Query Parameters

<table><thead><tr><th width="185">Name</th><th width="133">Type</th><th>Description</th></tr></thead><tbody><tr><td>api_key<mark style="color:red;">*</mark></td><td>String</td><td><code>&#x3C;YOUR_API_KEY></code></td></tr><tr><td>url<mark style="color:red;">*</mark></td><td>String</td><td><code>https://webscrapingapi.com</code></td></tr><tr><td>extract_rules</td><td>Object</td><td><p><code>{</code></p><p><code>"title": {</code></p><p><code>"selector": "h1",</code></p><p><code>"output": "html"</code></p><p><code>},</code></p><p><code>"subtitle": {</code></p><p><code>"selector": ".font-light.max-w-6x",</code></p><p><code>"output": "text"</code></p><p><code>}</code></p><p><code>}</code></p></td></tr></tbody></table>

{% tabs %}
{% tab title="200: OK Successfully extracted data" %}
{% code overflow="wrap" %}

```json
{
    "title": [
        "<h1 class=\"max-w-2xl mb-4 font-extrabold tracking-tight leading-tight dark:text-white text-4xl md:text-5xl xl:text-6xl\">Transform Websites <br>Into Data</h1>"
    ],
    "subtitle": [
        "Navigate the web data landscape effortlessly with our proxy networks, cutting-edge web scrapers, and dedicated data extraction experts. Choose your path - DIY or fully managed by us.",
        "Data collection reinvented: with our advanced scraper APIs, infrastructure building and maintenance become a thing of the past.",
        "Empower your business decision-making with our reliable managed data extraction service. We handle the intricate extraction process, legal compliance, and quality assurance so you can focus on deriving insights.",
        "Unleash the power of insight with tailored data solutions to push your business forward.",
        "Elevate your competitive game in your Industry",
        "From startups to Fortune 500s, WebScrapingAPI® stands as the premier choice for progressive businesses seeking superior data gathering solutions worldwide.",
        "Effortless and innovative solutions tailored to your unique use case, just a click away."
    ]
}
```

{% endcode %}
{% endtab %}

{% tab title="400: Bad Request Incorrect `extract_rules` format" %}

```javascript
{
    "status": "Failure",
    "status_code": 400,
    "created_at": "2022-09-12T12:09:55.157Z",
    "processed_at": null,
    "time_taken": {
        "total": 0.034,
        "scraping": 0
    },
    "error": "Key `extract_rules` must be a stringified object."
}
```

{% endtab %}
{% endtabs %}

The full **GET** request for the `extract_rules` should be:

{% code overflow="wrap" %}

```
https://api.webscrapingapi.com/v1?api_key=<YOUR_API_KEY>&url=https://webscrapingapi.com&extract_rules=%7B%22title%22%3A%20%7B%22selector%22%3A%20%22h1%22%2C%20%22output%22%3A%20%22text%22%7D%2C%20%22subtitle%22%3A%20%7B%22selector%22%3A%20%22.font-light.max-w-6xl%22%2C%20%22output%22%3A%20%22text%22%7D%7D
```

{% endcode %}

{% tabs %}
{% tab title="cURL" %}
{% code overflow="wrap" %}

```bash
curl https://api.webscrapingapi.com/v1?api_key=<YOUR_API_KEY>&url=https%3A%2F%2Fwebscrapingapi.com&extract_rules=%7B%22title%22%3A%20%7B%22selector%22%3A%20%22h1%22%2C%20%22output%22%3A%20%22text%22%7D%2C%20%22subtitle%22%3A%20%7B%22selector%22%3A%20%22.font-light.max-w-6xl%22%2C%20%22output%22%3A%20%22text%22%7D%7D
```

{% endcode %}
{% endtab %}

{% tab title="NodeJS" %}
{% code overflow="wrap" %}

```javascript
const http = require("https");

const options = {
  "method": "GET",
  "hostname": "api.webscrapingapi.com",
  "port": null,
  "path": "/v1?api_key=<YOUR_API_KEY>&url=https%3A%2F%2Fwebscrapingapi.com&extract_rules=%7B%22title%22%3A%20%7B%22selector%22%3A%20%22h1%22%2C%20%22output%22%3A%20%22text%22%7D%2C%20%22subtitle%22%3A%20%7B%22selector%22%3A%20%22.font-light.max-w-6xl%22%2C%20%22output%22%3A%20%22text%22%7D%7D",
  "headers": {}
};

const req = http.request(options, function (res) {
  const chunks = [];

  res.on("data", function (chunk) {
    chunks.push(chunk);
  });

  res.on("end", function () {
    const body = Buffer.concat(chunks);
    console.log(body.toString());
  });
});

req.end();
```

{% endcode %}
{% endtab %}

{% tab title="Python" %}
{% code overflow="wrap" %}

```python
import requests

API_KEY = '<YOUR_API_KEY>'
SCRAPER_URL = 'https://api.webscrapingapi.com/v1'

TARGET_URL = 'https://webscrapingapi.com/'

PARAMS = {
    "api_key":API_KEY,
    "url": TARGET_URL,
    "extract_rules":'{"title": {"selector": "h1", "output": "html"}, "subtitle": {"selector": ".font-light.max-w-6xl", "output": "text"}}'
}

response = requests.get(SCRAPER_URL, params=PARAMS)

print(response.text)
```

{% endcode %}
{% endtab %}

{% tab title="PHP" %}
{% code overflow="wrap" %}

```php
<?php

$curl = curl_init();

curl_setopt_array($curl, [
  CURLOPT_URL => "https://api.webscrapingapi.com/v1?api_key=<YOUR_API_KEY>&url=https%3A%2F%2Fwebscrapingapi.com&extract_rules=%7B%22title%22%3A%20%7B%22selector%22%3A%20%22h1%22%2C%20%22output%22%3A%20%22text%22%7D%2C%20%22subtitle%22%3A%20%7B%22selector%22%3A%20%22.font-light.max-w-6xl%22%2C%20%22output%22%3A%20%22text%22%7D%7D",
  CURLOPT_RETURNTRANSFER => true,
  CURLOPT_ENCODING => "",
  CURLOPT_MAXREDIRS => 10,
  CURLOPT_TIMEOUT => 30,
  CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
  CURLOPT_CUSTOMREQUEST => "GET",
]);

$response = curl_exec($curl);
$err = curl_error($curl);

curl_close($curl);

if ($err) {
  echo "cURL Error #:" . $err;
} else {
  echo $response;
}
```

{% endcode %}
{% endtab %}

{% tab title="Go" %}
{% code overflow="wrap" %}

```go
package main

import (
	"fmt"
	"net/http"
	"io/ioutil"
)

func main() {

	url := "https://api.webscrapingapi.com/v1?api_key=<YOUR_API_KEY>&url=https%3A%2F%2Fwebscrapingapi.com&extract_rules=%7B%22title%22%3A%20%7B%22selector%22%3A%20%22h1%22%2C%20%22output%22%3A%20%22text%22%7D%2C%20%22subtitle%22%3A%20%7B%22selector%22%3A%.font-light.max-w-6xl%22%2C%20%22output%22%3A%20%22text%22%7D%7D"

	req, _ := http.NewRequest("GET", url, nil)

	res, _ := http.DefaultClient.Do(req)

	defer res.Body.Close()
	body, _ := ioutil.ReadAll(res.Body)

	fmt.Println(res)
	fmt.Println(string(body))

}
```

{% endcode %}
{% endtab %}

{% tab title="Java" %}
{% code overflow="wrap" %}

```java
HttpResponse<String> response = Unirest.get("https://api.webscrapingapi.com/v1?api_key=<YOUR_API_KEY>&url=https%3A%2F%2Fwebscrapingapi.com&extract_rules=%7B%22title%22%3A%20%7B%22selector%22%3A%20%22h1%22%2C%20%22output%22%3A%20%22text%22%7D%2C%20%22subtitle%22%3A%20%7B%22selector%22%3A%20%22.font-light.max-w-6xl%22%2C%20%22output%22%3A%20%22text%22%7D%7D")
  .asString();
```

{% endcode %}
{% endtab %}

{% tab title=".NET" %}
{% code overflow="wrap" %}

```csharp
var client = new RestClient("https://api.webscrapingapi.com/v1?api_key=<YOUR_API_KEY>&url=https%3A%2F%2Fwebscrapingapi.com&extract_rules=%7B%22title%22%3A%20%7B%22selector%22%3A%20%22h1%22%2C%20%22output%22%3A%20%22text%22%7D%2C%20%22subtitle%22%3A%20%7B%22selector%22%3A%20%22.font-light.max-w-6xl%22%2C%20%22output%22%3A%20%22text%22%7D%7D");
var request = new RestRequest(Method.GET);
IRestResponse response = client.Execute(request);
```

{% endcode %}
{% endtab %}

{% tab title="Ruby" %}
{% code overflow="wrap" %}

```ruby
require 'uri'
require 'net/http'
require 'openssl'

url = URI("https://api.webscrapingapi.com/v1?api_key=<YOUR_API_KEY>&url=https%3A%2F%2Fwebscrapingapi.com&extract_rules=%7B%22title%22%3A%20%7B%22selector%22%3A%20%22h1%22%2C%20%22output%22%3A%20%22text%22%7D%2C%20%22subtitle%22%3A%20%7B%22selector%22%3A%20%22.font-light.max-w-6xl%22%2C%20%22output%22%3A%20%22text%22%7D%7D")

http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE

request = Net::HTTP::Get.new(url)

response = http.request(request)
puts response.read_body
```

{% endcode %}
{% endtab %}
{% endtabs %}

{% hint style="danger" %} <mark style="color:red;">**Important!**</mark> The `url` & `extract_rules` parameters have to be encoded.

*( i.e. **\&url=https%3A%2F%2Fwww\.webscrapingapi.com%2F\&extract\_rules=%7B%22title%22%3A%20%7B%22selector...** )*
{% endhint %}

<details>

<summary>Response Example</summary>

```javascript
{
    "title": [
        "Transform Websites Into Data"
    ],
    "subtitle": [
        "Navigate the web data landscape effortlessly with our proxy networks, cutting-edge web scrapers, and dedicated data extraction experts. Choose your path - DIY or fully managed by us.",
        "Data collection reinvented: with our advanced scraper APIs, infrastructure building and maintenance become a thing of the past.",
        "Empower your business decision-making with our reliable managed data extraction service. We handle the intricate extraction process, legal compliance, and quality assurance so you can focus on deriving insights.",
        "Unleash the power of insight with tailored data solutions to push your business forward.",
        "Elevate your competitive game in your Industry",
        "From startups to Fortune 500s, WebScrapingAPI® stands as the premier choice for progressive businesses seeking superior data gathering solutions worldwide.",
        "Effortless and innovative solutions tailored to your unique use case, just a click away."
    ]
}
```

</details>

### More `extract_rules` object examples

Here are more examples that should help you better understand how the object passed to the `extract_rules` parameter should look like:

<table><thead><tr><th width="306">HTML Sample</th><th width="311">Extraction Rule</th><th width="236">Rule Description</th><th width="284">JSON Output</th></tr></thead><tbody><tr><td><p><code>&#x3C;div class="title"></code></p><p><code>This is my title</code></p><p><code>&#x3C;/div></code></p></td><td><code>{"title": ".title"}</code></td><td>Return the text content of the elements having the CSS class <code>.title</code></td><td><p><code>{</code></p><p><code>"title": [</code></p><p><code>"This is my title"</code></p><p><code>]</code></p><p><code>}</code></p></td></tr><tr><td><p><code>&#x3C;div></code></p><p><code>&#x3C;a href="https://www.webscrapingapi.com/product/"></code></p><p><code>Product</code></p><p><code>&#x3C;/a></code></p><p><code>&#x3C;a href="https://www.webscrapingapi.com/pricing/"></code></p><p><code>Pricing</code></p><p><code>&#x3C;/a></code></p><p><code>&#x3C;/div></code></p></td><td><p><code>{</code></p><p><code>"links": {</code></p><p><code>"selector": "a",</code></p><p><code>"output": "@href",</code></p><p><code>"all": "1"</code></p><p><code>}</code></p><p><code>}</code></p></td><td>Return the <code>href</code> attribute of all links on page</td><td><p><code>{</code></p><p><code>"links": [</code></p><p><code>"https://www.webscrapingapi.com/product/","https://www.webscrapingapi.com/pricing/"</code></p><p><code>]</code></p><p><code>}</code></p></td></tr><tr><td><p><code>&#x3C;div></code></p><p><code>&#x3C;img src="https://www.webscrapingapi.com/assets/images/icons/full.svg?v=41d081a6f0"</code></p><p><code>></code></p><p><code>&#x3C;/div></code></p></td><td><p><code>{</code></p><p><code>"image": {</code></p><p><code>"selector": "img",</code></p><p><code>"output": "@src",</code></p><p><code>"all": 0,</code></p><p><code>}</code></p><p><code>}</code></p></td><td>Return the <code>src</code> attribute of the first image available on page</td><td><p><code>{</code></p><p><code>"image": [</code></p><p><code>"https://www.webscrapingapi.com/assets/images/icons/full.svg?v=41d081a6f0"</code></p><p><code>]</code></p><p><code>}</code></p></td></tr><tr><td><p><code>&#x3C;table class="ants"></code><br><code>&#x3C;thead></code></p><p><code>&#x3C;tr></code></p><p><code>&#x3C;th>Region&#x3C;/th></code></p><p><code>&#x3C;th>No. species&#x3C;/th></code></p><p><code>&#x3C;/tr></code></p><p><code>&#x3C;/thead></code></p><p><code>&#x3C;tbody></code></p><p><code>&#x3C;tr></code></p><p><code>&#x3C;td>Europe&#x3C;/td></code></p><p><code>&#x3C;td>180&#x3C;/td></code></p><p><code>&#x3C;/tr></code></p><p><code>&#x3C;/tbody></code><br><code>&#x3C;/table></code></p></td><td><p><code>{</code></p><p><code>"table": {</code></p><p><code>"selector": ".ants",</code></p><p><code>"output": "table_json",</code></p><p><code>"all": 0</code></p><p><code>}</code></p><p><code>}</code></p></td><td>Return the JSON format of the first table having the CSS class <code>.ants</code></td><td><p><code>{</code></p><p><code>"table": [</code></p><p><code>{</code></p><p><code>"Region: "Europe",</code></p><p><code>"No. species": "180"</code></p><p><code>}</code></p><p><code>]</code></p><p><code>}</code></p></td></tr><tr><td><p><code>&#x3C;table class="ants"></code><br><code>&#x3C;thead></code></p><p><code>&#x3C;tr></code></p><p><code>&#x3C;th>Region&#x3C;/th></code></p><p><code>&#x3C;th>No. species&#x3C;/th></code></p><p><code>&#x3C;/tr></code></p><p><code>&#x3C;/thead></code></p><p><code>&#x3C;tbody></code></p><p><code>&#x3C;tr></code></p><p><code>&#x3C;td>Europe&#x3C;/td></code></p><p><code>&#x3C;td>180&#x3C;/td></code></p><p><code>&#x3C;/tr></code></p><p><code>&#x3C;/tbody></code><br><code>&#x3C;/table></code></p></td><td><p><code>{</code></p><p><code>"table": {</code></p><p><code>"selector": ".ants",</code></p><p><code>"output": "table_array",</code></p><p><code>"all": 0</code></p><p><code>}</code></p><p><code>}</code></p></td><td>Return the array format of the first table having the CSS class <code>.ants</code></td><td><p><code>{</code></p><p><code>"table": [</code></p><p><code>["Europe", "180"]</code></p><p><code>]</code></p><p><code>}</code></p></td></tr><tr><td><p><code>&#x3C;ul></code></p><p><code>&#x3C;li></code></p><p><code>&#x3C;p class="name">Item1&#x3C;/p></code></p><p><code>&#x3C;p class="price">100&#x3C;/p></code></p><p><code>&#x3C;/li></code></p><p><code>&#x3C;li></code></p><p><code>&#x3C;p class="name">Item2&#x3C;/p></code></p><p><code>&#x3C;p class="price">1000&#x3C;/p></code></p><p><code>&#x3C;/li></code><br><code>&#x3C;/ul></code></p></td><td><p><code>{</code></p><p><code>"items": {</code></p><p><code>"selector": "li",</code></p><p><code>"output": {</code></p><p><code>"name": {</code></p><p><code>"selector": ".name",</code></p><p><code>"all": 0,</code></p><p><code>"price": {</code></p><p><code>"selector": ".price",</code></p><p><code>"all": 0</code></p><p><code>}</code></p><p><code>},</code></p><p><code>"all": 1</code></p><p><code>}</code></p><p><code>}</code></p></td><td>Return the name and the price of each list item.</td><td><p><code>{</code></p><p><code>"items": [</code></p><p><code>{</code></p><p><code>"name": "Item1",</code></p><p><code>"price": "100"</code></p><p><code>},</code></p><p><code>{</code></p><p><code>"name": "Item2",</code></p><p><code>"price": "1000"</code></p><p><code>}</code></p><p><code>]</code></p><p><code>}</code></p></td></tr></tbody></table>
