Extract Content from a Web Page API

Q: Does it return the full HTML?

It returns the HTML of the main content area, not the full raw HTML of the entire page. This makes it easier to display or process the relevant content.

Extract the main content from a web page. This API is useful for extracting the main text, title, and images from a web page. It can be used to create a summary of the content of a web page, or to extract the main content of a web page to display it in a different format.

Not only the main text but also the entire HTML of the main content will be returned.

GET

https://api.apileague.com/extract-content

Example Request and Response

GET

https://api.apileague.com/{{ examples.getExtractContent }}

{
    "title": "Happy-Go-Lucky Australia Is Feeling Neither Happy, Nor Lucky",
    "main_text": "For nearly three decades, Australia seemed to have a sort of get-out-of-jail card that allowed it to glide through [...]",
    "main_html": "<article>[...]</article>",
    "images": [
        "https://static01.nyt.com/images/2024/03/19/multimedia/00oz-misery-kbjt/00oz-misery-kbjt-superJumbo.jpg?quality=75&auto=webp"
    ]
}

cURL

Java

Javascript

Python

 {{ codeCopyText }}

 {{ codeCopyText }}

 {{ codeCopyText }}

 {{ codeCopyText }}

 {{ codeCopyText }}

 {{ codeCopyText }}

Frequently Asked Questions

What does the API extract?

The API extracts the main text content, the title, and the main images from a web page, filtering out navigation, ads, and other clutter.

Does it return the full HTML?

It returns the HTML of the main content area, not the full raw HTML of the entire page. This makes it easier to display or process the relevant content.

Can it extract images?

Yes, the API identifies and returns the main images associated with the content, which is useful for creating previews or summaries.

How is the Extract Web Content API different from the Extract News API?

The Extract Web Content API is a general-purpose tool for extracting the main content from any web page, while the Extract News API is optimized for news articles and extracts specific metadata like authors and publish dates.