Extract the main content from a web page. This API is useful for extracting the main text, title, and images from a web page. It can be used to create a summary of the content of a web page, or to extract the main content of a web page to display it in a different format.
Not only the main text but also the entire HTML of the main content will be returned.
{
"title": "Happy-Go-Lucky Australia Is Feeling Neither Happy, Nor Lucky",
"main_text": "For nearly three decades, Australia seemed to have a sort of get-out-of-jail card that allowed it to glide through [...]",
"main_html": "<article>[...]</article>",
"images": [
"https://static01.nyt.com/images/2024/03/19/multimedia/00oz-misery-kbjt/00oz-misery-kbjt-superJumbo.jpg?quality=75&auto=webp"
]
} {{ codeCopyText }}
{{ codeCopyText }}
{{ codeCopyText }}
{{ codeCopyText }}
{{ codeCopyText }}
{{ codeCopyText }}
The API extracts the main text content, the title, and the main images from a web page, filtering out navigation, ads, and other clutter.
It returns the HTML of the main content area, not the full raw HTML of the entire page. This makes it easier to display or process the relevant content.
Yes, the API identifies and returns the main images associated with the content, which is useful for creating previews or summaries.
The Extract Web Content API is a general-purpose tool for extracting the main content from any web page, while the Extract News API is optimized for news articles and extracts specific metadata like authors and publish dates.