Skip to main content

html

Returns the HTML content of the page or selector when specified. This API can also "clean" HTML markup returned by specifying a "clean" argument with numerous options. Features of the "clean" argument include removal of non-text nodes, removal of DOM attributes, as well as removal of excessive whitespace and newlines. Using "clean" can save nearly 1,000 times the payload size. Useful for LLM's and other scenarios

Example:

mutation GetHTML {
goto(url: "https://example.com") {
status
}
html(selector: "h1") {
html
}
}

Remove non-text DOM nodes and all Node attributes, but preserve the DOM tree

mutation GetHTML {
goto(url: "https://example.com") {
status
}

html(clean: {
removeAttributes: true
removeNonTextNodes: true
}) {
html
}
}
html(
selector: String
timeout: Float
visible: Boolean = false
clean: CleanInput
): HTMLResponse

Arguments

html.selector ● String scalar

The DOM selector of the given element you want to return the HTML of

html.timeout ● Float scalar

The maximum amount of time, in milliseconds, to wait for the selector to appear, overriding any defaults. Default timeout is 30 seconds, or 30000.

html.visible ● Boolean scalar

Whether or not to return the HTMLπ content of the element only if it's visible

html.clean ● CleanInput input

Specifies conditions for "cleaning" HTML, useful for minimizing the amount of markup returned for cases like LLMs and more. See nested options for parameters.

Type

HTMLResponse object

HTML content of a page