html
Returns the HTML content of the page or selector when specified. This API can also "clean" HTML markup returned by specifying a "clean" argument with numerous options. Features of the "clean" argument include removal of non-text nodes, removal of DOM attributes, as well as removal of excessive whitespace and newlines. Using "clean" can save nearly 1,000 times the payload size. Useful for LLM's and other scenarios
Example:
mutation GetHTML {
goto(url: "https://example.com") {
status
}
html(selector: "h1") {
html
}
}
Remove non-text DOM nodes and all Node attributes, but preserve the DOM tree
mutation GetHTML {
goto(url: "https://example.com") {
status
}
html(clean: {
removeAttributes: true
removeNonTextNodes: true
}) {
html
}
}
html(
selector: String
timeout: Float
visible: Boolean = false
clean: CleanInput
): HTMLResponse
Arguments
html.selector
● String
scalar
The DOM selector of the given element you want to return the HTML of
html.timeout
● Float
scalar
The maximum amount of time, in milliseconds, to wait for the selector to appear, overriding any defaults. Default timeout is 30 seconds, or 30000.
html.visible
● Boolean
scalar
Whether or not to return the HTMLπ content of the element only if it's visible
html.clean
● CleanInput
input
Specifies conditions for "cleaning" HTML, useful for minimizing the amount of markup returned for cases like LLMs and more. See nested options for parameters.
Type
HTMLResponse
object
HTML content of a page