Skip to main content

CleanInput

Options for cleaning up the DOM prior to exporting its content. Many options are available, and this query can destructively remove non-text DOM nodes, DOM attributes, and gratuitous whitespace characters. Since these operations are destructive in their nature it's recommended to run them at the very end of your query in order to preserve page functionality

input CleanInput {
removeNonTextNodes: Boolean
removeAttributes: Boolean
removeRegex: Boolean
selectors: [String!]
attributes: [String]
regexes: [String!]
}

Fields

CleanInput.removeNonTextNodes ● Boolean scalar

When true (default is true) this will remove non-textual nodes from the DOM like scripts, links, video, canvas, etc. You may override this by specifying a selectors argument for DOM selectors to remove.

CleanInput.removeAttributes ● Boolean scalar

When true (default is false) this will remove all attributes on all DOM nodes. Useful for "cleaning" up all HTML markup but preserving the structure overall. You can specify specific attributes to remove with attributes argument

CleanInput.removeRegex ● Boolean scalar

Removes any characters in the HTML by a regex pattern and arn in order. By default this is true and removes newlines, returns, tabs, multi-spaces and HTML comments in that order. You may supply your own regex by using the regexes argument

CleanInput.selectors ● [String!] list scalar

A list of selectors to remove from the page when removeNonTextNodes is set to true (true by default).

CleanInput.attributes ● [String] list scalar

A list of attributes to remove from all DOM nodes. When this isn't specified, and removeAttributes is true, all attributes on all DOM nodes are removed. removeNonTextNodes must be set to true for this to take effect

CleanInput.regexes ● [String!] list scalar

When removeRegex is set to "true" this list of regex items, without the beginning and ending /, are removed from the page. These are each run in order and replaced with a single space character to preserve some of their contents

Member Of

html mutation ● text mutation