CleanInput
Options for cleaning up the DOM prior to exporting its content. Many options are available, and this query can destructively remove non-text DOM nodes, DOM attributes, and gratuitous whitespace characters. Since these operations are destructive in their nature it's recommended to run them at the very end of your query in order to preserve page functionality
input CleanInput {
removeNonTextNodes: Boolean
removeAttributes: Boolean
removeRegex: Boolean
selectors: [String!]
attributes: [String]
regexes: [String!]
}
Fields
CleanInput.removeNonTextNodes
● Boolean
scalar
When true (default is true) this will remove non-textual nodes from the DOM like scripts, links, video, canvas, etc. You may override this by specifying a selectors
argument for DOM selectors to remove.
CleanInput.removeAttributes
● Boolean
scalar
When true (default is false) this will remove all attributes on all DOM nodes. Useful for "cleaning" up all HTML markup but preserving the structure overall. You can specify specific attributes to remove with attributes
argument
CleanInput.removeRegex
● Boolean
scalar
Removes any characters in the HTML by a regex pattern and arn in order. By default this is true and removes newlines, returns, tabs, multi-spaces and HTML comments in that order. You may supply your own regex by using the regexes
argument
CleanInput.selectors
● [String!]
list scalar
A list of selectors to remove from the page when removeNonTextNodes
is set to true (true
by default).
CleanInput.attributes
● [String]
list scalar
A list of attributes to remove from all DOM nodes. When this isn't specified, and removeAttributes
is true, all attributes on all DOM nodes are removed. removeNonTextNodes
must be set to true
for this to take effect
CleanInput.regexes
● [String!]
list scalar
When removeRegex
is set to "true" this list of regex items, without the beginning and ending /
, are removed from the page. These are each run in order and replaced with a single space character to preserve some of their contents