Mapping
The mapping feature provides a flexible way to extract structured data from web pages by specifying DOM selectors or JavaScript snippets. This guide covers advanced tips, best practices, and sneaky strategies to help you leverage this feature fully.
The mapping feature allows you to:
- Select multiple DOM nodes simultaneously.
- Retrieve structured data (text, HTML, or attributes).
- Map nested selectors to capture hierarchical DOM structures.
- Assign aliases to make your returned JSON more meaningful.
Use specific CSS selectors to avoid unnecessary data.
Creating JSON with mapSelector
The mapSelector
function offers an intuitive alternative to typical parsing, working similarly to the map
function in functional programming, with NodeLists or document.querySelectorAll
.
You can easily extract DOM attributes using the attribute(name: "data-custom-attribute")
property. It returns an object with name
and value
properties.
The query below demonstrates how to:
- Navigate to
https://news.ycombinator.com
. - Create a map named
posts
to extract all.submission .titleline > a
elements. - Return an array of objects, each containing the
href
attribute as a structured JSON.
- Mutation
- Example response
mutation scraping_example {
goto(
url: "https://news.ycombinator.com",
waitUntil: firstContentfulPaint
) {
status
}
posts: mapSelector(selector: ".submission .titleline > a", wait: true) {
link: attribute(name: "href") {
value
}
}
}
{
"data": {
"goto": {
"status": 200
},
"posts": [
{
"link": {
"value": "https://churchofturing.github.io/landscapeoflisp.html"
}
},
{
"link": {
"value": "https://www.jjj.de/fxt/fxtbook.pdf"
}
},
...
{
"link": {
"value": "https://ereader-swedish.fly.dev/"
}
}
]
}
}
Smart Use of Aliases
Aliases enhance readability and clarity of your mapped data.
Example:
mutation ProductDetails {
goto(url: "https://example.com/products") {
status
}
products: mapSelector(selector: ".product-item") {
title: mapSelector(selector: ".product-title") { innerText }
price: mapSelector(selector: ".product-price") { innerText }
}
}
Handling Arbitrary DOM Attributes
Retrieve any custom
or data-*
attributes seamlessly:
Example:
mutation CustomAttributes {
goto(url: "https://example.com") {
status
}
items: mapSelector(selector: "[data-item-id]") {
id: attribute(name: "data-item-id") {
name
value
}
}
}
Nested Mapping for Hierarchical Data
Use nested mappings for deeply structured data. The hierarchy is preserved, making your structured JSON easier to handle:
Example:
mutation NestedMappingExample {
goto(url: "https://example.com/categories") {
status
}
categories: mapSelector(selector: ".category") {
categoryName: innerText
subcategories: mapSelector(selector: ".subcategory") {
subcategoryName: innerText
}
}
}
Advanced Nested Example
Further illustrating nested mapping, this example retrieves metadata such as author and score:
mutation map_selector_example_with_metadata {
goto(url: "https://news.ycombinator.com") {
status
}
posts: mapSelector(selector: ".subtext .subline") {
author: mapSelector(selector: ".hnuser") {
authorName: innerText
}
score: mapSelector(selector: ".score") {
score: innerText
}
}
}
Conditional Wait and Timeout Adjustments
Customize wait times for dynamic content:
Example:
mutation TimeoutExample {
goto(url: "https://example.com") {
status
}
delayedItems: mapSelector(selector: ".async-loaded-item", timeout: 60000, wait: true) {
content: innerText
}
}