ScraperSensei provides a powerful selector engine based on CSS selectors with additional capabilities. This page demonstrates common selector patterns with examples to help you effectively extract data from web pages.

Text Matching

Basic Text Matching

article:has-text("ScraperSensei")

Matches any <article> tag that contains the text “ScraperSensei”, even nested inside child elements.

<!-- Will match -->
<article>
  <div>Welcome to ScraperSensei</div>
</article>

<!-- Will match -->
<article>ScraperSensei Documentation</article>

<!-- Will not match -->
<div>ScraperSensei</div>

Exact Text Matching

button:text-is("Log")

Matches elements with exact text content. Case-sensitive and trims whitespace.

<!-- Will match -->
<button> Log <span>in</span></button>

<!-- Will not match -->
<button>Log in</button>
<button>log</button>
<button>Login</button>

Text Pattern Matching

button:text-matches("Log\\s*in", "i")

Matches text using regex patterns. The example matches “Login”, “Log in”, “log IN”, etc.

<!-- All of these will match -->
<button>Login</button>
<button>Log in</button>
<button>LOG IN</button>
<button>log in</button>

Layout-Based Selectors

Position-Based

input:right-of(:text("Username")) Matches input fields that are to the right of text “Username”

<!-- Will match -->
<div>
  <label>Username</label>
  <input type="text" />
</div>

<!-- Will not match -->
<div>
  <input type="text" />
  <label>Username</label>
</div>

button:near(.promo-card) Matches buttons within 50 pixels of elements with class “promo-card”

<!-- Will match if button is within 50px of the div -->
<div class="promo-card">Special Offer!</div>
<button>Buy Now</button>

Distance Specification

button:near(:text("Username"), 120) Matches buttons within 120 pixels of text “Username”

<!-- Will match if button is within 120px -->
<div>Username</div>
<button>Click me</button>

Element State and Visibility

button:visible Matches only visible button elements

<!-- Will match -->
<button>Visible Button</button>

<!-- Will not match -->
<button style="display: none">Hidden Button</button>
<button hidden>Hidden Button</button>
<button style="visibility: hidden">Hidden Button</button>

Nested Elements

Has-Text with Specific Elements

article:has-text("ScraperSensei") Matches article elements containing “ScraperSensei” text anywhere inside

<!-- Will match -->
<article>
  <h2>Getting Started with ScraperSensei</h2>
  <p>Some content...</p>
</article>

<!-- Will match -->
<article>ScraperSensei Guide</article>

<!-- Will not match -->
<section>ScraperSensei Guide</section>

Parent-Child Relationships

#nav-bar :text("Home") Matches elements with text “Home” inside #nav-bar element

<!-- Will match -->
<nav id="nav-bar">
  <a>Home</a>
  <a>About</a>
</nav>

<!-- Will not match -->
<div id="content">
  <a>Home</a>
</div>

Multiple Conditions

button:has-text("Log in"), button:has-text("Sign in") Matches buttons containing either “Log in” or “Sign in” text

<!-- Both will match -->
<button>Log in</button>
<button>Sign in</button>

<!-- Will not match -->
<button>Register</button>

XPath Selectors

xpath=//button Matches any button element anywhere in the document

<!-- All will match -->
<button>Click me</button>
<div><button>Nested button</button></div>
<form><button>Submit</button></form>

xpath=//div[@id='main']//button Matches button elements inside div with id=“main”

<!-- Will match -->
<div id="main">
  <button>Inside main</button>
  <div>
    <button>Nested inside main</button>
  </div>
</div>

<!-- Will not match -->
<div id="sidebar">
  <button>Outside main</button>
</div>

Label and Form Controls

label:has-text("Password") Matches label element containing “Password” text and can be used to target its associated input

<!-- Will match -->
<label for="pwd">Password:</label>
<input id="pwd" type="password">

<!-- Will match -->
<label>
  Password:
  <input type="password">
</label>

Framework-Specific Selectors

React Components

_react=BookItem Matches React components named BookItem

// Will match
<BookItem title="React Guide" />

// Will match
<BookItem author="John Doe" year={2023} />

_react=BookItem[author = "Steven King"] Matches BookItem components with specific author prop

// Will match
<BookItem author="Steven King" />

// Will not match
<BookItem author="John Doe" />

Vue Components

_vue=book-item Matches Vue components named book-item

<!-- Will match -->
<book-item></book-item>
<book-item title="Vue Guide"></book-item>

_vue=book-item[author = "Steven King"] Matches book-item components with specific author prop

<!-- Will match -->
<book-item author="Steven King"></book-item>

<!-- Will not match -->
<book-item author="John Doe"></book-item>

Testing-Specific Attributes

data-testid=submit Matches elements with data-testid=“submit”

<!-- Will match -->
<button data-testid="submit">Submit</button>
<input data-testid="submit" type="submit">

<!-- Will not match -->
<button data-testid="cancel">Cancel</button>

id=login-form Matches elements with id=“login-form”

<!-- Will match -->
<form id="login-form">
  <input type="text">
</form>

<!-- Will not match -->
<form id="signup-form">
  <input type="text">
</form>

CSS Selectors

Basic CSS

css=button Matches any button element using standard CSS selector

<!-- All will match -->
<button>Click me</button>
<button class="primary">Submit</button>
<button id="cancel">Cancel</button>

CSS with Text Matching

css=#nav-bar :text("Home") Matches smallest element containing “Home” text inside #nav-bar

<!-- Will match the <a> element -->
<div id="nav-bar">
  <a>Home</a>
  <div>Welcome Home</div>
</div>

<!-- Will not match -->
<div id="content">
  <a>Home</a>
</div>

CSS with Has Selector

article:has(div.promo) Matches article elements that contain div with class “promo”

<!-- Will match -->
<article>
  <div class="promo">Special offer!</div>
</article>

<!-- Will not match -->
<article>
  <div class="content">No promo here</div>
</article>

Nth Match Selectors

:nth-match(:text("Buy"), 3) Matches the third element containing text “Buy”

<!-- Third button will match -->
<section> <button>Buy</button> </section>
<article><div> <button>Buy</button> </div></article>
<div><div> <button>Buy</button> </div></div>

Chaining Selectors

Basic Chaining

article >> .bar > .baz >> span[attr=value] Chains multiple selectors, each queried relative to the previous match

<!-- Will match the span -->
<article>
  <div class="bar">
    <div class="baz">
      <span attr="value">Target</span>
    </div>
  </div>
</article>

<!-- Will not match -->
<article>
  <span attr="value">Wrong place</span>
</article>

Intermediate Matches

*css=article >> text=Hello The * prefix captures the article element instead of the text element

<!-- Will match the article element -->
<article>
  <div>Hello</div>
  <p>World</p>
</article>

<!-- Will not match -->
<section>
  <div>Hello</div>
</section>

Layout Combinations

[type=radio]:left-of(:text("Label 3")):near(.form-group) Complex selector combining position and proximity

<!-- Will match if within proximity -->
<div class="form-group">
  <input type="radio">
  <label>Label 3</label>
</div>

<!-- Will not match if too far -->
<div class="form-group">
  <input type="radio">
</div>
<div>
  <label>Label 3</label>
</div>

Role-Based Selectors

[role="button"][aria-label="Submit"] Matches elements with specific ARIA roles and labels

<!-- Will match -->
<div role="button" aria-label="Submit">Click me</div>

<!-- Will not match -->
<div role="button">Submit</div>
<button aria-label="Submit">Click me</button>

Union Selectors

xpath=//span[contains(@class, 'spinner__loading')]|//div[@id='confirmation'] Matches elements that satisfy either condition

<!-- Both will match -->
<span class="spinner__loading"></span>
<div id="confirmation">Confirmed!</div>

<!-- Will not match -->
<span class="spinner">Loading...</span>
<div id="other">Not confirmation</div>

CSS Pseudo-Classes

Visibility Matching

button:visible

Only matches visible buttons, useful to distinguish between similar elements

<!-- Will match -->
<button>Visible button</button>

<!-- Will not match any of these -->
<button style="display: none">Invisible</button>
<button style="visibility: hidden">Hidden</button>
<button hidden>Hidden</button>

Text Content Matching

article:has-text("ScraperSensei") Matches elements containing specified text somewhere inside

<!-- Will match -->
<article>
  <div>Testing with ScraperSensei</div>
  <p>Some other content</p>
</article>

<!-- Will not match -->
<div>ScraperSensei</div>

Multiple Text Conditions

button:has-text("Log in"), button:has-text("Sign in") Matches elements that satisfy any of the text conditions

<!-- Both will match -->
<button>Log in</button>
<button>Sign in</button>

<!-- Will not match -->
<button>Register</button>
<a>Log in</a>

Element Containment

article:has(div.promo) Returns elements that have matching children

<!-- Will match -->
<article>
  <div class="promo">Special offer!</div>
  <p>Content</p>
</article>

<!-- Will not match -->
<article>
  <div>No promo here</div>
</article>

Nth Element Selection

:nth-match(:text("Buy"), 3) Matches the nth occurrence of an element (1-based index)

<!-- Third "Buy" button will match -->
<div>
  <button>Buy</button>
  <button>Buy</button>
  <button>Buy</button>  <!-- This one matches -->
  <button>Buy</button>
</div>

Advanced Layout Selectors

Above/Below

button:above(.footer) Matches buttons that are above the footer element

<!-- Will match if positioned above -->
<button>Click me</button>
<div class="content">Some content</div>
<footer class="footer">Footer content</footer>

input:below(.header) Matches inputs that are below the header element

<!-- Will match if positioned below -->
<header class="header">Header content</header>
<div class="content">
  <input type="text" />  <!-- This will match -->
</div>

Left/Right Positioning

button:right-of(.sidebar) Matches buttons positioned to the right of the sidebar

<!-- Will match if positioned to the right -->
<div class="layout">
  <div class="sidebar">Menu</div>
  <button>Action</button>  <!-- This will match -->
</div>

Near Elements

button:near(.card, 100) Matches buttons within 100 pixels of a card element

<!-- Will match if within 100px -->
<div class="card">
  Product details
  <button>Add to cart</button>
</div>

<!-- Will not match if beyond 100px -->
<div class="card">Product details</div>
<div class="spacer"></div>
<button>Add to cart</button>

React Component Properties

Multiple Property Matching

_react=BookItem[author *= "king" i][year = 1990] Matches React components with multiple property conditions

// Will match
<BookItem author="Stephen King" year={1990} />

// Will not match
<BookItem author="Stephen King" year={1991} />
<BookItem author="John Doe" year={1990} />

Nested Property Values

_react=[some.nested.value = 12] Matches components with specific nested property values

// Will match
<Component some={{ nested: { value: 12 }}} />

// Will not match
<Component some={{ nested: { value: 13 }}} />

Property Pattern Matching

_react=BookItem[author = /Steven(\\s+King)?/i] Matches components where properties match a regex pattern

// All of these will match
<BookItem author="Steven" />
<BookItem author="Steven King" />
<BookItem author="steven king" />

// Will not match
<BookItem author="Stephen King" />