Recipes
Web Scraping at Scale
Scrape data from protected websites using SessionKit's stealth sessions, residential proxies, and automatic retry logic.
Basic Scraping Pattern
import { SessionKit } from '@sessionkit/sdk'
import { chromium } from 'playwright'
const sk = new SessionKit({ apiKey: process.env.SESSIONKIT_API_KEY })
async function scrape(url: string) {
const session = await sk.sessions.create({
proxy: { type: 'residential', country: 'US', sticky: true },
stealth: 'max',
fingerprint: 'auto',
timeout: 120,
})
const browser = await chromium.connectOverCDP(session.cdpUrl)
const page = await browser.newPage()
try {
await page.goto(url, { waitUntil: 'networkidle' })
// Extract data
const data = await page.evaluate(() => {
const items = document.querySelectorAll('.product-card')
return Array.from(items).map(item => ({
title: item.querySelector('h2')?.textContent?.trim(),
price: item.querySelector('.price')?.textContent?.trim(),
url: item.querySelector('a')?.href,
}))
})
return data
} finally {
await browser.close()
await sk.sessions.destroy(session.id)
}
}
Parallel Scraping with Fleets
For high-throughput scraping, use a fleet to avoid cold-start latency:
// Create a fleet for scraping
const fleet = await sk.fleets.create({
name: 'product-scraper',
size: 20,
maxSize: 50,
proxy: { type: 'datacenter', region: 'us' },
stealth: 'basic',
sessionTimeout: 60,
})
// Scrape multiple URLs in parallel
const urls = ['https://example.com/page/1', 'https://example.com/page/2', ...]
const results = await Promise.all(
urls.map(async (url) => {
const session = await sk.fleets.acquire(fleet.id)
const browser = await chromium.connectOverCDP(session.cdpUrl)
const page = await browser.newPage()
try {
await page.goto(url)
const data = await page.evaluate(() => document.title)
return { url, data }
} finally {
await browser.close()
await sk.fleets.release(fleet.id, session.id)
}
})
)
Handling Anti-Bot Challenges
async function scrapeProtectedSite(url: string) {
const session = await sk.sessions.create({
proxy: { type: 'residential', country: 'US' },
stealth: 'max',
fingerprint: 'auto',
})
const browser = await chromium.connectOverCDP(session.cdpUrl)
const page = await browser.newPage()
await page.goto(url)
// Wait for Cloudflare challenge to resolve
await page.waitForFunction(
() => !document.querySelector('#cf-challenge-running'),
{ timeout: 15000 }
).catch(() => {
console.log('Challenge did not resolve — may need to retry')
})
// Check if we passed
const title = await page.title()
if (title.includes('Just a moment')) {
throw new Error('Anti-bot challenge not bypassed')
}
// Continue with scraping...
const content = await page.content()
return content
}
Best Practices
- Use residential proxies for anti-bot protected sites
- Enable sticky sessions for multi-page crawls requiring login
- Add random delays between actions to mimic human behavior
- Rotate fingerprints between sessions to avoid tracking
- Respect
robots.txtand rate-limit your requests - Handle errors gracefully with exponential backoff
Important: Always comply with the target website's Terms of Service and applicable laws when scraping.