Performance

Smippo's vacuum architecture is designed for speed. Learn how to optimize performance for different scenarios—from aggressive captures to polite crawling.

Parallel Workers

Smippo uses multiple browser tabs (workers) to capture pages in parallel.

Default Configuration

smippo https://example.com  # 8 workers (default)

Each worker operates independently, capturing pages and resources simultaneously.

Adjusting Workers

# Conservative: 1 worker
smippo https://example.com --workers 1

# Moderate: 4 workers
smippo https://example.com --workers 4

# Aggressive: 16 workers
smippo https://example.com --workers 16

Worker Guidelines

WorkersUse Case
1Strict rate-limited sites, debugging
2-4API-heavy sites, slow servers
8Default, works for most sites
16Fast servers you control, large captures

Rate Limiting

Be polite to servers by adding delays between requests.

Request Delay

smippo https://example.com --rate-limit 1000  # 1 second delay
smippo https://example.com --rate-limit 500   # 500ms delay

The rate limit applies per worker, so with 8 workers and 1000ms rate limit, you'll make ~8 requests per second max.

Combining Rate Limit with Workers

For very polite crawling:

smippo https://example.com --workers 2 --rate-limit 2000

This makes ~1 request per second total.

Robots.txt Crawl Delay

Smippo automatically respects Crawl-delay directives in robots.txt. If a site specifies:

User-agent: *
Crawl-delay: 5

Smippo will wait 5 seconds between requests, regardless of your --rate-limit setting.

Resource Limits

Maximum Pages

smippo https://example.com --depth 10 --max-pages 500

Stop after capturing 500 pages, regardless of how many more exist.

Maximum Time

smippo https://example.com --max-time 300  # 5 minutes

Stop after 5 minutes, saving whatever has been captured.

Combining Limits

smippo https://large-site.com \
  --depth 10 \
  --max-pages 1000 \
  --max-time 1800 \
  --workers 8

This stops at whichever comes first: 1000 pages or 30 minutes.

Filtering for Speed

Reduce capture time by filtering out unnecessary content.

Skip Large Files

smippo https://example.com --max-size 5MB

Skip videos, large images, and downloads.

Skip Specific Content

smippo https://example.com \
  --exclude "*/downloads/*" \
  --exclude "*.pdf" \
  --mime-exclude "video/*"

Focus on HTML Only

smippo https://example.com \
  --mime-include "text/html" \
  --mime-include "text/css"

Wait Strategy Optimization

The wait strategy affects how long each page takes.

Fastest (May Miss Content)

smippo https://example.com --wait domcontentloaded

Captures immediately when DOM is ready. May miss lazy-loaded content.

Balanced (Default)

smippo https://example.com --wait networkidle

Waits until network activity stops. Good balance.

Slowest (Most Complete)

smippo https://example.com --wait networkidle --wait-time 3000

Waits for network idle plus 3 extra seconds. Best for SPAs with delayed loading.

Memory Management

Reduce Memory Usage

For large captures on limited memory:

smippo https://example.com \
  --workers 4 \
  --max-size 2MB \
  --no-har

HAR File Impact

HAR files store all network requests and can get large. Disable for memory-constrained captures:

smippo https://example.com --no-har

Benchmarks

Single Page

time smippo https://example.com
# Typical: 2-5 seconds

Small Site (100 pages)

time smippo https://docs.example.com --depth 3
# Typical: 30-60 seconds with 8 workers

Large Site (1000 pages)

time smippo https://large-site.com --depth 5 --max-pages 1000
# Typical: 5-15 minutes with 8 workers

Optimization Recipes

Fast Documentation Capture

smippo https://docs.example.com \
  --depth 10 \
  --workers 12 \
  --wait domcontentloaded \
  --max-size 5MB \
  --no-har

Polite Blog Archive

smippo https://blog.example.com \
  --depth 5 \
  --workers 2 \
  --rate-limit 2000 \
  --max-pages 500

Complete Site Mirror

smippo https://example.com \
  --depth 10 \
  --workers 8 \
  --external-assets \
  --wait-time 1000 \
  --max-time 3600

Quick Preview

smippo https://example.com \
  --depth 1 \
  --workers 16 \
  --wait domcontentloaded \
  --no-har

Monitoring Progress

Verbose Output

smippo https://example.com --verbose

Shows each page as it's captured, useful for monitoring large captures.

Log to File

smippo https://example.com --log-file capture.log --verbose

Write detailed logs for later analysis.

Debug Mode

smippo https://example.com --debug

Opens a visible browser window. Slower but useful for troubleshooting.

Next Steps

Was this page helpful?