Performance
Smippo's vacuum architecture is designed for speed. Learn how to optimize performance for different scenarios—from aggressive captures to polite crawling.
Parallel Workers
Smippo uses multiple browser tabs (workers) to capture pages in parallel.
Default Configuration
smippo https://example.com # 8 workers (default)
Each worker operates independently, capturing pages and resources simultaneously.
Adjusting Workers
# Conservative: 1 worker
smippo https://example.com --workers 1
# Moderate: 4 workers
smippo https://example.com --workers 4
# Aggressive: 16 workers
smippo https://example.com --workers 16
Worker Guidelines
| Workers | Use Case |
|---|---|
| 1 | Strict rate-limited sites, debugging |
| 2-4 | API-heavy sites, slow servers |
| 8 | Default, works for most sites |
| 16 | Fast servers you control, large captures |
More workers = more memory usage. Each worker uses ~50-100MB RAM. With 16 workers, expect ~1-2GB RAM usage.
Rate Limiting
Be polite to servers by adding delays between requests.
Request Delay
smippo https://example.com --rate-limit 1000 # 1 second delay
smippo https://example.com --rate-limit 500 # 500ms delay
The rate limit applies per worker, so with 8 workers and 1000ms rate limit, you'll make ~8 requests per second max.
Combining Rate Limit with Workers
For very polite crawling:
smippo https://example.com --workers 2 --rate-limit 2000
This makes ~1 request per second total.
Robots.txt Crawl Delay
Smippo automatically respects Crawl-delay directives in robots.txt. If a site specifies:
User-agent: *
Crawl-delay: 5
Smippo will wait 5 seconds between requests, regardless of your --rate-limit setting.
Resource Limits
Maximum Pages
smippo https://example.com --depth 10 --max-pages 500
Stop after capturing 500 pages, regardless of how many more exist.
Maximum Time
smippo https://example.com --max-time 300 # 5 minutes
Stop after 5 minutes, saving whatever has been captured.
Combining Limits
smippo https://large-site.com \
--depth 10 \
--max-pages 1000 \
--max-time 1800 \
--workers 8
This stops at whichever comes first: 1000 pages or 30 minutes.
Filtering for Speed
Reduce capture time by filtering out unnecessary content.
Skip Large Files
smippo https://example.com --max-size 5MB
Skip videos, large images, and downloads.
Skip Specific Content
smippo https://example.com \
--exclude "*/downloads/*" \
--exclude "*.pdf" \
--mime-exclude "video/*"
Focus on HTML Only
smippo https://example.com \
--mime-include "text/html" \
--mime-include "text/css"
Wait Strategy Optimization
The wait strategy affects how long each page takes.
Fastest (May Miss Content)
smippo https://example.com --wait domcontentloaded
Captures immediately when DOM is ready. May miss lazy-loaded content.
Balanced (Default)
smippo https://example.com --wait networkidle
Waits until network activity stops. Good balance.
Slowest (Most Complete)
smippo https://example.com --wait networkidle --wait-time 3000
Waits for network idle plus 3 extra seconds. Best for SPAs with delayed loading.
Memory Management
Reduce Memory Usage
For large captures on limited memory:
smippo https://example.com \
--workers 4 \
--max-size 2MB \
--no-har
HAR File Impact
HAR files store all network requests and can get large. Disable for memory-constrained captures:
smippo https://example.com --no-har
Benchmarks
Single Page
time smippo https://example.com
# Typical: 2-5 seconds
Small Site (100 pages)
time smippo https://docs.example.com --depth 3
# Typical: 30-60 seconds with 8 workers
Large Site (1000 pages)
time smippo https://large-site.com --depth 5 --max-pages 1000
# Typical: 5-15 minutes with 8 workers
Optimization Recipes
Fast Documentation Capture
smippo https://docs.example.com \
--depth 10 \
--workers 12 \
--wait domcontentloaded \
--max-size 5MB \
--no-har
Polite Blog Archive
smippo https://blog.example.com \
--depth 5 \
--workers 2 \
--rate-limit 2000 \
--max-pages 500
Complete Site Mirror
smippo https://example.com \
--depth 10 \
--workers 8 \
--external-assets \
--wait-time 1000 \
--max-time 3600
Quick Preview
smippo https://example.com \
--depth 1 \
--workers 16 \
--wait domcontentloaded \
--no-har
Monitoring Progress
Verbose Output
smippo https://example.com --verbose
Shows each page as it's captured, useful for monitoring large captures.
Log to File
smippo https://example.com --log-file capture.log --verbose
Write detailed logs for later analysis.
Debug Mode
smippo https://example.com --debug
Opens a visible browser window. Slower but useful for troubleshooting.
Next Steps
- Options Reference — All options explained
- Filtering — Fine-tune captures
- Troubleshooting — Common issues