Capture Command

The main smippo command captures websites by rendering them in a browser and saving all resources for offline viewing.

Basic Usage

smippo <url> [options]

Single Page Capture

Capture a single page with all its assets:

smippo https://example.com

This creates a ./site directory with:

  • index.html - The fully rendered page
  • All assets (images, CSS, JS, fonts)
  • network.har - HTTP Archive file

Recursive Website Mirroring

Crawl and capture multiple pages:

smippo https://example.com --depth 3

The --depth option controls how many levels deep to follow links:

  • --depth 0 or --no-crawl - Single page only (default)
  • --depth 1 - Current page + linked pages
  • --depth 3 - Current page + 3 levels deep

Common Options

Output Directory

Save to a custom directory:

smippo https://example.com --output ./my-mirror

Scope Control

Control which links to follow:

# Stay on same domain (default)
smippo https://www.example.com --scope domain

# Stay on same subdomain
smippo https://www.example.com --scope subdomain

# Follow all links (use with caution!)
smippo https://example.com --scope all --depth 2

Filtering

Include or exclude specific URLs:

# Include only HTML and CSS
smippo https://example.com --include "*.html" --include "*.css"

# Exclude tracking and ads
smippo https://example.com --exclude "*tracking*" --exclude "*ads*"

# Filter by MIME type
smippo https://example.com --mime-include "image/*" --mime-exclude "video/*"

# Filter by file size
smippo https://example.com --max-size 5MB --min-size 1KB

Performance

Control parallel workers and limits:

# Use 4 workers (default: 8)
smippo https://example.com --workers 4

# Limit total pages
smippo https://example.com --max-pages 100

# Limit total time (5 minutes)
smippo https://example.com --max-time 300

# Rate limiting (1 second between requests)
smippo https://example.com --rate-limit 1000

Browser Options

Wait Strategy

Control when to consider a page "loaded":

# Wait for network idle (default)
smippo https://example.com --wait networkidle

# Wait for DOM content loaded
smippo https://example.com --wait domcontentloaded

# Wait for full page load
smippo https://example.com --wait load

# Add extra wait time for slow sites
smippo https://example.com --wait-time 5000

Viewport & Device

# Custom viewport size
smippo https://example.com --viewport 1280x720

# Emulate device
smippo https://example.com --device "iPhone 13"

# Custom user agent
smippo https://example.com --user-agent "Mozilla/5.0..."

Output Options

Screenshots & PDFs

# Take screenshot of each page
smippo https://example.com --screenshot

# Generate PDF of each page
smippo https://example.com --pdf

Static Mode

Strip JavaScript for true offline viewing:

smippo https://example.com --static

This removes all <script> tags while preserving the rendered content.

Output Structure

# Original URL structure (default)
smippo https://example.com --structure original

# Flat structure (all files in one directory)
smippo https://example.com --structure flat

# Organized by domain
smippo https://example.com --structure domain

HAR Files

# Generate HAR file (default)
smippo https://example.com --har

# Skip HAR file generation
smippo https://example.com --no-har

Authentication

Basic Auth

smippo https://user:pass@example.com

Cookie-based Auth

smippo https://example.com --cookies cookies.json

The cookies file should be in JSON format:

[
  {
    "name": "session",
    "value": "abc123",
    "domain": ".example.com",
    "path": "/"
  }
]

Interactive Auth

Open a browser window for manual login:

smippo https://example.com --capture-auth

This opens a browser where you can log in manually. After login, Smippo captures the session and continues.

Complete Example

Here's a comprehensive example capturing a documentation site:

smippo https://docs.example.com \
  --depth 5 \
  --scope subdomain \
  --output ./docs-mirror \
  --exclude "*api*" \
  --exclude "*search*" \
  --max-size 10MB \
  --workers 4 \
  --screenshot \
  --wait-time 2000

This will:

  • Crawl 5 levels deep
  • Stay on the same subdomain
  • Save to ./docs-mirror
  • Exclude API and search pages
  • Skip files larger than 10MB
  • Use 4 parallel workers
  • Take screenshots of each page
  • Wait 2 seconds after network idle

Next Steps

Was this page helpful?