Capture Command
The main smippo command captures websites by rendering them in a browser and saving all resources for offline viewing.
Basic Usage
smippo <url> [options]
Single Page Capture
Capture a single page with all its assets:
smippo https://example.com
This creates a ./site directory with:
index.html- The fully rendered page- All assets (images, CSS, JS, fonts)
network.har- HTTP Archive file
Recursive Website Mirroring
Crawl and capture multiple pages:
smippo https://example.com --depth 3
The --depth option controls how many levels deep to follow links:
--depth 0or--no-crawl- Single page only (default)--depth 1- Current page + linked pages--depth 3- Current page + 3 levels deep
Common Options
Output Directory
Save to a custom directory:
smippo https://example.com --output ./my-mirror
Scope Control
Control which links to follow:
# Stay on same domain (default)
smippo https://www.example.com --scope domain
# Stay on same subdomain
smippo https://www.example.com --scope subdomain
# Follow all links (use with caution!)
smippo https://example.com --scope all --depth 2
Filtering
Include or exclude specific URLs:
# Include only HTML and CSS
smippo https://example.com --include "*.html" --include "*.css"
# Exclude tracking and ads
smippo https://example.com --exclude "*tracking*" --exclude "*ads*"
# Filter by MIME type
smippo https://example.com --mime-include "image/*" --mime-exclude "video/*"
# Filter by file size
smippo https://example.com --max-size 5MB --min-size 1KB
Performance
Control parallel workers and limits:
# Use 4 workers (default: 8)
smippo https://example.com --workers 4
# Limit total pages
smippo https://example.com --max-pages 100
# Limit total time (5 minutes)
smippo https://example.com --max-time 300
# Rate limiting (1 second between requests)
smippo https://example.com --rate-limit 1000
Browser Options
Wait Strategy
Control when to consider a page "loaded":
# Wait for network idle (default)
smippo https://example.com --wait networkidle
# Wait for DOM content loaded
smippo https://example.com --wait domcontentloaded
# Wait for full page load
smippo https://example.com --wait load
# Add extra wait time for slow sites
smippo https://example.com --wait-time 5000
Viewport & Device
# Custom viewport size
smippo https://example.com --viewport 1280x720
# Emulate device
smippo https://example.com --device "iPhone 13"
# Custom user agent
smippo https://example.com --user-agent "Mozilla/5.0..."
Output Options
Screenshots & PDFs
# Take screenshot of each page
smippo https://example.com --screenshot
# Generate PDF of each page
smippo https://example.com --pdf
Static Mode
Strip JavaScript for true offline viewing:
smippo https://example.com --static
This removes all <script> tags while preserving the rendered content.
Output Structure
# Original URL structure (default)
smippo https://example.com --structure original
# Flat structure (all files in one directory)
smippo https://example.com --structure flat
# Organized by domain
smippo https://example.com --structure domain
HAR Files
# Generate HAR file (default)
smippo https://example.com --har
# Skip HAR file generation
smippo https://example.com --no-har
Authentication
Basic Auth
smippo https://user:pass@example.com
Cookie-based Auth
smippo https://example.com --cookies cookies.json
The cookies file should be in JSON format:
[
{
"name": "session",
"value": "abc123",
"domain": ".example.com",
"path": "/"
}
]
Interactive Auth
Open a browser window for manual login:
smippo https://example.com --capture-auth
This opens a browser where you can log in manually. After login, Smippo captures the session and continues.
Complete Example
Here's a comprehensive example capturing a documentation site:
smippo https://docs.example.com \
--depth 5 \
--scope subdomain \
--output ./docs-mirror \
--exclude "*api*" \
--exclude "*search*" \
--max-size 10MB \
--workers 4 \
--screenshot \
--wait-time 2000
This will:
- Crawl 5 levels deep
- Stay on the same subdomain
- Save to
./docs-mirror - Exclude API and search pages
- Skip files larger than 10MB
- Use 4 parallel workers
- Take screenshots of each page
- Wait 2 seconds after network idle
Next Steps
- Options Reference — Complete list of all options
- Filtering Guide — Advanced filtering techniques
- Examples — Real-world use cases