Filtering
Smippo provides powerful filtering options to control exactly what gets captured. Filter by URL patterns, MIME types, and file sizes.
URL Pattern Filtering
Use glob patterns to include or exclude specific URLs.
Include Patterns
Only capture URLs matching the pattern:
# Only HTML and CSS files
smippo https://example.com --include "*.html" --include "*.css"
# Only files in /docs directory
smippo https://example.com --include "*/docs/*"
# Multiple extensions
smippo https://example.com --include "*.{jpg,png,gif}"
Exclude Patterns
Skip URLs matching the pattern:
# Exclude tracking and ads
smippo https://example.com --exclude "*tracking*" --exclude "*ads*"
# Exclude API endpoints
smippo https://example.com --exclude "*/api/*"
# Exclude specific file types
smippo https://example.com --exclude "*.pdf" --exclude "*.zip"
Combining Include and Exclude
Include and exclude patterns work together:
smippo https://example.com \
--include "*.html" \
--include "*.css" \
--exclude "*admin*" \
--exclude "*private*"
This captures HTML and CSS files, but excludes any with "admin" or "private" in the URL.
MIME Type Filtering
Filter resources by their MIME type.
Include MIME Types
# Only images
smippo https://example.com --mime-include "image/*"
# Only HTML and CSS
smippo https://example.com --mime-include "text/html" --mime-include "text/css"
# Only JSON
smippo https://example.com --mime-include "application/json"
Exclude MIME Types
# Exclude videos
smippo https://example.com --mime-exclude "video/*"
# Exclude binary files
smippo https://example.com --mime-exclude "application/octet-stream"
# Exclude fonts (if you have them locally)
smippo https://example.com --mime-exclude "font/*"
Wildcards
MIME type filters support wildcards:
image/*- All image types (image/png, image/jpeg, etc.)text/*- All text typesapplication/*- All application typesvideo/*- All video types
File Size Filtering
Limit downloads by file size to avoid large files.
Maximum Size
# Skip files larger than 10MB
smippo https://example.com --max-size 10MB
# Skip files larger than 500KB
smippo https://example.com --max-size 500KB
# Skip files larger than 1GB
smippo https://example.com --max-size 1GB
Minimum Size
# Skip files smaller than 1KB (often tracking pixels)
smippo https://example.com --min-size 1KB
# Skip files smaller than 100 bytes
smippo https://example.com --min-size 100B
Size Format
Size values support these units:
Bor bytes (e.g.,100B)KB(e.g.,10KB)MB(e.g.,5MB)GB(e.g.,1GB)
Common Filtering Patterns
Documentation Site
smippo https://docs.example.com \
--include "*.html" \
--include "*.css" \
--include "*.{jpg,png,svg}" \
--exclude "*/api/*" \
--exclude "*search*" \
--mime-exclude "video/*" \
--max-size 5MB
Blog Archive
smippo https://blog.example.com \
--include "*/posts/*" \
--include "*.html" \
--exclude "*/comments/*" \
--exclude "*tracking*" \
--mime-exclude "video/*"
Image Gallery
smippo https://gallery.example.com \
--mime-include "image/*" \
--max-size 20MB \
--min-size 10KB
API Documentation
smippo https://api-docs.example.com \
--include "*.html" \
--include "*.css" \
--exclude "*/examples/*" \
--exclude "*.pdf"
Filtering Tips
Use Dry Run
Test your filters before capturing:
smippo https://example.com \
--include "*.html" \
--exclude "*api*" \
--dry-run
This shows what would be captured without downloading.
Combine Filters
Filters work together logically:
- Include patterns: URL must match at least one
- Exclude patterns: URL must not match any
- MIME filters: Applied after URL filters
- Size filters: Applied to all resources
Performance
Filtering happens before download, so it doesn't slow down captures. In fact, excluding large files can speed things up!
Next Steps
- Options Reference — Complete options list
- Scope Control — Control which links to follow
- Examples — See real-world filtering examples