Filtering

Smippo provides powerful filtering options to control exactly what gets captured. Filter by URL patterns, MIME types, and file sizes.

URL Pattern Filtering

Use glob patterns to include or exclude specific URLs.

Include Patterns

Only capture URLs matching the pattern:

# Only HTML and CSS files
smippo https://example.com --include "*.html" --include "*.css"

# Only files in /docs directory
smippo https://example.com --include "*/docs/*"

# Multiple extensions
smippo https://example.com --include "*.{jpg,png,gif}"

Exclude Patterns

Skip URLs matching the pattern:

# Exclude tracking and ads
smippo https://example.com --exclude "*tracking*" --exclude "*ads*"

# Exclude API endpoints
smippo https://example.com --exclude "*/api/*"

# Exclude specific file types
smippo https://example.com --exclude "*.pdf" --exclude "*.zip"

Combining Include and Exclude

Include and exclude patterns work together:

smippo https://example.com \
  --include "*.html" \
  --include "*.css" \
  --exclude "*admin*" \
  --exclude "*private*"

This captures HTML and CSS files, but excludes any with "admin" or "private" in the URL.

MIME Type Filtering

Filter resources by their MIME type.

Include MIME Types

# Only images
smippo https://example.com --mime-include "image/*"

# Only HTML and CSS
smippo https://example.com --mime-include "text/html" --mime-include "text/css"

# Only JSON
smippo https://example.com --mime-include "application/json"

Exclude MIME Types

# Exclude videos
smippo https://example.com --mime-exclude "video/*"

# Exclude binary files
smippo https://example.com --mime-exclude "application/octet-stream"

# Exclude fonts (if you have them locally)
smippo https://example.com --mime-exclude "font/*"

Wildcards

MIME type filters support wildcards:

  • image/* - All image types (image/png, image/jpeg, etc.)
  • text/* - All text types
  • application/* - All application types
  • video/* - All video types

File Size Filtering

Limit downloads by file size to avoid large files.

Maximum Size

# Skip files larger than 10MB
smippo https://example.com --max-size 10MB

# Skip files larger than 500KB
smippo https://example.com --max-size 500KB

# Skip files larger than 1GB
smippo https://example.com --max-size 1GB

Minimum Size

# Skip files smaller than 1KB (often tracking pixels)
smippo https://example.com --min-size 1KB

# Skip files smaller than 100 bytes
smippo https://example.com --min-size 100B

Size Format

Size values support these units:

  • B or bytes (e.g., 100B)
  • KB (e.g., 10KB)
  • MB (e.g., 5MB)
  • GB (e.g., 1GB)

Common Filtering Patterns

Documentation Site

smippo https://docs.example.com \
  --include "*.html" \
  --include "*.css" \
  --include "*.{jpg,png,svg}" \
  --exclude "*/api/*" \
  --exclude "*search*" \
  --mime-exclude "video/*" \
  --max-size 5MB

Blog Archive

smippo https://blog.example.com \
  --include "*/posts/*" \
  --include "*.html" \
  --exclude "*/comments/*" \
  --exclude "*tracking*" \
  --mime-exclude "video/*"

Image Gallery

smippo https://gallery.example.com \
  --mime-include "image/*" \
  --max-size 20MB \
  --min-size 10KB

API Documentation

smippo https://api-docs.example.com \
  --include "*.html" \
  --include "*.css" \
  --exclude "*/examples/*" \
  --exclude "*.pdf"

Filtering Tips

Use Dry Run

Test your filters before capturing:

smippo https://example.com \
  --include "*.html" \
  --exclude "*api*" \
  --dry-run

This shows what would be captured without downloading.

Combine Filters

Filters work together logically:

  • Include patterns: URL must match at least one
  • Exclude patterns: URL must not match any
  • MIME filters: Applied after URL filters
  • Size filters: Applied to all resources

Performance

Filtering happens before download, so it doesn't slow down captures. In fact, excluding large files can speed things up!

Next Steps

Was this page helpful?