Scope Control

Scope options determine which links Smippo follows when crawling. Proper scope configuration prevents runaway crawls while ensuring you capture what you need.

Understanding Scope

When Smippo encounters a link, it must decide: should I follow this link?

The --scope option defines the boundary:

smippo https://www.example.com --scope <type>

Scope Types

Subdomain Scope (Strictest)

smippo https://www.example.com --scope subdomain

Only follows links on the exact same subdomain:

URLFollowed?
https://www.example.com/page✅ Yes
https://www.example.com/docs/intro✅ Yes
https://docs.example.com/❌ No
https://example.com/❌ No
https://other.com/❌ No

Use when: You want a specific subdomain only (e.g., just www or just docs).

Domain Scope (Default)

smippo https://www.example.com --scope domain

Follows links on the same domain and any subdomain:

URLFollowed?
https://www.example.com/page✅ Yes
https://docs.example.com/✅ Yes
https://api.example.com/✅ Yes
https://example.com/✅ Yes
https://other.com/❌ No

Use when: You want the entire site including all subdomains.

TLD Scope

smippo https://www.example.com --scope tld

Follows links on the same top-level domain. Not recommended as it can lead to capturing unrelated sites.

All Scope (Most Permissive)

smippo https://example.com --scope all

Follows ALL links, regardless of domain:

URLFollowed?
https://example.com/page✅ Yes
https://docs.example.com/✅ Yes
https://other.com/✅ Yes
https://anything.com/✅ Yes

Directory Restriction

Stay in Directory

smippo https://example.com/docs/ --stay-in-dir

Only follows links within the same directory path:

URLFollowed?
https://example.com/docs/intro✅ Yes
https://example.com/docs/guide/start✅ Yes
https://example.com/blog/❌ No
https://example.com/❌ No

Use when: Capturing a specific section of a site (documentation, blog category).

Combining with Scope

smippo https://docs.example.com/v2/ --scope subdomain --stay-in-dir

This captures only:

  • Same subdomain (docs.example.com)
  • Same directory tree (/v2/*)

External Assets

By default, Smippo only captures pages within scope, but assets (images, CSS, JS) can come from anywhere.

Enable External Assets

smippo https://example.com --external-assets

This captures assets from CDNs and external domains:

ResourceWithout FlagWith Flag
https://cdn.example.com/style.css❌ Skip✅ Capture
https://fonts.googleapis.com/❌ Skip✅ Capture
https://example.com/logo.png✅ Capture✅ Capture

Use when: You want a complete offline copy with all fonts, images, and styles.

Practical Example

For a fully offline documentation site:

smippo https://docs.example.com \
  --depth 5 \
  --scope subdomain \
  --external-assets \
  --static

This captures:

  • All pages on docs.example.com
  • External CSS, fonts, and images
  • Static HTML (no JavaScript needed)

Scope Decision Tree

New link discovered: https://target.com/page

    ├─ Is it the same subdomain?
    │   └─ Yes → Follow (any scope)

    ├─ Is it the same domain (different subdomain)?
    │   ├─ scope = subdomain → Don't follow
    │   └─ scope = domain/tld/all → Follow

    └─ Is it a different domain?
        ├─ scope = subdomain/domain → Don't follow
        └─ scope = all → Follow

Common Configurations

Documentation Site

smippo https://docs.framework.com \
  --scope subdomain \
  --depth 10 \
  --external-assets

Company Website

smippo https://www.company.com \
  --scope domain \
  --depth 5 \
  --exclude "*/careers/*" \
  --exclude "*/press/*"

Blog Archive

smippo https://blog.example.com/posts/ \
  --scope subdomain \
  --stay-in-dir \
  --depth 3

Multi-Site Crawl (Careful!)

smippo https://hub.example.com \
  --scope all \
  --depth 2 \
  --max-pages 500 \
  --max-time 600

Troubleshooting

"Capture takes forever"

Your scope is too broad. Add restrictions:

smippo https://example.com \
  --scope subdomain \
  --max-pages 200 \
  --max-time 300

"Missing pages"

Your scope is too narrow. Try:

smippo https://www.example.com --scope domain

"Missing images/fonts"

You need external assets:

smippo https://example.com --external-assets

Next Steps

Was this page helpful?