Update Command

The smippo update command refreshes an existing mirror. It re-crawls the site, fetching only changed content while preserving previously captured pages.

Basic Usage

smippo update [options]

Update Default Directory

Update the mirror in ./site:

smippo update

Update Custom Directory

Update a mirror in a specific location:

smippo update -o ./docs-mirror
smippo update --output ~/archives/blog

How It Works

The update command:

  1. Reads the manifest — Gets original URL and options
  2. Re-crawls — Starts a fresh crawl from the root URL
  3. Checks cache — Uses ETags and Last-Modified headers to skip unchanged files
  4. Updates changed files — Downloads and saves only modified content
  5. Adds new pages — Captures newly discovered pages
  6. Updates manifest — Records the update timestamp
Before Update:
├── example.com/
│   ├── index.html (2024-01-01)
│   └── about.html (2024-01-01)

After Update:
├── example.com/
│   ├── index.html (2024-01-15, updated)
│   ├── about.html (2024-01-01, unchanged)
│   └── news.html  (2024-01-15, new)

When to Use Update

Regular Refreshes

Keep documentation mirrors up to date:

# Weekly update of docs
smippo update -o ./react-docs

Check for Changes

See what has changed on a site:

smippo update -o ./competitor-site --verbose

Archive Maintenance

Update archived blogs or news sites:

smippo update -o ./blog-archive

Difference from Continue

FeatureContinueUpdate
PurposeResume interrupted captureRefresh existing mirror
Starting pointLast captured pageRoot URL
ScopeQueued pages onlyFull re-crawl
Cache usageResume stateCheck for changes
ManifestResume stateFresh crawl metadata

Use continue when a capture was interrupted. Use update when you want to refresh content.

Cache Behavior

The update command uses HTTP caching headers:

ETags

If a file returns the same ETag, it's skipped:

GET /style.css
If-None-Match: "abc123"

304 Not Modified → Skip download

Last-Modified

If a file hasn't changed since last capture:

GET /page.html
If-Modified-Since: Sat, 01 Jan 2024 00:00:00 GMT

304 Not Modified → Skip download

Forced Refresh

To ignore cache and re-download everything:

smippo update --no-cache

Verbose Output

See what's being updated:

smippo update --verbose

Output shows:

  • Skipped: /page.html (not modified) — Unchanged files
  • Updated: /news.html — Modified files
  • New: /blog/post-2024.html — Newly discovered pages

Update Scheduling

For automated updates, use cron:

# Update docs every Sunday at 3am
0 3 * * 0 cd /path/to/mirror && smippo update -o ./docs

Or use a Node.js script:

import { capture } from 'smippo';
import { readManifest } from 'smippo';

const manifest = await readManifest('./docs');
await capture(manifest.rootUrl, {
  output: './docs',
  useCache: true,
  ...manifest.options
});

Practical Example

Maintaining a documentation archive:

# Initial capture
smippo https://docs.example.com \
  --depth 5 \
  --scope subdomain \
  --output ./docs-archive

# Weekly updates
smippo update -o ./docs-archive

# Check what changed
smippo update -o ./docs-archive --verbose

Requirements

For update to work, the directory must contain:

  • .smippo/manifest.json — Capture metadata
  • .smippo/cache.json — Cache data (ETags, etc.)

If these files are missing or corrupted, you'll get an error:

✗ Error: No capture found in the specified directory. Start a new capture first.

All Options

OptionDescriptionDefault
-o, --output <dir>Directory with existing capture./site
-v, --verboseVerbose outputfalse

Was this page helpful?