Update Command
The smippo update command refreshes an existing mirror. It re-crawls the site, fetching only changed content while preserving previously captured pages.
Basic Usage
smippo update [options]
Update Default Directory
Update the mirror in ./site:
smippo update
Update Custom Directory
Update a mirror in a specific location:
smippo update -o ./docs-mirror
smippo update --output ~/archives/blog
How It Works
The update command:
- Reads the manifest — Gets original URL and options
- Re-crawls — Starts a fresh crawl from the root URL
- Checks cache — Uses ETags and Last-Modified headers to skip unchanged files
- Updates changed files — Downloads and saves only modified content
- Adds new pages — Captures newly discovered pages
- Updates manifest — Records the update timestamp
Before Update:
├── example.com/
│ ├── index.html (2024-01-01)
│ └── about.html (2024-01-01)
After Update:
├── example.com/
│ ├── index.html (2024-01-15, updated)
│ ├── about.html (2024-01-01, unchanged)
│ └── news.html (2024-01-15, new)
When to Use Update
Regular Refreshes
Keep documentation mirrors up to date:
# Weekly update of docs
smippo update -o ./react-docs
Check for Changes
See what has changed on a site:
smippo update -o ./competitor-site --verbose
Archive Maintenance
Update archived blogs or news sites:
smippo update -o ./blog-archive
Difference from Continue
| Feature | Continue | Update |
|---|---|---|
| Purpose | Resume interrupted capture | Refresh existing mirror |
| Starting point | Last captured page | Root URL |
| Scope | Queued pages only | Full re-crawl |
| Cache usage | Resume state | Check for changes |
| Manifest | Resume state | Fresh crawl metadata |
Use continue when a capture was interrupted. Use update when you want to refresh content.
Cache Behavior
The update command uses HTTP caching headers:
ETags
If a file returns the same ETag, it's skipped:
GET /style.css
If-None-Match: "abc123"
304 Not Modified → Skip download
Last-Modified
If a file hasn't changed since last capture:
GET /page.html
If-Modified-Since: Sat, 01 Jan 2024 00:00:00 GMT
304 Not Modified → Skip download
Forced Refresh
To ignore cache and re-download everything:
smippo update --no-cache
Verbose Output
See what's being updated:
smippo update --verbose
Output shows:
Skipped: /page.html (not modified)— Unchanged filesUpdated: /news.html— Modified filesNew: /blog/post-2024.html— Newly discovered pages
Update Scheduling
For automated updates, use cron:
# Update docs every Sunday at 3am
0 3 * * 0 cd /path/to/mirror && smippo update -o ./docs
Or use a Node.js script:
import { capture } from 'smippo';
import { readManifest } from 'smippo';
const manifest = await readManifest('./docs');
await capture(manifest.rootUrl, {
output: './docs',
useCache: true,
...manifest.options
});
Practical Example
Maintaining a documentation archive:
# Initial capture
smippo https://docs.example.com \
--depth 5 \
--scope subdomain \
--output ./docs-archive
# Weekly updates
smippo update -o ./docs-archive
# Check what changed
smippo update -o ./docs-archive --verbose
Requirements
For update to work, the directory must contain:
.smippo/manifest.json— Capture metadata.smippo/cache.json— Cache data (ETags, etc.)
If these files are missing or corrupted, you'll get an error:
✗ Error: No capture found in the specified directory. Start a new capture first.
All Options
| Option | Description | Default |
|---|---|---|
-o, --output <dir> | Directory with existing capture | ./site |
-v, --verbose | Verbose output | false |
Related Commands
- Continue Command — Resume interrupted captures
- Capture Command — Start fresh captures