Utilities
Smippo exports various utility functions for URL handling, filtering, manifest management, and more.
URL Utilities
normalizeUrl
Normalize a URL for consistent comparison.
import { normalizeUrl } from 'smippo';
normalizeUrl('https://Example.com/Page/');
// → 'https://example.com/page'
normalizeUrl('https://example.com:443/path');
// → 'https://example.com/path'
normalizeUrl('http://example.com/path#anchor');
// → 'http://example.com/path'
resolveUrl
Resolve a relative URL against a base URL.
import { resolveUrl } from 'smippo';
resolveUrl('../about', 'https://example.com/docs/intro');
// → 'https://example.com/about'
resolveUrl('/contact', 'https://example.com/about/');
// → 'https://example.com/contact'
resolveUrl('page.html', 'https://example.com/docs/');
// → 'https://example.com/docs/page.html'
urlToPath
Convert a URL to a local file path.
import { urlToPath } from 'smippo';
urlToPath('https://example.com/about');
// → 'example.com/about/index.html'
urlToPath('https://example.com/style.css');
// → 'example.com/style.css'
urlToPath('https://cdn.example.com/font.woff2');
// → 'cdn.example.com/font.woff2'
isInScope
Check if a URL is within the specified scope.
import { isInScope } from 'smippo';
isInScope('https://www.example.com/page', {
baseUrl: 'https://www.example.com',
scope: 'subdomain'
});
// → true
isInScope('https://docs.example.com/page', {
baseUrl: 'https://www.example.com',
scope: 'subdomain'
});
// → false
isInScope('https://docs.example.com/page', {
baseUrl: 'https://www.example.com',
scope: 'domain'
});
// → true
isSameOrigin
Check if two URLs have the same origin.
import { isSameOrigin } from 'smippo';
isSameOrigin('https://example.com/a', 'https://example.com/b');
// → true
isSameOrigin('https://example.com', 'http://example.com');
// → false
isSameOrigin('https://example.com', 'https://www.example.com');
// → false
isSameDomain
Check if two URLs are on the same domain (ignoring subdomain).
import { isSameDomain } from 'smippo';
isSameDomain('https://www.example.com', 'https://docs.example.com');
// → true
isSameDomain('https://example.com', 'https://example.org');
// → false
isLikelyPage
Determine if a URL is likely an HTML page (vs an asset).
import { isLikelyPage } from 'smippo';
isLikelyPage('https://example.com/about');
// → true
isLikelyPage('https://example.com/page.html');
// → true
isLikelyPage('https://example.com/style.css');
// → false
isLikelyPage('https://example.com/image.png');
// → false
isAsset
Check if a URL is likely a static asset.
import { isAsset } from 'smippo';
isAsset('https://example.com/style.css');
// → true
isAsset('https://example.com/script.js');
// → true
isAsset('https://example.com/about');
// → false
Filter Utilities
createFilter
Create a filter for URL and resource filtering.
import { createFilter } from 'smippo';
const filter = createFilter({
baseUrl: 'https://example.com',
scope: 'domain',
include: ['*.html', '*.css'],
exclude: ['*admin*', '*tracking*'],
mimeInclude: ['text/*', 'image/*'],
mimeExclude: ['video/*'],
maxSize: 10 * 1024 * 1024, // 10MB
minSize: 100,
});
// Check if URL should be followed
filter.shouldFollow('https://example.com/page');
// → true
filter.shouldFollow('https://example.com/admin/');
// → false
// Check if resource should be downloaded
filter.shouldDownload('text/html', 5000);
// → true
filter.shouldDownload('video/mp4', 100000000);
// → false
Filter Class
Direct access to the Filter class for advanced usage.
import { Filter } from 'smippo';
const filter = new Filter({
baseUrl: 'https://example.com',
scope: 'subdomain',
stayInDir: true,
include: ['*/docs/*'],
exclude: ['*private*'],
});
filter.shouldFollow('https://example.com/docs/intro');
// → true
Manifest Utilities
createManifest
Create a new manifest object.
import { createManifest } from 'smippo';
const manifest = createManifest('https://example.com', {
depth: 3,
scope: 'domain',
});
console.log(manifest.version);
console.log(manifest.rootUrl);
console.log(manifest.created);
readManifest
Read a manifest from disk.
import { readManifest } from 'smippo';
const manifest = await readManifest('./site');
console.log(manifest.rootUrl);
console.log(manifest.stats.pagesCapt);
console.log(manifest.pages.length);
writeManifest
Write a manifest to disk.
import { writeManifest } from 'smippo';
await writeManifest('./site', manifest);
manifestExists
Check if a manifest exists in a directory.
import { manifestExists } from 'smippo';
if (manifestExists('./site')) {
console.log('Capture exists, can continue or update');
} else {
console.log('No existing capture');
}
readCache
Read cache data from disk.
import { readCache } from 'smippo';
const cache = await readCache('./site');
console.log(cache.etags);
console.log(cache.lastModified);
writeCache
Write cache data to disk.
import { writeCache } from 'smippo';
await writeCache('./site', {
etags: { 'https://example.com/style.css': '"abc123"' },
lastModified: {},
contentTypes: {}
});
Link Utilities
extractLinks
Extract links from HTML content.
import { extractLinks } from 'smippo';
const html = `
<a href="/about">About</a>
<link href="/style.css" rel="stylesheet">
<img src="/logo.png">
`;
const links = extractLinks(html, 'https://example.com');
// → {
// pages: ['https://example.com/about'],
// assets: ['https://example.com/style.css', 'https://example.com/logo.png']
// }
extractCssUrls
Extract URLs from CSS content.
import { extractCssUrls } from 'smippo';
const css = `
body { background: url('/bg.png'); }
@font-face { src: url('/font.woff2'); }
@import url('https://fonts.example.com/style.css');
`;
const urls = extractCssUrls(css, 'https://example.com');
// → [
// 'https://example.com/bg.png',
// 'https://example.com/font.woff2',
// 'https://fonts.example.com/style.css'
// ]
rewriteLinks
Rewrite links in HTML for offline viewing.
import { rewriteLinks } from 'smippo';
const html = `<a href="https://example.com/about">About</a>`;
const urlMap = new Map([
['https://example.com/about', 'example.com/about/index.html']
]);
const rewritten = rewriteLinks(html, 'https://example.com', urlMap);
// → '<a href="./about/index.html">About</a>'
rewriteCssUrls
Rewrite URLs in CSS for offline viewing.
import { rewriteCssUrls } from 'smippo';
const css = `body { background: url('https://example.com/bg.png'); }`;
const urlMap = new Map([
['https://example.com/bg.png', 'example.com/bg.png']
]);
const rewritten = rewriteCssUrls(css, 'https://example.com/style.css', urlMap);
// → 'body { background: url('./bg.png'); }'
Server Utilities
createServer
Create and start a web server.
import { createServer } from 'smippo';
const server = await createServer({
directory: './site',
port: 8080,
host: '127.0.0.1',
open: true,
cors: true,
});
console.log(server.url);
// Stop server
await server.close();
serve
CLI-style serve function (blocks until Ctrl+C).
import { serve } from 'smippo';
await serve({
directory: './site',
port: 8080,
open: true,
});
Type Exports
For TypeScript users, types are exported:
import type {
CrawlerOptions,
CaptureOptions,
CaptureResult,
ServerOptions,
FilterOptions,
} from 'smippo';
Next Steps
- Programmatic API — Getting started
- Crawler Class — Advanced crawler usage
- Examples — Real-world usage