Utilities

Smippo exports various utility functions for URL handling, filtering, manifest management, and more.

URL Utilities

normalizeUrl

Normalize a URL for consistent comparison.

import { normalizeUrl } from 'smippo';

normalizeUrl('https://Example.com/Page/');
// → 'https://example.com/page'

normalizeUrl('https://example.com:443/path');
// → 'https://example.com/path'

normalizeUrl('http://example.com/path#anchor');
// → 'http://example.com/path'

resolveUrl

Resolve a relative URL against a base URL.

import { resolveUrl } from 'smippo';

resolveUrl('../about', 'https://example.com/docs/intro');
// → 'https://example.com/about'

resolveUrl('/contact', 'https://example.com/about/');
// → 'https://example.com/contact'

resolveUrl('page.html', 'https://example.com/docs/');
// → 'https://example.com/docs/page.html'

urlToPath

Convert a URL to a local file path.

import { urlToPath } from 'smippo';

urlToPath('https://example.com/about');
// → 'example.com/about/index.html'

urlToPath('https://example.com/style.css');
// → 'example.com/style.css'

urlToPath('https://cdn.example.com/font.woff2');
// → 'cdn.example.com/font.woff2'

isInScope

Check if a URL is within the specified scope.

import { isInScope } from 'smippo';

isInScope('https://www.example.com/page', {
  baseUrl: 'https://www.example.com',
  scope: 'subdomain'
});
// → true

isInScope('https://docs.example.com/page', {
  baseUrl: 'https://www.example.com',
  scope: 'subdomain'
});
// → false

isInScope('https://docs.example.com/page', {
  baseUrl: 'https://www.example.com',
  scope: 'domain'
});
// → true

isSameOrigin

Check if two URLs have the same origin.

import { isSameOrigin } from 'smippo';

isSameOrigin('https://example.com/a', 'https://example.com/b');
// → true

isSameOrigin('https://example.com', 'http://example.com');
// → false

isSameOrigin('https://example.com', 'https://www.example.com');
// → false

isSameDomain

Check if two URLs are on the same domain (ignoring subdomain).

import { isSameDomain } from 'smippo';

isSameDomain('https://www.example.com', 'https://docs.example.com');
// → true

isSameDomain('https://example.com', 'https://example.org');
// → false

isLikelyPage

Determine if a URL is likely an HTML page (vs an asset).

import { isLikelyPage } from 'smippo';

isLikelyPage('https://example.com/about');
// → true

isLikelyPage('https://example.com/page.html');
// → true

isLikelyPage('https://example.com/style.css');
// → false

isLikelyPage('https://example.com/image.png');
// → false

isAsset

Check if a URL is likely a static asset.

import { isAsset } from 'smippo';

isAsset('https://example.com/style.css');
// → true

isAsset('https://example.com/script.js');
// → true

isAsset('https://example.com/about');
// → false

Filter Utilities

createFilter

Create a filter for URL and resource filtering.

import { createFilter } from 'smippo';

const filter = createFilter({
  baseUrl: 'https://example.com',
  scope: 'domain',
  include: ['*.html', '*.css'],
  exclude: ['*admin*', '*tracking*'],
  mimeInclude: ['text/*', 'image/*'],
  mimeExclude: ['video/*'],
  maxSize: 10 * 1024 * 1024,  // 10MB
  minSize: 100,
});

// Check if URL should be followed
filter.shouldFollow('https://example.com/page');
// → true

filter.shouldFollow('https://example.com/admin/');
// → false

// Check if resource should be downloaded
filter.shouldDownload('text/html', 5000);
// → true

filter.shouldDownload('video/mp4', 100000000);
// → false

Filter Class

Direct access to the Filter class for advanced usage.

import { Filter } from 'smippo';

const filter = new Filter({
  baseUrl: 'https://example.com',
  scope: 'subdomain',
  stayInDir: true,
  include: ['*/docs/*'],
  exclude: ['*private*'],
});

filter.shouldFollow('https://example.com/docs/intro');
// → true

Manifest Utilities

createManifest

Create a new manifest object.

import { createManifest } from 'smippo';

const manifest = createManifest('https://example.com', {
  depth: 3,
  scope: 'domain',
});

console.log(manifest.version);
console.log(manifest.rootUrl);
console.log(manifest.created);

readManifest

Read a manifest from disk.

import { readManifest } from 'smippo';

const manifest = await readManifest('./site');

console.log(manifest.rootUrl);
console.log(manifest.stats.pagesCapt);
console.log(manifest.pages.length);

writeManifest

Write a manifest to disk.

import { writeManifest } from 'smippo';

await writeManifest('./site', manifest);

manifestExists

Check if a manifest exists in a directory.

import { manifestExists } from 'smippo';

if (manifestExists('./site')) {
  console.log('Capture exists, can continue or update');
} else {
  console.log('No existing capture');
}

readCache

Read cache data from disk.

import { readCache } from 'smippo';

const cache = await readCache('./site');

console.log(cache.etags);
console.log(cache.lastModified);

writeCache

Write cache data to disk.

import { writeCache } from 'smippo';

await writeCache('./site', {
  etags: { 'https://example.com/style.css': '"abc123"' },
  lastModified: {},
  contentTypes: {}
});

extractLinks

Extract links from HTML content.

import { extractLinks } from 'smippo';

const html = `
  <a href="/about">About</a>
  <link href="/style.css" rel="stylesheet">
  <img src="/logo.png">
`;

const links = extractLinks(html, 'https://example.com');
// → {
//     pages: ['https://example.com/about'],
//     assets: ['https://example.com/style.css', 'https://example.com/logo.png']
//   }

extractCssUrls

Extract URLs from CSS content.

import { extractCssUrls } from 'smippo';

const css = `
  body { background: url('/bg.png'); }
  @font-face { src: url('/font.woff2'); }
  @import url('https://fonts.example.com/style.css');
`;

const urls = extractCssUrls(css, 'https://example.com');
// → [
//     'https://example.com/bg.png',
//     'https://example.com/font.woff2',
//     'https://fonts.example.com/style.css'
//   ]

rewriteLinks

Rewrite links in HTML for offline viewing.

import { rewriteLinks } from 'smippo';

const html = `<a href="https://example.com/about">About</a>`;
const urlMap = new Map([
  ['https://example.com/about', 'example.com/about/index.html']
]);

const rewritten = rewriteLinks(html, 'https://example.com', urlMap);
// → '<a href="./about/index.html">About</a>'

rewriteCssUrls

Rewrite URLs in CSS for offline viewing.

import { rewriteCssUrls } from 'smippo';

const css = `body { background: url('https://example.com/bg.png'); }`;
const urlMap = new Map([
  ['https://example.com/bg.png', 'example.com/bg.png']
]);

const rewritten = rewriteCssUrls(css, 'https://example.com/style.css', urlMap);
// → 'body { background: url('./bg.png'); }'

Server Utilities

createServer

Create and start a web server.

import { createServer } from 'smippo';

const server = await createServer({
  directory: './site',
  port: 8080,
  host: '127.0.0.1',
  open: true,
  cors: true,
});

console.log(server.url);

// Stop server
await server.close();

serve

CLI-style serve function (blocks until Ctrl+C).

import { serve } from 'smippo';

await serve({
  directory: './site',
  port: 8080,
  open: true,
});

Type Exports

For TypeScript users, types are exported:

import type {
  CrawlerOptions,
  CaptureOptions,
  CaptureResult,
  ServerOptions,
  FilterOptions,
} from 'smippo';

Next Steps

Was this page helpful?