Sealed Directories — Internals, API & Limits
Technical reference for Sealed Directories — the manifest
format, the seal/unseal phases and crash recovery, the auto_seal policy engine, the
shell and Admin API, metrics, and limits to plan around.
Sealing packs a cold directory’s child entries into zstd-compressed segment chunks stored in ordinary volumes, and replaces the individual child records in the filer store with a small manifest on the directory entry. Reads are served from the manifest; writes under the directory are fenced until it is unsealed.
What can it do?
- Shrink the filer metadata store: removes per-entry records for cold children, keeping only a sorted segment index on the parent — roughly 18–58× less filer-store size on hash-fanout trees (example savings).
- Keep directories usable: listing and lookup inside a sealed directory are served from the manifest (decompressed segments are cached), so readers see a normal directory.
- Reuse the storage stack: segments are ordinary volume needles, so they inherit erasure coding, cloud tiering, and replication. Sibling directories sealed in one pass share needles.
- Automate by policy: the
auto_sealworker seals cold directories on ordered path-pattern rules, keyed on child-file idleness. - Reverse cleanly:
fs.unsealmaterializes the children back into the store exactly as they were.
Why do you need it?
The filer stores one metadata record per file and directory, and cold data costs the same as hot. Once a namespace reaches hundreds of millions of entries, the metadata store — not the raw bytes — limits filer memory, disk, open/compaction time, and backup size.
| Scenario | Problem | Sealed Directories |
|---|---|---|
| Deep hash-fanout tree, mostly cold | Millions of leaf entries in the hot store | Seal cold prefixes; ~18× fewer records there |
| Aged logs / archives | Written once, never changed, still billed as metadata | Seal on an idle-time rule |
| Model checkpoints / finished experiments | Immutable, read occasionally | Seal; reads still served from the manifest |
| Actively written directory | Must stay mutable | Idle gate + exclude rules never seal it |
How does it work?
Seal: pack a cold directory's children into volume-stored segment chunks
filer store (before) filer store (after)
┌───────────────────────┐ ┌───────────────────────┐
│ /cold (dir) │ │ /cold (dir + manifest)│──┐
│ /cold/aaaa… (entry) │ seal │ │ │ segment
│ /cold/aaab… (entry) │ ───────► └───────────────────────┘ │ index only
│ … (millions) │ │
└───────────────────────┘ volumes ◄──────────────────┘
┌─────────────────────────┐
│ zstd segment chunks │
│ (EC-able, cloud-tierable)│
└─────────────────────────┘
The seal is transactional and crash-safe:
- Fence — the directory is marked seal-pending (an evented marker that replicates to peer filers); new mutations under it are rejected from this point.
- Build — children are scanned in name order, packed into zstd-compressed segments, and uploaded as volume needles. Sibling directories in one recursive run share needles.
- Commit — the manifest (sorted
first_name/last_name/chunkper segment) is written onto the directory entry and a metadata event is emitted; then the original child records are purged from the store. A row that raced the seal is kept as residue, served alongside the manifest, never lost. - Recovery — a build journal and pending/event markers let
RepairSealfinish or roll back an interrupted seal after a crash, and let peers converge.
Unseal reverses it: children are materialized back into the store from the segments (idempotent — existing residue wins), the manifest is cleared, and the segment needles are reclaimed by deferred GC after a grace period so peers replaying the unseal can still read them.
Policy: /etc/seaweedfs/seal.conf
The auto_seal worker reads an ordered rule list (a SealConfig, protojson) stored on the filer:
{ "rules": [
{ "pattern": "/data/**", "idleSeconds": 2592000, "minEntries": 64 },
{ "pattern": "/data/hot/**", "exclude": true },
{ "pattern": "/data/logs/**", "idleSeconds": 7776000 }
] }
| Field | Meaning |
|---|---|
pattern |
Doublestar glob matched against a directory’s full path (** spans separators, * one segment). |
exclude |
Matching directories are never sealed (carve-out). |
idleSeconds |
Seal only after the directory has been idle this long. 0 = server default (30 days). Floored at 1 hour; a value long enough to overflow is capped (never seals active data). |
minEntries |
Skip directories with fewer than this many entries. 0 = server default (64). |
Last matching rule wins (gitignore / rsync-filter semantics). Idleness is measured from the child files’ modification times (SeaweedFS does not bump a directory’s own mtime on child writes), so a directory under active write is never sealed. Sealing is always recursive: a rule seals the whole matching subtree bottom-up.
Shell & Admin API
weed shell> fs.seal -dryRun /data/archive/2023 # preview: entry & segment counts, manifest size
weed shell> fs.seal /data/archive/2023 # seal the directory and its subtree
weed shell> fs.unseal /data/archive/2023 # materialize children back into the store
weed shell> fs.seal.status /data/archive/2023 # inspect seal state
The Admin UI Sealed Directories page manages the rule list and offers ad-hoc Seal (with dry-run preview) and Unseal. The file browser badges sealed directories and warns when you are inside one.
Deleting a sealed directory
- Delete the whole directory: a server-side recursive delete (
fs.rm -r, the Admin file browser, or an S3 prefix delete) removes the directory, its manifest, and its segment chunks. A FUSErm -rfdoes not work — POSIX turns it into per-child unlinks that each hit the seal fence; use a server-side delete. - Delete or modify specific entries: unseal first, change what you need, then re-seal.
Metrics
Per-operation counters track mutations rejected on sealed directories (create_rejected, update_rejected, delete_rejected, rename_rejected) alongside committed seals, unseals, repairs, and residue-purge outcomes, plus sealed-segment read cache hit/fetch rates.
Limits to plan around
- Sealed directories are read-only until unsealed — seal only data that has gone cold.
- The idle gate keys on child-file mtimes. SeaweedFS does not advance a directory’s own mtime on child deletes, so a delete-only-active directory has no coldness signal; do not point a rule at an active cleanup prefix (the conservative 30-day default limits exposure, and unseal recovers).
- A directory sealed while a subdirectory was still active leaves that subdirectory as live residue and does not re-seal it later on its own; re-run a seal once it is cold.
- System paths are never sealed (
/,/etc,/buckets, the message-queue/topicssubtree).