Skip to contents

Overview

The .stamp/ directory is the persistent storage backend for the stamp package’s versioning system. This vignette explains:

  1. How and when .stamp/ is created
  2. Its internal directory structure
  3. How versioning and metadata work under the hood
  4. Troubleshooting common issues

Audience: This vignette serves both users (understanding what .stamp does) and developers (understanding internal implementation details).

Architecture note

Recent changes introduced alias-aware paths and a dedicated storage directory .st_data/ under the project root. In this model: - .stamp/ continues to store the catalog, locks, and immutable version snapshots. - .st_data/ can hold artifact files when using storage-managed paths, while your code still refers to logical paths. - Paths passed to APIs must resolve under the current alias root (the directory you called st_init() on). Prefer saving within that root (e.g., fs::path(demo_dir, ...)).

The examples below use in-project paths to keep the focus on .stamp/ internals.


1. Creation and Initialization

1.1 Creating .stamp/ with st_init()

The .stamp/ directory is created when you call st_init():

# Create a temporary project directory for demonstration
demo_dir <- fs::path_temp("stamp-demo")
fs::dir_create(demo_dir)

# Initialize stamp
st_init(demo_dir)
#>  stamp initialized
#>   alias: default
#>   root: /tmp/RtmpdAe4jD/stamp-demo
#>   state: /tmp/RtmpdAe4jD/stamp-demo/.stamp

# Inspect what was created
fs::dir_tree(fs::path(demo_dir, ".stamp"), recurse = TRUE, all = TRUE)
#> /tmp/RtmpdAe4jD/stamp-demo/.stamp
#> ├── logs
#> └── temp

What happens during initialization:

  1. Creates directory structure (if it doesn’t exist):

    • .stamp/ - Root state directory
    • .stamp/temp/ - Temporary files during atomic writes
    • .stamp/logs/ - Future use for logging (currently unused)
  2. Records project root in package state (in-memory reference)

  3. Does NOT create or initialize the catalog yet - that happens on first save

1.2 Re-running st_init(): Safe and Non-Destructive

Important: Running st_init() multiple times is safe and will NOT delete or overwrite existing version history.

# Save an artifact to create version history
test_data <- data.frame(x = 1:5, y = letters[1:5])
test_path <- fs::path(demo_dir, "data", "test.qs2")
st_save(test_data, test_path)
#>  Saved [qs2] → /tmp/RtmpdAe4jD/stamp-demo/data/test.qs2 @ version
#>   19b403e9e4b75444

# Check versions exist
versions_before <- st_versions(test_path)
nrow(versions_before)
#> [1] 1

# Re-initialize (this is safe!)
st_init(demo_dir)
#>  stamp initialized
#>   alias: default
#>   root: /tmp/RtmpdAe4jD/stamp-demo
#>   state: /tmp/RtmpdAe4jD/stamp-demo/.stamp

# Version history is preserved
versions_after <- st_versions(test_path)
identical(versions_before, versions_after)
#> [1] TRUE

Why is this safe?

  • st_init() only creates directories that don’t exist
  • The catalog file (catalog.qs2) is read if present, never deleted
  • Version snapshots in versions/ are never touched by initialization

2. Directory Structure Explained

2.1 High-Level Layout

<project-root>/
├── .stamp/                      # State directory (catalog, locks, temp)
│   ├── catalog.qs2              # Central version registry
│   ├── catalog.lock             # Lock file for concurrent access
│   ├── temp/                    # Temporary files during atomic writes
│   └── logs/                    # Reserved for future logging
│
├── results/                     # Your project structure (example)
│   └── model.rds/               # Artifact directory
│       ├── model.rds            # The actual artifact file
│       ├── stmeta/              # Sidecar metadata directory
│       │   ├── sidecar.json     # JSON metadata
│       │   └── sidecar.qs2      # (optional) Binary metadata
│       └── versions/            # Version history (per-artifact)
│           ├── 20250108T121500Z-abc12345/   # Version snapshot
│           │   ├── artifact                  # Snapshot of the file
│           │   ├── sidecar.json              # Metadata at save time
│           │   ├── sidecar.qs2               # (optional) Binary metadata
│           │   └── parents.json              # Lineage information
│           └── 20250108T143000Z-def67890/   # Another version
│               └── ...
│
└── data.qs2/                    # Bare filename (no subdirectory path)
    ├── data.qs2                 # The actual artifact file
    ├── stmeta/                  # Sidecar metadata
    └── versions/                # Version history for this artifact

Let’s explore each component:

2.2 State Directory: .stamp/

The .stamp/ directory contains only state - no artifact data:

  • catalog.qs2 - Central registry of all artifacts and versions
  • catalog.lock - Concurrency control for catalog updates
  • temp/ - Temporary files during atomic writes
  • logs/ - Reserved for future use

Key change (v0.0.9): Version snapshots are no longer under .stamp/versions/. They now live alongside each artifact in per-artifact versions/ directories.

2.3 The Catalog: catalog.qs2

The catalog is a central registry tracking all artifacts and their versions. It’s a QS2-serialized list containing two data.table objects:

catalog <- list(
  artifacts = data.table(
    artifact_id,          # Stable hash of normalized path
    path,                 # Current canonical path
    format,               # File format (rds, qs2, csv, etc.)
    latest_version_id,    # Most recent version identifier
    n_versions            # Total number of saved versions
  ),
  versions = data.table(
    version_id,           # Unique version identifier
    artifact_id,          # Links to artifacts table
    content_hash,         # Hash of file contents
    code_hash,            # Hash of generating code (if tracked)
    size_bytes,           # File size in bytes
    created_at,           # ISO8601 UTC timestamp
    sidecar_format        # "json", "qs2", "both", or "none"
  )
)

Key functions that read the catalog:

# List all versions of an artifact
versions <- st_versions(test_path)
str(versions)
#> Classes 'data.table' and 'data.frame':   1 obs. of  7 variables:
#>  $ version_id    : chr "19b403e9e4b75444"
#>  $ artifact_id   : chr "ab29f8e09c362001"
#>  $ content_hash  : chr "2d26c6e5d9121bfd"
#>  $ code_hash     : chr NA
#>  $ size_bytes    : num 262
#>  $ created_at    : chr "2026-03-04T21:30:15.688979Z"
#>  $ sidecar_format: chr "json"
#>  - attr(*, ".internal.selfref")=<externalptr>

# Get latest version ID
latest_id <- st_latest(test_path)
latest_id
#> [1] "19b403e9e4b75444"

# Get comprehensive info (catalog + sidecar + snapshot location)
info <- st_info(test_path)
str(info, max.level = 1)
#> List of 4
#>  $ sidecar     :List of 10
#>  $ catalog     :List of 2
#>  $ snapshot_dir: 'fs_path' chr "/tmp/RtmpdAe4jD/stamp-demo/data/test.qs2/versions/19b403e9e4b75444"
#>  $ parents     : list()

2.4 Version Snapshots: Per-Artifact versions/

Each time you save an artifact (and versioning is enabled), a new immutable snapshot is created in a versions/ directory next to the artifact itself:

# Save the same artifact multiple times with changes
st_opts(versioning = "timestamp")  # ensure snapshots are created
#>  stamp options updated
#>   versioning = "timestamp"
v1 <- data.frame(x = 1:3)
st_save(v1, test_path, code_label = "initial")
#>  Saved [qs2] → /tmp/RtmpdAe4jD/stamp-demo/data/test.qs2 @ version
#>   a61c4b267562a44e
Sys.sleep(1.1)

v2 <- data.frame(x = 1:5)
st_save(v2, test_path, code_label = "added rows")
#>  Saved [qs2] → /tmp/RtmpdAe4jD/stamp-demo/data/test.qs2 @ version
#>   eed74be794b07791
Sys.sleep(1.1)

v3 <- data.frame(x = 1:5, y = 10:14)
st_save(v3, test_path, code_label = "added column")
#>  Saved [qs2] → /tmp/RtmpdAe4jD/stamp-demo/data/test.qs2 @ version
#>   03da91223c251e31

# Version history is stored NEXT TO the artifact, not in .stamp/
# Extract the artifact directory from sidecar info  
info <- st_info(test_path, alias = NULL)
artifact_dir <- fs::path_dir(info$sidecar$path)
versions_dir <- fs::path(artifact_dir, "versions")

if (fs::dir_exists(versions_dir)) {
  cat("Version snapshots location:", versions_dir, "\n")
  fs::dir_tree(versions_dir, recurse = 1)
} else {
  cat("No versions directory found (versioning may be off).\n")
}
#> No versions directory found (versioning may be off).

# reset to default behavior for the rest of the vignette
st_opts(versioning = "content")
#>  stamp options updated
#>   versioning = "content"

What’s inside a version snapshot directory?

Each version snapshot (e.g., <artifact-dir>/versions/20250108T121500Z-abc12345/) contains:

  1. artifact - A complete copy of the file at that point in time
  2. sidecar.json / sidecar.qs2 - Metadata including:
    • Content hash
    • Code hash (if tracked)
    • File size
    • Timestamp
    • Custom metadata tags
    • Parent references (quick view)
  3. parents.json - Immutable provenance chain showing which artifacts this version depends on (only present if parents were specified)
# Get the latest version directory path
latest_info <- st_info(test_path)
latest_vdir <- latest_info$snapshot_dir

if (!is.na(latest_vdir) && fs::dir_exists(latest_vdir)) {
  cat("Latest snapshot directory:", latest_vdir, "\n\n")
  
  # List contents - note: parents.json only exists if parents were specified
  fs::dir_ls(latest_vdir)
  
  # Read the sidecar from the snapshot
  sidecar_path <- fs::path(latest_vdir, "sidecar.json")
  if (fs::file_exists(sidecar_path)) {
    sidecar <- jsonlite::read_json(sidecar_path)
    str(sidecar[c("path", "format", "created_at", "size_bytes", "code_label")])
  }
} else {
  cat("No snapshot directory recorded for test_path; ensure versioning created snapshots.\n")
}
#> Latest snapshot directory: /tmp/RtmpdAe4jD/stamp-demo/data/test.qs2/versions/03da91223c251e31 
#> 
#> List of 5
#>  $ path      : chr "/tmp/RtmpdAe4jD/stamp-demo/data/test.qs2"
#>  $ format    : chr "qs2"
#>  $ created_at: chr "2026-03-04T21:30:18.532118Z"
#>  $ size_bytes: int 265
#>  $ code_label: chr "added column"

Example of specifying parents during save:

Notice that the parents.json file is not present in the example above. This is because it is only created when parents are specified.

# Ensure snapshots are recorded for this demo
st_opts(versioning = "timestamp")
#>  stamp options updated
#>   versioning = "timestamp"

# First, create an upstream artifact
upstream_path <- fs::path(demo_dir, "data", "upstream.qs2")
upstream_data <- data.frame(id = 1:10, value = rnorm(10))
st_save(upstream_data, upstream_path, code_label = "upstream data")
#>  Saved [qs2] → /tmp/RtmpdAe4jD/stamp-demo/data/upstream.qs2 @ version
#>   8d43671bdf7ce4a6
upstream_version <- st_latest(upstream_path)

# Now create a derived artifact with parent reference
derived_path <- fs::path(demo_dir, "data", "derived.qs2")
derived_data <- data.frame(id = 1:10, transformed = upstream_data$value * 2)
st_save(
  derived_data, 
  derived_path,
  parents = list(list(path = upstream_path, version_id = upstream_version)),
  code_label = "derived from upstream"
)
#>  Saved [qs2] → /tmp/RtmpdAe4jD/stamp-demo/data/derived.qs2 @ version
#>   5ac5f5b4b9219fe3

# Now check the derived artifact's snapshot - parents.json will be present
derived_info <- st_info(derived_path)
derived_vdir <- derived_info$snapshot_dir
if (!is.na(derived_vdir) && fs::dir_exists(derived_vdir)) {
  fs::dir_ls(derived_vdir)
  
  # Read parents.json
  parents_file <- fs::path(derived_vdir, "parents.json")
  if (fs::file_exists(parents_file)) {
    parents <- jsonlite::read_json(parents_file)
    str(parents)
  } else {
    cat("parents.json not found; ensure parents were recorded.\n")
  }
} else {
  cat("No snapshot directory recorded for derived_path; ensure versioning created snapshots.\n")
}
#> List of 1
#>  $ :List of 2
#>   ..$ path      : chr "/tmp/RtmpdAe4jD/stamp-demo/data/upstream.qs2"
#>   ..$ version_id: chr "8d43671bdf7ce4a6"

# Reset to default versioning for the remainder
st_opts(versioning = "content")
#>  stamp options updated
#>   versioning = "content"

2.5 Artifact Storage: Direct-Path Model

Key concept (v0.0.9): Artifacts are stored directly at the path you specify, wrapped in a parent directory:

st_save(data, "results/model.rds")  →  <root>/results/model.rds/model.rds
st_save(data, "data.qs2")           →  <root>/data.qs2/data.qs2

This creates a predictable structure:

<root>/
├── results/
│   └── model.rds/              # Artifact directory
│       ├── model.rds           # The actual file
│       ├── stmeta/             # Sidecar metadata
│       └── versions/           # Version history for THIS artifact
│           ├── 20250108T121500Z-abc12345/
│           └── 20250108T143000Z-def67890/
│
└── data.qs2/                   # Bare filename
    ├── data.qs2                # The actual file
    ├── stmeta/                 # Sidecar metadata
    └── versions/               # Version history for THIS artifact
        └── 20250108T120000Z-xyz98765/

Why This Structure?

  1. Transparency: Artifacts live where you expect them - at the path you specified
  2. Co-location: Version history, metadata, and the artifact itself are grouped together
  3. Simplicity: No need to search .stamp/versions/ - everything is in one place
  4. Distributed: Each artifact manages its own version history independently

Real-World Example

# Save to a subdirectory path
project_file <- fs::path(demo_dir, "outputs", "results.qs2")
st_save(data.frame(x = 1:5), project_file)

# Creates:
#   <demo_dir>/outputs/results.qs2/results.qs2  (the file)
#   <demo_dir>/outputs/results.qs2/stmeta/      (metadata)
#   <demo_dir>/outputs/results.qs2/versions/    (history)

# Save a bare filename
bare_file <- fs::path(demo_dir, "summary.qs2")
st_save(data.frame(y = 6:10), bare_file)

# Creates:
#   <demo_dir>/summary.qs2/summary.qs2  (the file)
#   <demo_dir>/summary.qs2/stmeta/      (metadata)
#   <demo_dir>/summary.qs2/versions/    (history)

Loading Artifacts

When you load, just use the original path you saved with:

# Load using the same path you saved with
data <- st_load("outputs/results.qs2")  # stamp finds outputs/results.qs2/results.qs2
data <- st_load("summary.qs2")          # stamp finds summary.qs2/summary.qs2

Stamp automatically resolves the path to find the actual file in its directory wrapper.


3. How Versioning Works

3.1 The Version Creation Process

When you call st_save(), here’s what happens internally:

┌─────────────────────────────────────────────────────────────────┐
│                       st_save(data, path)                       │
└─────────────────────────┬───────────────────────────────────────┘
                          │
                          ▼
         ┌────────────────────────────────────────┐
         │  1. Decide if save is needed           │
         │     (st_should_save)                   │
         ├────────────────────────────────────────┤
         │  • Check if file exists                │
         │  • Compare content hash                │
         │  • Compare code hash                   │
         │  • Check versioning policy             │
         └────────────────┬───────────────────────┘
                          │
                          ▼
         ┌────────────────────────────────────────┐
         │  2. Write artifact atomically          │
         ├────────────────────────────────────────┤
         │  • Write to temp file                  │
         │  • Move to destination (atomic)        │
         └────────────────┬───────────────────────┘
                          │
                          ▼
         ┌────────────────────────────────────────┐
         │  3. Write sidecar metadata             │
         ├────────────────────────────────────────┤
         │  • Compute hashes                      │
         │  • Record timestamp                    │
         │  • Store in stmeta/ directory          │
         └────────────────┬───────────────────────┘
                          │
                          ▼
         ┌────────────────────────────────────────┐
         │  4. Update catalog                     │
         ├────────────────────────────────────────┤
         │  • Add version row to catalog          │
         │  • Update artifact's latest_version_id │
         │  • Increment n_versions counter        │
         └────────────────┬───────────────────────┘
                          │
                          ▼
         ┌────────────────────────────────────────┐
         │  5. Create version snapshot            │
         ├────────────────────────────────────────┤
         │  • Copy artifact to versions/          │
         │  • Copy sidecars                       │
         │  • Write parents.json                  │
         └────────────────┬───────────────────────┘
                          │
                          ▼
         ┌────────────────────────────────────────┐
         │  6. Apply retention policy             │
         │     (if configured)                    │
         ├────────────────────────────────────────┤
         │  • Prune old versions based on policy  │
         └────────────────────────────────────────┘

Each step is designed to be atomic and crash-safe, ensuring that partial writes never corrupt your version history.

3.2 Version Identifiers

Version IDs are deterministic hashes combining:

  • Timestamp (microsecond precision)
  • Content hash
  • Code hash (if available)
  • Artifact ID

Example: 20250108T121507123456Z-abc12345

This ensures:

  • Chronological ordering - Timestamps sort naturally
  • Uniqueness - Hash suffix prevents collisions
  • Traceability - Hash links to specific content state

3.3 Versioning Modes

Control versioning behavior with st_opts():

# Show current versioning mode
st_opts("versioning", .get = TRUE)
#> [1] "content"

# Available modes:
# - "content" (default): Save version only when content or code changes
# - "timestamp": Save version on every st_save() call
# - "off": Disable versioning (only update current file + sidecar)

# Example: Force version on every save
st_opts(versioning = "timestamp")
#>  stamp options updated
#>   versioning = "timestamp"
v_same <- data.frame(x = 1:3)
st_save(v_same, test_path, code_label = "first")
#>  Saved [qs2] → /tmp/RtmpdAe4jD/stamp-demo/data/test.qs2 @ version
#>   e949e412e857f680
Sys.sleep(0.2)
st_save(v_same, test_path, code_label = "second identical")  # Still creates version!
#>  Saved [qs2] → /tmp/RtmpdAe4jD/stamp-demo/data/test.qs2 @ version
#>   c8d78458ca1f0afb

# Check: two versions with identical content
recent_versions <- st_versions(test_path)
tail(recent_versions[, .(version_id, created_at, content_hash)], 2)
#>          version_id                  created_at     content_hash
#>              <char>                      <char>           <char>
#> 1: a61c4b267562a44e 2026-03-04T21:30:16.162689Z 41c16cfe6598913b
#> 2: 19b403e9e4b75444 2026-03-04T21:30:15.688979Z 2d26c6e5d9121bfd

# Reset to default
st_opts(versioning = "content")
#>  stamp options updated
#>   versioning = "content"

4. Developer Details

4.1 Key Internal Functions

These functions power the .stamp/ infrastructure (from R/version_store.R):

Path and ID Management:

.st_norm_path(path)           # Normalize path to absolute canonical form
.st_artifact_id(path)         # Compute stable hash identifier from path
.st_root_dir()                # Get project root from st_init()
.st_state_dir_abs()           # Get absolute .stamp/ path

Catalog Operations:

.st_catalog_path()            # Path to catalog.qs2
.st_catalog_read()            # Read catalog (or create empty if missing)
.st_catalog_write(cat)        # Atomic catalog write with locking
.st_catalog_record_version()  # Add new version row to catalog

Version Management:

.st_version_dir(rel_path, vid, alias)    # Compute specific version snapshot path
.st_version_commit_files()    # Copy artifact + sidecars to snapshot
.st_version_read_parents()    # Read parents.json from snapshot
.st_version_write_parents()   # Write parents.json to snapshot
# Note: .st_versions_root() is deprecated (centralized storage removed in v0.0.9)

4.2 Concurrency and Safety

File-Based Locking Explained

The Problem: Imagine two R sessions running simultaneously, both trying to save artifacts:

Session A: Reads catalog → Modifies → Writes back
Session B: Reads catalog → Modifies → Writes back

Without coordination, Session B might overwrite Session A’s changes, losing version records.

The Solution: stamp uses file-based locking to ensure only one process modifies the catalog at a time:

Session A: Acquires lock → Reads → Modifies → Writes → Releases lock
Session B: Waits for lock → Acquires lock → Reads → Modifies → Writes → Releases lock

This is implemented via a lock file (.stamp/catalog.lock):

# Internal locking mechanism (simplified)
.st_with_lock(path, {
  cat <- .st_catalog_read()     # Read catalog safely
  # ... modify catalog ...       # Make changes
  .st_catalog_write(cat)         # Write back safely
})

# Lock file: .stamp/catalog.lock
# - Created automatically during catalog updates
# - Uses filelock package if available (recommended)
# - 5-second timeout prevents permanent deadlocks
# - Automatically cleaned up after operation completes

Real-world scenarios where locking matters:

  1. Parallel processing: Running future::plan(multisession) with multiple workers saving artifacts
  2. Shared filesystems: Multiple analysts on a server saving to the same project
  3. Background jobs: RStudio background jobs running st_save() while you work interactively

What happens without the filelock package?

  • stamp falls back to advisory locking (no enforcement, relies on cooperation)
  • Risk of corruption increases in high-concurrency scenarios
  • Install filelock for production use: install.packages("filelock")

Atomic Operations

Beyond locking, stamp uses atomic operations to prevent partial writes that could corrupt your data.

What “atomic” means: An operation that either completes entirely or fails entirely, with no in-between state visible to other processes.

Why this matters: Consider what could go wrong without atomicity:

# Bad scenario (non-atomic):
1. Start writing new data to file
2. ❌ CRASH (power outage, R session killed, etc.)
3. File now contains partial/corrupt data
4. Version history references a broken file

How stamp ensures atomicity:

  • File writes: Always write to temp file → move to final location (filesystem-level atomic operation)
  • Catalog updates: Read-modify-write under lock ensures serialized updates
  • Version snapshots: Immutable once created (copy operations, never modified)

5. Inspecting .stamp/ Programmatically

5.1 User-Level Inspection

# Ensure at least two versions exist for demonstration
# Use explicit alias = NULL to auto-detect current alias
versions <- st_versions(test_path, alias = NULL)
if (nrow(versions) < 2) {
  st_opts(versioning = "timestamp")
  tmp <- data.frame(x = seq_len(3L))
  st_save(tmp, test_path, code_label = "autocreate v1 (user-inspection)")
  Sys.sleep(1.1)
  tmp2 <- transform(tmp, y = x * 10L)
  st_save(tmp2, test_path, code_label = "autocreate v2 (user-inspection)")
  versions <- st_versions(test_path, alias = NULL)
  st_opts(versioning = "content")
}

# List all versions of an artifact (compact view)
versions[, .(version_id, created_at, size_bytes)]
#>          version_id                  created_at size_bytes
#>              <char>                      <char>      <num>
#> 1: c8d78458ca1f0afb 2026-03-04T21:30:19.348221Z        245
#> 2: e949e412e857f680 2026-03-04T21:30:19.106981Z        245
#> 3: 03da91223c251e31 2026-03-04T21:30:18.532118Z        265
#> 4: eed74be794b07791 2026-03-04T21:30:17.390402Z        246
#> 5: a61c4b267562a44e 2026-03-04T21:30:16.162689Z        245
#> 6: 19b403e9e4b75444 2026-03-04T21:30:15.688979Z        262

# Get comprehensive info
info <- st_info(test_path)
info$catalog      # Latest version and count
#> $latest_version_id
#> [1] "c8d78458ca1f0afb"
#> 
#> $n_versions
#> [1] 6
info$snapshot_dir # Path to latest snapshot
#> /tmp/RtmpdAe4jD/stamp-demo/data/test.qs2/versions/c8d78458ca1f0afb

# Load a specific historical version (previous), safely
if (nrow(versions) > 1) {
  old_version_try <- try(st_load(test_path, version = -1, alias = NULL), silent = TRUE)
  if (!inherits(old_version_try, "try-error")) {
    str(old_version_try)
  } else {
    cat("Previous version not available; skipping load.\n")
  }
}
#>  Loaded ← data/test.qs2 @ e949e412e857f680
#> [qs2]
#> 'data.frame':    3 obs. of  1 variable:
#>  $ x: int  1 2 3

5.2 Direct Catalog Access (Advanced)

# NOT recommended for users, but useful for debugging
catalog_path <- fs::path(demo_dir, ".stamp", "catalog.qs2")
if (fs::file_exists(catalog_path)) {
  cat <- qs2::qs_read(catalog_path)
  
  # View all artifacts
  print(cat$artifacts)
  
  # View all versions
  print(cat$versions)
}

5.3 Exploring Snapshots

# Get all version directories for an artifact
# Version history is now stored NEXT TO the artifact, not in .stamp/
info <- st_info(test_path, alias = NULL)
artifact_dir <- fs::path_dir(info$sidecar$path)
versions_dir <- fs::path(artifact_dir, "versions")

if (fs::dir_exists(versions_dir)) {
  cat("Versions directory:", versions_dir, "\n\n")
  
  # List all version snapshots
  snapshot_dirs <- fs::dir_ls(versions_dir, type = "directory")
  cat("Found", length(snapshot_dirs), "version snapshots\n\n")
  
  # Inspect contents of latest snapshot
  if (length(snapshot_dirs) > 0) {
    latest <- snapshot_dirs[length(snapshot_dirs)]
    cat("Latest snapshot:", latest, "\n")
    fs::dir_tree(latest)
    
    # Read parents.json if present
    parents_file <- fs::path(latest, "parents.json")
    if (fs::file_exists(parents_file)) {
      parents <- jsonlite::read_json(parents_file)
      str(parents)
    }
  }
} else {
  cat("No versions directory found for this artifact\n")
}
#> No versions directory found for this artifact

6. Troubleshooting

6.1 Missing or Corrupt .stamp/ Directory

Symptom: Functions like st_versions() return empty results or error.

Diagnosis:

# Check if .stamp exists
stamp_dir <- fs::path(demo_dir, ".stamp")
fs::dir_exists(stamp_dir)

# Check if catalog exists
catalog_path <- fs::path(stamp_dir, "catalog.qs2")
fs::file_exists(catalog_path)

Solution:

# Re-initialize (safe, won't delete existing data)
st_init(demo_dir)

# If catalog is corrupt, it will be recreated empty on first st_save()
# Note: This means version history is lost - restore from backup if available

6.2 Catalog Schema Mismatch

Symptom: Error: “Catalog schema mismatch in versions table.”

Cause: Package upgrade changed catalog structure, or manual corruption.

Solution:

# Back up existing catalog
backup_path <- fs::path(stamp_dir, "catalog_backup.qs2")
fs::file_copy(catalog_path, backup_path, overwrite = TRUE)

# Option 1: Delete catalog and rebuild (LOSES VERSION HISTORY)
fs::file_delete(catalog_path)
# Next st_save() will create fresh catalog

# Option 2: Manual migration (advanced - contact package maintainer)

6.3 Disk Space Issues

Symptom: Version history grows very large across many artifacts.

Diagnosis:

# Check total .stamp/ state size
stamp_dir <- fs::path(demo_dir, ".stamp")
info <- fs::dir_info(stamp_dir, recurse = TRUE)
data.table::data.table(stamp_state_mb = sum(info$size, na.rm = TRUE) / 1024^2)

# Find artifacts with large version histories
# Version directories are now distributed: <artifact-dir>/versions/

all_files <- fs::dir_info(demo_dir, recurse = TRUE, all = TRUE)
version_files <- all_files[grepl("/versions/", all_files$path) & all_files$type == "file", ]
if (nrow(version_files) > 0) {
  # Group by artifact and sum sizes of version files
  version_files$artifact <- dirname(dirname(version_files$path))
  agg <- aggregate(size ~ artifact, version_files, sum)
  agg[order(-agg$size), ]
}

Solution:

# Configure retention policy to auto-prune old versions
st_opts(retention_policy = list(n = 10, days = 90))

# Manually prune versions for specific artifact
st_prune_versions(test_path, policy = list(n = 5), dry_run = FALSE)

# Prune all artifacts (use with caution!)
catalog <- stamp:::.st_catalog_read()
for (aid in unique(catalog$versions$artifact_id)) {
  artifact_path <- catalog$artifacts[artifact_id == aid]$path[1]
  st_prune_versions(artifact_path, policy = list(n = 5), dry_run = FALSE)
}

6.4 Version Timestamp Issues

Symptom: Timestamps appear corrupted or versions won’t load.

Diagnosis:

# Check for invalid timestamps
vers <- st_versions(test_path)
bad_timestamps <- vers[is.na(created_at) | created_at == ""]
nrow(bad_timestamps)
#> [1] 0

Solution: The package automatically drops corrupt version rows when reading. If this happens frequently, check for:

  • System clock issues
  • Concurrent access without proper locking
  • Manual modification of catalog files

6.5 Lock File Issues

Symptom: catalog.lock file persists after crash.

Solution:

# Safe to delete if no stamp operations are running
lock_file <- fs::path(stamp_dir, "catalog.lock")
if (fs::file_exists(lock_file)) {
  fs::file_delete(lock_file)
}

7. Best Practices

7.1 Version Control Integration

Include in .gitignore:

.stamp/temp/
.stamp/logs/
.stamp/catalog.lock

Consider including: - .stamp/catalog.qs2 - Enables version history tracking across team - */versions/ - Version snapshots for all artifacts (useful for small datasets)

For large projects:

# .gitignore
.stamp/catalog.qs2  # Local catalog only
*/versions/         # Exclude all version history (too large for git)
*/*.qs2             # Optionally exclude large binary artifacts
*/*.rds

Note: With the new distributed structure, version directories are scattered throughout your project (e.g., data/model.rds/versions/, results/output.qs2/versions/). Use wildcard patterns to exclude them.

7.2 Backup Strategy

# Periodic catalog backup
backup_dir <- fs::path(demo_dir, "_backups")
fs::dir_create(backup_dir)

# Backup the catalog (small, fast)
catalog_src <- fs::path(demo_dir, ".stamp", "catalog.qs2")
catalog_dst <- fs::path(backup_dir, sprintf("catalog_%s.qs2", Sys.Date()))
fs::file_copy(catalog_src, catalog_dst, overwrite = TRUE)

# For critical projects, backup specific artifacts with version history
# Version directories are now distributed: <artifact-dir>/versions/
artifact_dir <- fs::path(demo_dir, "results", "model.rds")
if (fs::dir_exists(artifact_dir)) {
  artifact_backup <- fs::path(backup_dir, sprintf("model_artifact_%s", Sys.Date()))
  fs::dir_copy(artifact_dir, artifact_backup, overwrite = TRUE)
}

# Or backup the entire project directory (includes all artifacts + versions)
project_backup <- fs::path(backup_dir, sprintf("project_%s", Sys.Date()))
fs::dir_copy(demo_dir, project_backup, overwrite = TRUE)

8. Summary

The stamp package uses a simplified, distributed storage architecture:

Transparent storage - Artifacts live at <path>/<filename>/<filename>, matching your mental model
Distributed versioning - Each artifact has its own versions/ directory, not centralized
State separation - .stamp/ contains only catalog, locks, and temp files
Complete version history - Every save creates an immutable snapshot next to the artifact
Lineage tracking - Parent relationships preserved in parents.json
Concurrent access - File-based locking prevents corruption
Retention policies - Auto-prune old versions per-artifact to manage disk space

Key takeaways for users:

  • Artifacts are stored as <root>/<path>/<filename>/<filename> (direct-path model)
  • Version history lives in <artifact-dir>/versions/, not .stamp/versions/
  • .stamp/ contains only state (catalog, temp, logs) - no artifact data
  • Use st_versions(), st_info(), and st_load(version=...) to explore history
  • Safe to re-initialize without losing history

Key takeaways for developers:

  • Catalog is the source of truth (two tables: artifacts, versions)
  • Version snapshots are per-artifact and immutable
  • All writes use atomic operations + locking for safety
  • .st_versions_root() is deprecated (removed centralized version storage in v0.0.9)
  • Internal functions prefixed with .st_ provide infrastructure

Further Reading