Hashing, Change Detection and Versions
Source:vignettes/hashing-and-versions.Rmd
hashing-and-versions.Rmd
# Use development build when interactive *and* explicitly enabled via env var.
dev_mode <- (Sys.getenv("DEV_VIGNETTES", "false") == "true")
if (dev_mode && requireNamespace("pkgload", quietly = TRUE)) {
pkgload::load_all(
export_all = FALSE,
helpers = FALSE,
attach_testthat = FALSE
)
} else {
# fall back to the installed package (the path CRAN, CI, and pkgdown take)
library(stamp)
}This article explains how stamp detects changes and records versions. We’ll cover:
- Stable object hashing (
st_hash_obj()), - Code hashing / provenance (
st_hash_code()), - Optional file hashing (
st_hash_file()), - Change-detection helpers (
st_changed(),st_changed_reason(),st_should_save()), and - The lightweight versions catalog (
st_versions(),st_latest(),st_load_version()).
The core idea: stamp computes stable, reproducible hashes for objects
and (optionally) the user-supplied code. When you call
st_save(), stamp compares the new hashes with stored
metadata (sidecars or committed snapshots) and decides whether to write
a new version. This allows cheap “skip-on-equal” behavior for expensive
workflows.
Recommended options
st_opts_reset()
st_opts(
versioning = "content", # skip write when content unchanged
code_hash = TRUE, # store code hash when 'code=' is provided to st_save()
store_file_hash = TRUE, # compute & store file hash after write
verify_on_load = TRUE, # verify content on load (warn on mismatch)
meta_format = "both" # write JSON + QS2 sidecars
)## ✔ stamp options updated
## versioning = "content", code_hash = "TRUE", store_file_hash = "TRUE",
## verify_on_load = "TRUE", meta_format = "both"
Save with hashes (and skip if content identical)
## ✔ stamp initialized
## root: /tmp/RtmpF4xZDt
## state: /tmp/RtmpF4xZDt/.stamp
p <- fs::path(root, "demo.qs")
x <- data.frame(a = 1:3)
# First write: creates artifact + sidecars + catalog entry
st_save(x, p, code = function(z) z)## ✔ Saved [qs] → /tmp/RtmpF4xZDt/demo.qs @ version
## 17789658836fb198
# Second write, same content & same code: skipped (no new version)
st_save(x, p, code = function(z) z)## ✔ Skip save (reason: no_change_policy) for
## /tmp/RtmpF4xZDt/demo.qs
nrow(st_versions(p)) # should be 1## [1] 1
In the snippet above stamp serializes x in a
deterministic way and computes a content hash. If both the content hash
and (when provided) the code hash match the stored metadata, no new
version is created when versioning = "content".
Note: the first write will always create the artifact and its
sidecar(s). If you see the first write skipped, check that the path you
passed to st_save() exactly matches subsequent calls.
If you change content (or change the code), a new version is recorded:
## ✔ Saved [qs] → /tmp/RtmpF4xZDt/demo.qs @ version
## 1946b8e377d9c8d0
nrow(st_versions(p)) # now 2## [1] 2
st_latest(p) # latest version id (string)## [1] "1946b8e377d9c8d0"
Policy: By design, changing the
code=you pass tost_save()creates a new version even ifxis identical. This makes code provenance explicit.
A short practical pattern:
- Pass your transformation code to
st_save(..., code = <function or expression>)so stamp can record the code hash. - Use
st_changed()orst_should_save()to cheaply decide whether to run expensive computations before callingst_save().
Inspect sidecars & metadata
meta <- st_read_sidecar(p)
meta[c(
"format",
"created_at",
"size_bytes",
"content_hash",
"code_hash",
"file_hash"
)]## $format
## [1] "qs"
##
## $created_at
## [1] "2025-12-22T11:00:57.492275Z"
##
## $size_bytes
## [1] 137
##
## $content_hash
## [1] "7e25cdd35cd37239"
##
## $code_hash
## [1] "488e8fa49c740261"
##
## $file_hash
## [1] "28d069cce24a86f3"
Explanation:
content_hashis the stable hash of the R object written (viast_hash_obj()).code_hashis recorded when you providecode=tost_save()(viast_hash_code()).file_hashis computed after the file write (ifstore_file_hash = TRUE) and can be used to detect external file tampering.content_hashcomes fromst_hash_obj(x)code_hashcomes fromst_hash_code(code)if you suppliedcode=file_hashis optional (post-write) and is used to verify on load
Change detection (without writing)
Use these before doing expensive work, to decide whether to recompute.
x_same <- x2
x_new <- transform(x2, a = a + 10L)
st_changed(p, x = x_same, code = function(z) z)## $changed
## [1] FALSE
##
## $reason
## [1] "no_change"
##
## $details
## $details$content_changed
## [1] FALSE
##
## $details$code_changed
## [1] FALSE
##
## $details$file_changed
## [1] FALSE
st_changed_reason(p, x = x_same, code = function(z) z) # "no_change"## [1] "no_change"
st_changed(p, x = x_new, code = function(z) z)## $changed
## [1] TRUE
##
## $reason
## [1] "content"
##
## $details
## $details$content_changed
## [1] TRUE
##
## $details$code_changed
## [1] FALSE
##
## $details$file_changed
## [1] FALSE
st_changed_reason(p, x = x_new, code = function(z) z) # "content"## [1] "content"
st_should_save(p, x = x_same, code = function(z) z) # recommends skip## $save
## [1] FALSE
##
## $reason
## [1] "no_change_policy"
st_should_save(p, x = x_new, code = function(z) z) # recommends save## $save
## [1] TRUE
##
## $reason
## [1] "content"
When you call st_changed() or
st_changed_reason() you avoid performing any file writes.
These helpers are ideal as guards inside functions that compute
expensive results only when necessary:
Example pattern inside your pipeline function:
if (st_should_save(p, x = out, code = my_transform)$save) {
st_save(out, p, code = my_transform)
} else {
message("Skipping write; content and code unchanged")
}Loading specific versions
vids <- st_versions(p)
head(vids)## version_id artifact_id content_hash code_hash
## <char> <char> <char> <char>
## 1: 1946b8e377d9c8d0 fed38ae263aeddbc 7e25cdd35cd37239 488e8fa49c740261
## 2: 17789658836fb198 fed38ae263aeddbc 1811ba4b2bd2a26a 488e8fa49c740261
## size_bytes created_at sidecar_format
## <num> <char> <char>
## 1: 137 2025-12-22T11:00:57.492275Z both
## 2: 137 2025-12-22T11:00:57.325872Z both
vid_latest <- st_latest(p)
obj_latest <- st_load_version(p, vid_latest)## ✔ Loaded ← /tmp/RtmpF4xZDt/demo.qs @
## 1946b8e377d9c8d0 [qs]
# Load an older version by id
if (nrow(vids) > 1L) {
vid_old <- vids$version_id[[nrow(vids)]]
obj_old <- st_load_version(p, vid_old)
}## ✔ Loaded ← /tmp/RtmpF4xZDt/demo.qs @
## 17789658836fb198 [qs]
st_versions() returns a table of version metadata. Each
row includes the version_id, created_at, and a
snapshot of sidecar fields available at commit time. Use
st_load_version() to restore the artifact as it was at that
version.
Where are versions stored?
Snapshots live under
.stamp/versions/<relative-path>/<version_id>/.
p <- fs::path(root, "demo.qs")
x <- data.frame(a = 1:5)
# Write once to create a version snapshot
st_save(x, p, code = function(z) z)## ✔ Saved [qs] → /tmp/RtmpF4xZDt/demo.qs @ version
## ff201adba0a15f08
# Now list the versions tree
vroot <- stamp:::.st_versions_root()
fs::dir_tree(vroot, recurse = TRUE, all = TRUE)## /tmp/RtmpF4xZDt/.stamp/versions
## └── demo.qs
## ├── 17789658836fb198
## │ ├── artifact
## │ ├── sidecar.json
## │ └── sidecar.qs2
## ├── 1946b8e377d9c8d0
## │ ├── artifact
## │ ├── sidecar.json
## │ └── sidecar.qs2
## └── ff201adba0a15f08
## ├── artifact
## ├── sidecar.json
## └── sidecar.qs2
Each snapshot dir contains:
-
artifact— a copy of the saved file -
sidecar.jsonand/orsidecar.qs2— depending onmeta_format
Additionally each snapshot may include a parents.json
file capturing committed lineage between artifacts; this is created when
stamp records explicit parents during a commit. Sidecar metadata (in
stmeta/) is the primary local source used to decide whether
to write, while snapshots are the long-term committed record.
Integrity checks on load (optional)
If verify_on_load = TRUE and a content_hash
exists in the sidecar, st_load() recomputes the object’s
hash and warns if it differs (indicating the file changed outside
stamp).
## Warning: Loaded object hash mismatch for /tmp/RtmpF4xZDt/demo.qs (content hash differs
## from sidecar).
## Warning: No primary key recorded for /tmp/RtmpF4xZDt/demo.qs.
## ℹ You can add one with `st_add_pk()`.
## ✔ Loaded [qs] ← /tmp/RtmpF4xZDt/demo.qs
If verify_on_load = TRUE, st_load()
recomputes st_hash_obj() and compares it to the
content_hash recorded in the sidecar or snapshot. A
mismatch usually means the file was modified outside of stamp and
re-saving is recommended.
Troubleshooting
Q: The first st_save() was skipped and
st_versions(p) is 0. A: The first write should
never be skipped. Ensure you’re using the current
st_should_save() which returns save = TRUE
when the artifact is missing or the sidecar is missing.
Q: st_changed_reason() says
"missing_meta". A: The artifact exists but the
sidecar was removed or is unreadable. Call
st_save(x, p, code = ...) once; it will re-materialize
metadata and record a version.
Q: Changing only code= didn’t create a new
version. A: By design, a code change does
create a new version. Confirm
st_opts("code_hash", .get = TRUE) is TRUE and
you passed code= consistently (e.g., a function literal,
not different object pointers to identical functions in rare cases).
Q: CSV round-trips aren’t byte-identical. A:
data.table::fread/fwrite may coerce types (e.g., integers
vs doubles). Compare with relaxed checks or coerce types before
comparison.
Q: I see a warning on load about hash mismatch. A:
With verify_on_load = TRUE, stamp
recomputes the object hash and warns if it differs from the sidecar’s
content_hash. This indicates the file was modified outside
stamp or the sidecar is stale. Re-save to repair.
Q: qs2 isn’t installed. A:
qs2 is preferred. If unavailable, stamp
falls back to qs for read/write under the
"qs2" handler. Install qs2 for best
performance.
Q: Sidecars not appearing. A: Check
st_opts("meta_format", .get = TRUE) — set to
"json", "qs2", or "both".
Sidecars are written to stmeta/ next to the artifact.
Q: Versions aren’t where I expect. A: Version
snapshots live under
.stamp/versions/<relative-path>/<version_id>/.
Use the code snippet above to explore the tree.
Tips & conventions
- Keep
versioning = "content"for reproducible artifacts; use"timestamp"if you want a new version on every save;"off"to skip versioning entirely. - Use
st_changed()/st_should_save()to gate expensive computation inside your own functions. - Sidecars: prefer
meta_format = "json"for readability,"qs2"for compactness, or"both"for redundancy.
Further reading / next steps:
- See the
lineage-rebuildsvignette for how committedparentsand sidecar parents interact duringst_rebuild(). - Consider recording
code=for critical data transformations so provenance is preserved even when object content is identical.