This short vignette focuses on the core concepts that enable
reproducible, incremental rebuilding with stamp: builders,
plans, and rebuilds. It complements vignettes/stamp.Rmd by
isolating patterns and best practices you should follow when registering
programmatic targets.
if (requireNamespace("pkgload", quietly = TRUE)) {
pkgload::load_all(".")
} else {
library(stamp)
}
library(data.table)
set.seed(123)Key concepts
-
Builder: a function registered with
st_register_builder(path, builder_fn). Signature:function(path, parents). It must return a list with at least:-
x: the object to save -
code: the producing function(s) used to computex(used to compute code_hash) - optional
code_label: a short tag used in sidecars
-
Plan: the result of
st_plan_rebuild()which inspects recorded parents,code_hashvalues and artifact presence. It describes actions required to bring targets up-to-date.Rebuild: executing the plan with
st_rebuild(plan).st_rebuild()calls the registered builders and thenst_save()for each produced artifact, recording new versions and sidecars.
Minimal example (single-target)
# Suppose we have a summary target path
summary_path <- "outputs/welfare_summary.qs2"
# Register a simple builder that loads parents and returns x+code
st_register_builder(summary_path, function(path, parents) {
# Load welfare parents (illustrative — adapt to your parents layout)
welfare_paths <- vapply(
parents,
function(p) grepl("welfare", p$path),
logical(1)
)
welfare_list <- lapply(parents[welfare_paths], function(p) st_load(p$path))
# Load macros (shared parents)
cpi_tbl <- st_load("data/macro/cpi.qs2")
gdp_tbl <- st_load("data/macro/gdp.qs2")
pop_tbl <- st_load("data/macro/population.qs2")
# call a pure worker that accepts pre-loaded inputs
out <- bar(welfare_list, cpi_tbl, gdp_tbl, pop_tbl)
list(
x = out,
code = list(bar = bar),
code_label = "aggregate_welfare"
)
})
# Inspect plan (strict mode is conservative)
plan <- st_plan_rebuild(
targets = summary_path,
include_targets = TRUE,
mode = "strict"
)
plan
# Run only if you're ready
# st_rebuild(plan)Multi-function code metadata
If your target depends on multiple functions (e.g. bar()
calls helpers h1() and h2()), include them
explicitly in code so st_save() computes a
combined hash that changes if any helper changes:
# Good: include all contributing functions
st_save(
x,
summary_path,
pk = c("country", "year"),
parents = parents,
code = list(bar = bar, h1 = h1, h2 = h2)
)This ensures st_is_stale() and plan computation detect
code changes across the whole chain.
Partitioned targets and partial rebuilds
Builders can produce partitioned outputs or you can register many builders — one per partition — to enable targeted re-computation. Two patterns:
Parent-driven builders: the builder receives
parentsand loads only the parents it needs. This is simple when parents are explicit and named in the sidecar.Directory-driven builders: the builder identifies a partition key from the
pathargument and computes output for that key (seevignettes/stamp.Rmdfor an end-to-end example).
Example: register a partitioned builder for per-key summaries (illustrative):
summary_parts_dir <- "outputs/summary_parts"
st_register_builder(
fs::path(
summary_parts_dir,
"country=COL/year=2012/reporting_level=urban/out.qs2"
),
function(path, parents) {
# parse key from path or use parents to find the matching welfare partition
key <- list(country = "COL", year = 2012L, reporting_level = "urban")
# load only matching partition + macros
w <- st_load(st_part_path("data/welfare_parts", key))
cpi <- st_load("data/macro/cpi.qs2")[
country == key$country &
year == key$year &
reporting_level == key$reporting_level
]
gdp <- st_load("data/macro/gdp.qs2")[
country == key$country & year == key$year
]
out <- bar(
list(w),
cpi,
gdp,
st_load("data/macro/population.qs2")[
country == key$country &
year == key$year &
reporting_level == key$reporting_level
]
)
list(x = out, code = list(bar = bar), code_label = "partition_summary")
}
)When a macro like CPI changes for COL 2012,
st_plan_rebuild() will mark only those partitioned targets
as stale and st_rebuild() will execute the few registered
builders required.
Modes: strict vs relaxed
-
mode = "strict"treats missing parents or mismatched code hashes conservatively — prefer this for correctness. -
mode = "relaxed"may skip some checks to avoid unnecessary recomputation in noisy environments; use with caution.
Tips & best practices
- Prefer pure worker functions that accept inputs as arguments. It makes testing and registration easier.
- Register builders that use the
parentsargument; avoid global state when possible. - Include helper functions in
codemetadata so any relevant change invalidates dependent artifacts. - Keep builders deterministic and side-effect free (don’t write inside
builder; instead return
xand letst_rebuild()callst_save()).
Try it locally
To try the examples interactively, set eval = TRUE in
the setup chunk and run examples inside an interactive R session. Use a
short root_dir to avoid Windows PATH length issues.
Appendix: example plan interpretation
A plan is a list-of-lists; each entry includes target
(path), a reason (parent_changed,
code_changed, or missing) and builder info.
Inspect the plan and filter or reorder actions before calling
st_rebuild().
Example (pseudo-inspection):
plan <- st_plan_rebuild(targets = summary_path, include_targets = TRUE)
# Show targets that are stale
Filter(function(x) x$reason != "uptodate", plan)