Skip to contents

Splits a data.frame/data.table by partition columns and saves each partition to a separate file using Hive-style directory structure. Eliminates manual looping and splitting logic.

Usage

st_write_parts(
  x,
  base,
  partitioning,
  code = NULL,
  parents = NULL,
  code_label = NULL,
  format = NULL,
  pk = NULL,
  domain = NULL,
  unique = TRUE,
  .progress = NULL,
  ...
)

Arguments

x

Data.frame or data.table to partition and save

base

Base directory for partitions (e.g., "data/welfare_parts")

partitioning

Character vector of column names to partition by (e.g., c("country", "year", "reporting_level"))

code, parents, code_label, format, ...

Passed to st_save() for each partition

pk

Optional primary key columns (passed to st_save())

domain

Optional domain label(s) (passed to st_save())

unique

Logical; enforce PK uniqueness at save time (default TRUE)

.progress

Logical; show progress bar for partitions (default TRUE for >10 parts)

Value

Invisibly, a data.frame with columns:

  • partition_key: list-column of key values

  • path: file path

  • version_id: version identifier

  • n_rows: number of rows in partition

Performance

For large datasets with many partitions, this function uses data.table's split for efficiency when available. Progress reporting can be disabled with .progress = FALSE.

Examples

if (FALSE) { # \dontrun{
# Create sample data
welfare <- data.frame(
  country = rep(c("USA", "CAN"), each = 100),
  year = rep(2020:2021, each = 50),
  reporting_level = sample(c("national", "urban"), 200, replace = TRUE),
  value = rnorm(200)
)

# Auto-partition and save
st_write_parts(
  welfare,
  base = "data/welfare_parts",
  partitioning = c("country", "year", "reporting_level"),
  code_label = "welfare_partition"
)

# Result: files saved to:
#   data/welfare_parts/country=USA/year=2020/reporting_level=national/part.qs2
#   data/welfare_parts/country=USA/year=2020/reporting_level=urban/part.qs2
#   ... etc
} # }