library(joyn)
#> 
#> Attaching package: 'joyn'
#> The following object is masked from 'package:base':
#> 
#>     merge
library(data.table)
#> Warning: package 'data.table' was built under R version 4.3.3
x <- data.table(id = c(1, 4, 2, 3, NA),
                t  = c(1L, 2L, 1L, 2L, NA),
                country = c(16, 12, 3, NA, 15))
  
y <- data.table(id  = c(1, 2, 5, 6, 3),
                gdp = c(11L, 15L, 20L, 13L, 10L),
                country = 16:20)Advanced use
This vignette will let you explore some additional features available
in joyn, through an example use case.
Suppose you want to join tables x and y,
where the variable country is available in both. You could do
one of five things:
1. Use variable country as one of the key variables
If you don’t use the argument by, joyn will
consider country and id as key variables by default
given that they are common between x and
y.
# The variables with the same name, `id` and `country`, are used as key
# variables.
joyn(x = x, 
     y = y)
#> 
#> ── JOYn Report ──
#> 
#>   .joyn n percent
#> 1     x 4   44.4%
#> 2     y 4   44.4%
#> 3 x & y 1   11.1%
#> 4 total 9    100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#> ℹ Note: Removing key variables id and country from id, gdp, and country
#>       id     t country   gdp  .joyn
#>    <num> <int>   <num> <int> <fctr>
#> 1:     1     1      16    11  x & y
#> 2:     4     2      12    NA      x
#> 3:     2     1       3    NA      x
#> 4:     3     2      NA    NA      x
#> 5:    NA    NA      15    NA      x
#> 6:     2    NA      17    15      y
#> 7:     5    NA      18    20      y
#> 8:     6    NA      19    13      y
#> 9:     3    NA      20    10      yAlternatively, you can specify to join by country
# Joining by country
joyn(x = x, 
     y = y, 
     by = "country")
#> 
#> ── JOYn Report ──
#> 
#>   .joyn n percent
#> 1     x 4   44.4%
#> 2     y 4   44.4%
#> 3 x & y 1   11.1%
#> 4 total 9    100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#> ℹ Note: Removing key variables country from id, gdp, and country
#>       id     t country   gdp  .joyn
#>    <num> <int>   <num> <int> <fctr>
#> 1:     1     1      16    11  x & y
#> 2:     4     2      12    NA      x
#> 3:     2     1       3    NA      x
#> 4:     3     2      NA    NA      x
#> 5:    NA    NA      15    NA      x
#> 6:    NA    NA      17    15      y
#> 7:    NA    NA      18    20      y
#> 8:    NA    NA      19    13      y
#> 9:    NA    NA      20    10      y2. Ignore the values of country from y and
don’t bring it into the resulting table
This the default if you did not include country as part of
the key variables in argument by.
joyn(x = x, 
     y = y, 
     by = "id")
#> 
#> ── JOYn Report ──
#> 
#>   .joyn n percent
#> 1     x 2   28.6%
#> 2     y 2   28.6%
#> 3 x & y 3   42.9%
#> 4 total 7    100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#> ℹ Note: Removing key variables id from id, gdp, and country
#>       id     t country   gdp  .joyn
#>    <num> <int>   <num> <int> <fctr>
#> 1:     1     1      16    11  x & y
#> 2:     4     2      12    NA      x
#> 3:     2     1       3    15  x & y
#> 4:     3     2      NA    10  x & y
#> 5:    NA    NA      15    NA      x
#> 6:     5    NA      NA    20      y
#> 7:     6    NA      NA    13      y3. Update only NAs in table x
Another possibility is to make use of the update_NAs
argument of joyn(). This allows you to update the NAs
values in variable country in table x with the
actual values of the matching observations in country from
table y. In this case, actual values in country from table x
will remain unchanged.
joyn(x = x,
     y = y, 
     by = "id", 
     update_NAs = TRUE)
#> 
#> ── JOYn Report ──
#> 
#>        .joyn n percent
#> 1          x 2   28.6%
#> 2      x & y 2   28.6%
#> 3 NA updated 3   42.9%
#> 4      total 7    100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#> ℹ Note: Removing key variables id from id, gdp, and country
#>       id     t country   gdp      .joyn
#>    <num> <int>   <num> <int>     <fctr>
#> 1:     1     1      16    11      x & y
#> 2:     4     2      12    NA          x
#> 3:     2     1       3    15      x & y
#> 4:     3     2      20    10 NA updated
#> 5:    NA    NA      15    NA          x
#> 6:     5    NA      18    20 NA updated
#> 7:     6    NA      19    13 NA updated4. Update actual values in table x
You can also update all the values - both NAs and actual - in
variable country of table x with the actual values
of the matching observations in country from y.
This is done by setting update_values = TRUE.
Notice that the reportvar allows you keep track of how
the update worked. In this case, value update means that only
the values that are different between country from
x and country from y are updated.
However, let’s consider other possible cases:
- If, for the same matching observations, the values between the two country variables were the same, the reporting variable would report x & y instead (so you know that there is no update to make). 
- if there are NAs in country from - y, the actual values in- xwill be unchanged, and you would see a not updated status in the reporting variable. Nevertheless, notice there is another way for you to bring country from- yto- x. This is done through the argument- keep_y_in_x(see 5. below ⬇️)
# Notice that only the value that are 
joyn(x = x, 
     y = y, 
     by = "id", 
     update_values = TRUE)
#> 
#> ── JOYn Report ──
#> 
#>           .joyn n percent
#> 1    NA updated 3   42.9%
#> 2 value updated 2   28.6%
#> 3   not updated 2   28.6%
#> 4         total 7    100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#> ℹ Note: Removing key variables id from id, gdp, and country
#>       id     t country   gdp         .joyn
#>    <num> <int>   <num> <int>        <fctr>
#> 1:     1     1      16    11 value updated
#> 2:     4     2      12    NA   not updated
#> 3:     2     1      17    15 value updated
#> 4:     3     2      20    10    NA updated
#> 5:    NA    NA      15    NA   not updated
#> 6:     5    NA      18    20    NA updated
#> 7:     6    NA      19    13    NA updated5. Keep original country variable from y into returning table
(Keep matching-names variable from y into x -not updating values in x)
Another available option is that of bringing the original variable
country from y into the resulting table, without
using it to update the values in x. In order to distinguish
country from x and country from
y, joyn will assign a suffix to the variable’s
name: so that you will get country.y and country.x.
All of this can be done specifying
keep_common_vars = TRUE.
joyn(x = x, 
     y = y, 
     by = "id", 
     keep_common_vars = TRUE)
#> 
#> ── JOYn Report ──
#> 
#>   .joyn n percent
#> 1     x 2   28.6%
#> 2     y 2   28.6%
#> 3 x & y 3   42.9%
#> 4 total 7    100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#> ℹ Note: Removing key variables id from id, gdp, and country
#>       id     t country.x   gdp country.y  .joyn
#>    <num> <int>     <num> <int>     <int> <fctr>
#> 1:     1     1        16    11        16  x & y
#> 2:     4     2        12    NA        NA      x
#> 3:     2     1         3    15        17  x & y
#> 4:     3     2        NA    10        20  x & y
#> 5:    NA    NA        15    NA        NA      x
#> 6:     5    NA        NA    20        18      y
#> 7:     6    NA        NA    13        19      yBring other variables from y into returning table
In joyn , you can also bring non common variables from
y into the resulting table. In fact you can specify them in
y_vars_to_keep, as shown in the example below:
# Keeping variable gdp 
joyn(x = x, 
     y = y, 
     by = "id", 
     y_vars_to_keep = "gdp")
#> 
#> ── JOYn Report ──
#> 
#>   .joyn n percent
#> 1     x 2   28.6%
#> 2     y 2   28.6%
#> 3 x & y 3   42.9%
#> 4 total 7    100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#>       id     t country   gdp  .joyn
#>    <num> <int>   <num> <int> <fctr>
#> 1:     1     1      16    11  x & y
#> 2:     4     2      12    NA      x
#> 3:     2     1       3    15  x & y
#> 4:     3     2      NA    10  x & y
#> 5:    NA    NA      15    NA      x
#> 6:     5    NA      NA    20      y
#> 7:     6    NA      NA    13      yNotice that if you set y_vars_to_keep = FALSE or
y_vars_to_keep = NULL, then joyn won’t bring
any variable into the returning table.