Package 'MatchThem' reference manual

Title:	Matching and Weighting Multiply Imputed Datasets
Description:	Provides essential tools for the pre-processing techniques of matching and weighting multiply imputed datasets. The package includes functions for matching within and across multiply imputed datasets using various methods, estimating weights for units in the imputed datasets using multiple weighting methods, calculating causal effect estimates in each matched or weighted dataset using parametric or non-parametric statistical models, and pooling the resulting estimates according to Rubin's rules (please see <https://journal.r-project.org/archive/2021/RJ-2021-073/> for more details).
Authors:	Farhad Pishgar [aut, cre], Noah Greifer [aut], Clémence Leyrat [ctb], Elizabeth Stuart [ctb]
Maintainer:	Farhad Pishgar <[email protected]>
License:	GPL (>= 2)
Version:	1.2.2
Built:	2025-03-23 06:11:16 UTC
Source:	https://github.com/farhadpishgar/matchthem

Create a `mimids` object

Description

Creates a mimids object from a list of matchit objects and an imputed dataset.

Usage

as.mimids(x, ...)

## Default S3 method:
as.mimids(x, datasets, ...)
as.mimids(x, ...)

## Default S3 method:
as.mimids(x, datasets, ...)

Arguments

`x`	A list of `matchit` objects, each the output of a call to `MatchIt::matchit()` on an imputed dataset.
`...`	Ignored.
`datasets`	This argument specifies the datasets containing the exposure and the potential confounders called in the `formula`. This argument must be an object of the `mids` or `amelia` class, which is typically produced by a previous call to `mice()` function from the mice package or to `amelia()` function from the Amelia package (the Amelia package is designed to impute missing data in a single cross-sectional dataset or in a time-series dataset, currently, the MatchThem package only supports the former datasets).

Details

The matched datasets are stored as though matchthem() was called with approach = "within".

Value

A mimids object.

Examples


#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- mice::mice(osteoarthritis, m = 5,
                               printFlag = FALSE)

#Matching the multiply imputed datasets manually
match.list <- lapply(1:5, function(i) {
  MatchIt::matchit(OSP ~ AGE + SEX + BMI + RAC + SMK,
                   mice::complete(imputed.datasets, i),
                   method = 'nearest')
})

#Creating mimids object
matched.datasets <- as.mimids(match.list,
                              imputed.datasets)
#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- mice::mice(osteoarthritis, m = 5,
                               printFlag = FALSE)

#Matching the multiply imputed datasets manually
match.list <- lapply(1:5, function(i) {
  MatchIt::matchit(OSP ~ AGE + SEX + BMI + RAC + SMK,
                   mice::complete(imputed.datasets, i),
                   method = 'nearest')
})

#Creating mimids object
matched.datasets <- as.mimids(match.list,
                              imputed.datasets)

Create a `wimids` object

Description

Creates a wimids object from a list of weightit objects and an imputed dataset.

Usage

as.wimids(x, ...)

## Default S3 method:
as.wimids(x, datasets, ...)
as.wimids(x, ...)

## Default S3 method:
as.wimids(x, datasets, ...)

Arguments

`x`	A list of `weightit` objects, each the output of a call to `WeightIt::weightit()` on an imputed dataset.
`...`	Ignored.
`datasets`	The datasets containing the exposure and covariates mentioned in the `formula`. This argument must be an object of the `mids` or `amelia` class, which is typically produced by a previous call to `mice()` from the mice package or to `amelia()` from the Amelia package (the Amelia package is designed to impute missing data in a single cross-sectional dataset or in a time-series dataset, currently, the MatchThem package only supports the former datasets).

Details

The weighted datasets are stored as though weightthem() was called with approach = "within".

Value

A wimids object.

Examples


#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- mice::mice(osteoarthritis, m = 5,
                               printFlag = FALSE)

#Matching the multiply imputed datasets manually
weight.list <- lapply(1:5, function(i) {
  WeightIt::weightit(OSP ~ AGE + SEX + BMI + RAC + SMK,
                     mice::complete(imputed.datasets, i),
                     method = 'glm',
                     estimand = 'ATT')
})

#Creating wimids object
weighted.datasets <- as.wimids(weight.list,
                               imputed.datasets)
#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- mice::mice(osteoarthritis, m = 5,
                               printFlag = FALSE)

#Matching the multiply imputed datasets manually
weight.list <- lapply(1:5, function(i) {
  WeightIt::weightit(OSP ~ AGE + SEX + BMI + RAC + SMK,
                     mice::complete(imputed.datasets, i),
                     method = 'glm',
                     estimand = 'ATT')
})

#Creating wimids object
weighted.datasets <- as.wimids(weight.list,
                               imputed.datasets)

Combine `mimids` and `wimids` Objects by Columns

Description

This function combines a mimids or wimids object columnwise with additional datasets or variables. Typically these would be variables not included in the original multiple imputation and therefore absent in the mimids or wimids object. with() can then be used on the output to run models with the added variables.

Usage

cbind(..., deparse.level = 1)

## S3 method for class 'mimids'
cbind(..., deparse.level = 1)

## S3 method for class 'wimids'
cbind(..., deparse.level = 1)
cbind(..., deparse.level = 1)

## S3 method for class 'mimids'
cbind(..., deparse.level = 1)

## S3 method for class 'wimids'
cbind(..., deparse.level = 1)

Arguments

`...`	Objects to combine columnwise. The first argument should be a `mimids` or `wimids` object. Additional `data.frame`s, `matrix`es, `factor`s, or `vector`s can be supplied. These can be given as named arguments.
`deparse.level`	Ignored.

Value

An object with the same class as the first input object with the additional datasets or variables added to the components.

Author(s)

Farhad Pishgar and Noah Greifer

Examples

#Loading libraries
library(survey)

#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- mice::mice(osteoarthritis, m = 5)

#Weighting the multiply imputed datasets
weighted.datasets <- weightthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                                imputed.datasets,
                                approach = 'within')

#Adding additional variables
weighted.datasets <- cbind(weighted.datasets,
                           logAGE = log(osteoarthritis$AGE))

#Using the additional variables in an analysis
models <- with(weighted.datasets,
               svyglm(KOA ~ OSP + logAGE, family = quasibinomial))

#Pooling results obtained from analyzing the datasets
results <- pool(models)
summary(results)
#Loading libraries
library(survey)

#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- mice::mice(osteoarthritis, m = 5)

#Weighting the multiply imputed datasets
weighted.datasets <- weightthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                                imputed.datasets,
                                approach = 'within')

#Adding additional variables
weighted.datasets <- cbind(weighted.datasets,
                           logAGE = log(osteoarthritis$AGE))

#Using the additional variables in an analysis
models <- with(weighted.datasets,
               svyglm(KOA ~ OSP + logAGE, family = quasibinomial))

#Pooling results obtained from analyzing the datasets
results <- pool(models)
summary(results)

Extracts Multiply Imputed Datasets

Description

complete() extracts data from an object of the mimids or wimids class.

Usage

## S3 method for class 'mimids'
complete(data, action = 1, include = FALSE, mild = FALSE, all = TRUE, ...)

## S3 method for class 'wimids'
complete(data, action = 1, include = FALSE, mild = FALSE, all = TRUE, ...)
## S3 method for class 'mimids'
complete(data, action = 1, include = FALSE, mild = FALSE, all = TRUE, ...)

## S3 method for class 'wimids'
complete(data, action = 1, include = FALSE, mild = FALSE, all = TRUE, ...)

Arguments

`data`	A `mimids` or `wimids` object; the output of a call to `matchthem()` or `weightthem()`.
`action`	The imputed dataset number, intended to extract its data, or an action. The input must be a positive integer or a keyword. The keywords include `"all"` (produces a `mild` object of the multiply imputed datasets), `"long"` (produces a dataset with multiply imputed datasets stacked vertically), and `"broad"` (produces a dataset with multiply imputed datasets stacked horizontally). The default is `1`.
`include`	Whether the original data with the missing values should be included. The input must be a logical value. The default is `FALSE`.
`mild`	Whether the return value should be an object of `mild` class. Please note that setting `mild = TRUE` overrides `action` keywords of `"long"`, `"broad"`, and `"repeated"`. The default is `FALSE`.
`all`	Whether to include observations with a zero estimated weight. The default is `TRUE`.
`...`	Ignored.

Details

complete() works by running mice::complete() on the mids object stored within the mimids or wimids object and appending the outputs of the matching or weighting procedure. For mimids objects, the appended outputs include the matching weights, the propensity score (if included), pair membership (if included), and whether each unit was discarded. For wimids objects, the appended output is the estimated weights.

Value

This function returns the imputed dataset within the supplied mimids or wimids objects.

References

Stef van Buuren and Karin Groothuis-Oudshoorn (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3): 1-67. doi:10.18637/jss.v045.i03

Examples


#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- mice::mice(osteoarthritis, m = 5)

#Matching the multiply imputed datasets
matched.datasets <- matchthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                              imputed.datasets,
                              approach = 'within',
                              method = 'nearest')

#Extracting the first imputed dataset
matched.dataset.1 <- complete(matched.datasets, n = 1)
#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- mice::mice(osteoarthritis, m = 5)

#Matching the multiply imputed datasets
matched.datasets <- matchthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                              imputed.datasets,
                              approach = 'within',
                              method = 'nearest')

#Extracting the first imputed dataset
matched.dataset.1 <- complete(matched.datasets, n = 1)

Checks for the `mimids` Class

Description

is.mimids() function checks whether class of objects is mimids or not.

Usage

is.mimids(object)
is.mimids(object)

Arguments

object

This argument specifies the object that should be checked to see if it is of the mimids class or not.

Details

The class of objects is checked to be of the mimids.

Value

This function returns a logical value indicating whether object is of the mimids class.

Author(s)

Farhad Pishgar

Examples

#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- mice::mice(osteoarthritis, m = 5)

#Matching the multiply imputed datasets
matched.datasets <- matchthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                              imputed.datasets,
                              approach = 'within',
                              method = 'nearest')

#Checking the 'matched.datasets' object
is.mimids(matched.datasets)
#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- mice::mice(osteoarthritis, m = 5)

#Matching the multiply imputed datasets
matched.datasets <- matchthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                              imputed.datasets,
                              approach = 'within',
                              method = 'nearest')

#Checking the 'matched.datasets' object
is.mimids(matched.datasets)

Checks for the `mimipo` Class

Description

is.mimipo() function checks whether class of objects is mimipo or not.

Usage

is.mimipo(object)
is.mimipo(object)

Arguments

object

This argument specifies the object that should be checked to see if it is of the mimipo class or not.

Details

The class of objects is checked to be of the mimipo.

Value

This function returns a logical value indicating whether object is of the mimipo class.

Author(s)

Farhad Pishgar

Examples

#Loading libraries
library(survey)

#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- mice::mice(osteoarthritis, m = 5)

#Estimating weights of observations in the multiply imputed datasets
weighted.datasets <- weightthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                                imputed.datasets,
                                approach = 'within',
                                method = 'glm',
                                estimand = "ATT")

#Analyzing the weighted datasets
models <- with(data = weighted.datasets,
               exp = svyglm(KOA ~ OSP, family = binomial))

#Pooling results obtained from analysing the datasets
results <- pool(models)

#Checking the 'results' object
is.mimipo(results)
#Loading libraries
library(survey)

#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- mice::mice(osteoarthritis, m = 5)

#Estimating weights of observations in the multiply imputed datasets
weighted.datasets <- weightthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                                imputed.datasets,
                                approach = 'within',
                                method = 'glm',
                                estimand = "ATT")

#Analyzing the weighted datasets
models <- with(data = weighted.datasets,
               exp = svyglm(KOA ~ OSP, family = binomial))

#Pooling results obtained from analysing the datasets
results <- pool(models)

#Checking the 'results' object
is.mimipo(results)

Checks for the `mimira` Class

Description

is.mimira() function checks whether class of objects is mimira or not.

Usage

is.mimira(object)
is.mimira(object)

Arguments

object

This argument specifies the object that should be checked to see if it is of the mimira class or not.

Details

The class of objects is checked to be of the mimira.

Value

This function returns a logical value indicating whether object is of the mimira class.

Author(s)

Farhad Pishgar

Examples

#Loading libraries
library(survey)

#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- mice::mice(osteoarthritis, m = 5)

#Estimating weights of observations in the multiply imputed datasets
weighted.datasets <- weightthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                                imputed.datasets,
                                approach = 'within',
                                method = 'glm',
                                estimand = "ATT")

#Analyzing the weighted datasets
models <- with(weighted.datasets,
               svyglm(KOA ~ OSP, family = binomial))

#Checking the 'models' object
is.mimira(models)
#Loading libraries
library(survey)

#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- mice::mice(osteoarthritis, m = 5)

#Estimating weights of observations in the multiply imputed datasets
weighted.datasets <- weightthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                                imputed.datasets,
                                approach = 'within',
                                method = 'glm',
                                estimand = "ATT")

#Analyzing the weighted datasets
models <- with(weighted.datasets,
               svyglm(KOA ~ OSP, family = binomial))

#Checking the 'models' object
is.mimira(models)

Checks for the `wimids` Class

Description

is.wimids() function checks whether class of objects is wimids or not.

Usage

is.wimids(object)
is.wimids(object)

Arguments

object

This argument specifies the object that should be checked to see if it is of the wimids class or not.

Details

The class of objects is checked to be of the wimids.

Value

This function returns a logical value indicating whether object is of the wimids class.

Author(s)

Farhad Pishgar

Examples

#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- mice::mice(osteoarthritis, m = 5)

#Estimating weights of observations in the multiply imputed datasets
weighted.datasets <- weightthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                                imputed.datasets,
                                approach = 'within',
                                method = 'glm',
                                estimand = "ATT")

#Checking the 'weighted.datasets' object
is.wimids(weighted.datasets)
#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- mice::mice(osteoarthritis, m = 5)

#Estimating weights of observations in the multiply imputed datasets
weighted.datasets <- weightthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                                imputed.datasets,
                                approach = 'within',
                                method = 'glm',
                                estimand = "ATT")

#Checking the 'weighted.datasets' object
is.wimids(weighted.datasets)

Matches Multiply Imputed Datasets

Description

matchthem() performs matching in the supplied multiply imputed datasets, given as mids or amelia objects, by running MatchIt::matchit() on each of the multiply imputed datasets with the supplied arguments.

Usage

matchthem(
  formula,
  datasets,
  approach = "within",
  method = "nearest",
  distance = "glm",
  link = "logit",
  distance.options = list(),
  discard = "none",
  reestimate = FALSE,
  ...
)
matchthem(
  formula,
  datasets,
  approach = "within",
  method = "nearest",
  distance = "glm",
  link = "logit",
  distance.options = list(),
  discard = "none",
  reestimate = FALSE,
  ...
)

Arguments

`formula`	A `formula` of the form `z ~ x1 + x2`, where `z` is the exposure and `x1` and `x2` are the covariates to be balanced, which is passed directly to `MatchIt::matchit()` to specify the propensity score model or treatment and covariates to be used in matching. See `MatchIt::matchit()` for details.
`datasets`	This argument specifies the datasets containing the exposure and the potential confounders called in the `formula`. This argument must be an object of the `mids` or `amelia` class, which is typically produced by a previous call to `mice()` function from the mice package or to `amelia()` function from the Amelia package (the Amelia package is designed to impute missing data in a single cross-sectional dataset or in a time-series dataset, currently, the MatchThem package only supports the former datasets).
`approach`	The approach that should be used to combine information in multiply imputed datasets. Currently, `"within"` (performing matching within each dataset) and `"across"` (estimating propensity scores within each dataset, averaging them across datasets, and performing matching using the averaged propensity scores in each dataset) approaches are available. The default is `"within"`, which has been shown to have superior performance in most cases.
`method`	This argument specifies a matching method. Currently, `"nearest"` (nearest neighbor matching), `"exact"` (exact matching), `"full"` (optimal full matching), `"genetic"` (genetic matching), `"subclass"` (subclassication), `"cem"` (coarsened exact matching), `"optimal"` (optimal pair matching), `"quick"` (generalized full matching), and `("cardinality")` (cardinality and profile matching) methods are available. Only methods that produce a propensity score (`"nearest"`, `"full"`, `"genetic"`, `"subclass"`, `"optimal"`, and `"quick"`) are compatible with the `"across"` approach. The default is `"nearest"` for nearest neighbor matching. See `MatchIt::matchit()` for details.
`distance`	The method used to estimate the distance measure (e.g., propensity scores) used in matching, if any. Only options that specify a method of estimating propensity scores (i.e., not `"mahalanobis"`) are compatible with the `"across"` approach. The default is `"glm"` for estimating propensity scores using logistic regression. See `MatchIt::matchit()` and `MatchIt::distance` for details and allowable options.
`link`, `distance.options`, `discard`, `reestimate`	Arguments passed to `MatchIt::matchit()` to control estimation of the distance measure (e.g., propensity scores).
`...`	Additional arguments passed to `MatchIt::matchit()`.

Details

If an amelia object is supplied to datasets, it will be transformed into a mids object for further use. matchthem() works by calling mice::complete() on the mids object to extract a complete dataset, and then calls MatchIt::matchit() on each one, storing the output of each matchit() call and the mids in the output. All arguments supplied to matchthem() except datasets and approach are passed directly to matchit(). With the "across" approach, the estimated propensity scores are averaged across multiply imputed datasets and re-supplied to another set of calls to matchit().

Value

An object of the mimids() (matched multiply imputed datasets) class, which includes the supplied mids object (or an amelia object transformed into a mids object if supplied) and the output of the calls to matchit() on each multiply imputed dataset.

Author(s)

Farhad Pishgar and Noah Greifer

References

Daniel Ho, Kosuke Imai, Gary King, and Elizabeth Stuart (2007). Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis, 15(3): 199-236. https://gking.harvard.edu/files/abs/matchp-abs.shtml

Stef van Buuren and Karin Groothuis-Oudshoorn (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3): 1-67. doi:10.18637/jss.v045.i03

Gary King, James Honaker, Anne Joseph, and Kenneth Scheve (2001). Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation. American Political Science Review, 95: 49–69. https://gking.harvard.edu/files/abs/evil-abs.shtml

Examples

#1

#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- mice::mice(osteoarthritis, m = 5)

#Matching the multiply imputed datasets
matched.datasets <- matchthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                              imputed.datasets,
                              approach = 'within',
                              method = 'nearest')

#2

#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- Amelia::amelia(osteoarthritis, m = 5,
                                   noms = c("SEX", "RAC", "SMK", "OSP", "KOA"))

#Matching the multiply imputed datasets
matched.datasets <- matchthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                              imputed.datasets,
                              approach = 'across',
                              method = 'nearest')
#1

#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- mice::mice(osteoarthritis, m = 5)

#Matching the multiply imputed datasets
matched.datasets <- matchthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                              imputed.datasets,
                              approach = 'within',
                              method = 'nearest')

#2

#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- Amelia::amelia(osteoarthritis, m = 5,
                                   noms = c("SEX", "RAC", "SMK", "OSP", "KOA"))

#Matching the multiply imputed datasets
matched.datasets <- matchthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                              imputed.datasets,
                              approach = 'across',
                              method = 'nearest')

Matched Multiply Imputed Datasets

Description

mimids object contains data of matched multiply imputed datasets. mimids objects are generated by calls to matchthem().

Details

mimids objects have methods for print(), summary(), plot(), and cbind().

Note

The MatchThem package does not use the S4 class definitions and instead relies on the S3 list equivalents.

Author(s)

Farhad Pishgar

References

Stef van Buuren and Karin Groothuis-Oudshoorn (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3): 1-67. doi:10.18637/jss.v045.i03

Multiply Imputed Pooled Outcome

Description

mimipo object contains data of multiply imputed pooled outcome. mimipo objects are generated by calls to pool().

Details

mimipo objects has methods for the print() and summary() functions (please see mice package reference manual for details).

Note

The MatchThem package does not use the S4 class definitions and instead relies on the S3 list equivalents.

Author(s)

Farhad Pishgar

References

Stef van Buuren and Karin Groothuis-Oudshoorn (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3): 1-67. doi:10.18637/jss.v045.i03

Multiply Imputed Repeated Analyses

Description

mimira object contains data of multiply imputed repeated analyses. mimira objects are generated by calls to with().

Details

mimira objects has methods for the print() and summary() functions (please see mice package reference manual for details).

Note

The MatchThem package does not use the S4 class definitions and instead relies on the S3 list equivalents.

Author(s)

Farhad Pishgar

References

Stef van Buuren and Karin Groothuis-Oudshoorn (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3): 1-67. doi:10.18637/jss.v045.i03

Data of 2,585 Participants in the Osteoarthritis Initiative (OAI) Project

Description

osteoarthritis includes demographic data of 2,585 units (individuals) with or at risk of knee osteoarthritis. The recorded data has missing values in body mass index (BMI, a quantitative variable), race (RAC, a categorical qualitative variable), smoking status (SMK, a binary qualitative variable), and knee osteoarthritis status at follow-up (KOA, a binary qualitative variable).

Usage

osteoarthritis
osteoarthritis

Format

This dataset contains 2,585 rows and 7 columns. Each row presents data of an unit (individual) and each column presents data of a characteristic of that unit. The columns are:

AGE: Age of each unit (individual);
SEX: Gender of each unit (individual), coded as 0 (female) and 1 (male);
BMI: Estimated body mass index of each unit (individual);
RAC: Race of each unit (individual), coded as 0 (other), 1 (Caucasian), 2 (African American), and 3 (Asian);
SMK: The smoking status of each unit (individual), coded as 0 (non-smoker) and 1 (smoker);
OSP: Osteoporosis status of each unit (individual) at baseline, coded as 0 (negative) and 1 (positive); and
KOA: Knee osteoarthritis status of each unit (individual) in the follow-up, coded as 0 (at risk) and 1 (diagnosed).

Source

The information presented in the osteoarthritis dataset is based on the publicly available data of the Osteoarthritis Initiative (OAI) project (see https://nda.nih.gov/oai/ for details), with changes.

Pools Estimates by Rubin's Rules

Description

pool() pools estimates from the analyses done within each multiply imputed dataset. The typical sequence of steps to do a matching or weighting procedure on multiply imputed datasets are:

Multiply impute the missing values using the mice() function (from the mice package) or the amelia() function (from the Amelia package), resulting in a multiply imputed dataset (an object of the mids or amelia class);
Match or weight each multiply imputed dataset using matchthem() or weightthem(), resulting in an object of the mimids or wimids class;
Check the extent of balance of covariates in the datasets (using functions from the cobalt package);
Fit the statistical model of interest on each dataset by the with() function, resulting in an object of the mimira class; and
Pool the estimates from each model into a single set of estimates and standard errors, resulting in an object of the mimipo class.

Usage

pool(object, dfcom = NULL)
pool(object, dfcom = NULL)

Arguments

`object`	An object of the `mimira` class (produced by a previous call to `with()`).
`dfcom`	A positive number representing the degrees of freedom in the data analysis. The default is `NULL`, which means to extract this information from the fitted model with the lowest number of observations or the first fitted model (when that fails the parameter is set to `999999`).

Details

pool() function averages the estimates of the model and computes the total variance over the repeated analyses by Rubin’s rules. It calls mice::pool() after computing the model degrees of freedom.

Value

This function returns an object from the mimipo class. Methods for mimipo objects (e.g., print(), summary(), etc.) are imported from the mice package.

References

Stef van Buuren and Karin Groothuis-Oudshoorn (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3): 1-67. doi:10.18637/jss.v045.i03

Examples

#Loading libraries
#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- mice::mice(osteoarthritis, m = 5)

#Weighting the multiply imputed datasets
weighted.datasets <- weightthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                                imputed.datasets,
                                approach = 'within',
                                method = 'glm')

#Analyzing the weighted datasets
models <- with(weighted.datasets,
               WeightIt::glm_weightit(KOA ~ OSP,
                                      family = binomial))

#Pooling results obtained from analyzing the datasets
results <- pool(models)
summary(results)
#Loading libraries
#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- mice::mice(osteoarthritis, m = 5)

#Weighting the multiply imputed datasets
weighted.datasets <- weightthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                                imputed.datasets,
                                approach = 'within',
                                method = 'glm')

#Analyzing the weighted datasets
models <- with(weighted.datasets,
               WeightIt::glm_weightit(KOA ~ OSP,
                                      family = binomial))

#Pooling results obtained from analyzing the datasets
results <- pool(models)
summary(results)

Trim Weights

Description

Trims (i.e., truncates) large weights by setting all weights higher than that at a given quantile to the weight at the quantile. This can be useful in controlling extreme weights, which can reduce effective sample size by enlarging the variability of the weights.

Usage

## S3 method for class 'wimids'
trim(x, at = 0, lower = FALSE, ...)
## S3 method for class 'wimids'
trim(x, at = 0, lower = FALSE, ...)

Arguments

`x`	A `wimids` object; the output of a call to `weightthem()`.
`at`	`numeric`; either the quantile of the weights above which weights are to be trimmed. A single number between .5 and 1, or the number of weights to be trimmed (e.g., `at = 3` for the top 3 weights to be set to the 4th largest weight).
`lower`	`logical`; whether also to trim at the lower quantile (e.g., for `at = .9`, trimming at both .1 and .9, or for `at = 3`, trimming the top and bottom 3 weights). Default is `FALSE` to only trim the higher weights.
`...`	Ignored.

Details

trim.wimids() works by calling WeightIt::trim() on each weightit object stored in the models component of the wimids object. Because trim() itself is not exported from MatchThem, it must be called using WeightIt::trim() or by attaching WeightIt (i.e., running library(WeightIt)) before use.

Value

An object from the wimids class, identical to the original object except with trim() applied to each of the weightit objects in the models component.

Author(s)

Noah Greifer

Examples

#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- mice::mice(osteoarthritis, m = 5)

#Estimating weights of observations in the multiply imputed datasets
weighted.datasets <- weightthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                                imputed.datasets,
                                approach = 'within',
                                method = 'glm',
                                estimand = "ATE")

#Trimming the top 10% of weights in each dataset
#to the 90th percentile
trimmed.datasets <- trim(weighted.datasets, at = 0.9)
#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- mice::mice(osteoarthritis, m = 5)

#Estimating weights of observations in the multiply imputed datasets
weighted.datasets <- weightthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                                imputed.datasets,
                                approach = 'within',
                                method = 'glm',
                                estimand = "ATE")

#Trimming the top 10% of weights in each dataset
#to the 90th percentile
trimmed.datasets <- trim(weighted.datasets, at = 0.9)

Weights Multiply Imputed Datasets

Description

weightthem() performs weighting in the supplied multiply imputed datasets, given as mids or amelia objects, by running WeightIt::weightit() on each of the multiply imputed datasets with the supplied arguments.

Usage

weightthem(formula, datasets, approach = "within", method = "glm", ...)
weightthem(formula, datasets, approach = "within", method = "glm", ...)

Arguments

`formula`	A `formula` of the form `z ~ x1 + x2`, where `z` is the exposure and `x1` and `x2` are the covariates to be balanced, which is passed directly to `WeightIt::weightit()` to specify the propensity score model or treatment and covariates to be used to estimate the weights. See `WeightIt::weightit()` for details.
`datasets`	The datasets containing the exposure and covariates mentioned in the `formula`. This argument must be an object of the `mids` or `amelia` class, which is typically produced by a previous call to `mice()` from the mice package or to `amelia()` from the Amelia package (the Amelia package is designed to impute missing data in a single cross-sectional dataset or in a time-series dataset, currently, the MatchThem package only supports the former datasets).
`approach`	The approach used to combine information in multiply imputed datasets. Currently, `"within"` (estimating weights within each dataset), `"across"` (estimating propensity scores within each dataset, averaging them across datasets, and computing a single set of weights based on that to be applied to all datasets), and `"apw"` (or averaging the probability weights, estimating weights within each dataset and averaging them across datasets) approaches are available. The default is `"within"`, which has been shown to have superior performance in most cases.
`method`	The method used to estimate weights. See `WeightIt::weightit()` for allowable options. Only methods that produce a propensity score (`"glm"`, `"gbm"`, `"ipt"` `"cbps"`, `"super"`, and `"bart"`) are compatible with the `"across"` approach). The default is `"glm"` propensity score weighting using logistic regression propensity scores.
`...`	Additional arguments to be passed to `weightit()`. see `WeightIt::weightit()` for more details.

Details

If an amelia object is supplied to datasets, it will be transformed into a mids object for further use. weightthem() works by calling mice::complete() on the mids object to extract a complete dataset, and then calls WeightIt::weightit() on each dataset, storing the output of each weightit() call and the mids in the output. All arguments supplied to weightthem() except datasets and approach are passed directly to weightit(). With the "across" approach, the estimated propensity scores are averaged across imputations and re-supplied to another set of calls to weightit().

Value

An object of the wimids() (weighted multiply imputed datasets) class, which includes the supplied mids object (or an amelia object transformed into a mids object if supplied) and the output of the calls to weightit() on each multiply imputed dataset.

Author(s)

Farhad Pishgar and Noah Greifer

References

Stef van Buuren and Karin Groothuis-Oudshoorn (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3): 1-67. doi:10.18637/jss.v045.i03

Examples

#1

#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- mice::mice(osteoarthritis, m = 5)

#Estimating weights of observations in the multiply imputed datasets
weighted.datasets <- weightthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                                imputed.datasets,
                                approach = 'within',
                                method = 'glm',
                                estimand = 'ATT')

#2

#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- Amelia::amelia(osteoarthritis, m = 5,
                                   noms = c("SEX", "RAC", "SMK", "OSP", "KOA"))

#Estimating weights of observations in the multiply imputed datasets
weighted.datasets <- weightthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                                imputed.datasets,
                                approach = 'within',
                                method = 'glm',
                                estimand = 'ATT')
#1

#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- mice::mice(osteoarthritis, m = 5)

#Estimating weights of observations in the multiply imputed datasets
weighted.datasets <- weightthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                                imputed.datasets,
                                approach = 'within',
                                method = 'glm',
                                estimand = 'ATT')

#2

#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- Amelia::amelia(osteoarthritis, m = 5,
                                   noms = c("SEX", "RAC", "SMK", "OSP", "KOA"))

#Estimating weights of observations in the multiply imputed datasets
weighted.datasets <- weightthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                                imputed.datasets,
                                approach = 'within',
                                method = 'glm',
                                estimand = 'ATT')

Weighted Multiply Imputed Datasets

Description

wimids object contains data of weighted multiply imputed datasets. The wimids object is generated by calls to the weightthem().

Details

wimids objects have methods for print(), summary(), and cbind().

Note

The MatchThem package does not use the S4 class definitions and instead relies on the S3 list equivalents.

Author(s)

Farhad Pishgar

References

Stef van Buuren and Karin Groothuis-Oudshoorn (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3): 1-67. doi:10.18637/jss.v045.i03

Evaluates an Expression in Matched or Weighted Multiply Imputed Datasets

Description

with() runs a model on the n multiply imputed datasets of the supplied mimids or wimids object. The typical sequence of steps to do a matching or weighting procedure on multiply imputed datasets are:

Multiply impute the missing values using the mice() function (from the mice package) or the amelia() function (from the Amelia package), resulting in a multiply imputed dataset (an object of the mids or amelia class);
Match or weight each multiply imputed dataset using matchthem() or weightthem(), resulting in an object of the mimids or wimids class;
Check the extent of balance of covariates in the datasets (using functions from the cobalt package);
Fit the statistical model of interest on each dataset by the with() function, resulting in an object of the mimira class; and
Pool the estimates from each model into a single set of estimates and standard errors, resulting in an object of the mimipo class.

Usage

## S3 method for class 'mimids'
with(data, expr, cluster, ...)

## S3 method for class 'wimids'
with(data, expr, ...)
## S3 method for class 'mimids'
with(data, expr, cluster, ...)

## S3 method for class 'wimids'
with(data, expr, ...)

Arguments

`data`	A `mimids` or `wimids` object, typically produced by a previous call to the `matchthem()` or `weightthem()`.
`expr`	An expression (usually a call to a modeling function like `glm()`, `coxph()`, `svyglm()`, etc.) to evaluate in each (matched or weighted) multiply imputed dataset. See Details.
`cluster`	When a function from survey (e.g., `survey::svyglm()`) is supplied in `expr`, whether the standard errors should incorporate clustering due to dependence between matched pairs. This is done by supplying the variable containing pair membership to the `ids` argument of `link[survey:svydesign]{svydesign()}`. If unspecified, it will be set to `TRUE` if subclasses (i.e., pairs) are present in the output and there are 20 or more unique subclasses. It will be ignored for matching methods that don't return subclasses (e.g., matching with replacement).
`...`	Additional arguments to be passed to `expr`.

Details

with() applies the supplied model in expr to the (matched or weighted) multiply imputed datasets, automatically incorporating the (matching) weights when possible. The argument to expr should be of the form glm(y ~ z, family = quasibinomial), for example, excluding the data or weights argument, which are automatically supplied.
Functions from the survey package, such as svyglm(), are treated a bit differently. No svydesign object needs to be supplied because with() automatically constructs and supplies it with the imputed dataset and estimated weights. When cluster = TRUE (or with() detects that pairs should be clustered; see the cluster argument above), pair membership is supplied to the ids argument of svydesign().
After weighting using weightthem(), glm_weightit() should be used as the modeling function to fit generalized lienar models. It correctly produces robust standard errors that account for estimation of the weights, if possible. See WeightIt::glm_weightit() for details. Otherwise, svyglm() should be used rather than glm() in order to correctly compute standard errors. For Cox models, coxph() will produce approximately correct standard errors when used with weighting but svycoxph() will produce more accurate standard errors when matching is used.

Value

An object from the mimira class containing the output of the analyses.

Author(s)

Farhad Pishgar and Noah Greifer

References

Stef van Buuren and Karin Groothuis-Oudshoorn (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3): 1-67. doi:10.18637/jss.v045.i03

Examples

#Loading libraries
library(survey)

#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- mice::mice(osteoarthritis, m = 5)

#Matching in the multiply imputed datasets
matched.datasets <- matchthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                              imputed.datasets,
                              approach = 'within',
                              method = 'nearest')

#Analyzing the matched datasets
models <- with(matched.datasets,
               svyglm(KOA ~ OSP, family = binomial),
               cluster = TRUE)

#Weghting in the multiply imputed datasets
weighted.datasets <- weightthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                               imputed.datasets,
                               approach = 'within',
                               method = 'glm')

#Analyzing the matched datasets
models <- with(weighted.datasets,
               WeightIt::glm_weightit(KOA ~ OSP,
                                      family = binomial))

#Loading libraries
library(survey)

#Loading the dataset
data(osteoarthritis)

#Multiply imputing the missing values
imputed.datasets <- mice::mice(osteoarthritis, m = 5)

#Matching in the multiply imputed datasets
matched.datasets <- matchthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                              imputed.datasets,
                              approach = 'within',
                              method = 'nearest')

#Analyzing the matched datasets
models <- with(matched.datasets,
               svyglm(KOA ~ OSP, family = binomial),
               cluster = TRUE)

#Weghting in the multiply imputed datasets
weighted.datasets <- weightthem(OSP ~ AGE + SEX + BMI + RAC + SMK,
                               imputed.datasets,
                               approach = 'within',
                               method = 'glm')

#Analyzing the matched datasets
models <- with(weighted.datasets,
               WeightIt::glm_weightit(KOA ~ OSP,
                                      family = binomial))

Package 'MatchThem'

Help Index

Create a mimids object

Description

Usage

Arguments

Details

Value

See Also

Examples

Create a wimids object

Description

Usage

Arguments

Details

Value

See Also

Examples

Combine mimids and wimids Objects by Columns

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Extracts Multiply Imputed Datasets

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Checks for the mimids Class

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Checks for the mimipo Class

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Checks for the mimira Class

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Checks for the wimids Class

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Matches Multiply Imputed Datasets

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Create a `mimids` object

Create a `wimids` object

Combine `mimids` and `wimids` Objects by Columns

Checks for the `mimids` Class

Checks for the `mimipo` Class

Checks for the `mimira` Class

Checks for the `wimids` Class