Upcoming RMME/STAT Colloquium (3/24): Joseph L. Schafer, “Modeling Coarsened Categorical Variables: Techniques and Software”

RMME/STAT Joint Colloquium

Modeling Coarsened Categorical Variables: Techniques and Software

Dr. Joseph L. Schafer
U.S. Census Bureau

Friday, March 24, at 11AM ET


Coarsened data can express intermediate states of knowledge between fully observed and fully missing. For example, when classifying survey respondents by cigarette smoking behavior as 1=never smoked, 2=former smoker, or 3=current smoker, we may encounter some who reported having smoked in the past but whose current activity is unknown (either 2 or 3, but not 1). Software for categorical data modeling typically provides codes for missing values but lacks convenient ways to convey states of partial  knowledge. A new R package cvam: Coarsened Variable Modeling, extends R’s implementation of categorical variables (factors) and fits log-linear and latent-class models to incomplete datasets containing coarsened and missing values. Methods include maximum likelihood estimation using an expectation-maximization algorithm, approximate Bayesian and Bayesian inference via Markov chain Monte Carlo. Functions are also provided for comparing models, predicting missing values, creating multiple imputations, and generating partially or fully synthetic data. In the first major application of this software, data from the U.S. Decennial Census and administrative records were combined to predict citizenship status for 309 million residents of the United States.


Loader Loading...
EAD Logo Taking too long?

Reload Reload document
| Open Open in new tab