Multilevel and Longitudinal Modeling Using Stata, Volumes I and II, 4th Edition

Length: 1047 pages
Edition: 4
Language: English
Publisher: Stata Press
Publication Date: 2021-08-19
ISBN-10: B09CW4JB3W
ISBN-13: 9781597181372
Sales Rank: #0 (See Top 100 Books)
This book is a complete resource for learning to model data in which observations are grouped—whether nested data such as children nested in schools or repeated observations on the same individuals. Rabe-Hesketh and Skrondal introduce a variety of multilevel models for continuous, binary, count, and other outcomes. They also explain when each model is useful, how to fit and evaluate the model using Stata, and how to interpret the results. With this comprehensive coverage, researchers who need to apply multilevel models will find this book to be the perfect companion. It is also an excellent text for courses in multilevel modeling because it provides examples from a variety of disciplines as well as end-of-chapter exercises that allow students to practice newly learned material.
Displays
Preface
Acknowledgments
I Preliminaries
1 Review of linear regression
    1.1 Introduction
    1.2 Is there gender discrimination in faculty           salaries?
    1.3 Independent-samples t test
    1.4 One-way analysis of variance
    1.5 Simple linear regression
    1.6 Dummy variables
    1.7 Multiple linear regression
    1.8 Interactions
    1.9 Dummy variables for more than two groups
    1.10 Other types of interactions
        1.10.1 Interaction between dummy variables
        1.10.2 Interaction between continuous             covariates
    1.11 Nonlinear effects
    1.12 Residual diagnostics
    1.13 ❖ Causal and noncausal interpretations of           regression coefficients
        1.13.1 Regression as conditional             expectation
        1.13.2 Regression as structural model
    1.14 Summary and further reading
    1.15 Exercises
II Two-level models
2 Variance-components models
    2.1 Introduction
    2.2 How reliable are peak-expiratory-flow           measurements?
    2.3 Inspecting within-subject dependence
    2.4 The variance-components model
        2.4.1 Model specification
        2.4.2 Path diagram
        2.4.3 Between-subject heterogeneity
        2.4.4 Within-subject dependence
            Intraclass correlation
            Intraclass correlation versus Pearson               correlation
    2.5 Estimation using Stata
        2.5.1 Data preparation: Reshaping from wide form             to long form
        2.5.2 Using xtreg
        2.5.3 Using mixed
    2.6 Hypothesis tests and confidence           intervals
        2.6.1 Hypothesis test and confidence interval for             the population mean
        2.6.2 Hypothesis test and confidence interval for             the between-cluster variance
            Likelihood-ratio test
            ❖ Score test
            F test
            Confidence intervals
    2.7 ❖ Model as data-generating mechanism
    2.8 Fixed versus random effects
    2.9 Crossed versus nested effects
    2.10 Parameter estimation
        2.10.1 Model assumptions
            Mean structure and covariance structure
            Distributional assumptions
        2.10.2 Different estimation methods
        2.10.3 Inference for β
            Estimate and standard error: Balanced               case
            Estimate: Unbalanced case
    2.11 Assigning values to the random           intercepts
        2.11.1 Maximum “likelihood” estimation
            Implementation via OLS
            Implementation via the mean total               residual
        2.11.2 Empirical Bayes prediction
        2.11.3 Empirical Bayes standard errors
            Posterior and comparative standard               errors
            Diagnostic standard errors
            Accounting for uncertainty in β
        2.11.4 ❖ Bayesian interpretation of REML             estimation and prediction
    2.12 Summary and further reading
    2.13 Exercises
3 Random-intercept models with covariates
    3.1 Introduction
    3.2 Does smoking during pregnancy affect           birthweight?
        3.2.1 Data structure and descriptive             statistics
    3.3 The linear random-intercept model with           covariates
        3.3.1 Model specification
        3.3.2 Model assumptions
        3.3.3 Mean structure
        3.3.4 Residual covariance structure
        3.3.5 Graphical illustration of random-intercept             model
    3.4 Estimation using Stata
        3.4.1 Using xtreg
        3.4.2 Using mixed
    3.5 Coefficients of determination or variance           explained
    3.6 Hypothesis tests and confidence           intervals
        3.6.1 Hypothesis tests for individual regression             coefficients
        3.6.2 Joint hypothesis tests for several             regression coefficients
        3.6.3 Predicted means and confidence             intervals
        3.6.4 Hypothesis test for random-intercept             variance
    3.7 Between and within effects of level-1           covariates
        3.7.1 Between-mother effects
        3.7.2 Within-mother effects
        3.7.3 ❖ Relations among within estimator, between             estimator, and estimator for random-intercept             model
        3.7.4 Level-2 endogeneity and cluster-level             confounding
        3.7.5 Conventional Hausman test
        3.7.6 Allowing for different within and between             effects
        3.7.7 Robust Hausman test
    3.8 Fixed versus random effects revisited
    3.9 Assigning values to random effects: Residual           diagnostics
    3.10 More on statistical inference
        3.10.1 ❖ Overview of estimation methods
            Pooled OLS
            Feasible generalized least squares               (FGLS)
            ML by iterative GLS (IGLS)
            ML by Newton–Raphson and Fisher scoring
            ML by the expectation-maximization (EM)               algorithm
            REML
        3.10.2 Consequences of using standard regression             modeling for clustered data
            Purely between-cluster covariate
            Purely within-cluster covariate
        3.10.3 ❖ Power and sample-size             determination
            Purely between-cluster covariate
            Purely within-cluster covariate
    3.11 Summary and further reading
    3.12 Exercises
4 Random-coefficient models
    4.1 Introduction
    4.2 How effective are different schools?
    4.3 Separate linear regressions for each           school
    4.4 Specification and interpretation of a           random-coefficient model
        4.4.1 Specification of a random-coefficient             model
        4.4.2 Interpretation of the random-effects             variances and covariances
    4.5 Estimation using mixed
        4.5.1 Random-intercept model
        4.5.2 Random-coefficient model
    4.6 Testing the slope variance
    4.7 Interpretation of estimates
    4.8 Assigning values to the random intercepts and           slopes
        4.8.1 Maximum “likelihood” estimation
        4.8.2 Empirical Bayes prediction
        4.8.3 Model visualization
        4.8.4 Residual diagnostics
        4.8.5 Inferences for individual schools
    4.9 Two-stage model formulation
    4.10 Some warnings about random-coefficient           models
        4.10.1 Meaningful specification
        4.10.2 Many random coefficients
        4.10.3 Convergence problems
        4.10.4 Lack of identification
    4.11 Summary and further reading
    4.12 Exercises
III Models for longitudinal and panel data
5 Subject-specific effects, endogeneity, and         unobserved confounding
    5.1 Introduction
    5.2 Random-effects approach: No endogeneity
    5.3 Fixed-effects approach: Level-2           endogeneity
        5.3.1 De-meaning and subject dummies
            De-meaning
            Subject dummies
        5.3.2 Hausman test
        5.3.3 Mundlak approach and robust Hausman             test
        5.3.4 First-differencing
    5.4 Difference-in-differences and repeated-measures           ANOVA
        5.4.1 Does raising the minimum wage reduce             employment?
        5.4.2 ❖ Repeated-measures ANOVA
    5.5 Subject-specific coefficients
        5.5.1 Random-coefficient model: No             endogeneity
        5.5.2 Fixed-coefficient model: Level-2             endogeneity
    5.6 Hausman–Taylor: Level-2 endogeneity for level-1           and level-2 covariates
    5.7 Instrumental-variable methods: Level-1 (and           level-2) endogeneity
        5.7.1 Do deterrents decrease crime rates?
        5.7.2 Conventional fixed-effects approach
        5.7.3 Fixed-effects IV estimator
        5.7.4 Random-effects IV estimator
        5.7.5 More Hausman tests
    5.8 Dynamic models
        5.8.1 Dynamic model without subject-specific             intercepts
        5.8.2 Dynamic model with subject-specific             intercepts
    5.9 Missing data and dropout
        5.9.1 ❖ Maximum likelihood estimation under MAR:             A simulation
    5.10 Summary and further reading
    5.11 Exercises
6 Marginal models
    6.1 Introduction
    6.2 Mean structure
    6.3 Covariance structures
        6.3.1 Unstructured covariance matrix
        6.3.2 Random-intercept or compound             symmetric/exchangeable structure
        6.3.3 Random-coefficient structure
        6.3.4 Autoregressive and exponential             structures
        6.3.5 Moving-average residual structure
        6.3.6 Banded and Toeplitz structures
    6.4 Hybrid and complex marginal models
        6.4.1 Random effects and correlated level-1             residuals
        6.4.2 Heteroskedastic level-1 residuals over             occasions
        6.4.3 Heteroskedastic level-1 residuals over             groups
        6.4.4 Different covariance matrices over             groups
    6.5 Comparing the fit of marginal models
    6.6 Generalized estimating equations (GEE)
    6.7 Marginal modeling with few units and many           occasions
        6.7.1 Is a highly organized labor market             beneficial for economic growth?
        6.7.2 Marginal modeling for long panels
        6.7.3 Fitting marginal models for long panels in             Stata
    6.8 Summary and further reading
    6.9 Exercises
7 Growth-curve models
    7.1 Introduction
    7.2 How do children grow?
        7.2.1 Observed growth trajectories
    7.3 Models for nonlinear growth
        7.3.1 Polynomial models
            Estimation using mixed
            Predicting the mean trajectory
            Predicting trajectories for individual               children
        7.3.2 Piecewise linear models
            Estimation using mixed
            Predicting the mean trajectory
    7.4 Two-stage model formulation and cross-level           interaction
    7.5 Heteroskedasticity
        7.5.1 Heteroskedasticity at level 1
        7.5.2 Heteroskedasticity at level 2
    7.6 How does reading improve from kindergarten           through third grade?
    7.7 Growth-curve model as a structural equation           model
        7.7.1 Estimation using sem
        7.7.2 Estimation using mixed
    7.8 Summary and further reading
    7.9 Exercises
IV Models with nested and crossed random         effects
8 Higher-level models with nested random         effects
    8.1 Introduction
    8.2 Do peak-expiratory-flow measurements vary           between methods within subjects?
    8.3 Inspecting sources of variability
    8.4 Three-level variance-components models
    8.5 Different types of intraclass           correlation
    8.6 Estimation using mixed
    8.7 Empirical Bayes prediction
    8.8 Testing variance components
    8.9 Crossed versus nested random effects           revisited
    8.10 Does nutrition affect cognitive development of           Kenyan children?
    8.11 Describing and plotting three-level           data
        8.11.1 Data structure and missing data
        8.11.2 Level-1 variables
        8.11.3 Level-2 variables
        8.11.4 Level-3 variables
        8.11.5 Plotting growth trajectories
    8.12 Three-level random-intercept model
        8.12.1 Model specification: Reduced form
        8.12.2 Model specification: Three-stage             formulation
        8.12.3 Estimation using mixed
    8.13 Three-level random-coefficient models
        8.13.1 Random coefficient at the child             level
            Estimation using mixed
        8.13.2 Random coefficient at the child and school             levels
            Estimation using mixed
    8.14 Residual diagnostics and predictions
    8.15 Summary and further reading
    8.16 Exercises
9 Crossed random effects
    9.1 Introduction
    9.2 How does investment depend on expected profit           and capital stock?
    9.3 A two-way error-components model
        9.3.1 Model specification
        9.3.2 Residual variances, covariances, and             intraclass correlations
            Longitudinal correlations
            Cross-sectional correlations
        9.3.3 Estimation using mixed
        9.3.4 Prediction
    9.4 How much do primary and secondary schools           affect attainment at age 16?
    9.5 Data structure
    9.6 Additive crossed random-effects model
        9.6.1 Specification
        9.6.2 Intraclass correlations
        9.6.3 Estimation using mixed
    9.7 Crossed random-effects model with random           interaction
        9.7.1 Model specification
        9.7.2 Intraclass correlations
        9.7.3 Estimation using mixed
        9.7.4 Testing variance components
        9.7.5 Some diagnostics
    9.8 ❖ A trick requiring fewer random effects
    9.9 Summary and further reading
    9.10 Exercises
A Useful Stata commands
V Models for categorical responses
10 Dichotomous or binary responses
    10.1 Introduction
    10.2 Single-level logit and probit regression           models for dichotomous responses
        10.2.1 Generalized linear model             formulation
            Labor-participation data
            Estimation using logit
            Estimation using glm
        10.2.2 Latent-response formulation
            Logistic regression
            Probit regression
            Estimation using probit
    10.3 Which treatment is best for toenail           infection?
    10.4 Longitudinal data structure
    10.5 Proportions and fitted population-averaged or           marginal probabilities
        Estimation using logit
    10.6 Random-intercept logistic regression
        10.6.1 Model specification
            Reduced-form specification
            Two-stage formulation
        10.6.2 Model assumptions
        10.6.3 Estimation
            Using xtlogit
            Using melogit
            Using gllamm
    10.7 Subject-specific or conditional versus           population-averaged or marginal relationships
    10.8 Measures of dependence and           heterogeneity
        10.8.1 Conditional or residual intraclass             correlation of the latent responses
        10.8.2 Median odds ratio
        10.8.3 ❖ Measures of association for observed             responses at median fixed part of the model
    10.9 Inference for random-intercept logistic           models
        10.9.1 Tests and confidence intervals for odds             ratios
        10.9.2 Tests of variance components
    10.10 Maximum likelihood estimation
        10.10.1 ❖ Adaptive quadrature
        10.10.2 Some speed and accuracy             considerations
            Integration methods and number of quadrature               points
            Starting values
            Using melogit and gllamm for collapsible               data
            Spherical quadrature in gllamm
    10.11 Assigning values to random effects
        10.11.1 Maximum “likelihood” estimation
        10.11.2 Empirical Bayes prediction
        10.11.3 Empirical Bayes modal prediction
    10.12 Different kinds of predicted           probabilities
        10.12.1 Predicted population-averaged or marginal             probabilities
        10.12.2 Predicted subject-specific             probabilities
            Predictions for hypothetical subjects:               Conditional probabilities
            Predictions for the subjects in the sample:               Posterior mean probabilities
    10.13 Other approaches to clustered dichotomous           data
        10.13.1 Conditional logistic regression
            Estimation using clogit
        10.13.2 Generalized estimating equations             (GEE)
            Estimation using xtgee
    10.14 Summary and further reading
    10.15 Exercises
11 Ordinal responses
    11.1 Introduction
    11.2 Single-level cumulative models for ordinal           responses
        11.2.1 Generalized linear model             formulation
        11.2.2 Latent-response formulation
        11.2.3 Proportional odds
        11.2.4 ❖ Identification
    11.3 Longitudinal data structure and graphs
        11.3.1 Longitudinal data structure
        11.3.2 Plotting cumulative proportions
        11.3.3 Plotting cumulative sample logits and             transforming the time scale
    11.4 Single-level proportional-odds model
        11.4.1 Model specification
            Estimation using ologit
    11.5 Random-intercept proportional-odds           model
        11.5.1 Model specification
            Estimation using meologit
            Estimation using gllamm
        11.5.2 Measures of dependence and             heterogeneity
            Residual intraclass correlation of latent               responses
            Median odds ratio
    11.6 Random-coefficient proportional-odds           model
        11.6.1 Model specification
            Estimation using meologit
            Estimation using gllamm
    11.7 Different kinds of predicted           probabilities
        11.7.1 Predicted population-averaged or marginal             probabilities
        11.7.2 Predicted subject-specific probabilities:             Posterior mean
    11.8 Do experts differ in their grading of student           essays?
    11.9 A random-intercept probit model with grader           bias
        11.9.1 Model specification
            Estimation using gllamm
    11.10 ❖ Including grader-specific measurement-error           variances
        11.10.1 Model specification
            Estimation using gllamm
    11.11 ❖ Including grader-specific thresholds
        11.11.1 Model specification
            Estimation using gllamm
    11.12 ❖ Other link functions
        Cumulative complementary log–log model
        Continuation-ratio logit model
        Adjacent-category logit model
        Baseline-category logit and stereotype             models
    11.13 Summary and further reading
    11.14 Exercises
12 Nominal responses and discrete choice
    12.1 Introduction
    12.2 Single-level models for nominal           responses
        12.2.1 Multinomial logit models
            Transport data version 1
            Estimation using mlogit
        12.2.2 Conditional logit models with             alternative-specific covariates
            Transport data version 2: Expanded form
            Estimation using clogit
            Estimation using cmclogit
        12.2.3 Conditional logit models with alternative-             and unit-specific covariates
            Estimation using clogit
            Estimation using cmclogit
    12.3 Independence from irrelevant           alternatives
    12.4 Utility-maximization formulation
    12.5 Does marketing affect choice of yogurt?
    12.6 Single-level conditional logit models
        12.6.1 Conditional logit models with             alternative-specific intercepts
            Estimation using clogit
            Estimation using cmclogit
    12.7 Multilevel conditional logit models
        12.7.1 Preference heterogeneity: Brand-specific             random intercepts
            Estimation using cmxtmixlogit
            Estimation using gllamm
        12.7.2 Response heterogeneity: Marketing             variables with random coefficients
            Estimation using cmxtmixlogit
            Estimation using gllamm
        12.7.3 ❖ Preference and response             heterogeneity
            Estimation using cmxtmixlogit
            Estimation using gllamm
    12.8 Prediction of marginal choice           probabilities
    12.9 Prediction of random effects and           household-specific choice probabilities
    12.10 Summary and further reading
    12.11 Exercises
VI Models for counts
13 Counts
    13.1 Introduction
    13.2 What are counts?
        13.2.1 Counts versus proportions
        13.2.2 Counts as aggregated event-history             data
    13.3 Single-level Poisson models for counts
    13.4 Did the German healthcare reform reduce the           number of doctor visits?
    13.5 Longitudinal data structure
    13.6 Single-level Poisson regression
        13.6.1 Model specification
            Estimation using poisson
            Estimation using glm
    13.7 Random-intercept Poisson regression
        13.7.1 Model specification
        13.7.2 Measures of dependence and             heterogeneity
        13.7.3 Estimation
            Using xtpoisson
            Using mepoisson
            Using gllamm
    13.8 Random-coefficient Poisson regression
        13.8.1 Model specification
            Estimation using mepoisson
            Estimation using gllamm
    13.9 Overdispersion in single-level models
        13.9.1 Normally distributed random             intercept
            Estimation using xtpoisson
        13.9.2 Negative binomial models
            Mean dispersion or NB2
            Constant dispersion or NB1
        13.9.3 Quasilikelihood
            Estimation using glm
    13.10 Level-1 overdispersion in two-level           models
        13.10.1 Random-intercept Poisson model with             robust standard errors
            Estimation using mepoisson
        13.10.2 Three-level random-intercept model
        13.10.3 Negative binomial models with random             intercepts
            Estimation using menbreg
        13.10.4 The HHG model
    13.11 Other approaches to two-level count           data
        13.11.1 Conditional Poisson regression
            Estimation using xtpoisson, fe
            Estimation using Poisson regression with dummy               variables for clusters
        13.11.2 Conditional negative binomial             regression
        13.11.3 Generalized estimating equations
            Estimation using xtgee
    13.12 Estimating marginal and conditional effects           when responses are missing at random
        ❖ Simulation
    13.13 Which Scottish counties have a high risk of           lip cancer?
    13.14 Standardized mortality ratios
    13.15 Random-intercept Poisson regression
        13.15.1 Model specification
            Estimation using gllamm
        13.15.2 Prediction of standardized mortality             ratios
    13.16 ❖ Nonparametric maximum likelihood           estimation
        13.16.1 Specification
            Estimation using gllamm
        13.16.2 Prediction
    13.17 Summary and further reading
    13.18 Exercises
VII Models for survival or duration data
14 Discrete-time survival
    14.1 Introduction
    14.2 Single-level models for discrete-time survival           data
        14.2.1 Discrete-time hazard and discrete-time             survival
            Promotions data
        14.2.2 Data expansion for discrete-time survival             analysis
        14.2.3 Estimation via regression models for             dichotomous responses
            Estimation using logit
        14.2.4 Including time-constant covariates
            Estimation using logit
        14.2.5 Including time-varying covariates
            Estimation using logit
        14.2.6 Multiple absorbing events and competing             risks
            Estimation using mlogit
        14.2.7 Handling left-truncated data
    14.3 How does mother’s birth history affect child           mortality?
    14.4 Data expansion
    14.5 ❖ Proportional hazards and           interval-censoring
    14.6 Complementary log–log models
        14.6.1 Marginal baseline hazard
            Estimation using cloglog
        14.6.2 Including covariates
            Estimation using cloglog
    14.7 Random-intercept complementary log–log           model
        14.7.1 Model specification
            Estimation using mecloglog
    14.8 ❖ Population-averaged or marginal vs.           cluster-specific or conditional survival           probabilities
    14.9 Summary and further reading
    14.10 Exercises
15 Continuous-time survival
    15.1 Introduction
    15.2 What makes marriages fail?
    15.3 Hazards and survival
    15.4 Proportional hazards models
        15.4.1 Piecewise exponential model
            Estimation using streg
            Estimation using poisson
        15.4.2 Cox regression model
            Estimation using stcox
        15.4.3 Cox regression via Poisson regression for             expanded data
            Estimation using xtpoisson, fe
        15.4.4 Approximate Cox regression: Poisson             regression with smooth baseline hazard
            Estimation using poisson
    15.5 Accelerated failure-time models
        15.5.1 Log-normal model
            Estimation using streg
            Estimation using stintreg
    15.6 Time-varying covariates
        Estimation using streg
    15.7 Does nitrate reduce the risk of angina           pectoris?
    15.8 Marginal modeling
        15.8.1 Cox regression with occasion-specific             dummy variables
            Estimation using stcox
        15.8.2 Cox regression with occasion-specific             baseline hazards
            Estimation using stcox, strata
        15.8.3 Approximate Cox regression
            Estimation using poisson
    15.9 Multilevel proportional hazards models
        15.9.1 Cox regression with gamma shared             frailty
            Estimation using stcox, shared
        15.9.2 Approximate Cox regression with log-normal             shared frailty
            Estimation using mepoisson
        15.9.3 Approximate Cox regression with normal             random intercept and random coefficient
            Estimation using mepoisson
    15.10 Multilevel accelerated failure-time           models
        15.10.1 Log-normal model with gamma shared             frailty
            Estimation using streg
        15.10.2 Log-normal model with log-normal shared             frailty
            Estimation using mestreg
        15.10.3 Log-normal model with normal random             intercept and random coefficient
            Estimation using mestreg
    15.11 Fixed-effects approach
        15.11.1 Stratified Cox regression with             subject-specific baseline hazards
            Estimation using stcox, strata
    15.12 ❖ Different approaches to recurrent-event           data
        15.12.1 Total-time risk interval
        15.12.2 Counting-process risk interval
        15.12.3 Gap-time risk interval
    15.13 Summary and further reading
    15.14 Exercises
VIII Models with nested and crossed random         effects
16 Models with nested and crossed random         effects
    16.1 Introduction
    16.2 Did the Guatemalan-immunization campaign           work?
    16.3 A three-level random-intercept logistic           regression model
        16.3.1 Model specification
        16.3.2 Measures of dependence and             heterogeneity
            Types of residual intraclass correlations of               the latent responses
            Types of median odds ratios
        16.3.3 Three-stage formulation
        16.3.4 Estimation
            Using melogit
            Using gllamm
    16.4 A three-level random-coefficient logistic           regression model
        16.4.1 Estimation
            Using melogit
            Using gllamm
    16.5 Prediction of random effects
        16.5.1 Empirical Bayes prediction
        16.5.2 Empirical Bayes modal prediction
    16.6 Different kinds of predicted           probabilities
        16.6.1 Predicted population-averaged or marginal             probabilities: New clusters
        16.6.2 Predicted median or conditional             probabilities
        16.6.3 Predicted posterior mean probabilities:             Existing clusters
    16.7 Do salamanders from different populations mate           successfully?
    16.8 Crossed random-effects logistic           regression
        16.8.1 Setup for estimating crossed             random-effects model using melogit
        16.8.2 Approximate maximum likelihood             estimation
            Estimation using melogit
        16.8.3 Bayesian estimation
            Brief introduction to Bayesian inference
            Priors for the salamander data
            Estimation using bayes: melogit
        16.8.4 Estimates compared
        16.8.5 Fully Bayesian versus empirical Bayesian             inference for random effects
    16.9 Summary and further reading
    16.10 Exercises
B Syntax for gllamm, eq, and gllapred: The bare         essentials
C Syntax for gllamm
D Syntax for gllapred
E Syntax for gllasim
References
Author index
Subject index