+ - 0:00:00
Notes for current slide
Notes for next slide

Structured Effects with Generalized Nonlinear Models

Heather Turner
Research Software Engineering Fellow
University of Warwick

@HeathrTurnr

24 November 2021

heatherturner.net/talks/uRos2021

1 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Overview

  • Introduction to Generalized Nonlinear Models and the gnm package

  • Case study of a structured main effect:

    Modelling the Consequences of Social Mobility with the Diagonal Reference Model

  • Case study of a structured interaction:

    Modelling Mortality Trends with the Lee-Carter Model

2 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Introduction to GNMs and the gnm package

3 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Linear models

yi^=β0+β1xi+β2zi+ϵi

4 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Linear models

yi^=β0+β1xi+β2zi+ϵi

E(yi)=β0+exp(θ1)xi+β2xi2

4 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Linear models

yi^=β0+β1xi+β2zi+ϵi

E(yi)=β0+exp(θ1)xi+β2xi2

In general

E(yi)=ηi(β)=linear function of unknown parameters

Also assumes variance essentially constant: Var(yi)=ϕai with ai known (often ai1).

4 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Generalized linear models

Problems with linear models in many applications:

  • range of y is restricted (e.g., y is a count, or is binary, or is a duration)
  • effects are not additive
  • variance depends on mean (e.g., large mean large variance)

Generalized linear models specify a non-linear link function and variance function to allow for such things, while maintaining the simple interpretation of linear models.

5 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

GLM structure

A GLM is made up of a linear predictor

η=β0+β1x1+...+βpxp

a link function that describes how E(Y)=μ depends on η

g(μ)=η

a variance function that describes how Var(Y) depends on μ

Var(Y)=ϕaiV(μ)

where the dispersion parameter ϕ is a constant and ai are known values.

6 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Logistic regression

Suppose proportions Yi/ni with YiBinomial(ni,pi). Then

E(Yi/ni)=piVar(Yi/ni)=1nipi(1pi)

The variance function is

V(μi)=μi(1μi) Using a logit link maps the mean from (0,1) to (,) g(μi)=logit(μi)=log(μi1μi)

7 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Log-linear model

Suppose counts YiPoisson(λi). Then

E(Yi)=λiVar(Yi)=λi

The variance function is

V(μi)=μi Using a log link maps the mean from (0,) to (,)

g(μi)=log(μi)

8 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Generalized Nonlinear Models

A generalized nonlinear model (GNM) is the same as a GLM except that we have

g(μ)=η(x;β)

where η(x;β) is nonlinear in the parameters β.

Equivalently an extension of nonlinear least squares model, where the variance of Y is allowed to depend on the mean.

Using a nonlinear predictor can produce a more parsimonious and interpretable model.

9 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Example: Mental Health Status

A study of 1660 children from Manhattan recorded their mental impairment and parents' socioeconomic status (Agresti, 2002)

library(vcd)
mosaic(mentalHealth)

10 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Independence

A simple analysis of these data might be to test for independence of MHS and SES using a chi-squared test.

Equivalent to testing goodness-of-fit of the independence model log(μrc)=αr+βc

Such a test compares the independence model to the saturated model

log(μrc)=αr+βc+γrc which may be over-complex.

11 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Row-column Association

One intermediate model is the Row-Column association model:

log(μrc)=αr+βc+ϕrψc (Goodman, 1979), an example of a multiplicative interaction model.

12 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Row-column Association

One intermediate model is the Row-Column association model:

log(μrc)=αr+βc+ϕrψc (Goodman, 1979), an example of a multiplicative interaction model.

For the mental health data:

Analysis of Deviance Table
Model 1: Freq ~ SES + MHS
Model 2: Freq ~ SES + MHS + Mult(SES, MHS)
Model 3: Freq ~ SES + MHS + SES:MHS
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 15 47.4
2 8 3.6 7 43.8 2.3e-07
3 0 0.0 8 3.6 0.89
12 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

gnm R package

  • Provides a framework for specifying, fitting and criticizing GNMs in R
  • Designed to be glm-like
    • common arguments, returned objects, methods, etc
    • equivalent results for linear predictors
  • For family = gaussian equivalent to nls fit.
13 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Model Specification

  • Linear terms in the model may be specified in the usual way, e.g.
y ∼ a + b + a:b
  • Nonlinear terms must be specified using functions of class "nonlin"
    • specify structure of term, possible also labels & starting values
    • provided functions: Exp, Inv, Mult, MultHomog, Dref
    • custom functions
14 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

RC model in gnm

RCmod <- gnm(Freq ~ SES + MHS + Mult(SES, MHS), family = poisson,
data = mentalHealth, verbose = FALSE, ofInterest = "Mult")
coef(RCmod)
Coefficients of interest:
Mult(., MHS).SESA Mult(., MHS).SESB Mult(., MHS).SESC
0.37314 0.37649 0.10062
Mult(., MHS).SESD Mult(., MHS).SESE Mult(., MHS).SESF
-0.04575 -0.40728 -0.70431
Mult(SES, .).MHSwell Mult(SES, .).MHSmild Mult(SES, .).MHSmoderate
0.76609 0.07004 -0.05557
Mult(SES, .).MHSimpaired
-0.63370
15 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Parameterisation in (G)LMs

The independence model was defined in an over-parameterised form:

log(μrc)=αr+βc=(αr+1)+(βc1)=αr+βc

glm will impose identifiability constraints, by default α1=β1=0

16 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Parameterisation in (G)NMs

In the Row-Column Association model

αr+βr+ϕrψc=αr+(βrψc)+(ϕr+1)ψc=αr+βr+(2ϕr)(ψc/2)

17 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Parameterisation in (G)NMs

In the Row-Column Association model

αr+βr+ϕrψc=αr+(βrψc)+(ϕr+1)ψc=αr+βr+(2ϕr)(ψc/2)

We need to constrain both location and scale for identifiability.

17 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Parameterisation in (G)NMs

In the Row-Column Association model

αr+βr+ϕrψc=αr+(βrψc)+(ϕr+1)ψc=αr+βr+(2ϕr)(ψc/2)

We need to constrain both location and scale for identifiability.

gnm works with overparameterised models

  • will return one of infinite parameterisations at random
  • by default, nonlinear parameters are not identifiable: standard error NA
  • constrain parameters of interest, as required
17 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Case study: Diagonal Reference Model

18 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Background

Data from the ONS Longitudinal Survey, which links census and vital event data for 1% of the population of England and Wales.

Objective is to study the effect of social mobility on long-term limiting illness (LLTI)

a long-standing illness, health problem or handicap that limits a person's activities or the work they can do

Data include

  • National Statistics socio-economic classification (7 classes, high man/prof to routine) in 1991 (origin) and 2001 (destination).
  • Age, gender
19 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Bartley & Plewis Models

Bartley & Plewis (2007) modelled trajectories by combining social mobility effects with origin or destination effects.

For example, the origin + mobility model includes the term θij={αi+δ0i>jαi+δ1i=jαi+δ2i<j

where αi is the effect for origin i and i>j denotes moving to a more favourable destination class.

20 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

GLM mobility models

The full models are logistic regression models for probability of LLTI in 2001.

Since the mobility models are GLMs, they can be fitted with glm, e.g.

dest_m <- glm(llti ~ age1991 + origin + mobility,
family = binomial, data = LS_m)

Age in 1991 was included as a covariate and men and women were modelled separately.

21 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Social Mobility Effects

The exponential of the mobility parameters gives the odds ratios of LLTI, e.g. for men

Origin + Mobility Destination + Mobility
To more favourable 1.00 1.00
Stable 1.21 0.71
To less favourable 1.45 0.52
  • given origin, odds of LLTI increased by downward mobility
  • given destination, odds of LLTI decreased by downward mobility
22 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Weighted Residuals

The working residuals from the GLM fit can be averaged over each origin-destination combination to provide an indicator of fit for each trajectory

These average residuals can be standardized to be approximately N(0, 1) assuming the model is correct.

Passing a GLM fit to the mosaic function will produce a plot of the weighted residuals, with the p-value for a χ2 goodness-of-fit test.

23 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Origin + Mobility

Destination + Mobility

Although two sides of the same coin, the two models capture different features of the data.

24 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Diagonal Reference Model

Sobel (1991) proposed the diagonal reference model, which combines origin, destination and social mobility effects w1γi+(1w1)γj The effect of moving from class i to class j is a weighted sum of the diagonal effects γi, where γi is the effect for stable individuals in that class.

This is a alternative to adding main effects of origin and destination.

25 / 51

strutured main effect

heatherturner.net/talks/uRos2021   @HeathrTurnr

Dref in gnm

A diagonal reference term may be specified via Dref(fac1, fac2, ...)

The weights are constrained to be non-negative and to sum to one by defining them as wf=eδfieδi and estimating unconstrained δf parameters.

26 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr


Following the previous analysis we fit the diagonal reference model with age as a covariate and fit models for men and women separately.

dref_m <- gnm(llti ~ age1991 + Dref(origin, dest),
family = binomial, data = LS_m)
coef(summary(classMobility))
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.30795 NA NA NA
age1991 0.06357 0.001221 52.08 < 2e-16
Dref(origin, dest)delta1 -0.05275 NA NA NA
Dref(origin, dest)delta2 -0.53378 NA NA NA
Dref(., .).origin|desthigh man/pro -3.74444 NA NA NA
Dref(., .).origin|destlow man/pro -3.29767 NA NA NA
Dref(., .).origin|destintermed -2.74320 NA NA NA
Dref(., .).origin|destsmall empl/own ac -2.99956 NA NA NA
Dref(., .).origin|destlow sup/tech -2.79887 NA NA NA
Dref(., .).origin|destsemi-routine -2.47277 NA NA NA
Dref(., .).origin|destroutine -2.27206 NA NA NA
27 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Diagonal Effects

The model for the stable individuals is an ordinary logistic regression

logit(piik)=β0+β1agek+γi Setting γ1=0, the diagonal effects are log odds ratios of LLTI in class i against the first class for a given age

log(piik/(1piik)p11k/(1p11k))=(β0+β1agek+γi)(β0+β1agek)=γi

28 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr


dref_m <- gnm(llti ~ age1991 + Dref(origin, dest),
family = binomial, data = LS_m, constrain = "(high man)")
coef(summary(classMobility))
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.05239 0.063708 -79.305 < 2e-16
age1991 0.06357 0.001221 52.084 < 2e-16
Dref(origin, dest)delta1 NA NA NA NA
Dref(origin, dest)delta2 NA NA NA NA
Dref(., .).origin|desthigh man/pro 0.00000 NA NA NA
Dref(., .).origin|destlow man/pro 0.44677 0.053225 8.394 < 2e-16
Dref(., .).origin|destintermed 1.00124 0.063136 15.858 < 2e-16
Dref(., .).origin|destsmall empl/own ac 0.74488 0.052400 14.227 < 2e-16
Dref(., .).origin|destlow sup/tech 0.94557 0.053376 17.715 < 2e-16
Dref(., .).origin|destsemi-routine 1.27167 0.053763 23.653 < 2e-16
Dref(., .).origin|destroutine 1.47237 0.049871 29.524 < 2e-16
29 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Health Inequality

30 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

DrefWeights

Setting δ1=0 is enough to constrain the weights, so that

w1=11+eδ2w2=eδ21+eδ2 DrefWeights() computes these weights and their standard errors, e.g.

DrefWeights(dref_m)
$origin
weight se
0.61799 0.02649
$dest
weight se
0.38201 0.02649
31 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Diluting Effect of Social Mobility

The ratio of origin weight to destination weight quantifies the diluting effect of social mobility on health inequality

  • 1:0 social mobility has no effect on individual
  • 0:1 social mobility has no effect on inequality
  • otherwise social mobility increases P(LLTI) in the upper classes and decreases P(LLTI) in the lower classes.

The larger the origin weight, the greater the diluting effect.

32 / 51

mobile assume probability of stable in destination class

heatherturner.net/talks/uRos2021   @HeathrTurnr

Diagonal Weights

The diagonal weights for the LLTI models are

Men Women
Origin 0.62 (0.03) 0.41 (0.03)
Destination 0.38 (0.03) 0.59 (0.03)

Since their destination class is given more weight, social mobility has a greater impact for women.

33 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Model Comparison

The models can be compared by the difference in deviance from the null model:

Men
Women
Deviance Df Deviance DF
Origin + mobility 4050 9 3194 9
Destination + mobility 4026 9 3273 9
Diagonal reference 4121 8 3312 8

The diagonal reference model reduces the deviance the most despite requiring fewer degrees of freedom.

34 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Case study: Lee-Carter Model

35 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Background

Data from the Human Mortality Database, which provides open, international access to population and mortality data from 41 countries.

Here we consider data on deaths for men aged 20 to 99 in Canada from 1921 to 2003.

The data comprise the variables

  • Year: year death recorded
  • Age: age at time of death
  • Deaths: number of deaths
  • Exposure: number at risk model
36 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Lee-Carter model

Lee & Carter (1992) proposed a model for age-specific population mortality rates that has been the basis of many subsequent analyses.

Suppose that death count Day for individuals of age a in year y has mean μay and quasi-Poisson variance ϕμay.

Lee-Carter model

log(μay/eay)=αa+βaγy,log(μay)=log(eay)+αa+βaγy

where eay is the exposure (number of lives at risk).

37 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Lee-Carter in gnm

The Lee-Carter model can be fitted with gnm as follows

LCmodel <- gnm(Deaths ~ Mult(Exp(Age), Year),
eliminate = Age, offset = log(Exposure),
family = "quasipoisson")
  • Exp(Age) is used to constrain the parameters representing the sensitivity of age group a to have the same sign,
  • Age is eliminated since it will typically have many levels,
  • offset is used to add the log exposure with a coefficient of 1
  • the quasipoisson family is used so that the dispersion parameter ϕ is estimated rather than fixed to 1.
38 / 51

The eliminate argument of gnm is used to specify that the Age parameters replace the intercept in the model. This has two benefits:

  • gnm exploits the structure of these parameters to estimate them more efficiently
  • these nuisance parameters are excluded from summaries of the model object

heatherturner.net/talks/uRos2021   @HeathrTurnr


Residuals are more dispersed for ages 25-35 and 50-65, as well as pre-1930

39 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Residuals by birth cohort (calendar year - age) show a clear trend

40 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Renshaw & Haberman extension

Renshaw & Haberman (2006) considered a cohort-based extension

log(μay/eay)=αa+βa(0)γy+βa(1)λya

This is easy to specify in gnm

Deaths ~ Mult(Exp(Age), Year) + Mult(Exp(Age), Cohort)

But not so easy to fit!

  • Year, Age and Cohort are directly related
  • now have 405 multiplicative parameters, data for some cohorts is sparse
41 / 51

Haberman and Renshaw rejected the model on practical grounds

heatherturner.net/talks/uRos2021   @HeathrTurnr

Fitting the Renshaw & Haberman model

Step 1. Initially fit the model with age multipliers fixed to 1

log(μay/eay)=αa+γy+λya

Step 2. Fit the full model with starting values from step 1, without constraining the age multipliers.

  • Converges in 54 iterations
  • Find the multiplier for age 99 in the first multiplicative term is negative

Step 3. Re-fit using Exp(Age) to constrain the age multipliers to be positive, setting β99(0)=exp(1000)=0

42 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Residuals from Renshaw & Haberman model

43 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Age multipliers

An advantage of re-parameterising the model such that βa(0)=exp(δa(0))βa(1)=exp(δa(1)) is that simple contrasts (differences) of δa(n) are identifiable.

We can obtain the contrasts and standard errors with getContrasts()

AgeMult2 <- pickCoef(APCmodel_GNM, "Mult(Exp(.), Cohort)", fixed = TRUE)
getContrasts(APCmodel_GNM, AgeMult2, check = FALSE)$qvframe[1:3,]
estimate SE quasiSE quasiVar
Mult(Exp(.), Cohort).Age20 0.0000 0.0000 0.0810 0.006561
Mult(Exp(.), Cohort).Age21 -0.2361 0.1294 0.1027 0.010543
Mult(Exp(.), Cohort).Age22 -0.1666 0.1290 0.1021 0.010420
45 / 51

delta are log-multipliers

the relative approximation errors for the quasi-variances of these contrasts range from −24% to 70.1%, so the quasi-variance approximation performs poorly in this example

Figure shows a clear pattern in the log-multipliers for age in both terms, with the sensitivity of age initially rising until about 35 years of age, then decreasing with increasing age. However there is a lot of local variability with the estimates rising and falling in consecutive years and it is clear that the precision of the estimates decreases with age. The large uncertainty in the log-multipliers of the cohort effects for ages 85 and above is due to the early cohorts, for which we only have data at the end of the age range.

heatherturner.net/talks/uRos2021   @HeathrTurnr

Spline model

Define a piecewise linear spline over age:

Age_num <- as.numeric(as.character(Canada$Age))
Age_bs <- bs(Age_num, knots = c(seq(25, 70, by = 5), 80),
degree = 1)

Use instead of age in the Renshaw & Haberman model:

Deaths ~ Mult(Exp(Age_bs), Year) + Mult(Exp(Age_bs), Cohort)
47 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Model comparison

Model Deviance Df Percent
Lee-Carter 32423 6399 NA
Age-Year-Cohort models
Main effects 24730 6318 0.00
Spline 12486 6293 83.71
Renshaw-Haberman 10104 6159 100.00

The spline model preserves the year and cohort multipliers which could be modelled using time series methods for forecasting as in Renshaw & Haberman (2006) - only depends on the differences between multipliers.

48 / 51

so it is not necessary to impose constraints on these parameters

heatherturner.net/talks/uRos2021   @HeathrTurnr

Finding out more

Case studies with reproducibility materials: github.com/hturner/gnm-case-studies

  • So far Lee-Carter and UNIDIFF
  • Diagonal Reference and Rasch models to be added
  • Working paper with D. Firth and I. Kosmidis in progress

See the gnm vignette for further examples

Also github.com/hturner/gnm-half-day-course and github.com/hturner/gnm-day-course

50 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

References

Agresti, A. 2002. Categorical Data Analysis. 2nd ed. New York: Wiley.

Bartley, M, and I Plewis. 2007. 10.1111/j.1467-985X.2006.00464.x.

Goodman, L A. 1979. J. Amer. Statist. Assoc. 10.1080/01621459.1979.10481650.

Lee, R D, and L Carter. 1992. J. Amer. Statist. Assoc. 10.1080/01621459.1992.10475265.

Renshaw, A, and S Haberman. 2006. Insur Math Econ 10.1016/j.insmatheco.2005.12.001.

Sobel, M. E. 1981. Amer. Soc. Rev. 10.2307/2095086.

51 / 51

heatherturner.net/talks/uRos2021   @HeathrTurnr

Overview

  • Introduction to Generalized Nonlinear Models and the gnm package

  • Case study of a structured main effect:

    Modelling the Consequences of Social Mobility with the Diagonal Reference Model

  • Case study of a structured interaction:

    Modelling Mortality Trends with the Lee-Carter Model

2 / 51
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow