heatherturner.net/talks/uRos2021 @HeathrTurnr
Introduction to Generalized Nonlinear Models and the gnm
package
Case study of a structured main effect:
Modelling the Consequences of Social Mobility with the Diagonal Reference Model
Case study of a structured interaction:
Modelling Mortality Trends with the Lee-Carter Model
heatherturner.net/talks/uRos2021 @HeathrTurnr
^yi=β0+β1xi+β2zi+ϵi
E(yi)=β0+exp(θ1)xi+β2x2i
heatherturner.net/talks/uRos2021 @HeathrTurnr
^yi=β0+β1xi+β2zi+ϵi
E(yi)=β0+exp(θ1)xi+β2x2i
In general
E(yi)=ηi(β)=linear function of unknown parameters
Also assumes variance essentially constant: Var(yi)=ϕai with ai known (often ai≡1).
heatherturner.net/talks/uRos2021 @HeathrTurnr
Problems with linear models in many applications:
Generalized linear models specify a non-linear link function and variance function to allow for such things, while maintaining the simple interpretation of linear models.
heatherturner.net/talks/uRos2021 @HeathrTurnr
A GLM is made up of a linear predictor
η=β0+β1x1+...+βpxp
a link function that describes how E(Y)=μ depends on η
g(μ)=η
a variance function that describes how Var(Y) depends on μ
Var(Y)=ϕaiV(μ)
where the dispersion parameter ϕ is a constant and ai are known values.
heatherturner.net/talks/uRos2021 @HeathrTurnr
Suppose proportions Yi/ni with Yi∼Binomial(ni,pi). Then
E(Yi/ni)=piVar(Yi/ni)=1nipi(1−pi)
The variance function is
V(μi)=μi(1−μi) Using a logit link maps the mean from (0,1) to (−∞,∞) g(μi)=logit(μi)=log(μi1−μi)
heatherturner.net/talks/uRos2021 @HeathrTurnr
Suppose counts Yi∼Poisson(λi). Then
E(Yi)=λiVar(Yi)=λi
The variance function is
V(μi)=μi Using a log link maps the mean from (0,∞) to (−∞,∞)
g(μi)=log(μi)
heatherturner.net/talks/uRos2021 @HeathrTurnr
A generalized nonlinear model (GNM) is the same as a GLM except that we have
g(μ)=η(x;β)
where η(x;β) is nonlinear in the parameters β.
Equivalently an extension of nonlinear least squares model, where the variance of Y is allowed to depend on the mean.
Using a nonlinear predictor can produce a more parsimonious and interpretable model.
heatherturner.net/talks/uRos2021 @HeathrTurnr
A study of 1660 children from Manhattan recorded their mental impairment and parents' socioeconomic status (Agresti, 2002)
library(vcd)mosaic(mentalHealth)
heatherturner.net/talks/uRos2021 @HeathrTurnr
A simple analysis of these data might be to test for independence of MHS and SES using a chi-squared test.
Equivalent to testing goodness-of-fit of the independence model log(μrc)=αr+βc
Such a test compares the independence model to the saturated model
log(μrc)=αr+βc+γrc which may be over-complex.
heatherturner.net/talks/uRos2021 @HeathrTurnr
One intermediate model is the Row-Column association model:
log(μrc)=αr+βc+ϕrψc (Goodman, 1979), an example of a multiplicative interaction model.
heatherturner.net/talks/uRos2021 @HeathrTurnr
One intermediate model is the Row-Column association model:
log(μrc)=αr+βc+ϕrψc (Goodman, 1979), an example of a multiplicative interaction model.
For the mental health data:
Analysis of Deviance TableModel 1: Freq ~ SES + MHSModel 2: Freq ~ SES + MHS + Mult(SES, MHS)Model 3: Freq ~ SES + MHS + SES:MHS Resid. Df Resid. Dev Df Deviance Pr(>Chi)1 15 47.4 2 8 3.6 7 43.8 2.3e-073 0 0.0 8 3.6 0.89
heatherturner.net/talks/uRos2021 @HeathrTurnr
glm
-likefamily = gaussian
equivalent to nls
fit.heatherturner.net/talks/uRos2021 @HeathrTurnr
y ∼ a + b + a:b
"nonlin"
Exp
, Inv
, Mult
, MultHomog
, Dref
heatherturner.net/talks/uRos2021 @HeathrTurnr
RCmod <- gnm(Freq ~ SES + MHS + Mult(SES, MHS), family = poisson, data = mentalHealth, verbose = FALSE, ofInterest = "Mult")coef(RCmod)
Coefficients of interest: Mult(., MHS).SESA Mult(., MHS).SESB Mult(., MHS).SESC 0.37314 0.37649 0.10062 Mult(., MHS).SESD Mult(., MHS).SESE Mult(., MHS).SESF -0.04575 -0.40728 -0.70431 Mult(SES, .).MHSwell Mult(SES, .).MHSmild Mult(SES, .).MHSmoderate 0.76609 0.07004 -0.05557 Mult(SES, .).MHSimpaired -0.63370
heatherturner.net/talks/uRos2021 @HeathrTurnr
The independence model was defined in an over-parameterised form:
log(μrc)=αr+βc=(αr+1)+(βc−1)=α∗r+β∗c
glm
will impose identifiability constraints, by default α1=β1=0
heatherturner.net/talks/uRos2021 @HeathrTurnr
In the Row-Column Association model
αr+βr+ϕrψc=αr+(βr−ψc)+(ϕr+1)ψc=αr+βr+(2ϕr)(ψc/2)
heatherturner.net/talks/uRos2021 @HeathrTurnr
In the Row-Column Association model
αr+βr+ϕrψc=αr+(βr−ψc)+(ϕr+1)ψc=αr+βr+(2ϕr)(ψc/2)
We need to constrain both location and scale for identifiability.
heatherturner.net/talks/uRos2021 @HeathrTurnr
In the Row-Column Association model
αr+βr+ϕrψc=αr+(βr−ψc)+(ϕr+1)ψc=αr+βr+(2ϕr)(ψc/2)
We need to constrain both location and scale for identifiability.
gnm
works with overparameterised models
NA
heatherturner.net/talks/uRos2021 @HeathrTurnr
Data from the ONS Longitudinal Survey, which links census and vital event data for 1% of the population of England and Wales.
Objective is to study the effect of social mobility on long-term limiting illness (LLTI)
a long-standing illness, health problem or handicap that limits a person's activities or the work they can do
Data include
heatherturner.net/talks/uRos2021 @HeathrTurnr
Bartley & Plewis (2007) modelled trajectories by combining social mobility effects with origin or destination effects.
For example, the origin + mobility model includes the term θij=⎧⎨⎩αi+δ0i>jαi+δ1i=jαi+δ2i<j
where αi is the effect for origin i and i>j denotes moving to a more favourable destination class.
heatherturner.net/talks/uRos2021 @HeathrTurnr
The full models are logistic regression models for probability of LLTI in 2001.
Since the mobility models are GLMs, they can be fitted with glm
, e.g.
dest_m <- glm(llti ~ age1991 + origin + mobility, family = binomial, data = LS_m)
Age in 1991 was included as a covariate and men and women were modelled separately.
heatherturner.net/talks/uRos2021 @HeathrTurnr
The exponential of the mobility parameters gives the odds ratios of LLTI, e.g. for men
Origin + Mobility | Destination + Mobility | |
---|---|---|
To more favourable | 1.00 | 1.00 |
Stable | 1.21 | 0.71 |
To less favourable | 1.45 | 0.52 |
heatherturner.net/talks/uRos2021 @HeathrTurnr
The working residuals from the GLM fit can be averaged over each origin-destination combination to provide an indicator of fit for each trajectory
These average residuals can be standardized to be approximately N(0, 1) assuming the model is correct.
Passing a GLM fit to the mosaic
function will produce a plot of the weighted residuals, with the p-value for a χ2 goodness-of-fit test.
heatherturner.net/talks/uRos2021 @HeathrTurnr
Origin + Mobility
Destination + Mobility
Although two sides of the same coin, the two models capture different features of the data.
heatherturner.net/talks/uRos2021 @HeathrTurnr
Sobel (1991) proposed the diagonal reference model, which combines origin, destination and social mobility effects w1γi+(1−w1)γj The effect of moving from class i to class j is a weighted sum of the diagonal effects γi, where γi is the effect for stable individuals in that class.
This is a alternative to adding main effects of origin and destination.
strutured main effect
heatherturner.net/talks/uRos2021 @HeathrTurnr
A diagonal reference term may be specified via Dref(fac1, fac2, ...)
The weights are constrained to be non-negative and to sum to one by defining them as wf=eδf∑ieδi and estimating unconstrained δf parameters.
heatherturner.net/talks/uRos2021 @HeathrTurnr
Following the previous analysis we fit the diagonal reference model with age as a covariate and fit models for men and women separately.
dref_m <- gnm(llti ~ age1991 + Dref(origin, dest), family = binomial, data = LS_m)coef(summary(classMobility))
Estimate Std. Error z value Pr(>|z|)(Intercept) -1.30795 NA NA NAage1991 0.06357 0.001221 52.08 < 2e-16Dref(origin, dest)delta1 -0.05275 NA NA NADref(origin, dest)delta2 -0.53378 NA NA NADref(., .).origin|desthigh man/pro -3.74444 NA NA NADref(., .).origin|destlow man/pro -3.29767 NA NA NADref(., .).origin|destintermed -2.74320 NA NA NADref(., .).origin|destsmall empl/own ac -2.99956 NA NA NADref(., .).origin|destlow sup/tech -2.79887 NA NA NADref(., .).origin|destsemi-routine -2.47277 NA NA NADref(., .).origin|destroutine -2.27206 NA NA NA
heatherturner.net/talks/uRos2021 @HeathrTurnr
The model for the stable individuals is an ordinary logistic regression
logit(piik)=β0+β1agek+γi Setting γ1=0, the diagonal effects are log odds ratios of LLTI in class i against the first class for a given age
log(piik/(1−piik)p11k/(1−p11k))=(β0+β1agek+γi)−(β0+β1agek)=γi
heatherturner.net/talks/uRos2021 @HeathrTurnr
dref_m <- gnm(llti ~ age1991 + Dref(origin, dest), family = binomial, data = LS_m, constrain = "(high man)")coef(summary(classMobility))
Estimate Std. Error z value Pr(>|z|)(Intercept) -5.05239 0.063708 -79.305 < 2e-16age1991 0.06357 0.001221 52.084 < 2e-16Dref(origin, dest)delta1 NA NA NA NADref(origin, dest)delta2 NA NA NA NADref(., .).origin|desthigh man/pro 0.00000 NA NA NADref(., .).origin|destlow man/pro 0.44677 0.053225 8.394 < 2e-16Dref(., .).origin|destintermed 1.00124 0.063136 15.858 < 2e-16Dref(., .).origin|destsmall empl/own ac 0.74488 0.052400 14.227 < 2e-16Dref(., .).origin|destlow sup/tech 0.94557 0.053376 17.715 < 2e-16Dref(., .).origin|destsemi-routine 1.27167 0.053763 23.653 < 2e-16Dref(., .).origin|destroutine 1.47237 0.049871 29.524 < 2e-16
heatherturner.net/talks/uRos2021 @HeathrTurnr
Setting δ1=0 is enough to constrain the weights, so that
w1=11+eδ2w2=eδ21+eδ2
DrefWeights()
computes these weights and their standard errors, e.g.
DrefWeights(dref_m)
$origin weight se0.61799 0.02649$dest weight se0.38201 0.02649
heatherturner.net/talks/uRos2021 @HeathrTurnr
The ratio of origin weight to destination weight quantifies the diluting effect of social mobility on health inequality
The larger the origin weight, the greater the diluting effect.
mobile assume probability of stable in destination class
heatherturner.net/talks/uRos2021 @HeathrTurnr
The diagonal weights for the LLTI models are
Men | Women | |
---|---|---|
Origin | 0.62 (0.03) | 0.41 (0.03) |
Destination | 0.38 (0.03) | 0.59 (0.03) |
Since their destination class is given more weight, social mobility has a greater impact for women.
heatherturner.net/talks/uRos2021 @HeathrTurnr
The models can be compared by the difference in deviance from the null model:
Men |
Women |
|||
---|---|---|---|---|
Deviance | Df | Deviance | DF | |
Origin + mobility | 4050 | 9 | 3194 | 9 |
Destination + mobility | 4026 | 9 | 3273 | 9 |
Diagonal reference | 4121 | 8 | 3312 | 8 |
The diagonal reference model reduces the deviance the most despite requiring fewer degrees of freedom.
heatherturner.net/talks/uRos2021 @HeathrTurnr
Data from the Human Mortality Database, which provides open, international access to population and mortality data from 41 countries.
Here we consider data on deaths for men aged 20 to 99 in Canada from 1921 to 2003.
The data comprise the variables
Year
: year death recordedAge
: age at time of deathDeaths
: number of deathsExposure
: number at risk
modelheatherturner.net/talks/uRos2021 @HeathrTurnr
Lee & Carter (1992) proposed a model for age-specific population mortality rates that has been the basis of many subsequent analyses.
Suppose that death count Day for individuals of age a in year y has mean μay and quasi-Poisson variance ϕμay.
Lee-Carter model
log(μay/eay)=αa+βaγy,⇒log(μay)=log(eay)+αa+βaγy
where eay is the exposure (number of lives at risk).
heatherturner.net/talks/uRos2021 @HeathrTurnr
The Lee-Carter model can be fitted with gnm
as follows
LCmodel <- gnm(Deaths ~ Mult(Exp(Age), Year), eliminate = Age, offset = log(Exposure), family = "quasipoisson")
Exp(Age)
is used to constrain the parameters representing the
sensitivity of age group a to have the same sign,Age
is eliminated since it will typically have many levels,offset
is used to add the log exposure with a coefficient of 1quasipoisson
family is used so that the dispersion parameter ϕ is estimated rather than fixed to 1.The eliminate argument of gnm is used to specify that the Age parameters replace the intercept in the model. This has two benefits:
heatherturner.net/talks/uRos2021 @HeathrTurnr
Residuals are more dispersed for ages 25-35 and 50-65, as well as pre-1930
heatherturner.net/talks/uRos2021 @HeathrTurnr
Residuals by birth cohort (calendar year - age) show a clear trend
heatherturner.net/talks/uRos2021 @HeathrTurnr
Renshaw & Haberman (2006) considered a cohort-based extension
log(μay/eay)=αa+β(0)aγy+β(1)aλy−a
This is easy to specify in gnm
Deaths ~ Mult(Exp(Age), Year) + Mult(Exp(Age), Cohort)
But not so easy to fit!
Year
, Age
and Cohort
are directly relatedHaberman and Renshaw rejected the model on practical grounds
heatherturner.net/talks/uRos2021 @HeathrTurnr
Step 1. Initially fit the model with age multipliers fixed to 1
log(μay/eay)=αa+γy+λy−a
Step 2. Fit the full model with starting values from step 1, without constraining the age multipliers.
Step 3. Re-fit using Exp(Age)
to constrain the age multipliers to be positive, setting
β(0)99=exp(−1000)=0
heatherturner.net/talks/uRos2021 @HeathrTurnr
An advantage of re-parameterising the model such that β(0)a=exp(δ(0)a)β(1)a=exp(δ(1)a) is that simple contrasts (differences) of δ(n)a are identifiable.
We can obtain the contrasts and standard errors with getContrasts()
AgeMult2 <- pickCoef(APCmodel_GNM, "Mult(Exp(.), Cohort)", fixed = TRUE)getContrasts(APCmodel_GNM, AgeMult2, check = FALSE)$qvframe[1:3,]
estimate SE quasiSE quasiVarMult(Exp(.), Cohort).Age20 0.0000 0.0000 0.0810 0.006561Mult(Exp(.), Cohort).Age21 -0.2361 0.1294 0.1027 0.010543Mult(Exp(.), Cohort).Age22 -0.1666 0.1290 0.1021 0.010420
delta are log-multipliers
the relative approximation errors for the quasi-variances of these contrasts range from −24% to 70.1%, so the quasi-variance approximation performs poorly in this example
Figure shows a clear pattern in the log-multipliers for age in both terms, with the sensitivity of age initially rising until about 35 years of age, then decreasing with increasing age. However there is a lot of local variability with the estimates rising and falling in consecutive years and it is clear that the precision of the estimates decreases with age. The large uncertainty in the log-multipliers of the cohort effects for ages 85 and above is due to the early cohorts, for which we only have data at the end of the age range.
heatherturner.net/talks/uRos2021 @HeathrTurnr
Define a piecewise linear spline over age:
Age_num <- as.numeric(as.character(Canada$Age))Age_bs <- bs(Age_num, knots = c(seq(25, 70, by = 5), 80), degree = 1)
Use instead of age in the Renshaw & Haberman model:
Deaths ~ Mult(Exp(Age_bs), Year) + Mult(Exp(Age_bs), Cohort)
heatherturner.net/talks/uRos2021 @HeathrTurnr
Model | Deviance | Df | Percent |
---|---|---|---|
Lee-Carter | 32423 | 6399 | NA |
Age-Year-Cohort models | |||
Main effects | 24730 | 6318 | 0.00 |
Spline | 12486 | 6293 | 83.71 |
Renshaw-Haberman | 10104 | 6159 | 100.00 |
The spline model preserves the year and cohort multipliers which could be modelled using time series methods for forecasting as in Renshaw & Haberman (2006) - only depends on the differences between multipliers.
so it is not necessary to impose constraints on these parameters
heatherturner.net/talks/uRos2021 @HeathrTurnr
Case studies with reproducibility materials: github.com/hturner/gnm-case-studies
See the gnm
vignette for further examples
Also github.com/hturner/gnm-half-day-course and github.com/hturner/gnm-day-course
heatherturner.net/talks/uRos2021 @HeathrTurnr
Agresti, A. 2002. Categorical Data Analysis. 2nd ed. New York: Wiley.
Bartley, M, and I Plewis. 2007. 10.1111/j.1467-985X.2006.00464.x.
Goodman, L A. 1979. J. Amer. Statist. Assoc. 10.1080/01621459.1979.10481650.
Lee, R D, and L Carter. 1992. J. Amer. Statist. Assoc. 10.1080/01621459.1992.10475265.
Renshaw, A, and S Haberman. 2006. Insur Math Econ 10.1016/j.insmatheco.2005.12.001.
Sobel, M. E. 1981. Amer. Soc. Rev. 10.2307/2095086.
heatherturner.net/talks/uRos2021 @HeathrTurnr
Introduction to Generalized Nonlinear Models and the gnm
package
Case study of a structured main effect:
Modelling the Consequences of Social Mobility with the Diagonal Reference Model
Case study of a structured interaction:
Modelling Mortality Trends with the Lee-Carter Model
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |