2.1 The Treatment of Lead-Exposed Children (TLC) trial was a placebo-controlled, randomized study of succimer (a chelating agent) in children with blood lead levels of 20 to 44 micrograms/dL. Recall that the data consist of four repeated measurements of blood lead levels obtained at baseline ( or week 0), week 1, week 4, and week 6 on 100 children who were randomly assigned to chelation treatment with succimer or to placebo. For this problem set we focus only on the 50 children assigned to chelation treatment with succimer. The raw data are stored in an external file: lead.dat Each row of the data set contains the following 5 variables:
ID Y1 Y2 Y3 Y4
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 0.3.5
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## Warning: package 'ggplot2' was built under R version 4.2.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(readr)
TLC_50 <- read_table("E:/University(3)/Longitudinal/EXE/TLC.50.txt",
col_names = FALSE)
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## X1 = col_double(),
## X2 = col_double(),
## X3 = col_double(),
## X4 = col_double(),
## X5 = col_double()
## )
names(TLC_50)=c("id","w0","w1","w4","w6")
head(TLC_50)
## # A tibble: 6 × 5
## id w0 w1 w4 w6
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 26.5 14.8 19.5 21
## 2 2 25.8 23 19.1 23.2
## 3 3 20.4 2.8 3.2 9.4
## 4 4 20.4 5.4 4.5 11.9
## 5 5 24.8 23.1 24.6 30.9
## 6 6 27.9 6.3 18.5 16.3
2.1.1 Read the data from the external file and calculate the sample means, standard deviations, and variances of the blood lead levels at each occasion.
succimer=round((TLC_50 %>%
select(w0,w1,w4,w6) %>%
summarise_all(list(~mean(.),~sd(.),~var(.))))
,2)
succimer
## # A tibble: 1 × 12
## w0_mean w1_mean w4_mean w6_mean w0_sd w1_sd w4_sd w6_sd w0_var w1_var w4_var
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 26.5 13.5 15.5 20.8 5.02 7.67 7.85 9.25 25.2 58.9 61.7
## # … with 1 more variable: w6_var <dbl>
2.1.2 Construct a time plot of the mean blood lead levels versus time (in weeks). Describe the general characteristics of the time trend.
succ.mean.t=ts(as.numeric(succimer[-(5:12)]),0,frequency = 0.5)
plot(succ.mean.t,xlab="Time(week)",ylab="Mean blood lead level(mcg/dL)"
,xlim=c(0,6),ylim=c(10,30),lwd=2, col= "pink4")
2.1.3 Calculate the 4 x 4 covariance and correlation matrices for the four repeated measures of blood lead levels.
cor(TLC_50[-(1)])
## w0 w1 w4 w6
## w0 1.0000000 0.4014589 0.3839654 0.4951063
## w1 0.4014589 1.0000000 0.7308221 0.5069743
## w4 0.3839654 0.7308221 1.0000000 0.4548224
## w6 0.4951063 0.5069743 0.4548224 1.0000000
cov(TLC_50[-(1)])
## w0 w1 w4 w6
## w0 25.20980 15.46543 15.13800 22.98543
## w1 15.46543 58.86706 44.02907 35.96596
## w4 15.13800 44.02907 61.65715 33.02197
## w6 22.98543 35.96596 33.02197 85.49465
2.1.4 Verify that the diagonal elements of the covariance matrix are the variances by comparing to the descriptive statistics obtained in Problem 2.1.1.
diag(round(cov(TLC_50[-(1)]),2))==succimer[9:12] #YES
## w0_var w1_var w4_var w6_var
## [1,] TRUE TRUE TRUE TRUE
2.1.5 Verify that the correlation between blood lead levels at baseline (week 0) and week 1 is equal to the covariance between blood lead levels at baseline and week 1, divided by the product of the standard deviations of the blood lead levels at baseline and week 1.
round((cov(TLC_50[-(1)])[1,2])/(succimer[,5]*succimer[,6]),2)==
round(cor(TLC_50[-(1)])[1,2],2) #YES
## w0_sd
## [1,] TRUE
5.1 In the National Cooperative Gallstone Study (NCGS), one of the major interestswas to study the safety of the drug chenodiol for the treatment of cholesterol gallstones (Schoenfield et al., 1981; Wei and Lachin, 1984). In this study, patients were randomly assigned to high-dose (750 mg per day), low-dose (375 mg per day), or placebo. We focus on a subset of data on patients who had floating gallstones and who were assigned to the high-dose and placebo groups. In the NCGS it was suggested that chenodiol would dissolve gallstones but, in doing so, might increase levels of serum cholesterol. As a result serum cholesterol (mg/dL) was measured at baseline and at 6, 12, 20, and 24 months of follow-up. Many cholesterol measurements are missing because of missed visits, laboratory specimens were lost or inadequate, or patient follow-up was terminated. The NCGS serum cholesterol data are stored in an external file: cholesterol.dat Each row of the data set contains the following seven variables:
Group ID Y1 Y2 Y3 Y4 Y5
Note: The categorical variable Group is coded 1 = High-Dose, 2 = Placebo.
library (tidyverse)
library (nlme)
##
## Attaching package: 'nlme'
## The following object is masked from 'package:dplyr':
##
## collapse
library(foreign)
cholesterol=read.table("E:/University(3)/Longitudinal/EXE/cholesterol.txt",
quote="\"", comment.char="", na.strings=".")
names(cholesterol) = c("trt","ID","y1","y2","y3","y4","y5")
head(cholesterol)
## trt ID y1 y2 y3 y4 y5
## 1 1 1 178 246 295 228 274
## 2 1 2 254 260 278 245 340
## 3 1 3 185 232 215 220 292
## 4 1 4 219 268 241 260 320
## 5 1 5 205 232 265 242 230
## 6 1 6 182 213 173 200 193
5.1.1 Read the data from the external file and keep it in a “multivariate” or “wide” format.
str( cholesterol )
## 'data.frame': 103 obs. of 7 variables:
## $ trt: int 1 1 1 1 1 1 1 1 1 1 ...
## $ ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ y1 : int 178 254 185 219 205 182 310 191 245 229 ...
## $ y2 : int 246 260 232 268 232 213 334 204 270 200 ...
## $ y3 : int 295 278 215 241 265 173 290 227 209 238 ...
## $ y4 : int 228 245 220 260 242 200 286 228 255 259 ...
## $ y5 : int 274 340 292 320 230 193 248 196 213 221 ...
5.1.2 Calculate the sample means, standard deviations, and variances of the serum cholesterol levels at each occasion for each treatment group.
High_Dose = cholesterol %>%
filter (trt==1) %>%
select (y1,y2,y3,y4,y5)
Placebo = cholesterol %>%
filter(trt==2) %>%
select (y1,y2,y3,y4,y5)
(mean.High_Dose= apply(High_Dose,2,mean,na.rm=T))
## y1 y2 y3 y4 y5
## 226.0161 245.5323 252.0182 256.7955 254.5526
(mean.Placebo= apply(Placebo,2,mean,na.rm=T))
## y1 y2 y3 y4 y5
## 235.9268 243.1707 244.7632 257.6000 257.4839
(sd.High_Dose= apply(High_Dose,2,sd,na.rm=T))
## y1 y2 y3 y4 y5
## 39.66437 39.45228 38.32922 34.48935 49.96198
(sd.Placebo= apply(Placebo,2,sd,na.rm=T))
## y1 y2 y3 y4 y5
## 55.87459 49.23967 46.11058 51.14179 49.38817
(var.High_Dose= apply(High_Dose,2,var,na.rm=T))
## y1 y2 y3 y4 y5
## 1573.262 1556.483 1469.129 1189.515 2496.200
(var.Placebo= apply(Placebo,2,var,na.rm=T))
## y1 y2 y3 y4 y5
## 3121.970 2424.545 2126.186 2615.482 2439.191
5.1.3 On a single graph, construct a time plot that displays the mean serum cholesterol versus time (in months) for the two treatment group. Describe the general characteristics of the time trends for the two groups.
H_dose.mean.t = ts (mean.High_Dose ,0 ,frequency = 0.5)
plac.mean.t = ts (mean.Placebo ,0 ,frequency = 0.5)
plot (H_dose.mean.t,xlab="Time(Month)" ,ylab="Mean serum cholesterol"
,lwd=3, col="yellow3")
lines(plac.mean.t,xlab="Time(Month)",ylab="Mean serum cholesterol"
,lwd=2.5,lty=2, col="Purple")
legend (x=5 ,y=233 ,c("High_Dose","Placebo"), lty = c(1,2), lwd = 3,
col= c("yellow3","Purple"))
5.1.4 Next read the data from the external file and put the data in a “univariate” or “long” format, with five “records” per subject.
data2 = reshape(cholesterol, varying=c("y1","y2","y3","y4","y5"),
v.names="y", timevar="time",idvar="id",
time=1:5, direction="long")
head(data2)
## trt ID time y id
## 1.1 1 1 1 178 1
## 2.1 1 2 1 254 2
## 3.1 1 3 1 185 3
## 4.1 1 4 1 219 4
## 5.1 1 5 1 205 5
## 6.1 1 6 1 182 6
5.1.5 Assuming an unstructured covariance matrix, conduct an analysis of response profiles. Determine whether the patterns of change over time differ in the two treatment groups.
data2 = subset(data2, time > 1)
month = data2 $time
month[data2 $time==2] <- 6
month[data2 $time==3] <- 12
month[data2 $time==4] <- 20
month[data2 $time==5] <- 24
data2 $time= data2 $time -1
month.2 = factor(month, c(6,12,20,24))
attach(data2)
Analysis of Response Profiles assuming Equal Mean Response at Baseline
model = gls(y ~ I(month.2==6) + I(month.2==12) +
I(month.2==20) + I(month.2==24)+
I(month.2==6 & trt==1) + I(month.2==12 & trt==1) +
I(month.2==20 & trt==1)+ I(month.2==24 & trt==1),
correlation = corSymm (form= ~ time | id) ,
weights = varIdent(form = ~ 1 | month.2),
control = list(singular.ok=T), na.action = na.omit)
summary(model)
## Generalized least squares fit by REML
## Model: y ~ I(month.2 == 6) + I(month.2 == 12) + I(month.2 == 20) + I(month.2 == 24) + I(month.2 == 6 & trt == 1) + I(month.2 == 12 & trt == 1) + I(month.2 == 20 & trt == 1) + I(month.2 == 24 & trt == 1)
## Data: NULL
## AIC BIC logLik
## 3557.579 3630.047 -1759.789
##
## Correlation Structure: General
## Formula: ~time | id
## Parameter estimate(s):
## Correlation:
## 1 2 3
## 2 0
## 3 0 0
## 4 0 0 0
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | month.2
## Parameter estimates:
## 6 12 20 24
## 1 1 1 1
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 257.48387 7.945043 32.40811 0.0000
## I(month.2 == 6)TRUE -14.31314 10.528599 -1.35945 0.1749
## I(month.2 == 12)TRUE -12.72071 10.706044 -1.18818 0.2356
## I(month.2 == 20)TRUE 0.11613 10.910238 0.01064 0.9915
## I(month.2 == 6 & trt == 1)TRUE 2.36153 8.904468 0.26521 0.7910
## I(month.2 == 12 & trt == 1)TRUE 7.25502 9.331371 0.77749 0.4374
## I(month.2 == 20 & trt == 1)TRUE -0.80455 10.019137 -0.08030 0.9360
## I(month.2 == 24 & trt == 1)TRUE -2.93124 10.706044 -0.27379 0.7844
##
## Correlation:
## (Intr) I(.2=6 I(.2=1 I(.2=2 I=6&t=1 I=1&t=1
## I(month.2 == 6)TRUE -0.755
## I(month.2 == 12)TRUE -0.742 0.560
## I(month.2 == 20)TRUE -0.728 0.550 0.540
## I(month.2 == 6 & trt == 1)TRUE 0.000 -0.509 0.000 0.000
## I(month.2 == 12 & trt == 1)TRUE 0.000 0.000 -0.515 0.000 0.000
## I(month.2 == 20 & trt == 1)TRUE 0.000 0.000 0.000 -0.511 0.000 0.000
## I(month.2 == 24 & trt == 1)TRUE -0.742 0.560 0.551 0.540 0.000 0.000
## I(.2==20&t=1
## I(month.2 == 6)TRUE
## I(month.2 == 12)TRUE
## I(month.2 == 20)TRUE
## I(month.2 == 6 & trt == 1)TRUE
## I(month.2 == 12 & trt == 1)TRUE
## I(month.2 == 20 & trt == 1)TRUE
## I(month.2 == 24 & trt == 1)TRUE 0.000
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.28706110 -0.68279992 -0.02646551 0.60736339 3.22015893
##
## Residual standard error: 44.23613
## Degrees of freedom: 344 total; 335 residual
model.11 = gls(y ~ I(month.2==6) + I(month.2==12) +
I(month.2==20) + I(month.2==24),
correlation = corSymm (form= ~ time | id) ,
weights = varIdent(form = ~ 1 | month.2),
control = list(singular.ok=T), na.action = na.omit )
summary(model.11)
## Generalized least squares fit by REML
## Model: y ~ I(month.2 == 6) + I(month.2 == 12) + I(month.2 == 20) + I(month.2 == 24)
## Data: NULL
## AIC BIC logLik
## 3575.861 3633.251 -1772.931
##
## Correlation Structure: General
## Formula: ~time | id
## Parameter estimate(s):
## Correlation:
## 1 2 3
## 2 0
## 3 0 0
## 4 0 0 0
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | month.2
## Parameter estimates:
## 6 12 20 24
## 1 1 1 1
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 255.86957 5.299865 48.27851 0.0000
## I(month.2 == 6)TRUE -11.27733 6.848735 -1.64663 0.1006
## I(month.2 == 12)TRUE -6.81580 6.994889 -0.97440 0.3306
## I(month.2 == 20)TRUE 1.28233 7.254076 0.17677 0.8598
##
## Correlation:
## (Intr) I(.2=6 I(.2=1
## I(month.2 == 6)TRUE -0.774
## I(month.2 == 12)TRUE -0.758 0.586
## I(month.2 == 20)TRUE -0.731 0.565 0.554
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.33037153 -0.69489928 -0.02505069 0.62256448 3.20576264
##
## Residual standard error: 44.02398
## Degrees of freedom: 344 total; 339 residual
model1 = gls(y ~ I(month.2==6) + I(month.2==12) +
I(month.2==20) + I(month.2==24)+
I(month.2==6 & trt==1) + I(month.2==12 & trt==1) +
I(month.2==20 & trt==1)+ I(month.2==24 & trt==1),
correlation = corSymm (form= ~ time | id) ,
weights = varIdent(form = ~ 1 | month.2),
control = list(singular.ok=T), method = "ML"
,na.action = na.omit)
model2 = gls(y ~ I(month.2==6) + I(month.2==12) +
I(month.2==20) + I(month.2==24),
corr=corSymm(, form= ~ time | id),
weights = varIdent(form = ~ 1 | month.2),
control = list(singular.ok=T), method="ML",
na.action = na.omit)
Goodness Of Fit
anova(model1,model2)
## Model df AIC BIC logLik Test L.Ratio p-value
## model1 1 19 3612.315 3685.287 -1787.157
## model2 2 15 3605.090 3662.700 -1787.545 1 vs 2 0.7756738 0.9417
5.1.6 Display the estimated 5 x 5 covariance and correlation matrices for the five repeated measurements of serum cholesterol.
cor(High_Dose )
## y1 y2 y3 y4 y5
## y1 1.000000 0.720338 NA NA NA
## y2 0.720338 1.000000 NA NA NA
## y3 NA NA 1 NA NA
## y4 NA NA NA 1 NA
## y5 NA NA NA NA 1
cov(High_Dose )
## y1 y2 y3 y4 y5
## y1 1573.262 1127.221 NA NA NA
## y2 1127.221 1556.483 NA NA NA
## y3 NA NA NA NA NA
## y4 NA NA NA NA NA
## y5 NA NA NA NA NA
cor(Placebo)
## y1 y2 y3 y4 y5
## y1 1.0000000 0.8161257 NA NA NA
## y2 0.8161257 1.0000000 NA NA NA
## y3 NA NA 1 NA NA
## y4 NA NA NA 1 NA
## y5 NA NA NA NA 1
cov(Placebo)
## y1 y2 y3 y4 y5
## y1 3121.970 2245.363 NA NA NA
## y2 2245.363 2424.545 NA NA NA
## y3 NA NA NA NA NA
## y4 NA NA NA NA NA
## y5 NA NA NA NA NA
6.1 In a study of weight gain (Box, 1950) investigators randomly assigned 30 rats to three treatment groups: treatment 1 was a control (no additive); treatments 2 and 3 consisted of two different additives (thiouracil and thyroxin respectively) to the rats drinking water. Weight, in grams, was measured at baseline (week 0) and at weeks 1, 2, 3, and 4. Due to an accident at the beginning of the study, data on 3 rats from the thyroxin group are unavailable. The raw data are stored in an external file: rat.dat Each row of the data set contains the following seven variables:
ID Group Y1 Y2 Y3 Y4 Y5
Note: The variable Group is coded 1 = control, 2 = thiouracil, and 3 = thyroxin.
library(readr)
library(tidyverse)
library(nlme)
library(foreign)
rat <- read_table("E:/University(3)/Longitudinal/EXE/rat.txt",
col_names = FALSE)
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## X1 = col_double(),
## X2 = col_double(),
## X3 = col_double(),
## X4 = col_double(),
## X5 = col_double(),
## X6 = col_double(),
## X7 = col_double()
## )
names(rat)=c("ID","group","y1","y2","y3","y4","y5")
head(rat)
## # A tibble: 6 × 7
## ID group y1 y2 y3 y4 y5
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 57 86 114 139 172
## 2 2 1 60 93 123 146 177
## 3 3 1 52 77 111 144 185
## 4 4 1 49 67 100 129 164
## 5 5 1 56 81 104 121 151
## 6 6 1 46 70 102 131 153
6.1.1 On a single graph, construct a time plot that displays the mean weight versus time (in weeks) for the three groups. Describe the general characteristics of the time trends for the three groups.
control = rat %>%
filter (group==1) %>%
select (y1,y2,y3,y4,y5) %>%
summarise_all (list (~mean(.)))
thiouracil = rat %>%
filter (group==2) %>%
select (y1,y2,y3,y4,y5) %>%
summarise_all (list (~mean(.)))
thyroxin = rat %>%
filter (group==3) %>%
select (y1,y2,y3,y4,y5) %>%
summarise_all (list (~mean(.)))
c.mean.t = ts (as.numeric(control) ,0 ,frequency = 0.5)
thi.mean.t = ts (as.numeric(thiouracil) ,0 ,frequency = 0.5)
thy.mean.t = ts (as.numeric(thyroxin) ,0 ,frequency = 0.5)
plot (c.mean.t,xlab="Time(week)" ,ylab="mean weight",lwd=2,col="red2")
lines(thi.mean.t,xlab="Time(week)",ylab="mean weight",lwd=2.5,lty=4,col="blue")
lines(thy.mean.t,xlab="Time(week)",ylab="mean weight",lwd=2.5,lty=5)
legend("topleft",c("control","thiouracil","thyroxin"),
col = c("red2","blue","black"), lty = c(1,4,5), lwd=2.5)
6.1.2 Read the data from the external file and put the data in a “univariate” or “long” format, with five “records” per subject.
str(rat) ### univariate
## spc_tbl_ [27 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ID : num [1:27] 1 2 3 4 5 6 7 8 9 10 ...
## $ group: num [1:27] 1 1 1 1 1 1 1 1 1 1 ...
## $ y1 : num [1:27] 57 60 52 49 56 46 51 63 49 57 ...
## $ y2 : num [1:27] 86 93 77 67 81 70 71 91 67 82 ...
## $ y3 : num [1:27] 114 123 111 100 104 102 94 112 90 110 ...
## $ y4 : num [1:27] 139 146 144 129 121 131 110 130 112 139 ...
## $ y5 : num [1:27] 172 177 185 164 151 153 141 154 140 169 ...
## - attr(*, "spec")=
## .. cols(
## .. X1 = col_double(),
## .. X2 = col_double(),
## .. X3 = col_double(),
## .. X4 = col_double(),
## .. X5 = col_double(),
## .. X6 = col_double(),
## .. X7 = col_double()
## .. )
rat2 = reshape(data.frame(rat), idvar="ID",
varying=c("y1","y2","y3","y4","y5"),
v.names="y", timevar="time", time=1:5,
direction="long")
head(rat2) ### Long
## ID group time y
## 1.1 1 1 1 57
## 2.1 2 1 1 60
## 3.1 3 1 1 52
## 4.1 4 1 1 49
## 5.1 5 1 1 56
## 6.1 6 1 1 46
6.1.3 Assume that the rate of increase in each group is approximately constant throughout the duration of the study. Assuming an unstructured covariance matrix, construct a test of whether the rate of increase differs in the groups.
attach(rat2)
## The following objects are masked from data2:
##
## ID, time, y
week <- time
week[time==0] <- 1
week[time==1] <- 2
week[time==2] <- 3
week[time==3] <- 4
week[time==4] <- 5
model <- gls(y ~ group*time, corr=corSymm( form= ~ time | ID),
weights = varIdent(form = ~ 1 | time),
control = list(singular.ok=T))
summary(model)
## Generalized least squares fit by REML
## Model: y ~ group * time
## Data: NULL
## AIC BIC logLik
## 870.3813 925.0101 -416.1907
##
## Correlation Structure: General
## Formula: ~time | ID
## Parameter estimate(s):
## Correlation:
## 1 2 3 4
## 2 0.861
## 3 0.616 0.771
## 4 0.368 0.473 0.872
## 5 0.219 0.311 0.785 0.955
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | time
## Parameter estimates:
## 1 2 3 4 5
## 1.000000 1.790909 2.317159 3.746176 5.232001
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 27.475521 2.297648 11.958106 0.0000
## group 2.577627 1.123120 2.295059 0.0233
## time 25.860771 2.047309 12.631593 0.0000
## group:time -1.442876 1.000751 -1.441793 0.1517
##
## Correlation:
## (Intr) group time
## group -0.923
## time -0.525 0.484
## group:time 0.484 -0.525 -0.923
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.2049350 -0.8759787 -0.1456332 0.6748131 1.9839658
##
## Residual standard error: 4.610488
## Degrees of freedom: 135 total; 131 residual
6.1.4 On a single graph, construct a time plot that displays the estimated mean weight versus time (in weeks) for the three treatment groups from the results generated from Problem 6.1.3.
with(rat2, interaction.plot(time, group, predict(model)))
Linear Trend Model (ML Estimation)
model1 <- gls(y ~ group*time, corr=corSymm( form= ~ time | ID),
weights = varIdent(form = ~ 1 | time),method="ML",
control = list(singular.ok=T))
summary(model1)
## Generalized least squares fit by maximum likelihood
## Model: y ~ group * time
## Data: NULL
## AIC BIC logLik
## 876.4359 931.6361 -419.2179
##
## Correlation Structure: General
## Formula: ~time | ID
## Parameter estimate(s):
## Correlation:
## 1 2 3 4
## 2 0.862
## 3 0.610 0.758
## 4 0.353 0.446 0.864
## 5 0.199 0.280 0.775 0.953
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | time
## Parameter estimates:
## 1 2 3 4 5
## 1.000000 1.800697 2.304840 3.767945 5.286570
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 27.475539 2.2444221 12.241698 0.0000
## group 2.577616 1.0971025 2.349476 0.0203
## time 25.860756 1.9998822 12.931140 0.0000
## group:time -1.442867 0.9775682 -1.475976 0.1423
##
## Correlation:
## (Intr) group time
## group -0.923
## time -0.525 0.484
## group:time 0.484 -0.525 -0.923
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.2899581 -0.9048518 -0.1512489 0.6924827 2.0604666
##
## Residual standard error: 4.463033
## Degrees of freedom: 135 total; 131 residual
7.1 In a study of dental growth, measurements of the distance (mm) from the center of the pituitary gland to the pteryomaxillary fissure were obtained on 11 girls and 16 boys at ages 8, 10, 12, and 14 (Potthoff and Roy, 1964). The raw data are stored in an external file: dental.dat Each row of the data set contains the following six variables:
ID Gender Y1 Y2 Y3 Y4
Note: The categorical (character) variable Gender is coded F = Female, M = Male. The third measure (at age 12) on subject ID= 20 is a potential outlier.
library(readr)
library(nlme)
dental_dat <- read_table("C:/Users/mcf/Desktop/dental.dat.txt",
col_names = FALSE)
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## X1 = col_double(),
## X2 = col_character(),
## X3 = col_double(),
## X4 = col_double(),
## X5 = col_double(),
## X6 = col_double()
## )
names(dental_dat)= c("ID" , "group" , "y1" , "y2" , "y3" , "y4")
head(dental_dat)
## # A tibble: 6 × 6
## ID group y1 y2 y3 y4
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1 F 21 20 21.5 23
## 2 2 F 21 21.5 24 25.5
## 3 3 F 20.5 24 24.5 26
## 4 4 F 23.5 24.5 25 26.5
## 5 5 F 21.5 23 22.5 23.5
## 6 6 F 20 21 21 22.5
dim(dental_dat)
## [1] 27 6
boxplot( dental_dat $y3 , col = "yellow2" , pch=19)
outlier=which(dental_dat $y3 == boxplot.stats(dental_dat $y3) $ out[1])
outlier
## [1] 20 21
7.1.1 On a single graph, construct a time plot that displays the mean distance (mm) versus age (in years) for boys and girls. Describe the time trends for boys and girls.
library(tidyverse)
Female = dental_dat %>%
filter (group=="F") %>%
select (y1,y2,y3,y4) %>%
summarise_all (list (~mean(.))); Female= as.numeric(Female) ;Female
## [1] 21.18182 22.22727 23.09091 24.09091
Male = dental_dat %>%
filter (group=="M") %>%
select (y1,y2,y3,y4) %>%
summarise_all (list (~mean(.))); Male= as.numeric(Male) ;Male
## [1] 22.87500 23.81250 25.71875 27.46875
F.mean.t = ts (Female ,0 ,frequency = 0.5)
M.mean.t = ts (Male ,0 ,frequency = 0.5)
plot (F.mean.t,xlab=" age (in years)" ,ylab="Mean distance (mm) ",lwd=2)
par(new=T)
plot(M.mean.t,xlab=" age (in years)",ylab="Mean distance (mm)"
,col="red3",lwd=2,lty=2)
legend ("topleft",c("Female","Male"), lty = c(1,2), lwd = 2)
7.1.2 Read the data from the external file and put the data in a “univariate” or “long” format, with four “records” per subject.
dental_dat2 = reshape(data.frame(dental_dat), idvar="id",
varying=c("y1","y2","y3","y4"),
v.names="y", timevar="time", time=1:4,direction="long")
head(dental_dat2)
## ID group time y id
## 1.1 1 F 1 21.0 1
## 2.1 2 F 1 21.0 2
## 3.1 3 F 1 20.5 3
## 4.1 4 F 1 23.5 4
## 5.1 5 F 1 21.5 5
## 6.1 6 F 1 20.0 6
Unstructured Covariance (REML Estimation)
attach(dental_dat2)
## The following objects are masked from rat2:
##
## group, ID, time, y
## The following objects are masked from data2:
##
## id, ID, time, y
year = time*2
year.f = factor(year, c(2,4,6,8) )
group.f = factor(group, c("F","M"))
newtime = time
newtime[time==1] <- 8
newtime[time==2] <- 10
newtime[time==3] <- 12
newtime[time==4] <- 14
newtime.f = factor(newtime,c(8,10,12,14))
model1=gls(y ~ group.f+year.f+group.f:year.f, corr=corSymm( form= ~1 | id),
weights = varIdent(form = ~ 1 | newtime),
control = list(singular.ok=T))
summary(model1)
## Generalized least squares fit by REML
## Model: y ~ group.f + year.f + group.f:year.f
## Data: NULL
## AIC BIC logLik
## 450.0348 496.9279 -207.0174
##
## Correlation Structure: General
## Formula: ~1 | id
## Parameter estimate(s):
## Correlation:
## 1 2 3
## 2 0.571
## 3 0.661 0.563
## 4 0.522 0.726 0.728
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | newtime
## Parameter estimates:
## 8 10 12 14
## 1.0000000 0.8790601 1.0918305 0.9595050
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 21.181818 0.7016511 30.188534 0.0000
## group.fM 1.693182 0.9114715 1.857635 0.0662
## year.f4 1.045455 0.6154516 1.698679 0.0925
## year.f6 1.909091 0.6068358 3.145976 0.0022
## year.f8 2.909091 0.6728998 4.323216 0.0000
## group.fM:year.f4 -0.107955 0.7994951 -0.135028 0.8929
## group.fM:year.f6 0.934659 0.7883028 1.185660 0.2386
## group.fM:year.f8 1.684659 0.8741224 1.927258 0.0568
##
## Correlation:
## (Intr) grp.fM yer.f4 yer.f6 yer.f8 g.M:.4 g.M:.6
## group.fM -0.770
## year.f4 -0.568 0.437
## year.f6 -0.321 0.247 0.418
## year.f8 -0.521 0.401 0.726 0.651
## group.fM:year.f4 0.437 -0.568 -0.770 -0.321 -0.559
## group.fM:year.f6 0.247 -0.321 -0.321 -0.770 -0.501 0.418
## group.fM:year.f8 0.401 -0.521 -0.559 -0.501 -0.770 0.726 0.651
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.52458687 -0.64159832 -0.07813035 0.59313659 2.07856661
##
## Residual standard error: 2.327113
## Degrees of freedom: 108 total; 100 residual
model1.1 = gls(y ~ group.f+year.f, corr=corSymm( form= ~1 | id),
weights = varIdent(form = ~ 1 | newtime),
control = list(singular.ok=T))
summary(model1.1)
## Generalized least squares fit by REML
## Model: y ~ group.f + year.f
## Data: NULL
## AIC BIC logLik
## 454.8186 494.3395 -212.4093
##
## Correlation Structure: General
## Formula: ~1 | id
## Parameter estimate(s):
## Correlation:
## 1 2 3
## 2 0.586
## 3 0.654 0.554
## 4 0.488 0.667 0.732
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | newtime
## Parameter estimates:
## 8 10 12 14
## 1.0000000 0.8853779 1.0937024 1.0000943
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 20.973234 0.6239732 33.61240 0.0000
## group.fM 2.045167 0.7361456 2.77821 0.0065
## year.f4 0.981481 0.3853455 2.54702 0.0123
## year.f6 2.462963 0.3903431 6.30974 0.0000
## year.f8 3.907407 0.4513638 8.65689 0.0000
##
## Correlation:
## (Intr) grp.fM yer.f4 yer.f6
## group.fM -0.699
## year.f4 -0.398 0.000
## year.f6 -0.233 0.000 0.400
## year.f8 -0.362 0.000 0.667 0.674
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.596132450 -0.600375896 -0.007937714 0.531696177 2.176598239
##
## Residual standard error: 2.318218
## Degrees of freedom: 108 total; 103 residual
Autoregressive Covariance (REML Estimation)
model2 = gls(y ~ group.f*year.f, corr=corAR1(form= ~ newtime | id))
summary(model2)
## Generalized least squares fit by REML
## Model: y ~ group.f * year.f
## Data: NULL
## AIC BIC logLik
## 490.4908 516.5425 -235.2454
##
## Correlation Structure: ARMA(1,0)
## Formula: ~newtime | id
## Parameter estimate(s):
## Phi1
## 0
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 21.181818 0.6915349 30.630149 0.0000
## group.fM 1.693182 0.8983302 1.884810 0.0624
## year.f4 1.045455 0.9779781 1.068996 0.2876
## year.f6 1.909091 0.9779781 1.952079 0.0537
## year.f8 2.909091 0.9779781 2.974597 0.0037
## group.fM:year.f4 -0.107955 1.2704308 -0.084975 0.9325
## group.fM:year.f6 0.934659 1.2704308 0.735702 0.4636
## group.fM:year.f8 1.684659 1.2704308 1.326053 0.1878
##
## Correlation:
## (Intr) grp.fM yer.f4 yer.f6 yer.f8 g.M:.4 g.M:.6
## group.fM -0.770
## year.f4 -0.707 0.544
## year.f6 -0.707 0.544 0.500
## year.f8 -0.707 0.544 0.500 0.500
## group.fM:year.f4 0.544 -0.707 -0.770 -0.385 -0.385
## group.fM:year.f6 0.544 -0.707 -0.385 -0.770 -0.385 0.500
## group.fM:year.f8 0.544 -0.707 -0.385 -0.385 -0.770 0.500 0.500
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.56151797 -0.60972303 -0.07927328 0.61436795 2.30264116
##
## Residual standard error: 2.293562
## Degrees of freedom: 108 total; 100 residual
Exponential Covariance (REML Estimation)
model3 = gls(y ~ group.f*year.f, corr=corExp(form= ~ year | id))
summary(model3)
## Generalized least squares fit by REML
## Model: y ~ group.f * year.f
## Data: NULL
## AIC BIC logLik
## 454.5472 480.5989 -217.2736
##
## Correlation Structure: Exponential spatial correlation
## Formula: ~year | id
## Parameter estimate(s):
## range
## 4.117766
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 21.181818 0.6906162 30.670898 0.0000
## group.fM 1.693182 0.8971367 1.887317 0.0620
## year.f4 1.045455 0.6058037 1.725732 0.0875
## year.f6 1.909091 0.7699348 2.479549 0.0148
## year.f8 2.909091 0.8554108 3.400811 0.0010
## group.fM:year.f4 -0.107955 0.7869621 -0.137179 0.8912
## group.fM:year.f6 0.934659 1.0001747 0.934496 0.3523
## group.fM:year.f8 1.684659 1.1112113 1.516057 0.1327
##
## Correlation:
## (Intr) grp.fM yer.f4 yer.f6 yer.f8 g.M:.4 g.M:.6
## group.fM -0.770
## year.f4 -0.439 0.338
## year.f6 -0.557 0.429 0.635
## year.f8 -0.619 0.477 0.488 0.727
## group.fM:year.f4 0.338 -0.439 -0.770 -0.489 -0.376
## group.fM:year.f6 0.429 -0.557 -0.489 -0.770 -0.560 0.635
## group.fM:year.f8 0.477 -0.619 -0.376 -0.560 -0.770 0.488 0.727
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.56492566 -0.61053417 -0.07937874 0.61518526 2.30570445
##
## Residual standard error: 2.290515
## Degrees of freedom: 108 total; 100 residual
Goodness Of Fit
anova(model1,model2)
## Model df AIC BIC logLik Test L.Ratio p-value
## model1 1 18 450.0348 496.9279 -207.0174
## model2 2 10 490.4908 516.5425 -235.2454 1 vs 2 56.45605 <.0001
anova(model1,model3)
## Model df AIC BIC logLik Test L.Ratio p-value
## model1 1 18 450.0348 496.9279 -207.0174
## model3 2 10 454.5472 480.5989 -217.2736 1 vs 2 20.51236 0.0086
anova(model2,model3)
## Model df AIC BIC logLik
## model2 1 10 490.4908 516.5425 -235.2454
## model3 2 10 454.5472 480.5989 -217.2736
7.1.7 On a single graph, construct a time plot that displays the estimated mean distance (mm) versus age (in years) for boys and girls from the results generated from Problem 7 .1.6.
new.y = predict(model1)
interaction.plot(year.f , group.f ,new.y , lwd=2 , legend = F,
xlab = " age (in years) " , main="time plot" ,
ylab = " the estimated mean distance (mm)")
legend("topleft", c("Female","Male"), lty = c(2,1), lwd = 2)
Remove outliers
detach(dental_dat2 )
dental_dat [c(20,21) ,5] <- NA
dental_dat2_new = reshape(data.frame(dental_dat), idvar="id",
varying=c("y1","y2","y3","y4"),
v.names="y2", timevar="time", time=1:4,direction="long")
head(dental_dat2_new)
## ID group time y2 id
## 1.1 1 F 1 21.0 1
## 2.1 2 F 1 21.0 2
## 3.1 3 F 1 20.5 3
## 4.1 4 F 1 23.5 4
## 5.1 5 F 1 21.5 5
## 6.1 6 F 1 20.0 6
Unstructured Covariance (REML Estimation) 2
attach(dental_dat2_new)
## The following objects are masked from rat2:
##
## group, ID, time
## The following objects are masked from data2:
##
## id, ID, time
yearr = time*2
year.ff = factor(year, c(2,4,6,8) )
group.ff = factor(group, c("F","M"))
new_time = time
new_time[time==1] <- 8
new_time[time==2] <- 10
new_time[time==3] <- 12
new_time[time==4] <- 14
newtime.ff = factor(new_time,c(8,10,12,14))
model11 = gls(y2 ~ group.ff*year.ff, corr=corSymm( form= ~1 | id),
weights = varIdent(form = ~ 1 | newtime), na.action =na.omit,
control = list(singular.ok=T))
summary(model11)
## Generalized least squares fit by REML
## Model: y2 ~ group.ff * year.ff
## Data: NULL
## AIC BIC logLik
## 407.9755 454.505 -185.9878
##
## Correlation Structure: General
## Formula: ~1 | id
## Parameter estimate(s):
## Correlation:
## 1 2 3
## 2 0.583
## 3 0.717 0.818
## 4 0.514 0.713 0.872
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | newtime
## Parameter estimates:
## 8 10 12 14
## 1.0000000 0.8742800 0.9308864 0.9162081
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 21.181818 0.7125597 29.726376 0.0000
## group.ffM 1.666235 0.9225771 1.806065 0.0740
## year.ff4 1.045455 0.6147694 1.700564 0.0922
## year.ff6 1.909091 0.5198495 3.672391 0.0004
## year.ff8 2.909091 0.6753461 4.307555 0.0000
## group.ffM:year.ff4 -0.093180 0.7975420 -0.116834 0.9072
## group.ffM:year.ff6 0.380969 0.6710294 0.567738 0.5715
## group.ffM:year.ff8 1.670359 0.8588684 1.944837 0.0547
##
## Correlation:
## (Intr) grp.fM yr.ff4 yr.ff6 yr.ff8 g.M:.4 g.M:.6
## group.ffM -0.772
## year.ff4 -0.568 0.439
## year.ff6 -0.456 0.352 0.776
## year.ff8 -0.559 0.431 0.722 0.876
## group.ffM:year.ff4 0.438 -0.566 -0.771 -0.598 -0.557
## group.ffM:year.ff6 0.353 -0.445 -0.601 -0.775 -0.679 0.772
## group.ffM:year.ff8 0.439 -0.551 -0.568 -0.689 -0.786 0.725 0.869
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.47453539 -0.62049943 -0.07693425 0.63514408 2.23144808
##
## Residual standard error: 2.363293
## Degrees of freedom: 106 total; 98 residual
8.1 In a study of exercise therapies, 37 patients were assigned to one of two weightlifting programs (Freund at al., 1986). In the first program (treatment 1), the number of repetitions was increased as subjects became stronger. In the second program (treatment 2), the number of repetitions was fixed but the weight was increased as subjects became stronger. Measures of strength were taken at baseline (day 0), and on days 2, 4, 6, 8, 10, and 12. The raw data are stored in an external file: exercise.dat Each row of the data set contains the following nine variables:
ID Treatment Y1 Y2 Y3 Y4 Y5 Y6 Y7
Note: The categorical variable Treatment is coded 1 = Program 1 (increase number of repetitions), 2 = Program 2 (increase amount of weight).
library(tidyverse)
library(nlme)
library(lme4)
## Warning: package 'lme4' was built under R version 4.2.2
## Loading required package: Matrix
## Warning: package 'Matrix' was built under R version 4.2.2
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
##
## Attaching package: 'lme4'
## The following object is masked from 'package:nlme':
##
## lmList
library(insight)
## Warning: package 'insight' was built under R version 4.2.2
exercise.dat <- read.table("E:/University(3)/Longitudinal/EXE/exercise.dat.txt"
, quote="\"", comment.char="", na.strings=".")
names(exercise.dat) = c("ID","trt","y1","y2","y3","y4","y5","y6","y7")
head(exercise.dat)
## ID trt y1 y2 y3 y4 y5 y6 y7
## 1 1 1 79 NA 79 80 80 78 80
## 2 2 1 83 83 85 85 86 87 87
## 3 3 1 81 83 82 82 83 83 82
## 4 4 1 81 81 81 82 82 83 81
## 5 5 1 80 81 82 82 82 NA 86
## 6 6 1 76 76 76 76 76 76 75
8.1.1 On a single graph, construct a time plot that displays the mean strength versus time (in days) for the two treatment groups. Describe the general characteristics of the time trends for the two exercise programs.
program.1 = exercise.dat %>%
filter(trt == 1) %>%
select(y1, y2, y3, y4, y5, y6, y7)
program.2 = exercise.dat %>%
filter(trt == 2) %>%
select(y1, y2, y3, y4, y5, y6, y7)
mean.program1= apply(program.1,2,mean,na.rm=T) ; mean.program1
## y1 y2 y3 y4 y5 y6 y7
## 79.68750 80.66667 80.81250 81.33333 80.80000 81.07692 81.06667
mean.program2= apply(program.2,2,mean,na.rm=T) ; mean.program2
## y1 y2 y3 y4 y5 y6 y7
## 81.04762 81.66667 81.90000 82.61905 82.73684 82.35294 82.53333
p1.mean.t = ts (mean.program1 ,0 ,frequency = 0.5)
p2.mean.t = ts (mean.program2 ,0 ,frequency = 0.5)
plot (p1.mean.t ,xlab="time (in days)" ,ylab="mean strength",lwd=3,
col="orange2")
par(new=T)
plot(p2.mean.t,xlab="time (in days)" ,ylab="mean strength" ,lwd=3,lty=2,
col="blue")
legend (x=8 ,y=81.5 ,c("program1","program2"), lty = c(1,2), lwd = 3,
col=c("orange2","blue"))
8.1.2 Read the data from the external file and put the data in a “univariate” or “long” format, with 7 “records” per patient.
str( exercise.dat )
## 'data.frame': 37 obs. of 9 variables:
## $ ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ trt: int 1 1 1 1 1 1 1 1 1 1 ...
## $ y1 : int 79 83 81 81 80 76 81 77 84 74 ...
## $ y2 : int NA 83 83 81 81 76 84 78 85 75 ...
## $ y3 : int 79 85 82 81 82 76 83 79 87 78 ...
## $ y4 : int 80 85 82 82 82 76 83 79 89 78 ...
## $ y5 : int 80 86 83 82 82 76 85 81 NA 79 ...
## $ y6 : int 78 87 83 83 NA 76 85 82 NA 78 ...
## $ y7 : int 80 87 82 81 86 75 85 81 86 78 ...
exercise.dat2= reshape(exercise.dat, varying=c("y1","y2","y3","y4","y5","y6"
,"y7"),v.names="y", timevar="time",idvar="ID",
time=1:7, direction="long")
head(exercise.dat2)
## ID trt time y
## 1.1 1 1 1 79
## 2.1 2 1 1 83
## 3.1 3 1 1 81
## 4.1 4 1 1 81
## 5.1 5 1 1 80
## 6.1 6 1 1 76
8.1.3 Fit a model with randomly varying intercepts and slopes, and allow the mean values of the intercept and slope to depend on treatment group (i.e., include main effect of treatment, a linear time trend, and a treatment by linear time trend interaction as fixed effects).
model.1 <- lme(y ~ time + trt ,
random= ~ time | ID , na.action=na.omit ,data= exercise.dat2)
summary(model.1)
## Linear mixed-effects model fit by REML
## Data: exercise.dat2
## AIC BIC logLik
## 828.1876 852.4344 -407.0938
##
## Random effects:
## Formula: ~time | ID
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 3.1861438 (Intr)
## time 0.3669096 -0.142
## Residual 0.8156985
##
## Fixed effects: y ~ time + trt
## Value Std.Error DF t-value p-value
## (Intercept) 78.59975 1.7459386 201 45.01862 0.0000
## time 0.29216 0.0664072 201 4.39946 0.0000
## trt 1.20172 1.0599179 35 1.13379 0.2646
## Correlation:
## (Intr) time
## time -0.063
## trt -0.952 -0.001
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -1.92411003 -0.61602464 -0.04444255 0.52480134 3.24177987
##
## Number of Observations: 239
## Number of Groups: 37
(a) What is the estimated variance of the random intercepts?
(v.i=get_variance_intercept(model.1))
## var.intercept.ID
## 10.15151
(b) What is the estimated variance of the random slopes?
(v.s=get_variance_slope(model.1))
## var.slope.ID.time
## 0.1346226
(c) What is the estimated correlation between the random intercepts and slopes?
(corr.s.i=get_correlation_slope_intercept(model.1))
## cor.slope_intercept.ID
## -0.142
(d) Give an interpretation to the magnitude of the estimated variance of the random intercepts.
v.r=get_variance_residual(model.1)
(v.i/(v.i+v.s+v.r))*100
## var.intercept.ID
## 92.69519
(e) Give an interpretation to the magnitude of the estimated variance of the random slopes.
(v.s/(v.i+v.s+v.r))*100
## var.slope.ID.time
## 1.229262
8.1.4 Is a model with only randomly varying intercepts defensible? Explain?
model.1.1 <- lme(y ~ time + trt ,
random= ~ 1 | ID , na.action=na.omit ,data= exercise.dat2)
anova(model.1,model.1.1)
## Model df AIC BIC logLik Test L.Ratio p-value
## model.1 1 7 828.1876 852.4344 -407.0938
## model.1.1 2 5 885.7476 903.0668 -437.8738 1 vs 2 61.56007 <.0001
8.1.5 What are the mean intercept and slope in the two exercise programs.
program.1.1= reshape(program.1, varying=c("y1","y2","y3","y4","y5","y6","y7"),
v.names="y", timevar="time",idvar="ID",
time=1:7, direction="long")
program.2.2= reshape(program.2, varying=c("y1","y2","y3","y4","y5","y6","y7"),
v.names="y", timevar="time",idvar="ID",
time=1:7, direction="long")
model.2 <- lme(y ~ time ,
random= ~ time | ID , na.action=na.omit ,data=program.1.1)
model.3 <- lme(y ~ time ,
random= ~ time | ID , na.action=na.omit ,data=program.2.2)
m.intercept =
mean(c(get_variance_intercept(model.2),get_variance_intercept(model.3)))
m.intercept
## [1] 10.08788
m.slope =
mean(c(get_variance_slope(model.2) ,get_variance_slope(model.3)) )
m.slope
## [1] 0.1408707
8.1.7 What is the estimate ofVar(Yij | bi)? What is the estimate ofVar(Yij)? Explain the difference.
get_variance_fixed(model.1)
## var.fixed
## 0.6696537
get_variance_random(model.1)
## var.random
## 11.39212
8.2 AIDS Clinical Trial Group (ACTG) study 193A was a randomized, doubleblind, study of AIDS patients with advanced immune suppression (CD4 counts of <= 50 cells/mm^3 ) (Henry et al., 1998). Patients were randomized to one of four daily regimens containing 600 mg of zidovudine:
In the analyses reported in Section 8.8, the first three treatment groups were combined and compared to the fourth. Measurements of CD4 counts were scheduled to be collected at baseline and at 8week intervals during follow-up. However, the CD4 count data are unbalanced due to mistimed measurements and missing data that resulted from skipped visits and dropout. The number of measurements of CD4 counts during the first 48 weeks of follow-up varied from 1 to 9, with a median of 4. CD4 count refers to the number of T-lymphocyte cells in the body; these cells are directly affected by the HIV virus. A normal CD4 count is approximately 800 to 1000; a CD4 count below 200 is one of the diagnostic criteria for AIDS established by the Centers for Disease Control and Prevention (CDC). The raw data are stored in an external file: cd4.dat Each row of the data set contains the following six variables:
ID Group Age Gender Week Log(CD4 + 1)
Note: The categorical variable Group is coded 1 = zidovudine alternating monthly with 400 mg didanosine, 2 = zidovudine plus 2.25 mg of zalcitabine, 3 = zidovudine plus 400 mg of didanosine, and 4 = zidovudine plus 400 mg of didanosine plus 400 mg of nevirapine. The variable Week represents time since baseline (in weeks).
cd4.dat= read.table("E:/University(3)/Longitudinal/EXE/cd4.dat.txt"
, quote="\"", comment.char="")
names(cd4.dat)= c("ID", "Group", "Age", "Gender", "Week", "logcd4.1")
head(cd4.dat)
## ID Group Age Gender Week logcd4.1
## 1 1 2 36.4271 1 0.0000 3.135494
## 2 1 2 36.4271 1 7.5714 3.044522
## 3 1 2 36.4271 1 15.5714 2.772589
## 4 1 2 36.4271 1 23.5714 2.833213
## 5 1 2 36.4271 1 32.5714 3.218876
## 6 1 2 36.4271 1 40.0000 3.044522
8.2.1 On a single graph, construct a smoothed time plot that displays the mean log CD4 counts versus time (in weeks) for the four treatment groups. Describe the general characteristics of the time trends for the four groups.
with(cd4.dat, interaction.plot( Week , Group, logcd4.1 ,
col = c("black","blue","red","green"),
lwd =rep(2,4), lty=rep(1,4)))
8.2.2 Fit a model where each patient’s response trajectory is represented by a randomly varying piecewise linear spline with a knot at week 16. That is, fit a model with random intercepts and two randomly varying slopes, one slope for the changes in log CD4 counts before week 16, another slope for the changes in response after week 16. Allow the average slopes for changes in response before and after week 16 to vary by group, but assume the mean response at baseline is the same in the four groups.
attach(cd4.dat)
## The following object is masked from dental_dat2_new:
##
## ID
## The following object is masked from rat2:
##
## ID
## The following object is masked from data2:
##
## ID
trt = Group
trt[Group == 4] <- 1
trt[Group < 4] <- 0
Week16 = (Week-16)*I(Week>=16)
trt.Week = I(trt==1)*Week
trt.Week16 = I(trt==1)*Week16
fit1 = lme(logcd4.1 ~ Week + Week16 + trt.Week + trt.Week16,
random= ~1+ Week + Week16 | ID)
summary(fit1)
## Linear mixed-effects model fit by REML
## Data: NULL
## AIC BIC logLik
## 11958.97 12037.25 -5967.483
##
## Random effects:
## Formula: ~1 + Week + Week16 | ID
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.76534018 (Intr) Week
## Week 0.03038394 0.312
## Week16 0.03521895 -0.458 -0.858
## Residual 0.55331916
##
## Fixed effects: logcd4.1 ~ Week + Week16 + trt.Week + trt.Week16
## Value Std.Error DF t-value p-value
## (Intercept) 2.9414587 0.025620786 3723 114.80751 0e+00
## Week -0.0073438 0.001986836 3723 -3.69623 2e-04
## Week16 -0.0120392 0.003174434 3723 -3.79256 2e-04
## trt.Week 0.0268521 0.003847190 3723 6.97966 0e+00
## trt.Week16 -0.0277377 0.006198368 3723 -4.47499 0e+00
## Correlation:
## (Intr) Week Week16 trt.Wk
## Week -0.193
## Week16 0.102 -0.867
## trt.Week 0.001 -0.497 0.438
## trt.Week16 -0.001 0.434 -0.507 -0.870
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -4.36206263 -0.41946338 0.02941786 0.47734190 3.83987831
##
## Number of Observations: 5036
## Number of Groups: 1309
8.2.3 Is a model with only randomly varying intercepts defensible? Explain?
fit2 <- lme(logcd4.1 ~ Week + Week16 + trt.Week + trt.Week16,
random= ~1 | ID)
summary(fit2)
## Linear mixed-effects model fit by REML
## Data: NULL
## AIC BIC logLik
## 12176.5 12222.17 -6081.251
##
## Random effects:
## Formula: ~1 | ID
## (Intercept) Residual
## StdDev: 0.8716567 0.6143669
##
## Fixed effects: logcd4.1 ~ Week + Week16 + trt.Week + trt.Week16
## Value Std.Error DF t-value p-value
## (Intercept) 2.9409606 0.028931915 3723 101.65109 0e+00
## Week -0.0068451 0.001830086 3723 -3.74029 2e-04
## Week16 -0.0119270 0.003143212 3723 -3.79454 2e-04
## trt.Week 0.0265260 0.003380130 3723 7.84763 0e+00
## trt.Week16 -0.0284370 0.005989088 3723 -4.74813 0e+00
## Correlation:
## (Intr) Week Week16 trt.Wk
## Week -0.354
## Week16 0.238 -0.890
## trt.Week 0.001 -0.474 0.436
## trt.Week16 -0.001 0.423 -0.495 -0.889
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -4.29891028 -0.46446357 0.03593495 0.52390309 3.69754180
##
## Number of Observations: 5036
## Number of Groups: 1309
8.2.7 Using the estimates of the fixed effects from the previous analysis, construct a time plot that displays the estimated mean log CD4 counts versus time (in weeks) for the four treatment groups. Does the plot suggest that one treatment regimen is superior to the others in terms of short-term (40 weeks) changes in CD4 counts?
with(cd4.dat, interaction.plot(Week, Group, predict(fit1)))
9.1 In a U.S. Centers for Disease Control (CDC) study, conducted in the state of Georgia from 1980 to 1992, linked data on live births to the same mother were obtained (Adams et al., 1997). We focus on a subset of data restricted to 878 mothers for whom five births were identified; these data are from Neuhaus and Kalbfleisch (1998) and are reported in Pan (2002) and Rabe-Hesketh and Skrondal (2008). The main objective of the analysis is to determine the effect of maternal age (the mother’s age at each birth) on infant birth weight. Note that because each mother had five infant births, but at different maternal ages, maternal age is a within-subject or time-varying covariate. The raw data are stored in an external file: birthwt.dat Each row of the data set contains the following five variables:
MID Order Wt Age CID
Note: The outcome variable Wt records the infant birth weight measured in grams. The variable Order denotes the infant birth order (coded 1-5). The variable Age is the mother’s age (in years) at each of the five recorded births. Finally, the variables MID and CID denote the mother and child identifiers (IDs).
library(nlme)
library(lme4)
library(plm)
## Warning: package 'plm' was built under R version 4.2.2
##
## Attaching package: 'plm'
## The following objects are masked from 'package:dplyr':
##
## between, lag, lead
library(readr)
brith_dat= read_table("E:/University(3)/Longitudinal/EXE/brith.dat.txt",
col_names = FALSE)
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## X1 = col_double(),
## X2 = col_double(),
## X3 = col_double(),
## X4 = col_double(),
## X5 = col_double()
## )
names(brith_dat)= c("MID", "Order", "Wt", "Age", "CID" )
head(brith_dat)
## # A tibble: 6 × 5
## MID Order Wt Age CID
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 80 1 3175 18 1
## 2 80 2 3572 21 2
## 3 80 3 3317 24 3
## 4 80 4 4281 26 4
## 5 80 5 3827 28 5
## 6 84 1 2892 14 6
9.1.1 Let Yij denote the birth weight (in grams) of the jth infant from the ith mother and Ageij denote the ith mother’s age at the time of the birth of her jth infant (for i = 1, … , 878; j = 1, … , 5). Fit the following standard mixed effects model with linear trend for maternal age and randomly varying intercepts,
Yij= B1 + B2 Ageij + bi + eij
where bi~ N(0, sigma2b) and eij ~ N(0,sigma2e).
attach(brith_dat)
## The following object is masked from cd4.dat:
##
## Age
y= Wt
model1= lme(y ~ Age ,
random= ~ 1 | MID , na.action=na.omit)
summary(model1)
## Linear mixed-effects model fit by REML
## Data: NULL
## AIC BIC logLik
## 67069.07 67094.62 -33530.53
##
## Random effects:
## Formula: ~1 | MID
## (Intercept) Residual
## StdDev: 354.9605 434.2334
##
## Fixed effects: y ~ Age
## Value Std.Error DF t-value p-value
## (Intercept) 2785.2542 44.99276 3511 61.90449 0
## Age 17.1383 1.98014 3511 8.65511 0
## Correlation:
## (Intr)
## Age -0.953
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -6.04660152 -0.46606378 0.05847938 0.56450997 3.12988230
##
## Number of Observations: 4390
## Number of Groups: 878
9.1.4
model2= lm (y ~ Age , na.action=na.omit)
summary(model2)
##
## Call:
## lm(formula = y ~ Age, na.action = na.omit)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3013.37 -296.45 30.04 357.94 2080.22
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2645.316 40.409 65.46 <2e-16 ***
## Age 23.602 1.825 12.93 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 559.9 on 4388 degrees of freedom
## Multiple R-squared: 0.03671, Adjusted R-squared: 0.03649
## F-statistic: 167.2 on 1 and 4388 DF, p-value: < 2.2e-16
9.1.8
mage= rep(tapply(Age,MID,mean),table(MID))
cage= Age - mage
model3= lme(y ~ mage + cage, random= ~ 1 | MID)
summary(model3)
## Linear mixed-effects model fit by REML
## Data: NULL
## AIC BIC logLik
## 67048.3 67080.24 -33519.15
##
## Random effects:
## Formula: ~1 | MID
## (Intercept) Residual
## StdDev: 351.8482 433.9342
##
## Fixed effects: y ~ mage + cage
## Value Std.Error DF t-value p-value
## (Intercept) 2499.108 80.69220 3511 30.970879 0
## mage 30.355 3.67406 876 8.261983 0
## cage 11.832 2.34256 3511 5.050863 0
## Correlation:
## (Intr) mage
## mage -0.986
## cage 0.000 0.000
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -6.00869342 -0.46679435 0.05271805 0.56461728 3.15143540
##
## Number of Observations: 4390
## Number of Groups: 878
9.1.11
phtest(plm(y~Age,data = brith_dat,index="MID",model = "within")
,plm(y~Age,data = brith_dat,index="MID",model = "random"))
##
## Hausman Test
##
## data: y ~ Age
## chisq = 18.246, df = 1, p-value = 1.942e-05
## alternative hypothesis: one model is inconsistent
10.1 In Section 8.8 we presented the results of analyses of a subset of the pulmonary function data collected in the Six Cities Study of Air Pollution and Health (Dockery et al., 1983). The data consist of measurements of FEV 1, height, and age obtained from a randomly selected subset of the female participants living in Topeka, Kansas. Specifically, we considered the following model for log(FEV 1):
E(Yij | bi)= B1 + B2 Ageij +B3 log(Ht)ij + B4 Agei1 + B5 log(Ht)i1 + b1i + b2i Ageij
where Yij is the log(FEV 1) for the ith child at the jth visit, Agei1 and log(Ht)i1 are the initial or baseline age and log(height) for the ith child. In this exercise we examine the residuals from the fitted model to assess the overall adequacy of the model for the mean response. The raw data are stored in an external file: fev1.dat Each row of the data set contains the following six variables:
ID Height Age Initial Height Initial Age log(FEV 1)
library(foreign)
library(nlme)
library(readr)
fev1.dat <- read.table("E:/University(3)/Longitudinal/EXE/fev1.dat.txt",
quote="\"", comment.char="")
names(fev1.dat)= c("id" , "height" , "age" , "baseheight" , "baseage"
,"logfev1_dat")
head(fev1.dat)
## id height age baseheight baseage logfev1_dat
## 1 1 1.20 9.3415 1.2 9.3415 0.21511
## 2 1 1.28 10.3929 1.2 9.3415 0.37156
## 3 1 1.33 11.4524 1.2 9.3415 0.48858
## 4 1 1.42 12.4600 1.2 9.3415 0.75142
## 5 1 1.48 13.4182 1.2 9.3415 0.83291
## 6 1 1.50 15.4743 1.2 9.3415 0.89200
attach(fev1.dat)
## The following object is masked from dental_dat2_new:
##
## id
## The following object is masked from data2:
##
## id
loght= log(height)
logbht= log(baseheight)
10.1.1 linear.mixed.effegt.model
model1= lme(logfev1_dat ~ age +loght +baseage +logbht ,random= ~age | id)
summary(model1)
## Linear mixed-effects model fit by REML
## Data: NULL
## AIC BIC logLik
## -4484.09 -4433.732 2251.045
##
## Random effects:
## Formula: ~age | id
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.123425936 (Intr)
## age 0.007067898 -0.545
## Residual 0.060428739
##
## Fixed effects: logfev1_dat ~ age + loght + baseage + logbht
## Value Std.Error DF t-value p-value
## (Intercept) -0.2692104 0.04240063 1692 -6.34921 0.0000
## age 0.0234924 0.00139949 1692 16.78642 0.0000
## loght 2.2406382 0.04370642 1692 51.26566 0.0000
## baseage -0.0237636 0.00812466 297 -2.92487 0.0037
## logbht 0.3679824 0.15822225 297 2.32573 0.0207
## Correlation:
## (Intr) age loght baseag
## age 0.017
## loght -0.074 -0.876
## baseage -0.823 -0.165 0.166
## logbht 0.369 0.216 -0.249 -0.815
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -6.43861002 -0.50366195 0.06392053 0.58604687 2.81673446
##
## Number of Observations: 1994
## Number of Groups: 300
10.1.2 Construct a histogram of the residuals. Comment on the shape of the distribution of the residuals.
fit= fitted(model1)
res= resid(model1)
hist(res, col = "green3" , lwd=2)
10.1.3 Construct a normal quantile plot (Q-Q plot) of the residuals. Does the plot display any systematic departures from a straight line? Does the plot suggest any potential outlying observations?
qqnorm(res,pch=1,frame=FALSE, lwd=2)
qqline(res,col="blue2",lwd=2)
10.1.4 On a single graph, construct a scatterplot of the residuals versus the predicted values and superimpose a lowess smoothed curve on the scatterplot. Does the plot display any systematic pattern?
plot(fit,res, col="red3")
lines(lowess(fit,res), lwd=3)
10.1.5 On a single graph, construct a scatterplot of the residuals versus age and superimpose a lowess smoothed curve on the scatterplot. Does the plot display any systematic pattern?
plot(age,res , col= "green4")
lines(lowess(age,res), lwd=3)
10.1.6 On a single graph, construct a scatterplot of the residuals versus log(height) and superimpose a lowess smoothed curve on the scatterplot. Does the plot display any systematic pattern?
plot(loght,res, col= "orange3")
lines(lowess(loght,res), lwd=3)
transform
library(mgcv)
## This is mgcv 1.8-40. For overview type 'help("mgcv-package")'.
est.cov= extract.lme.cov(model1,fev1.dat)
Li= t(chol(est.cov))
cholesky.residuals= solve(Li) %*% res
hist(cholesky.residuals, col = "pink2" , lwd=2)
11.1 In an experimental study of patients with bladder cancer conducted by the Veterans Administration Cooperative Urological Research Group (Byar and Blackard, 1978; Wei et al., 1989), patients underwent surgery to remove tumors. Following surgery, patients were randomized to either placebo or treatment with thiotepa. Subsequently patients were examined at 18, 24, 30, and 36 months. For this problem set we focus only on the data for month 18. The response variable is binary, indicating whether or not there is anew tumor (Y = 1, if new tumor; Y = 0, ifno new tumor) at the 18-month visit. The objective of the analysis is to determine the effect of treatment on tumor recurrence by month 18. The raw data are stored in an external file: tumor.dat Each row of the data set contains the following three variables:
ID Treatment Y Note: The response variable Y is coded 1 = new tumor, 0 = no new tumor. The categorical variable Treatment is coded 1 = thiotepa, 0 = placebo.
library(foreign)
library(nlme)
library(lme4)
library(readr)
tumor_dat= read_table("E:/University(3)/Longitudinal/EXE/tumor.dat.txt",
col_names = FALSE)
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## X1 = col_double(),
## X2 = col_double(),
## X3 = col_double()
## )
names(tumor_dat)= c("ID" , "Treatment" , "y")
head(tumor_dat)
## # A tibble: 6 × 3
## ID Treatment y
## <dbl> <dbl> <dbl>
## 1 1 0 1
## 2 2 0 1
## 3 3 0 1
## 4 4 0 0
## 5 5 0 1
## 6 6 0 1
attach(tumor_dat)
## The following object is masked _by_ .GlobalEnv:
##
## y
## The following object is masked from cd4.dat:
##
## ID
## The following object is masked from dental_dat2_new:
##
## ID
## The following objects are masked from rat2:
##
## ID, y
## The following objects are masked from data2:
##
## ID, y
11.1.1 Assuming a Bernoulli distribution for the recurrence of tumor at month 18, fit the following logistic regression model relating the mean or probability of recurrence (µi) to Treatment:
logit(ui)= B1 + B2 Treatment i
model1= glm(tumor_dat$y ~
tumor_dat$Treatment,family=binomial(link="logit"))
summary(model1)
##
## Call:
## glm(formula = tumor_dat$y ~ tumor_dat$Treatment, family = binomial(link = "logit"))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.9005 -0.9005 -0.5083 -0.5083 2.0544
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.6931 0.3273 -2.118 0.0342 *
## tumor_dat$Treatment -1.2879 0.6258 -2.058 0.0396 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 82.662 on 74 degrees of freedom
## Residual deviance: 77.843 on 73 degrees of freedom
## AIC: 81.843
##
## Number of Fisher Scoring iterations: 4
11.1.2 What are the interpretations of B1 and B2 ?
exp(summary(model1) $ coefficients[1,1])
## [1] 0.5
exp(summary(model1) $ coefficients[2,1])
## [1] 0.2758621
11.1.3 From the results obtained in Problem 11.1.1, what can you conclude about the effect of treatment on tumor recurrence at month 18?
placebo=( (summary(model1) $ coefficients[1,1])+
(summary(model1) $ coefficients[2,1]*0));placebo
## [1] -0.6931472
thiotepa= ((summary(model1) $ coefficients[1,1]) +
(summary(model1) $ coefficients[2,1]*1));thiotepa
## [1] -1.981001
(odds_placebo= exp(placebo))
## [1] 0.5
(odds_thiotepa= exp(thiotepa))
## [1] 0.137931
(OR= odds_thiotepa/odds_placebo)
## [1] 0.2758621
11.1.4 What is the estimated probability of recurrence of a new tumor among those who received placebo?
(P_placebo= odds_placebo/(1+odds_placebo))
## [1] 0.3333333
11.1.5 What is the estimated probability of recurrence of a new tumor among those who received thiotepa?
(P_thiotepa= odds_thiotepa/(1+odds_thiotepa))
## [1] 0.1212121
11.1.6 Construct a 95% confidence interval for the log odds ratio, comparing thiotepa to placebo.
confint(model1)
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## (Intercept) -1.363522 -0.06898443
## tumor_dat$Treatment -2.639958 -0.13218201
11.1.7 Construct a 95% confidence interval for the odds ratio, comparing thiotepa to placebo.
exp(cbind("Odds ratio" = coef(model1),
confint.default(model1, level = 0.95)))
## Odds ratio 2.5 % 97.5 %
## (Intercept) 0.5000000 0.26323822 0.9497101
## tumor_dat$Treatment 0.2758621 0.08091196 0.9405270
11.2 In a clinical trial of patients suffering from epileptic seizures (Thall and Vail, 1990), patients were randomized to receive either a placebo or the drug progabide, in addition to standard therapy. A baseline count of the number of epileptic seizures in an 8-week period prior to randomization was obtained. In addition, counts of the number of epileptic seizures in each of four successive 2-week (post-baseline) treatment periods were obtained. For this problem set, we focus only on the data from the last 2-week treatment period. The goal of the analysis is to make a comparison between the two treatment groups in terms of the counts of the number of seizures in the final 2-week period of the study. The question we want to address is whether treatment with progabide is effective in reducing epileptic seizures. The raw data are stored in an external file: seizure.dat Each row of the data set contains the following four variables:
ID Treatment Age Y
Note: The response variable Y is a count of the number of epileptic seizures in a 2-week interval. The categorical variable Treatment is coded 1 = progabide, 0 = placebo. The variable Age is the age of each patient (in years) at baseline.
library(readr)
seizure_dat= read_table("E:/University(3)/Longitudinal/EXE/seizure.dat.txt",
col_names = FALSE)
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## X1 = col_double(),
## X2 = col_double(),
## X3 = col_double(),
## X4 = col_double()
## )
names(seizure_dat)= c("ID", "Treatment", "Age", "Y")
attach(seizure_dat)
## The following objects are masked from tumor_dat:
##
## ID, Treatment
## The following object is masked from brith_dat:
##
## Age
## The following objects are masked from cd4.dat:
##
## Age, ID
## The following object is masked from dental_dat2_new:
##
## ID
## The following object is masked from rat2:
##
## ID
## The following object is masked from data2:
##
## ID
11.2.1 Assuming a Poisson distribution for the counts, fit the following model relating the mean number of seizures (µi) to Treatment:
ln(ui)= B1 + B2 Treatment i
model2 <- glm(Y ~ Treatment, family=poisson(link="log"))
summary(model2)
##
## Call:
## glm(formula = Y ~ Treatment, family = poisson(link = "log"))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -4.0000 -2.0286 -1.1402 0.4085 13.0026
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.07944 0.06682 31.122 <2e-16 ***
## Treatment -0.17109 0.09617 -1.779 0.0752 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 473.79 on 58 degrees of freedom
## Residual deviance: 470.62 on 57 degrees of freedom
## AIC: 662.88
##
## Number of Fisher Scoring iterations: 6
11.2.4 Construct a 95% confidence interval for the rate ratio, comparing progabide to placebo.
exp(cbind("rate ratio" = coef(model2),
confint.default(model2, level = 0.95)))
## rate ratio 2.5 % 97.5 %
## (Intercept) 8.0000000 7.0180538 9.119337
## Treatment 0.8427419 0.6979643 1.017551
11.2.5 Redo the analysis in Problem 11.2.1, adjusting for the effect of baseline age of the patient:
ln(ui) = B1 +B2 Treatment i + B3 Age i
model3 <- glm(Y ~ Treatment + Age,
family=poisson(link="log"))
summary(model3)
##
## Call:
## glm(formula = Y ~ Treatment + Age, family = poisson(link = "log"))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -4.0222 -1.9263 -1.0147 0.2053 12.6042
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.504596 0.210549 11.896 <2e-16 ***
## Treatment -0.179725 0.096372 -1.865 0.0622 .
## Age -0.014788 0.006997 -2.113 0.0346 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 473.79 on 58 degrees of freedom
## Residual deviance: 466.00 on 56 degrees of freedom
## AIC: 660.25
##
## Number of Fisher Scoring iterations: 6
11.2.6 Based on the results of the analysis for Problem 11.2.5, construct a 95% confidence interval for the age-adjusted rate ratio, comparing progabide to placebo.
exp(cbind("rate ratio" = coef(model3),
confint.default(model3, level = 0.95)))
## rate ratio 2.5 % 97.5 %
## (Intercept) 12.2386186 8.1005097 18.4906619
## Treatment 0.8354995 0.6916945 1.0092019
## Age 0.9853204 0.9718993 0.9989267
11.3 In a study of mental health conducted on a random sample of 40 adult residents of Alachua County, Florida, mental impairment was measured on a four-level ordinal scale with four categories (well, mild symptom formation, moderate symptom formation, impaired); these data are from Chapter 3 (Table 3.3) of Agresti (2010). The goal of the study was to relate mental impairment to several covariates, including an index of life events. The life events (LE) index is a composite measure of the number and severity of important life events that occurred within the past three years (e.g., birth of a child, new job, divorce, death of a family member). The main objective of the analyses is to assess whether the odds of a more favorable mental impairment response is related to the index of life events. Because socioeconomic status (SES) is considered to be a potential confounding variable, it is also of interest to assess the relationship between mental impairment and life events adjusted for SES. The raw data are stored in an external file: impairment.dat Each row of the data set contains the following four variables:
ID LE SES Y
Note: The ordinal response variable Y, denoting subjects’ reported mental impairment, has four categories coded l=well, 2=mild symptom formation, 3=moderate symptom formation, and 4=impaired. The categorical variable SES is coded 1 = high SES, 0 = low SES. The variable LE is a quantitative measures of the number and severity of important life events.
ID= 1:40
LE=c(1,9,4,3,2,0,1,3,3,7,1,2,5,
6,3,1,8,2,5,5,9,3,3,1,0,4,3,9,6,4,3,8,2,7,5,4,4,
8,8,9)
SES= c(1,1,1,1,0,1,0,1,1,1,0,0,1,0,1,0,1
,1,0,1,1,0,1,1,0,1,0,0,1,0,0,1,1,1,0,0,0,1,0,0)
y=c(rep(1,12),rep(2,12),rep(3,7),rep(4,9))
impairment.dat= data.frame(ID, LE, SES,y)
head(impairment.dat)
## ID LE SES y
## 1 1 1 1 1
## 2 2 9 1 1
## 3 3 4 1 1
## 4 4 3 1 1
## 5 5 2 0 1
## 6 6 0 1 1
attach(impairment.dat)
## The following objects are masked _by_ .GlobalEnv:
##
## ID, LE, SES, y
## The following object is masked from seizure_dat:
##
## ID
## The following objects are masked from tumor_dat:
##
## ID, y
## The following object is masked from cd4.dat:
##
## ID
## The following object is masked from dental_dat2_new:
##
## ID
## The following objects are masked from rat2:
##
## ID, y
## The following objects are masked from data2:
##
## ID, y
library (MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
11.3.1
model3= polr(factor(y)~LE)
summary(model3)
##
## Re-fitting to get Hessian
## Call:
## polr(formula = factor(y) ~ LE)
##
## Coefficients:
## Value Std. Error t value
## LE 0.2879 0.1175 2.451
##
## Intercepts:
## Value Std. Error t value
## 1|2 0.2614 0.5639 0.4636
## 2|3 1.6563 0.6171 2.6838
## 3|4 2.5876 0.6938 3.7296
##
## Residual Deviance: 102.5271
## AIC: 110.5271
(ln_y=(summary(model3)$coefficients[2,1]+
summary(model3)$coefficients[3,1]+
summary(model3)$coefficients[4,1]+
(summary(model3)$coefficients[1,1]*LE)))
##
## Re-fitting to get Hessian
##
##
## Re-fitting to get Hessian
##
##
## Re-fitting to get Hessian
##
##
## Re-fitting to get Hessian
## [1] 4.793265 7.096705 5.657055 5.369125 5.081195 4.505335 4.793265 5.369125
## [9] 5.369125 6.520845 4.793265 5.081195 5.944985 6.232915 5.369125 4.793265
## [17] 6.808775 5.081195 5.944985 5.944985 7.096705 5.369125 5.369125 4.793265
## [25] 4.505335 5.657055 5.369125 7.096705 6.232915 5.657055 5.369125 6.808775
## [33] 5.081195 6.520845 5.944985 5.657055 5.657055 6.808775 6.808775 7.096705
11.3.2 What is the interpretation of the estimate of B1?
(exp_b= exp(summary(model3)$coefficients[1,1]))
##
## Re-fitting to get Hessian
## [1] 1.333664
11.3.4 Based on the results from Problem 11.3.1, estimate the odds ratio of a more favorable response for subjects with no life events (LE=O) relative to subjects with 6 life events (LE=6).
11.3.5 The proportional odds model in Problem 11.3.1 makes the assumption of a common effect oflife events ((31) across the different cumulative logits. Provide a formal or informal assessment of the “proportionality assumption.” What do you conclude?
(no_life_event= (summary(model3)$coefficients[2,1]+
summary(model3)$coefficients[3,1]+
summary(model3)$coefficients[4,1]+
(summary(model3)$coefficients[1,1]*0)))
##
## Re-fitting to get Hessian
##
##
## Re-fitting to get Hessian
##
##
## Re-fitting to get Hessian
##
##
## Re-fitting to get Hessian
## [1] 4.505335
(six_life_event= (summary(model3)$coefficients[2,1]
+summary(model3)$coefficients[3,1]
+summary(model3)$coefficients[4,1]
+(summary(model3)$coefficients[1,1]*6)))
##
## Re-fitting to get Hessian
##
##
## Re-fitting to get Hessian
##
##
## Re-fitting to get Hessian
##
##
## Re-fitting to get Hessian
## [1] 6.232915
(odds_no_life_event= exp(no_life_event))
## [1] 90.49866
(odds_six_life_event= exp(six_life_event))
## [1] 509.2378
(OR_no= (odds_no_life_event/odds_six_life_event))
## [1] 0.1777139
(OR_six= (odds_six_life_event/odds_no_life_event))
## [1] 5.627021
11.3.7 Based on the results from Problem 11.3.6, what are the interpretations of the estimates of B1 and B2?
11.3.8 Construct a test of the null hypothesis of no effect oflife events on the cumulative log odds of response, after adjusting for SES. What conclusions do you draw about the adjusted effect of life events on mental impairment?
model3.1= polr(factor(y) ~ LE + SES)
summary(model3.1)
##
## Re-fitting to get Hessian
## Call:
## polr(formula = factor(y) ~ LE + SES)
##
## Coefficients:
## Value Std. Error t value
## LE 0.3189 0.1210 2.635
## SES -1.1112 0.6109 -1.819
##
## Intercepts:
## Value Std. Error t value
## 1|2 -0.2819 0.6423 -0.4389
## 2|3 1.2128 0.6607 1.8357
## 3|4 2.2094 0.7210 3.0644
##
## Residual Deviance: 99.0979
## AIC: 109.0979
(ln_y=summary(model3.1)$coefficients[3,1]+
summary(model3.1)$coefficients[4,1]+
summary(model3.1)$coefficients[5,1]+
(summary(model3.1)$coefficients[1,1]*LE)+
((summary(model3.1)$coefficients[2,1])*SES))
##
## Re-fitting to get Hessian
##
##
## Re-fitting to get Hessian
##
##
## Re-fitting to get Hessian
##
##
## Re-fitting to get Hessian
##
##
## Re-fitting to get Hessian
## [1] 2.347892 4.898782 3.304476 2.985615 3.777984 2.029031 3.459123 2.985615
## [9] 2.985615 4.261060 3.459123 3.777984 3.623337 5.053429 2.985615 3.459123
## [17] 4.579921 2.666753 4.734568 3.623337 4.898782 4.096846 2.985615 2.347892
## [25] 3.140262 3.304476 4.096846 6.010013 3.942198 4.415707 4.096846 4.579921
## [33] 2.666753 4.261060 4.734568 4.415707 4.415707 4.579921 5.691152 6.010013
(exp_b1= exp(summary(model3.1)$coefficients[1,1]))
##
## Re-fitting to get Hessian
## [1] 1.375561
(exp_b2= exp(summary(model3.1)$coefficients[2,1]))
##
## Re-fitting to get Hessian
## [1] 0.3291535
11.3.9 Combine the two adjacent categories, mild and moderate symptom formation, to form a three category ordinal response. With the three category ordinal response, redo the analysis in Problem 11.3.6:
ak + B1 LEi +B2 SESi
11.3.10 Compare and contrast the estimate of B1 obtained from Problem 11.3.9 with the corresponding estimate of B1 obtained from Problem 11.3.6. Does B1 have the same interpretation in the model from Problem 11.3.9 as it does in the model from Problem 11.3.6?
y_new<-y
y_new[y==1]<-1
y_new[y==2]<-2
y_new[y==3]<-2
y_new[y==4]<-30
model3.2= polr(factor(y_new) ~ LE + SES)
summary(model3.2)
##
## Re-fitting to get Hessian
## Call:
## polr(formula = factor(y_new) ~ LE + SES)
##
## Coefficients:
## Value Std. Error t value
## LE 0.3546 0.1313 2.702
## SES -0.9327 0.6364 -1.466
##
## Intercepts:
## Value Std. Error t value
## 1|2 -0.0468 0.6423 -0.0728
## 2|30 2.4813 0.7857 3.1580
##
## Residual Deviance: 74.5481
## AIC: 82.5481
(ln_y_new= summary(model3.2)$coefficients[3,1]+
summary(model3.2)$coefficients[4,1]+
(summary(model3.2)$coefficients[1,1]*LE)+
((summary(model3.2)$coefficients[2,1]*SES)))
##
## Re-fitting to get Hessian
##
##
## Re-fitting to get Hessian
##
##
## Re-fitting to get Hessian
##
##
## Re-fitting to get Hessian
## [1] 1.856442 4.693441 2.920317 2.565692 3.143761 1.501817 2.789136 2.565692
## [9] 2.565692 3.984191 2.789136 3.143761 3.274942 4.562260 2.565692 2.789136
## [17] 4.338816 2.211067 4.207636 3.274942 4.693441 3.498386 2.565692 1.856442
## [25] 2.434511 2.920317 3.498386 5.626135 3.629567 3.853011 3.498386 4.338816
## [33] 2.211067 3.984191 4.207636 3.853011 3.853011 4.338816 5.271510 5.626135