13 Standardization and the Parametric G-Formula

Chapter 13 에서는 평균 인과효과를 추정하기 위해 어떻게 표준화 기법을 사용할 것인가를 다룬다. 표준화 기법은 공변량의 수가 많고 비이산형 처치가 존재할 때 나타날 수 있는 고차원성의 문제(high-dimensional problems)에 대한 처방 중 하나이다. 이전에 다루었던 IP 가중치와 같은 식별가능성의 조건(identifiability conditions)을 가지지만, 서로 다른 모델링 가정을 가지고 있다.

13.1 Standardization as an alternative to IP weighting

이전 IP 가중치와 동일한 데이터셋을 사용한다:

nhefs <- read_csv("https://www.hsph.harvard.edu/miguel-hernan/wp-content/uploads/sites/1268/2019/03/nhefs.csv")

\(E[Y^{a, c = 0}]\): 모든 개인이 \(a\) 수준의 처치를 받았을 때, 그리고 누구도 검열되지 않았을 때(no censored) 관측된 체중 증가의 평균
\(E[Y^{a = 1, c = 0}] - E[Y^{a = 0, c = 0}]\): 금연의 평균 인과효과
성별, 연령, 인종, 교육 수준, 흡연 정도와 기간, 일일 활동량 등과 관련된 공변량 \(L\)은 중첩과 선택 편향을 조정하기에 충분한 일련의 변수들이라고 가정한다.

\(L\)에 조건적인 교환가능성과 양의 확률 조건 하에서 검열되지 않은 처치군에서의 표준화된 평균 결과는 모든 이들이 처치를 받고 검열되지 않았다는 조건 \(E[Y^{a = 1, c = 0}]\) 하에서는 평균 결과에 대한 일관된 추정량을 제공한다.

검열되지 않고 처치받지 않은 집단(통제군)에서의 표준화된 평균 결과값은 모든 이들이 처치를 받지 않으면서 검열을 받지 않은 \(E[Y^{a = 0, c = 0}]\)에 대한 평균 결과값의 일관된 추정량을 제공한다.

검열되지 않은 처치군에서의 표준화된 평균 결과값을 컴퓨팅하기 위해서는 먼저 중첩변수 \(L\)의 각 값 \(l\)에서의 검열되지 않은 처치군의 평균 결과값을 컴퓨팅해야 한다. 즉, 각각의 \(l\)에서의 조건부 평균 \(E[Y|A = 1, C = 0, L = l]\)을 구하는 것이다:

\[ \sum_l E[Y|A = a, C= 0, L = l]\times\Pr[L=l]. \] \(L\)의 변수 중에는 연속형인 것들도 존재하는데, 이 경우에는 \(\Pr[L=l]\)을 확률밀도함수인 \(f_L[l]\)로 대체한다.

13.2 Estimating the mean outcome via modeling

\(L\)로 정의된 수많은 계층 각각에서 \(E[Y|A = a,C = 0,L = l]\)의 모수 추정치를 얻기 위해, 처치 \(A\)와 \(L\)의 9개 교란 변수를 모두 공변량으로 포함하여 평균 체중 증가에 대한 선형 회귀 모델을 적합한다.

여기서 모델은 연속형 공변량과 평균 결과 사이의 조건부 관계 \(E[Y |A = a,C = 0,L = l]\)의 가능한 값을 제한한다.

nhefs |> mutate(cens = if_else(is.na(wt82), 1, 0)) -> nhefs

fit <-
  glm(
    wt82_71 ~ qsmk + sex + race + age + I(age * age) + as.factor(education)
    + smokeintensity + I(smokeintensity * smokeintensity) + smokeyrs
    + I(smokeyrs * smokeyrs) + as.factor(exercise) + as.factor(active)
    + wt71 + I(wt71 * wt71) + qsmk * smokeintensity,
    data = nhefs
  )
texreg::screenreg(fit, single.row = T)


====================================================
                                 Model 1            
----------------------------------------------------
(Intercept)                         -1.59 (4.31)    
qsmk                                 2.56 (0.81) ** 
sex                                 -1.43 (0.47) ** 
race                                 0.56 (0.58)    
age                                  0.36 (0.16) *  
age * age                           -0.01 (0.00) ***
as.factor(education)2                0.79 (0.61)    
as.factor(education)3                0.56 (0.56)    
as.factor(education)4                1.49 (0.83)    
as.factor(education)5               -0.19 (0.74)    
smokeintensity                       0.05 (0.05)    
smokeintensity * smokeintensity     -0.00 (0.00)    
smokeyrs                             0.13 (0.09)    
smokeyrs * smokeyrs                 -0.00 (0.00)    
as.factor(exercise)1                 0.30 (0.54)    
as.factor(exercise)2                 0.35 (0.56)    
as.factor(active)1                  -0.95 (0.41) *  
as.factor(active)2                  -0.26 (0.68)    
wt71                                 0.05 (0.08)    
wt71 * wt71                         -0.00 (0.00)    
qsmk:smokeintensity                  0.05 (0.04)    
----------------------------------------------------
AIC                              10701.15           
BIC                              10818.99           
Log Likelihood                   -5328.58           
Deviance                         82763.03           
Num. obs.                         1566              
====================================================
*** p < 0.001; ** p < 0.01; * p < 0.05

이러한 모수적 제약(모델의 가정에 따른) 하에서 우리는 \(A\)와 \(L\)의 값의 조합에 따른 \(\hat{E}[Y|A=a, C= 0, L = l]\)에 대한 추정값을 얻게 된다. 예제에 따르면, 즉, 검열되지 않은 처치군 \((A = 1, C= 0)\) 인 403명과 검열되지 않은 통제군 \((A=0, C=0)\)인 1,163명에 대한 추정량을 얻게 되는 것이다.

우리의 목적은 처치군 \((A = 1)\)과 통제군 \((A = 0)\)일 때, 표준화된 평균 \(\sum_l E[Y|A = a, C= 0, L = l]\times\Pr[L=l]\)을 추정하는 것이다.

13.3 Standardizing the mean outcome to the confounder distribution

개인 \(i\)에 대해 \(l\)의 값에 따라 \(E[Y|A=a, C=0, L= l]\)를 추정하고 나서 평균인 \(\frac{1}{n}\sum_{i=1}^n \hat{E}[Y|A=a, C= 0, L_i]\)를 구하는 것이다. 이때, \(n\)은 연구 대상인 개인의 수이다. \(\sum_l E[Y|A = a, C= 0, L = l]\Pr[L=l]\)는 이중 기대값인 \(E[E[Y|A=a, C=0, L]]\)로 나타낼 수 있다.

표준화 평균을 구하는 단계는 네 가지로 구성되어 있다: 데이터셋의 확장, 결과의 모델링, 예측, 그리고 평균화를 통한 표준화이다.

데이터셋의 확장: 주어진 처치 \(A\)와 교란변수 \(L\)의 값에 따른 평균 결과값에 대해 회귀모델을 적합하기 위하여 가능한 값의 조합에 따라 데이터셋을 확장한다.
결과의 모델링: \(L\)과 \(A\)의 값과 회귀계수 추정값을 결합하여 결과 \(Y\)에 대한 결측값을 채워넣는다.
예측된 결과값들은 \(L\)의 값과 \(A=0\) 또는 \(A = 1\)인 경우에 대한 평균 추정값이다.
마지막으로 처치 수준에 따른 모든 예측값의 평균을 구한다.

id <- c(
  "Rheia", "Kronos", "Demeter", "Hades", "Hestia",
  "Poseidon", "Hera", "Zeus", "Artemis", "Apollo",
  "Leto", "Ares", "Athena", "Hephaestus", "Aphrodite",
  "Cyclope", "Persephone", "Hermes", "Hebe", "Dionysus")
N <- length(id)
L <- c(0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
A <- c(0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1)
Y <- c(0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0)
interv <- rep(-1, N)
observed <- cbind(L, A, Y, interv)
untreated <- cbind(L, rep(0, N), rep(NA, N), rep(0, N))
treated <- cbind(L, rep(1, N), rep(NA, N), rep(1, N))
table22 <- as.data.frame(rbind(observed, untreated, treated))
table22$id <- rep(id, 3)

glm.obj <- glm(Y ~ A * L, data = table22)
texreg::screenreg(glm.obj, 
                  single.row = T,
                  caption = "Standardizing the mean outcome to the baseline confounders")


=============================
                Model 1      
-----------------------------
(Intercept)       0.25 (0.26)
A                 0.00 (0.36)
L                 0.42 (0.39)
A:L              -0.00 (0.50)
-----------------------------
AIC              35.39       
BIC              40.36       
Log Likelihood  -12.69       
Deviance          4.17       
Num. obs.        20          
=============================
*** p < 0.001; ** p < 0.01; * p < 0.05

데이터셋에 두 번째 및 세 번째 블록을 추가하고, 이전 섹션에서 설명한 대로 \(E[Y |A = a,C = 0,L = l]\)에 대한 회귀 모델을 적합하여 예측값을 생성한다. 두 번째 블록의 평균 예측값(통제군의 표준화 평균)은 1.66이고, 세 번째 블록의 평균 예측값(처치군의 표준화 평균)은 5.18이다. 따라서 인과 효과 \(E[Y a=1,c=0]-E[Y a=0,c=0]\) 의 추정치는 \(5.18-1.66 = 3.5\mathrm{kg}\)이 된다.

# create a dataset with 3 copies of each subject
nhefs$interv <- -1 # 1st copy: equal to original one

interv0 <- nhefs # 2nd copy: treatment set to 0, outcome to missing
interv0$interv <- 0
interv0$qsmk <- 0
interv0$wt82_71 <- NA

interv1 <- nhefs # 3rd copy: treatment set to 1, outcome to missing
interv1$interv <- 1
interv1$qsmk <- 1
interv1$wt82_71 <- NA

onesample <- rbind(nhefs, interv0, interv1) # combining datasets

# linear model to estimate mean outcome conditional on treatment and confounders
# parameters are estimated using original observations only (nhefs)
# parameter estimates are used to predict mean outcome for observations with
# treatment set to 0 (interv=0) and to 1 (interv=1)

std <- glm(
  wt82_71 ~ qsmk + sex + race + age + I(age * age)
  + as.factor(education) + smokeintensity
  + I(smokeintensity * smokeintensity) + smokeyrs
  + I(smokeyrs * smokeyrs) + as.factor(exercise)
  + as.factor(active) + wt71 + I(wt71 * wt71) + I(qsmk * smokeintensity),
  data = onesample
)
texreg::screenreg(std, single.row = T)


====================================================
                                 Model 1            
----------------------------------------------------
(Intercept)                         -1.59 (4.31)    
qsmk                                 2.56 (0.81) ** 
sex                                 -1.43 (0.47) ** 
race                                 0.56 (0.58)    
age                                  0.36 (0.16) *  
age * age                           -0.01 (0.00) ***
as.factor(education)2                0.79 (0.61)    
as.factor(education)3                0.56 (0.56)    
as.factor(education)4                1.49 (0.83)    
as.factor(education)5               -0.19 (0.74)    
smokeintensity                       0.05 (0.05)    
smokeintensity * smokeintensity     -0.00 (0.00)    
smokeyrs                             0.13 (0.09)    
smokeyrs * smokeyrs                 -0.00 (0.00)    
as.factor(exercise)1                 0.30 (0.54)    
as.factor(exercise)2                 0.35 (0.56)    
as.factor(active)1                  -0.95 (0.41) *  
as.factor(active)2                  -0.26 (0.68)    
wt71                                 0.05 (0.08)    
wt71 * wt71                         -0.00 (0.00)    
qsmk * smokeintensity                0.05 (0.04)    
----------------------------------------------------
AIC                              10701.15           
BIC                              10818.99           
Log Likelihood                   -5328.58           
Deviance                         82763.03           
Num. obs.                         1566              
====================================================
*** p < 0.001; ** p < 0.01; * p < 0.05

onesample$predicted_meanY <- predict(std, onesample)

# estimate mean outcome in each of the groups interv=0, and interv=1
# this mean outcome is a weighted average of the mean outcomes in each combination
# of values of treatment and confounders, that is, the standardized outcome
mean(onesample[which(onesample$interv == -1), ]$predicted_meanY)

[1] 2.563

mean(onesample[which(onesample$interv == 0), ]$predicted_meanY)

[1] 1.66

mean(onesample[which(onesample$interv == 1), ]$predicted_meanY)

[1] 5.179

이 추정치에 대한 95% 신뢰 구간을 얻기 위해 부트스트래핑이라는 통계 기법을 사용한 결과, 금연하면 체중이 3.5kg 증가하는 것으로 추정하였다(95% 신뢰 구간: 2.6, 4.5).

#install.packages("boot") # install package if required
library(boot)

# function to calculate difference in means
standardization <- function(data, indices) {
  # create a dataset with 3 copies of each subject
  d <- data[indices, ] # 1st copy: equal to original one`
  d$interv <- -1
  d0 <- d # 2nd copy: treatment set to 0, outcome to missing
  d0$interv <- 0
  d0$qsmk <- 0
  d0$wt82_71 <- NA
  d1 <- d # 3rd copy: treatment set to 1, outcome to missing
  d1$interv <- 1
  d1$qsmk <- 1
  d1$wt82_71 <- NA
  d.onesample <- rbind(d, d0, d1) # combining datasets
  
  # linear model to estimate mean outcome conditional on treatment and confounders
  # parameters are estimated using original observations only (interv= -1)
  # parameter estimates are used to predict mean outcome for observations with set
  # treatment (interv=0 and interv=1)
  fit <- glm(
    wt82_71 ~ qsmk + sex + race + age + I(age * age) +
      as.factor(education) + smokeintensity +
      I(smokeintensity * smokeintensity) + smokeyrs + I(smokeyrs *
                                                          smokeyrs) +
      as.factor(exercise) + as.factor(active) + wt71 + I(wt71 *
                                                           wt71),
    data = d.onesample
  )
  
  d.onesample$predicted_meanY <- predict(fit, d.onesample)
  
  # estimate mean outcome in each of the groups interv=-1, interv=0, and interv=1
  return(c(
    mean(d.onesample$predicted_meanY[d.onesample$interv == -1]),
    mean(d.onesample$predicted_meanY[d.onesample$interv == 0]),
    mean(d.onesample$predicted_meanY[d.onesample$interv == 1]),
    mean(d.onesample$predicted_meanY[d.onesample$interv == 1]) -
      mean(d.onesample$predicted_meanY[d.onesample$interv == 0])
  ))
}

# bootstrap
results <- boot(data = nhefs,
                statistic = standardization,
                R = 5)

# generating confidence intervals
se <- c(sd(results$t[, 1]),
        sd(results$t[, 2]),
        sd(results$t[, 3]),
        sd(results$t[, 4]))
mean <- results$t0
ll <- mean - qnorm(0.975) * se
ul <- mean + qnorm(0.975) * se

bootstrap <-
  tibble(Treatment = 
    c(
      "Observed",
      "No Treatment",
      "Treatment",
      "Treatment - No Treatment"
    ),
    mean,
    se,
    ll,
    ul
  )
bootstrap |> 
  mutate_if(is.numeric, round, 3) |> 
  gt::gt()

Treatment	mean	se	ll	ul
Observed	2.562	0.166	2.236	2.888
No Treatment	1.652	0.056	1.542	1.762
Treatment	5.115	0.359	4.411	5.819
Treatment - No Treatment	3.463	0.305	2.865	4.060

13.4 IP weighting or standardization

실제로 모든 모델에서 어느 정도의 잘못된 모델 특정으로 인한 문제는 피할 수 없으며, 이러한 잘못된 특정은 약간의 편향으로 이어질 수 있다. 그러나 처치 모델(IP 가중치)과 결과 모델(표준화)의 잘못된 특정은 일반적으로 효과 추정치의 편향의 크기와 방향이 다르게 나타난다. 따라서 피할 수 없는 모델을 잘못 특정할 가능성이 점추정치에 다른 영향을 미치기 때문에 일반적으로 IP 가중치 추정치는 표준화 추정치와 다른 결과를 나타낼 수 잇다. IP 가중치 추정치와 표준화 추정치 간의 차이가 크면 추정치 중 하나 이상에서 심각한 모델 오특정의 문제가 있음을 알 수 있다.

표준화는 단순히 g-formula의 조건부 평균 결과를 추정치로 대체하기 때문에 plug-in g-formula 추정량이라고 한다. 여기서는 평균 인과효과에만 관심이 있으므로 조건부 평균 결과를 모수적으로 추정한다.

IP 가중치와 모수적 g-formula 중 하나를 선택할 필요가 없는 경우가 많다. 인과 관계를 추정하는 데 두 가지 방법을 모두 사용할 수 있는 경우 두 가지 방법을 모두 사용하면 된다. 또한 가능하면 처치와 결과에 대한 모델을 동일한 추정량에 결합하는 이중 강건성 방법(doubly robust methods)을 사용할 것을 권한다. 주어진 \(L\)에서 교환 가능성과 양의 확률 조건을 충족할 경우에, 이중 강건성 추정량이 두 모델 중 어느 것이 올바른 모델인지 알 수 없어도 처치 모델이나 결과 모델 중 하나가 정확할 경우 평균 인과효과를 일관되게 추정한다. 단, 이제까지 우리가 사용한 모수적 g-fomula는 전체 모집단에서의 평균 인과효과를 추정하기 위해 사용되었다는 것을 주지할 필요가 있다. 만약 특정한 모집단의 서브셋에 대해 인과효과를 추정하고자 한다면, 추정을 위한 계산을 좀 더 제한할 필요가 있다.

13.5 How seriously do we take out estimates?

관찰 효과 추정치는 항상 심각한 비판에 노출되어 있다. 효과 추정치를 다른 모집단으로 옮기고 싶지 않고, 개인 간 간섭이 없다고 하더라도 대상 모집단에 대한 추정치의 유효성을 확보하려면 여러 가지 조건이 필요하다.

관찰 연구가 대상 임상시험과 유사하려면 교환 가능성, 양의 확률 조건, 일관성의 식별 가능성 조건 (Chapter 3) 이 충족되어야 한다.
분석에 사용된 모든 변수를 정확하게 측정해야 한다. 처치 \(A\), 결과 \(Y\) 또는 교란변수 \(L\)의 측정 오류는 일반적으로 편향을 초래한다 (Chapter 9). 실제로 대부분의 변수는 어느 정도의 오측정을 피할 수 없다.
모든 분석에 사용되는 모델은 정확하게 특정될 필요가 있다 (Chapter 11).

만약 이러한 조건들이 충분히 보장된다면, 데이터 분석 그 자체는 큰 문제가 아니다.