Chapter 11 Survival analysis with competing risks: Oral cancer patients

11.1 Description of the data

File oralca2.txt, that you may access from a url address to be given in the practical, contains data from 338 patients having an oral squamous cell carcinoma diagnosed and treated in one tertiary level oncological clinic in Finland since 1985, followed-up for mortality until 31 December 2008.

The dataset contains the following variables:

Variable	Description
`sex`	sex, a factor with categories; `1 = "Female", 2 = "Male"`
`age`	age (years) at the date of diagnosing the cancer
`stage`	TNM stage of the tumour (factor): `1 = "I", ..., 4 = "IV", 5 = "unkn"`
`time`	follow-up time (in years) since diagnosis until death or censoring
`event`	event ending the follow-up (numeric): `0 = censoring alive, 1 = death from oral cancer, 2 = death from other causes.`

11.2 Loading the packages and the data

Load the R packages Epi, and survival needed in this exercise.

library(Epi)
library(survival)
cB8  <- c("#000000", "#E69F00", "#56B4E9", "#009E73", 
          "#F0E442", "#0072B2", "#D55E00", "#CC79A7") #colors chosen
options(digits=3)

Read the datafile oralca2.txt from a website, whose precise address will be given in the practical, into an R data frame named orca. Look at the head, structure and the summary of the data frame. Using function table() count the numbers of censorings as well as deaths from oral cancer and other causes, respectively, from the event variable.

orca <-  read.table(file = 'https://raw.githubusercontent.com/SPE-R/SPE/master/pracs/data/oralca2.txt', header = TRUE, sep = " ",row.names = 1 )
head(orca)
orca$stage<-as.factor(orca$stage)
orca$sex<-factor(orca$sex,levels=c("Male","Female"))
str(orca)
summary(orca)

11.3 Total mortality: Kaplan–Meier analyses

We start our analysis of total mortality pooling the two causes of death into a single outcome. First, construct a survival object orca$suob from the event variable and the follow-up time using function Surv(). Look at the structure and summary of suob .

suob <- Surv(orca$time, 1 * (orca$event > 0))
str(suob)
summary(suob)

Create a survfit object s.all, which does the default calculations for a Kaplan–Meier analysis of the overall (marginal) survival curve.

s.all <- survfit(suob ~ 1, data = orca)

See the structure of this object and apply print() method on it, too. Look at the results; what do you find?

s.all
str(s.all)

The summary method for a survfit object would return a lengthy life table. However, the plot method with default arguments offers the Kaplan–Meier curve for a conventional illustration of the survival experience in the whole patient group. Alternatively, instead of graphing survival proportions, one can draw a curve describing their complements: the cumulative mortality proportions. This curve is drawn together with the survival curve as the result of the second command line below.

plot(s.all,main="KM estimate of the survival
     and cum. mortality proportions",
     xlab="years", ylab="Survival")

plot(s.all)
lines(s.all, fun = "event", mark.time = F, conf.int = FALSE)

The effect of option mark.time=F is to omit marking the times when censorings occurred.

11.4 Total mortality by stage

Tumour stage is an important prognostic factor in cancer survival.

stage I: cancer hasn’t spread to lymph nodes or other tissue (local),
stage II: cancer has grown but not spread
stage III:cancer has grown larger but not spread
stage IV cancer has spread to other organs or areas
Plot separate cumulative mortality curves for the different stage groups marking them with different colors, the order which you may define yourself. Also find the median survival time for each stage.

s.stg <- survfit(suob ~ stage, data = orca)
col5 <- cB8[1:5]
plot(s.stg, col = col5, fun = "event", main="Cum. mortality by stage",mark.time = FALSE)
legend(15, 0.6, title="stage",legend=levels(orca$stage),col = col5,lty=rep(1,5))

Create now two parallel plots of which the first one describes the cumulative hazards and the second one graphs the log-cumulative hazards against log-time for the different stages. Compare the two presentations with each other and with the one in the previous item.

par(mfrow = c(1, 2))
plot(s.stg, col = col5, fun = "cumhaz", main = "Cumulative hazards")
legend(1, 3.5, title="stage",legend=levels(orca$stage),col = col5,lty=rep(1,5), cex=0.8)
plot(s.stg, col = col5, fun = "cloglog",main = "cloglog: log cum.haz")
legend(3, -2, title="stage",legend=levels(orca$stage),col = col5,lty=rep(1,5), cex=0.7)

If the survival times were exponentially distributed in a given (sub)population the corresponding cloglog-curve should follow an approximately linear pattern. Could this be the case here in the different stages?
Also, if the survival distributions of the different subpopulations would obey the proportional hazards model, the vertical distance between the cloglog-curves should be approximately constant over the time axis. Do these curves indicate serious deviation from the proportional hazards assumption?
In the lecture handouts it was observed that the crude contrast between males and females in total mortality appears unclear, but the age-adjustment in the Cox model provided a more expected hazard ratio estimate. We shall examine the confounding by age somewhat closer. First categorize the continuous age variable into, say, three categories by function cut() using suitable breakpoints, like 55 and 75 years, and cross-tabulate sex and age group:

orca$agegr <- cut(orca$age, br = c(0, 55, 75, 95))
stat.table(list(sex, agegr), list(count(), percent(agegr)),
  margins = TRUE, 
  data = orca
)

Male patients are clearly younger than females in these data.

Now, plot Kaplan–Meier curves jointly classified by sex and age.

s.agrx <- survfit(suob ~ agegr + sex, data=orca)
par(mfrow=c(1,1))
plot(s.agrx, fun="event", main="Cumulative mortality (KM) by age and sex",xlab="Time since oral cancer diagnosis (years)",ylab="Cum. mortality",mark.time=F, xlim = c(0,15), lwd=2,
             col=rep(c(cB8[7], cB8[6]),3), lty=c(2,2, 1,1, 5,5),
     xaxs="i",yaxs="i")
legend(12,0.35, legend=c("(0,55] Female "," (0,55] Male",
                       "(55,75] Female "," (55,75] Male",
                       "(75,95] Female "," (75,95] Male" ),
       col=rep(c(cB8[7], cB8[6]),3), lty=c(2,2, 1,1, 5,5),cex=0.65)

In each age band the mortality curve for males is on a higher level than that for females.

11.5 Event-specific cumulative mortality curves

We move on to analysing cumulative mortalities for the two causes of death separately, first overall and then by prognostic factors.

Use the survfit-function in survival package with option type="mstate".

library(survival)
cif1 <- survfit(Surv(time, event, type = "mstate") ~ 1,
  data = orca
)
str(cif1)

One could apply here the plot method of the survfit object to plot the cumulative incidences for each cause. However, we suggest that you use instead a simple function plotCIF() found in the Epi package. The main arguments are

`data`	data frame created by function }`survfit()`
`event`	indicator for the event: values 1 or 2.

Other arguments are like in the ordinary plot() function.

Draw two parallel plots describing the overall cumulative incidence curves for both causes of death

par(mfrow = c(1, 2))
plotCIF(cif1, 1, main = "Cancer death",xlab="Time since oral cancer diagnosis (years)")
plotCIF(cif1, 2, main = "Other deaths",xlab="Time since oral cancer diagnosis (years)")

Compute the estimated cumulative incidences by stage for both causes of death. Now you have to add variable stage to survfit-function.

See the structure of the resulting object, in which you should observe strata variable containing the stage grouping variable. Plot the pertinent curves in two parallel graphs. Cut the $y$-axis for a more efficient graphical presentation

col5 <- col5
cif2 <- survfit(Surv(time, event, type = "mstate") ~ stage,
  data = orca
)
str(cif2)

par(mfrow = c(1, 2))
plotCIF(cif2, 1,
  main = "Cancer death by stage",
  col = col5, ylim = c(0, 0.7)
)
plotCIF(cif2, 2,
  main = "Other deaths by stage",
  col = col5, ylim = c(0, 0.7)
)

legend(3, 0.7, title="stage",legend=levels(orca$stage),col = col5,lty=rep(1,5), cex=0.7)

Compare the two plots. What would you conclude about the effect of stage on the two causes of death?

Using another function stackedCIF() in Epi you can put the two cumulative incidence curves in one graph but stacked upon one another such that the lower curve is for the cancer deaths and the upper curve is for total mortality, and the vertical difference between the two curves describes the cumulative mortality from other causes. You can also add some colours for the different zones:

par(mfrow=c(1,1),xaxs="i", yaxs="i") # make plot start 0,0
stackedCIF(cif1,xlim=c(0,20),
           col = c("black"),
           fill=c(cB8[6],cB8[8],cB8[2])) #choosing some colors 
text( 10, 0.10, "Oral ca death ", pos = 4)
text( 10, 0.5, " Other death ", pos = 4)
text( 10, 0.80, " Alive ", pos = 4)

11.6 Regression modelling of overall mortality.

Fit the semiparametric proportional hazards regression model, a.k.a. the Cox model, on all deaths including sex, age and stage as covariates. Use function coxph() in package survival. It is often useful to center and scale continuous covariates like age here. The estimated rate ratios and their confidence intervals can also here be displayed by applying ci.lin() on the fitted model object.

options(show.signif.stars = FALSE)
m1 <- coxph(suob ~ sex + I((age - 65) / 10) + stage, data = orca)
summary(m1)
round(ci.exp(m1),3)

Look at the results. What are the main findings?

Check whether the data are sufficiently consistent with the assumption of proportional hazards with respect to each of the variables separately as well as globally, using the cox.zph() function.

cox.zph(m1)

No evidence against proportionality assumption could apparently be found. Moreover, no difference can be observed between stages I and II in the estimates. On the other hand, the group with stage unknown is a complex mixture of patients from various true stages. Therefore, it may be prudent to exclude these subjects from the data and to pool the first two stage groups into one. After that fit a model in the reduced data with the new stage variable.

orca2 <- subset(orca, stage != "unkn")
orca2$st3 <- Relevel(orca2$stage, list(1:2, 3, 4:5))
levels(orca2$st3) <- c("I-II", "III", "IV")
m2 <- coxph(Surv(orca2$time, 1 * (orca2$event > 0)) ~ sex + I((age - 65) / 10) + st3, data = orca2)
summary(m2)
#m2 <- update(m1, . ~ . - stage + st3, data = orca2) #do not work
round(ci.exp(m2), 3)

Plot the predicted cumulative mortality curves by stage, jointly stratified by sex and age, focusing only on 40 and 80 year old patients, respectively, based on the fitted model m2. You need to create a new artificial data frame containing the desired values for the covariates.

newd <- data.frame(
  sex = c(rep("Male", 6), rep("Female", 6)),
  age = rep(c(rep(40, 3), rep(80, 3)), 2),
  st3 = rep(levels(orca2$st3), 4)
)
newd
col3 <- cB8[1:3] #pre-setting color palette
leg<-levels(interaction(levels(factor(orca2$sex)),levels(orca2$st3))) #legend labels by sex and stage

## Warning in ans * length(l) + if1: longer object length is not a
## multiple of shorter object length

par(mfrow = c(1, 2))
plot(
  survfit(m2, newdata = subset(newd, sex == "Male" & age == 40)),
  col = col3, 
  lty= 1,
  fun = "event", mark.time = FALSE,
  main="Cum. mortality for M and F \n age 40"
  )
lines(
  survfit(m2, newdata = subset(newd, sex == "Female" & age == 40)),
  col = col3, lty=c(2,2,2),fun = "event", mark.time = FALSE
)
legend(0, 0.95, title="stage",legend=leg,
       col = c(col3[1],col3[1],col3[2],col3[2],col3[3],col3[3]),
        ,lty=c(2,1,2,1,2,1), cex=0.6)
plot(
  survfit(m2, newdata = subset(newd, sex == "Male" & age == 80)),
  ylim = c(0, 1), col = col3, fun = "event", mark.time = FALSE,
   main="Cum. mortality for M and F \n age 80"
)
lines(
  survfit(m2, newdata = subset(newd, sex == "Female" & age == 80)  ),
  col = col3, fun = "event", lty = 2, mark.time = FALSE
)
legend(10, 0.5, title="stage",legend=leg,
       col = c(col3[1],col3[1],col3[2],col3[2],col3[3],col3[3]),
        ,lty=c(2,1,2,1,2,1), cex=0.7)

11.7 Modelling event-specific hazards

Fit the Cox model for the cause-specific hazard of cancer deaths with the same covariates as above. In this case only cancer deaths are counted as events and deaths from other causes are included into censorings.

m2haz1 <- 
  coxph(
    Surv(time, event == 1) ~ sex + I((age - 65) / 10) + st3, 
    data = orca2
  )
round(ci.exp(m2haz1), 4)
cox.zph(m2haz1)

Compare the results with those of model m2. What are the major differences?

Fit a similar model for deaths from other causes and compare the results.

m2haz2 <- 
  coxph(
    Surv(time, event == 2) ~ sex + I((age - 65) / 10) + st3, 
    data = orca2
  )
round(ci.exp(m2haz2), 4)
cox.zph(m2haz2)

11.8 Lexis object with multi-state set-up

Before entering to multi-state analyses, it might be instructive to apply some Lexis tools to illustrate the competing-risks set-up. More detailed explanation of these tools will be given by Bendix later.

Form a Lexis object from the data frame and print a summary of it. We shall name the main (and only) time axis in this object as stime.

orca.lex <- Lexis(
  exit = list(stime = time),
  exit.status = factor(event,
    labels = c("Alive", "Oral ca. death", "Other death")
  ),
  data = orca
)
summary(orca.lex)

Draw a box diagram of the two-state set-up of competing transitions. Run first th e following command line

boxes(orca.lex,boxpos = TRUE)

Now, move the cursor to the point in the graphics window, at which you wish to put the box for Alive, and click. Next, move the cursor to the point at which you wish to have the box for Oral ca. death, and click. Finally, do the same with the box for Other death. If you are not happy with the outcome, run the command line again and repeat the necessary mouse moves and clicks.