Chapter 11 Survival analysis with competing risks: Oral cancer patients
11.1 Description of the data
File oralca2.txt
, that you may
access from a url address to be given in the practical, contains data from 338
patients having an oral squamous cell carcinoma diagnosed and treated
in one tertiary level oncological clinic in Finland since 1985, followed-up
for mortality until 31 December 2008.
The dataset contains the following variables:
Variable | Description |
---|---|
sex |
sex, a factor with categories; 1 = "Female", 2 = "Male" |
age |
age (years) at the date of diagnosing the cancer |
stage |
TNM stage of the tumour (factor): 1 = "I", ..., 4 = "IV", 5 = "unkn" |
time |
follow-up time (in years) since diagnosis until death or censoring |
event |
event ending the follow-up (numeric): 0 = censoring alive, 1 = death from oral cancer, 2 = death from other causes. |
11.2 Loading the packages and the data
- Load the R packages
Epi
, andsurvival
needed in this exercise.
library(Epi)
library(survival)
cB8 <- c("#000000", "#E69F00", "#56B4E9", "#009E73",
"#F0E442", "#0072B2", "#D55E00", "#CC79A7") #colors chosen
options(digits=3)
- Read the datafile
oralca2.txt
from a website, whose precise address will be given in the practical, into an R data frame namedorca
. Look at the head, structure and the summary of the data frame. Using functiontable()
count the numbers of censorings as well as deaths from oral cancer and other causes, respectively, from theevent
variable.
11.3 Total mortality: Kaplan–Meier analyses
- We start our analysis of total mortality pooling the two causes of death into
a single outcome.
First, construct a survival object
orca$suob
from the event variable and the follow-up time using functionSurv()
. Look at the structure and summary ofsuob
.
- Create a
survfit
objects.all
, which does the default calculations for a Kaplan–Meier analysis of the overall (marginal) survival curve.
See the structure of this object and apply print()
method on it, too.
Look at the results; what do you find?
- The
summary
method for asurvfit
object would return a lengthy life table. However, theplot
method with default arguments offers the Kaplan–Meier curve for a conventional illustration of the survival experience in the whole patient group. Alternatively, instead of graphing survival proportions, one can draw a curve describing their complements: the cumulative mortality proportions. This curve is drawn together with the survival curve as the result of the second command line below.
plot(s.all,main="KM estimate of the survival
and cum. mortality proportions",
xlab="years", ylab="Survival")
The effect of option mark.time=F
is to omit
marking the times when censorings occurred.
11.4 Total mortality by stage
Tumour stage is an important prognostic factor in cancer survival.
stage I: cancer hasn’t spread to lymph nodes or other tissue (local),
stage II: cancer has grown but not spread
stage III:cancer has grown larger but not spread
stage IV cancer has spread to other organs or areasPlot separate cumulative mortality curves for the different stage groups marking them with different colors, the order which you may define yourself. Also find the median survival time for each stage.
s.stg <- survfit(suob ~ stage, data = orca)
col5 <- cB8[1:5]
plot(s.stg, col = col5, fun = "event", main="Cum. mortality by stage",mark.time = FALSE)
legend(15, 0.6, title="stage",legend=levels(orca$stage),col = col5,lty=rep(1,5))
- Create now two parallel plots of which the first one describes the cumulative hazards and the second one graphs the log-cumulative hazards against log-time for the different stages. Compare the two presentations with each other and with the one in the previous item.
par(mfrow = c(1, 2))
plot(s.stg, col = col5, fun = "cumhaz", main = "Cumulative hazards")
legend(1, 3.5, title="stage",legend=levels(orca$stage),col = col5,lty=rep(1,5), cex=0.8)
plot(s.stg, col = col5, fun = "cloglog",main = "cloglog: log cum.haz")
legend(3, -2, title="stage",legend=levels(orca$stage),col = col5,lty=rep(1,5), cex=0.7)
If the survival times were exponentially distributed in a given (sub)population the corresponding cloglog-curve should follow an approximately linear pattern. Could this be the case here in the different stages?
Also, if the survival distributions of the different subpopulations would obey the proportional hazards model, the vertical distance between the cloglog-curves should be approximately constant over the time axis. Do these curves indicate serious deviation from the proportional hazards assumption?
In the lecture handouts it was observed that the crude contrast between males and females in total mortality appears unclear, but the age-adjustment in the Cox model provided a more expected hazard ratio estimate. We shall examine the confounding by age somewhat closer. First categorize the continuous age variable into, say, three categories by function
cut()
using suitable breakpoints, like 55 and 75 years, and cross-tabulate sex and age group:
orca$agegr <- cut(orca$age, br = c(0, 55, 75, 95))
stat.table(list(sex, agegr), list(count(), percent(agegr)),
margins = TRUE,
data = orca
)
Male patients are clearly younger than females in these data.
Now, plot Kaplan–Meier curves jointly classified by sex and age.
s.agrx <- survfit(suob ~ agegr + sex, data=orca)
par(mfrow=c(1,1))
plot(s.agrx, fun="event", main="Cumulative mortality (KM) by age and sex",xlab="Time since oral cancer diagnosis (years)",ylab="Cum. mortality",mark.time=F, xlim = c(0,15), lwd=2,
col=rep(c(cB8[7], cB8[6]),3), lty=c(2,2, 1,1, 5,5),
xaxs="i",yaxs="i")
legend(12,0.35, legend=c("(0,55] Female "," (0,55] Male",
"(55,75] Female "," (55,75] Male",
"(75,95] Female "," (75,95] Male" ),
col=rep(c(cB8[7], cB8[6]),3), lty=c(2,2, 1,1, 5,5),cex=0.65)
In each age band the mortality curve for males is on a higher level than that for females.
11.5 Event-specific cumulative mortality curves
We move on to analysing cumulative mortalities for the two causes of death separately, first overall and then by prognostic factors.
- Use the
survfit
-function insurvival
package with optiontype="mstate"
.
- One could apply here the plot method of the survfit object to plot the
cumulative incidences for each cause. However, we suggest that you use
instead a simple function
plotCIF()
found in theEpi
package. The main arguments are
data |
data frame created by function }survfit() |
event |
indicator for the event: values 1 or 2. |
Other arguments are like in the ordinary plot()
function.
- Draw two parallel plots describing the overall cumulative incidence curves for both causes of death
par(mfrow = c(1, 2))
plotCIF(cif1, 1, main = "Cancer death",xlab="Time since oral cancer diagnosis (years)")
plotCIF(cif1, 2, main = "Other deaths",xlab="Time since oral cancer diagnosis (years)")
- Compute the estimated
cumulative incidences by stage for both causes of death.
Now you have to add variable
stage
to survfit-function.
See the structure of the resulting object, in which you should observe strata variable containing the stage grouping variable. Plot the pertinent curves in two parallel graphs. Cut the \(y\)-axis for a more efficient graphical presentation
col5 <- col5
cif2 <- survfit(Surv(time, event, type = "mstate") ~ stage,
data = orca
)
str(cif2)
par(mfrow = c(1, 2))
plotCIF(cif2, 1,
main = "Cancer death by stage",
col = col5, ylim = c(0, 0.7)
)
plotCIF(cif2, 2,
main = "Other deaths by stage",
col = col5, ylim = c(0, 0.7)
)
legend(3, 0.7, title="stage",legend=levels(orca$stage),col = col5,lty=rep(1,5), cex=0.7)
Compare the two plots. What would you conclude about the effect of stage on the two causes of death?
- Using another function
stackedCIF()
inEpi
you can put the two cumulative incidence curves in one graph but stacked upon one another such that the lower curve is for the cancer deaths and the upper curve is for total mortality, and the vertical difference between the two curves describes the cumulative mortality from other causes. You can also add some colours for the different zones:
11.6 Regression modelling of overall mortality.
- Fit the semiparametric proportional hazards
regression model, a.k.a. the Cox model, on all deaths including
sex, age and stage as covariates. Use function
coxph()
in packagesurvival
. It is often useful to center and scale continuous covariates likeage
here. The estimated rate ratios and their confidence intervals can also here be displayed by applyingci.lin()
on the fitted model object.
options(show.signif.stars = FALSE)
m1 <- coxph(suob ~ sex + I((age - 65) / 10) + stage, data = orca)
summary(m1)
round(ci.exp(m1),3)
Look at the results. What are the main findings?
- Check whether the data are sufficiently consistent with the
assumption of proportional hazards with respect to each of
the variables separately
as well as globally, using the
cox.zph()
function.
- No evidence against proportionality assumption could apparently be found. Moreover, no difference can be observed between stages I and II in the estimates. On the other hand, the group with stage unknown is a complex mixture of patients from various true stages. Therefore, it may be prudent to exclude these subjects from the data and to pool the first two stage groups into one. After that fit a model in the reduced data with the new stage variable.
orca2 <- subset(orca, stage != "unkn")
orca2$st3 <- Relevel(orca2$stage, list(1:2, 3, 4:5))
levels(orca2$st3) <- c("I-II", "III", "IV")
m2 <- coxph(Surv(orca2$time, 1 * (orca2$event > 0)) ~ sex + I((age - 65) / 10) + st3, data = orca2)
summary(m2)
#m2 <- update(m1, . ~ . - stage + st3, data = orca2) #do not work
round(ci.exp(m2), 3)
- Plot the predicted cumulative mortality curves by stage,
jointly stratified by sex and age, focusing
only on 40 and 80 year old patients, respectively,
based on the fitted model
m2
. You need to create a new artificial data frame containing the desired values for the covariates.
newd <- data.frame(
sex = c(rep("Male", 6), rep("Female", 6)),
age = rep(c(rep(40, 3), rep(80, 3)), 2),
st3 = rep(levels(orca2$st3), 4)
)
newd
col3 <- cB8[1:3] #pre-setting color palette
leg<-levels(interaction(levels(factor(orca2$sex)),levels(orca2$st3))) #legend labels by sex and stage
## Warning in ans * length(l) + if1: longer object length is not a
## multiple of shorter object length
par(mfrow = c(1, 2))
plot(
survfit(m2, newdata = subset(newd, sex == "Male" & age == 40)),
col = col3,
lty= 1,
fun = "event", mark.time = FALSE,
main="Cum. mortality for M and F \n age 40"
)
lines(
survfit(m2, newdata = subset(newd, sex == "Female" & age == 40)),
col = col3, lty=c(2,2,2),fun = "event", mark.time = FALSE
)
legend(0, 0.95, title="stage",legend=leg,
col = c(col3[1],col3[1],col3[2],col3[2],col3[3],col3[3]),
,lty=c(2,1,2,1,2,1), cex=0.6)
plot(
survfit(m2, newdata = subset(newd, sex == "Male" & age == 80)),
ylim = c(0, 1), col = col3, fun = "event", mark.time = FALSE,
main="Cum. mortality for M and F \n age 80"
)
lines(
survfit(m2, newdata = subset(newd, sex == "Female" & age == 80) ),
col = col3, fun = "event", lty = 2, mark.time = FALSE
)
legend(10, 0.5, title="stage",legend=leg,
col = c(col3[1],col3[1],col3[2],col3[2],col3[3],col3[3]),
,lty=c(2,1,2,1,2,1), cex=0.7)
11.7 Modelling event-specific hazards
- Fit the Cox model for the cause-specific hazard of cancer deaths with the same covariates as above. In this case only cancer deaths are counted as events and deaths from other causes are included into censorings.
m2haz1 <-
coxph(
Surv(time, event == 1) ~ sex + I((age - 65) / 10) + st3,
data = orca2
)
round(ci.exp(m2haz1), 4)
cox.zph(m2haz1)
Compare the results with those of model m2
. What are the major differences?
- Fit a similar model for deaths from other causes and compare the results.
11.8 Lexis object with multi-state set-up
Before entering to multi-state analyses, it might be instructive to apply some Lexis tools to illustrate the competing-risks set-up. More detailed explanation of these tools will be given by Bendix later.
- Form a
Lexis
object from the data frame and print a summary of it. We shall name the main (and only) time axis in this object asstime
.
orca.lex <- Lexis(
exit = list(stime = time),
exit.status = factor(event,
labels = c("Alive", "Oral ca. death", "Other death")
),
data = orca
)
summary(orca.lex)
- Draw a box diagram of the two-state set-up of competing transitions. Run first th e following command line
Now, move the cursor to the point in the graphics window, at which you wish to put the box for Alive, and click. Next, move the cursor to the point at which you wish to have the box for Oral ca. death, and click. Finally, do the same with the box for Other death. If you are not happy with the outcome, run the command line again and repeat the necessary mouse moves and clicks.