Chapter 15 Time-dependent variables and multiple states

The following practical exercise is based on the data from paper:

P Hovind, L Tarnow, P Rossing, B Carstensen, and HH Parving: Improved survival in patients obtaining remission of nephrotic range albuminuria in diabetic nephropathy. Kidney Int, 66(3):1180–1186, Sept 2004.

You can find a .pdf-version of the paper here: http://BendixCarstensen.com/AdvCoh/papers/Hovind.2004.pdf

15.1 The renal failure dataset

The dataset renal.dta contains data on follow up of 125 patients from Steno Diabetes Center. They enter the study when they are diagnosed with nephrotic range albuminuria (NRA). This is a condition where the levels of albumin in the urine is exceeds a certain level as a sign of kidney disease. The levels may however drop as a consequence of treatment, this is called remission. Patients exit the study at death or kidney failure (dialysis or transplant).

Variable Description
id Patient id
sex 1=male, 2=female
dob Date of birth
doe Date of entry into the study (2.5 years after NRA)
dor Date of remission. Missing if no remission has occurred
dox Date of exit from study
event Exit status: 1,2,3=event (death, ESRD), 0=censored
  1. The dataset is in Stata-format, so you must read the dataset using read.dta from the foreign package (which is part of the standard R-distribution). At the same time, convert sex to a proper factor. Choose where to read the dataset.

    library(Epi)
    library(survival)
    library(mgcv)
    library(foreign)
    # renal <- read.dta(
    #  "https://raw.githubusercontent.com/SPE-R/SPE/master/pracs/data/renal.dta")
    renal <- read.dta("http://BendixCarstensen.com/SPE/data/renal.dta")
    renal$sex <- factor(renal$sex, labels = c("M", "F"))
    head(renal)
  2. Use the Lexis function to declare the data as survival data with age, calendar time and time since entry into the study as timescales. Label any event \(>0\) as ESRD, i.e. renal death (death of kidney (transplant or dialysis), or person). Note that you must make sure that the alive state (here NRA) is the first, as Lexis assumes that everyone starts in this state (unless of course entry.status is specified):

    Lr <- Lexis(entry = list(per = doe,
                             age = doe - dob,
                             tfi = 0),
                 exit = list(per = dox),
          exit.status = factor(event > 0, labels = c("NRA", "ESRD")),
                 data = renal)
    NOTE: entry.status has been set to "NRA" for all.
    str(Lr)
    Classes 'Lexis' and 'data.frame':   125 obs. of  14 variables:
     $ per    : num  1996 1990 1988 1995 1988 ...
     $ age    : num  28.1 30.2 25.8 44.5 26.6 ...
     $ tfi    : num  0 0 0 0 0 0 0 0 0 0 ...
     $ lex.dur: num  1.08 6.6 5.39 8.75 16.07 ...
     $ lex.Cst: Factor w/ 2 levels "NRA","ESRD": 1 1 1 1 1 1 1 1 1 1 ...
     $ lex.Xst: Factor w/ 2 levels "NRA","ESRD": 2 2 2 1 1 2 2 1 2 1 ...
     $ lex.id : int  1 2 3 4 5 6 7 8 9 10 ...
     $ id     : num  17 26 27 33 42 46 47 55 62 64 ...
     $ sex    : Factor w/ 2 levels "M","F": 1 2 2 1 2 2 1 1 2 1 ...
     $ dob    : num  1968 1959 1962 1951 1961 ...
     $ doe    : num  1996 1990 1988 1995 1988 ...
     $ dor    : num  NA 1990 NA 1996 1997 ...
     $ dox    : num  1997 1996 1993 2004 2004 ...
     $ event  : num  2 1 3 0 0 2 1 0 2 0 ...
     - attr(*, "time.scales")= chr [1:3] "per" "age" "tfi"
     - attr(*, "time.since")= chr [1:3] "" "" ""
     - attr(*, "breaks")=List of 3
      ..$ per: NULL
      ..$ age: NULL
      ..$ tfi: NULL
    summary(Lr)
    
    Transitions:
         To
    From  NRA ESRD  Records:  Events: Risk time:  Persons:
      NRA  48   77       125       77       1085       125

    Make sure you know what the variables in Lr stand for.

  3. Visualize the follow-up in a Lexis-diagram, by using the plot method for Lexis objects.

    plot(Lr, col = "black", lwd = 3)
    subset(Lr, age < 0)
     lex.id  per   age tfi lex.dur lex.Cst lex.Xst  id sex  dob  doe dor  dox event
         88 1989 -38.8   0     3.5     NRA    ESRD 586   M 2028 1989  NA 1993     1

    What is wrong here? List the data for the person with negative entry age.

  4. Correct the data and make a new plot, for example by:

    Lr <- transform(Lr, age = ifelse(dob > 2000, age + 100, age),
                        dob = ifelse(dob > 2000, dob - 100, dob))
    subset(Lr, id == 586)
     lex.id  per  age tfi lex.dur lex.Cst lex.Xst  id sex  dob  doe dor  dox event
         88 1989 61.2   0     3.5     NRA    ESRD 586   M 1928 1989  NA 1993     1
    plot(Lr, col = "black", lwd = 3)
  5. Now make a Cox-regression analysis of ESRD occurrence with the variables sex and age at entry into the study, using time since entry to the study as time scale.

    mc <- coxph(Surv(lex.dur, lex.Xst == "ESRD") 
                ~ I(age / 10) + sex, data = Lr)
    summary(mc)
    Call:
    coxph(formula = Surv(lex.dur, lex.Xst == "ESRD") ~ I(age/10) + 
        sex, data = Lr)
    
      n= 125, number of events= 77 
    
                coef exp(coef) se(coef)     z Pr(>|z|)
    I(age/10)  0.551     1.736    0.140  3.93  8.4e-05
    sexF      -0.182     0.834    0.273 -0.67     0.51
    
              exp(coef) exp(-coef) lower .95 upper .95
    I(age/10)     1.736      0.576     1.319      2.28
    sexF          0.834      1.199     0.489      1.42
    
    Concordance= 0.612  (se = 0.036 )
    Likelihood ratio test= 16.1  on 2 df,   p=3e-04
    Wald test            = 16.4  on 2 df,   p=3e-04
    Score (logrank) test = 16.8  on 2 df,   p=2e-04

    What is the hazard ratio between males and females? Between two persons who differ 10 years in age at entry?

  6. The main focus of the paper was to assess whether the occurrence of remission (return to a lower level of albumin excretion, an indication of kidney recovery) influences mortality. Remission is a time-dependent variable which is initially 0, but takes the value 1 when remission occurs. In order to handle this, each person who sees a remission must have two records:

    • One record for the time before remission, where entry is doe, exit is dor, remission is 0, and event is 0.
    • One record for the time after remission, where entry is dor, exit is dox, remission is 1, and event is 0 or 1 according to whether the person had an event at dox.

    This is accomplished using the cutLexis function on the Lexis object, where we introduce a remission state Rem. Also use split.state=TRUE to have different ESRD states according to whether a person had had remission or not prioer to ESRD. The statement to do this is:

    Lc <- cutLexis(Lr, cut = Lr$dor, # where to cut follow up
                 timescale = "per",  # what timescale are we referring to
                 new.state = "Rem",  # name of the new state
               split.state = TRUE)   # different states depending on previous
    summary(Lc)
    
    Transitions:
         To
    From  NRA Rem ESRD ESRD(Rem)  Records:  Events: Risk time:  Persons:
      NRA  24  29   69         0       122       98        825       122
      Rem   0  24    0         8        32        8        260        32
      Sum  24  53   69         8       154      106       1085       125

    List the records from a few select persons (choose values for lex.id, using for example subset(Lc, lex.id %in% c(5,7,9))).

  7. Now show how the states are connected and the number of transitions between them by using boxes. This is an interactive command that requires you to click in the graph window:

    boxes(Lc)

    It has a couple of fancy arguments, try:

    boxes(Lc, boxpos = TRUE, scale.R = 100, show.BE = TRUE, hm = 1.5, wm = 1.5)

    You may even be tempted to read the help page for boxes.Lexis

  8. Plot a Lexis diagram where different coloring is used for different segments of the follow-up. The plot.Lexis function draws a line for each record in the dataset, so you can index the coloring by lex.Cst and lex.Xst as appropriate — indexing by a factor corresponds to indexing by the index number of the factor levels, so you must be know which order the factor levels are in:

    levels(Lc) # names and order of states in lex.Cst and lex.Xst
    [1] "NRA"       "Rem"       "ESRD"      "ESRD(Rem)"
    par(mai = c(3, 3, 1, 1) / 4, mgp = c(3, 1, 0) / 1.6)
    plot(Lc, col = c("red", "limegreen")[Lc$lex.Cst],
            xlab = "Calendar time", ylab = "Age",
             lwd = 3, grid = 0:20 * 5, las = 1,
            xlim = c(1970, 2010), ylim = c(20, 70), 
            xaxs = "i", yaxs = "i")
    points(Lc, pch = c(NA, NA, 16, 16)[Lc$lex.Xst],
               col = c("red", "limegreen", "transparent", "transparent")[Lc$lex.Cst])
    points(Lc, pch = c(NA, NA, 1, 1)[Lc$lex.Xst],
               col = "black", lwd = 2)
  9. Make a Cox-regression of mortality rates (i.e. endpoint ESRD or ESRD(Rem)) with sex, age at entry and remission as explanatory variables, using time since entry as timescale, and include lex.Cst as time-dependent variable, and indicate that each record represents follow-up from tfi to tfi+lex.dur. Make sure that you know why what goes where here in the call to coxph.

    (EP <- levels(Lc)[3:4])           # define EndPoint states
    [1] "ESRD"      "ESRD(Rem)"
    m1 <- coxph(Surv(tfi,             # entry time
                     tfi + lex.dur,   # exit time
                     lex.Xst %in% EP) # event
                ~ sex + I((doe - dob - 50) / 10) + # fixed covariates
                  (lex.Cst == "Rem"),              # time-dependent variable
                data = Lc)
    summary(m1)
    Call:
    coxph(formula = Surv(tfi, tfi + lex.dur, lex.Xst %in% EP) ~ sex + 
        I((doe - dob - 50)/10) + (lex.Cst == "Rem"), data = Lc)
    
      n= 154, number of events= 77 
    
                              coef exp(coef) se(coef)     z Pr(>|z|)
    sexF                   -0.0553    0.9462   0.2750 -0.20  0.84052
    I((doe - dob - 50)/10)  0.5219    1.6852   0.1366  3.82  0.00013
    lex.Cst == "Rem"TRUE   -1.2624    0.2830   0.3848 -3.28  0.00104
    
                           exp(coef) exp(-coef) lower .95 upper .95
    sexF                       0.946      1.057     0.552     1.622
    I((doe - dob - 50)/10)     1.685      0.593     1.290     2.202
    lex.Cst == "Rem"TRUE       0.283      3.534     0.133     0.602
    
    Concordance= 0.664  (se = 0.033 )
    Likelihood ratio test= 30.3  on 3 df,   p=1e-06
    Wald test            = 27.1  on 3 df,   p=6e-06
    Score (logrank) test = 29.4  on 3 df,   p=2e-06

    What is the effect of of remission on the rate of ESRD?

    Splitting the follow-up time

    In order to explore the effect of remission on the rate of ESRD, we split the data further into small pieces of follow-up. To this end we use the function splitLexis. The rates can then be modeled using a Poisson-model, and the shape of the effect of the underlying rates be explored. Furthermore, we can allow effects of both time since NRA and current age. To this end we will use splines, so we need the splines and also the mgcv packages.

  10. Now split the follow-up time every month after entry, and verify that the number of events and risk time is the same as before and after the split:

    sLc <- splitLexis(Lc, "tfi", breaks = seq(0, 30, 1/12))
    summary( Lc)
    
    Transitions:
         To
    From  NRA Rem ESRD ESRD(Rem)  Records:  Events: Risk time:  Persons:
      NRA  24  29   69         0       122       98        825       122
      Rem   0  24    0         8        32        8        260        32
      Sum  24  53   69         8       154      106       1085       125
    summary(sLc)
    
    Transitions:
         To
    From   NRA  Rem ESRD ESRD(Rem)  Records:  Events: Risk time:  Persons:
      NRA 9854   29   69         0      9952       98        825       122
      Rem    0 3139    0         8      3147        8        260        32
      Sum 9854 3168   69         8     13099      106       1085       125
  11. Now fit the Poisson-model corresponding to the Cox-model we fitted previously. The function Ns() produces a model matrix corresponding to a piece-wise cubic function, modeling the baseline hazard explicitly (think of the Ns terms as the baseline hazard that is not visible in the Cox-model). You can use the wrapper function glm.Lexis

    mp <- glm.Lexis(sLc, 
                    ~ Ns(tfi, knots = c(0, 2, 5, 10)) +
                      sex + I((doe - dob - 40) / 10) + 
                      I(lex.Cst == "Rem"))
    stats::glm Poisson analysis of Lexis object sLc with log link:
    Rates for transitions:
    NRA->ESRD
    Rem->ESRD(Rem)
    ci.exp(mp)
                                     exp(Est.)    2.5%   97.5%
    (Intercept)                         0.0166 0.00396   0.070
    Ns(tfi, knots = c(0, 2, 5, 10))1    5.1892 1.94920  13.815
    Ns(tfi, knots = c(0, 2, 5, 10))2   34.2000 1.76482 662.755
    Ns(tfi, knots = c(0, 2, 5, 10))3    4.4332 2.17998   9.015
    sexF                                0.9175 0.53626   1.570
    I((doe - dob - 40)/10)              1.7008 1.30081   2.224
    I(lex.Cst == "Rem")TRUE             0.2793 0.13140   0.594

    How does the effects of sex change from the Cox-model?

  12. Try instead using the gam function from the mgcv package. There is convenience wrapper for this for Lexis objects as well:

    mx <- gam.Lexis(sLc,
                    ~ s(tfi, k = 10) + 
                      sex + I((doe - dob - 40) / 10) + 
                      I(lex.Cst == "Rem"))
    mgcv::gam Poisson analysis of Lexis object sLc with log link:
    Rates for transitions:
    NRA->ESRD
    Rem->ESRD(Rem)
    ci.exp(mp, subset = c("Cst", "doe", "sex"))
                            exp(Est.)  2.5% 97.5%
    I(lex.Cst == "Rem")TRUE     0.279 0.131 0.594
    I((doe - dob - 40)/10)      1.701 1.301 2.224
    sexF                        0.918 0.536 1.570
    ci.exp(mx, subset = c("Cst", "doe", "sex"))
                            exp(Est.)  2.5% 97.5%
    I(lex.Cst == "Rem")TRUE     0.278 0.131 0.592
    I((doe - dob - 40)/10)      1.699 1.300 2.222
    sexF                        0.931 0.544 1.595

    We see that there is virtually no difference between the two approaches in terms of the regression parameters.

  13. Extract the regression parameters from the models using ci.exp and compare with the estimates from the Cox-model:

    ci.exp(mx, subset = c("sex", "dob", "Cst"), pval = TRUE)
                            exp(Est.)  2.5% 97.5%        P
    sexF                        0.931 0.544 1.595 0.794539
    I((doe - dob - 40)/10)      1.699 1.300 2.222 0.000107
    I(lex.Cst == "Rem")TRUE     0.278 0.131 0.592 0.000897
    ci.exp(m1)
                           exp(Est.)  2.5% 97.5%
    sexF                       0.946 0.552 1.622
    I((doe - dob - 50)/10)     1.685 1.290 2.202
    lex.Cst == "Rem"TRUE       0.283 0.133 0.602
    round(ci.exp(mp, subset = c("sex", "dob", "Cst")) / ci.exp(m1), 2)
                            exp(Est.) 2.5% 97.5%
    sexF                         0.97 0.97  0.97
    I((doe - dob - 40)/10)       1.01 1.01  1.01
    I(lex.Cst == "Rem")TRUE      0.99 0.99  0.99

    How large are the differences in estimated regression parameters?

  14. The model has the same assumptions as the Cox-model about proportionality of rates, but there is an additional assumption that the hazard is a smooth function of time since entry. It seems to be a sensible assumption (well, restriction) to put on the rates that they vary smoothly by time. No such restriction is made in the Cox model. The gam model optimizes the shape of the smoother by general cross-validation. Try to look at the shape of the estimated effect of tfi:

    plot(mx)

    Is this a useful plot?

  15. However, plot (well, plot.gam) does not give you the absolute level of the underlying rates because it bypasses the intercept. So in order to predict the rates as a function of tfi and the covariates, we set up a prediction data frame. Note that age in the model specification is entered as doe-dob, hence the prediction data frame must have these two variables and not the age, but it is only the difference that matters for the prediction:

    nd <- data.frame(tfi = seq(0, 20, 0.1),
                     sex = "M",
                     doe = 1990,
                     dob = 1940,
                 lex.Cst = "NRA")
    str(nd)
    'data.frame':   201 obs. of  5 variables:
     $ tfi    : num  0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 ...
     $ sex    : chr  "M" "M" "M" "M" ...
     $ doe    : num  1990 1990 1990 1990 1990 1990 1990 1990 1990 1990 ...
     $ dob    : num  1940 1940 1940 1940 1940 1940 1940 1940 1940 1940 ...
     $ lex.Cst: chr  "NRA" "NRA" "NRA" "NRA" ...
    matshade(nd$tfi, cbind(ci.pred(mp, newdata = nd),
                           ci.pred(mx, newdata = nd)) * 100,
             plot = TRUE,
             type = "l", lwd = 3:4, col = c("black", "forestgreen"),
             log = "y", xlab = "Time since entry (years)",
             ylab = "ESRD rate (per 100 PY) for 50 year old men")

    Try to overlay with the corresponding prediction from the glm model using Ns.

## Prediction from the multistate model

If we want to make proper statements about the survival and disease
probabilities we must know not only how the occurrence of remission
influences the rate of death/ESRD, but we must also model the
occurrence rate of remission itself.
  1. The rates of ESRD were modelled by a Poisson model with effects of age and time since NRA — in the models mp and mx. But if we want to model whole process we must also model the remission rates transition from NRA to Rem, but the number of events is rather small so we restrict covariates in this model to only time since NRA and sex. Note that only the records that represent follow-up in the NRA state should be used; this is most easily done using the gam.Lexis function

    mr <- gam.Lexis(sLc, ~ s(tfi, k = 10) + sex,
                         from = "NRA",
                           to = "Rem")
    mgcv::gam Poisson analysis of Lexis object sLc with log link:
    Rates for the transition:
    NRA->Rem
    summary(mr)
    
    Family: poisson 
    Link function: log 
    
    Formula:
    cbind(trt(Lx$lex.Cst, Lx$lex.Xst) %in% trnam, Lx$lex.dur) ~ s(tfi, 
        k = 10) + sex
    
    Parametric coefficients:
                Estimate Std. Error z value Pr(>|z|)
    (Intercept)   -3.703      0.258  -14.34   <2e-16
    sexF           0.958      0.373    2.57     0.01
    
    Approximate significance of smooth terms:
            edf Ref.df Chi.sq p-value
    s(tfi) 1.01   1.03   0.07    0.81
    
    R-sq.(adj) =  -5.65e-06   Deviance explained = 1.65%
    UBRE = -0.96024  Scale est. = 1         n = 9952
    ci.exp(mr, pval = TRUE)
                exp(Est.)   2.5%  97.5%        P
    (Intercept)    0.0247 0.0149 0.0409 1.25e-46
    sexF           2.6062 1.2550 5.4120 1.02e-02
    s(tfi).1       1.0050 0.8913 1.1332 9.35e-01
    s(tfi).2       0.9962 0.8078 1.2287 9.72e-01
    s(tfi).3       0.9982 0.9191 1.0841 9.66e-01
    s(tfi).4       1.0019 0.8901 1.1278 9.75e-01
    s(tfi).5       0.9984 0.9228 1.0802 9.69e-01
    s(tfi).6       0.9982 0.9014 1.1053 9.72e-01
    s(tfi).7       1.0017 0.9262 1.0834 9.66e-01
    s(tfi).8       0.9945 0.6845 1.4449 9.77e-01
    s(tfi).9       0.9479 0.6335 1.4183 7.95e-01

    What is the remission rate-ratio between men and women?

  2. If we want to predict the probability of being in each of the three states using these estimated rates, we may resort to analytical calculations of the probabilities from the estimated rates, which is actually doable in this case, but which will be largely intractable for more complicated models. Alternatively we can simulate the life course for a large group of (identical) individuals through a model using the estimated rates. That will give a simulated cohort (in the form of a Lexis object), and we can then just count the number of persons in each state at each of a set of time points. This is accomplished using the function simLexis. The input to this is the initial status of the persons whose life-course we shall simulate, and the transition rates in suitable form:

    • Suppose we want predictions for men aged 50 at NRA. The input is in the form of a Lexis object (where lex.dur and lex.Xst will be ignored). Note that in order to carry over the time.scales and the time.since attributes, we construct the input object using subset to select columns, and NULL to select rows (see the example in the help file for simLexis):
    inL <- subset(sLc, select = 1:11)[NULL, ]
    str(inL)
    Classes 'Lexis' and 'data.frame':   0 obs. of  11 variables:
     $ lex.id : int 
     $ per    : num 
     $ age    : num 
     $ tfi    : num 
     $ lex.dur: num 
     $ lex.Cst: Factor w/ 4 levels "NRA","Rem","ESRD",..: 
     $ lex.Xst: Factor w/ 4 levels "NRA","Rem","ESRD",..: 
     $ id     : num 
     $ sex    : Factor w/ 2 levels "M","F": 
     $ dob    : num 
     $ doe    : num 
     - attr(*, "time.scales")= chr [1:3] "per" "age" "tfi"
     - attr(*, "time.since")= chr [1:3] "" "" ""
     - attr(*, "breaks")=List of 3
      ..$ per: NULL
      ..$ age: NULL
      ..$ tfi: num [1:361] 0 0.0833 0.1667 0.25 0.3333 ...
    timeScales(inL)
    [1] "per" "age" "tfi"
    inL[1, "lex.id"] <- 1
    inL[1, "per"] <- 2000
    inL[1, "age"] <- 50
    inL[1, "tfi"] <- 0
    inL[1, "lex.Cst"] <- "NRA"
    inL[1, "lex.Xst"] <- NA
    inL[1, "lex.dur"] <- NA
    inL[1, "sex"] <- "M"
    inL[1, "doe"] <- 2000
    inL[1, "dob"] <- 1950
    inL <- rbind(inL, inL)
    inL[2, "sex"] <- "F"
    inL
     lex.id  per age tfi lex.dur lex.Cst lex.Xst id sex  dob  doe
          1 2000  50   0      NA     NRA    <NA> NA   M 1950 2000
          1 2000  50   0      NA     NRA    <NA> NA   F 1950 2000
    str(inL)
    Classes 'Lexis' and 'data.frame':   2 obs. of  11 variables:
     $ lex.id : num  1 1
     $ per    : num  2000 2000
     $ age    : num  50 50
     $ tfi    : num  0 0
     $ lex.dur: num  NA NA
     $ lex.Cst: Factor w/ 4 levels "NRA","Rem","ESRD",..: 1 1
     $ lex.Xst: Factor w/ 4 levels "NRA","Rem","ESRD",..: NA NA
     $ id     : num  NA NA
     $ sex    : Factor w/ 2 levels "M","F": 1 2
     $ dob    : num  1950 1950
     $ doe    : num  2000 2000
     - attr(*, "breaks")=List of 3
      ..$ per: NULL
      ..$ age: NULL
      ..$ tfi: num [1:361] 0 0.0833 0.1667 0.25 0.3333 ...
     - attr(*, "time.scales")= chr [1:3] "per" "age" "tfi"
     - attr(*, "time.since")= chr [1:3] "" "" ""

    The other input for the simulation is the models for the transitions. This is given as a list with an element for each transient state (that is NRA and Rem), each of which is again a list with names equal to the states that can be reached from the transient state. The content of the list will be glm objects, in this case the models we just fitted, describing the transition rates:

    Tr <- list("NRA" = list("Rem"  = mr,
                            "ESRD" = mx),
               "Rem" = list("ESRD(Rem)" = mx))

    With this as input we can now generate a cohort, using N=5 to simulate life course of 10 persons (5 for each set of starting values in inL):

    (iL <- simLexis(Tr, inL, N = 10))
     lex.id  per  age  tfi lex.dur lex.Cst   lex.Xst id sex  dob  doe cens
          1 2000 50.0 0.00    2.97     NRA      ESRD NA   M 1950 2000 2020
          2 2000 50.0 0.00    0.44     NRA       Rem NA   M 1950 2000 2020
          2 2000 50.4 0.44   12.00     Rem ESRD(Rem) NA   M 1950 2000 2020
          3 2000 50.0 0.00   10.63     NRA      ESRD NA   M 1950 2000 2020
          4 2000 50.0 0.00   10.60     NRA      ESRD NA   M 1950 2000 2020
          5 2000 50.0 0.00   14.29     NRA      ESRD NA   M 1950 2000 2020
          6 2000 50.0 0.00    3.00     NRA      ESRD NA   M 1950 2000 2020
          7 2000 50.0 0.00   20.00     NRA       NRA NA   M 1950 2000 2020
          8 2000 50.0 0.00   11.21     NRA      ESRD NA   M 1950 2000 2020
          9 2000 50.0 0.00   15.42     NRA      ESRD NA   M 1950 2000 2020
         10 2000 50.0 0.00    9.74     NRA      ESRD NA   M 1950 2000 2020
         11 2000 50.0 0.00   12.24     NRA      ESRD NA   F 1950 2000 2020
         12 2000 50.0 0.00    8.18     NRA      ESRD NA   F 1950 2000 2020
         13 2000 50.0 0.00    4.78     NRA      ESRD NA   F 1950 2000 2020
         14 2000 50.0 0.00    1.23     NRA       Rem NA   F 1950 2000 2020
         14 2001 51.2 1.23    6.32     Rem ESRD(Rem) NA   F 1950 2000 2020
         15 2000 50.0 0.00   14.64     NRA      ESRD NA   F 1950 2000 2020
         16 2000 50.0 0.00    2.51     NRA       Rem NA   F 1950 2000 2020
         16 2003 52.5 2.51    4.66     Rem ESRD(Rem) NA   F 1950 2000 2020
         17 2000 50.0 0.00    0.38     NRA       Rem NA   F 1950 2000 2020
         17 2000 50.4 0.38   10.51     Rem ESRD(Rem) NA   F 1950 2000 2020
         18 2000 50.0 0.00    6.73     NRA      ESRD NA   F 1950 2000 2020
         19 2000 50.0 0.00    0.09     NRA       Rem NA   F 1950 2000 2020
         19 2000 50.1 0.09    6.74     Rem ESRD(Rem) NA   F 1950 2000 2020
         20 2000 50.0 0.00    4.24     NRA      ESRD NA   F 1950 2000 2020
    summary(iL, by = "sex")
    $M
    
    Transitions:
         To
    From  NRA Rem ESRD ESRD(Rem)  Records:  Events: Risk time:  Persons:
      NRA   1   1    8         0        10        9       98.3        10
      Rem   0   0    0         1         1        1       12.0         1
      Sum   1   1    8         1        11       10      110.3        10
    
    $F
    
    Transitions:
         To
    From  NRA Rem ESRD ESRD(Rem)  Records:  Events: Risk time:  Persons:
      NRA   0   4    6         0        10       10       55.0        10
      Rem   0   0    0         4         4        4       28.2         4
      Sum   0   4    6         4        14       14       83.2        10

    What type of object have you got as iL?

  3. Now generate the life course of, say, 5,000 persons, and look at the summary. The system.time command is just to tell you how long it took, you may want to start with 500 just to see how long that takes.

    system.time(sM <- simLexis(Tr, inL, N = 500, t.range = 12))
       user  system elapsed 
       2.57    3.06    1.97 
    summary(sM, by = "sex")
    $M
    
    Transitions:
         To
    From  NRA Rem ESRD ESRD(Rem)  Records:  Events: Risk time:  Persons:
      NRA  30  72  398         0       500      470       2721       500
      Rem   0  35    0        37        72       37        415        72
      Sum  30 107  398        37       572      507       3136       500
    
    $F
    
    Transitions:
         To
    From  NRA Rem ESRD ESRD(Rem)  Records:  Events: Risk time:  Persons:
      NRA  32 143  325         0       500      468       2358       500
      Rem   0  89    0        54       143       54       1023       143
      Sum  32 232  325        54       643      522       3380       500

    Why are there so many ESRD-events in the resulting data set?

  4. Now count how many persons are present in each state at each time for the first 10 years after entry (which is at age 50). This can be done by using nState. Try:

    nStm <- nState(subset(sM, sex == "M"), time.scale = "age", 
                   at = seq(0, 10, 0.1), 
                 from = 50)
    nStf <- nState(subset(sM, sex == "F"), time.scale = "age", 
                   at = seq(0, 10, 0.1), 
                 from = 50)
    head(nStf, 15)
          State
    when   NRA Rem ESRD ESRD(Rem)
      50   500   0    0         0
      50.1 493   3    4         0
      50.2 488   6    6         0
      50.3 483   8    9         0
      50.4 471  15   14         0
      50.5 466  16   18         0
      50.6 458  20   22         0
      50.7 452  22   26         0
      50.8 448  25   27         0
      50.9 443  29   28         0
      51   439  32   29         0
      51.1 429  35   36         0
      51.2 420  37   43         0
      51.3 417  39   43         1
      51.4 413  41   45         1

    What is in the object nStf?

  5. With the counts of persons in each state at the designated time points (in nStm), compute the cumulative fraction over the states, arranged in order given by perm:

    ppm <- pState(nStm, perm = c(2, 1, 3, 4))
    ppf <- pState(nStf, perm = c(2, 1, 3, 4))
    head(ppf)
          State
    when     Rem   NRA ESRD ESRD(Rem)
      50   0.000 1.000    1         1
      50.1 0.006 0.992    1         1
      50.2 0.012 0.988    1         1
      50.3 0.016 0.982    1         1
      50.4 0.030 0.972    1         1
      50.5 0.032 0.964    1         1
    tail(ppf)
          State
    when     Rem   NRA  ESRD ESRD(Rem)
      59.5 0.200 0.312 0.926         1
      59.6 0.198 0.308 0.924         1
      59.7 0.198 0.306 0.924         1
      59.8 0.194 0.302 0.920         1
      59.9 0.196 0.300 0.918         1
      60   0.196 0.298 0.918         1

    What do the entries in ppf represent?

  6. Try to plot the cumulative probabilities using the plot method for pState objects:

    plot(ppf)

    Is this useful?

  7. Now try to improve the plot so that it is easier to read, and easier to compare between men and women, for example:

    par(mfrow = c(1, 2))
    # Men
    plot(ppm, col = c("limegreen", "red", "#991111", "forestgreen"))
    lines(as.numeric(rownames(ppm)), ppm[, "NRA"], lwd = 2)
    text(59.5, 0.95, "Men", adj = 1, col = "white", font = 2, cex = 1.2)
    axis(side = 4, at = 0:10 / 10)
    axis(side = 4, at = 1:99 / 100, labels = NA, tck = -0.01)
    # Women 
    plot(ppf, col = c("limegreen", "red", "#991111", "forestgreen"),
              xlim = c(60, 50)) # inverted x-axis
    lines(as.numeric(rownames(ppf)), ppf[, "NRA"], lwd = 2)
    text(59.5, 0.95, "Women", adj = 0, col = "white", font = 2, cex = 1.2)
    axis(side = 2, at = 0:10 / 10)
    axis(side = 2, at = 1:99 / 100, labels = NA, tck = -0.01)

    What is the 10-year risk of remission for men and women respectively?