Chapter 4 Tabulation
4.1 Introduction
R
and its add-on packages provide several different
tabulation functions with different capabilities. The appropriate function
to use depends on your goal. There are at least three different uses for
tables.
The first use is to create simple summary statistics that will be used
for further calculations in R
. For example,
a two-by-two table created by the table
function can be passed
to fisher.test
, which will calculate odds ratios and confidence
intervals. The appearance of these tables may, however, be quite basic,
as their principal goal is to create new objects for future calculations.
A quite different use of tabulation is to make production quality
tables for publication. You may want to generate reports from for
publication in paper form, or on the World Wide Web. The package
xtable
provides this capability, but it is not covered by this course.
An intermediate use of tabulation functions is to create human-readable
tables for discussion within your work-group, but not for publication. The
Epi
package provides a function stat.table
for this purpose,
and this practical is designed to introduce this function.
4.2 The births data
We shall use the births data which concern 500 mothers who had singleton births in a large London hospital. The outcome of interest is the birth weight of the baby, also dichotomised as normal or low birth weight. These data are available in the Epi package:
In order to work with this data set we need to transform some of the variables into factors. This is done with the following commands:
births$hyp <- factor(births$hyp, labels = c("normal", "hyper"))
births$sex <- factor(births$sex, labels = c("M", "F"))
births$agegrp <-
cut(
births$matage,
breaks = c(20, 25, 30, 35, 40, 45),
right = FALSE
)
births$gest4 <-
cut(
births$gestwks,
breaks = c(20, 35, 37, 39, 45),
right = FALSE
)
Now use str(births)
to examine the modified data frame. We have
transformed the binary variables hyp
and sex
into factors
with informative labels. This will help when displaying the tables. We
have also created grouped variables agegrp
and gest4
from
the continuous variables matage
and gestwks
so that they
can be tabulated.
4.3 One-way tables
The simplest table one-way table is created by
This creates a count of individuals, classified by levels of the
factor sex
. Compare this table with the equivalent one produced
by the table
function. Note that stat.table
has a
data
argument that allows you to use variables in a data frame without
specifying the frame.
You can display several summary statistics in the same table by
giving a list of expressions to the contents
argument:
Only a limited set of expressions are allowed: see the help page
for stat.table
for details.
You can also calculate marginal tables by specifying margin=TRUE
in your call to stat.table
. Do this for the above table. Check
that the percentages add up to 100 and the total for count()
is the
same as the number of rows of the data frame births
.
To see how the mean birth weight changes with sex
, try
Add the count to this table. Add also the margin with margin=TRUE
.
As an alternative to bweight
we can look at lowbw
with
All the percentages are 100! To use the percent
function the variable lowbw
must also be in the index, as in
The final column is the percentage of babies with low birth weight by different categories of gestation.
- Obtain a table showing the frequency distribution of
gest4
. - Show how the mean birth weight changes with
gest4
. - Show how the percentage of low birth weight babies changes with
gest4
.
Another way of obtaining the percentage of low birth weight babies by gestation is to use the ratio function:
This only works because lowbw
is coded 0/1, with 1 for low birth
weight.
Tables of odds can be produced in the same way by using
ratio(lowbw, 1-lowbw)
. The ratio
function is also very useful
for making tables of rates with (say) ratio(D,Y,1000)
where
D
is the number of failures, and Y
is the follow-up time. We
shall return to rates in a later practical.
4.4 Improving the Presentation of Tables
The stat.table
function provides default column headings based
on the contents
argument, but these are not always very informative.
Supply your own column headings using tagged lists as the
value of the contents
argument, within a stat.table
call:
This improves the readability of the table. It remains to give an
informative title to the index variable. You can do this in the same way:
instead of giving gest4
as the index
argument to stat.table
,
use a named list:
4.5 Two-way Tables
The following call gives a \(2\times 2\) table showing the mean birth weight
cross-classified by sex
and hyp
.
Add the count to this table and repeat the function call using margin = TRUE
to calculate the
marginal tables.
Use stat.table
with the ratio function to obtain a \(2\times 2\) table of percent low birth weight by sex
and hyp
.
You can have fine-grained control over which margins to calculate by
giving a logical vector to the margin
argument. Use margin=c(FALSE, TRUE)
to calculate margins over sex
but not
hyp
. This might not be what you expect, but the margin
argument indicates which of the index variables are to be marginalized out
,
not which index variables are to remain.
4.6 Printing
Just like every other R
function, stat.table
produces
an object that can be saved and printed later, or used for further
calculation. You can control the appearance of a table with an explicit
call to print()
There are two arguments to the print method for stat.table
. The
width
argument which specifies the minimum column width, and the
digits
argument which controls
the number of digits printed after the decimal point. This table
odds.tab <-
stat.table(
gest4,
list("odds of low bw" = ratio(lowbw, 1 - lowbw)),
data = births
)
print(odds.tab)
shows a table of odds that the baby has low birth weight. Use
width=15
and digits=3
and see the difference.