Chapter 9 The basics: t-tests, ANOVA, and linear regression

The most basic statistical analyses for a GRA to use are t-tests, analysis of variance, and linear regression. Here are some tips and references for getting started and creating well-formatted output:

9.1 t-tests & ANOVA

The best R-package I have used for doing t-tests and ANOVA is gtsummary, a package that makes beautiful tables for summarizing data.

Here is an example of a t-test with gtsummary:

library(gtsummary)
# lets use the ToothGrowth data
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
# Note: I am assuming here that you have checked the appropriate diagnostics (
#   e.g., check residuals for normality, check for outliers, check variances in both groups, etc.)

tbl_summary(ToothGrowth, # data set 
            by = "supp", # factor/grouping variable
            include = "len", # numeric outcome 
            label = list("len" ~ "Length"), # make a readable label 
            # t-test is a test of means, so let's summarize data with mean (sd) form
            statistic = list(all_continuous() ~ "{mean} ({sd})"), 
            # choose reasonable number of digits to print 
            digits = list(everything() ~ 2)) |> 
  # indicate that you want a t.test 
  add_p(test = list(all_continuous() ~ "t.test")) |> 
  # format labels 
  bold_labels() |> 
  # remove default 'Characteristic' header 
  modify_header(label = "")
OJ
N = 30
1
VC
N = 30
1
p-value2
Length 20.66 (6.61) 16.96 (8.27) 0.061
1 Mean (SD)
2 Welch Two Sample t-test

For ANOVA, we could do this:

tbl_summary(ToothGrowth, 
            by = dose, # factor/grouping variable is now 'dose' 
            include = "len", 
            label = list("len" ~ "Length"), 
            statistic = list(all_continuous() ~ "{mean} ({sd})"), 
            digits = list(everything() ~ 2)) |> 
  # indicate that you want a one-way ANOVA test here, not assuming equal variances 
  add_p(test = list(all_continuous() ~ "oneway.test")) |> 
  bold_labels() |> 
  modify_header(label = "")
0.5
N = 20
1
1
N = 20
1
2
N = 20
1
p-value2
Length 10.61 (4.50) 19.74 (4.42) 26.10 (3.77) <0.001
1 Mean (SD)
2 One-way analysis of means (not assuming equal variances)

9.2 Linear regression

gtsummary will also create summary tables for linear regression, like so:

# use the 'iris' data this time 
str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
my_model <- lm(Petal.Length ~ Species + Sepal.Length, data = iris)

# again, I am assuming you have checked the appropriate linear model diagnostics
tbl_regression(my_model) |> 
  modify_header(label = "**Feature**", estimate = "**Estimate**") |> 
  bold_labels()
Feature Estimate 95% CI1 p-value
Species


    setosa
    versicolor 2.2 2.1, 2.3 <0.001
    virginica 3.1 2.9, 3.3 <0.001
Sepal.Length 0.63 0.54, 0.72 <0.001
1 CI = Confidence Interval

For logistic regression, you will need to do some futher customization of the output to get the odds ratio estimates to appear in the table - take a look at this help documentation for examples.

For more on diagnostics/checking assumptions: this article on STHDA gives some examples of checking assumptions for linear regression in R.