Brian McKay’s Flu Analysis Data: Fitting

Let’s Begin with some Fitting

But first let’s load some packages…

library(gtsummary) #To create tables will include
library(tidyr) #Helps with data wrangling
Warning: package 'tidyr' was built under R version 4.2.3
library(performance) #Model evaluation
library(here) #Setting paths
here() starts at C:/GitHub/MADA/kimberlyperez-MADA-portfolio
library(broom.mixed) #Converts Bayesian to tibbles 
library(tidymodels) #Great package for models and other resources used for fitting
── Attaching packages ────────────────────────────────────── tidymodels 1.0.0 ──
✔ broom        1.0.2      ✔ recipes      1.0.4 
✔ dials        1.1.0      ✔ rsample      1.1.1 
✔ dplyr        1.0.10     ✔ tibble       3.1.8 
✔ ggplot2      3.4.0      ✔ tune         1.0.1 
✔ infer        1.0.4      ✔ workflows    1.1.2 
✔ modeldata    1.1.0      ✔ workflowsets 1.0.0 
✔ parsnip      1.0.3      ✔ yardstick    1.1.0 
✔ purrr        1.0.1      
── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ recipes::all_double()  masks gtsummary::all_double()
✖ recipes::all_factor()  masks gtsummary::all_factor()
✖ recipes::all_integer() masks gtsummary::all_integer()
✖ recipes::all_logical() masks gtsummary::all_logical()
✖ recipes::all_numeric() masks gtsummary::all_numeric()
✖ purrr::discard()       masks scales::discard()
✖ dplyr::filter()        masks stats::filter()
✖ dplyr::lag()           masks stats::lag()
✖ yardstick::mae()       masks performance::mae()
✖ yardstick::rmse()      masks performance::rmse()
✖ recipes::step()        masks stats::step()
• Learn how to get started at https://www.tidymodels.org/start/
library(ggplot2) #Great for data vis.
library(dotwhisker) #Visualization for regression outputs
Warning: package 'dotwhisker' was built under R version 4.2.3

Data Loading

Now that the packages we need are loaded, let’s read in and view our cleaned data

data_flu<- readRDS(here("fluanalysis", "processed_data", "SympAct_cleaned.rds")) #Reading in cleaned data
glimpse(data_flu) #Let's reacquaint ourselves with the cleaned data
Rows: 730
Columns: 32
$ SwollenLymphNodes <fct> Yes, Yes, Yes, Yes, Yes, No, No, No, Yes, No, Yes, Y…
$ ChestCongestion   <fct> No, Yes, Yes, Yes, No, No, No, Yes, Yes, Yes, Yes, Y…
$ ChillsSweats      <fct> No, No, Yes, Yes, Yes, Yes, Yes, Yes, Yes, No, Yes, …
$ NasalCongestion   <fct> No, Yes, Yes, Yes, No, No, No, Yes, Yes, Yes, Yes, Y…
$ CoughYN           <fct> Yes, Yes, No, Yes, No, Yes, Yes, Yes, Yes, Yes, No, …
$ Sneeze            <fct> No, No, Yes, Yes, No, Yes, No, Yes, No, No, No, No, …
$ Fatigue           <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Ye…
$ SubjectiveFever   <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, No, Yes…
$ Headache          <fct> Yes, Yes, Yes, Yes, Yes, Yes, No, Yes, Yes, Yes, Yes…
$ Weakness          <fct> Mild, Severe, Severe, Severe, Moderate, Moderate, Mi…
$ WeaknessYN        <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Ye…
$ CoughIntensity    <fct> Severe, Severe, Mild, Moderate, None, Moderate, Seve…
$ CoughYN2          <fct> Yes, Yes, Yes, Yes, No, Yes, Yes, Yes, Yes, Yes, Yes…
$ Myalgia           <fct> Mild, Severe, Severe, Severe, Mild, Moderate, Mild, …
$ MyalgiaYN         <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Ye…
$ RunnyNose         <fct> No, No, Yes, Yes, No, No, Yes, Yes, Yes, Yes, No, No…
$ AbPain            <fct> No, No, Yes, No, No, No, No, No, No, No, Yes, Yes, N…
$ ChestPain         <fct> No, No, Yes, No, No, Yes, Yes, No, No, No, No, Yes, …
$ Diarrhea          <fct> No, No, No, No, No, Yes, No, No, No, No, No, No, No,…
$ EyePn             <fct> No, No, No, No, Yes, No, No, No, No, No, Yes, No, Ye…
$ Insomnia          <fct> No, No, Yes, Yes, Yes, No, No, Yes, Yes, Yes, Yes, Y…
$ ItchyEye          <fct> No, No, No, No, No, No, No, No, No, No, No, No, Yes,…
$ Nausea            <fct> No, No, Yes, Yes, Yes, Yes, No, No, Yes, Yes, Yes, Y…
$ EarPn             <fct> No, Yes, No, Yes, No, No, No, No, No, No, No, Yes, Y…
$ Hearing           <fct> No, Yes, No, No, No, No, No, No, No, No, No, No, No,…
$ Pharyngitis       <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, No, No, No, Yes, …
$ Breathless        <fct> No, No, Yes, No, No, Yes, No, No, No, Yes, No, Yes, …
$ ToothPn           <fct> No, No, Yes, No, No, No, No, No, Yes, No, No, Yes, N…
$ Vision            <fct> No, No, No, No, No, No, No, No, No, No, No, No, No, …
$ Vomit             <fct> No, No, No, No, No, No, Yes, No, No, No, Yes, Yes, N…
$ Wheeze            <fct> No, No, No, Yes, No, Yes, No, No, No, No, No, Yes, N…
$ BodyTemp          <dbl> 98.3, 100.4, 100.8, 98.8, 100.5, 98.4, 102.5, 98.4, …

Fitting a Linear Model: Continuous Outcome and Main Predictor of Interest

To begin fitting, we will first need to define the linear regression

linear_reg() %>%
  set_engine("lm") #Setting 
Linear Regression Model Specification (regression)

Computational engine: lm 
model_lm<- linear_reg()

Now that we have defined our LR, we will have to train the model to our given data

fit_lm <- 
  model_lm %>%
  fit(BodyTemp~SwollenLymphNodes, data_flu)

fit_lm #Let's view what we just did
parsnip model object


Call:
stats::lm(formula = BodyTemp ~ SwollenLymphNodes, data = data)

Coefficients:
         (Intercept)  SwollenLymphNodesYes  
            98.96842              -0.07804  
#We can use the tidy() function to produce the output/ summary statistics 
tidy(fit_lm)
# A tibble: 2 × 5
  term                 estimate std.error statistic p.value
  <chr>                   <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)           99.0       0.0585  1691.      0    
2 SwollenLymphNodesYes  -0.0780    0.0895    -0.872   0.384

Plotting Linear Regression Model

tidy(fit_lm) %>%
    dwplot(dot_args = list(size = 2, color = "coral1"),
         whisker_args = list(color = "coral1"),
         vline = geom_vline(xintercept = 0, color = "darkred", linetype = 4))

Fitting More Linear Models

Wonderful! Now, let’s do the same thing with all predictors of interest.

all_lm<- linear_reg() #Defining the model

fit_all<- #Since we defined the model, we can now train the model to the data!
  all_lm %>%
  fit(BodyTemp~., data= data_flu)

fit_all
parsnip model object


Call:
stats::lm(formula = BodyTemp ~ ., data = data)

Coefficients:
           (Intercept)    SwollenLymphNodesYes      ChestCongestionYes  
             97.925243               -0.165302                0.087326  
       ChillsSweatsYes      NasalCongestionYes              CoughYNYes  
              0.201266               -0.215771                0.313893  
             SneezeYes              FatigueYes      SubjectiveFeverYes  
             -0.361924                0.264762                0.436837  
           HeadacheYes            WeaknessMild        WeaknessModerate  
              0.011453                0.018229                0.098944  
        WeaknessSevere           WeaknessYNYes      CoughIntensityMild  
              0.373435                      NA                0.084881  
CoughIntensityModerate    CoughIntensitySevere             CoughYN2Yes  
             -0.061384               -0.037272                      NA  
           MyalgiaMild         MyalgiaModerate           MyalgiaSevere  
              0.164242               -0.024064               -0.129263  
          MyalgiaYNYes            RunnyNoseYes               AbPainYes  
                    NA               -0.080485                0.031574  
          ChestPainYes             DiarrheaYes                EyePnYes  
              0.105071               -0.156806                0.131544  
           InsomniaYes             ItchyEyeYes               NauseaYes  
             -0.006824               -0.008016               -0.034066  
              EarPnYes              HearingYes          PharyngitisYes  
              0.093790                0.232203                0.317581  
         BreathlessYes              ToothPnYes               VisionYes  
              0.090526               -0.022876               -0.274625  
              VomitYes               WheezeYes  
              0.165272               -0.046665  
#We can use the tidy() function to produce the output/ summary statistics 
tidy(fit_all)
# A tibble: 38 × 5
   term                 estimate std.error statistic   p.value
   <chr>                   <dbl>     <dbl>     <dbl>     <dbl>
 1 (Intercept)           97.9       0.304   322.     0        
 2 SwollenLymphNodesYes  -0.165     0.0920   -1.80   0.0727   
 3 ChestCongestionYes     0.0873    0.0975    0.895  0.371    
 4 ChillsSweatsYes        0.201     0.127     1.58   0.114    
 5 NasalCongestionYes    -0.216     0.114    -1.90   0.0584   
 6 CoughYNYes             0.314     0.241     1.30   0.193    
 7 SneezeYes             -0.362     0.0983   -3.68   0.000249 
 8 FatigueYes             0.265     0.161     1.65   0.0996   
 9 SubjectiveFeverYes     0.437     0.103     4.22   0.0000271
10 HeadacheYes            0.0115    0.125     0.0913 0.927    
# … with 28 more rows

Plotting Linear Regression Model Using All Predictors of Interest

tidy(fit_all) %>%
    dwplot(dot_args = list(size = 2, color = "coral1"),
         whisker_args = list(color = "coral1"),
         vline = geom_vline(xintercept = 0, color = "darkred", linetype = 4))

Now, let’s check the model performance for a single predictor of interest (Swollen Lymph Nodes

check_model(fit_lm$fit) #single predictor of interest swollen lymph nodes

We will now do the same for all predictors of interest

check_model(fit_all$fit)
Variable `Component` is not in your data frame :/

glm_mod<- logistic_reg() %>%
  set_engine("glm")

fit_glm<- 
  glm_mod %>%
  fit(Nausea~BodyTemp, data= data_flu)

fit_glm
parsnip model object


Call:  stats::glm(formula = Nausea ~ BodyTemp, family = stats::binomial, 
    data = data)

Coefficients:
(Intercept)     BodyTemp  
   -5.87053      0.05304  

Degrees of Freedom: 729 Total (i.e. Null);  728 Residual
Null Deviance:      944.7 
Residual Deviance: 944  AIC: 948
tidy(fit_glm) #Like we did above, let's utilize the same process and examine the model using tidy()
# A tibble: 2 × 5
  term        estimate std.error statistic p.value
  <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)  -5.87      6.34      -0.926   0.355
2 BodyTemp      0.0530    0.0641     0.828   0.408

We will now plot the output

tidy(fit_glm) %>%
    dwplot(dot_args = list(size = 2, color = "coral1"),
         whisker_args = list(color = "coral1"),
         vline = geom_vline(xintercept = 0, color = "darkred", linetype = 4))

Plotting Logistic Model Using All Predictors of Interest

glm_all<- logistic_reg() %>% #Setting 
  set_engine ("glm")

all_glm_fit<- glm_all %>% #Training Model 
  fit(Nausea~., data= data_flu)

all_glm_fit
parsnip model object


Call:  stats::glm(formula = Nausea ~ ., family = stats::binomial, data = data)

Coefficients:
           (Intercept)    SwollenLymphNodesYes      ChestCongestionYes  
              0.222870               -0.251083                0.275554  
       ChillsSweatsYes      NasalCongestionYes              CoughYNYes  
              0.274097                0.425817               -0.140423  
             SneezeYes              FatigueYes      SubjectiveFeverYes  
              0.176724                0.229062                0.277741  
           HeadacheYes            WeaknessMild        WeaknessModerate  
              0.331259               -0.121606                0.310849  
        WeaknessSevere           WeaknessYNYes      CoughIntensityMild  
              0.823187                      NA               -0.220794  
CoughIntensityModerate    CoughIntensitySevere             CoughYN2Yes  
             -0.362678               -0.950544                      NA  
           MyalgiaMild         MyalgiaModerate           MyalgiaSevere  
             -0.004146                0.204743                0.120758  
          MyalgiaYNYes            RunnyNoseYes               AbPainYes  
                    NA                0.045324                0.939304  
          ChestPainYes             DiarrheaYes                EyePnYes  
              0.070777                1.063934               -0.341991  
           InsomniaYes             ItchyEyeYes                EarPnYes  
              0.084175               -0.063364               -0.181719  
            HearingYes          PharyngitisYes           BreathlessYes  
              0.323052                0.275364                0.526801  
            ToothPnYes               VisionYes                VomitYes  
              0.480649                0.125498                2.458466  
             WheezeYes                BodyTemp  
             -0.304435               -0.031246  

Degrees of Freedom: 729 Total (i.e. Null);  695 Residual
Null Deviance:      944.7 
Residual Deviance: 751.5    AIC: 821.5
tidy(all_glm_fit)
# A tibble: 38 × 5
   term                 estimate std.error statistic p.value
   <chr>                   <dbl>     <dbl>     <dbl>   <dbl>
 1 (Intercept)             0.223     7.83     0.0285  0.977 
 2 SwollenLymphNodesYes   -0.251     0.196   -1.28    0.200 
 3 ChestCongestionYes      0.276     0.213    1.30    0.195 
 4 ChillsSweatsYes         0.274     0.288    0.952   0.341 
 5 NasalCongestionYes      0.426     0.255    1.67    0.0944
 6 CoughYNYes             -0.140     0.519   -0.271   0.787 
 7 SneezeYes               0.177     0.210    0.840   0.401 
 8 FatigueYes              0.229     0.372    0.616   0.538 
 9 SubjectiveFeverYes      0.278     0.225    1.23    0.218 
10 HeadacheYes             0.331     0.285    1.16    0.245 
# … with 28 more rows
tidy(all_glm_fit) %>%
    dwplot(dot_args = list(size = 2, color = "coral1"),
         whisker_args = list(color = "coral1"),
         vline = geom_vline(xintercept = 0, color = "darkred", linetype = 4))

Comparing Performance (Body Temp & Nausea)

check_model(fit_glm$fit)

Comparing Performance (Nausea & All Predictors of Interest)

check_model(all_glm_fit$fit)
Variable `Component` is not in your data frame :/

Exercise Wrap Up

Let us finally compare the performance of these models using compare_performance

compare_performance(fit_glm,all_glm_fit)
Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
prediction from a rank-deficient fit may be misleading

Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
prediction from a rank-deficient fit may be misleading

Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
prediction from a rank-deficient fit may be misleading

Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
prediction from a rank-deficient fit may be misleading
# Comparison of Model Performance Indices

Name        | Model | AIC (weights) | AICc (weights) | BIC (weights) | Tjur's R2 |  RMSE | Sigma | Log_loss | Score_log | Score_spherical |   PCP
-------------------------------------------------------------------------------------------------------------------------------------------------
fit_glm     |  _glm | 948.0 (<.001) |  948.0 (<.001) | 957.2 (>.999) | 9.297e-04 | 0.477 | 1.139 |    0.647 |  -107.928 |           0.007 | 0.546
all_glm_fit |  _glm | 821.5 (>.999) |  825.1 (>.999) | 982.2 (<.001) |     0.247 | 0.414 | 1.040 |    0.515 |      -Inf |           0.002 | 0.658

And wrap this up by exploring another model…specifically ANOVA!

#r makes this fairly seamless to do utilizing the anova() function
anova(fit_glm$fit, all_glm_fit$fit)
Analysis of Deviance Table

Model 1: Nausea ~ BodyTemp
Model 2: Nausea ~ SwollenLymphNodes + ChestCongestion + ChillsSweats + 
    NasalCongestion + CoughYN + Sneeze + Fatigue + SubjectiveFever + 
    Headache + Weakness + WeaknessYN + CoughIntensity + CoughYN2 + 
    Myalgia + MyalgiaYN + RunnyNose + AbPain + ChestPain + Diarrhea + 
    EyePn + Insomnia + ItchyEye + EarPn + Hearing + Pharyngitis + 
    Breathless + ToothPn + Vision + Vomit + Wheeze + BodyTemp
  Resid. Df Resid. Dev Df Deviance
1       728     943.97            
2       695     751.47 33    192.5