lme models

Linear mixed effects model - an introduction

Why do we need linear mixed effect models?
The conditional independence assumption

Two events A and B are conditionally independent given a third event Y precisely if the occurrence or non-occurrence of A and B are not dependent on events in their conditional probability distribution given Y. In other words, the performance ratings after drinking stimulants and depressants are conditionally independent if they not depend on the chosen stimuli, i.e. beer or wine and coffee or Red Bull). Failing to meet conditional independence assumptions is one of the worst statistical errors you can make but it is also one of the most poorly understood aspects of statistical analysis. Aggregating up to the highest sampling unit without making use of the variability in the data at lower levels permits inferences on the highest level only. On the contrary, estimating variability for sample units (participants and drinks) permits full generalisation and individual differences can be explored. This is where linear mixed-effect models come into play

What's the purpose of this app?

This app was designed as a help to all those students who have been struggling with understanding linear mixed-effect models. You can discover how fixed and random effects influence a model's predicted data.

Our totally real abstract on the right describes the definitely not simulated experiment we conducted for this app. In the Introduction tab, you can learn more about the mixed-effect model we used. In the 'Try it' tab, you can play around with the data and see how changes in variance components and fixed effect parts can change the model's predictions.

Tip: If you want to remove a random effect completely, just set the corresponding slider to zero

We hope you enjoy our app! Have fun!

Our totally real paper ;-)

Linear mixed-effects models derive from the usual regression formula you can see here:

$$ \Large Y_{si} = \beta_0 + \beta_1 * x_i + e_{si}$$

β₀ designates the intercept, β₁ the slope and therefore the influence of x_i on Y_si and e_si is the residual error.

$$ \Large \beta_0 = \gamma_0 + S_{0s} + I_{0i}$$

In a mixed-effects model, the intercept actually comprises a fixed-effect part γ₀, a random-effect intercept by subject S_0s and a random-effect intercept by item I_0i

$$ \Large \beta_1 = \gamma_1 + S_{1s} $$

The slope is comprised of a fixed-effect part γ₁ and a random-effect slope by subject S_1s. It can also include a random-effect slope by item I_1i, although we did not include this in our model

Put together, this yields the complete formula for our mixed-effects model:

$$ \Large Y_{si} = \gamma_0 + S_{0s} + I_{0i} + (\gamma_1 + S_{1s}) * x_i + e_{si}$$

$$ \LARGE Y_{si} = \gamma_0 + S_{0s} + I_{0i} + (\gamma_1 + S_{0s}) * x_i + e_{si}$$

# App for visualizing the influence of random effects in linear mixed-effect models # Claudia Glemser, University Tuebingen, Germany # Maria-Bianca Leonte, Jacobs University Bremen, Germany # Malte Dewies, KU Leuven, Belgium # Dmitri Rozgonjuk, University of Tartu, Estonia library(shiny) library(lme4) library(Matrix) library(ggplot2) library(simgen) ui <- fluidPage( includeScript("../../../Matomo-tquant.js"), titlePanel("Linear mixed effects model - an introduction", "lme models"), tabsetPanel( tabPanel("Welcome", fluidRow( column(6, align = "center", br(), h4("Why do we need linear mixed effect models?", br(), "The conditional independence assumption"), br(), align ="left", "Two events A and B are conditionally independent given a third event Y precisely if the occurrence or non-occurrence of A and B are not dependent on events in their conditional probability distribution given Y. In other words, the performance ratings after drinking stimulants and depressants are conditionally independent if they not depend on the chosen stimuli, i.e. beer or wine and coffee or Red Bull). Failing to meet conditional independence assumptions is one of the worst statistical errors you can make but it is also one of the most poorly understood aspects of statistical analysis. Aggregating up to the highest sampling unit without making use of the variability in the data at lower levels permits inferences on the highest level only. On the contrary, estimating variability for sample units (participants and drinks) permits full generalisation and individual differences can be explored. This is where linear mixed-effect models come into play", br(), br(), br(), h4("What's the purpose of this app?"), br(), "This app was designed as a help to all those students who have been struggling with understanding linear mixed-effect models. You can discover how fixed and random effects influence a model's predicted data.", br(), br(), "Our totally real abstract on the right describes the definitely not simulated experiment we conducted for this app. In the Introduction tab, you can learn more about the mixed-effect model we used. In the 'Try it' tab, you can play around with the data and see how changes in variance components and fixed effect parts can change the model's predictions.", br(), br(), "Tip: If you want to remove a random effect completely, just set the corresponding slider to zero", br(), br(), "We hope you enjoy our app! Have fun!", br(), br() ), column(6, align = "center", br(), h4("Our totally real paper ;-)"), br(), img(src = "Our_Paper.JPG")) )), tabPanel("Introduction to our mixed-effect model", fluidRow( column(12, align = "center", br(), HTML("<div>Linear mixed-effects models derive from the usual regression formula you can see here: </div>"), withMathJax(), '$$ \\Large Y_{si} = \\beta_0 + \\beta_1 * x_i + e_{si}$$', HTML("<div> β<sub>0</sub> designates the intercept, β<sub>1</sub> the slope and therefore the influence of x<sub>i</sub> on Y<sub>si</sub> and e<sub>si</sub> is the residual error.</div><br>") ) ), fluidRow( column(12, align = "center", withMathJax(), '$$ \\Large \\beta_0 = \\gamma_0 + S_{0s} + I_{0i}$$', HTML("<div> In a mixed-effects model, the intercept actually comprises a fixed-effect part γ<sub>0</sub>, a random-effect intercept by subject S<sub>0s</sub> and a random-effect intercept by item I<sub>0i</sub> </div> <br>") ) ), fluidRow( column(12, align = "center", withMathJax(), '$$ \\Large \\beta_1 = \\gamma_1 + S_{1s} $$', HTML("<div> The slope is comprised of a fixed-effect part γ<sub>1</sub> and a random-effect slope by subject S<sub>1s</sub>. It can also include a random-effect slope by item I<sub>1i</sub>, although we did not include this in our model </div> <br><br> <div> Put together, this yields the complete formula for our mixed-effects model: </div>") ) ), fluidRow( column(12, align = "center", withMathJax(), '$$ \\Large Y_{si} = \\gamma_0 + S_{0s} + I_{0i} + (\\gamma_1 + S_{1s}) * x_i + e_{si}$$' ) ) ), tabPanel("Try it!", sidebarLayout( sidebarPanel( actionButton("go.gen", "Generate new dataset"), withMathJax(), sliderInput("gm_int", label = HTML("Grand mean for intercept γ<sub>0</sub>"), min = 0, max = 3, value = 0, step=0.05), sliderInput("gm_slope", label = HTML("Grand mean for slope γ<sub>1</sub>"), min = 0, max = 3, value = 1, step = 0.05), sliderInput("bs_int", label = HTML("By subject intercept S<sub>0s</sub>"), min = 0, max = 3, value = 1, step=0.05), sliderInput("bs_slope", label = HTML("By subject random slope S<sub>1s</sub>"), min = 0, max = 3, value = 1, step=0.05), sliderInput("bi_int", label = HTML("By item random intercept I<sub>0i</sub>"), min = 0, max = 3, value = 1, step=0.05), sliderInput("resid_err", label = HTML("Residual error variance e<sub>si</sub>"), min = 0, max = 6, value = 1, step=0.1), actionButton("pop.up", "Danger! Don't click here!") ), mainPanel( fluidRow( column(12, align = "center", withMathJax(), '$$ \\LARGE Y_{si} = \\gamma_0 + S_{0s} + I_{0i} + (\\gamma_1 + S_{0s}) * x_i + e_{si}$$' ) ), fluidRow( column(12, tabsetPanel( tabPanel("full data", plotOutput("plot.all") ), tabPanel("subject random effects", plotOutput("plot.preds") ), tabPanel("beverage random effects", plotOutput("plot.preds2") ) ) ) ) ) ) ) ) ) server <- function(input, output){ ## (1) define the functions we need # call this function after you have re-scaled variance components # (i.e., subj_re, item_re, or err) # This will recreate the individual trials with the new values resimulate <- function(olddat, # result from mkDf(verbose = TRUE)$dat vc, # list of variance components params) { # the generative parameters dat2 <- olddat[, c("SubjID", "ItemID", "Cond")] dat2$ord <- seq_len(nrow(dat2)) # remember the original order dat3 <- merge(dat2, vc$subj_re, "SubjID") dat4 <- merge(dat3, vc$item_re, "ItemID") dat5 <- dat4[order(dat4$ord), ] # re-sort into original order rownames(dat5) <- NULL dat5$Resp <- with(dat5, params[1] + ri.subj + ri.item + # fe-intercept (params[2] + rs.subj + rs.item) * Cond + # fe-slope vc$err) dat5[, c("SubjID", "ItemID", "Cond", "Resp")] } rescale_variance_components <- function(x, # the data, result from call to mkDf() sri = 1, # scaling factor for by-subject random intercept srs = 1, # by-subject random slope iri = 1, # by-item random intercept irs = 1, # by-item random slope err = 1) { # residual error x$subj_re$ri.subj <- x$subj_re$ri.subj * sri x$subj_re$rs.subj <- x$subj_re$rs.subj * srs x$item_re$ri.item <- x$item_re$ri.item * iri x$item_re$ri.item <- x$item_re$ri.item * irs x$err <- x$err * err x[-1] } ## calculate model predictions for each unit/condition combination unit_predict <- function(x, params) { ## this is the design matrix of the study design_mx <- cbind(rs.int = 1, rs.slope = c(-.5, .5)) rownames(design_mx) <- paste("condition", c("A", "B"), sep = ".") ## matrix crossproduct to calculate predictions for each condition ## for each unit dat <- tcrossprod(as.matrix(x[, -1]), design_mx) + tcrossprod(cbind(rep(params["int"], nrow(x)), rep(params["eff"], nrow(x))), design_mx) ## add in the fixed effects ## recombine in long format res <- data.frame(rep(x[, 1], 2), cond = rep(c("A", "B"), each = nrow(dat)), Y = c(dat)) names(res)[1] <- names(x)[1] res } # save your input data v <- reactive({ c(int = input$gm_int, eff = input$gm_slope, # intercept + slope (fixed effects) err = input$resid_err, # residual error miss = 0, pMin = 0, pMax = 0, # no missing data pmin: minimum probability # and maximum probability of a data point # going missing t00 = input$bs_int, # by-subj random intercept (unit) t11 = input$bs_slope, # by-subj random slope (unit) rsub = 0, w00 = input$bi_int, # by-item random intercept (unit) w11 = 0, ritm = 0) # w11 = by-item random slope (unit) }) gen.newdata <- function(v){ data <- mkDf(nsubj = 8, nitem = 4, mcr.params = v, wsbi = TRUE, verbose = TRUE) } values <- reactiveValues(data = gen.newdata( setNames(c(0,0,1,0,0,0,1,1,0,1,0,0), c("int", "eff", "err", "miss", "pMin", "pMax", "t00", "t11", "rsub", "w00", "w11", "ritm") ) ) ) observeEvent(input$go.gen, { values$data <- gen.newdata(v()) }) new_v <- reactive({ c(input$gm_int, input$gm_slope) }) dat.rest <- reactive({ rescale_variance_components(values$data, sri = input$bs_int, srs = input$bs_slope, iri = input$bi_int, irs = 1, err = input$resid_err) }) # returns all but data frame dat.fram <- reactive({ resimulate(values$data$dat, dat.rest(), new_v()) # returns data frame }) plot.fun <- function(data){ data$SubjID <- factor(data$SubjID, levels = c(1:8), labels = c("Sergio", "Juergen", "Florian", "Andrea", "Cord", "Francis", "Kathi", "Luca")) ggplot(data, aes(x = as.numeric(cond), y = Y)) + # fixed effects on the background geom_point(aes(col = SubjID), size = 3, shape = 16) + geom_line(aes(col = SubjID, group = SubjID), size = 1) + labs(title = "\nPerformance rating per subject\n", x = "\nCondition", y = "Performance\n") + theme_classic() + theme(text = element_text(size=20), plot.title = element_text(hjust = 0.5), axis.text = element_text(colour = "black")) + scale_colour_brewer(type = "seq", palette = "Set1", name ="Subjects\n") + scale_x_continuous(limits=c(NA,NA),breaks=c(1,2), labels=c("stimulants", "depressants"), expand=c(0.1,0.1)) } data.plot <- reactive({ unit_predict(dat.rest()$subj_re, v()) }) output$plot.preds <- renderPlot({ plot.fun(data.plot()) }) plot.fun2 <- function(data){ data$ItemID <- factor(data$ItemID, levels = c(1:4), labels = c("coffee", "Red Bull", "beer", "wine")) ggplot(data, aes(x = as.numeric(cond), y = Y)) + # fixed effects on the background geom_point(aes(col = ItemID), size = 3, shape = 16) + geom_line(aes(col = ItemID, group = ItemID), size = 1) + labs(title = "\nPerformance rating by beverage\n", x = "\nCondition", y = "Performance\n") + theme_classic() + theme(text = element_text(size=20), plot.title = element_text(hjust = 0.5), axis.text = element_text(colour = "black")) + scale_colour_brewer(type = "seq", palette = "Set1", name ="Beverage\n") + scale_x_continuous(limits=c(NA,NA),breaks=c(1,2), labels=c("stimulants", "depressants"), expand=c(0.1,0.1)) } data.plot2 <- reactive({ unit_predict(dat.rest()$item_re, v()) }) output$plot.preds2 <- renderPlot({ plot.fun2(data.plot2()) }) plot.all <- function(data){ data$Cond <- factor(data$Cond, labels = c("stimulants", "depressants")) ggplot(data, aes(Cond, Resp)) + scale_colour_brewer(type = "seq", palette = "Set1") + geom_jitter(aes(color = Cond)) + geom_violin(aes(fill = Cond, alpha = .2, col = NA)) + # scale_y_continuous(limits = c(-4, 4), breaks = seq(-4, 4, 0.50)) + labs(title = "\nAll data points per condition\n", x = "\nCondition", y = "Performance\n") + theme_classic() + theme(text = element_text(size=20), plot.title = element_text(hjust = 0.5), axis.text = element_text(colour = "black"), legend.position="none") } output$plot.all <- renderPlot({ plot.all(dat.fram()) }) # Easter egg observeEvent(input$pop.up, { showModal(modalDialog( title = "Some Dale love from the developer team - Cuteness overload!", img(src = "Dale_Love.gif"), footer = modalButton("Dismiss"), size = "m", easyClose = FALSE, fade = TRUE, align = "center" ) ) }) } shinyApp(ui = ui, server = server)

Linear mixed effects model - an introduction

Why do we need linear mixed effect models? The conditional independence assumption

What's the purpose of this app?

Our totally real paper ;-)

Why do we need linear mixed effect models?
The conditional independence assumption