Welcome to this App


The reproducability crisis has been a hot topic of debate in all sciences, but particular in psychological science.
The failure to reproduce assumingly established findings has led to an agitation in both the media
and the scientific community. The current situation has more than once been labelled as a replication crisis.
This educational app can be used as a visualisational aid to understand the statistical fundamentals of null
hypothesis testing and how these fundamentals relate to the current replication crisis.

In the Power Plot tab, there is a visualization and explaination of the relationship between alpha, effect size, sample size
and power. In the Hit/False Alarm Ratio tab, the relationship between prevalence, alpha and power is visualized.
Finally, in the Replication tab, the prevalence of true hypotheses given a specific replication rate is calculated and visualized.

ENJOY!

Explanation

In nullhypothesis testing (NHT) we search for the probability to find our data or more extreme, given that our nullhypothesis is true. We compare this probability, known as the p-value, to an arbitrary cut-off within the probability distribution (alpha). When we find that the p-value is smaller than this cut-off value, we assume that it is so unlikely for our data to come from our NH-distribution, that we choose to reject the NH.

When the Null hypothesis is true:
- Alpha is the arbitrary cut-off level to which we compare our p-value. This value is also the probability of a Type 1 error. This error refers to the situation that we reject the null hypothesis while in reality it is true.
- The true negative refers to the probability of not rejecting the null hypothesis when in reality the hypothesis is true. The probability of a true negative is equal to 1 - alpha.

When the null hypothesis is not true:
- Power refers to the probability to find an effect when in reality it really exists, i.e. the probability to reject the null hypothesis when the alternative hypothesis is true. Power is dependent on sample size and effect size and is equal to 1 - beta.
- Beta is the probability of a Type 2 error. This error occurs when in reality the null hypothesis is not true, but we fail to reject it.

Prevalence of Experiments with true effect

After running experiments: true and false significant effects

How many of the significant results are really true?

Explanation


The null hypothesis testing procedure is probabilistic. That is: there is always a (small) probability that an effect is not found (beta) or that the data seemed to show an effect by chance (alpha). Because of this, there will always be wrong decisions in null hypothesis testing.

Different situations can occur:
- True positive: In reality the alternative hypothesis is TRUE and through NHT the nullhypothesis is rejected.
- True negative: In reality the alternative hypothesis is FALSE and through NHT the nullhypothesis is NOT rejected.
- False positive: In reality the alternative hypothesis is FALSE but through NHT the nullhypothsis is rejected.
- False negative: In reality the alternative hypothesis is TRUE but through NHT the nullhypothesis is NOT rejected.

The amount of False/True Positives/Negatives is dependent on the chosen alpha and beta levels and the prevalence of true alternative hypothesis in the scientific field. That is: how many alternative hypothesis of all (interesting) tested hypothesis are in reality true.

Corresponding prevalence


In this plot the relationship between replication rate and prevalence is shown. Some information that can be obtained from this plot:
- The prevalence of true alternative hypothesis in the scientific field can be obtained when a replication rate is known (e.g. in a recent study a replication rate of 0.36 is found; The Reproducibility Project: Psychology (Open Science Collaboration, 2015).
- Given a certain prevalence, the replication rate will differ if other values for alpha and beta are chosen.



This app was made by Annika Nieper, Felix Wolff, Marius Kroesche, Sofieke Kevenaar & Zenab Tamimy
show with app
library(shinydashboard)
library(waffle)
library(ggplot2)
require("RColorBrewer")

ui <- dashboardPage(
  dashboardHeader(title = "The Power App"),
  
  dashboardSidebar(
    menuItem("Welcome", icon = icon("th"), tabName = "Welcome"),
    menuItem('Power_Plot',  icon = icon("th"),tabName = 'Power_Plot'),
    menuItem("Hit_False_Alarm_Ratio",  icon = icon("th"),tabName = "Hit_False_Alarm_Ratio"),
    menuItem("Replication",  icon = icon("th"),tabName = "Replication"),
    menuItem("About",  icon = icon("th"),tabName = "About")
    
    
  ),
  
  
  dashboardBody(
    includeScript("../../../Matomo-tquant.js"),
    
    tabItems(
      
      tabItem( tabName = 'Welcome',
               
               titlePanel("Welcome to this App"),
               br(),
               "The reproducability crisis has been a hot topic of debate in all sciences, but particular in psychological science.", 
               br(),
               "The failure to reproduce assumingly established findings has led to an agitation in both the media",
               br(),
               "and the scientific community. The current situation has more than once been labelled as a replication crisis.",
               br(),
               "This educational app can be used as a visualisational aid to understand the statistical fundamentals of null",
               br(),
               "hypothesis testing and how these fundamentals relate to the current replication crisis.",
               br(), 
               br(), 
               "In the Power Plot tab, there is a visualization and explaination of the relationship between alpha, effect size, sample size", 
               br(), 
               "and power. In the Hit/False Alarm Ratio tab, the relationship between prevalence, alpha and power is visualized.",
               br(), 
               "Finally, in the Replication tab, the prevalence of true hypotheses given a specific replication rate is calculated and visualized.",
               br(),
               h3("ENJOY!")
               
      ),
      
      tabItem(tabName =  'Power_Plot', 
              
              fluidRow(
                column(4,
                       sliderInput(inputId = "n", label = "Number of Observations",
                                   min = 50, max = 200, value = 100),
                       sliderInput(inputId = "k", label = "Alpha",
                                   min = 0.01, max = 0.9, step = 0.01, value = 0.05),
                       sliderInput(inputId = "r", label = "Effect Size",
                                   min = 0.1, max = 2, step = 0.1, value = 0.8)
                       ),
                
                column(8,
                       plotOutput (outputId="plot")
                       )
              ),
              
              fluidRow(
                box(width = 12,
                    title = "Explanation",
                    # background = "blue",
                    solidHeader = F,
                    
                    
                    "In nullhypothesis testing (NHT) we search for the probability to find our data or more extreme, given that our nullhypothesis is true. We compare this probability, known as the p-value, to an arbitrary cut-off within the probability distribution (alpha). When we find that the p-value is smaller than this cut-off value, we assume that it is so unlikely for our data to come from our NH-distribution, that we choose to reject the NH.",
                    br(),
                    br(),
                    "When the Null hypothesis is true:",
                    br(),
                    "- Alpha is the arbitrary cut-off level to which we compare our p-value. This value is also the probability of a Type 1 error. This error refers to the situation that we reject the null hypothesis while in reality it is true.",
                    br(),
                    "- The true negative refers to the probability of not rejecting the null hypothesis when in reality the hypothesis is true. The probability of a true negative is equal to 1 - alpha.",
                    br(),
                    br(),
                    "When the null hypothesis is not true:",
                    br(),
                    "- Power refers to the probability to find an effect when in reality it really exists, i.e. the probability to reject the null hypothesis when the alternative hypothesis is true. Power is dependent on sample size and effect size and is equal to 1 - beta.",
                    br(),
                    "- Beta is the probability of a Type 2 error. This error occurs when in reality the null hypothesis is not true, but we fail to reject it.",
                    br(),
                    br()
                )
              )
      ),
      
      
      
      tabItem(tabName =  'Hit_False_Alarm_Ratio', 
              
              # prevalence
              fluidRow(
                box(width=12,
                    title = "Prevalence of Experiments with true effect",
                    # background = "blue",
                    solidHeader = F,
                    
                    sliderInput(inputId = 'percTrueH1',label = 'proportion of experiments with true effects',
                                min = 0,max = 1 ,step = 0.01,value = 0.1),
                    
                    plotOutput('waffle_population',height = "140px")
                )
              ),
              
              # after testing
              fluidRow(
                box(width=12,
                    title = "After running experiments: true and false significant effects",
                    solidHeader = F,
                    
                    fluidRow(
                      column(6,
                             sliderInput(inputId = 'alpha',label = 'alpha',
                                         min = 0,max = 1 ,step = 0.01,value = 0.05)
                      ),
                      column(6,
                             sliderInput(inputId = 'power',label = 'power (1-beta)',
                                         min = 0,max = 1 ,step = 0.1,value = 0.8)
                      )
                    ),
                    
                    fluidRow(
                      # Info Boxes
                      valueBoxOutput("BoxTN",width = 3),
                      valueBoxOutput("BoxFP",width = 3),
                      valueBoxOutput("BoxFN",width = 3),
                      valueBoxOutput("BoxTP",width = 3)
                    ),
                    
                    plotOutput('waffle_testing',height = "140px")
                )
              ),
              
              # PPV
              fluidRow(
                box(width=12,
                    title = "How many of the significant results are really true?",
                    # background = "blue",
                    solidHeader = F,
                    
                    column(4,
                           valueBoxOutput("BoxPPV",width = 12)
                           ),
                    column(8,
                           plotOutput('waffle_ppv',height = "140px")
                           )
                    
                    
                )
              ),
              
              fluidRow(
                box(width=12,
                    
                    title = "Explanation",
                    # background = "blue",
                    solidHeader = F,
                    
                    br(), 
                    "The null hypothesis testing procedure is probabilistic. That is: there is always a (small) probability that an effect is not found (beta) or that the data seemed to show an effect by chance (alpha). Because of this, there will always be wrong decisions in null hypothesis testing.",
                    br(), 
                    br(), 
                    "Different situations can occur:",
                    br(), 
                    "- True positive: In reality the alternative hypothesis is TRUE and through NHT the nullhypothesis is rejected.", 
                    br(), 
                    "- True negative: In reality the alternative hypothesis is FALSE and through NHT the nullhypothesis is NOT rejected.",
                    br(), 
                    "- False positive: In reality the alternative hypothesis is FALSE but through NHT the nullhypothsis is rejected.",
                    br(), 
                    "- False negative: In reality the alternative hypothesis is TRUE but through NHT the nullhypothesis is NOT rejected.",
                    br(), 
                    br(), 
                    "The amount of False/True Positives/Negatives is dependent on the chosen alpha and beta levels and the prevalence of true alternative hypothesis in the scientific field. That is: how many alternative hypothesis of all (interesting) tested hypothesis are in reality true.",
                    br(),
                    br()
                    )
              )
      ),
      
      tabItem(tabName =  'Replication', 
              
              fluidRow(
                column(4,
                       sliderInput("bet", "Power", 0.2, 1, 0.1, value = 0.8),
                       sliderInput("alph", "Alpha", 0.01, 0.1, 0.001, value = 0.05),
                       sliderInput("yas", "Replication Rate", 0, 1, 0.01, value = 0.36)
                       ),
                column(8,
                       plotOutput("lines")
                       )
              ),
              
              fluidRow(
                box(width = 12,
                    title = "Corresponding prevalence",
                    # background = "blue",
                    solidHeader = F,
                    
                    textOutput("reptext"),
                    br(),
                    "In this plot the relationship between replication rate and prevalence is shown. Some information that can be obtained from this plot:",
                    br(), 
                    "- The prevalence of true alternative hypothesis in the scientific field can be obtained when a replication rate is known (e.g. in a recent study a replication rate of 0.36 is found; The Reproducibility Project: Psychology (Open Science Collaboration, 2015).",
                    br(), 
                    "- Given a certain prevalence, the replication rate will differ if other values for alpha and beta are chosen.", 
                    br(), 
                    br()
                    )
                )
      ),
              
            
      
      tabItem(tabName =  'About', 
              br(),
              img(src = "photo_TquanT1.png", height=300, allign="left"),
              br(),
              "This app was made by Annika Nieper, Felix Wolff, Marius Kroesche, Sofieke Kevenaar & Zenab Tamimy"
              )
      
    )
  )
)





server <- function(input, output) {
  
  
  #  Info Boxes ---------------------------
  output$BoxTN <- renderValueBox({
    valueBox(
      paste(round(memory()$numTN/10,0),'%',sep = ''), "True Negative", color = 'olive'#,'red'
    )
  })
  output$BoxFP <- renderValueBox({
    valueBox(
      paste(round(memory()$numFP/10,0),'%',sep = ''), "False Positive", color = 'lime'#'fuchsia'
    )
  })
  output$BoxFN <- renderValueBox({
    valueBox(
      paste(round(memory()$numFN/10,0),'%',sep = ''), "False Negative", color = 'blue'
    )
  })
  output$BoxTP <- renderValueBox({
    valueBox(
      paste(round(memory()$numTP/10,0),'%',sep = ''), "True Positive", color = 'aqua'
    )
  })
  
  output$BoxPPV <- renderValueBox({
    valueBox(
      paste(round(memory()$numTP/(memory()$numTP+memory()$numFP)*100,0),'%',sep = ''), 
      # "P(true effects | significant results) is the Positive Predictive Value", 
      "PPV", 
      
      color = 'aqua'#'fuchsia'
    )
  })
  
  
  
  
  memory<- reactive({
    numHypotheses <- 1000 # total amount of tested hypotheses
    percTrueH1 <- input$percTrueH1# 0.1 # fraction of true H1 of all hypothesis
    alpha <- input$alpha#0.05 # type I error rate: probability to pick up false effect if there really is no effect on nature
    power <- input$power#0.5 # probability to pick up effect with test if there is true effect in nature
    
    percTrueH0 <- 1 - percTrueH1 # fraction of true H0
    numTrueH1 <- percTrueH1 * numHypotheses # number of true H1
    numTrueH0 <- numHypotheses - numTrueH1 # number of true H0
    
    beta <- 1-power # probability to miss effect if effect is there in nature
    
    numTP <- power*numTrueH1 #power*numTrueH1 # number of true positive effects
    numFN <- numTrueH1 - numTP #beta*numTrueH1 # number false negative results missing out on true effects on nature
    numFP <- alpha*numTrueH0#alpha*numTrueH0 # number of false positive results produced by chance
    numTN <- numTrueH0 - numFP #(1-alpha)*numTrueH0 # number correct negative results
    
    PPV <- numTP/(numTP + numFP) # positive predictive value: proportion of true effects of all positice results
    
    result <- list(numHypotheses=numHypotheses,
                   percTrueH1=percTrueH1,
                   alpha=alpha,
                   power=power,
                   percTrueH0=percTrueH0,
                   numTrueH1=numTrueH1,
                   numTrueH0=numTrueH0,
                   beta=beta,
                   numTP=numTP,
                   numFN=numFN,
                   numFP=numFP,
                   numTN=numTN,
                   PPV=PPV
    )
  })
  
  ### WAFFLE PLOTS --------------------------------------------------
  
  
  output$waffle_population <- renderPlot({
    
    mycolors <- c('springgreen4','springgreen3','royalblue3','mediumturquoise')
    myrows = 12
    
    
    # prevalence of true effects
    wdat1 <- c('null effects'=memory()$numTrueH0,'true effects'=memory()$numTrueH1)
    p1<-waffle(wdat1,rows = myrows,size=.3,colors =c('green','blue'),
               #title = 'prevalence of true effects in nature',
               legend_pos = c(-1,-1))
    p1
  })
  
  output$waffle_testing <- renderPlot({
    
    mycolors <- c('springgreen4','springgreen3','royalblue3','mediumturquoise')
    myrows = 12
    
    # frequencies of TR, FP, FN, CN
    wdat2 <- c('False'=memory()$numTN,
               'False Positives'=memory()$numFP,
               'False Negatives'=memory()$numFN,
               'True'=memory()$numTP)
    p2<-waffle(wdat2,rows = myrows,size=.3,colors =mycolors[1:4],
               #title = 'prevalence of true effects in nature',
               legend_pos = c(1,1)) +theme(legend.position=c(0.5,3.5),legend.direction="horizontal")
    p2
  })
  
  output$waffle_ppv <- renderPlot({
    
    mycolors <- c('springgreen4','springgreen3','royalblue3','mediumturquoise')
    myrows = 12
    
    # frequencies of TR, FP, FN, CN
    wdat3 <- c('False'=memory()$numTN,
               'False Negatives'=memory()$numFN,
               'seperater' = myrows*3,
               'False Positives'=memory()$numFP,
               'True Positives'=memory()$numTP)
    p3<-waffle(wdat3,rows = myrows,size=.3,colors =c(mycolors[c(1,3)],'white',mycolors[c(2,4)]),
               #title = 'prevalence of true effects in nature',
               legend_pos = c(-1,-1))
    p3
  })
  
  output$plot <- renderPlot({
    m0 <- 0
    N <- input$n
    effectsize <- input$r
    sd <- 100/N
    m1 <- effectsize + m0
    alpha <- input$k
    cval <- qnorm(1-alpha,m0,sd)
    power  <-  pnorm(cval,m1,sd,lower=F)
    x0seq  <-  seq(m0-3*sd,m0+3*sd,length=500)
    x1seq  <-  seq(m1-3*sd,m1+3*sd,length=500)
    
    plot(x1seq,dnorm(x1seq,m1,sd)+0.04,xlim=c(-4, 4),type="l",
         lwd=2,xlab="", ylab="", yaxt = "n", col = "white",
         cex.axis=1.25,cex.lab=1.5, bty = "n")
    lines(x1seq, dnorm(x1seq,m1,sd),lwd=2, col = "blue")
    
    px <- c(rep(cval,2),seq(cval-0.01,m1-3*sd,length=50),m1-3*sd,cval)
    py <- c(0,dnorm(cval,m1,sd),dnorm(seq(cval-0.01,m1-3*sd,length=50),m1,sd),
            rep(dnorm(m1-3*sd,m1,sd),2))
    polygon(px,py,col="lightblue",border=NA)
    
    x3 <- c(rep(cval,2),seq(cval+0.01,m1+3*sd,length=50),m1,sd)
    y3 <- c(0,dnorm(cval,m1,sd),dnorm(seq(cval+0.01,m1+3*sd,length=50),m1,sd),
            rep(dnorm(m0+3*sd,m0,sd),2))
    polygon(x3,y3,col="gray",border=NA)
    
    px <- c(rep(cval,2),seq(cval+0.1,m0+3*sd,length=50),m0+3*sd,cval)
    py <- c(0,dnorm(cval,m0,sd),dnorm(seq(cval+0.1,m0+3*sd,length=50),m0,sd),
            rep(dnorm(m0+3*sd,m0,sd),2))
    polygon(px,py,col="darkblue",border=NA)
    
    lines(x0seq,dnorm(x1seq,m1,sd),lwd=2, col = "red")
    
    # abline(v=m0,col=1,lty=3)
    segments(x0 = m0, y0 = 0, x1 = m0, y1 = dnorm(m1,m1,sd), lwd = "1", lty = 3)
    # abline(v=m1,col=1,lty=3)
    segments(x0 = m1, y0 = 0, x1 = m1, y1 = dnorm(m1,m1,sd), lwd = "1", lty = 3)
    
    legend("topright", inset=.05,
           c("H0 is true", "H1 is true", "Type II error", "Type I error", "Power"), fill=c("red", "blue", "lightblue", "darkblue", "gray"), horiz=FALSE)
    segments(x0 = m0, y0 = 1.02*dnorm(m0,m0,sd), x1 = m1, y1 = 1.02*dnorm(m1,m1,sd), lwd = "3")
    text((m0 + m1)/2, 1.06*dnorm(m0,m0,sd), "Effect Size")
  })
  
  memory<- reactive({
    numHypotheses <- 1000 # total amount of tested hypotheses
    percTrueH1 <- input$percTrueH1# 0.1 # fraction of true H1 of all hypothesis
    alpha <- input$alpha#0.05 # type I error rate: probability to pick up false effect if there really is no effect on nature
    power <- input$power#0.5 # probability to pick up effect with test if there is true effect in nature
    
    percTrueH0 <- 1 - percTrueH1 # fraction of true H0
    numTrueH1 <- percTrueH1 * numHypotheses # number of true H1
    numTrueH0 <- numHypotheses - numTrueH1 # number of true H0
    
    beta <- 1-power # probability to miss effect if effect is there in nature
    
    numTP <- power*numTrueH1 #power*numTrueH1 # number of true positive effects
    numFN <- numTrueH1 - numTP #beta*numTrueH1 # number false negative results missing out on true effects on nature
    numFP <- alpha*numTrueH0#alpha*numTrueH0 # number of false positive results produced by chance
    numTN <- numTrueH0 - numFP #(1-alpha)*numTrueH0 # number correct negative results
    
    PPV <- numTP/(numTP + numFP) # positive predictive value: proportion of true effects of all positice results
    
    result <- list(numHypotheses=numHypotheses,
                   percTrueH1=percTrueH1,
                   alpha=alpha,
                   power=power,
                   percTrueH0=percTrueH0,
                   numTrueH1=numTrueH1,
                   numTrueH0=numTrueH0,
                   beta=beta,
                   numTP=numTP,
                   numFN=numFN,
                   numFP=numFP,
                   numTN=numTN,
                   PPV=PPV
    )
  })
  
  output$lines <- renderPlot({
    pi= seq(0, 1,length.out=1000)
    kans1 = (pi*(input$bet * input$bet) + (1-pi)*(input$alph * input$alph)) / (pi*(input$bet) + (1-pi)*(input$alph))
    
    plot(pi, kans1,ylim=c(0,1), type="l", ylab="Replication Rate", xlab="Prevalence of True H1", col="deepskyblue", bty="n", las=1, xlim=c(0,1), lwd = 2)
    abline(h = input$yas, lty=3)
    abline(v = ((input$alph-input$yas)*input$alph)/(input$alph*input$alph-input$yas*input$alph-input$bet*input$bet+input$yas*input$bet), lty=3)
    points(((input$alph-input$yas)*input$alph)/(input$alph*input$alph-input$yas*input$alph-input$bet*input$bet+input$yas*input$bet), input$yas, pch = 19, col="deepskyblue4", cex = 1.25)
  })
  output$reptext <- renderText (
    if (input$yas > input$bet){
      paste("replication rate cannot be bigger than power")
    } else {
      ((input$alph-input$yas)*input$alph)/(input$alph*input$alph-input$yas*input$alph-input$bet*input$bet+input$yas*input$bet)})
  
}

shinyApp(ui, server)