The Power App

Welcome to this App

The reproducability crisis has been a hot topic of debate in all sciences, but particular in psychological science.
The failure to reproduce assumingly established findings has led to an agitation in both the media
and the scientific community. The current situation has more than once been labelled as a replication crisis.
This educational app can be used as a visualisational aid to understand the statistical fundamentals of null
hypothesis testing and how these fundamentals relate to the current replication crisis.

In the Power Plot tab, there is a visualization and explaination of the relationship between alpha, effect size, sample size
and power. In the Hit/False Alarm Ratio tab, the relationship between prevalence, alpha and power is visualized.
Finally, in the Replication tab, the prevalence of true hypotheses given a specific replication rate is calculated and visualized.

ENJOY!

Number of Observations

Alpha

Effect Size

Explanation

In nullhypothesis testing (NHT) we search for the probability to find our data or more extreme, given that our nullhypothesis is true. We compare this probability, known as the p-value, to an arbitrary cut-off within the probability distribution (alpha). When we find that the p-value is smaller than this cut-off value, we assume that it is so unlikely for our data to come from our NH-distribution, that we choose to reject the NH.

When the Null hypothesis is true:
- Alpha is the arbitrary cut-off level to which we compare our p-value. This value is also the probability of a Type 1 error. This error refers to the situation that we reject the null hypothesis while in reality it is true.
- The true negative refers to the probability of not rejecting the null hypothesis when in reality the hypothesis is true. The probability of a true negative is equal to 1 - alpha.

When the null hypothesis is not true:
- Power refers to the probability to find an effect when in reality it really exists, i.e. the probability to reject the null hypothesis when the alternative hypothesis is true. Power is dependent on sample size and effect size and is equal to 1 - beta.
- Beta is the probability of a Type 2 error. This error occurs when in reality the null hypothesis is not true, but we fail to reject it.

Prevalence of Experiments with true effect

proportion of experiments with true effects

After running experiments: true and false significant effects

alpha

power (1-beta)

How many of the significant results are really true?

Explanation

The null hypothesis testing procedure is probabilistic. That is: there is always a (small) probability that an effect is not found (beta) or that the data seemed to show an effect by chance (alpha). Because of this, there will always be wrong decisions in null hypothesis testing.

Different situations can occur:
- True positive: In reality the alternative hypothesis is TRUE and through NHT the nullhypothesis is rejected.
- True negative: In reality the alternative hypothesis is FALSE and through NHT the nullhypothesis is NOT rejected.
- False positive: In reality the alternative hypothesis is FALSE but through NHT the nullhypothsis is rejected.
- False negative: In reality the alternative hypothesis is TRUE but through NHT the nullhypothesis is NOT rejected.

The amount of False/True Positives/Negatives is dependent on the chosen alpha and beta levels and the prevalence of true alternative hypothesis in the scientific field. That is: how many alternative hypothesis of all (interesting) tested hypothesis are in reality true.

Power

Alpha

Replication Rate

Corresponding prevalence

In this plot the relationship between replication rate and prevalence is shown. Some information that can be obtained from this plot:
- The prevalence of true alternative hypothesis in the scientific field can be obtained when a replication rate is known (e.g. in a recent study a replication rate of 0.36 is found; The Reproducibility Project: Psychology (Open Science Collaboration, 2015).
- Given a certain prevalence, the replication rate will differ if other values for alpha and beta are chosen.

This app was made by Annika Nieper, Felix Wolff, Marius Kroesche, Sofieke Kevenaar & Zenab Tamimy

library(shinydashboard) library(waffle) library(ggplot2) require("RColorBrewer") ui <- dashboardPage( dashboardHeader(title = "The Power App"), dashboardSidebar( menuItem("Welcome", icon = icon("th"), tabName = "Welcome"), menuItem('Power_Plot', icon = icon("th"),tabName = 'Power_Plot'), menuItem("Hit_False_Alarm_Ratio", icon = icon("th"),tabName = "Hit_False_Alarm_Ratio"), menuItem("Replication", icon = icon("th"),tabName = "Replication"), menuItem("About", icon = icon("th"),tabName = "About") ), dashboardBody( includeScript("../../../Matomo-tquant.js"), tabItems( tabItem( tabName = 'Welcome', titlePanel("Welcome to this App"), br(), "The reproducability crisis has been a hot topic of debate in all sciences, but particular in psychological science.", br(), "The failure to reproduce assumingly established findings has led to an agitation in both the media", br(), "and the scientific community. The current situation has more than once been labelled as a replication crisis.", br(), "This educational app can be used as a visualisational aid to understand the statistical fundamentals of null", br(), "hypothesis testing and how these fundamentals relate to the current replication crisis.", br(), br(), "In the Power Plot tab, there is a visualization and explaination of the relationship between alpha, effect size, sample size", br(), "and power. In the Hit/False Alarm Ratio tab, the relationship between prevalence, alpha and power is visualized.", br(), "Finally, in the Replication tab, the prevalence of true hypotheses given a specific replication rate is calculated and visualized.", br(), h3("ENJOY!") ), tabItem(tabName = 'Power_Plot', fluidRow( column(4, sliderInput(inputId = "n", label = "Number of Observations", min = 50, max = 200, value = 100), sliderInput(inputId = "k", label = "Alpha", min = 0.01, max = 0.9, step = 0.01, value = 0.05), sliderInput(inputId = "r", label = "Effect Size", min = 0.1, max = 2, step = 0.1, value = 0.8) ), column(8, plotOutput (outputId="plot") ) ), fluidRow( box(width = 12, title = "Explanation", # background = "blue", solidHeader = F, "In nullhypothesis testing (NHT) we search for the probability to find our data or more extreme, given that our nullhypothesis is true. We compare this probability, known as the p-value, to an arbitrary cut-off within the probability distribution (alpha). When we find that the p-value is smaller than this cut-off value, we assume that it is so unlikely for our data to come from our NH-distribution, that we choose to reject the NH.", br(), br(), "When the Null hypothesis is true:", br(), "- Alpha is the arbitrary cut-off level to which we compare our p-value. This value is also the probability of a Type 1 error. This error refers to the situation that we reject the null hypothesis while in reality it is true.", br(), "- The true negative refers to the probability of not rejecting the null hypothesis when in reality the hypothesis is true. The probability of a true negative is equal to 1 - alpha.", br(), br(), "When the null hypothesis is not true:", br(), "- Power refers to the probability to find an effect when in reality it really exists, i.e. the probability to reject the null hypothesis when the alternative hypothesis is true. Power is dependent on sample size and effect size and is equal to 1 - beta.", br(), "- Beta is the probability of a Type 2 error. This error occurs when in reality the null hypothesis is not true, but we fail to reject it.", br(), br() ) ) ), tabItem(tabName = 'Hit_False_Alarm_Ratio', # prevalence fluidRow( box(width=12, title = "Prevalence of Experiments with true effect", # background = "blue", solidHeader = F, sliderInput(inputId = 'percTrueH1',label = 'proportion of experiments with true effects', min = 0,max = 1 ,step = 0.01,value = 0.1), plotOutput('waffle_population',height = "140px") ) ), # after testing fluidRow( box(width=12, title = "After running experiments: true and false significant effects", solidHeader = F, fluidRow( column(6, sliderInput(inputId = 'alpha',label = 'alpha', min = 0,max = 1 ,step = 0.01,value = 0.05) ), column(6, sliderInput(inputId = 'power',label = 'power (1-beta)', min = 0,max = 1 ,step = 0.1,value = 0.8) ) ), fluidRow( # Info Boxes valueBoxOutput("BoxTN",width = 3), valueBoxOutput("BoxFP",width = 3), valueBoxOutput("BoxFN",width = 3), valueBoxOutput("BoxTP",width = 3) ), plotOutput('waffle_testing',height = "140px") ) ), # PPV fluidRow( box(width=12, title = "How many of the significant results are really true?", # background = "blue", solidHeader = F, column(4, valueBoxOutput("BoxPPV",width = 12) ), column(8, plotOutput('waffle_ppv',height = "140px") ) ) ), fluidRow( box(width=12, title = "Explanation", # background = "blue", solidHeader = F, br(), "The null hypothesis testing procedure is probabilistic. That is: there is always a (small) probability that an effect is not found (beta) or that the data seemed to show an effect by chance (alpha). Because of this, there will always be wrong decisions in null hypothesis testing.", br(), br(), "Different situations can occur:", br(), "- True positive: In reality the alternative hypothesis is TRUE and through NHT the nullhypothesis is rejected.", br(), "- True negative: In reality the alternative hypothesis is FALSE and through NHT the nullhypothesis is NOT rejected.", br(), "- False positive: In reality the alternative hypothesis is FALSE but through NHT the nullhypothsis is rejected.", br(), "- False negative: In reality the alternative hypothesis is TRUE but through NHT the nullhypothesis is NOT rejected.", br(), br(), "The amount of False/True Positives/Negatives is dependent on the chosen alpha and beta levels and the prevalence of true alternative hypothesis in the scientific field. That is: how many alternative hypothesis of all (interesting) tested hypothesis are in reality true.", br(), br() ) ) ), tabItem(tabName = 'Replication', fluidRow( column(4, sliderInput("bet", "Power", 0.2, 1, 0.1, value = 0.8), sliderInput("alph", "Alpha", 0.01, 0.1, 0.001, value = 0.05), sliderInput("yas", "Replication Rate", 0, 1, 0.01, value = 0.36) ), column(8, plotOutput("lines") ) ), fluidRow( box(width = 12, title = "Corresponding prevalence", # background = "blue", solidHeader = F, textOutput("reptext"), br(), "In this plot the relationship between replication rate and prevalence is shown. Some information that can be obtained from this plot:", br(), "- The prevalence of true alternative hypothesis in the scientific field can be obtained when a replication rate is known (e.g. in a recent study a replication rate of 0.36 is found; The Reproducibility Project: Psychology (Open Science Collaboration, 2015).", br(), "- Given a certain prevalence, the replication rate will differ if other values for alpha and beta are chosen.", br(), br() ) ) ), tabItem(tabName = 'About', br(), img(src = "photo_TquanT1.png", height=300, allign="left"), br(), "This app was made by Annika Nieper, Felix Wolff, Marius Kroesche, Sofieke Kevenaar & Zenab Tamimy" ) ) ) ) server <- function(input, output) { # Info Boxes --------------------------- output$BoxTN <- renderValueBox({ valueBox( paste(round(memory()$numTN/10,0),'%',sep = ''), "True Negative", color = 'olive'#,'red' ) }) output$BoxFP <- renderValueBox({ valueBox( paste(round(memory()$numFP/10,0),'%',sep = ''), "False Positive", color = 'lime'#'fuchsia' ) }) output$BoxFN <- renderValueBox({ valueBox( paste(round(memory()$numFN/10,0),'%',sep = ''), "False Negative", color = 'blue' ) }) output$BoxTP <- renderValueBox({ valueBox( paste(round(memory()$numTP/10,0),'%',sep = ''), "True Positive", color = 'aqua' ) }) output$BoxPPV <- renderValueBox({ valueBox( paste(round(memory()$numTP/(memory()$numTP+memory()$numFP)*100,0),'%',sep = ''), # "P(true effects | significant results) is the Positive Predictive Value", "PPV", color = 'aqua'#'fuchsia' ) }) memory<- reactive({ numHypotheses <- 1000 # total amount of tested hypotheses percTrueH1 <- input$percTrueH1# 0.1 # fraction of true H1 of all hypothesis alpha <- input$alpha#0.05 # type I error rate: probability to pick up false effect if there really is no effect on nature power <- input$power#0.5 # probability to pick up effect with test if there is true effect in nature percTrueH0 <- 1 - percTrueH1 # fraction of true H0 numTrueH1 <- percTrueH1 * numHypotheses # number of true H1 numTrueH0 <- numHypotheses - numTrueH1 # number of true H0 beta <- 1-power # probability to miss effect if effect is there in nature numTP <- power*numTrueH1 #power*numTrueH1 # number of true positive effects numFN <- numTrueH1 - numTP #beta*numTrueH1 # number false negative results missing out on true effects on nature numFP <- alpha*numTrueH0#alpha*numTrueH0 # number of false positive results produced by chance numTN <- numTrueH0 - numFP #(1-alpha)*numTrueH0 # number correct negative results PPV <- numTP/(numTP + numFP) # positive predictive value: proportion of true effects of all positice results result <- list(numHypotheses=numHypotheses, percTrueH1=percTrueH1, alpha=alpha, power=power, percTrueH0=percTrueH0, numTrueH1=numTrueH1, numTrueH0=numTrueH0, beta=beta, numTP=numTP, numFN=numFN, numFP=numFP, numTN=numTN, PPV=PPV ) }) ### WAFFLE PLOTS -------------------------------------------------- output$waffle_population <- renderPlot({ mycolors <- c('springgreen4','springgreen3','royalblue3','mediumturquoise') myrows = 12 # prevalence of true effects wdat1 <- c('null effects'=memory()$numTrueH0,'true effects'=memory()$numTrueH1) p1<-waffle(wdat1,rows = myrows,size=.3,colors =c('green','blue'), #title = 'prevalence of true effects in nature', legend_pos = c(-1,-1)) p1 }) output$waffle_testing <- renderPlot({ mycolors <- c('springgreen4','springgreen3','royalblue3','mediumturquoise') myrows = 12 # frequencies of TR, FP, FN, CN wdat2 <- c('False'=memory()$numTN, 'False Positives'=memory()$numFP, 'False Negatives'=memory()$numFN, 'True'=memory()$numTP) p2<-waffle(wdat2,rows = myrows,size=.3,colors =mycolors[1:4], #title = 'prevalence of true effects in nature', legend_pos = c(1,1)) +theme(legend.position=c(0.5,3.5),legend.direction="horizontal") p2 }) output$waffle_ppv <- renderPlot({ mycolors <- c('springgreen4','springgreen3','royalblue3','mediumturquoise') myrows = 12 # frequencies of TR, FP, FN, CN wdat3 <- c('False'=memory()$numTN, 'False Negatives'=memory()$numFN, 'seperater' = myrows*3, 'False Positives'=memory()$numFP, 'True Positives'=memory()$numTP) p3<-waffle(wdat3,rows = myrows,size=.3,colors =c(mycolors[c(1,3)],'white',mycolors[c(2,4)]), #title = 'prevalence of true effects in nature', legend_pos = c(-1,-1)) p3 }) output$plot <- renderPlot({ m0 <- 0 N <- input$n effectsize <- input$r sd <- 100/N m1 <- effectsize + m0 alpha <- input$k cval <- qnorm(1-alpha,m0,sd) power <- pnorm(cval,m1,sd,lower=F) x0seq <- seq(m0-3*sd,m0+3*sd,length=500) x1seq <- seq(m1-3*sd,m1+3*sd,length=500) plot(x1seq,dnorm(x1seq,m1,sd)+0.04,xlim=c(-4, 4),type="l", lwd=2,xlab="", ylab="", yaxt = "n", col = "white", cex.axis=1.25,cex.lab=1.5, bty = "n") lines(x1seq, dnorm(x1seq,m1,sd),lwd=2, col = "blue") px <- c(rep(cval,2),seq(cval-0.01,m1-3*sd,length=50),m1-3*sd,cval) py <- c(0,dnorm(cval,m1,sd),dnorm(seq(cval-0.01,m1-3*sd,length=50),m1,sd), rep(dnorm(m1-3*sd,m1,sd),2)) polygon(px,py,col="lightblue",border=NA) x3 <- c(rep(cval,2),seq(cval+0.01,m1+3*sd,length=50),m1,sd) y3 <- c(0,dnorm(cval,m1,sd),dnorm(seq(cval+0.01,m1+3*sd,length=50),m1,sd), rep(dnorm(m0+3*sd,m0,sd),2)) polygon(x3,y3,col="gray",border=NA) px <- c(rep(cval,2),seq(cval+0.1,m0+3*sd,length=50),m0+3*sd,cval) py <- c(0,dnorm(cval,m0,sd),dnorm(seq(cval+0.1,m0+3*sd,length=50),m0,sd), rep(dnorm(m0+3*sd,m0,sd),2)) polygon(px,py,col="darkblue",border=NA) lines(x0seq,dnorm(x1seq,m1,sd),lwd=2, col = "red") # abline(v=m0,col=1,lty=3) segments(x0 = m0, y0 = 0, x1 = m0, y1 = dnorm(m1,m1,sd), lwd = "1", lty = 3) # abline(v=m1,col=1,lty=3) segments(x0 = m1, y0 = 0, x1 = m1, y1 = dnorm(m1,m1,sd), lwd = "1", lty = 3) legend("topright", inset=.05, c("H0 is true", "H1 is true", "Type II error", "Type I error", "Power"), fill=c("red", "blue", "lightblue", "darkblue", "gray"), horiz=FALSE) segments(x0 = m0, y0 = 1.02*dnorm(m0,m0,sd), x1 = m1, y1 = 1.02*dnorm(m1,m1,sd), lwd = "3") text((m0 + m1)/2, 1.06*dnorm(m0,m0,sd), "Effect Size") }) memory<- reactive({ numHypotheses <- 1000 # total amount of tested hypotheses percTrueH1 <- input$percTrueH1# 0.1 # fraction of true H1 of all hypothesis alpha <- input$alpha#0.05 # type I error rate: probability to pick up false effect if there really is no effect on nature power <- input$power#0.5 # probability to pick up effect with test if there is true effect in nature percTrueH0 <- 1 - percTrueH1 # fraction of true H0 numTrueH1 <- percTrueH1 * numHypotheses # number of true H1 numTrueH0 <- numHypotheses - numTrueH1 # number of true H0 beta <- 1-power # probability to miss effect if effect is there in nature numTP <- power*numTrueH1 #power*numTrueH1 # number of true positive effects numFN <- numTrueH1 - numTP #beta*numTrueH1 # number false negative results missing out on true effects on nature numFP <- alpha*numTrueH0#alpha*numTrueH0 # number of false positive results produced by chance numTN <- numTrueH0 - numFP #(1-alpha)*numTrueH0 # number correct negative results PPV <- numTP/(numTP + numFP) # positive predictive value: proportion of true effects of all positice results result <- list(numHypotheses=numHypotheses, percTrueH1=percTrueH1, alpha=alpha, power=power, percTrueH0=percTrueH0, numTrueH1=numTrueH1, numTrueH0=numTrueH0, beta=beta, numTP=numTP, numFN=numFN, numFP=numFP, numTN=numTN, PPV=PPV ) }) output$lines <- renderPlot({ pi= seq(0, 1,length.out=1000) kans1 = (pi*(input$bet * input$bet) + (1-pi)*(input$alph * input$alph)) / (pi*(input$bet) + (1-pi)*(input$alph)) plot(pi, kans1,ylim=c(0,1), type="l", ylab="Replication Rate", xlab="Prevalence of True H1", col="deepskyblue", bty="n", las=1, xlim=c(0,1), lwd = 2) abline(h = input$yas, lty=3) abline(v = ((input$alph-input$yas)*input$alph)/(input$alph*input$alph-input$yas*input$alph-input$bet*input$bet+input$yas*input$bet), lty=3) points(((input$alph-input$yas)*input$alph)/(input$alph*input$alph-input$yas*input$alph-input$bet*input$bet+input$yas*input$bet), input$yas, pch = 19, col="deepskyblue4", cex = 1.25) }) output$reptext <- renderText ( if (input$yas > input$bet){ paste("replication rate cannot be bigger than power") } else { ((input$alph-input$yas)*input$alph)/(input$alph*input$alph-input$yas*input$alph-input$bet*input$bet+input$yas*input$bet)}) } shinyApp(ui, server)