Validating Knowledge Structures

How well does the knowledge structure fit to the data?



Choose a data set:


Choose a coefficient:





Patterns of responses not included in the diagram:

 

Distance Distribution

Please keep in mind that the maximal possible distance is
   dmax = ⌊|Q|/2⌋ = 2.
Please note that the surmise relation for a structure is always the surmise relation of the including quasi-ordinal knowledge space, i.e. the closure of the structure under union and intersection.
show with app
### packages
library(shiny)
library(kst)
library(pks)
library(shinyBS)
library(Rgraphviz)
library(plotrix)
library(markdown)
data(DoignonFalmagne7)
data(Taagepera)




### draw it
ui <- fluidPage(
  includeScript("../../../Matomo-tquant.js"),
  titlePanel(h1("Validating Knowledge Structures", style = "color: #db2561"),
             "Validating Knowledge Structures"),
  h2("How well does the knowledge structure fit to the data?", style="color: #660d2a"),
  
  fluidRow(
    ## first column 
    
    column(width = 4,
           # choose data set 
           
           br(),
           br(),
           h4("Choose a data set:"),
           radioButtons(
             "data_set", "", width = 300, 
             c("Doignon & Falmagne", "Density 97 (Taagepera)", "Matter 97 (Taagepera)"), 
             inline = FALSE
           ),
           
           checkboxInput("use_qosp", "Use quasi-ordinal knowledge spaces?"),
           br(),
           
           # choose coefficient
           h4("Choose a coefficient:"),
           coeff_choice <- selectInput("coeff_choice", "", c("Gamma Index" = "gamma", "Discrepancy Indey" = "DI", "Distance Agreement Coefficient" = "DA", "Violational Coefficient" = "VC")),
           
           
           htmlOutput("coeff"),
           br()
           
           
    ),
    ## second column
    column(width = 8,
           # hasse plot
           plotOutput(outputId = "hasseplot"),
           column(width = 8, offset = 3,
                  textOutput("n_in"),
                  textOutput("n_out")),
           br(),
           br(),
           br()
    )),
  
  fluidRow(
    column(width = 4,
           
           actionButton("infoT", label = "", icon = icon("info-circle"),
                        style="color: #fff; background-color: #db2561; border-color: #db2561"),
           
           bsModal("infoM", "Glossary of Coefficients", "infoT",
                   size = "large", includeHTML("www/methods.html")),
           
           actionButton("infoData", label = "", icon = icon("table"),
                        style="color: #fff; background-color: #a91945; border-color: #db2561"),
           
           bsModal("infoD", "About the Data", "infoData",
                   size = "large", includeHTML("www/data.html")),
           
           actionButton("infoUs", label = "", icon = icon("child"),
                        style="color: #fff; background-color: #660d2a; border-color: #660d2a"),
           
           bsModal("infoAboutUs", " ", "infoUs",
                   size = "large", includeHTML("www/about.html")),
           top = 600, bottom = 300),
    
    column(width = 8,
           h4("Patterns of responses not included in the diagram:"),
           textOutput("patt"), # top = 600, bottom = 600, left = 600, right = 100)
           HTML("<p>&nbsp;</p>")
    )),
  
  conditionalPanel(condition = '(input.coeff_choice == "DA") || (input.coeff_choice == "DI")',
                   fluidRow(
                     column(width = 4,
                            h4("Distance Distribution"),
                            tableOutput("distances"),
                            HTML("Please keep in mind that the maximal possible distance
                               is<br>&nbsp;&nbsp; d<sub>max</sub> = &lfloor;|Q|/2&rfloor; = 2.")
                     ),
                     column(width = 4, offset = 2,
                            plotOutput("distanceplot")
                     )
                   )    
  ),
  
  conditionalPanel(condition = '(input.coeff_choice == "gamma") || (input.coeff_choice == "VC")',
                   fluidRow(
                     column(width = 4,
                            HTML("Please note that the surmise relation for a structure is always the
                                 surmise relation of the including quasi-ordinal knowledge space, i.e.
                                 the closure of the structure under union and intersection.")),
                     column(width = 6, offset = 1,
                            plotOutput("srplot")
                     )
                   )    
  )
)



### server function
server <- function(input, output){
  
  all_data <- reactive({
    if (!input$use_qosp) {
      if (input$data_set == "Doignon & Falmagne") {
        strcdf <- kstructure(as.pattern(DoignonFalmagne7$K, as.set = TRUE))
        respdf <- as.binmat(DoignonFalmagne7$N.R, uniq = FALSE)
        mat <- 1 * t(apply(respdf, 1, function(r) {
          apply(DoignonFalmagne7$K, 1, function(s) {
            all(r == s) 
          })
        }))
        ladf <- as.logical(rowSums(mat))
        # n of people involved in structure
        n_indf <- dim(as.data.frame(respdf)[ladf, ])[1]
        # n of people not involved
        n_outdf <- dim(as.data.frame(respdf))[1] - dim(as.data.frame(respdf)[ladf, ])[1]

        ad <- list(
          strc = strcdf,
          resp = respdf,
          la = ladf,
          n_in = n_indf,
          n_out = n_outdf,
          plottitle = "Knowledge Structure of Doignon & Falmagne"
        )
        
      } else if (input$data_set == "Density 97 (Taagepera)") {
        strcd <- kstructure(as.pattern(density97$K, as.set = TRUE))
        respd <- as.binmat(density97$N.R, uniq = FALSE)
        mat <- 1 * t(apply(respd, 1, function(r) {
          apply(density97$K, 1, function(s) {
            all(r == s) 
          })
        }))
        lad <- as.logical(rowSums(mat))

        # n of people involved in structure
        n_ind <- dim(as.data.frame(respd)[lad, ])[1]
        # n of people not involved
        n_outd <- dim(as.data.frame(respd))[1] - dim(as.data.frame(respd)[lad, ])[1]
        ad <- list(
          strc = strcd,
          resp = respd,
          la = lad,
          n_in = n_ind,
          n_out = n_outd,
          plottitle = "Knowledge Structure of Density 97"
        )
        
      } else { # Matter 97
        strcm <- kstructure(as.pattern(matter97$K, as.set = TRUE))
        respm <- as.binmat(matter97$N.R, uniq = FALSE)
        mat <- 1 * t(apply(respm, 1, function(r) {
          apply(matter97$K, 1, function(s) {
            all(r == s) 
          })
        }))
        lam <- as.logical(rowSums(mat))
        # n of people involved in structure
        n_inm <- dim(as.data.frame(respm)[lam, ])[1]
        # n of people not involved
        n_outm <- dim(as.data.frame(respm))[1] - dim(as.data.frame(respm)[lam, ])[1]
        ad <- list(
          strc = strcm,
          resp = respm,
          la = lam,
          n_in = n_inm,
          n_out = n_outm,
          plottitle = "Knowledge Structure of Matter 97"
        )
      }
      
    } else {
      if (input$data_set == "Doignon & Falmagne") {
        # The DF7 structure is already a quasi-ordinal knowledge space
        qoksdf <- kstructure(as.pattern(DoignonFalmagne7$K, as.set = TRUE))
        respdf <- as.binmat(DoignonFalmagne7$N.R, uniq = FALSE)
        mat <- 1 * t(apply(respdf, 1, function(r) {
          apply(as.binmat(qoksdf), 1, function(s) {
            all(r == s) 
          })
        }))
        laqodf <- as.logical(rowSums(mat))
        # n of people involved in structure
        n_inqodf <- dim(as.data.frame(respdf)[laqodf, ])[1]
        # n of people not involved
        n_outqodf <- dim(as.data.frame(respdf))[1] - dim(as.data.frame(respdf)[laqodf, ])[1]
        
        ad <- list(
          strc = qoksdf,
          resp = respdf,
          la = laqodf,
          n_in = n_inqodf,
          n_out = n_outqodf,
          plottitle = "Quasi-ordinal Knowledge Space of Doignon & Falmagne"
        )
        
      } else if (input$data_set == "Density 97 (Taagepera)") {
        strcd <- kstructure(as.pattern(density97$K, as.set = TRUE))
        qoksd <- closure(kspace(strcd), operation = "intersection")
        respd <- as.binmat(density97$N.R, uniq = FALSE)
        mat <- 1 * t(apply(respd, 1, function(r) {
          apply(as.binmat(qoksd), 1, function(s) {
            all(r == s) 
          })
        }))
        laqod <- as.logical(rowSums(mat))
        
        
        # n of people involved in structure
        n_inqod <- dim(as.data.frame(respd)[laqod, ])[1]
        # n of people not involved
        n_outqod <- dim(as.data.frame(respd))[1] - dim(as.data.frame(respd)[laqod, ])[1]
        ad <- list(
          strc = qoksd,
          resp = respd,
          la = laqod,
          n_in = n_inqod,
          n_out = n_outqod,
          plottitle = "Quasi-ordinal Knowledge Space of Density 97"
        )
      } else { # Matter 97
        strcm <- kstructure(as.pattern(matter97$K, as.set = TRUE))
        qoksm <- closure(kspace(strcm), operation = "intersection")
        respm <- as.binmat(matter97$N.R, uniq = FALSE)
        mat <- 1 * t(apply(respm, 1, function(r) {
          apply(matter97$K, 1, function(s) {
            all(r == s) 
          })
        }))
        laqom <- as.logical(rowSums(mat))
        
        # n of people involved in structure
        n_inqom <- dim(as.data.frame(respm)[laqom, ])[1]
        # n of people not involved
        n_outqom <- dim(as.data.frame(respm))[1] - dim(as.data.frame(respm)[laqom, ])[1]
        ad <- list(
          strc = qoksm,
          resp = respm,
          la = laqom,
          n_in = n_inqom,
          n_out = n_outqom,
          plottitle = "Quasi-ordinal Knowledge Space of Matter 97"
        )
      }
    }
    ad
  })
  
  distances <- reactive({kvalidate(all_data()$strc, 
                                   all_data()$resp, 
                                   method = "DI"
  )$di_dist
  })
  
  
  
  # coefficients
  output$coeff <- renderText({
    kval <- kvalidate(all_data()$strc, all_data()$resp, method = input$coeff_choice)
    if (input$coeff_choice == "DA") {
      c("d<sub>dat</sub> =", round(kval$ddat, 3),
        "<br>d<sub>pot</sub> =", round(kval$dpot, 3),
        "<br>DA =", round(kval$DA, 3)
      )
    } else if (input$coeff_choice == "DI") {
      c("DI =", round(kval$di, 3))
    } else if (input$coeff_choice == "gamma") {
      c("&gamma; =", round(kval$gamma, 3), 
        "<br>N<sub>c</sub> =", kval$nc,
        "<br>N<sub>d</sub> =", kval$nd
      )
    } else { # input$coeff_choice == "VC"
      c("VC =", round(kval$vc, 3),
        "<br>N<sub>d</sub> =", kval$nd
      ) 
    }
  })
  
  # hasse plots
  output$hasseplot <- renderPlot({
    output$n_in <- renderText(c("Number of participants fitting to a knowledge state: ", all_data()$n_in))
    output$n_out <- renderText(c("Number of participants not fitting: ", all_data()$n_out))
    output$patt <- renderPrint(print(as.pattern(as.data.frame(all_data()$resp)[!all_data()$la, ], as.letters = TRUE, 
                                                as.set =  TRUE ), quote = FALSE))
    plot(kstructure(all_data()$strc), 
         main = all_data()$plottitle)
  })
  
  output$srplot <- renderPlot(plot(as.relation(all_data()$strc),
                                   main = "Surmise Relation of the Structure above"))
  
  
  output$distances <- renderTable(distances())
  output$distanceplot <- renderPlot(barplot(distances(), col = c("darkgreen", "darkblue", "red"), xlab = "Distance", ylab = "Frequency", main =  "Distance distribution"))
  
}

shinyApp(ui, server)
<center><b>This app was built by </b>
    &nbsp
<p>
  <p> &nbsp
    Arne John
    <p>
    Alice Maurer
    <p>
    Benjamin Heikkinen
    <p>
    Mª Leonor Neto
    <p>
    Cord Hockemeyer
    </center>
<h2>Example Spaces</h2>
As example data, knowledge spaces provided by the R package pks (Heller &amp; Wickelmaier, 2013;
Wickelmaier et al., 2016) are used. Concretely, the following spaces are used:
<dl>
<dt>Density 97</dt>
<dd>Taagepera et al. (1997) applied knowledge space theory to specific science problems. The 
density test was administered to 2060 students, a sub structure of five items is included here. </dd>
<dt>Matter 97</dt>
<dd>Taagepera et al. (1997) applied knowledge space theory to specific science problems. The conservation 
of matter test was administered to 1620 students, a sub structure of five items is included here.</dd>
<dt>Doignon &amp; Falmagne</dt>
<dd>Fictitious data set from Doignon and Falmagne (1999, chap. 7). </dd>
</dl>
Please note that for "Matter 97" and "Density 97" the structures are not (quasi-ordinal) knowledge spaces. The VC
and &gamma; coefficients work on the underlying surmise relation, i.e. effectively on the closure of the
knowledge structure under union and intersection.

<h4>References</h4>
Doignon, J.-P., &amp; Falmagne, J.-C. (1999). <i>Knowledge spaces.</i> Berlin: Springer.
<p />
Heller, J. &amp; Wickelmaier, F. (2013). Minimum discrepancy estimation in probabilistic knowledge structures. 
<i>Electronic Notes in Discrete Mathematics, 42,</i> 49-56.
<p />
Schrepp, M., Held, T., &amp; Albert, D. (1999). Component-based construction of surmise relations for chess problems. 
 In D. Albert &amp; J. Lukas (Eds.), Knowledge spaces: Theories, empirical research, and applications (pp. 41--66). 
 Mahwah, NJ: Erlbaum. 
 <p />
Taagepera, M., Potter, F., Miller, G.E., &amp; Lakshminarayan, K. (1997). Mapping students' thinking patterns by 
the use of knowledge space theory. <i>International Journal of Science Education, 19,</i> 283--302.
<p />
Wickelmaier, F., Heller, J., &amp; Anselmi, P. (2016). <i>pks: Probabilistic Knowledge Structures.</i> R package
version 0.4-0. <a href="https://CRAN.R-project.org/package=kst">https://CRAN.R-project.org/package=kst</a>
<p />
<b> Gamma Index: </b> It was proposed by Goodman & Kruskal (1972) as a way of validating the prerequisite relations and assumes that not 
    every response pattern is represented by a prerequisite relation. It compares the number of response patterns
    that are represented by a prerequisite relation with the number of response patterns that are not. 
    <p>
    <center>&gamma; = (N<sub>c</sub> - N<sub>d</sub>) / (N<sub>c</sub> + N<sub>d</sub>)</center>
    <p>
    where N<sub>c</sub> is the number of concordant pairs and N<sub>d</sub> is the number of discordant pairs.
    <p>
Generally, a positive Gamma Index supports the validity of the <b>prerequisite relations</b> and a negative Gamma Index
violates the its validity.
  <p> 
&nbsp
&nbsp
    <p>
    <b>Violational Coefficient: </b>It validades <b>prerequisite relations</b>. For this purpose, the number of violations against the prerequisite relations are calculated.
    A low VC supports the validity of prerequisite relations.
    <p>
    <p>
    <center>VC = &sum; <i>v<sub>xy</sub></i> / <i>n</i>(|S| - <i>m</i>)</center>
    &nbsp
&nbsp
    &nbsp
    <p>
    <b> Discrepancy Index: </b> It was proposef by Kambouri et al. (1994) as a way of validating the knowledge 
    space and allows for the evaluation of its quality. <i>DI</i> calculates the average minimal distance
    between each response pattern and the nearest state in the knowledge space.
    <p>
    <p>
    <center><i>DI</i> = &sum;<sub>r</sub> min<sub>K</sub>{ <i>d</i> (<i>r, K</i>) | <i>r</i> &epsi; <i>R</i> } / <i>n</i></center>
    <p>
    &nbsp
&nbsp
&nbsp
    <p>
    <b> Distance Agreement Coefficient: </b> It was proposef by Schrepp (1999) as a way of validating the knowledge space and allows for the evaluation
    of its quality. It compares the average symmetric distance between the knowledge states and participant's response patterns (<i>d<sub>dat</sub></i>) with the
    average distance between the resulting knowledge states and the power set of items (<i>d<sub>pot</sub></i>).
    <p>
    <b><i>d<sub>dat</sub></i></b> calculates the average minimal distance between each response pattern and the nearest state in the knowledge space.
    <p>
    <p>
    <center><i>d<sub>dat</sub></i> = &sum;<sub>r</sub> min<sub>K</sub>{ <i>d</i> (<i>r, K</i>) | <i>r</i> &epsi; <i>R</i> } / <i>n</i></center>
    <p>
    <p>
    <b><i>d<sub>pot</sub></i></b> calculates the average minimal distance between the power set of all potencial response patterns and the nearest state in the knowledge
    space.
    <p>
    <p>
    <center><i>d<sub>pot</sub></i> = &sum;<sub>P</sub> min<sub>K</sub>{ <i>d</i> (<i>P, K</i>) | <i>P</i> &subseteq; <i>Q</i> } / 2<sup>|Q|</sup></center>
    <p>&nbsp;<p>
    Note that <i>d<sub>dat</sub></i> is equal to the <i>DI</i>.
    <p>
    
    Generally, a low Distance Agreement Coefficient supports the validity of the <b>knowledge space</b> and a high Distance Agreement Coefficient violates its validity.
<p>
<p>
<center>DA = <i>d<sub>dat</sub></i> / <i>d<sub>pot</sub></i></center><p>
&nbsp<p>
Please note that for the relation-oriented measures, quasi-ordinal knowledge spaces should be used.