Labeled Data

As of the last data dump, we've collected over 35,550 label "votes" across 20 label categories for 3,459 texts. The numbers you see assigned to a label are based on "votes" from players of the Learned Hands online game. We ask players to say if a label is or isn't present in a text, and we use these answers to calculate a Wilson confidence score interval. The best guess values are the center of this interval as of the files' creation for all of the texts with at least one label. The 95% confidence values are 1 if the lower bound of the interval exceeded 50% and 0 if the upper interval dropped below 50%. If the interval for a text straddles 50%, no value is included for that label. That is, the best guess values are our best guesses for what percentage of folks will say an issue is present, and the 95% confidence values are those values where we're 95% confident that more than half of folks would agree. We intend to test these values against expert labels at some point in the near future.

95% Confidence

Best Guess