The LearnLab DataShop is a data repository and web application for learning science researchers. It provides secure data storage as well as an array of analysis and visualization tools available through a web-based interface. DataShop was funded by a National Science Foundation grants (SBE-0836012, SBE-0354420) to LearnLab.



DataStage is provided by the Vice Provost Office for Online Learning (VPOL) at Stanford, which facilitates the teaching of online classes. The instruction delivery platforms are instrumented to collect a variety of data around participants' interaction with the study material. Examples are participants manipulating video players as they view portions of a class, solution submissions to problem sets, uses of the online forum available for some classes, peer grading activities, and some demographic data. VPOL makes some of this data available for research on learning processes, and for explorations into improving instruction through Datastage.


MITx course data, in a variety of forms, is made available for research purposes by the Institutional Research section of the Office of the Provost at MIT. The release of data from MITx courses is subject to compliance with student privacy regulations. Researchers may request access to Learner Data for research to improve teaching and curriculum or contribute to scholarship on teaching and learning.

The ASSISTments data repository contains datasets from secondary school interactions with an online tutoring system, in many cases as part of online experiments of what learning works best. You can also submit studies at www.assistmentstestbed.org as well as get a lot of information on how to interpret your data.


The Databrary project aims to promote data sharing, archiving, and reuse among researchers who study the development of humans and other animals. The project focuses on creating tools for scientists to store, manage, preserve, analyze, and share video and other temporally dense streams of data. The project is based at New York University and at Penn State. The U.S. National Science Foundation (NSF BCS-1238599) and the U.S. National Institutes of Health (NIH U01-HD-076595) have provided the funding for this project.


TalkBank is an interdisciplinary research project to promote the study of human and animal communication. The subfields of study include first language acquisition, second language acquisition, conversation analysis, classroom discourse and aphasic language. TalkBank has been funded by grants from the National Science Foundation (including BCS-998009, 0324883) as well as the National Institutes of Health.


The Child Language Data Exchange System (CHILDES) is the part of TalkBank focused on child language, or first language acquisition. CHILDES provides tools for studying conversational interactions, including a transcripts database, programs for analyzing transcripts, methods for linguistic coding and systems for linking audio and video. CHILDES is supported by grants from the National Institutes of Health (R01-HD23998, R01-HD051698).

The MITx and HarvardX Dataverse contains deidentified student-level data from the first year of HarvardX and MITx courses.

The Computer Science Education workshop was held in Pittsburgh June 5-6, 2017. This document outlines the discussion around Data, Analytics and Tool sharing.



The MOOCdb project aims to brings together education researchers, computer science researchers, machine learning researchers, technologists, database and big data experts to advance MOOC data science. The project founded at MIT includes a platform agnostic functional data model for data exhaust from MOOCs, a collaborative-open source-open access data visualization framework, a crowd sourced knowledge discovery framework and a privacy preserving software framework. The team is currently working to release a number of these tools and frameworks as open source.


DiscourseDB is a data infrastructure project, in the space of collaborative and Discussion-based learning, that aims to provide a common data model to accommodate diverse sources including but not limited to Chat, Threaded Discussions, Blogs, Twitter, Wikis and Text messaging. In the future, the project will make available analytics which will facilitate research questions related to the mediating and moderating effects of role taking, help exchange, collaborative knowledge construction and others.

Free tools submitted by developers in the educational data mining and intelligent tutoring systems communities.


The Simon DataLab is an emerging intellectual data commons to drive continuous improvement in student learning outcomes with a particular focus on supporting instructors and course developers in using data to improve their courses.

The Educational Data Mining Workbench will support learning scientists to perform a number of analytic tasks including 1) define and modify behavior categories of interest (e.g., gaming, unresponsiveness, off-task conversation, help avoidance), 2) label previously collected educational log data with the categories of interest, 3) validate inter-rater reliability between multiple labelers of the same educational log data corpus, and 4) provide support for running the labeled data through a machine-learning tool, such as WEKA or RapidMiner.

