ABOUT ERuDIte

ERuDIte provides the resource data that learners can discover on the TCC Web Portal. Over time, learners’ interactions with the TCC Web Portal will impact ERuDIte, forming a feedback cycle where both components improve each other and adapt to learners’ needs and demands.
The blending of Artificial Intelligence techniques with highly targeted curation for resource discovery, retrieval, personalization, and organization distinguishes ERuDIte from MOOC aggregators and other resource collection initiatives.

ERuDIte is the educational resource discovery index that powers the BD2K Training Coordinating Center (TCC) Web Portal. ERuDIte not only serves as a resource collector and aggregator but also as system powered by Machine Learning, Information Retrieval, and Natural Language Processing that intelligently organizes resources to provide a dynamic and personalized curriculum for biomedical researchers interested in learning about Data Science.

In the context of this document, biomedical researchers are the intended audience of ERuDIte and the TCC Web Portal, and consequently, they will be addressed as users or learners.

As a research initiative itself, ERuDIte aims to:

  1. Identify, store and synthesize large volumes of relevant educational resources in a scalable fashion
  2. Maintain a schema that aligns with other resource collection initiatives to promote data sharing
  3. Serve high-quality, up-to-date educational content to the biomedical community (and research community at large) that not only teaches Data Science concepts but also supports the practical application of such concepts into specific analysis tasks
  4. Aid learners in navigating the vast number of resources pertaining to Data Science through semi-automatic tagging and prerequisite identification
  5. Provide an individualized learning path through recommendations tailored to learners’ interests, experience, and progress over time

To accomplish these objectives, ERuDIte has multiple components responsible for the resource to ERuDIte to TCC Web Portal pipeline. We illustrate the pipeline below:

  • Resource Identification Component: collects links to relevant resources and gathers any available data for the resources
  • Resource Integration Component: unifies data from heterogeneous resources and conforms them to a standard schema
  • Resource Database: stores resource data, making it available for the Resource Organization Engine, Curation Interface, TCC Web Portal, and Resource Personalization Engine
  • Resource Organization Engine: automatically assigns tags, identifies prerequisites, and evaluates resource depth and uses curator data from the Curation Interface and user data TCC User Database to improve its algorithms
  • Curation Interface: tool for curators to validate organization data and assess resource quality
  • TCC Web Portal: presents ERuDIte data and collects learner activity and progress.
  • TCC User Database: stores user data, including (but not limited to) learner profile data and usage activity, and informs the Resource Personalization Engine and the Resource Organization Engine
  • Resource Personalization Engine: synthesizes user activity, resource tags and prerequisites, and resource similarity measurements to recommend resources to learners

ERuDIte provides the resource data that learners can discover on the TCC Web Portal. Over time, learners’ interactions with the TCC Web Portal will impact ERuDIte, forming a feedback cycle where both components improve each other and adapt to learners’ needs and demands.
The blending of Artificial Intelligence techniques with highly targeted curation for resource discovery, retrieval, personalization, and organization distinguishes ERuDIte from MOOC aggregators and other resource collection initiatives.