A week-long intensive Bootcamp focused on methods, techniques, tools, services and resources for big healthcare and biomedical data analytics using the open-source statistical computing software R. Morning sessions (3 hrs) will be dedicated to methods and technologies and applications. Afternoon sessions (3 hrs) will be for group-based hands-on practice and team work. Commitment to attend the full week of instruction (morning sessions) and self-guided work (afternoon sessions) is required. Certificates of completion will be issued only to trainees with perfect attendance that complete all work.


College level (undergrad) mathematical modeling, statistical analysis, or programming courses or permission of the instructor. Introductory graduate level statistical methods, probability theory, computational analysis, or mathematical modeling class. Some MOOCs may be taken as prerequisites, e.g., Corsera, EdX1, EdX2.


A collection of exemplary datasets for testing and practicing with different tools.

SOCR Data » Course Data » Case-Studies »


Place: SNB 1250
Time: Monday through Friday (May 08-12, 2017), 8:00AM - 4:00 PM
University of Michigan affiliates can directly register for the course using their UMich credentials and the Enrollment link below. Non-affiliated learners and students outside the University of Michigan need to first obtain a UMich friend account (using an outside email) that can then be used to register for the course. Enrollment » LiveStream » Archived Videos »

Course Description

This hands-on intensive graduate course (Bootcamp) will provide a general overview of the principles, concepts, techniques, tools and services for managing, harmonizing, aggregating, preprocessing, modeling, analyzing and interpreting large, multi-source, incomplete, incongruent, and heterogeneous data (Big Data). The focus will be to expose students to common challenges related to handling Big Data and present the enormous opportunities and power associated with our ability to interrogate such complex datasets, extract useful information, derive knowledge, and provide actionable forecasting. Biomedical, healthcare, and social datasets will provide context for addressing specific driving challenges. Students will learn about modern data analytic techniques and develop skills for importing and exporting, cleaning and fusing, modeling and visualizing, analyzing and synthesizing complex datasets. The collaborative design, implementation, sharing and community validation of high-throughput analytic workflows will be emphasized throughout the course.

Course Objectives

Trainees successfully completing the course will:
(1) Gain understanding of the computational foundations in Big Data Science
(2) Develop critical inferential thinking
(3) Gather a tool chest of R libraries for managing and interrogating raw and derived, observed, experimental, and simulated big healthcare datasets
(4) Possess practical skills for handling complex datasets.

Target Audience

This course will be appropriate for graduate MIDAS, T32 Neuroscience, and health science trainees who have significant interest in learning data scientific and predictive analytic methods that can commit substantial amount of time to focus an undivided attention to study, practice and interact with other trainees in the course.


Class notes and learning materials will be provided. This Bootcamp will cover topics like managing data with R, various Learning Classifiers, model-based and model free forecasting and predictive analytics, evaluation of classification performance, and ensemble methods. Learning Modules »


DSPA Bootcamp Assignments all students are expected to complete by the corresponding due dates.

Cource Management System

Bootcamp Canvas CMS website provides additional course materials.
SOCR Resource Visitor number Dinov Email