The Data Scientist’s Toolbox

This course is offered through Coursera — you can add it to your Accredible profile to organize your learning, find others learning the same thing and to showcase evidence of your learning on your CV with Accredible's export features.

Course Date: 04 August 2014 to 01 September 2014 (4 weeks)

Price: free

Course Summary

Get an overview of the data, questions, and tools that data analysts and data scientists work with. This is the first course in the Johns Hopkins Data Science Specialization.

Estimated Workload: 3-4 hours/week

Course Instructors

Jeff Leek

Jeff Leek is an Assistant Professor of Biostatistics at the Johns Hopkins Bloomberg School of Public Health and co-editor of the Simply Statistics Blog. He received his Ph.D. in Biostatistics from the University of Washington and is recognized for his contributions to genomic data analysis and statistical methods for personalized medicine. His data analyses have helped us understand the molecular mechanisms behind brain development, stem cell self-renewal, and the immune response to major blunt force trauma. His work has appeared in the top scientific and medical journals Nature, Proceedings of the National Academy of Sciences, Genome Biology, and PLoS Medicine. He created Data Analysis as a component of the year-long statistical methods core sequence for Biostatistics students at Johns Hopkins. The course has won a teaching excellence award, voted on by the students at Johns Hopkins, every year Dr. Leek has taught the course.

Roger Peng

Roger D. Peng is an Associate Professor of Biostatistics at the Johns Hopkins Bloomberg School of Public Health and a Co-Editor of the Simply Statistics blog. He received his Ph.D. in Statistics from the University of California, Los Angeles and is a prominent researcher in the areas of air pollution and health risk assessment and statistical methods for environmental data. He created the course Statistical Programming at Johns Hopkins as a way to introduce students to the computational tools for data analysis. Dr. Peng is also a national leader in the area of methods and standards for reproducible research and is the Reproducible Research editor for the journal Biostatistics. His research is highly interdisciplinary and his work has been published in major substantive and statistical journals, including the Journal of the American Medical Association and the Journal of the Royal Statistical Society. Dr. Peng is the author of more than a dozen software packages implementing statistical methods for environmental studies, methods for reproducible research, and data distribution tools. He has also given workshops, tutorials, and short courses in statistical computing and data analysis.

Brian Caffo

Brian Caffo, PhD is a professor in the Department of Biostatistics at the Johns Hopkins University Bloomberg School of Public Health. He graduated from the Department of Statistics at the University of Florida in 2001. He works in the fields of computational statistics and neuroinformatics and co-created the SMART ( working group. He has been the recipient of the Presidential Early Career Award for Scientist (PECASE) and Engineers and Bloomberg School of Public Health Golden Apple and AMTRA teaching awards.

Course Description

In this course you will get an introduction to the main tools and ideas in the data scientist's toolbox. The course gives an overview of the data, questions, and tools that data analysts and data scientists work with. There are two components to this course. The first is a conceptual introduction to the ideas behind turning data into actionable knowledge. The second is a practical introduction to the tools that will be used in the program like version control, markdown, git, GitHub, R, and RStudio.


How do the courses in the Data Science Specialization depend on each other?
We have created a handy course dependency chart to help you see how the nine courses in the specialization depend on each other.

Will I get a Statement of Accomplishment after completing this class?

Yes. Students who successfully complete the class will receive a Statement of Accomplishment signed by the instructor.

What resources will I need for this class?

For this course, all you need is an Internet connection and access to Github 

How does this course fit into the Data Science Specialization?
This is the first course in the sequence. We recommend that you take this course before moving on to R Programming or any of the other courses in the specialization.


Upon completion of this course you will be able to identify and classify data science problems. You will also have created your Github account, created your first repository, and pushed your first markdown file to your account.


This course consists of weekly video lectures, weekly quizzes, and a final peer-assessed project.

Course Workload

3-4 hours/week

Review course:

Please sign in to review this course.

Similar Courses

{{ }} {{ }}


{{course.start_date | date:'MMM d'}} — {{ course.end_date | date:'MMM d'}}   ({{ course.time_until_course_starts }} ,   length: {{ course.length_in_weeks }} weeks) Self-paced — no deadlines    
${{ course.price }} p/mfree


Course Activity & Community

Be the first Accredible user to join this course!

uploaded {{ feed_item.model.caption || feed_item.model.url || feed_item.model.file_file_name }} for the course {{ }} — {{ feed_item.time_ago }}

{{ }} {{ comment.text | truncate: (comment.length || comment_display_length) }}   read more hide

{{ comment.time_ago }}

started the course {{ }} — {{ feed_item.time_ago }}
followed {{ }} — {{ feed_item.time_ago }}
followed thier friend {{ }} — {{ feed_item.time_ago }}
{{ feed_item.model.text }} (on the course {{ }}) — {{ feed_item.time_ago }}

{{ }} {{ comment.text | truncate: (comment.length || comment_display_length) }}   read more hide

{{ comment.time_ago }}