Introductory Resources for Data Science
Getting started in data science can be a daunting task. The area is broad, with little guidance on where one topic start and the next ends. Because of that, I've pulled together this list of resources that particularly helped me when I was starting out.
I've broken the links into the following four categories, each of which distinguishes between the different skills required by data scientists:
- Data Analytics - Involves feature engineering and statistical tests.
- Machine Learning - Involves model development and prediction.
- Data Engineering - Involves database management and design.
- Produce Analytics - Involves product design and user engagement.
Here is a list of texts that I found particularly interesting, which I hope is useful to others who are starting out.
For the Data Analyst
- Probability and Statistics for Engineering and the Sciences, by Jay L. Devore. This college textbook is my go-to resource for all things statistics.
- The Cartoon Guide to Statistics, by Larry Gonick. This is a fun and surprisingly useful introduction to statistics.
- Brian Steel's Introduction to Probability & Statistics has great course notes.
- Data Analysis Using Regression and Multilevel/Hierarchical Models, by Andrew Gelman and Jennifer Hill. With a bias towards R and Bayesian statistics, the text is the bible when considering hierarchical modelling.
- Bayesian Data Analysis, by Gelman, is also one of my favourite books that provides more of a fundamental introduction to Bayesian statistics in R.
- Google's A/B Testing Course on Udacity is a nice introductory series to analytics. This really helps to cement the core concepts, and is especially good for those starting out.
- The Probability and Statistics Course, on Kahn Academy, provides some nice videos to brush up on the basics.
- A First Course in Design and Analysis of Experiment is another free book that discusses experimental design and analytics.
- Python for Data Analysis, by Wes McKinney. Pandas has become a core tool in the last few years, and this book introduces all its concepts.
- Wes McKinney's videos on Pandas are also a nice way to learn the basics.
- Tom Augspurger seven-part introduction to Pandas is, of course, superb!
For the Machine Learning Coder
- Machine Learning in Python, by Sebastian Raschka, provides an excellent introduction to core concepts, with detailed code for use in Scikit-Learn.
- An Introduction to Statistical Learning: with Applications in R has to be one of the core text for Machine Learning Models.
- The Elements of Statistical Learning, by Trevor Hastie is the other core text in ML and Statistics.
- Probabilistic Programming and Bayesian Methods for Hackers is just amazing. I get lost in this text every time, and the code is all available to play with.
- Problem Solving with Algorithms and Data Structures should be read to better understand computer science concepts. Learn to code the problems and practice on Leetcode and HackerRank.
- MIT's Introduction to Algorithms (via Coursera) is a great video introduction to the fundamentals of data types, algorithms, and data structures.
- Composing Programs is a great online introduction to programming and computer science.
For the Data Engineer
- The SQL School by Model Analytics is perfect to learn the basics.
- SQLZoo is recommended to practice queries.
- SQL Joins Visualizer is nice to see the difference between commands.
- W3schools is my go-to reference to look up SQL keywords.
- Bill Howe's Introduction to Data Science on Coursera has a nice few videos discussing the differences between database designs. The course also introduces - MapReduce and the history of different schemas. It's honestly a nice way to spend a weekend!
For the Product Analyst
- Lean Analytics, by Eric Ries, is the most concise introduction to web analytics I've come across.
- Customer Churn is discussed in this iPython Notebook, with a coded example.
- Zero to One, by Peter Thiel, is a leisurely read that introduces the different factors to consider when building a product.
- Hacker News is my way to catch up on tech news each evening! If you find yourself waiting around with nothing to do, open it up and explore the latest from the world of tech.
Finally, Hilary Parker and Brian Coffey (both from Stitch-Fix) have their own excellent list of recommended data science books.