Cheatsheets for data science
Data science is ever evolving, and with so many things going on it can be difficult to keep track of all the new libraries and algorithms. This is when it can be really useful to have a reference guide to help you out.In this post I provide some very useful cheatsheets for data science, which any learner should find useful.
Python data science cheasheets
Python for data science: Python basics
This great cheatsheet from Datacamp is going to be extremely useful for any people learning Python for data science. All the basic commands, from list manipulation to numpy arrays are there.
Keras is a great and easy-to-use deep learning library for Python. It is easier to get started in deep neural networks with Keras, rather than it is with Tensorflow directly. This cheatsheet contains some quick recipes to create the most basic neural network types.
Data visualisation in Python
A great data scientist should also be a great communicator, and quite often there is no better tool to do that, than a visualisation. This cheatsheet covers some of the basics of visualisation in Python using matplotlib and seaborn.
The scikit-learn flowchart
I don’t think that any data science cheatsheet article is complete without a reference to the famous scikit-learn flowchart for choosing the right machine learning model. This amazing cheatsheet shows you how to choose the right machine learning model depending on your task and the number of rows and features.
Text cleaning in Python
Every good data scientist should know how to do natural language processing. This cheatsheet presents some very good tips and tricks for cleaning up text.
R data science cheatsheets
The R reference card
This is the go-to cheatsheet for all basic R commands. Provides a good coverage of all the native R commands from plotting, to installing packages, to manipulating vectors. Good for beginners, but even some experienced R users might find it useful.
Data transformation with dplyr
Visualisation with ggplot2
Ggplot2 is best way to produce visually pleasing plots in R. While the traditional plotting capabilities of R are good, the plots produced do not look that great, plus they are not very flexible. Ggplot2 improves upon all that, but it can be a bit daunting for the uninitiated. This cheatsheet provides a great overview of ggplot2 commands and syntax.
The caret package
The caret package provides an easy way to do machine learning in R. It provides a wrapper over many other machine learning R libraries and has utility functions for running cross-validation and cleaning up data. This cheatsheet is a good way to get started using caret.
R reference card for data mining
This very useful cheatsheet contains a high level overview of functions and associated packages in R for data mining. From data manipulation, to big data and parallel computing, this cheatsheet covers a variety of use cases.
You want to become a data scientist?
I believe that the best way to learn data science is to blend different modes of learning. That’s why we decided to create the Datalyst Academy based on my many years of experience in data science education. What are the benefits of Datalyst?
- It is flexible: It can fit around anyone’s schedule, and can be done in as few as 3 months, or as long as up to 1 year or more.
- Topics which are best for solo-learning, like coding, are being taught online.
- Lectures are online, but are supported by additional content, so they can fit around anyone’s schedule.
- There are offline face-to-face workshops in order to facilitate social interaction.
- There is 24/7 mentoring and support, which helps speed up learning.
- The students get the chance to work on projects of their own choice, which enables them to do things they like, or related to their work.