Four steps to master machine learning with python (including free books & resources)
To understand and apply machine learning techniques you have to learn Python or R. Both are programming languages similar to C, Java or PHP. However, since Python and R are much younger and “farer away” from the CPU, they are easier. The advantage of Python is that it can be adopted to many other problems than R, which is only used for handling data, analysing it with e.g. machine learning and statistic algorythms and ploting it in nice graphs. Because Python has a broader distribution (hosting websites with Jango, natural language proecssing, accessing APIs of websites such as Twitter, Linkedin etc.) and resembles more classical programming languages like C Python is more popular.
The four steps of learning machine learning in python
First you have to learn the basics of Python using books, courses and videos.
Then you have to master the different moduls such as Pandas, Numpy, Matplotlib and Natural Language Processing (NLP) in order to handle, clean, plot and understand data.
Afterwards you have to able to scrap data from the web which is either done by using APIs of websites or the web-scraping moduls Beautiful Soup. Web scraping allows you to collect data which you feed into you machine learning algorithms.
In the last step you have to learn machine learning (ML) tools like Scikit-Learn or implement ML-algorithm from scratch.
1. Getting started with Python:
And easy and fast way to learn Python is to register atcodecademy.comand imediately start to code and learn the basics of python. A classic is the websitelearnpythonthehardwaywhich is referenced by a lot of python programmers. A good PDF is abyte of python. Alist of python resourcesfor beginners is also provided by the python community. A book from O’Reilley is Think Python, which can be downloaded forfree from here. A last resource isIntroduction to Python for Econometrics, Statistics and Data Analysiswhich also covers the basics of Python.
2. Important Modules for machine learning
The most important modules for machine learning areNumPy,Pandas,MatplotlibandIPython. A book covering a couple of these modules isData Analysis with Open Source Tools. The free bookIntroduction to Python for Econometrics, Statistics and Data Analysisfrom 1. also covers Numpy, Pandas, matplotlib and IPython. Another resource isPython for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, which also covers the most important modules. Her are other free Numpy (Numerical Python,Numpy Userguide,Guide to NumPy), Pandas (Pandas, Powerful Python Data Analysis Toolkit,Practical Business Python,Intros to Pandas Data Structure) andMatplotlib books.
Other resources:
3. Mining and scraping the data from websites and through APIs
Once you have understood the basics of python and the most important modules you have to learn how to collect data from different sources. This technique is also called web scrapping. Classic sources are text from websites, textual data through APIs to access websites such as twitter or linkedin. Good books on web scraping areMining the Social Web (free book!),Web Scraping with PythonandWeb Scraping with Python: Collecting Data from the Modern Web.
Lastly this textual data has to be transformed into numerical data, which is done with natural language processing techniques covered byNatural language processing with PythonandNatural Language Annotation for Machine Learning. Other data are images and videos, which can be analysed using computer vision techniques:Programming Computer Vision with Python,Programming Computer Vision with Python: Tools and algorithms for analyzing imagesandPractical Python and OpenCVare typical resources to analyse images.
Educational and interesting examples of what you can already do using basic python commands and web scraping techniques can be found in these examples:
Mini-Tutorial: Saving Tweets to a Database with Python
Web Scraping Indeed for Key Data Science Job Skills
Case Study: Sentiment Analysis On Movie Reviews
Basic Sentiment Analysis with Python
Twitter sentiment analysis using Python and NLTK
Second Try: Sentiment Analysis in Python
Natural Language Processing in a Kaggle Competition for Movie Reviews
4. Machine learning with Python
Machine learning can be divided into four groups. Classification, clustering, regression and dimensionalty reduction.
Classification can also be called supervised learning and helps one to classify an image in order to identify a symbol or face in the image, or to classify a user from its profile and to grant him different credit scores. Clustering happens under unsupervised learning and allows the user to identify groups/clusters within its data. Regression permits to estimate a value from a paramter set and can be used to predict the best price for a house, apartment or car.
All importantmodules, packages and techniquesto learn Machine Learning in Python, C, Scala, Java, Julia, MATLAB, Go, R and Ruby. Books about machine learning in python:
I especially recommend the bookMachine learning in action. Although a bit short it is probably a classic in machine learning due to its ageProgramming Collective Intelligence. These two books let you build machine learning algorithms from scratch.
Most recent publications about machine learning are base on the Python module scikit-learn. It makes machine learning very easy since all the algorithm are already implemented. The only thing you do is to tell python which ML-technique should be used to analyse the data.
A free scikit-learn tutorialcan be found on the official scikit-learn website. Other posts are be found here:
Introduction to Machine Learning with Python and Scikit-Learn
Machine Learning for Predicting Bad Loans
A Generic Architecture for Text Classification with Machine Learning
Using Python and AI to predict types of wine
Advice for applying Machine Learning
Predicting customer churn with scikit-learn
Case Study: Sentiment Analysis on Movie Reviews
Document Clustering with Python
Five most popular similarity measures implementation in python
Case Study: Sentiment Analysis on Movie Reviews
Text Processing in Machine Learning
Hacking an epic NHL goal celebration with a hue light show and real-time machine learning
Exploring and Predicting University Faculty Salaries
Books about machine learning and the module scikit-learn in Python are:
Building Machine Learning Systems with Python
Building Machine Learning Systems with Python, 2nd Edition
Learning scikit-learn: Machine Learning in Python
Machine Learning Algorithmic Perspective
Data Science from Scratch – First Principles with Python
Books which are published in the coming months are:
Introduction to Machine Learning with Python
Thoughtful Machine Learning with Python: A Test-Driven Approach
Courses and blogs about Machine learning
You want to earn a degree, take an online course or attand a real workshop, camp or university course? Here are some links:Collection of linksto online education in analytics, Big Data, Data Mining, and Data Science. Coursera course inmachine learningandData Analyst Nanodegreefrom Udacity are other recommended online courses.List of frequently updatedblogs about machine learning.
A great youtube video is this class fromJake Vanderplas, Olivier Grisel about Exploring Machine Learning with Scikit-learn!
Theory of Machine Learning
Want to learn the theory of machine learning?The Elements of statistical LearningandIntroduction to Statistical Learningare often cited classics. Other books areIntroduction to machine learningandA Course in Machine Learning. The links contain free PDF, so you don’t have to pay them! Don’t want to read this? Watch15 hours theory of machine learning!
Please follow and like us: