The author gives a very gentle introduction to key issues in statistics. Even simple things like the difference between mean and median are explained.
But the book is also a crash course on R. Parallel to my reading I could experiment with the data and the R environment.
Learning machine learning with real data
Especially intriguing for me was that one could follow the data analysis hands-on with real data sets! (I didn’t know previously that there are real data sets free available on the internet – for instance at the UCI machine learning repository). And all this could be done without previous knowledge of R.
I have to confess that some of the statistical details in the later chapters I didn’t understand completely in my first reading. But I didn’t expect that with my first dive into the domain of machine learning I will become a professional data scientist. I got some understanding about the main concepts and know now where to go for further practice and to build up my skills for analysing big data.
Excellent teaching approach to machine learning
The book is also (almost) perfect from an educational point of view. After two introductory chapters (one about general features of machine learning and one about the first steps and general syntax of R) the next seven chapters follow the same outline:
- Providing a general understand of the algorithms with strength and weaknesses: Explaining the most important formulas and the effects demonstrating with some illustrative sample data. This provides you with a qualitative understanding of the method.
- The chapter continues with a practical demonstration in the following order:
- Collecting data: Where to get the data set, references and explaining the structure of the data.
- Exploring and preparing the data. Every R-command to load the data, to transform etc. is explained and written down as code. The data and even these command are provided in a .zip archive at github.
- Training the model on the data
- Evaluating the model performance, looking for and discussing the false positives and false negatives including their effects in the real world.
- Improving the performance of the model.
- And finally a summary with lessons learned from this chapter.
- Like the first two chapters the structure of the last three chapters are different too: They are dedicates on strategies for evaluating and improving of model performances and some other specialised issues on machine learning.
Some suggestions for the third edition of machine learning with R
- Please provide a section with exercises and solutions for the next edition! This would be very important for the transfer from understanding to applicable skills.
- I would like to see one application in learning analytics with a real data set from the educational domain.
- And last not least – there should be a last chapter „Where to go from here now“.
But all in all: One of the best tutorial books I have read!