Saturday, January 27, 2018

Some good introductory machine learning resources in R

I didn't want to clog up a Twitter thread with a bunch of machine learning blogs/books/vignettes/software, but also thought an email to Scott wouldn't be useful to anyone else. So here are a few relatively-accessible resources that someone with a bit of math should be able to get through with ease. 

(Regularized) generalized linear models

This is an excellent worked vignette for regularize (generalized) linear models using the fantastic glmnet package in R: 

What you'll find is that for prediction, regularized glms with some feature engineering (interactions, bucketing, splines, combinations of all three) will typically give you similar predictive performance to random forests while maintaining interpretability and the possibility of estimating uncertainty (see below). That's why they're so popular. 

When you have high-dimensioned categorical predictors or natural groupings, it often doesn't make sense to one-hot encode them (ie. take fixed effects) in a regularized glm. Doing so will result in the same degree of regularization across grouping variables, which might be undesirable. In such a case you can often see huge improvements by simply using varying intercepts (and even varying slopes) in a Bayesian random effects model. The nice thing here is that because it's Bayesian, you get uncertainty for free. Well not free--you pay for it in the extra coal and time you'll burn fitting your model. But they're really pretty great. rstanarm implements these very nicely. 

In the above two methods, if you want to discover non-linearities by yourself, you have to cook your own non-linear features. But there are methods that do this quite well, while retaining the interpretability of linear models. The fantastic mgcv and rstanarm packages will fit Generalised Additive Models using maximum likelihood and MCMC-based techniques respectively. is a fun introduction


is a full vignette on implementation using various GAM packages. 

Tree-based methods

The obvious alternatives to regularized glms are tree-based methods and neural networks. A lot of industry folks, especially those who started life using proprietary packages, use SVMs too. Pedants who enjoy O(n^3) operations seem to get a weird kick out of Gaussian Processes. The point of all these methods is the same: to relax (or really, to automatically discover) non-linear relationships between features and outcomes. Tree-based methods and neural networks will also do well at discovering interactions too. Neural networks go a step further and uncover representations of your data which might be useful in themselves. 

To get a good understanding of tree-based methods, it makes sense to start at the beginning--with a simple classification and regression tree. I found this introducton pretty clear:

Once you understand CART, then Random Forests are probably the next step. The original Breiman piece is as good a place to start as any:

Next you should learn about tree-based additive models. These come in many varieties, but something close to the current state-of-the-art is implemented using xgboost. These techniques combined with smart feature engineering will work extremely well for a wide range of predictive problems. I incorporate them into my work to serve as a baseline that simpler models (for which we can get more sensible notions of uncertainty) should be able to get close to with enough work.

Net-based methods

Neural networks are of course all the rage, yet it's helpful to remember that they're really just tools for high-dimensional functional approximation. I found them hard to get into coming from an econometrics background (where notions like "maybe we should have more observations than unknowns in the model" are fairly common). But there are really just a few concepts to understand in order to get something working. 

I found David Mackay's chapters on them to be extremely easy to grasp. His whole, brilliant book is available for free here, with the relevant chapters starting at page 467:

Given you have some understanding now of what a neural network is and how they're fit, you can get down to fitting some. There are a few great high-level approaches, like Keras and, which are extremely easy to dive in with:


Note that these two approaches are great for fairly simple prediction tasks. If you want to make any real investment in deep learning for image/voice/NLP then you will find yourself working at a lower level (the analogy for statisticians would be going from rstanarm/brms to Stan proper), like Torch or TensorFlow. At this point you would probably be wise in asking yourself what you're doing in R--almost the entire AI community uses Python.

Even so, there is a reasonable API for TensorFlow available within R. I've not done a huge amount of playing outside of the tutorials, which seem well written.


If you know of any other great resources for someone--especially an economist--wanting to build their machine-learning chops, please drop them in the comments! 


  1. Alex S writes:

    That's supervised learning, not ML. So you need things like

    and maybe

    For smoother transition

    may help. And then I'd strongly recommend

  2. This comment has been removed by a blog administrator.

  3. Some issues may strike us as anomalous, however we will not examine existences and levels of perfection, since we're caught with the one actuality we discover ourselves in.This is great blog. If you want to know more about this visit here Machine Learning Model.

  4. Thank you for sharing wonderful information with us to get some idea about that content. check it once through
    best machine learning institute in chennai | Machine Learning course in chennai

  5. Love to read it,Waiting For More new Update and I Already Read your Recent Post its Great Thanks. predictive maintenance

  6. Amazing resource. These functional blogs is helping us a lot in all possible ways. You can read more about this topic right here so have it.

  7. these machine and those regular factors which is really conencted to them should be taken into account. In this link
    there has said about our typing service.

  8. Yeah i agree with you and i think for get success in our career such kind of research work is very essential for us. We can see this website for get more information about research.

  9. Best R Programming Training in Bangalore offered by myTectra. India's No.1 R Programming Training Institute. Classroom, Online and Corporate training in R Programming
    r programming training

  10. IOT Training in Bangalore - Live Online & Classroom
    Students are made to understand the type of input devices and communications among the devices in a wireless media.
    IOT Training course observes iot as the platform for networking of different devices on the internet and their inter related communication.

  11. Several concerns may well affect us all since anomalous, nonetheless we all is not going to analyze existences and also numbers of flawlessness, given that we have been found with all the a single fact we all find out yourself inside.go to this site is fantastic website. In order to learn concerning this click here Equipment Studying Product.

  12. All of considering anomalous, even now everyone certainly will not investigate existences and even variety of flawlessness, seeing as we've been determined with a particular point everyone find you in. pay a visit to this blog is definitely brilliant

  13. Positive site, where did u come up with the information on this posting?I have read a few of the articles on your website now, and I really like your style. Thanks a million and please keep up the effective work. R Programming institutes in Chennai | R Programming Training in Chennai | R Programming Course Fees | R Language training in Chennai

  14. Amazing article. Your blog helped me to improve myself in many ways thanks for sharing this kind of wonderful informative blogs in live. I have bookmarked more article from this website. Such a nice blog you are providing ! Kindly Visit Us hadoop training in chennai velachery | hadoop training course fees in chennai | Hadoop Training in Chennai Omr

  15. I'm glad you all were capable of making your machine learn so quickly. Did you by any chance use microsoft dynamics 365 to achieve your results? I know that Anegis is very well versed in the IoT and machine learning technologies.

  16. Vanan Services Offers 24 hour Translation Services
    from Experienced Professionals at affordable rates in a Faster Turnaround Time.Get High Quality Services from a Reliable Service Provider.Ask for a Quote Now!

  17. Great blog, Nice explanation about Machine Learning in R. Thanks for sharing this information.
    click here:
    Data Science Online Training


  18. myTectra the Market Leader in Machine Learning Training in Bangalore
    myTectra offers Machine Learning Training in Bangalore using Class Room. myTectra offers Live Online Machine Learning Training Globally. Read More

  19. Great blog, Nice explanation about Machine Learning in R. Thanks for sharing this information.

    Dot Net Training in Chennai

  20. Thanks for your nice post, Machine LearningMachine Learning is steadily moving away from abstractions and engaging more in business problem solving with support from AI and Deep Learning. With Big Data making its way back to mainstream business activities,For more informations visit Pridesys IT Ltd

  21. Really helpful resource to learn machine learning using R. I would like to share one of the best instructor-led course on Artificial intelligence and machine learning. The course is designed to provide the deep insight on AI and Machine Learning.

  22. Really great post, Thank you for sharing This knowledge.Excellently written article, if only all bloggers offered the same level of content as you, the internet would be a much better place. Please keep it up!
    python Course in Pune
    python Course institute in Chennai
    python Training institute in Bangalore

  23. Awesome post. Thanks for sharing this post with learning, downside resolution, and pattern recognition", in essence, it's the thought that machines will possess intelligence.Machine learning course There square measure different models that return underneath the class of unattended learning Models.

  24. you have gave detailed info about nachine learning using R, it is very useful information to all of us .

  25. awesome blog, thank you for sharing the post.

    data science course malaysia
    data science course

  26. nice and very interesting blog, thank you

  27. I have read your article, it is very informative and helpful for me.I admire the valuable information you offer in your articles. Thanks for posting it.. cooperative learning

  28. Thanks for sharing this valuable information and we collected some information from this blog.
    Machine Learning Training in Gurgaon