I didn't want to clog up a Twitter thread with a bunch of machine learning blogs/books/vignettes/software, but also thought an email to Scott wouldn't be useful to anyone else. So here are a few relatively-accessible resources that someone with a bit of math should be able to get through with ease.

This is an excellent worked vignette for regularize (generalized) linear models using the fantastic glmnet package in R:

What you'll find is that for prediction, regularized glms with some feature engineering (interactions, bucketing, splines, combinations of all three) will typically give you similar predictive performance to random forests while maintaining interpretability and the possibility of estimating uncertainty (see below). That's why they're so popular.

When you have high-dimensioned categorical predictors or natural groupings, it often doesn't make sense to one-hot encode them (ie. take fixed effects) in a regularized glm. Doing so will result in the same degree of regularization across grouping variables, which might be undesirable. In such a case you can often see

https://github.com/noamross/2017-11-14-noamross-gams-nyhackr/blob/master/2017-11-14-noamross-gams-nyhackr.pdf is a fun introduction

and

https://m-clark.github.io/docs/GAM.html

is a full vignette on implementation using various GAM packages.

The obvious alternatives to regularized glms are tree-based methods and neural networks. A lot of industry folks, especially those who started life using proprietary packages, use SVMs too. Pedants who enjoy O(n^3) operations seem to get a weird kick out of Gaussian Processes. The point of all these methods is the same: to relax (or really, to automatically discover) non-linear relationships between features and outcomes. Tree-based methods and neural networks will also do well at discovering interactions too. Neural networks go a step further and uncover representations of your data which might be useful in themselves.

https://cran.r-project.org/web/packages/rpart/vignettes/longintro.pdf

https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf

Next you should learn about tree-based additive models. These come in many varieties, but something close to the current state-of-the-art is implemented using xgboost. These techniques combined with smart feature engineering will work extremely well for a wide range of predictive problems. I incorporate them into my work to serve as a baseline that simpler models (for which we can get more sensible notions of uncertainty) should be able to get close to with enough work.

https://xgboost.readthedocs.io/en/latest/model.html

http://www.inference.org.uk/itprnn/book.pdf

Given you have some understanding now of what a neural network is and how they're fit, you can get down to fitting some. There are a few great high-level approaches, like Keras and H2O.ai, which are extremely easy to dive in with:

https://keras.rstudio.com

and

http://h2o-release.s3.amazonaws.com/h2o/rel-lambert/5/docs-website/Ruser/Rinstall.html

Note that these two approaches are great for fairly simple prediction tasks. If you want to make any real investment in deep learning for image/voice/NLP then you will find yourself working at a lower level (the analogy for statisticians would be going from rstanarm/brms to Stan proper), like Torch or TensorFlow. At this point you would probably be wise in asking yourself what you're doing in R--almost the entire AI community uses Python.

Even so, there is a reasonable API for TensorFlow available within R. I've not done a huge amount of playing outside of the tutorials, which seem well written.

https://tensorflow.rstudio.com/tensorflow/

**(Regularized) generalized linear models**

This is an excellent worked vignette for regularize (generalized) linear models using the fantastic glmnet package in R:

What you'll find is that for prediction, regularized glms with some feature engineering (interactions, bucketing, splines, combinations of all three) will typically give you similar predictive performance to random forests while maintaining interpretability and the possibility of estimating uncertainty (see below). That's why they're so popular.

When you have high-dimensioned categorical predictors or natural groupings, it often doesn't make sense to one-hot encode them (ie. take fixed effects) in a regularized glm. Doing so will result in the same degree of regularization across grouping variables, which might be undesirable. In such a case you can often see

*huge*improvements by simply using varying intercepts (and even varying slopes) in a Bayesian random effects model. The nice thing here is that because it's Bayesian, you get uncertainty for free. Well not free--you pay for it in the extra coal and time you'll burn fitting your model. But they're really pretty great. rstanarm implements these very nicely.

In the above two methods, if you want to discover non-linearities by yourself, you have to cook your own non-linear features. But there are methods that do this quite well, while retaining the interpretability of linear models. The fantastic mgcv and rstanarm packages will fit Generalised Additive Models using maximum likelihood and MCMC-based techniques respectively.

and

https://m-clark.github.io/docs/GAM.html

is a full vignette on implementation using various GAM packages.

**Tree-based methods**

The obvious alternatives to regularized glms are tree-based methods and neural networks. A lot of industry folks, especially those who started life using proprietary packages, use SVMs too. Pedants who enjoy O(n^3) operations seem to get a weird kick out of Gaussian Processes. The point of all these methods is the same: to relax (or really, to automatically discover) non-linear relationships between features and outcomes. Tree-based methods and neural networks will also do well at discovering interactions too. Neural networks go a step further and uncover representations of your data which might be useful in themselves.

To get a good understanding of tree-based methods, it makes sense to start at the beginning--with a simple classification and regression tree. I found this introducton pretty clear:

Once you understand CART, then Random Forests are probably the next step. The original Breiman piece is as good a place to start as any:

Next you should learn about tree-based additive models. These come in many varieties, but something close to the current state-of-the-art is implemented using xgboost. These techniques combined with smart feature engineering will work extremely well for a wide range of predictive problems. I incorporate them into my work to serve as a baseline that simpler models (for which we can get more sensible notions of uncertainty) should be able to get close to with enough work.

https://xgboost.readthedocs.io/en/latest/model.html

**Net-based methods**

Neural networks are of course all the rage, yet it's helpful to remember that they're really just tools for high-dimensional functional approximation. I found them hard to get into coming from an econometrics background (where notions like "maybe we should have more observations than unknowns in the model" are fairly common). But there are really just a few concepts to understand in order to get something working.

I found David Mackay's chapters on them to be extremely easy to grasp. His whole, brilliant book is available for free here, with the relevant chapters starting at page 467:

Given you have some understanding now of what a neural network is and how they're fit, you can get down to fitting some. There are a few great high-level approaches, like Keras and H2O.ai, which are extremely easy to dive in with:

https://keras.rstudio.com

and

http://h2o-release.s3.amazonaws.com/h2o/rel-lambert/5/docs-website/Ruser/Rinstall.html

Note that these two approaches are great for fairly simple prediction tasks. If you want to make any real investment in deep learning for image/voice/NLP then you will find yourself working at a lower level (the analogy for statisticians would be going from rstanarm/brms to Stan proper), like Torch or TensorFlow. At this point you would probably be wise in asking yourself what you're doing in R--almost the entire AI community uses Python.

Even so, there is a reasonable API for TensorFlow available within R. I've not done a huge amount of playing outside of the tutorials, which seem well written.

https://tensorflow.rstudio.com/tensorflow/

**Others?****If you know of any other great resources for someone--especially an economist--wanting to build their machine-learning chops, please drop them in the comments!**

Alex S writes:

ReplyDeleteThat's supervised learning, not ML. So you need things like

https://sites.google.com/site/igorcarron2/matrixfactorizations

and maybe

https://arxiv.org/abs/1801.01586

For smoother transition

http://mlg.eng.cam.ac.uk/zoubin/papers/lds.pdf

may help. And then I'd strongly recommend

http://castlelab.princeton.edu/html/Papers/Powell-UnifiedFrameworkStochasticOptimization_July222017.pdf

This comment has been removed by a blog administrator.

ReplyDeleteIot Training in Bangalore

ReplyDeleteMachine Learning Training in Bangalore

Pcb Training in Bangalore

Devops Training in Bangalore

Some issues may strike us as anomalous, however we will not examine existences and levels of perfection, since we're caught with the one actuality we discover ourselves in.This is great blog. If you want to know more about this visit here Machine Learning Model.

ReplyDeleteNice topics thank you for sharing this information.

ReplyDeleteDeep Learning Training in Hyderabad

Thank you for sharing wonderful information with us to get some idea about that content. check it once through

ReplyDeletebest machine learning institute in chennai | Machine Learning course in chennai

Love to read it,Waiting For More new Update and I Already Read your Recent Post its Great Thanks. predictive maintenance

ReplyDeleteAmazing resource. These functional blogs is helping us a lot in all possible ways. You can read more about this topic right here so have it.

ReplyDeletethese machine and those regular factors which is really conencted to them should be taken into account. In this link

ReplyDeletethere has said about our typing service.

Yeah i agree with you and i think for get success in our career such kind of research work is very essential for us. We can see this website for get more information about research.

ReplyDeleteBest R Programming Training in Bangalore offered by myTectra. India's No.1 R Programming Training Institute. Classroom, Online and Corporate training in R Programming

ReplyDeleter programming training

IOT Training in Bangalore - Live Online & Classroom

ReplyDeleteStudents are made to understand the type of input devices and communications among the devices in a wireless media.

IOT Training course observes iot as the platform for networking of different devices on the internet and their inter related communication.

Several concerns may well affect us all since anomalous, nonetheless we all is not going to analyze existences and also numbers of flawlessness, given that we have been found with all the a single fact we all find out yourself inside.go to this site is fantastic website. In order to learn concerning this click here Equipment Studying Product.

ReplyDelete