Saturday, January 27, 2018

Some good introductory machine learning resources in R

I didn't want to clog up a Twitter thread with a bunch of machine learning blogs/books/vignettes/software, but also thought an email to Scott wouldn't be useful to anyone else. So here are a few relatively-accessible resources that someone with a bit of math should be able to get through with ease. 


(Regularized) generalized linear models

This is an excellent worked vignette for regularize (generalized) linear models using the fantastic glmnet package in R: 

What you'll find is that for prediction, regularized glms with some feature engineering (interactions, bucketing, splines, combinations of all three) will typically give you similar predictive performance to random forests while maintaining interpretability and the possibility of estimating uncertainty (see below). That's why they're so popular. 

When you have high-dimensioned categorical predictors or natural groupings, it often doesn't make sense to one-hot encode them (ie. take fixed effects) in a regularized glm. Doing so will result in the same degree of regularization across grouping variables, which might be undesirable. In such a case you can often see huge improvements by simply using varying intercepts (and even varying slopes) in a Bayesian random effects model. The nice thing here is that because it's Bayesian, you get uncertainty for free. Well not free--you pay for it in the extra coal and time you'll burn fitting your model. But they're really pretty great. rstanarm implements these very nicely. 

In the above two methods, if you want to discover non-linearities by yourself, you have to cook your own non-linear features. But there are methods that do this quite well, while retaining the interpretability of linear models. The fantastic mgcv and rstanarm packages will fit Generalised Additive Models using maximum likelihood and MCMC-based techniques respectively. 

https://github.com/noamross/2017-11-14-noamross-gams-nyhackr/blob/master/2017-11-14-noamross-gams-nyhackr.pdf is a fun introduction

and 

https://m-clark.github.io/docs/GAM.html

is a full vignette on implementation using various GAM packages. 

Tree-based methods

The obvious alternatives to regularized glms are tree-based methods and neural networks. A lot of industry folks, especially those who started life using proprietary packages, use SVMs too. Pedants who enjoy O(n^3) operations seem to get a weird kick out of Gaussian Processes. The point of all these methods is the same: to relax (or really, to automatically discover) non-linear relationships between features and outcomes. Tree-based methods and neural networks will also do well at discovering interactions too. Neural networks go a step further and uncover representations of your data which might be useful in themselves. 

To get a good understanding of tree-based methods, it makes sense to start at the beginning--with a simple classification and regression tree. I found this introducton pretty clear: 

https://cran.r-project.org/web/packages/rpart/vignettes/longintro.pdf

Once you understand CART, then Random Forests are probably the next step. The original Breiman piece is as good a place to start as any: 

https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf

Next you should learn about tree-based additive models. These come in many varieties, but something close to the current state-of-the-art is implemented using xgboost. These techniques combined with smart feature engineering will work extremely well for a wide range of predictive problems. I incorporate them into my work to serve as a baseline that simpler models (for which we can get more sensible notions of uncertainty) should be able to get close to with enough work.

https://xgboost.readthedocs.io/en/latest/model.html


Net-based methods

Neural networks are of course all the rage, yet it's helpful to remember that they're really just tools for high-dimensional functional approximation. I found them hard to get into coming from an econometrics background (where notions like "maybe we should have more observations than unknowns in the model" are fairly common). But there are really just a few concepts to understand in order to get something working. 

I found David Mackay's chapters on them to be extremely easy to grasp. His whole, brilliant book is available for free here, with the relevant chapters starting at page 467: 

http://www.inference.org.uk/itprnn/book.pdf

Given you have some understanding now of what a neural network is and how they're fit, you can get down to fitting some. There are a few great high-level approaches, like Keras and H2O.ai, which are extremely easy to dive in with:

https://keras.rstudio.com

and

http://h2o-release.s3.amazonaws.com/h2o/rel-lambert/5/docs-website/Ruser/Rinstall.html

Note that these two approaches are great for fairly simple prediction tasks. If you want to make any real investment in deep learning for image/voice/NLP then you will find yourself working at a lower level (the analogy for statisticians would be going from rstanarm/brms to Stan proper), like Torch or TensorFlow. At this point you would probably be wise in asking yourself what you're doing in R--almost the entire AI community uses Python.

Even so, there is a reasonable API for TensorFlow available within R. I've not done a huge amount of playing outside of the tutorials, which seem well written.

https://tensorflow.rstudio.com/tensorflow/

Others? 

If you know of any other great resources for someone--especially an economist--wanting to build their machine-learning chops, please drop them in the comments! 

103 comments:

  1. Alex S writes:

    That's supervised learning, not ML. So you need things like

    https://sites.google.com/site/igorcarron2/matrixfactorizations

    and maybe

    https://arxiv.org/abs/1801.01586

    For smoother transition

    http://mlg.eng.cam.ac.uk/zoubin/papers/lds.pdf

    may help. And then I'd strongly recommend

    http://castlelab.princeton.edu/html/Papers/Powell-UnifiedFrameworkStochasticOptimization_July222017.pdf

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. Machine Learning Projects for Final Year machine learning projects for final year

      Deep Learning Projects assist final year students with improving your applied Deep Learning skills rapidly while allowing you to investigate an intriguing point. Furthermore, you can include Deep Learning projects for final year into your portfolio, making it simpler to get a vocation, discover cool profession openings, and Deep Learning Projects for Final Year even arrange a more significant compensation.

      Python Training in Chennai Python Training in Chennai Angular Training

      Delete
  2. This comment has been removed by a blog administrator.

    ReplyDelete

  3. I loved reading your post because of clear font style and size.Very helpful content for readers.Thanks for posting.Enroll in big data courses and get placement assistant.Big data refers to the large, diverse sets of information that grow at ever-increasing rates
    big data training institute in btm

    ReplyDelete
  4. The content and the subject in the article are straight to the point and also very clear. Would like to know more such information related to same subject.

    data science training in aurangabad
    data science course in aurangabad

    ReplyDelete
  5. Top Website design service company in Brampton
    Best web design and development company in Toronto
    Developing from a group of specialists to an undeniable top SEO office in Canada, we have confidence in persistently redesigning ourselves with the goal that we can offer the most recent types of assistance to our customers.

    ReplyDelete
  6. Attend The Machine Learning Courses in Bangalore From ExcelR. Practical Machine Learning courses in Bangalore Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Machine Learning courses in Bangalore.
    Machine Learning courses in Bangalore

    ReplyDelete
  7. Enjoyed reading the article above, really explains everything in detail, the article is very interesting and effective. Thank you and good luck for the upcoming articles Python Programming Course

    ReplyDelete
  8. I compliment you on the way you share the information in the blogs you write.
    what is hrdf

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete
  10. Thanks for the Valuable information.Really useful information. Thank you so much for sharing. It will help everyone.

    Full Stack Training in Delhi
    FOR MORE INFO:

    ReplyDelete
  11. Very informative post, really help me.
    Naresh IT is The Best Software Training Institute, It Provides Python Online Training, From this Python Online Course you will be able to learn all the Concepts of Python with real-time Industry Exports, Having the combined experience of more than 10+ yrs in the industry.

    ReplyDelete
  12. Good post and its very informative too. Thanks for sharing...
    Visit us: java course
    Visit us: Core Java Online Course
    Visit us: Java Online Training Hyderabad

    ReplyDelete
  13. I am impressed by the information that you have on this blog. It shows how well you understand this subject.
    data scientist certification malaysia

    ReplyDelete
  14. There may be noticeably a bundle to find out about this. I assume you made sure good factors in options also. Cryptocurrency web App Build Exchange Website

    ReplyDelete
  15. After reading your article I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article.
    data science training

    ReplyDelete
  16. Sometimes, blogging is a bit tiresome specially if you need to update more topics. 3d animation companies in karachi

    ReplyDelete
  17. Ucuz, kaliteli ve organik sosyal medya hizmetleri satın almak için Ravje Medyayı tercih edebilir ve sosyal medya hesaplarını hızla büyütebilirsin. Ravje Medya ile sosyal medya hesaplarını organik ve gerçek kişiler ile geliştirebilir, kişisel ya da ticari hesapların için Ravje Medyayı tercih edebilirsin. Ravje Medya internet sitesine giriş yapmak için hemen tıkla: ravje.com

    İnstagram takipçi satın almak için Ravje Medya hizmetlerini tercih edebilir, güvenilir ve gerçek takipçilere Ravje Medya ile ulaşabilirsin. İnstagram takipçi satın almak artık Ravje Medya ile oldukça güvenilir. Hemen instagram takipçi satın almak için Ravje Medyanın ilgili sayfasını ziyaret et: instagram takipçi satın al

    Tiktok takipçi satın al istiyorsan tercihini Ravje Medya yap! Ravje Medya uzman kadrosu ve profesyonel ekibi ile sizlere Tiktok takipçi satın alma hizmetide sunmaktadır. Tiktok takipçi satın almak için hemen tıkla: tiktok takipçi satın al

    İnstagram beğeni satın almak için Ravje medya instagram beğeni satın al sayfasına giriş yap, hızlı ve kaliteli instagram beğeni satın al: instagram beğeni satın al

    Youtube izlenme satın al sayfası ile hemen youtube izlenme satın al! Ravje medya kalitesi ile hemen youtube izlenme satın almak için tıklayın: youtube izlenme satın al

    Twitter takipçi satın almak istiyorsan Ravje medya twitter takipçi satın al sayfasına tıkla, Ravje medya güvencesi ile organik twitter takipçi satın al: twitter takipçi satın al

    ReplyDelete
  18. Python is one of the most powerful languages that are simple to master and easy to master. Python is a quantitative field, and AI Patasala is the top choice for Python Training in Hyderabad.
    Python Course Hyderabad

    ReplyDelete
  19. Thank you for sharing wonderful information with us to get some idea about it.

    tableau certification training

    ReplyDelete
  20. Gsim is the best digital marketing institute in Gurdaspur.

    ReplyDelete

  21. We’re pioneers in Last Mile Manufacturing solutions, including integration, configuration, order fulfillment, returns-management and reverse logistics, and managed demo-evaluation programs. We provide globalization & compliance solutions, helping fast-growing companies expand successfully around the world. Extron also serves as an operations beachhead for overseas companies wising to configure products closer to their U.S. based customers.
    Resilient Global Supply Chain Services
    Server Rack and Stack Integration Services
    Medical Device Manufacturing Services
    Configure To Order Manufacturing Services
    Last Mile Manufacturing Experts
    Retail Prodcut Fulfillment Solutions
    Returns Management Services
    Supply Chain Resilience Solutions

    ReplyDelete
  22. keep it up.If you are Searching for info click on given link
    Mobile Prices Bangladesh

    ReplyDelete
  23. Great Post! Thanks for sharing. Keep sharing such information.
    If you are looking for an advanced machine learning training institute in Delhi then join Ducat India now. At Ducat, we offer you advanced machine learning training with live projects at very affordable prices. Our expert makes you an Industry level expert after completing the course and you get the job vacancies across the world also. So what are you waiting for, we are just one call away. Call us on 70-70-90-50-90

    ReplyDelete
  24. Thanks , I have just been looking for information approximately this subject for a while and yours is the greatest I've found out till now. However, what concerning the conclusion? Are you sure about the source?

    야한소설
    대딸방
    출장안마
    출장마사지
    카지노

    ReplyDelete
  25. Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Keep posting. Thanks for sharing.data scientist course in bhubaneswar

    ReplyDelete
  26. Digital marketing is a field with lucrative career options. The options in the career of digital marketing.

    After the pandemic period, digital marketing has emerged with a lot of opportunities across the globe.

    Currently, Delhi is now a great stop for digital marketers and many folks are looking forward to starting a career in digital marketing.

    Parallelly, there are digital marketing institutes that providing high quality training which are Best digital marketing academy in Delhi

    ReplyDelete
  27. Thank you for the post. I will definitely comeback. data scientist course in surat

    ReplyDelete
  28. Really Nice Post Admin, Very helpful looking for more posts, Now I have to share some information about Top digital marketing academy in Noida

    ReplyDelete
  29. Hi, I read your whole blog. This is very nice. Good to know about the career in Python Training & Certification. We are also providing various Python Training , anyone interested can Python Courses for making their career in this field .

    ReplyDelete
  30. I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.
    cyber security course in malaysia

    ReplyDelete
  31. Extremely overall quite fascinating post. I was searching for this sort of data and delighted in perusing this one.
    Continue posting. A debt of gratitude is in order for sharing.
    data scientist course in warangal

    ReplyDelete
  32. Thank You for this wonderful and much required information Best online tuition in hyderabad

    ReplyDelete
  33. Are you looking for the best Azure training in Chennai? Here is the best suggestion for you, Infycle Technologies the best Software training institute in Chennai to study Azure platform with the top demanding courses such as Graphic Design and Animation, Cyber Security, Blockchain, Data Science, Oracle, AWS DevOps, Python, Big data, Python, Selenium Testing, Medical Coding, etc., with best offers. To know more about the offers, approach us on +91-7504633633, +91-7502633633.

    ReplyDelete
  34. I will truly value the essayist's decision for picking this magnificent article fitting to my matter.Here is profound depiction about the article matter which helped me more.

    ReplyDelete
  35. I am a new user of this site, so here I saw several articles and posts published on this site, I am more interested in some of them, will provide more information on these topics in future articles.
    data science course in london

    ReplyDelete
  36. 360DigiTMG, the top-rated organisation among the most prestigious industries around the world, is an educational destination for those looking to pursue their dreams around the globe. The company is changing careers of many people through constant improvement, 360DigiTMG provides an outstanding learning experience and distinguishes itself from the pack. 360DigiTMG is a prominent global presence by offering world-class training. Its main office is in India and subsidiaries across Malaysia, USA, East Asia, Australia, Uk, Netherlands, and the Middle East.

    ReplyDelete
  37. Thank for sharing such informational blog. If you want to fulfill your dream of studying abroad. So to complete you study abroad in usa dreams.

    ReplyDelete
  38. Great! Here is the best Machine learning training institute in Delhi that offers the best training with the live projects from certified trainers. It also offers the placements in top IT companies.

    ReplyDelete
  39. Thank you for sharing this coaching-related information with local and global audiences. Primary audiences (Coaching Center, Training Center, Business Coaching Class) who wants coaching software, book a free trial of coaching management software to generate the leads, schedule appointments, inquiry management, payments, and business reports in 2022.

    ReplyDelete
  40. Nice to be seeing your site once again, it's been weeks for me. This article which ive been waited for so long. I need this guide to complete my mission inside the school, and it's same issue together along with your essay. Thanks, pleasant share.
    Data Science training in Bangalore

    ReplyDelete
  41. I have been looking fot this kind of blog. and I explore various Best Staffing Company in Noida. but couldn't find like that. There is a firm Insbytech which also writing these kind of blogs and also giving free consultations to their clients.

    ReplyDelete

  42. A website designing company in Delhi. The term website designing comprises the layout, appearance and sometimes, management of the content of the website.

    ReplyDelete
  43. Database of training for customers

    Firms maintain a record of all their present and potential customers using customer database. The customer database stores information like personal details, buying habits, last interaction with the firm, contact information etc. Such databases help the firms to understand the customers buying pattern which is used to design products and decide on prices. The companies use the available list of potential customers to generate higher sales. Also, such a customer database makes it possible for the firm to provide personalized service to the customers.Database of training for customers



    A customer database is an organized collection of customer data which is used to analyze and create the appropriate marketing plan for a product or service. Current or past data gathered in the customer database from different sources like electronic transactions and web activity is stored in structured form in databases which aid in efficient decision makingCustomer training database


    ReplyDelete
  44. Great Post. Very informative. Keep Sharing!!

    Apply Now for Big Data course In Noida

    For more details about the course fee, duration, classes, certification, and placement call our expert at 70-70-90-50-90

    ReplyDelete
  45. This comment has been removed by the author.

    ReplyDelete
  46. Machine Learning Institute in Delhi
    https://www.wikiful.com/@trainingdelhi/should-you-be-worried-about-your-job-if-youre-doing-machine-learning-course
    Get everything and become an expert of Machine Learning. APTRON is the best Machine Learning Institute in Delhi. Machine Learning Training in Delhi Offered by APTRON is the most noteworthy Machine Learning Training anytime Top Quality Trainers, affordable fees, authorized Machine Learning Certification.

    ReplyDelete
  47. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
    tax return for self employed in London

    ReplyDelete
  48. Very interesting this article. This is my first time visit here. I found so many interesting stuff in your blog especially its discussion. Thanks for the post! Please visit Here

    ReplyDelete
  49. Really great post, I simply unearthed your site and needed to say that I have truly appreciated perusing your blog entries. I want to say thanks for great sharing.
    Refrigerator repairing services in islamabad

    ReplyDelete
  50. Website Designing Company in Janakpuri contact if you want to make best SEO friendly website. Website Designing Company in Janakpuri

    ReplyDelete
  51. Hey,
    Thanks for sharing this great blog. It contains a lot of information. It is easy to locate a Product Design and Development in india. But hard to choose the best Web Design services like this. All your services look very professional. Keep posting.

    ReplyDelete
  52. a href="https://www.bibsonomy.org/url/ab0ab691aa096b90653009a3fff5f40c">Blockchain Training Course Bangalore through ongoing verification as further blocks are added to the blockchain.

    blockchain courses in bangalore – We are one of the best Blockchain Training Institute in Bangalore. Because we provide High Quality Corporate Blockchain Training and Demo classes at affordable cost.

    Blockchain Training In Bangalore
    At Blockchain Brainz, our faculty ensure that you are able to dive deep into the subject to make sure that you’re able to grasp the concept and apply it in your project.

    ReplyDelete
  53. Great blog! Your blog is very informative and useful. Data science is currently one of the most popular professions globally.

    Machine Learning

    ReplyDelete
  54. Thank for sharing such informational blog. To study abroad you need to start with IELTS course, and gradually complete your study abroad dreams

    ReplyDelete
  55. Thank you very much for such an encouraging post. Thank you for sharing this useful information. Python Training in Delhi

    ReplyDelete
  56. Thank you for sharing this blog. Very useful blog for me. We are provide for positive change, improve productivity, increase the workforce, and creating a career opportunity for everyone.
    Pyspark Online Free Training

    ReplyDelete
  57. Thank You for this wonderful and much required information online tuition in hyderabad

    ReplyDelete