I didn't want to clog up a Twitter thread with a bunch of machine learning blogs/books/vignettes/software, but also thought an email to Scott wouldn't be useful to anyone else. So here are a few relatively-accessible resources that someone with a bit of math should be able to get through with ease.
This is an excellent worked vignette for regularize (generalized) linear models using the fantastic glmnet package in R:
What you'll find is that for prediction, regularized glms with some feature engineering (interactions, bucketing, splines, combinations of all three) will typically give you similar predictive performance to random forests while maintaining interpretability and the possibility of estimating uncertainty (see below). That's why they're so popular.
When you have high-dimensioned categorical predictors or natural groupings, it often doesn't make sense to one-hot encode them (ie. take fixed effects) in a regularized glm. Doing so will result in the same degree of regularization across grouping variables, which might be undesirable. In such a case you can often see huge improvements by simply using varying intercepts (and even varying slopes) in a Bayesian random effects model. The nice thing here is that because it's Bayesian, you get uncertainty for free. Well not free--you pay for it in the extra coal and time you'll burn fitting your model. But they're really pretty great. rstanarm implements these very nicely.
https://github.com/noamross/2017-11-14-noamross-gams-nyhackr/blob/master/2017-11-14-noamross-gams-nyhackr.pdf is a fun introduction
and
https://m-clark.github.io/docs/GAM.html
is a full vignette on implementation using various GAM packages.
The obvious alternatives to regularized glms are tree-based methods and neural networks. A lot of industry folks, especially those who started life using proprietary packages, use SVMs too. Pedants who enjoy O(n^3) operations seem to get a weird kick out of Gaussian Processes. The point of all these methods is the same: to relax (or really, to automatically discover) non-linear relationships between features and outcomes. Tree-based methods and neural networks will also do well at discovering interactions too. Neural networks go a step further and uncover representations of your data which might be useful in themselves.
https://cran.r-project.org/web/packages/rpart/vignettes/longintro.pdf
https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf
Next you should learn about tree-based additive models. These come in many varieties, but something close to the current state-of-the-art is implemented using xgboost. These techniques combined with smart feature engineering will work extremely well for a wide range of predictive problems. I incorporate them into my work to serve as a baseline that simpler models (for which we can get more sensible notions of uncertainty) should be able to get close to with enough work.
https://xgboost.readthedocs.io/en/latest/model.html
http://www.inference.org.uk/itprnn/book.pdf
Given you have some understanding now of what a neural network is and how they're fit, you can get down to fitting some. There are a few great high-level approaches, like Keras and H2O.ai, which are extremely easy to dive in with:
https://keras.rstudio.com
and
http://h2o-release.s3.amazonaws.com/h2o/rel-lambert/5/docs-website/Ruser/Rinstall.html
Note that these two approaches are great for fairly simple prediction tasks. If you want to make any real investment in deep learning for image/voice/NLP then you will find yourself working at a lower level (the analogy for statisticians would be going from rstanarm/brms to Stan proper), like Torch or TensorFlow. At this point you would probably be wise in asking yourself what you're doing in R--almost the entire AI community uses Python.
Even so, there is a reasonable API for TensorFlow available within R. I've not done a huge amount of playing outside of the tutorials, which seem well written.
https://tensorflow.rstudio.com/tensorflow/
Others?
If you know of any other great resources for someone--especially an economist--wanting to build their machine-learning chops, please drop them in the comments!
(Regularized) generalized linear models
This is an excellent worked vignette for regularize (generalized) linear models using the fantastic glmnet package in R:
What you'll find is that for prediction, regularized glms with some feature engineering (interactions, bucketing, splines, combinations of all three) will typically give you similar predictive performance to random forests while maintaining interpretability and the possibility of estimating uncertainty (see below). That's why they're so popular.
When you have high-dimensioned categorical predictors or natural groupings, it often doesn't make sense to one-hot encode them (ie. take fixed effects) in a regularized glm. Doing so will result in the same degree of regularization across grouping variables, which might be undesirable. In such a case you can often see huge improvements by simply using varying intercepts (and even varying slopes) in a Bayesian random effects model. The nice thing here is that because it's Bayesian, you get uncertainty for free. Well not free--you pay for it in the extra coal and time you'll burn fitting your model. But they're really pretty great. rstanarm implements these very nicely.
In the above two methods, if you want to discover non-linearities by yourself, you have to cook your own non-linear features. But there are methods that do this quite well, while retaining the interpretability of linear models. The fantastic mgcv and rstanarm packages will fit Generalised Additive Models using maximum likelihood and MCMC-based techniques respectively.
and
https://m-clark.github.io/docs/GAM.html
is a full vignette on implementation using various GAM packages.
Tree-based methods
The obvious alternatives to regularized glms are tree-based methods and neural networks. A lot of industry folks, especially those who started life using proprietary packages, use SVMs too. Pedants who enjoy O(n^3) operations seem to get a weird kick out of Gaussian Processes. The point of all these methods is the same: to relax (or really, to automatically discover) non-linear relationships between features and outcomes. Tree-based methods and neural networks will also do well at discovering interactions too. Neural networks go a step further and uncover representations of your data which might be useful in themselves.
To get a good understanding of tree-based methods, it makes sense to start at the beginning--with a simple classification and regression tree. I found this introducton pretty clear:
Once you understand CART, then Random Forests are probably the next step. The original Breiman piece is as good a place to start as any:
Next you should learn about tree-based additive models. These come in many varieties, but something close to the current state-of-the-art is implemented using xgboost. These techniques combined with smart feature engineering will work extremely well for a wide range of predictive problems. I incorporate them into my work to serve as a baseline that simpler models (for which we can get more sensible notions of uncertainty) should be able to get close to with enough work.
https://xgboost.readthedocs.io/en/latest/model.html
Net-based methods
Neural networks are of course all the rage, yet it's helpful to remember that they're really just tools for high-dimensional functional approximation. I found them hard to get into coming from an econometrics background (where notions like "maybe we should have more observations than unknowns in the model" are fairly common). But there are really just a few concepts to understand in order to get something working.
I found David Mackay's chapters on them to be extremely easy to grasp. His whole, brilliant book is available for free here, with the relevant chapters starting at page 467:
Given you have some understanding now of what a neural network is and how they're fit, you can get down to fitting some. There are a few great high-level approaches, like Keras and H2O.ai, which are extremely easy to dive in with:
https://keras.rstudio.com
and
http://h2o-release.s3.amazonaws.com/h2o/rel-lambert/5/docs-website/Ruser/Rinstall.html
Note that these two approaches are great for fairly simple prediction tasks. If you want to make any real investment in deep learning for image/voice/NLP then you will find yourself working at a lower level (the analogy for statisticians would be going from rstanarm/brms to Stan proper), like Torch or TensorFlow. At this point you would probably be wise in asking yourself what you're doing in R--almost the entire AI community uses Python.
Even so, there is a reasonable API for TensorFlow available within R. I've not done a huge amount of playing outside of the tutorials, which seem well written.
https://tensorflow.rstudio.com/tensorflow/
Others?
If you know of any other great resources for someone--especially an economist--wanting to build their machine-learning chops, please drop them in the comments!
Alex S writes:
ReplyDeleteThat's supervised learning, not ML. So you need things like
https://sites.google.com/site/igorcarron2/matrixfactorizations
and maybe
https://arxiv.org/abs/1801.01586
For smoother transition
http://mlg.eng.cam.ac.uk/zoubin/papers/lds.pdf
may help. And then I'd strongly recommend
http://castlelab.princeton.edu/html/Papers/Powell-UnifiedFrameworkStochasticOptimization_July222017.pdf
This comment has been removed by the author.
DeleteGeneralized Linear Models (GLMs) extend the framework of linear models to allow for response variables that have error distribution models other than a normal distribution. They are very flexible and can handle a wide range of data types, including binary, count, and continuous data with non-normal errors.
DeleteDeep Learning Projects for Final Year
Machine Learning Projects for Final Year
Components of Generalized Linear Models
Random Component: Specifies the probability distribution of the response variable (e.g., normal, binomial, Poisson).
Systematic Component: Specifies the explanatory variables (or predictors) as a linear combination.
Link Function: Specifies the link between the mean of the response variable and the linear predictors.
Common Types of GLMs
Linear Regression: Used for continuous response variables with normal distribution.
Logistic Regression: Used for binary response variables with binomial distribution.
Poisson Regression: Used for count response variables with Poisson distribution.
Image Processing Projects For Final Year
This comment has been removed by a blog administrator.
ReplyDeleteThanks for sharing such a great blog Keep posting.
ReplyDeletesales automation tools
sales automation crm
sales automation process
b2b database providers in india
Company Database India
corporate directory
This comment has been removed by the author.
Delete
ReplyDeleteI loved reading your post because of clear font style and size.Very helpful content for readers.Thanks for posting.Enroll in big data courses and get placement assistant.Big data refers to the large, diverse sets of information that grow at ever-increasing rates
big data training institute in btm
The content and the subject in the article are straight to the point and also very clear. Would like to know more such information related to same subject.
ReplyDeletedata science training in aurangabad
data science course in aurangabad
Many thanks for providing this information with us.
ReplyDeleteMachine Learning Training in Noida
Machine Learning Course in Noida
nice post.
ReplyDeleteIot training
Java online training
Java training
Machine learning online training
Machine learning training
Magento online training
This comment has been removed by the author.
ReplyDeleteThanks for the Valuable information.Really useful information. Thank you so much for sharing. It will help everyone.
ReplyDeleteFull Stack Training in Delhi
FOR MORE INFO:
coin haber - koin haber - kripto para haberleri - coin haber - instagram video indir - instagram takipçi satın al - instagram takipçi satın al - tiktok takipçi satın al - instagram takipçi satın al - instagram takipçi satın al - instagram takipçi satın al - instagram takipçi satın al - instagram takipçi satın al - binance güvenilir mi - binance güvenilir mi - binance güvenilir mi - binance güvenilir mi - instagram beğeni satın al - instagram beğeni satın al - google haritalara yer ekleme - btcturk güvenilir mi - binance hesap açma - kuşadası kiralık villa - tiktok izlenme satın al - instagram takipçi satın al - sms onay - paribu sahibi - binance sahibi - btcturk sahibi - paribu ne zaman kuruldu - binance ne zaman kuruldu - btcturk ne zaman kuruldu - youtube izlenme satın al - torrent oyun - google haritalara yer ekleme - altyapısız internet - bedava internet - no deposit bonus forex - erkek spor ayakkabı - tiktok jeton hilesi - tiktok beğeni satın al - microsoft word indir - misli indir - instagram takipçi satın al
ReplyDeleteGood post and its very informative too. Thanks for sharing...
ReplyDeleteVisit us: java course
Visit us: Core Java Online Course
Visit us: Java Online Training Hyderabad
There may be noticeably a bundle to find out about this. I assume you made sure good factors in options also. Cryptocurrency web App Build Exchange Website
ReplyDeleteAfter reading your article I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article.
ReplyDeletedata science training
Sometimes, blogging is a bit tiresome specially if you need to update more topics. 3d animation companies in karachi
ReplyDeletePython is one of the most powerful languages that are simple to master and easy to master. Python is a quantitative field, and AI Patasala is the top choice for Python Training in Hyderabad.
ReplyDeletePython Course Hyderabad
keep it up.If you are Searching for info click on given link
ReplyDeleteMobile Prices Bangladesh
Accounts Officer exam paper
ReplyDeleteclass 9 tuition classes in gurgaon
laboratory furniture manufacturer
world777
cloudkeeda
what is azure
azure free account
azure data factory
Thanks , I have just been looking for information approximately this subject for a while and yours is the greatest I've found out till now. However, what concerning the conclusion? Are you sure about the source?
ReplyDelete야한소설
대딸방
출장안마
출장마사지
카지노
Machine Learning Course in Noida
ReplyDeleteMachine Learning Training in Noida
Digital marketing is a field with lucrative career options. The options in the career of digital marketing.
ReplyDeleteAfter the pandemic period, digital marketing has emerged with a lot of opportunities across the globe.
Currently, Delhi is now a great stop for digital marketers and many folks are looking forward to starting a career in digital marketing.
Parallelly, there are digital marketing institutes that providing high quality training which are Best digital marketing academy in Delhi
Hi, I read your whole blog. This is very nice. Good to know about the career in Python Training & Certification. We are also providing various Python Training , anyone interested can Python Courses for making their career in this field .
ReplyDeleteNice Blog Keep Sharing such infromative content.
ReplyDeletemachine learning certification course
I will truly value the essayist's decision for picking this magnificent article fitting to my matter.Here is profound depiction about the article matter which helped me more.
ReplyDeleteMachine Learning Course in Noida
ReplyDelete360DigiTMG, the top-rated organisation among the most prestigious industries around the world, is an educational destination for those looking to pursue their dreams around the globe. The company is changing careers of many people through constant improvement, 360DigiTMG provides an outstanding learning experience and distinguishes itself from the pack. 360DigiTMG is a prominent global presence by offering world-class training. Its main office is in India and subsidiaries across Malaysia, USA, East Asia, Australia, Uk, Netherlands, and the Middle East.
ReplyDeleteSAP HR Training In Noida
ReplyDeleteGreat! Here is the best Machine learning training institute in Delhi that offers the best training with the live projects from certified trainers. It also offers the placements in top IT companies.
ReplyDeleteMachine Learning Training in Noida
ReplyDeleteThank you for sharing this coaching-related information with local and global audiences. Primary audiences (Coaching Center, Training Center, Business Coaching Class) who wants coaching software, book a free trial of coaching management software to generate the leads, schedule appointments, inquiry management, payments, and business reports in 2022.
ReplyDeleteMachine Learning Training Institute in Noida
ReplyDeleteGreat Post. Very informative. Keep Sharing!!
ReplyDeleteApply Now for Big Data course In Noida
For more details about the course fee, duration, classes, certification, and placement call our expert at 70-70-90-50-90
This comment has been removed by the author.
ReplyDelete
ReplyDeleteRole of Artificial Intelligence
Machine Learning Institute in Delhi
ReplyDeletehttps://www.wikiful.com/@trainingdelhi/should-you-be-worried-about-your-job-if-youre-doing-machine-learning-course
Get everything and become an expert of Machine Learning. APTRON is the best Machine Learning Institute in Delhi. Machine Learning Training in Delhi Offered by APTRON is the most noteworthy Machine Learning Training anytime Top Quality Trainers, affordable fees, authorized Machine Learning Certification.
Very interesting this article. This is my first time visit here. I found so many interesting stuff in your blog especially its discussion. Thanks for the post! Please visit Here
ReplyDeleteReally great post, I simply unearthed your site and needed to say that I have truly appreciated perusing your blog entries. I want to say thanks for great sharing.
ReplyDeleteRefrigerator repairing services in islamabad
Website Designing Company in Janakpuri contact if you want to make best SEO friendly website. Website Designing Company in Janakpuri
ReplyDeleteHey,
ReplyDeleteThanks for sharing this great blog. It contains a lot of information. It is easy to locate a Product Design and Development in india. But hard to choose the best Web Design services like this. All your services look very professional. Keep posting.
Great blog! Your blog is very informative and useful. Data science is currently one of the most popular professions globally.
ReplyDeleteMachine Learning
uc satın al
ReplyDeleteyurtdışı kargo
nft nasıl alınır
minecraft premium
en son çıkan perde modelleri
en son çıkan perde modelleri
lisans satın al
özel ambulans
Thank you very much for such an encouraging post. Thank you for sharing this useful information. Python Training in Delhi
ReplyDeleteChoose Secure Move Packers and Movers for affordable Packers and Movers in India.
ReplyDeleteSecure Move provides Best Car Transportation Service in Ghaziabad Contact us for more information.
We will repair your computer Repair Services wardha provide service for PCs from branded manufacturers, but also for various non-branded "assembled" PCs.
ReplyDeleteThank you for sharing the great content.
ReplyDeleteData Engineering Training
This comment has been removed by the author.
ReplyDeleteGreat information, thank you for sharing here.
ReplyDeleteData Engineering Training
If You Want To Study Medicine, Do Not Worry, As Several Places Provide Affordable Medical Study, Including Poland, Georgia, Armenia And Ukraine. If You Want Medical Admission In Armenia , Consult Our Team To Get The Best Solutions
ReplyDeleteYour blog rocks! I just wanted to say that your blog is awesome. It’s really helped me to chose ms in machine learning usa.
ReplyDeleteinstagram takipçi satın al
ReplyDeleteassam tea Botad
ReplyDeleteassam tea Budaun
assam tea Budgam
assam tea Bulandshahr
assam tea Buldana
assam tea Bundi
assam tea Burhanpur
assam tea Buxar
assam tea Chamarajanagar
Informational article. Must read about hypnotherapy business
ReplyDeleteMachine Learning Course in Bangalore
ReplyDeleteHey
ReplyDeleteI really enjoyed reading your blog. It is very informative information you are providing. WTE Academy is also a very helpful institute for students who are interested in applying to Medical admission in Poland. For a better future, join us
Thanks for sharing
Your information is very helpful, thank you for sharing.Machine Downtime Monitoring Erp System
ReplyDeleteThank you for providing this helpful information. Are you a student of GCSE Board and looking for a best tutors for online classes. Our online home tuition program for GCSE board is designed to provide personalized and effective learning to students from the comfort of their homes.
ReplyDeleteFor more info contact +91-9654271931 | UAE +971- 505593798 or visit Tuition Classes of GCSE
Thank you for sharing the valuable article with us.
ReplyDeleteBest Artificial intelligence Services Company/a>
Great article, thank you for sharing with us.
ReplyDeleteBest Artificial Intelligence Services Company/a>
Thanks for sharing this informative article on Some good introductory machine learning resources in R. If you want to Machine learning development company for your project. Please visit us.
ReplyDeleteIn the heart of Gurgaon's technological landscape, APTRON's Data Science Institute in Gurgaon stands as a hub of excellence. Its comprehensive curriculum, expert faculty, practical approach, top-notch infrastructure, placement assistance, and networking opportunities make it a standout choice for individuals aspiring to excel in the field of data science. By choosing APTRON, you're not just enrolling in an institute – you're embarking on a transformative journey toward becoming a proficient data scientist ready to conquer the data-driven world.
ReplyDeleteFixed matches tips ht ft
ReplyDeleteFootball prediction
This is an awesome post. Really very informative and creative contents.
ReplyDeleteBest IAS Coaching in Bhubaneswar
Best IAS Coaching in kolkata
Comparta excelente información sobre su blog. Blog realmente útil para nosotros.
ReplyDeleteDP-080: Querying Data with Microsoft Transact-SQL
Interessant artikel! Voor degenen die geïnteresseerd zijn in het ontwikkelen van een website om hun machine learning vaardigheden te tonen, kan het bouwen van een goedkope website een geweldige manier zijn om hun portfolio te presenteren. Goedkope website laten maken Als je meer wilt weten over het bouwen van zo'n platform, kan het laten maken van een goedkope website een goede optie zijn. Bedankt voor het delen van deze waardevolle bronnen in R!
ReplyDeleteAPTRON Solutions stands out as a premier institution for Machine Learning Training in Noida, combining expert instruction, a robust curriculum, hands-on projects, and excellent placement support. Whether you are looking to start your career or advance it, the comprehensive training provided by APTRON Solutions will equip you with the skills needed to thrive in the competitive world of machine learning. Enroll now and take a significant step towards becoming a machine learning expert.
ReplyDeleteAnchor Text Link Generator
ReplyDeleteReally I am very impressed with this post. Just awesome... I haven’t any word to appreciate this post. DP-080: Querying Data with Microsoft Transact-SQL
ReplyDelete