One of the exciting parts of the new sparse data tidymodels work, is that {textrecipes} can now be used as a reproducible way to generate DTM, tf-idf etc etc
#rstats #tidymodels
One of the exciting parts of the new sparse data tidymodels work, is that {textrecipes} can now be used as a reproducible way to generate DTM, tf-idf etc etc
#rstats #tidymodels
This summer, join the tidymodels team as an intern and help expand the possibilities of feature selection!
Over the years, our eight summer interns have added incredible contributions, including packages like agua, applicable, bundle, butcher, shinymodels, spatialsample, and stacks. Now, it’s your turn to shape the future of #tidymodels #RStats tools!
Learn more and apply: https://www.tidyverse.org/blog/2025/01/tidymodels-2025-internship/
Combining two of my favorite things.
#RStats
and oysters. My latest blog post is a project to predict New York Harbor water quality using data from Billionoysterproject.org and #tidymodels
https://outsiderdata.netlify.app/posts/2024-12-05-predicting-water-quality-in-new-york-harbor/oyster
Introducing **tidyAML**: the new R package for automated machine learning!
Quickly generate multiple regression models with just a few lines of code, all while leveraging the powerful **tidymodels** ecosystem.
No Java setup needed! Perfect for beginners & pros alike.
Check it out! #rstats #AutoML #DataScience #tidymodels #parsnip
Introducing **tidyAML**: the new R package for automated machine learning!
Quickly generate multiple regression models with just a few lines of code, all while leveraging the powerful **tidymodels** ecosystem.
No Java setup needed! Perfect for beginners & pros alike.
Check it out! #rstats #AutoML #DataScience #tidymodels #parsnip
Introducing support for postprocessing in tidymodels!
Postprocessors refine predictions outputted from machine learning models to improve predictive performance or better satisfy distributional limitations.
The tidymodels team has been working on a set of changes across many #tidymodels packages to introduce support for postprocessing. They would love to hear your thoughts on their progress so far!
Learn more in the blog post: https://www.tidyverse.org/blog/2024/10/postprocessing-preview/
The healthyverse meta package:
healthyR: Streamline hospital data workflows
healthyR.ts: Master time series analysis
healthyR.ai: Implement AI modeling seamlessly
healthyR.data: Access curated healthcare datasets
TidyDensity: Simplify probability distributions
tidyAML: Automate machine learning with tidymodels
RandomWalker: Explore random walk analysis
install.packages("healthyverse")
library(healthyverse)
The healthyverse meta package:
healthyR: Streamline hospital data workflows
healthyR.ts: Master time series analysis
healthyR.ai: Implement AI modeling seamlessly
healthyR.data: Access curated healthcare datasets
TidyDensity: Simplify probability distributions
tidyAML: Automate machine learning with tidymodels
RandomWalker: Explore random walk analysis
install.packages("healthyverse")
library(healthyverse)
some more #TimeSeries testing in #r #RStats using the #tidymodels #parsnip extension #modeltime
some more #TimeSeries testing in #r #RStats using the #tidymodels #parsnip extension #modeltime
I think I may move my tidyAML to something like the following, takes away from the current dynamic creation but helps isolate issues maybe, idk.
I think I may move my tidyAML to something like the following, takes away from the current dynamic creation but helps isolate issues maybe, idk.
recipes 1.1.0 is on CRAN! recipes lets you create a pipeable sequence of feature engineering steps.
Improvements in column type checking, allowing more data types to be passed to recipes, use of long formulas, and better error for misspelled argument names.
Check out the blog post for more details (and a delicious treat at the end ): https://www.tidyverse.org/blog/2024/07/recipes-1-1-0/
ok how did I not know until now that you can add se.fit = TRUE to the predict() function to get errors?
and of course, I now see there is a std_error option and several others in the #tidymodels version
what do these do for nonparametric models, I wonder?
No matter how much I think I know, there is always so much more to learn...
I'll be running an "Introduction to machine learning with {tidymodels}" workshop at RSS Conference in September!
Session details: Wednesday 4 September, 2024
11:30am - 12:50pm
Brighton, UK
More info: https://virtual.oxfordabstracts.com/#/event/6693/program?session=92723&s=2600
Register: https://rss.org.uk/training-events/conference-2024/
We have five posit::conf(2024) workshops for #RStats
and #Python
modeling and ML enthusiasts!
• Causal Inference in R, led by @malcolmbarrett and @travisgerke
• Introduction to machine learning in Python with Scikit-learn, led by @TiffanyTimbers and Trevor Campbell
• Intro to MLOps with vetiver, led by @isabelizimm
• Introduction to tidymodels, led by @hfrick and @simonpcouch
• Advanced Tidymodels, led by @topepo
What an incredible lineup of️ talks in our last session! 5-min talks aren’t an easy feat!
We learned about #TidyModels, #CRAN, Posit Public Package Manager, writing #QuartoPub blogs, #Shiny portfolio analysis, #RStats package development for epi dashboards, and the GenTwoArmsTrialSize R stats package!
Huge round of applause for:
This week's blog post is on deploying #MLOps with #tidymodels using #vetiver! Dive in to learn how to streamline your machine learning workflows:
#DataScience #RStats #MachineLearning
https://www.jumpingrivers.com/blog/vetiver-mlops-tidymodels-deployment/
Preprint from Simon Wood on the new cross-validation smoothness estimation in #mgcv: https://arxiv.org/abs/2404.16490. It's a neat performant + data-efficient way to estimate GAMs based on complex CV splits (like spatial/temporal/phylo ones).
See ?NCV in latest {mgcv} for examples (https://cran.r-universe.dev/mgcv/doc/manual.html#NCV)
I might write a helper to convert {rsample}/{spatialsample} objects into mgcv's funny CV indexing structure.
#rstats #ml #tidymodels #mgcvchat @MikeMahoney218 @gavinsimpson @ericJpedersen @millerdl
tidymodels has long supported parallelizing model fits across CPU cores. A couple of the modeling engines that #rstats #tidymodels supports for gradient boosting—#XGBoost and #LightGBM—have their own tools to parallelize model fits. A new blog post explores whether tidymodels users should use tidymodels' implementation, the engines', or both.