Doing some research on dplyr-alike data packages for Python. Several options have been developed over the years with different emphases and back-ends. It speaks volumes about the contribution of @Posit to the philosophy of data wrangling.
Packages I've found so far:
* siuba (maintained by a Posit engineer)
* pyplyr (new project by @_wurli)
* tidypolars (@markfairbanks)
* ibis
* plydata
* dplython (possibly abandoned?)
Any that I missed?
#RStats #Python #DataScience
@josi
oh thanks! I think I saw that in my search results and assumed it was an R package, oops
@ataustin @Posit @_wurli @markfairbanks
I tried plydata because it's from the same person who made plotnine (ggplot2 #Python port) and plotnine is amazing. However, unlike grammar of graphics, I don't have intuition for plydata. Maybe, it's lack of autocomplete, something with the library, or just "grammar for data" isn't that intuitive. I never tried R, though.
The big advantage of pandas over any alternatives is the crazy amount of tutorials and SO answers for any problem you may face.
@ataustin @Posit @_wurli @markfairbanks
I liked in plydata that it natively integrates with plotnine. Most of my data analysis includes visualization, and that's something I miss a lot in polars, the new fast alternative to pandas. Most of my polars pipelines end up with converting it to pandas anyway and then visualizing that dataframe with plotnine or seaborn.
@orsinium
great point, I haven't even begun thinking about visualization and how the packages play together in Python land
@ataustin @Posit @_wurli @markfairbanks Oh interesting, many I have not come across! And I have been searching for a dplyr like experience in Python for a while (pandas almost made me give up the whole snake biz).
I wonder if polars itself should count? In terms of "chain SQL-inspired verbs together to operate on data frames", it feels quite close sometimes. Or is there another criteria for you for "dplyr similarity"?
@mario_angst_sci
no real criteria, I'm just a lowly R user lost in a Python world. I've found though that when working in Python I lose the clarity of reasoning that comes when I use R, so somehow the grammar of data work for me is bound up in the verbs and flow of tidy principles. If I can get close to mimicking those in Python I think I'll be more effective as a data scientist.
@ataustin Oh I feel you, that feels similar . Good luck on your journey and keep us posted
! I am nowhere near as effective in Python as in R for a lot of stuff, but recently, polars has really helped me. As has investing some time in learning more about how to build APIs in both languages (w/ plumber and FastAPI). That might sound a bit unrelated, but it has helped me to know that when one part of a project works better in the other language, I can compartmentalize and build an interface.
@mario_angst_sci @ataustin @Posit @markfairbanks I'd count polars! The dplyr influence is pretty clear I think.
@_wurli @ataustin @Posit @markfairbanks Yeah, I'd do so too - I've only been using polars as my main EDA workhorse in Python for a couple of weeks now but I really like it. I have started recommending it to everyone I know who is coming from R and has to do some stuff in Python.
The big thing I still miss in Python EDA is really something as intuitive yet powerful as ggplot to pipe all these chains of data transformations into at the end for plotting, that is really where R shines bright .