Some interesting Data Science stuff found between 2018-02-01 and 2018-02-28. https://www.youtube.com/watch?v=atiYXm7JZv0 - (by J.J. Allaire from Rstudio) - Machine Learning with R and TensorFlow - a video introduction to Deep Learning in R.
#deeplearning #rstats #tensorflow #datascience https://t.co/W4SjSTBYQq
https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 - (by James Le) - a collection of some basic ML algorithms for newbies. Pictures in the article are pretty good.
#datascience #ml https://t.co/QGOliYhXgt
https://tensorflow.rstudio.com/blog/keras-customer-churn.html - (by Matt Dancho) - predicting customer churn using deep learning in R.
Some time ago I had to move from sparklyr to Scala for better integration with Spark, and easier collaboration with other developers in a team. Interestingly, this conversion was much easier than I thought because Spark’s DataFrame API is somewhat similar to dplyr, there’s groupBy function, agg instead of summarise, and so on. You can also use traditional, old SQL to operate on data frames. Anyway, in this post, I’ll show how to fit very simple LDA (Latent Dirichlet allocation) model, and then extract information about topic’s words.
Some interesting Data Science stuff found between 2018-01-16 and 2018-01-31. https://simplystatistics.org/2018/01/22/the-dslabs-package-provides-datasets-for-teaching-data-science/ - (by Rafael Irizarry) package dslab containing datasets for teaching data science.
install.packages(“dslabs”) #CopyAndInstall
#applyrds #rstats #datascience https://t.co/db0LUvCBx8
https://github.com/facebookresearch/StarSpace - a general purpose #NLP library from @fb_research. For now, it works only from a command line. However, it’s easy to build and use from command line.
#FacebookResearch #applyrds https://t.co/hcyjVdLdIZ
https://research.fb.com/facebook-open-sources-detectron/ - Facebook open sources Detectron, a platform for object detection running on the top of Caffe2.
Some interesting Data Science stuff found between 2018-01-16 and 2018-01-16. https://www.tidyverse.org/articles/2018/01/dbplyr-1-2/ - a new version of the database backend for dplyr. It allows using stringr functions in the mutate statements, and the operations are evaluated directly on the database. #applyrds #db #rstats https://t.co/76sX7KjIxR
https://github.com/welovedatascience/stranger - new package for anomaly detection in R. #rstats #pkg #applyrds https://t.co/O1itP9YXML
https://hughjonesd.github.io/huxtable/ - an alternative for xtable? I hope so:) Conditional formatting (e.g., make background red if the value is larger than 3) seems to be very easily achievable.
R has various packages to call other languages, like Rccp, rJava or sparklyr. Those tools significantly expand R’s capabilities, because the user doesn’t need to learn a lot of different stuff. Everything is nicely integrated into R.
However, sometimes the problem is different - there’s an existing system written in some language, and R can be used to expand its possibilities. So in that scenario R must be called.
In that post, I’ll describe how R can be integrated with C# program using Microsoft.
The best way to organize shiny app is to use modules. They are also an excellent choice when you need to replicate some functionality few times. For example, when you want to compare some plots with different parameters, the modules are your way to go. In basic usage, modules require a direct call of callModule in a server and a place in UI. However, sometimes you don’t know how many instances of a given module are needed.
This is the last (planned) RNews in 2017.
In this week there’s something about reproducible research (anyone should do this!), and deep learning - one tutorial, and one general article about image classification in radiology.
Books http://www.britishecologicalsociety.org/wp-content/uploads/2017/12/guide-to-reproducible-code.pdf - this book is a guide to writing reproducible code using R. There’s an in Ecology and Evolution in the title, but the scope of this writing should not be limited to only that areas:) Articles: https://tensorflow.
This list is quite small because I hit the history limit on Twitter:( Nevertheless, there are some exciting things - e.g. videos from H2O World 2017.
Videos https://www.youtube.com/playlist?list=PLNtMya54qvOHQs2ZmV-pPSW_etMUykE0_ - list of all videos from H2O World 2017. There are at least a few videos which seem to be worth watching. Unfortunately, I didn’t have time to do so:(. Articles: https://rviews.rstudio.com/2017/12/11/r-and-tensorflow/ - this article claims that installing keras (for deep learning) is as simple as calling keras::install_keras().
Another pack of news from R’s world.
Aggregators: https://rweekly.org/ - this is a great aggregator - even better than mine;) Articles: http://smarterpoland.pl/index.php/2017/12/explain-explain-explain/ - a quick summary of the packages which can be used to explain the results of various models (lm, glm, xgboost, etc.). Unfortunately, there’s nothing about LIME, which is more general purpose package for explaining models (https://cran.r-project.org/web/packages/lime/index.html).
https://christophm.github.io/interpretable-ml-book/ - an online book about explaining models predictions.
Next portion of exciting information from R’s world (in fact, not only R’s but it’s the main thing here).
Aggregators: https://trello.com/b/rbpEfMld/data-science - a huge collection of the resources related to the Data Science (R, Python, Big Data, and so on). Articles: https://drsimonj.svbtle.com/visualising-residuals - long post about visualization of the residuals. The “Multiple Regression” section is worth to check - the analysis is quite impressive.
http://www.tandfonline.com/doi/full/10.1080/01621459.2017.1311264 - the old saga about why p-values are evil continues.