R-bloggers

Subscribe to R-bloggers feed R-bloggers
R news and tutorials contributed by hundreds of R bloggers
Updated: 41 min 21 sec ago

Notes on Becoming an RStudio Certified Trainer

Tue, 20/08/2019 - 19:04

[This article was first published on R – AriLamstein.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I recently became an RStudio Certified Trainer, and thought that it might interest the broader R Community to learn about this new program.

For those who don’t know, RStudio has recently put together a process to independently verify that R trainers (a) are proficient with the Tidyverse and (b) know modern teaching pedagogy. Certified trainers get listed on RStudio’s website and also get referrals for training requests. Apparently there are a lot of people who want to learn the Tidyverse, and RStudio cannot keep up with the demand themselves!

The author with Garrett Grolemund at RStudio’s 2018 conference

I was actually one of the first people involved in this program, having taken Garrett Grolemund’s Tidyverse Train-the-Trainer workshop at RStudio’s 2018 Conference. Garrett, a Data Scientist and Master Instructor at RStudio, had recently created a popular workshop for introducing people to the Tidyverse.

The idea behind Tidyverse Train-the-Trainer was for people to learn two things. The first, of course, was to learn the ins and outs of Garrett’s workshop on the Tidyverse. The second, and perhaps more important, thing to learn was how Garrett had come to create this workshop. This involved learning a lot of important research that’s been done on adult education. The workshop also had lots of time for us to practice what we were learning.

At the end of the workshop we got the slides Garrett uses for his own workshop on the Tidyverse, and were told that we could use and modify them however we wanted. Perhaps it’s not surprising, but Garrett’s slides on ggplot2 and dplyr were fantastic, and I now use them when I teach!

I should also mention that the requirements for becoming Certified have recently increased. I believe that when I first took Garrett’s workshop, everyone who attended received a certificate. But recently, RStudio has started listing their “Certified Training Partners” on their website. In order to take be listed in this directory I had to take two additional exams. One exam was on the Tidyverse and one was on Teaching. The exams were given online, and were proctored by an RStudio employee.

Overall, I would recommend this program to anyone who wants to improve their ability to teach R. If you are a professional trainer, then the program can only help you in your career. But many people in the workshop were not professional trainers. They worked in academia and the corporate sector, and simply wanted help in bringing R to their organizations.

You can see the full list of RStudio Certified Trainers here. If you are interested in becoming certified yourself, you can learn more about the application process here.

The post Notes on Becoming an RStudio Certified Trainer appeared first on AriLamstein.com.

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R – AriLamstein.com. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Modern reporting for R with Dash

Tue, 20/08/2019 - 07:30

[This article was first published on R – Modern Data, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Creating an effective, informative, and aesthetically appealing report to showcase your data can be tedious: it’s often difficult to display your data and your plots together in an uncluttered manner, and even harder to implement interactivity between the individual elements. Dash for R facilitates this task, providing an intuitive way to make interactive and customizable reports directly from the R environment, without the need to create your own JavaScript components. If you’re already using R for data wrangling, visualization, and analysis, it’s convenient to stay within the R ecosystem to create your report as well.

Dash for R allows users to present interactive plots and tabular data side-by-side to monitor, highlight, or explore key aspects of their data. The library includes a rich set of GUI components that make it easy to interact with your data out of the box, and allows for customizing all aspects of your dashboard. As a result, it’s surprisingly easy to create a modern report with an intuitive user interface to better communicate your data. 

Displaying and editing your data

Displaying tabular data can give the reader a good sense of the data you are working with, but when it is shown as a static table, it can be hard to digest and intimidating. Instead, it’s nice to display an interactive, formattable spreadsheet, providing a familiar and flexible tool within the report itself. The Dash DataTable component creates tables that can be sorted, filtered, and conditionally formatted, providing extensive support for customized views. 

These tables can also be linked to your plots, so when you modify or filter your data, the changes to your data tables are reflected graphically on the fly. As data are added or modified in the table, the changes are immediately reflected in the linked plot. Data tables that are created or modified in your report can be downloaded locally, so they can be used in another program as well. 

Tabbed applications

With complex analyses, it’s common to end up with more data than can reasonably be displayed at once. It’s better to organize the layout of your app so that the different aspects of your analyses are grouped together. This is where separate tabs and pages are useful. Dash takes the hassle out of creating multi-page apps, allowing you to compartmentalize the data and charts that you display into tabs, using the dccTab component.

For example, if your data has a geographical component, you can display an interactive map in one tab, summary plots in another, and a data table in a third. This allows for an uncluttered display of your data, and separates different views or controls for an easily understandable visualization.

You can also use dccLocation and dccLink to create a multi-page app that can be navigated through links instead of tabs. In fact, our interactive online Dash for R documentation is a multi-page Dash app in itself.

Styling and customization

Whether you have a specific vision for your app or need to incorporate your company’s branding, reports made with Dash are completely customizable. Components can be styled inline with the style property, using local CSS in your app’s assets directory or via an external CSS stylesheet. This means you can quickly modify the look of an individual component directly in R, or reference a CSS file that will apply styles to your components given their className or id. The ability to style an app using an external stylesheet means you can create generalizable styles to be applied to multiple components and have deeper control over the styling of the components, like sliders or radio buttons.  

Interested in learning more?

You can explore full working examples of apps and reports, along with the code to generate them, in the Dash app gallery. Many of these examples show modern takes on traditional dashboards, while others, such as the financial report example pictured below, are structured more like interactive PDFs, allowing researchers and analysts to deliver beautiful and informative reports to their collaborators or clients.

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R – Modern Data. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

How to get an AUC confidence interval

Tue, 20/08/2019 - 05:45

[This article was first published on R – Open Source Automation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


(adsbygoogle = window.adsbygoogle || []).push({ google_ad_client: "ca-pub-4184791493740497", enable_page_level_ads: true });

Background

AUC is an important metric in machine learning for classification. It is often used as a measure of a model’s performance. In effect, AUC is a measure between 0 and 1 of a model’s performance that rank-orders predictions from a model. For a detailed explanation of AUC, see this link.

Since AUC is widely used, being able to get a confidence interval around this metric is valuable to both better demonstrate a model’s performance, as well as to better compare two or more models. For example, if model A has an AUC higher than model B, but the 95% confidence interval around each AUC value overlaps, then the models may not be statistically different in performance. We can get a confidence interval around AUC using R’s pROC package, which uses bootstrapping to calculate the interval.

Building a simple model to test

To demonstrate how to get an AUC confidence interval, let’s build a model using a movies dataset from Kaggle (you can get the data here).

Reading in the data # load packages library(pROC) library(dplyr) library(randomForest) # read in dataset movies <- read.csv("movie_metadata.csv") # remove records with missing budget / gross data movies <- movies %>% filter(!is.na(budget) & !is.na(gross)) Split into train / test

Next, let’s randomly select 70% of the records to be in the training set and leave the rest for testing.

# get random sample of rows set.seed(0) train_rows <- sample(1:nrow(movies), .7 * nrow(movies)) # split data into train / test train_data <- movies[train_rows,] test_data <- movies[-train_rows,] # select only fields we need train_need <- train_data %>% select(gross, duration, director_facebook_likes, budget, imdb_score, content_rating, movie_title) test_need <- test_data %>% select(gross, duration, director_facebook_likes, budget, imdb_score, content_rating, movie_title) Create the label

Lastly, we need to create our label i.e. what we’re trying to predict. Here, we’re going to predict if a movie’s gross beats its budget (1 if so, 0 if not).

train_need$beat_budget <- as.factor(ifelse(train_need$gross > train_need$budget, 1, 0)) test_need$beat_budget <- as.factor(ifelse(test_need$gross > test_need$budget, 1, 0)) Train a random forest

Now, let’s train a simple random forest model with just 50 trees.

# train a random forest forest <- randomForest(beat_budget ~ duration + director_facebook_likes + budget + imdb_score + content_rating, train_need, ntree = 50, na.omit = TRUE) Getting an AUC confidence interval

Next, let’s use our model to get predictions on the test set.

test_pred <- predict(forest, test_need, type = "prob")[,2]

And now, we’re reading to get our confidence interval! We can do that in just one line of code using the ci.auc function from pROC. By default, this function uses 2000 bootstraps to calculate a 95% confidence interval. This means our 95% confidence interval for the AUC on the test set is between 0.6198 and 0.6822, as can be seen below.

ci.auc(test_need$beat_budget, test_pred) # 95% CI: 0.6198-0.6822 (DeLong)

We can adjust the confidence interval using the conf.level parameter:

ci.auc(test_need$beat_budget, test_pred, conf.level = 0.9) # 90% CI: 0.6248-0.6772 (DeLong)

That’s it for this post! Please click here to follow this blog on Twitter!

See here to learn more about the pROC package.

The post How to get an AUC confidence interval appeared first on Open Source Automation.

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R – Open Source Automation. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

RcppQuantuccia 0.0.3

Tue, 20/08/2019 - 02:45

[This article was first published on Thinking inside the box , and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A maintenance release of RcppQuantuccia arrived on CRAN earlier today.

RcppQuantuccia brings the Quantuccia header-only subset / variant of QuantLib to R. At the current stage, it mostly offers date and calendaring functions.

This release was triggered by some work CRAN is doing on updating C++ standards for code in the repository. Notably, under C++11 some constructs such ptr_fun, bind1st, bind2nd, … are now deprecated, and CRAN prefers the code base to not issue such warnings (as e.g. now seen under clang++-9). So we updated the corresponding code in a good dozen or so places to the (more current and compliant) code from QuantLib itself.

We also took this opportunity to significantly reduce the footprint of the sources and the installed shared library of RcppQuantuccia. One (unexported) feature was pricing models via Brownian Bridges based on quasi-random Sobol sequences. But the main source file for these sequences comes in at several megabytes in sizes, and allocates a large number of constants. So in this version the file is excluded, making the current build of RcppQuantuccia lighter in size and more suitable for the (simpler, popular and trusted) calendar functions. We also added a new holiday to the US calendar.

The complete list changes follows.

Changes in version 0.0.3 (2019-08-19)
  • Updated Travis CI test file (#8)).

  • Updated US holiday calendar data with G H Bush funeral date (#9).

  • Updated C++ use to not trigger warnings [CRAN request] (#9).

  • Comment-out pragmas to suppress warnings [CRAN Policy] (#9).

  • Change build to exclude Sobol sequence reducing file size for source and shared library, at the cost of excluding market models (#10).

Courtesy of CRANberries, there is also a diffstat report relative to the previous release. More information is on the RcppQuantuccia page. Issues and bugreports should go to the GitHub issue tracker.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: Thinking inside the box . R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Fitting ‘complex’ mixed models with ‘nlme’. Example #1

Tue, 20/08/2019 - 02:00

[This article was first published on R on The broken bridge between biologists and statisticians, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The environmental variance model

Fitting mixed models has become very common in biology and recent developments involve the manipulation of the variance-covariance matrix for random effects and residuals. To the best of my knowledge, within the frame of frequentist methods, the only freeware solution in R should be based on the ‘nlme’ package, as the ‘lmer’ package does not easily permit such manipulations. The ‘nlme’ package is fully described in Pinheiro and Bates (2000). Of course, the ‘asreml’ package can be used, but, unfortunately, this is not freeware.

Coding mixed models in ‘nlme’ is not always easy, especially when we have crossed random effects, which is very common with agricultural experiments. I have been struggling with this issue very often in the last years and I thought it might be useful to publish a few examples in this blog, to save collegues from a few headaches. Please, note that I have already published other posts dealing with the use of the ‘lme()’ function in the ‘nlme’ package, for example this post here about the correlation in designed experiments and this other post here, about heteroscedastic multienvironment experiments.

The first example in this series relates to a randomised complete block design with three replicates, comparing winter wheat genotypes. The experiment was repeated in seven years in the same location. The dataset (‘WinterWheat’) is available in the ‘aomisc’ package, which is the companion package for this blog and it is available on gitHub. Information on how to download and install the ‘aomisc’ package are given in this page. Please, note that this dataset shows the data for eight genotypes, but the model that we want to fit requires that the number of environments is higher than the number of genotypes. Therefore, we have to make a subset, at the beginning, removing a couple of genotypes.

The first code snippet loads the ‘aomisc’ package and other necessary packages. Afterwards, it loads the ‘WinterWheat’ dataset, subsets it and turns the ‘Genotype’, ‘Year’ and ‘Block’ variables into factors.

library(plyr) library(nlme) library(aomisc) data(WinterWheat) WinterWheat <- WinterWheat[WinterWheat$Genotype != "SIMETO" & WinterWheat$Genotype != "SOLEX",] WinterWheat$Genotype <- factor(WinterWheat$Genotype) WinterWheat$Year <- factor(WinterWheat$Year) WinterWheat$Block <- factor(WinterWheat$Block) head(WinterWheat, 10) ## Plot Block Genotype Yield Year ## 1 2 1 COLOSSEO 6.73 1996 ## 2 1 1 CRESO 6.02 1996 ## 3 50 1 DUILIO 6.06 1996 ## 4 49 1 GRAZIA 6.24 1996 ## 5 63 1 IRIDE 6.23 1996 ## 6 32 1 SANCARLO 5.45 1996 ## 9 110 2 COLOSSEO 6.96 1996 ## 10 137 2 CRESO 5.34 1996 ## 11 91 2 DUILIO 5.57 1996 ## 12 138 2 GRAZIA 6.09 1996

Dealing with the above dataset, a good candidate model for data analyses is the so-called ‘environmental variance model’. This model is often used in stability analyses for multi-environment experiments and I will closely follow the coding proposed in Piepho (1999):

\[y_{ijk} = \mu + g_i + r_{jk} + h_{ij} + \varepsilon_{ijk}\]

where \(y_{ijk}\) is yield (or other trait) for the \(k\)-th block, \(i\)-th genotype and \(j\)-th environment, \(\mu\) is the intercept, \(g_i\) is the effect for the i-th genotype, \(r_{jk}\) is the effect for the \(k\)-th block in the \(j\)-th environment, \(h_{ij}\) is a random deviation from the expected yield for the \(i\)-th genotype in the \(j\)-th environment and \(\varepsilon_{ijk}\) is the residual variability of yield between plots, within each environment and block.

We usually assume that \(r_{jk}\) and \(\varepsilon_{ijk}\) are independent and normally distributed, with variances equal to \(\sigma^2_r\) and \(\sigma^2_e\), respectively. Such an assumption may be questioned, but we will not do it now, for the sake of simplicity.

Let’s concentrate on \(h_{ij}\), which we will assume as normally distributed with variance-covariance matrix equal to \(\Omega\). In particular, it is reasonable to expect that the genotypes will have different variances across environments (heteroscedasticity), which can be interpreted as static stability measures (‘environmental variances’; hence the name ‘environmental variance model’). Furthermore, it is reasonable that, if an environment is good for one genotype, it may also be good for other genotypes, so that yields in each environment are correlated, although the correlations can be different for each couple of genotypes. To reflect our expectations, the \(\Omega\) matrix needs to be totally unstructured, with the only constraint that it is positive definite.

Piepho (1999) has shown how the above model can be coded by using SAS and I translated his code into R.

EnvVarMod <- lme(Yield ~ Genotype, random = list(Year = pdSymm(~Genotype - 1), Year = pdIdent(~Block - 1)), control = list(opt = "optim", maxIter = 100), data=WinterWheat) VarCorr(EnvVarMod) ## Variance StdDev Corr ## Year = pdSymm(Genotype - 1) ## GenotypeCOLOSSEO 0.48876512 0.6991174 GCOLOS GCRESO GDUILI ## GenotypeCRESO 0.70949309 0.8423141 0.969 ## GenotypeDUILIO 2.37438440 1.5409038 0.840 0.840 ## GenotypeGRAZIA 1.18078525 1.0866394 0.844 0.763 0.942 ## GenotypeIRIDE 1.23555204 1.1115539 0.857 0.885 0.970 ## GenotypeSANCARLO 0.93335518 0.9661031 0.928 0.941 0.962 ## Year = pdIdent(Block - 1) ## Block1 0.02748257 0.1657787 ## Block2 0.02748257 0.1657787 ## Block3 0.02748257 0.1657787 ## Residual 0.12990355 0.3604214 ## ## Year = ## GenotypeCOLOSSEO GGRAZI GIRIDE ## GenotypeCRESO ## GenotypeDUILIO ## GenotypeGRAZIA ## GenotypeIRIDE 0.896 ## GenotypeSANCARLO 0.884 0.942 ## Year = ## Block1 ## Block2 ## Block3 ## Residual

I coded the random effects as a list, by using the ‘Year’ as the nesting factor (Galecki and Burzykowski, 2013). In order to specify a totally unstructured variance-covariance matrix for the genotypes within years, I used the ‘pdMat’ construct ‘pdSymm()’. This model is rather complex and may take long to converge.

The environmental variances are retrieved by the following code:

envVar <- as.numeric ( VarCorr(EnvVarMod)[2:7,1] ) envVar ## [1] 0.4887651 0.7094931 2.3743844 1.1807853 1.2355520 0.9333552

while the correlations are given by:

VarCorr(EnvVarMod)[2:7,3:7] ## Corr ## GenotypeCOLOSSEO "GCOLOS" "GCRESO" "GDUILI" "GGRAZI" "GIRIDE" ## GenotypeCRESO "0.969" "" "" "" "" ## GenotypeDUILIO "0.840" "0.840" "" "" "" ## GenotypeGRAZIA "0.844" "0.763" "0.942" "" "" ## GenotypeIRIDE "0.857" "0.885" "0.970" "0.896" "" ## GenotypeSANCARLO "0.928" "0.941" "0.962" "0.884" "0.942" Unweighted two-stage fitting

In his original paper, Piepho (1999) also gave SAS code to analyse the means of the ‘genotype x environment’ combinations. Indeed, agronomists and plant breeders often adopt a two-steps fitting procedure: in the first step, the means across blocks are calculated for all genotypes in all environments. In the second step, these means are used to fit an environmental variance model. This two-step process is less demanding in terms of computer resources and it is correct whenever the experiments are equireplicated, with no missing ‘genotype x environment’ combinations. Furthermore, we need to be able to assume similar variances within all experiments.

I would also like to give an example of this two-step analysis method. In the first step, we can use the ‘ddply()’ function in the package ‘plyr’:

#First step WinterWheatM <- ddply(WinterWheat, c("Genotype", "Year"), function(df) c(Yield = mean(df$Yield)) )

Once we have retrieved the means for genotypes in all years, we can fit the following model:

\[y_{ij} = \mu + g_i + a_{ij}\]

where \(y_{ij}\) is the mean yield for the \(i\)-th genotype in the \(j\)-th environment and \(a_{ij}\) is the residual term, which includes the genotype x environment random interaction, the block x environment random interaction and the residual error term.

In this model we have only one random effect (\(a_{ij}\)) and, therefore, this is a fixed linear model. However, we need to model the variance-covariance matrix of residuals (\(R\)), by adopting a totally unstructured form. Please, note that, when working with raw data, we have modelled \(\Omega\), i.e. the variance-covariance matrix for the random effects. I have used the ‘gls()’ function, together with the ‘weights’ and ‘correlation’ arguments. See the code below.

#Second step envVarModM <- gls(Yield ~ Genotype, data = WinterWheatM, weights = varIdent(form=~1|Genotype), correlation = corSymm(form=~1|Year)) summary(envVarModM) ## Generalized least squares fit by REML ## Model: Yield ~ Genotype ## Data: WinterWheatM ## AIC BIC logLik ## 80.6022 123.3572 -13.3011 ## ## Correlation Structure: General ## Formula: ~1 | Year ## Parameter estimate(s): ## Correlation: ## 1 2 3 4 5 ## 2 0.947 ## 3 0.809 0.815 ## 4 0.816 0.736 0.921 ## 5 0.817 0.866 0.952 0.869 ## 6 0.888 0.925 0.949 0.856 0.907 ## Variance function: ## Structure: Different standard deviations per stratum ## Formula: ~1 | Genotype ## Parameter estimates: ## COLOSSEO CRESO DUILIO GRAZIA IRIDE SANCARLO ## 1.000000 1.189653 2.143713 1.528848 1.560620 1.356423 ## ## Coefficients: ## Value Std.Error t-value p-value ## (Intercept) 6.413333 0.2742314 23.386574 0.0000 ## GenotypeCRESO -0.439524 0.1107463 -3.968746 0.0003 ## GenotypeDUILIO 0.178571 0.3999797 0.446451 0.6579 ## GenotypeGRAZIA -0.330952 0.2518270 -1.314205 0.1971 ## GenotypeIRIDE 0.281905 0.2580726 1.092347 0.2819 ## GenotypeSANCARLO -0.192857 0.1802547 -1.069915 0.2918 ## ## Correlation: ## (Intr) GCRESO GDUILI GGRAZI GIRIDE ## GenotypeCRESO 0.312 ## GenotypeDUILIO 0.503 0.371 ## GenotypeGRAZIA 0.269 -0.095 0.774 ## GenotypeIRIDE 0.292 0.545 0.857 0.638 ## GenotypeSANCARLO 0.310 0.612 0.856 0.537 0.713 ## ## Standardized residuals: ## Min Q1 Med Q3 Max ## -2.0949678 -0.5680656 0.1735444 0.7599596 1.3395000 ## ## Residual standard error: 0.7255481 ## Degrees of freedom: 42 total; 36 residual

The variance-covariance matrix for residuals can be obtained using the ‘getVarCov()’ function in the ‘nlme’ package, although I had to discover that there is a small buglet there, which causes problems in some instances (such as here). Please, see this link; I have included the correct code in the ‘getVarCov.gls()’ function in the ‘aomisc’ package, that is the companion package for this blog.

R <- getVarCov.gls(envVarModM) R ## Marginal variance covariance matrix ## [,1] [,2] [,3] [,4] [,5] [,6] ## [1,] 0.52642 0.59280 0.91285 0.65647 0.67116 0.63376 ## [2,] 0.59280 0.74503 1.09440 0.70422 0.84652 0.78560 ## [3,] 0.91285 1.09440 2.41920 1.58850 1.67700 1.45230 ## [4,] 0.65647 0.70422 1.58850 1.23040 1.09160 0.93442 ## [5,] 0.67116 0.84652 1.67700 1.09160 1.28210 1.01070 ## [6,] 0.63376 0.78560 1.45230 0.93442 1.01070 0.96855 ## Standard Deviations: 0.72555 0.86315 1.5554 1.1093 1.1323 0.98415

As the design is perfectly balanced, the diagonal elements of the above matrix correspond to the variances of genotypes across environments:

tapply(WinterWheatM$Yield, WinterWheatM$Genotype, var) ## COLOSSEO CRESO DUILIO GRAZIA IRIDE SANCARLO ## 0.5264185 0.7450275 2.4191624 1.2304397 1.2821143 0.9685497

which can also be retreived by the ‘stability’ package:

library(stability) ## Registered S3 methods overwritten by 'lme4': ## method from ## cooks.distance.influence.merMod car ## influence.merMod car ## dfbeta.influence.merMod car ## dfbetas.influence.merMod car envVarStab <- stab_measures( .data = WinterWheatM, .y = Yield, .gen = Genotype, .env = Year ) envVarStab$StabMeasures ## # A tibble: 6 x 7 ## Genotype Mean GenSS Var CV Ecov ShuklaVar ## ## 1 COLOSSEO 6.41 3.16 0.526 11.3 1.25 0.258 ## 2 CRESO 5.97 4.47 0.745 14.4 1.01 0.198 ## 3 DUILIO 6.59 14.5 2.42 23.6 2.31 0.522 ## 4 GRAZIA 6.08 7.38 1.23 18.2 1.05 0.208 ## 5 IRIDE 6.70 7.69 1.28 16.9 0.614 0.0989 ## 6 SANCARLO 6.22 5.81 0.969 15.8 0.320 0.0254

Strictly speaking, those variances are not the environmental variances, as they also contain the within-experiment and within block random variability, which needs to be separately estimated during the first step.

Thanks for reading!

Andrea Onofri
Department of Agricultural, Food and Environmental Sciences
University of Perugia (Italy)

#References

  • Gałecki, A., Burzykowski, T., 2013. Linear mixed-effects models using R: a step-by-step approach. Springer, Berlin.
  • Muhammad Yaseen, Kent M. Eskridge and Ghulam Murtaza (2018). stability: Stability Analysis of Genotype by Environment Interaction (GEI). R package version 0.5.0. https://CRAN.R-project.org/package=stability
  • Piepho, H.-P., 1999. Stability Analysis Using the SAS System. Agronomy Journal 91, 154–160.
  • Pinheiro, J.C., Bates, D.M., 2000. Mixed-Effects Models in S and S-Plus, Springer-Verlag Inc. ed. Springer-Verlag Inc., New York.
var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R on The broken bridge between biologists and statisticians. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Referring to POTUS on Twitter: a stance-based perspective on variation in the 116th House

Tue, 20/08/2019 - 02:00

[This article was first published on Jason Timm, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


In this post, we investigate how (& how often) members of the 116th House of Representatives refer to the 45th president of the United States on Twitter. TRUMP, POTUS, PRESIDENT TRUMP, @realDonaldTrump — options abound. Here, we consider how a House Rep’s stance towards (or opinion of) 45 influences the choice of referring expression, as well as how this stance aligns with the popularity of 45 in a House Rep’s congressional district.

A fully reproducible, R-based code-through.

A very brief introduction

Most linguistic variation is riddled with nuanced meaning, the source of which is often some type of socio-cultural value (Du Bois 2007). In the case of variation in reference, one dimension of this socio-cultural value is status. While “President Donald Trump” and “Donald Trump” point to the same referent, the former emphasizes the status of 45 as POTUS, while the latter downplays this status (Berg et al. 2019).

Similarly, “Mr. Trump” is a more deferential referring expression than “Trump”. We know this as speakers of English because of social convention: we refer to folks higher up in the food chain in different (ie, more formal) ways. A simple formality cline is presented below:

  1. First name only < Last name only < Full name < Title and last name < Title and full name (Berg et al. 2019)

As a speaker, I can abide by this convention when referring to an elder/boss/POTUS/etc (by using forms towards the right of the cline), or I can flout it (by using forms to the left). In either case, I (theoretically) communicate my stance towards the referent to my audience.

In the case of a tweeting House Rep, this audience is their Twitter following (ie, ~their constituency). And if a House Rep is stancetaking when referring to 45 on Twitter, presumably how this audience feels about 45 mediates the polarity of the House Rep’s stance. Another, presumably safer, option would be to not refer to 45 at all. This is what we investigate here.

Some open source data sets library(tidyverse) Legislators & vote margins

We first grab some data/details about the 116th House of Representatives from a few online sources. For House Rep names, congressional districts, and twitter handles, we use a data set made available by the @unitedstates project. The project is a fantastic resource; maintained by folks from GovTrack, ProPublica, MapLight & FiveThirtyEight.

leg_dets <- 'https://theunitedstates.io/congress-legislators/legislators-current.csv' house_meta <- read.csv((url(leg_dets)), stringsAsFactors = FALSE) %>% filter(type == 'rep' & twitter!='') %>% # select(type, bioguide_id, icpsr_id, last_name, state, district, party, twitter) %>% mutate(district = ifelse(district == 0, 'AL', district), CD = paste0(state, '-', stringr::str_pad(district, 2, pad = '0')), twitter = toupper(twitter))

For Trump vote margins by congressional district, we utilize a data set made available by the DailyKos.

url <- 'https://docs.google.com/spreadsheets/d/1zLNAuRqPauss00HDz4XbTH2HqsCzMe0pR8QmD1K8jk8/edit#gid=0' margins_by_cd <- read.csv(text=gsheet::gsheet2text(url, format='csv'), skip = 1, stringsAsFactors=FALSE) %>% mutate(trump_margin = Trump - Clinton) %>% select(CD, trump_margin) Tweets: 116th House of Representatives

Next we gather tweets for members of the 116th House of Representatives using the rtweet package. Members took office on January 3, 2019, so we filter tweets to post-January 2. We also exclude retweets. (Last tweets collected on 8-19-19).

congress_tweets <- rtweet::get_timeline( house_meta$twitter, n = 2000, check=FALSE) %>% mutate(created_at = as.Date(gsub(' .*$', '', created_at))) %>% filter(is_quote == 'FALSE' & is_retweet == 'FALSE' & created_at > '2019-01-02' & display_text_width > 0) setwd("/home/jtimm/jt_work/GitHub/x_politico") #saveRDS(congress_tweets_tif, 'congress_tweets_tif.rds') saveRDS(congress_tweets, 'congress_tweets_tif.rds')

Then we join the Twitter and House lawmaker detail data sets:

congress_tweets <- congress_tweets %>% mutate(twitter = toupper(screen_name)) %>% select(status_id, created_at, twitter, text) %>% inner_join(house_meta %>% filter(type == 'rep'))

For a high level summary of how often members of 116th House have been tweeting since taking office, we summarize total tweets by House Rep. The density plot below summarizes the distribution of House Reps’ tweeting habits by party affiliation. So, Democrats (in blue) a bit more active on Twitter.

total_tweets <- congress_tweets %>% group_by(party, twitter) %>% summarize(all_tweets = n()) total_tweets %>% ggplot( aes(all_tweets, fill = party)) + ggthemes::scale_fill_stata() + theme_minimal()+ geom_density(alpha = 0.8, color = 'gray')+ labs(title="116th House Rep tweet counts by party affiliation")+ theme(legend.position = "none")

Some additional summary statistics about the tweeting habits of House Reps by party affiliation:

x <- list( 'REP' = summary(total_tweets$all_tweets[total_tweets$party == 'Republican']), 'DEM' = summary(total_tweets$all_tweets[total_tweets$party == 'Democrat'])) cbind(party =names(x$DEM), x%>% bind_rows()) %>% mutate(REP = round(REP), DEM = round(DEM))%>% t(.) %>% knitr::kable() party Min. 1st Qu. Median Mean 3rd Qu. Max. REP 5 143 230 272 360 1466 DEM 23 278 404 463 591 1624 Extracting referring expressions to 45

With tweets and some legislator details in tow, we can now get a beat on how members of the 116th House refer to POTUS 45 on Twitter. Here we present a quick-simple approach to extracting Twitter-references to 45.

The code below summarizes the set of 45 referring expressions (in regex terms) that will be our focus here. It is not exhaustive. The list is ultimately a product of some trial/error, with less frequent forms being culled in the process (eg, #45). We have included “Trump Administration” in this set; while not exactly a direct reference to 45, it is super frequent and (as we will see) an interesting example.

s1 <- "Trump Admin(istration)?" s2 <- '@realDonaldTrump' s3 <- '(@)?POTUS' s4 <- 'Mr(\\.)? President' s5 <- "the president" s6 <- '(Pres(\\.)? |President )?(Donald )?\\bTrump' searches <- c(s1, s2, s3, s4, s5, s6) potus <- paste(searches, collapse = '|')

The procedure below extracts instantiations of the regex terms/patterns above from each tweet in our corpus.

potus_sum <- lapply(1:nrow(congress_tweets), function(x) { spots <- gregexpr(pattern = potus, congress_tweets$text[x], ignore.case=TRUE) prez_gram <- regmatches(congress_tweets$text[x], spots)[[1]] if (-1 %in% spots){} else { data.frame(doc_id = congress_tweets$status_id[x], twitter = congress_tweets$twitter[x], prez_gram = toupper(prez_gram), stringsAsFactors = FALSE)} }) %>% data.table:::rbindlist() %>% mutate(prez_gram = trimws(prez_gram), prez_gram = gsub('\\.', '', prez_gram), prez_gram = gsub('ADMIN$', 'ADMINISTRATION', prez_gram), prez_gram = gsub('PRES ', 'PRESIDENT ', prez_gram), prez_gram = gsub('@', '', prez_gram)) %>% left_join(house_meta)

A sample of the output is presented below.

set.seed(149) potus_sum %>% select(doc_id:prez_gram) %>% sample_n(5) %>% knitr::kable() doc_id twitter prez_gram 1126141623803568128 REPMATTGAETZ REALDONALDTRUMP 1110977855561838593 REPJOEKENNEDY TRUMP ADMINISTRATION 1110615484335034373 REPFRANKLUCAS THE PRESIDENT 1131334181815017472 REPTEDLIEU POTUS 1136024493049163777 REPWILSON REALDONALDTRUMP

Based on the above output, the table below summarizes the frequency of expressions used to reference 45 by party affiliation.

data.frame(table(potus_sum$party, potus_sum$prez_gram)) %>% spread(Var1, Freq) %>% rename(prez_gram = Var2) %>% rowwise() %>% mutate(Total = sum (Democrat, Republican)) %>% arrange(desc(Total)) %>% janitor::adorn_totals(c('row')) %>% #Cool. knitr::kable() prez_gram Democrat Republican Total TRUMP 6818 375 7193 THE PRESIDENT 3816 890 4706 REALDONALDTRUMP 2015 1989 4004 POTUS 1023 1802 2825 TRUMP ADMINISTRATION 2497 133 2630 PRESIDENT TRUMP 1649 776 2425 DONALD TRUMP 224 23 247 MR PRESIDENT 195 42 237 PRESIDENT DONALD TRUMP 15 12 27 Total 18252 6042 24294 Party-level stance towards 45

Based on the counts above, we next investigate potential evidence of stancetaking at the party level. Here, we assume that Reps are supportive of 45 and that Dems are less supportive. If House Reps are stancetaking on Twitter, we would expect that Democrats use less formal terms to downplay the status of 45 & that Republicans use more formal terms to highlight the status of 45.

To get a sense of which terms are more prevalent among each party, we consider the probability of each party using a particular expression to refer to 45. Then we calculate the degree of formality for a given expression as the simple ratio of the two usage rates – where the higher rate is treated as the numerator. Terms prevalent among Democrats are transformed to negative values.

The table below summarizes these ratios, which can be interpreted as follows: Reps are ~5.3 times more likely than Dem colleagues to refer to 45 on Twitter as POTUS; Dems are ~6 times more likely to refer to 45 as Trump.

ratios <- potus_sum %>% group_by(party, prez_gram) %>% summarize(n = n()) %>% group_by(party) %>% mutate(per = round(n/sum(n), 3))%>% group_by(prez_gram) %>% mutate(n = sum(n)) %>% spread(party, per) %>% mutate(ratio = ifelse(Republican > Democrat, Republican/Democrat, -Democrat/Republican), ratio = round(ratio, 2)) %>% filter(n > 60) %>% select(-n) %>% arrange(desc(ratio)) ratios %>% knitr::kable() prez_gram Democrat Republican ratio POTUS 0.056 0.298 5.32 REALDONALDTRUMP 0.110 0.329 2.99 PRESIDENT TRUMP 0.090 0.128 1.42 THE PRESIDENT 0.209 0.147 -1.42 MR PRESIDENT 0.011 0.007 -1.57 DONALD TRUMP 0.012 0.004 -3.00 TRUMP 0.374 0.062 -6.03 TRUMP ADMINISTRATION 0.137 0.022 -6.23

The visualization below summarizes formality ratios for 45 referring expressions as a simple cline. Less formal terms (prevalent among Democrats) are in blue; More formal terms (prevalent among Republicans) are in red.

#cut <- 1.45 ratios %>% mutate(col1 = ifelse(ratio>0, 'red', 'blue')) %>% ggplot(aes(x=reorder(prez_gram, ratio), y=ratio, label=prez_gram, color = col1)) + # geom_hline(yintercept = cut, # linetype = 2, color = 'gray') + # geom_hline(yintercept = -cut, # linetype = 2, color = 'gray') + geom_point(size= 1.5, color = 'darkgray') + geom_text(size=4, hjust = 0, nudge_y = 0.15)+ annotate('text' , y = -5, x = 7, label = 'Democrat') + annotate('text' , y = 5, x = 3, label = 'Republican') + ggthemes::scale_color_stata() + theme_minimal() + labs(title="Twitter-based formality cline") + ##? theme(legend.position = "none", axis.text.y=element_blank(), axis.ticks.y=element_blank())+ xlab('') + ylab('Polarity')+ ylim(-7, 7) + coord_flip()

So, some real nice variation. Recall our initial (& very generic) formality cline presented in the introduction:

  1. First name only < Last name only < Full name < Title and last name < Title and full name

Compared to our House Rep, Twitter-based, 45-specific cline:

  1. Trump Administration < Trump < Donald Trump < Mr. President < The President < President Trump < realDonaldTrump < POTUS

While alignment between (1) & (2) is not perfect, the two are certainly conceptually comparable, indeed suggesting that House Reps are choosing expressions to refer to 45 based on stance. Terms prevalent among House Dems downplay the status of 45 by excluding titles & explicit reference to the office (eg, TRUMP, DONALD TRUMP). In contrast, terms prevalent among Republicans highlight the status of 45 via direct reference to the office (eg, PRESIDENT TRUMP, POTUS). More neutral terms (eg, MR PRESIDENT, THE PRESIDENT) reference the office but not the individual.

While the Twitter handle @realDonaldTrump does not highlight the status of the presidency per se, it would seem to carry with it some Twitter-based deference. (I imagine the “real-” prefix is also at play here.) The prevalence of the acronym POTUS among Reps is interesting as well. On one hand, it is super economical; on the other hand, the acronym unpacked is arguably the most deferential. The prevalence of Trump Administration among Dems is also curious – it would seem to be a way to reference 45 without actually referencing (or conjuring images of) either the individual or the office.

House Rep stance & 2016 presidential vote margins

The next, and more interesting, piece is how stancetaking plays out at the House Rep level. While the formality cline presented above illustrates some clear divisions between how Dems and Reps refer to the president, its gradient nature speaks to individual variation.

In this section, we (1) present a simple method for quantifying House Rep-level variation in formality when referring to 45, and (2) investigate the extent to which district-level support for 45 in the 2016 presidential election can account for this variation.

ratios <- ratios %>% mutate(polarity = case_when( ratio > 1.4 ~ 'Formal', ratio < -2.5 ~ 'LessFormal', ratio > -2.5 & ratio < 1.4 ~ 'Neutral'))

To get started, we first categorize each reference to 45 in our data set as either Formal (POTUS, REALDONADTRUMP, PRESIDENT TRUMP), Less Formal (TRUMP ADMINISTRATION, TRUMP, DONALD TRUMP), or Neutral (MR TRUMP, THE PRESIDENT). Reference to 45 (per legislator) is then represented as a (count-based) distribution across these three formality categories.

wide <- potus_sum%>% filter(prez_gram %in% unique(ratios$prez_gram)) %>% left_join(ratios %>% select(prez_gram, polarity)) %>% group_by(CD) %>% mutate(prez_tweets = length(unique(doc_id)))%>% group_by(CD, twitter, last_name, party, polarity, prez_tweets) %>% summarize(n = n())

Formality distributions for a random set of House Reps are summarized in the plot below. So, lots of variation – and presumably 435 House Reps that refer to 45 with varying degrees of formality.

set.seed(171) samp <- sample(margins_by_cd$CD, 10) pal <- c('#395f81', 'gray', '#9e5055') names(pal) <- c('LessFormal', 'Neutral', 'Formal') wide %>% filter(CD %in% samp) %>% group_by(CD) %>% mutate(per = n/sum(n))%>% select(-n) %>% spread(polarity, per) %>% ungroup() %>% mutate(rank = rank(Formal), lab = paste0(last_name, ' (', CD, '-', substr(party, 1,1), ')')) %>% gather(key = polarity, value = per, LessFormal, Formal, Neutral)%>% mutate(polarity = factor(polarity, levels = c('LessFormal', 'Neutral', 'Formal'))) %>% ggplot (aes(x = reorder(lab, rank), y = per, fill = polarity)) + geom_bar(position = "fill", stat = "identity") + coord_flip()+ theme_minimal()+ theme(legend.position = 'bottom', axis.title.y=element_blank()) + scale_fill_manual(values = pal) + ggtitle('Example degrees of formality in the 116th House')

Based on these distributions, we define a given House Rep’s degree of formality as the (log) ratio of the number of formal terms used to refer to 45 to the number of less formal terms used to refer to 45. Neutral terms are ignored.

Values greater than one indicate a prevalence for referring expressions that highlight the status of 45; values less than one indicate a prevalence for referring expressions that downplay the status of 45. The former reflecting a positive/supportive stance; the latter a negative/less supportive stance. A relative & rough approximation.

wide1 <- wide %>% group_by(CD) %>% mutate(prez_refs = sum(n)) %>% spread(polarity, n) %>% ungroup() %>% replace(., is.na(.), 1) %>% mutate(ratio = round(Formal/LessFormal, 3)) %>% inner_join(margins_by_cd)%>% left_join(total_tweets)

So, to what extent does a congressional district’s collective support for 45 (per 2016 Trump margins) influence the degree of formality with which their House Rep refers to 45? Do House Reps representing districts that supported HRC in 2016, for example, use less formal terms to convey a negative stance towards 45, and mirror the sentiment of their constituents (ie, their ~Twitter followers & ~audience)?

The plot below illustrates the relationship between House Reps’ degrees of formality on Twitter & 2016 presidential vote margins for their respective congressional districts. As can be noted, a fairly strong, positive relationship between the two variables.

wide1 %>% filter(prez_tweets > 10) %>% ggplot(aes(x = trump_margin, y = log(jitter(ratio)), color = party) ) + geom_point()+ geom_smooth(method="lm", se=T, color = 'steelblue')+ geom_text(aes(label=last_name), size=3, check_overlap = TRUE, color = 'black')+ ggthemes::scale_color_stata()+ theme_minimal()+ theme(legend.position = "none", axis.title = element_text())+ xlab('2016 Trump Vote Margin') + ylab('Degree of Formality')+ ggtitle('2016 Trump Margins vs. Degree of Formality on Twitter')

So, not only are there systematic differences in how Dems & Reps reference 45 on Twitter, these differences are gradient within/across party affiliation: formality in reference to 45 increases as 2016 Trump margins increase. House Reps are not only hip to how their constituents (the audience) feel about 45 (the referent), but they choose referring expressions (and mediate stance) accordingly.

Prevalence of 45 reference

Next we consider how often members of the 116th House reference 45 on Twitter, which we operationalize here as the percentage of a House Rep’s total tweets that include reference to 45.

wide2 <- wide1 %>% mutate(party = gsub('[a-z]', '', party), trump_margin = round(trump_margin,1), per_prez = round(prez_tweets/all_tweets, 2)) %>% select(CD, last_name, party, per_prez, all_tweets, trump_margin) %>% arrange(desc(per_prez))

The density plot below summarizes the distribution of these percentages by party affiliation. A curious plot indeed. The bimodal nature of the House Dem distribution sheds light on two distinct approaches to Twitter & 45 among House Dems. One group that takes a bit of a “no comment” approach and another in which reference to 45 is quite prevalent.

wide2 %>% ggplot( aes(per_prez, fill = party)) + ggthemes::scale_fill_stata() + theme_minimal()+ geom_density(alpha = 0.8, color = 'gray')+ labs(title="Rates of reference to 45 on Twitter")+ theme(legend.position = "none")

The table below summarizes 45 tweet reference rates for members of the 116th House, along with total tweets & 2016 Trump vote margins for some context. Lots going on for sure. Curious to note that Maxine Waters (CA-43) and Adam Schiff (CA-28) reference 45 on Twitter at the highest rates, despite being fairly infrequent tweeters in general. Almost as if they use Twitter for the express purpose of commenting on the president and/or defending themselves from the president’s Twitter-ire.

out <- wide2 %>% DT::datatable(extensions = 'FixedColumns', options = list(scrollX = TRUE, fixedColumns = list(leftColumns = 1:3)), rownames =FALSE, width="450") %>% DT::formatStyle('per_prez', background = DT::styleColorBar(wide2$per_prez, "lightblue"), backgroundSize = '80% 70%', backgroundRepeat = 'no-repeat', backgroundPosition = 'right')

Rates of 45-reference, total tweets & 2016 Trump margins for members of the 116th House:

{"x":{"filter":"none","extensions":["FixedColumns"],"data":[["CA-43","CA-28","AZ-03","LA-03","MD-04","NC-11","OR-04","TN-09","CA-19","OH-04","CA-27","VA-08","CA-14","CA-02","TX-09","FL-21","NJ-09","AZ-05","NY-17","TX-04","NC-04","RI-01","TX-20","TX-35","AZ-07","CA-13","FL-01","FL-10","LA-01","MD-05","PA-03","CA-12","CA-47","CO-01","IL-01","MN-04","NJ-06","OH-09","WI-02","CA-08","MA-09","MI-04","IL-09","NY-08","TX-15","CA-31","CA-51","FL-02","FL-23","MO-05","NY-13","TX-36","VA-11","WI-04","CA-11","CA-15","CA-29","CA-33","CA-35","CA-37","CA-50","CA-53","IL-02","ME-01","MI-09","NY-18","SC-07","TX-27","TX-34","CA-30","CA-32","FL-20","IL-11","MA-05","NJ-10","NJ-12","NY-05","NY-07","OH-13","OK-01","TX-30","WA-09","CA-40","CA-41","CA-46","CO-07","IL-04","IL-05","KY-03","MI-14","NM-03","NY-16","TN-04","TN-07","TX-12","TX-16","CA-09","CA-34","CT-03","FL-11","FL-22","MA-02","MI-05","NY-10","NY-20","PA-02","TX-14","TX-18","TX-19","AL-04","CA-18","CA-39","CO-05","IN-07","MA-04","MI-13","PA-18","SC-03","TX-05","VT-AL","GA-08","MI-01","OH-03","PA-16","TX-26","WV-02","CA-06","CA-24","CA-48","CA-49","GA-04","GA-10","IN-08","LA-04","MD-02","MD-07","MD-08","NY-15","WA-07","CA-03","CA-17","CA-38","FL-04","FL-14","FL-19","GA-01","GA-09","LA-05","MO-08","NV-01","OK-02","TX-25","AR-01","CA-04","IA-04","KS-04","NC-01","NJ-01","NY-01","NY-06","OH-07","TX-11","WI-07","AL-05","AZ-02","CA-23","CA-52","FL-03","FL-09","IN-03","MD-01","MD-03","MD-06","MO-06","NH-02","OH-08","OR-01","PA-14","SC-05","TN-03","TX-06","TX-08","TX-33","WI-05","AZ-04","CT-04","FL-26","GA-11","GA-12","MO-03","NY-04","NY-12","NY-23","RI-02","TX-02","WA-04","WA-10","AL-01","AL-07","CO-02","FL-25","FL-27","IL-10","NY-02","NY-09","NY-25","OK-04","PA-04","PA-05","PA-13","TN-01","WA-01","AR-04","CA-21","CA-45","FL-16","FL-24","GA-02","GA-07","IL-18","MN-05","NC-06","NC-07","NJ-08","SC-06","TN-06","TX-01","VA-10","WY-AL","CA-01","CA-44","FL-17","IL-06","IL-12","IL-17","IN-02","MS-03","MS-04","NC-08","NM-02","TX-28","UT-02","WV-03","CA-07","IL-16","KY-05","MA-08","MO-04","MS-02","NC-10","NY-26","PA-11","SD-AL","TN-05","TN-08","TX-21","VA-01","VA-03","VA-04","WA-02","AK-AL","CA-16","CA-22","CT-02","GA-14","HI-02","IL-08","IN-06","ME-02","MN-01","NE-03","NH-01","OH-06","SC-02","TX-10","TX-29","VA-06","AL-02","AL-06","AZ-08","AZ-09","CA-25","CO-04","FL-06","IL-03","KY-04","LA-02","MA-01","MA-06","MI-08","NC-05","NC-12","NC-13","ND-AL","NM-01","NV-04","OR-02","PA-09","TX-31","AL-03","AR-02","CA-20","CA-42","CO-06","DE-AL","FL-07","FL-13","GA-05","IA-02","IL-13","KS-02","KY-02","KY-06","MA-03","MI-10","MN-03","NE-01","NJ-07","NY-27","OH-10","OH-11","OH-14","OK-03","PA-06","PA-07","PA-08","PA-15","SC-04","TX-13","TX-17","TX-32","WA-05","WI-08","AR-03","CA-05","CA-36","CT-01","FL-15","FL-18","GA-03","IA-03","ID-02","IL-07","LA-06","MI-12","MN-08","NE-02","NY-14","NY-21","OH-15","OR-05","PA-10","UT-03","WV-01","CO-03","FL-05","FL-12","GA-06","IA-01","ID-01","IL-14","IN-05","MA-07","MI-06","MI-07","MI-11","MO-02","MO-07","NY-03","NY-24","TN-02","TX-22","CT-05","HI-01","KS-03","MI-02","MN-06","MS-01","NC-02","NJ-02","NJ-11","NV-03","NY-11","NY-22","OH-02","OH-05","OH-16","SC-01","TX-03","TX-07","TX-23","UT-04","VA-02","VA-05","VA-07","WA-03","WA-08","WI-03","WI-06","AZ-01","AZ-06","CA-10","GA-13","IN-04","MN-02","NJ-03","NJ-05","NY-19","OH-12","OK-05","PA-01","PA-17","VA-09","WA-06","WI-01"],["Waters","Schiff","Grijalva","Higgins","Brown","Meadows","DeFazio","Cohen","Lofgren","Jordan","Chu","Beyer","Speier","Huffman","Green","Frankel","Pascrell","Biggs","Lowey","Ratcliffe","Price","Cicilline","Castro","Doggett","Gallego","Lee","Gaetz","Demings","Scalise","Hoyer","Evans","Pelosi","Lowenthal","DeGette","Rush","McCollum","Pallone","Kaptur","Pocan","Cook","Keating","Moolenaar","Schakowsky","Jeffries","Gonzalez","Aguilar","Vargas","Dunn","Wasserman Schultz","Cleaver","Espaillat","Babin","Connolly","Moore","DeSaulnier","Swalwell","Cárdenas","Lieu","Torres","Bass","Hunter","Davis","Kelly","Pingree","Levin","Maloney","Rice","Cloud","Vela","Sherman","Napolitano","Hastings","Foster","Clark","Payne","Watson Coleman","Meeks","Velázquez","Ryan","Hern","Johnson","Smith","Roybal-Allard","Takano","Correa","Perlmutter","García","Quigley","Yarmuth","Lawrence","Luján","Engel","DesJarlais","Green","Granger","Escobar","McNerney","Gomez","DeLauro","Webster","Deutch","McGovern","Kildee","Nadler","Tonko","Boyle","Weber","Jackson Lee","Arrington","Aderholt","Eshoo","Cisneros","Lamborn","Carson","Kennedy","Tlaib","Doyle","Duncan","Gooden","Welch","Scott","Bergman","Beatty","Kelly","Burgess","Mooney","Matsui","Carbajal","Rouda","Levin","Johnson","Hice","Bucshon","Johnson","Ruppersberger","Cummings","Raskin","Serrano","Jayapal","Garamendi","Khanna","Sánchez","Rutherford","Castor","Rooney","Carter","Collins","Abraham","Smith","Titus","Mullin","Williams","Crawford","McClintock","King","Estes","Butterfield","Norcross","Zeldin","Meng","Gibbs","Conaway","Duffy","Brooks","Kirkpatrick","McCarthy","Peters","Yoho","Soto","Banks","Harris","Sarbanes","Trone","Graves","Kuster","Davidson","Bonamici","Reschenthaler","Norman","Fleischmann","Wright","Brady","Veasey","Sensenbrenner","Gosar","Himes","Mucarsel-Powell","Loudermilk","Allen","Luetkemeyer","Rice","Maloney","Reed","Langevin","Crenshaw","Newhouse","Heck","Byrne","Sewell","Neguse","Diaz-Balart","Shalala","Schneider","King","Clarke","Morelle","Cole","Dean","Scanlon","Joyce","Roe","DelBene","Westerman","Cox","Porter","Buchanan","Wilson","Bishop","Woodall","LaHood","Omar","Walker","Rouzer","Sires","Clyburn","Rose","Gohmert","Wexton","Cheney","LaMalfa","Barragán","Steube","Casten","Bost","Bustos","Walorski","Guest","Palazzo","Hudson","Torres Small","Cuellar","Stewart","Miller","Bera","Kinzinger","Rogers","Lynch","Hartzler","Thompson","McHenry","Higgins","Smucker","Johnson","Cooper","Kustoff","Roy","Wittman","Scott","McEachin","Larsen","Young","Costa","Nunes","Courtney","Graves","Gabbard","Krishnamoorthi","Pence","Golden","Hagedorn","Smith","Pappas","Johnson","Wilson","McCaul","Garcia","Cline","Roby","Palmer","Lesko","Stanton","Hill","Buck","Waltz","Lipinski","Massie","Richmond","Neal","Moulton","Slotkin","Foxx","Adams","Budd","Armstrong","Haaland","Horsford","Walden","Meuser","Carter","Rogers","Hill","Panetta","Calvert","Crow","Blunt Rochester","Murphy","Crist","Lewis","Loebsack","Davis","Watkins","Guthrie","Barr","Trahan","Mitchell","Phillips","Fortenberry","Malinowski","Collins","Turner","Fudge","Joyce","Lucas","Houlahan","Wild","Cartwright","Thompson","Timmons","Thornberry","Flores","Allred","McMorris Rodgers","Gallagher","Womack","Thompson","Ruiz","Larson","Spano","Mast","Ferguson","Axne","Simpson","Davis","Graves","Dingell","Stauber","Bacon","Ocasio-Cortez","Stefanik","Stivers","Schrader","Perry","Curtis","McKinley","Tipton","Lawson","Bilirakis","McBath","Finkenauer","Fulcher","Underwood","Brooks","Pressley","Upton","Walberg","Stevens","Wagner","Long","Suozzi","Katko","Burchett","Olson","Hayes","Case","Davids","Huizenga","Emmer","Kelly","Holding","Van Drew","Sherrill","Lee","Rose","Brindisi","Wenstrup","Latta","Gonzalez","Cunningham","Taylor","Fletcher","Hurd","McAdams","Luria","Riggleman","Spanberger","Herrera Beutler","Schrier","Kind","Grothman","O’Halleran","Schweikert","Harder","Scott","Baird","Craig","Kim","Gottheimer","Delgado","Balderson","Horn","Fitzpatrick","Lamb","Griffith","Kilmer","Steil"],["D","D","D","R","D","R","D","D","D","R","D","D","D","D","D","D","D","R","D","R","D","D","D","D","D","D","R","D","R","D","D","D","D","D","D","D","D","D","D","R","D","R","D","D","D","D","D","R","D","D","D","R","D","D","D","D","D","D","D","D","R","D","D","D","D","D","R","R","D","D","D","D","D","D","D","D","D","D","D","R","D","D","D","D","D","D","D","D","D","D","D","D","R","R","R","D","D","D","D","R","D","D","D","D","D","D","R","D","R","R","D","D","R","D","D","D","D","R","R","D","R","R","D","R","R","R","D","D","D","D","D","R","R","R","D","D","D","D","D","D","D","D","R","D","R","R","R","R","R","D","R","R","R","R","R","R","D","D","R","D","R","R","R","R","D","R","D","R","D","R","R","D","D","R","D","R","D","R","R","R","R","R","D","R","R","D","D","R","R","R","D","D","R","D","R","R","D","R","D","D","R","D","D","R","D","D","R","D","D","R","R","D","R","D","D","R","D","D","R","R","D","R","R","D","D","R","R","D","R","R","D","R","D","R","D","R","R","R","R","D","D","R","R","D","R","R","D","R","D","R","D","R","R","D","R","R","R","D","D","D","R","D","R","D","R","D","D","R","D","R","R","D","R","R","R","D","R","R","R","R","D","D","R","R","D","R","D","D","D","D","R","D","R","R","D","D","R","R","R","R","R","D","R","D","D","D","D","D","D","R","R","R","R","D","R","D","R","D","R","R","D","R","R","D","D","D","R","R","R","R","D","R","R","R","D","D","D","R","R","R","D","R","D","R","D","R","R","D","R","R","D","R","R","R","R","D","R","D","D","R","D","R","D","R","R","D","R","R","D","R","R","R","D","D","D","R","R","R","R","D","D","D","D","D","R","R","R","D","R","D","R","D","D","R","D","R","D","D","R","D","R","D","D","R","D","D","D","D","R","D","R","D","R","D","R"],[0.6,0.55,0.42,0.38,0.38,0.36,0.36,0.36,0.34,0.34,0.33,0.33,0.32,0.31,0.31,0.3,0.3,0.29,0.29,0.29,0.28,0.28,0.28,0.28,0.27,0.27,0.27,0.27,0.27,0.27,0.27,0.26,0.26,0.26,0.26,0.26,0.26,0.26,0.26,0.25,0.25,0.25,0.24,0.24,0.24,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.21,0.21,0.21,0.21,0.21,0.21,0.21,0.21,0.21,0.21,0.21,0.21,0.21,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.19,0.19,0.19,0.19,0.19,0.19,0.19,0.19,0.19,0.19,0.19,0.19,0.19,0.18,0.18,0.18,0.18,0.18,0.18,0.18,0.18,0.18,0.18,0.18,0.17,0.17,0.17,0.17,0.17,0.17,0.16,0.16,0.16,0.16,0.16,0.16,0.16,0.16,0.16,0.16,0.16,0.16,0.16,0.15,0.15,0.15,0.15,0.15,0.15,0.15,0.15,0.15,0.15,0.15,0.15,0.15,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.12,0.12,0.12,0.12,0.12,0.12,0.12,0.12,0.12,0.12,0.12,0.12,0.12,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.09,0.09,0.09,0.09,0.09,0.09,0.09,0.09,0.09,0.09,0.09,0.09,0.09,0.09,0.08,0.08,0.08,0.08,0.08,0.08,0.08,0.08,0.08,0.08,0.08,0.08,0.08,0.08,0.08,0.08,0.08,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01],[114,185,615,121,432,226,109,781,87,312,330,387,586,102,213,502,505,1033,279,112,405,284,412,1045,267,1036,882,820,457,1269,617,745,678,449,309,708,1201,295,411,16,71,20,545,310,215,134,235,147,585,420,1235,221,309,389,835,370,241,297,508,350,201,171,554,430,498,537,215,40,23,332,224,326,262,327,381,497,308,1026,444,156,119,275,564,409,116,527,454,732,152,352,803,289,183,503,20,675,145,589,330,57,783,574,323,332,402,570,132,884,278,38,129,542,197,307,180,317,294,352,363,133,142,92,879,88,329,204,300,583,401,431,304,279,241,282,305,348,288,229,1237,521,1092,184,236,563,65,142,382,148,306,744,247,314,195,74,143,257,260,259,346,339,116,36,413,238,549,505,933,135,367,692,137,355,411,419,548,408,911,155,220,370,213,526,391,85,344,507,1624,240,172,140,278,1160,208,608,265,111,212,859,481,730,579,768,372,237,347,646,140,451,831,437,178,576,256,405,467,129,763,126,325,358,628,569,151,858,429,216,378,866,130,362,551,146,705,230,738,254,266,402,285,225,254,218,242,316,126,59,491,414,396,131,421,275,276,266,383,759,360,203,549,646,118,215,14,551,185,204,207,416,351,321,249,617,321,385,174,462,215,322,105,428,221,592,174,380,540,843,49,108,366,303,94,879,272,327,995,806,296,221,262,421,318,352,420,644,231,307,364,38,140,495,332,38,184,1110,417,479,223,195,225,128,291,239,172,704,529,111,337,73,41,402,436,407,418,545,964,301,357,203,498,179,388,89,114,141,623,261,1466,27,383,141,131,182,437,169,144,228,188,204,275,193,354,459,539,407,213,796,228,298,259,164,194,775,638,281,394,144,215,286,64,284,508,821,439,716,241,351,251,476,162,571,197,365,780,588,1095,171,210,169,551,692,253,515,80,170,592,343,963,815,230,251,367,185,100,542,544],[-61.7,-49.8,-29.9,38.1,-57.5,29.2,-0.1,-57.7,-51.4,33.6,-37.6,-52.6,-58.7,-45.7,-61.3,-19.5,-31.2,21.1,-20.2,53.6,-40,-25.6,-26.7,-33.6,-49.2,-80.6,39.3,-26.9,42,-32,-83.9,-77.5,-31.6,-45.8,-54,-30.9,-15.6,-22.2,-36.8,15.1,-10.7,24.8,-45.2,-71.1,-16.7,-21.1,-49,35.6,-26.1,-13.5,-86.9,46.8,-39.4,-52.2,-48.8,-45.7,-60.9,-41.3,-40.8,-76.1,15,-34.9,-58.9,-14.8,-7.8,1.9,18.9,23.6,-21.5,-43.4,-38.9,-62.1,-23.5,-43.6,-72.4,-33.2,-73,-76.5,-6.5,28.7,-60.8,-47.2,-69.4,-27.9,-38.4,-12,-68.9,-46.6,-15,-60.9,-15.1,-52.6,41.2,39.3,30.2,-40.7,-18.6,-72.9,-15.5,32.3,-15.8,-19.4,-4.2,-59.5,-13.5,-48,19.8,-56.5,49,63,-53.2,-8.6,24,-22.8,-24.2,-60.7,-27.5,38,28.4,-26.4,28.9,21.3,-38.4,20,26.5,36.4,-44.8,-20.2,-1.7,-7.5,-53.1,25.5,33.7,24,-24.4,-55.6,-34.4,-88.9,-69.9,-12.6,-53.4,-39.6,28,-18.2,22.1,15.5,58.5,29.4,54.4,-29,50.1,14.9,34.8,14.7,27.4,27.2,-37,-24.5,12.3,-33,29.7,58.7,20.4,33.4,-4.9,22,-22.5,16,-12.9,35,28.6,-30.5,-15.1,31.4,-2.4,34.5,-22.8,29,18.5,35.2,12.3,48.8,-49.2,20.1,40.2,-23,-16.3,25,16.2,39,-9.6,-69.8,14.8,-7.1,9.3,22.8,-11.4,29.4,-41.2,-21.3,1.8,-19.6,-29.4,9.1,-69.1,-16.4,37.4,-19.3,-28.2,45.7,57,-16.3,32.9,-15.5,-5.4,10.7,-67.5,-11.7,6.3,27.3,-55.2,14.7,17.7,-54.2,-36.5,48.9,46.9,-10,47.6,19.7,-70.7,27.2,-7,14.8,0.7,23.2,24.5,41.2,15,10.2,-19.8,14,49.2,-11.4,17.2,62.1,-26,36,-28.5,24.6,-19.6,25.8,29.8,-18.3,35.6,10,12.4,-31.7,-21.6,-22.1,15.2,-21.6,9.5,-2.9,52.9,-31.8,-21.7,40.3,10.3,14.9,54.9,1.6,42.6,17.7,9.1,-45.7,24.8,31.9,44.7,21.1,-16.3,-6.7,23.1,17,-15.3,35.9,-52.4,-20.7,-17.9,6.7,17.6,-40,9.4,36.4,-16.5,-4.9,20.1,34,12.7,33,10.7,-47.2,12,-8.9,-11.5,-7.3,-3.2,-73.1,4.1,5.5,18.4,39.9,15.3,-22.8,32.2,-9.4,21.3,-1.1,24.5,7.3,-63.5,11.5,52.7,-9.3,-1.1,9.6,43.3,25.7,63,17.5,-1.9,13.1,17.6,31.4,-44.9,-8.8,-23.1,10,9.2,31.5,3.5,24.7,-78.2,33.8,-26.3,15.6,2.2,-57.9,13.9,15.4,-4.2,8.9,23.9,41.6,12,-25.4,18.6,1.5,3.5,38.3,3.9,11.8,-72.2,8.4,17,4.4,10.3,45.7,-6.1,-3.6,35.4,7.9,-4.1,-32.6,-1.2,17.6,25.7,33,9.6,4.6,0.9,1,9.8,15.5,16.1,25.1,16.7,13.1,14.2,-1.4,-3.4,6.7,3.4,11.1,6.5,7.4,-3,4.5,16.9,1.1,10,-3,-44.4,34.1,1.2,6.2,1.1,6.8,11.3,13.4,-2,2.6,41.5,-12.3,10.3]],"container":"

\n

\n

\n

CD<\/th>\n

last_name<\/th>\n

party<\/th>\n

per_prez<\/th>\n

all_tweets<\/th>\n

trump_margin<\/th>\n <\/tr>\n <\/thead>\n<\/table>","options":{"scrollX":true,"fixedColumns":{"leftColumns":[1,2,3]},"columnDefs":[{"className":"dt-right","targets":[3,4,5]}],"order":[],"autoWidth":false,"orderClasses":false,"rowCallback":"function(row, data) {\nvar value=data[3]; $(this.api().cell(row, 3).node()).css({'background':isNaN(parseFloat(value)) || value <= 0.010000 ? '' : 'linear-gradient(90.000000deg, transparent ' + (0.600000 - value)/0.590000 * 100 + '%, lightblue ' + (0.600000 - value)/0.590000 * 100 + '%)','background-size':'80% 70%','background-repeat':'no-repeat','background-position':'right'});\n}"}},"evals":["options.rowCallback"],"jsHooks":[]}

Last question, then: to what extent does a congressional district’s collective support for 45 (per 2016 Trump margins) influence the rate at which House Reps reference 45 on Twitter?

The much talked about freshmen class of House Dems, for example, is largely comprised of folks from districts that supported Trump in 2016. As such, freshmen Dems are generally more centrist ideologically, representing districts with mixed feeling towards 45. Do they tend to play it safe on Twitter (and with their constituents), and keep the president’s name out of their Twitter mouths?

Per the plot below, this would seem to be the case (although freshmen Dems are not explicitly identified). Circe size reflects total tweet count. House members on both sides of the aisle representing districts with slimmer 2016 Trump margins reference 45 on Twitter at lower rates.

wide2 %>% ggplot(aes(x = trump_margin, y = per_prez, color = as.factor(party), size = all_tweets) ) + geom_point()+ geom_smooth(method = "lm", se = T)+ geom_text(aes(label=last_name), size=3, check_overlap = TRUE, color = 'black')+ ggthemes::scale_color_stata()+ theme_minimal()+ theme(legend.position = "none", axis.title = element_text())+ scale_y_continuous(limits = c(0,.4)) + xlab('2016 Trump Margin') + ylab('Reference-to-Trump Rate') + ggtitle('2016 Trump Margins vs. Reference-to-Trump Rates')

Seemingly a no-brainer if you don’t want to ruffle any feathers within an ideologically heterogeneous constituency, and if you want to fly under 45’s Twitter-radar. On the other hand, House Reps in safer (ie, ideologically more uniform) districts (especially Dems) are more likely to comment (or sound-off) on the doings of 45.

Summary

So, a couple of novel metrics for investigating variation with respect to the how & how often of 45-reference on Twitter in the 116th House. Simple methods (that could certainly be tightened up some) & intuitive results that align quite well with with linguistic/stance theory. Also some super interesting & robust relationships based in two very disparately-sourced data sets: 2016 Trump margins and Twitter text data (ca, present day).

The predictive utility of 2016 presidential voting margins seems (roughly) limitless. As does the cache of socio-political treasure hidden in the tweets of US lawmakers – for better or worse. A fully reproducible post. Cheers.

References

Berg, Esther van den, Katharina Korfhage, Josef Ruppenhofer, Michael Wiegand, and Katja Markert. 2019. “Not My President: How Names and Titles Frame Political Figures.” In Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science, 1–6.

Du Bois, John W. 2007. “The Stance Triangle.” Stancetaking in Discourse: Subjectivity, Evaluation, Interaction 164 (3): 139–82.

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: Jason Timm. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Correspondence Analysis visualization using ggplot

Tue, 20/08/2019 - 02:00

[This article was first published on Rcrastinate, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

What we want to do

Recently, I used a correspondence analysis from the ca package in a paper. All of the figures in the paper were done with ggplot. So, I wanted the visualization for the correspondence analysis to match the style of the other figures. The standard plot method plot.ca() however, produces base graphics plots. So, I had to create the ggplot visualization myself. Actually, I don’t know if there are any packages that take a ca object (created by the ca package) and produce ggplots from it. I found this website but it uses the FactoMineR/factoextra package to do and visualize the correspondence analysis.

So, off we go… let’s build our own ggplot-based visualization for ca objects.

Getting the data

I’m going to demonstrate this using data from a linguistic experiment. You could also use, for example, the HairEyeColor dataset that comes with R. In this case, you’ll have to select a specific sub-table, e.g. HairEyeColor[,,"Female"], to get a 2-dimensional table.

Let’s start by loading the data. You can get it from my Dropbox. It’s a 2-dimensional table with 3 rows and 7 columns. This was an association experiment in German and the task of the participants was to associate several items of three different linguistic constructions (rows) to different media or text types (columns). I will not deal with conceptual difference between media and text types here.

struc.assoc <- readRDS("LangStrucAssoc.Rds")

This is the table.

Text mess. Voice mess. Newspaper E-mail Soc.Netw. Letter Other V-final 157 125 114 190 112 147 23 V2 175 210 14 80 128 39 15 Ellipsis 293 128 6 43 152 12 12

I’ll briefly explain what the rows and columns mean. In the rows, there are three different constructions.

  • V-final: As you might know, in Standard German, the finite verb is put at the end of dependent subclauses. We presented “because”-clauses, and this is how such a sentence would look like in Standard German: “Er mag sein Auto, weil es sparsam ist.” (He likes his car, because it economical is.).
  • V2: If you are an English speaker, you might be more familiar with this construction. It is not considered written Standard German but it is OK to use it in spoken language. V2 means that the finite verb goes at the second position in the dependent subclause: “Er mag sein Auto, weil es ist sparsam.” (He likes his car, because it is economic.)
  • Ellipsis: This sounds very colloquial but most people would understand what you mean. In the ellipsis construction we used, we simply dropped the verb altogether: “Er mag sein Auto, weil sparsam.” (He likes his car, because economic.)

Now, each participant was presented nine of such sentences (three per construction) and had to check which of the media/text types they think it could appear in. We included some media that are clearly more prone to written Standard German than others (like the newspaper or a letter). “Soc.Netw.” (social networks) was maybe a bit underspecified from our side. There are a lot of different social networks and each community has its own “writing style” (at least one!). But we’ll see, where the correspondence analysis puts this item.

Correspondence analysis

I’ll do a simple ca() and will plot the result while I’m also saving the plot object in the variable ca.plot.

library(ca) ca.fit <- ca(struc.assoc) ca.plot <- plot(ca.fit)

As you can see, (almost) all the information we need is in the plot object.

str(ca.plot) ## List of 2 ## $ rows: num [1:3, 1:2] -0.51 0.202 0.478 0.05 -0.235 ... ## ..- attr(*, "dimnames")=List of 2 ## .. ..$ : chr [1:3] "V-final" "V2" "Ellipsis" ## .. ..$ : chr [1:2] "Dim1" "Dim2" ## $ cols: num [1:7, 1:2] 0.356 0.201 -0.912 -0.448 0.247 ... ## ..- attr(*, "dimnames")=List of 2 ## .. ..$ : chr [1:7] "Text mess." "Voice mess." "Newspaper" "E-mail" ... ## .. ..$ : chr [1:2] "Dim1" "Dim2"

Only the variance contributions for the dimensions are missing. I will get them from the original ca.fit object later.

Converting the plot object

For ggplot, we will need a dataframe with the labels, the coordinates for the two dimensions and the name of the variable which is stored in rows and columns. The following function make.ca.plot.df() converts the plot object (parameter ca.plot.obj) into such a dataframe. If you want, you can put the variable names for rows and columns as arguments row.lab and col.lab. These are used in the legend later.

make.ca.plot.df <- function (ca.plot.obj, row.lab = "Rows", col.lab = "Columns") { df <- data.frame(Label = c(rownames(ca.plot.obj$rows), rownames(ca.plot.obj$cols)), Dim1 = c(ca.plot.obj$rows[,1], ca.plot.obj$cols[,1]), Dim2 = c(ca.plot.obj$rows[,2], ca.plot.obj$cols[,2]), Variable = c(rep(row.lab, nrow(ca.plot.obj$rows)), rep(col.lab, nrow(ca.plot.obj$cols)))) rownames(df) <- 1:nrow(df) df } ca.plot.df <- make.ca.plot.df(ca.plot, row.lab = "Construction", col.lab = "Medium") ca.plot.df$Size <- ifelse(ca.plot.df$Variable == "Construction", 2, 1)

I also want the points for the three constructions to be bigger than the points for the different media/text types. This is why I included the last line in the code chunk above. Please note that the numbers we supplied for sizes (2 and 1) are not the actual sizes of the points in the plot. These are simply two values that are mapped on the size scale later.

ca.plot.df looks like this now.

Label Dim1 Dim2 Variable Size V-final -0.5095947 0.0499651 Construction 2 V2 0.2019318 -0.2346586 Construction 2 Ellipsis 0.4780980 0.1729715 Construction 2 Text mess. 0.3559765 0.1712304 Medium 1 Voice mess. 0.2009605 -0.2765821 Medium 1 Newspaper -0.9117981 0.1577468 Medium 1 E-mail -0.4478077 -0.0360625 Medium 1 Soc.Netw. 0.2465235 0.0289500 Medium 1 Letter -0.7218847 0.0083225 Medium 1 Other -0.1377860 -0.0361663 Medium 1 Getting variances

ca.plot.df is already fine for plotting. Only the variance contributions of the two dimensions are missing. We can get them from the summary() of the ca.fit object. If you want, you can do str(ca.sum) to see what is held in this object and how to access the contribution values.

ca.sum <- summary(ca.fit) dim.var.percs <- ca.sum$scree[,"values2"] dim.var.percs ## [1] 87.35737 12.64263

That worked. These values are the ones plotted next to the dimension labs in the base graphics plot above.

Plotting

Now for plotting. I’ll start by declaring the aesthetic mappings, the dashed lines for x = 0 and y = 0, and putting in the points.

library(ggplot2) library(ggrepel) p <- ggplot(ca.plot.df, aes(x = Dim1, y = Dim2, col = Variable, shape = Variable, label = Label, size = Size)) + geom_vline(xintercept = 0, lty = "dashed", alpha = .5) + geom_hline(yintercept = 0, lty = "dashed", alpha = .5) + geom_point()

Now, this is going to be a little complicated. With the limits argument of scale_[x/y]_continuous, I want to make the plot region a little bigger than the range of the points. I’m doing this by getting the ranges of the dimensions (Dim1 for x, and Dim2 for y). To these I am adding and subtracting a fraction (here: 0,2) of the distance between the minimal and the maximum value.

With the scale_size() component, I am controlling how small the smallest label and how large the largest label will be. People helped me with this in this stackoverflow question. Cheers!

Then, I am adding the labels that are automatically being repelled from each other and the data points. I played around with the parameters here to achieve a nice result. With the guides() component, I am overriding the size scale for the legend because I want the points to have different sizes in the plot but not in the legend.

p <- p + scale_x_continuous(limits = range(ca.plot.df$Dim1) + c(diff(range(ca.plot.df$Dim1)) * -0.2, diff(range(ca.plot.df$Dim1)) * 0.2)) + scale_y_continuous(limits = range(ca.plot.df$Dim2) + c(diff(range(ca.plot.df$Dim2)) * -0.2, diff(range(ca.plot.df$Dim2)) * 0.2)) + scale_size(range = c(4, 7), guide = F) + geom_label_repel(show.legend = F, segment.alpha = .5, point.padding = unit(5, "points")) + guides(colour = guide_legend(override.aes = list(size = 4)))

OK, almost there. The last thing to do is to define all the labels and setting a theme (I like theme_minimal()). Please note that for the labels of the axes, I am using the object dim.var.percs we constructed from the summary of the fit above.

p <- p + labs(x = paste0("Dimension 1 (", signif(dim.var.percs[1], 3), "%)"), y = paste0("Dimension 2 (", signif(dim.var.percs[2], 3), "%)"), col = "", shape = "") + theme_minimal() plot(p)

That’s basically it. Interpreting the results in not witin the scope of this post. In short: You can see how text messages are in proximity of the ellipsis construction (presumably because text messages are strongly associated with shorter texts). Also, newspapers, letters, and e-mails are associated with the written Standard German construction. The only medium that is associated with V2 (the “spoken” construction) is indeed the only spoken medium (voice message).

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: Rcrastinate. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

simstudy updated to version 0.1.14: implementing Markov chains

Tue, 20/08/2019 - 02:00

[This article was first published on ouR data generation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’m developing study simulations that require me to generate a sequence of health status for a collection of individuals. In these simulations, individuals gradually grow sicker over time, though sometimes they recover slightly. To facilitate this, I am using a stochastic Markov process, where the probability of a health status at a particular time depends only on the previous health status (in the immediate past). While there are packages to do this sort of thing (see for example the markovchain package), I hadn’t yet stumbled upon them while I was tackling my problem. So, I wrote my own functions, which I’ve now incorporated into the latest version of simstudy that is now available on CRAN. As a way of announcing the new release, here is a brief overview of Markov chains and the new functions. (See here for a more complete list of changes.)

Markov processes

The key “parameter” of a stochastic Markov process is the transition matrix, which defines the probability of moving from one state to another (or remaining in the same state). Each row of the matrix is indexed by the current state, while the columns are indexed by the target state. The values of the matrix represent the probabilities of transitioning from the current state to the target state. The sum of the probabilities across each row must equal one.

In the transition matrix below, there are three states \((1, 2, 3)\). The probability of moving from state 1 to state 3 is represented by \(p_{13}\). Likewise the probability of moving from state 3 to state 2 is \(p_{32}\). And \(\sum_{j=1}^3 p_{ij} = 1\) for all \(i \in (1,2,3)\).

\[
\left(
\begin{matrix}
p_{11} & p_{12} & p_{13} \\
p_{21} & p_{22} & p_{23} \\
p_{31} & p_{32} & p_{33}
\end{matrix}
\right )
\]

Here’s a possible \(3 \times 3\) transition matrix:

\[
\left(
\begin{matrix}
0.5 & 0.4 & 0.1 \\
0.2 & 0.5 & 0.3 \\
0.0 & 0.0 & 1.0
\end{matrix}
\right )
\]

In this case, the probability of moving from state 1 to state 2 is \(40\%\), whereas there is no possibility that you can move from 3 to 1 or 2. (State 3 is considered to be an “absorbing” state since it is not possible to leave; if we are talking about health status, state 3 could be death.)

function genMarkov

The new function genMarkov generates a random sequence for the specified number of individuals. (The sister function addMarkov is quite similar, though it allows users to add a Markov chain to an existing data set.) In addition to defining the transition matrix, you need to indicate the length of the chain to be generated for each simulated unit or person. The data can be returned either in long or wide form, depending on how you’d ultimately like to use the data. In the first case, I am generating wide format data for sequences of length of 6 for 12 individuals:

library(simstudy) set.seed(3928398) tmatrix <- matrix(c(0.5, 0.4, 0.1, 0.2, 0.5, 0.3, 0.0, 0.0, 1.0), 3, 3, byrow = T) dd <- genMarkov(n = 12, transMat = tmatrix, chainLen = 6, wide = TRUE) dd ## id S1 S2 S3 S4 S5 S6 ## 1: 1 1 2 2 1 2 2 ## 2: 2 1 1 2 2 2 3 ## 3: 3 1 1 2 3 3 3 ## 4: 4 1 2 2 1 1 2 ## 5: 5 1 1 2 2 2 3 ## 6: 6 1 1 1 1 1 1 ## 7: 7 1 1 1 1 2 2 ## 8: 8 1 1 1 1 1 1 ## 9: 9 1 1 2 3 3 3 ## 10: 10 1 1 2 3 3 3 ## 11: 11 1 2 2 2 2 1 ## 12: 12 1 2 1 1 2 1

In the long format, the output is multiple records per id. This could be useful if you are going to be estimating longitudinal models, or as in this case, creating longitudinal plots:

set.seed(3928398) dd <- genMarkov(n = 12, transMat = tmatrix, chainLen = 6, wide = FALSE)

Here are the resulting data (for the first two individuals):

dd[id %in% c(1,2)] ## id period state ## 1: 1 1 1 ## 2: 1 2 2 ## 3: 1 3 2 ## 4: 1 4 1 ## 5: 1 5 2 ## 6: 1 6 2 ## 7: 2 1 1 ## 8: 2 2 1 ## 9: 2 3 2 ## 10: 2 4 2 ## 11: 2 5 2 ## 12: 2 6 3

And here’s a plot for each individual, showing their health status progressions over time:

I do plan on sharing the details of the simulation that inspired the creation of these new functions, though I am still working out a few things. In the meantime, as always, if anyone has any suggestions or questions about simstudy, definitely let me know.

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: ouR data generation. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introducing Open Forensic Science in R

Tue, 20/08/2019 - 02:00

[This article was first published on rOpenSci - open tools for open science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The free online book Open Forensic Science in R was created to foster open science practices in the forensic science community. It is comprised of eight chapters: an introduction and seven chapters covering different areas of forensic science: the validation of DNA interpretation systems, firearms analysis of bullets and casings, latent fingerprints, shoe outsole impressions, trace glass evidence, and decision-making in forensic identification tasks. The chapters of Open Forensic Science in R have the same five sections: Introduction, Data, R Package(s), Drawing Conclusions, and Case Study. There is R code throughout the chapter to guide the reader along in an analysis, and the case study walks the reader through solving a forensic science problem in R, from reading the data to answering a specific question such as, “Were these two bullets fired by the same gun?”

The Center for Statistics and Applications in Forensic Evidence (CSAFE)

To help more scholars access forensic science research, Open Forensic Science in R brings together many open resources created and/or used by the Center for Statistics and Applications in Forensic Evidence (CSAFE) and the National Institute for Standards and Technology (NIST). CSAFE was founded in 2015 with the mission of building up the statistical foundations in forensic science. CSAFE is an interdisciplinary NIST Center of Excellence comprised of four institutions: Iowa State University (ISU), Carnegie Mellon University (CMU), University of California Irvine (UCI), and University of Virginia (UVA). The statistics faculty members, postdoctoral researchers, and graduate students working with CSAFE have written many R packages to complete a variety of forensic science tasks, from analyzing shoeprint impressions to comparing marking on bullets. Open Forensic Science in R brings many of these forensic science R packages together in one place.

CSAFE is committed to open science, and many CSAFE researchers contributed to Open Forensic Science in R. As most CSAFE researchers are statisticians, much of CSAFE’s research uses R in some capacity. Dedication to open source is vital for advancing the field of forensic science because the current barrier to entry into forensic science research is very high: equipment is very expensive and most software in the field is proprietary, so no one outside of the company selling the software knows exactly how it works. These barriers limit the number of researchers who have access to the field. There are also limited data available to researchers. In many cases, data are limited for practical reasons because they are from real, identifiable people and investigations. CSAFE is committed to releasing as much data as possible to help advance forensic science research, and hosts a large Forensic Science Data Portal to make the data widely available. Open Forensic Science in R brings many of the open resources available to current and future forensic science researchers together in one place to encourage openness in the field.

Example: Comparing Bullets Figure 1: Two images of partial bullet scans. Were these two bullets fired by the same gun?” width=”50%” />Figure 1: Two images of partial bullet scans. Were these two bullets fired by the same gun?” width=”50%” />

Figure 1: Two images of partial bullet scans. Were these two bullets fired by the same gun?

The chapter “Firearms: bullets” begins by introducing the reader to the terminology of firearms and bullets, and describes the methods used in forensic science to compare bullets. Then, the chapter discusses the open source work by current and former CSAFE researchers. Three dimensional bullet scans, the data of interest, are stored in the x3p standard format, and the R package x3ptools is used to read the bullet scans in R 1. Then, the bullet data are analyzed with the bulletxtrctr package 2. Each bullet is comprised of many surface scans corresponding to the number of lands on the bullet which came in contact with the gun barrel when fired. The number of surface scans varies by type and manufacturer of the gun. For a comparison of two bullets, a representative cross-section of each 3D surface scan is selected, the curve is removed, some noise is removed, and only the smoothed bullet signature remains. (See Figure 2.) This procedure is repeated for all lands of the bullets of interest, and the signatures are compared to each other using a trained random forest available in the package, resulting in scores from zero to one indicating how similar the two signatures are 3. In this case, bullets 1 and 2 were fired from the same gun, which can be seen by comparing the bullet 1 lands 2, 3, 4, 5, 6, 1 to the bullet 2 lands 3, 4, 5, 6, 1, 2, respectively. For complete details, see the Case Study section of the chapter.

Figure 2: At left, the representative cross-sections from two bullets with 6 lands each. At right, the resulting smoothed bullet signatures (dark gray) and the raw signatures (light gray).” width=”50%” />Figure 2: At left, the representative cross-sections from two bullets with 6 lands each. At right, the resulting smoothed bullet signatures (dark gray) and the raw signatures (light gray).” width=”50%” />

Figure 2: At left, the representative cross-sections from two bullets with 6 lands each. At right, the resulting smoothed bullet signatures (dark gray) and the raw signatures (light gray).

How to Contribute & Acknowledgements

The book will continue to grow, and contributions are welcome via issues or pull requests on the Github repo. If you would like to contribute, please follow our Contributor Code of Conduct. Thank you to the many CSAFE and NIST researchers who contributed to this project: Dr. Heike Hofmann (ISU), Dr. Soyoung Park (ISU), Xiao Hui Tai (CMU), Dr. Eric Hare (Omni Analytics, formerly ISU), Dr. Karen Kafadar (UVA), Karen Pan (UVA), Dr. Amanda Luby (CMU), and Dr. Sarah Riman (NIST). Finally, a big thank you goes to the rOpenSci Fellowship program for funding this project.

  1. Heike Hofmann, Susan Vanderplas, Ganesh Krishnan and Eric Hare (2019). x3ptools: Tools for Working with 3D Surface Measurements. R package version 0.0.2.9000. https://github.com/heike/x3ptools
  2. Heike Hofmann, Susan Vanderplas and Ganesh Krishnan (2018). bulletxtrctr: Automatic Matching of Bullet Striae. R package version 0.2.0. https://heike.github.io/bulletxtrctr/
  3. Hare, Eric, Heike Hofmann, and Alicia Carriquiry. 2017. “Automatic Matching of Bullet Land Impressions.” The Annals of Applied Statistics 11 (4): 2332–56.
var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: rOpenSci - open tools for open science. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

What does a modern analytics platform need to offer companies real added value?

Mon, 19/08/2019 - 16:10

[This article was first published on R-Bloggers – eoda GmbH, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

What does a modern analytics platform need to offer companies real added value?

Currently, new, innovative platforms are sprouting up on the market again and again – implemented with technical competence and ideally suited to the respective analytical approaches. But the question arises: Is that enough? Is it enough to develop software that allows reliable analysis and delivers clean results? Or do other factors exist that are just as important for companies to be even more successful with them?

The masses have unbelievable potential for corporate success

While thinking of analysis or data projects, one cannot get around thinking of a certain user group: Data scientists. Their task is to develop the actual analysis scripts and algorithms. But why should a company limit itself to that? Often, departments such as sales and marketing, production / manufacturing or human resources also have good ideas for realizing use cases with the corresponding data. Or they exactly ask the right questions, which can then be answered with data support. Now you can ask yourself: Should the company want that? We are sure: YES!

Only through the interaction of different groups, companies can achieve the best results in the long term. This is the big challenge that modern analytics platforms must face today and also in the future: The possibility for different groups to participate in the development, planning and implementation of data science projects. Each group contributes its own views and expertise. The modern analytics platform makes it possible to create a holistic data project that answers critical questions or uncovers previously undiscovered potentials and correlations. This point is becoming increasingly important.

Conclusion: Ideas for data projects can come from all user groups – if you give them access to the platform, you will generate more projects that contribute to the success of the company.

If the upper point shows the „What“ – then this point shows the „How“.  If analytical platforms are to be implemented as profitably as possible in companies, it must be possible to operate them relatively simply. Remember operating systems: UNIX – users love its shell – but only with a graphical, intuitive interface the masses can enjoy the advantages of operating systems. Thus, analytics platforms have to be designed like this. If you want to access more than just absolute code cracks, the focus must be on clear menu navigation and the easy use. Only in this way, the above-mentioned user groups can contribute own projects or questions in order to increase the success of the company.

A corresponding user interface is therefore necessary and must be intuitively comprehensible. Therefore, companies must take this factor into account when selecting analytical platforms if data science is to be used comprehensively.

Conclusion: Only with an intuitive user interface, the company-wide use of analytical platforms can be achieved.

Outlook

In the next part: How important an intelligent role and rights concept is when scaling data science projects.

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R-Bloggers – eoda GmbH. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Regular Sequences

Mon, 19/08/2019 - 15:43

[This article was first published on R-exercises, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

So far in this series, we used vectors from built-in datasets (rivers, women and nhtemp), or created them by stringing together several numbers with the c function (e.g. c(1, 2, 3, 4)). R offers an extremely useful shortcut to create vectors of the latter kind, which is the colon : operator. Instead of having to type:

x <- c(1, 2, 3, 4)

we can simply type

x <- 1:4

to create exactly the same vector. Obviously this is especially useful for longer sequences.

In fact, you will use sequences like this a lot in real-world applications of R, e.g. to select subsets of data points, records, or variables. The exercises in this set might come across as a little abstract, but trust me, these sequences are really the basic building blocks for your future R scripts. So let’s go ahead!

Before starting the exercises, please note this is the fourth set in a series of five: In the first three sets, we practised creating vectors, vector arithmetics, and various functions. You can find all sets in our ebook Start Here To Learn R – vol. 1: Vectors, arithmetic, and regular sequences. The book also includes all solutions (carefully explained), and the fifth and final set of the series. This final set focuses on the application of the concepts you learned in the first four sets, to real-world data.

One more thing: I would really appreciate your feedback on these exercises: Which ones did you like? Which ones were too easy or too difficult? Please let me know what you think here!

Exercise 1

Try to shorten the notation of the following vectors as much as possible, using : notation:

  1. x <- c(157, 158, 159, 160, 161, 162, 163, 164)
  2. x <- c(15, 16, 17, 18, 20, 21, 22, 23, 24)
  3. x <- c(10, 9, 8, 7, 6, 5, 4, 3, 2, 1)
  4. x <- c(-1071, -1072, -1073, -1074, -1075, -1074, -1073, -1072, -1071)
  5. x <- c(1.5, 2.5, 3.5, 4.5, 5.5)

(Solution)

Exercise 2

The : operator can be used in more complex operations along with arithmetic operators, and variable names. Have a look at the following expressions, and write down what sequence you think they will generate. Then check with R.

  1. (10:20) * 2
  2. 105:(30 * 3)
  3. 10:20*2
  4. 1 + 1:10/10
  5. 2^(0:5)

(Solution)

Exercise 3

Use the : operator and arithmetic operators/functions from the previous chapter to create the following vectors:

  1. x <- c(5, 10, 15, 20, 25, 30)
  2. x <- c(0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3)
  3. x <- c(1/5, 2/6, 3/7, 4/8, 5/9, 6/10, 7/11, 8/12)
  4. x <- (1, 4, 3, 8, 5, 12, 7, 16, 9, 20) (Hint: you have to use the recycle rule)

(Solution)

Exercise 4

Another way to generate a sequence is the seq function. Its first two arguments are from and to, followed by a third, which is by. seq(from=5, to=30, by=5) replicates part (a) of the previous exercise.

Note that you can omit the argument names from, to, and by, if you stick to their positions, i.e., seq(5, 30, 5). Have a look at the following expressions, and write down what sequence you think they will generate. Then check with R.

  1. seq(from=20, to=80, by=20)
  2. seq(from=-10, to=5, by=0.5)
  3. seq(from=10, to=-3, by=-2)
  4. seq(from=0.01, to=0.09, by=0.02)

(Solution)

Exercise 5

Compare the regular sequence of exercises 2(a) and 3(a) (both using the : operator) with the same sequences using the seq function with appropriate by argument. Can you think of a more general rule how to convert any seq(from, to, by) statement to a sequence generated with the : operator?

In other words, rewrite seq(from=x, to=y, by=z) to a statement using the : operator. Hint: if this appears difficult, try to do this first by choosing some values for x, y, and z, and see which pattern emerges.

(Solution)

Exercise 6

The previous exercises in this set were aimed at generating sets of increasing or decreasing numbers. However, sometimes you just want a set of equal numbers. You can accomplish this with the rep function (from “replicate”). Its first argument is the number or vector that will be replicated, and its second argument times, … well I guess you can guess that one already. Now, let’s shorten the following statements, using rep:

  1. x <- c(5, 5, 5, 5, 5, 5, 5)
  2. x <- c(5, 6, 7)y <- c(x, x, x, x, x)
  3. x <- (10, 16, 71, 10, 16, 71, 10, 16, 71)

(Solution)

Exercise 7

rep has a third very useful argument: each. As we saw in the previous exercise (part b), vectors are replicated in their entirety by rep.

However, you can also replicate “each” individual element. Consider for example:

seq(c(1, 2, 3), times=2, each=3).

This says: “replicate each element of the input vector c(1, 2, 3) 3 times, and then replicate the resulting vector 2 times.” Now, let’s shorten the following statements, using rep:

  1. x <- c(5, 5, 5, 5, 8, 8, 8, 8, -3, -3, -3, -3, 0.34, 0.34, 0.34, 0.34)
  2. x <- c(-0.1, -0.1, -0.9, -0.9, -0.6, -0.6)
  3. x <- c(1, 1, 2, 2, 3, 3, 1, 1, 2, 2, 3, 3, 1, 1, 2, 2, 3, 3)

(Solution)

Exercise 8

We can actually write part c of te previous exercise even more compact by using rep in combination with the : operator. Do you see how?

In this exercise we’re using combinations of rep, : and seq to create the following sequences:

  1. x <- c(97, 98, 99, 100, 101, 102, 97, 98, 99, 100, 101, 102, 97, 98, 99, 100, 101, 102)
  2. x <- c(-5, -5, -5, -5, -6, -6, -6, -6, -7, -7, -7, -7, -8, -8, -8, -8)
  3. x <- c(13, 13, 17, 17, 21, 21, 25, 25, 29, 29, 13, 13, 17, 17, 21, 21, 25, 25, 29, 29)
  4. x <- c(1, 2, 3, 2, 1, 0, 1, 2, 3, 2, 1, 0, 1, 2, 3, 2, 1, 0)

(Solution)

Exercise 9

Suppose there would be no each argument for rep. Rewrite the following statement, without using the each argument: x <- rep(c(27, 31, 19, 14), each=v, times=w)

(Solution)

Exercise 10

Let’s finish this set off with an application. Let’s create a series of vectors for later use in a timeseries dataset. The idea is that each observation in this dataset can be identified by a timestamp, which is defined by four vectors:

  • s (for seconds)
  • m (minutes)
  • h (hours)
  • d (days)

For this exercise, we’ll limit the series to a full week of 7 days.

This is a somewhat more complicated problem than the previous ones in this exercise. Don’t worry however! Whenever you’re faced with a somewhat more complicated problem than you are used to, the best strategy is to break it down into smaller problems. So, we’ll simply start with the s vector.

  1. Since s counts the number of seconds, we know it has to start at 1, run to 60, restart at 1, etc. As it should cover a full week, we also know we have to replicate this series many times. Can you calculate exactly how many times it has to replicate this series? Use the outcome of your calculation to create the full s vector.
  2. Now, let’s create the vector m. Think about how this vector differs from s. What does this mean for the times and each arguments?
  3. Now, let’s create vector h and d using the same logic. Check that s, m, h, and d have equal length.

(Solution)

Related exercise sets:
  1. Vectors and Functions
  2. Working With Vectors
  3. Descriptive Analytics-Part 6: Interactive dashboard ( 2/2)
  4. Become a Top R Programmer Fast with our Individual Coaching Program
  5. Explore all our (>4000) R exercises
  6. Find an R course using our R Course Finder directory
var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R-exercises. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Why Machine Learning is more Practical than Econometrics in the Real World

Mon, 19/08/2019 - 05:54

[This article was first published on R – Remix Institute, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Motivation

I’ve read several studies and articles that claim Econometric models are still superior to machine learning when it comes to forecasting. In the article, “Statistical and Machine Learning forecasting methods: Concerns and ways forward”, the author mentions that,

“After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined.”

In many business environments a data scientist is responsible for generating hundreds or thousands (possibly more) forecasts for an entire company, opposed to a single series forecast. While it appears that Econometric methods are better at forecasting a single series (which I generally agree with), how do they compare at forecasting multiple series, which is likely a more common requirement in the real world? Some other things to consider when digesting the takeaways from that study:

  • Do the ML models benefit from building a single model to forecast all series at once, which most time series models cannot do?
  • What are the run-time differences with both approaches?
  • The author in the linked article above states that the Econometrics models outperform machine learning models across all forecast horizons but is that really the case?
Approach

In this article, I am going to show you an experiment I ran that compares machine learning models and Econometrics models for time series forecasting on an entire company’s set of stores and departments.

Before I kick this off, I have to mention that I’ve come across several articles that describe how one can utilize ML for forecasting (typically with deep learning models) but I haven’t seen any that truly gives ML the best chance at outperforming traditional Econometric models. On top of that, I also haven’t seen too many legitimate attempts to showcase the best that Econometric models can do either. That’s where this article and evaluation differ. The suite of functions I tested are near-fully optimized versions of both ML models and Econometric models (list of models and tuning details are below). The functions come from the R open source package RemixAutoML, which is a suite of functions for automated machine learning (AutoML), automated forecasting, automated anomaly detection, automated recommender systems, automated feature engineering, and more. I provided the R script at the bottom of this article so you can replicate this experiment. You can also utilize the functions in Python via the r2py package and Julia via the RCall package.

The Data

The data I’m utilizing comes from Kaggle — weekly Walmart sales by store and department. I’m only using the store and department combinations that have complete data to minimize the noise added to the experiment, which leaves me with a total of 2,660 individual store and department time series. Each store & dept combo has 143 records of weekly sales. I also removed the “IsHoliday” column that was provided.

Preview of Walmart Store Sales Kaggle Data Set The Experiment

Given the comments from the article linked above, I wanted to test out several forecast horizons. The performance for all models are compared on n-step ahead forecasts, for n = {1,5,10,20,30}, with distinct model builds used for each n-step forecast test. For each run, I have 2,660 evaluation time series for comparison, represented by each store and department combination. In the Results section you can find the individual results for each of those runs.

The Models

In the experiment I used the AutoTS() function for testing out Econometric models and I used the RemixAutoML CARMA suite (Calendar-Auto-Regressive-Moving-Average) for testing out Machine Learning. The AutoTS() function tests out every model from the list below in in several ways (similar to grid tuning in ML). The ML suite contains 4 different tree-based algorithms. As a side note, the Econometric models all come from the forecast package in R. You can see a detailed breakdown of how each model is optimized below the Results section in this article.

Econometrics Models used in AutoTS()
  1. DSHW — Double-Seasonal Holt-Winters
  2. ARIMA — Autoregressive, integrated, moving average
  3. ARFIMA — Fractionally differenced ARIMA
  4. ETS — Exponential Smoothing State-Space Model
  5. NN — Feed-forward neural network with a single hidden layer and lagged inputs. I’m counting this towards Econometrics because it came from the Forecast package in R which is an Econometrics package along with the fact that it’s not as customizable as a TensorFlow or PyTorch model. (Besides, I’ve seen authors state that linear regression is machine learning which would imply that all the Econometrics methods are Machine Learning but I don’t want to debate that here). If you want to consider the NN as a Machine Learning model, just factor that into the results data below.
  6. TBATS (Exponential smoothing state-space model with Box-Cox transformation, ARMA errors, Trend and Seasonal components)
  7. TSLM — time series linear model with trend and seasonal components
Example Plot from AutoTS(): Single Store Forecast with 80% & 95% Prediction Intervals

Machine Learning Algorithms
  1. AutoCatBoostCARMA() — CatBoost
  2. AutoXGBoostCARMA() — XGBoost
  3. AutoH2oGBMCARMA() — H2O Gradient Boosting Machine
  4. AutoH2oDRFCARMA() — H2O Distributed Random Forest
Example Plot from AutoCatBoostCARMA(): Aggregated Forecast Results

The table outputs below shows the ranks of 11 models (7 Econometric and 4 Machine Learning) when it comes to lowest mean absolute error (MAE) for every single store and department combination (2,660 individual time series) across five different forecast horizons.

For example, in the 1-step ahead forecast table below, NN was the most accurate model on 666 of the 2,660 time series. TBATS was the most accurate 414 times out of the 2,660.

Still looking at the 1-step ahead forecast table below, the NN was the second most accurate on 397 out of 2,660 time series. TBATS was the second most accurate on 406 out of the 2,660 time series. TBATS ranked last place (11th) 14 times.

  • Winner in Single Model Accuracy — TBATS is the winner of the competition (Econometrics model) with a mean rank of 1.6 across all five forecast horizons.
  • Runner-Up in Single Model Accuracy — Catboost is the runner up of the competition (Machine Learning model) with a mean rank of 3.6 across all five forecast horizons.
  • Winner in Run time — ML is winner: For a single run (there were 5 total, 1 for each forecast horizon) the Econometrics automated forecasting took an average of 33 hours! to run while the automated ML models took an average of 3.5 hours, where each run included a grid tune of 6 comparisons, (1 hour for CatBoost, 1 hour for XGBoost, 30 minutes for H2O GBM and 1 hour for H2O Distributed Random Forest).
  • Winner in Shorter-Horizon Accuracy — The Econometrics models dominate on shorter forecast horizons.
  • Winner in Longer Horizon Accuracy — The Machine Learning models dominate on longer forecast horizons.
Aggregate Summaries Mean Rank by Model Ranks: Long Term = {20,30} Period-Ahead & Short Term = {1,5,10} Period-Ahead

The histograms below were derived from selecting the best Econometrics models for each individual store and department time series (essentially the ensemble results) and the best Machine Learning models for each individual store and department time series (ensemble). You can see that as the forecast horizon grows, the Machine Learning models catch up and overcome (slightly) the Econometrics models. With the shorter forecast horizon, the Econometrics models outperform the Machine Learning models by a larger amount than the converse.

Forecast MAE by Econometrics(Blue) and Machine Learning(Gold): 1-Period, 5-Period, 20-Period, 30-Period Individual Forecast Horizon Summaries by Model 1- Period Ahead Forecast Model Counts by Rank (based on lowest MAE) 5- Period Ahead Forecast Model Counts by Rank (based on lowest MAE) 10- Period Ahead Forecast Model Counts by Rank (based on lowest MAE) 20- Period Ahead Forecast Model Counts by Rank (based on lowest MAE) 30- Period Ahead Forecast Model Counts by Rank (based on lowest MAE) Conclusion

While the short term horizon forecasts are more accurate via the Econometrics models I tend to have a greater need for longer term forecasts for planning purposes and the Machine Learning models exceed the Econometrics in that category. On top of that, the run-time is a pretty significant factor for me.

If your business needs are the opposite, the Econometrics models are probably your best bet, assuming the run times are not a concern.

If I had enough resources available I’d run both functions and utilize the individual models that performed best for each series, which means I’d be utilizing all 11 models.

Algorithm Tuning Details Econometrics Model Details:

Each of the individual Econometrics models in AutoTS() are optimized based on the following treatments.

Global Optimizations (applies to all models):

A) Optimal Box-Cox Transformations are used in every run where data is strictly positive. The optimal transformation could be no transformation (artifact of Box-Cox).

B) Four different treatments are tested for each model:

  • user-specified time frequency + no outlier smoothing & no imputation
  • model-based time frequency + no outlier smoothing & no imputation
  • user-specified time frequency + outlier smoothing & imputation
  • model-based time frequency + outlier smoothing & imputation

The treatment of outlier smoothing and imputation sometimes has a beneficial effect on forecasts; sometimes it doesn’t. You really need to test out both to see what generates more accurate predictions out-of-sample. Same goes with manually defining the frequency of the data. If you have daily data, you specify “day” in the AutoTS arguments. Alternatively, if specified, spectral analysis is done to find the frequency of the data based on the dominant trend and seasonality. Sometimes this approach works better, sometimes it doesn’t. That’s why I test all the combinations for each model.

Individual Model Optimizations:

C) For the ARIMA and ARFIMA, I used up to 25 lags and moving averages, algorithmically determined how many differences and seasonal differences to use, and up to a single difference and seasonal difference can be used, all determined in the stepwise procedure (all combinations can be tested and run in parallel but it’s too time consuming for my patience).

D) For the Double Seasonal Holt-Winters model, alpha, beta, gamma, omega, and phi are determined using least-squares and the forecasts are adjusted using an AR(1) model for the errors.

E) The Exponential Smoothing State-Space model runs through an automatic selection of the error type, trend type, and season type, with the options being “none”, “additive”, and “multiplicative”, along with testing of damped vs. non-damped trend (either additive or multiplicative). Alpha, beta, and phi are estimated.

F) The Neural Network is set up to test out every combination of lags and seasonal lags (25 lags, 1 seasonal lag) and the version with the best holdout score is selected.

G) The TBATS model utilizes 25 lags and moving averages for the errors, damped trend vs. non-damped trend are tested, trend vs. non-trend are also tested, and the model utilizes parallel processing.

H) The TSLM model utilizes simple time trend and season depending on the frequency of the data.

Machine Learning Model Details:

The CARMA suite utilizes several features to ensure proper models are built to generate the best possible out-of-sample forecasts.

A) Feature engineering: I use a time trend, calendar variables, holiday counts, and 25 lags and moving averages along with 51, 52, and 53-week lags and moving averages (all specified as arguments in the CARMA function suite). Internally, the CARMA functions utilize several RemixAutoML functions, all written using data.table for fast and memory efficient processing:

  • DT_GDL_Feature_Engineering() — creates lags and moving average features by grouping variables (also creates lags and moving averages off of time between records)
  • Scoring_GDL_Feature_Engineering() — creates lags and moving average features for a single record by grouping variables (along with the time between features)
  • CreateCalendarVariables() — creates numeric features identifying various time units based on date columns
  • CreateHolidayFeatures() — creates count features based on the specified holiday groups you want to track and the date columns you supply

B) Optimal transformations: the target variable along with the associated lags and moving average features were transformed. This is really useful for regression models with categorical features that have associated target values that significantly differ from each other. The transformation options that are tested (using a Pearson test for normality) include:

  • YeoJohnson, Box-Cox, Arcsinh, Identity,
  • arcsin(sqrt(x)), logit(x) — for proportion data, not used in experiment

The functions used to create the transformations throughout the process and then back-transform them after the forecasts have been generated come from RemixAutoML :

  • AutoTransformationCreate()
  • AutoTransformationScore()

C) Models: there are four CARMA functions and each use a different algorithm for the model fitting. The models used to fit the time series data come from RemixAutoML and include:

  • AutoCatBoostRegression()
  • AutoXGBoostRegression()
  • AutoH2oDRFRegression()
  • AutoH2oGBMRegression()

You can view all of the 21 process steps in those functions on my GitHub page README under the section titled, “Supervised Learning Models” in the “Regression” sub-section (you can also view the source code directly of course).

D) GPU: With the CatBoost and XGBoost functions, you can build the models utilizing GPU (I ran them with a GeForce 1080ti) which results in an average 10x speedup in model training time (compared to running on CPU with 8 threads). I should also note, the lags and moving average features by store and department and pretty intensive to compute and are built with data.table exclusively which means that if you have a CPU with a lot of threads then those calculations will be faster as data.table is parallelized.

E) One model for all series: I built the forecasts for all the store and department combinations with a single model by simply specifying c(“Store”,”Dept”) in the GroupVariables argument, which provides superior results compared to building a single model for each series. The group variables are used as categorical features and do not require one-hot-encoding before hand as CatBoost and H2O handle those internally. The AutoXGBoostCARMA() version utilizes the DummifyDT() function from RemixAutoML to handle the categorical features.

F) The max number of trees used for each model was (early stopping is used internally):

  • AutoCatBoostCARMA() = 20,000
  • AutoXGBoostCARMA() = 5,000
  • AutoH2oDRFCARMA() = 2,000
  • AutoH2oGBMCARMA() = 2,000

G) Grid tuning: I ran a 6 model random hyper-parameter grid tune for each algorithm. Essentially, a baseline model is built and then 5 other models are built and compared with the lowest MAE model being selected. This is all done internally in the CARMA function suite.

H) Data partitioning: for creating the training, validation, and test data, the CARMA functions utilize the RemixAutoML::AutoDataPartition()function and utilizes the “timeseries” option for the PartitionTypeargument which ensures that the train data reflects the furthest data points back in time, followed by the validation data, and then the test data which is the most recent data points in time. For the experiment, I used 10/143 as the percent holdout for validation data. The test data varied by which n-step ahead holdout was being tested, and the remaining data went to the training set.

I) Forecasting: Once the regression model is built, the forecast process replicates an ARIMA process. First, a single step-ahead forecast is made. Next, the lags and moving average features are updated, making use of the predicted values from the previous step. Next, the other features are updated (trend, calendar, holiday). Then the next forecast step is made; rinse and repeat for remaining forecasting steps. This process utilizes the RemixAutoML functions:

  • AutoCatBoostScoring()
  • AutoXGBoostScoring()
  • AutoH2oMLScoring()
Contact

If anyone is interested in testing out other models, utilizing different data sets, or just need to set up automated forecasts for their company, contact me on LinkedIn.

If you’d like to learn how to utilize the RemixAutoML package check out the free course on Remyx Courses.

P.S.

I have plans to continue enhancing and adding capabilities to the automated time series functions discussed above. For example, I plan to:

  • Test Fourier features for both AutoTS() and the CARMA suite
  • Test other ML algorithms
  • Ensemble methods for combining forecasts
  • Add Croston Econometric model for intermittent demand forecasting
  • Create machine learning based intermittent demand forecasting models (in a similar fashion to the Croston method) utilizing RemixAutoML Generalized Hurdle Models
R Script

Code to reproduce: https://gist.github.com/AdrianAntico/8e1bbf63f26835756348d7c67a930227

library(RemixAutoML) library(data.table) ########################################### # Prepare data for AutoTS()---- ########################################### # Load Walmart Data ---- # link to manually download file: https://remixinstitute.app.box.com/v/walmart-store-sales-data/ data <- data.table::fread("https://remixinstitute.box.com/shared/static/9kzyttje3kd7l41y1e14to0akwl9vuje.csv", header = T, stringsAsFactors = FALSE) # Subset for Stores / Departments with Full Series Available: (143 time points each)---- data <- data[, Counts := .N, by = c("Store","Dept")][Counts == 143][, Counts := NULL] # Subset Columns (remove IsHoliday column)---- keep <- c("Store","Dept","Date","Weekly_Sales") data <- data[, ..keep] # Group Concatenation---- data[, GroupVar := do.call(paste, c(.SD, sep = " ")), .SDcols = c("Store","Dept")] data[, c("Store","Dept") := NULL] # Grab Unique List of GroupVar---- StoreDept <- unique(data[["GroupVar"]]) ########################################### # AutoTS() Builds---- ########################################### for(z in c(1,5,10,20,30)) { TimerList <- list() OutputList <- list() l <- 0 for(i in StoreDept) { l <- l + 1 temp <- data[GroupVar == eval(i)] temp[, GroupVar := NULL] TimerList[[i]] <- system.time( OutputList[[i]] <- tryCatch({ RemixAutoML::AutoTS( temp, TargetName = "Weekly_Sales", DateName = "Date", FCPeriods = 1, HoldOutPeriods = z, EvaluationMetric = "MAPE", TimeUnit = "week", Lags = 25, SLags = 1, NumCores = 4, SkipModels = NULL, StepWise = TRUE, TSClean = TRUE, ModelFreq = TRUE, PrintUpdates = FALSE)}, error = function(x) "Error in AutoTS run")) print(l) } # Save Results When Done and Pull Them in After AutoCatBoostCARMA() Run---- save(TimerList, file = paste0(getwd(),"/TimerList_FC_",z,"_.R")) save(OutputList, file = paste0(getwd(),"/OutputList_FC_",z,".R")) rm(OutputList, TimerList) } ########################################### # Prepare data for AutoCatBoostCARMA()---- ########################################### # Load Walmart Data---- # link to manually download file: https://remixinstitute.app.box.com/v/walmart-store-sales-data/ data <- data.table::fread("https://remixinstitute.box.com/shared/static/9kzyttje3kd7l41y1e14to0akwl9vuje.csv", header = T, stringsAsFactors = FALSE) # Subset for Stores / Departments With Full Series (143 time points each)---- data <- data[, Counts := .N, by = c("Store","Dept")][Counts == 143][, Counts := NULL] # Subset Columns (remove IsHoliday column)---- keep <- c("Store","Dept","Date","Weekly_Sales") data <- data[, ..keep] # Build AutoCatBoostCARMA Models---- for(z in c(1,5,10,20,30)) { CatBoostResults <- RemixAutoML::AutoCatBoostCARMA( data, TargetColumnName = "Weekly_Sales", DateColumnName = "Date", GroupVariables = c("Store","Dept"), FC_Periods = 10, TimeUnit = "week", TargetTransformation = TRUE, Lags = c(1:25,51,52,53), MA_Periods = c(1:25,51,52,53), CalendarVariables = TRUE, TimeTrendVariable = TRUE, HolidayVariable = TRUE, DataTruncate = FALSE, SplitRatios = c(1 - 60/143, 30/143, 30/143), TaskType = "GPU", EvalMetric = "RMSE", GridTune = FALSE, GridEvalMetric = "r2", ModelCount = 2, NTrees = 1500, PartitionType = "timeseries", Timer = TRUE) # Output---- CatBoostResults$TimeSeriesPlot CatBoost_Results <- CatBoostResults$ModelInformation$EvaluationMetricsByGroup data.table::fwrite(CatBoost_Results, paste0(getwd(),"/CatBoost_Results_",30,".csv")) rm(CatBoost_Results,CatBoostResults) } ########################################### # Prepare data for AutoXGBoostCARMA()---- ########################################### # Load Walmart Data ---- # link to manually download file: https://remixinstitute.app.box.com/v/walmart-store-sales-data/ data <- data.table::fread("https://remixinstitute.box.com/shared/static/9kzyttje3kd7l41y1e14to0akwl9vuje.csv", header = T, stringsAsFactors = FALSE) # Subset for Stores / Departments With Full Series (143 time points each)---- data <- data[, Counts := .N, by = c("Store","Dept")][Counts == 143][, Counts := NULL] # Subset Columns (remove IsHoliday column)---- keep <- c("Store","Dept","Date","Weekly_Sales") data <- data[, ..keep] for(z in c(1,5,10,20,30)) { XGBoostResults <- RemixAutoML::AutoXGBoostCARMA( data, TargetColumnName = "Weekly_Sales", DateColumnName = "Date", GroupVariables = c("Store","Dept"), FC_Periods = 2, TimeUnit = "week", TargetTransformation = TRUE, Lags = c(1:25, 51, 52, 53), MA_Periods = c(1:25, 51, 52, 53), CalendarVariables = TRUE, HolidayVariable = TRUE, TimeTrendVariable = TRUE, DataTruncate = FALSE, SplitRatios = c(1 - (30+z)/143, 30/143, z/143), TreeMethod = "hist", EvalMetric = "MAE", GridTune = FALSE, GridEvalMetric = "mae", ModelCount = 1, NTrees = 5000, PartitionType = "timeseries", Timer = TRUE) XGBoostResults$TimeSeriesPlot XGBoost_Results <- XGBoostResults$ModelInformation$EvaluationMetricsByGroup data.table::fwrite(XGBoost_Results, paste0(getwd(),"/XGBoost_Results",z,".csv")) rm(XGBoost_Results) } ########################################### # Prepare data for AutoH2oDRFCARMA()---- ########################################### # Load Walmart Data ---- # link to manually download file: https://remixinstitute.app.box.com/v/walmart-store-sales-data/ data <- data.table::fread("https://remixinstitute.box.com/shared/static/9kzyttje3kd7l41y1e14to0akwl9vuje.csv", header = T, stringsAsFactors = FALSE) # Subset for Stores / Departments With Full Series (143 time points each)---- data <- data[, Counts := .N, by = c("Store","Dept")][Counts == 143][, Counts := NULL] # Subset Columns (remove IsHoliday column)---- keep <- c("Store","Dept","Date","Weekly_Sales") data <- data[, ..keep] for(z in c(1,5,10,20,30)) { H2oDRFResults <- AutoH2oDRFCARMA( data, TargetColumnName = "Weekly_Sales", DateColumnName = "Date", GroupVariables = c("Store","Dept"), FC_Periods = 2, TimeUnit = "week", TargetTransformation = TRUE, Lags = c(1:5, 51,52,53), MA_Periods = c(1:5, 51,52,53), CalendarVariables = TRUE, HolidayVariable = TRUE, TimeTrendVariable = TRUE, DataTruncate = FALSE, SplitRatios = c(1 - (30+z)/143, 30/143, z/143), EvalMetric = "MAE", GridTune = FALSE, ModelCount = 1, NTrees = 2000, PartitionType = "timeseries", MaxMem = "28G", NThreads = 8, Timer = TRUE) # Plot aggregate sales forecast (Stores and Departments rolled up into Total)---- H2oDRFResults$TimeSeriesPlot H2oDRF_Results <- H2oDRFResults$ModelInformation$EvaluationMetricsByGroup data.table::fwrite(H2oDRF_Results, paste0(getwd(),"/H2oDRF_Results",z,".csv")) rm(H2oDRF_Results) } ########################################### # Prepare data for AutoH2OGBMCARMA()---- ########################################### # Load Walmart Data ---- # link to manually download file: https://remixinstitute.app.box.com/v/walmart-store-sales-data/ data <- data.table::fread("https://remixinstitute.box.com/shared/static/9kzyttje3kd7l41y1e14to0akwl9vuje.csv", header = T, stringsAsFactors = FALSE) # Subset for Stores / Departments With Full Series (143 time points each)---- data <- data[, Counts := .N, by = c("Store","Dept")][Counts == 143][, Counts := NULL] # Subset Columns (remove IsHoliday column)---- keep <- c("Store","Dept","Date","Weekly_Sales") data <- data[, ..keep] for(z in c(1,5,10,20,30)) { H2oGBMResults <- AutoH2oGBMCARMA( data, TargetColumnName = "Weekly_Sales", DateColumnName = "Date", GroupVariables = c("Store","Dept"), FC_Periods = 2, TimeUnit = "week", TargetTransformation = TRUE, Lags = c(1:5, 51,52,53), MA_Periods = c(1:5, 51,52,53), CalendarVariables = TRUE, HolidayVariable = TRUE, TimeTrendVariable = TRUE, DataTruncate = FALSE, SplitRatios = c(1 - (30+z)/143, 30/143, z/143), EvalMetric = "MAE", GridTune = FALSE, ModelCount = 1, NTrees = 2000, PartitionType = "timeseries", MaxMem = "28G", NThreads = 8, Timer = TRUE) # Plot aggregate sales forecast (Stores and Departments rolled up into Total)---- H2oGBMResults$TimeSeriesPlot H2oGBM_Results <- H2oGBMResults$ModelInformation$EvaluationMetricsByGroup data.table::fwrite(H2oGBM_Results, paste0(getwd(),"/H2oGBM_Results",z,".csv")) rm(H2oGBM_Results) } ################################################## # AutoTS() and AutoCatBoostCARMA() Comparison---- ################################################## # Gather results---- for(i in c(1,5,10,20,30)) { load(paste0("C:/Users/aantico/Desktop/Work/Remix/RemixAutoML/TimerList_",i,"_.R")) load(paste0("C:/Users/aantico/Desktop/Work/Remix/RemixAutoML/OutputList_",i,"_.R")) # Assemble TS Data TimeList <- names(TimerList) results <- list() for(j in 1:2660) { results[[j]] <- cbind( StoreDept = TimeList[j], tryCatch({OutputList[[j]]$EvaluationMetrics[, .(ModelName,MAE)][ , ModelName := gsub("_.*","",ModelName) ][ , ID := 1:.N, by = "ModelName" ][ ID == 1 ][ , ID := NULL ]}, error = function(x) return( data.table::data.table( ModelName = "NONE", MAE = NA)))) } # AutoTS() Results---- Results <- data.table::rbindlist(results) # Remove ModelName == NONE Results <- Results[ModelName != "NONE"] # Average out values: one per store and dept so straight avg works---- Results <- Results[, .(MAE = mean(MAE, na.rm = TRUE)), by = c("StoreDept","ModelName")] # Group Concatenation---- Results[, c("Store","Dept") := data.table::tstrsplit(StoreDept, " ")][, StoreDept := NULL] data.table::setcolorder(Results, c(3,4,1,2)) ################################## # Machine Learning Results---- ################################## # Load up CatBoost Results---- CatBoost_Results <- data.table::fread(paste0(getwd(),"/CatBoost_Results_",i,".csv")) CatBoost_Results[, ':=' (MAPE_Metric = NULL, MSE_Metric = NULL, R2_Metric = NULL)] data.table::setnames(CatBoost_Results, "MAE_Metric", "MAE") CatBoost_Results[, ModelName := "CatBoost"] data.table::setcolorder(CatBoost_Results, c(1,2,4,3)) # Load up XGBoost Results---- XGBoost_Results <- data.table::fread(paste0(getwd(),"/XGBoost_Results",i,".csv")) XGBoost_Results[, ':=' (MAPE_Metric = NULL, MSE_Metric = NULL, R2_Metric = NULL)] data.table::setnames(XGBoost_Results, "MAE_Metric", "MAE") XGBoost_Results[, ModelName := "XGBoost"] data.table::setcolorder(XGBoost_Results, c(1,2,4,3)) # Load up H2oDRF Results---- H2oDRF_Results <- data.table::fread(paste0(getwd(),"/H2oDRF_Results",i,".csv")) H2oDRF_Results[, ':=' (MAPE_Metric = NULL, MSE_Metric = NULL, R2_Metric = NULL)] data.table::setnames(H2oDRF_Results, "MAE_Metric", "MAE") H2oDRF_Results[, ModelName := "H2oDRF"] data.table::setcolorder(H2oDRF_Results, c(1,2,4,3)) # Load up H2oGBM Results---- H2oGBM_Results <- data.table::fread(paste0(getwd(),"/H2oGBM_Results",i,".csv")) H2oGBM_Results[, ':=' (MAPE_Metric = NULL, MSE_Metric = NULL, R2_Metric = NULL)] data.table::setnames(H2oGBM_Results, "MAE_Metric", "MAE") H2oGBM_Results[, ModelName := "H2oGBM"] data.table::setcolorder(H2oGBM_Results, c(1,2,4,3)) ################################## # Combine Data---- ################################## # Stack Files---- ModelDataEval <- data.table::rbindlist( list(Results, CatBoost_Results, XGBoost_Results, H2oGBM_Results, H2oDRF_Results)) data.table::setorderv(ModelDataEval, cols = c("Store","Dept","MAE")) # Add rank---- ModelDataEval[, Rank := 1:.N, by = c("Store","Dept")] # Get Frequencies---- RankResults <- ModelDataEval[, .(Counts = .N), by = c("ModelName","Rank")] data.table::setorderv(RankResults, c("Rank", "Counts"), order = c(1,-1)) # Final table---- FinalResultsTable <- data.table::dcast(RankResults, formula = ModelName ~ Rank, value.var = "Counts") data.table::setorderv(FinalResultsTable, "1", -1, na.last = TRUE) # Rename Columns---- for(k in 2:ncol(FinalResultsTable)) { data.table::setnames(FinalResultsTable, old = names(FinalResultsTable)[k], new = paste0("Rank_",names(FinalResultsTable)[k])) } # Print print(i) print(knitr::kable(FinalResultsTable)) } var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R – Remix Institute. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

No visible binding for global variable

Mon, 19/08/2019 - 04:13

[This article was first published on Random R Ramblings, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Recently I have been working on a very large legacy project which utilises the excellent data.table package throughout. What this has resulted in is an R CMD check containing literally thousands of NOTEs similar to the following:

❯ checking R code for possible problems ... NOTE my_fn: no visible binding for global variable ‘mpg’

There are several reasons why you might see these NOTEs and, for our code base, some of the NOTEs were potentially more damaging than others. This was a problem as these NOTEs were hidden firstly by a suppression of them due to a manipulation of the _R_CHECK_CODETOOLS_PROFILE_ option of the .Renviron file. Once this was removed we discovered the more damaging NOTEs were hidden within the sheer amount of NOTEs we had in the R CMD check.

Non-standard Evaluation

If we have a function where we are using data.table’s modification by reference features, i.e. we are using a variable in an unquoted fashion (also known as non-standard evaluation (NSE)) then this issue will occur. Take the following function as an example.

my_fn <- function() { mtcars <- data.table::data.table(mtcars) mtcars[, mpg_div_hp := mpg / hp] mtcars[] }

Here, we would find the following NOTEs:

❯ checking R code for possible problems ... NOTE my_fn: no visible binding for global variable ‘mpg_div_hp’ my_fn: no visible binding for global variable ‘mpg’ my_fn: no visible binding for global variable ‘hp’ Undefined global functions or variables: hp mpg mpg_div_hp

Sometimes you may also see these NOTEs for syntactic sugar such as !! or := if you haven’t correctly imported the package they come from.

This is a well discussed issue on the internet which only became an issue after a change introduced to the core R code in version 2.15.1. There are two solutions to this problem.

Option One

Include all variable names within a globalVariables() call in the package documentation file.

globalVariables(c("mpg", "hp", "mpg_div_hp"))

For our package, as there are literally thousands of variables to list in this file, it makes it very difficult to maintain this list and makes the file very long. If, however, the variables belong to data which are stored within your package then this can be greatly simplified to

globalVariables(names(my_data))

You may wish to import any syntactic sugar functionality here as well. For example

globalVariables(c(":=", "!!")) Option Two

The second option involves binding the variable locally to the function. At the top of your function you can define the variable as a NULL value.

my_fn <- function() { mpg <- hp <- mpg_div_hp <- NULL mtcars <- data.table::data.table(mtcars) mtcars[, mpg_div_hp := mpg / hp] mtcars[] }

Therefore your variable(s) are now bound to object(s) and so the R CMD check has nothing to complain about. This is the method that the data.table team recommend and to me, feels like a much neater and more importantly maintainable solution than the first option.

A Note on the Tidyverse

You may also come across this problem whilst programming using the tidyverse for which there is a very neat solution. You simply need to be more explicit within your function by using the .data pronoun.

#' @importFrom rlang .data my_fn <- function() { mtcars %>% mutate(mpg_div_hp = .data$mpg / .data$hp) }

Note the import!

Selecting Variables with the data.table .. Prefix

NOTEs can occur when we are using the .. syntax of data.table, for example

double_dot <- function() { mtcars <- data.table::data.table(mtcars) select_cols <- c("cyl", "wt") mtcars[, ..select_cols] }

This will yield

❯ checking R code for possible problems ... NOTE Undefined global functions or variables: ..select_cols

In this instance, this can be solved by avoiding the .. syntax and using the alternative with = FALSE notation.

double_dot <- function() { mtcars <- data.table::data.table(mtcars) select_cols <- c("cyl", "wt") mtcars[, select_cols, with = FALSE] }

Even though the .. prefix is syntactic sugar, we cannot use globalVariables(c("..")) since the actual variable in this case is ..select_cols; we would therefore need to use globalVariables(c("..select_cols")) if we wanted to use the globalVariables() approach.

Missing Imports

In our code base, I also found NOTEs for functions or datasets which were not correctly imported. For example, consider the following simple function.

Rversion <- function() { info <- sessionInfo() info$R.version }

This gives the following NOTEs:

❯ checking R code for possible problems ... NOTE Rversion: no visible global function definition for ‘sessionInfo’ Consider adding importFrom("utils", "sessionInfo") to your NAMESPACE file.

Here the R CMD check is rather helpful and tells us the solution; we need to ensure that we explicitly import the function from the utils package in the documentation. This can easily be done with the roxygen2 package by including an @importFrom utils sessionInfo tag.

Trying to Call Removed Functionality

If you have a function which has been removed from your package but attempt to call it from another function, R will only give you a NOTE about this.

use_non_existent_function <- function() { this_function_doesnt_exist() }

This will give the NOTE

❯ checking R code for possible problems ... NOTE use_non_existent_function: no visible global function definition for ‘this_function_doesnt_exist’

Of course it goes without saying that you should make sure to remove any calls to functions which have been removed from your package. As a side note, when I first started working on the project, I was initially unaware that within our package we had the option _R_CHECK_CODETOOLS_PROFILE_="suppressUndefined=TRUE" set within our .Renviron file which will suppresses all unbound global variable NOTEs from appearing in the R CMD check. However given that this can mask these deeper issues within your package, such as not recognising when a function calls functionality which has been removed from the package. This can end up meaning the end user can face nasty and confusing error messages. Therefore I would not recommend using this setting and would suggest tackling each of your packages NOTEs individually to remove them all.

I actually discovered all of our package NOTEs when introducing the lintr package to our CI pipeline. lintr will pick up on some – but not all – of these unbound global variable problems ('lintr of course does not take the _R_CHECK_CODETOOLS_PROFILE_ into account). Take our original function as an example

my_fn <- function() { mtcars <- data.table::data.table(mtcars) mtcars[, mpg_div_hp := mpg / hp] mtcars[] }

Here, lintr will highlight the variables mpg and hp as problems but it currently won’t highlight the variables on the LHS of :=, i.e. mpg_div_hp.

Conclusion

When developing your package, if you are experiencing these unbound global variables NOTEs you should

  1. Strive to define any unbound variables locally within a function.
  2. Ensure that any functions or data from external packages (including utils, stats, etc.) have the correct @importFrom tag
  3. Do not suppress this check in the .Renviron file and the solutions proposed here should remove the current need to do so
  4. Any package wide unbound variables, which are typically syntactic sugar (e.g. :=), should be defined within the package description file inside a globalVariables() function, which should be a very short and maintainable list.
var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: Random R Ramblings. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Mueller Report Volume 1: Network Analysis

Mon, 19/08/2019 - 02:46

[This article was first published on sweissblaug, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

settle down and have another cup of coffee code TLDR There are a lot of Russian’s talking to a lot of Trump campaign members in Mueller report. There are so many it’s tough to get your head around it all. In this post I attempted some network analysis on the relations between campaign officials and Russians. I found that one can ‘compress’ Russian involvement into 9 (mostly) distinct groups. I then summarize these points of contacts. Introduction to Mueller Report Volume 1 of Mueller Report starts with Russian interference in 2016 US Presidential Elections. Russia did so in two Ways.

The first was a campaign by the IRA that used social media tools like facebook and twitter with the goal of changing public opinion. While there were some retweets by Trump and his campaign officials from these accounts there wasn’t much direct communication.

The second form was to use Russian intelligence to hack Hillary Clinton’s emails. These hacked emails were released with help of wikileaks and guccifer 2.0. Trump’s campaign deliberately tried to find other hacked emails and encourages Russia to do so public. However, the campaign could not find additional information on these emails.

The rest of Volume 1 discusses the numerous relationship between trump campaign officials and Russians. It’s this part that will be the basis for most of the results below. The data Volume 1 consists of 199 pages including foot-notes and appendices. I found a machine readable version here. I split the text into sentences and looked at whether a person’s name was included in that sentence. This left me with a sentence by name matrix that is the starting point of my analysis. There are some drawbacks to this in that OCR does not immediately distinguish sentences. In addition it often groups footnotes with last line of sentences in a page. But it seemed like a good starting point so went ahead.

Below are the top 20 most common occurring names. 

Papadopulos, Manafort, Kushner, Cohen, Trump JR, and Flynn are all in the top. Considering they were all, to varying degrees, worked in the Trump campaign this makes sense. We also see some Russian names such as Dmitriev, Kilimnik, and Kislyak. I’ll explain their contacts below. I then created a person x person matrix that counted the number of times a name co-occurs with another. I’m treating this as a weighted, undirected graph. I transformed this to a laplacian matrix and performed an eigen decomposition. This is known as a spectral analysis of a network. Basically this tries to find locations that minimizes the square error of the relations. Below is the resulting image of 2nd to last and 3rd to last eigenvectors.

WHOA … I’m getting a headache looking at this.

But it definitely looks like there is structure in the graph. There appears to be some clusters forming and these do correspond to particular events described in the report. In the lower left you can see Papadopoulos related characters, in the upper right some cohen acquaintances, and around 0,.1 there’s the trump tower meeting. Not bad but still messy. I’m looking for distinct clusters.

What if we look at only the Russians in the graph?

Ok! Now we’re talking. There are 6 distinct clusters of Russians here. That means there are no relations between these clusters and each correspond to a unique set of relations with trump campaign officials. I played around with this some more but the text data was too messy to have robust analysis. Co-occurring names do not pick up everything and due to sentence parsing errors somethings lead to erroneous relations.

Finally, I gave up on trying to only use text analysis, read volume one, and manually created a network found here. With that I created groupings using the above chart as a starting point. I found 9 fairly distinct clusters of Russians. Below you can see the relationships between those groups and various members of the Trump campaign.

I then further grouped them into 4 broad categories which I’ve named; Trump Business, The Opportunists, The Professionals, and Russian Officials and Lackeys. I also included whether a trump campaign officials interaction was of first degree (they were in meeting or talked explicitly with Russian Group in question) or second degree (they were aware of meeting). Below are my summaries for each.

Trump Business
  • Group 1
    • agalarov, aras, goldstone, samochornov, veselnitskaya, kaveladze, akhmetshin
  • Group 2
    • klokov, erchova
  • Group 3
    • rtskhiladze, rozov
  • Group 5
    • peskov
Aras Agalorov (he has a son Emin. I did not disambiguate the difference between them) is a billionaire Russian Property Developer that worked with trump to create Miss Universe Pageant in 2013. They Discussed creating a Trump tower in moscow in late 2013 and discussed with Donald Trump JR (DTJ) and Ivanka Trump but did not progress.

In Summer of 2015 Group 3 signed a letter of intent to build the trump tower in moscow and met with Ivanka and DTJ.

While this was happening Group 2 contacted cohen to discuss Trump tower in moscow and a meeting with Trump. Cohen thought this person was a pro-wrestler but that did not seem to bother and agreed to talk about business. They wanted to set up a meeting with Trump and Putin but Cohen wanted to keep clear of politics and it went nowhere.

Finally, due to the slowness of progress in Trump Tower Moscow deal from Group 2 cohen reached out to Peskov, Press Secretary for Putin, to try and get in touch with Putin directly and begin building. Cohen worked on moscow deal through summer of 2016 but it went nowhere. During campaign the Emin Agalorov, at the behest of his father, setup a meeting with DJT to discuss hacked emails. This lead to the infamous Trump Tower meeting that involved DJT, Kushner, and Manafort and other Russians in Group 1. DJT discussed this meeting with others in the campaign as well including Gates. Kushner showed up late to the meeting and texted manafort during that this was a ‘waste of time’ and texted others to call him to get out and he subsequently left early. The meeting did not provide any information to Trump campaign.

The Oppurtunists
  • Group 4
    • mifsud, polonskaya, timofeev, millian
  • Group 5
    • klimentov, poliakova, peskov, dvorkovich
Papadoplous and Page had similar experiences with the Trump campaign and they both seemed to be in it for the opportunity it presented themselves. Both padded his resume to look more important than he was to get the job and both foreign policy advisory roles.

Papadoplous got the job of foreign policy advisor in march 2016. He met Mifsud, a Maltese Professor, in Rome at a meeting for London Centre of International Law Practice shortly after. Upon learning that Papadoplous was employed by campaign Mifsud took interest and spoke of his Russian connections. Papadopolous, thinking that having more russian connections could help his stature in the Trump Campaign, pursued this relationship. They met the following week in London where Mifsud introduced him to Polonskaya. Papadopolous relayed his new contacts with Clovis and received an approving response. This relationship continued and Mifsud said Russia had ‘dirt’ on Clinton during a meeting in late April. Ten days later he told a foreign official about his contacts and knowledge of dirt on Clinton. He then discussed a Trump meeting with Putin to Lewandowski, Miller, and Manafort. Manafort made clear that Trump should not meet with Putin directly.

Page also joined the campaign in march 2016 as a foreign policy advisor. He had previously lived and worked in Russia and had several Russian contacts. He was invited to talk to the New Economic School in Russia in July and asked for permission. Clovis responded that if he went he could not speak for Trump Campaign. His talk was critical of US policy towards Russia and was received welcomingly from Russian Deputy Prime Minister and others. After he met Kislyak in July in Cleveland. These activities drew the attention of the media and was removed from campaign in late september.

After Election Page went to Russia in an unofficial role after the election in late 2016. He again met with Russians in Group 5. The Professionals
  • Group 6
    • oknyansky, rasin
  • Group 7
    • kilimnik, deripaska, boyarkin, oganov
Paul Manfort and Roger Stone are political consultants and previously worked together. Roger Stone worked alongside the campaign to help but was never officially apart of the campaign. Manafort joined in March 2016 and was the chairman between June and August.

Caputo set up a meeting Stone with Group 6, Oknyansky and Rasin, to get dirt on Clinton in May 2016. Rasin claimed to have information on money laundering activities by Clinton. Stone refused the offer because they asked for too much money.

Also, Stone had some contact with the twitter account Guccifer 2.0 (not shown above). This was the front used by the GRU to release stolen documents. Curiously, his name was redacted on page 45 in the Mueller report because of ‘Harm to ongoing matter’. Seems a little weird to redact something when it’s public information.

From March 2016 until his depart Manafort gave and ordered Gates to give campaign updates to Kilimnik. Kilimnik is thought to be a Russian spy and has connections with Deripaska, a Russian billionaire who Manafort owed money to. Manafort gave polling data on the Trump campaign and met with Kilimnik twice in person; once in May and then again in August. It’s not clear why Manafort gave this data to Kilimnik although Gates thought it was to ingratiate himself to Deripaska. Deripaska and his deputy Boyarkin were subsequently sanctioned by the US Treasury. Russian Officials and Lackeys
  • Group 8
    • kislyak, gorkov
  • Group 9
    • aven, dmitriev
The final groups deal with Russian Officials and and Putin’s Billionaires. Sessions and Kushner met with Kislyak, the Russian Ambassador to the US, first in April at a Trump Foreign Policy Conference. These were brief handshake affairs that lasted about a couple of minutes. Sessions does not recall seeing Kislyak.

Sessions, Gordon, and Page met with Kislyak at the Republican National Convention in July. He was one of approximately 80 foreign ambassadors to the US that were invited. Gordon and Sessions met with Kislyak for a few minutes after their speeches. Gordon, Page, and Kislyak later sat at the same table and discussed improving US Russian Relations for a few minutes. Gordon received an email in August to meet with Kislyak but declined due to ‘constant stream of false media stories’ and offered to rain check the meeting.

In August Russian Embassy set up a meeting with Sessions in Kislyak and the two met in September at Session’s Senate office. Meeting lasted 30 minutes and Kislyak tried to set up another meeting but Session’s didn’t follow up. Sessions got into trouble by not disclosing his meetings with Kislyak and was part of the reason he recused himself from what became known as the Mueller report.

Following the election in November Kislyak reached out to Kushner but Kushner did not think Kislyak had a direct line to Putin and was therefore not important enough to talk to. Nevertheless Kushner met with Kislayk in November at Trump tower and invited Flynn and spoke for about 30 minutes about repairing US Russian Relations. Kislyak suggested using a secure line to talk to Russian generals about Syrian war. Kushner said he had no secure lines to use and asked if they could use russian facilities but Kislyak rejected that idea.

Kislyak tried to get another meeting with Kushner but Kushner sent his assistant instead. Kislyak proposed meeting with Gorkov, the head of a Russian owned bank, instead. Kushner agreed and they met in December. Kushner said that meeting was about restoring US – Russian Relations. Gorkov said it was about Kushner’s personal business. They did not have any follow up meetings.

In december Flynn talked with Kislyak about two separate topics. The first was to convince Russia to Veto anti-Israel resolution on settlements in the UN where it was thought the Obama administration would abstain. Russia did not vote against it. The second was to convince Russia not to retaliate against new sanctions for meddling in US elections. Mcfarland and Bannon were aware of Flynn’s discussions about the sanctions. Russia did not apply retaliatory sanctions. Finally there were two billionaires men that Putin ‘deputized’ to create contacts with the Trump Campaign after the election; Aven and Dmitriev. Aven said recalled that Putin did not know who to contact to get in touch with President Elect Trump. Aven did not make direct contact to campaign but Dmitriev did through two avenues. One was to try and convince Kushner’s friend to setup a meeting. Kushner circulated this opportunity internally but it went nowhere. The other was meeting with Erick Prince, a supporter of Trump but not officially in campaign, in the Seychelles Islands. Prince discussed his meeting with Bannon but Bannon does not have a recollection of it. Some notable connections In general these Russian Groupings were distinct in the people they talked to and had little obvious contact with one another. Some notable exceptions are:
  • Peskov talked to Cohen and Page independently
  • Dmitriev and Peskov might have talked to eachother (p. 149) but there was some ‘investigative technique’ redactions so I’m not sure
  • Kilimnik was aware of Page’s December visit to Russia and discussed with Manafort saying “Carter Page is in Moscow today, sending messages he is authorized to talk to Russia on behalf of DT on a range of issues of mutual interest, including Ukraine” p. 166. Leads me to ask: who would know the whereabouts and discussions of other people? Spies. Thats who.
Conclusions on Volume 1 Overall, I get the impression that the Trump campaign did not have the ‘best people’. Cohen tried to make a deal but couldn’t find the right people to talk to. Papadopolous and DJT tried to get dirt on Clinton but couldn’t find anything. Page seemed to use the campaign as a platform to create more connections with Russians. A few ‘friends’ (Stone and Prince) lent a hand but probably hurt Trump’s credibility by dealing with Russians more than they helped him. Manafort, a seasoned campaigner, wasn’t obviously working for Trump… he worked for free after all. It seemed like a group that were willing to do shady things, for their own personal gain, but without the ability to follow through. SAD! All Together Graph Conclusions on Analysis Running text analysis before reading report was very helpful to understanding it. There are just so many connections going on it’s hard to keep track. Running some basic clustering techniques as described above helped me zone into what to look for while reading the report. var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: sweissblaug. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Dash with golem: The beginning

Mon, 19/08/2019 - 00:16

[This article was first published on Rtask, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

{golem} has been developed to help building big Shiny application to put in production. What if {golem} could be used to build another popular interactive web application, recently made available to R programmers: Dash ? Dash, a newcomer in interactive web applications A few days ago, Plotly announced Dash now available for R. After reading this announcement, I thought this

L’article Dash with golem: The beginning est apparu en premier sur Rtask.

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: Rtask. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

How To Select Multiple Columns Using Grep & R

Sun, 18/08/2019 - 22:31

[This article was first published on Data Science Using R – FinderDing, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Why you need to be using Grep when programming with R.

There’s a reason that grep is included in most if not all programming language to this day 44 years later from creation. It’s useful and simple to use. Below is an example of using grep to make selecting multiple columns in R simple and easy to read.

The dataset below has the following column names.

names(data) # Column Names [1] "fips" "state" "county" "metro_area" [5] "population" "med_hh_income" "poverty_rate" "population_lowaccess" [9] "lowincome_lowaccess" "no_vehicle_lowaccess" "s_grocery" "s_supermarket" [13] "s_convenience" "s_specialty" "s_farmers_market" "r_fastfood" [17] "r_full_service"

How can we select only the columns we need to work with?

  • metro_area
  • med_hh_income
  • poverty_rate
  • population_lowaccess
  • lowincome_lowaccess
  • no_vehicle_lowaccess
  • s_grocery
  • s_supermarket
  • s_convenience
  • s_specialty
  • s_farmers_market
  • r_fastfood
  • r_full_service

We can tell R exactly by listing each column as below

data[c("metro_area","med_hh_income", "poverty_rate", "population_lowaccess", "lowincome_lowaccess", "no_vehicle_lowaccess","s_grocery","s_supermarket","s_convenience","s_specialty","s_farmers_market", "r_fastfood", "r_full_service")] OR

We can tell R where each column we want is.

data[c(4,6,7:17)]

First, writing out each individual column is time consuming and chances are you’re going to make a typo (I did when writing it). Second option we have to first figure out where the columns are located to then tell R. Well looking at the columns we are trying to access vs the others theirs a specific difference. All these columns have a “_” located in there name, and we can use regular expressions (grep) to select these.

data[grep("_", names(data))])

FYI… to get the column locations you can actually use…

grep("_", names(data)) [1] 4 6 7 8 9 10 11 12 13 14 15 16 17

You will rarely have a regular expression as easy at “_” to select multiple columns, a very useful resource to learn and practice is https://regexr.com

Data was obtained from https://www.ers.usda.gov/data-products/food-access-research-atlas/download-the-data/

The post How To Select Multiple Columns Using Grep & R appeared first on FinderDing.

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: Data Science Using R – FinderDing. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

noaastorms R package now supports NOAA IBTrACS v4

Sun, 18/08/2019 - 02:00

[This article was first published on Blog - BS, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Earlier this year, I released a simple R package (available at basilesimon/noaastorms) that downloads, cleans and parses NOAA IBtrack data for you.

As the NOAA updated its datasets, noaastorms is now using these!

How to install library(devtools) install_github("basilesimon/noaastorms") Available functions

getStorms: Fetch NOAA historical best track storms data

> df <- getStorms(c('EP')) > head(df[1:5]) Serial_Num Season Num Basin Sub_basin Name 2 1902276N14266 1902 01 EP MM UNNAMED 3 1902276N14266 1902 01 EP MM UNNAMED 4 1902276N14266 1902 01 EP MM UNNAMED 5 1902276N14266 1902 01 EP MM UNNAMED 6 1902276N14266 1902 01 EP MM UNNAMED

The first argument is a vector of basin codes from this list:

  • NA: North Atlantic
  • SA: South Atlantic
  • NI: North Indian
  • SI: South Indian
  • EP: East Pacific
  • SP: South Pacific
  • WP: West Pacific

To get storms that took place in the Atlantic for example, run getStorms(c('NA', 'SA')).

The second (optional) argument is a date range to filter data with. For example:

dateRange <- c(as.Date('2010-01-01'), as.Date('2012-12-31')) getStorms(c('NA', 'SA'), dateRange = dateRange)

Will query storms that took place in the Atlantic in 2010 and 2012.

Usage # load a map of the world and # use `clipPolys` to avoid issues # when zooming in with `coord_map` wm <- map_data("world") library("PBSmapping") data.table::setnames(wm, c("X","Y","PID","POS","region","subregion")) worldmap <- clipPolys(wm, xlim=c(20,110),ylim=c(0, 45), keepExtra=TRUE) # load storms for the Atlantic ocean spStorms <- getStorms(c('NA', 'SA')) ggplot(spStorms, aes(x = Longitude, y = Latitude, group = Serial_Num)) + geom_polygon(data = worldmap, aes(x = X, y = Y, group = PID), fill = "whitesmoke", colour = "gray10", size = 0.2) + geom_path(alpha = 0.1, size = 0.8, color = "red") + coord_map(xlim = c(20,110), ylim = c(0, 45))

Official changelog (retrieved Aug 16, 2019)

[https://www.ncdc.noaa.gov/ibtracs/index.php?name=status][https://www.ncdc.noaa.gov/ibtracs/index.php?name=status]

This is the first release of IBTrACS version 04. It is updated weekly.
Release date: March 2019

New features (improvements from v03):
* Best track data updated daily and contain provisional tracks of recent storms.
* Reduced formats – Version 4 is available in 3 formats (netCDF, CSV, shapefiles)
* Consistent formats – The data presented in each format is completely interconsistent (identical).
* More parameters – More parameters provided by the agencies are provided in IBTrACS
* Basin assignment – Any system occuring in a basin is included in that basin file (in version 3, the storm was only included in the basin in which it had its genesis)
* New derived parameters – We provide storm translation speed and direction and other variables requested by users.

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: Blog - BS. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Missing Values In Dataframes With Inspectdf

Sun, 18/08/2019 - 02:00

[This article was first published on Alastair Rushworth, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Summarising NA by column in dataframes

Exploring the number of records containing missing values in a new set
of data is an important and well known exploratory check. However, NAs
can be introduced into your data for a multitude of other reasons, often
as a side effect of data manipulations like transforming columns or
performing joins. In most cases, the behaviour is expected, but
sometimes when things go wrong, tracing missing values back through a
sequence of steps can be a helpful diagnostic.

All of that is to say that it’s vital to have simple tools for
interrogating dataframes for missing values… enter inspectdf!

Missingness by column: inspectdf::inspect_na()

The inspect_na() function from the inspectdf package is a simple
tool designed to quickly summarise the frequency of missingness by
columns in a dataframe. Firstly, install the inspectdf package by
running

install.packages("inspectdf")

Then load both the inspectdf and dplyr packages – the latter we’ll
just use for its built-in starwars dataset.

# load packages library(inspectdf) library(dplyr) # quick peek at starwars data that comes with dplyr head(starwars) ## # A tibble: 6 x 13 ## name height mass hair_color skin_color eye_color birth_year gender ## ## 1 Luke… 172 77 blond fair blue 19 male ## 2 C-3PO 167 75 gold yellow 112 ## 3 R2-D2 96 32 white, bl… red 33 ## 4 Dart… 202 136 none white yellow 41.9 male ## 5 Leia… 150 49 brown light brown 19 female ## 6 Owen… 178 120 brown, gr… light blue 52 male ## # … with 5 more variables: homeworld , species , films , ## # vehicles , starships

So how many missing values are there in starwars? Even looking at the
output of the head() function reveals that there are at least a few
NAs in there. The use of the inspect_na() function is very
straightforward:

starwars %>% inspect_na ## # A tibble: 13 x 3 ## col_name cnt pcnt ## ## 1 birth_year 44 50.6 ## 2 mass 28 32.2 ## 3 homeworld 10 11.5 ## 4 height 6 6.90 ## 5 hair_color 5 5.75 ## 6 species 5 5.75 ## 7 gender 3 3.45 ## 8 name 0 0 ## 9 skin_color 0 0 ## 10 eye_color 0 0 ## 11 films 0 0 ## 12 vehicles 0 0 ## 13 starships 0 0

The output is a simple tibble with columns showing the count (cnt)
and percentage (pcnt) of NAs corresponding to each column
(col_name) in the starwars data. For example, we can see that the
birth_year column has the highest number of NAs with over half
missing. Note that the tibble is sorted in descending order of the
frequency of NA occurrence.

By adding the show_plot command, the tibble can also be displayed
graphically:

starwars %>% inspect_na %>% show_plot

Although this is a simple summary, and you’ll find many other ways to do
this in R, I use this all of the time and find it very convenient to
have a one-liner to call on. Code efficiency matters!

More on the inspectdf package and exploratory data analysis

inspectdf can be used to produce a number of common summaries with
minimal effort. See previous posts to learn how to explore and
visualise categorical
data

and to calculate and display correlation
coefficients
.
For a more general overview, have a look at the package
website
.

For a recent overview of R packages for exploratory analysis, you might
also be interested in the recent paper The Landscape of R Packages for
Automated Exploratory Data Analysis by Mateusz Staniak and Przemysław
Biecek
.

Comments? Suggestions? Issues?

Any feedback is welcome! Find me on twitter at
rushworth_a or write a github
issue
.

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: Alastair Rushworth. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Tech Dividends, Part 2

Sat, 17/08/2019 - 02:00

[This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.




In a previous post, we explored the dividend history of stocks included in the SP500, and we followed that with exploring the dividend history of some NASDAQ tickers. Today’s post is a short continuation of that tech dividend theme, with the aim of demonstrating how we can take our previous work and use it to quickly visualize research from the real world. In this case, the inspiration is the July 27th edition of Barron’s, which has an article called 8 Tech Stocks That Yield Steady Payouts. (As of this writing, a subscription is required to view that article, unfortunately. If you do peruse that issue, I also recommend the interview with GMO’s Jeff Montier, as well, as he offers an interesting viewpoint on modern monetary theory.)

The article breaks out eight tech stock with attractive dividends: IBM, HPQ, TXN, CSCO, INTC, ORCL, AAPL and MSFT. It also mentions QCOM as an interesting stock to watch. We’ll piggyback on the substance of the article and visualize the dividend history of those nine tickers.

First, let’s load up our packages and create a vector of tickers called barrons_tickers. We will pass that vector to tq_get(get = "dividends") just as we did last time. Indeed, we’re not going to do much differently today, but hopefully it’s a nice way to see how previous work can be applied to other situations. Ah, the joys of code that can be reused!

library(tidyverse) library(tidyquant) library(janitor) library(plotly) barrons_tickers <- c("IBM", "HPQ", "TXN", "CSCO", "INTC", "ORCL", "AAPL", "MSFT", "QCOM") barrons_dividends <- barrons_tickers %>% tq_get(get = "dividends")

We can reuse our code from the previous post to quickly visualize these tickers’ dividend histories, along with a detailed tooltip setting in plotly.

ggplotly( barrons_dividends %>% group_by(symbol) %>% mutate(info = paste(date, '
symbol:', symbol, '
div: $', dividends)) %>% ggplot(aes(x = date, y = dividends, color = symbol, label_tooltip = info)) + geom_point() + scale_y_continuous(labels = scales::dollar) + scale_x_date(breaks = scales::pretty_breaks(n = 10)) + labs(x = "", y = "div/share", title = "Nasdaq dividends") + theme(plot.title = element_text(hjust = 0.5)), tooltip = "label_tooltip" )

{"x":{"data":[{"x":[15561,15651,15743,15834,15925,16015,16107,16198,16289,16380,16471,16562,16653,16744,16835,16926,17017,17108,17206,17297,17388,17480,17571,17662,17753,17843,17935,18026,18117],"y":[0.0540814285714286,0.0540814285714286,0.0540814285714286,0.0622442857142857,0.0622442857142857,0.0622442857142857,0.0622442857142857,0.0671428571428571,0.47,0.47,0.47,0.52,0.52,0.52,0.52,0.57,0.57,0.57,0.57,0.63,0.63,0.63,0.63,0.73,0.73,0.73,0.73,0.77,0.77],"text":["info: 2012-08-09
symbol: AAPL
div: $ 0.0540814285714286","info: 2012-11-07
symbol: AAPL
div: $ 0.0540814285714286","info: 2013-02-07
symbol: AAPL
div: $ 0.0540814285714286","info: 2013-05-09
symbol: AAPL
div: $ 0.0622442857142857","info: 2013-08-08
symbol: AAPL
div: $ 0.0622442857142857","info: 2013-11-06
symbol: AAPL
div: $ 0.0622442857142857","info: 2014-02-06
symbol: AAPL
div: $ 0.0622442857142857","info: 2014-05-08
symbol: AAPL
div: $ 0.0671428571428571","info: 2014-08-07
symbol: AAPL
div: $ 0.47","info: 2014-11-06
symbol: AAPL
div: $ 0.47","info: 2015-02-05
symbol: AAPL
div: $ 0.47","info: 2015-05-07
symbol: AAPL
div: $ 0.52","info: 2015-08-06
symbol: AAPL
div: $ 0.52","info: 2015-11-05
symbol: AAPL
div: $ 0.52","info: 2016-02-04
symbol: AAPL
div: $ 0.52","info: 2016-05-05
symbol: AAPL
div: $ 0.57","info: 2016-08-04
symbol: AAPL
div: $ 0.57","info: 2016-11-03
symbol: AAPL
div: $ 0.57","info: 2017-02-09
symbol: AAPL
div: $ 0.57","info: 2017-05-11
symbol: AAPL
div: $ 0.63","info: 2017-08-10
symbol: AAPL
div: $ 0.63","info: 2017-11-10
symbol: AAPL
div: $ 0.63","info: 2018-02-09
symbol: AAPL
div: $ 0.63","info: 2018-05-11
symbol: AAPL
div: $ 0.73","info: 2018-08-10
symbol: AAPL
div: $ 0.73","info: 2018-11-08
symbol: AAPL
div: $ 0.73","info: 2019-02-08
symbol: AAPL
div: $ 0.73","info: 2019-05-10
symbol: AAPL
div: $ 0.77","info: 2019-08-09
symbol: AAPL
div: $ 0.77"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(248,118,109,1)","opacity":1,"size":5.66929133858268,"symbol":"circle","line":{"width":1.88976377952756,"color":"rgba(248,118,109,1)"}},"hoveron":"points","name":"AAPL","legendgroup":"AAPL","showlegend":true,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[15062,15160,15251,15342,15433,15523,15615,15671,15799,15887,15979,16072,16161,16253,16343,16437,16525,16617,16709,16804,16895,16987,17077,17170,17260,17352,17443,17535,17625,17717,17808,17899,17990,18080],"y":[0.06,0.06,0.06,0.06,0.08,0.08,0.14,0.14,0.17,0.17,0.17,0.17,0.19,0.19,0.19,0.19,0.21,0.21,0.21,0.21,0.26,0.26,0.26,0.26,0.29,0.29,0.29,0.29,0.33,0.33,0.33,0.33,0.35,0.35],"text":["info: 2011-03-29
symbol: CSCO
div: $ 0.06","info: 2011-07-05
symbol: CSCO
div: $ 0.06","info: 2011-10-04
symbol: CSCO
div: $ 0.06","info: 2012-01-03
symbol: CSCO
div: $ 0.06","info: 2012-04-03
symbol: CSCO
div: $ 0.08","info: 2012-07-02
symbol: CSCO
div: $ 0.08","info: 2012-10-02
symbol: CSCO
div: $ 0.14","info: 2012-11-27
symbol: CSCO
div: $ 0.14","info: 2013-04-04
symbol: CSCO
div: $ 0.17","info: 2013-07-01
symbol: CSCO
div: $ 0.17","info: 2013-10-01
symbol: CSCO
div: $ 0.17","info: 2014-01-02
symbol: CSCO
div: $ 0.17","info: 2014-04-01
symbol: CSCO
div: $ 0.19","info: 2014-07-02
symbol: CSCO
div: $ 0.19","info: 2014-09-30
symbol: CSCO
div: $ 0.19","info: 2015-01-02
symbol: CSCO
div: $ 0.19","info: 2015-03-31
symbol: CSCO
div: $ 0.21","info: 2015-07-01
symbol: CSCO
div: $ 0.21","info: 2015-10-01
symbol: CSCO
div: $ 0.21","info: 2016-01-04
symbol: CSCO
div: $ 0.21","info: 2016-04-04
symbol: CSCO
div: $ 0.26","info: 2016-07-05
symbol: CSCO
div: $ 0.26","info: 2016-10-03
symbol: CSCO
div: $ 0.26","info: 2017-01-04
symbol: CSCO
div: $ 0.26","info: 2017-04-04
symbol: CSCO
div: $ 0.29","info: 2017-07-05
symbol: CSCO
div: $ 0.29","info: 2017-10-04
symbol: CSCO
div: $ 0.29","info: 2018-01-04
symbol: CSCO
div: $ 0.29","info: 2018-04-04
symbol: CSCO
div: $ 0.33","info: 2018-07-05
symbol: CSCO
div: $ 0.33","info: 2018-10-04
symbol: CSCO
div: $ 0.33","info: 2019-01-03
symbol: CSCO
div: $ 0.33","info: 2019-04-04
symbol: CSCO
div: $ 0.35","info: 2019-07-03
symbol: CSCO
div: $ 0.35"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(211,146,0,1)","opacity":1,"size":5.66929133858268,"symbol":"circle","line":{"width":1.88976377952756,"color":"rgba(211,146,0,1)"}},"hoveron":"points","name":"CSCO","legendgroup":"CSCO","showlegend":true,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[14312,14403,14501,14592,14683,14774,14865,14956,15047,15138,15229,15320,15411,15502,15593,15684,15775,15866,15957,16048,16139,16230,16321,16412,16503,16594,16682,16776,16867,16958,16965,17056,17147,17231,17329,17421,17512,17603,17694,17785,17877,17967,18058],"y":[0.0164986376021798,0.0164986376021798,0.0164986376021798,0.0164986376021798,0.0164986376021798,0.0164986376021798,0.0164986376021798,0.0164986376021798,0.0164986376021798,0.0247502270663034,0.0247502270663034,0.0247502270663034,0.0247502270663034,0.0272252497729337,0.0272252497729337,0.0272252497729337,0.0272252497729337,0.0299046321525886,0.0299046321525886,0.0299046321525886,0.0299046321525886,0.0329972752043597,0.0329972752043597,0.0329972752043597,0.0329972752043597,0.0362988192552225,0.0362988192552225,0.124,0.124,0.124,0.124,0.124,0.133,0.133,0.133,0.133,0.139,0.139,0.139,0.139,0.16,0.16,0.16],"text":["info: 2009-03-09
symbol: HPQ
div: $ 0.0164986376021798","info: 2009-06-08
symbol: HPQ
div: $ 0.0164986376021798","info: 2009-09-14
symbol: HPQ
div: $ 0.0164986376021798","info: 2009-12-14
symbol: HPQ
div: $ 0.0164986376021798","info: 2010-03-15
symbol: HPQ
div: $ 0.0164986376021798","info: 2010-06-14
symbol: HPQ
div: $ 0.0164986376021798","info: 2010-09-13
symbol: HPQ
div: $ 0.0164986376021798","info: 2010-12-13
symbol: HPQ
div: $ 0.0164986376021798","info: 2011-03-14
symbol: HPQ
div: $ 0.0164986376021798","info: 2011-06-13
symbol: HPQ
div: $ 0.0247502270663034","info: 2011-09-12
symbol: HPQ
div: $ 0.0247502270663034","info: 2011-12-12
symbol: HPQ
div: $ 0.0247502270663034","info: 2012-03-12
symbol: HPQ
div: $ 0.0247502270663034","info: 2012-06-11
symbol: HPQ
div: $ 0.0272252497729337","info: 2012-09-10
symbol: HPQ
div: $ 0.0272252497729337","info: 2012-12-10
symbol: HPQ
div: $ 0.0272252497729337","info: 2013-03-11
symbol: HPQ
div: $ 0.0272252497729337","info: 2013-06-10
symbol: HPQ
div: $ 0.0299046321525886","info: 2013-09-09
symbol: HPQ
div: $ 0.0299046321525886","info: 2013-12-09
symbol: HPQ
div: $ 0.0299046321525886","info: 2014-03-10
symbol: HPQ
div: $ 0.0299046321525886","info: 2014-06-09
symbol: HPQ
div: $ 0.0329972752043597","info: 2014-09-08
symbol: HPQ
div: $ 0.0329972752043597","info: 2014-12-08
symbol: HPQ
div: $ 0.0329972752043597","info: 2015-03-09
symbol: HPQ
div: $ 0.0329972752043597","info: 2015-06-08
symbol: HPQ
div: $ 0.0362988192552225","info: 2015-09-04
symbol: HPQ
div: $ 0.0362988192552225","info: 2015-12-07
symbol: HPQ
div: $ 0.124","info: 2016-03-07
symbol: HPQ
div: $ 0.124","info: 2016-06-06
symbol: HPQ
div: $ 0.124","info: 2016-06-13
symbol: HPQ
div: $ 0.124","info: 2016-09-12
symbol: HPQ
div: $ 0.124","info: 2016-12-12
symbol: HPQ
div: $ 0.133","info: 2017-03-06
symbol: HPQ
div: $ 0.133","info: 2017-06-12
symbol: HPQ
div: $ 0.133","info: 2017-09-12
symbol: HPQ
div: $ 0.133","info: 2017-12-12
symbol: HPQ
div: $ 0.139","info: 2018-03-13
symbol: HPQ
div: $ 0.139","info: 2018-06-12
symbol: HPQ
div: $ 0.139","info: 2018-09-11
symbol: HPQ
div: $ 0.139","info: 2018-12-12
symbol: HPQ
div: $ 0.16","info: 2019-03-12
symbol: HPQ
div: $ 0.16","info: 2019-06-11
symbol: HPQ
div: $ 0.16"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(147,170,0,1)","opacity":1,"size":5.66929133858268,"symbol":"circle","line":{"width":1.88976377952756,"color":"rgba(147,170,0,1)"}},"hoveron":"points","name":"HPQ","legendgroup":"HPQ","showlegend":true,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[14281,14370,14462,14554,14648,14735,14827,14921,15013,15100,15194,15286,15378,15468,15560,15651,15742,15833,15924,16015,16107,16197,16288,16380,16472,16561,16562,16653,16745,16839,16927,16930,17021,17113,17205,17294,17386,17479,17570,17660,17752,17843,17934,18025,18116],"y":[0.5,0.55,0.55,0.55,0.55,0.65,0.65,0.65,0.65,0.75,0.75,0.75,0.75,0.85,0.85,0.85,0.85,0.95,0.95,0.95,0.95,1.1,1.1,1.1,1.1,1.3,1.3,1.3,1.3,1.3,1.4,1.4,1.4,1.4,1.4,1.5,1.5,1.5,1.5,1.57,1.57,1.57,1.57,1.62,1.62],"text":["info: 2009-02-06
symbol: IBM
div: $ 0.5","info: 2009-05-06
symbol: IBM
div: $ 0.55","info: 2009-08-06
symbol: IBM
div: $ 0.55","info: 2009-11-06
symbol: IBM
div: $ 0.55","info: 2010-02-08
symbol: IBM
div: $ 0.55","info: 2010-05-06
symbol: IBM
div: $ 0.65","info: 2010-08-06
symbol: IBM
div: $ 0.65","info: 2010-11-08
symbol: IBM
div: $ 0.65","info: 2011-02-08
symbol: IBM
div: $ 0.65","info: 2011-05-06
symbol: IBM
div: $ 0.75","info: 2011-08-08
symbol: IBM
div: $ 0.75","info: 2011-11-08
symbol: IBM
div: $ 0.75","info: 2012-02-08
symbol: IBM
div: $ 0.75","info: 2012-05-08
symbol: IBM
div: $ 0.85","info: 2012-08-08
symbol: IBM
div: $ 0.85","info: 2012-11-07
symbol: IBM
div: $ 0.85","info: 2013-02-06
symbol: IBM
div: $ 0.85","info: 2013-05-08
symbol: IBM
div: $ 0.95","info: 2013-08-07
symbol: IBM
div: $ 0.95","info: 2013-11-06
symbol: IBM
div: $ 0.95","info: 2014-02-06
symbol: IBM
div: $ 0.95","info: 2014-05-07
symbol: IBM
div: $ 1.1","info: 2014-08-06
symbol: IBM
div: $ 1.1","info: 2014-11-06
symbol: IBM
div: $ 1.1","info: 2015-02-06
symbol: IBM
div: $ 1.1","info: 2015-05-06
symbol: IBM
div: $ 1.3","info: 2015-05-07
symbol: IBM
div: $ 1.3","info: 2015-08-06
symbol: IBM
div: $ 1.3","info: 2015-11-06
symbol: IBM
div: $ 1.3","info: 2016-02-08
symbol: IBM
div: $ 1.3","info: 2016-05-06
symbol: IBM
div: $ 1.4","info: 2016-05-09
symbol: IBM
div: $ 1.4","info: 2016-08-08
symbol: IBM
div: $ 1.4","info: 2016-11-08
symbol: IBM
div: $ 1.4","info: 2017-02-08
symbol: IBM
div: $ 1.4","info: 2017-05-08
symbol: IBM
div: $ 1.5","info: 2017-08-08
symbol: IBM
div: $ 1.5","info: 2017-11-09
symbol: IBM
div: $ 1.5","info: 2018-02-08
symbol: IBM
div: $ 1.5","info: 2018-05-09
symbol: IBM
div: $ 1.57","info: 2018-08-09
symbol: IBM
div: $ 1.57","info: 2018-11-08
symbol: IBM
div: $ 1.57","info: 2019-02-07
symbol: IBM
div: $ 1.57","info: 2019-05-09
symbol: IBM
div: $ 1.62","info: 2019-08-08
symbol: IBM
div: $ 1.62"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(0,186,56,1)","opacity":1,"size":5.66929133858268,"symbol":"circle","line":{"width":1.88976377952756,"color":"rgba(0,186,56,1)"}},"hoveron":"points","name":"IBM","legendgroup":"IBM","showlegend":true,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[14279,14369,14461,14552,14643,14734,14825,14916,15008,15098,15189,15281,15373,15463,15555,15649,15741,15828,15922,16014,16106,16195,16287,16379,16470,16560,16652,16743,16834,16925,17016,17108,17200,17289,17381,17476,17568,17655,17749,17841,17933,18022,18114],"y":[0.14,0.14,0.14,0.14,0.158,0.158,0.158,0.158,0.181,0.181,0.21,0.21,0.21,0.21,0.225,0.225,0.225,0.225,0.225,0.225,0.225,0.225,0.225,0.225,0.24,0.24,0.24,0.24,0.26,0.26,0.26,0.26,0.26,0.273,0.273,0.273,0.3,0.3,0.3,0.3,0.315,0.315,0.315],"text":["info: 2009-02-04
symbol: INTC
div: $ 0.14","info: 2009-05-05
symbol: INTC
div: $ 0.14","info: 2009-08-05
symbol: INTC
div: $ 0.14","info: 2009-11-04
symbol: INTC
div: $ 0.14","info: 2010-02-03
symbol: INTC
div: $ 0.158","info: 2010-05-05
symbol: INTC
div: $ 0.158","info: 2010-08-04
symbol: INTC
div: $ 0.158","info: 2010-11-03
symbol: INTC
div: $ 0.158","info: 2011-02-03
symbol: INTC
div: $ 0.181","info: 2011-05-04
symbol: INTC
div: $ 0.181","info: 2011-08-03
symbol: INTC
div: $ 0.21","info: 2011-11-03
symbol: INTC
div: $ 0.21","info: 2012-02-03
symbol: INTC
div: $ 0.21","info: 2012-05-03
symbol: INTC
div: $ 0.21","info: 2012-08-03
symbol: INTC
div: $ 0.225","info: 2012-11-05
symbol: INTC
div: $ 0.225","info: 2013-02-05
symbol: INTC
div: $ 0.225","info: 2013-05-03
symbol: INTC
div: $ 0.225","info: 2013-08-05
symbol: INTC
div: $ 0.225","info: 2013-11-05
symbol: INTC
div: $ 0.225","info: 2014-02-05
symbol: INTC
div: $ 0.225","info: 2014-05-05
symbol: INTC
div: $ 0.225","info: 2014-08-05
symbol: INTC
div: $ 0.225","info: 2014-11-05
symbol: INTC
div: $ 0.225","info: 2015-02-04
symbol: INTC
div: $ 0.24","info: 2015-05-05
symbol: INTC
div: $ 0.24","info: 2015-08-05
symbol: INTC
div: $ 0.24","info: 2015-11-04
symbol: INTC
div: $ 0.24","info: 2016-02-03
symbol: INTC
div: $ 0.26","info: 2016-05-04
symbol: INTC
div: $ 0.26","info: 2016-08-03
symbol: INTC
div: $ 0.26","info: 2016-11-03
symbol: INTC
div: $ 0.26","info: 2017-02-03
symbol: INTC
div: $ 0.26","info: 2017-05-03
symbol: INTC
div: $ 0.273","info: 2017-08-03
symbol: INTC
div: $ 0.273","info: 2017-11-06
symbol: INTC
div: $ 0.273","info: 2018-02-06
symbol: INTC
div: $ 0.3","info: 2018-05-04
symbol: INTC
div: $ 0.3","info: 2018-08-06
symbol: INTC
div: $ 0.3","info: 2018-11-06
symbol: INTC
div: $ 0.3","info: 2019-02-06
symbol: INTC
div: $ 0.315","info: 2019-05-06
symbol: INTC
div: $ 0.315","info: 2019-08-06
symbol: INTC
div: $ 0.315"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(0,193,159,1)","opacity":1,"size":5.66929133858268,"symbol":"circle","line":{"width":1.88976377952756,"color":"rgba(0,193,159,1)"}},"hoveron":"points","name":"INTC","legendgroup":"INTC","showlegend":true,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[14292,14383,14474,14565,14656,14747,14838,14929,15020,15111,15202,15293,15384,15475,15566,15657,15755,15839,15930,16028,16119,16203,16301,16392,16483,16574,16665,16756,16847,16938,17029,17120,17211,17302,17393,17485,17576,17667,17758,17849,17947,18031,18122],"y":[0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.16,0.16,0.16,0.16,0.2,0.2,0.2,0.2,0.23,0.23,0.23,0.23,0.28,0.28,0.28,0.28,0.31,0.31,0.31,0.31,0.36,0.36,0.36,0.36,0.39,0.39,0.39,0.39,0.42,0.42,0.42,0.42,0.46,0.46,0.46,0.46],"text":["info: 2009-02-17
symbol: MSFT
div: $ 0.13","info: 2009-05-19
symbol: MSFT
div: $ 0.13","info: 2009-08-18
symbol: MSFT
div: $ 0.13","info: 2009-11-17
symbol: MSFT
div: $ 0.13","info: 2010-02-16
symbol: MSFT
div: $ 0.13","info: 2010-05-18
symbol: MSFT
div: $ 0.13","info: 2010-08-17
symbol: MSFT
div: $ 0.13","info: 2010-11-16
symbol: MSFT
div: $ 0.16","info: 2011-02-15
symbol: MSFT
div: $ 0.16","info: 2011-05-17
symbol: MSFT
div: $ 0.16","info: 2011-08-16
symbol: MSFT
div: $ 0.16","info: 2011-11-15
symbol: MSFT
div: $ 0.2","info: 2012-02-14
symbol: MSFT
div: $ 0.2","info: 2012-05-15
symbol: MSFT
div: $ 0.2","info: 2012-08-14
symbol: MSFT
div: $ 0.2","info: 2012-11-13
symbol: MSFT
div: $ 0.23","info: 2013-02-19
symbol: MSFT
div: $ 0.23","info: 2013-05-14
symbol: MSFT
div: $ 0.23","info: 2013-08-13
symbol: MSFT
div: $ 0.23","info: 2013-11-19
symbol: MSFT
div: $ 0.28","info: 2014-02-18
symbol: MSFT
div: $ 0.28","info: 2014-05-13
symbol: MSFT
div: $ 0.28","info: 2014-08-19
symbol: MSFT
div: $ 0.28","info: 2014-11-18
symbol: MSFT
div: $ 0.31","info: 2015-02-17
symbol: MSFT
div: $ 0.31","info: 2015-05-19
symbol: MSFT
div: $ 0.31","info: 2015-08-18
symbol: MSFT
div: $ 0.31","info: 2015-11-17
symbol: MSFT
div: $ 0.36","info: 2016-02-16
symbol: MSFT
div: $ 0.36","info: 2016-05-17
symbol: MSFT
div: $ 0.36","info: 2016-08-16
symbol: MSFT
div: $ 0.36","info: 2016-11-15
symbol: MSFT
div: $ 0.39","info: 2017-02-14
symbol: MSFT
div: $ 0.39","info: 2017-05-16
symbol: MSFT
div: $ 0.39","info: 2017-08-15
symbol: MSFT
div: $ 0.39","info: 2017-11-15
symbol: MSFT
div: $ 0.42","info: 2018-02-14
symbol: MSFT
div: $ 0.42","info: 2018-05-16
symbol: MSFT
div: $ 0.42","info: 2018-08-15
symbol: MSFT
div: $ 0.42","info: 2018-11-14
symbol: MSFT
div: $ 0.46","info: 2019-02-20
symbol: MSFT
div: $ 0.46","info: 2019-05-15
symbol: MSFT
div: $ 0.46","info: 2019-08-14
symbol: MSFT
div: $ 0.46"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(0,185,227,1)","opacity":1,"size":5.66929133858268,"symbol":"circle","line":{"width":1.88976377952756,"color":"rgba(0,185,227,1)"}},"hoveron":"points","name":"MSFT","legendgroup":"MSFT","showlegend":true,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[14340,14438,14526,14623,14711,14802,14886,14988,15075,15166,15254,15348,15439,15532,15623,15686,15896,15982,16073,16164,16258,16349,16440,16527,16622,16717,16804,16903,16983,17081,17169,17266,17364,17449,17540,17637,17728,17819,17911,17996,18093],"y":[0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.18,0.12,0.12,0.12,0.12,0.12,0.12,0.12,0.15,0.15,0.15,0.15,0.15,0.15,0.15,0.15,0.19,0.19,0.19,0.19,0.19,0.19,0.19,0.19,0.24,0.24],"text":["info: 2009-04-06
symbol: ORCL
div: $ 0.05","info: 2009-07-13
symbol: ORCL
div: $ 0.05","info: 2009-10-09
symbol: ORCL
div: $ 0.05","info: 2010-01-14
symbol: ORCL
div: $ 0.05","info: 2010-04-12
symbol: ORCL
div: $ 0.05","info: 2010-07-12
symbol: ORCL
div: $ 0.05","info: 2010-10-04
symbol: ORCL
div: $ 0.05","info: 2011-01-14
symbol: ORCL
div: $ 0.05","info: 2011-04-11
symbol: ORCL
div: $ 0.06","info: 2011-07-11
symbol: ORCL
div: $ 0.06","info: 2011-10-07
symbol: ORCL
div: $ 0.06","info: 2012-01-09
symbol: ORCL
div: $ 0.06","info: 2012-04-09
symbol: ORCL
div: $ 0.06","info: 2012-07-11
symbol: ORCL
div: $ 0.06","info: 2012-10-10
symbol: ORCL
div: $ 0.06","info: 2012-12-12
symbol: ORCL
div: $ 0.18","info: 2013-07-10
symbol: ORCL
div: $ 0.12","info: 2013-10-04
symbol: ORCL
div: $ 0.12","info: 2014-01-03
symbol: ORCL
div: $ 0.12","info: 2014-04-04
symbol: ORCL
div: $ 0.12","info: 2014-07-07
symbol: ORCL
div: $ 0.12","info: 2014-10-06
symbol: ORCL
div: $ 0.12","info: 2015-01-05
symbol: ORCL
div: $ 0.12","info: 2015-04-02
symbol: ORCL
div: $ 0.15","info: 2015-07-06
symbol: ORCL
div: $ 0.15","info: 2015-10-09
symbol: ORCL
div: $ 0.15","info: 2016-01-04
symbol: ORCL
div: $ 0.15","info: 2016-04-12
symbol: ORCL
div: $ 0.15","info: 2016-07-01
symbol: ORCL
div: $ 0.15","info: 2016-10-07
symbol: ORCL
div: $ 0.15","info: 2017-01-03
symbol: ORCL
div: $ 0.15","info: 2017-04-10
symbol: ORCL
div: $ 0.19","info: 2017-07-17
symbol: ORCL
div: $ 0.19","info: 2017-10-10
symbol: ORCL
div: $ 0.19","info: 2018-01-09
symbol: ORCL
div: $ 0.19","info: 2018-04-16
symbol: ORCL
div: $ 0.19","info: 2018-07-16
symbol: ORCL
div: $ 0.19","info: 2018-10-15
symbol: ORCL
div: $ 0.19","info: 2019-01-15
symbol: ORCL
div: $ 0.19","info: 2019-04-10
symbol: ORCL
div: $ 0.24","info: 2019-07-16
symbol: ORCL
div: $ 0.24"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(97,156,255,1)","opacity":1,"size":5.66929133858268,"symbol":"circle","line":{"width":1.88976377952756,"color":"rgba(97,156,255,1)"}},"hoveron":"points","name":"ORCL","legendgroup":"ORCL","showlegend":true,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[14300,14391,14482,14571,14664,14755,14846,14935,15028,15119,15210,15299,15399,15490,15588,15679,15770,15859,15947,16036,16132,16223,16311,16400,16496,16587,16678,16766,16860,16948,17042,17133,17224,17312,17406,17498,17589,17680,17778,17871,17961,18052],"y":[0.16,0.17,0.17,0.17,0.17,0.19,0.19,0.19,0.19,0.215,0.215,0.215,0.215,0.25,0.25,0.25,0.25,0.35,0.35,0.35,0.35,0.42,0.42,0.42,0.42,0.48,0.48,0.48,0.48,0.53,0.53,0.53,0.53,0.57,0.57,0.57,0.57,0.62,0.62,0.62,0.62,0.62],"text":["info: 2009-02-25
symbol: QCOM
div: $ 0.16","info: 2009-05-27
symbol: QCOM
div: $ 0.17","info: 2009-08-26
symbol: QCOM
div: $ 0.17","info: 2009-11-23
symbol: QCOM
div: $ 0.17","info: 2010-02-24
symbol: QCOM
div: $ 0.17","info: 2010-05-26
symbol: QCOM
div: $ 0.19","info: 2010-08-25
symbol: QCOM
div: $ 0.19","info: 2010-11-22
symbol: QCOM
div: $ 0.19","info: 2011-02-23
symbol: QCOM
div: $ 0.19","info: 2011-05-25
symbol: QCOM
div: $ 0.215","info: 2011-08-24
symbol: QCOM
div: $ 0.215","info: 2011-11-21
symbol: QCOM
div: $ 0.215","info: 2012-02-29
symbol: QCOM
div: $ 0.215","info: 2012-05-30
symbol: QCOM
div: $ 0.25","info: 2012-09-05
symbol: QCOM
div: $ 0.25","info: 2012-12-05
symbol: QCOM
div: $ 0.25","info: 2013-03-06
symbol: QCOM
div: $ 0.25","info: 2013-06-03
symbol: QCOM
div: $ 0.35","info: 2013-08-30
symbol: QCOM
div: $ 0.35","info: 2013-11-27
symbol: QCOM
div: $ 0.35","info: 2014-03-03
symbol: QCOM
div: $ 0.35","info: 2014-06-02
symbol: QCOM
div: $ 0.42","info: 2014-08-29
symbol: QCOM
div: $ 0.42","info: 2014-11-26
symbol: QCOM
div: $ 0.42","info: 2015-03-02
symbol: QCOM
div: $ 0.42","info: 2015-06-01
symbol: QCOM
div: $ 0.48","info: 2015-08-31
symbol: QCOM
div: $ 0.48","info: 2015-11-27
symbol: QCOM
div: $ 0.48","info: 2016-02-29
symbol: QCOM
div: $ 0.48","info: 2016-05-27
symbol: QCOM
div: $ 0.53","info: 2016-08-29
symbol: QCOM
div: $ 0.53","info: 2016-11-28
symbol: QCOM
div: $ 0.53","info: 2017-02-27
symbol: QCOM
div: $ 0.53","info: 2017-05-26
symbol: QCOM
div: $ 0.57","info: 2017-08-28
symbol: QCOM
div: $ 0.57","info: 2017-11-28
symbol: QCOM
div: $ 0.57","info: 2018-02-27
symbol: QCOM
div: $ 0.57","info: 2018-05-29
symbol: QCOM
div: $ 0.62","info: 2018-09-04
symbol: QCOM
div: $ 0.62","info: 2018-12-06
symbol: QCOM
div: $ 0.62","info: 2019-03-06
symbol: QCOM
div: $ 0.62","info: 2019-06-05
symbol: QCOM
div: $ 0.62"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(219,114,251,1)","opacity":1,"size":5.66929133858268,"symbol":"circle","line":{"width":1.88976377952756,"color":"rgba(219,114,251,1)"}},"hoveron":"points","name":"QCOM","legendgroup":"QCOM","showlegend":true,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[14272,14362,14454,14545,14637,14727,14818,14910,15001,15092,15183,15274,15366,15456,15548,15642,15644,15734,15821,15915,16007,16099,16188,16280,16372,16463,16553,16645,16736,16828,16919,17010,17108,17193,17283,17374,17469,17561,17655,17742,17834,17926,18019,18107],"y":[0.11,0.11,0.11,0.12,0.12,0.12,0.12,0.13,0.13,0.13,0.13,0.17,0.17,0.17,0.17,0.21,0.21,0.21,0.28,0.28,0.3,0.3,0.3,0.3,0.34,0.34,0.34,0.34,0.38,0.38,0.38,0.38,0.5,0.5,0.5,0.5,0.62,0.62,0.62,0.62,0.77,0.77,0.77,0.77],"text":["info: 2009-01-28
symbol: TXN
div: $ 0.11","info: 2009-04-28
symbol: TXN
div: $ 0.11","info: 2009-07-29
symbol: TXN
div: $ 0.11","info: 2009-10-28
symbol: TXN
div: $ 0.12","info: 2010-01-28
symbol: TXN
div: $ 0.12","info: 2010-04-28
symbol: TXN
div: $ 0.12","info: 2010-07-28
symbol: TXN
div: $ 0.12","info: 2010-10-28
symbol: TXN
div: $ 0.13","info: 2011-01-27
symbol: TXN
div: $ 0.13","info: 2011-04-28
symbol: TXN
div: $ 0.13","info: 2011-07-28
symbol: TXN
div: $ 0.13","info: 2011-10-27
symbol: TXN
div: $ 0.17","info: 2012-01-27
symbol: TXN
div: $ 0.17","info: 2012-04-26
symbol: TXN
div: $ 0.17","info: 2012-07-27
symbol: TXN
div: $ 0.17","info: 2012-10-29
symbol: TXN
div: $ 0.21","info: 2012-10-31
symbol: TXN
div: $ 0.21","info: 2013-01-29
symbol: TXN
div: $ 0.21","info: 2013-04-26
symbol: TXN
div: $ 0.28","info: 2013-07-29
symbol: TXN
div: $ 0.28","info: 2013-10-29
symbol: TXN
div: $ 0.3","info: 2014-01-29
symbol: TXN
div: $ 0.3","info: 2014-04-28
symbol: TXN
div: $ 0.3","info: 2014-07-29
symbol: TXN
div: $ 0.3","info: 2014-10-29
symbol: TXN
div: $ 0.34","info: 2015-01-28
symbol: TXN
div: $ 0.34","info: 2015-04-28
symbol: TXN
div: $ 0.34","info: 2015-07-29
symbol: TXN
div: $ 0.34","info: 2015-10-28
symbol: TXN
div: $ 0.38","info: 2016-01-28
symbol: TXN
div: $ 0.38","info: 2016-04-28
symbol: TXN
div: $ 0.38","info: 2016-07-28
symbol: TXN
div: $ 0.38","info: 2016-11-03
symbol: TXN
div: $ 0.5","info: 2017-01-27
symbol: TXN
div: $ 0.5","info: 2017-04-27
symbol: TXN
div: $ 0.5","info: 2017-07-27
symbol: TXN
div: $ 0.5","info: 2017-10-30
symbol: TXN
div: $ 0.62","info: 2018-01-30
symbol: TXN
div: $ 0.62","info: 2018-05-04
symbol: TXN
div: $ 0.62","info: 2018-07-30
symbol: TXN
div: $ 0.62","info: 2018-10-30
symbol: TXN
div: $ 0.77","info: 2019-01-30
symbol: TXN
div: $ 0.77","info: 2019-05-03
symbol: TXN
div: $ 0.77","info: 2019-07-30
symbol: TXN
div: $ 0.77"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(255,97,195,1)","opacity":1,"size":5.66929133858268,"symbol":"circle","line":{"width":1.88976377952756,"color":"rgba(255,97,195,1)"}},"hoveron":"points","name":"TXN","legendgroup":"TXN","showlegend":true,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null}],"layout":{"margin":{"t":43.7625570776256,"r":7.30593607305936,"b":25.5707762557078,"l":54.7945205479452},"plot_bgcolor":"rgba(235,235,235,1)","paper_bgcolor":"rgba(255,255,255,1)","font":{"color":"rgba(0,0,0,1)","family":"","size":14.6118721461187},"title":{"text":"Nasdaq dividends","font":{"color":"rgba(0,0,0,1)","family":"","size":17.5342465753425},"x":0.5,"xref":"paper"},"xaxis":{"domain":[0,1],"automargin":true,"type":"linear","autorange":false,"range":[14079.5,18314.5],"tickmode":"array","ticktext":["2009","2010","2011","2012","2013","2014","2015","2016","2017","2018","2019","2020"],"tickvals":[14245,14610,14975,15340,15706,16071,16436,16801,17167,17532,17897,18262],"categoryorder":"array","categoryarray":["2009","2010","2011","2012","2013","2014","2015","2016","2017","2018","2019","2020"],"nticks":null,"ticks":"outside","tickcolor":"rgba(51,51,51,1)","ticklen":3.65296803652968,"tickwidth":0.66417600664176,"showticklabels":true,"tickfont":{"color":"rgba(77,77,77,1)","family":"","size":11.689497716895},"tickangle":-0,"showline":false,"linecolor":null,"linewidth":0,"showgrid":true,"gridcolor":"rgba(255,255,255,1)","gridwidth":0.66417600664176,"zeroline":false,"anchor":"y","title":{"text":"","font":{"color":"rgba(0,0,0,1)","family":"","size":14.6118721461187}},"hoverformat":".2f"},"yaxis":{"domain":[0,1],"automargin":true,"type":"linear","autorange":false,"range":[-0.0636764305177112,1.70017506811989],"tickmode":"array","ticktext":["$0.00","$0.50","$1.00","$1.50"],"tickvals":[0,0.5,1,1.5],"categoryorder":"array","categoryarray":["$0.00","$0.50","$1.00","$1.50"],"nticks":null,"ticks":"outside","tickcolor":"rgba(51,51,51,1)","ticklen":3.65296803652968,"tickwidth":0.66417600664176,"showticklabels":true,"tickfont":{"color":"rgba(77,77,77,1)","family":"","size":11.689497716895},"tickangle":-0,"showline":false,"linecolor":null,"linewidth":0,"showgrid":true,"gridcolor":"rgba(255,255,255,1)","gridwidth":0.66417600664176,"zeroline":false,"anchor":"x","title":{"text":"div/share","font":{"color":"rgba(0,0,0,1)","family":"","size":14.6118721461187}},"hoverformat":".2f"},"shapes":[{"type":"rect","fillcolor":null,"line":{"color":null,"width":0,"linetype":[]},"yref":"paper","xref":"paper","x0":0,"x1":1,"y0":0,"y1":1}],"showlegend":true,"legend":{"bgcolor":"rgba(255,255,255,1)","bordercolor":"transparent","borderwidth":1.88976377952756,"font":{"color":"rgba(0,0,0,1)","family":"","size":11.689497716895},"y":0.913385826771654},"annotations":[{"text":"symbol","x":1.02,"y":1,"showarrow":false,"ax":0,"ay":0,"font":{"color":"rgba(0,0,0,1)","family":"","size":14.6118721461187},"xref":"paper","yref":"paper","textangle":-0,"xanchor":"left","yanchor":"bottom","legendTitle":true}],"hovermode":"closest","barmode":"relative"},"config":{"doubleClick":"reset","showSendToCloud":false},"source":"A","attrs":{"e04b61379be2":{"x":{},"y":{},"colour":{},"label_tooltip":{},"type":"scatter"}},"cur_data":"e04b61379be2","visdat":{"e04b61379be2":["function (y) ","x"]},"highlight":{"on":"plotly_click","persistent":false,"dynamic":false,"selectize":false,"opacityDim":0.2,"selected":{"opacity":1},"debounce":0},"shinyEvents":["plotly_hover","plotly_click","plotly_selected","plotly_relayout","plotly_brushed","plotly_brushing","plotly_clickannotation","plotly_doubleclick","plotly_deselect","plotly_afterplot"],"base_url":"https://plot.ly"},"evals":[],"jsHooks":[]}

With a handful of stocks, our visualization really tells a nice story. We can more clearly see the four annual payments by each company, and it pops off the chart that IBM has been raising it’s dividend consistently. Not bad for a company that also owns Red Hat.

Let’s move beyond the dividend history and compare the dividend yields for each of these tickers. We’ll grab yesterday’s closing price by calling tq_get(get = "stock.prices", from = "2019-08-05").

barrons_price <- barrons_tickers %>% tq_get(get = "stock.prices", from = "2019-08-05")

Now, we estimate the annual dividend payment by taking the most recent quarterly dividend via slice(n()) and multiplying by four.

barrons_dividends %>% group_by(symbol) %>% slice(n()) %>% mutate(total_div = dividends * 4) # A tibble: 9 x 4 # Groups: symbol [9] symbol date dividends total_div 1 AAPL 2019-08-09 0.77 3.08 2 CSCO 2019-07-03 0.35 1.4 3 HPQ 2019-06-11 0.16 0.64 4 IBM 2019-08-08 1.62 6.48 5 INTC 2019-08-06 0.315 1.26 6 MSFT 2019-08-14 0.46 1.84 7 ORCL 2019-07-16 0.24 0.96 8 QCOM 2019-06-05 0.62 2.48 9 TXN 2019-07-30 0.77 3.08

Next, we use left_join(barrons_price, by = "symbol") to add the most recent closing price.

barrons_dividends %>% group_by(symbol) %>% slice(n()) %>% mutate(total_div = dividends * 4) %>% left_join(barrons_price, by = "symbol") %>% head() # A tibble: 6 x 11 # Groups: symbol [1] symbol date.x dividends total_div date.y open high low close 1 AAPL 2019-08-09 0.77 3.08 2019-08-05 198. 199. 193. 193. 2 AAPL 2019-08-09 0.77 3.08 2019-08-06 196. 198. 194. 197 3 AAPL 2019-08-09 0.77 3.08 2019-08-07 195. 200. 194. 199. 4 AAPL 2019-08-09 0.77 3.08 2019-08-08 200. 204. 199. 203. 5 AAPL 2019-08-09 0.77 3.08 2019-08-09 201. 203. 199. 201. 6 AAPL 2019-08-09 0.77 3.08 2019-08-12 200. 202. 199. 200. # … with 2 more variables: volume , adjusted

That worked, but note how we now have two date columns, called date.x and date.y, since both of our tibbles had a date column before we joined them. In the past we have dealt with that by deleting the duplicate but this time let’s use a select() inside left_join() to remove the duplicate before joining. The full call is left_join(select(barrons_price, -date), by = "symbol").

barrons_dividends %>% group_by(symbol) %>% slice(n()) %>% mutate(total_div = dividends * 4) %>% left_join(select(barrons_price, -date), by = "symbol") %>% head() # A tibble: 6 x 10 # Groups: symbol [1] symbol date dividends total_div open high low close volume 1 AAPL 2019-08-09 0.77 3.08 198. 199. 193. 193. 5.24e7 2 AAPL 2019-08-09 0.77 3.08 196. 198. 194. 197 3.58e7 3 AAPL 2019-08-09 0.77 3.08 195. 200. 194. 199. 3.34e7 4 AAPL 2019-08-09 0.77 3.08 200. 204. 199. 203. 2.70e7 5 AAPL 2019-08-09 0.77 3.08 201. 203. 199. 201. 2.46e7 6 AAPL 2019-08-09 0.77 3.08 200. 202. 199. 200. 2.25e7 # … with 1 more variable: adjusted

Now, we calculate the yield with mutate(yield = total_div/close).

barrons_dividends %>% group_by(symbol) %>% slice(n()) %>% mutate(total_div = dividends * 4) %>% left_join(barrons_price, by = "symbol") %>% select(symbol, total_div, close) %>% mutate(yield = total_div/close) # A tibble: 81 x 4 # Groups: symbol [9] symbol total_div close yield 1 AAPL 3.08 193. 0.0159 2 AAPL 3.08 197 0.0156 3 AAPL 3.08 199. 0.0155 4 AAPL 3.08 203. 0.0151 5 AAPL 3.08 201. 0.0153 6 AAPL 3.08 200. 0.0154 7 AAPL 3.08 209. 0.0147 8 AAPL 3.08 203. 0.0152 9 AAPL 3.08 202. 0.0153 10 CSCO 1.4 51.4 0.0273 # … with 71 more rows

We can plot the dividend yields as bar heights using geom_col().

barrons_dividends %>% group_by(symbol) %>% slice(n()) %>% mutate(total_div = dividends * 4) %>% left_join(barrons_price, by = "symbol") %>% select(symbol, total_div, close) %>% mutate(yield = total_div/close) %>% ggplot(aes(x = reorder(symbol, yield), y = yield, fill = symbol)) + geom_col(width = .5) + labs(x = "") + scale_y_continuous(labels = scales::percent)

We could wrap this up with a call to plotly, but let’s totally change directions and add some animation. Animate a chart? That sounds really hard, I guess we’ll need to loop through the dates and add dots as we go. A lot of work and who has the time…wait…boom…gganimate to the rescue!

The gganimate package makes this so painless it’s a shame. We add transition_reveal(date) to the end of the code flow, and that’s it! Well, not quite; on my machine, I needed to load the gifski and png packages before any of this works, but then we’re good to go.

library(gganimate) library(gifski) library(png) barrons_dividends %>% group_by(symbol) %>% ggplot(aes(x = date, y = dividends, color = symbol)) + geom_point() + scale_y_continuous(labels = scales::dollar) + scale_x_date(breaks = scales::pretty_breaks(n = 10)) + labs(x = "", y = "div/share", title = "Nasdaq dividends") + theme(plot.title = element_text(hjust = 0.5)) + transition_reveal(date)

Nice!

What about animating our chart that shows the dividend yield as bar heights? Well, we can’t reveal by date here, so we use transition_states(symbol).

barrons_dividends %>% group_by(symbol) %>% slice(n()) %>% mutate(total_div = dividends * 4) %>% left_join(select(barrons_price, -date), by = "symbol") %>% select(symbol, total_div, close) %>% mutate(yield = total_div/close) %>% ggplot(aes(x = reorder(symbol, yield), y = yield, fill = symbol)) + geom_col(width = .5) + labs(x = "") + scale_y_continuous(labels = scales::percent) + transition_states(symbol)

Ah, not quite perfect – notice the chart doesn’t respect the reorder in our aes(), so they appear in alphabetical order and each column disappears as the next one appears. Let’s use shadow_mark() to keep the previous bar and attempt to reorder the images with arrange().

barrons_dividends %>% group_by(symbol) %>% slice(n()) %>% mutate(total_div = dividends * 4) %>% left_join(select(barrons_price, -date), by = "symbol") %>% select(symbol, total_div, close) %>% mutate(yield = total_div/close) %>% arrange(yield) %>% ggplot(aes(x = reorder(symbol, yield), y = yield, fill = symbol)) + geom_col(width = .5) + labs(x = "") + scale_y_continuous(labels = scales::percent) + transition_states(symbol, wrap = FALSE) + shadow_mark()

It is still not respecting the new order and defaulting to alphabetical. Let’s hard-code that reordering by converting symbol to a factor, ordered by yield. And that means a foray into the forcats package and fct_reorder(). Note we need to ungroup() first since symbol is our grouping column and then can call symbol_fct = forcats::as_factor(symbol) %>% fct_reorder(yield). I also think it would be a little more dramatic to remove the x-axis labels and have the ticker names appear on the chart.

barrons_dividends %>% group_by(symbol) %>% slice(n()) %>% mutate(total_div = dividends * 4) %>% left_join(select(barrons_price, -date), by = "symbol") %>% select(symbol, total_div, close) %>% mutate(yield = total_div/close) %>% ungroup() %>% mutate(symbol_fct = forcats::as_factor(symbol) %>% fct_reorder(yield)) %>% ggplot(aes(x = symbol_fct, y = yield, fill = symbol_fct)) + geom_col(width = .5) + geom_label(aes(label = symbol, y = yield), nudge_y = .03) + labs(x = "") + scale_y_continuous(labels = scales::percent) + theme(axis.text.x = element_blank(), axis.ticks.x = element_blank()) + transition_states(symbol_fct, wrap = FALSE) + shadow_mark()

Creating and loading those animated gifs takes some time, about 10-30 seconds each on my RStudio Server Pro instance. Plus, it’s totally fair to quibble that these animations haven’t added any new substance to the charts, they just look cool (R plots can be cool, right?). But if you’ve read this far (thanks!), I might as well subject you to my rant about visualization and communication being just-as-if-not-more important than analytical or statistical findings. Most of the consumers of our work are really busy and we’re lucky if they spend two minutes glancing at whatever findings we put in front of them. We don’t have long to grab their attention and communicate our message. If an animation helps us, it’s worth spending the extra time on it, even though we were actually ‘done’ with this job many lines of code ago.

Alright, so with that:

If you like this sort of code through ,check out my book, Reproducible Finance with R.

Not specific to finance, but several of the stringr and ggplot tricks in this post came from this awesome Business Science University course.

I’m also going to be posting weekly code snippets on LinkedIn; connect with me there if you’re keen for some R finance stuff.

Thanks for reading and see you next time!

_____='https://rviews.rstudio.com/2019/08/17/tech-dividends-part-2/';

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R Views. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Modern R with the tidyverse is available on Leanpub

Sat, 17/08/2019 - 02:00

[This article was first published on Econometrics and Free Software, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


Yesterday I released an ebook on Leanpub,
called Modern R with the tidyverse, which you can also
read for free here.

In this blog post, I want to give some context.

Modern R with the tidyverse is the second ebook I release on Leanpub. I released the first one, called
Functional programming and unit testing for data munging with R around
Christmas 2016 (I’ve retired it on Leanpub, but you can still read it for free
here) . I just had moved back to my home country of
Luxembourg and started a new job as a research assistant at the statistical national institute.
Since then, lots of things happened; I’ve changed jobs and joined PwC Luxembourg as a data scientist,
was promoted to manager, finished my PhD, and most importantly of all, I became a father.

Through all this, I continued blogging and working on a new ebook, called Modern R with the tidyverse.
At first, this was supposed to be a separate book from the first one, but as I continued writing,
I realized that updating and finishing the first one, would take a lot of effort, and also, that
it wouldn’t make much sense in keeping both separated. So I decided to merge the content from the
first ebook with the second, and update everything in one go.

My very first notes were around 50 pages if memory serves, and I used them to teach R at the
University of Strasbourg while I employed there as a research and teaching assistant and working
on my PhD. These notes were the basis of Functional programming and unit testing for data munging with R
and now Modern R. Chapter 2 of Modern R is almost a simple copy and paste from these notes
(with more sections added). These notes were first written around 2012-2013ish.

Modern R is the kind of text I would like to have had when I first started playing around with R,
sometime around 2009-2010. It starts from the beginning, but also goes quite into details in the
later chapters. For instance, the section on
modeling with functional programming
is quite advanced, but I believe that readers that read through all the book and reached that part
would be armed with all the needed knowledge to follow. At least, this is my hope.

Now, the book is still not finished. Two chapters are missing, but it should not take me long to
finish them as I already have drafts lying around. However, exercises might still be in wrong
places, and more are required. Also, generally, more polishing is needed.

As written in the first paragraph of this section, the book is available on
Leanpub. Unlike my previous ebook, this one costs money;
a minimum price of 4.99$ and a recommended price of 14.99$, but as mentioned you can read it for
free online. I’ve hesitated to give it a minimum price of
0$, but I figured that since the book can be read for free online, and that Leanpub has a 45 days
return policy where readers can get 100% reimbursed, no questions asked (and keep the downloaded
ebook), readers were not taking a lot of risks by buying it for 5 bucks. I sure hope however that
readers will find that this ebook is worth at least 5 bucks!

Now why should you read it? There’s already a lot of books on learning how to use R. Well, I don’t
really want to convince you to read it. But some people do seem to like my style of writing and my
blog posts, so I guess these same people, or similar people, might like the ebook. Also, I think
that this ebook covers a lot of different topics, enough of them to make you an efficient R user.
But as I’ve written in the introduction of Modern R:

So what you can expect from this book is that this book is not the only one you should read.

Anyways, hope you’ll enjoy Modern R, suggestions, criticisms and reviews welcome!

By the way, the cover of the book is a painting by John William Waterhouse,depicting Diogenes of Sinope,
an ancient Greek philosopher, and absolute mad lad. Read his Wikipedia page, it’s worth it.

Hope you enjoyed! If you found this blog post useful, you might want to follow
me on twitter for blog post updates and
buy me an espresso or paypal.me, or buy my ebook on Leanpub

.bmc-button img{width: 27px !important;margin-bottom: 1px !important;box-shadow: none !important;border: none !important;vertical-align: middle !important;}.bmc-button{line-height: 36px !important;height:37px !important;text-decoration: none !important;display:inline-flex !important;color:#ffffff !important;background-color:#272b30 !important;border-radius: 3px !important;border: 1px solid transparent !important;padding: 1px 9px !important;font-size: 22px !important;letter-spacing:0.6px !important;box-shadow: 0px 1px 2px rgba(190, 190, 190, 0.5) !important;-webkit-box-shadow: 0px 1px 2px 2px rgba(190, 190, 190, 0.5) !important;margin: 0 auto !important;font-family:'Cookie', cursive !important;-webkit-box-sizing: border-box !important;box-sizing: border-box !important;-o-transition: 0.3s all linear !important;-webkit-transition: 0.3s all linear !important;-moz-transition: 0.3s all linear !important;-ms-transition: 0.3s all linear !important;transition: 0.3s all linear !important;}.bmc-button:hover, .bmc-button:active, .bmc-button:focus {-webkit-box-shadow: 0px 1px 2px 2px rgba(190, 190, 190, 0.5) !important;text-decoration: none !important;box-shadow: 0px 1px 2px 2px rgba(190, 190, 190, 0.5) !important;opacity: 0.85 !important;color:#82518c !important;} Buy me an Espresso

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: Econometrics and Free Software. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Pages