# R-bloggers

R news and tutorials contributed by hundreds of R bloggers
Updated: 41 min 21 sec ago

### Notes on Becoming an RStudio Certified Trainer

Tue, 20/08/2019 - 19:04

I recently became an RStudio Certified Trainer, and thought that it might interest the broader R Community to learn about this new program.

For those who don’t know, RStudio has recently put together a process to independently verify that R trainers (a) are proficient with the Tidyverse and (b) know modern teaching pedagogy. Certified trainers get listed on RStudio’s website and also get referrals for training requests. Apparently there are a lot of people who want to learn the Tidyverse, and RStudio cannot keep up with the demand themselves!

The author with Garrett Grolemund at RStudio’s 2018 conference

I was actually one of the first people involved in this program, having taken Garrett Grolemund’s Tidyverse Train-the-Trainer workshop at RStudio’s 2018 Conference. Garrett, a Data Scientist and Master Instructor at RStudio, had recently created a popular workshop for introducing people to the Tidyverse.

The idea behind Tidyverse Train-the-Trainer was for people to learn two things. The first, of course, was to learn the ins and outs of Garrett’s workshop on the Tidyverse. The second, and perhaps more important, thing to learn was how Garrett had come to create this workshop. This involved learning a lot of important research that’s been done on adult education. The workshop also had lots of time for us to practice what we were learning.

At the end of the workshop we got the slides Garrett uses for his own workshop on the Tidyverse, and were told that we could use and modify them however we wanted. Perhaps it’s not surprising, but Garrett’s slides on ggplot2 and dplyr were fantastic, and I now use them when I teach!

I should also mention that the requirements for becoming Certified have recently increased. I believe that when I first took Garrett’s workshop, everyone who attended received a certificate. But recently, RStudio has started listing their “Certified Training Partners” on their website. In order to take be listed in this directory I had to take two additional exams. One exam was on the Tidyverse and one was on Teaching. The exams were given online, and were proctored by an RStudio employee.

Overall, I would recommend this program to anyone who wants to improve their ability to teach R. If you are a professional trainer, then the program can only help you in your career. But many people in the workshop were not professional trainers. They worked in academia and the corporate sector, and simply wanted help in bringing R to their organizations.

You can see the full list of RStudio Certified Trainers here. If you are interested in becoming certified yourself, you can learn more about the application process here.

The post Notes on Becoming an RStudio Certified Trainer appeared first on AriLamstein.com.

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

### Modern reporting for R with Dash

Tue, 20/08/2019 - 07:30

[This article was first published on R – Modern Data, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Creating an effective, informative, and aesthetically appealing report to showcase your data can be tedious: it’s often difficult to display your data and your plots together in an uncluttered manner, and even harder to implement interactivity between the individual elements. Dash for R facilitates this task, providing an intuitive way to make interactive and customizable reports directly from the R environment, without the need to create your own JavaScript components. If you’re already using R for data wrangling, visualization, and analysis, it’s convenient to stay within the R ecosystem to create your report as well.

Dash for R allows users to present interactive plots and tabular data side-by-side to monitor, highlight, or explore key aspects of their data. The library includes a rich set of GUI components that make it easy to interact with your data out of the box, and allows for customizing all aspects of your dashboard. As a result, it’s surprisingly easy to create a modern report with an intuitive user interface to better communicate your data.

Displaying tabular data can give the reader a good sense of the data you are working with, but when it is shown as a static table, it can be hard to digest and intimidating. Instead, it’s nice to display an interactive, formattable spreadsheet, providing a familiar and flexible tool within the report itself. The Dash DataTable component creates tables that can be sorted, filtered, and conditionally formatted, providing extensive support for customized views.

These tables can also be linked to your plots, so when you modify or filter your data, the changes to your data tables are reflected graphically on the fly. As data are added or modified in the table, the changes are immediately reflected in the linked plot. Data tables that are created or modified in your report can be downloaded locally, so they can be used in another program as well.

Tabbed applications

With complex analyses, it’s common to end up with more data than can reasonably be displayed at once. It’s better to organize the layout of your app so that the different aspects of your analyses are grouped together. This is where separate tabs and pages are useful. Dash takes the hassle out of creating multi-page apps, allowing you to compartmentalize the data and charts that you display into tabs, using the dccTab component.

For example, if your data has a geographical component, you can display an interactive map in one tab, summary plots in another, and a data table in a third. This allows for an uncluttered display of your data, and separates different views or controls for an easily understandable visualization.

You can also use dccLocation and dccLink to create a multi-page app that can be navigated through links instead of tabs. In fact, our interactive online Dash for R documentation is a multi-page Dash app in itself.

Styling and customization

Whether you have a specific vision for your app or need to incorporate your company’s branding, reports made with Dash are completely customizable. Components can be styled inline with the style property, using local CSS in your app’s assets directory or via an external CSS stylesheet. This means you can quickly modify the look of an individual component directly in R, or reference a CSS file that will apply styles to your components given their className or id. The ability to style an app using an external stylesheet means you can create generalizable styles to be applied to multiple components and have deeper control over the styling of the components, like sliders or radio buttons.

Interested in learning more?

You can explore full working examples of apps and reports, along with the code to generate them, in the Dash app gallery. Many of these examples show modern takes on traditional dashboards, while others, such as the financial report example pictured below, are structured more like interactive PDFs, allowing researchers and analysts to deliver beautiful and informative reports to their collaborators or clients.

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

### How to get an AUC confidence interval

Tue, 20/08/2019 - 05:45

[This article was first published on R – Open Source Automation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Background

AUC is an important metric in machine learning for classification. It is often used as a measure of a model’s performance. In effect, AUC is a measure between 0 and 1 of a model’s performance that rank-orders predictions from a model. For a detailed explanation of AUC, see this link.

Since AUC is widely used, being able to get a confidence interval around this metric is valuable to both better demonstrate a model’s performance, as well as to better compare two or more models. For example, if model A has an AUC higher than model B, but the 95% confidence interval around each AUC value overlaps, then the models may not be statistically different in performance. We can get a confidence interval around AUC using R’s pROC package, which uses bootstrapping to calculate the interval.

Building a simple model to test

To demonstrate how to get an AUC confidence interval, let’s build a model using a movies dataset from Kaggle (you can get the data here).

Reading in the data # load packages library(pROC) library(dplyr) library(randomForest) # read in dataset movies <- read.csv("movie_metadata.csv") # remove records with missing budget / gross data movies <- movies %>% filter(!is.na(budget) & !is.na(gross)) Split into train / test

Next, let’s randomly select 70% of the records to be in the training set and leave the rest for testing.

# get random sample of rows set.seed(0) train_rows <- sample(1:nrow(movies), .7 * nrow(movies)) # split data into train / test train_data <- movies[train_rows,] test_data <- movies[-train_rows,] # select only fields we need train_need <- train_data %>% select(gross, duration, director_facebook_likes, budget, imdb_score, content_rating, movie_title) test_need <- test_data %>% select(gross, duration, director_facebook_likes, budget, imdb_score, content_rating, movie_title) Create the label

Lastly, we need to create our label i.e. what we’re trying to predict. Here, we’re going to predict if a movie’s gross beats its budget (1 if so, 0 if not).

train_need$beat_budget <- as.factor(ifelse(train_need$gross > train_need$budget, 1, 0)) test_need$beat_budget <- as.factor(ifelse(test_need$gross > test_need$budget, 1, 0)) Train a random forest

Now, let’s train a simple random forest model with just 50 trees.

# train a random forest forest <- randomForest(beat_budget ~ duration + director_facebook_likes + budget + imdb_score + content_rating, train_need, ntree = 50, na.omit = TRUE) Getting an AUC confidence interval

Next, let’s use our model to get predictions on the test set.

test_pred <- predict(forest, test_need, type = "prob")[,2]

And now, we’re reading to get our confidence interval! We can do that in just one line of code using the ci.auc function from pROC. By default, this function uses 2000 bootstraps to calculate a 95% confidence interval. This means our 95% confidence interval for the AUC on the test set is between 0.6198 and 0.6822, as can be seen below.

ci.auc(test_need$beat_budget, test_pred) # 95% CI: 0.6198-0.6822 (DeLong) We can adjust the confidence interval using the conf.level parameter: ci.auc(test_need$beat_budget, test_pred, conf.level = 0.9) # 90% CI: 0.6248-0.6772 (DeLong)

The post How to get an AUC confidence interval appeared first on Open Source Automation.

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

### RcppQuantuccia 0.0.3

Tue, 20/08/2019 - 02:45

[This article was first published on Thinking inside the box , and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A maintenance release of RcppQuantuccia arrived on CRAN earlier today.

RcppQuantuccia brings the Quantuccia header-only subset / variant of QuantLib to R. At the current stage, it mostly offers date and calendaring functions.

This release was triggered by some work CRAN is doing on updating C++ standards for code in the repository. Notably, under C++11 some constructs such ptr_fun, bind1st, bind2nd, … are now deprecated, and CRAN prefers the code base to not issue such warnings (as e.g. now seen under clang++-9). So we updated the corresponding code in a good dozen or so places to the (more current and compliant) code from QuantLib itself.

We also took this opportunity to significantly reduce the footprint of the sources and the installed shared library of RcppQuantuccia. One (unexported) feature was pricing models via Brownian Bridges based on quasi-random Sobol sequences. But the main source file for these sequences comes in at several megabytes in sizes, and allocates a large number of constants. So in this version the file is excluded, making the current build of RcppQuantuccia lighter in size and more suitable for the (simpler, popular and trusted) calendar functions. We also added a new holiday to the US calendar.

The complete list changes follows.

Changes in version 0.0.3 (2019-08-19)
• Updated Travis CI test file (#8)).

• Updated US holiday calendar data with G H Bush funeral date (#9).

• Updated C++ use to not trigger warnings [CRAN request] (#9).

• Comment-out pragmas to suppress warnings [CRAN Policy] (#9).

• Change build to exclude Sobol sequence reducing file size for source and shared library, at the cost of excluding market models (#10).

Courtesy of CRANberries, there is also a diffstat report relative to the previous release. More information is on the RcppQuantuccia page. Issues and bugreports should go to the GitHub issue tracker.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

### Fitting ‘complex’ mixed models with ‘nlme’. Example #1

Tue, 20/08/2019 - 02:00

[This article was first published on R on The broken bridge between biologists and statisticians, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The environmental variance model

Fitting mixed models has become very common in biology and recent developments involve the manipulation of the variance-covariance matrix for random effects and residuals. To the best of my knowledge, within the frame of frequentist methods, the only freeware solution in R should be based on the ‘nlme’ package, as the ‘lmer’ package does not easily permit such manipulations. The ‘nlme’ package is fully described in Pinheiro and Bates (2000). Of course, the ‘asreml’ package can be used, but, unfortunately, this is not freeware.

Coding mixed models in ‘nlme’ is not always easy, especially when we have crossed random effects, which is very common with agricultural experiments. I have been struggling with this issue very often in the last years and I thought it might be useful to publish a few examples in this blog, to save collegues from a few headaches. Please, note that I have already published other posts dealing with the use of the ‘lme()’ function in the ‘nlme’ package, for example this post here about the correlation in designed experiments and this other post here, about heteroscedastic multienvironment experiments.

The first example in this series relates to a randomised complete block design with three replicates, comparing winter wheat genotypes. The experiment was repeated in seven years in the same location. The dataset (‘WinterWheat’) is available in the ‘aomisc’ package, which is the companion package for this blog and it is available on gitHub. Information on how to download and install the ‘aomisc’ package are given in this page. Please, note that this dataset shows the data for eight genotypes, but the model that we want to fit requires that the number of environments is higher than the number of genotypes. Therefore, we have to make a subset, at the beginning, removing a couple of genotypes.

The first code snippet loads the ‘aomisc’ package and other necessary packages. Afterwards, it loads the ‘WinterWheat’ dataset, subsets it and turns the ‘Genotype’, ‘Year’ and ‘Block’ variables into factors.

library(plyr) library(nlme) library(aomisc) data(WinterWheat) WinterWheat <- WinterWheat[WinterWheat$Genotype != "SIMETO" & WinterWheat$Genotype != "SOLEX",] WinterWheat$Genotype <- factor(WinterWheat$Genotype) WinterWheat$Year <- factor(WinterWheat$Year) WinterWheat$Block <- factor(WinterWheat$Block) head(WinterWheat, 10) ## Plot Block Genotype Yield Year ## 1 2 1 COLOSSEO 6.73 1996 ## 2 1 1 CRESO 6.02 1996 ## 3 50 1 DUILIO 6.06 1996 ## 4 49 1 GRAZIA 6.24 1996 ## 5 63 1 IRIDE 6.23 1996 ## 6 32 1 SANCARLO 5.45 1996 ## 9 110 2 COLOSSEO 6.96 1996 ## 10 137 2 CRESO 5.34 1996 ## 11 91 2 DUILIO 5.57 1996 ## 12 138 2 GRAZIA 6.09 1996

Dealing with the above dataset, a good candidate model for data analyses is the so-called ‘environmental variance model’. This model is often used in stability analyses for multi-environment experiments and I will closely follow the coding proposed in Piepho (1999):

$y_{ijk} = \mu + g_i + r_{jk} + h_{ij} + \varepsilon_{ijk}$

where $$y_{ijk}$$ is yield (or other trait) for the $$k$$-th block, $$i$$-th genotype and $$j$$-th environment, $$\mu$$ is the intercept, $$g_i$$ is the effect for the i-th genotype, $$r_{jk}$$ is the effect for the $$k$$-th block in the $$j$$-th environment, $$h_{ij}$$ is a random deviation from the expected yield for the $$i$$-th genotype in the $$j$$-th environment and $$\varepsilon_{ijk}$$ is the residual variability of yield between plots, within each environment and block.

We usually assume that $$r_{jk}$$ and $$\varepsilon_{ijk}$$ are independent and normally distributed, with variances equal to $$\sigma^2_r$$ and $$\sigma^2_e$$, respectively. Such an assumption may be questioned, but we will not do it now, for the sake of simplicity.

Let’s concentrate on $$h_{ij}$$, which we will assume as normally distributed with variance-covariance matrix equal to $$\Omega$$. In particular, it is reasonable to expect that the genotypes will have different variances across environments (heteroscedasticity), which can be interpreted as static stability measures (‘environmental variances’; hence the name ‘environmental variance model’). Furthermore, it is reasonable that, if an environment is good for one genotype, it may also be good for other genotypes, so that yields in each environment are correlated, although the correlations can be different for each couple of genotypes. To reflect our expectations, the $$\Omega$$ matrix needs to be totally unstructured, with the only constraint that it is positive definite.

Piepho (1999) has shown how the above model can be coded by using SAS and I translated his code into R.

EnvVarMod <- lme(Yield ~ Genotype, random = list(Year = pdSymm(~Genotype - 1), Year = pdIdent(~Block - 1)), control = list(opt = "optim", maxIter = 100), data=WinterWheat) VarCorr(EnvVarMod) ## Variance StdDev Corr ## Year = pdSymm(Genotype - 1) ## GenotypeCOLOSSEO 0.48876512 0.6991174 GCOLOS GCRESO GDUILI ## GenotypeCRESO 0.70949309 0.8423141 0.969 ## GenotypeDUILIO 2.37438440 1.5409038 0.840 0.840 ## GenotypeGRAZIA 1.18078525 1.0866394 0.844 0.763 0.942 ## GenotypeIRIDE 1.23555204 1.1115539 0.857 0.885 0.970 ## GenotypeSANCARLO 0.93335518 0.9661031 0.928 0.941 0.962 ## Year = pdIdent(Block - 1) ## Block1 0.02748257 0.1657787 ## Block2 0.02748257 0.1657787 ## Block3 0.02748257 0.1657787 ## Residual 0.12990355 0.3604214 ## ## Year = ## GenotypeCOLOSSEO GGRAZI GIRIDE ## GenotypeCRESO ## GenotypeDUILIO ## GenotypeGRAZIA ## GenotypeIRIDE 0.896 ## GenotypeSANCARLO 0.884 0.942 ## Year = ## Block1 ## Block2 ## Block3 ## Residual

I coded the random effects as a list, by using the ‘Year’ as the nesting factor (Galecki and Burzykowski, 2013). In order to specify a totally unstructured variance-covariance matrix for the genotypes within years, I used the ‘pdMat’ construct ‘pdSymm()’. This model is rather complex and may take long to converge.

The environmental variances are retrieved by the following code:

envVar <- as.numeric ( VarCorr(EnvVarMod)[2:7,1] ) envVar ## [1] 0.4887651 0.7094931 2.3743844 1.1807853 1.2355520 0.9333552

while the correlations are given by:

VarCorr(EnvVarMod)[2:7,3:7] ## Corr ## GenotypeCOLOSSEO "GCOLOS" "GCRESO" "GDUILI" "GGRAZI" "GIRIDE" ## GenotypeCRESO "0.969" "" "" "" "" ## GenotypeDUILIO "0.840" "0.840" "" "" "" ## GenotypeGRAZIA "0.844" "0.763" "0.942" "" "" ## GenotypeIRIDE "0.857" "0.885" "0.970" "0.896" "" ## GenotypeSANCARLO "0.928" "0.941" "0.962" "0.884" "0.942" Unweighted two-stage fitting

In his original paper, Piepho (1999) also gave SAS code to analyse the means of the ‘genotype x environment’ combinations. Indeed, agronomists and plant breeders often adopt a two-steps fitting procedure: in the first step, the means across blocks are calculated for all genotypes in all environments. In the second step, these means are used to fit an environmental variance model. This two-step process is less demanding in terms of computer resources and it is correct whenever the experiments are equireplicated, with no missing ‘genotype x environment’ combinations. Furthermore, we need to be able to assume similar variances within all experiments.

I would also like to give an example of this two-step analysis method. In the first step, we can use the ‘ddply()’ function in the package ‘plyr’:

#First step WinterWheatM <- ddply(WinterWheat, c("Genotype", "Year"), function(df) c(Yield = mean(df$Yield)) ) Once we have retrieved the means for genotypes in all years, we can fit the following model: $y_{ij} = \mu + g_i + a_{ij}$ where $$y_{ij}$$ is the mean yield for the $$i$$-th genotype in the $$j$$-th environment and $$a_{ij}$$ is the residual term, which includes the genotype x environment random interaction, the block x environment random interaction and the residual error term. In this model we have only one random effect ($$a_{ij}$$) and, therefore, this is a fixed linear model. However, we need to model the variance-covariance matrix of residuals ($$R$$), by adopting a totally unstructured form. Please, note that, when working with raw data, we have modelled $$\Omega$$, i.e. the variance-covariance matrix for the random effects. I have used the ‘gls()’ function, together with the ‘weights’ and ‘correlation’ arguments. See the code below. #Second step envVarModM <- gls(Yield ~ Genotype, data = WinterWheatM, weights = varIdent(form=~1|Genotype), correlation = corSymm(form=~1|Year)) summary(envVarModM) ## Generalized least squares fit by REML ## Model: Yield ~ Genotype ## Data: WinterWheatM ## AIC BIC logLik ## 80.6022 123.3572 -13.3011 ## ## Correlation Structure: General ## Formula: ~1 | Year ## Parameter estimate(s): ## Correlation: ## 1 2 3 4 5 ## 2 0.947 ## 3 0.809 0.815 ## 4 0.816 0.736 0.921 ## 5 0.817 0.866 0.952 0.869 ## 6 0.888 0.925 0.949 0.856 0.907 ## Variance function: ## Structure: Different standard deviations per stratum ## Formula: ~1 | Genotype ## Parameter estimates: ## COLOSSEO CRESO DUILIO GRAZIA IRIDE SANCARLO ## 1.000000 1.189653 2.143713 1.528848 1.560620 1.356423 ## ## Coefficients: ## Value Std.Error t-value p-value ## (Intercept) 6.413333 0.2742314 23.386574 0.0000 ## GenotypeCRESO -0.439524 0.1107463 -3.968746 0.0003 ## GenotypeDUILIO 0.178571 0.3999797 0.446451 0.6579 ## GenotypeGRAZIA -0.330952 0.2518270 -1.314205 0.1971 ## GenotypeIRIDE 0.281905 0.2580726 1.092347 0.2819 ## GenotypeSANCARLO -0.192857 0.1802547 -1.069915 0.2918 ## ## Correlation: ## (Intr) GCRESO GDUILI GGRAZI GIRIDE ## GenotypeCRESO 0.312 ## GenotypeDUILIO 0.503 0.371 ## GenotypeGRAZIA 0.269 -0.095 0.774 ## GenotypeIRIDE 0.292 0.545 0.857 0.638 ## GenotypeSANCARLO 0.310 0.612 0.856 0.537 0.713 ## ## Standardized residuals: ## Min Q1 Med Q3 Max ## -2.0949678 -0.5680656 0.1735444 0.7599596 1.3395000 ## ## Residual standard error: 0.7255481 ## Degrees of freedom: 42 total; 36 residual The variance-covariance matrix for residuals can be obtained using the ‘getVarCov()’ function in the ‘nlme’ package, although I had to discover that there is a small buglet there, which causes problems in some instances (such as here). Please, see this link; I have included the correct code in the ‘getVarCov.gls()’ function in the ‘aomisc’ package, that is the companion package for this blog. R <- getVarCov.gls(envVarModM) R ## Marginal variance covariance matrix ## [,1] [,2] [,3] [,4] [,5] [,6] ## [1,] 0.52642 0.59280 0.91285 0.65647 0.67116 0.63376 ## [2,] 0.59280 0.74503 1.09440 0.70422 0.84652 0.78560 ## [3,] 0.91285 1.09440 2.41920 1.58850 1.67700 1.45230 ## [4,] 0.65647 0.70422 1.58850 1.23040 1.09160 0.93442 ## [5,] 0.67116 0.84652 1.67700 1.09160 1.28210 1.01070 ## [6,] 0.63376 0.78560 1.45230 0.93442 1.01070 0.96855 ## Standard Deviations: 0.72555 0.86315 1.5554 1.1093 1.1323 0.98415 As the design is perfectly balanced, the diagonal elements of the above matrix correspond to the variances of genotypes across environments: tapply(WinterWheatM$Yield, WinterWheatM$Genotype, var) ## COLOSSEO CRESO DUILIO GRAZIA IRIDE SANCARLO ## 0.5264185 0.7450275 2.4191624 1.2304397 1.2821143 0.9685497 which can also be retreived by the ‘stability’ package: library(stability) ## Registered S3 methods overwritten by 'lme4': ## method from ## cooks.distance.influence.merMod car ## influence.merMod car ## dfbeta.influence.merMod car ## dfbetas.influence.merMod car envVarStab <- stab_measures( .data = WinterWheatM, .y = Yield, .gen = Genotype, .env = Year ) envVarStab$StabMeasures ## # A tibble: 6 x 7 ## Genotype Mean GenSS Var CV Ecov ShuklaVar ## ## 1 COLOSSEO 6.41 3.16 0.526 11.3 1.25 0.258 ## 2 CRESO 5.97 4.47 0.745 14.4 1.01 0.198 ## 3 DUILIO 6.59 14.5 2.42 23.6 2.31 0.522 ## 4 GRAZIA 6.08 7.38 1.23 18.2 1.05 0.208 ## 5 IRIDE 6.70 7.69 1.28 16.9 0.614 0.0989 ## 6 SANCARLO 6.22 5.81 0.969 15.8 0.320 0.0254

Strictly speaking, those variances are not the environmental variances, as they also contain the within-experiment and within block random variability, which needs to be separately estimated during the first step.

Andrea Onofri
Department of Agricultural, Food and Environmental Sciences
University of Perugia (Italy)

#References

• Gałecki, A., Burzykowski, T., 2013. Linear mixed-effects models using R: a step-by-step approach. Springer, Berlin.
• Muhammad Yaseen, Kent M. Eskridge and Ghulam Murtaza (2018). stability: Stability Analysis of Genotype by Environment Interaction (GEI). R package version 0.5.0. https://CRAN.R-project.org/package=stability
• Piepho, H.-P., 1999. Stability Analysis Using the SAS System. Agronomy Journal 91, 154–160.
• Pinheiro, J.C., Bates, D.M., 2000. Mixed-Effects Models in S and S-Plus, Springer-Verlag Inc. ed. Springer-Verlag Inc., New York.
var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

### Referring to POTUS on Twitter: a stance-based perspective on variation in the 116th House

Tue, 20/08/2019 - 02:00

In this post, we investigate how (& how often) members of the 116th House of Representatives refer to the 45th president of the United States on Twitter. TRUMP, POTUS, PRESIDENT TRUMP, @realDonaldTrump — options abound. Here, we consider how a House Rep’s stance towards (or opinion of) 45 influences the choice of referring expression, as well as how this stance aligns with the popularity of 45 in a House Rep’s congressional district.

A fully reproducible, R-based code-through.

A very brief introduction

Most linguistic variation is riddled with nuanced meaning, the source of which is often some type of socio-cultural value (Du Bois 2007). In the case of variation in reference, one dimension of this socio-cultural value is status. While “President Donald Trump” and “Donald Trump” point to the same referent, the former emphasizes the status of 45 as POTUS, while the latter downplays this status (Berg et al. 2019).

Similarly, “Mr. Trump” is a more deferential referring expression than “Trump”. We know this as speakers of English because of social convention: we refer to folks higher up in the food chain in different (ie, more formal) ways. A simple formality cline is presented below:

1. First name only < Last name only < Full name < Title and last name < Title and full name (Berg et al. 2019)

As a speaker, I can abide by this convention when referring to an elder/boss/POTUS/etc (by using forms towards the right of the cline), or I can flout it (by using forms to the left). In either case, I (theoretically) communicate my stance towards the referent to my audience.

In the case of a tweeting House Rep, this audience is their Twitter following (ie, ~their constituency). And if a House Rep is stancetaking when referring to 45 on Twitter, presumably how this audience feels about 45 mediates the polarity of the House Rep’s stance. Another, presumably safer, option would be to not refer to 45 at all. This is what we investigate here.

Some open source data sets library(tidyverse) Legislators & vote margins

We first grab some data/details about the 116th House of Representatives from a few online sources. For House Rep names, congressional districts, and twitter handles, we use a data set made available by the @unitedstates project. The project is a fantastic resource; maintained by folks from GovTrack, ProPublica, MapLight & FiveThirtyEight.

leg_dets <- 'https://theunitedstates.io/congress-legislators/legislators-current.csv' house_meta <- read.csv((url(leg_dets)), stringsAsFactors = FALSE) %>% filter(type == 'rep' & twitter!='') %>% # select(type, bioguide_id, icpsr_id, last_name, state, district, party, twitter) %>% mutate(district = ifelse(district == 0, 'AL', district), CD = paste0(state, '-', stringr::str_pad(district, 2, pad = '0')), twitter = toupper(twitter))

For Trump vote margins by congressional district, we utilize a data set made available by the DailyKos.

url <- 'https://docs.google.com/spreadsheets/d/1zLNAuRqPauss00HDz4XbTH2HqsCzMe0pR8QmD1K8jk8/edit#gid=0' margins_by_cd <- read.csv(text=gsheet::gsheet2text(url, format='csv'), skip = 1, stringsAsFactors=FALSE) %>% mutate(trump_margin = Trump - Clinton) %>% select(CD, trump_margin) Tweets: 116th House of Representatives

Next we gather tweets for members of the 116th House of Representatives using the rtweet package. Members took office on January 3, 2019, so we filter tweets to post-January 2. We also exclude retweets. (Last tweets collected on 8-19-19).

congress_tweets <- rtweet::get_timeline( house_meta$twitter, n = 2000, check=FALSE) %>% mutate(created_at = as.Date(gsub(' .*$', '', created_at))) %>% filter(is_quote == 'FALSE' & is_retweet == 'FALSE' & created_at > '2019-01-02' & display_text_width > 0) setwd("/home/jtimm/jt_work/GitHub/x_politico") #saveRDS(congress_tweets_tif, 'congress_tweets_tif.rds') saveRDS(congress_tweets, 'congress_tweets_tif.rds')

Then we join the Twitter and House lawmaker detail data sets:

congress_tweets <- congress_tweets %>% mutate(twitter = toupper(screen_name)) %>% select(status_id, created_at, twitter, text) %>% inner_join(house_meta %>% filter(type == 'rep'))

For a high level summary of how often members of 116th House have been tweeting since taking office, we summarize total tweets by House Rep. The density plot below summarizes the distribution of House Reps’ tweeting habits by party affiliation. So, Democrats (in blue) a bit more active on Twitter.

total_tweets <- congress_tweets %>% group_by(party, twitter) %>% summarize(all_tweets = n()) total_tweets %>% ggplot( aes(all_tweets, fill = party)) + ggthemes::scale_fill_stata() + theme_minimal()+ geom_density(alpha = 0.8, color = 'gray')+ labs(title="116th House Rep tweet counts by party affiliation")+ theme(legend.position = "none")

Some additional summary statistics about the tweeting habits of House Reps by party affiliation:

x <- list( 'REP' = summary(total_tweets$all_tweets[total_tweets$party == 'Republican']), 'DEM' = summary(total_tweets$all_tweets[total_tweets$party == 'Democrat'])) cbind(party =names(x$DEM), x%>% bind_rows()) %>% mutate(REP = round(REP), DEM = round(DEM))%>% t(.) %>% knitr::kable() party Min. 1st Qu. Median Mean 3rd Qu. Max. REP 5 143 230 272 360 1466 DEM 23 278 404 463 591 1624 Extracting referring expressions to 45 With tweets and some legislator details in tow, we can now get a beat on how members of the 116th House refer to POTUS 45 on Twitter. Here we present a quick-simple approach to extracting Twitter-references to 45. The code below summarizes the set of 45 referring expressions (in regex terms) that will be our focus here. It is not exhaustive. The list is ultimately a product of some trial/error, with less frequent forms being culled in the process (eg, #45). We have included “Trump Administration” in this set; while not exactly a direct reference to 45, it is super frequent and (as we will see) an interesting example. s1 <- "Trump Admin(istration)?" s2 <- '@realDonaldTrump' s3 <- '(@)?POTUS' s4 <- 'Mr(\\.)? President' s5 <- "the president" s6 <- '(Pres(\\.)? |President )?(Donald )?\\bTrump' searches <- c(s1, s2, s3, s4, s5, s6) potus <- paste(searches, collapse = '|') The procedure below extracts instantiations of the regex terms/patterns above from each tweet in our corpus. potus_sum <- lapply(1:nrow(congress_tweets), function(x) { spots <- gregexpr(pattern = potus, congress_tweets$text[x], ignore.case=TRUE) prez_gram <- regmatches(congress_tweets$text[x], spots)[[1]] if (-1 %in% spots){} else { data.frame(doc_id = congress_tweets$status_id[x], twitter = congress_tweets$twitter[x], prez_gram = toupper(prez_gram), stringsAsFactors = FALSE)} }) %>% data.table:::rbindlist() %>% mutate(prez_gram = trimws(prez_gram), prez_gram = gsub('\\.', '', prez_gram), prez_gram = gsub('ADMIN$', 'ADMINISTRATION', prez_gram), prez_gram = gsub('PRES ', 'PRESIDENT ', prez_gram), prez_gram = gsub('@', '', prez_gram)) %>% left_join(house_meta)

A sample of the output is presented below.

set.seed(149) potus_sum %>% select(doc_id:prez_gram) %>% sample_n(5) %>% knitr::kable() doc_id twitter prez_gram 1126141623803568128 REPMATTGAETZ REALDONALDTRUMP 1110977855561838593 REPJOEKENNEDY TRUMP ADMINISTRATION 1110615484335034373 REPFRANKLUCAS THE PRESIDENT 1131334181815017472 REPTEDLIEU POTUS 1136024493049163777 REPWILSON REALDONALDTRUMP

Based on the above output, the table below summarizes the frequency of expressions used to reference 45 by party affiliation.

data.frame(table(potus_sum$party, potus_sum$prez_gram)) %>% spread(Var1, Freq) %>% rename(prez_gram = Var2) %>% rowwise() %>% mutate(Total = sum (Democrat, Republican)) %>% arrange(desc(Total)) %>% janitor::adorn_totals(c('row')) %>% #Cool. knitr::kable() prez_gram Democrat Republican Total TRUMP 6818 375 7193 THE PRESIDENT 3816 890 4706 REALDONALDTRUMP 2015 1989 4004 POTUS 1023 1802 2825 TRUMP ADMINISTRATION 2497 133 2630 PRESIDENT TRUMP 1649 776 2425 DONALD TRUMP 224 23 247 MR PRESIDENT 195 42 237 PRESIDENT DONALD TRUMP 15 12 27 Total 18252 6042 24294 Party-level stance towards 45

Based on the counts above, we next investigate potential evidence of stancetaking at the party level. Here, we assume that Reps are supportive of 45 and that Dems are less supportive. If House Reps are stancetaking on Twitter, we would expect that Democrats use less formal terms to downplay the status of 45 & that Republicans use more formal terms to highlight the status of 45.

To get a sense of which terms are more prevalent among each party, we consider the probability of each party using a particular expression to refer to 45. Then we calculate the degree of formality for a given expression as the simple ratio of the two usage rates – where the higher rate is treated as the numerator. Terms prevalent among Democrats are transformed to negative values.

The table below summarizes these ratios, which can be interpreted as follows: Reps are ~5.3 times more likely than Dem colleagues to refer to 45 on Twitter as POTUS; Dems are ~6 times more likely to refer to 45 as Trump.

ratios <- potus_sum %>% group_by(party, prez_gram) %>% summarize(n = n()) %>% group_by(party) %>% mutate(per = round(n/sum(n), 3))%>% group_by(prez_gram) %>% mutate(n = sum(n)) %>% spread(party, per) %>% mutate(ratio = ifelse(Republican > Democrat, Republican/Democrat, -Democrat/Republican), ratio = round(ratio, 2)) %>% filter(n > 60) %>% select(-n) %>% arrange(desc(ratio)) ratios %>% knitr::kable() prez_gram Democrat Republican ratio POTUS 0.056 0.298 5.32 REALDONALDTRUMP 0.110 0.329 2.99 PRESIDENT TRUMP 0.090 0.128 1.42 THE PRESIDENT 0.209 0.147 -1.42 MR PRESIDENT 0.011 0.007 -1.57 DONALD TRUMP 0.012 0.004 -3.00 TRUMP 0.374 0.062 -6.03 TRUMP ADMINISTRATION 0.137 0.022 -6.23

The visualization below summarizes formality ratios for 45 referring expressions as a simple cline. Less formal terms (prevalent among Democrats) are in blue; More formal terms (prevalent among Republicans) are in red.

#cut <- 1.45 ratios %>% mutate(col1 = ifelse(ratio>0, 'red', 'blue')) %>% ggplot(aes(x=reorder(prez_gram, ratio), y=ratio, label=prez_gram, color = col1)) + # geom_hline(yintercept = cut, # linetype = 2, color = 'gray') + # geom_hline(yintercept = -cut, # linetype = 2, color = 'gray') + geom_point(size= 1.5, color = 'darkgray') + geom_text(size=4, hjust = 0, nudge_y = 0.15)+ annotate('text' , y = -5, x = 7, label = 'Democrat') + annotate('text' , y = 5, x = 3, label = 'Republican') + ggthemes::scale_color_stata() + theme_minimal() + labs(title="Twitter-based formality cline") + ##? theme(legend.position = "none", axis.text.y=element_blank(), axis.ticks.y=element_blank())+ xlab('') + ylab('Polarity')+ ylim(-7, 7) + coord_flip()

So, some real nice variation. Recall our initial (& very generic) formality cline presented in the introduction:

1. First name only < Last name only < Full name < Title and last name < Title and full name

Compared to our House Rep, Twitter-based, 45-specific cline:

1. Trump Administration < Trump < Donald Trump < Mr. President < The President < President Trump < realDonaldTrump < POTUS

While alignment between (1) & (2) is not perfect, the two are certainly conceptually comparable, indeed suggesting that House Reps are choosing expressions to refer to 45 based on stance. Terms prevalent among House Dems downplay the status of 45 by excluding titles & explicit reference to the office (eg, TRUMP, DONALD TRUMP). In contrast, terms prevalent among Republicans highlight the status of 45 via direct reference to the office (eg, PRESIDENT TRUMP, POTUS). More neutral terms (eg, MR PRESIDENT, THE PRESIDENT) reference the office but not the individual.

While the Twitter handle @realDonaldTrump does not highlight the status of the presidency per se, it would seem to carry with it some Twitter-based deference. (I imagine the “real-” prefix is also at play here.) The prevalence of the acronym POTUS among Reps is interesting as well. On one hand, it is super economical; on the other hand, the acronym unpacked is arguably the most deferential. The prevalence of Trump Administration among Dems is also curious – it would seem to be a way to reference 45 without actually referencing (or conjuring images of) either the individual or the office.

House Rep stance & 2016 presidential vote margins

The next, and more interesting, piece is how stancetaking plays out at the House Rep level. While the formality cline presented above illustrates some clear divisions between how Dems and Reps refer to the president, its gradient nature speaks to individual variation.

In this section, we (1) present a simple method for quantifying House Rep-level variation in formality when referring to 45, and (2) investigate the extent to which district-level support for 45 in the 2016 presidential election can account for this variation.

ratios <- ratios %>% mutate(polarity = case_when( ratio > 1.4 ~ 'Formal', ratio < -2.5 ~ 'LessFormal', ratio > -2.5 & ratio < 1.4 ~ 'Neutral'))

To get started, we first categorize each reference to 45 in our data set as either Formal (POTUS, REALDONADTRUMP, PRESIDENT TRUMP), Less Formal (TRUMP ADMINISTRATION, TRUMP, DONALD TRUMP), or Neutral (MR TRUMP, THE PRESIDENT). Reference to 45 (per legislator) is then represented as a (count-based) distribution across these three formality categories.

wide <- potus_sum%>% filter(prez_gram %in% unique(ratios$prez_gram)) %>% left_join(ratios %>% select(prez_gram, polarity)) %>% group_by(CD) %>% mutate(prez_tweets = length(unique(doc_id)))%>% group_by(CD, twitter, last_name, party, polarity, prez_tweets) %>% summarize(n = n()) Formality distributions for a random set of House Reps are summarized in the plot below. So, lots of variation – and presumably 435 House Reps that refer to 45 with varying degrees of formality. set.seed(171) samp <- sample(margins_by_cd$CD, 10) pal <- c('#395f81', 'gray', '#9e5055') names(pal) <- c('LessFormal', 'Neutral', 'Formal') wide %>% filter(CD %in% samp) %>% group_by(CD) %>% mutate(per = n/sum(n))%>% select(-n) %>% spread(polarity, per) %>% ungroup() %>% mutate(rank = rank(Formal), lab = paste0(last_name, ' (', CD, '-', substr(party, 1,1), ')')) %>% gather(key = polarity, value = per, LessFormal, Formal, Neutral)%>% mutate(polarity = factor(polarity, levels = c('LessFormal', 'Neutral', 'Formal'))) %>% ggplot (aes(x = reorder(lab, rank), y = per, fill = polarity)) + geom_bar(position = "fill", stat = "identity") + coord_flip()+ theme_minimal()+ theme(legend.position = 'bottom', axis.title.y=element_blank()) + scale_fill_manual(values = pal) + ggtitle('Example degrees of formality in the 116th House')

Based on these distributions, we define a given House Rep’s degree of formality as the (log) ratio of the number of formal terms used to refer to 45 to the number of less formal terms used to refer to 45. Neutral terms are ignored.

Values greater than one indicate a prevalence for referring expressions that highlight the status of 45; values less than one indicate a prevalence for referring expressions that downplay the status of 45. The former reflecting a positive/supportive stance; the latter a negative/less supportive stance. A relative & rough approximation.

wide1 <- wide %>% group_by(CD) %>% mutate(prez_refs = sum(n)) %>% spread(polarity, n) %>% ungroup() %>% replace(., is.na(.), 1) %>% mutate(ratio = round(Formal/LessFormal, 3)) %>% inner_join(margins_by_cd)%>% left_join(total_tweets)

So, to what extent does a congressional district’s collective support for 45 (per 2016 Trump margins) influence the degree of formality with which their House Rep refers to 45? Do House Reps representing districts that supported HRC in 2016, for example, use less formal terms to convey a negative stance towards 45, and mirror the sentiment of their constituents (ie, their ~Twitter followers & ~audience)?

The plot below illustrates the relationship between House Reps’ degrees of formality on Twitter & 2016 presidential vote margins for their respective congressional districts. As can be noted, a fairly strong, positive relationship between the two variables.

wide1 %>% filter(prez_tweets > 10) %>% ggplot(aes(x = trump_margin, y = log(jitter(ratio)), color = party) ) + geom_point()+ geom_smooth(method="lm", se=T, color = 'steelblue')+ geom_text(aes(label=last_name), size=3, check_overlap = TRUE, color = 'black')+ ggthemes::scale_color_stata()+ theme_minimal()+ theme(legend.position = "none", axis.title = element_text())+ xlab('2016 Trump Vote Margin') + ylab('Degree of Formality')+ ggtitle('2016 Trump Margins vs. Degree of Formality on Twitter')

So, not only are there systematic differences in how Dems & Reps reference 45 on Twitter, these differences are gradient within/across party affiliation: formality in reference to 45 increases as 2016 Trump margins increase. House Reps are not only hip to how their constituents (the audience) feel about 45 (the referent), but they choose referring expressions (and mediate stance) accordingly.

Prevalence of 45 reference

Next we consider how often members of the 116th House reference 45 on Twitter, which we operationalize here as the percentage of a House Rep’s total tweets that include reference to 45.

wide2 <- wide1 %>% mutate(party = gsub('[a-z]', '', party), trump_margin = round(trump_margin,1), per_prez = round(prez_tweets/all_tweets, 2)) %>% select(CD, last_name, party, per_prez, all_tweets, trump_margin) %>% arrange(desc(per_prez))

The density plot below summarizes the distribution of these percentages by party affiliation. A curious plot indeed. The bimodal nature of the House Dem distribution sheds light on two distinct approaches to Twitter & 45 among House Dems. One group that takes a bit of a “no comment” approach and another in which reference to 45 is quite prevalent.

wide2 %>% ggplot( aes(per_prez, fill = party)) + ggthemes::scale_fill_stata() + theme_minimal()+ geom_density(alpha = 0.8, color = 'gray')+ labs(title="Rates of reference to 45 on Twitter")+ theme(legend.position = "none")

The table below summarizes 45 tweet reference rates for members of the 116th House, along with total tweets & 2016 Trump vote margins for some context. Lots going on for sure. Curious to note that Maxine Waters (CA-43) and Adam Schiff (CA-28) reference 45 on Twitter at the highest rates, despite being fairly infrequent tweeters in general. Almost as if they use Twitter for the express purpose of commenting on the president and/or defending themselves from the president’s Twitter-ire.

out <- wide2 %>% DT::datatable(extensions = 'FixedColumns', options = list(scrollX = TRUE, fixedColumns = list(leftColumns = 1:3)), rownames =FALSE, width="450") %>% DT::formatStyle('per_prez', background = DT::styleColorBar(wide2$per_prez, "lightblue"), backgroundSize = '80% 70%', backgroundRepeat = 'no-repeat', backgroundPosition = 'right') Rates of 45-reference, total tweets & 2016 Trump margins for members of the 116th House: {"x":{"filter":"none","extensions":["FixedColumns"],"data":[["CA-43","CA-28","AZ-03","LA-03","MD-04","NC-11","OR-04","TN-09","CA-19","OH-04","CA-27","VA-08","CA-14","CA-02","TX-09","FL-21","NJ-09","AZ-05","NY-17","TX-04","NC-04","RI-01","TX-20","TX-35","AZ-07","CA-13","FL-01","FL-10","LA-01","MD-05","PA-03","CA-12","CA-47","CO-01","IL-01","MN-04","NJ-06","OH-09","WI-02","CA-08","MA-09","MI-04","IL-09","NY-08","TX-15","CA-31","CA-51","FL-02","FL-23","MO-05","NY-13","TX-36","VA-11","WI-04","CA-11","CA-15","CA-29","CA-33","CA-35","CA-37","CA-50","CA-53","IL-02","ME-01","MI-09","NY-18","SC-07","TX-27","TX-34","CA-30","CA-32","FL-20","IL-11","MA-05","NJ-10","NJ-12","NY-05","NY-07","OH-13","OK-01","TX-30","WA-09","CA-40","CA-41","CA-46","CO-07","IL-04","IL-05","KY-03","MI-14","NM-03","NY-16","TN-04","TN-07","TX-12","TX-16","CA-09","CA-34","CT-03","FL-11","FL-22","MA-02","MI-05","NY-10","NY-20","PA-02","TX-14","TX-18","TX-19","AL-04","CA-18","CA-39","CO-05","IN-07","MA-04","MI-13","PA-18","SC-03","TX-05","VT-AL","GA-08","MI-01","OH-03","PA-16","TX-26","WV-02","CA-06","CA-24","CA-48","CA-49","GA-04","GA-10","IN-08","LA-04","MD-02","MD-07","MD-08","NY-15","WA-07","CA-03","CA-17","CA-38","FL-04","FL-14","FL-19","GA-01","GA-09","LA-05","MO-08","NV-01","OK-02","TX-25","AR-01","CA-04","IA-04","KS-04","NC-01","NJ-01","NY-01","NY-06","OH-07","TX-11","WI-07","AL-05","AZ-02","CA-23","CA-52","FL-03","FL-09","IN-03","MD-01","MD-03","MD-06","MO-06","NH-02","OH-08","OR-01","PA-14","SC-05","TN-03","TX-06","TX-08","TX-33","WI-05","AZ-04","CT-04","FL-26","GA-11","GA-12","MO-03","NY-04","NY-12","NY-23","RI-02","TX-02","WA-04","WA-10","AL-01","AL-07","CO-02","FL-25","FL-27","IL-10","NY-02","NY-09","NY-25","OK-04","PA-04","PA-05","PA-13","TN-01","WA-01","AR-04","CA-21","CA-45","FL-16","FL-24","GA-02","GA-07","IL-18","MN-05","NC-06","NC-07","NJ-08","SC-06","TN-06","TX-01","VA-10","WY-AL","CA-01","CA-44","FL-17","IL-06","IL-12","IL-17","IN-02","MS-03","MS-04","NC-08","NM-02","TX-28","UT-02","WV-03","CA-07","IL-16","KY-05","MA-08","MO-04","MS-02","NC-10","NY-26","PA-11","SD-AL","TN-05","TN-08","TX-21","VA-01","VA-03","VA-04","WA-02","AK-AL","CA-16","CA-22","CT-02","GA-14","HI-02","IL-08","IN-06","ME-02","MN-01","NE-03","NH-01","OH-06","SC-02","TX-10","TX-29","VA-06","AL-02","AL-06","AZ-08","AZ-09","CA-25","CO-04","FL-06","IL-03","KY-04","LA-02","MA-01","MA-06","MI-08","NC-05","NC-12","NC-13","ND-AL","NM-01","NV-04","OR-02","PA-09","TX-31","AL-03","AR-02","CA-20","CA-42","CO-06","DE-AL","FL-07","FL-13","GA-05","IA-02","IL-13","KS-02","KY-02","KY-06","MA-03","MI-10","MN-03","NE-01","NJ-07","NY-27","OH-10","OH-11","OH-14","OK-03","PA-06","PA-07","PA-08","PA-15","SC-04","TX-13","TX-17","TX-32","WA-05","WI-08","AR-03","CA-05","CA-36","CT-01","FL-15","FL-18","GA-03","IA-03","ID-02","IL-07","LA-06","MI-12","MN-08","NE-02","NY-14","NY-21","OH-15","OR-05","PA-10","UT-03","WV-01","CO-03","FL-05","FL-12","GA-06","IA-01","ID-01","IL-14","IN-05","MA-07","MI-06","MI-07","MI-11","MO-02","MO-07","NY-03","NY-24","TN-02","TX-22","CT-05","HI-01","KS-03","MI-02","MN-06","MS-01","NC-02","NJ-02","NJ-11","NV-03","NY-11","NY-22","OH-02","OH-05","OH-16","SC-01","TX-03","TX-07","TX-23","UT-04","VA-02","VA-05","VA-07","WA-03","WA-08","WI-03","WI-06","AZ-01","AZ-06","CA-10","GA-13","IN-04","MN-02","NJ-03","NJ-05","NY-19","OH-12","OK-05","PA-01","PA-17","VA-09","WA-06","WI-01"],["Waters","Schiff","Grijalva","Higgins","Brown","Meadows","DeFazio","Cohen","Lofgren","Jordan","Chu","Beyer","Speier","Huffman","Green","Frankel","Pascrell","Biggs","Lowey","Ratcliffe","Price","Cicilline","Castro","Doggett","Gallego","Lee","Gaetz","Demings","Scalise","Hoyer","Evans","Pelosi","Lowenthal","DeGette","Rush","McCollum","Pallone","Kaptur","Pocan","Cook","Keating","Moolenaar","Schakowsky","Jeffries","Gonzalez","Aguilar","Vargas","Dunn","Wasserman Schultz","Cleaver","Espaillat","Babin","Connolly","Moore","DeSaulnier","Swalwell","Cárdenas","Lieu","Torres","Bass","Hunter","Davis","Kelly","Pingree","Levin","Maloney","Rice","Cloud","Vela","Sherman","Napolitano","Hastings","Foster","Clark","Payne","Watson Coleman","Meeks","Velázquez","Ryan","Hern","Johnson","Smith","Roybal-Allard","Takano","Correa","Perlmutter","García","Quigley","Yarmuth","Lawrence","Luján","Engel","DesJarlais","Green","Granger","Escobar","McNerney","Gomez","DeLauro","Webster","Deutch","McGovern","Kildee","Nadler","Tonko","Boyle","Weber","Jackson Lee","Arrington","Aderholt","Eshoo","Cisneros","Lamborn","Carson","Kennedy","Tlaib","Doyle","Duncan","Gooden","Welch","Scott","Bergman","Beatty","Kelly","Burgess","Mooney","Matsui","Carbajal","Rouda","Levin","Johnson","Hice","Bucshon","Johnson","Ruppersberger","Cummings","Raskin","Serrano","Jayapal","Garamendi","Khanna","Sánchez","Rutherford","Castor","Rooney","Carter","Collins","Abraham","Smith","Titus","Mullin","Williams","Crawford","McClintock","King","Estes","Butterfield","Norcross","Zeldin","Meng","Gibbs","Conaway","Duffy","Brooks","Kirkpatrick","McCarthy","Peters","Yoho","Soto","Banks","Harris","Sarbanes","Trone","Graves","Kuster","Davidson","Bonamici","Reschenthaler","Norman","Fleischmann","Wright","Brady","Veasey","Sensenbrenner","Gosar","Himes","Mucarsel-Powell","Loudermilk","Allen","Luetkemeyer","Rice","Maloney","Reed","Langevin","Crenshaw","Newhouse","Heck","Byrne","Sewell","Neguse","Diaz-Balart","Shalala","Schneider","King","Clarke","Morelle","Cole","Dean","Scanlon","Joyce","Roe","DelBene","Westerman","Cox","Porter","Buchanan","Wilson","Bishop","Woodall","LaHood","Omar","Walker","Rouzer","Sires","Clyburn","Rose","Gohmert","Wexton","Cheney","LaMalfa","Barragán","Steube","Casten","Bost","Bustos","Walorski","Guest","Palazzo","Hudson","Torres Small","Cuellar","Stewart","Miller","Bera","Kinzinger","Rogers","Lynch","Hartzler","Thompson","McHenry","Higgins","Smucker","Johnson","Cooper","Kustoff","Roy","Wittman","Scott","McEachin","Larsen","Young","Costa","Nunes","Courtney","Graves","Gabbard","Krishnamoorthi","Pence","Golden","Hagedorn","Smith","Pappas","Johnson","Wilson","McCaul","Garcia","Cline","Roby","Palmer","Lesko","Stanton","Hill","Buck","Waltz","Lipinski","Massie","Richmond","Neal","Moulton","Slotkin","Foxx","Adams","Budd","Armstrong","Haaland","Horsford","Walden","Meuser","Carter","Rogers","Hill","Panetta","Calvert","Crow","Blunt Rochester","Murphy","Crist","Lewis","Loebsack","Davis","Watkins","Guthrie","Barr","Trahan","Mitchell","Phillips","Fortenberry","Malinowski","Collins","Turner","Fudge","Joyce","Lucas","Houlahan","Wild","Cartwright","Thompson","Timmons","Thornberry","Flores","Allred","McMorris Rodgers","Gallagher","Womack","Thompson","Ruiz","Larson","Spano","Mast","Ferguson","Axne","Simpson","Davis","Graves","Dingell","Stauber","Bacon","Ocasio-Cortez","Stefanik","Stivers","Schrader","Perry","Curtis","McKinley","Tipton","Lawson","Bilirakis","McBath","Finkenauer","Fulcher","Underwood","Brooks","Pressley","Upton","Walberg","Stevens","Wagner","Long","Suozzi","Katko","Burchett","Olson","Hayes","Case","Davids","Huizenga","Emmer","Kelly","Holding","Van Drew","Sherrill","Lee","Rose","Brindisi","Wenstrup","Latta","Gonzalez","Cunningham","Taylor","Fletcher","Hurd","McAdams","Luria","Riggleman","Spanberger","Herrera Beutler","Schrier","Kind","Grothman","O’Halleran","Schweikert","Harder","Scott","Baird","Craig","Kim","Gottheimer","Delgado","Balderson","Horn","Fitzpatrick","Lamb","Griffith","Kilmer","Steil"],["D","D","D","R","D","R","D","D","D","R","D","D","D","D","D","D","D","R","D","R","D","D","D","D","D","D","R","D","R","D","D","D","D","D","D","D","D","D","D","R","D","R","D","D","D","D","D","R","D","D","D","R","D","D","D","D","D","D","D","D","R","D","D","D","D","D","R","R","D","D","D","D","D","D","D","D","D","D","D","R","D","D","D","D","D","D","D","D","D","D","D","D","R","R","R","D","D","D","D","R","D","D","D","D","D","D","R","D","R","R","D","D","R","D","D","D","D","R","R","D","R","R","D","R","R","R","D","D","D","D","D","R","R","R","D","D","D","D","D","D","D","D","R","D","R","R","R","R","R","D","R","R","R","R","R","R","D","D","R","D","R","R","R","R","D","R","D","R","D","R","R","D","D","R","D","R","D","R","R","R","R","R","D","R","R","D","D","R","R","R","D","D","R","D","R","R","D","R","D","D","R","D","D","R","D","D","R","D","D","R","R","D","R","D","D","R","D","D","R","R","D","R","R","D","D","R","R","D","R","R","D","R","D","R","D","R","R","R","R","D","D","R","R","D","R","R","D","R","D","R","D","R","R","D","R","R","R","D","D","D","R","D","R","D","R","D","D","R","D","R","R","D","R","R","R","D","R","R","R","R","D","D","R","R","D","R","D","D","D","D","R","D","R","R","D","D","R","R","R","R","R","D","R","D","D","D","D","D","D","R","R","R","R","D","R","D","R","D","R","R","D","R","R","D","D","D","R","R","R","R","D","R","R","R","D","D","D","R","R","R","D","R","D","R","D","R","R","D","R","R","D","R","R","R","R","D","R","D","D","R","D","R","D","R","R","D","R","R","D","R","R","R","D","D","D","R","R","R","R","D","D","D","D","D","R","R","R","D","R","D","R","D","D","R","D","R","D","D","R","D","R","D","D","R","D","D","D","D","R","D","R","D","R","D","R"],[0.6,0.55,0.42,0.38,0.38,0.36,0.36,0.36,0.34,0.34,0.33,0.33,0.32,0.31,0.31,0.3,0.3,0.29,0.29,0.29,0.28,0.28,0.28,0.28,0.27,0.27,0.27,0.27,0.27,0.27,0.27,0.26,0.26,0.26,0.26,0.26,0.26,0.26,0.26,0.25,0.25,0.25,0.24,0.24,0.24,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.21,0.21,0.21,0.21,0.21,0.21,0.21,0.21,0.21,0.21,0.21,0.21,0.21,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.19,0.19,0.19,0.19,0.19,0.19,0.19,0.19,0.19,0.19,0.19,0.19,0.19,0.18,0.18,0.18,0.18,0.18,0.18,0.18,0.18,0.18,0.18,0.18,0.17,0.17,0.17,0.17,0.17,0.17,0.16,0.16,0.16,0.16,0.16,0.16,0.16,0.16,0.16,0.16,0.16,0.16,0.16,0.15,0.15,0.15,0.15,0.15,0.15,0.15,0.15,0.15,0.15,0.15,0.15,0.15,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.12,0.12,0.12,0.12,0.12,0.12,0.12,0.12,0.12,0.12,0.12,0.12,0.12,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.09,0.09,0.09,0.09,0.09,0.09,0.09,0.09,0.09,0.09,0.09,0.09,0.09,0.09,0.08,0.08,0.08,0.08,0.08,0.08,0.08,0.08,0.08,0.08,0.08,0.08,0.08,0.08,0.08,0.08,0.08,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.04,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01],[114,185,615,121,432,226,109,781,87,312,330,387,586,102,213,502,505,1033,279,112,405,284,412,1045,267,1036,882,820,457,1269,617,745,678,449,309,708,1201,295,411,16,71,20,545,310,215,134,235,147,585,420,1235,221,309,389,835,370,241,297,508,350,201,171,554,430,498,537,215,40,23,332,224,326,262,327,381,497,308,1026,444,156,119,275,564,409,116,527,454,732,152,352,803,289,183,503,20,675,145,589,330,57,783,574,323,332,402,570,132,884,278,38,129,542,197,307,180,317,294,352,363,133,142,92,879,88,329,204,300,583,401,431,304,279,241,282,305,348,288,229,1237,521,1092,184,236,563,65,142,382,148,306,744,247,314,195,74,143,257,260,259,346,339,116,36,413,238,549,505,933,135,367,692,137,355,411,419,548,408,911,155,220,370,213,526,391,85,344,507,1624,240,172,140,278,1160,208,608,265,111,212,859,481,730,579,768,372,237,347,646,140,451,831,437,178,576,256,405,467,129,763,126,325,358,628,569,151,858,429,216,378,866,130,362,551,146,705,230,738,254,266,402,285,225,254,218,242,316,126,59,491,414,396,131,421,275,276,266,383,759,360,203,549,646,118,215,14,551,185,204,207,416,351,321,249,617,321,385,174,462,215,322,105,428,221,592,174,380,540,843,49,108,366,303,94,879,272,327,995,806,296,221,262,421,318,352,420,644,231,307,364,38,140,495,332,38,184,1110,417,479,223,195,225,128,291,239,172,704,529,111,337,73,41,402,436,407,418,545,964,301,357,203,498,179,388,89,114,141,623,261,1466,27,383,141,131,182,437,169,144,228,188,204,275,193,354,459,539,407,213,796,228,298,259,164,194,775,638,281,394,144,215,286,64,284,508,821,439,716,241,351,251,476,162,571,197,365,780,588,1095,171,210,169,551,692,253,515,80,170,592,343,963,815,230,251,367,185,100,542,544],[-61.7,-49.8,-29.9,38.1,-57.5,29.2,-0.1,-57.7,-51.4,33.6,-37.6,-52.6,-58.7,-45.7,-61.3,-19.5,-31.2,21.1,-20.2,53.6,-40,-25.6,-26.7,-33.6,-49.2,-80.6,39.3,-26.9,42,-32,-83.9,-77.5,-31.6,-45.8,-54,-30.9,-15.6,-22.2,-36.8,15.1,-10.7,24.8,-45.2,-71.1,-16.7,-21.1,-49,35.6,-26.1,-13.5,-86.9,46.8,-39.4,-52.2,-48.8,-45.7,-60.9,-41.3,-40.8,-76.1,15,-34.9,-58.9,-14.8,-7.8,1.9,18.9,23.6,-21.5,-43.4,-38.9,-62.1,-23.5,-43.6,-72.4,-33.2,-73,-76.5,-6.5,28.7,-60.8,-47.2,-69.4,-27.9,-38.4,-12,-68.9,-46.6,-15,-60.9,-15.1,-52.6,41.2,39.3,30.2,-40.7,-18.6,-72.9,-15.5,32.3,-15.8,-19.4,-4.2,-59.5,-13.5,-48,19.8,-56.5,49,63,-53.2,-8.6,24,-22.8,-24.2,-60.7,-27.5,38,28.4,-26.4,28.9,21.3,-38.4,20,26.5,36.4,-44.8,-20.2,-1.7,-7.5,-53.1,25.5,33.7,24,-24.4,-55.6,-34.4,-88.9,-69.9,-12.6,-53.4,-39.6,28,-18.2,22.1,15.5,58.5,29.4,54.4,-29,50.1,14.9,34.8,14.7,27.4,27.2,-37,-24.5,12.3,-33,29.7,58.7,20.4,33.4,-4.9,22,-22.5,16,-12.9,35,28.6,-30.5,-15.1,31.4,-2.4,34.5,-22.8,29,18.5,35.2,12.3,48.8,-49.2,20.1,40.2,-23,-16.3,25,16.2,39,-9.6,-69.8,14.8,-7.1,9.3,22.8,-11.4,29.4,-41.2,-21.3,1.8,-19.6,-29.4,9.1,-69.1,-16.4,37.4,-19.3,-28.2,45.7,57,-16.3,32.9,-15.5,-5.4,10.7,-67.5,-11.7,6.3,27.3,-55.2,14.7,17.7,-54.2,-36.5,48.9,46.9,-10,47.6,19.7,-70.7,27.2,-7,14.8,0.7,23.2,24.5,41.2,15,10.2,-19.8,14,49.2,-11.4,17.2,62.1,-26,36,-28.5,24.6,-19.6,25.8,29.8,-18.3,35.6,10,12.4,-31.7,-21.6,-22.1,15.2,-21.6,9.5,-2.9,52.9,-31.8,-21.7,40.3,10.3,14.9,54.9,1.6,42.6,17.7,9.1,-45.7,24.8,31.9,44.7,21.1,-16.3,-6.7,23.1,17,-15.3,35.9,-52.4,-20.7,-17.9,6.7,17.6,-40,9.4,36.4,-16.5,-4.9,20.1,34,12.7,33,10.7,-47.2,12,-8.9,-11.5,-7.3,-3.2,-73.1,4.1,5.5,18.4,39.9,15.3,-22.8,32.2,-9.4,21.3,-1.1,24.5,7.3,-63.5,11.5,52.7,-9.3,-1.1,9.6,43.3,25.7,63,17.5,-1.9,13.1,17.6,31.4,-44.9,-8.8,-23.1,10,9.2,31.5,3.5,24.7,-78.2,33.8,-26.3,15.6,2.2,-57.9,13.9,15.4,-4.2,8.9,23.9,41.6,12,-25.4,18.6,1.5,3.5,38.3,3.9,11.8,-72.2,8.4,17,4.4,10.3,45.7,-6.1,-3.6,35.4,7.9,-4.1,-32.6,-1.2,17.6,25.7,33,9.6,4.6,0.9,1,9.8,15.5,16.1,25.1,16.7,13.1,14.2,-1.4,-3.4,6.7,3.4,11.1,6.5,7.4,-3,4.5,16.9,1.1,10,-3,-44.4,34.1,1.2,6.2,1.1,6.8,11.3,13.4,-2,2.6,41.5,-12.3,10.3]],"container":" \n \n \n CD<\/th>\n last_name<\/th>\n party<\/th>\n per_prez<\/th>\n all_tweets<\/th>\n trump_margin<\/th>\n <\/tr>\n <\/thead>\n<\/table>","options":{"scrollX":true,"fixedColumns":{"leftColumns":[1,2,3]},"columnDefs":[{"className":"dt-right","targets":[3,4,5]}],"order":[],"autoWidth":false,"orderClasses":false,"rowCallback":"function(row, data) {\nvar value=data[3];$(this.api().cell(row, 3).node()).css({'background':isNaN(parseFloat(value)) || value <= 0.010000 ? '' : 'linear-gradient(90.000000deg, transparent ' + (0.600000 - value)/0.590000 * 100 + '%, lightblue ' + (0.600000 - value)/0.590000 * 100 + '%)','background-size':'80% 70%','background-repeat':'no-repeat','background-position':'right'});\n}"}},"evals":["options.rowCallback"],"jsHooks":[]}

Last question, then: to what extent does a congressional district’s collective support for 45 (per 2016 Trump margins) influence the rate at which House Reps reference 45 on Twitter?

The much talked about freshmen class of House Dems, for example, is largely comprised of folks from districts that supported Trump in 2016. As such, freshmen Dems are generally more centrist ideologically, representing districts with mixed feeling towards 45. Do they tend to play it safe on Twitter (and with their constituents), and keep the president’s name out of their Twitter mouths?

Per the plot below, this would seem to be the case (although freshmen Dems are not explicitly identified). Circe size reflects total tweet count. House members on both sides of the aisle representing districts with slimmer 2016 Trump margins reference 45 on Twitter at lower rates.

wide2 %>% ggplot(aes(x = trump_margin, y = per_prez, color = as.factor(party), size = all_tweets) ) + geom_point()+ geom_smooth(method = "lm", se = T)+ geom_text(aes(label=last_name), size=3, check_overlap = TRUE, color = 'black')+ ggthemes::scale_color_stata()+ theme_minimal()+ theme(legend.position = "none", axis.title = element_text())+ scale_y_continuous(limits = c(0,.4)) + xlab('2016 Trump Margin') + ylab('Reference-to-Trump Rate') + ggtitle('2016 Trump Margins vs. Reference-to-Trump Rates')

Seemingly a no-brainer if you don’t want to ruffle any feathers within an ideologically heterogeneous constituency, and if you want to fly under 45’s Twitter-radar. On the other hand, House Reps in safer (ie, ideologically more uniform) districts (especially Dems) are more likely to comment (or sound-off) on the doings of 45.

Summary

So, a couple of novel metrics for investigating variation with respect to the how & how often of 45-reference on Twitter in the 116th House. Simple methods (that could certainly be tightened up some) & intuitive results that align quite well with with linguistic/stance theory. Also some super interesting & robust relationships based in two very disparately-sourced data sets: 2016 Trump margins and Twitter text data (ca, present day).

The predictive utility of 2016 presidential voting margins seems (roughly) limitless. As does the cache of socio-political treasure hidden in the tweets of US lawmakers – for better or worse. A fully reproducible post. Cheers.

References

Berg, Esther van den, Katharina Korfhage, Josef Ruppenhofer, Michael Wiegand, and Katja Markert. 2019. “Not My President: How Names and Titles Frame Political Figures.” In Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science, 1–6.

Du Bois, John W. 2007. “The Stance Triangle.” Stancetaking in Discourse: Subjectivity, Evaluation, Interaction 164 (3): 139–82.

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

### Correspondence Analysis visualization using ggplot

Tue, 20/08/2019 - 02:00

What we want to do

Recently, I used a correspondence analysis from the ca package in a paper. All of the figures in the paper were done with ggplot. So, I wanted the visualization for the correspondence analysis to match the style of the other figures. The standard plot method plot.ca() however, produces base graphics plots. So, I had to create the ggplot visualization myself. Actually, I don’t know if there are any packages that take a ca object (created by the ca package) and produce ggplots from it. I found this website but it uses the FactoMineR/factoextra package to do and visualize the correspondence analysis.

So, off we go… let’s build our own ggplot-based visualization for ca objects.

Getting the data

I’m going to demonstrate this using data from a linguistic experiment. You could also use, for example, the HairEyeColor dataset that comes with R. In this case, you’ll have to select a specific sub-table, e.g. HairEyeColor[,,"Female"], to get a 2-dimensional table.

Let’s start by loading the data. You can get it from my Dropbox. It’s a 2-dimensional table with 3 rows and 7 columns. This was an association experiment in German and the task of the participants was to associate several items of three different linguistic constructions (rows) to different media or text types (columns). I will not deal with conceptual difference between media and text types here.

This is the table.

Text mess. Voice mess. Newspaper E-mail Soc.Netw. Letter Other V-final 157 125 114 190 112 147 23 V2 175 210 14 80 128 39 15 Ellipsis 293 128 6 43 152 12 12

I’ll briefly explain what the rows and columns mean. In the rows, there are three different constructions.

• V-final: As you might know, in Standard German, the finite verb is put at the end of dependent subclauses. We presented “because”-clauses, and this is how such a sentence would look like in Standard German: “Er mag sein Auto, weil es sparsam ist.” (He likes his car, because it economical is.).
• V2: If you are an English speaker, you might be more familiar with this construction. It is not considered written Standard German but it is OK to use it in spoken language. V2 means that the finite verb goes at the second position in the dependent subclause: “Er mag sein Auto, weil es ist sparsam.” (He likes his car, because it is economic.)
• Ellipsis: This sounds very colloquial but most people would understand what you mean. In the ellipsis construction we used, we simply dropped the verb altogether: “Er mag sein Auto, weil sparsam.” (He likes his car, because economic.)

Now, each participant was presented nine of such sentences (three per construction) and had to check which of the media/text types they think it could appear in. We included some media that are clearly more prone to written Standard German than others (like the newspaper or a letter). “Soc.Netw.” (social networks) was maybe a bit underspecified from our side. There are a lot of different social networks and each community has its own “writing style” (at least one!). But we’ll see, where the correspondence analysis puts this item.

Correspondence analysis

I’ll do a simple ca() and will plot the result while I’m also saving the plot object in the variable ca.plot.

library(ca) ca.fit <- ca(struc.assoc) ca.plot <- plot(ca.fit)

As you can see, (almost) all the information we need is in the plot object.

str(ca.plot) ## List of 2 ## $rows: num [1:3, 1:2] -0.51 0.202 0.478 0.05 -0.235 ... ## ..- attr(*, "dimnames")=List of 2 ## .. ..$ : chr [1:3] "V-final" "V2" "Ellipsis" ## .. ..$: chr [1:2] "Dim1" "Dim2" ##$ cols: num [1:7, 1:2] 0.356 0.201 -0.912 -0.448 0.247 ... ## ..- attr(*, "dimnames")=List of 2 ## .. ..$: chr [1:7] "Text mess." "Voice mess." "Newspaper" "E-mail" ... ## .. ..$ : chr [1:2] "Dim1" "Dim2"

Only the variance contributions for the dimensions are missing. I will get them from the original ca.fit object later.

Converting the plot object

For ggplot, we will need a dataframe with the labels, the coordinates for the two dimensions and the name of the variable which is stored in rows and columns. The following function make.ca.plot.df() converts the plot object (parameter ca.plot.obj) into such a dataframe. If you want, you can put the variable names for rows and columns as arguments row.lab and col.lab. These are used in the legend later.

make.ca.plot.df <- function (ca.plot.obj, row.lab = "Rows", col.lab = "Columns") { df <- data.frame(Label = c(rownames(ca.plot.obj$rows), rownames(ca.plot.obj$cols)), Dim1 = c(ca.plot.obj$rows[,1], ca.plot.obj$cols[,1]), Dim2 = c(ca.plot.obj$rows[,2], ca.plot.obj$cols[,2]), Variable = c(rep(row.lab, nrow(ca.plot.obj$rows)), rep(col.lab, nrow(ca.plot.obj$cols)))) rownames(df) <- 1:nrow(df) df } ca.plot.df <- make.ca.plot.df(ca.plot, row.lab = "Construction", col.lab = "Medium") ca.plot.df$Size <- ifelse(ca.plot.df$Variable == "Construction", 2, 1)

I also want the points for the three constructions to be bigger than the points for the different media/text types. This is why I included the last line in the code chunk above. Please note that the numbers we supplied for sizes (2 and 1) are not the actual sizes of the points in the plot. These are simply two values that are mapped on the size scale later.

ca.plot.df looks like this now.

Label Dim1 Dim2 Variable Size V-final -0.5095947 0.0499651 Construction 2 V2 0.2019318 -0.2346586 Construction 2 Ellipsis 0.4780980 0.1729715 Construction 2 Text mess. 0.3559765 0.1712304 Medium 1 Voice mess. 0.2009605 -0.2765821 Medium 1 Newspaper -0.9117981 0.1577468 Medium 1 E-mail -0.4478077 -0.0360625 Medium 1 Soc.Netw. 0.2465235 0.0289500 Medium 1 Letter -0.7218847 0.0083225 Medium 1 Other -0.1377860 -0.0361663 Medium 1 Getting variances

ca.plot.df is already fine for plotting. Only the variance contributions of the two dimensions are missing. We can get them from the summary() of the ca.fit object. If you want, you can do str(ca.sum) to see what is held in this object and how to access the contribution values.

### No visible binding for global variable

Mon, 19/08/2019 - 04:13

Recently I have been working on a very large legacy project which utilises the excellent data.table package throughout. What this has resulted in is an R CMD check containing literally thousands of NOTEs similar to the following:

❯ checking R code for possible problems ... NOTE my_fn: no visible binding for global variable ‘mpg’

There are several reasons why you might see these NOTEs and, for our code base, some of the NOTEs were potentially more damaging than others. This was a problem as these NOTEs were hidden firstly by a suppression of them due to a manipulation of the _R_CHECK_CODETOOLS_PROFILE_ option of the .Renviron file. Once this was removed we discovered the more damaging NOTEs were hidden within the sheer amount of NOTEs we had in the R CMD check.

Non-standard Evaluation

If we have a function where we are using data.table’s modification by reference features, i.e. we are using a variable in an unquoted fashion (also known as non-standard evaluation (NSE)) then this issue will occur. Take the following function as an example.

my_fn <- function() { mtcars <- data.table::data.table(mtcars) mtcars[, mpg_div_hp := mpg / hp] mtcars[] }

Here, we would find the following NOTEs:

❯ checking R code for possible problems ... NOTE my_fn: no visible binding for global variable ‘mpg_div_hp’ my_fn: no visible binding for global variable ‘mpg’ my_fn: no visible binding for global variable ‘hp’ Undefined global functions or variables: hp mpg mpg_div_hp

Sometimes you may also see these NOTEs for syntactic sugar such as !! or := if you haven’t correctly imported the package they come from.

This is a well discussed issue on the internet which only became an issue after a change introduced to the core R code in version 2.15.1. There are two solutions to this problem.

Option One

Include all variable names within a globalVariables() call in the package documentation file.

globalVariables(c("mpg", "hp", "mpg_div_hp"))

For our package, as there are literally thousands of variables to list in this file, it makes it very difficult to maintain this list and makes the file very long. If, however, the variables belong to data which are stored within your package then this can be greatly simplified to

globalVariables(names(my_data))

You may wish to import any syntactic sugar functionality here as well. For example

globalVariables(c(":=", "!!")) Option Two

The second option involves binding the variable locally to the function. At the top of your function you can define the variable as a NULL value.

my_fn <- function() { mpg <- hp <- mpg_div_hp <- NULL mtcars <- data.table::data.table(mtcars) mtcars[, mpg_div_hp := mpg / hp] mtcars[] }

Therefore your variable(s) are now bound to object(s) and so the R CMD check has nothing to complain about. This is the method that the data.table team recommend and to me, feels like a much neater and more importantly maintainable solution than the first option.

A Note on the Tidyverse

You may also come across this problem whilst programming using the tidyverse for which there is a very neat solution. You simply need to be more explicit within your function by using the .data pronoun.

#' @importFrom rlang .data my_fn <- function() { mtcars %>% mutate(mpg_div_hp = .data$mpg / .data$hp) }

Note the import!

Selecting Variables with the data.table .. Prefix

NOTEs can occur when we are using the .. syntax of data.table, for example

double_dot <- function() { mtcars <- data.table::data.table(mtcars) select_cols <- c("cyl", "wt") mtcars[, ..select_cols] }

This will yield

❯ checking R code for possible problems ... NOTE Undefined global functions or variables: ..select_cols

In this instance, this can be solved by avoiding the .. syntax and using the alternative with = FALSE notation.

double_dot <- function() { mtcars <- data.table::data.table(mtcars) select_cols <- c("cyl", "wt") mtcars[, select_cols, with = FALSE] }

Even though the .. prefix is syntactic sugar, we cannot use globalVariables(c("..")) since the actual variable in this case is ..select_cols; we would therefore need to use globalVariables(c("..select_cols")) if we wanted to use the globalVariables() approach.

Missing Imports

In our code base, I also found NOTEs for functions or datasets which were not correctly imported. For example, consider the following simple function.

Rversion <- function() { info <- sessionInfo() info$R.version } This gives the following NOTEs: ❯ checking R code for possible problems ... NOTE Rversion: no visible global function definition for ‘sessionInfo’ Consider adding importFrom("utils", "sessionInfo") to your NAMESPACE file. Here the R CMD check is rather helpful and tells us the solution; we need to ensure that we explicitly import the function from the utils package in the documentation. This can easily be done with the roxygen2 package by including an @importFrom utils sessionInfo tag. Trying to Call Removed Functionality If you have a function which has been removed from your package but attempt to call it from another function, R will only give you a NOTE about this. use_non_existent_function <- function() { this_function_doesnt_exist() } This will give the NOTE ❯ checking R code for possible problems ... NOTE use_non_existent_function: no visible global function definition for ‘this_function_doesnt_exist’ Of course it goes without saying that you should make sure to remove any calls to functions which have been removed from your package. As a side note, when I first started working on the project, I was initially unaware that within our package we had the option _R_CHECK_CODETOOLS_PROFILE_="suppressUndefined=TRUE" set within our .Renviron file which will suppresses all unbound global variable NOTEs from appearing in the R CMD check. However given that this can mask these deeper issues within your package, such as not recognising when a function calls functionality which has been removed from the package. This can end up meaning the end user can face nasty and confusing error messages. Therefore I would not recommend using this setting and would suggest tackling each of your packages NOTEs individually to remove them all. I actually discovered all of our package NOTEs when introducing the lintr package to our CI pipeline. lintr will pick up on some – but not all – of these unbound global variable problems ('lintr of course does not take the _R_CHECK_CODETOOLS_PROFILE_ into account). Take our original function as an example my_fn <- function() { mtcars <- data.table::data.table(mtcars) mtcars[, mpg_div_hp := mpg / hp] mtcars[] } Here, lintr will highlight the variables mpg and hp as problems but it currently won’t highlight the variables on the LHS of :=, i.e. mpg_div_hp. Conclusion When developing your package, if you are experiencing these unbound global variables NOTEs you should 1. Strive to define any unbound variables locally within a function. 2. Ensure that any functions or data from external packages (including utils, stats, etc.) have the correct @importFrom tag 3. Do not suppress this check in the .Renviron file and the solutions proposed here should remove the current need to do so 4. Any package wide unbound variables, which are typically syntactic sugar (e.g. :=), should be defined within the package description file inside a globalVariables() function, which should be a very short and maintainable list. var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script')); To leave a comment for the author, please follow the link and comment on their blog: Random R Ramblings. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. ### Mueller Report Volume 1: Network Analysis Mon, 19/08/2019 - 02:46 [This article was first published on sweissblaug, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. settle down and have another cup of coffee code TLDR There are a lot of Russian’s talking to a lot of Trump campaign members in Mueller report. There are so many it’s tough to get your head around it all. In this post I attempted some network analysis on the relations between campaign officials and Russians. I found that one can ‘compress’ Russian involvement into 9 (mostly) distinct groups. I then summarize these points of contacts. Introduction to Mueller Report Volume 1 of Mueller Report starts with Russian interference in 2016 US Presidential Elections. Russia did so in two Ways. The first was a campaign by the IRA that used social media tools like facebook and twitter with the goal of changing public opinion. While there were some retweets by Trump and his campaign officials from these accounts there wasn’t much direct communication. The second form was to use Russian intelligence to hack Hillary Clinton’s emails. These hacked emails were released with help of wikileaks and guccifer 2.0. Trump’s campaign deliberately tried to find other hacked emails and encourages Russia to do so public. However, the campaign could not find additional information on these emails. The rest of Volume 1 discusses the numerous relationship between trump campaign officials and Russians. It’s this part that will be the basis for most of the results below. The data Volume 1 consists of 199 pages including foot-notes and appendices. I found a machine readable version here. I split the text into sentences and looked at whether a person’s name was included in that sentence. This left me with a sentence by name matrix that is the starting point of my analysis. There are some drawbacks to this in that OCR does not immediately distinguish sentences. In addition it often groups footnotes with last line of sentences in a page. But it seemed like a good starting point so went ahead. Below are the top 20 most common occurring names. Papadopulos, Manafort, Kushner, Cohen, Trump JR, and Flynn are all in the top. Considering they were all, to varying degrees, worked in the Trump campaign this makes sense. We also see some Russian names such as Dmitriev, Kilimnik, and Kislyak. I’ll explain their contacts below. I then created a person x person matrix that counted the number of times a name co-occurs with another. I’m treating this as a weighted, undirected graph. I transformed this to a laplacian matrix and performed an eigen decomposition. This is known as a spectral analysis of a network. Basically this tries to find locations that minimizes the square error of the relations. Below is the resulting image of 2nd to last and 3rd to last eigenvectors. WHOA … I’m getting a headache looking at this. But it definitely looks like there is structure in the graph. There appears to be some clusters forming and these do correspond to particular events described in the report. In the lower left you can see Papadopoulos related characters, in the upper right some cohen acquaintances, and around 0,.1 there’s the trump tower meeting. Not bad but still messy. I’m looking for distinct clusters. What if we look at only the Russians in the graph? Ok! Now we’re talking. There are 6 distinct clusters of Russians here. That means there are no relations between these clusters and each correspond to a unique set of relations with trump campaign officials. I played around with this some more but the text data was too messy to have robust analysis. Co-occurring names do not pick up everything and due to sentence parsing errors somethings lead to erroneous relations. Finally, I gave up on trying to only use text analysis, read volume one, and manually created a network found here. With that I created groupings using the above chart as a starting point. I found 9 fairly distinct clusters of Russians. Below you can see the relationships between those groups and various members of the Trump campaign. I then further grouped them into 4 broad categories which I’ve named; Trump Business, The Opportunists, The Professionals, and Russian Officials and Lackeys. I also included whether a trump campaign officials interaction was of first degree (they were in meeting or talked explicitly with Russian Group in question) or second degree (they were aware of meeting). Below are my summaries for each. Trump Business • Group 1 • agalarov, aras, goldstone, samochornov, veselnitskaya, kaveladze, akhmetshin • Group 2 • klokov, erchova • Group 3 • rtskhiladze, rozov • Group 5 • peskov Aras Agalorov (he has a son Emin. I did not disambiguate the difference between them) is a billionaire Russian Property Developer that worked with trump to create Miss Universe Pageant in 2013. They Discussed creating a Trump tower in moscow in late 2013 and discussed with Donald Trump JR (DTJ) and Ivanka Trump but did not progress. In Summer of 2015 Group 3 signed a letter of intent to build the trump tower in moscow and met with Ivanka and DTJ. While this was happening Group 2 contacted cohen to discuss Trump tower in moscow and a meeting with Trump. Cohen thought this person was a pro-wrestler but that did not seem to bother and agreed to talk about business. They wanted to set up a meeting with Trump and Putin but Cohen wanted to keep clear of politics and it went nowhere. Finally, due to the slowness of progress in Trump Tower Moscow deal from Group 2 cohen reached out to Peskov, Press Secretary for Putin, to try and get in touch with Putin directly and begin building. Cohen worked on moscow deal through summer of 2016 but it went nowhere. During campaign the Emin Agalorov, at the behest of his father, setup a meeting with DJT to discuss hacked emails. This lead to the infamous Trump Tower meeting that involved DJT, Kushner, and Manafort and other Russians in Group 1. DJT discussed this meeting with others in the campaign as well including Gates. Kushner showed up late to the meeting and texted manafort during that this was a ‘waste of time’ and texted others to call him to get out and he subsequently left early. The meeting did not provide any information to Trump campaign. The Oppurtunists • Group 4 • mifsud, polonskaya, timofeev, millian • Group 5 • klimentov, poliakova, peskov, dvorkovich Papadoplous and Page had similar experiences with the Trump campaign and they both seemed to be in it for the opportunity it presented themselves. Both padded his resume to look more important than he was to get the job and both foreign policy advisory roles. Papadoplous got the job of foreign policy advisor in march 2016. He met Mifsud, a Maltese Professor, in Rome at a meeting for London Centre of International Law Practice shortly after. Upon learning that Papadoplous was employed by campaign Mifsud took interest and spoke of his Russian connections. Papadopolous, thinking that having more russian connections could help his stature in the Trump Campaign, pursued this relationship. They met the following week in London where Mifsud introduced him to Polonskaya. Papadopolous relayed his new contacts with Clovis and received an approving response. This relationship continued and Mifsud said Russia had ‘dirt’ on Clinton during a meeting in late April. Ten days later he told a foreign official about his contacts and knowledge of dirt on Clinton. He then discussed a Trump meeting with Putin to Lewandowski, Miller, and Manafort. Manafort made clear that Trump should not meet with Putin directly. Page also joined the campaign in march 2016 as a foreign policy advisor. He had previously lived and worked in Russia and had several Russian contacts. He was invited to talk to the New Economic School in Russia in July and asked for permission. Clovis responded that if he went he could not speak for Trump Campaign. His talk was critical of US policy towards Russia and was received welcomingly from Russian Deputy Prime Minister and others. After he met Kislyak in July in Cleveland. These activities drew the attention of the media and was removed from campaign in late september. After Election Page went to Russia in an unofficial role after the election in late 2016. He again met with Russians in Group 5. The Professionals • Group 6 • oknyansky, rasin • Group 7 • kilimnik, deripaska, boyarkin, oganov Paul Manfort and Roger Stone are political consultants and previously worked together. Roger Stone worked alongside the campaign to help but was never officially apart of the campaign. Manafort joined in March 2016 and was the chairman between June and August. Caputo set up a meeting Stone with Group 6, Oknyansky and Rasin, to get dirt on Clinton in May 2016. Rasin claimed to have information on money laundering activities by Clinton. Stone refused the offer because they asked for too much money. Also, Stone had some contact with the twitter account Guccifer 2.0 (not shown above). This was the front used by the GRU to release stolen documents. Curiously, his name was redacted on page 45 in the Mueller report because of ‘Harm to ongoing matter’. Seems a little weird to redact something when it’s public information. From March 2016 until his depart Manafort gave and ordered Gates to give campaign updates to Kilimnik. Kilimnik is thought to be a Russian spy and has connections with Deripaska, a Russian billionaire who Manafort owed money to. Manafort gave polling data on the Trump campaign and met with Kilimnik twice in person; once in May and then again in August. It’s not clear why Manafort gave this data to Kilimnik although Gates thought it was to ingratiate himself to Deripaska. Deripaska and his deputy Boyarkin were subsequently sanctioned by the US Treasury. Russian Officials and Lackeys • Group 8 • kislyak, gorkov • Group 9 • aven, dmitriev The final groups deal with Russian Officials and and Putin’s Billionaires. Sessions and Kushner met with Kislyak, the Russian Ambassador to the US, first in April at a Trump Foreign Policy Conference. These were brief handshake affairs that lasted about a couple of minutes. Sessions does not recall seeing Kislyak. Sessions, Gordon, and Page met with Kislyak at the Republican National Convention in July. He was one of approximately 80 foreign ambassadors to the US that were invited. Gordon and Sessions met with Kislyak for a few minutes after their speeches. Gordon, Page, and Kislyak later sat at the same table and discussed improving US Russian Relations for a few minutes. Gordon received an email in August to meet with Kislyak but declined due to ‘constant stream of false media stories’ and offered to rain check the meeting. In August Russian Embassy set up a meeting with Sessions in Kislyak and the two met in September at Session’s Senate office. Meeting lasted 30 minutes and Kislyak tried to set up another meeting but Session’s didn’t follow up. Sessions got into trouble by not disclosing his meetings with Kislyak and was part of the reason he recused himself from what became known as the Mueller report. Following the election in November Kislyak reached out to Kushner but Kushner did not think Kislyak had a direct line to Putin and was therefore not important enough to talk to. Nevertheless Kushner met with Kislayk in November at Trump tower and invited Flynn and spoke for about 30 minutes about repairing US Russian Relations. Kislyak suggested using a secure line to talk to Russian generals about Syrian war. Kushner said he had no secure lines to use and asked if they could use russian facilities but Kislyak rejected that idea. Kislyak tried to get another meeting with Kushner but Kushner sent his assistant instead. Kislyak proposed meeting with Gorkov, the head of a Russian owned bank, instead. Kushner agreed and they met in December. Kushner said that meeting was about restoring US – Russian Relations. Gorkov said it was about Kushner’s personal business. They did not have any follow up meetings. In december Flynn talked with Kislyak about two separate topics. The first was to convince Russia to Veto anti-Israel resolution on settlements in the UN where it was thought the Obama administration would abstain. Russia did not vote against it. The second was to convince Russia not to retaliate against new sanctions for meddling in US elections. Mcfarland and Bannon were aware of Flynn’s discussions about the sanctions. Russia did not apply retaliatory sanctions. Finally there were two billionaires men that Putin ‘deputized’ to create contacts with the Trump Campaign after the election; Aven and Dmitriev. Aven said recalled that Putin did not know who to contact to get in touch with President Elect Trump. Aven did not make direct contact to campaign but Dmitriev did through two avenues. One was to try and convince Kushner’s friend to setup a meeting. Kushner circulated this opportunity internally but it went nowhere. The other was meeting with Erick Prince, a supporter of Trump but not officially in campaign, in the Seychelles Islands. Prince discussed his meeting with Bannon but Bannon does not have a recollection of it. Some notable connections In general these Russian Groupings were distinct in the people they talked to and had little obvious contact with one another. Some notable exceptions are: • Peskov talked to Cohen and Page independently • Dmitriev and Peskov might have talked to eachother (p. 149) but there was some ‘investigative technique’ redactions so I’m not sure • Kilimnik was aware of Page’s December visit to Russia and discussed with Manafort saying “Carter Page is in Moscow today, sending messages he is authorized to talk to Russia on behalf of DT on a range of issues of mutual interest, including Ukraine” p. 166. Leads me to ask: who would know the whereabouts and discussions of other people? Spies. Thats who. Conclusions on Volume 1 Overall, I get the impression that the Trump campaign did not have the ‘best people’. Cohen tried to make a deal but couldn’t find the right people to talk to. Papadopolous and DJT tried to get dirt on Clinton but couldn’t find anything. Page seemed to use the campaign as a platform to create more connections with Russians. A few ‘friends’ (Stone and Prince) lent a hand but probably hurt Trump’s credibility by dealing with Russians more than they helped him. Manafort, a seasoned campaigner, wasn’t obviously working for Trump… he worked for free after all. It seemed like a group that were willing to do shady things, for their own personal gain, but without the ability to follow through. SAD! All Together Graph Conclusions on Analysis Running text analysis before reading report was very helpful to understanding it. There are just so many connections going on it’s hard to keep track. Running some basic clustering techniques as described above helped me zone into what to look for while reading the report. var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script')); To leave a comment for the author, please follow the link and comment on their blog: sweissblaug. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. ### Dash with golem: The beginning Mon, 19/08/2019 - 00:16 [This article was first published on Rtask, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. {golem} has been developed to help building big Shiny application to put in production. What if {golem} could be used to build another popular interactive web application, recently made available to R programmers: Dash ? Dash, a newcomer in interactive web applications A few days ago, Plotly announced Dash now available for R. After reading this announcement, I thought this L’article Dash with golem: The beginning est apparu en premier sur Rtask. var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script')); To leave a comment for the author, please follow the link and comment on their blog: Rtask. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. ### How To Select Multiple Columns Using Grep & R Sun, 18/08/2019 - 22:31 [This article was first published on Data Science Using R – FinderDing, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Why you need to be using Grep when programming with R. There’s a reason that grep is included in most if not all programming language to this day 44 years later from creation. It’s useful and simple to use. Below is an example of using grep to make selecting multiple columns in R simple and easy to read. The dataset below has the following column names. names(data) # Column Names [1] "fips" "state" "county" "metro_area" [5] "population" "med_hh_income" "poverty_rate" "population_lowaccess" [9] "lowincome_lowaccess" "no_vehicle_lowaccess" "s_grocery" "s_supermarket" [13] "s_convenience" "s_specialty" "s_farmers_market" "r_fastfood" [17] "r_full_service" How can we select only the columns we need to work with? • metro_area • med_hh_income • poverty_rate • population_lowaccess • lowincome_lowaccess • no_vehicle_lowaccess • s_grocery • s_supermarket • s_convenience • s_specialty • s_farmers_market • r_fastfood • r_full_service We can tell R exactly by listing each column as below data[c("metro_area","med_hh_income", "poverty_rate", "population_lowaccess", "lowincome_lowaccess", "no_vehicle_lowaccess","s_grocery","s_supermarket","s_convenience","s_specialty","s_farmers_market", "r_fastfood", "r_full_service")] OR We can tell R where each column we want is. data[c(4,6,7:17)] First, writing out each individual column is time consuming and chances are you’re going to make a typo (I did when writing it). Second option we have to first figure out where the columns are located to then tell R. Well looking at the columns we are trying to access vs the others theirs a specific difference. All these columns have a “_” located in there name, and we can use regular expressions (grep) to select these. data[grep("_", names(data))]) FYI… to get the column locations you can actually use… grep("_", names(data)) [1] 4 6 7 8 9 10 11 12 13 14 15 16 17 You will rarely have a regular expression as easy at “_” to select multiple columns, a very useful resource to learn and practice is https://regexr.com Data was obtained from https://www.ers.usda.gov/data-products/food-access-research-atlas/download-the-data/ The post How To Select Multiple Columns Using Grep & R appeared first on FinderDing. var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script')); To leave a comment for the author, please follow the link and comment on their blog: Data Science Using R – FinderDing. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. ### noaastorms R package now supports NOAA IBTrACS v4 Sun, 18/08/2019 - 02:00 [This article was first published on Blog - BS, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Earlier this year, I released a simple R package (available at basilesimon/noaastorms) that downloads, cleans and parses NOAA IBtrack data for you. As the NOAA updated its datasets, noaastorms is now using these! How to install library(devtools) install_github("basilesimon/noaastorms") Available functions getStorms: Fetch NOAA historical best track storms data > df <- getStorms(c('EP')) > head(df[1:5]) Serial_Num Season Num Basin Sub_basin Name 2 1902276N14266 1902 01 EP MM UNNAMED 3 1902276N14266 1902 01 EP MM UNNAMED 4 1902276N14266 1902 01 EP MM UNNAMED 5 1902276N14266 1902 01 EP MM UNNAMED 6 1902276N14266 1902 01 EP MM UNNAMED The first argument is a vector of basin codes from this list: • NA: North Atlantic • SA: South Atlantic • NI: North Indian • SI: South Indian • EP: East Pacific • SP: South Pacific • WP: West Pacific To get storms that took place in the Atlantic for example, run getStorms(c('NA', 'SA')). The second (optional) argument is a date range to filter data with. For example: dateRange <- c(as.Date('2010-01-01'), as.Date('2012-12-31')) getStorms(c('NA', 'SA'), dateRange = dateRange) Will query storms that took place in the Atlantic in 2010 and 2012. Usage # load a map of the world and # use clipPolys to avoid issues # when zooming in with coord_map wm <- map_data("world") library("PBSmapping") data.table::setnames(wm, c("X","Y","PID","POS","region","subregion")) worldmap <- clipPolys(wm, xlim=c(20,110),ylim=c(0, 45), keepExtra=TRUE) # load storms for the Atlantic ocean spStorms <- getStorms(c('NA', 'SA')) ggplot(spStorms, aes(x = Longitude, y = Latitude, group = Serial_Num)) + geom_polygon(data = worldmap, aes(x = X, y = Y, group = PID), fill = "whitesmoke", colour = "gray10", size = 0.2) + geom_path(alpha = 0.1, size = 0.8, color = "red") + coord_map(xlim = c(20,110), ylim = c(0, 45)) Official changelog (retrieved Aug 16, 2019) [https://www.ncdc.noaa.gov/ibtracs/index.php?name=status][https://www.ncdc.noaa.gov/ibtracs/index.php?name=status] This is the first release of IBTrACS version 04. It is updated weekly. Release date: March 2019 New features (improvements from v03): * Best track data updated daily and contain provisional tracks of recent storms. * Reduced formats – Version 4 is available in 3 formats (netCDF, CSV, shapefiles) * Consistent formats – The data presented in each format is completely interconsistent (identical). * More parameters – More parameters provided by the agencies are provided in IBTrACS * Basin assignment – Any system occuring in a basin is included in that basin file (in version 3, the storm was only included in the basin in which it had its genesis) * New derived parameters – We provide storm translation speed and direction and other variables requested by users. var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script')); To leave a comment for the author, please follow the link and comment on their blog: Blog - BS. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. ### Missing Values In Dataframes With Inspectdf Sun, 18/08/2019 - 02:00 [This article was first published on Alastair Rushworth, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Summarising NA by column in dataframes Exploring the number of records containing missing values in a new set of data is an important and well known exploratory check. However, NAs can be introduced into your data for a multitude of other reasons, often as a side effect of data manipulations like transforming columns or performing joins. In most cases, the behaviour is expected, but sometimes when things go wrong, tracing missing values back through a sequence of steps can be a helpful diagnostic. All of that is to say that it’s vital to have simple tools for interrogating dataframes for missing values… enter inspectdf! Missingness by column: inspectdf::inspect_na() The inspect_na() function from the inspectdf package is a simple tool designed to quickly summarise the frequency of missingness by columns in a dataframe. Firstly, install the inspectdf package by running install.packages("inspectdf") Then load both the inspectdf and dplyr packages – the latter we’ll just use for its built-in starwars dataset. # load packages library(inspectdf) library(dplyr) # quick peek at starwars data that comes with dplyr head(starwars) ## # A tibble: 6 x 13 ## name height mass hair_color skin_color eye_color birth_year gender ## ## 1 Luke… 172 77 blond fair blue 19 male ## 2 C-3PO 167 75 gold yellow 112 ## 3 R2-D2 96 32 white, bl… red 33 ## 4 Dart… 202 136 none white yellow 41.9 male ## 5 Leia… 150 49 brown light brown 19 female ## 6 Owen… 178 120 brown, gr… light blue 52 male ## # … with 5 more variables: homeworld , species , films , ## # vehicles , starships So how many missing values are there in starwars? Even looking at the output of the head() function reveals that there are at least a few NAs in there. The use of the inspect_na() function is very straightforward: starwars %>% inspect_na ## # A tibble: 13 x 3 ## col_name cnt pcnt ## ## 1 birth_year 44 50.6 ## 2 mass 28 32.2 ## 3 homeworld 10 11.5 ## 4 height 6 6.90 ## 5 hair_color 5 5.75 ## 6 species 5 5.75 ## 7 gender 3 3.45 ## 8 name 0 0 ## 9 skin_color 0 0 ## 10 eye_color 0 0 ## 11 films 0 0 ## 12 vehicles 0 0 ## 13 starships 0 0 The output is a simple tibble with columns showing the count (cnt) and percentage (pcnt) of NAs corresponding to each column (col_name) in the starwars data. For example, we can see that the birth_year column has the highest number of NAs with over half missing. Note that the tibble is sorted in descending order of the frequency of NA occurrence. By adding the show_plot command, the tibble can also be displayed graphically: starwars %>% inspect_na %>% show_plot Although this is a simple summary, and you’ll find many other ways to do this in R, I use this all of the time and find it very convenient to have a one-liner to call on. Code efficiency matters! More on the inspectdf package and exploratory data analysis inspectdf can be used to produce a number of common summaries with minimal effort. See previous posts to learn how to explore and visualise categorical data and to calculate and display correlation coefficients . For a more general overview, have a look at the package website . For a recent overview of R packages for exploratory analysis, you might also be interested in the recent paper The Landscape of R Packages for Automated Exploratory Data Analysis by Mateusz Staniak and Przemysław Biecek . Comments? Suggestions? Issues? Any feedback is welcome! Find me on twitter at rushworth_a or write a github issue . var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script')); To leave a comment for the author, please follow the link and comment on their blog: Alastair Rushworth. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. ### Tech Dividends, Part 2 Sat, 17/08/2019 - 02:00 [This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. In a previous post, we explored the dividend history of stocks included in the SP500, and we followed that with exploring the dividend history of some NASDAQ tickers. Today’s post is a short continuation of that tech dividend theme, with the aim of demonstrating how we can take our previous work and use it to quickly visualize research from the real world. In this case, the inspiration is the July 27th edition of Barron’s, which has an article called 8 Tech Stocks That Yield Steady Payouts. (As of this writing, a subscription is required to view that article, unfortunately. If you do peruse that issue, I also recommend the interview with GMO’s Jeff Montier, as well, as he offers an interesting viewpoint on modern monetary theory.) The article breaks out eight tech stock with attractive dividends: IBM, HPQ, TXN, CSCO, INTC, ORCL, AAPL and MSFT. It also mentions QCOM as an interesting stock to watch. We’ll piggyback on the substance of the article and visualize the dividend history of those nine tickers. First, let’s load up our packages and create a vector of tickers called barrons_tickers. We will pass that vector to tq_get(get = "dividends") just as we did last time. Indeed, we’re not going to do much differently today, but hopefully it’s a nice way to see how previous work can be applied to other situations. Ah, the joys of code that can be reused! library(tidyverse) library(tidyquant) library(janitor) library(plotly) barrons_tickers <- c("IBM", "HPQ", "TXN", "CSCO", "INTC", "ORCL", "AAPL", "MSFT", "QCOM") barrons_dividends <- barrons_tickers %>% tq_get(get = "dividends") We can reuse our code from the previous post to quickly visualize these tickers’ dividend histories, along with a detailed tooltip setting in plotly. ggplotly( barrons_dividends %>% group_by(symbol) %>% mutate(info = paste(date, ' symbol:', symbol, ' div:$', dividends)) %>% ggplot(aes(x = date, y = dividends, color = symbol, label_tooltip = info)) + geom_point() + scale_y_continuous(labels = scales::dollar) + scale_x_date(breaks = scales::pretty_breaks(n = 10)) + labs(x = "", y = "div/share", title = "Nasdaq dividends") + theme(plot.title = element_text(hjust = 0.5)), tooltip = "label_tooltip" )

{"x":{"data":[{"x":[15561,15651,15743,15834,15925,16015,16107,16198,16289,16380,16471,16562,16653,16744,16835,16926,17017,17108,17206,17297,17388,17480,17571,17662,17753,17843,17935,18026,18117],"y":[0.0540814285714286,0.0540814285714286,0.0540814285714286,0.0622442857142857,0.0622442857142857,0.0622442857142857,0.0622442857142857,0.0671428571428571,0.47,0.47,0.47,0.52,0.52,0.52,0.52,0.57,0.57,0.57,0.57,0.63,0.63,0.63,0.63,0.73,0.73,0.73,0.73,0.77,0.77],"text":["info: 2012-08-09
symbol: AAPL
div: $0.0540814285714286","info: 2012-11-07 symbol: AAPL div:$ 0.0540814285714286","info: 2013-02-07
symbol: AAPL
div: $0.0540814285714286","info: 2013-05-09 symbol: AAPL div:$ 0.0622442857142857","info: 2013-08-08
symbol: AAPL
div: $0.0622442857142857","info: 2013-11-06 symbol: AAPL div:$ 0.0622442857142857","info: 2014-02-06
symbol: AAPL
div: $0.0622442857142857","info: 2014-05-08 symbol: AAPL div:$ 0.0671428571428571","info: 2014-08-07
symbol: AAPL
div: $0.47","info: 2014-11-06 symbol: AAPL div:$ 0.47","info: 2015-02-05
symbol: AAPL
div: $0.47","info: 2015-05-07 symbol: AAPL div:$ 0.52","info: 2015-08-06
symbol: AAPL
div: $0.52","info: 2015-11-05 symbol: AAPL div:$ 0.52","info: 2016-02-04
symbol: AAPL
div: $0.52","info: 2016-05-05 symbol: AAPL div:$ 0.57","info: 2016-08-04
symbol: AAPL
div: $0.57","info: 2016-11-03 symbol: AAPL div:$ 0.57","info: 2017-02-09
symbol: AAPL
div: $0.57","info: 2017-05-11 symbol: AAPL div:$ 0.63","info: 2017-08-10
symbol: AAPL
div: $0.63","info: 2017-11-10 symbol: AAPL div:$ 0.63","info: 2018-02-09
symbol: AAPL
div: $0.63","info: 2018-05-11 symbol: AAPL div:$ 0.73","info: 2018-08-10
symbol: AAPL
div: $0.73","info: 2018-11-08 symbol: AAPL div:$ 0.73","info: 2019-02-08
symbol: AAPL
div: $0.73","info: 2019-05-10 symbol: AAPL div:$ 0.77","info: 2019-08-09
symbol: AAPL
div: $0.77"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(248,118,109,1)","opacity":1,"size":5.66929133858268,"symbol":"circle","line":{"width":1.88976377952756,"color":"rgba(248,118,109,1)"}},"hoveron":"points","name":"AAPL","legendgroup":"AAPL","showlegend":true,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[15062,15160,15251,15342,15433,15523,15615,15671,15799,15887,15979,16072,16161,16253,16343,16437,16525,16617,16709,16804,16895,16987,17077,17170,17260,17352,17443,17535,17625,17717,17808,17899,17990,18080],"y":[0.06,0.06,0.06,0.06,0.08,0.08,0.14,0.14,0.17,0.17,0.17,0.17,0.19,0.19,0.19,0.19,0.21,0.21,0.21,0.21,0.26,0.26,0.26,0.26,0.29,0.29,0.29,0.29,0.33,0.33,0.33,0.33,0.35,0.35],"text":["info: 2011-03-29 symbol: CSCO div:$ 0.06","info: 2011-07-05
symbol: CSCO
div: $0.06","info: 2011-10-04 symbol: CSCO div:$ 0.06","info: 2012-01-03
symbol: CSCO
div: $0.06","info: 2012-04-03 symbol: CSCO div:$ 0.08","info: 2012-07-02
symbol: CSCO
div: $0.08","info: 2012-10-02 symbol: CSCO div:$ 0.14","info: 2012-11-27
symbol: CSCO
div: $0.14","info: 2013-04-04 symbol: CSCO div:$ 0.17","info: 2013-07-01
symbol: CSCO
div: $0.17","info: 2013-10-01 symbol: CSCO div:$ 0.17","info: 2014-01-02
symbol: CSCO
div: $0.17","info: 2014-04-01 symbol: CSCO div:$ 0.19","info: 2014-07-02
symbol: CSCO
div: $0.19","info: 2014-09-30 symbol: CSCO div:$ 0.19","info: 2015-01-02
symbol: CSCO
div: $0.19","info: 2015-03-31 symbol: CSCO div:$ 0.21","info: 2015-07-01
symbol: CSCO
div: $0.21","info: 2015-10-01 symbol: CSCO div:$ 0.21","info: 2016-01-04
symbol: CSCO
div: $0.21","info: 2016-04-04 symbol: CSCO div:$ 0.26","info: 2016-07-05
symbol: CSCO
div: $0.26","info: 2016-10-03 symbol: CSCO div:$ 0.26","info: 2017-01-04
symbol: CSCO
div: $0.26","info: 2017-04-04 symbol: CSCO div:$ 0.29","info: 2017-07-05
symbol: CSCO
div: $0.29","info: 2017-10-04 symbol: CSCO div:$ 0.29","info: 2018-01-04
symbol: CSCO
div: $0.29","info: 2018-04-04 symbol: CSCO div:$ 0.33","info: 2018-07-05
symbol: CSCO
div: $0.33","info: 2018-10-04 symbol: CSCO div:$ 0.33","info: 2019-01-03
symbol: CSCO
div: $0.33","info: 2019-04-04 symbol: CSCO div:$ 0.35","info: 2019-07-03
symbol: CSCO
div: $0.35"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(211,146,0,1)","opacity":1,"size":5.66929133858268,"symbol":"circle","line":{"width":1.88976377952756,"color":"rgba(211,146,0,1)"}},"hoveron":"points","name":"CSCO","legendgroup":"CSCO","showlegend":true,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[14312,14403,14501,14592,14683,14774,14865,14956,15047,15138,15229,15320,15411,15502,15593,15684,15775,15866,15957,16048,16139,16230,16321,16412,16503,16594,16682,16776,16867,16958,16965,17056,17147,17231,17329,17421,17512,17603,17694,17785,17877,17967,18058],"y":[0.0164986376021798,0.0164986376021798,0.0164986376021798,0.0164986376021798,0.0164986376021798,0.0164986376021798,0.0164986376021798,0.0164986376021798,0.0164986376021798,0.0247502270663034,0.0247502270663034,0.0247502270663034,0.0247502270663034,0.0272252497729337,0.0272252497729337,0.0272252497729337,0.0272252497729337,0.0299046321525886,0.0299046321525886,0.0299046321525886,0.0299046321525886,0.0329972752043597,0.0329972752043597,0.0329972752043597,0.0329972752043597,0.0362988192552225,0.0362988192552225,0.124,0.124,0.124,0.124,0.124,0.133,0.133,0.133,0.133,0.139,0.139,0.139,0.139,0.16,0.16,0.16],"text":["info: 2009-03-09 symbol: HPQ div:$ 0.0164986376021798","info: 2009-06-08
symbol: HPQ
div: $0.0164986376021798","info: 2009-09-14 symbol: HPQ div:$ 0.0164986376021798","info: 2009-12-14
symbol: HPQ
div: $0.0164986376021798","info: 2010-03-15 symbol: HPQ div:$ 0.0164986376021798","info: 2010-06-14
symbol: HPQ
div: $0.0164986376021798","info: 2010-09-13 symbol: HPQ div:$ 0.0164986376021798","info: 2010-12-13
symbol: HPQ
div: $0.0164986376021798","info: 2011-03-14 symbol: HPQ div:$ 0.0164986376021798","info: 2011-06-13
symbol: HPQ
div: $0.0247502270663034","info: 2011-09-12 symbol: HPQ div:$ 0.0247502270663034","info: 2011-12-12
symbol: HPQ
div: $0.0247502270663034","info: 2012-03-12 symbol: HPQ div:$ 0.0247502270663034","info: 2012-06-11
symbol: HPQ
div: $0.0272252497729337","info: 2012-09-10 symbol: HPQ div:$ 0.0272252497729337","info: 2012-12-10
symbol: HPQ
div: $0.0272252497729337","info: 2013-03-11 symbol: HPQ div:$ 0.0272252497729337","info: 2013-06-10
symbol: HPQ
div: $0.0299046321525886","info: 2013-09-09 symbol: HPQ div:$ 0.0299046321525886","info: 2013-12-09
symbol: HPQ
div: $0.0299046321525886","info: 2014-03-10 symbol: HPQ div:$ 0.0299046321525886","info: 2014-06-09
symbol: HPQ
div: $0.0329972752043597","info: 2014-09-08 symbol: HPQ div:$ 0.0329972752043597","info: 2014-12-08
symbol: HPQ
div: $0.0329972752043597","info: 2015-03-09 symbol: HPQ div:$ 0.0329972752043597","info: 2015-06-08
symbol: HPQ
div: $0.0362988192552225","info: 2015-09-04 symbol: HPQ div:$ 0.0362988192552225","info: 2015-12-07
symbol: HPQ
div: $0.124","info: 2016-03-07 symbol: HPQ div:$ 0.124","info: 2016-06-06
symbol: HPQ
div: $0.124","info: 2016-06-13 symbol: HPQ div:$ 0.124","info: 2016-09-12
symbol: HPQ
div: $0.124","info: 2016-12-12 symbol: HPQ div:$ 0.133","info: 2017-03-06
symbol: HPQ
div: $0.133","info: 2017-06-12 symbol: HPQ div:$ 0.133","info: 2017-09-12
symbol: HPQ
div: $0.133","info: 2017-12-12 symbol: HPQ div:$ 0.139","info: 2018-03-13
symbol: HPQ
div: $0.139","info: 2018-06-12 symbol: HPQ div:$ 0.139","info: 2018-09-11
symbol: HPQ
div: $0.139","info: 2018-12-12 symbol: HPQ div:$ 0.16","info: 2019-03-12
symbol: HPQ
div: $0.16","info: 2019-06-11 symbol: HPQ div:$ 0.16"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(147,170,0,1)","opacity":1,"size":5.66929133858268,"symbol":"circle","line":{"width":1.88976377952756,"color":"rgba(147,170,0,1)"}},"hoveron":"points","name":"HPQ","legendgroup":"HPQ","showlegend":true,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[14281,14370,14462,14554,14648,14735,14827,14921,15013,15100,15194,15286,15378,15468,15560,15651,15742,15833,15924,16015,16107,16197,16288,16380,16472,16561,16562,16653,16745,16839,16927,16930,17021,17113,17205,17294,17386,17479,17570,17660,17752,17843,17934,18025,18116],"y":[0.5,0.55,0.55,0.55,0.55,0.65,0.65,0.65,0.65,0.75,0.75,0.75,0.75,0.85,0.85,0.85,0.85,0.95,0.95,0.95,0.95,1.1,1.1,1.1,1.1,1.3,1.3,1.3,1.3,1.3,1.4,1.4,1.4,1.4,1.4,1.5,1.5,1.5,1.5,1.57,1.57,1.57,1.57,1.62,1.62],"text":["info: 2009-02-06
symbol: IBM
div: $0.5","info: 2009-05-06 symbol: IBM div:$ 0.55","info: 2009-08-06
symbol: IBM
div: $0.55","info: 2009-11-06 symbol: IBM div:$ 0.55","info: 2010-02-08
symbol: IBM
div: $0.55","info: 2010-05-06 symbol: IBM div:$ 0.65","info: 2010-08-06
symbol: IBM
div: $0.65","info: 2010-11-08 symbol: IBM div:$ 0.65","info: 2011-02-08
symbol: IBM
div: $0.65","info: 2011-05-06 symbol: IBM div:$ 0.75","info: 2011-08-08
symbol: IBM
div: $0.75","info: 2011-11-08 symbol: IBM div:$ 0.75","info: 2012-02-08
symbol: IBM
div: $0.75","info: 2012-05-08 symbol: IBM div:$ 0.85","info: 2012-08-08
symbol: IBM
div: $0.85","info: 2012-11-07 symbol: IBM div:$ 0.85","info: 2013-02-06
symbol: IBM
div: $0.85","info: 2013-05-08 symbol: IBM div:$ 0.95","info: 2013-08-07
symbol: IBM
div: $0.95","info: 2013-11-06 symbol: IBM div:$ 0.95","info: 2014-02-06
symbol: IBM
div: $0.95","info: 2014-05-07 symbol: IBM div:$ 1.1","info: 2014-08-06
symbol: IBM
div: $1.1","info: 2014-11-06 symbol: IBM div:$ 1.1","info: 2015-02-06
symbol: IBM
div: $1.1","info: 2015-05-06 symbol: IBM div:$ 1.3","info: 2015-05-07
symbol: IBM
div: $1.3","info: 2015-08-06 symbol: IBM div:$ 1.3","info: 2015-11-06
symbol: IBM
div: $1.3","info: 2016-02-08 symbol: IBM div:$ 1.3","info: 2016-05-06
symbol: IBM
div: $1.4","info: 2016-05-09 symbol: IBM div:$ 1.4","info: 2016-08-08
symbol: IBM
div: $1.4","info: 2016-11-08 symbol: IBM div:$ 1.4","info: 2017-02-08
symbol: IBM
div: $1.4","info: 2017-05-08 symbol: IBM div:$ 1.5","info: 2017-08-08
symbol: IBM
div: $1.5","info: 2017-11-09 symbol: IBM div:$ 1.5","info: 2018-02-08
symbol: IBM
div: $1.5","info: 2018-05-09 symbol: IBM div:$ 1.57","info: 2018-08-09
symbol: IBM
div: $1.57","info: 2018-11-08 symbol: IBM div:$ 1.57","info: 2019-02-07
symbol: IBM
div: $1.57","info: 2019-05-09 symbol: IBM div:$ 1.62","info: 2019-08-08
symbol: IBM
div: $1.62"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(0,186,56,1)","opacity":1,"size":5.66929133858268,"symbol":"circle","line":{"width":1.88976377952756,"color":"rgba(0,186,56,1)"}},"hoveron":"points","name":"IBM","legendgroup":"IBM","showlegend":true,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[14279,14369,14461,14552,14643,14734,14825,14916,15008,15098,15189,15281,15373,15463,15555,15649,15741,15828,15922,16014,16106,16195,16287,16379,16470,16560,16652,16743,16834,16925,17016,17108,17200,17289,17381,17476,17568,17655,17749,17841,17933,18022,18114],"y":[0.14,0.14,0.14,0.14,0.158,0.158,0.158,0.158,0.181,0.181,0.21,0.21,0.21,0.21,0.225,0.225,0.225,0.225,0.225,0.225,0.225,0.225,0.225,0.225,0.24,0.24,0.24,0.24,0.26,0.26,0.26,0.26,0.26,0.273,0.273,0.273,0.3,0.3,0.3,0.3,0.315,0.315,0.315],"text":["info: 2009-02-04 symbol: INTC div:$ 0.14","info: 2009-05-05
symbol: INTC
div: $0.14","info: 2009-08-05 symbol: INTC div:$ 0.14","info: 2009-11-04
symbol: INTC
div: $0.14","info: 2010-02-03 symbol: INTC div:$ 0.158","info: 2010-05-05
symbol: INTC
div: $0.158","info: 2010-08-04 symbol: INTC div:$ 0.158","info: 2010-11-03
symbol: INTC
div: $0.158","info: 2011-02-03 symbol: INTC div:$ 0.181","info: 2011-05-04
symbol: INTC
div: $0.181","info: 2011-08-03 symbol: INTC div:$ 0.21","info: 2011-11-03
symbol: INTC
div: $0.21","info: 2012-02-03 symbol: INTC div:$ 0.21","info: 2012-05-03
symbol: INTC
div: $0.21","info: 2012-08-03 symbol: INTC div:$ 0.225","info: 2012-11-05
symbol: INTC
div: $0.225","info: 2013-02-05 symbol: INTC div:$ 0.225","info: 2013-05-03
symbol: INTC
div: $0.225","info: 2013-08-05 symbol: INTC div:$ 0.225","info: 2013-11-05
symbol: INTC
div: $0.225","info: 2014-02-05 symbol: INTC div:$ 0.225","info: 2014-05-05
symbol: INTC
div: $0.225","info: 2014-08-05 symbol: INTC div:$ 0.225","info: 2014-11-05
symbol: INTC
div: $0.225","info: 2015-02-04 symbol: INTC div:$ 0.24","info: 2015-05-05
symbol: INTC
div: $0.24","info: 2015-08-05 symbol: INTC div:$ 0.24","info: 2015-11-04
symbol: INTC
div: $0.24","info: 2016-02-03 symbol: INTC div:$ 0.26","info: 2016-05-04
symbol: INTC
div: $0.26","info: 2016-08-03 symbol: INTC div:$ 0.26","info: 2016-11-03
symbol: INTC
div: $0.26","info: 2017-02-03 symbol: INTC div:$ 0.26","info: 2017-05-03
symbol: INTC
div: $0.273","info: 2017-08-03 symbol: INTC div:$ 0.273","info: 2017-11-06
symbol: INTC
div: $0.273","info: 2018-02-06 symbol: INTC div:$ 0.3","info: 2018-05-04
symbol: INTC
div: $0.3","info: 2018-08-06 symbol: INTC div:$ 0.3","info: 2018-11-06
symbol: INTC
div: $0.3","info: 2019-02-06 symbol: INTC div:$ 0.315","info: 2019-05-06
symbol: INTC
div: $0.315","info: 2019-08-06 symbol: INTC div:$ 0.315"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(0,193,159,1)","opacity":1,"size":5.66929133858268,"symbol":"circle","line":{"width":1.88976377952756,"color":"rgba(0,193,159,1)"}},"hoveron":"points","name":"INTC","legendgroup":"INTC","showlegend":true,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[14292,14383,14474,14565,14656,14747,14838,14929,15020,15111,15202,15293,15384,15475,15566,15657,15755,15839,15930,16028,16119,16203,16301,16392,16483,16574,16665,16756,16847,16938,17029,17120,17211,17302,17393,17485,17576,17667,17758,17849,17947,18031,18122],"y":[0.13,0.13,0.13,0.13,0.13,0.13,0.13,0.16,0.16,0.16,0.16,0.2,0.2,0.2,0.2,0.23,0.23,0.23,0.23,0.28,0.28,0.28,0.28,0.31,0.31,0.31,0.31,0.36,0.36,0.36,0.36,0.39,0.39,0.39,0.39,0.42,0.42,0.42,0.42,0.46,0.46,0.46,0.46],"text":["info: 2009-02-17
symbol: MSFT
div: $0.13","info: 2009-05-19 symbol: MSFT div:$ 0.13","info: 2009-08-18
symbol: MSFT
div: $0.13","info: 2009-11-17 symbol: MSFT div:$ 0.13","info: 2010-02-16
symbol: MSFT
div: $0.13","info: 2010-05-18 symbol: MSFT div:$ 0.13","info: 2010-08-17
symbol: MSFT
div: $0.13","info: 2010-11-16 symbol: MSFT div:$ 0.16","info: 2011-02-15
symbol: MSFT
div: $0.16","info: 2011-05-17 symbol: MSFT div:$ 0.16","info: 2011-08-16
symbol: MSFT
div: $0.16","info: 2011-11-15 symbol: MSFT div:$ 0.2","info: 2012-02-14
symbol: MSFT
div: $0.2","info: 2012-05-15 symbol: MSFT div:$ 0.2","info: 2012-08-14
symbol: MSFT
div: $0.2","info: 2012-11-13 symbol: MSFT div:$ 0.23","info: 2013-02-19
symbol: MSFT
div: $0.23","info: 2013-05-14 symbol: MSFT div:$ 0.23","info: 2013-08-13
symbol: MSFT
div: $0.23","info: 2013-11-19 symbol: MSFT div:$ 0.28","info: 2014-02-18
symbol: MSFT
div: $0.28","info: 2014-05-13 symbol: MSFT div:$ 0.28","info: 2014-08-19
symbol: MSFT
div: $0.28","info: 2014-11-18 symbol: MSFT div:$ 0.31","info: 2015-02-17
symbol: MSFT
div: $0.31","info: 2015-05-19 symbol: MSFT div:$ 0.31","info: 2015-08-18
symbol: MSFT
div: $0.31","info: 2015-11-17 symbol: MSFT div:$ 0.36","info: 2016-02-16
symbol: MSFT
div: $0.36","info: 2016-05-17 symbol: MSFT div:$ 0.36","info: 2016-08-16
symbol: MSFT
div: $0.36","info: 2016-11-15 symbol: MSFT div:$ 0.39","info: 2017-02-14
symbol: MSFT
div: $0.39","info: 2017-05-16 symbol: MSFT div:$ 0.39","info: 2017-08-15
symbol: MSFT
div: $0.39","info: 2017-11-15 symbol: MSFT div:$ 0.42","info: 2018-02-14
symbol: MSFT
div: $0.42","info: 2018-05-16 symbol: MSFT div:$ 0.42","info: 2018-08-15
symbol: MSFT
div: $0.42","info: 2018-11-14 symbol: MSFT div:$ 0.46","info: 2019-02-20
symbol: MSFT
div: $0.46","info: 2019-05-15 symbol: MSFT div:$ 0.46","info: 2019-08-14
symbol: MSFT
div: $0.46"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(0,185,227,1)","opacity":1,"size":5.66929133858268,"symbol":"circle","line":{"width":1.88976377952756,"color":"rgba(0,185,227,1)"}},"hoveron":"points","name":"MSFT","legendgroup":"MSFT","showlegend":true,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[14340,14438,14526,14623,14711,14802,14886,14988,15075,15166,15254,15348,15439,15532,15623,15686,15896,15982,16073,16164,16258,16349,16440,16527,16622,16717,16804,16903,16983,17081,17169,17266,17364,17449,17540,17637,17728,17819,17911,17996,18093],"y":[0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.18,0.12,0.12,0.12,0.12,0.12,0.12,0.12,0.15,0.15,0.15,0.15,0.15,0.15,0.15,0.15,0.19,0.19,0.19,0.19,0.19,0.19,0.19,0.19,0.24,0.24],"text":["info: 2009-04-06 symbol: ORCL div:$ 0.05","info: 2009-07-13
symbol: ORCL
div: $0.05","info: 2009-10-09 symbol: ORCL div:$ 0.05","info: 2010-01-14
symbol: ORCL
div: $0.05","info: 2010-04-12 symbol: ORCL div:$ 0.05","info: 2010-07-12
symbol: ORCL
div: $0.05","info: 2010-10-04 symbol: ORCL div:$ 0.05","info: 2011-01-14
symbol: ORCL
div: $0.05","info: 2011-04-11 symbol: ORCL div:$ 0.06","info: 2011-07-11
symbol: ORCL
div: $0.06","info: 2011-10-07 symbol: ORCL div:$ 0.06","info: 2012-01-09
symbol: ORCL
div: $0.06","info: 2012-04-09 symbol: ORCL div:$ 0.06","info: 2012-07-11
symbol: ORCL
div: $0.06","info: 2012-10-10 symbol: ORCL div:$ 0.06","info: 2012-12-12
symbol: ORCL
div: $0.18","info: 2013-07-10 symbol: ORCL div:$ 0.12","info: 2013-10-04
symbol: ORCL
div: $0.12","info: 2014-01-03 symbol: ORCL div:$ 0.12","info: 2014-04-04
symbol: ORCL
div: $0.12","info: 2014-07-07 symbol: ORCL div:$ 0.12","info: 2014-10-06
symbol: ORCL
div: $0.12","info: 2015-01-05 symbol: ORCL div:$ 0.12","info: 2015-04-02
symbol: ORCL
div: $0.15","info: 2015-07-06 symbol: ORCL div:$ 0.15","info: 2015-10-09
symbol: ORCL
div: $0.15","info: 2016-01-04 symbol: ORCL div:$ 0.15","info: 2016-04-12
symbol: ORCL
div: $0.15","info: 2016-07-01 symbol: ORCL div:$ 0.15","info: 2016-10-07
symbol: ORCL
div: $0.15","info: 2017-01-03 symbol: ORCL div:$ 0.15","info: 2017-04-10
symbol: ORCL
div: $0.19","info: 2017-07-17 symbol: ORCL div:$ 0.19","info: 2017-10-10
symbol: ORCL
div: $0.19","info: 2018-01-09 symbol: ORCL div:$ 0.19","info: 2018-04-16
symbol: ORCL
div: $0.19","info: 2018-07-16 symbol: ORCL div:$ 0.19","info: 2018-10-15
symbol: ORCL
div: $0.19","info: 2019-01-15 symbol: ORCL div:$ 0.19","info: 2019-04-10
symbol: ORCL
div: $0.24","info: 2019-07-16 symbol: ORCL div:$ 0.24"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(97,156,255,1)","opacity":1,"size":5.66929133858268,"symbol":"circle","line":{"width":1.88976377952756,"color":"rgba(97,156,255,1)"}},"hoveron":"points","name":"ORCL","legendgroup":"ORCL","showlegend":true,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[14300,14391,14482,14571,14664,14755,14846,14935,15028,15119,15210,15299,15399,15490,15588,15679,15770,15859,15947,16036,16132,16223,16311,16400,16496,16587,16678,16766,16860,16948,17042,17133,17224,17312,17406,17498,17589,17680,17778,17871,17961,18052],"y":[0.16,0.17,0.17,0.17,0.17,0.19,0.19,0.19,0.19,0.215,0.215,0.215,0.215,0.25,0.25,0.25,0.25,0.35,0.35,0.35,0.35,0.42,0.42,0.42,0.42,0.48,0.48,0.48,0.48,0.53,0.53,0.53,0.53,0.57,0.57,0.57,0.57,0.62,0.62,0.62,0.62,0.62],"text":["info: 2009-02-25
symbol: QCOM
div: $0.16","info: 2009-05-27 symbol: QCOM div:$ 0.17","info: 2009-08-26
symbol: QCOM
div: $0.17","info: 2009-11-23 symbol: QCOM div:$ 0.17","info: 2010-02-24
symbol: QCOM
div: $0.17","info: 2010-05-26 symbol: QCOM div:$ 0.19","info: 2010-08-25
symbol: QCOM
div: $0.19","info: 2010-11-22 symbol: QCOM div:$ 0.19","info: 2011-02-23
symbol: QCOM
div: $0.19","info: 2011-05-25 symbol: QCOM div:$ 0.215","info: 2011-08-24
symbol: QCOM
div: $0.215","info: 2011-11-21 symbol: QCOM div:$ 0.215","info: 2012-02-29
symbol: QCOM
div: $0.215","info: 2012-05-30 symbol: QCOM div:$ 0.25","info: 2012-09-05
symbol: QCOM
div: $0.25","info: 2012-12-05 symbol: QCOM div:$ 0.25","info: 2013-03-06
symbol: QCOM
div: $0.25","info: 2013-06-03 symbol: QCOM div:$ 0.35","info: 2013-08-30
symbol: QCOM
div: $0.35","info: 2013-11-27 symbol: QCOM div:$ 0.35","info: 2014-03-03
symbol: QCOM
div: $0.35","info: 2014-06-02 symbol: QCOM div:$ 0.42","info: 2014-08-29
symbol: QCOM
div: $0.42","info: 2014-11-26 symbol: QCOM div:$ 0.42","info: 2015-03-02
symbol: QCOM
div: $0.42","info: 2015-06-01 symbol: QCOM div:$ 0.48","info: 2015-08-31
symbol: QCOM
div: $0.48","info: 2015-11-27 symbol: QCOM div:$ 0.48","info: 2016-02-29
symbol: QCOM
div: $0.48","info: 2016-05-27 symbol: QCOM div:$ 0.53","info: 2016-08-29
symbol: QCOM
div: $0.53","info: 2016-11-28 symbol: QCOM div:$ 0.53","info: 2017-02-27
symbol: QCOM
div: $0.53","info: 2017-05-26 symbol: QCOM div:$ 0.57","info: 2017-08-28
symbol: QCOM
div: $0.57","info: 2017-11-28 symbol: QCOM div:$ 0.57","info: 2018-02-27
symbol: QCOM
div: $0.57","info: 2018-05-29 symbol: QCOM div:$ 0.62","info: 2018-09-04
symbol: QCOM
div: $0.62","info: 2018-12-06 symbol: QCOM div:$ 0.62","info: 2019-03-06
symbol: QCOM
div: $0.62","info: 2019-06-05 symbol: QCOM div:$ 0.62"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(219,114,251,1)","opacity":1,"size":5.66929133858268,"symbol":"circle","line":{"width":1.88976377952756,"color":"rgba(219,114,251,1)"}},"hoveron":"points","name":"QCOM","legendgroup":"QCOM","showlegend":true,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[14272,14362,14454,14545,14637,14727,14818,14910,15001,15092,15183,15274,15366,15456,15548,15642,15644,15734,15821,15915,16007,16099,16188,16280,16372,16463,16553,16645,16736,16828,16919,17010,17108,17193,17283,17374,17469,17561,17655,17742,17834,17926,18019,18107],"y":[0.11,0.11,0.11,0.12,0.12,0.12,0.12,0.13,0.13,0.13,0.13,0.17,0.17,0.17,0.17,0.21,0.21,0.21,0.28,0.28,0.3,0.3,0.3,0.3,0.34,0.34,0.34,0.34,0.38,0.38,0.38,0.38,0.5,0.5,0.5,0.5,0.62,0.62,0.62,0.62,0.77,0.77,0.77,0.77],"text":["info: 2009-01-28
symbol: TXN
div: $0.11","info: 2009-04-28 symbol: TXN div:$ 0.11","info: 2009-07-29
symbol: TXN
div: $0.11","info: 2009-10-28 symbol: TXN div:$ 0.12","info: 2010-01-28
symbol: TXN
div: $0.12","info: 2010-04-28 symbol: TXN div:$ 0.12","info: 2010-07-28
symbol: TXN
div: $0.12","info: 2010-10-28 symbol: TXN div:$ 0.13","info: 2011-01-27
symbol: TXN
div: $0.13","info: 2011-04-28 symbol: TXN div:$ 0.13","info: 2011-07-28
symbol: TXN
div: $0.13","info: 2011-10-27 symbol: TXN div:$ 0.17","info: 2012-01-27
symbol: TXN
div: $0.17","info: 2012-04-26 symbol: TXN div:$ 0.17","info: 2012-07-27
symbol: TXN
div: $0.17","info: 2012-10-29 symbol: TXN div:$ 0.21","info: 2012-10-31
symbol: TXN
div: $0.21","info: 2013-01-29 symbol: TXN div:$ 0.21","info: 2013-04-26
symbol: TXN
div: $0.28","info: 2013-07-29 symbol: TXN div:$ 0.28","info: 2013-10-29
symbol: TXN
div: $0.3","info: 2014-01-29 symbol: TXN div:$ 0.3","info: 2014-04-28
symbol: TXN
div: $0.3","info: 2014-07-29 symbol: TXN div:$ 0.3","info: 2014-10-29
symbol: TXN
div: $0.34","info: 2015-01-28 symbol: TXN div:$ 0.34","info: 2015-04-28
symbol: TXN
div: $0.34","info: 2015-07-29 symbol: TXN div:$ 0.34","info: 2015-10-28
symbol: TXN
div: $0.38","info: 2016-01-28 symbol: TXN div:$ 0.38","info: 2016-04-28
symbol: TXN
div: $0.38","info: 2016-07-28 symbol: TXN div:$ 0.38","info: 2016-11-03
symbol: TXN
div: $0.5","info: 2017-01-27 symbol: TXN div:$ 0.5","info: 2017-04-27
symbol: TXN
div: $0.5","info: 2017-07-27 symbol: TXN div:$ 0.5","info: 2017-10-30
symbol: TXN
div: $0.62","info: 2018-01-30 symbol: TXN div:$ 0.62","info: 2018-05-04
symbol: TXN
div: $0.62","info: 2018-07-30 symbol: TXN div:$ 0.62","info: 2018-10-30
symbol: TXN
div: $0.77","info: 2019-01-30 symbol: TXN div:$ 0.77","info: 2019-05-03
symbol: TXN
div: $0.77","info: 2019-07-30 symbol: TXN div:$ 0.77"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(255,97,195,1)","opacity":1,"size":5.66929133858268,"symbol":"circle","line":{"width":1.88976377952756,"color":"rgba(255,97,195,1)"}},"hoveron":"points","name":"TXN","legendgroup":"TXN","showlegend":true,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null}],"layout":{"margin":{"t":43.7625570776256,"r":7.30593607305936,"b":25.5707762557078,"l":54.7945205479452},"plot_bgcolor":"rgba(235,235,235,1)","paper_bgcolor":"rgba(255,255,255,1)","font":{"color":"rgba(0,0,0,1)","family":"","size":14.6118721461187},"title":{"text":"Nasdaq dividends","font":{"color":"rgba(0,0,0,1)","family":"","size":17.5342465753425},"x":0.5,"xref":"paper"},"xaxis":{"domain":[0,1],"automargin":true,"type":"linear","autorange":false,"range":[14079.5,18314.5],"tickmode":"array","ticktext":["2009","2010","2011","2012","2013","2014","2015","2016","2017","2018","2019","2020"],"tickvals":[14245,14610,14975,15340,15706,16071,16436,16801,17167,17532,17897,18262],"categoryorder":"array","categoryarray":["2009","2010","2011","2012","2013","2014","2015","2016","2017","2018","2019","2020"],"nticks":null,"ticks":"outside","tickcolor":"rgba(51,51,51,1)","ticklen":3.65296803652968,"tickwidth":0.66417600664176,"showticklabels":true,"tickfont":{"color":"rgba(77,77,77,1)","family":"","size":11.689497716895},"tickangle":-0,"showline":false,"linecolor":null,"linewidth":0,"showgrid":true,"gridcolor":"rgba(255,255,255,1)","gridwidth":0.66417600664176,"zeroline":false,"anchor":"y","title":{"text":"","font":{"color":"rgba(0,0,0,1)","family":"","size":14.6118721461187}},"hoverformat":".2f"},"yaxis":{"domain":[0,1],"automargin":true,"type":"linear","autorange":false,"range":[-0.0636764305177112,1.70017506811989],"tickmode":"array","ticktext":["$0.00","$0.50","$1.00","$1.50"],"tickvals":[0,0.5,1,1.5],"categoryorder":"array","categoryarray":["$0.00","$0.50","$1.00","$1.50"],"nticks":null,"ticks":"outside","tickcolor":"rgba(51,51,51,1)","ticklen":3.65296803652968,"tickwidth":0.66417600664176,"showticklabels":true,"tickfont":{"color":"rgba(77,77,77,1)","family":"","size":11.689497716895},"tickangle":-0,"showline":false,"linecolor":null,"linewidth":0,"showgrid":true,"gridcolor":"rgba(255,255,255,1)","gridwidth":0.66417600664176,"zeroline":false,"anchor":"x","title":{"text":"div/share","font":{"color":"rgba(0,0,0,1)","family":"","size":14.6118721461187}},"hoverformat":".2f"},"shapes":[{"type":"rect","fillcolor":null,"line":{"color":null,"width":0,"linetype":[]},"yref":"paper","xref":"paper","x0":0,"x1":1,"y0":0,"y1":1}],"showlegend":true,"legend":{"bgcolor":"rgba(255,255,255,1)","bordercolor":"transparent","borderwidth":1.88976377952756,"font":{"color":"rgba(0,0,0,1)","family":"","size":11.689497716895},"y":0.913385826771654},"annotations":[{"text":"symbol","x":1.02,"y":1,"showarrow":false,"ax":0,"ay":0,"font":{"color":"rgba(0,0,0,1)","family":"","size":14.6118721461187},"xref":"paper","yref":"paper","textangle":-0,"xanchor":"left","yanchor":"bottom","legendTitle":true}],"hovermode":"closest","barmode":"relative"},"config":{"doubleClick":"reset","showSendToCloud":false},"source":"A","attrs":{"e04b61379be2":{"x":{},"y":{},"colour":{},"label_tooltip":{},"type":"scatter"}},"cur_data":"e04b61379be2","visdat":{"e04b61379be2":["function (y) ","x"]},"highlight":{"on":"plotly_click","persistent":false,"dynamic":false,"selectize":false,"opacityDim":0.2,"selected":{"opacity":1},"debounce":0},"shinyEvents":["plotly_hover","plotly_click","plotly_selected","plotly_relayout","plotly_brushed","plotly_brushing","plotly_clickannotation","plotly_doubleclick","plotly_deselect","plotly_afterplot"],"base_url":"https://plot.ly"},"evals":[],"jsHooks":[]}

With a handful of stocks, our visualization really tells a nice story. We can more clearly see the four annual payments by each company, and it pops off the chart that IBM has been raising it’s dividend consistently. Not bad for a company that also owns Red Hat.

Let’s move beyond the dividend history and compare the dividend yields for each of these tickers. We’ll grab yesterday’s closing price by calling tq_get(get = "stock.prices", from = "2019-08-05").

barrons_price <- barrons_tickers %>% tq_get(get = "stock.prices", from = "2019-08-05")

Now, we estimate the annual dividend payment by taking the most recent quarterly dividend via slice(n()) and multiplying by four.

barrons_dividends %>% group_by(symbol) %>% slice(n()) %>% mutate(total_div = dividends * 4) # A tibble: 9 x 4 # Groups: symbol [9] symbol date dividends total_div 1 AAPL 2019-08-09 0.77 3.08 2 CSCO 2019-07-03 0.35 1.4 3 HPQ 2019-06-11 0.16 0.64 4 IBM 2019-08-08 1.62 6.48 5 INTC 2019-08-06 0.315 1.26 6 MSFT 2019-08-14 0.46 1.84 7 ORCL 2019-07-16 0.24 0.96 8 QCOM 2019-06-05 0.62 2.48 9 TXN 2019-07-30 0.77 3.08

Next, we use left_join(barrons_price, by = "symbol") to add the most recent closing price.

barrons_dividends %>% group_by(symbol) %>% slice(n()) %>% mutate(total_div = dividends * 4) %>% left_join(barrons_price, by = "symbol") %>% head() # A tibble: 6 x 11 # Groups: symbol [1] symbol date.x dividends total_div date.y open high low close 1 AAPL 2019-08-09 0.77 3.08 2019-08-05 198. 199. 193. 193. 2 AAPL 2019-08-09 0.77 3.08 2019-08-06 196. 198. 194. 197 3 AAPL 2019-08-09 0.77 3.08 2019-08-07 195. 200. 194. 199. 4 AAPL 2019-08-09 0.77 3.08 2019-08-08 200. 204. 199. 203. 5 AAPL 2019-08-09 0.77 3.08 2019-08-09 201. 203. 199. 201. 6 AAPL 2019-08-09 0.77 3.08 2019-08-12 200. 202. 199. 200. # … with 2 more variables: volume , adjusted

That worked, but note how we now have two date columns, called date.x and date.y, since both of our tibbles had a date column before we joined them. In the past we have dealt with that by deleting the duplicate but this time let’s use a select() inside left_join() to remove the duplicate before joining. The full call is left_join(select(barrons_price, -date), by = "symbol").

barrons_dividends %>% group_by(symbol) %>% slice(n()) %>% mutate(total_div = dividends * 4) %>% left_join(select(barrons_price, -date), by = "symbol") %>% head() # A tibble: 6 x 10 # Groups: symbol [1] symbol date dividends total_div open high low close volume 1 AAPL 2019-08-09 0.77 3.08 198. 199. 193. 193. 5.24e7 2 AAPL 2019-08-09 0.77 3.08 196. 198. 194. 197 3.58e7 3 AAPL 2019-08-09 0.77 3.08 195. 200. 194. 199. 3.34e7 4 AAPL 2019-08-09 0.77 3.08 200. 204. 199. 203. 2.70e7 5 AAPL 2019-08-09 0.77 3.08 201. 203. 199. 201. 2.46e7 6 AAPL 2019-08-09 0.77 3.08 200. 202. 199. 200. 2.25e7 # … with 1 more variable: adjusted

Now, we calculate the yield with mutate(yield = total_div/close).

barrons_dividends %>% group_by(symbol) %>% slice(n()) %>% mutate(total_div = dividends * 4) %>% left_join(barrons_price, by = "symbol") %>% select(symbol, total_div, close) %>% mutate(yield = total_div/close) # A tibble: 81 x 4 # Groups: symbol [9] symbol total_div close yield 1 AAPL 3.08 193. 0.0159 2 AAPL 3.08 197 0.0156 3 AAPL 3.08 199. 0.0155 4 AAPL 3.08 203. 0.0151 5 AAPL 3.08 201. 0.0153 6 AAPL 3.08 200. 0.0154 7 AAPL 3.08 209. 0.0147 8 AAPL 3.08 203. 0.0152 9 AAPL 3.08 202. 0.0153 10 CSCO 1.4 51.4 0.0273 # … with 71 more rows

We can plot the dividend yields as bar heights using geom_col().

barrons_dividends %>% group_by(symbol) %>% slice(n()) %>% mutate(total_div = dividends * 4) %>% left_join(barrons_price, by = "symbol") %>% select(symbol, total_div, close) %>% mutate(yield = total_div/close) %>% ggplot(aes(x = reorder(symbol, yield), y = yield, fill = symbol)) + geom_col(width = .5) + labs(x = "") + scale_y_continuous(labels = scales::percent)

We could wrap this up with a call to plotly, but let’s totally change directions and add some animation. Animate a chart? That sounds really hard, I guess we’ll need to loop through the dates and add dots as we go. A lot of work and who has the time…wait…boom…gganimate to the rescue!

The gganimate package makes this so painless it’s a shame. We add transition_reveal(date) to the end of the code flow, and that’s it! Well, not quite; on my machine, I needed to load the gifski and png packages before any of this works, but then we’re good to go.

library(gganimate) library(gifski) library(png) barrons_dividends %>% group_by(symbol) %>% ggplot(aes(x = date, y = dividends, color = symbol)) + geom_point() + scale_y_continuous(labels = scales::dollar) + scale_x_date(breaks = scales::pretty_breaks(n = 10)) + labs(x = "", y = "div/share", title = "Nasdaq dividends") + theme(plot.title = element_text(hjust = 0.5)) + transition_reveal(date)

Nice!

What about animating our chart that shows the dividend yield as bar heights? Well, we can’t reveal by date here, so we use transition_states(symbol).

barrons_dividends %>% group_by(symbol) %>% slice(n()) %>% mutate(total_div = dividends * 4) %>% left_join(select(barrons_price, -date), by = "symbol") %>% select(symbol, total_div, close) %>% mutate(yield = total_div/close) %>% ggplot(aes(x = reorder(symbol, yield), y = yield, fill = symbol)) + geom_col(width = .5) + labs(x = "") + scale_y_continuous(labels = scales::percent) + transition_states(symbol)

Ah, not quite perfect – notice the chart doesn’t respect the reorder in our aes(), so they appear in alphabetical order and each column disappears as the next one appears. Let’s use shadow_mark() to keep the previous bar and attempt to reorder the images with arrange().

barrons_dividends %>% group_by(symbol) %>% slice(n()) %>% mutate(total_div = dividends * 4) %>% left_join(select(barrons_price, -date), by = "symbol") %>% select(symbol, total_div, close) %>% mutate(yield = total_div/close) %>% arrange(yield) %>% ggplot(aes(x = reorder(symbol, yield), y = yield, fill = symbol)) + geom_col(width = .5) + labs(x = "") + scale_y_continuous(labels = scales::percent) + transition_states(symbol, wrap = FALSE) + shadow_mark()

It is still not respecting the new order and defaulting to alphabetical. Let’s hard-code that reordering by converting symbol to a factor, ordered by yield. And that means a foray into the forcats package and fct_reorder(). Note we need to ungroup() first since symbol is our grouping column and then can call symbol_fct = forcats::as_factor(symbol) %>% fct_reorder(yield). I also think it would be a little more dramatic to remove the x-axis labels and have the ticker names appear on the chart.

barrons_dividends %>% group_by(symbol) %>% slice(n()) %>% mutate(total_div = dividends * 4) %>% left_join(select(barrons_price, -date), by = "symbol") %>% select(symbol, total_div, close) %>% mutate(yield = total_div/close) %>% ungroup() %>% mutate(symbol_fct = forcats::as_factor(symbol) %>% fct_reorder(yield)) %>% ggplot(aes(x = symbol_fct, y = yield, fill = symbol_fct)) + geom_col(width = .5) + geom_label(aes(label = symbol, y = yield), nudge_y = .03) + labs(x = "") + scale_y_continuous(labels = scales::percent) + theme(axis.text.x = element_blank(), axis.ticks.x = element_blank()) + transition_states(symbol_fct, wrap = FALSE) + shadow_mark()

Creating and loading those animated gifs takes some time, about 10-30 seconds each on my RStudio Server Pro instance. Plus, it’s totally fair to quibble that these animations haven’t added any new substance to the charts, they just look cool (R plots can be cool, right?). But if you’ve read this far (thanks!), I might as well subject you to my rant about visualization and communication being just-as-if-not-more important than analytical or statistical findings. Most of the consumers of our work are really busy and we’re lucky if they spend two minutes glancing at whatever findings we put in front of them. We don’t have long to grab their attention and communicate our message. If an animation helps us, it’s worth spending the extra time on it, even though we were actually ‘done’ with this job many lines of code ago.

Alright, so with that:

If you like this sort of code through ,check out my book, Reproducible Finance with R.

Not specific to finance, but several of the stringr and ggplot tricks in this post came from this awesome Business Science University course.

I’m also going to be posting weekly code snippets on LinkedIn; connect with me there if you’re keen for some R finance stuff.

Thanks for reading and see you next time!

_____='https://rviews.rstudio.com/2019/08/17/tech-dividends-part-2/';

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

### Modern R with the tidyverse is available on Leanpub

Sat, 17/08/2019 - 02:00

[This article was first published on Econometrics and Free Software, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Yesterday I released an ebook on Leanpub,
called Modern R with the tidyverse, which you can also

In this blog post, I want to give some context.

Modern R with the tidyverse is the second ebook I release on Leanpub. I released the first one, called
Functional programming and unit testing for data munging with R around
here) . I just had moved back to my home country of
Luxembourg and started a new job as a research assistant at the statistical national institute.
Since then, lots of things happened; I’ve changed jobs and joined PwC Luxembourg as a data scientist,
was promoted to manager, finished my PhD, and most importantly of all, I became a father.

Through all this, I continued blogging and working on a new ebook, called Modern R with the tidyverse.
At first, this was supposed to be a separate book from the first one, but as I continued writing,
I realized that updating and finishing the first one, would take a lot of effort, and also, that
it wouldn’t make much sense in keeping both separated. So I decided to merge the content from the
first ebook with the second, and update everything in one go.

My very first notes were around 50 pages if memory serves, and I used them to teach R at the
University of Strasbourg while I employed there as a research and teaching assistant and working
on my PhD. These notes were the basis of Functional programming and unit testing for data munging with R
and now Modern R. Chapter 2 of Modern R is almost a simple copy and paste from these notes
(with more sections added). These notes were first written around 2012-2013ish.

Modern R is the kind of text I would like to have had when I first started playing around with R,
sometime around 2009-2010. It starts from the beginning, but also goes quite into details in the
later chapters. For instance, the section on
modeling with functional programming
is quite advanced, but I believe that readers that read through all the book and reached that part
would be armed with all the needed knowledge to follow. At least, this is my hope.

Now, the book is still not finished. Two chapters are missing, but it should not take me long to
finish them as I already have drafts lying around. However, exercises might still be in wrong
places, and more are required. Also, generally, more polishing is needed.

As written in the first paragraph of this section, the book is available on
Leanpub. Unlike my previous ebook, this one costs money;
a minimum price of 4.99$and a recommended price of 14.99$, but as mentioned you can read it for
free online. I’ve hesitated to give it a minimum price of
0\$, but I figured that since the book can be read for free online, and that Leanpub has a 45 days
ebook), readers were not taking a lot of risks by buying it for 5 bucks. I sure hope however that
readers will find that this ebook is worth at least 5 bucks!

Now why should you read it? There’s already a lot of books on learning how to use R. Well, I don’t
really want to convince you to read it. But some people do seem to like my style of writing and my
blog posts, so I guess these same people, or similar people, might like the ebook. Also, I think
that this ebook covers a lot of different topics, enough of them to make you an efficient R user.
But as I’ve written in the introduction of Modern R:

So what you can expect from this book is that this book is not the only one you should read.

Anyways, hope you’ll enjoy Modern R, suggestions, criticisms and reviews welcome!

By the way, the cover of the book is a painting by John William Waterhouse,depicting Diogenes of Sinope,