Social media’s rise has changed the election process. Subtle social cues on Facebook increase voter turnout, supporters of politicians buy fake followers, and regular people have the opportunity to weigh in on political debates like never before. In an attempt to quantify a part of our collective political conversation, we parsed Twitter’s Gardenhose API for tweets mentioning the debate hashtags #DEMdebate and #GOPdebate.1 We also analyzed words spoken by the candidate’s themselves during the debates, using the Washington Post’s annotated transcripts.
Our methodology allows a rank ordering of the words which contribute the most to the differences in sentiment scores between candidates. Below, we use tf-idf statistics on the emotionally charged subset of the text, namely words whose labMT score fall outside of the neutral range. (Details can be found in our previous blog post on the topic.)
Figures in the left column show words that appear in tweets mentioning the candidate while those in the right column show words spoken by the candidate during the debate. Words are colored by sentiment (lighter teal for happier, darker purple for sadder) and sized by their weighted tf-idf scores, a combination of raw frequency and the relative ‘surprise’ factor across all tweets. We order the candidates in reverse alphabetical order.
On Twitter, the positive words distinguishing Trump from the other candidates include “tower”, “muscle”, and “gaming” and the negative words include “misunderstood”, “delay”, and “protestors”. The texture of Trump’s own debate language includes the positive words “domain”, “tremendous”, and “totally”, and the negative words “beat”, “stupid”, and “hell”.
Tweets about Sanders mention the positive words “safety”, “contribution”, and “revolutionary”, and the negative words “bailout”, “penalty”, and “arrested”. The words standing out from his debate performances include “college”, “handful”, and “progressive” on the positive side along with “opposition”, “unemployment”, and “lobbying” on the negative.
Tweets mentioning Clinton often include the words “influences”, “contribution”, and “bank” which are positive when taken out of context, and the negative words “bailout”, “indicted”, and “penalty”. Her debate language is far more positive, including the words “progressive”, “universal”, and “coverage”.
A time-series of the tweet occurrences #DemDebate and #GOPdebate, which are categorized by candidate mention, give a sense of the turnout on Twitter, and serve as an informative comparison of social media attention.
Below, we provide interactive wordshift graphs which reveal how word usage changes either raise or lower the overall positivity of a candidate relative to all other candidates. Words on the right contribute positively to the happiness difference, and words on the left contribute negatively. (More detail on how to to read each of these graphs is available here.)
The word shifts reveal many patterns which we largely leave for the reader to explore, giving a few examples here:
- For Trump, tweets about him have about the same average as for all candidates combined. The three most influential positive words contributing to tweets mentioning him are “win”, “great”, and “love” while the three most influential negatives ones are “hate”, “hell”, and “politicians”.
- For words used by candidates themselves, Clinton has the highest average positivity score, largely driven by using negative words relatively less than the other candidates (e.g., Clinton says “tax”, “not”, “don’t”, and “problem” less than her counterparts).
- For Cruz, the most influential negative word he has used in debates is “tax”, followed by “fight” and “terrorism”. His average positivity score is the lowest of all candidates though we stress that the differences are relatively small.
- Sanders, who is above average in positivity, uses relatively more of “kids”, “united” (states), and “believe”.
- Rubio, now to be discussed in the past tense, had a strong usage of negative words in debates with “not” standing out.
- Kasich, like Clinton, uses negatives words less that most of the other candidates, and the same can be said for tweets about him. And tweets referencing Kasich are the most positive out of all the candidates.
That’s enough for this post in what we intend will be a series of analyses throughout the 2016 Presidential Election season. We will work towards extracting frames around candidates in different kinds of ways, and see how far we go towards discerning the competing collective stories driving the successes and failures of candidates, and the choices of the electorate.
1. Note that in our previous blog post, we analyzed very specific keywords related to each candidate, including the words ‘Bernie’, ‘Sanders’, ‘Hillary’, ‘Clinton’. As a result, some false positives were introduced (e.g. Bernie Mac or Bill Clinton). Here, we analyze tweets that reference the hashtags #GOPdebate, #DEMdebate, leading to a smaller rate of false positives. For additional details on the methodology, please see the previous blog post.