# The Happiest Character on Seinfeld was …

… Kramer! Ohhh YEAHHHHH!

Whether he was analyzing underwear performance (“My boys need a house!”), advising a friend on marriage (“Is it alright if I use the bathroom now?”), or unabashedly offering his honest opinion (“You just need a nose job!”), Jerry Seinfeld’s awkwardly outspoken neighbor brought the funny. Inspired by writer Larry David’s real-life neighbor Kenny Kramer, and played by actor Michael Richards, Kramer made millions of people happy simply by entering the room.

In one particularly memorable scene, Kramer described his frantic efforts to save the severed pinky toe of a friend:

Crossing the cultural cavern between Science and Seinfeld, we downloaded scripts for each episode from http://seinology.com and used our Hedonometer algorithm to calculate the happiness expressed by each character throughout the 9 seasons. Below you’ll see part of the script for “The Fire”, the episode containing Kramer’s bus story.

The script for an episode of Seinfeld. Happy words are colored shades of red and sad words are colored shades of blue. Words that are neutral (grey) or for which we have no rating (black) are not included in the calculation. The text from Kramer’s amazing story is highlighted.

Gathering all of the lines read by each character, and combining the happiness ratings of their words, we find Kramer’s character to have the happiest average, followed by George, Elaine, and then Jerry. As a frame of reference for the y-axis, our instrument found tweets authored in Vermont in 2013 had an average happiness of roughly 5.95, ranking 4th in the US. Tweets from Alabama scored roughly 5.83, ranking 48th in the US.

Expressed happiness for words spoken by the four main characters in Seinfeld. No surprise at the top.

Among the entire sitcom’s 600,000 total spoken words, Jerry offered roughly 150,000 to Kramer’s 70,000 (George ~ 110,000 and Elaine ~ 80,000). Compared with Jerry, Kramer used the happy words “buddy”, “delicious”, and whoa “mama” more often, and the negative words “not”, “don’t”, and “stupid” less often. Below you’ll find an interactive graphic illustrating the difference.

Looking at the entire scripts for each show, and averaging over each season, the fifth season rated happiest, offering classic episodes like “The Puffy Shirt“, “The Fire“, and team Storylab all-time favorite, “The Marine Biologist“. The sixth season was close behind, including “The Big Salad“, “The Race“, and “The Soup” (we didn’t include the highlights episodes 100, 101).

The show’s saddest season, the ninth and final (5.95), scores happier than the show’s happiest lead character Kramer (5.90), suggesting that the supporting cast of regulars like US Postal worker Newman, George’s father, and Elaine’s off-and-on love interest David Puddy gave a measurable emotional lift.

In homage to the show’s main character, who presently enjoys riding in fancy cars with friends and getting coffee, we are quite happy to have spent quality time quantifying something that means absolutely nothing.

Next up on the computational culture binge: comparing Seinfeld with other TV series, TV to movies and books, and more.

# Hedonometer 2.0: Measuring happiness and using word shifts

With our Hedonometer, we’re measuring how a (very capable) individual might feel when reading a large text—a day’s worth of tweets from New York City, the first chapter of Moby Dick, or the music lyrics from all UK pop songs released in 1983.

We’ll describe two fundamental pieces of the Hedonometer in this post:

1. How our simple measure works;
2. How to understand changes in happiness scores through our interactive word shifts.

More depth on everything below can be found in our foundational papers. Off we go:

### Measuring happiness:

We measure the happiness of large-scale texts using what we call a lexical meter. (We’ll be introducing two other kinds of meters in the near future: ground truth meters and bootstrap meters.)

Lexical meters assess a given quality of a text (e.g., expressed happiness) by averaging over the contributions of that same, already measured quality for the text’s individual words (and potentially phrases).

For happiness, we have lists of average happiness scores for around 10,000 commonly used words in 10 languages, each deriving from the personal evaluations of 50 people. Scores were made on a 1 to 9 integer scale with 1 meaning “I feel extremely sad”, 5 meaning “I feel no emotion”, and 9 meaning “I feel extremely happy.” (See Likert scales for more.) The happiness scores of a few example words are $$h_{\textrm{avg}}(\textrm{‘war’}) = 1.80$$, $$h_{\textrm{avg}}(\textrm{‘the’}) = 4.98$$, and $$h_{\textrm{avg}}(\textrm{‘laughter’}) = 8.50$$.

Now, not all words convey emotion and some words are ambiguous or difficult to rate. We’ve found that we can apply a simple “word lens” to improve and tune the Hedonometer by analysing texts using words with enough clear emotional content. We’ve found a good default lens excludes all words for which $$4 \lt h_{\textrm{avg}} \lt 6$$ (this is one principled way of finding stop words). We also allow users to choose any lens they like, as we explain here. Again, see our foundational papers for more. We’ll write $$L$$ as the set of all words allowed by a particular lens choice.

You can explore all of our English words here and download the entire data set using our API.

So, given a word lens $$L$$, we now score a text $$T$$’s happiness as the average of its component word scores: $h_{\textrm{avg}}^{(T)} = \frac{ \sum_{w \in L} h_{\textrm{avg}} {(w)} \cdot f_{w} } { \sum_{w \in L} f_{w} } = \sum_{w \in L} h_{\textrm{avg}} {(w)} \cdot p_{w}$ where $$h_{\textrm{avg}} {(w)}$$ is the average perceived happiness of word $$w$$, $$f_{w}$$ is the frequency with which word $$w$$ appears in $$T$$, and $$p_{w} = \frac{f_{w}}{\sum_{w \in L} f_{w}}$$ is the normalized version of $$f_{w}$$.

### Interactive Word shifts:

Okay, now that we can measure happiness (or any other quantity for which we have a lexical, ground-truth, or bootstrap meter), we need to understand why scores go up and down. There are many sentiment measures around but for the most part they are opaque in their workings. The linearity of our measure allows us to show in great detail why one text is happier than another through what we call word shifts.

Word shifts give us reason to trust our measure, to discover how to improve it, and the ability to explore happiness changes for which we do not have immediate intuition. They require some concentration to grasp but once you understand them, you’ll be very pleased with the rich information they provide. You can think of them as sophisticated word clouds.

We’ll present a little math first, then a few pictures, and tie everything up with a video. We’ll reprise and simplify the explanation for word shifts we gave in our 2009 PLoS ONE paper on Twitter happiness (p. 10).

Let’s say we have two texts which we call ‘reference’, $${T}^\textrm{(ref)}$$, and ‘comparison’, $${T}^\textrm{(comp)}$$. We want to know why the happiness of the comparison text, $$h_{\textrm{avg}}^\textrm{(comp)}$$, is higher or lower than that of the reference text, $$h_{\textrm{avg}}^\textrm{(ref)}$$.

We take the difference of their average happiness scores, rearrange a few things, and arrive at $h^{\textrm{(comp)}}_{\textrm{avg}} - h^{\textrm{(ref)}}_{\textrm{avg}} = \sum_{w \in L} \underbrace{ \left[ h_{\textrm{avg}} {(w)} - h^{\textrm{(ref)}}_{\textrm{avg}} \right] }_{+/-} \underbrace{ \left[ p_w^{\textrm{(comp)}} - p_w^{\textrm{(ref)}} \right] }_{\uparrow/\downarrow}.$

Each word contributes to the word shift according to its happiness relative to the reference text ($${+/-}$$ = happier/sadder), and its change in frequency of usage ($$\uparrow/\downarrow$$ = more/less). We normalize the word shift contributions of each word so that they sum to $$\pm$$100 (the sign depends on whether happiness goes up or down), and order them by absolute value for the default word shift view.

Below is a word shift for Robin Williams’s tragic death where the reference text is 10% of all tweets from the previous seven days, and the comparison text is 10% of all tweets from the day he died. We will explain everything in the following section, but immediately we see a preponderance of negativity with the words ‘RIP’, ‘sad’, ‘suicide’, ‘dead’, and ‘depression’ increasing the perceived sadness of the day, while there is also an increase in ‘thank’ and ‘laughter’. The length of the bars at the top of the shift give the combined contribution of the four ways individuals words can change the happiness score, and as demonstrated in the video at the end of this post, these bars can be clicked on to focus on only words of their kind.

Word shift showing how happiness dropped on Twitter for the day of Robin Williams’s death compared to the previous seven days. Click for interactive version.

Let’s get to these four types of words. When combined, a word’s relative happiness ($${+/-}$$) and change in frequency usage ($$\uparrow/\downarrow$$) give how the word contributes to the change in happiness between the two texts, which can occur in one of four ways. We’ll move through viewing modes for the word shift above to demonstrate.

• $$+$$$$\uparrow$$, strong yellow: Increased usage of relatively positive words—If a word is happier than text $$T_{\rm ref}$$ ($$+$$) and is being used more in text $$T_{\rm comp}$$ ($$\uparrow$$), then it makes the comparison text happier.

Positive words being used more frequently on the day of Robin Williams’s death. Click for interactive version.

• $$-$$$$\downarrow$$, pale blue: Decreased usage of relatively negative words—If a word is less happy than text $$T_{\rm ref}$$ ($$-$$) and appears relatively less often in text $$T_{\rm comp}$$ ($$\downarrow$$), then it also makes the comparison text happier.

Negative words being used less frequently on the day of Robin Williams’s death. Click for interactive version.

• $$+$$$$\downarrow$$, pale yellow: Decreased usage of relatively positive words—If a word is happier than text $$T_{\rm ref}$$ ($$+$$) and appears relatively less often in text $$T_{\rm comp}$$ ($$\downarrow$$), then it makes the comparison text sadder.

Positive words being used less frequently on the day of Robin Williams’s death. Click for interactive version.

• $$-$$$$\uparrow$$, strong blue: Increased usage of relatively negative words—If a word is less happy than text $$T_{\rm ref}$$ ($$-$$) and appears relatively more often in text $$T_{\rm comp}$$ ($$\uparrow$$), then it also makes the comparison text sadder.

Negative words being used more frequently on the day of Robin Williams’s death. Click for interactive version.

So that’s the end of our description. The short video below shows how our interactive word shift works for our global Twitter time series, and will help reinforce what we’ve laid out above. Please spend some time exploring the shifts, returning to this explanation as you need to.

# Hedonometer 2.0

Geography of Happiness for the US

Over the summer of 2014, we have worked very hard to bring many new pieces to our Hedonometer, and we’re pleased to tell you about what we’ve done, and where we’re going next.

Snapshot of Hedonometer 2.0′s happiness time series.

All along, one of the central goals for the Hedonometer has been to provide a new instrument for society’s dashboard, one that measures population-level happiness in real time from any streaming text source. Like flying a plane, where we would never want just one dial with the limits “all good” and “uh-oh”, we need a sophisticated dashboard to quantify how well a population is faring. We want to see unconventional measures like ours added to traditional, easier-to-gauge quantities often concerned with economic activity. Money doesn’t equal happiness. We hope the Hedonometer will enable individuals, journalists, policy makers, corporations, and other research teams in their various pursuits.

Because our Hedonometer works for any large text, we’re able to explore other areas for basic science purposes, particularly the vast realm of sociotechnical systems and the digital humanities. And some of our work will be simply just for fun (hopefully yours and ours).

Harry Potter and the Prisoner of Azkaban.

As you’ll see below, we have many plans for the future. So far, we’ve received crucial support from the NSF and the MITRE Corporation, and we’re always looking for more ways to continue to lift our enterprise. If you’re interested in or have suggestions about funding our work, please contact us.

Okay—here’s what we’ve put together. We now have four main interactive views of emotion up and running:

1. A completely rebuilt global Twitter happiness time series in English, updated daily and with powerful new word shifts;
2. An interactive map of happiness for the 50 US States plus DC, also based on Twitter;
3. A ranked list of cities by happiness for the US (Twitter again);
4. and an explorable visualization of the emotional plot trajectories of 10,000 books in 10 major languages including Harry Potter along with classic and obscure works.

We’ll go into more depth about how to use and share these visualizations in our following blog posts. As always, Hedonometer stands on a team effort but we have to acknowledge and praise Andy Reagan (@andyreagan) for his incredible efforts in leading the charge to Hedonometer 2.0. Building things is fun.

Some of the many new elements we’re looking to add in the next year are:

1. Other kinds of real-time, population-scale meters based on word usage including sleep, food consumption, exercise, binge-drinking, and boredom. We’ll apply these meters to Twitter but they could in principle be used on any text.
2. Global Twitter happiness time series and maps for all 10 languages: English, Spanish, French, German, Brazilian Portuguese, Indonesian, Korean, Simplified Chinese, Arabic, and Russian;
3. Real-time Twitter happiness at 1 minute time scales.
4. An interactive world map with the ability to explore at scales of country, state, city, and district (or equivalents).
5. Simple ways to embed our interactive visualizations into webpages;
6. A simple interface for uploading and comparing two texts, and for generating shareable visualizations;
7. Phrase-based rather than word-based analysis in English;
8. More stand-alone projects such as interactive visualizations of music lyrics over the last 60 years;
9. Measures based on other major emotions such as fear, disgust, anger, and surprise.

We also have two longer-term, major projects in development:

1. A fast search facility in the Hedonometer for users to find the emotional spectrum around specific words or phrases. This is a computationally bundensome problem. We’ll be able to show how the emotional texture of how people are talking about an event or a product.
2. Storybreaker: a real-time extractor of stories and narratives emerging around major events. Our algorithm will include emotion but our goal is to measure frames around issues and ultimately meaningful stories.

One last thing: we’ve moved our blog from onehappybird to compstorylab.org. All old links will still work.

# How does movement influence your daily happiness?

Imagine commuting an hour to work, one way, grinding through miles of traffic to get from your suburban home to a desk job in the big city. Excited yet?

Ok, now imagine that you lead a life of leisure traveling the world. You fly coast-to-coast to see a concert, soak in some culture, and drink fine wine. Does this lifestyle seem more appealing?

Lets try to quantify the influence of these travel patterns on individual happiness. We do this using geolocated tweets, which we have previously used to reveal the happiness of cities, and to quantify patterns of movement.

Each point corresponds to a geo-located tweet from 2011.
(A) USA (B) Washington, D.C. (C) Los Angeles (D) Earth

First, we find the average location of each individual’s tweets. We call this their expected location. Then we draw circles emanating from this spot, like rings on a dart board. Some messages are written close to home, others from very far away.

Then we collect all of the words written at each distance, roughly 500,000 tweets per ring. Averaging the happiness of words found at each distance, remarkably we find that happiness increases logarithmically with distance from expected location. Tweets authored far from home contain a smaller number of negative words.

Tweets are grouped into ten equally populated bins by the distance from their author’s average location, and the average happiness of words written at each distance is plotted. Expressed happiness grows logarithmically with distance from home.

Home is where the hate is? What? No.

Below we look at the difference between the happiest and saddest distances from home. Words appearing on the right increase the happiness of the 2500km distance relative to the 1km distance. For example, tweets authored far from an individual’s expected location are more likely to contain the positive words beach’, new’, great’, park’, restaurant’, dinner’, resort’, coffee’, lunch’, cafe’, and food’, and less likely to contain the negative words no’, don’t’, not’, hate’, can’t’, damn’, and never’ than tweets posted close to home. Words going against the trend appear on the left, decreasing the happiness of the 2500km distance group relative to the 1km group.

Word shift graph comparing the lowest average word happiness distance group to the words authored farthest from home.

Tweets written close to home are more likely to contain the positive words me’, lol’, love’, like’, haha’, my’, you’, and good’. Moving clockwise, the three insets show that the two text sizes are comparable, the biggest contributor to the happiness difference is the decrease in negative words authored by individuals very far from their average location, and the 50 words listed make up roughly 50% of the total difference between the two bags of words. For you visual learning folks, here is a short video explaining how these word shifts work.

Take home story: people tweeting far from home talk about food more, and they swear less than people tweeting close to home. These people are probably enjoying awesome vacations, and tweeting about it!

In summary, if you are a fellow with a daily commute that makes you feel a little bit sad, you are not alone! Try swearing less. Or ride your bike.

If you are lucky enough to travel often, then keep smiling…maybe send the rest of us some pictures to cheer us up!

For more details on our analysis, check our paper “Happiness and the Patterns of Life: A Study of Geolocated Tweets” recently published in Nature Scientific Reports.

# Now Published: The Geography of Happiness

Today we’re pleased to announce that our article “The Geography of Happiness: Connecting Twitter sentiment and expression, demographics, and objective characteristics of place” has been officially published by PLoS ONE.  We wanted to tell you about one key piece we’ve added to the paper and an unusual new Twitter account we’ve created.

After our three blog posts (which coincided with the release of the preprint), we received plenty of media attention, as well as some fantastic feedback from readers (thanks!). One very important question kept coming up: “How well does happiness agree with other measures of well-being?”, or more simply: “Why should we believe you?”

Well, we’re glad you asked.  For the final paper, we’ve added a US state-level comparison between our happiness measure and five other kinds of well-being indices:

• the Behavioral Risk Factor Surveillance Survey (BRFSS)  for which people were asked to rate their life satisfaction on a scale of 1 to 4 (the BRFSS was explored in this Science paper on well-being from a few years back);
• Gallup’s health survey-based well-being index;
• the Peace Index, which aggregates various crime data;
• the America’s Health Ranking, which aggregates health data; and
• gun violence, specifically the number of shootings per 100,000 people.

In the figure below, we show a series of scatter plots comparing all pairs of well-being metrics  (happiness runs along the top row).  Each dot represents a US state, and the colors represent strength of correlation or agreement between measures, with blue meaning strong agreement, and red representing no (statistically significant) agreement. (We include the exact Spearman correlation coefficienr and p-value in each scatter plot.)

Scatter matrix showing comparison between different well-being metrics for all US states. The top row shows comparisons with happiness. Colors indicate the strength of correlation between pairs of metrics; shades of blue indicate increasingly significant correlation.

Looking at the top row, we can immediately see that happiness agrees with all measures except for the BRFSS. However, the BRFSS itself doesn’t agree with any other measure except for the Gallup well-being index.  The most striking departure was the BRFSS ranking Louisiana as the happiest state whereas our happiness measure placed it last.  There are a number of possible explanations for these disagreements: one is that the BRFSS data was taken between 2005 and 2008, while all other data is from 2011 only; another is that unlike the other measures, happiness is self-reported in the BRFSS. How would you answer if asked how happy you are? Do you expect that your answer is representative of the population you live in at large? There are certainly many different ways to define “happiness”, as a number of different readers have pointed out.

Of course, this is not to criticize the BRFSS (it remains a significant data source, and Oswald & Wu did fine work analyzing it in their Science paper), but merely to suggest that our word happiness score is measuring something different but perhaps complementary to traditional survey-based techniques. There certainly appears to be plenty of value to observing people “in the wild” via social network data, e.g. with the real-time instrument hedonometer.org.

Finally, to celebrate the publication of our article we created a Twitter feed, @geographyofhapp, dedicated to tweeting the happiest and saddest city every day, and we invite you to follow.  We’re hoping that this is the first research article with its own Twitter account, but perhaps not hoping that it represents the future of scientific publishing…

# Now online: the Dow Jones Index of Happiness

Total excitement people: our website hedonometer.org has gone live.  We’re measuring Twitter’s happiness in real time.  Please check it out!

If you’re still here, here’s the blurb from the site’s about page:

Happiness: It’s what most people say they want. So how do we know how happy people are? You can’t improve or understand what you can’t measure. In a blow to happiness, we’re very good at measuring economic indices and this means we tend to focus on them. With hedonometer.org we’ve created an instrument that measures the happiness of large populations in real time.

Our hedonometer is based on people’s online expressions, capitalizing on data-rich social media, and we’re measuring how people present themselves to the outside world. For our first version of hedonometer.org, we’re using Twitter as a source but in principle we can expand to any data source in any language. We’ll also be adding an API soon.

So this is just a start – we invite you to explore the Twitter time series, let us know what you think, and follow the daily updates through the hedonometer twitter feed: .

How does food (or talking about food online) relate to how happy you are? This is part 3 of our series on the Geography of Happiness. Previously we’ve looked at how happiness varies across the United States (as measured from word frequencies in geotagged tweets), and then at how different socioeconomic factors relate to variations in happiness. Now we focus in on one particular important health factor that might influence happiness, obesity.

We looked at how happiness varied with obesity across the 190 largest metropolitan statistical areas in the United States, giving us the following scatter plot:

Each point represents one city; for example the city with both(!) lowest obesity and greatest happiness in this set is Boulder, CO, located at the top left. The red line is a linear trend through the data (a line of best fit). Again, for the mathematically minded onehappybird watchers, we show the Spearman correlation coefficient and its corresponding p-value at the lower left. We do this to convince you that there is, in fact, a statistically significant downward trend in the blob of points in the picture! The big story here is of course that as obesity goes up, happiness goes down.

The natural next question to ask is: are there any words which could be indicators of obesity? What foods are people in obese cities eating, or talking about? To answer this question we correlated word frequencies with obesity, and searched for the most strongly-correlating food-related words. Below are two examples: on the left, “mcdonalds”, and on the right, “cafe”.

As obesity goes up, so does talk (at least on Twitter) about McDonalds, but talk about cafes follows the opposite trend! Does that mean that in order to lose weight we should spend more time sipping lattes in cafes? I wish.

Looking through the list of words, the top 5 food-related words that increase in frequency as obesity went up were:

1. mcdonalds
2. eat
3. wings
4. hungry
5. heartburn

We were surprised by ‘hungry’! On the other hand, the top food-related words which were used more as obesity went down were:

1. cafe
2. sushi
3. brewery
4. restaurant
5. bar

Perhaps unsurprisingly, these are words typically used by the high-socioeconomic group described in our previous post on city happiness, suggesting that better health correlates with higher socioeconomic status. You can find the complete list of how all words correlate with happiness here (page best viewed using Google Chrome). One surprising result was the observation that far more food-related words appeared in the low-obesity group than in the high-obesity group; in other words, food was being talked about more in the less-obese cities!

Summarizing: based on word usage, the Twitter diet consists of: breakfast at your favorite cafe, a delicious sushi lunch, dinner out at a fancy restaurant, with a nightcap at the best local bar or brewery. Thank you Twitter, don’t mind if I do.

All jokes aside, this sort of technique has great potential. Imagine being able to predict whether obesity was going to rise or fall in a city, or estimate changes in other demographics, just by analyzing the words people use online. Perhaps New York City Mayor Michael Bloomberg would find some early indicators of the success or failure of his war on soda!

And that’s all for this series of posts on the geography of happiness. More information on all of the results in this series can be found in our recently submitted arxiv paper. Please take a look at it and the accompanying online appendices, where you can look through all of the data yourself. As a special bonus feature, you can check out this video of me talking about this work at our recent TEDxUVM conference.  Thanks for reading!

# What makes a city happy?

Welcome back, onehappybird watchers! Wow, what a crazy week of coverage of our post about how happiness varies by city and state across the United States. Many, many people read, shared, and commented on the post, for which we are grateful. For the detailed explanation of the results, check out the full paper we recently submitted to PLoS ONE.

A number of readers wondered how variations in happiness relate to different underlying social and economic factors. To try to answer this question, we took data from the 2011 census (all helpfully available online on the Census Bureau’s American FactFinder website) and correlated it with our measure of happiness. Surprisingly, happiness generally decreases with the number of tweets per capita in a city (this doesn’t mean that tweeting more will make you less happy, it’s only a correlation):

Next, we grouped covarying demographic characteristics obtained from the census, and looked at how these clusters varied with happiness. For example, it might not surprise you that cities with a larger percentage of married couples also contain a larger percentage of children – this is what we mean by covarying demographics.  And you might or might not be surprised that more marriage is positively correlated with happiness.  There’s plenty of scatter but the connection is there:

Scatter plot of happiness vs. percentage of population married. Each dot represents one city, the rho and p-values reported are Spearman correlations.

We used an automated algorithm to bin the census data for us into eight groups and then compared the happiness of those groups, leading to the following figure:

Each point represents a characteristic from the census (for example, the % married/happiness plot above is now represented by one point in this figure), with the horizontal groupings representing covarying demographic characteristics. A point’s position on the vertical axis shows how that characteristic varies with happiness across all cities. A positive value means that happiness is higher in cities where that characteristic is higher, while a negative value means that happiness is lower in cities where that characteristic is higher. For example, the figure shows that as the percentage of married couples in a city increases, so does the average happiness of that city (no causality is implied).

Only two groupings (the colored dots on the far left and right) showed strong correlation (either positive or negative) with happiness. Looking at which characteristics make up these groups, it appears that the general story here is a socioeconomic one, and one that holds only at the extremes. With our peculiar Twitter-based lens, we see money statistically correlates with happiness, which is not quite as catchy as “money buys happiness” (see the debate over the Easterlin Paradox for more). You can delve into the data yourself – the correlations of all 432 characteristics of cities recorded by the census with happiness can be found here (page best viewed using Google Chrome).

A more interesting question might be how word usage varies with different demographics – to do this we correlated each word with each demographic characteristic across all 373 cities in our dataset, leading to a lot of data to sift through! (And you can too, by following the link in the above paragraph.) As an example, take a look at how the word “cafe” varies with the percentage of population with a college degree:

Each point in the figure represents one city, and broadly the trend is that the more “college-y” the city is, the more people talk about cafes online. (You can decide for yourself whether that’s surprising or not). The top 10 emotive words whose usage went up as percentage of population with a college degree went up turned out to be:

1. cafe
2. pub
3. software
4. yoga
5. grill
6. development
7. emails
8. wine
9. art
10. library

And the emotive words which went up as college degrees went down?

1. me
2. love
3. my
4. like
5. hate
6. tired
7. sleep
8. stupid
9. bored
10. you

We saw similar patterns of word use across many socioeconomic characteristics – emotive words and words about interpersonal relationships (‘me’ and ‘you’) at one end of the spectrum, and words about more complex social or intellectual themes at the other. Interestingly, we find more food-related words in this group as well.

Of course, all of this is open to interpretation. As many commenters last week pointed out, Twitter users (indeed, specifically those users who geotag their tweets using a mobile device) are a small, non-representative sample of the global population. Furthermore, our method is undeniably crude, and by breaking texts up into their constituent words ignores the context in which those words were used. That said, many of these results agree with our intuition (for example, many of the cities with low happiness scores also appeared on a list of America’s “most miserable cities” published late last week by Forbes), while some surprise us. There is certainly a lot to be learned by looking at what the data can tell us, and we encourage you to do so by exploring our website of supplementary data. Again, you can read the full technical details in our research paper here.

We’ll pick up on the theme of food again in our next post, which will focus on one important health factor relating to happiness – obesity.

# Where is the happiest city in the USA?

(Update: this work is now published at PLoS ONE)

Is Disneyland really the happiest place on Earth?* How happy is the city you live in? We have already seen how the hedonometer can be used to find the happiest street corner in New York City, now it’s time to let it loose on the entire United States.

We plotted over 10 million geotagged tweets from 2011 (all our results are in this paper, also on the arxiv), coloring each point by the average happiness of nearby words (detail on how we calculate happiness can be found in this article published in PLoS ONE):

As well as cities and the roads between them, we can make out many regions of higher and lower happiness, even within individual cities. As an example, check out this tweet-generated map of the city of Chicago:

Tweet-generated map of Chicago. Click to enlarge.

Notice the striking contrast between the relatively happy Central/North Side of the city, and the sadder South Side. You can also find a few airports in this map, and if you look very closely you might even be able to pick out happy and sad terminals!

To quantify this variation in happiness a bit better, let’s look at the average happiness of each state:

Southern states tend to produce sadder words than those in northern New England or out west. Hawaii emerges as the happiest state and Louisiana as the saddest, due to relative differences in the frequencies of happy and sad words used in each state. Here at onehappybird, we characterize such differences by “word shifts”, which are basically word clouds for grown-ups. You can find examples of these, as well as the full list of the average happiness of each state, here (page best viewed using Google Chrome).

Zooming in further to the level of cities, we produced a similar list for 373 cities in the lower 48 states (you can find the full list, as well as maps and word shifts for each city, here). With a score of 6.25, we found the happiest city to be Napa, CA, due to a relative abundance of such happy words as “restaurant”, “wine”, and even “cheers”, along with a lack of profanity.

At the other end of the spectrum, we found the saddest city to be Beaumont, TX, with a score of 5.82. In general, cities in the south tended to be less happy than those in the north, with a major contributing factor being the relative abundance of profanity used in those cities.

We can go even further than this, and group cities by similarities in word usage. Each square in the heatmap below represents the similarity (Spearman correlation for you mathematically minded onehappybird watchers) between word distributions for the largest cities in the US. Red squares mean that the corresponding cities use words in a similar fashion, while blue means that those cities tend to use different types of words with respect to each other. The colors in the tree diagram at the top signify clusters of cities exhibiting similar word usage (below a certain threshold).

As we might expect for two cities that are geographically nearby, New Orleans and Baton Rouge are clumped together at the bottom right of the figure. On the other hand, New York and Seattle get clumped together as well, suggesting that similarities in language depend on more than just geographical proximity.

You can find more information about happiness and cities, as well as details on the methods used to produce these results, in our arxiv research article. In our next post, we’ll look at how these results are related to various underlying socioeconomic characteristics of cities. What makes a city happy or sad? Can we use Big Data to predict future changes in the demographics, health, or happiness of a city? How does happiness relate to the food you eat?

*By the way, to answer the question at the start of this post: According to this analysis Disneyland is not the happiest place on Earth; it isn’t even the happiest place in Southern California! See if you can find it in this tweet-generated map of LA! Or find your city here.

# The Daily Unraveling of the Human Mind

Each morning we find ourselves in wide flung arms of drowsy possibilites. Cradled by the warm embrace of our beds, we begin our day, rebooted and rejuvenated. Having not eaten for a full eight hours, we can enjoy a guilt free breakfast, setting a blissful tone for the day.

Hourly frequency of meal references on twitter.
See figure 1 page 3 of our paper for details.

Last night’s dreams of victory and triumph bolster our delusions of adequacy, preparing us to surmount any of life’s challenges. But the moment we step outside, reality commences its slow and insidious descent. Its weight, compressing our spine, crushing our dreams, alters the course of the day completely.  The soul crushing litany of work, interacting with people, and generally living our lives takes its toll. As our sanity unravels, apathy takes root. The profane becomes our standard of expression. In the throes of despair, we swear just to feel something. We swear increasingly as we realize the inevitability of repeating this all again tomorrow.

F***, that’s a terrifying thought.

This ephemeral pattern is reflected in our tweets, our spontaneous burst of being. Below, we see our happiness peaks during the early hours of the day, and degrades as the hours progress (yellow circles). The proportion of profanity in our tweets, however, follows a reverse cycle. Profanity appears in a smaller percent of tweets at the start of each day, and increases gradually as time wears on.

Daily Unraveling
See figure 10 page 15 of our paper for details.

Remarkably, the relative frequency of these five expressions of frustration (a******,  b****, s***, f***, m***********) are quite similar.

Well done, humans.

To avoid experiencing the daily unraveling, we recommend eating organic, local dark chocolate all day long.