Hedonometer 2.0: Measuring happiness and using word shifts

With our Hedonometer, we’re measuring how a (very capable) individual might feel when reading a large text—a day’s worth of tweets from New York City, the first chapter of Moby Dick, or the music lyrics from all UK pop songs released in 1983.

We’ll describe two fundamental pieces of the Hedonometer in this post:

  1. How our simple measure works;
  2. How to understand changes in happiness scores through our interactive word shifts.

More depth on everything below can be found in our foundational papers. Off we go:

Measuring happiness:

We measure the happiness of large-scale texts using what we call a lexical meter. (We’ll be introducing two other kinds of meters in the near future: ground truth meters and bootstrap meters.)

Lexical meters assess a given quality of a text (e.g., expressed happiness) by averaging over the contributions of that same, already measured quality for the text’s individual words (and potentially phrases).

For happiness, we have lists of average happiness scores for around 10,000 commonly used words in 10 languages, each deriving from the personal evaluations of 50 people. Scores were made on a 1 to 9 integer scale with 1 meaning “I feel extremely sad”, 5 meaning “I feel no emotion”, and 9 meaning “I feel extremely happy.” (See Likert scales for more.) The happiness scores of a few example words are \( h_{\textrm{avg}}(\textrm{‘war’}) = 1.80\), \( h_{\textrm{avg}}(\textrm{‘the’}) = 4.98\), and \( h_{\textrm{avg}}(\textrm{‘laughter’}) = 8.50\).

Now, not all words convey emotion and some words are ambiguous or difficult to rate. We’ve found that we can apply a simple “word lens” to improve and tune the Hedonometer by analysing texts using words with enough clear emotional content. We’ve found a good default lens excludes all words for which \( 4 \lt h_{\textrm{avg}} \lt 6 \) (this is one principled way of finding stop words). We also allow users to choose any lens they like, as we explain here. Again, see our foundational papers for more. We’ll write \( L \) as the set of all words allowed by a particular lens choice.

You can explore all of our English words here and download the entire data set using our API.

So, given a word lens \(L\), we now score a text \(T\)’s happiness as the average of its component word scores: \[ h_{\textrm{avg}}^{(T)} = \frac{ \sum_{w \in L} h_{\textrm{avg}} {(w)} \cdot f_{w} } { \sum_{w \in L} f_{w} } = \sum_{w \in L} h_{\textrm{avg}} {(w)} \cdot p_{w} \] where \(h_{\textrm{avg}} {(w)}\) is the average perceived happiness of word \(w\), \(f_{w}\) is the frequency with which word \(w\) appears in \(T\), and \( p_{w} = \frac{f_{w}}{\sum_{w \in L} f_{w}} \) is the normalized version of \(f_{w}\).

Interactive Word shifts:

Okay, now that we can measure happiness (or any other quantity for which we have a lexical, ground-truth, or bootstrap meter), we need to understand why scores go up and down. There are many sentiment measures around but for the most part they are opaque in their workings. The linearity of our measure allows us to show in great detail why one text is happier than another through what we call word shifts.

Word shifts give us reason to trust our measure, to discover how to improve it, and the ability to explore happiness changes for which we do not have immediate intuition. They require some concentration to grasp but once you understand them, you’ll be very pleased with the rich information they provide. You can think of them as sophisticated word clouds.

We’ll present a little math first, then a few pictures, and tie everything up with a video. We’ll reprise and simplify the explanation for word shifts we gave in our 2009 PLoS ONE paper on Twitter happiness (p. 10).

Let’s say we have two texts which we call ‘reference’, \({T}^\textrm{(ref)}\), and ‘comparison’, \({T}^\textrm{(comp)}\). We want to know why the happiness of the comparison text, \(h_{\textrm{avg}}^\textrm{(comp)}\), is higher or lower than that of the reference text, \(h_{\textrm{avg}}^\textrm{(ref)}\).

We take the difference of their average happiness scores, rearrange a few things, and arrive at \[ h^{\textrm{(comp)}}_{\textrm{avg}} - h^{\textrm{(ref)}}_{\textrm{avg}} = \sum_{w \in L} \underbrace{ \left[ h_{\textrm{avg}} {(w)} - h^{\textrm{(ref)}}_{\textrm{avg}} \right] }_{+/-} \underbrace{ \left[ p_w^{\textrm{(comp)}} - p_w^{\textrm{(ref)}} \right] }_{\uparrow/\downarrow}. \]

Each word contributes to the word shift according to its happiness relative to the reference text (\({+/-}\) = happier/sadder), and its change in frequency of usage (\(\uparrow/\downarrow\) = more/less). We normalize the word shift contributions of each word so that they sum to \(\pm\)100 (the sign depends on whether happiness goes up or down), and order them by absolute value for the default word shift view.

Below is a word shift for Robin Williams’s tragic death where the reference text is 10% of all tweets from the previous seven days, and the comparison text is 10% of all tweets from the day he died. We will explain everything in the following section, but immediately we see a preponderance of negativity with the words ‘RIP’, ‘sad’, ‘suicide’, ‘dead’, and ‘depression’ increasing the perceived sadness of the day, while there is also an increase in ‘thank’ and ‘laughter’. The length of the bars at the top of the shift give the combined contribution of the four ways individuals words can change the happiness score, and as demonstrated in the video at the end of this post, these bars can be clicked on to focus on only words of their kind.

Word shift for Robin Williams's death

Word shift showing how happiness dropped on Twitter for the day of Robin Williams’s death compared to the previous seven days. Click for interactive version.

Let’s get to these four types of words. When combined, a word’s relative happiness (\({+/-}\)) and change in frequency usage (\(\uparrow/\downarrow\)) give how the word contributes to the change in happiness between the two texts, which can occur in one of four ways. We’ll move through viewing modes for the word shift above to demonstrate.

  • \(+\)\(\uparrow\), strong yellow: Increased usage of relatively positive words—If a word is happier than text \(T_{\rm ref}\) (\(+\)) and is being used more in text \(T_{\rm comp}\) (\(\uparrow\)), then it makes the comparison text happier.
    Positive words being used more frequently on the day of Robin Williams's death.

    Positive words being used more frequently on the day of Robin Williams’s death. Click for interactive version.

  • \(-\)\(\downarrow\), pale blue: Decreased usage of relatively negative words—If a word is less happy than text \(T_{\rm ref}\) (\(-\)) and appears relatively less often in text \(T_{\rm comp}\) (\(\downarrow\)), then it also makes the comparison text happier.
    Negative words being used less frequently on the day of Robin Williams's death.

    Negative words being used less frequently on the day of Robin Williams’s death. Click for interactive version.

  • \(+\)\(\downarrow\), pale yellow: Decreased usage of relatively positive words—If a word is happier than text \(T_{\rm ref}\) (\(+\)) and appears relatively less often in text \(T_{\rm comp}\) (\(\downarrow\)), then it makes the comparison text sadder.
    Positive words being used less frequently on the day of Robin Williams's death.

    Positive words being used less frequently on the day of Robin Williams’s death. Click for interactive version.

  • \(-\)\(\uparrow\), strong blue: Increased usage of relatively negative words—If a word is less happy than text \(T_{\rm ref}\) (\(-\)) and appears relatively more often in text \(T_{\rm comp}\) (\(\uparrow\)), then it also makes the comparison text sadder.
    Positive words being used more frequently on the day of Robin Williams's death.

    Negative words being used more frequently on the day of Robin Williams’s death. Click for interactive version.

So that’s the end of our description. The short video below shows how our interactive word shift works for our global Twitter time series, and will help reinforce what we’ve laid out above. Please spend some time exploring the shifts, returning to this explanation as you need to.

Exploring Hedonometer 2.0′s global Twitter time series

In this post, we’ll run through the basic features of our new interactive happiness time series for Twitter. We’ll first use words and pictures to orient your experience, and then finish with a video explanation.

Our method for measuring happiness, which we describe in a companion post and more fully in our foundational papers, relies on perceived happiness scores for individual words. The scale we use is 1 to 9, with 1 meaning extremely negative, 5 neutral, and 9 extremely positive. Our general experience with our measure is that scores for texts range between 5 and 7.

When you first visit hedonometer.org, you’ll see the daily happiness time series for the most recent 18 months of Twitter. For this overall visualization, we’ve analysed around 10% of all tweets going back to the end of 2008 using our English language Hedonometer (we’ll be adding time series for more languages soon). Here’s an example 18 month view starting around January, 2012:

Hedonometer time series with the selector active.

Hedonometer time series with the slider active. Click for interactive versin.

The main elements of the time series are:

  1. Circles whose heights represent daily average happiness scores. These circles are color-coded for day of the week, and the menu in the top right corner allows the colors to be toggled on and off.
  2. A slider at the bottom to allow movement back and forth along the time series, as well as the ability to zoom in or out by grabbing its edges:
    Time series selector.

    Time series selector.

  3. A curated set of “Major event” dates where our happiness measure jumps or drops unusually. For clarity, more events will appear as you zoom in, less will show as you zoom out.
  4. Pop-up “Word shifts”—which we describe in detail in the companion post mentioned above—that show how word usage changes contribute to a specific day’s happiness difference relative to the previous seven days. Rolling over any date’s circle will show a small pop up word shift, and clicking on that word shift will bring up an interactive, richer version. For major events, a link to the relevant Wikipedia entry will be included in the full word shift. Here’s an example of the view you’ll see for the death of Robin Williams:
    Word shift for Robin Williams's death.

    Word shift for Robin Williams’s death. Click for interactive version.

  5. The very important ability to reliably share or bookmark any view you generate with the details stored in the full URL. You can either use the share buttons we provide or simply copy and paste the URL. Here’s an example that will link directly to a view of 2012: http://hedonometer.org/index.html?from=2012-01-01&to=2012-12-31.

So, that’s the basic story for our new time series, and now here’s a short video explanation which will help as well:

Hedonometer 2.0

Geography of Happiness for the US

Geography of Happiness for the US

Over the summer of 2014, we have worked very hard to bring many new pieces to our Hedonometer, and we’re pleased to tell you about what we’ve done, and where we’re going next.

Snapshot of Hedonometer 2.0's happiness time series.

Snapshot of Hedonometer 2.0′s happiness time series.

All along, one of the central goals for the Hedonometer has been to provide a new instrument for society’s dashboard, one that measures population-level happiness in real time from any streaming text source. Like flying a plane, where we would never want just one dial with the limits “all good” and “uh-oh”, we need a sophisticated dashboard to quantify how well a population is faring. We want to see unconventional measures like ours added to traditional, easier-to-gauge quantities often concerned with economic activity. Money doesn’t equal happiness. We hope the Hedonometer will enable individuals, journalists, policy makers, corporations, and other research teams in their various pursuits.

Because our Hedonometer works for any large text, we’re able to explore other areas for basic science purposes, particularly the vast realm of sociotechnical systems and the digital humanities. And some of our work will be simply just for fun (hopefully yours and ours).

Harry Potter and the Prisoner of Azkaban.

Harry Potter and the Prisoner of Azkaban.

As you’ll see below, we have many plans for the future. So far, we’ve received crucial support from the NSF and the MITRE Corporation, and we’re always looking for more ways to continue to lift our enterprise. If you’re interested in or have suggestions about funding our work, please contact us.

Okay—here’s what we’ve put together. We now have four main interactive views of emotion up and running:

  1. A completely rebuilt global Twitter happiness time series in English, updated daily and with powerful new word shifts;
  2. An interactive map of happiness for the 50 US States plus DC, also based on Twitter;
  3. A ranked list of cities by happiness for the US (Twitter again);
  4. and an explorable visualization of the emotional plot trajectories of 10,000 books in 10 major languages including Harry Potter along with classic and obscure works.

We’ll go into more depth about how to use and share these visualizations in our following blog posts. As always, Hedonometer stands on a team effort but we have to acknowledge and praise Andy Reagan (@andyreagan) for his incredible efforts in leading the charge to Hedonometer 2.0. Building things is fun.

Some of the many new elements we’re looking to add in the next year are:

  1. Other kinds of real-time, population-scale meters based on word usage including sleep, food consumption, exercise, binge-drinking, and boredom. We’ll apply these meters to Twitter but they could in principle be used on any text.
  2. Global Twitter happiness time series and maps for all 10 languages: English, Spanish, French, German, Brazilian Portuguese, Indonesian, Korean, Simplified Chinese, Arabic, and Russian;
  3. Real-time Twitter happiness at 1 minute time scales.
  4. An interactive world map with the ability to explore at scales of country, state, city, and district (or equivalents).
  5. Simple ways to embed our interactive visualizations into webpages;
  6. A simple interface for uploading and comparing two texts, and for generating shareable visualizations;
  7. Phrase-based rather than word-based analysis in English;
  8. More stand-alone projects such as interactive visualizations of music lyrics over the last 60 years;
  9. Measures based on other major emotions such as fear, disgust, anger, and surprise.

We also have two longer-term, major projects in development:

  1. A fast search facility in the Hedonometer for users to find the emotional spectrum around specific words or phrases. This is a computationally bundensome problem. We’ll be able to show how the emotional texture of how people are talking about an event or a product.
  2. Storybreaker: a real-time extractor of stories and narratives emerging around major events. Our algorithm will include emotion but our goal is to measure frames around issues and ultimately meaningful stories.

One last thing: we’ve moved our blog from onehappybird to compstorylab.org. All old links will still work.

The Ferguson protests: Quantifying state-level sentiment on Twitter

Reporting on the August 9, 2014 shooting death of Michael Brown, David Carr concluded his August 17 piece for the New York Times by observing that “nothing much good was happening in Ferguson until it became a hashtag”. Following the story’s rise and spread on Twitter, the protests in Missouri swiftly captured the news cycle in the U.S., and brought into focus the consequences of militarization and racial inequality in police forces throughout the country.

Using our new and improved instruments at hedonometer.org, we can strongly quantify and visualize the texture of sentiment surrounding the protests on Twitter.  Over the course of a week starting Wednesday, August 13, our all-of-Twitter time series dipped several times, and we saw a large increase in negative words related to events in Ferguson as viewed in the word shift below.  Because we currently break tweets into individual words, “tear” is separated from “tear gas” but is still a negative term that rises to the top (we will move to phrases in a major future update of hedonometer).

Click on the graphic below for an interactive version of the word shift or here.

Click on the graphic for an interactive version.In comparing the week of August 13 to 20 with the last 90 days, we see Missouri’s happiness ranking dropped from 18th to 32nd. The geography of happiness for the U.S. is remarkably stable, and this is the first time we’ve observed such a large, rapid change for a state ranking.

Click on the graphic for an interactive version.

 

Over the 7 days of August 15 to 21, the positive words “lol”, “hahaha”, and “laughing” have been used relatively less frequently in Missouri than in the entire U.S., and the negative words “racist”, “violence”, and “protest” have been used relatively more frequently. Click here or on the image above to explore our sentiment map.

We’re in the process of building interactive sentiment maps for other languages and at scales of cities, countries, and regions.   Our hope is that through hedonometer, anyone will able to make and share geographically localized observations of crowd-sourced public opinion, and generate a defensible quantification of the collective conversation on Twitter and elsewhere.

 

Moose on the Loose!

Note: a version of this post was given by the author for Invocation at the UVM College of Engineering & Mathematical Sciences graduation ceremony, Flynn Theatre, May 18, 2014.

A few weeks ago—on one of those beautiful spring mornings that makes the long winter seem like it happened elsewhere—something quite remarkable took place here at the University of Vermont.

At the time, I was sitting outside on a small wooden picnic bench near my office on Trinity campus. The sun was shining, the birds were chirping their layered periodic rhythms, and the Green Mountains were finally living up to their name after months of radiating white.

Vermont Mountains Panorama at Sunrise Mt Ascutney, Vermont, New England, USA

I’d love to say that I was meditating within our glorious landscape. I’d really like to say that I was deeply appreciating the sacred gifts offered by Mother Nature. The truth is that I was totally geeking out.

I was reading storylabber Dilan Kiley’s undergraduate Honors thesis. It was awesome! He quantified the spread of information on Twitter in response to sudden, unanticipated events. These system-scale shocks can briefly synchronize our society’s chaotic collective attention. And while reading about them in Dilan’s thesis, I experienced one myself.

I heard a strange noise nearby, and looked up to find a moose staring at me, from 10 feet away. Well played Dilan.

The moose was out of breath, having been chased up the hill from the lake by an excited mob of followers. In a moment that seemed to last several seconds, we looked at each other. I was in awe. The moose looked unsure, confused, and lost.

Our time together was quickly interrupted by animal control officers. They were sprinting after the moose, trying to steer it safely into the woods. A small flock of undergraduates followed, looking at each other in disbelief at what they were seeing.

Professor Chris Danforth caught this photo today, which he shared on Twitter: "@ChrisDanforth: Sitting outside Farrell reading at the picnic table, then this happened. #mooseontheloose."#uvm #instauvm

Not surprisingly, a seven foot tall, thousand pound wild animal jogging through campus caused quite a stir! Pictures of the moose received thousands of likes on social media, and #mooseontheloose started trending, at least here in Vermont. After a few hours in the spotlight, Vermont Fish & Wildlife happily reported that the moose found its way back into the woods north of campus.

I tell you this story today because I think the moose’s adventure offers some lessons for us as you wander off campus to find your way home.

In the past few weeks, I’ve spoken to many of you, asking about your plans for the future. This is a time of great transition in your life. Most of you don’t have a grand plan, or even a muddy pond to call home.

Like the moose, you too may feel a bit lost. You too will have many people taking your picture, and making a big fuss over you. 

Over the coming months, you too will have well-intentioned loved ones trying to steer you to a safe path in life, advising you where to go, and what to do. You too will have to find your way through a noisy, often confusing set of uncertain options.

As people, we imitate role models whom we admire, using their past choices to inform our own. As scientists, we use mathematical models to make predictions, which are helpful, because unfortunately, observations of the future are not available at this time [1].

Seemingly inconsequential decisions, that you make, may change your life in the biggest ways. But which decisions are most important? To which decisions will your life be sensitively dependent?

I reached out to the hero of our story via his parody Twitter account @BTVMoose. Really. Talk about geeking out. I asked for words of wisdom for the class of 2014. Overcoming the great modern difficulty of finding a wireless internet signal in the dense forest, he was able to tweet this advice:

To paraphrase this bit of spiritual guidance: you may need to wander around a bit, before you find your way.

[1] Original quote from Knutson and Tuleya, Journal of Climate, 2005.

 

How our storytelling nature means we deeply misunderstand the mechanics of fame (and much else…)

Should the Mona Lisa be our most famous painting?

Was Harry Potter destined to (repeatedly) sweep the globe?

What would happen to everyone and everything famous if we ran the experiment that is our world over again?

Find out why fame is truly unpredictable, how it lives and dies entirely in our social stories, and why “… there is no such thing as fate, only the story of fate” in a current Nautilus Magazine piece by the Computational Story Lab’s co-team leader Peter Dodds:

“Homo Narrativus and the Trouble with Fame: We think that fame is deserved. We are wrong.”  

Nautilus is a new, design-driven publication on science published both online (free) and in print (unfree).  The Nautilus team is creating a beautiful showcase for scientific knowledge, and we encourage you to explore everything they have on offer.

nautilus-crowd

How does movement influence your daily happiness?

Imagine commuting an hour to work, one way, grinding through miles of traffic to get from your suburban home to a desk job in the big city. Excited yet?

Ok, now imagine that you lead a life of leisure traveling the world. You fly coast-to-coast to see a concert, soak in some culture, and drink fine wine. Does this lifestyle seem more appealing?

Lets try to quantify the influence of these travel patterns on individual happiness. We do this using geolocated tweets, which we have previously used to reveal the happiness of cities, and to quantify patterns of movement.

Each point corresponds to a geo-located tweet from 2011. (A) USA (B) Washington, D.C. (C) Los Angeles (D) Earth

Each point corresponds to a geo-located tweet from 2011.
(A) USA (B) Washington, D.C. (C) Los Angeles (D) Earth

First, we find the average location of each individual’s tweets. We call this their expected location. Then we draw circles emanating from this spot, like rings on a dart board. Some messages are written close to home, others from very far away.

Then we collect all of the words written at each distance, roughly 500,000 tweets per ring. Averaging the happiness of words found at each distance, remarkably we find that happiness increases logarithmically with distance from expected location. Tweets authored far from home contain a smaller number of negative words.

Tweets are grouped into ten equally populated bins by the distance from their author's average location, and the average happiness of words written at each distance is plotted. Expressed happiness grows logarithmically with distance from home.

Tweets are grouped into ten equally populated bins by the distance from their author’s average location, and the average happiness of words written at each distance is plotted. Expressed happiness grows logarithmically with distance from home.

Home is where the hate is? What? No.

Below we look at the difference between the happiest and saddest distances from home. Words appearing on the right increase the happiness of the 2500km distance relative to the 1km distance. For example, tweets authored far from an individual’s expected location are more likely to contain the positive words `beach’, `new’, `great’, `park’, `restaurant’, `dinner’, `resort’, `coffee’, `lunch’, `cafe’, and `food’, and less likely to contain the negative words `no’, `don’t’, `not’, `hate’, `can’t’, `damn’, and `never’ than tweets posted close to home. Words going against the trend appear on the left, decreasing the happiness of the 2500km distance group relative to the 1km group.

Word shift graph comparing the lowest average word happiness distance group to the words authored farthest from home.

Word shift graph comparing the lowest average word happiness distance group to the words authored farthest from home.

Tweets written close to home are more likely to contain the positive words `me’, `lol’, `love’, `like’, `haha’, `my’, `you’, and `good’. Moving clockwise, the three insets show that the two text sizes are comparable, the biggest contributor to the happiness difference is the decrease in negative words authored by individuals very far from their average location, and the 50 words listed make up roughly 50% of the total difference between the two bags of words. For you visual learning folks, here is a short video explaining how these word shifts work.

Take home story: people tweeting far from home talk about food more, and they swear less than people tweeting close to home. These people are probably enjoying awesome vacations, and tweeting about it!

In summary, if you are a fellow with a daily commute that makes you feel a little bit sad, you are not alone! Try swearing less. Or ride your bike.

If you are lucky enough to travel often, then keep smiling…maybe send the rest of us some pictures to cheer us up!

For more details on our analysis, check our paper “Happiness and the Patterns of Life: A Study of Geolocated Tweets” recently published in Nature Scientific Reports.

Now Published: The Geography of Happiness

Today we’re pleased to announce that our article “The Geography of Happiness: Connecting Twitter sentiment and expression, demographics, and objective characteristics of place” has been officially published by PLoS ONE.  We wanted to tell you about one key piece we’ve added to the paper and an unusual new Twitter account we’ve created.

After our three blog posts (which coincided with the release of the preprint), we received plenty of media attention, as well as some fantastic feedback from readers (thanks!). One very important question kept coming up: “How well does happiness agree with other measures of well-being?”, or more simply: “Why should we believe you?”

Well, we’re glad you asked.  For the final paper, we’ve added a US state-level comparison between our happiness measure and five other kinds of well-being indices:

  • the Behavioral Risk Factor Surveillance Survey (BRFSS)  for which people were asked to rate their life satisfaction on a scale of 1 to 4 (the BRFSS was explored in this Science paper on well-being from a few years back);
  • Gallup’s health survey-based well-being index;
  • the Peace Index, which aggregates various crime data;
  • the America’s Health Ranking, which aggregates health data; and
  • gun violence, specifically the number of shootings per 100,000 people.

In the figure below, we show a series of scatter plots comparing all pairs of well-being metrics  (happiness runs along the top row).  Each dot represents a US state, and the colors represent strength of correlation or agreement between measures, with blue meaning strong agreement, and red representing no (statistically significant) agreement. (We include the exact Spearman correlation coefficienr and p-value in each scatter plot.)

happinessScatterMatrix1

Scatter matrix showing comparison between different well-being metrics for all US states. The top row shows comparisons with happiness. Colors indicate the strength of correlation between pairs of metrics; shades of blue indicate increasingly significant correlation.

Looking at the top row, we can immediately see that happiness agrees with all measures except for the BRFSS. However, the BRFSS itself doesn’t agree with any other measure except for the Gallup well-being index.  The most striking departure was the BRFSS ranking Louisiana as the happiest state whereas our happiness measure placed it last.  There are a number of possible explanations for these disagreements: one is that the BRFSS data was taken between 2005 and 2008, while all other data is from 2011 only; another is that unlike the other measures, happiness is self-reported in the BRFSS. How would you answer if asked how happy you are? Do you expect that your answer is representative of the population you live in at large? There are certainly many different ways to define “happiness”, as a number of different readers have pointed out.

Of course, this is not to criticize the BRFSS (it remains a significant data source, and Oswald & Wu did fine work analyzing it in their Science paper), but merely to suggest that our word happiness score is measuring something different but perhaps complementary to traditional survey-based techniques. There certainly appears to be plenty of value to observing people “in the wild” via social network data, e.g. with the real-time instrument hedonometer.org.

Finally, to celebrate the publication of our article we created a Twitter feed, @geographyofhapp, dedicated to tweeting the happiest and saddest city every day, and we invite you to follow.  We’re hoping that this is the first research article with its own Twitter account, but perhaps not hoping that it represents the future of scientific publishing…

Now online: the Dow Jones Index of Happiness

Total excitement people: our website hedonometer.org has gone live.  We’re measuring Twitter’s happiness in real time.  Please check it out!

If you’re still here, here’s the blurb from the site’s about page:

Happiness: It’s what most people say they want. So how do we know how happy people are? You can’t improve or understand what you can’t measure. In a blow to happiness, we’re very good at measuring economic indices and this means we tend to focus on them. With hedonometer.org we’ve created an instrument that measures the happiness of large populations in real time.

Our hedonometer is based on people’s online expressions, capitalizing on data-rich social media, and we’re measuring how people present themselves to the outside world. For our first version of hedonometer.org, we’re using Twitter as a source but in principle we can expand to any data source in any language. We’ll also be adding an API soon.

So this is just a start – we invite you to explore the Twitter time series, let us know what you think, and follow the daily updates through the hedonometer twitter feed: .

A data-driven study of the patterns of life for 180,000 people

Here at the Computational Story Lab, some of us commute by foot, some by car, and a few deliver themselves by bike, even in the middle of our cold, snowful Vermont winter.  Occasionally, we transport ourselves over very long distances in magic flying tubes with wings to attend conferences, to see family, or for travel.  So what do our movement patterns look like over time?  Are there distinct kinds of movement patterns as we look across populations, or are they variations on a single theme?

Inspired by an analysis of mobile phone data by Marta Gonzalez at MIT, James Bagrow at Northwestern, and colleagues, we used 37 million geotagged tweets to characterize the movement patterns of 180,000 people during their 2011 travels. We used the standard deviation in their position, a.k.a. radius of gyration, as a reflection of their movement. As an example, below we plot a dot for each geotagged tweet we found posted in the San Francisco Bay area, colored by the author’s radius of gyration.

The Bay Area is shown with a dot for each tweet, colored by the radius of gyration of its author.

The Bay Area is shown with a dot for each tweet, colored by the radius of gyration of its author. The color scale is logarithmic, so we can compare people with very different habits.

You can see from the picture that there are many people with a radius near 100km tweeting from downtown San Francisco. This pattern could reflect a concentration of tourists visiting the area, or individuals who live downtown and travel for work or pleasure. Images for New York City, Chicago, and Los Angeles are also quite beautiful.

In the image below, we rotated every individual’s movement pattern so that the origin represents their average location, and the horizontal line heading to the left represents their principle axis (most likely the path from home to work). We also stretched or shrunk the vertical and horizontal axes for each individual, so that everyone could fit on the same picture. Basically, we have a heatmap of collective movement, with each individual in their own intrinsic reference frame.  The immediate good news for these kinds of data-driven studies is that we see a very similar form to those found for mobile phone data sets.  Apart from being a different social signal, Geotagged Tweets also have much better spatial resolution than mobile phone calls which are referenced by the nearest cellphone tower.

Movement pattern exhibited by 180,000 individuals in 2011, as inferred from 37 million geolocated tweets. Colormap shows the probability density in log10. Note that despite the resemblance, this image is neither a nested rainbow horseshoe crab, nor the Mandelbrot set.

Movement pattern exhibited by 180,000 individuals in 2011, as inferred from 37 million geolocated tweets. Colormap shows the probability density in log10. Note that despite the resemblance, this image is neither a nested rainbow horseshoe crab, nor the Mandelbrot set.

Several features of the map reveal interesting patterns. First, the teardrop shape of the contours demonstrates that people travel predominantly along their principle axis, with deviations becoming shorter and less frequent as they move farther away. Second, the appearance of two spatially distinct yellow regions suggests that people spend the vast majority of their time near two locations. We refer to these locations as the work and home locales, where the home locale is centered on the dark red region right of the origin, and the work locale is centered just left of the origin.

Finally, we see a clear horizontal asymmetry indicating the increasingly isotropic variation in movement surrounding the home locale, as compared to the work locale. We suspect this to be a reflection of the tendency to be more familiar with the surroundings of one’s home, and to explore these surroundings in a more social context. The up-down symmetry demonstrates the remarkable consistency of the movement patterns revealed by the data.

We see a clear separation between the most likely and second most likely position.

We see a clear separation between the most likely and second most likely position.

Looking just at the messages posted along the work-home corridor, the distribution is skewed left, with movement from home in a heading opposite work seen to be highly unlikely.

The isotropy ratio shows the change in the probability density's shape as a function of radius.

The isotropy ratio shows the change in the probability density’s shape as a function of radius.

Above we see that individuals who move around a lot have a much larger variation in their positions along their principle axis, exhibiting a less circular pattern of life than people who stay close to home. Remarkably, the isotropy ratio decays logarithmically with radius.

Finally, we grabbed messages from the most prolific tweople, those 300 champions who had posted more than 10,000 geotagged messages in 2011. We received 10% of these messages through our gardenhose feed from Twitter. Below, we plot the times during the week that they post from their most frequently visited location. These folks most likely have the geotag switch on for all messages, and exhibit a very regular routine.

A robust diurnal cycle is observed in the hourly time of day at which statuses are updated, with those from the mode location (black curve) occurring more often than other locations (red curve) in the morning and evening.

A robust diurnal cycle is observed in the hourly time of day at which statuses are updated, with those from the mode location (black curve) occurring more often than other locations (red curve) in the morning and evening.

Peaks in activity are seen in the morning (8-10am) and evening (10pm-midnight), separated by lulls in the afternoon (2-4pm) and overnight (2-4am) hours.  As we and our friend Captain Obvious would expect, people tend to tweet more from their home locale than any other locale (red curve) in the morning and evening.

Bottom line: Despite our seemingly different patterns of life, we are remarkably similar in the way we move around. Our walks are a far cry from random.

Next up: We’ll examine the emotional content of tweets as a function of distance.  Is home where the heart is?

For more details on these results, see our paper Happiness and the Patterns of Life: A Study of Geolocated Tweets.