Storylab

Complete data set:

The following directory has a file for each word, comprising of a zipped 2,628 x 10,222 matrix giving the frequency of each labMT word appearing in a tweet with the given word on each day in our dataset: full matrices.

The following directory has a file for each word, each file being a 2,628 length vector with the ambient happiness for that word on each day in our dataset: happiness timeseries.

The words corresponding to the each column of the full matrices are ordered by happiness (see http://hedonometer.org/words.html), and can be downloaded here: labMT-english.csv.

The 2,628 days for each file span September 13, 2008 to November 23, 2015.

If you find this data useful, please cite our paper:

@article{cody2016public,
  title={Public opinion polling with Twitter},
  author={Emily Cody and Andy Reagan and Peter Sheridan Dodds and Christopher M. Danforth},
  arXiv={https://arxiv.org/abs/1608.02024},
  onlineappendices={http://compstorylab.org/share/papers/cody2016a/index.html},
  year={2016},
  citations={0}
}

Public opinion polling with Twitter [arxiv]

Emily M. Cody, Andrew J. Reagan, Peter Sheridan Dodds, Christopher M. Danforth

Complete data set: