The Lexicocalorimeter: Gauging public health through caloric input and output on social media [arXiv] [PLoS ONE]

Sharon E. Alajajian, Jake R. Williams, Andrew J. Reagan, Stephen C. Alajajian, Morgan R. Frank, Lewis Mitchell, Jacob Lahne, Christopher M. Danforth, and Peter Sheridan Dodds

Explore the paper interactively at panometer.org

Abstract

We propose and develop a Lexicocalorimeter, an online, interactive instrument for measuring the "caloric content" of social media and other large-scale texts. We do so by constructing extensive yet improvable tables of food and activity related phrases, and respectively assigning them with estimates of caloric intake and expenditure. We show that for Twitter, our naive measures of "caloric output", "caloric input", and the ratio of these measures—"caloric balance"—are all strong correlates with health and well-being demographics for the contiguous United States. Our caloric balance measure, which outperforms both its constituent quantities, provides a real-time signal reflecting a population's health and has the potential to be used alongside traditional survey data in the development of public policy and collective self-awareness. Because our Lexicocalorimeter is a linear superposition of principled phrase scores, we also show we can move beyond correlations to explore what people talk about in collective detail, and assist in the understanding and explanation of how population-scale conditions vary, a capacity unavailable to black-box type methods.