As the Ebola epidemic expanded overseas and reached U.S. shores in late September, concern – and misinformation – flew across social media networks.

Twitter lit up with references to “virus” and “death” and descriptions of “scary” and “terrifying.” Some less-worried tweets mentioned “cure” and even “jokes.”

These words became building blocks for Eric Clark’s latest “social contagion” study. A mathematician and doctoral candidate working in the University of Vermont Department of Surgery, Clark is finding ways to use data-processing power and social media to take the public’s temperature on health issues and current events. He works under the mentorship of Assistant Professor of Surgery Christopher Jones, Ph.D., director of the Global Health Economics Unit in the Center for Clinical and Translational Science.

Clark’s primary tool for this “sentiment analysis” is the language-saturated world of Twitter. UVM receives a feed of about 10 percent of all tweets from the network, or about 50 million a day. Clark searches that content for “emotionally-charged” words related to the topic at hand and collects these “bags of words” to scrutinize.

From the time he entered his master’s program in 2012, Clark has worked with Jones and Professors of Mathematics Chris Danforth, Ph.D., and Peter Dodds, Ph.D., in UVM’s Computational Story Lab. He helped the lab’s team develop a hedonometer – a scoring system for words such as “love” and “laughter” to measure levels of happiness.

The Ebola outbreak arose as an ideal subject for Clark because of both its menace and prominence in conversation. He’s studying changes in keyword frequency in places where the virus has encroached – such as Dallas, Tx., where the first U.S. case was diagnosed – and for falsehoods about how the disease is transmitted. He also plans to make comparisons across languages.

“It’s looking at the social contagion of fear, ignorance and maybe the idea that people can be educated through social media,” he says.

Clark has become UVM’s go-to guy for such analysis. He has assisted with projects exploring opinions about electronic cigarettes, in vitro fertilization, federal health care reform and cancer.

“Everyone who thinks they can substantiate their clinical research with Twitter data, they come to us,” says Clark, who enjoys the interdisciplinary nature of his research.

With each project, he must dig out the anomalies that can skew the data. With e-cigarettes, for example, he found 80 to 90 percent of tweets were advertising and not consumer discussion.

“You can have rampant robots, like spamming; other robots pretend to be people,” says Clark, who noticed some that tweeted, “I quit smoking with e-cigarettes. You should try it!” In the case of a current event such as Ebola, many tweets are news reports that he has to extract.

Clark co-developed a bot-detection algorithm that identifies the robots with fellow Ph.D. candidate Jake Williams and is applying it as he compiles the Ebola data. Publication will come down the road, he says, and he’s considering building the Ebola project into his dissertation for his Ph.D.

“With big data, you have to be careful that there aren’t other underlying features that you haven’t accounted for,” he says. “We want to tell a story through the data, and we want to tell the right story.”

PUBLISHED

12-03-2014
Carolyn Shapiro