University of Vermont

Diving into Big Data

Faculty go deep in Honors College seminar

Donna Rizzo
Donna Rizzo shows fellow members of the faculty how neural networks and multi-objective optimization algorithms can protect drinking water. It’s Big Data — and the UVM Honors College — at work. (Photo: Joshua Brown)

UVM professor of engineering Donna Rizzo knows how messy problems of groundwater pollution can be brought into a tidy framework. She’s developed mathematical techniques to cleverly probe and connect bucket loads of data collected from, say, scattered well holes dug around a leaking landfill. Her result: detailed 3-D pictures of contamination invisible from the surface — and reliable estimates of where and how fast underground pollutants will flow. This kind of mapping helps to reconcile the competing interests of polluters, residents, and regulators — and it’s one example of the growing power of Big Data.

Of course, big is always a relative term. And one man’s data is another man’s poison. Put the two together — with a capital B and capital D — and you have not just a zeitgeist-y term of this decade, but also a meaty topic for the 11th annual UVM Honors College faculty seminar, "Big Data: Engaging and Critiquing the Production of Knowledge in the Digital Age," where Rizzo was one of the invited speakers.

Data ecosystem

“‘Big data’ signifies massive sets of digital information of unprecedented volume, variety, and velocity, whose potential research applications are felt across disciplines,” notes professor of geography Meghan Cope, who helped to organize and lead the three-day gathering of nearly 30 UVM faculty from throughout the university, including mathematics, anthropology, geology, medicine, history, engineering, libraries, art, computer science and others.

The faculty members take turns around the circle describing some of the large data sets they are studying or have gathered. Some have thousands of data points — others tens of billions. All want to know how this welter of information can be wrestled into meaningful patterns and stories: what can maps of single-nucleotide polymorphisms in human genes tell us about disease risk? What do the locations of consular offices around the world in 1895 tell us about U.S. foreign policy?

This Big Data, it seems, is a lot more than just big data. It is being hailed as a revolution in how we think, “a computational turn in thought” as one of the reading assignments for the attending faculty put it — perhaps creating “destabilizing amounts of knowledge,” or even an overthrow of traditional philosophy in which — deep breath — “computationality might be understood as an ontotheology.” In the beginning, God computed the heavens?

A bit closer to the ground, the Big Data revolution, most observers agree, is about more than exponentially exploding amounts of data. Instead, it’s about what can be done with this data: search, aggregate, and cross-reference it from many sources, in many ways. The U.S. Census gathered huge amounts of data long before Big Data or even computers existed. But what is different now is that data is entering into a highly interconnected “data ecosystem,” that, like a real ecosystem, surrounds people, shaping lives and reshaping itself continuously, with a vast cloud of “genetic sequences, social media interactions, health records, phone logs, government records and other digital traces left by people.”

Danish with your data?

Many of the faculty in the seminar want to probe this interconnected information to benefit humanity. As one example, Thomas Ahern, an epidemiologist in the UVM College of Medicine, describes how he’s using Big Data to seek a better understanding of whether phthalates — a family of chemicals contained in many plastics — are an environmental cause of breast cancer. He’s particularly concerned about extended-release pill coatings, since people taking medicines this way have seventy times more phthalate exposure than the rest of the population.

“If you were to study this in a traditional epidemiological sense you would have to enroll a massive cohort of women in a study,” he says. Then characterize their phthalate exposure and monitor them closely for years. “That’s very expensive in time and money,” he says.

But he discovered a much cheaper Big Data solution in public records in Denmark. Using computers, he’s linking information from the Danish national prescription registry (that identifies every prescription written to any Dane since 1995) with a national drug products database that lists every product in every drug available in the country, with the national civil registry that tracks who is alive, dead, or has left the country, with the national cancer registry to ascertain cases of breast cancer. “You can form a longitudinal cohort of women who are exposed to drugs that have phthalates,” he says, and know which phthalates they were exposed to, in what quantities, for how many years, and which of these women developed breast cancer. By linking these disparate databases, the Big Data approach allows you to do “environmental epidemiology even if you’re locked into a pharmacy,” Ahern says, laughing.

Know data

Other faculty involved in the Big Data seminar are looking for new opportunities to collaborate across disciplines, or to gather and sharpen research tools — like the programming language Python or the supercomputer at the Vermont Advanced Computing Core. For yet others, the seminar raises basic question about epistemology and consciousness. Joseph Acquisto, UVM professor of French, wonders, “how is Big Data changing the ways we think we know what we know?”

And, for some of the attendees, Big Data raises questions about how fundamental dynamics of societies may be changing. “When it comes to information on a network,” wrote UVM mathematics professor Jim Bagrow on his application for the Honors College seminar, “we’re all in this together.” And that togetherness may be changing our senses of self.

“For better or for worse,” Mara Saule, UVM’s chief information officer and dean of the libraries, says in her keynote address to the gathered scholars, “we’re all someone else’s data point.”