Going Big

Imagine, for a second, that the 120 people gathered in Davis Auditorium at the TEDxUVM lectures last Friday, plus the 544 people watching the live-stream video, were set to work calculating with paper and pencil what the world’s computers can calculate in one second.

How long do you suppose it would take them to do the same calculations? A thousand years? A million years? Not even close. If they had started calculating at the start of the Big Bang, with nary a snack or bathroom break, they wouldn’t be halfway done, if recent estimates in Science are correct.

And the corollary to this vast computing power is the vast amount of information now being produced and stored by our computers, satellites, cell phones and social media. A recent IBM estimate notes that at least ninety percent of all the information created by humanity has been produced in the last two years.

This is “big data,” according to the organizers of the TEDx event, “Big Data, Big Stories,” organized by UVM’s Complex Systems Center. And the effects, insights, problems and potential of all this information — the “big stories” — was the topic of the packed event’s 11 10-minute micro-lectures held at Fletcher Allen Health Care near campus.

Pattern recognition

In these “really data-rich worlds,” said UVM mathematician Peter Dodds — who organized the TEDx event along with his fellow mathematician Chris Danforth, roboticist Josh Bognard, and staffers Andi Elledge and Keri Toksu — something fundamental can change about how scientists do what they do.

When the data set gets big enough and the computers fast enough, Dodds said, instead of starting with a question, “there is a new way to approach things: which is simply that you have to go look for patterns, look for the shades, in these massive data sets.”

For example, in billions of Twitter tweets and blog posts, Dodds and his colleagues have found patterns of language that point to the rise and fall of the world’s mood. This discovery has allowed the scientists to create a near-real-time “hedonometer” — a happiness-measuring tool that can take the emotional pulse of places and groups of people around the world. Wednesday, not Monday, is the nadir of the workweek it seems. And the overall global “happiness signal” dropped, said Isabel Kloumann ’11 (one of Dodds's former students) during her TEDx talk, “corresponding to the time of the London riots breaking out.”

Call a robot

Who the scientist is may be changing too, under the storm of big data. In his talk, Mike Schmidt, a Cornell researcher, threw a clean curve of data points up on the screen and asked the audience to describe the equation that produced that pattern. The mathematically gifted in the group got it easily. “X squared,” someone shouted.

But as subsequent slides of data points got more complex, the audience was stumped. Into this kind of mess — or collections of data millions of times more messy — Schmidt would have researchers deploy a “new kind of artificial intelligence,” he says, “a robotic scientist” to fish out patterns from the seeming chaos.

As one example of this kind of silicon assistant, Schmidt has created a free software tool, Eureqa, that crunches raw experimental results and “distills out the fundamental mathematical properties of your data,” he says, “so that you come away with the model and deeper understanding of that data to help you ask the right questions.”

Ceci n’est pas une fish

If big data and fast computers are creating opportunity for new insights, the higher resolution and detail of the vast ocean of big data is also creating new challenges. Even headaches.

“With all this increased computing we are absolutely drowning in data,” said UVM’s Austin Troy, director of UVM’s Spatial Analysis Laboratory, in his TEDx talk, “and one of the types we are most drowning in is remote sensing data,” like that collected by satellites. Geographic images that once could only be resolved to a 30-meter-wide box labeled “forest,” can now be discerned as a specific maple tree with a broken branch.

In this case, the intelligence of computers can be the limiting factor rather than the breakthrough assistant. To explain, Troy put up side-by-side images of two bearded men. “I can tell within two seconds that that is George Carlin and that is Sigmund Freud,” he said, as the audience laughed. But for Troy to train a computer to recognize the difference would take “unbelievable amounts of time,” he said. Computers may be good at distilling mathematical approximations of data, but they don’t yet hold an old-fashioned candle to the human capacity for finding the gestalt.

In the same way, high-resolution images from space are providing an incredibly rich portrait of the planet — Troy showed a gorgeous image of the huge shadows of camels flowing across African desert and a tacky swimming pool in Nevada shaped like a tropical fish — but computers have been largely unable to pierce this raw data and find the camel or recognize the fish. Teaching computers to not work pixel by pixel, but, instead, to start to recognize the complex interplay of “shape, size, tone, pattern, texture, sight, and association,” Troy said, is one of the biggest challenges of big data.

Our satellites may be able to view every bumper-sticker on the planet and our computers may be able to complete calculations that would foil all of humanity, and yet, looked at another way, our computers are puny. All the computer storage in the world contains less information than is contained in your DNA. In one second, the 120 people gathered at the TEDxUVM lectures fire off as many neural impulses as all the computers in the world can perform operations.

“I need to teach a computer to see objects,” Troy said, “and to think like me.” That could be a while.

No hurricane in a water molecule

But part of the promise of the big data revolution is looking for patterns and interactions that are beyond or alien to the human mind. And in these patterns may be hidden “a way to solve incredibly hard problems that we need to solve,” Peter Dodds said. Looking for the master variable that controls a hurricane's track, a sudden economic collapse, or an ecosystem breakdown is bound to fail, he thinks.

That’s because many of the most important problems we want to understand are driven by complex systems, “where there is no powerful central control,” Dodds said. Instead there are “lots of localized interactions giving rising to macroscopic behavior, and often the macroscopic behavior is disastrous, like crashes in the stock market or ecosystem crashes.”

“There is no hurricane in a water molecule,” he said, “there is no financial collapse in a dollar bill. It’s all in how these thing arise.” And how they arise may yield to the brute and tireless power of a computer in ways that the far-more-powerful and elegant human brain doesn’t take in.

“There are compelling reasons for understanding systems, and the reason we haven’t been able to do so for things like social systems and economics systems,” Dodds said, “is because we haven’t been able to describe them.”

But work like that of Rob Axtell, from George Mason University, is getting closer. His TEDx talk described a model of the U.S. economy with 150 million independent “agents,” each with complex — sometimes irrational (i.e., real world!) — rules of behavior representing individual people. By letting these agents all interact in computer simulations he is seeking a view of the larger macro-economy that emerges from millions of micro decisions.

“What we are finding in the last few years is that some things that we thought were beyond measure — like willpower,” or the economic value of nature, or the timing of “random” terrorist attacks, said UVM robotics expert Josh Bongard, “— are not.”

Getting hotter

Still, the fundamental unpredictability of some aspects of the future, like the weather beyond a few weeks, may be intractable, UVM mathematician and climate modeler Chris Danforth reminded the audience, invoking the great chaos theorist Edward Lorenz.

And yet Danforth — who served as the moderator of the TEDx event — says “the most important big data story,” is the one coming out of the huge pools of information going into long-term climate forecasts. (Remember: it’s very hard to know if it will rain next Tuesday, but very easy to know it will be colder in January than June.)

If “we get it wrong — or we don’t pay attention to what the models are telling us,” Danforth said, “we could end up in big trouble.”

Going Big

Explore

Going Big

Pattern recognition

Call a robot

Ceci n’est pas une fish

No hurricane in a water molecule

Getting hotter

Of Related Interest