University tackles big data
Newly founded institute to focus efforts on research, ethics, education, support
In an effort to address large, complicated and quickly expanding data sets that are now the norm in any sort of research, the University is preparing to launch a Big Data Institute in the coming weeks.
The Institute will operate virtually, pulling faculty, facilities and resources from schools all across the University. It will attempt to keep the University competitive with other research universities moving toward big data research said Rick Howitz, the associate vice president for research.
“In the end, I think all universities are going to have to move [in] this direction,” Horwitz said. “All curricula in the long run are going to have to adapt to big data. It’s just like once the Internet got started, everyone eventually had to adapt — the question was whether you’re late.”
Big data, a term often used ambiguously, is actually a catchall to describe as many as four phenomena, Computer Science Department Chair Kevin Skadron said.
“Just sheerly having too much data is one phenomenon,” Skadron said. “Another is that there are many different data sources, where the volume might not actually be that big but because there are so many sources there is a challenging integration problem.”
Skadron also noted that researchers face substantial challenges when processing data coming in at very high rates and when dealing with models that are highly complex. Major uses of big data include weather forecasting, government surveillance and even election forecasting.
“One of the best examples of big data that I’ve seen recently were the columns by Nate Silver in The New York Times,” Skadron said. “He pulled together a lot of different data sources and had an incredibly accurate prediction of the presidential election outcome. He wasn’t just pulling together lots of surveys asking the same questions, he was actually pulling together lots of survey asking different questions, and used historical data, etc.”
The University’s Big Data Institute was established to accomplish four major goals in the field of big data: facilitating research, sponsoring education, providing support and consulting on ethical issues. Each of these areas is critical to consider when using big data effectively.
“If you look across research areas a lot of the faculty are working in today — and I’m not restricting that to science — people are getting interested in what can be learned from larger and larger data sets,” Provost John Simon said. “Secondly, there are more and more jobs out there looking for people with the knowledge and experience on how to use data. One of the distinctive features of this institute are the education programs that will prepare people for the next generation of careers.”
What makes the institute distinct from similar programs around the country is its use of many different disciplines and schools across the University. Research and course offerings are expected to come from the traditional data science areas like statistics and computer science, but also from often passed-by, but equally important areas like philosophy and law.
“While other schools have programs that are very computer oriented, or very statistics oriented, the virtue of our curriculum is that it is very integrated,” Horwitz said. “The presence of all the different schools on one campus allows synergies that other places can’t have.”
Skadron, along with Statistics Prof. Jeff Holt and Engineering Prof. Don Brown have worked together to create a curriculum for a Masters in Data Science and are also in the process of creating a curriculum for an undergraduate minor. Required classes would draw from multiple departments in multiple schools. Skadron said he believed the curriculum would be unlike anything similar offered at other universities.
“There are some other data science programs out there that we have used as guidance, but I think we’re doing something different,” he said. “A lot of other degrees out there simply take a few courses from different departments so that students get some computational background and some statistical background and so on, and they call it data science. We’re trying to really integrate the students’ work across all the disciplines.”
Another major component of the institute is a new Center for Data Ethics. Skadron believes that ethics in big data is the area the University could have a great deal to contribute. The center would consult with faculty doing research at the University, but it would also be able to contribute to the broader national conversation about data ethics.
“What if you have doubts about a certain type of data you are collecting?” Skadron said. “Where do you go for help? A lot of these issues aren’t cut and dry. You can’t go to a lawyer, because a lawyer doesn’t know the answer yet because there isn’t a law yet.”
The institute was born out of a Big Data Summit held at the University in May 2012. The summit pulled together faculty from around the school to discuss how they were experiencing and handling the challenges that come with big data. That led to a consensus that the institute should be pursued.
“[The summit] was astonishing,” Horwitz said. “We tried to get together all the people we thought would be interested and they gave five minute talks — what do you do, what are your challenges and who else does this stuff that you know of. 170 people came. The synergies were spectacular.”
After receiving a large amount of faculty support, the Big Data Institute made its way into the University’s strategic plan, an effort that aims to develop a set of goals and ideas to steer the University into the foreseeable future.
The institute will be officially launched in the next few weeks.