Data Science Institute hosts third-annual Datapalooza

Event shares research, best practices and revolutionary strategies for big data analytics


The University Data Science Institute hosted the third-annual Datapalooza this past Friday in the Newcomb Ballroom.

Courtesy University of Virginia

The Data Science Institute hosted its annual Datapalooza on Friday in the Newcomb Ballroom. This is the third year that students, professors and researchers gathered for the all-day event to explore the cutting edge of data science. Representatives from the University and keynote lecturers from companies such as The New York Times and Google discussed topics from investment banking and credit assessment to healthcare and physiology. In both presentations and panels, science experts and academics demonstrated the diversity and immense power of data analysis technologies and methods.

After a welcome address by the DSI Director Phil Bourne and Tom Katsouleas, the University’s executive vice president and provost , various speakers discussed novel techniques and applications for delving into large datasets — first examining them and then using technology and quickly and accurately generating results.

“Data science is really powerful and proven, no hype necessary,” said John Elder, a keynote lecturer and founder and chair of Elder Research. “Companies in vastly different fields are producing huge gains by extracting useful information from their data.”

During the “Research Highlights” portion of the day, professors and students from numerous departments discussed the widespread applicability of data science and machine learning. DSI Data Scientist Daniel Mietchen emphasized the need to share data and to effectively use technology in data analysis.

“[We] don’t have to do this alone, but we have to leverage the crowd and open data actually helps with that,” Mietchen said. “We need tools to break down barriers between individual silos.”

Faculty and students went on to describe projects in engineering, commerce, biology and even the humanities. Jason Papin, a professor in the Department of Biomedical Engineering, described the creation of metabolic network models to better predict cell behavior. Other research included improvements on building design to reduce energy consumption, the effects of media on financial decision-making and the localized repercussions of the Holocaust.

The audience likewise reflected a widespread interest in incorporating data science methods across a variety of disciplines. Lydia Fetcho, a technical business analyst in the Dean’s Office of the School of Medicine, expressed enthusiasm for employing statistics to enable researchers to quickly execute their projects and to ensure the protection of participants’ privacy. 

“[The] data will help us make certain things go faster,” Fetcho said. “And hopefully if we can get the process as efficient as possible, it will help us earn grant money, because people will go ‘Wow, U.Va., they do this very well. They get their projects pushed through quickly’... “It’s just exciting. I love coming to these things, all these people with these great ideas, and it makes you happy.”

As the day progressed, panelists touched on modeling social media trends, visualizing data and increasing interpretability and utilizing analytics in healthcare and business. In addition, keynote lecturer Mary Jo Madda, creative strategy manager for Google’s Code Next team, reiterated the importance of data in the American education system.

“Coding is not something fundamentally taught in schools yet,” Madda said. “[Code Next] is focusing on black and Hispanic tech leaders because there is such a death of them. The lack of diversity [in technology fields] is extremely concerning.”

Datapalooza 2017 concluded with a lecture by Chris Wiggins, the New York Times’s chief data scientist and an associate professor of applied physics and applied mathematics at Columbia University. In his address, Wiggins charted the rise and fall of newspaper dissemination over time, and the challenges the internet has introduced to journalism. He mentioned that data science can and must be leveraged if the written word is to survive. He mentioned efforts to capitalize on mass amounts of data to predict subscription rates, to efficiently distribute newspapers and to offer online article recommendations.

“Our data science group develops and deploys machine learning algorithms to assist in newsroom and journalism challenges,” Wiggins said.

According to Wiggins, the purpose of Datapalooza and promoting data science strategies are not solely efficiency and improving production. Instead, the purpose is to take into consideration the people behind the technology and integrating methods to help society as a whole.

related stories