Pictured above: UChicago students Tarun Arora, Richard Zhang and Manuel Martinez
For Department of Energy national laboratories, the era of machine learning and artificial intelligence (AI) offers new opportunities to solve old problems.
Those problems can range from the operational — finding and updating policies and procedures within documents — to the astronomical: determining neutrino detection events amid the noise of the universe.
Projects like these require months or years of data science work, both to pre-process the data, and to build and train the right models and algorithms to solve the problems.
The national labs partnered with the University of Chicago to focus their combined expertise and strengths on these problems through the university’s Data Science Institute Clinic course. Undergraduate and graduate students work in teams as data scientists on problems from industry, academia, and social impact organizations. The UChicago students gained real-world experience, and the labs benefited from the students’ knowledge and expertise in areas of need.
This academic year, two teams worked with Argonne National Laboratory and Fermi National Accelerator Laboratory to mine big data sets and solve specific scientific and organizational problems.
“We provide a vehicle where partners can come together with students and access their expertise and talents,” said Nick Ross, director of the clinic. “And the national labs have been great partners. Their projects are challenging, with interesting features, and they are great mentors for our students.”
The projects were funded by UChicago’s Joint Task Force Initiative, which helps Argonne and Fermilab achieve mission success by opening channels of frequent communication and collaboration across institutions.
“The core interest of many of our students is on direct research,” said David Uminsky, executive director of the Data Science Institute. “There’s no better partner serving some of the best problems from science to operations than the national labs. Long term, this past year has really proven out a successful model of engagement that really does accelerate the bridges between our two institutions.”
An easier way to find information in documents
Argonne came to the DSI clinic with a huge challenge: All the lab’s policies and procedures are contained in thousands of documents, most of them PDFs. This organizational system isn’t ideal, especially when the DOE or US Congress decides to change a policy or procedure. Identifying which documents are impacted and therefore need to be edited to reflect these changes is very difficult for staff.
“To make changes, you need to understand all of the content in the documents and how they are related to one another,” said Matthew Dearing, the technical lead for Argonne’s Artificial Intelligence for Operations (AIOps) initiative. “And that’s the problem we brought to students.”
The students were tasked with creating a mathematical model called a knowledge graph that would map the policies and procedures and how each document connects with the others. Ultimately, Argonne wants to use this graph to create a chatbot that staff can query for answers to policy and procedure questions.
The students extracted the text from the documents, pre-processed the data, and then used that processed data to train models to make the graph. Because they worked with text, they used natural language processing tools — a useful set of machine learning techniques that allows computers to comprehend human language.
“It is a really good project to give students intuition on how to work with any natural language processing tools,” said Rahim Rasool, the student’s mentor and a data scientist with the Data Science Institute. “It’s a great project because it’s a small contribution to a big vision of using AI to model all these documents.”
For Soren Dunn, an undergraduate student majoring in statistics, data science, and chemistry, the project gave him a good foundation for a career in data science. Not only does it involve understanding data — it also involves creating a framework and communicating your results. “I now understand the importance of having clean data, of developing a reproducible pipeline,” he said. “And we presented to the COO of Argonne, which was fun.”
Dearing said working with the students has been “fantastic” and he has been impressed by the progress they have made each quarter. He and his team at Argonne hope to eventually start testing how they can integrate the knowledge graph with a chatbot interface.
“While Argonne has a ton of collaboration on the science side, we on the operations side don’t have as much direct collaboration with the University of Chicago,” Dearing said. “This has been a great opportunity for us, and we hope we can eventually share this project throughout the DOE complex.”
Determining neutrino events
At Fermilab, the MicroBooNE experiment is working to detect neutrinos — subatomic particles that interact weakly with matter, making them extremely hard to detect.
The lab’s sensitive liquid argon neutrino detectors often also record several non-neutrino events, such as cosmic ray interactions. Scientists at the lab wondered if there was a way to use AI to determine which events were neutrino interactions in real-time, so they wouldn’t need to store the data from the non-neutrino events.
The UChicago team worked with Fermilab scientists to implement a machine learning method that could take in raw experiment data and determine neutrino events. Much of the work involved processing the data (including nearly 9,000 detection events) and figuring out the best algorithms to use to ultimately train a convolutional neural network to find the correct events.
“It’s a tricky problem, because we understand the physics and can simulate these interactions well, but it’s hard to write an algorithm that can identify whether a pattern is a neutrino event or not,” said Peter Lu, the team’s mentor and an Eric and Wendy Schmidt AI in Science Postdoctoral Fellow.
Tarun Arora, a master’s student in computer science, said the best part of the project was working on a real-world problem. Each week he would learn theory in his machine learning course, and then immediately be able to apply it on this project. “It was fascinating to see how data science can be applied to scientific data from a national laboratory,” he said.
Manuel Martinez, a master’s student in computational analysis and public policy, said his background in economics research hadn’t given him this level of experience in scientific computing. “Just learning how the big labs do their work has been really interesting from a computing point of view,” he said.
And because the project involves students from across disciplines and from different groups of students across quarter, “The most important part is communicating with the group and the project mentors,” said Mingyan Wang, an undergraduate student in data science and computer science. “It’s really complicated, and we all have to understand the data set, so community is key.”
Fermilab scientists said the students were key in every step of the process. “Even before we started deploying the algorithm, we had to pre-process the data, and the students had a lot of expertise in that,” said Meghna Bhattacharya, a postdoctoral associate at Fermilab.
Often, the students brought a new perspective that the scientists hadn’t considered, said Michael Kirby, a Fermilab scientist. “As particle physicists, we have kind of worked ourselves into a default mode of operation, so it was great having really smart students who are energized about the problem and have their own wonderful, different perspective.”
Students in future DSI clinic courses will continue to work on both laboratory projects, and the lab project leaders are happy with the progress so far. “We still have a ways to go to make the network fast enough and performative enough to be used in real time, but we feel good about the path forward,” Kirby said.