In academic fields from physics to genetics, researchers rely on computers for everything from data analysis to modeling. One area of scholarship that has gone largely untouched is the humanities, where today’s researchers are far more often hunched over stacks of books than scanning graphs and charts on a screen.
That, however, is changing thanks to people like David Mimno, Information Science. “I’m exploring how to use new statistical methods and data mining methods to give people better ability to explore culture and literature,” says Mimno, who double-majored in classics and computer science as an undergraduate student.
How Can Computational Tools Change the Humanities?
It’s more about providing support, rather than answers. Mimno analogizes computer assistance in the humanities to an archeological field survey. In the same way archeologists conduct a shallow survey of a large area of land in order to find specific excavation sites, Mimno says that statistical models and text mining can identify broad themes and hone in on interesting topics where detailed scholarship could be most useful.
Computers are especially good at analyzing vast amounts of data in a short amount of time. This feature could be especially useful to humanities scholars who want to examine more books than one can read.
“There are opportunities for dealing with more material than we could ever possibly have read,” Mimno says. “If I can represent one percent of the information in 1,000 books, that's not the same as reading 10 books, but it provides a different kind of perspective.”
MALLET, a Tool for Identifying Patterns in Text
Mimno developed a program called MALLET, which can comb through thousands of volumes of text and identify patterns that might not be obvious or might be too time consuming for a reader to glean. Mimno established collaborations with Cornell scholars like Tom P. McEnaney, Comparative Literature. McEnaney is interested in understanding the various references that South American poets make to one another and then determine why they reference some people, but not others. Tools like MALLET are ideal for such research, because the software can pinpoint and then visualize references at a much faster pace than a single reader, or even multiple readers.
Mimno stresses, however, that the statistical models he’s developing are not built to replace humanities research. “Computation can provide insights; it can provide clues; and it can double check whether a theory is plausible,” he says. “But it’s not going to give us any conclusions. Scholarship still needs to be done.”
The Right Tool for the Research Problem
Mimno wants to provide scholars with the right and most accurate tools. In another area of his research, Mimno checks whether specific models work for a given problem or whether the models are misleading. Depending on the question or problem, a researcher might be able to work with a very simple model. At other times, one might need a more complicated, specified model. The challenge for computer scientists is in figuring out which model works best for a specific research need.
“If I can represent one percent of the information in 1,000 books, that's not the same as reading 10 books, but it provides a different kind of perspective.” Mimno Says.
“People in my field love to propose new statistical methods, and it can be overwhelming for people looking at them, wondering which of the five different solutions that very smart, capable people have proposed is the one that makes sense,” Mimno says. “What we’re trying to do is give people a little bit of guidance.”
Text Mining
Mimno takes statistical models developed for text mining and applies them to population genetics problems to test whether each model is appropriate. In some cases, like when applied to diverse ancestral populations, the text mining models work very well. In others, where a genetic population is more homogeneous, the text models aren’t useful.
Although Mimno reaches outside of the text world for this project, the purpose is to come up with a standard methodology to test whether a model is successful in any scenario or field. He and researcher colleagues have found that it’s best to have a testing method that visualizes what the data should look like in relation to actual observations. Mimno’s overarching goal has always been to develop methods that scholars can use when they have particular types of questions, much like computational tools are used in other fields.
Useful Tools, Challenging Questions
“Nobody talks about digital physics. You know that when you have a large collection of data, you analyze it with certain methods,” says Mimno. “I hope that the methods we’re developing for the humanities will become an organic part of scholarship.” Research in both computer science and the humanities could benefit from this interdisciplinary and collaborative interaction, says Mimno. It not only provides scholars with useful tools; it provides computer scientists with good, challenging sets of questions that can propel computational advances.
Today, computer assistance in humanities parallels the advances made in mathematics centuries and decades ago, says Mimno. Advances like calculus were driven by practical problems in physics, chemistry, and astronomy.
“We have the same potential in computation now. There are real problems in computational analysis of text, from historical texts to contemporary texts, such as news articles and social media, that we don’t know how to solve,” he says. These problems are especially complicated since they are embedded in the messiness of language and meaning, he adds.
The complexity of the problems, however, is a driving force for the research. “It’s an incredibly good motivation for doing new and exciting work,” Mimno says. “We know we don’t know how to do it yet, but we know what we want to do.”
With such goals in mind, researchers like Mimno are spearheading the way for computational advances that will change the way we approach the humanities and text in the future.