NUANCES OF LANGUAGE: Say the words “puppy,” “cocker spaniel,” and “dog” in conversation, and many people can identify the similarities. Type those words into a computer, however, and the machine is unable to make those instant connections. Ellie Pavlick, a fourth-year Ph.D. student studying natural language processing in the Department of Computer and Information Science in the School of Engineering and Applied Science, says this presents a significant challenge to researchers like herself, who are trying to make computers understand language. “A computer can’t tell the difference between the word ‘dog’ and the word ‘puppy’ any better than it can tell the difference between ‘four’ and ‘27.’ They’re just completely different values in terms of how they’re stored,” Pavlick says. “That makes it really hard for it to understand that you’re saying ‘I just got a new puppy’ to mean ‘I just got a new dog’ because as far as it’s concerned, those are totally different things.”
‘COCKER SPANIEL’ VS ‘PUPPY’: Pavlick is working on getting computers to understand when words may mean the same thing, and when they differ. Staying with the examples of “dog,” “puppy,” and “cocker spaniel,” Pavlick notes how they have roughly the same definition, but individually, have different meanings.
STATUS UPDATE: Pavlick was recently named one of a dozen 2016-17 Facebook Fellows, a program which identifies and supports promising computer science and engineering Ph.D. students and awards funding to support their research. This summer, she’ll be interning at Google, where she’ll be working on a project to analyze noun compounds.
WHAT COMPUTERS CAN’T DO: Pavlick says it’s tempting to look at the language computers are able to process and think researchers are close to solving all language identification problems. “You realize the more [a computer] can do, the more people expect it to be able to do, and the more surprised people are when it can’t do something.”
ASK THE CROWD: With her adviser, Assistant Professor Chris Callison-Burch, Pavlick teaches the class, “Crowdsourcing & Human Computation.” As an undergraduate at Johns Hopkins, Pavlick majored in economics, so she has a particular interest in social science research. In one project, students are using crowdsourcing to build a database of gun violence in the United States. “You want computers to be able to understand language chiefly to give you an answer to a question, but they’re so far from that,” Pavlick says. “That doesn’t mean we should have social scientists sit on their hands and wait until we get the systems up to speed. ... Crowdsourcing is a really nice way to bridge that gap.”
NEXT STEPS: Pavlick, who is now based in New York, says Penn has a long history in natural language processing. “There’s a lot more overlap between the Linguistics Department and the cognitive science people in the Computer Science Department and that makes for nice collaborations.”