Penn computer scientist Daniel Preotiuc-Pietro spends countless hours checking social media—not to share pictures or anecdotes about his life, but to uncover the concealed meaning behind the actual words people use. His latest research, a collaboration with colleagues at Johns Hopkins University and University College London, sets out to show just how deep the connection runs.
In a study of more than 5,000 Twitter users and more than 10 million tweets, Preotiuc-Pietro, a postdoctoral fellow at the Positive Psychology Center in the School of Arts & Sciences, learned that the language and emotion a person puts forth in those 140 characters correlates closely to his or her income level.
Some results were expected, such as the gender pay gap and more affluence among the elderly. Other results were unforeseen.
For instance, the researchers found that higher earners have more followers, whereas lower earners’ tweets include more URLs. High-income Twitter users more often discuss politics and corporations in their tweets; those in lower income brackets more often use profanity. The researchers report that “perceived Christians”—those who explicitly mention Christianity-related cues—more frequently earn less.
The notion that you can learn something from how someone talks or writes isn’t new, says Preotiuc-Pietro. What is, however, is how he and his colleagues conducted their research.
“No one has studied this at [such a] large scale, and especially with Twitter,” he says. “We’re the first to actually have access to this data.”
They began by looking at the stated occupation of users with public profiles. A classification system in the United Kingdom breaks down jobs into nine categories, ranging from those that do not require formal education to management roles, then filters each category further. The researchers initially pulled 200 users per grouping, but after culling the list to remove uncertainties, ended up with 50 to 150 from each.
From there, they evaluated what these people wrote and their overall place in the social media hierarchy.
“We are all natural language processing researchers. Our specialization is analyzing written text. But we didn’t limit ourselves to text,” Preotiuc-Pietro says. “We also looked at other social media features like number of friends, number of followers, that kind of basic stuff.”
The amount of information to be gleaned from a user’s profile and tweets is nearly endless, Preotiuc-Pietro says, including everything from gender and education level, to personality and perceived religion. Svitlana Volkova, a research assistant at Johns Hopkins who collaborated with Preotiuc-Pietro, says a previous study in which scientists asked whether people understood that their information could be used for research found that 80 percent did. The current work, published recently in the journal PLOS ONE, builds on this concept.
“Even if they are aware that their tweets are used by researchers, they don’t understand that we can tell a lot about them,” Volkova says. “We can predict tons.”
For advertisers and health care insurers or employees looking for their next hire, the researchers say this information could prove priceless. It could produce a positive change for Twitter users, paving the way for a level of personalization far beyond what the site experiences today.