Twitter can predict rates of heart disease

Twitter has helped to launch or torpedo careers, bring cute animal pictures to millions, and document social movements from the ground up.

Now, the social media site can be considered an accurate predictor of rates of atherosclerotic heart disease in communities across the United States.

In a paper published in Psychological Science, researchers at Penn have demonstrated that Twitter can capture more information about heart disease risk than many traditional factors combined, including levels of income and education, and rates of obesity, smoking, and hypertension.

Language in tweets expressing negative social relationships, disengagement, and negative emotions indicated a higher risk for heart disease, while language expressing positive emotions and engagement were protective against the disease.

“What’s not at all obvious is that this should work, because the people who are tweeting are not the people dying of heart disease,” says Johannes Eichstaedt, a Ph.D. student in the Department of Psychology in the School of Arts & Sciences, and lead author on the paper. “We’re not capturing individuals who are dying from heart disease at all. We’re capturing people in the communities where people are dying of heart disease—which is every community. Twitter is a canary for a community.”

Researchers have previously thought that the well being of communities is important for physical health, but this has proven difficult—and expensive—to study. Twitter solves that conundrum, Eichstaedt says, since tweets are public, and samples are large. The researchers’ data set consisted of public tweets between 2009 and 2010 from 1,347 U.S. counties, which are home to more than 88 percent of the country’s population.

While Eichstaedt says that there are biases built into Twitter—namely, that the median age on the social media site is lower than the median age of the United States—they do not prove to be insurmountable problems. Because the data sets were large enough, the researchers were able to easily account for the biases and still have a sample that was representative of communities.

There is no simple way to measure people’s inner emotional lives, so the researchers drew on traditions in psychological research that glean this information from words people use when speaking or writing.

“Imagine you go through a town that you don’t know and you get a recording of all the traffic signs, the road signs, the menus, and then you start getting conversations between people, even though you might not know anything about the town. It would tell you a lot,” says Eichstaedt. “Any single tweet probably has close to zero information, but that times a billion is a lot.”

The team used three different approaches to analyze the language from the tweets, including an automatic process to measure the frequency of words and phrases, and compared these findings to data on heart disease deaths available from public health sources around the country.

The team included H. Andrew Schwartz, a visiting assistant professor in the School of Engineering and Applied Science; Margaret Kern, an assistant professor at the University of Melbourne, Australia; Gregory Park, a postdoctoral fellow in the School of Arts & Science’s Department of Psychology; and director Martin Seligman, both of the Positive Psychology Center, as well as Lyle Ungar, a professor of computer and information science.

“The most basic idea is to have the words come out of the data, to have the data tell its story, not come up with any theories of what any words mean,” says Eichstaedt. “The name of this whole game is signal-to-noise, so it’s not that there’s no noise, you just hope that you get more signal than noise.”

Negative emotional language, which includes words like “hate,” expletives, and things that indicate boredom or fatigue, are strongly correlated with heart disease risk, even  when variables like income and education are taken into consideration. Conversely, optimism, positive experiences, and words associated with skilled occupations such as “management,” “learning,” and “conference” may be protective against heart disease.

“Having a reason to get up in the morning is protective from heart disease. It’s the hostility that kills you,” says Eichstaedt.

The findings fit into existing sociological research that suggests that the combined characteristics of communities can be more predictive of physical health than the report of a single person.

“What Twitter is picking up on is that level of community context—how do people behave given stressors, how much anger there is in the community,” says Eichstaedt.

Currently, Eichstaedt and the team are working on a similar study for the 15 leading causes of death in the United States. Eichstaedt himself is examining depression at the individual level using Facebook statuses.

He expects this study will receive much attention and scrutiny, since unlike other papers about social media and disease, they are attempting to interpret the meaning behind tweets.

“We’re saying there are psychological statements to be made here, and this is a tool for psychology and sociology and for the human sciences, not just for the data sciences,” Eichstaedt says. “That’s an interesting claim that hasn’t been made often before.”

Twitter Heart Disease