Skip to main content

Data from Twitter Can Predict a Crypto Coin’s Ascent

Cryptocurrencies are notoriously volatile. But listening carefully to social media chatter can help identify winning short-term investments in crypto, according to a new Yale study carried out as the crypto bubble expanded and finally popped. The methodology in the study, co-authored by Prof. Tauhid Zaman and PhD student Khizar Qureshi, could also be used to translate online buzz into predictions in other domains.

As cryptocurrency soared (and, eventually, collapsed) in the late 2010s and early 2020s, Tauhid Zaman watched countless crypto coins pop into existence and then disappear. There might be a few mentions on social media as they got started, perhaps a brief flash in the public eye, a handful of people getting rich. And then—poof, gone.

“As a scientist, I started to wonder if there was a pattern in all that noise,” says Zaman, an associate professor of operations at Yale SOM. “And I wondered if it was a financially predictable pattern.”

The key with these coins is that long-term investment is not the goal. You catch the wave and then get out.

He and PhD student Khizar Qureshi took to Twitter to monitor how people talked about emerging coins. They found that if you measure the conversation properly it’s possible to identify the coins with the best prospects over the next month.

“The key with these coins is that long-term investment is not the goal,” Zaman says. “You catch the wave and then get out. This is the lesson for trading crypto in general.”

Zaman and Qureshi achieved their results by devising a novel method for distilling hype. People have previously tried to use the raw volume of tweets on a certain topic to predict outcomes, guessing that lots of tweets imply strong future performance. But Twitter restricts the amount of information that can be scraped from its site, which means the raw volume is sometimes too large for anybody to get a meaningful sample; it’s not possible to track millions of tweets a month.

Researchers have alternatively looked into the predictive power of sentiment analysis: is discussion around a given topic favorable? But insider shorthand like #buythedip or #hodl, which both express positive sentiment in the crypto world, tend to elude machine learning analysis, as do memes expressing sentiment one way or another.

What Zaman did instead was develop an “engagement coefficient” based on the number of followers of the accounts posting tweets mentioning a cryptocurrency as well as the number of times that each tweet is liked and, retweeted. These two measures were combined to provide a single number between zero and one that indicated how much people were both talking and hearing about that cryptocurrency over the course of a month. Zaman and Qureshi used this indicator to track a sample of mentions of 48 cryptocurrencies that came on the market between 2019 and 2021, and to make hypothetical month-long investments. Those investments earned a (hypothetical) return of nearly 200%.

This gives you a way to take the overall temperature of a topic by sampling a relatively small amount of data. With just a couple thousand tweets you can look at a crypto coin, a movie, a new brand or product or politician.

“One of the really cool things was that this signal wasn’t monotonic,” Zaman says. The researchers found, unsurprisingly, that if the engagement coefficient for a given coin stays below a certain threshold, then it isn’t worth purchasing that coin. But too much buzz is also a bad sign, he adds: “If the engagement coefficient got really huge, you also wanted to avoid buying the coin.” Very high coefficients seemed to suggest lots of bots engaging with the coin, and a potential pump-and-dump scam in which people were artificially inflating interest among buyers before the coin crashed. There was, he says, a Goldilocks spot where investment made sense.

This insight, Zaman notes, could be useful for regulatory agencies trying to limit fraud. If shortly after a coin is listed on a crypto exchange it begins to create an unusually large amount of buzz, that could be a red flag suggesting the coin is being manipulated.

Zaman says that the engagement coefficient has applications beyond the world of cryptocurrency; in fact, he recently tested it in another domain that is notoriously hard to predict. In a class on social media, he asked his students to test whether the new method could forecast movie performance. They gathered up historical buzz about a handful of movies, then tried to determine which would succeed. The animated Super Mario Bros. Movie was far and away the most talked about and, true to form, is now the highest grossing film of 2023.

“This gives you a way to take the overall temperature of a topic by sampling a relatively small amount of data, and it seems to predict success really well,” he says. “With just a couple thousand tweets you can look at a crypto coin, or a movie, perhaps a new brand or product or politician.”

Department: Research