Twitter’s Leaky API: Who Does it Impact, Why Does it Matter

Dec 14, 2022

Data about Twitter account holders surfaced again last month on a conspicuous underground stolen data forum. The data, which comprised 5.4 million records, was offered for free, and this wasn’t the first time it had been available. In this post, Intel 471 explores the history behind this data, how it came to leak and who is most at risk.

In July 2022, a threat actor, *hail_s, posted an advertisement on the aforementioned forum offering around 5.4 million active Twitter account records. Later, another batch of data comprising 1.4 million suspended accounts was also released, making for a total of around 6.8 million accounts. The 1.4 million data set was shared privately amongst a few individuals.


*Threat actors names have been substituted.

Twitter Breach Blog 1
The initial July 2022 advertisement for the Twitter data. (Source: Intel 471)

The account data was public and had been scraped except for one of two key pieces of information: an email address or a phone number. Some Twitter users choose to put that information in their bios, but most don’t. Where did those two identifiers come from?

The forum’s administrator claimed the data came as the result of exploitation of an application programming interface (API) vulnerability that had been reported to Twitter in January 2022. Twitter confirmed the administrator’s version of events. Twitter said it received a vulnerability report through its bug bounty program in January 2022 and patched the issue. The vulnerability was introduced during a code update in June 2021 and undermined a key privacy setting.

Twitter users could adjust their privacy settings to prevent other people from searching for their Twitter ID by an email address or a phone number. Twitter IDs are unique and can be matched to someone’s Twitter handle. But the bug meant that people who enabled that privacy setting could be searched for with those identifiers. For everyone else, it still meant that an email address or phone number could be matched to their Twitter account, something they may have not wanted to be associated with publicly. Once matched, the rest of the data from a person’s profile was scraped to complete the records released in the data sets. Twitter said it notified affected account holders.

After the data was initially advertised for sale, the data was repeatedly posted on the original forum and elsewhere. Intel 471 found that the data was shared at least 13 times on underground forums and various Telegram channels.

Who’s at Risk?

Those most at risk are people who sought to have an anonymous Twitter account. Their reasons for wanting an anonymous account could be to distance a parody account from their real identity, protect their personal safety or even shield themselves from government attention.

Anonymous Twitter account holders whose email addresses or phone numbers were linked to their real identities elsewhere and available - perhaps through other data breaches - would be at risk. Twitter acknowledged that it was “particularly mindful of people with pseudonymous accounts who can be targeted by state or other actors.”

For people with Twitter accounts under their real names, the vulnerability could have resulted in a violation of their privacy. For example, Intel 471 noticed one person with a large number of followers whose record in the data leak had a phone number. The phone number wasn’t in the person’s bio and may not have been widely public. For high-profile people that could prove a nuisance and necessitate changing their phone numbers. The same goes for email addresses. At worst, email addresses and phone numbers can be used for a variety of social engineering schemes, ranging from password theft to fraud and more.

While some account holders may have been fine with allowing others to search for them on Twitter using information their searchers already possessed, they may not have wanted an email address or phone number explicitly associated with their Twitter account in a data leak.

By the Numbers

The data set is 6.8 million Twitter accounts, which represents around 2.8% of Twitter’s reported 238 million monetizable active daily users. A look at the records shows that most have an email address rather than a phone number. Nearly 5.3 million records have an email address, while just 187,922 records have a phone number.

It’s not clear why most of the records were matched to email addresses. A brute force approach using phone numbers, which would involve testing every number within a country’s addressable phone number space, may have been too intensive and triggered rate limits on Twitter’s API. Being selective about which email addresses to run through Twitter’s leaky API may have resulted in more positive matches. One way to potentially reveal anonymous Twitter accounts would have been to leverage other leaked data sets where an email or phone number is already definitively linked to an identified social media account, such as Facebook.

In 2021, profiles on 533 million Facebook users were released, including names, Facebook ID numbers, email addresses and phone numbers. The data was scraped between 2018 and 2019 due to a vulnerability in Facebook’s contacts import feature that allowed someone to upload large batches of phone numbers and match them to Facebook profiles, a bug ironically along the same lines as Twitter’s. Meta, Facebook’s owner, was fined in November 2022 over the incident for infringing the European Union’s General Data Protection Regulation.

Cybercriminals and fraudsters often collate data from multiple sources. A breach at one provider can provide a stepping stone for more effective exploitation of issues at other providers. Facebook’s breach is by far not the only one that cybercriminals could have tapped for more effectively taking advantage of Twitter’s vulnerability. But it would be a solid anchor point to start with since Facebook forbids anonymous accounts. If an anonymous Twitter account matches an email address linked with a named Facebook one, the account holder is identified.

Risk of Reusing Identifying Data

Reusing identifying data such as email addresses and phone numbers across multiple services is a risk since those identifiers may be exposed in a breach. But there are ways to mitigate those risks.

One way is to use different email addresses for every online service provider, and there are services that allow people to generate unique email addresses for this purpose. In the Twitter incident, using a unique email address just for Twitter would mean that an attacker wouldn’t have been able to source the address from somewhere else and thus not make a match.

Phone numbers are different. It’s not practical for most people to use different phone numbers for every service although there is a healthy market for SIM cards and burner phones. However, Twitter doesn’t require a phone number, and it now offers multifactor authentication alternatives that do not involve sending a code over SMS. There’s no compelling reason to give Twitter a phone number.

Providing as little information as possible to a service provider is best since breaches are an ever-present problem. This incident shows that re-using an identifier just one time somewhere else where it is linked to a real identity could lead to the unmasking of a Twitter account.