2.6 million DuoLingo users have scraped data released

An unknown party has released the scraped data of 2.6 million DuoLingo users on a hacking forum. While they offered the data set for sale in January for $1,500, it’s now been released on a new version of the Breached hacking forum for 8 site credits, worth only $2.13.

DuoLingo is an educational platform most famous for its language learning programs. According to a May 2023 press release, DuoLingo has 72.6 million monthly active users.

The scraped data among others contain email addresses, usernames, languages, and which language the users are learning.

screenshot courtesy of FalconFeedsio

The data were scraped from public profile information by using an exposed application programming interface (API). On March 2, a researcher called Ivano Somaini tweeted how one could take advantage of Duolingo’s API to check if an email address is associated with a Duolingo account.

The API allows anyone to run a query by submitting a username or an email address to confirm if it is associated with a valid DuoLingo account. Bleeping Computer has confirmed that this API is still openly available to anyone on the web, even after its abuse was reported to DuoLingo in January.

Such a query by email address will result in JSON formatted data, revealing:

  • Streak – A user’s streak is a measure of how consistently they use Duolingo. A streak starts at zero and increases by one for each day the user completes a lesson.
  • Profile picture – For this field, Duolingo’s API yields a URL with this structure //simg-ssl.duolingo.com/avatar/*******/*******. If you get //simg-ssl.duolingo.com/avatar/default_2 it means there’s no profile picture associated with the email address you’ve queried.
  • Learning languages, XP points and crowns – Duolingo’s API shows which courses the account has enrolled in. XP points and crowns give an idea of the progression on those courses. When you learn on Duolingo, you earn experience points, or XP for short.
  • hasFacebookId – Shows if the profile is associated with a Facebook account (true or false)
  • hasGoogleId – Shows if the profile is associated with a Google account (true or false)
  • id – Probably Duolingo’s user ID.
  • username – Username associated with the Duolingo’s account
  • hasPhoneNumber – Shows if the profile is associated with a phone number (true or false)
  • creationDate – This is a Unix timestamp (epoch time) that appears to show when the account was created.
  • name – The real name associated with the account.
  • Location – User location (unknown if it’s vetted by Duolingo)
  • emailVerified – Shows if the email address associated with the account was checked by Duolingo (true or false).

HaveIbeenPwnd’s (HIBP) Troy Hunt explained how it is possible that practically every one of the email addresses in the DuoLingo data could already be found in the HIBP database. The email addresses the scraper used came from the big melting pot of data breach-land being used to compromise even more of our personal information. By trying millions and millions of addresses, the scraper found 2.6 million matches on DuoLingo.

Troy Hunt added:

“I’m a Duolingo user but because I have a unique email address on every service, I’m not in there”

Even though most of the scraped data is publicly available, it gives cybercriminals yet another chance to correlate more information with a specific email address or name. Affected users should be wary of phishing emails making use of this information. For example, since you are interested in a certain language you might be more likely to fall for an email inviting you to visit a country where that language is spoken.

Protecting yourself from a data breach

There are some actions you can take if you are, or suspect you may have been, the victim of a data breach.

  • Check the vendor’s advice. Every breach is different, so check with the vendor to find out what’s happened, and follow any specific advice they offer.
  • Change your password. You can make a stolen password useless to thieves by changing it. Choose a strong password that you don’t use for anything else. Better yet, let a password manager choose one for you.
  • Enable two-factor authentication (2FA). If you can, use a FIDO2-compliant hardware key, laptop or phone as your second factor. Some forms of two-factor authentication (2FA) can be phished just as easily as a password. 2FA that relies on a FIDO2 device can’t be phished.
  • Watch out for fake vendors. The thieves may contact you posing as the vendor. Check the vendor website to see if they are contacting victims, and verify any contacts using a different communication channel.
  • Take your time. Phishing attacks often impersonate people or brands you know, and use themes that require urgent attention, such as missed deliveries, account suspensions, and security alerts.
  • Set up identity monitoring. Identity monitoring alerts you if your personal information is found being traded illegally online, and helps you recover after.