Unsecured Databases Leak 60 Million Records of Scraped LinkedIn Data
Eight unsecured databases were found leaking approximately 60 million records of LinkedIn user information. While most of the information is publicly available, the databases contain the email addresses of the LinkedIn users.
Approximately two weeks ago, I was contacted by security researcher Sanyam Jain of the GDI foundation about something strange that he was seeing. Jain told BleepingComputer that he kept seeing unsecured databases containing the same LinkedIn data appearing and disappearing from the Internet under different IP addresses.
“According to my analysis the data has been removed every day and loaded on another IP. After some time the database becomes either inaccessible or I can no longer connect to the particular IP, which makes me think it was secured. It is very strange.”
Between all eight databases, there was a combined total of approximately 60 million records that contained what appeared to be scraped public information of LinkedIn users. The total size of all of the 8 DBs is 229 GB, with each database ranging between 25 GB to 32 GB.
As a test, Jain pulled my record from one of the databases and sent it to me for review. The data contained in this record included my LinkedIn profile information, including IDs, profile URLs, work history, education history, location, listed skills, other social profiles, and the last time the profile was updated.
Included in the profile was also my email address that I used when registering my LinkedIn account. It is not known how they gained access to this information as I have always had the LinkedIn privacy setting configured to not publicly display my email address.
After reviewing the data that was sent to me, I found all of the information to be accurate.
In addition to the above public information, each profile also contains what appears to be internal values that describe the type of LinkedIn subscription the user has and whether they utilize a particular email provider. These values are labeled “isProfessional”, “isPersonal”, “isGmail”, “isHotmail”, and “isOutlook”.
While we not able to determine who the database belonged to, we were able to contact Amazon who is hosting the databases for assistance in getting them secured. As of Monday, the databases were secured and are no longer accessible via the Internet.
LinkedIn states it’s not their database
After seeing that the database contained a user’s email addresses and what appeared to be possible internal values, BleepingComputer contacted LinkedIn to see if the database belonged to them.
After they reviewed my sample record, Paul Rockwell, head of Trust & Safety at LinkedIn, told us that this database does not belong to them, but they are aware of third-party databases containing scraped LinkedIn data.
“We are aware of claims of a scraped LinkedIn database. Our investigation indicates that a third-party company exposed a set of data aggregated from LinkedIn public profiles as well as other, non-LinkedIn sources. We have no indication that LinkedIn has been breached.”
When we followed up with questions as to why the databases would contain my email, we were told that in some cases an email address could be public and were provided a link to a privacy page that allows you to configure who can see a profile’s email address.
My settings only allow 1st degree connections to see my email address, so unless the scraper is posing as this type of connection, it is still not known how my email address was included in the database.