Vinny Troia has become something of a hero. This deep web researcher made a name for himself by exposing security gaps in systems all around the world. A few weeks ago, he discovered something that should never have seen the light.

An unprotected, fully accessible server containing the personal information of 1.2 billion people. The data included social network accounts (Facebook, Twitter, LinkedIn, GitHub…), email addresses, workplaces, and even personal phone numbers.

A total of around 4 terabytes of private information—enough for any cybercriminal to be able to steal the identity of any of the 1.2 billion people affected. In short, this one of the largest ever data leaks of all time from a single source.

Of unknown origin_

As Wired explained, there are many unknowns surrounding the origins of these databases, as well as how they were made public. At first, Vinny Troia thought it could belong to People Data Labs, since the initials PDL were found on the tags on the folders. He also indicated Oxydata as another possible source, due to the OXY tag on some of the folders.

In any case, both companies have stated that they are unaware of having suffered any kind of intrusion like this. They do, however, recognize that there is another possibility: that the databases belong to them, but were stolen from one of their many clients all around the world, many of which are in the United States.

The database has been uploaded to HaveIBeenPwned, the platform run by the researcher Troy Hunt, where users can check whether their personal data has been involved in a data breach. The leak was so far-reaching that Hunt himself claims to have found his own information on one of the servers.

According to Wired, there is still a lot of data yet to be discovered, and one vital question: how did this database get out there? The most likely explanation that has been offered so far is that the information was gained legally through data enrichment companies. The illicit step, in this case, would be the subsequent sharing of this information, as well as the prior acquisition of the database from the system of whoever paid for it legally.

Whatever tactic was used to obtain all this information, which was subsequently shared on the deep web, there is one thing that is clear: the company that owns the servers not only failed to protect its cybersecurity, but also failed to show due diligence when it came to storing 1.2 billion people’s personal data.

How to avoid these leaks_

This was perhaps one of the most striking cases of the past few years, but the fact remains that any company that handles the data and private information of its customers, users, providers or employees runs the risk of suffering a cyberattack that leads to a smaller, yet similar, data breach.


To avoid leaks of this kind, companies that want to protect their cybersecurity must adhere to a set of actions to protect their data:


  1. Different servers. When a company stores so much user information, it must be kept in a disaggregated format on different servers. This way, if a cyberattack manages to gain access to the company, the criminals will only be able to get their hands on part of the information.
  2. Offline servers on differentiated instances If the information is effectively distributed over several servers, the next level of isolation would be to separate their instances and store them on different networks. If a further level of isolation is needed, we can establish intermittent Internet disconnection protocols to hinder any attempted intrusions.
  3. Limited access. Mid- or low-ranking employees must never be able to access the personal information of thousands or even millions of users. These kinds of employees are often the weakest link in a company’s cybersecurity chain.

4. Access and traffic control. There will be cases where some of the measures outlined above cannot be applied. In this case, every server should be submitted to exhaustive, automated and real time control measures in order to constantly analyze their activity, in order to detect any problem before it can cause any damage.

5. Activity control. If we see and register everything that happens both on workstations and servers, we can be aware of the entire course of a possible attack. And this register must include the profiling and classification of all behaviors of files, users, and machines in order to keep track of all normal and suspicious activity.

In fact, this is the same approach used by the managed Zero-Trust Application Service. This service is exclusive to Cytomic, and is included as a feature in all our advanced security solutions. It ensures that only trusted applications and binaries can run on the endpoint. It therefore guarantees that attacks using malware of any kind are simply halted before they can do any damage.

Our complete EDR capacities monitor endpoint activity, and offer continuous, integral and detailed visibility of the behavior of programs, endpoints, and users. This monitoring is the basis of the Zero-Trust Application Service.

Data breaches are still one of the most pressing concerns for organizations, for reputational or financial reasons, or because of worries about providing customers or end users with accountability. Dealing with cyberattacks is not an easy task; they have increased exponentially in complexity, and most of them now change continually to get around the obstacles they come across. This is why it is vital that companies improve their cybersecurity stances, implementing advanced tools and services that get ahead of and mitigate attacks, as well as analyzing them to discover the details and patterns they conceal.