The British royal family safeguards its crown jewels—the treasures held by the English monarchy comprising more than 100 objects estimated to be worth billions—with world-class security systems, including two-ton steel doors and bomb-proof glass. When the crown jewels are transported or moved, such as for a royal wedding or the recent coronation ceremony, they are guarded closely by elite, ex-military members at all times—with access restricted to only a few security-cleared individuals.
These anecdotes do not convey a love for the royal family but rather illustrate that security and access to the crown jewels is an immediate and constant concern. The same needs to be true for your organization’s “crown jewels”—the production data enabling you to conduct your mission-critical, day-to-day business tasks and processes, including the core applications that the business uses daily.
In many cases, organizations will typically create staging environments enabling developer teams to write code and run tests to ensure the applications they ship are high quality before deploying them to customers. However, these test environments can present risks when organizations use their “crown jewel” production data in them.
Real-world data containing customer information—such as Social Security numbers, financial information, addresses, and more—or other proprietary data is core to how a business operates.
Using this production data in test environments unnecessarily exposes it to new risks of being transferred, stolen, or exposed.
If we were to use the royal treasure analogy again, this would be like allowing a mall security guard to move the crown jewels from one location to another.
So, what is the most effective way to avoid the risk of exposing your critical data while still being able to build high-quality applications effectively? By using dummy data, dummy.
A company should rarely need to use real production data to develop applications. Instead, they simply need to use information that looks, acts, and is processed like production data. Dummy data is safer because it eliminates the risk of customer information being inappropriately accessed.
One of the most common ways companies obtain dummy data for use in their testing environment is through test data generator services or API mocking tools. These services automatically create information you can use in your test environments. And while that sounds easy, there is usually a cost associated, and you will still need to manually create and input a sample data set for the generator to work. If you are interested in creating dummy data in-house, there are also methods for companies to simulate statistics in their databases to generate this type of data. By manually inputting fake information into empty databases in your system, you can generate your data for use in test environments.
Unfortunately, using production data in staging environments is still common because it is easier and less expensive than creating this dummy data. Because production data is often considered an organization’s most important asset, it can be shocking when security-forward firms still choose to use production data in test environments. Despite all the warnings and enterprise security risks, your organization may still need production data to build an application at the lowest cost and in the shortest timeframe.
Should you absolutely need to transfer and use production data in its test environments, there are a handful of best practices to consider. The first is leveraging industry-recommended data encryption. When considering the best method to use, many organizations turn to the National Institute of Standards and Technology (NIST)-approved Format-Preserving Encryption to de-identify their production data.
Alternatively, obfuscating or masking data is another technique that organizations should consider. The process creates data that is structurally similar, but is not the same as the original data. While the goal of data masking or obfuscation is to create a version of the original information that cannot be deciphered or reverse-engineered, risk, though slim, does still exist.
It’s critical to reduce the risk of exposing production data whenever possible. The potential consequences of not doing so are too great.