Learn about some of the DevOps best practices for data with Mark Semenenko, Solutions Engineer at Privitar.

DevOps Best Practices for Data

[0:10] Why do organizations need data for DevOps? 
When organizations are implementing new systems, or building new software applications, or developing new data products, all of them need to be tested. And to do that you need data. Most organizations today, would use production or raw data for this, but that’s really not appropriate for test and development.

[0:40] What are the risks of using raw data for DevOps?
Much of your production or raw data, is going to be of a sensitive nature. Now, this can lead to delays in obtaining that data in the first place, due to having to go through compliance. It can lead to an increased emphasis on security, which means no more using password with the zero for everything. And finally, due to data minimization under the GDPR, if you don’t have the consent of the data subject, it might preclude you from using that data at all.

[1:20] Why isn’t synthetic data appropriate for DevOps?
Synthetic data doesn’t solve the problem either. The real challenge is in creating the scale of data that to test your systems. In generating the complexity that you would find within raw production data, as well as testing things like error conditions, which you might find in raw and production data, but you don’t manage to accurately synthesize in your test data.

[1:48] How can technology reduce the risk of using sensitive data for DevOps?
The simple data masking techniques don’t sufficiently address privacy concerns; which makes it very difficult to implement these in house. Instead, by implementing privacy enhancing technologies, and combining data masking, generalization and Protected Data Domains, you can produce high privacy, high utility data sets, that maintain a realistic data profile.