Skip to content

Episode 8: The Path to Modern Data Provisioning

Jason du Preez, CEO of Privitar, joins the show to discuss modern data provisioning and how it fits into the larger picture of data privacy.

Listen now

Speakers

Nick Cucuru

VP of Advisory Services at Privitar

Jason du Preez

Chief Executive Officer at Privitar

Transcript

00:01

Jason: We now know how powerful data can be in terms of driving insight and prediction. But as you say, we mustn’t forget that those data points that we feed into these algorithms relate to people.

00:12

Nick: Welcome to InConfidence, the podcast for data ops leaders. In each episode, we ask thought leaders and futurists to break down the topics and trends concerning it and data professionals today, and to give us their take on what the data landscape will look like tomorrow. Let’s join the data conversation. I’m Nick Curcuru, and this is the Inconfidence podcast sponsored by Privitar. InConfidence is a community of data practitioners, and they encourage us to have conversations that will enlighten, educate and inform data leaders of today and tomorrow. Thank you for taking the time for Inconfidence to be a part of your day. Today joining our community is Jason du Preez, Jason is the CEO of Privitar. And for over six years, he has been spreading the good word on the importance of data privacy. Today, we get to hear straight from one of our privacy pioneers, who has an instill shaping the market. In this episode, we will hear how his commitment to privacy motivated him to start Privitar together with co-founder John Taysom. How the world’s perspective on privacy has evolved from those early years, and the trend shaping its future. Jason, welcome to the InConfidence community. 

Jason: Great, Nick, wonderful to be here. 

Nick: Ok, Jason, let’s get right into it. Let’s talk about you know, that commitment that you started to make to privacy, how you got there, and how you actually started to think about creating prototypes and why. So just a little bit about that founding story for us? Yeah, sure. Well, I’ve always worked in software and technology. And in 2013, I was working at Thomson Reuters, a company that was effectively monetizing data and data platforms. And I was cognizant of the fact that data platforms were rapidly evolving, there was a massive disruption and transformation happening in the market. And in particular, at the time, we were starting to see the realization of this promise of machine intelligence, machine learning, we’re starting to realize that promise, and you’re really at the time machine learning theory was a decade and a half old. And you know, it’s seen this massive reduction in the cost of computers. So what was the big change at the time? Well, it was really this explosion of data, both human generated and collected. And that was fueling the machine and bringing it all to life. And that’s what made this whole thing. So timely, the sudden, sudden, massive explosion of data and in particular, what we call big data, which is actually for the most part data about people. And with the emergence of big data, the importance of data driven decision making became increasingly more obvious. We’re suddenly seeing the power of big data in terms of making predictions. And then alongside this data imperative, you know, which was this massive force, we were also cognizant of the, I guess, the slippery slope that we were in, with many firms, particularly technology firms abusing and exploiting that data collection. My co-founder, John Taysom, had actually published a paper in 2012, outlining, you know, what he perceived as the many perils associated with this power imbalance between technology firms, governments and data subjects. And he called in a paper for better ways to leverage data, but to do so in a responsible way with respect for data subjects. And so, serendipitously, we were brought together by a mutual friend, and it started to come together. We saw these two major tectonic forces, digital transformation, and, of course, the burgeoning appreciation of the importance of responsible data use. And, you know, we put our heads together and incorporated Privitar. And that was the beginning of the story back in 2014. I really liked the way that you know, when you talk about this, that you made the connection between zeros and ones, which is data, and people. And I think that’s an important connection that I think even today, when I hear you talk, whether it’s to an executive or to a data engineer, it’s that perspective, making it personal as you work through this. 

Nick: So when you think of that start, you know, what were some of those early conversations like in talking to people about that thought process of zeros and ones are not just zeros and ones anymore? They are actually people? How were those early conversations that you had and what were some of those things that you had to overcome to get people to get into that mindset?

05:12

Jason: I think this is such an important point, Nick. And, you know, it speaks to that the whole reason,

Jason: You know, behind the importance of privacy, we know. Now, I mean, so much as has happened over the past half a decade or so. But we now know how powerful data can be in terms of driving insight and prediction. But as you say, we mustn’t forget that those data points that we feed into these algorithms relate to people. And people, those are those are real human beings, with lives with a right to self determination. And so privacy is really important here. I think the important point is that when we are observed, you know, when we are watched, our freedom of expression, our ability to explore ideas, these things are actually quite significantly inhibited and affect our ability to reach conclusions. You know, often, if you’re in a conversation, you might explore ideas you might move from, from one perspective, or, you know, one opinion to another, that’s part of the debate, part of the discussion process. And, personally, I believe that we all need space for that to happen, we need the space for freedom of thought, and expression, without observation. And that includes observation from big technology companies, or governments or any of the other brands that, you know, that we engage with in our daily lives. And so building mechanisms into the fabric of the technology infrastructure that is used to deliver these services feels like an essential, essential piece in a free society. And I guess, if we go back to when we started, you know, one of the biggest challenges that we had is just the general lack of awareness of privacy as a concept. You know, I think that has changed very quickly over the years. But at the time, these were very nascent concepts, these were very early ideas, and particularly, you know, a lot of privacy concepts. And ideas originated in Europe, particularly across the pond in the US, this was really quite an alien concept. But that changed pretty quickly. And we’ve seen the temperature rise have a very significant impact on privacy and privacy topics over the last six, seven years, as I’m sure you can appreciate. Yeah, I mean, the big thing was, is when you talk about those really conversations, I think part of the challenge that we’ve always we’ve seen in the last five or six years is that discussion around well, there’s data security, you know, locking it down, or or maybe cybersecurity was a good term five or six years ago, or five years ago. But then privacy came in. 

Nick: Can you just talk a little bit about how, even though they’re similar, there are nuances that make privacy different than let’s say, data security, or cyber security? 

Jason: Yeah, I think I think in simple terms, and, you know, there is quite an important distinction. And we can make that distinction, I think the two do work very intrinsically, together. And particularly, as we as we start to look at the world of data and data use, really the to go hand in glove and data security and data privacy are kind of intrinsic concepts that need to work very tightly, closely coupled together, in what we would call today that the modern data stack. But at the highest level, I think, you know, we can think of data security controls as controls that prevent or protect information, sensitive data, and so on, from the outside. So it’s all about preventing unauthorized access. And privacy, on the other hand, is really mostly concerned with the risks that occur when you have authorized access. So if you know, what is the sensitive information that can be revealed to the authorized user. And so and so we need to combine these two concepts, we need to make sure that first of all, we’ve authorized appropriately access to data and we only have the right people or the right groups with that access. And then secondarily, we need to look at the privacy components and say, you know, even though this person has access to this information, really what do they need from this data? What is the insight that they’re looking for, and ensure that we expose no more than is required for that particular operation? So, you know, if we’re building a predictive model over a large consumer data sets, we don’t need to expose identifying characteristics, potentially, of the individuals in that data set. So how do we and this is what we mean by data minimization principles? How do we expose only the least amount of information that is required in a particular context? And that’s really what privacy is most concerned about, if we can differentiate between the data that is useful in the context, and the data that we want to expressly permit from, from not being visible in a context. So if we can differentiate between those two things, then we’re really talking about a privacy problem. Obviously, if the data that I need in a particular context or use case is also the data that I’m concerned about, then then we’re looking at more of a data security problem, if that makes sense. But it does. And what I like about that is, you know, most people, or most organizations, and hopefully, you have an opinion on this, take it from a data security side, where they’re just locking down data and information, you know, stopping that access, which really limits the ability for someone to use that data to find out or create insights is what you’re describing, right? What is the context? What is the purpose? You know, how do we utilize the data and information that we have, in order to drive better insights or to better service a customer or to, you know, do things that we want to want to be able to have to do, and I think, whenever I’ve heard you speak in the past is that set distinction between, we’ve got to start to unlock some of that data in a safe manner. And I think that’s been a big part of the evolution I’ve heard from you is, hey, you know, stop the locking down only using 35% of the data is what I’ve heard the stats, and let’s start to utilize it other than others. And I think that’s an important point. Like, I think privacy is a concept. I mean, if you look at it from a range of perspectives, and I said, you know, when we started, you know, it was an alien concept of burgeoning concept to many, I think, what’s happened since then, is we’ve seen, you know, a lot more visibility around data breaches, like Cambridge Analytica, which is the largest largest privacy breach, but we’ve also seen privacy champions and you know, people like Tim Cook, the CEO at Apple stand up and say, that we can, we can do great things with data without infringing on on privacy, the regulations that have come in, and obviously, the GDPR being the sort of gold standard here, what these regulations aim to achieve is not to stop or hinder the use of data in a digital society, really, they’re there to enshrine best practice, and, and principles and, you know, with the, with the express purpose, actually, of reducing friction in data usage, and as you say, you know, enabling us to actually unlock deeper insights and create less friction in the exchange of information in this ecosystem. And so, I think that, you know, yes, absolutely, in simplistic terms, you know, security control can be pretty binary, you either see the data or you don’t, and so, you know, historically, then, you know, that drives a tendency, when it comes to, you know, particularly sensitive or personal datasets to want to lock those down, to play it safe, you know, to lean towards a conservative perspective. And so, you know, access control, in that sense, can be a little coarse grained, whereas if we actually say, you know, every context is different, each use case we need to evaluate. And this is a risk management exercise. So, you know, what is the benefit? And what is the risk? And how do we balance these two things out, and then what privacy controls can we use to mitigate those risks, and we might say, in a particular context, that we, you know, we can only work with aggregate data, for example, or in a different context, that we have to send our code to the data, we can’t get the data at all, we have to actually do the Senate in a fully federated way. Each of these controls, each of these mechanisms, allow us to open up or expose additional data to our, to our analytics, to our machine learning, and really broaden the landscape of information that we have available to these data driven decision making processes. So it’s very powerful in that sense, right? 

Nick: We really, you know, our goal is to make all data available in some way. That doesn’t mean you know, in the raw identifiable, it just means available to a process in order to leverage that information to create value, if I may, in building that and doing that, what are some of the companies or some of

15:00

Nick: So best practices or, or even maybe some of the pitfalls that you have seen organizations go through because I know you’ve, you’ve worked with a lot of organizations and trying to build that type of culture with that type of dangerous decision in with privacy as part of their privacy by design, what are some of those those tips or tricks or whatever you want to call it best practices that you could share? 

Jason: Yeah. It’s been really interesting. I mean, we’ve certainly seen the landscape evolve very rapidly in recent years. And, you know, one can’t have this discussion without referencing the incredible impact of the pandemic. And, you know, so at the very highest level, there’s been a huge amount of transformation and change. And I think some very valuable, valuable lessons learned over the past couple of years, I think, certainly, you know, it’s no secret digital transformation has has accelerated through the pandemic. But I think one of the most important lessons that we’ve learned is really an underscore reinforcement on the importance of data. And, you know, it kind of goes without saying data privacy is of no interest if data is of no interest, right. So, you know, first and foremost, we’ve got to look at this landscape and agree that that data is important. And the moment data becomes important, the moment we make a commitment to leveraging data to derive insight, then data privacy becomes really important. And you know, if we use the pandemic, as an example, you know, data has been one of the most important things, right, I mean, understanding the impact of what was happening, planning, the response, development of therapeutics, the development of the vaccines, continued planning exercises, you know, all of these things have been fueled and driven by data. And I think, for many countries, for many health services. At the early onset of the crisis, there was this realization that, you know, data wasn’t available in the way that it needed to be right, there was, there was a lot of sand in the gears, and, you know, there were some, some quite sort of awkward starts to this process. But, you know, by and large, the need drove very rapid transformation and change, and we’re in much better shape. But what it taught us is that, really, this needs to be the new normal, I mean, we need data available in this way to make decisions. And, you know, in parallel, we also learned that, you know, that the respect for the data subject, and people’s expectations, in terms of privacy, are also super important. You know, we learned that lesson the hard way with, with contact tracing, but I think, you know, the lesson that we need to carry through this, this, you know, as we accept the pandemic is really, it’s not good enough to just make that data available, we need to ensure that the way we’ve done that the methods, we’ve deployed, the way we’ve protected the data, people’s private and sensitive information, that is going to come under a ton of scrutiny. And if we haven’t done the right thing, then, you know, again, we see backlash when people see people withdrawing consent, withdrawing their data from the ecosystem, which just sets us back years and years in terms of progress. So I think, you know, the big lesson that we’ve learned through the pandemic is really, data is so important, it needs to be there, we need to have that, you know, friction, free, efficient access to information. But we also need to, you know, ensure that the way we protect privacy, and the responsible use of data, it’s just intrinsic in these systems, it’s no longer a luxury, it just, it’s a must have, it’s a need to have, and it just needs to be built and part of the infrastructure going forward.

Nick: Are we having Privitar done with health organizations through the pandemic? Is that what you just described, making that data safe and available to share? And I think the last part was when you talk about that trust, if you can prove that you are safely and doing the right thing with the data in order to you know, help the larger population, the trust starts to come with because then people are able to see some of those things and, and building that trust is an extremely important thing when it comes to privacy. Would you agree with that?

19:47

19:47

Jason: Absolutely. I mean, the bottom line is that and I think I think we can, we can, we can save the fact that it’s universally recognized now that you know, we need to maintain trust with our customers. As with consumers, with patience, whatever it might be, and that’s critical to long, long term success, you know, the respect for, for data subjects and individual privacy just needs to be intrinsic and and has become intrinsic to to digital transformation initiatives and winners long term will make have made and will continue to make strong commitments and investments in privacy and security and ensure that those are bedded systematically into infrastructure. You know, it’s what we’re starting to call modern data provisioning. So you talk about modern data provisioning. And this is the evolution of the trend that you’re starting to see. Can you kind of let the audience know a little bit about what modern data provisioning is, and you know what it’s going to take to get an organization? So they are achieving this next evolution within privacy as we look at it? So yeah, so I think, you know, first of all, in context, I think we need to accept that every every business today is in some way, shape, or form as a data business, regardless of industry, or vertical, whether it’s retail, hospitality, or healthcare. Data is key to organizations and is really the lifeblood of digital transformation, which I would argue in internals is really critical for organizations survival in today’s world. So. So we accepted that organizations are increasingly data driven. In parallel data protection, or data privacy has risen in prominence, as, as we’ve discussed, to both ensure the free flow of data in the digital ecosystem, and also in defense of civil liberties. So we’ve got these two major tectonic forces, digital transformation, data privacy and data protection colliding. And the modern enterprise data stack, you know, has emerged in that collision. And against that landscape to enable organizations to effectively use data and leverage data as an asset. So it’s a function of those things. And that stack is broader than just data privacy, it means ensuring data comes direct from source systems to ensure consistency at the business layer, avoiding the need for this complex web of data warehouses and unsupportable derived data and calculated fields. That is the legacy of in many organizations, it means ensuring that data is easy to discover and gain access to further data users in the organization. And indeed, you know, even outside the organization, I think, we’re gonna see data sharing as a key feature of this landscape as well. And the idea here is that these consumers of data are the ones that create value, right? So if we want to turn data from a liability to an asset, historically, we’ve seen data as a liability. Really, we’ve locked it away in underlying systems, we’re concerned about breach, we’re concerned about fines. If we want to turn that data into an asset, that means making it more broadly available and the more consumers that get access to that information, the more value we can create. And this means ensuring that as data is used, right, and new data is created, as machine learning models are launched to drive decision making. And we need to ensure while that’s going on, that the complex web of rules and regulations that we need to abide by are enforced, done so in a very efficient, transparent way. And importantly, not just enforced, but also evidenced so that we can, you know, show that these controls have been applied appropriately, for later scrutiny, by the regulator, and, and so on. So, at a high level, you know, we accept that data protection means nothing if it’s not effectively operationalized. And it needs to be intrinsic to the modern enterprise data stack. So how do we bring this ecosystem together? You know, if you imagine these workflows, we have the data consumer, getting access to data, creating value with data, creating a return on data as an asset. We’ve got the data guardians who are responsible for the application of policies for interpreting laws and regulations and ensuring that they’re appropriately interpreted and applied within an organization. And of course, we’ve got the data owners responsible for curation often carrying the risk of those data sets. And then this, you know, really quite complex interplay between all of these groups and in modern data provisioning is really the combination of people process and technology required to enable responsible data use across that ecosystem. I like that approach, because when you think about it, each one of those the guardians, the owners, and the consumers, in many cases in the last two or three years have still been working kind of in silos. So this whole modern data provisioning trend is bringing those groups together. But also there, what I’m hearing you say is making sure that what’s important to each one of those groups is actually addressed. And it’s the stack that helps address that whether it’s I want to find the data if I’m a consumer, how are we protecting that data and being able to prove or protecting that data to the Guardian, right? And then there’s the data owner who’s saying, Yeah, I’m responsible for this, I want to make sure people are using this data potentially appropriately.

Nick: I really, really liked the way you put it in that all three of those roles are coming together, and doing so very efficiently. 

Jason: Yeah, I mean, they’re like, how do we do this quickly, right, because we can’t afford for our consumers to be waiting around for data, you know, and we can’t afford for there to be lack of clarity around the controls that need to be enforced around that usage. And so as we bring those groups together, and we really leverage technology, we can drive a lot of efficiency into that process, we can drive a lot of automation into that process, which means that we create this environment where our consumers get, effectively self service, access to data, instant access to data, with all those controls and policies, you know, seamlessly transparently enforced at the point of consumption. And that’s where it all comes together. That’s the magic.

26:51

Nick: You know, as you take a look at that, that magic that’s happening, I know that you’re working really closely with anyone from financial institutions to healthcare institutions, to enable that data.  Is there any, you know, when you take a look at it, are there any things that an organization should be looking for, when they want to try to make this evolution this this, this movement to modern data provision? Are there certain considerations that you’re seeing a common theme, as you’ve been working with these executives across these industries that you could share?

Jason: Yeah, I mean, I think I think the consumer, the data consumer, is key, we need to first and foremost serve the needs of that data user, that data consumer. That means we need to accommodate a broad range of consumption patterns or use cases, you know, if you’re a data user, you want to use data in the environment that you’re comfortable in, you don’t want to have to move between loads of different technology stacks or systems. So you want to stay in your native environment. And you want that process to be really simple and transparent and friction free. So I think that’s critical. And then I think the other piece is, you know, really thinking about how we can simplify this, this complex regulatory and legal landscape, you know, how do we template and codify requirements, so that, you know, it’s not a major program, not a major change management exercise, to implement these required rules, and, you know, required for compliance and so on. And also, as they evolve, because they are evolving very rapidly, that we’ve got a really seamless way of ensuring that our approach is up to date, you know, driven systematically, so that if something happens in a new regulation, that just seamlessly flows down into, into the policy layer and very efficiently and enabled us to update that with minimal disruption to our user group. So I think we just want to see, you know, that ease of use, that friction free access to the consumer, and, you know, the simplification of that complex, regulatory landscape, you know, I think those are massively important, certainly amongst our customer base today. 

Nick: Okay. Well, before we end this episode, you know, are there any other trends or any other things or even myths if you want to say what you’d like to say, what are those parting words of wisdom? Let’s say that you could give our audience as we end this episode. 

Jason: Well, yeah, maybe a good way to end is to think a little bit beyond data privacy and regulation and modern data provisioning. I think another big topic that we’re seeing is this question of ethical data. Yes. And that’s one I think that is often misinterpreted or used inappropriately, I think that it is certainly a very important consideration. And it’s something that is definitely growing in prominence in our customer base. I think it’s very important in many ways, and it’s quite separate from the controls we’ve been discussing. I think, you know, data ethics, ethics, in general, really, as a culture is cultural, right? What is ethically unacceptable in one culture, in one country, in one group of people may be completely unacceptable and another. And I think that, you know, we can put all the controls in the world, we can be completely compliant with every regulation, and we can still be doing things that are completely unethical. It’s becoming easier to weaponize data to affect any manner of decision making, you know, whether that’s driving students subject choices, to encourage students into STEM, for example, or manipulating election results. And I think we need to ensure that we have the systems and the processes and the checks and balances in place, that we, you know, effectively wield this power in a way that is acceptable within our culture, and within our society. And I think, you know, what that really means, ultimately, is, is driving a lot of transparency. And, you know, providing very clear communication around data use, so that expectations are correctly set. And we drive the right level of understanding in the community, right, with the data subjects. And that again, you know, reinforces that sense of trust. So I think my point is, I think, you know, we can do everything in the world, on the technology side, we can do leverage the most incredible leading edge technology around privacy enhancements, but at the end of the day, it goes back to that question around people and remembering that we’re, we’re dealing with human beings here, and and take a step back and ask ourselves the question, you know, just because we can do this, should we be doing less at the end of the day? 

Nick: You know, that may have been the last thought, but that’s probably one of the most important thoughts that we’ve had in this episode. You know, to your point, context, purpose and intent really do matter to build that trust in people. Jason, it has been a pleasure having you on the show, and as part of this episode, and thank you for creating InConfidence that it’s community that the support data practitioners and debt operations people, you know, we look forward to catching up with you throughout the year, as these trends begin to play out and new trends come together. So Jason, thank you for taking part in the community. It’s been an absolute pleasure to be here. Well, thank you to our listeners. Thank you for listening to the InConfidence podcast for the community of data leaders. We hope you found your time with us well spent, we will enjoy your feedback and attest to you’ve given us some great feedback thus far. 

Ready to learn more about Privitar?

Our experts are ready to answer your questions and discuss how Privitar’s security and privacy solutions can fuel your efficiency, innovation, and business growth.