Redefining CyberSecurity

Adversarial Machine Learning: Realities of AI and ML in Cybersecurity | A Conversation with Dr. Anmol Agarwal | Redefining CyberSecurity with Sean Martin

Episode Summary

Dive into the intriguing intersection of AI, machine learning, and cybersecurity with Dr. Anmol Agarwal on the Redefining Cybersecurity Podcast with Sean Martin, where a cutting-edge discussion illuminates the crucial balance between technological advancement and security standardization. Unravel the complexities of adversarial machine learning and its impact on the future of cybersecurity, in a conversation that's as enlightening as it is essential for anyone navigating the tech-driven landscapes of today.

Episode Notes

Guest: Dr. Anmol Agarwal, Senior Security Researcher

On LinkedIn | https://www.linkedin.com/in/anmolsagarwal/

On Twitter | https://twitter.com/anmolspeaker

On YouTube | https://www.youtube.com/channel/UCuWzfnJyZ0S68kG5e-lUZ6w

____________________________

Host: Sean Martin, Co-Founder at ITSPmagazine [@ITSPmagazine] and Host of Redefining CyberSecurity Podcast [@RedefiningCyber]

On ITSPmagazine | https://www.itspmagazine.com/sean-martin

View This Show's Sponsors

___________________________

Episode Notes

In this episode of Redefining CyberSecurity, host Sean Martin explores the complex world of artificial intelligence (AI) and machine learning (ML) with Dr. Anmol Agarwal, a senior security researcher at Nokia and adjunct professor at George Washington University. The discussion kicks off with a reflection on the evolving dialogue around AI and ML, shedding light on the critical role of machine learning as the backbone of AI technology. Dr. Agarwal emphasizes machine learning's influence on the accessibility and popularity of generative AI, thanks to its application in natural language processing.

The conversation transitions to Dr. Agarwal's intricate work on standardizing 5G and 6G technologies, underscoring the significance of security standardization in the rapid advancement of mobile technologies. Furthermore, they explore the utilization of machine learning in balancing network load and enabling emerging technologies like the metaverse, showcasing AI's prowess in facilitating fast data analytics.

A substantial portion of the episode is dedicated to adversarial machine learning, where Dr. Agarwal explains its definition as the study of attacking and defending machine learning models. Through examples such as the potential manipulation of Tesla's autopilot via adversarial techniques, they provide a vivid picture of the threats posed by malicious actors leveraging AI for harmful purposes. The episode concludes with an appeal for a deeper understanding of AI and ML beyond the buzzwords, promoting a pragmatic approach to integrating these technologies in cybersecurity strategies.

This episode offers valuable insights for cybersecurity leaders, CISOs, business executives, and security analysts, emphasizing the importance of comprehensive risk analysis and the ethical application of AI and ML in bolstering cybersecurity defenses.

___________________________

Watch this and other videos on ITSPmagazine's YouTube Channel

Redefining CyberSecurity Podcast with Sean Martin, CISSP playlist:

📺 https://www.youtube.com/playlist?list=PLnYu0psdcllS9aVGdiakVss9u7xgYDKYq

ITSPmagazine YouTube Channel:

📺 https://www.youtube.com/@itspmagazine

Be sure to share and subscribe!

___________________________

Resources

MITRE ATLAS: https://atlas.mitre.org/

___________________________

To see and hear more Redefining CyberSecurity content on ITSPmagazine, visit:

https://www.itspmagazine.com/redefining-cybersecurity-podcast

Are you interested in sponsoring this show with an ad placement in the podcast?

Learn More 👉 https://itspm.ag/podadplc

Episode Transcription

Adversarial Machine Learning: Realities of AI and ML in Cybersecurity | A Conversation with Dr. Anmol Agarwal | Redefining CyberSecurity with Sean Martin

Please note that this transcript was created using AI technology and may contain inaccuracies or deviations from the original audio file. The transcript is provided for informational purposes only and should not be relied upon as a substitute for the original recording, as errors may exist. At this time, we provide it “as it is,” and we hope it can be helpful for our audience.

_________________________________________

Sean Martin: [00:00:00] And hello, everybody. You're very welcome to a new episode of Redefining Cybersecurity. This is Sean Martin, your host, where I get to talk about all kinds of cool things that impact how we run our businesses safely and security, looking at tons of different aspects of technology and And of course, connected to our teams and operations in support of the business.

And it's hard to have conversations these days that either aren't completely about, or definitely touch on in some regard, uh, the concepts of. Artificial intelligence and machine learning. And it's interesting. I wrote a piece on, it's called the marketing mess. And it was probably three years ago where I kind of explored the, the challenges of the language and the terminology of AI and ML and the misuse of them in different arenas.

And my sense, and perhaps my guest will share in a moment, my sense is that. [00:01:00] Machine learning seems to have fallen off the, off the scale in terms of what people talk about. Everything seems to be about AI and large language models, um, and even then it's really generative AI. I'm just mentioning a few things there.

Um, I'm sure it's even much more complex now. Or maybe it's been complex and I only touched on a little bit last time. Anyway, I'm just kind of rambling at this point. I'm thrilled to have my guest, uh, Dr. Anwar Agarwal on to join me. We're going to talk about adversarial machine learning. And, uh, Dr. Anwar, it's a pleasure to have you on.

Anmol Agarwal: Thank you so much for having me on your show. Yes. Thank you. So to touch on your point, a lot of us are talking about AI and generative AI, and that's a buzzword. But actually, if you think about it, machine learning, is the heart behind all of these AI technologies. [00:02:00] Generative AI actually was natural language processing, which is something you study in machine learning.

But I think generative AI is so popular now just because it's accessible to the public. Anyone can use it now. Whereas in the past, you would need a lot of compute power, a lot of access to servers in order to use it. But it's not new technology, it's just now everyone's talking about it.

Sean Martin: Yeah, and I picture it.

Uh, I guess it's there's the technology and then the interface to the technology as well. Um, I'm probably oversimplifying it, um, before we get into it. And thank you for clarifying. And I suspect we'll have a deeper dive on some of this as well. A few words for our audience, uh, who Dr. Anmol is, what you've been up to, what you're working on now, uh, just to kind of paint a picture for who, who you are as they listen to our conversation.[00:03:00]

Anmol Agarwal: Yeah. So I am a senior security researcher. And I work at Nokia, but Nokia is actually providing the 5G and 6G technologies that power mobile carriers like AT& T, Verizon, T Mobile. And people might not know that, but in order to actually develop 5G and 6G, it typically takes a 10 year development cycle to actually standardize this technology, because if we're using something like 5g or 6g, then we need to standardize it.

So the whole world collaborates on how this technology needs to be implemented. So my job is related to standardizing 5g and 6g technology and specifically security standardization of that. So what that means is if AI or machine learning is being used. Being used to power 5G or 6G, you know, for your cell phone to work.

For example, my job is to secure that technology [00:04:00] so that we can actually use 5G and 6G without worrying about someone eavesdropping on our calls or. Faking sending a text message, for example, so that's a little bit about what I do. And then I'm a part time adjunct professor at the George Washington University.

So I teach machine learning. And that's also where I did my doctorate in cyber security analytics. And my research was focused on basically adversarial machine learning, which is what I'm talking about today.

Sean Martin: That's fantastic. A fellow, uh, adjunct professor. I actually have the privilege on occasion, uh, to teach security analytics at Pepperdine to MBA students.

And the whole goal is to tell, take data and turn it into stories that matter for the business. And I suspect what you're doing is similar in terms of. Taking data and translating it [00:05:00] into operations for the business and helping with decision making and things like that, probably the same and ultimate goal.

Anmol Agarwal: Yeah. I mean, it's basically not only how do you implement machine learning from scratch, but how do you actually apply it to a business? How do you actually get your product deployed so that people can use it? How do you communicate your technical findings to the board? Because we can talk about all this cool technology, but unless we Break it down for the business leaders.

It's not going to be implemented.

Sean Martin: So in this. This show isn't about 6G, but you said something that, that intrigues me, so I, I, I can't help myself, but to ask. AI and machine learning used at the, is it, is it correct to say that Nokia is the MNO, the mobile network operator? Is that the right term?

Anmol Agarwal: Well, so it's not the mobile network operator, but Nokia's [00:06:00] customers are the operators.

Sean Martin: Got it.

Anmol Agarwal: Okay. So it's developing the technology that will ultimately go to the operators.

Sean Martin: Got it. Because there's a whole chain of, of, uh, we'll say entities involved to, to actually get to the handset. Um, so they're even one step above that. So the infrastructure and the platform, if you will, that the operators use.

So where, I'm curious what, how machine learning sits at the Is it to enable it or is it the platform actually using it to help route or can you take just a quick moment to describe what's going on there? I think it'd be interesting.

Anmol Agarwal: I can give you a few examples. So 5G has a product called NWDAF. So if you just look that up, it's basically network data analytics function.

So that's something being used in 5G. To analyze the different metrics we're getting from the network and to basically retrain the machine learning model on data that we learn. [00:07:00] So if you think about it, when we talk about, you know, cell phone, there are millions of us. We're all using cell phones. We all travel, but how does this actually work in terms of load balancing?

So sometimes for load balancing, you use machine learning so that you have enough resources for, you know, a big metropolitan city like New York City and also be able to account for someone who's traveling to, let's say. Australia or New Zealand, and how does that technology actually work? So a lot of machine learning is going into that.

And also, machine learning helps us enable new technology. So, for example, the metaverse is something people are talking about. But to actually realize that technology, to actually plug into your virtual world with your avatar, you would need AI to do this. So that's another thing we're looking at with 6G.

So think of AI and machine learning as [00:08:00] just a tool to enable fast data analytics. That's really what we're using it for.

Sean Martin: Got it. And part of the infrastructure then that, that the rest of the folks use. It's super, super cool. So let's, um, let's get into the talk where eventually we're going to get down into the weeds of adversarial machine learning, but I think given my first point of AI and ML and NPL, natural process language processing, NLP, I should say, um, can you paint a picture for folks, what all those elements are?

And I don't know, maybe a use case to say, or to describe what each one does, just kind of level set there. So we can talk about which direction next things are going

Anmol Agarwal: right. Yeah. So I'm sure a lot of us already know about AI artificial intelligence. So sometimes we talk about AI and machine learning interchangeably.

So sometimes when you'll see in the news stories people are talking about AI, they might actually be [00:09:00] talking about machine learning. But AI just sounds cooler. So actually, AI is trying to imitate a human like behavior. So it's, it's originally from biology, right? So they're trying to imitate something like the brain.

Where we have neurons and that's where we have something like neural networks. That's what came about with AI and an example. We all know is chat. That's generative. So what it's doing is it's basically chat. You send it a question and it gives you a response. Um, there were also AIs that created screenplays for people, created recipes for people.

So it's not a human, but it's trying to do some things that as humans we might do. So that's what AI is. And then machine learning is really what Is driving AI. So if you think about machine learning and I mentioned natural language [00:10:00] processing under the hood machine learning is actually what is powering something like chat GPT.

So machine learning at a high level is just. A machine learns we have input data. We have output data. If you think about data analytics and cyber security, we think about network traffic. For example, we have benign data. We have attack data. We might label that some of these events are indicative of an attack.

So label it. Send it to your machine learning model and it can flag that and say, I think I detected an attack. This seems suspicious. So that's an example of machine learning and we're actually using machine learning a lot in cyber security now for malware analysis. For anomaly detection, you know, something doesn't look right for user behavior analytics.

Maybe there's an insider threat. You could use machine learning. So it's, they're both [00:11:00] intertwined. But it's just AI is the buzzword now. So that's what we're talking about.

Sean Martin: Yeah. Cause I, I don't know how many years ago now it's probably seven or eight years. Um, there, well, there's a Defcon, there's an adversarial AI, um, village, at least there was.

Yes. And, uh, we had somebody on talking about adversarial AI and we're trying to remember what, what that conversation was like. I don't know how different that would be from the conversation we're going to have today either. I'll have to look that up, look up that episode now. Um, one of the things that, that I wanted to touch on with you is the, well, first maybe let's talk about the exposure that AI and machine learning, learning bring.

So you talked about how we're using it to analyze data to, Hopefully find anomalies that could point to something bad happening from a cyber security perspective, [00:12:00] but the use of these technologies, um, puts additional reliance on data that might have some issues, um, new connections to code, which coding has issues, um, business logic, business logic has issues.

So can you describe some of the exposure that we introduce? Just by inserting this one new technology into the stuff we recreate and build and launch,

Anmol Agarwal: right? So in cyber security, we always monitor the attack surface And of course adding AI and machine learning is going to increase the attack surface Because now our systems are more complicated It's much easier for someone to attack it and a lot of times with AI and machine learning technology It's really hard to Sometimes people don't really understand how it works under the hood, right?

They just want the cool new feature and then security can kind of be left out. And then we try to bolt [00:13:00] security on later after the product has already deployed. And we all know that doesn't work out well. Um, so there's actually. A growing movement called secure by design, and they're trying to apply this idea to machine learning as well.

So when you actually develop machine learning, making sure it's secure by design. So when we talk about machine learning or AI, it's important to monitor the data. That you're actually using for your model. So, for example, organizations might be using their confidential information. So you want to protect that data.

You need to monitor the model itself, because maybe if you don't protect your model, someone could reverse engineer your model and steal it. So you don't want that. And you also need to talk about the encryption of communication between the data and the model because you don't want a passive attacker who could just sniff the traffic and get access to your data.

[00:14:00] So, I think those are things that everyone needs to think about when they're implementing AI or machine learning.

Sean Martin: Yeah, and an extension to the, uh, to the encryption, of course, comes identity and authentication.

Anmol Agarwal: Right,

Sean Martin: exactly. It's not just the person connecting any longer at this point. So you might have services and apps and systems and all kinds of fun stuff having a go at the model.

Um, let's talk about Take that for a minute. And let's look at this. Uh, most of my audience are security leaders, CISOs, business executives, uh, building stuff with security in mind. And we also have folks who, uh, who sit in the sock and are analysts looking, looking for those anomalies. So what could you.

describe to them in terms of how best to prepare their security operations for their business that is now introducing these technologies.

Anmol Agarwal: So I think in [00:15:00] order to actually prepare their security operations, I recommend researching what the AI technology that they're trying to implement is because oftentimes with the Using AI and machine learning right in the sock.

We need different roles. So you might need a team of data scientists. You might need a team of security engineers, and they might need to collaborate with each other to see what the AI is doing. And then also making sure that you have access control like you mentioned only. allow the principle of least privilege, which we know only give them the minimum amount of privilege that they need to do their job.

And that applies to AI and machine learning as well as to accessing the data because machine learning is all about the data you give it. You need to make sure that's so making sure that is secured. And I think It's important to educate your staff on the risks of AI and [00:16:00] machine learning because it can introduce a lot of risks and then making sure that we verify the results of AI because AI can make mistakes, which I'm sure we're going to touch on today.

So making sure you have people that verify the results AI is giving you versus just blindly trusting it is very important.

Sean Martin: Yeah. And then, so there's the. It we, we trained our model to do a certain thing and it didn't do exactly what we expected it to, but we didn't validate it. And therefore, a decision was made and some action was taken based on wrong information or wrong analysis or inaccurate analysis, however you want to refer to it.

Um, I, I. Unless you want to take a moment to describe that. I really want to get into the concept of adversarial, which, because that to me is, we think we got it [00:17:00] right. We didn't, most of the time it is, occasionally it's not, we didn't catch those occasional knots and it is what it is, but then there's the.

The nefarious end of things, right? So, which I think is where we're going to head with the adversarial. So can you describe, maybe you do this for some of your students, what adversarial machine learning is?

Anmol Agarwal: Yeah. So if you were to look up adversarial machine learning. Just on Google, for example, you'll see it's the study of attacks and defenses against machine learning models.

So it's basically studying how we attack machine learning, but then also how we defend against some of these attacks.

Sean Martin: So it's uh Well, I could guess but I'd rather you tell me It's It's my my sense is that it's It's tricking or [00:18:00] abusing or misleading the model or the application using it or perhaps even the business logic that sits on top of it to do something other than wasn't was intended, right?

It's just like any other app app sec vulnerability. But what's different here?

Anmol Agarwal: I mean, that's exactly what it is. So an example is let's say you have Machine learning for image recognition and let's say you're recognizing different images of people. Maybe you could add noise to some kind of image that you're giving to the machine learning model and what it does is it basically misclassifies that image and says this is not a person, this is a tree.

Right, so we could make a big mistake, or you basically just try to trick it in some way so that it functions in a way it shouldn't. Whether it's to carry out a denial of service attack, or just cause it to misclassify something, or to insert a backdoor so you can sneak malware in the machine learning model without anyone [00:19:00] knowing.

So, you know, Yeah, basically just tricking the model.

Sean Martin: Let's talk about some of those. So from just the, for the moment, I'm going to tell Marco who probably won't listen to this because he's busy with his own show. But I'm going to start referring to him as a tree. I classify him as a tree. Um, let's talk about that use case a little more and maybe the big picture again for, and, and security teams who have to deal with this stuff.

What's the, what's the business impact potentially? By a person being classified as a tree and I'm certain it's different for different organizations. Financial services might be different than health care might be different than somebody who builds autonomous vehicles looking at people in trees, not many trees across the street.

I don't think except for Marco in this case. So let's use that use case and maybe talk a little bit about what's going on there and some of the impacts that can happen.

Anmol Agarwal: Yeah. So [00:20:00] basically in terms of machine learning, if you think about it, person is a tree. So you say a tree is a person and that that will cause your machine learning model to basically fail.

So think about it in cyber security. You say attack is actually benign. What does that mean? Well, then, you know, you snuck malware in. They got away with it. You don't detect that attack in time. It could cause financial impact. It could mean your customers records have been breached. So it's really important to verify that information.

Sean Martin: And what about, um, so I often classify cyber security this way is most, especially with a lot of regulation around privacy, um, and intellectual property. and breaches into an organization. My sense is a lot of times people's first thought of cybersecurity and the [00:21:00] protections are around the, yeah, the, the access, the, uh, the CIA, the confidentiality, right?

The CIA try out the C part of that, where in fact the integrity is also important as well as the availability. Are there, have you seen, are there ways I'm sure there are that. Not just access to data, but the manipulation of it can then cause the system to not be resilient, to go offline. You're talking about the integrity here in this case as well.

What are your thoughts there?

Anmol Agarwal: Yeah. So that's actually an open research question and that's what people are researching right now. So back in 2019, this was actually presented at DEF CON was a team of security researchers. We're able to trick Tesla autopilot, so they used adversarial machine learning to basically add noise to the image.

Like I mentioned, so initially [00:22:00] Tesla autopilot, it functions and it recognizes the lanes properly, but then what. They did was they added noise to the image. So now Tesla autopilot no longer recognize that lane. So it actually diverted from the lane that it should have been in because it didn't see the lane anymore.

So that's, that's just one example of when adversarial machine learning can actually break the system entirely. And if this was not caught, you know, it could lead to loss of life in extreme cases.

Sean Martin: Your perspective, how, how advanced is the research from the adversary, the bad actors, cyber criminals in terms of their efforts to break these models?

Um, I don't know that they're probably gazillion models at this point, right? A lot, a lot of them public, a lot of them private. Um, is it, [00:23:00] you know, is it a spray and pray that, you know, I've, I've found this to work in one model. Let's see if I can find it the same weakness and other models. Um, or are they, or do they have specific examples that they know will work on one or more models and they just keep going after that, or are they even more advanced to say, this is the.

And goal I'm trying to achieve. I'm trying to break this business logic to transfer funds from this account to mine and the way to do that, not through a breach of a, of a hardly hardened system within a bank, but through. Some model that gives me access to the systems in another way. So what's the sophistication level of the adversary?

I guess is what I'm trying to understand.

Anmol Agarwal: I think it varies based on your adversary. So this is actually coming up right now is people are using [00:24:00] generative AI to learn how to attack these models. If you think about, um, Gandalf was something that came out. A few years ago, basically, it was an example of an LLM, and you would try to do some kind of prompt injection to get Gandalf the LLM to give you passwords that it shouldn't.

So that was an illustration of the idea. But basically, now generative AI is allowing someone like a script kitty to To attack these machine learning models without really understanding how it all works. So now we can have script kitties attacking these machine learning models, and then the very sophisticated actors, like what you mentioned, breaking the business logic, we can see that in nation states, I'm sure, you know, there are.

Countries that are trying to do that, because now, AI and machine learning is being used in all of these industries. It's being used in the government. It's being used by [00:25:00] NASA for space exploration. So, I think we see a varying skill set with all these adversaries and unfortunately, generative AI is enabling script kitties to also perform adversarial machine learning.

Sean Martin: So the reasons grip kiddies can do it, I think is twofold and you can correct me if I'm wrong. So the first is the accessibility to the technology is exposed through an interface and through some other open source tooling, things like that, that abstracts a lot of the complexities of the deep machine learning capabilities and basically puts it in the hands of people.

Somebody, right? So they can, they can manipulate the other side of the equation to me is the data and, and of course the whatever app or service is built on top of that [00:26:00] data, but which for this audience is you, the business owner, not you, Dr. I'm all that the, the business owner lead listening to this episode that decide we want to enable this, this, uh, customer support, uh, service we offer with, with, uh, machine learning and expose that, um, If you don't know, if relying on the abstraction and you don't know how it works, machine learning, and you put data in there, and you don't know every bit about every piece of data that you've now fed this, you could potentially be exposing data that you don't want to.

And making it easy. The other side of the equation is making it easy for somebody who doesn't need a lot of knowledge to get access to that information. I think I can't remember which airline it was, but I presume there was some marketing data in the dataset that then their, their chat bot [00:27:00] pulled, pulled information from and fed a coupon or a discount or something that the, that the airline then had to honor, right?

To me, I don't know what the full story, but my, my sense of it is, you know, The airline used data to train, use that data to present it to, to the user, not realizing that data would surface and then they'd have to honor whatever that data suggested they offer. So I don't know. I said a lot there. I don't know if you, if you followed, but my point to the audience here is if you don't know how it's working, be very, very careful what you're feeding it and how, how you build apps to expose it.

Anmol Agarwal: Right, there's actually an adversarial machine learning attack called model inference attack. So basically, it's what you explained, give the machine learning model data, you have no idea how it's working, you're just an end user, and you're getting output from your model. And [00:28:00] maybe from that output, people can see, like, oh, based on what I get given giving it this data.

like this person is a tree, then I can infer some other things about the data set, and then I can reverse engineer the model or reverse engineer the training data set and get access to data that they shouldn't get access to. So that's another important security Threat with these machine learning technologies.

Sean Martin: And so I'm going to tell a little story because I don't talk enough clearly, but Marco and I, we, it's been a few months now. We, we started to explore the use of, uh, generate AI through as a service. We have, we bought a service or subscribe to the service to do this for us where we, it's all public information, all our episodes.

Are on Apple and Spotify and everywhere. So this is all, all this information is there. [00:29:00] So we're not worried about somebody hearing it again or reading it through and through an AI prompt. Um, the idea was show me, show me episodes that talk about adversarial AI and they'd feed this one and maybe. Maybe the one we had a couple years back.

Um, what we found was that it was still so erroneous that the, the, the stuff that came back with it mix up people as hosts that were guests. It would talk about topics that, uh, that weren't part of the episode. And it was clear it, it, it, That particular model, um, that we were using, the service we were using, wasn't good.

Um, but, so we decided to pull it back because it wasn't, wasn't achieving what we wanted and I think it was adding more stress than, than good. Um, but having this conversation now, uh, I don't know what it, what would be in that data set, and there's having the, the airline [00:30:00] example, I don't know what would be in the data set that's in our episodes that could be pulled together in some certain way that, It says something or describe something that never happened.

We don't want it to say. And so for me, more so than the, the fact that it wasn't providing value is that with the errors we saw, it could actually present something that wasn't what we wanted it to say and nor our guests or other, other hosts. So we pulled it back recognizing this wasn't working and I didn't want to, I'm not going to spend the time to really hone and fine tune the data set, nor work with the provider of that service to figure it out.

The value wasn't there for us, but so anyway, that's my story. Based on value based on risk. We made a decision. I'm curious how many listeners [00:31:00] do that same analysis. I'm not saying I'm great or anything. I'm just wondering how many people do that analysis. Do you have a sense of how organizations, either your own or others formally look at this?

That process and come up with a resolution that they're acceptable, except as a risk appetite. I think that that one reached the threshold. I said it wasn't worth it. Or now many companies do that same analysis. Any ideas?

Anmol Agarwal: Yeah. So, I mean, from my experience at Nokia, this, this might be different than what the listeners have experienced, but in telecommunications, it's highly regulated.

So sometimes we can't just accept a risk, but what we can do is we can say, Okay. We prioritize certain security issues and we say prioritize these security issues and those other issues will deal with it in the next release. It's very difficult to accept risk, especially with something like. AI and machine learning.[00:32:00]

So I actually worked in an organization before, I don't want to name names, but this organization was very resistant on using AI and machine learning. So we had researchers come over and they said, we have this really cool new, uh, tool using machine learning to help you prioritize the security risks that these organizations are facing, you should use machine learning.

And my manager at the time was very, very scared of using machine learning because A, maybe it doesn't work. B, how do you get the data? How do you clean the data? Is it even worth it? We're looking at not just one organization, but you know, all of our customers. So that's like hundreds, maybe thousands of different groups or systems or companies or products.

So is it even worth it? Level of effort and then also finances, right? So I and machine learning can [00:33:00] be expensive. So sometimes you have to do the trade off. Is it worth even investing in this technology? Before you actually see it work and realize what it's supposed to do. So, in, in that organization's case, they chose, no, we're not going to use machine learning.

But, you know, maybe a startup might try to use machine learning. They might try to use all of these new technologies to innovate. because they're okay with accepting the risk. But I think for larger enterprises, it's very difficult to accept this risk because you don't want your customer data to be breached.

You know, so, so I think it even depends on your organization.

Sean Martin: Yeah, that's great. Great point. And, um, I was wondering if you have any, any additional stories you might want to tell because, or maybe, maybe let's look at it this way. So [00:34:00] clearly there's risk. Each organization has to determine the risk reward equation, which side they land on, um, make their own decision that can only be reached in my opinion with guess what data, right?

And in this case, we're, Somebody's going to do the analysis on what the value is and can it, can it drive more, whatever revenue market share, whatever they're trying to achieve. That's one part of the equation. The other is the risk side. So how, how do you see folks getting a view of the risk? I did a very simple.

I just used it analysis and it wasn't working and I could see where the, where things were heading and didn't, didn't even want to invest anymore. Um, but if I decided that the value is also there, I would then have to take a deeper dive on the risk part of it. So how, how do organizations get that risk data, [00:35:00] um, for.

For the, the data set they're using, is it the right data? Is it clean? Does it have bias? How are we building which model do we build our own? We leverage another one. What service are we going to connect that to? What apps are we going to build that in? Where are we going to expose that? Is it through a mobile app, through a web app, through internal or whatever?

How do they get that full picture, assuming they decide it's worth investing enough to consider? The risks,

Anmol Agarwal: right? So that that's a good point. I think so. 1st of all, an organization, let's just decide you want to use AI and machine learning. So I think just a formal threat and risk analysis would be a good starting point.

Like, what are the risks I see? What would the impact be if this risk were to occur? And then do I have any risk mitigations or do I just accept the risk? So there are actually many resources that organizations can use. could [00:36:00] leverage. So Microsoft and Mitre have actually put up their own framework. So we know about Mitre attack, but there's a Mitre attack for adversarial machine learning called Mitre Atlas.

So if you look up Mitre Atlas, then that's a very useful resource. They list some common attacks that machine learning models might have. So that's a good starting point. And then in terms of actually understanding the data. This is very challenging. You could try to analyze your data manually, but of course that's not a good idea.

Um, you can try to implement machine learning, but again, it depends on the strength of your model. So, A lot of organizations are facing this challenge. We have this use case, but we don't have enough data. And so sometimes people are trying to synthesize their own data. So I actually did this for one of my projects was I only [00:37:00] had a few records of attack and benign data, and I needed to synthesize my own data in order to test.

out the machine learning model. So you could try to synthesize your own data, but again, that could introduce some bias that you weren't intending. But I think it's important for everyone to understand that all of your data will have bias. The only thing you can control is the The amount of bias, but the fact is that all of the data that we have in the world is created by us as humans, and we all have our own biases.

So you will never have a 100 percent bias free model. There's always going to be some level of risk.

Sean Martin: And as you're describing that, and thanks for those resources, by the way, perhaps you can share some of those in the notes and we'll include them for folks here as well. The. As you were describing some of that, I was thinking through my own use case and [00:38:00] I didn't even think of a bad actor coming in.

I was just thinking somebody wanting to learn more about the stuff that we have, uh, that we've published. And at the time it was, uh, I don't know, 1, 500 episodes. We're now approaching 2, 000. Which if we train the data on 1500 and then 500 more episodes, it could change how the results come back. So you have to keep, you have to keep your risk assessment up to date with the new data that you put in.

But the point I want to make is it's easy to assume somebody is going to come in and use it as you intend. Um, and say like I, my use cases were show me episodes. When was this episode published? Um, what are some resources provided in that episode that can help me with mitigating this risk? That's the kind of stuff I was hoping people would do.

What, I could see somebody coming in and saying, uh, Do you believe that CISO should be criminally [00:39:00] charged? With, uh, with their role in whatever attack that they experienced at their company. And if the thing comes back and says yes or no, or whatever, what, what can somebody do with that? And presumably that that dataset could provide some, some opinion based on whatever somebody said in some episode, whether that's our position or not.

So that's a, that's a benign, that's a, that's a, Prompt based, trying to get it to say something, but then they're also more technical based and that's where some of the ScriptKitty stuff comes in where if you know what, what types of data to feed it, maybe code you feed it or whatever else you can get it to do all kinds of weird, weird stuff, right?

Anmol Agarwal: Right, exactly. So that actually made me think about the Tay chatbot. Which, back in 2016, Microsoft designed Tay, and it was supposed to [00:40:00] basically function almost like a teenager, and just be very benign, and I'm sure Microsoft just thought it would be a useful chatbot, assistant type of chatbot. But of course, it was trained on Twitter, and it was sent all these tweets about conspiracy theories, and incorrect information.

And of course, The machine learning just parroted back whatever it was trained on. So even prompts that you give it could train the model as well.

Sean Martin: So much, so much to consider here. Um, we're at about 40 minutes. Um, I'd like to, like to close here with some, some final thoughts from you. Is there anything we didn't touch on yet?

Maybe a use case or a scenario or something that you think would be helpful for folks to understand. What's possible? And, um, yeah, we'll start there. Maybe one final point on there if you have anything else to share.

Anmol Agarwal: Yeah, I mean, I could talk about this topic for hours,

Sean Martin: but [00:41:00] I'm afraid of you keep going and go

Anmol Agarwal: right, right.

But one attack I did want to touch on is the idea of a poisoning attack. So you could have an insider threat and maybe someone could inject poison data, and then your model is trained on incorrect information, and that could cause your model to break. So, um, in research, they actually use this with autonomous vehicles.

So you have stop signs and speed limit signs, and you could poison the machine learning model to say a stop sign is a speed limit sign, so then the vehicle won't stop. But this poisoning attack idea could apply to all of our scenarios, right? It could apply to the facial recognition scenario. It could apply to your scenario.

It could even apply to a cyber security scenario. So again, that just underscores the importance of protecting your data and making sure that you mitigate any chances of having an [00:42:00] insider threat because you don't want this training data to be. Injected with bad poison data and then cause your entire model to break.

Sean Martin: Yeah. And I'm thinking that reminds me of like some of the financial services. Uh, if, if certain criteria are met, it requires additional authentication to approve certain transactions. Those are very simple rules, right? If this and this and this, then do this. Um, I can, I can only guess that some financial services are using.

Machine learning to do a little deeper analysis to fine tune that risk. So it's not just a set of three. It's a, it's a gray area that based on the analysis, but that can be poisoned and let transactions through potentially or block important ones. It should be. Should be allowed through, um, super fascinating.

Of course, AI is machine learning is in general, um, so much more to learn and [00:43:00] perhaps you'll come back and share some more with me. But as we, as we close today, um, a final question for you in the, in the mind of, Of the audience, CISOs and business leaders and, uh, security practitioners. What's the one thing you think we need to do to, with AI and machine learning to redefine security?

Anmol Agarwal: Yeah, I think for us, we need to understand what actually AI and machine learning is beyond the buzzwords and hype actually take the time to research what this technology is doing. It's not magical. It's just powerful data analytics. So I think that's something I want to leave the audience with. It's AI is not magic, it's just data analytics.

Sean Martin: I love it. The magic hat and wand. You can wear them, but don't expect them to work. Exactly. I love it. [00:44:00] Dr. Anmol, it's an absolute pleasure chatting with you today. I learned a ton and hopefully the audience did as well. Um, we got people to think that that's certainly my goal. And I believe we achieved that today.

So I want to thank you very much for joining me and for everybody listening and watching. Please do subscribe, share, and, uh, love to know your thoughts on, I don't know, many of the things we talked about, how you, how do you analyze the risk? What tools do you use? What frameworks do you use? What are those conversations sound like, uh, as you're trying to determine when and where and how you're going to use machine learning to build, build a business and then maybe even protect it on the other side.

So thank you so much. Um, we'll, uh, we'll see you on the next episode.

Anmol Agarwal: Thank you.