If you're making an AI model available, but you're not actually trying to govern access to it in any reasonable or meaningful way, your intellectual property is at risk. Just like it’s possible to hack into a filesystem and steal IP, it’s possible to attack an AI model to determine what makes it tick or even how to trick the model. Anytime you make a model available where people can send input into it and look at the output, you are allowing those people the opportunity to copy your model. If you’re not restricting access, by gating with API keys for example, you’re allowing others the opportunity to copy the IP you spent months or years developing—and just walk out the door with it (maybe without you ever knowing it!).
To review: for better AI security, start by controlling access. At a minimum, you need to control access to your IP on all the networks it resides on and limit access to an appropriate user base.
Integrity
Integrity is the second principle of AI security. Integrity demands that the data concerning your information and systems is accurate and reliable. Integrity for AI models can be summarized easily: verify artifacts to validate outcomes.
Consider what parts of your process might require integrity checks, starting with the obvious: know what’s being shipped, know what tests are performed during each stage of the delivery process, and verify test results before moving on. You’ll want defined success and failure criteria for each test and ensure there is some guarantee that artifacts cannot be modified or replaced without arousing suspicion after each verification check.
Testing is straightforward and the industry has been doing it for years inside software development pipelines, however the security aspect is often overlooked. The strongest approaches to guarantee artifact security rely on digital fingerprinting. For example: MD5 hashes have long been used to verify the provenance of digital downloads. Watermarking model responses through the training process is emerging as a way to verify black box model integrity, and simple TLS communications can prevent man-in-the-middle attacks. These digital fingerprint-based techniques can be implemented with a variety of off the shelf tools. Using those tools is a simple way to confirm the files and models you are using are the files and models you intended to use, and not a clever phishing exercise. Without implementing some kind of integrity check, you are allowing an avoidable lapse in your information security framework.
To review: Spend the time to implement basic digital security. It can confirm integrity, has a low bar for implementation, and can prevent a false sense of security.
Availability
The third security principle is availability, which encompasses your system being available or accessible by an authorized user as needed, and operating as expected. Because denial-of-service attacks and other threats to data loss are so common, decisions around monitoring should work to remove single-points-of-failure, establish a back-up for data, create alerts, support a disaster recovery plan, and monitor all aspects of AI deemed important. These issues are all common to standard enterprise software development; however, AI model development and deployment has an additional issue that you need to consider—data ingest.
All AI models require that a user supply data for them to produce an output. Unfortunately, most data scientists aren’t network security specialist. They do what they know how to do or can make work in a pinch. This “get it done/move fast and break things” approach commonly leads to deployment scenarios that are nothing more than an AI model wrapped in a Flask app. It’s minimal and works for low volume inference. However, if your solution to AI deployment is to put a Web server up and just pass data into it, what you’re really saying is that you’ve got a place on your network that’s going to ingest arbitrary data. If you’re not fencing it off somehow and putting access control and logging in place, then you are essentially saying you trust people not to upload anything malicious to your network. See the problem?
To review: Availability for AI comes with many of the same problems as enterprise microservices, but with the added threat of well meaning, but uninformed Data Scientists added attack planes to your network. If you’re involved with a network running AI, you need to communicate with the team how data moves into and out of the models to prevent any security policy violations in addition to standard enterprise deployment evaluations.
Advanced Technology and Engineering Basics
Fighting off AI attacks starts with the basics. Do decent software engineering. Fail fast and fix it. Be afraid of what you don’t know. Don’t look for guarantees—they don’t exist in security. Do trust that there will always be someone looking to exploit your vulnerabilities.
Last fall, the Defense Information Systems Agency (DISA) released their Container Image Creation and Deployment Guide (November 2020), a list of technical requirements for security conscience container image creation and deployment within a container platform. This document can be used as a checklist for isolating your AI from your host system so if something goes wrong inside your container, it stays in the container.
I would go further and suggest that a good place to start with AI security is to leverage the things that are already well-built and well-accepted. For example, you don’t need to ever rewrite crypto code and odds are you won’t build a better object detector than YOLO. Why risk attack surfaces by redeveloping your own implementations instead of leveraging the widely adopted open source community artifacts? There are well-respected open source projects, and even commercial products, that provide answers when security is the question. Don’t knock store-bought security: as long as AI IP is valuable, there’s an incentive for someone to try to steal it and we all need to take as many precautions as possible.
Finally, when thinking about AI security, I like to remember that complicated systems are by nature full of attack surfaces. Microsoft Windows has been around for almost 40 years and still puts out security patches (what seems like) the second Tuesday of every month. Because Microsoft has taken security seriously, they are now focused on edge cases that take a worldwide collaborative army to identify. That kind of security focus is lacking in the AI industry. The lack in focus is partially because we don’t know what attack surfaces are going to prove the most fruitful to nefarious actors in the next 10 years. Additionally, we haven’t spent much time thinking about it. However, this kind of focus is going to be required if we want to avoid the confidence killing side effects of a public security vulnerability impeding our industry growth. Getting to that level of focus is going to take some time, but you can do your part by thinking through and implementing the security practices outlined here.