I’m Sorry Dave, I’m Afraid I Can’t Do That

What a surprise.

It turns that it is trivial to hack the most sophisticated Artificial Intelligence (AI) systems by simply training them poorly:

If you don’t know what your AI model is doing, how do you know it’s not evil?

Boffins from New York University have posed that question in a paper at arXiv, and come up with the disturbing conclusion that machine learning can be taught to include backdoors, by attacks on their learning data.

The problem of a “maliciously trained network” (which they dub a “BadNet”) is more than a theoretical issue, the researchers say in this paper: for example, they write, a facial recognition system could be trained to ignore some faces, to let a burglar into a building the owner thinks is protected.

The assumptions they make in the paper are straightforward enough: first, that not everybody has the computing firepower to run big neural network training models themselves, which is what creates an “as-a-service” market for machine learning (Google, Microsoft and Amazon all have such offerings in their clouds); and second, that from the outside, there’s no way to know a service isn’t a “BadNet”.

Note that current high end AI models are not so much programmed as trained, and it appears that this provides an unprecedented opportunity to develop malicious software.

I’m thinking that you might see an AI drone that gets the whole Manchurian Candidate treatment in the not too distant future.

One comment

Leave a Reply