#How triggerless backdoors could dupe AI models without manipulating their input data

Table of Contents

“#How triggerless backdoors could dupe AI models without manipulating their input data”

In the past few years, researchers have shown growing interest in the security of artificial intelligence systems. There’s a special interest in how malicious actors can attack and compromise machine learning algorithms, the subset of AI that is being increasingly used in different domains.

Among the security issues being studied are backdoor attacks, in which a bad actor hides malicious behavior in a machine learning model during the training phase and activates it when the AI enters production.

Until now, backdoor attacks had certain practical difficulties because they largely relied on visible triggers. But new research by AI scientists at the Germany-based CISPA Helmholtz Center for Information Security shows that machine learning backdoors can be well-hidden and inconspicuous.

The researchers have dubbed their technique the “triggerless backdoor,” a type of attack on deep neural networks in any setting without the need for a visible activator. Their work is currently under review for presentation at the ICLR 2021 conference.

Classic backdoors on machine learning systems

Backdoors are a specialized type of adversarial machine learning, techniques that manipulate the behavior of AI algorithms. Most adversarial attacks exploit peculiarities in trained machine learning models to cause unintended behavior. Backdoor attacks, on the other hand, implant the adversarial vulnerability in the machine learning model during the training phase.

Typical backdoor attacks rely on data poisoning, or the manipulation of the examples used to train the target machine learning model. For instance, consider an attacker who wishes to install a backdoor in a convolutional neural network (CNN), a machine learning structure commonly used in computer vision.

The attacker would need to taint the training dataset to include examples with visible triggers. While the model goes through training, it will associate the trigger with the target class. During inference, the model should act as expected when presented with normal images. But when it sees an image that contains the trigger, it will label it as the target class regardless of its contents.