#Machine learning has an alarming threat: undetectable backdoors

Table of Contents

“Machine learning has an alarming threat: undetectable backdoors”

This article is part of our coverage of the latest in AI research.

If an adversary gives you a machine learning model and secretly plants a malicious backdoor in it, what are the chances that you can discover it? Very little, according to a new paper by researchers at UC Berkeley, MIT, and the Institute of Advanced Study.

The security of machine learning is becoming increasingly critical as ML models find their way into a growing number of applications. The new study focuses on the security threats of delegating the training and development of machine learning models to third parties and service providers.

Greetings humanoids

Subscribe now for a weekly recap of our favorite AI stories

With the shortage of AI talent and resources, many organizations are outsourcing their machine learning work, using pre-trained models or online ML services. These models and services can become sources of attacks against the applications that use them.

The new research paper presents two techniques of planting undetectable backdoors in machine learning models that can be used to trigger malicious behavior.

The paper sheds light on the challenges of establishing trust in machine learning pipelines.

Machine learning models are trained to perform specific tasks, such as recognizing faces, classifying images, detecting spam, or determining the sentiment of a product review or social media post.

Machine learning backdoors are techniques that implant secret behaviors into trained ML models. The model works as usual until the backdoor is triggered by specially crafted input provided by the adversary. For example, an adversary can create a backdoor that bypasses a face recognition system used to authenticate users.

A simple and well-known ML backdooring method is data poisoning. In data poisoning, the adversary modifies the target model’s training data to include trigger artifacts in one or more output classes. The model then becomes sensitive to the backdoor pattern and triggers the intended behavior (e.g., the target output class) whenever it sees it.