NIPS 2007 Workshop on
Machine Learning in Adversarial Environments
for Computer Security


dates | description | schedule | invited speakers | recommended readings


Date and location

December 8, 2007
Whistler, British Columbia, Canada
Sessions: 9:00am - 12:00am and 2:00pm - 5:00pm

Description

Computer and network security has become an important research area due to the alarming recent increase in hacker activity motivated by profit and both ideological and national conflicts. Increases in spam, botnets, viruses, malware, key loggers, software vulnerabilities, zero-day exploits and other threats contribute to growing concerns about security. In the past few years, many researchers have begun to apply machine learning techniques to these and other security problems. Security, however, is a difficult area because adversaries actively manipulate training data and vary attack techniques to defeat new systems. A main purpose of this workshop is examine adversarial machine learning problems across different security applications to see if there are common problems, effective solutions, and theoretical results to guide future research, and to determine if machine learning can indeed work well in adversarial environments. Another purpose is to initiate a dialog between computer security and machine learning researchers already working on various security applications, and to draw wider attention to computer security problems in the NIPS community. The workshop will consist of invited and contributed presentations as well as panel discussions.

Recommended readings

A collection of recommended readings in this area is available here.

Schedule

Morning session (9:00am - 12:00am):

9:00am Opening Remarks
R. Lippmann and P. Laskov
9:10am Can Machine Learning Be Secure? [abstract] [slides]
M. Barreno
9:40am Foundations of Adversarial Machine Learning [abstract] [slides]
D. Lowd, C. Meek, P. Domingos
10:00am Poster spotlights:
  • Optimal Spamming: Solving a Family of Adversarial Classification Games [abstract] [spotlight]
    M. Brückner, S. Bickel, T. Scheffer
  • Statistical Classification and Computer Security [abstract] [spotlight] [poster]
    A.A. Cardenas, J.D. Tygar
  • Sensor Placement for Outbreak Detection in Computer Security [abstract] [spotlight]
    A. Krause, H.B. McMahan, C. Guestrin, A. Gupta
  • Proactive Vulnerability Assessment of Networks [abstract] [spotlight]
    A. Fern, T. Nguyen, S. Dejmal, L.C. Viet
  • Automatic Detection and Banning of Content Stealing Bots for E-commerce [abstract] [spotlight]
    N. Poggi, J.L. Berral, T. Moreno, R. Gavaldà, J. Torres
  • Online Training and Sanitization of AD Systems [abstract] [spotlight] [poster]
    G.F. Cretu, A. Stavrou, M.F. Locasto, S.J. Stolfo
  • Combining Multiple One-class Classifiers for Hardening Payload-based Anomaly detection Systems [abstract] [spotlight] [poster]
    R. Perdisci, G. Gu, W. Lee
  • Using the Dempster-Shafer Theory for Network Traffic Labelling [abstract] [spotlight]
    F. Gargiulo, C. Mazzariello, C. Sansone
10:15am Coffee break and poster preview
10:45am Content-based Anomaly Detection in Intrusion Detection [abstract] [slides]
S.J. Stolfo
11:15am A "Poisoning" Attack Against Online Anomaly Detection
M. Kloft, P. Laskov [abstract] [slides]
11:30am Discussion:
  • When is adversarial machine learning needed?
  • When is it effective, are there some convincing examples?
  • How does theory help develop better learning methods?

Afternoon session (2:00pm - 5:00pm):

2:00pm Spam, Phishing, Scam: How to Thrive on Fraud and Deception [abstract]
T. Scheffer
2:30pm The War Against Spam: A Report From the Front Line
B. Taylor, D. Fingal, D. Aberdeen [abstract] [slides]
2:42pm Machine Learning-Assisted Binary Code Analysis
N. Rosenblum, X. Zhu, B. Miller, K. Hunt [abstract] [slides]
3:00pm Poster spotlights:
  • Learning to Predict Bad Behavior [abstract] [spotlight] [poster]
    N. Syed, N. Feamster, A. Gray
  • Attacking SpamBayes: Compromising a Statistical Spam Filter [abstract] [spotlight] [poster]
    M. Barreno, F. Chi, A. Joseph, B. Nelson, B. Rubinstein, U. Saini, C. Sutton, D. Tygar, K. Xia
  • Using visual and semantic features for anti-spam filters [abstract] [spotlight]
    F. Gargiulo, A. Penta, A. Picariello, C. Sansone
  • Supervised Clustering for Spam Detection in Data Streams [abstract] [spotlight]
    U. Brefeld, P. Haider, T. Scheffer
  • Image Spam Filtering by Detection of Adversarial Obfuscated Text [abstract] [spotlight] [poster]
    F. Roli, B. Biggio, G. Fumera, I. Pillai, R. Satta
  • Lightweight Hierarchical Network Traffic Clustering [abstract] [spotlight] [poster]
    A. Hijazi, H. Inoue, A. Somayaji
  • Intrusion Detection in Computer Systems as a Pattern Recognition Task in Adversarial Environment: a Critical Review [abstract] [spotlight]
    I. Corona, G. Giacinto, F. Roli
  • Learning from a Flaw in a Naive-Bayes Masquerade Detector [abstract] [spotlight] [poster]
    K. Killourhy, R. Maxion
3:15pm Coffee break and poster preview
3:45pm Misleading Automated Worm Signature Generators [abstract]
W. Lee
4:15pm ALADIN: Active Learning for Statistical Intrusion Detection
J. Stokes, J. Platt, J. Kravis, M. Shilman [abstract] [slides]
4:30pm Discussion:
  • What do we learn about adversarial learning that can be applied across many applications?
  • What types of applications can benefit the most?
  • What new theory is necessary?

Invited speakers

Can Machine Learning Be Secure?
Marco Barreno, University of California, Berkeley

From intrusion detection to worm signature generation to network traffic analysis to spam filtering, real-world applications increasingly use machine learning because of its ability to adapt to changing conditions and infer patterns from data. A growing body of work has begun to raise the question of security for these machine learning techniques--the adaptability that makes learning useful may also create new opportunities for attack.

In this talk, we will survey the current research in this nascent field and present a taxonomy developed by our group at Berkeley for categorizing attacks against learning systems [1]. We will discuss how our taxonomy can organize and connect the work done to date. Framing research in this way provides two benefits: it draws out similarities between attacks, suggesting the possibility of generalizing attack descriptions and perhaps developing common defenses; and it spotlights types of attack that have received scant research attention. Building on this foundation, we will share what our research group sees as the prominent open problems in this area and which directions we believe are most promising for future research.

[1] Marco Barreno, Blaine Nelson, Russell Sears, Anthony D. Joseph, and J. D. Tygar. Can machine learning be secure? In Proceedings of the ACM Symposium on InformAtion, Computer, and Communications Security (ASIACCS'06), March 2006. (Invited paper.)


Content-based Anomaly Detection in Intrusion Detection
Salvatore J. Stolfo, Columbia University

There are many anti-virus and intrusion detection systems in wide use that are primarily signature-based detectors. They detect what is already known to be bad by matching a signature pattern against input. These systems have been effective at detecting known exploits and intrusion attempts but they fail to recognize new attacks and carefully crafted variants of old exploits. Anomaly Detection has been proposed as an alternative strategy for detecting new attacks. Anomaly Detectors model what is known to be good in order to detect deviations that are presumed to be bad. Anomaly Detection systems that analyze network flow level statistics have been the subject of research for several years and some are now appearing in commercial products. Content-based Anomaly Detection systems that utilize machine learning algorithms are designed to model normal content for a distinct site or host. These systems are designed to detect content deviations of interest that may indicate the presence of malcode that otherwise would not be detected by conventional (and soon to be obsolete) signature-based detectors. In the continuing battle between attacker and defender, Anomaly Detectors can also be thwarted by a variety of obfuscation methods. In this talk we will provide an overview of the state of the art in content-based Anomaly Detection in intrusion detection, describe various approaches to blind these detectors, and propose new approaches to counter these evasion tactics based upon randomization strategies to blind the attacker.


Spam, Phishing, Scam: How to Thrive on Fraud and Deception
Tobias Scheffer, Max-Planck-Institute for Computer Science

Operating a spam business successfully involves many challenges. To start with, it is important to team up with the right vendors. The art of acquiring email addresses of likely customers has well advanced over harvesting addresses from web pages. Disseminating spam requires engineering proficiency: popular methods are the construction and use of viruses that form a network of software bots and the exploitation of web services. Competently engineered, phishing emails are virtually indistinguishable from legitimate personal communication. Analogously, impostor web portals can be made indistinguishable from their legitimate banking and ecommerce counterparts. Text, image, PDF, and audio messages that are generated individually according to probabilistic grammars render text classification based spam filters useless. Recognizing spam, phishing, and scam emails raises great research challenges for machine learning far beyond text classification. They include learning under distribution shift and adversarial learning models, and the recognition of batches of messages that have been generated according to the same (text, image, audio) grammar.


Misleading Automated Worm Signature Generators
Wenke Lee, Georgia Institute of Technology

In this talk, I present a noise-injection technique for defeating worm signature generation. An attacker can craft and inject noise (faked worm flows) into the traffic in such a way that an automated (learning-based) worm signature generator cannot learn reliable signatures. This in turn means that signature-based network intrusion detection systems cannot have good worm signatures to stop the worm traffic until manual/slow worm analysis is conducted.


Organizers

Supported by






Last modified: Mon Jan 7 09:45:46 CET 2008