Fraud detection by generating positive samples for classification from unlabeled data

Kocsis, Levente and György, András (2010) Fraud detection by generating positive samples for classification from unlabeled data. In: ICML 2010. Proceedings of the 27th international conference on machine learning. Workshop on machine learning and games. Haifa, 2010..

[img] Text
KocsisGyorgy.pdf - Published Version
Restricted to Registered users only

Download (87kB)


In many real world (binary) classification problems it is easy to obtain unlabeled data, but labeled data are very expensive or simply unavailable. In certain cases, however, such as in the problem of detecting frauds in (computer) games, or insider trading in stock markets, one can assume that the unlabeled data contains very few samples from one class (fraudulent plays or insider trades), but it is possible to generate synthetic data from this class. Training a naive classifier on the above data is particularly suited for detecting frauds in Markov decision problems if the feature vectors of the classifier are composed of the frequency a player abates from the optimal policy in each state and the associated excess reward. Based on a synthetic example in blackjack, we demonstrate that the above classification method can perform quite well even in the case the generated positive samples come from a distribution different to the real one. The method is also applied to identify possibly fraudulent trades in the stock market.

Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: anomaly detection, fraud detection, semi-supervised learning
Subjects: Q Science > QA Mathematics and Computer Science > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány
Depositing User: Eszter Nagy
Date Deposited: 12 Dec 2012 08:38
Last Modified: 12 Dec 2012 08:38

Update Item Update Item