Jan Mielniczuk ; Adam Wawrzeńczyk - Single-sample versus case-control sampling scheme for Positive Unlabeled data: the story of two scenarios

fi:12648 - Fundamenta Informaticae, May 14, 2025, Volume 193
Single-sample versus case-control sampling scheme for Positive Unlabeled data: the story of two scenariosArticle

Authors: Jan Mielniczuk ; Adam Wawrzeńczyk

    In the paper we argue that performance of the classifiers based on Empirical Risk Minimization (ERM) for positive unlabeled data, which are designed for case-control sampling scheme may significantly deteriorate when applied to a single-sample scenario. We reveal why their behavior depends, in all but very specific cases, on the scenario. Also, we introduce a single-sample case analogue of the popular non-negative risk classifier designed for case-control data and compare its performance with the original proposal. We show that the significant differences occur between them, especiall when half or more positive of observations are labeled. The opposite case when ERM minimizer designed for the case-control case is applied for single-sample data is also considered and similar conclusions are drawn. Taking into account difference of scenarios requires a sole, but crucial, change in the definition of the Empirical Risk.


    Volume: Volume 193
    Published on: May 14, 2025
    Accepted on: July 9, 2024
    Submitted on: December 5, 2023
    Keywords: Computer Science - Machine Learning

    Consultation statistics

    This page has been seen 27 times.
    This article's PDF has been downloaded 15 times.