Setembro, 2025

João Luiz Junho Pereira, Alfredo Antonio Alencar Exposito de Queiroz, Telmo de Menezes e Silva Filho, Ana Carolina Lorena, Rafael Gomes Mantovani, Gisele Lobo Pappa, and Ricardo Bastos Cavalcante Prudêncio. 2025. Novel applications of item response theory for analysing data set complexity and benchmark selection. Mach. Learn. 114, 10 (Sep 2025). https://doi.org/10.1007/s10994-025-06873-3

Abstract

Item response theory (IRT) was developed in psychometrics to measure the latent skills of human respondents based on their observed responses to items with different difficulty levels. Human ability is high in IRT when one correctly responds to difficult items despite random mistakes in easy items. IRT has been recently framed as a powerful tool to characterise instance hardness in classification problems by measuring difficulty and discrimination levels of instances in a data set based on the correctness of a set of classifiers. Here, we generalise such a concept to the data set level by taking a pool of 509 classification data sets and assessing their difficulties and discriminations based on the performance achieved by 95 classifiers when solving these problems. The ability is estimated such that high abilities are assigned to classifiers with better behaviour in hard data sets. We further evaluated IRT in two distinct applications. First, we build a regression meta-model where complexity measures are used to predict the IRT parameters of new data sets without the need to retrain the IRT model. Second, we propose two IRT-based benchmarks with 30 data sets each to test classifiers, one selected for diversity and another selected for greater difficulty. Both benchmarks may be used to evaluate new methods more broadly, instead of the common practice of gathering random data sets from public repositories.

Authors

João Luiz Junho Pereira, Divisão de Ciência da Computação, Instituto Tecnológico da Aeronáutica, São José dos Campos, Brazil

Alfredo Antonio Alencar Exposito de Queiroz, Divisão de Ciência da Computação, Instituto Tecnológico da Aeronáutica, São José dos Campos, Brazil

Telmo de Menezes e Silva Filho, School of Engineering Mathematics and Technology, University of Bristol, Bristol, UK

Ana Carolina Lorena, Divisão de Ciência da Computação, Instituto Tecnológico da Aeronáutica, São José dos Campos, Brazil

Rafael Gomes Mantovani, Universidade Tecnológica Federal do Paraná, Apucarana, Brazil

Gisele Lobo Pappa, Departamento de Ciência da Computação, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil

Ricardo Bastos Cavalcante Prudêncio, Centro de Informática, Universidade Federal de Pernambuco, Recife, Brazil


Comentários desativados

Sobre este site

Portal institucional do Centro de Informática – UFPE

Encontre-nos

Endereço
Av. Jornalista Aníbal Fernandes, s/n – Cidade Universitária.
Recife-PE – Brasil
CEP: 50.740-560

Horário
Segunda–Sexta: 8:00–18:00