Cost-Sensitive Algorithms for Imbalanced Text Classification: a study case in Brazilian legal domain

Anonymous

Cost-Sensitive Algorithms for Imbalanced Text Classification: a study case in Brazilian legal domain

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone

Abstract: This article discusses the challenges of imbalanced classification in machine learning, where algorithms often wrongly assume an even distribution of instances across classes. This issue is common in real-world scenarios, leading to poor representation of minority classes in training data. To combat this, Cost-Sensitive Learning techniques have been developed, focusing on minimizing the overall misclassification cost rather than merely optimizing accuracy. These techniques are categorized into three types: Cost-Sensitive resampling, Algorithms, and Hybrid techniques. The research presents a case study on classifying lawsuits into repetitive themes in São Paulo Court, Brazil, using these cost-sensitive approaches on an imbalanced dataset. The goal is to automate the classification of lawsuits to save time, use human resources more effectively, and speed up lawsuit resolution. The study highlights the effectiveness of cost-sensitive techniques in handling imbalanced classification and their benefits in real-world applications, particularly in the legal field, by enhancing efficiency and reducing manual workload and processing time for lawsuits.

Paper Type: long

Research Area: Machine Learning for NLP

Contribution Types: NLP engineering experiment

Languages Studied: Portuguese(Brazil)

0 Replies

Loading