The Use of Machine Learning in Detecting Corporate Income Tax Fraud From Financial Statement Pattern

Achmad Ginanjar

Published: 31 Aug 2022, Last Modified: 30 Sept 2024DGT Reserach and Scholarship Festival Extended Abstract 2022EveryoneRevisionsCC BY 4.0

Abstract: The usage of machine learning in analysing financial statement is one stream that less explored compared to data mining mainstream such as Natural Language Processing (NLP) or image explorations, yet it is a promising area to explore. This study explores state of the art linear regression theorem to analyse detectable pattern in taxpayer's financial statement, utilizing method that conceptually adopting the basic concept of both vertical and horizontal approaches in financial statement analysis. Using data of financial statements reported to Indonesian Tax administration and historical records of the taxation audits, the study shows that there are detectable patterns generated from the experiment. This study uses linear regression's of financial statement account values that represent degree of changes over the studied year. Furthermore, this study uses yearly value of each financial statement account of an entity to build a unique point that represent itself among other points. a clustering method then done to finally group sets that are having similar pattern. This study shows that the method used in this experiment can be used to analyse how an entity reports its financial statement over the years and clustering it based on the possibility of committing fraud derived from historical audit record of financial statement pattern. These pattern then being validated by the occurrence of underpayment/ overpayment of corporate income from tax audit results. By examining the cluster results, this study shows that some clusters identify labelled pattern quite well where 2 out of 3 labels can be identified accurately. The comparation results between unsupervised clustered method versus labeled criteria show a significant probability of fitness. This study, however, did not evaluate the feature importance that might show the reason why the clusters are formed.