TAXPAYER CLUSTERING FROM FINANCIAL STATEMENT REPORTING PATTERNS
Abstract: : The usage of machine learning in accounting area is one stream
that less explored compared to such mainstream Natural Language Processing (NLP) or image
explorations, yet it is a promising area to explore. Background Problems: The use of classical
methods such as vertical and horizontal analysis might be effective to detect anomalies or financial
fraud in a financial statement, but they will not be efficient tool when faced with enormous set of
data. The same problem is faced by Indonesian tax officials when it comes to detecting anomalies
in financial statements reported to tax officials by taxpayers. Novelty: This study explores state of
the art linear regression theorem to be used in accounting area. Using data of financial statements
reported to Indonesian tax administration and historical records of the taxation audits, the study
shows detectable patterns generated from the experiment. Research Methods: This study uses
linear regression’s B1 value that represent degree of changes over the studied year. A set of B1 of
each entity creates a unique high-dimensional object. In short, this study uses yearly value of each
financial statement account of an entity to build a unique point that represent itself among other
points. a clustering method then done to finally group sets that are having similar pattern. Finding
/ Results: This study shows that the method used in this experiment can be used as option to detect
patterns on how an entity reports its financial statement over the years. These pattern then being
validated by the occurrence of underpayment/ overpayment of corporate income from tax audit
results. By examining the cluster results, this study shows that some clusters identify labelled
pattern quite well with 2 out of 3 labels can be identified well by this study. The comparation
results between unsupervised clustered method versus labeled criteria show a significant
probability of fitness. Conclusion: This study demonstrates a promising technique using machine
learning to detect patterns in financial reports that are reported to tax administrators that can assist
them for early detection of suspicious reports.
Loading