% \documentclass[AMA,Times1COL]{WileyNJDv5}
%\documentclass[AMA,Times1COL]{WileyNJDv5}


%\articletype{Original Article}%

%\received{01 July 20024}
%\revised{Date Month Year}
%\accepted{Date Month Year}
%\journal{Journal}
%\volume{00}
%\copyyear{2023}
%\startpage{1}

%\raggedbottom



%\begin{document}



%%%%%%%%%%%%%%%% INICIO ESTILO MAEB 2025

\documentclass{article}


% if you need to pass options to natbib, use, e.g.:
%     \PassOptionsToPackage{numbers, compress}{natbib}
% before loading maeb_2025


% ready for submission
\usepackage[nonatbib,final]{maeb_2025}
\usepackage{graphicx}

% to compile a preprint version, e.g., for submission to arXiv, add add the
% [preprint] option:
%     \usepackage[preprint]{maeb_2025}


% to compile a camera-ready version, add the [final] option, e.g.:
%     \usepackage[final]{maeb_2025}


% to avoid loading the natbib package, add option nonatbib:
%    \usepackage[nonatbib]{maeb_2025}


\usepackage[utf8]{inputenc} % allow utf-8 input
\usepackage[T1]{fontenc}    % use 8-bit T1 fonts
\usepackage{hyperref}       % hyperlinks
\usepackage{url}            % simple URL typesetting
\usepackage{booktabs}       % professional-quality tables
\usepackage{amsfonts}       % blackboard math symbols
\usepackage{nicefrac}       % compact symbols for 1/2, etc.
\usepackage{microtype}      % microtypography
\usepackage{xcolor}         % colors
\usepackage{verbatim}       % multi-line comments

\title{Genetic Programming for Age-at-death Estimation from the Pubic Symphysis}


% The \author macro works with any number of authors. There are two commands
% used to separate the names and addresses of multiple authors: \And and \AND.
%
% Using \And between authors leaves it to LaTeX to determine where to break the
% lines. Using \AND forces a line break at that point. So, if LaTeX puts 3 of 4
% authors names on the first line, and the last on the second line, try using
% \AND instead of \And before the third author name.


%\author{%
%  Josu Ceberio\thanks{Use footnote for providing further information
 %   about author (webpage, alternative address)---\emph{not} for acknowledging
 %   funding agencies.} \\
 % Department of Computer Science\\
 % Cranberry-Lemon University\\
 % Pittsburgh, PA 15213 \\
 % \texttt{hippo@cs.cranberry-lemon.edu} \\
  % examples of more authors
  % \And
  % Coauthor \\
  % Affiliation \\
  % Address \\
  % \texttt{email} \\
  % \AND
  % Coauthor \\
  % Affiliation \\
  % Address \\
  % \texttt{email} \\
  % \And
  % Coauthor \\
  % Affiliation \\
  % Address \\
  % \texttt{email} \\
  % \And
  % Coauthor \\
  % Affiliation \\
  % Address \\
  % \texttt{email} \\
%}

\author{
Enrique Bermejo, Oscar Cordón \\
DECSAI, University of Granada\\
\texttt{\{ebermejo,ocordon\}@decsai.ugr.es}\\ 
%\orcid{0000-0003-0355-1898}
\And
Antonio David Villegas\\
Panacea Coop. Research\\
\texttt{antonio.villegas.panacea@gmail.com}\\ %\orcid{0009-0009-9460-2814}
\And
Javier Irurita\\
Physical Anthropology Lab\\
University of Granada\\
\texttt{javieri@ugr.es }\\%\orcid{0000-0003-1676-9773}
\And
Sergio Damas\\
Software Engineering Dpt.\\
University of Granada\\
\texttt{sdamas@ugr.es}\\  %\orcid{0000-0002-8377-8349}
%\And
%Oscar Cord\'on\\
%DECSAI\\
%University of Granada\\
%\texttt{ocordon@decsai.ugr.es}\\ %\orcid{0000-0001-5112-5629}
%\equalcont{These authors contributed equally to this work.}
}

%\authormark{E. BERMEJO \textsc{et al.}}
%\titlemark{Interpretable Machine Learning for Age-at-death %Estimation from the Pubic Symphysis}

%\address[1]{\orgdiv{Dept. of Computer Science and Artificial Intelligence}, \orgname{University of Granada}, \orgaddress{\city{Granada}, \postcode{18071}, \country{Spain}}}
%\address[2]{ \orgname{Panacea Cooperative Research S. Coop.}, \orgaddress{  \city{Ponferrada}, \postcode{24402},  \country{Spain}}}
%\address[3]{\orgdiv{Dept. of Legal Medicine, Toxicology and %Physical Anthropology},  \orgname{University of Granada}, %\orgaddress{\city{Granada}, \postcode{18016}, \country{Spain}}}
%\address[4]{\orgdiv{Dept. of Software Engineering},  \orgname{University of Granada}, \orgaddress{\city{Granada}, \postcode{18071}, \country{Spain}}}
%\address[5]{\orgdiv{Andalusian Research Institute in Data Science and Computational Intelligence (DASCI)}, \orgaddress{\city{Granada}, \postcode{18016}, \country{Spain}}}

%\corres{Enrique Bermejo, Panacea Cooperative Research S. Coop. Ponferrada, Spain. \email{enrique.bermejo@decsai.ugr.es}}

\begin{document}


\maketitle




\begin{abstract}
%Age-at-death estimation is an arduous task in human identification based on characteristics such as appearance, morphology or ossification patterns in skeletal remains. This process is performed manually, although in recent years there have been several studies that attempt to automate it. One of the most recent approaches involves considering interpretable machine learning methods, obtaining simple and easily understandable models.  The ultimate goal is not to fully automate the task, but to obtain an accurate model supporting the forensic anthropologists in the age-at-death estimation process. 

%We propose a semi-automatic method for age-at-death estimation based on nine pubic symphysis traits identified from Todd’s pioneering method. Genetic programming is used to learn simple mathematical expressions following a symbolic regression process, also developing feature selection. Our method follows a component-scoring approach where the values of the different traits are evaluated by the expert and aggregated by the corresponding mathematical expression to directly estimate the numeric age-at-death value. Oversampling methods are considered to deal with the strongly imbalanced nature of the problem. State-of-the-art performance is achieved thanks to an interpretable model structure that allows us to both validate existing knowledge and extract some new insights in the discipline.

Skeleton-based age-at-death estimation is an arduous task in human identification based on characteristics such as appearance, morphology or ossification patterns. This process is performed manually, although in recent years there have been several studies that attempt to automate it. This study proposes a semi-automatic method for estimating age-at-death using nine pubic symphysis traits derived from Todd's method. By employing genetic programming and symbolic regression, simple mathematical expressions are generated to estimate age-at-death. To address the imbalance in the data, oversampling methods are implemented. The method achieves state-of-the-art performance while maintaining interpretability, allowing validation of existing knowledge and discovery of new forensics insights.


\end{abstract}

%\keywords{Interpretable machine learning, Age estimation, Decision support system, Symbolic regression, Genetic programming}

%\jnlcitation{\cname{%\author{Taylor M.},\author{Lauritzen P},\author{Erath C}, and\author{Mittal R}}.\ctitle{On simplifying ‘incremental remap’-based transport schemes.} \cjournal{\it J Comput Phys.} \cvol{2021;00(00):1--18}.}

\maketitle


\section{Introduction}\label{sec:introduction}

Human identification in forensic anthropology (FA) is crucial in scenarios of individual and mass casualties~\cite{Ubelaker2008Forensic}. The accurate estimation of biological profile (BP)  (i.e. age, sex, ancestry, and stature) is essential in narrowing down potential matches, with DNA analysis used for final identification if possible. Among other bones, the pubic symphysis is highly reliable for age-at-death estimation~\cite{Dudzik2015Estimating}.

Historically, methods for age estimation have evolved from phase-based to numerical approaches. Phase-based methods, like the Suchey-Brooks extension of Todd's method~\cite{Brooks1990Skeletal}, are simple and widely used but suffer from subjectivity and reduced accuracy. Meanwhile, methods obtaining the estimation from an overall analysis of the morphological characteristics associated to the different pubic symphysis changes (usually performed with a visual inspection of the bone) and methods analyzing each pubic bone trait in isolation and then aggregating the partial observations to take the final decision (i.e., component-scoring and component-based methods). Component-scoring methods, first proposed by Gilbert and McKern~\cite{Gilbert1973AMethod}, offer a more objective approach by evaluating each trait individually before aggregating the observations. The use of either scoring or other component-based methods had already shown a significant reduction in both intra- and inter-observer error. This is because labeling each component separately can be done more objectively than assigning a general age-at-death estimation (phase or number) to the entire pubic symphysis. 

The development of automatic, precise, and robust age-at-death estimation methods is currently a significant focus in FA~\cite{Ubelaker2020}. Modern approaches include advanced computer vision and machine learning (ML) techniques, which, although accurate, often produce complex models that are not easily interpretable. This study~\cite{Bermejo2025} focuses on developing transparent and accurate models.
%builds on previous efforts to automate Todd's method, 




\section{Symbolic Regression with Genetic Programming}\label{sec:methods}
%\subsection*{Learning the regression models using genetic programming-based symbolic regression}
\label{sec:learning}


%This section is devoted to briefly review the basics of the explainable machine learning (XML) approach considered in the current contribution. To do so, we will first devote a short subsection to introduce the discipline of explainable artificial intelligence and XML, which allows us to categorize the position of GP-based symbolic regressors in the field. Then, we discuss the fundamentals of classical GP, bloat control, and GA-P as important tools to design transparent ML models.


%\subsection{Explainable artificial intelligence and explainable machine learning} \label{sec:XML}


Explainable artificial intelligence (XAI)~\cite{Barredo2020} is vital for human-centric decision support systems, including FA. XAI emphasizes the need for interpretable models that balance accuracy and transparency, ensuring trustworthiness in critical applications like medicine, law, and security. Symbolic regression, a form of regression analysis, seeks to represent data relationships without prior knowledge of the underlying mathematical expressions, enhancing interpretability compared to black-box models like deep learning. Evolutionary algorithms (EAs) are effective for exploring the complex space of such expressions. Genetic programming (GP)~\cite{Koza1992}, a tree-based EA, represents mathematical expressions as tree structures with variables as terminal nodes and operators as inner nodes. Niching genetic algorithm-programming (GA-P)~\cite{Howard1995} hybridizes GP with genetic algorithms (GAs) for parameter optimization, enhancing expression estimation. GP and GA-P both suffer from {\it bloat}, an excessive growth in individual size that increases evaluation costs and overfitting risk. Various bloat control methods, such as limiting tree depth or using semantic approximation (GP-DA)~\cite{Nguyen2020}, address this issue by optimizing expression subtrees for semantic similarity. This method optimizes the semantic vector of subtrees, enhancing diversity while maintaining performance.




\section{Results and analysis}\label{sec:results}
\subsection{Experimental design}

The data set includes 960 annotated pubic symphysis samples from 600 individuals, aged 17-82, collected by forensic experts. Nine traits, identified from Todd's method, are categorized and annotated. The imbalanced data set, with more older samples, requires oversampling for effective model training. Random replication oversampling re-balances the data set, creating 2780 training and validation samples. The final test set remains original and unseen, ensuring unbiased evaluation. The experimental setup involves 5-fold cross-validation (5-CV) to avoid overfitting and ensure robust model selection. Multiple runs with different seeds assess method robustness. Twenty-five models are learned for each method (GP, GP-DA, GA-P), with the best validation model tested on the unseen data set. Parameters for GP-DA include a maximum depth of 20 nodes. GP and GA-P configurations vary tree depths at 20, 40, and 60 nodes. Other parameters include variable generation probability (0.3), crossover (0.75) and mutation (0.05) probabilities, intra-niche crossover probability (0.03), and population size (1000). The stopping criterion is 1,000,000 evaluations.

\subsection{Model validation and selection}

Performance metrics include root mean squared error (RMSE) and mean absolute error (MAE). GP-DA shows slight performance advantages with smaller tree depths. Despite similar accuracies, simpler models with fewer parameters generalize better. Seven best-performing models from 5-CV are evaluated on the test set, compared with classical ML methods like linear regression (LR), support-vector machines (SVM), decision trees (DT), and random forests (RF). GP-DA and GA-P achieve the best results, with simpler models offering better generalization. The finally selected model is expressed by the following equation\footnote{The interested reader is referred to~\cite{Bermejo2025} for a graphical description of the variables involved in this equation.}, including only 5 different pubic symphysis variables:
 \begin{equation}
        Age = 6.06 \cdot I_{P} + U_{SE} +6.06 \cdot L_{SE} +5.06 \cdot V_{M} + \frac{V_{B}}{I_{P}},
 \end{equation}
where $I_{P}$ represents the irregular porosity, $U_{SE}$ the upper symphysial extremity, $L_{SE}$ the lower symphysial extremity, $V_{M}$ the ventral margin, and $V_{B}$ the ventral bevel. We select this model due to its simplicity, which helps to make it more interpretable. The expression is simple as in all the cases but one the variables are simply multiplied by a factor and added together. The only compound trait is a ratio between the ventral bevel and the irregular porosity values, that is directly incorporated into the remaining expression. The most influential features are $I_{P}$ and $L_{SE}$, according to the coefficients in the expression. Notably, the former plays a double role, as it also acts as a modifier of the ventral bevel trait in the denominator of the $\frac{V_{B}}{I_{P}}$ variable. From the extensive experimentation, we can confirm that key traits for age-at-death estimation include irregular porosity, lower symphysial extremity, and ventral margin, frequently used in GA-P models, while less relevant traits include bony nodule and dorsal plateau, aligned with previous work~\cite{Gamez2021XAITodd}.

\subsection{Benchmarking and overview}
As a final comparison, Table \ref{tab:sota} summarizes the relation of the best models found in this contribution for GP, GA-P, and GP-DA with the age estimation methods reviewed earlier. Though the proposed benchmarking is a rough and generic comparison, it will allow us to draw a valuable overview on the accuracy of the methods. First of all, the comparison involves methods of different typology (phase-based and numerical, as well as global and component-scoring) which were tested by using a different validation methodology (leave-one-out cross validation (LOOCV), single 50\% training-test partition, use of the samples of one pubic symphysis laterality for training and those of the other laterality for test, and 5-fold CV). Moreover, the size, age range, and distribution according sex or ethnic groups of the samples are key differences. Setting aside such differences, our proposal stands out in the comparison. 

%% table 5
\begin{table}[ht]

\scriptsize
\caption{Comparison between the best proposed methods, and the state-of-the-art results. Acronyms used: {\bf Method type}: CS=component-scoring, PB=phase-based, N=numeric; {\bf Experimental setup}: LOOCV: leave-one-out cross validation, 50\%-50\% split: single 50\% training-test partition, tra: XXX-test:XXX: use of the samples of one pubic symphysis laterality for training and those of the other laterality for test; 5-CV: 5-fold cross validation; AD: Alternative Distribution.}\label{tab:sota}
\centering
\begin{tabular}{lllcccc}
\hline
\textbf{\textbf{Method}}   & \textbf{Type}          & \textbf{Exp setup} & \textbf{\# Samples}     & \textbf{Age range} & \textbf{RMSE}  & \textbf{MAE}  \\ \hline
\textbf{Slice and Algee-Hewitt~\cite{Slice2015Modeling}}     & N & LOOCV                           & 41                & 19-96     & 17.15          & —             \\
\textbf{Stoyanova et al.~\cite{Stoyanova2015AnEnhanced}}     & N  & LOOCV                           & 56 & 16-100       & 19        & —             \\
\textbf{Stoyanova et al.~\cite{Stoyanova2017AComputational}} & N  & 50-50\% split                   & 93  & 16-90       & 13.7-16.5 & —             \\
\textbf{Kot{\v{e}}rov{\'a} et al.~\cite{Koterova2018Age}}              & CS,N  & 5-CV (w/o test)                 & 941               & 19-100    & 12.1           & 9.7           \\
\textbf{Kot{\v{e}}rov{\'a} et al.~\cite{Koterova2022} SAAS}                & CS,N  & 5-CV (w/o test)                 & 483               &  18-92       & 14.3       & 11.7              \\
\textbf{Kot{\v{e}}rov{\'a} et al.~\cite{Koterova2022} AANNESS}                & N  & 5-CV (w/o test)                 & 483               &  18-92       &  12.9            & 10.6        \\
\textbf{Gámez-Granados et al.~\cite{Gamez2021XAITodd}}       & PB    & tra: right lat.-test: left lat. & 892 (439-453)     & 18-60     & 13.19          & 10.38         \\
\textbf{Gámez-Granados et al.~\cite{Gamez2021XAITodd}}       & PB    & tra: right lat.-test: left lat. & 960 (487-473)     & 18-82     & 14.61          & 11.62         \\ \hline
\textbf{GP  (Depth 20)}                                       & CS,N    & 5-CV tra-val-test               & 960 (614-154-192) & 18-82    & 10.82          & 8.56          \\
\textbf{GA-P (Depth 20)}                                     & CS,N    & 5-CV tra-val-test               & 960 (614-154-192) & 18-82    & \textbf{10.81} & \textbf{8.55} \\
\textbf{GP-DA (Depth 20)}                                    & CS,N    & 5-CV tra-val-test               & 960 (614-154-192) & 18-82    & 10.84 & 8.55 \\ \hline
\textbf{GA-P (Depth 20) AD}                                     & CS,N    & 5-CV tra-val-test               & 668 (381-95-192) & 18-82    & \textbf{9.54} & \textbf{7.51} \\
\hline
\end{tabular}
\end{table}


\subsection{Designing a new model}

For a more thorough exploration of the tentative expression space, we will follow a different approach by combining age-targeted undersampling and oversampling strategies. This results in a new uniform distribution in which each age value is represented by 21 samples. Specifically, the undersampling step reduces the dataset to 476 samples in the middle-age range, while the oversampling step yields a total of 1113 samples for training purposes to avoid individuals over 64 years being considered outliers. We follow the same experimental setup as in our previous experiment (5-CV) considering the new data distribution. The behavior of the algorithms is similar and GA-P (Depth 20) also achieves the best performance among the GP methods. The last row of Table~\ref{tab:sota} (GA-P AD, standing for Alternative Distribution) summarizes the results for an overview comparison. In particular, the test results are {\bf 9.54} and {\bf 7.51} years according RMSE and MAE, respectively. Hence, the refined preprocessing allows the method to improve its performance and to achieve an even lower test error. Meanwhile, the resulting equation can be expressed as follows:  

\begin{equation}
    Age = V_{M} \left(I_{P} + 4.88 B_{N} + 4.88 D_{M} - V_{M} + 3.22 + \frac{I_{P} -V_{M}}{V_{B}}\right).
\end{equation} 

These observations further support the notion that there exist certain traits more appropriate for specific age ranges, in agreement with~\cite{Gamez2021XAITodd} and Castillo et al.~\cite{Castillo2021Technical}. 

\section{Conclusions}\label{sec:conclusions}

This study~\cite{Bermejo2025} presents a robust, interpretable method for age-at-death estimation, combining traditional forensic knowledge with modern ML. Symbolic regression with GP and GA-P offers accurate, transparent models, supporting forensic anthropologists' work. Future work should explore hierarchical models and ensemble expert assessments to further enhance reliability and accuracy.



\begin{ack}
This work was supported by grant CONFIA (PID2021-122916NB-I00) funded by MCIN/AEI/10.13039/501100011033, funded by ``ERDF A way of making Europe''. Additionally, E.B.'s work has been supported by the Regional Government of Andalusia as postdoctoral fellow (DOC\_01130). AD.V.'s work has been supported by the Regional Government of Andalusia under the Recovery, Transformation and Resilience Plan (GR/INV/0004/2022). 

\end{ack}



 %\bibliography{sn-bibliography}%

%\bibliographystyle{plainnat}
\bibliographystyle{IEEEtran}
{\small
\bibliography{IEEEabrv,sn-bibliography}
}


\end{document}
