Keywords: machine learning, astrobiology, Gauquelin, Shnoll
Abstract: We apply machine learning methods to the data from Archives Gauquelin in an attempt to build a binary classifier able to distinguish between outstanding scientists and sports champions using only astronomical factors derived from their natal data. We apply a special splitting into training, validation and testing sets, and a set of three combined features, each of which combines dozens of elementary astronomical features. Our null hypothesis is that accuracy on Testing sets must be 0.5 if the Training sets contain the same number of group A and group B representatives born on each year, that is, if yearly frequencies are equal. Our external Testing sets contain only persons born later than those in Archives Gauquelin. Logistic Regression is our primary method, and Random Forest an alternative. All data and implementations of our algorithms are available from a public repository on GitLab.com.
0 Replies
Loading