Letter_2_Santa.py – Tapping Big Data from the Arctic Circle
Keywords: Finland, Linguistic Data Science, Cultural Studies, Art Education, Pragmatics
TL;DR: Submission for a POSTER
Abstract: POSTER
Our poster presents the results of a pilot project which aims at building the Santa Claus Letter Corpus. These letters – sent to Finland from around the world – feature text and art, mostly handwriting enriched with drawings. The senders are primarily children. The physical collection at the National Archives contains 25 shelf meters of letters. So far they have been catalogued only in bunches, according to country and year of origin.
We have started to examining the collection in 2023, digitised parts of it, enriched the cataloguing metadata, run tests for quantitiative analyses, and carried out first qualitative analyses. Our original focus has been on letters which we expected to be written in either German, Finnish, Swedish, or Russian. But we found out immediately that the language diversity is higher than the sender’s poststamp suggests, e.g. letters sent in Finnish from Sweden or in English from Germany.
The main results of our pilot were: 1) The documentation of workflows and data standards for digitisation, 2) Preliminary (manual) indexing according to language, artwork, and texttype, 3) Experimenting with computational methods for indexing the letters (format-, language-, and text recognition), 4) Pragmatic analysis of a subset of German-language letters (name anonymised, in press).
REFERENCES
name anonymised (in press) „Briefe an den Weihnachtsmann in Finnland – eine unerforschte Textsorte. Kategorisierung und textpragmatische Auswertung“
Submission Number: 60
Loading