New Developments in the Polish Parliamentary Corpus
Abstract: This short paper presents the current (as of February 2020) state of preparation of the Polish Parliamentary Corpus (PPC) — an extensive
collection of transcripts of Polish parliamentary proceedings dating from 1919 to present. The most evident developments as compared
to the 2018 version is harmonization of metadata, standardization of document identifiers, uploading contents of all documents and
metadata to the database (to enable easier modification, maintenance and future development of the corpus), linking utterances to the
political ontology, linking corpus texts to source data and processing historical documents.
0 Replies
Loading