What is Sean doing in Bozen?
I am in a town in northern Italy called Bozen in German, Bolzano in Italian. Around 100,000 people live in Bozen. Bozen is in South Tyrol (German: Südtirol; Italian: Alto Adige), which is an autonomous province about the size of Delaware. Both German and Italian are spoken in South Tyrol, but the native speakers of German have a moderate majority.
I have an 18-month job at an institution called EURAC, or the European Academy (Europäische Akademie). EURAC is technically private, but it receives funding from the EU (the European govenment) and from the government of the Autonomous Province of South Tyrol, among other sources. A lot of different kinds of research are carried out here, including environmental and land use issues in South Tyrol, study of the genetics of the local human population (many of the mountain villages have seen little population movement over the centuries), and so on.
I'm in the Department of Communication and Multilingualism. I am working with texts which were written in German here in South Tyrol; this job makes good use of my skills as a linguist and as a programmer. The main part of my job is to take electronic texts from all different sources (newspaper text, government texts, court documents, novels, cookbooks, etc.) and to convert them all to a single, standard electronic format for easy processing. A collection of texts of this kind is called a corpus.
What is a corpus good for? Here is a brief overview I wrote.