I now know some of my summer plans: Google Summer of Code! I just found out I was accepted. The project is to start a machine translation project with Apertium to translate from Finnish to Northern Sámi. The Apertium project is also getting several other GSoC participants in other areas, as well, featuring translation projects from Polish to Czech, and French to Portuguese. In addition, there are other projects to improve and expand Apertium in various ways. If you want to see my proposal, that's available online. If you want to know a little more about machine translation, read on...
There are two major methods of machine translation (Apertium uses a combination of both): statistical and linguistic translation. Google Translate is a well known example of a statistical machine translation program, which is aided by Google's wealth of texts in various languages. Google Translate works by lining up sentences that are known to match up in translation, and then translates chunk by chunk. In a way, it's like speaking a language by phrasebook, you may say mostly the right thing most of the time, but then some other times you may tell your tobaccanist that your hovercraft is full of eels.
The linguistic translation method analyzes words in the source language morpheme by morpheme (the smallest unit of meaning within a word), and then analyzes the word order of the sentences in order to disambiguate and handle what roles the words play in the sentences. After this, a bilingual dictionary is consulted, and the analysis of the sentence in the source language is used to construct the sentence in the target language. This approach is much closer to what it is like to speak a language, in a way; because words are inflected based on linguistic rules, and the grammar is thoroughly consulted to produce the output.
Although linguistic translation seems to be more akin to learning a language and speak it, that does not mean that it is 100% perfect in the sense of what a bilingual human may provide in translating a novel, for instance; but this is not necessarily one of the immediate goals of machine translation. In order for a machine to flawlessly translate any sentence, it would have to have a more thorough understanding of all of the semantic data associated with words, and how to know when to use what word; and all of the various shades of meaning between words.
We're not there yet, but making progress... So maybe some day, once machines gain enough knowledge to translate flawlessly, they will be able to address us politely as they take over the world.
It has been a while since I've last updated this. Life's been busy, partly because at the beginning of August, I'm moving across the ocean to Tromsø, Norway to continue my studies in linguistics at the Universitetet i Tromsø. I'm excited to go, but I'll also miss Minneapolis, which I've grown more attached to in the past few years.
Despite this, it feels like it's about time I got back to linguistics! I've spent about 2.5 years after graduating with an undergraduate degree (also linguistics) working in web development and tech support at the University of Minnesota. While it's been enjoyable and absolutely full of learning something, I'm excited about the change of what I'll be doing, and incredibly happy to get back to linguistics full-time.
I'm also excited to be moving to a town like Tromsø. Not only is it beautiful and mountainous, it's linguistically exciting too (at least for someone interested in Finno-Ugric languages). Tromsø is located in Sápmi, a cultural region in northern Scandinavia and Russia, termed so because it is home to the Sámi people. The region also contains some varieties of Balto-Finnic languages: Kven and Finnish.
The variety of Norwegian spoken in the area is also rather exciting, and is host a few innovations not present in standard bokmål Norwegian. I can't say I know enough about it yet, but I'll have my ears wide open upon arrival.
If it seems like it's a while between posts, be sure to check out my Flickr page.
![[Atom/RSS icon]](/m/img/feed.png)