Recent blog entries from 2009

nginx and Django media directories — 30 Dec 2009

Brief note to anyone else who's looking (and me incase I forget). While learning to use nginx, either I misread somewhere or borrowed the wrong information, but I was attempting to set up static content directories within my server definition like so:

server {

    ...

    location /media {
        root /site/media/dir;
        access_log off;
    }

    ...

}

Unfortunately, it wasn't working out for me that way. I tried all manner of things to test that it wasn't something simple, such as making sure there were slashes on the end of the directory (nope), or a permissions issue for nginx's user not being able to access the directory (nope!). Finally, I stumbled upon an email list posting that lead me in the right direction: I needed to use an alias instead of directory root. Using root (according to the email thread) implies that the directory you specify (e.g., location /media) actually exists within your server's root directory. Oops!

location /media {
    alias /site/media/dir;
    access_log off;
}

Now with an alias instead, everything works fine, and nginx is running like a charm.

0 comments

Minor updates — 29 Dec 2009

Since I'm a poor graduate student, I've added a basic CV, which is accessible either by that link, or if you're not reading this in an RSS feeder, to the upper right in the navigation bar. More detailed information is of course available at request.

Also, there were some outstanding things that needed doing with this blog, such as adding pagination, trackbacks, switching over to Mercurial/Hg instead of SVN (Hg seems to just have a better workflow for me). I disabled some entertaining, yet out of date things that I hadn't had time to do the upkeep on, or indeed finish up the starting content (moving country takes work!). Maybe I'll get back to that soon, and reenable it. Blogging about random Finnish words in detail is fun, anyway. :)

Of course, I always wonder why I don't just run some WordPress instance, or just direct my domain at a Blogger account, but it's always more fun to program things on your own when you've got the time, and you always learn a few new and useful things. On the to do list for the break from school is to try out nginx. I've heard it's great for high load things, so I'm curious to do some load-testing with some projects I've been working on that are more computationally intensive. More on one of those later... :)

0 comments

OpenMinneapolis announced — 16 Dec 2009

If you don't know much about Minneapolis and what government data is available to you, that's no surprise; it's somewhat difficult to find out, and requires reading through state statutes and city ordinances in order to know what you can request. That doesn't mean that the process of getting this data is any easier, unless you have experience with governmental processes.

Having become annoyed with a lack of transparency and open government data in Minneapolis, some friends and I are launching a project, OpenMinneapolis.org, to make data on government meetings, officials, elections and various processes more available to the public and private individuals. You'll even be able to check out attendance records for city council members, and know whether they're doing the job you elected them to do, as well as have access to meeting minutes in an open and usable format (in this case XML). Take a look at the announcement on OpenMinneapolis.org for more details...

We're already getting some press too, and have submitted a grant application to the Knight foundation, which you can go and vote on. :)

1 comments

Referrer-based conditional redirects in lighttpd — 16 Nov 2009

I noticed someone was hot-linking to an image of Bonnie Tyler stored on my Bonnie Tyler tribute website: Total Eclipse of the World, so I thought I'd circumvent this somehow. I don't want my precious bandwidth going to unknown hotlinkers.

The solution was to use lighttpd's virtual host definitions to insert a condition saying that if the HTTP referrer is not my domain, then the user should be redirected somewhere else. Now, referrers can of course be spoofed easily, but this is not the point. I may still want people to be able to share a link to an image, but I don't want someone embedding an <img> tag somewhere that gets a ton of hits... Here's what the configuration looks like:

$HTTP["host"] == "mydomain.com" {
    server.document-root = "/foo/bar"
    server.errorlog = "/foo/bar"
    accesslog.filename = "/foo/bar"
    server.error-handler-404 = "/foo/bar"

    $HTTP["referer"] !~ "^($|https?://(.*\.)?mydomain\.com)" {
        url.redirect = ("^(.*)\.(jpg|gif|png|css|js)" => "http://www.goldenplec.com/wp-content/uploads/2008/10/rick_astley.jpg")
    }
}

0 comments

Action Squad and plagiarism troubles with Greg Brick — 16 Nov 2009

Greg Brick, author of Subterranean Twin Cities, has allegedly lifted contents of his book from Action Squad's website. Has anyone read it? I assume he hasn't had the gall to lift stories verbatim, but I could be wrong.

If the book isn't worth a read, Action Squad certainly is. It comes filled with photos of urban exploration adventures that are 100% awesome.

0 comments

Minneapolis Ward 9 City Council Election Campaign Financing — 3 Oct 2009

If you ever cared to know much about the Ward 9 election and campaign finances, it looks like the shit is really hitting the fan and coming out on this Minneapolis Issues List post. Dave Bicking (Green) took some heat, but responded with an elucidating explanation of who supports Gary Schiff and what their connections are. Dave Bicking is particularly critical of the donations to Gary Schiff from businesses and Political Action Committees, and is prohibited from receiving them as a Green party-endorsed candidate.

It pleases to see when politicians get involved in their campaigns like this, and respond directly to criticism to even up the score. And now one of the donors has responded about their own donations, so it looks like the debate is heating up... I hope it gets more attention!

0 comments

Pedobear almost for sale at prominent Finnish chain — 11 Aug 2009

Aamulehti reports (with pictures, for proof!) that a well-known clothing chain in Finland, Seppälä, is holding a design competition in which a Pedobear shirt was competing for some amount of time. There's not much original journalism available in the Aamulehti article (which lists its source as Iltalehti), so there isn't much to report; however, Seppälä removed the image as word that Pedobear was available spread. According to Aamulehti, the bear was so harmless looking that Seppälä obviously had no reason do doubt whether it was fit to print or not.

If you don't know what (or who) Pedobear is, take a look at Encyclopedia Dramatica's article (totally not safe for work, if you don't know what Encyclopedia Dramatica is), which explains everything. Normally, ED wouldn't be a reputable source, but it appears the English-language Wikipedia has deleted their Pedobear article a few times, so if you crave Wikipedia Pedobear wisdom, it is available in Finnish.

Now, who exactly is running this design competition? If they missed this, who knows what other 'borrowed' designs they're missing...

0 comments

Life update: æ skal til Tromsø — 25 Jul 2009

It has been a while since I've last updated this. Life's been busy, partly because at the beginning of August, I'm moving across the ocean to Tromsø, Norway to continue my studies in linguistics at the Universitetet i Tromsø. I'm excited to go, but I'll also miss Minneapolis, which I've grown more attached to in the past few years.

Despite this, it feels like it's about time I got back to linguistics! I've spent about 2.5 years after graduating with an undergraduate degree (also linguistics) working in web development and tech support at the University of Minnesota. While it's been enjoyable and absolutely full of learning something, I'm excited about the change of what I'll be doing, and incredibly happy to get back to linguistics full-time.

I'm also excited to be moving to a town like Tromsø. Not only is it beautiful and mountainous, it's linguistically exciting too (at least for someone interested in Finno-Ugric languages). Tromsø is located in Sápmi, a cultural region in northern Scandinavia and Russia, termed so because it is home to the Sámi people. The region also contains some varieties of Balto-Finnic languages: Kven and Finnish.

The variety of Norwegian spoken in the area is also rather exciting, and is host a few innovations not present in standard bokmål Norwegian. I can't say I know enough about it yet, but I'll have my ears wide open upon arrival.

If it seems like it's a while between posts, be sure to check out my Flickr page.

1 comments

X11 Font Cache, FontExplorer X and Inkscape — 19 Jun 2009

I use FontExplorer X to manage fonts, and by default it maintains an additional directory of fonts outside of other standard font directories. This is no problem for most OS X applications, however X11 applications such as Inkscape use X11's font cache which does not check the FontExplorer X directory for fonts.

When running the font cache updating tool (font_cache), I noticed that it was searching in a directory that didn't exist, ~/.fonts; so, not wanting to figure out how to define which directories font_cache should be looking in, I created a symbolic link from FontExplorer X's library to ~/.fonts:

ln -s FontExplorer\ X/Font\ Library/ .fonts

And then forced font_cache to run and recompile all fonts from the X11 Terminal application:

font_cache -v -f

This did the trick, and now all my fonts show up in Inkscape. I had also attempted previously to create a symlink in a directory that font_cache was checking already, but with no success-- it looks like font_cache will not traverse through symlinked directories, if they occur in a directory it checks (e.g., ~/Library/Fonts/fontexplorer_x_symlink); but it will look in a directory that is a symlink (e.g., ~/.fonts/).

Hopefully anyone googling for a similar issue will find this, but what other ways could one get font_cache to look in other directories? I couldn't find much, but then again I didn't really search all that much either. ;)

0 comments

Minneapolis City Board and Commission Diversity Audit — 27 May 2009

Councilmember Cam Gordon and a university graduate student are conducting a 'diversity audit' of Minneapolis city boards and commissions, wishing to get solid data on how diverse the Minneapolis government is and how accurately this represents the population of Minneapolis. Ah, democracy!

“I have concerns based on what I’ve seen,” Gordon said. “We are probably more male, more middle-aged and more white than the city as a whole.”

However, Gordon said he could be wrong, and having the data provided by the diversity audit would give the city solid numbers.

Part of me wishes they would conduct an audit of neighborhood organizations, too. My limited experience with them says that they fit well into the demographic that Gordon suggested above: primarily middle-aged, and white. Neighborhood organizations are also composed of a majority of business owners which— not that there's anything wrong with that— only represents a small percentage of the neighborhoods they represent. Also, not all business owners may necessarily live in the neighborhood that their business is in.

I remember a time I went to the Whittier Alliance's monthly meeting because I was interested in discussion of public transit and walkability in the neighborhood. The meeting was primarily white, middle-aged and/or a business owner. There were probably 4 or 5 other people my age (college-age student). I made a suggestion to the Whittier Alliance and attending audience that they look at reducing street parking and widening sidewalks along the parts of Nicollet avenue where they were concerned about improving foot traffic. But, since those in attendance were mainly business owners and people with cars, this was laughed down; with the owner of the Black Forest calling the suggestion a "fantasy". This is a minor issue in comparison to other issues that the neighborhood faces, such as violence and narcotics usage; but, one wonders how issues that concern non-white, non-middle aged and non-middle class residents are handled.

Certainly, more residents who don't fit the current Whittier standard for neighborhood involvement could be brought in through a stronger outreach program. On the other hand, I question how successful or serious organizations would be about such outreach, because I have also heard some troubling anecdotes from a friend living in another neighborhood where one of those involved in the neighborhood organization had a sense of outreach that was much more based in prejudice and racism than anything else.

Although I understand that the diversity audit is targeted at Minneapolis city boards and commissions, I wish they'd take a further look at neighborhood organizations and how their diversity lines up with the diverseness of their represented neighborhoods. If neighborhood involvement can be said to be a starting place for careers in city politics, certainly this could be related to the lack of diversity within city government. While I know already what results will come back, it will certainly be exciting to see the statistics. I'm glad that Cam Gordon and Annie Welch are looking into this— I just hope people listen to what they find and try to do something about it.

0 comments

More Northern Sámi Syllable Parsing... — 22 May 2009

I recently learned a bit more about regular expressions which allowed me to vastly improve the speed and simplicity with the Northern Sámi syllable parser I wrote about previously. The thing I learned was 'lookaround', which is a form of matching that doesn't actually consume any material. One of the examples in the book (Friedl, 1997) was a way to split numbers by inserting commas using lookaround using the standard pattern of splitting every three digits. Following is a really simple description of regular expression matching with lookaround, followed by an explanation of it with syllables and Northern Sámi.

Lookaround

What lookaround does, effectively, is finds a position in the text instead of finding text. A simple regular expression substitution without lookaround may be to look through a text, and find every instance of something and replace it.

text = "I'm Lisa.  My pet is a cat.  I like cats."
re.sub(r'cat', 'dog', text)

# Returns:
"I'm Lisa.  My pet is a dog.  I like dogs."

But, what if we wanted to insert "fat" in front of every instance of "cat"? We could certainly replace every instance of 'cat' with 'fat cat', or use group matching and replace every instance with 'cat' with 'fat \g<animal>'.

re.sub(r'(?P&lt;animal&gt;cat)&#x27;, &#x27;fat \g&lt;animal&gt;&#x27;, text)
# Returns: 
"I'm Lisa.  My pet is a fat cat.  I like fat cats."

In the above example, it's still necessary to specify what you have matched, because otherwise it will be gobbled up in the replacement. For the next example, I'll use a slightly bigger text just to show how lookaround can save breath. Since (?=lookaround) matches a position in the text, there is no need to specify the full match in order to make sure nothing is deleted. In the following example, I'm using lookahead, which matches a space before whatever we tell it to look for. There is also lookbehind, which matches the space after.

text = """
I'm Lisa.  My pet is a cat.  I like cats.
I'm Suzie.  My pet is a dog.  I like dogs.
"""
        
re.sub(r'(?=cat|dog)','fat ',text)
        
# Returns: 
"I'm Lisa.  My pet is a fat cat.  I like fat cats.
I'm Suzie.  My pet is a fat dog.  I like fat dogs."

As you can see, the ability to match an environment and insert things in that environment, instead of matching text and inserting the text with the matched environment is a lot more useful in some situations. For instance, the example I happened upon involved inserting commas between three digits from the right of the word, if the three digits were preceded by yet another digit (so commas wouldn't be inserted on the beginning of the number: ,123,456,789.). The moment I saw this example, my eyes lit up, because this is effectively how syllables are parsed.

Syllable parsing with lookaround

Syllable parsing is approached by assigning a syllable boundary for a certain criterion. In the case of Finnish and Northern Sámi, the most simple approach for a large chunk of words is to split them up at every Consonant Vowel (CV) pair from the right to the left. So, a word like CVCCVCV would be split like the following:

CVCCVCV
CVCCV.CV
CVC.CV.CV

It would be pretty simple, thus, to do this with lookaround, because one could define what counts as a consonant and what counts as a vowel pretty simply in a regular expression-friendly format. For the following examples, I'll keep using C and V, but note that C could be a stand-in for a regular expression set [tpklmn], and V could be a stand-in for [aieou]. It's convenient to use C and V here, but the needs of syllable parsing are sometimes more dependent on what the actual consonants and vowels are, and not the fact that they are simply a consonant or a vowel.

The first syllable splitting rule should then be to match CV and insert a period before it to represent a syllable break:

>>> word = 'CVCCVCV'
>>> print re.sub('(?=CV)', '.', word)
.CVC.CV.CV

For the sake of neatness, that initial dot can be removed by matching CV only if it is preceded by something else (using lookbehind):

>>> print re.sub('(?<=[CV])(?=CV)', '.', word)
CVC.CV.CV

This rule can also handle numerous types of words, vowel-initial words (VCVCV -> V.CV.CV), and words with codas (CVCCVCVC -> CVC.CV.CVC). Of course, things get a little more complex when you wish to match specific contexts like lists of vowels and lists of consonants, but it can be done. It also shortened the code I had written drastically. Where I was using a ridiculous amount of if:then statements, the updated parser just runs every word through the same regular expression and lets a significantly faster and more efficient process handle the splitting.

As a result, the code I had to write for a syllable parser was reduced from nearly 130 lines to 10. Also, much larger sets of data can be handled in a shorter amount of time so it would be easily possible to use it in a language checking/spell-check tool where it is important to provide quick feedback. For a detailed explanation of one such example, read my previous post on syllable parsing.

Note: Syntax highlighting may be non-existent or ugly while I work on adding some syntax highlighting rules to my CSS.

0 comments

Conficker Whitepaper Abstracts — 31 Mar 2009

A friend was shooting some quotes at me from a Conficker whitepaper today that were highly interesting, so I'm sharing here. Conficker is basically a virus, or worm, that now infects over 10 million machines worldwide, targetting a Windows vulnerability. The worm is supposed to go active on April 1st, or at least, download new instructions via its command network. Despite the havock it could cause, the worm is actually quite exciting. Part of me fears what it could do, but part of me is excited to see it. Certainly Conficker isn't the only worm to reach it to vast proportions, but it is one of the first to be as innovative as this.

As it is understood now, the worm has been created by a skilled person or persons who are well versed in current security holes, and current encryption algorithms. The code was purposefully constructed in a method that would create extra work for security analysts who would undoubtedly reverse-engineer it in attempts to circumvent it. The worm currently exists in a few separate versions that reflect stages in its development-- each largely improved over the last. While this isn't new to viruses and worms, the improvements show that the authors have some sort of highly detailed plan that they're going by. In short: this is no simple lulz-guided operation carried out by script kiddies: "At its core, the main purpose of Conficker is to provide the authors with a secure binary updating service that effectively allows them instant control of millions of PCs worldwide."

Some of the updates to the worm are really quite genius: on one event at a vulnerability was discovered in the MD-6 cryptographic hash method that the virus was using to help verify that new instructions were untampered, and shortly after, the vulnerability was patched in millions of machines. My friend joked that this worm is basically more secure and up-to-date than Windows, but humor aside, there is much truth. Windows is somewhat famous for having security holes that go unpatched for long periods of time. Then, when they are fixed, there is no way to verify that the audience is receiving them. Case-and-point: this virus is exploiting a vulnerability that has been patched on Microsoft's side, but they have no way of delivering the patch to the millions and millions of machines that need it.

Part of this issue stems from the fact that it is impossible, with a bootlegged version of Windows, to receive security updates. The sad humor in this is that these bootlegged versions of windows make up a significant chunk of the users of the version of Windows with this vulnerability. As such, Microsoft's own attempts to prevent people from bootlegging may well bring the internet to its knees for a few days if this worm is fed a particularly nasty apple from its developers. As there is no proof of it's developers intents, there is no way to say with any veracity that the worm is indeed malicious; outside of having gained control of so many computers and preventing them from looking for patches, the worm has yet done nothing except avoid attempts to disable it.

Read more
0 comments

Hungarian prefixes and word order — 28 Mar 2009

I've been on a Hungarian music kick lately, listening more to hits from the 30s to the 50s, and so naturally I had to learn some. I dusted off a book I'd picked up on Hungarian for Finnish language learners, thinking that the Uralic connection might mean that things were organized in a way I can learn more efficiently. It seems to be true so far, and I'm finding some interesting things about Hungarian syntax, particularly the verbal prefixes which are discussed here. All following examples are taken from Unkarin kielioppi (Csepregi, 1991). Note that in the following glosses, only definiteness is marked in verb conjugations, otherwise assume that verbs not marked with +Def are indefinite.

Read more
3 comments

Northern Sámi syllable parser — 2 Mar 2009

Back in one of my phonology classes while I was working on my undergraduate degree in Linguistics, I wrote a paper that was an Optimality Theoretical account of stress-based coda strengthening in Northern Sámi. Although the OT model worked perfectly for the data set I had collected, I was somewhat unhappy with a smaller data set and wished to prove my point on a larger scale. In the couple years since then, I've gained some better programming skills and internet has changed so more data is available. As such, what follows is an extention of this previous paper. At some point I might track that down and revise and post it, but since that will likely be a little while, what follows will be a more complete discussion of the phenomenon that assumes less prior knowledge.

In order to process large amounts of data, I created a syllable parser based on some rules I had made for a programmatical account Standard Finnish. The role of a syllable parser in in Northern Sámi could be a couple things. As an analytical tool, the syllable parser can handle larger amounts of data in a shorter amount of time than can be processed by a human working at this task. The parser could also be used predictively, and could aid language/spell checking and for translation or localization. In localization, the syllable parser could make sure that the right suffix is applied to the right kind of word.

Read more
0 comments

How not to comment anonymously about yourself on someone's blog — 2 Mar 2009

A friend of mine recently posted to his blog about a Star Tribune article on the relationship drama of a candidate for city council, Charles Carlson. The original Strib article also pointed fingers at a possible campaign violation, which I think are really more interesting. John also started some public discussion of Quorum-gate; which I don't want to reproduce here out of laziness, but I did make some points there worth reading.

Anyway, someone commented anonymously saying that it "seems irresponsible to make that accusation without any proof" and on a whim I asked John if he could give me the IP address for the anonymous poster. Sometimes interesting things turn up with these sorts of things, and lo', something interesting did in fact turn up.

The IP address (whois) was registered to "Boca Raton Residence Inn", and according to some logs that John graciously donated, this person stumbled upon his blog entry googling for the words "charles carlson jason matheson" (1). Who in Florida would be checking up on that so soon after the news broke?

(1)

12.42.0.8 - - [25/Feb/2009:06:48:51 -0800] "GET / HTTP/1.1" 200 44182 "http://www.google.com/search?q=charles+carlson+jason+matheson&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.6) Gecko/2009011912 Firefox/3.0.6"

The logs continued to tell a tale of how this user used John's email contact form to send him an email (2), and then proceeded to post a comment on the fated blog post.

(2)

12.42.0.8 - - [25/Feb/2009:06:53:06 -0800] "POST /contact HTTP/1.1" 200 126 "http://blog.johnschrom.com/contact" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.6) Gecko/2009011912 Firefox/3.0.6"

12.42.0.8 - - [25/Feb/2009:06:58:13 -0800] "POST /wp-comments-post.php HTTP/1.1" 302 - "http://blog.johnschrom.com/archives/1362" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.6) Gecko/2009011912 Firefox/3.0.6"

As it would have it, Charles Carlson sent John an email via John's contact form that morning, criticizing John for posting allegedly baseless accusations. Content of the form email aside, the case for what happened here is clear, and unless Charles had a travel companion who immediately posted a comment 5 minutes later from the same computer and browser session, Charles is the anonymous commenter.

If someone emailing John from a contact form (it gives you the option to specify any reply-to address) and then posting a comment isn't enough evidence for you, it seems to be widely known that Charles missed a political forum this weekend in lieu of officiating a tennis match in Florida. His recent tweets about Florida sun and a dinner in Fort Lauderdale certainly tie in with this.

There are services for anonymizing your internet traffic however, and a simple Google search will turn those up.

Update: Charles has since dropped out of the race for city council; but this anonymous comment snafu just shows the quality of his character and why it's good he's not running anymore. I hope the tennis officialdom works out for him and is a better fit to his skills.

0 comments

Estonian Overlength — 11 Feb 2009

I was reading a book on Estonian Noun and Verb conjugations (Mürk, 1997), which is filled with some interesting stuff, but one mention of Estonian overlength and its interaction in allomorphy caught my eye, so here are some notes. The following post is going to use an altered orthography (to be explained below).

Quantity in Estonian

Estonian has a system of three contrasts in quantity, short, long and overlong. Finnish and Italian and numerous other languages by comparison have two, so there are short and long consonants (marked in the orthographies usually by double consonants <kk>). Estonian is surprising in this regard, however the third quantity as it turns out may be more of a question of stress or intonation, and it's usually marked by a some sort of pitch accent (unsure on specifics). As far as I know, some studies have proven that Estonian speakers have more difficulties distinguishing words when the only difference is the duration of the consonants or vowels, and require pitch cues. Also, over-long segments or syllables are usually accompanied by stress. There are also other questions of lenis and fortis, but I don't want to get into that since it's not really important to the problem at hand.

Regardless of this, I'll be marking duration in the following way: short consonants and vowels will be marked by one character: <t>, <p>, <k>, <a>; long with two: <tt>, <pp>, <oo>; over-long with three: <ttt>, <rkk>, <mmp>, <uii>. For stops, this means I won't be using the traditional orthographic method of <b>, <d>, <g> to mark the short stops, rather <p>, <t>, <k>.

-sit/-it

The interesting morphophonological variation alluded to above is with partitive plural morphemes and how they behave when overlong syllables come into question. The book claimed that they count for an extra syllable (it may thus be an issue of moras), and I was skeptical at first, but then I collected some more data from the book, and it was rather surprising. So, I want to see what anyone else here with phonological tendencies thinks.

First, some words in which length isn't the question, but mere syllable count. In the following examples (1-6), you can see that words with two syllables in the genitive singular end up with the suffix -sit, while words with three syllables end up with -it.

  
  English   Gen Sg.   Part. Pl.
(1)   'airplane'   .len.nu.ki.   .len.nu.keit.
(2)   'horse'   .ho.pu.se.   .ho.pu.seit.
(3)   'beard'   .ha.pe.me.   .ha.pe.meit.
             
(4)    'disease'   .tõ.ve.   .tõ.pe.sit.
(5)   'storehouse'   .ai.ta.   .ait.ta.sit.
(6)   'lip'   .mo.ka.   .mo.ka.sit.

There are obviously a couple other things going on in the data such as consonant gradation (a system of lenition formerly triggered by morphological situations in which open syllables become closed). Otherwise, it looks pretty clear that the genitive singular forms that have three syllables correlate to the application of the partitive plural suffix -it, while genitive singular forms with two syllables correspond with partitive plural suffix -sit.

The following data shows situations in which the only way to explain the application of the suffix -it implies something is up with the syllable count. According to the book, an overlong segment implies the presence of two syllables, so I'll mark that in the examples below.

    English   Gen Sg.   Part. Pl.
(7)   'bush'   .põõ.õ.sa.   .põõ.õ.sait.
(8)   'speck'   .tap.p.pe.   .tap.p.peit.
(9)   'cabbage'   .kap.p.sa.   .kap.p.sait.
(10)   'alert'   .erk.k.sa.   .erk.k.sait.
(11)   'window'   .ak.k.na.   .ak.k.nait.
(12)    'edifice'   .ho.o.ne.   .ho.o.neit.
(13)   'tooth'   .ham.m.pa.   .ham.m.pait.

The book also implies that this only works when the overlong segments in question are in the last or second to last syllable, which would correspond with main stress. On the other hand, the assignment of stress in the partitive plural forms in the examples here is not different, so the issue can not necessarily be 100% stress. Also, if the above examples were to be treated as words with two syllables, they would receive the suffix -sit.

Anyway, what else could it be? I guess I'd suspect a moraic analysis of this would be more clear-- and thus prevent the need for this weird syllable analysis. On the other hand, the syllable analysis shows that all illative forms 'like' to be three syllables at most. Interesting problem. Now if only Estonian clearly marked it's length/whatever contrast in a more clear manner in the standard orthography, and I think I'd be set. Until then, I just need to learn a crap-load of words.

1 comments

Kven language — 10 Feb 2009

I've been doing a little studying/reading some of the Kven language, so I thought I'd share some samples.

The Kven are a recognized minority in northern Norway and their language has some official status in some places. They arrived in Northern Norway in the beginning of the 1700s, coming from Northern Finland. The language that they speak is most closely related to Finnish varieties found in northern Finland and Sweden (such as meänkieli). A sample of written Kven is available here in a document on switching to Digital TV.

The most noticeable feature of the language when you take a gander at the above-linked document is the fact that they use the letter đ (as in Northern Sámi), which represents an interdental fricative. This is one of the features that survived in Kven that didn't make it into Standard Finnish (the Rauma dialect of Finnish still has it-- unless that too has now completely gone away). Thus, Kađula ođotethaan 'People wait in the street/there is waited in the street' (c.f. Finnish: kadulla odotetaan).

Kven is also a variety of Balto-Finnic which has retained intervocalic -h-, although unlike in Karelian dialects of Finnish where the -h- is still intervocalic, -h- in Kven either metathesizes with previous voiced consonants, or follows them and works as the onset of the syllable:

kirkhoon 'into the church' (Kar. kirikköh, Fin. kirkkoon > *kirkkohon)
miehleen 'into mind' (Kar. mieleh, Fin. mieleen > *mielehen)
Norhjaan/Norjhaan 'into Norway'(Kar. Norjah, Fin. Norjaan > *Norjahan)

I have been unable to find samples of this pronounced yet, so I can't say if this is purely a resyllabification or if there is some assimilation with -h- and surrounding consonants involving voicing, which would be quite cool. How else are words like tukholhmaan treated, [.tuk.hol.@h.maan.], [.tuk.holh.maan.], [.tuk.hol.hmaan.] or [.tuk.hol̥.maan]? I'd really be curious how the syllable template in Kven handles this.

Another feature is that sometimes infinitives are overtly marked with a final consonant-- this is somewhat preserved in speech in Finnish (but not the orthography), where one sometimes finds an assimilating glottal stop (haluan mennäk kotiin/ostaat tuolin 'I want to go home/buy a table.'), but it is preserved in Kven as a final -t (at least according to orthography): mie haluun kattoot TV:tä 'I want to watch TV.'

Kven also seems to like 'strengthening' short stressed CV syllables, much like Sámi and apparently some dialects of Swedish and Norwegian that have been influenced by the same thing as Sámi, so apparently this is an areal phenomenon up there (although it does happen elsewhere in Finnish): pittäät 'to hold/like', (c.f. Finn.: pitää), but this seems to be governed by more than just syllable weight, so I'll not comment more than to point it out. If I can find some recordings some time, or just some speakers, it might be fun to figure it out.

0 comments

Import — 9 Feb 2009

I'm working on moving some old linguistics-related posts from another blog to here, with a few tweaks. Watch this space for: some notes on Estonian overlength, and interesting aspects of Kven.

0 comments

TrackThis: RSS for USPS — 5 Feb 2009

I just sent off a very important package, and as with very important packages, it's very important to keep apprised of their whereabouts! Unfortunately, USPS doesn't have any features allowing you to be updated whenever your package's status changes, rather, you must actively keep yourself updated by logging in and checking repeatedly. Since this only serves to make you feel obsessive and weird, a better solution would be a more passive method like RSS.

This is just what TrackThis is for. It tracks items and sends you updates in RSS format, but goes above and beyond just this and offers a service for SMS, Facebook, and even direct messages on Twitter. In addition, it gives you the option to use external authentication from OpenID, or OpenID-like services (Google, AOL, Facebook, MySpace, Yahoo!) to prevent the need for yet another random account. Very convenient!

So, now I feel slightly more productive, and less bothered by the need to keep checking USPS. And here, I was just about to devise my own hacky USPS POST request -> RSS 'solution', but saved the effort. Glad something better exists already!

0 comments

Facebook in Northern Sámi / Facebook davvisámegillii — 17 Jan 2009

A month or so ago, I talked to a friend of mine who works at Facebook, and as a result, a new localization option was opened in Facebook's Translations application: Northern Sámi. Some of you might ask what Northern Sámi is, so before I talk about the project, here's a quick introduction to the vital details in a format that is less intensive with regards to linguistic terminology.

Northern Sámi is spoken in Northern Scandinavia by an estimated 15,000 - 35,000 people (depending on who you ask). It is a Finno-Ugrian language, which makes its more well known relatives Finnish and Hungarian, which aren't quite closely related. If you were to compare the relation of Finnish and Northern Sámi to Indo-European and Romance languages, you might say that Finnish is to Portuguese as Northern Sámi is to Russian. Northern Sámi is most closely related to about 8 to 10 other Sámi languages which also are spoken around Northern Scandinavia and the Kola Peninsula of Russia. Of these languages, Northern Sámi is the most numerous in terms of speakers.

If forced to pick a few interesting points about the language, I would have to go with the following:

Dual numbers — Northern Sámi contains verb conjugations and pronouns that describe 'we two', 'you two' and 'they two', in addition to the singular and plural. The following examples show this, but also show that English only differentiates between singular and plural.

Márit lea gávpis.
'Márit is at the store.'

Márit ja Máhtte leaba gávpis.
'Márit and Máhtte are at the store.'

Márit, Máhtte ja Elle leat gávpis.
'Márit, Máhtte and Elle are at the store'.

Detailed terminology for reindeer and snow. A good summary is available in this PDF.

A three-way contrast between consonant and vowel length.

Interdentals! Ththththththththtthththththththth. There aren't a lot of Finno-ugrian languages that have these sounds. In fact, the only other language variety I can think of right now outside of Northern Scandinavia with interdentals is in a version of the Rauma dialect of Finnish (Southwest Finland) as spoken by now elderly speakers. Interdental consonants (like in 'think') used to be more prominent in Finnic languages about a thousand years ago, but have since become less common.

The Facebook internationalization project in Northern Sámi, since it began, has grown to having 25 translators. Some of them are highly active in providing translations, and some of them are highly active in voting on translations to make sure that the best translation "wins". Recently, the project reached a new phase (translating phrases), which has been going much faster than even the first phase (establishing a glossary of terminology) despite that this second phase contains much more work. While I cannot predict how long this second phase will take, I can say that (copy/pasting) there are 23,796 phrases left to translate as of this date.

The reason I feel that a Northern Sámi-localized version of Facebook is important is because Facebook is about keeping people in touch with each other. What better a way to accomplish this, than to do it in Facebookers' own languages? Not only that, but this goal becomes immediately more awesome when it is also improving the usefulness of a minority language to its speakers. This is important for the survival of a language, because in order for a language to survive a language must continue to be useful to its speakers, and they must want to speak it. Part of this is maintaining prestige, and part of this is making sure that the language can continue to be used in a changing and globalizing environment.

In this case, Facebook is just a piece of the puzzle, and part of a more general point: since it is a prominent social networking site (which is constantly gaining users, and has an active population larger than Russia), it is naturally an important part of some peoples' methods of keeping in touch. If this one resource is available to users in their own language, this service has increased the usefulness of that language and reduces a need to interact with that service with another non-native language. With more services and media (books, news, TV, etc.) becoming available, a language has an even better chance at surviving.

Now, Northern Sámi isn't as endangered (or just plain isn't endangered) like some of its closest relatives, but the availability of Facebook in Northern Sámi can serve as a sign that something like this is just as possible for other minority languages too.

If you're interested in participating, check out Facebook's Translations application.

Update (3/9/9): Since the original time of posting, the amount of translators working on this project has just about doubled. w00t!

0 comments

Moving in — 1 Jan 2009

Moving in... So, prepare for little bugs. If anything explodes and gives an error, drop a comment with the URL that was problematic. If other inconsistencies occur, also mention. I'm working on ironing those out but I only have one set of eyes! Blog posts may be a bit sparse to start with, but check out the Selection of Truly Exciting Finnish Words... I'm populating that with more words than there will be blog entries for while.

The content of this blog is not necessarily meant to be Northern Sámi-centric, but it just happens to be what I'm working on more lately, as will be explained in future posts. The reason for this is not that I am culturally Northern Sámi myself, but rather, I am a student of linguistics who has taken an interest in this language and its respective culture and language family. I'm basically a big nerd for Finno-ugric languages, and not ashamed to admit it.

Things may slowly end up getting tweaked through use. For instance, the Selection of Truly Exciting Finnish Words is currently in it's infancy, but I expect it to grow. Some words are not as thoroughly populated with interesting tidbits, or are there as a placeholder for more information. Word tags also contain a decent amount of information, for example: consonant gradation and the ghost consonant tags. Drop comments where comments are welcome; they'll only help improve things.

The sanasto itself does not store all word forms individually, and instead they are generated by a series of rules. I will be tweaking the underlying code that handles this over the course of time, so for any of you Finnish speakers out there, please tell me if you notice odd inflections, or are aware of additional variation that is available in certain words (e.g., tunturia/tuntureita). Be advised that Standard Finnish may accept one thing, but this may not be true of the wealth of Finnish dialects.

Happy reading and word-sleuthing!

4 comments