Statistical MT – with meter and rhyme

« previous post | next post »

I promised in an earlier post to report on some of the many interesting presentations here at InterSpeech 2010. But various other obligations and opportunities have cut into my blogging time, and so for now, I'll just point you to the slides for my own presentation here: Jiahong Yuan and Mark Liberman, "F0 Declination in English and Mandarin Broadcast News Speech".

I still hope to blog about some of the other interesting things I've learned here, but it's already time for me to head out on the next leg of my journey. Worse, I've already got a list of things to blog about from the next conference where I'm co-author on a presentation, EMNLP 2010 — which hasn't even started yet. At the top of that list is Dmitriy Genzel, Jakob Uszkoreit and Franz Och, "'Poetic' Statistical Machine Translation: Rhyme and Meter".

Their abstract:

As a prerequisite to translation of poetry, we implement the ability to produce translations with meter and rhyme for phrase-based MT, examine whether the hypothesis space of such a system is flexible enough to accomodate such constraints, and investigate the impact of such constraints on translation quality.

Their way of posing and solving the problem is very elegant, and the results are impressive. In the face of this display of imaginative technical bravado, it's churlish to complain.

Still, I can't help observing that their definition of English meter is wrong, in ways that cause their outputs to be both too tightly and too loosely constrained. They set up an ideal pattern of stressed and unstressed syllables, e.g.

[I]f we use 0 to indicate no stress, and 1 to indicate stress, blank verse with iambic foot obeys the regular expression (01)*.

and then add costs to translation candidates based on a count of mismatches between the stress sequence of a candidate translation and the desired ideal pattern.  Impressively, their algorithms are flexible enough to accommodate such extra constraints, and their hypothesis space of possible translations is large enough that they can generally find something that fits fairly well.

The trouble is,  this isn't really how English accentual-syllabic verse works.  The details are complicated and vary with writer, period, and genre. And there are competing theories about how to describe and explain the facts. But the basic descriptive situation for English iambic pentameter is a pattern like

x s w s w s w s w s

where the stressed syllables of polysyllabic words are not allowed in w(eak) positions.  There are no (categorical) constraints on the location of monosyllables, and no (categorical) constraint against unstressed syllables in s(trong) positions, and no constraints on the first syllable (so that "inverted feet" are common at line beginnings). There are also issues about when "syllables" can (or must) be ignored for purposes of scansion.  (Some further discussion here, or more authoritatively and at greater length, here, here, here.)

I don't see any reason that Genzel, Uzkoreit and Och couldn't modify their system to use the right constraints rather than the wrong ones (though there are a few possible complications due to interactions at the joints of phrasal combinations), and so it's too bad that they didn't set the system up to match the actual norms of English metered verse.

For your added reading pleasure, the authors have reprinted the "Review in verse" that they got from an anonymous referee.  Again, the reviewer's dedication and good spirits are impressive, though the meter is a bit, um, rough in spots.

(I won't actually be attending EMNLP this year, unfortunately, since I need to get back to teaching and other obligations back home. So  our paper — Rushin Shah, Paramveer S. Dhillon, Mark Liberman, Dean Foster, Mohamed Maamouri and Lyle Ungar, "A New Approach to Lexical Disambiguation of Arabic Text" — will be presented by another author.)


  1. groki said,

    September 30, 2010 @ 6:55 pm

    it would be delicious fun to try out the SMT poetry system, but alas, not yet:

    the system at present is too slow, and we cannot make it available online as a demo, although we may be able to do so in the future.

  2. Leonardo Boiko said,

    October 1, 2010 @ 11:14 am

    From the author's reply to the review:

    To the second, contrite,
    I am answering "No",
    Copyright is a fright,
    And will not let it go.

    I can only echo the reviewer's opinion:

    It is downright unfortunate,
    that such interesting work should remain private.

  3. Mark F. said,

    October 1, 2010 @ 11:39 am

    Why is the meter of light verse tighter than that of serious verse?

  4. groki said,

    October 1, 2010 @ 2:02 pm

    @Mark F.

    too tight sounds too much like a wind-up toy, and a lock-step feel doesn't really suit shades of meaning and subtleties of expression.

    think grape juice versus wine.

  5. Jerry Friedman said,

    October 1, 2010 @ 5:50 pm

    @MYL: So strong syllables in weak positions are allowed only as monosyllables or in prepositions or after pauses? Thanks, after all these years of reading and writing poetry I find out I didn't know one of the rules of iambic pentameter! And a look at my most recent effort in that meter shows I didn't know it unconsciously either. But I was surprised at how rare the exceptions are. It took me a while (when I should have been doing other things), but I found a couple in Frost:

    And then come back to it and begin over. ("Birches")

    “I can’t think Si ever hurt anyone.” ("The Death of the Hired Man")

    (Unless the rule that prepositions don't count also applies to ever.)

  6. dirk alan said,

    October 3, 2010 @ 10:39 pm

    shouldnt it be scotland meter ? or havent the bad guys converted.

RSS feed for comments on this post