A test bench
for morphology learning
• For a dozen or so diverse languages
– Text corpus (~1-10MW)
– Tagged subcorpus (.1-1MW)
• for training and/or testing
– Broad-coverage analyzer/synthesizer
• generating data for (semi-)supervised learning
• “oracle” for active learning
– Tagger
• generating approximately correct tagged data
2/5/01