Approaches I
Unsupervised learning
lots of text as input (100M words or so)
wordform similarity and context similarities
stem classes or full decomposition
Promising performance on small subtasks
Yarowsky (2000)
Overall results (so far) are inadequate
Brent (1999), Goldsmith (2000)
100M-word corpus is unrealistic for most
languages where induction is needed
2/5/01