PowerPoint Presentation

Approaches I

•

Unsupervised learning

–

lots of text as input (100M words or so)

–

wordform similarity and context similarities

–

stem classes or full decomposition

–

Promising performance on small subtasks

•

Yarowsky (2000)

–

Overall results (so far) are inadequate

•

Brent (1999), Goldsmith (2000)

–

100M-word corpus is unrealistic for most

languages where induction is needed

2/5/01