PowerPoint Presentation

2/5/01

Approaches I

•Unsupervised learning

–lots of text as input (100M words or so)

–wordform similarity and context similarities

–stem classes or full decomposition

–Promising performance on small subtasks

•Yarowsky (2000)

–Overall results (so far) are inadequate

•Brent (1999), Goldsmith (2000)

–100M-word corpus is unrealistic for most languages where induction is needed