|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
• |
Unsupervised
learning
|
|
|
|
– |
lots of text as
input (100M words or so)
|
|
|
|
– |
wordform
similarity and context similarities
|
|
|
– |
stem classes or
full decomposition
|
|
|
|
– |
Promising
performance on small subtasks
|
|
|
|
• |
Yarowsky (2000)
|
|
|
|
– |
Overall results
(so far) are inadequate
|
|
|
|
• |
Brent (1999),
Goldsmith (2000)
|
|
|
|
– |
100M-word corpus
is unrealistic for most
|
|
|
languages where
induction is needed
|
|