|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
For a dozen or
so diverse languages
|
|
|
|
|
Text corpus
(~1-10MW)
|
|
|
|
|
Tagged subcorpus
(.1-1MW)
|
|
|
|
|
for training
and/or testing
|
|
|
|
|
Broad-coverage
analyzer/synthesizer
|
|
|
|
|
generating data
for (semi-)supervised learning
|
|
|
|
oracle for
active learning
|
|
|
|
|
Tagger
|
|
|
|
|
generating
approximately correct tagged data
|
|