Publications listed here in reverse chronological order.
Also available:
NP
without supervision into
NP[0]
and NP[1]
, which behave differently.
We first propose to learn a PCFG that adds
such features to nonterminals in such a
way that they respect patterns of linguistic
feature passing: each node's nonterminal
features are either identical to, or independent
of, those of its parent. This linguistic
constraint reduces runtime and the
number of parameters to be learned. However,
it did not yield improvements when
training on the Penn Treebank. An orthogonal
strategy was more successful: to improve
the performance of the EM learner
by treebank preprocessing and by annealing
methods that split nonterminals selectively.
Using these methods, we can maintain
high parsing accuracy while dramatically
reducing the model size.
Radiology reports have especially low entropy, which allows prediction of multi-word phrases. Our system therefore chooses an optimal phrase length for each prediction, using Bellman-style dynamic programming to minimize the expected cost of typing the rest of the document. This computation considers what the user is likely to type in the future, and how many keystrokes it will take, considering the future effect of phrase completion as well.
Probabilistic parsing requires a lexicon that specifies each word's syntactic preferences in terms of probabilities. To estimate these probabilities for words that were poorly observed during training, this thesis assumes the existence of arbitrarily powerful transformations (also known to linguists as lexical redundancy rules or metarules) that can add, delete, retype or reorder the argument and adjunct positions specified by a lexical entry.
In a given language, some transformations apply frequently and others rarely. We describe how to estimate the rates of the transformations from a sample of lexical entries. More deeply, we learn which properties of a transformation increase or decrease its rate in the language. As a result, we can smooth the probabilities of lexical entries. Given enough direct evidence about a lexical entry's probability, our Bayesian approach trusts the evidence; but when less evidence or no evidence is available, it relies more on the transformations' rates to guess how often the entry will be derived from related entries.
Abstractly, the proposed ``transformation models'' are probability distributions that arise from graph random walks with a log-linear parameterization. A domain expert constructs the parameterized graph, and a vertex is likely according to whether random walks tend to halt at it. Transformation models are suited to any domain where ``related'' events (as defined by the graph) may have positively covarying probabilities. Such models admit a natural prior that favors simple regular relationships over stipulative exceptions. The model parameters can be locally optimized by gradient-based methods or by Expectation-Maximization. Exact algorithms (matrix inversion) and approximate ones (relaxation) are provided, with optimizations. Variations on the idea are also discussed.
We compare the new technique empirically to previous techniques from the probabilistic parsing literature, using comparable features, and obtain a 20% perplexity reduction (similar to doubling the amount of training data). Some of this reduction is shown to stem from the transformation model's ability to match observed probabilities, and some from its ability to generalize. Model averaging yields a final 24% perplexity reduction.
We speed up Tesar and Smolensky's RCD algorithm to be linear on the number of constraints. This finds a ranking so each attested form xi beats or ties a particular competitor yi.
We also generalize RCD so each xi beats or ties all possible competitors. Alas, neither this more realistic version of learning, nor even generation, has any polynomial algorithm unless P=NP! That is, one cannot improve qualitatively upon brute force:
Merely checking that a single (given) ranking is consistent with given forms is coNP-complete if the surface forms are fully observed and Delta2p-complete if not. Indeed, OT generation is OptP-complete. As for ranking, determining whether any consistent ranking exists is coNP-hard (but in Delta2p) if the forms are fully observed, and Sigma2p-complete if not.
Finally, we show that generation and ranking are easier in derivational theories: in P, and NP-complete.
The obvious parsing algorithm for bilexical grammars (used by most authors) takes time O(n5). A more efficient O(n3) method is exhibited. The new algorithm has been implemented and used in a large parsing experiment (Eisner 1996). We also give a useful extension to the case where the parser must undo a stochastic transduction that has altered the input.
This is an improved and extended version of an earlier paper (BibTeX): Eisner, Jason (1997). Bilexical grammars and a cubic-time probabilistic parser. Proceedings of the International Workshop on Parsing Technologies, pages 54-65, MIT, September.
In this talk I'll motivate a restrictive formalization of OT that allows just two types of simple, local constraint. Gen freely proposes gestures and prosodic constituents; the constraints try to force these to coincide or not coincide temporally. An efficient algorithm exists to find the optimal candidate.
I will argue that despite its simplicity, primitive OT is expressive enough to describe and unify nearly all current work in OT phonology. However, it is provably more constrained: because it is unable to mimic deeply non-local mechanisms like Generalized Alignment, it forces a new and arguably better account of metrical stress typology. I will even discuss a proposal for constraining it further.
http://cs.jhu.edu/~jason/papers/#ucla99
. Extended
version of talk at the 1997 LSA.
Some results on generation are presented. Unlike less restricted theories using Generalized Alignment, OTP grammars can derive optimal surface forms with finite-state methods adapted from Ellison (1994). Unfortunately these methods take time exponential on the size of the grammar. Indeed the generation problem is shown NP-complete in this sense. However, techniques are discussed for making Ellison's approach fast and practical in the typical case, including a simple trick that alone provides a 100-fold speedup on a grammar fragment of moderate size. One avenue for future improvements is a new finite-state notion, ``factored automata,'' where regular languages are represented compactly via formal intersections of FSAs.
Specifically, the paper attempts to shed light on the classical algorithms of Kruskal, Prim, and Boruvka; the improved approach of Gabow, Galil, and Spencer, which takes time only O(m log (lg* n - lg* m/n)); and the randomized O(m) algorithm of Karger, Klein, and Tarjan, which relies on an O(m) MST verification algorithm by King. It also considers Frederickson's method for maintaining an MST in time O(sqrt(m)) per change to the graph. An appendix explains Fibonacci heaps.
This paper presents a rather different and in some ways more explanatory typology of stress, couched in the framework of primitive Optimality Theory (OTP), which allows only primitive, radically local constraints. For example, Generalized Alignment is not allowed. The paper presents a single, coherent system of rerankable constraints that yields the basic facts about iambic and trochaic foot form, iambic lengthening, quantity sensitivity, unbounded feet, simple word-initial and word-final stress, directionality of footing, syllable (and foot) extrametricality, degenerate feet, and word-level stress.
The metrical part of the account rests on the following intuitions:
This talk proposes an approach to constraining OT, called "primitive Optimality Theory" (OTP). Most constraints given in the literature can be reformulated (not always obviously) as coming from one of two simple, local families of ``primitive constraints'':
We will formalize these families and the representations that they constrain. As in Optimal Domains Theory, neither the constraints nor the representations use association lines. The constraints control only the relative timing of articulatory gestures, and other phonological or morphological constituents, along a continuous timeline.
A list of hundreds of constraints drawn from the literature is presented, showing how every degree of freedom of OTP is exploited in each of several areas: features, prosody, feature-prosody interaction, input-output relationships, and morphophonology. To show that the primitive constraints are not merely necessary, but also close to sufficient, we also discuss how to handle a few apparently difficult cases of non-local phenomena.
http://ruccs.rutgers.edu/roa.html
) and at
http://cs.jhu.edu/~jason/papers/#lsa97
.
With some additions and corrections:
Eisner, Jason (1997).
Constraining OT: Primitive Optimality Theory. Talk handout,
MIT, September. http://www.cs.jhu.edu/~jason/papers/#mit97
.
The present paper describes some details of the experiments and repeats them with a larger training set of 25,000 sentences. As reported at the talk, the more extensive training yields greatly improved performance. Nearly half the sentences are parsed with no misattachments; two-thirds are parsed with at most one misattachment.
Of the models described in the original written paper, the best score is still obtained with the generative (top-down) ``model C.'' However, slightly better models are also explored, in particular, two variants on the comprehension (bottom-up) ``model B.'' The better of these has an attachment accuracy of 90%, and (unlike model C) tags words more accurately than the comparable trigram tagger. Differences are statistically significant.
If tags are roughly known in advance, search error is all but eliminated and the new model attains an attachment accuracy of 93%. We find that the parser of Collins (1996), when combined with a highly-trained tagger, also achieves 93% when trained and tested on the same sentences. Similarities and differences are discussed.
Many modern students of democracy favor proportional representation through the Single Transferable Vote (STV). In countries with high illiteracy, however, this system may be unworkable.
This paper proposes a practical modification of STV. In the modified system, each citizen votes for only one candidate. Voters need not specify their second, third, and fourth choices. Instead, each candidate specifies his or her second, third, and fourth choices. The modified system is no more difficult for voters than current proposals -- and it provides virtually all the benefits of STV, together with some new ones.
- Abstract:
- An adaptive compression technique which is an improvement to Lempel-Ziv (LZ) compression techniques, both as applied for purposes of reducing required storage space and for reducing the transmission time associated with transferring data from point to point. Pre-filled compression dictionaries are utilized to address the problem with prior Lempel-Ziv techniques in which the compression software starts with an empty compression dictionary, whereby little compression is achieved until the dictionary has been filled with sequences common in the data being compressed. In accordance with the invention, the compression dictionary is pre-filled, prior to the beginning of the data compression, with letter sequences, words and/or phrases frequent in the domain from which the data being compressed is drawn. The letter sequences, words, and/or phrases used in the pre-filled compression dictionary may be determined by statistically sampling text data from the same genre of text. Multiple pre-filled dictionaries may be utilized by the compression software at the beginning of the compression process, where the most appropriate dictionary for maximum compression is identified and used to compress the current data. These modifications are made to any of the known Lempel-Ziv compression techniques based on the variants detailed in 1977 and 1978 articles by Ziv and Lempel.
- Citation (BibTeX):
- Jeffrey C. Reynar, Fred Herz, Jason Eisner, and Lyle Ungar. A Lempel-Ziv data compression technique utilizing a dictionary pre-filled with frequent letter combinations, words and/or phrases. U.S. Patent #5,951,623 issued 9/14/1999, filed 1996.
- On-line document:
- #5,951,623 (at Patent Storm)
- Relationship to other papers:
- jump to this paper in my research summary
- Citation (BibTeX):
- Frederick Herz, Lyle Ungar, and Jason M. Eisner. System for the automatic determination of customized prices and promotions. Patent pending, filed 1996.
- Relationship to other papers:
- jump to this paper in my research summary
- Abstract:
- This invention relates to customized electronic identification of desirable objects, such as news articles, in an electronic media environment, and in particular to a system that automatically constructs both a "target profile" for each target object in the electronic media based, for example, on the frequency with which each word appears in an article relative to its overall frequency of use in all articles, as well as a "target profile interest summary" for each user, which target profile interest summary describes the user's interest level in various types of target objects. The system then evaluates the target profiles against the users' target profile interest summaries to generate a user-customized rank ordered listing of target objects most likely to be of interest to each user so that the user can select from among these potentially relevant target objects, which were automatically selected by this system from the plethora of target objects that are profiled on the electronic media. Users' target profile interest summaries can be used to efficiently organize the distribution of information in a large scale system consisting of many users interconnected by means of a communication network. Additionally, a cryptographically-based pseudonym proxy server is provided to ensure the privacy of a user's target profile interest summary, by giving the user control over the ability of third parties to access this summary and to identify or contact the user.
- Citations (BibTeX):
- Frederick S. M. Herz, Jason M. Eisner, and Lyle H. Ungar. System for generation of object profiles for a system for customized electronic identification of desirable objects. U.S. Patent #5,835,087 issued 11/10/1998, filed 1995.
- Frederick S. M. Herz, Jason M. Eisner, Lyle H. Ungar, and Mitchell P. Marcus. System for generation of user profiles for a system for customized electronic identification of desirable objects. U.S. Patent #5,754,939 issued 5/19/1998, filed 1995.
- Frederick S. M. Herz, Jason Eisner, and Marcos Salganicoff. Pseudonymous server for system for customized electronic identification of desirable objects. U.S. Patent #5,754,938 issued 5/19/1998, filed 1995.
- On-line documents:
- #5,835,087; #5,754,939; #5,754,983 (at Patent Storm)
- Relationship to other papers:
- jump to this paper in my research summary
http://cs.jhu.edu/~jason/papers
Jason Eisner - jason@cs.jhu.edu (tech correspondence welcome) | Last Mod $Date: 2006/11/08 05:55:24 $ |