Jason Eisner - Abstracts

Jason Eisner - All Papers and Slides

Publications listed here in reverse chronological order.
Also available:

List by topical area (includes summary of work in each area)
List by publication type, on my CV (PDF)
BibTeX database

Technical correspondence welcome! (email)

Visual Navigation Through Large Directed Graphs and Hypergraphs (2006)

Abstract:: We describe Dynasty, a system for browsing large (possibly infinite) directed graphs and hypergraphs. Only a small subgraph is visible at any given time. We sketch how we lay out the visible subgraph, and how we update the layout smoothly and dynamically in an asynchronous environment. We also sketch our user interface for browsing and annotating such graphs -- in particular, how we try to make keyboard navigation usable.
Citation (BibTeX):: Eisner, Jason, Michael Kornbluh, Gordon Woodhull, Raymond Buse, Samuel Huang, Constantinos Michael, and George Shafer (2006). Visual navigation through large directed graphs and hypergraphs. Electronic Proceedings of the IEEE Symposium on Information Visualization (InfoVis'06), Poster/Demo Session, Baltimore, October.
On-line document:: 2 pp. PDF (93K)
Poster (4 feet by 3 feet):: powerpoint (434K)
Other resources:: The Dynasty website (includes screenshots and documentation)
Relationship to other papers:: jump to this paper in my research summary

A Natural-Language Approach to Automated Cryptanalysis of Two-Time Pads (2006)

Abstract:: While keystream reuse in stream ciphers and one-time pads has been a well known problem for several decades, the risk to real systems has been underappreciated. Previous techniques have relied on being able to accurately guess words and phrases that appear in one of the plaintext messages, making it far easier to claim that "an attacker would never be able to do that." In this paper, we show how an adversary can automatically recover messages encrypted under the same keystream if only the type of each message is known (e.g. an HTML page in English). Our method, which is related to HMMs, recovers the most probable plaintext of this type by using a statistical language model and a dynamic programming algorithm. It produces up to 99% accuracy on realistic data and can process ciphertexts at 200ms per byte on a $2,000 PC. To further demonstrate the practical effectiveness of the method, we show that our tool can recover documents encrypted by Microsoft Word 2002.
Citation (BibTeX):: Mason, Joshua, Kathryn Watkins, Jason Eisner, and Adam Stubblefield (2006). A natural-language approach to automated cryptanalysis of two-time pads. Proceedings of the ACM Conference on Computer and Communications Security (CCS), Alexandria, VA, October.
On-line document:: 10 pp. PDF (298K)
Relationship to other papers:: jump to this paper in my research summary

Program Transformations for Optimization of Parsing Algorithms and Other Weighted Logic Programs (2006)

Abstract:: Dynamic programming algorithms in statistical natural language processing can be easily described as weighted logic programs. We give a notation and semantics for such programs. We then describe several source-to-source transformations that affect a program's efficiency, primarily by rearranging computations for better reuse or by changing the search strategy.
Citation (BibTeX):: Eisner, Jason and John Blatz (2006). Program transformations for optimization of parsing algorithms and other weighted logic programs. Pre-proceedings of the 11th Conference on Formal Grammar (FG-2006), pp. 39-59, Malaga, July. Revised version to appear in the post-proceedings from CSLI Publications.
On-line document:: 21 pp. PDF (207K) pre-proceedings.
Relationship to other papers:: jump to this paper in my research summary

Better Informed Training of Latent Syntactic Features (2006)

Abstract:: We study unsupervised methods for learning refinements of the nonterminals in a treebank. Following Matsuzaki et al. (2005) and Prescher (2005), we may for example split NP without supervision into NP[0] and NP[1], which behave differently. We first propose to learn a PCFG that adds such features to nonterminals in such a way that they respect patterns of linguistic feature passing: each node's nonterminal features are either identical to, or independent of, those of its parent. This linguistic constraint reduces runtime and the number of parameters to be learned. However, it did not yield improvements when training on the Penn Treebank. An orthogonal strategy was more successful: to improve the performance of the EM learner by treebank preprocessing and by annealing methods that split nonterminals selectively. Using these methods, we can maintain high parsing accuracy while dramatically reducing the model size.
Citation (BibTeX):: Dreyer, Markus and Jason Eisner (2006). Better informed training of latent syntactic features. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 317-326, Sydney, July.
On-line document:: 10 pp. PDF (195K)
Poster:: PDF (5146K)
Relationship to other papers:: jump to this paper in my research summary

Minimum-Risk Annealing for Training Log-Linear Models (2006)

Abstract:: When training the parameters for a natural language system, one would prefer to minimize 1-best loss (error) on an evaluation set. Since the error surface for many natural language problems is piecewise constant and riddled with local minima, many systems instead optimize log-likelihood, which is conveniently differentiable and convex. We propose training instead to minimize the expected loss, or risk. We define this expectation using a probability distribution over hypotheses that we gradually sharpen (anneal) to focus on the 1-best hypothesis. Besides the linear loss functions used in previous work, we also describe techniques for optimizing nonlinear functions such as precision or the BLEU metric. We present experiments training log-linear combinations of models for dependency parsing and for machine translation. In machine translation, annealed minimum risk training achieves significant improvements in BLEU over standard minimum error training. We also show improvements in labeled dependency parsing.
Citation (BibTeX):: Smith, David A. and Jason Eisner (2006). Minimum-risk annealing for training log-linear models. Proceedings of the International Conference on Computational Linguistics and the Association for Computational Linguistics (COLING-ACL), Companion Volume, pp. 787-794, Sydney, July.
On-line document:: 8 pp. PDF (269K)
Relationship to other papers:: jump to this paper in my research summary

Annealing Structural Bias in Multilingual Weighted Grammar Induction (2006)

Abstract:: We first show how a structural locality bias can improve the accuracy of state-of-the-art dependency grammar induction models trained by EM from unannotated examples (Klein and Manning, 2004). Next, by annealing the free parameter that controls this bias, we achieve further improvements. We then describe an alternative kind of structural bias, toward "broken" hypotheses consisting of partial structures over segmented sentences, and show a similar pattern of improvement. We relate this approach to contrastive estimation (Smith and Eisner, 2005a), apply the latter to grammar induction in six languages, and show that our new approach improves accuracy by 1-17% (absolute) over CE (and 8-30% over EM), achieving to our knowledge the best results on this task to date. Our method, structural annealing, is a general technique with broad applicability to hidden-structure discovery problems.
Citation (BibTeX):: Smith, Noah A. and Jason Eisner (2006). Annealing structural bias in multilingual weighted grammar induction. Proceedings of the International Conference on Computational Linguistics and the Association for Computational Linguistics (COLING-ACL), pp. 569-576, Sydney, July.
On-line document:: 8 pp. PDF (191K)
Relationship to other papers:: jump to this paper in my research summary

Local Search with Very Large-Scale Neighborhoods for Optimal Permutations in Machine Translation (2006)

Abstract:: We introduce a novel decoding procedure for statistical machine translation and other ordering tasks based on a family of Very Large-Scale Neighborhoods, some of which have previously been applied to other NP-hard permutation problems. We significantly generalize these problems by simultaneously considering three distinct sets of ordering costs. We discuss how these costs might apply to MT, and some possibilities for training them. We show how to search and sample from exponentially large neighborhoods using efficient dynamic programming algorithms that resemble statistical parsing. We also incorporate techniques from statistical parsing to improve the runtime of our search. Finally, we report results of preliminary experiments indicating that the approach holds promise.
Citation (BibTeX):: Eisner, Jason and Roy W. Tromble (2006). Local search with very large-scale neighborhoods for optimal permutations in machine translation. Proceedings of the HLT-NAACL Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing, pp. 57-75, New York, June.
On-line document:: 19 pp. PDF (214K)
Slides:: powerpoint (556K, much animation)
Relationship to other papers:: jump to this paper in my research summary

Quasi-Synchronous Grammars: Alignment by Soft Projection of Syntactic Dependencies (2006)

Abstract:: Many syntactic models in machine translation are channels that transform one tree into another, or synchronous grammars that generate trees in parallel. We present a new model of the translation process: quasi-synchronous grammar (QG). Given a source-language parse tree T₁, a QG defines a monolingual grammar that generates translations of T₁. The trees T₂ allowed by this monolingual grammar are inspired by pieces of substructure in T₁ and aligned to T₁ at those points. We describe experiments learning quasi-synchronous context-free grammars from bitext. As with other monolingual language models, we evaluate the crossentropy of QGs on unseen text and show that a better fit to bilingual data is achieved by allowing greater syntactic divergence. When evaluated on a word alignment task, QG matches standard baselines.
Citation (BibTeX):: Smith, David A. and Jason Eisner (2006). Quasi-synchronous grammars: Alignment by soft projection of syntactic dependencies. Proceedings of the HLT-NAACL Workshop on Statistical Machine Translation, pages 23-30, New York, June.
On-line document:: 8 pp. PDF (190K)
Slides:: powerpoint (499K, some animation)
Relationship to other papers:: jump to this paper in my research summary

A Fast Finite-State Relaxation Method for Enforcing Global Constraints on Sequence Decoding (2006)

Abstract:: We describe finite-state constraint relaxation, a method for applying global constraints, expressed as automata, to sequence model decoding. We present algorithms for both hard constraints and binary soft constraints. On the CoNLL-2004 semantic role labeling task, we report a speedup of at least 16x over a previous method that used integer linear programming.
Citation (BibTeX):: Tromble, Roy W. and Jason Eisner (2006). A fast finite-state relaxation method for enforcing global constraints on sequence decoding. Proceedings of HLT-NAACL, pages 423-430, New York, June.
On-line document:: 8 pp. PDF (205K)
Slides:: powerpoint (1495K, animation)
Relationship to other papers:: jump to this paper in my research summary

Parsing with Soft and Hard Constraints on Dependency Length (2005)

Abstract:: In lexicalized phrase-structure or dependency parses, a word's modifiers tend to fall near it in the string. We show that a crude way to use dependency length as a parsing feature can substantially improve parsing speed and accuracy in English and Chinese, with more mixed results on German. We then show similar improvements by imposing hard bounds on dependency length and (additionally) modeling the resulting sequence of parse fragments. This simple "vine grammar" formalism has only finite-state power, but a context-free parameterization with some extra parameters for stringing fragments together. We exhibit a linear-time chart parsing algorithm with a low grammar constant.
Citation (BibTeX):: Eisner, Jason and Noah A. Smith (2005). Parsing with soft and hard constraints on dependency length. Proceedings of the International Workshop on Parsing Technologies (IWPT), Vancouver, October, pp. 30-41.
On-line document:: 12 pp. PDF (264K)
Slides:: powerpoint (1295K, animation)
Relationship to other papers:: jump to this paper in my research summary

Bootstrapping Without the Boot (2005)

Abstract:: "Bootstrapping" methods for learning require a small amount of supervision to seed the learning process. We show that it is sometimes possible to eliminate this last bit of supervision, by trying many candidate seeds and selecting the one with the most plausible outcome. We discuss such "strapping" methods in general, and exhibit a particular method for strapping word-sense classifiers for ambiguous words. Our experiments on the Canadian Hansards show that our unsupervised technique is significantly more effective than picking seeds by hand (Yarowsky, 1995), which in turn is known to rival supervised methods.
Citation (BibTeX):: Eisner, Jason and Damianos Karakos (2005). Bootstrapping without the boot. Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT-EMNLP), pp. 395-402, Vancouver, October.
On-line document:: 8 pp. PDF (205K), postscript (654K)
Slides with audio:: powerpoint (2118K, inessential animation), PDF (3358K, 6 per page)
slightly longer powerpoint (2507K) with spoken lecture (1-hour MP3 audio, 55M)
Relationship to other papers:: jump to this paper in my research summary

Compiling Comp Ling: Weighted Dynamic Programming and the Dyna Language (2005)

Abstract:: Weighted deduction with aggregation is a powerful theoretical formalism that encompasses many NLP algorithms. This paper proposes a declarative specification language, Dyna; gives general agenda-based algorithms for computing weights and gradients; briefly discusses Dyna-to-Dyna program transformations; and shows that a first implementation of a Dyna-to-C++ compiler produces code that is efficient enough for real NLP research, though still several times slower than hand-crafted code.
Citation (BibTeX):: Eisner, Jason, Eric Goldlust, and Noah A. Smith (2005). Compiling comp ling: Weighted dynamic programming and the Dyna language. Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT-EMNLP), pp. 281-290, Vancouver, October.
On-line document:: 10 pp. PDF (246K), postscript (843K)
Slides with audio:: powerpoint (1223K, animation), PDF (474K, 6 per page)
spoken lecture (25-minute Windows Media audio, 23M)
Other resources:: The Dyna website
Relationship to other papers:: jump to this paper in my research summary

Guiding Unsupervised Grammar Induction Using Contrastive Estimation (2005)

Abstract:: We describe a novel training criterion for probabilistic grammar induction models, contrastive estimation [Smith and Eisner, 2005], which can be interpreted as exploiting implicit negative evidence and includes a wide class of likelihood-based objective functions. This criterion is a generalization of the function maximized by the Expectation-Maximization algorithm [Dempster et al., 1977]. CE is a natural fit for log-linear models, which can include arbitrary features but for which EM is computationally difficult. We show that, using the same features, log-linear dependency grammar models trained using CE can drastically outperform EM-trained generative models on the task of matching human linguistic annotations (the MATCHLINGUIST task). The selection of an implicit negative evidence class -- a "neighborhood" -- appropriate to a given task has strong implications, but a good neighborhood can target the objective of grammar induction to a specific application.
Citation (BibTeX):: Smith, Noah A. and Jason Eisner (2005). Guiding unsupervised grammar induction using contrastive estimation. International Joint Conference on Artificial Intelligence (IJCAI) Workshop on Grammatical Inference Applications, pp. 73-82, Edinburgh, July.
On-line document:: 10 pp. PDF (220K, minor corrections), postscript (1009K, minor corrections)
Relationship to other papers:: jump to this paper in my research summary

Contrastive Estimation: Training Log-Linear Models on Unlabeled Data (2005)

Nominated for best paper award.
Abstract:: Conditional random fields (Lafferty et al., 2001) are quite effective at sequence labeling tasks like shallow parsing (Sha and Pereira, 2003) and namedentity extraction (McCallum and Li, 2003). CRFs are log-linear, allowing the incorporation of arbitrary features into the model. To train on unlabeled data, we require unsupervised estimation methods for log-linear models; few exist. We describe a novel approach, contrastive estimation. We show that the new technique can be intuitively understood as exploiting implicit negative evidence and is computationally efficient. Applied to a sequence labeling problem -- POS tagging given a tagging dictionary and unlabeled text -- contrastive estimation outperforms EM (with the same feature set), is more robust to degradations of the dictionary, and can largely recover by modeling additional features.
Citation (BibTeX):: Smith, Noah A. and Jason Eisner (2005). Contrastive estimation: Training log-linear models on unlabeled data. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), pages 354-362, Ann Arbor, Michigan, June.
Slides: HTML
On-line document:: 9 pp. PDF (248K), postscript (1007K)
Relationship to other papers:: jump to this paper in my research summary

A Class of Rational n-WFSM Auto-Intersections (2005)

Abstract:: Weighted finite-state machines with n tapes describe n-ary rational string relations. The join n-ary relation is very important in applications. It is shown how to compute it via a more simple operation, the auto-intersection. Join and auto-intersection generally do not preserve rationality. We define a class of triples (A,i,j) such that the auto-intersection of the machine A on tapes i and j can be computed by a delay-based algorithm. We point out how to extend this class and hope that it is sufficient for many practical applications.
Citation (BibTeX):: Kempe, André, Jean-Marc Champarnaud, Jason Eisner, Franck Guingne, and Florent Nicart (2005). A class of rational n-WFSM auto-intersections. Proceedings of the Tenth International Conference on Implementation and Application of Automata (CIAA-2005). Lecture Notes in Computer Science 3845:189-200. Springer-Verlag.
On-line document:: 12 pp. PDF (201K)
Relationship to other papers:: jump to this paper in my research summary

Unsupervised Classification via Decision Trees: An Information-Theoretic Perspective (2005)

Abstract:: Integrated Sensing and Processing Decision Trees (ISPDTs) were introduced in [1] as a tool for supervised classification of high-dimensional data. In this paper, we consider the problem of unsupervised classification, through a recursive construction of ISPDTs, where at each internal node the data (i) are split into clusters, and (ii) are transformed independently of other clusters, guided by some optimization objective. We show that the maximization of information-theoretic quantities such as mutual information and alpha-divergences is theoretically justified for growing ISPDTs, assuming that each data point is generated by a finite-memory random process given the class label. Furthermore, we present heuristics that perform the maximization in a greedy manner, and we demonstrate their effectiveness with empirical results from multi-spectral imaging.
Citation (BibTeX):: Karakos, Damianos, Sanjeev Khudanpur, Jason Eisner, and Carey E. Priebe (2005). Unsupervised classification via decision trees: An information-theoretic perspective. Proceedings of the 2005 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1081-1084, Philadelphia, March. Invited talk.
On-line document:: 4 pp. PDF (712K)
Relationship to other papers:: jump to this paper in my research summary

A Note on Join and Auto-Intersection of n-ary Rational Relations (2004)

Abstract:: A finite-state machine with n tapes describes a rational (or regular) relation on n strings. It is more expressive than a relational database table with n columns, which can only describe a finite relation.
We describe some basic operations on n-ary rational relations and propose notation for them. (For generality we give the semiring-weighted case in which each tuple has a weight.) Unfortunately, the join operation is problematic: if two rational relations are joined on more than one tape, it can lead to non-rational relations with undecidable properties. We recast join in terms of "auto-intersection" and illustrate some cases in which difficulties arise. We close with the hope that partial or restricted algorithms may be found that are still powerful enough to have practical use.
Citation (BibTeX):: André Kempe, Jean-Marc Champarnaud, and Jason Eisner (2004). A note on join and auto-intersection of n-ary rational relations. In Loek Cleophas and Bruce Watson, editors, Proceedings of the Eindhoven FASTAR Days (Computer Science Technical Report 04-40), pages 64-78. Department of Mathematics and Computer Science, Technische Universiteit Eindhoven, Netherlands, December.
On-line document:: 15 pp. PDF (276K), postscript (429K)
Relationship to other papers:: jump to this paper in my research summary

Dyna: A Declarative Language for Implementing Dynamic Programs (2004)

Abstract:: We present the first version of a new declarative programming language. Dyna has many uses but was designed especially for rapid development of new statistical NLP systems. A Dyna program is a small set of equations, resembling Prolog inference rules, that specify the abstract structure of a dynamic programming algorithm. It compiles into efficient, portable, C++ classes that can be easily invoked from a larger application. By default, these classes run a generalization of agenda-based parsing, prioritizing the partial parses by some figure of merit. The classes can also perform an exact backward (outside) pass in the service of parameter training. The compiler already knows several implementation tricks, algorithmic transforms, and numerical optimization techniques. It will acquire more over time: we intend for it to generalize and encapsulate best practices, and serve as a testbed for new practices. Dyna is now being used for parsing, machine translation, morphological analysis, grammar induction, and finite-state modeling.
Citation (BibTeX):: Eisner, Jason, Eric Goldlust, and Noah A. Smith (2004). Dyna: A declarative language for implementing dynamic programs. Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL), Companion Volume, pages 218-221, Barcelona, July.
On-line document:: 4 pp. PDF (60K), postscript (68K)
Consider this longer, more recent paper instead
Other resources:: The Dyna website
Relationship to other papers:: jump to this paper in my research summary

Annealing Techniques for Unsupervised Statistical Language Learning (2004)

Abstract:: Exploiting unannotated natural language data is hard largely because unsupervised parameter estimation is hard. We describe deterministic annealing (Rose et al., 1990) as an appealing alternative to the Expectation-Maximization algorithm (Dempster et al., 1977). Seeking to avoid search error, DA begins by globally maximizing an easy concave function and maintains a local maximum as it gradually morphs the function into the desired non-concave likelihood function. Applying DA to parsing and tagging models is shown to be straightforward; significant improvements over EM are shown on a part-of-speech tagging task. We describe a variant, skewed DA, which can incorporate a good initializer when it is available, and show significant improvements over EM on a grammar induction task.
Citation (BibTeX):: Smith, Noah A. and Jason Eisner (2004). Annealing techniques for unsupervised statistical language learning. Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL), pages 486-493, Barcelona, July.
On-line document:: 8 pp. PDF (144K), postscript (171K)
Relationship to other papers:: jump to this paper in my research summary

Radiology Report Entry with Automatic Phrase Completion Driven by Language Modeling (2004)

Abstract:: Language modeling, a technology found in many computerized speech recognition systems, can also be used in a text editor to implement an automated phrase completion feature that significantly reduces the number of keystrokes required to generate a radiology report, therefore increasing typing speed.
Radiology reports have especially low entropy, which allows prediction of multi-word phrases. Our system therefore chooses an optimal phrase length for each prediction, using Bellman-style dynamic programming to minimize the expected cost of typing the rest of the document. This computation considers what the user is likely to type in the future, and how many keystrokes it will take, considering the future effect of phrase completion as well.
Citation (BibTeX):: Eng, John and Jason M. Eisner (2004). Radiology report entry with automatic phrase completion driven by language modeling. Radiographics 24(5):1493-1501, September-October.
On-line:: 9 pp. PDF (636K)
Relationship to other papers:: jump to this paper in my research summary

Natural Language Generation in the Context of Machine Translation (2004)

Abstract:: Final report from the team at the JHU CLSP 2002 summer workshop. See project description.
Citation (BibTeX):: Jan Hajic, Martin Cmejrek, Bonnie Dorr, Yuan Ding, Jason Eisner, Daniel Gildea, Terry Koo, Kristen Parton, Gerald Penn, Dragomir Radev, and Owen Rambow (2004). Natural language generation in the context of machine translation. Technical report, Center for Language and Speech Processing, Johns Hopkins University, Baltimore, March. Final report from 2002 CLSP summer workshop. 87 pp.
On-line document:: 87 pp. PDF (395K)
Slides:: powerpoint (1072K)
Other resources:: Official workshop team page
Relationship to other papers:: jump to this paper in my research summary

Learning Non-Isomorphic Tree Mappings for Machine Translation (2003)

Abstract:: Often one may wish to learn a tree-to-tree mapping, training it on unaligned pairs of trees, or on a mixture of trees and strings. Unlike previous statistical formalisms (limited to isomorphic trees), synchronous tree substitution grammar allows local distortion of the tree topology. We reformulate it to permit dependency trees, and sketch EM/Viterbi algorithms for alignment, training, and decoding.
[Note: At a reviewer's request, the paper describes TSG more formally than in the previous literature, which might be helpful for some readers and implementers.]
Citation (BibTeX):: Eisner, Jason (2003). Learning non-isomorphic tree mappings for machine translation. Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (Companion Volume), 205-208, Sapporo, July.
On-line document:: 4 pp. PDF (91K), postscript (103K) - see also correction
Slides:: powerpoint (320K, much animation), PDF (267K, 6 per page)
Relationship to other papers:: jump to this paper in my research summary

Simpler and More General Minimization for Weighted Finite-State Automata (2003)

Abstract:: Previous work on minimizing weighted finite-state automata (including transducers) is limited to particular types of weights. We present efficient new minimization algorithms that apply much more generally, while being simpler and about as fast.
We also point out theoretical limits on minimization algorithms. We characterize the kind of ``well-behaved'' weight semirings where our methods work. Outside these semirings, minimization is not well-defined (in the sense of producing a unique minimal automaton), and even finding the minimum number of states is in general NP-complete and inapproximable.
Citation (BibTeX):: Eisner, Jason (2003). Simpler and more general minimization for weighted finite-state automata. Proceedings of the Joint Meeting of the Human Language Technology Conference and the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pages 64-71, Edmonton, May.
On-line document:: 8 pp. PDF (263K), postscript (2027K)
Slides:: powerpoint (615K, much animation), PDF (397K, 6 per page)
Relationship to other papers:: jump to this paper in my research summary

Parameter Estimation for Probabilistic Finite-State Transducers (2002)

Abstract:: Weighted finite-state transducers suffer from the lack of a training algorithm. Training is even harder for transducers that have been assembled via finite-state operations such as composition, minimization, union, concatenation, and closure, as this yields tricky parameter tying. We formulate a "parameterized FST" paradigm and give training algorithms for it, including a general bookkeeping trick ("expectation semirings") that cleanly and efficiently computes expectations and gradients.
Citation (BibTeX):: Eisner, Jason (2002). Parameter estimation for probabilistic finite-state transducers. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 1-8, Philadelphia, July.
On-line document:: 8 pp. PDF (230K), postscript (1758K)
Slides from a longer version of this talk:: powerpoint (850K, some animation, a few speaker notes), PDF (479K, 6 per page)
Relationship to other papers:: jump to this paper in my research summary

Comprehension and Compilation in Optimality Theory (2002)

Abstract:: This paper ties up some loose ends in finite-state Optimality Theory. First, it discusses how to perform comprehension under Optimality Theory grammars consisting of finite-state constraints. Comprehension has not been much studied in OT; we show that unlike production, it does not always yield a regular set, making finite-state methods inapplicable. However, after giving a suitably flexible presentation of OT, we show carefully how to treat comprehension under recent variants of OT in which grammars can be compiled into finite-state transducers. We then unify these variants, showing that compilation is possible if all components of the grammar are regular relations, including the harmony ordering on scored candidates. A side benefit of our construction is a far simpler implementation of directional OT (Eisner, 2000).
Citation (BibTeX):: Eisner, Jason (2002). Comprehension and compilation in Optimality Theory. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 56-63, Philadelphia, July.
On-line document:: 8 pp. postscript (242K), PDF (191K, missing tableau shading)
Slides:: powerpoint (302K, a few special effects), PDF (240K, 6 per page)
Other resources:: related homework assignment (see problem 4) with an implementation
overlapping paper by Gerhard Jäger published independently at the same time
Relationship to other papers:: jump to this paper in my research summary

An Interactive Spreadsheet for Teaching the Forward-Backward Algorithm (2002)

Abstract:: This paper offers a detailed lesson plan on the forward-backward algorithm. The lesson is taught from a live, commented spreadsheet that implements the algorithm and graphs its behavior on a whimsical toy example. By experimenting with different inputs, one can help students develop intuitions about HMMs in particular and Expectation Maximization in general. The spreadsheet and a coordinated follow-up assignment are available.
Citation (BibTeX):: Eisner, Jason (2002). An interactive spreadsheet for teaching the forward-backward algorithm. In Dragomir Radev and Chris Brew (eds.), Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching NLP and CL, pages 10-18, Philadelphia, July.
On-line document (includes lesson plan):: 9 pp. PDF (242K, prettier), postscript (1026K, uglier)
Other resources:: Excel spreadsheet (372K), Viterbi version (324K)
Tricks and tips for quick navigation when teaching from a live spreadsheet
homework assignment (contact me for latex source)

Similar spreadsheet animating EM soft clustering (245K)
Animated Powerpoint examples of Earley's algorithm (1482K) and fast bilexical parsing (517K)
Related sites (discovered after publication):: Spreadsheets in Education has a long bibliography, many links, and examples (including Fourier synthesis!)
Visualizations with Excel explains how to do algorithm animation with Excel macros; examples include edit distance and Huffman coding
Relationship to other papers:: jump to this paper in my research summary

Transformational Priors Over Grammars (2002)

Nominated for best paper award.
Abstract:: This paper proposes a novel class of PCFG parameterizations that support linguistically reasonable priors over PCFGs. To estimate the parameters is to discover a notion of relatedness among context-free rules such that related rules tend to have related probabilities. The prior favors grammars in which the relationships are simple to describe and have few major exceptions. A basic version that bases relatedness on weighted edit distance yields superior smoothing of grammars learned from the Penn Treebank (20% reduction of rule perplexity over the best previous method).
Citation (BibTeX):: Eisner, Jason (2002). Transformational priors over grammars. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 63-70, Philadelphia, July.
On-line document:: 8 pp. postscript (205K), PDF (166K)
Slides:: powerpoint (749K, some speaker notes & animation), PDF (338K, 6 per page), PDF (579K, with speaker notes)
Relationship to other papers:: jump to this paper in my research summary

Discovering Syntactic Deep Structure via Bayesian Statistics (2002)

Abstract:: In the Bayesian framework, a language learner should seek a grammar that explains observed data well and is also a priori probable. This paper proposes such a measure of prior probability. Indeed it develops a full statistical framework for lexicalized syntax. The learner's job is to discover the system of probabilistic transformations (often called lexical redundancy rules) that underlies the patterns of regular and irregular syntactic constructions listed in the lexicon. Specifically, the learner discovers what transformations apply in the language, how often they apply, and in what contexts. It considers simpler systems of transformations to be more probable a priori. Experiments show that the learned transformations are more effective than previous statistical models at predicting the probabilities of lexical entries, especially those for which the learner had no direct evidence.
Citation (BibTeX):: Eisner, Jason (2002). Discovering syntactic deep structure via Bayesian statistics. Cognitive Science 26(3):255-268, May-June.
On-line document:: 9 pp. PDF (145K)
Slides: see this talk
Relationship to other papers:: jump to this paper in my research summary

Introduction to the Special Section on Linguistically Apt Statistical Methods (2002)

Abstract:: This brief introduction, from the editor of the special section, reviews why and how statistical and linguistic approaches to language can help each other. It also asks how statistical modeling fits into the broader program of cognitive science.
Citation (BibTeX):: Eisner, Jason (2002). Introduction to the special section on linguistically apt statistical methods. Cognitive Science 26(3):235-237, May-June.
On-line document:: 2 pp. PDF (30K)
Relationship to other papers:: jump to this paper in my research summary

Expectation Semirings: Flexible EM for Finite-State Transducers (2001)

Abstract:: This paper gives the first EM algorithm for general probabilistic finite-state transducers (with epsilon). Furthermore, the approach is powerful enough to fit machines' parameters even after the machines are combined by operations of the finite-state calculus, such as composition and minimization. This allows an expert to build a parameterized transducer in any way that is appropriate to the domain, and then fit the parameters automatically from data. Many standard algorithms are special cases, and there are many further applications. Yet the algorithm remains surprisingly simple because all the difficult work is subcontracted to existing algorithms for semiring-weighted automata. The trick is to use a novel semiring.
Citation (BibTeX):: Eisner, Jason (2001). Expectation semirings: Flexible EM for finite-state transducers. In Gertjan van Noord (ed.), Proceedings of the ESSLLI Workshop on Finite-State Methods in Natural Language Processing, Helsinki, August. 5 pages.
On-line document:: 5 pp. postscript (176K), PDF (240K)
Slides: see this talk
Relationship to other papers:: jump to this paper in my research summary

Smoothing a Probabilistic Lexicon Via Syntactic Transformations (2001)

Abstract:

Probabilistic parsing requires a lexicon that specifies each word's syntactic preferences in terms of probabilities. To estimate these probabilities for words that were poorly observed during training, this thesis assumes the existence of arbitrarily powerful transformations (also known to linguists as lexical redundancy rules or metarules) that can add, delete, retype or reorder the argument and adjunct positions specified by a lexical entry.

In a given language, some transformations apply frequently and others rarely. We describe how to estimate the rates of the transformations from a sample of lexical entries. More deeply, we learn which properties of a transformation increase or decrease its rate in the language. As a result, we can smooth the probabilities of lexical entries. Given enough direct evidence about a lexical entry's probability, our Bayesian approach trusts the evidence; but when less evidence or no evidence is available, it relies more on the transformations' rates to guess how often the entry will be derived from related entries.

Abstractly, the proposed ``transformation models'' are probability distributions that arise from graph random walks with a log-linear parameterization. A domain expert constructs the parameterized graph, and a vertex is likely according to whether random walks tend to halt at it. Transformation models are suited to any domain where ``related'' events (as defined by the graph) may have positively covarying probabilities. Such models admit a natural prior that favors simple regular relationships over stipulative exceptions. The model parameters can be locally optimized by gradient-based methods or by Expectation-Maximization. Exact algorithms (matrix inversion) and approximate ones (relaxation) are provided, with optimizations. Variations on the idea are also discussed.

We compare the new technique empirically to previous techniques from the probabilistic parsing literature, using comparable features, and obtain a 20% perplexity reduction (similar to doubling the amount of training data). Some of this reduction is shown to stem from the transformation model's ability to match observed probabilities, and some from its ability to generalize. Model averaging yields a final 24% perplexity reduction.

Citation (BibTeX):

Eisner, Jason M (2001). Smoothing a Probabilistic Lexicon Via Syntactic Transformations. Ph.D. thesis, University of Pennsylvania, July. 318 pages.

On-line document:

318 double-spaced pp. PDF (2850K), PS (2909K)
"Chapter 1: An Executive Summary": 33 double-spaced pp. , PDF (364K), PS (586K)

Slides: see this talk

Relationship to other papers:

jump to this paper in my research summary

Easy and Hard Constraint Ranking in Optimality Theory: Algorithms and Complexity (2000)

Abstract:

We consider the problem of ranking a set of OT constraints in a manner consistent with data.

We speed up Tesar and Smolensky's RCD algorithm to be linear on the number of constraints. This finds a ranking so each attested form x_i beats or ties a particular competitor y_i.

We also generalize RCD so each x_i beats or ties all possible competitors. Alas, neither this more realistic version of learning, nor even generation, has any polynomial algorithm unless P=NP! That is, one cannot improve qualitatively upon brute force:

Merely checking that a single (given) ranking is consistent with given forms is coNP-complete if the surface forms are fully observed and Delta₂^p-complete if not. Indeed, OT generation is OptP-complete. As for ranking, determining whether any consistent ranking exists is coNP-hard (but in Delta₂^p) if the forms are fully observed, and Sigma₂^p-complete if not.

Finally, we show that generation and ranking are easier in derivational theories: in P, and NP-complete.

Citation (BibTeX):

Eisner, Jason (2000). Easy and hard constraint ranking in optimality theory: Algorithms and complexity. In Jason Eisner, Lauri Karttunen and Alain Thériault (eds.), Finite-State Phonology: Proceedings of the 5th Workshop of the ACL Special Interest Group in Computational Phonology (SIGPHON), pages 22-33, Luxembourg, August.

On-line document:

12 pp. A4 postscript (321K), PDF (335K)

Slides:

powerpoint (265K, some animation), PDF (172K, 6 per page)

Relationship to other papers:

jump to this paper in my research summary

Directional Constraint Evaluation in Optimality Theory (2000)

Abstract:: Weighted finite-state constraints that can count unboundedly many violations make Optimality Theory more powerful than finite-state transduction (Frank and Satta, 1998). This result is empirically and computationally awkward. We propose replacing these unbounded constraints, as well as non-finite-state Generalized Alignment constraints, with a new class of finite-state directional constraints. We give linguistic applications, results on generative power, and algorithms to compile grammars into transducers.
Citation (BibTeX):: Eisner, Jason (2000). Directional constraint evaluation in Optimality Theory. Proceedings of the 18th International Conference on Computational Linguistics (COLING 2000), pages 257-263, Saarbrücken, August.
On-line document:: 7 pp. postscript (266K), PDF (266K)
Slides (detailed speaker notes, light animation):: powerpoint (855K, detailed speaker notes, light animation), PDF (263K, 6 per page), PDF (417K, with speaker notes)
Relationship to other papers:: jump to this paper in my research summary

The Science of Language: Computational Linguistics (2000)

Citation (BibTeX):: Eisner, Jason (2000). The science of language: Computational linguistics. Imagine Magazine 7(4):14-15, Center for Talented Youth, Johns Hopkins University, Baltimore, March/April.
On-line document:: 2 pp. PDF (4640K)

Review of Optimality Theory by René Kager (2000)

Abstract:: This book review also sketches why OT is interesting to computational linguists, and how it relates to other approaches for combining non-orthogonal surface features, such as maximum-entropy modeling.
Citation (BibTeX):: Eisner, Jason (2000). Review of Optimality Theory by René Kager. Computational Linguistics 26(2):286-290, June.
On-line document:: 5 pp. postscript (41K), PDF (196K)
Relationship to other papers:: jump to this paper in my research summary

Bilexical Grammars and Their Cubic-Time Parsing Algorithms (1997, 2000)

Abstract:: This paper introduces weighted bilexical grammars, a formalism in which individual lexical items, such as verbs and their arguments, can have idiosyncratic selectional influences on each other. Such ``bilexicalism'' has been a theme of much current work in parsing. The new formalism can be used to describe bilexical approaches to both dependency and phrase-structure grammars, and a slight modification yields link grammars. Its scoring approach is compatible with a wide variety of probability models.
The obvious parsing algorithm for bilexical grammars (used by most authors) takes time O(n⁵). A more efficient O(n³) method is exhibited. The new algorithm has been implemented and used in a large parsing experiment (Eisner 1996). We also give a useful extension to the case where the parser must undo a stochastic transduction that has altered the input.
Citation (BibTeX):: Eisner, Jason (2000). Bilexical grammars and their cubic-time parsing algorithms. In Harry Bunt and Anton Nijholt (eds.), Advances in Probabilistic and Other Parsing Technologies, pages 29-62. Kluwer Academic Publishers, October.
This is an improved and extended version of an earlier paper (BibTeX):
Eisner, Jason (1997). Bilexical grammars and a cubic-time probabilistic parser. Proceedings of the International Workshop on Parsing Technologies, pages 54-65, MIT, September.
On-line document:: 33 pp. postscript (316K), PDF (330K)
Earlier 1997 version: 12 pp. postscript (318K), PDF (383K)
Slides from 1997 talk (black-and-white, no animation):: powerpoint (680K), PDF (122K, 6 per page)
Relationship to other papers:: jump to this paper in my research summary

A Faster Parsing Algorithm for Lexicalized Tree-Adjoining Grammars (2000)

Abstract:: This paper points out some computational inefficiencies of standard TAG parsing algorithms when applied to LTAGs. We propose a novel algorithm with an asymptotic improvement, from O(n⁸ g² t) to O(n⁶ max(n,g) g t), where n is the input length and g, t are grammar constants that are independent of vocabulary size.
Citation (BibTeX):: Eisner, Jason and Giorgio Satta (2000). A faster parsing algorithm for lexicalized tree-adjoining grammars. Proceedings of the 5th Workshop on Tree-Adjoining Grammars and Related Formalisms (TAG+5), pages 14-19, Paris, May.
On-line document:: 6 pp. postscript (70K), PDF (82K)
Relationship to other papers:: jump to this paper in my research summary

Efficient Parsing for Bilexical Context-Free Grammars and Head Automaton Grammars (1999)

Abstract:: Several recent stochastic parsers use bilexical grammars, where each word type idiosyncratically prefers particular complements with particular head words. We present O(n⁴) parsing algorithms for two bilexical formalisms (see title), improving the previous upper bounds of O(n⁵). Also, for a common special case that was known to allow O(n³) parsing (Eisner, 1997), we present an O(n³) algorithm with an improved grammar constant.
Citation (BibTeX):: Eisner, Jason and Giorgio Satta (1999). Efficient parsing for bilexical context-free grammars and head automaton grammars. Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pages 457-464, College Park, Maryland, June.
On-line document:: 8 pp. postscript (536K), PDF (413K)
Slides:: powerpoint (517K, animation, speaker notes), PDF (203K, 6 per page), PDF (284K, with speaker notes)
Note that the slides include experimental speed comparisons that were not in the paper.
Relationship to other papers:: jump to this paper in my research summary

Doing OT in a Straitjacket (1999)

Note:

This is an extended version of What Constraints Should OT Allow? (1997).

Abstract:

A universal theory of human phonology should be clearly specified and falsifiable. To turn Optimality Theory (OT) into a complete proposal for phonological Universal Grammar, one must put some cards on the table: What kinds of constraints may an OT grammar state? And how can anyone tell what data this grammar predicts, without constructing infinite tableaux?

In this talk I'll motivate a restrictive formalization of OT that allows just two types of simple, local constraint. Gen freely proposes gestures and prosodic constituents; the constraints try to force these to coincide or not coincide temporally. An efficient algorithm exists to find the optimal candidate.

I will argue that despite its simplicity, primitive OT is expressive enough to describe and unify nearly all current work in OT phonology. However, it is provably more constrained: because it is unable to mimic deeply non-local mechanisms like Generalized Alignment, it forces a new and arguably better account of metrical stress typology. I will even discuss a proposal for constraining it further.

Citation (BibTeX):

Eisner, Jason M (1999). Doing OT in a straitjacket. Talk handout (27 pages), UCLA Linguistics Dept., June. http://cs.jhu.edu/~jason/papers/#ucla99. Extended version of talk at the 1997 LSA.

On-line document:

27 pp. postscript (297K), PDF (458K)
14 pp. postscript (260K), PDF (407K) (as handout)

Relationship to other papers:

jump to this paper in my research summary

Efficient Generation in Primitive Optimality Theory (1997)

Abstract:: This paper introduces computational linguists to primitive Optimality Theory (OTP), a clean and linguistically motivated formalization of OT. OTP specifies the class of autosegmental representations, the universal generator Gen, and the two simple families of permissible constraints. It is therefore possible to study its computational generation, comprehension, and learning properties.
Some results on generation are presented. Unlike less restricted theories using Generalized Alignment, OTP grammars can derive optimal surface forms with finite-state methods adapted from Ellison (1994). Unfortunately these methods take time exponential on the size of the grammar. Indeed the generation problem is shown NP-complete in this sense. However, techniques are discussed for making Ellison's approach fast and practical in the typical case, including a simple trick that alone provides a 100-fold speedup on a grammar fragment of moderate size. One avenue for future improvements is a new finite-state notion, ``factored automata,'' where regular languages are represented compactly via formal intersections of FSAs.
Citation (BibTeX):: Eisner, Jason (1997). Efficient generation in primitive Optimality Theory. Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, pages 313-320, Madrid, July.
On-line document:: 8 pp. postscript (178K), PDF (235K)
Slides (black-and-white, no animation, no speaker notes):: powerpoint (263K), PDF (104K)
Other resources:: Some proof details suppressed from the paper for space reasons
Relationship to other papers:: jump to this paper in my research summary

State-of-the-Art Algorithms for Minimum Spanning Trees: A Tutorial Discussion (1997)

Abstract:: The classic ``easy'' optimization problem is to find the MST of a connected, undirected graph. Good polynomial-time algorithms have been known since 1930. Over the last 10 years, however, the standard O(m log n) results of Kruskal and Prim have been improved to linear or near-linear time. The new methods use several tricks of general interest in order to reduce the number of edge weight comparisons and the amount of other work. This tutorial reviews those methods, building up strategies step by step so as to expose the insights behind the algorithms. Implementation details are clarified, and some generalizations are given.
Specifically, the paper attempts to shed light on the classical algorithms of Kruskal, Prim, and Boruvka; the improved approach of Gabow, Galil, and Spencer, which takes time only O(m log (lg* n - lg* m/n)); and the randomized O(m) algorithm of Karger, Klein, and Tarjan, which relies on an O(m) MST verification algorithm by King. It also considers Frederickson's method for maintaining an MST in time O(sqrt(m)) per change to the graph. An appendix explains Fibonacci heaps.
Citation (BibTeX):: Eisner, Jason (1997). State-of-the-art algorithms for minimum spanning trees: A tutorial discussion. Manuscript, University of Pennsylvania, April. 78 pp. (To be turned into a technical report, with more diagrams, as soon as I get a chance.)
On-line document:: 78 pp. postscript (725K), PDF (1037K)
Relationship to other papers:: jump to this paper in my research summary

FootForm Decomposed: Using primitive constraints in OT (1997)

Abstract:

Hayes (1995) gives a typology of the world's metrical stress systems, which is marked by several striking asymmetries (parametric gaps). Most work on metrical stress within Optimality Theory (OT) has adopted this typology without explaining the gaps. Moreover, the OT versions use uncomfortably non-local constraints (Align, FootForm, FtBin).

This paper presents a rather different and in some ways more explanatory typology of stress, couched in the framework of primitive Optimality Theory (OTP), which allows only primitive, radically local constraints. For example, Generalized Alignment is not allowed. The paper presents a single, coherent system of rerankable constraints that yields the basic facts about iambic and trochaic foot form, iambic lengthening, quantity sensitivity, unbounded feet, simple word-initial and word-final stress, directionality of footing, syllable (and foot) extrametricality, degenerate feet, and word-level stress.

The metrical part of the account rests on the following intuitions:

(a) iambs are special because syllable structure allows them to lengthen their strong ends;
(b) directionality of footing is really the result of local lapse avoidance;
(c) any lapses are forced by a (localist) generalization of right extrametricality;
(d) although degenerate feet are absolutely banned, primary stress does not require a foot in all languages.

An interesting prediction of (b) and (c) is that left-to-right trochees should be incompatible with extrametricality. This prediction is robustly confirmed in Hayes.

Citation (BibTeX):

Eisner, Jason (1997). FootForm decomposed: Using primitive constraints in OT. In Benjamin Bruening (ed.), MIT Working Papers in Linguistics, vol. 31, pages 115-143.

On-line document:

29 pp. postscript (301K), PDF (305K)

Relationship to other papers:

jump to this paper in my research summary

What Constraints Should OT Allow? (1997)

Note:

A more recent, extended version of this talk is Doing OT in a Straitjacket (1999).

Abstract:

Optimality Theory (OT) has shown itself to be an elegant framework for phonological description. Two important questions remain to be settled, however: What constraints are allowed? And what kind of representations do they constrain? Formalizing what OT can and cannot say is part of stating UG.

This talk proposes an approach to constraining OT, called "primitive Optimality Theory" (OTP). Most constraints given in the literature can be reformulated (not always obviously) as coming from one of two simple, local families of ``primitive constraints'':

Alignment (licensing): Each a temporally overlaps some b. (If not, it incurs one violation mark.)
Disalignment (clash): Each a temporally overlaps no b. (Each overlap incurs one violation mark.)

Here, a and b may be constituents, edges of constituents, or restricted kinds of conjunctive or disjunctive configurations.

We will formalize these families and the representations that they constrain. As in Optimal Domains Theory, neither the constraints nor the representations use association lines. The constraints control only the relative timing of articulatory gestures, and other phonological or morphological constituents, along a continuous timeline.

A list of hundreds of constraints drawn from the literature is presented, showing how every degree of freedom of OTP is exploited in each of several areas: features, prosody, feature-prosody interaction, input-output relationships, and morphophonology. To show that the primitive constraints are not merely necessary, but also close to sufficient, we also discuss how to handle a few apparently difficult cases of non-local phenomena.

Citation (BibTeX):

Eisner, Jason M (1997). What constraints should OT allow? Talk handout (22 pages), Annual Meeting of the Linguistic Society of America, Chicago, January. Available on the Rutgers Optimality Archive (http://ruccs.rutgers.edu/roa.html) and at http://cs.jhu.edu/~jason/papers/#lsa97.

With some additions and corrections:
Eisner, Jason (1997). Constraining OT: Primitive Optimality Theory. Talk handout, MIT, September. http://www.cs.jhu.edu/~jason/papers/#mit97.

On-line document (Click here instead for a more recent version!)

22 pp. postscript (176K), PDF (324K) (corrected 9/1997 version)
11 pp. postscript (171K), PDF (317K) (corrected 9/1997 version, as handout)
10 pp. postscript (158K) (original 1/1997 version, as handout)

Relationship to other papers:

jump to this paper in my research summary

An Empirical Comparison of Probability Models for Dependency Grammar (1996)

Abstract:

This technical report is an appendix to Eisner (1996): it gives superior experimental results that were reported only in the talk version of that paper. Eisner (1996) trained three probability models on a small set of about 4,000 conjunction-free, dependency-grammar parses derived from the Wall Street Journal section of the Penn Treebank, and then evaluated the models on a held-out test set, using a novel O(n³) parsing algorithm.

The present paper describes some details of the experiments and repeats them with a larger training set of 25,000 sentences. As reported at the talk, the more extensive training yields greatly improved performance. Nearly half the sentences are parsed with no misattachments; two-thirds are parsed with at most one misattachment.

Of the models described in the original written paper, the best score is still obtained with the generative (top-down) ``model C.'' However, slightly better models are also explored, in particular, two variants on the comprehension (bottom-up) ``model B.'' The better of these has an attachment accuracy of 90%, and (unlike model C) tags words more accurately than the comparable trigram tagger. Differences are statistically significant.

If tags are roughly known in advance, search error is all but eliminated and the new model attains an attachment accuracy of 93%. We find that the parser of Collins (1996), when combined with a highly-trained tagger, also achieves 93% when trained and tested on the same sentences. Similarities and differences are discussed.

Citation (BibTeX):

Eisner, Jason M (1996). An empirical comparison of probability models for dependency grammar. Technical report IRCS-96-11, Institute for Research in Cognitive Science, Univ. of Pennsylvania. 18 pp.

On-line document:

18 pp. postscript (327K), PDF (357K)

Relationship to other papers:

jump to this paper in my research summary

Three New Probabilistic Models for Dependency Parsing: An Exploration (1996)

Abstract:: After presenting a novel O(n³) parsing algorithm for dependency grammar, we develop three contrasting ways to stochasticize it. We propose (a) a lexical affinity model where words struggle to modify each other, (b) a sense tagging model where words fluctuate randomly in their selectional preferences, and (c) a generative model where the speaker fleshes out each word's syntactic and conceptual structure without regard to the implications for the hearer. We also give preliminary empirical results from evaluating the three models' parsing performance on annotated Wall Street Journal training text (derived from the Penn Treebank). In these results, the generative (i.e., top-down) model performs significantly better than the others, and does about equally well at assigning part-of-speech tags.
Citation (BibTeX):: Eisner, Jason M (1996). Three new probabilistic models for dependency parsing: An exploration. Proceedings of the 16th International Conference on Computational Linguistics (COLING-96), pages 340-345, Copenhagen, August.
On-line document:: 6 pp. postscript (153K), PDF (218K)
Relationship to other papers:: jump to this paper in my research summary

Efficient Normal-Form Parsing for Combinatory Categorial Grammar (1996)

Abstract:: Under categorial grammars that have powerful rules like composition, a simple n-word sentence can have exponentially many parses that are semantically equivalent. Generating all parses is inefficient and obscures whatever true semantic ambiguities are in the input. This paper addresses the problem for a fairly general form of Combinatory Categorial Grammar, by means of an efficient, correct, and easy to implement normal-form parsing technique. The parser is proved to find exactly one parse in each semantic equivalence class of allowable parses; that is, spurious ambiguity (as carefully defined) is shown to be both safely and completely eliminated.
Citation (BibTeX):: Eisner, Jason (1996). Efficient normal-form parsing for Combinatory Categorial Grammar. Proceedings of the 34th Annual Meeting of the ACL, pages 79-86, Santa Cruz, June.
On-line document:: 8 pp. postscript (149K), PDF (226K)
Slides (black-and-white):: PDF (50K)
Relationship to other papers:: jump to this paper in my research summary

Description of the University of Pennsylvania entry in the MUC-6 competition (1995)

Citation (BibTeX):: B. Baldwin, J.C. Reynar, M. Collins, J. Eisner, A. Ratnaparkhi, J. Rosenzweig, A. Sarkar, and B. Srinivas (1995). Description of the University of Pennsylvania entry in the MUC-6 competition. Proceedings of the Sixth Message Understanding Conference (MUC-6), pages 177-191, Maryland, October.
Summary:: A competitive system for coreference resolution, hacked together in our spare time.
On-line document:: 15 pp. postscript (166K)
Relationship to other papers:: jump to this paper in my research summary

`All'-less in Wonderland? Revisiting any (1995)

Abstract:: English any is often treated as two unrelated or semi-related lexemes: a negative-polarity item, NPI any, and a universal quantifier, free-choice (FC) any. The latter is idiosyncratic in that it must appear in the scope of a licenser, but moves to take scope immediately over that licenser at LF. I give a semantic account of FC any as an ``irrealis'' quantifier. This explains some curious (new and old) facts about FC any's semantics and licensing environments. Furthermore, it predicts that negation and other NPI-licensing environments should license FC any, which would then have just the same meaning as NPI any (pace Ladusaw (1979), Carlson (1980)). Thus, we may unify the two any's as a single universal quantifier, as originally proposed by Lasnik (1972) and others. Such an account implies that NPI any moves over negation at LF -- which is confirmed by scope tests. It also explains some well-known problems concerning NPI any in non-downward-entailing environments and under sorry vs. glad.
Citation (BibTeX):: Eisner, Jason (1995). `All'-less in Wonderland? Revisiting any. In Fuller, Janet, Ho Han, and David Parkinson (eds.), Proceedings of ESCOL 11 (October 1994), pages 92-103. Ithaca, NY: DMLL Publications.
On-line document:: 12 pp. postscript (96K), PDF (131K)
Relationship to other papers:: jump to this paper in my research summary

A Probabilistic Parser Applied to Software Testing Documents (1992)

Abstract:: We describe an approach to training a statistical parser from a bracketed corpus, and demonstrate its use in a software testing application that translates English specifications into an automated testing language. A grammar is not explicitly specified; the rules and contextual probabilities of occurrence are automatically generated from the corpus. The parser is extremely successful at producing and identifying the correct parse, and nearly deterministic in the number of parses that it produces. To compensate for undertraining, the parser also uses general, linguistic subtheories which aid in guessing some types of novel structures.
Citation (BibTeX):: Jones, Mark A., and Jason M. Eisner (1992). A probabilistic parser applied to software testing documents. Proceedings of National Conference on Artificial Intelligence (AAAI-92), pages 322-328, San Jose
On-line document:: 7 pp. postscript (161K), PDF (206K)
Relationship to other papers:: jump to this paper in my research summary

A Probabilistic Parser and Its Application (1992)

Abstract:: We describe a general approach to the probabilistic parsing of context-free grammars. The method integrates context-sensitive statistical knowledge of various types (e.g., syntactic and semantic) and can be trained incrementally from a bracketed corpus. We introduce a variant of the GHR context-free recognition algorithm, and explain how to adapt it for efficient probabilistic parsing. On a real-world corpus of sentences from software testing documents, with 23 possible parses for a sentence of average length, the system accurately finds the correct parse in 99% of cases, while producing only 1.02 parses per sentence. Significantly, the success rate would be only 66% without the semantic statistics.
Citation (BibTeX):: Jones, Mark A., and Jason M. Eisner (1992). A probabilistic parser and its application. In Carl Weir (ed.), Statistically-Based Natural Language Processing Techniques: Papers from the 1992 Workshop, pages 20-27, Technical Report W-92-01, AAAI Press, Menlo Park.
On-line document:: 8 pp. postscript (147K), PDF (220K)
Relationship to other papers:: jump to this paper in my research summary

Indirect STV Election: A Voting System for South Africa (1991)

Abstract:

"Winner take all" electoral systems are not fully representative. Unfortunately, the ANC's proposed system of proportional representation is not much better. Because it ensconces party politics, it is only slightly more representative, and poses a serious threat to accountability.

Many modern students of democracy favor proportional representation through the Single Transferable Vote (STV). In countries with high illiteracy, however, this system may be unworkable.

This paper proposes a practical modification of STV. In the modified system, each citizen votes for only one candidate. Voters need not specify their second, third, and fourth choices. Instead, each candidate specifies his or her second, third, and fourth choices. The modified system is no more difficult for voters than current proposals -- and it provides virtually all the benefits of STV, together with some new ones.

Citation (BibTeX):

Eisner, Jason (1991). Indirect STV election: A voting system for South Africa. White paper, University of Cape Town, June. 16 pp.

On-line document:

16 pp. PDF (54K), Microsoft Word (59K)

Relationship to other papers:

jump to this paper in my research summary

Cognitive Science and the Search for Intelligence (1991)

Abstract:: This talk for a general audience introduces the perspectives and problems of cognitive science. In its latter half, it turns to the philosophical question of defining intelligence, and proposes a non-operational alternative to the Turing Test.
Citation (BibTeX):: Eisner, Jason (1991). Cognitive science and the search for intelligence. Invited paper presented to the Socratic Society, University of Cape Town, South Africa, May.
On-line document:: 24 pp. postscript (1017K), PDF (217K), RTF (100K), MSWord (102K)
Relationship to other papers:: jump to this paper in my research summary

Dynamical-Systems Behavior in Recurrent and Non-Recurrent Connectionist Nets (1990)

Abstract:: A broad approach is developed for training dynamical behaviors in connectionist networks. General recurrent networks are powerful computational devices, necessary for difficult tasks like constraint satisfaction and temporal processing. These tasks are discussed here in some detail. From both theoretical and empirical considerations, it is concluded that such tasks are best addressed by recurrent networks that operate continuously in time -- and further, that effective learning rules for these continuous-time networks must be able to prescribe their dynamical properties. A general class of such learning rules is derived and tested on simple problems. Where existing learning algorithms for recurrent and non-recurrent networks only attempt to train a network's position in activation space, the models presented here can also explicitly and successfully prescribe the nature of its movement through activation space.
Citation (BibTeX):: Eisner, Jason M (1990). Dynamical-systems behavior in recurrent and non-recurrent connectionist nets. Undergraduate honors thesis, Harvard University, April.
On-line document:: 57 pp. postscript (338K), PDF (417K)
Relationship to other papers:: jump to this paper in my research summary

Patents

A Lempel-Ziv Data Compression Technique Utilizing a Dictionary Pre-Filled with Frequent Letter Combinations, Words and/or Phrases (1996)

Abstract:
An adaptive compression technique which is an improvement to Lempel-Ziv (LZ) compression techniques, both as applied for purposes of reducing required storage space and for reducing the transmission time associated with transferring data from point to point. Pre-filled compression dictionaries are utilized to address the problem with prior Lempel-Ziv techniques in which the compression software starts with an empty compression dictionary, whereby little compression is achieved until the dictionary has been filled with sequences common in the data being compressed. In accordance with the invention, the compression dictionary is pre-filled, prior to the beginning of the data compression, with letter sequences, words and/or phrases frequent in the domain from which the data being compressed is drawn. The letter sequences, words, and/or phrases used in the pre-filled compression dictionary may be determined by statistically sampling text data from the same genre of text. Multiple pre-filled dictionaries may be utilized by the compression software at the beginning of the compression process, where the most appropriate dictionary for maximum compression is identified and used to compress the current data. These modifications are made to any of the known Lempel-Ziv compression techniques based on the variants detailed in 1977 and 1978 articles by Ziv and Lempel.
Citation (BibTeX):
Jeffrey C. Reynar, Fred Herz, Jason Eisner, and Lyle Ungar. A Lempel-Ziv data compression technique utilizing a dictionary pre-filled with frequent letter combinations, words and/or phrases. U.S. Patent #5,951,623 issued 9/14/1999, filed 1996.
On-line document:
#5,951,623 (at Patent Storm)
Relationship to other papers:
jump to this paper in my research summary

System for the Automatic Determination of Customized Prices and Promotions (1996)

Citation (BibTeX):
Frederick Herz, Lyle Ungar, and Jason M. Eisner. System for the automatic determination of customized prices and promotions. Patent pending, filed 1996.
Relationship to other papers:
jump to this paper in my research summary

A System for Customized Electronic Identification of Desirable Objects (1995)

Abstract:
This invention relates to customized electronic identification of desirable objects, such as news articles, in an electronic media environment, and in particular to a system that automatically constructs both a "target profile" for each target object in the electronic media based, for example, on the frequency with which each word appears in an article relative to its overall frequency of use in all articles, as well as a "target profile interest summary" for each user, which target profile interest summary describes the user's interest level in various types of target objects. The system then evaluates the target profiles against the users' target profile interest summaries to generate a user-customized rank ordered listing of target objects most likely to be of interest to each user so that the user can select from among these potentially relevant target objects, which were automatically selected by this system from the plethora of target objects that are profiled on the electronic media. Users' target profile interest summaries can be used to efficiently organize the distribution of information in a large scale system consisting of many users interconnected by means of a communication network. Additionally, a cryptographically-based pseudonym proxy server is provided to ensure the privacy of a user's target profile interest summary, by giving the user control over the ability of third parties to access this summary and to identify or contact the user.
Citations (BibTeX):

Frederick S. M. Herz, Jason M. Eisner, and Lyle H. Ungar. System for generation of object profiles for a system for customized electronic identification of desirable objects. U.S. Patent #5,835,087 issued 11/10/1998, filed 1995.
Frederick S. M. Herz, Jason M. Eisner, Lyle H. Ungar, and Mitchell P. Marcus. System for generation of user profiles for a system for customized electronic identification of desirable objects. U.S. Patent #5,754,939 issued 5/19/1998, filed 1995.
Frederick S. M. Herz, Jason Eisner, and Marcos Salganicoff. Pseudonymous server for system for customized electronic identification of desirable objects. U.S. Patent #5,754,938 issued 5/19/1998, filed 1995.

On-line documents:
#5,835,087; #5,754,939; #5,754,983 (at Patent Storm)
Relationship to other papers:
jump to this paper in my research summary

This page online: http://cs.jhu.edu/~jason/papers

Jason Eisner - jason@cs.jhu.edu (tech correspondence welcome)

Last Mod $Date: 2006/11/08 05:55:24 $