Corpus-based Linguistic Research: From Phonetics to Pragmatics

Mark Liberman – University of Pennsylvania
Time: Monday/Wednesday 1:30-3:20 pm
Place: Angell Hall, Auditorium C
Course structure and assignments

There will be seven lectures, each with associated readings, all on line and linked into the course syllabus (below). Class participants will develop an individual research project, or perhaps a plan for a project, which they will document in a final report, due in your CTools dropbox by the end of the day on Monday 7/22. At the end of each of the three previous weeks, participants will submit a series of steps on the way to the goal: see the first set of lecture notes ("Introduction") for details.

Unfortunately, a prior commitment far away requires me to leave Ann Arbor on Tuesday, 7/16. Though traveling will limit my time, I will try to answer email inquiries as promptly as possible during the final week before your reports are due on 7/22, and I'll send feedback on your work between 7/23 and 8/1, when grades are due.

We'll be using Piazza for class discussion, so that you can get help from your classmates as well as from the instructor, and so that others can benefit from your questions and answers. The class Piazza page is https://piazza.com/lsa_linguistic_institute/summer2013/li514/home

Open office hours (for group discussions) will be

TUESDAY 10:00-13:00
Mason 2455
FRIDAY 10:00-13:00
Mason 2455


NOTE: There will be interactive demos in Mason Hall 2325 and 2333 from 10:00 am to 5:00 pm on Friday 7/12 (including help downloading and installing relevant software).


Lecture Notes and Readings

The lecture notes and readings will evolve as the course goes forward -- please check this page before each lecture for updates.

Date Lecture Notes Readings

Introduction to the course --
Introduction to the topic

"Obituary: Fred Jelinek", Computational Linguistics 2010
"Lessons for Responsible Science from DARPA's Programs in Human Language Technology", NAS Committee Presentation, 2012
6/26 Relations between theories and data

"Norvig channels Shannon contra Chomsky", LLOG 5/31/2011
"Straw men and bee science", LLOG 6/4/2011
Tom Wasow and Jennifer Arnold, "Intuitions in linguistic argumentation", Lingua 2005.
Mark Liberman & Janet Pierrehumbert, "Intonational Invariance under Changes in Pitch range and Length", 1984
Jiahong Yuan & Mark Liberman, "F0 Declination in English and Mandarin Broadcast News Speech", interSpeech 2010
Esther Grabe, Greg Kochanski & John Coleman, "The Intonation of Native Accent Varieties in the British Isles", 2005 -- and later...
Ted Underwood & Jordan Sellers, "The Emergence of Literary Diction", Journal of Digital Humanities 2012
Jiahong Yuan & Mark Liberman, "Investigating /l/ Variation in English through Forced Alignment", InterSpeech 2009

7/1 Reproducible Research

"Reproducible Science at AAAS 2011", 2/18/2011
Victoria Stodden, "The Digitization of Science: Reproducibility and Interdisciplinary Knowledge Transfer", A symposium at the AAAS Annual Meeting, February 19, 2011
Patrick vandewalle, Jelena Kovacevic, and Martin Vetterli, "Reproducible Research Links" (web site)
Tom Bartlett, "Power of Suggestion", The Chronicle Review 1/30/2013
Steve Abney et al., "Procedure for quantitatively comparing the syntactic coverage of English grammars", HLT 1991
Dan Bikel, "Intricacies of Collins' Parsing Model", Computational Linguistics 2004

7/3 Finding or creating resources  
7/8 Technical foundations:
What do you need to know?
How can you learn it?
Steven Bird et al., Natural Language Processing with Python, 2009 (Chaps. 0-3)
James Holland, "Why Use R?", monkey's uncle 7/25/2009
"Octave Programming Tutorial" (especially the suggest roadmap for beginners)
Ted Underwood, "Where to start with text mining", 8/14/2012
7/10 "Machine learning":
problems, algorithms, applications
Lance Ramshaw and Mitch Marcus, "Text Chunking using Transformation-Based Learning", 1995
Fei Sha and Fernando Pereira, "Shallow Parsing with Conditional Random Fields", HLT-NAACL 2003
Eric Fosler-Lussier, "Markov Models and Hidden Markov Models: A Brief Tutorial", ICSI 1998.
Geoffrey Hinton, "Tutorial on Deep Belief Nets",
Neville Ryant, Jiahong Yuan, and Mark Liberman, "Automating phonetic measurement: The case of voice onset time", ICA 2013
Neville Ryant et al., "Speech Activity Detection on YouTube Using Deep Neural Networks", InterSpeech 2013
Rushin Shah et al., "A New Approach to Lexical Disambiguation of Arabic Text", EMNLP 2010.
Slav Petrov et al., "Learning Accurate, Compact, and Interpretable Tree Annotation", COLING-ACL 2006
Slav Petrov and Dan Klein, "Improved Inference for Unlexicalized Parsing", HLT-NAACL 2007
7/15 Programs and programming:
practical methods for searching,
classifying, counting, and measuring
7/17 Summary & prospects Ted Underwood, "Against (talking about) 'big data'". 5/10/2013