- Website: http://www.stanford.edu/~cgpotts/
Posts by Chris Potts:
Back in 2009, Heidi Harley and I wrote a few inter-related posts looking at the linguistics job market and how it compares with the distribution of new PhD theses. Since then, people have occasionally written to me (and probably Heidi too) about updating the posts with new numbers. I've been reluctant to do that because I've always been worried that my data-gathering methods (scraping Linguist List and ProQuest) were problematic. (It would be great if Linguist List released analyses based on its internal database of jobs ads!)
I'm pleased to report that Stephanie Shih and Rebecca Starr have done the work for me, and they did a careful job indeed. Here's the summary picture; below the fold, I've included their notes on where the data came from. Comments are open so that people can add their own analyses.
Thanks Stephanie and Rebecca!
Posting on behalf of Phil Resnik:
This post brings together a bunch of news about language-related efforts to help out in Haiti:
Via John Gruber at Daring Fireball, I've learned that a company called Sarcasm, Inc., is marketing a "Sarcasm punctuation mark" called SarcMark, which people are supposed to use to "emphasize a sarcastic phrase, sentence or message". John Gruber's pitch-perfect assessment:
What a great idea. I'm sure it'll be a huge hit.
In the current (Jan 4, 2010) issue of The New Yorker, Rebecca Mead has a comment that is partly about what to call the previous decade:
Arguably, a grudging agreement has been reached on calling the decade "the aughts," but that unfortunate term is rooted in a linguistic error. The use of "aught" to mean "nothing," "zero," or "cipher" is a nineteenth-century corruption of the word "naught," which actually does mean nothing, and which, as in the phrase "all for naught," is still in current usage. Meanwhile, the adoption of "the aughts" as the decade’s name only accelerates the almost complete obsolescence of the actual English word "aught," a concise and poetic near-synonym for "anything" that has for centuries well served writers, including Shakespeare ("I never gave you aught," Hamlet says to Ophelia, in an especially ungenerous moment, before she goes off and drowns) and Milton ("To do aught good never will be our task / But ever to do ill our sole delight," Satan declares near the beginning of "Paradise Lost," before slinking up to tempt Eve).
I don't know whether Mead is right that we've settled on the aughts, and I won't comment directly on whether anyone has committed any errors. Instead, I'll try to explicate why a word like aught might take on senses close to that of nothing.
Jason Kottke links to a "Grammar Challenge" devised by David Foster Wallace and posted by a student of Wallace's, Amy McDaniel. What's noteworthy is that Kottke reports getting 0/10. Kottke is a thoughtful, creative English prose stylist, and Wallace thought that these questions were basic ones that should be taught in any undergraduate class. Kottke seems to think the problem lies with him. I take a different view: this test is useless. Just imagine a chemistry quiz that accomplished working chemists could not pass. What would you make of such a quiz? I myself would question its author's competence at devising chemistry quizzes.
The NLTK book, Natural Language Processing with Python, went on sale yesterday:
"This book is here to help you get your job done." I love that line (from the preface). It captures the spirit of the book. Right from the start, readers/users get to do advanced things with large corpora, including information-rich visualizations and sophisticated theory implementation. If you've started to see that your research would benefit from some computational power, but you have limited (or no) programming experience, don't despair — install NLTK and its data sets (it's a snap), then work through this book.
Early this year, Heidi Harley and I posted a few times about the job market in linguistics. I got the ball rolling with a post about the Linguist List's 2008 job ads. Heidi followed up with a comparison between the job-ad numbers and the indices at ProQuest, and then I put those numbers together.
The combined post told a dispiriting story about the theoretical areas: it looked like there was a serious mismatch between the number of PhDs and the number of jobs for them. I think now, though, that this was based on an unfair comparison. John McCarthy pointed out to me that the ProQuest counts we reported are not relativized to dissertations per se, but rather come from a more general search of ProQuest's databases.
If the ProQuest search term restricts always to (i) dissertations, (ii) the Linguistics subject area, and (iii) the last five years, then one gets much smaller numbers throughout, and the hits themselves seem generally reasonable. Here's a graph comparable to my earlier one, with the same job-ad numbers but the more restrictive ProQuest numbers.
The following is a guest post by Jason Merchant.
Thought the LangLog would like to hear this week's update on the the Supreme Court case involving adverbial modification argued in February: all nine justices agree with the linguists! The decision is posted, but briefly, the money quote is:
"In ordinary English, where a transitive verb has an object, listeners in most contexts assume that an adverb (such as knowingly) that modifies the transitive verb tells the listener how the subject performed the entire action, including the object as set forth in the sentence."
It is so ordered…
I can find no better description of Amazon's Mechanical Turk than in the "description" tag at the site itself:
The online market place for work. We give businesses and developers access to an on-demand scalable workforce. Workers can work at home and make money by choosing from thousands of tasks and jobs.
This is followed by a "keywords" meta tag:
make money, make money at home, make money from home, make money on the internet, make extra money, make money …
This makes the site sound a bit like the next stop on Dave Chapelle's tour of his imagined Internet as physical place, and indeed it does have its seamy side. But I come to defend Mechanical Turk as a useful tool for linguistic research — a quick and inexpensive way to gather data and conduct simple experiments.
The following is a guest post by Jason Merchant.
The Supreme Court is scheduled today (25 Feb 2009) to hear arguments (Flores-Figueroa v. U.S., No. 08-108) to decide whether Ignacio Flores-Figueroa should have his conviction for aggravated identity theft reversed. The debate centers on the interpretation of a statute, 18 U.S.C. sec. 1028A(a)(1), which states that:
"Whoever … knowingly transfers, possesses, or uses, without lawful authority, a means of identification of another person shall … be sentenced to a term of imprisonment of 2 years."
Honestly curious here: are numbers of applicants for particular jobs a matter of public record (at least, at public institutions)? It would be good to contrast the numbers above with some numbers that show how many folks are actually competing for individual jobs.
In the February 9, 2009, broadcast of The Daily Show, Jon Stewart presents a well-argued Optimality Theory analysis of part of Bill O'Reilly's journalistic standards. Stewart and his research team do a good job of gathering and presenting empirical support for a theory involving ranked, violable constraints. Here's a screenshot that links to the full episode:
The opening of John McPhee's article on fact-checking in the current New Yorker (Checkpoints, Feb 9 & 16, 2009) suggests that checking the facts means checking each word for its factuality. Quoting a legendary fact-checker there, he writes:
Each word in the piece that has even a shred of fact clinging to it is scrutinized, and, if passed, given the checker's imprimatur, which consists of a tiny pencil tick.
This is revealed later on to be a metaphor and/or a record-keeping device; I think all involved know that literally checking at the word-level would be mostly pretty vacuous, and would miss a lot of assertions. My favorite non-word-level anecdote in the article:
Penn's daughter Margaret fished in the Delaware, and wrote home to a brother asking him to "buy for me a four joynted, strong fishing Rod and Real with strong good Lines …"
The problem was not with the rod or the real but with William Penn's offspring. Should there be commas around Margaret or no commas around Margaret? The presence of absence of commas would, in effect, say whether Penn had one daughter or more than one. The commas—there or missing there—were not just commas; they were facts, neither more nor less factual than the kegs of Bud or the colors of Santa's suit.