PangramTweets

« previous post | next post »

The Twitter API, beyond its great utility for corpus linguistics (see "On the front lines of Twitter linguistics," "The he's and she's of Twitter"), has made possible a lot of fun automated text-mining projects. One fertile area is algorithmic found poetry: there have been Twitter bots designed to find accidental haikus, and even more impressively, a bot named @Pentametron that finds rhyming tweets in iambic pentameter and fashions sonnets out of them.

And then there is found wordplay, which is its own kind of found poetry. I'm a big fan of @Anagramatron, which discovers paired tweets that form serendipitous anagrams of each other. (Example: "Last time I do anything" ⇔ "That's it. I'm dying alone.") Now, courtesy of Jesse Sheidlower, comes @PangramTweets, in which each tweet contains every letter of the alphabet at least once.

Jesse explains the project on his site:

PangramTweets is a bot (a computer program that runs on its own) that searches Twitter for, and then retweets, pangrams—texts that contain every letter of the alphabet. A famous pangram, sometimes used as a typing test, is “The quick brown fox jumps over the lazy dog.” […]

You may find the results interesting, or dull. I make no judgment on this. The bot is entirely automated; I do not curate the results.

I strip out user names and URLs from the results, but hashtags are included. I also do some very basic filtering to try to ensure that the results are in English, and not in another language or complete gibberish (random letters), though earlier versions of the bot did retweet nonsense or foreign-language pangrams.

The bot originally did not filter out known pangrams of the "quick brown fox" variety, but by popular demand Jesse put a filter in place for that as well. The results are not as rich as Anagramatron, but that's to be expected given the constraints: Jesse says he gets "one real pangram in every few million tweets scanned." Here's a sampling of what has turned up so far.


It will be interesting to see if the bot turns up a naturally occurring "pangrammatic window" that beats the current record-holder of 42 letters, from Piers Anthony's Cube Route:

"We are all from Xanth," Cube said quickly. "Just visiting Phaze. We just want to find the dragon."

Sean Irvine announced the discovery of this pangrammatic window in Word Ways in 2012. It beat out Eric Chaikin's 47-letter find, which he discovered by Googling for "Joaquin Phoenix":

"JoBlo's movie review of The Yards: Mark Wahlberg, Joaquin Phoenix, Charlize Theron…"

Of course, determining if a pangram is "naturally occurring" may be difficult, since it's always possible to game the system! But with half a billion tweeters tweeting, maybe someday one of them will authentically produce a winner like "Mr. Jock, TV quiz PhD, bags few lynx."

Update: Jesse is attempting to filter out non-English tweets, but Indonesian tweets keep seeping through. Since I've done research on colloquial varieties of Indonesian, I find these tweets fascinating. I was initially surprised that the Indonesian Twittersphere would be generating pangrams, considering that the letters Q, V, X, and Z appear only in loanwords. But Indonesian participants on Twitter are using quite a lot of Anglicisms, along with a plethora of txtspk-style abbreviations of Indonesian words. An example that just popped up:

The loanwords here are EXCITED, JOIN, and LITTLEQUIZ, and 1D refers to the band One Direction. Here's a key to the abbreviation-heavy Indonesian items:

BGT = banget 'very'
GRGR = gara-gara 'just because'
MW = mau 'will'
K = ke '(come) to'
INDO = Indonesia
LBH = lebih 'more'
LG = lagi '(even) more'
KLO = kalau 'if'
DAN = dan 'and'
BCA = baca 'read'
JG = juga 'also'
PASTI = pasti 'definitely'
LO = (e)lo 'you'
MKIN = makin 'more and more'
CEK = cek 'check'

So that would work out to: "@PutriAZSYA Very excited just because One Direction is coming to Indonesia. You'll be even more excited if you join LittleQuiz @1D_CrazyLovers, and also read FFNY. You'll definitely get more and more excited. Check Fav6."



8 Comments

  1. AJD said,

    May 19, 2014 @ 5:28 pm

    …So am I doing something wrong if I see "quick brown fox" sentences showing up at https://twitter.com/PangramTweets anyway?

  2. leoboiko said,

    May 19, 2014 @ 5:31 pm

    It would be interesting to build pangram bots for more languages/character sets… Also a Japanese one to make up poems by choosing 5-7-5;7-7 moræ looks easy (Chinese poetry, too).

  3. Jesse Sheidlower said,

    May 19, 2014 @ 5:46 pm

    @AJD: No, you're not doing anything wrong. I didn't remove the "quick brown fox" examples already out there, but there won't be any new ones coming.

    Similarly there won't be any pure gibberish, or Indonesian tweets, which you'll also see in the earliest results.

  4. Faldone said,

    May 19, 2014 @ 7:05 pm

    How long was the pangram from Ella Minnow Pea?

  5. Erik said,

    May 19, 2014 @ 7:14 pm

    The one in Ella Minnow Pea was "Pack my box with five dozen liquor jugs," so 33 letters. But that's only "naturally occurring" in the fictional universe of the novel.

  6. David said,

    May 19, 2014 @ 8:56 pm

    For any folks interested in creating their own corpus using the Twitter API, I've posted some free code here to help:
    http://www.thegrammarlab.com/?portfolio=tweet-corpus
    Happy corpus building!

  7. Tristan Miller said,

    May 20, 2014 @ 2:17 am

    Ben, I'm not sure if the Word Ways archive at thefreelibrary.com is authorized. (I wrote to the Word Ways editor about this a while back but didn't get any response.) The only official Word Ways archive I'm aware of is the one at Butler University Digital Commons. The most recent couple years' issues are available only to Word Ways subscribers (and of course, anyone interested in pangrammatic windows and other linguistic oddities should subscribe!). But the rest of the archive, going back to 1967, is free to read online.

    The Butler archive has the practical advantage of offering scanned (but fully OCR'd) PDFs of the print magazine, so you don't end up losing text formatting, IPA symbols, and diagrams the way you do with the HTML rips at thefreelibrary.com. The two articles you cite in your blog post can be found at http://digitalcommons.butler.edu/wordways/vol39/iss2/3/ and http://digitalcommons.butler.edu/wordways/vol45/iss4/21/.

  8. cmyr said,

    May 20, 2014 @ 9:34 pm

    leoboiko: take a look at @haiku9000 (https://twitter.com/haiku9000) for a bot assembling haiku out of tweets.

    disclaimer: I wrote both @haiku9000 @anagramatron. Fun to see this stuff on LL! :)

RSS feed for comments on this post