## My illiterate search for the Sicilian animals (1)

My parents tell me that I could read well before my 4th birthday. As a result, I have virtually no experience of what it would be like to be illiterate. It would be easier for me to imagine blindness than complete inability to read. I did have a glimpse of it when I first spent some time in Japan, and was surrounded by an advanced culture using an utterly alien writing system in which I couldn't even read out the names off the signs (as I can in any of the alphabets of Europe). But I had another glimpse this morning when I heard a word on the radio that I couldn't guess how to spell, not even vaguely. Tracking it down was a terrible job. My dictionary was no help, precisely because dictionaries are organized in such a way as to be helpful only to the literate. The great naturalist Sir David Attenborough, on Radio 4, mentioned a curious-sounding class of animals that he appeared to be calling Sicilians. (Not a class in the technical terminology; technically they are actually a whole separate order of animals.) I listened carefully; it definitely sounded like "Sicilians". But what was this word? These creatures (he made it clear) did not live in Sicily.

I went to the American Heritage Dictionary (I wanted to know the meaning and learn about the animals that had the name, so I used a hard-copy dictionary that includes pictures), and simply examined all the words in the dictionary that had anything like a plausible beginning.

The only letters that can represent the [s] sound are c and s. (Just in case, I checked x and z, which actually represent [z] when initial; I thought perhaps I had misheard the voicing. But I had not; it was a blind alley, and I will ignore it hereafter. People have pointed out to me since I wrote this that the silent p words like psychology reveal another possibility. True. But it turned out not to be relevant.) The vowel letters that could represent [I] after [s] would be i as in city or sit and y as in cyst or system, and just possibly (in an unstressed syllable) e as in Cecilia or serenity. It couldn't be a or o or u, because before those letters a c stands for the stop consonant [k]. And the third sound, the second [s], could be spelled (for all I knew) c or s or ss.

And I came up with zip. Nada. Nichts. Nothing.

There I was, illiterate in English (while holding the position of head of the top Linguistics and English Language department in the U.K. — what a fraud!), fumbling through the dictionary, unable to find a word — solely because dictionaries are organized entirely on the assumption that you know how to spell, at least to an approximation. I found no sign of the word at all, in any of the relevant places. I simply couldn't remember this happening to me before. Most unpleasant. So this is how adult illiteracy feels. Dictionaries become useless.

Now, I do have a fair command of Unix tools, and those, used appropriately, can dramatically reduce your feelings of illiteracy and inadequacy. I knew that what I had to find was in the set of all words that would match the egrep regular expression: "^[cs][iey](c|ss?)". That is, I wanted to see a list of any words beginning with c or s followed by i or e or y followed by either c or else s with perhaps a second s after that. (To allow for ps-, the expression could be modified to "^(ps|[cs])[iey](c|ss?)". Doesn't make any difference.)

On any Unix system you can use the egrep program to produce an exhaustive list of all of them by searching the standard word list in /usr/share/dict/words (it is /usr/dict/words on some systems). In fact you can make egrep give you a list of all the words that begin the right way and include an l (there had to be one of those in the word) and ends in n plus (just possibly) a silent e. So I tried the magic of egrep. I typed this to the prompt in the Terminal program on my Mac OS X laptop:

egrep "^[cs][iey](c|ss?)[a-z]*l[a-z]*ne?$" /usr/dict/share/words But the results were disappointing: the four words it comes up with are cyclone, cyclopean, cyclotron, and seclusion. No plausible candidates there. Where the hell was this zoological word that sounded just like Sicilian that I had never heard before and that couldn't be found in an excellent dictionary or in the Unix standard word list? I turned to a larger word list. There is a 235,000-word list in a file called /usr/dict/web2 that is now supplied with many Unix and Linux systems (it is linked to /usr/dict/share/words on some of those), and I tried out this command: egrep "^[cs][iey](c|ss?)[a-z]*l[a-z]*ne?$" /usr/dict/web2

And I still drew a blank. I found out later that the word is in there, but the above command will not find it. It produces this list of 45 words (which reveals to you why web2 is often less useful than the shorter list — it is way too big, and contains all sorts of learned and scientific junk):

 ciclatoun cyclene cyclohexanone cyclopentadiene seclusion cisalpine cyclian cyclohexene cyclopentane sectionalization cisleithan cyclization cycloidean cyclopentanone secularization cisplatine cycloalkane cycloidian cyclopentene sesquialteran cycadofilicinean cyclobutane cyclomyarian cyclopropane sesquipedalian cyclamen cyclodiolefin cyclone cyclothurine sicilian cyclamin cycloheptane cycloolefin cyclotron sicilienne cyclamine cycloheptanone cycloparaffin secalin sickleman cyclane cyclohexane cyclopean secaline sysselman

Well, it turned out that I was missing something. Eventually I did track the word down, by a less orthodox technique, not based on the alphabet. And then I knew the crucial orthographic fact about English that I had been overlooking. It didn't seem so arcane once I reached that point. But until then it was completely opaque to me.

Let me explain…

No; on second thoughts, I don't have a lot more time right now, and it will be fun for you trying to figure it out. I will tell you tomorrow, and I'll explain how I tracked down the word.

