Oxford-NINJAL Corpus of Old Japanese

« previous post | next post »

From Bjarke Frellesvig (University of Oxford), Stephen Wright Horn (NINJAL), and Toshinobu Ogiso (NINJAL):

[VHM:  NINJAL = National Institute for Japanese Language and Linguistics]

We are very pleased to announce the first public release of the
Oxford-NINJAL Corpus of Old Japanese (ONCOJ). We will be grateful if you
would circulate and share this information as appropriate.

The corpus is avallable through this website: http://oncoj.ninjal.ac.jp/

Old Japanese is the earliest attested stage of the Japanese language (mainly the 8th century AD). The texts from the period are mainly poetry. The ONCOJ is an ongoing, long-term collaborative research project between the Research Centre for Japanese Language and Linguistics in the University of Oxford, and the National Institute for Japanese Language and Linguistics, Tokyo.

The ONCOJ contains the texts in /*original script*/ and in a */phonemic transcription/*. It is */lemmatized/* and has annotation for */mode of writing/* (phonographic or logographic), */morphology/*, */constituency/*, and */grammatical function/*. This release presents the poetic texts from the period, approximately 90,000 words of text.

The corpus is /*searchable*/ through a suite of online search facilities and both the full data in the corpus and individual search results are /*downloadable*/ for offline use. The data is primarily presented in a /*Penn Historical style*/ bracketed tree format, but will also soon be available in a TEI convertible xml format.

I'm curious about the romanization systems they're using and what the notation /*original script*/ means.  It seems that they may be using that expression to mean the original hanzi/kanji, to distinguish it from the romanized transliteration, but I'm not sure.

[Thanks to Jim Breen]


  1. Victor Mair said,

    April 11, 2018 @ 8:19 pm

    From Bob Ramsey:

    "Original script" can only mean Chinese characters, that is, kanji or manyogana, since there wasn't anything else then in Japan. And a few of the poems in the Man'yoshu were actually Chinese poems (written by Japanese).

  2. A-gu said,

    April 11, 2018 @ 9:03 pm

    Can confirm, some of it is in Man'yōgana (万葉仮名) , probably most.

  3. Ross Bender said,

    April 12, 2018 @ 8:16 am

    Your question about romanization systems is apt, since there is as yet no standard system for transliteration of Old Japanese. While I suspect that over time the Oxford system might become that standard, usage is still quite various.

    One can compare the ONCOJ version of Man'yōshū 15:3578 with versions by Alexander Vovin and Mack Horton given in my article "Trends in Western Research on Ancient Japanese Classics" in the Japanese journal Shishu:


    These three versions use different diacritics, which is why it is not easy to reproduce them quickly on this blog. Vovin's is the most complex, since he illustrates most clearly the semantic vs. phonological readings as well as detailed points of grammar.

    For my recently published complete translation of the 62 <i?senmyō in Shoku Nihongi I used a romanization of the furigana given in the standard edition (Kitagawa Kazuhide 1982).

  4. Ross Bender said,

    April 12, 2018 @ 1:16 pm

    The ONCOJ is a wonderful update from the old Oxford OJ Texts site


    with multiple functionalities I am only beginning to discover. However, the new site thus far comprises only the poetic texts.

    But, the "List of Words" feature is quite an adventure. I clicked on two spellings for Izumo, the place name (idumo -idumwo) . This brings up an entry for the word, with the option of going to the Tree Search Results page.

    This lists the six occurrences of the word in the Man'yoshu, Kojiki, and Nihon Shoki.

    Immediately one can see in context the several different spellings:







RSS feed for comments on this post