Hindi resource notes:

Hindi Language Resources page:
http://www.cs.colostate.edu/~malaiya/hindilinks.html

Webdunia Hindi Portal:
http://www.webdunia.com/

Hindi on the web:
http://www.avashy.com/hindibhasha/weblinks.htm
http://theory.tifr.res.in/bombay/history/people/language/hindi.html

Resources from Indian Language Technology Solutions: http://www.cfilt.iitb.ac.in/

Hindi links via Ted Pedersen:  http://www.d.umn.edu/~pura0010/hindi.html

Hindi translation site: http://mason.gmu.edu/~aross2/hindi.htm

Stuff from Anoop Sarkar  http://ldc.upenn.edu/myl/cbnlp-work.tar.gz
                                        http://ldc.upenn.edu/myl/cbnlp_readme
"The tarfile above includes a tagger/supertagger and chunker and also a
PCFG parser all trained on the tiny LTRC treebank. The treebank is also
in the data directory and the tests directory (where it has been
converted to dependency trees, etc.)."
http://ldc.upenn.edu/myl/hindi_chunker_24_03_03.tgz
"T Papi Reddy, who was one of my team members in the workshop held a
couple of years ago in India, has a version of his chunker available for
download from his web page. "


Hindi News Sites:

BBC Hindi news:
http://www.bbc.co.uk/hindi/

Indian newspapers page:
http://www.ipl.org/div/news/browse/IN/
Some Hindi newspapers:
http://www.prabhasakshi.com/
http://www.hindimilap.com/ (in Hindi and Urdu: http://www.milap.com/aboutus.html)
http://www.naidunia.com/
http://www.navabharat.net/
http://www.jagran.com/
http://www.rajasthanpatrika.com/
http://www.bhaskar.com/


Hindi literary magazine:
http://www.udgam.com/
http://www.bharatdarshan.co.nz/


Parallel sources (more or less):
http://sify.com/news_info/news/    http://sify.com/hindi/
http://www.indiatoday.com/itoday/index.html     http://www.indiatodayhindi.com/
 http://www.vigyanprasar.com/dream/index.asp

News in English, Hindi, Telegu: http://www.niharonline.com/  http://www.niharonline.com/hindi/news/
English, Hindi, Marathi:     http://www.rediff.com/  http://www.rediff.com/hindi/index.html

Literary  magazine in English and Hindi: http://www.boloji.com/Default.asp   http://www.boloji.com/hindi/index.html

"Computer news & IT resources" http://ciol.com  http://hindi.ciol.com/main.asp
ZDnet in Hindi:   http://www.zdnetindia.com/hindizone/

Indian government sites:
Government Portal: http://indiaimage.nic.in/
Parliament:
  English version: http://rajyasabha.nic.in/
  Hindi version: http://rajyasabha.nic.in/hindisite/hindipage.asp
Constitution:  http://indiacode.nic.in/coiweb/welcome.html

Ministry of Home Affairs: http://rajbhasha.nic.in/
http://rajbhasha.nic.in/dolst_eng.htm http://rajbhasha.nic.in/dolst_hin.htm
(English welcome page above seems to be be mistakenly linked to Hindi page)

Press information bureau: http://pib.nic.in/ http://pib.nic.in/urdu/hindimain.html
http://164.100.24.208/

Gita in Hindi and Engish: http://www.gitasupersite.iitk.ac.in/

7th World Hindi Conference: http://www.vishwahindisammelan.nic.in/welcome.html

Radio:
http://www.voa.gov/hindi/
http://allindiaradio.org/
http://www.bbc.co.uk/hindi/index.shtml


Hindi learning resources:

Learn to read Hindi  (i.e. devanagari):
http://www.ukindia.com/zhin001.htm
http://www.latrobe.edu.au/indiangallery/devanagari.htm

http://lrs.ed.uiuc.edu/students/avatans/resources.html
http://lrs.ed.uiuc.edu/Students/avatans/project.html
http://philae.sas.upenn.edu/Hindi/hindi.html

Grammatical sketch: http://www.it-c.dk/people/pfw/hindi/index.html

Corpora:
The beta version of the EMILLE corpus, released in March 2003
http://www.emille.lancs.ac.uk/beta.htm
contains 30Mw of Hindi text and some parallel text (120KW?)

AnnCorra documentation

Dictionaries etc.

English-Hindi dictionary at IIT:
http://sanskrit.gde.to/hindi/hindidictreadme.html
http://www.iiit.net/ltrc/Dictionaries/Dict_Frame.html

IIT dictionary in UTF-8 is at http://ldc.upenn.edu/myl/English-Hindi-Dictionary_2.utf8
       tab-delimited version without English example sentences is here in utf-8 and here in iscii.

Hindi-English dictionaries:
 http://www.wordanywhere.com/
http://www.yourdictionary.com/languages/indoiran.html#hindi

Hindi Wordnet at  Resource Center for Indian Language Technology Solutions
Indian Institute of Technology - Bombay
      http://www.cfilt.iitb.ac.in/
      http://www.cfilt.iitb.ac.in/wordnet/webhwn/

Four English-Hindi domain-specific bilingual term lists:
http://tdil.mit.gov.in/download/menu.htm
(be sure to select SHABDIKA in the pull-down list)

Collation of Mike Schultz's list against the IIT dictionary is here.

Morph analyzers:
http://www.iiit.net/ltrc/morph/index.htm
http://ccat.sas.upenn.edu/plc/tamilweb/hindi.html
Hindi verb conjugator:
http://www.verbix.com/languages/hindi.shtml

IIT download site:
http://www.iiit.net/ltrc/downloads.html

Encyclopedia etc.
http://www.tdil.mit.gov.in/terminology.htm


According to http://www.afnlp.org/nlprs2001/WS-LanguageResource/001.pdf,
NHK Science and Technical Research Laboratories has been generating
a parallel corpus in 22 languages including Hindi and English since 1998
by collecting the foreign language versions of Japanese news broadcasts.


According to http://tdil.mit.gov.in/corpora/ach-corpora.htm,
the Central Institute of Indian Languages (http://www.ciil.org/)
has a 3 million word corpus.

English to Hindi MT:  http://anglahindi.iitk.ac.in/index2.html

Rendering, encodings, fonts etc.:

A good illustration of why Hindi rendering is not trivial:
http://people.redhat.com/otaylor/gtk/guadec2-i18n/slide003.html

Website for an Indian-language localization project for linux:
http://www.indlinux.org/
They have a package that works for RedHat 8.0 to view and type (but not print and sort) Hindi text.

As of 5/16/2003, QT 3.2 Beta 1 supports complex rendering in Hindi and similar languages:
http://www.trolltech.com/newsroom/announcements/00000127.html
The previous releases of QT (used e.g. in the KDE desktop) do not.

Hindi support in Java
http://www.sun.com/developers/gadc/technicalpublications/presentations/iuc22_thai_hindi.pdf

The Inscript keymap (Indian gov't standard) for Hindi:
http://www.indlinux.org/keymap/hindi.php

General discussion of fonts/encodings for Indian langauges,
    with pointers to unicode fonts for Hindi etc.: http://india-n-indian.com/it/wil.html

Information about ISCII:
http://tdil.mit.gov.in/standards.htm
iscii91.pdf

ITRANS:
online interface: http://www.aczone.com/itrans/online/
The ITRANS 7-bit transliteration for devanagari -- this may be the most  practical thing for "ascii" approximations:
http://www.aczone.com/itrans/#itransencoding
more detailed ITRANS documentation: http://www.aczone.com/itrans/idoc/idoc.html
ITRANS download: http://www.aczone.com/itrans/#download

iscii2ascii.py for mapping from iscii to quasi-HZ-encoded transliteration.

The CS/CSX 8-bit encoding:
http://www.aczone.com/itrans/icsx/icsx.html

General information and downloads for ITRANS and CS/CSX:
http://www.aczone.com/itrans/

Another approach to typing Hindi (for Windows only, I think):
http://www.aksharamala.com/about/


Tools for conversion among Unicode, ISCII, ITRANS, proprietary fonts:

Notes on converting from ISCII or Unicode into ITRANS
Notes on other usages and how to hack them.

ISCIlib, iconverter:
http://www.cse.iitk.ac.in/users/isciig/
http://www.cse.iitk.ac.in/users/isciig/documents/user_iconverter.txt
"Support is provided for conversions from iscii code space to unicode and vice-versa
for each of the ten Indian languages. A file containing Indian language texts in ISCII codes,
can be converted to its unicode equivalent with the help of the tool - iconverter.
"
Same project also has spellchecker...
and iscii2ps for printing.

IBM's International Components for Unicode: http://oss.software.ibm.com/icu/
Includes "uconv" which seems to convert ISCII <-> unicode

Conversions from/to various non-standard fonts: http://www.iiit.net/ltrc/FC-1.0/fc.html

Font/encoding converters and other tools from Project Tukaram: http://www.cfilt.iitb.ac.in/resourcepage/index.html

iconv:  http://www.gnu.org/software/libiconv/
           http://gettext.sourceforge.net/
   (but only some recent versions support ISCII -- haven't found source yet...)


iscii2itrans.py

http://www.webdunia.net/products/data_converter.asp

Resources within LDC:
Hindi newswire:
/mnt/unagi/speechd1/newswires/hindi

Naiduna uses a proprietary encoding: utf8 versions of our 2-year archive are here:
/pkg/ldc/newswires/hindi/processed/utf8/*.sgml

/mnt/unagi/speechd16/TIDES/Surprise/HINDI/*
/mnt/unagi/speechd16/TIDES/Surprise/HINDI.txt,v

EMILLE corpus:  /speechd16/TIDES/Surprise/HINDI/EMILLE

IIT dictionary (cleaned up): /speechd16/TIDES/Surprise/HINDI/IIIT_Dictionary/eng_hin_dict.txt

Speech corpus LDC96S52 CALLFRIEND Hindi

/mnt/talk/Surprise/HINDI/parallel_text/www.rediff.com/hindi-tkn/*.text

TidesSLList mailing list TidesSLList@ldc.upenn.edu
http://www.ldc.upenn.edu/mailman/listinfo.cgi/tidessllist
http://ldc.upenn.edu/Project/SurpriseLanguage

Our tools for hand alignment, translation and entity tagging use QT:
>From TrollTech News Release for Qt 3.2:

> The addition of Indic script input and rendering means that Qt 3.2 now
> supports all major script-based languages, including advanced languages
> such as Hindi and Bengali. Qt 3.2 is also more efficient at font
> rendering.

Other relevant tools:

Survey of tools for Indian languages: http://www.indian-languages.org/

For extracting text from .pdf:
http://www.foolabs.com/xpdf/
http://www.foolabs.com/xpdf/download.html