Catalog Number |
Corpus Name |
LDC96S49 |
CALLFRIEND Egyptian Arabic |
LDC97S45 |
CALLHOME Egyptian Arabic Speech |
LDC97T19 |
CALLHOME Egyptian Arabic Transcripts |
LDC99L22 |
Egyptian Colloquial Arabic Lexicon |
LDC2001T55 |
Arabic Newswire Part 1 |
LDC2002L49 |
Buckwalter Arabic Morphological Analyzer Version 1.0 |
LDC2002S02 |
West Point Arabic Speech |
LDC2002S22 |
1997 HUB5 Arabic Evaluation |
LDC2002S37 |
CALLHOME Egyptian Arabic Speech Supplement |
LDC2002T38 |
CALLHOME Egyptian Arabic Transcripts Supplement |
LDC2002T39 |
1997 HUB5 Arabic Transcripts |
LDC2003T06 |
Arabic Treebank: Part 1 v 2.0 |
LDC2003T07 |
Arabic Treebank: Part 1 - 10K-word English Translation |
LDC2003T12 |
Arabic Gigaword |
LDC2003T18 |
Multiple-Translation Arabic (MTA) Part 1 |
LDC2004L02 |
Buckwalter Arabic Morphological Analyzer Version 2.0 |
LDC2004T02 |
Arabic Treebank: Part 2 v 2.0 |
LDC2004T09 |
TIDES Extraction (ACE) 2003 Multilingual Training Data |
LDC2004T11 |
Arabic Treebank: Part 3 v 1.0 |
LDC2004T17 |
Arabic News Translation Text Part 1 |
LDC2004T18 |
Arabic English Parallel News Part 1 |
LDC2004T23 |
Prague Arabic Dependency Treebank 1.0 |
LDC2005S07 |
Arabic CTS Levantine Fisher Training Data Set 3, Speech |
LDC2005S08 |
BBN/AUB DARPA Babylon Levantine Arabic Speech and Transcripts |
LDC2005S11 |
TDT4 Multilingual Broadcast News Speech Corpus |
LDC2005S14 |
Levantine Arabic QT Training Data Set 4 (Speech + Transcripts) |
LDC2005S26 |
CSLU: 22 Languages Corpus |
LDC2005T02 |
Arabic Treebank: Part 1 v 3.0 (POS with full vocalization + syntactic analysis) |
LDC2005T03 |
Arabic CTS Levantine Fisher Training Data Set 3, Transcripts |
LDC2005T05 |
Multiple-Translation Arabic (MTA) Part 2 |
LDC2005T09 |
ACE 2004 Multilingual Training Corpus |
LDC2005T16 |
TDT4 Multilingual Text and Annotations |
LDC2005T20 |
Arabic Treebank: Part 3 (full corpus) v 2.0 (MPG + Syntactic Analysis) |
LDC2005T30 |
Arabic Treebank: Part 4 v 1.0 (MPG Annotation) |
LDC2006S29 |
Levantine Arabic QT Training Data Set 5, Speech |
LDC2006S31 |
2003 NIST Language Recognition Evaluation |
LDC2006S43 |
Gulf Arabic Conversational Telephone Speech |
LDC2006S45 |
Iraqi Arabic Conversational Telephone Speech |
LDC2006S46 |
Arabic Broadcast News Speech |
LDC2006T02 |
Arabic Gigaword Second Edition |
LDC2006T06 |
ACE 2005 Multilingual Training Corpus |
LDC2006T07 |
Levantine Arabic QT Training Data Set 5, Transcripts |
LDC2006T10 |
English-Arabic Treebank v 1.0 |
LDC2006T15 |
Gulf Arabic Conversational Telephone Speech, Transcripts |
LDC2006T16 |
Iraqi Arabic Conversational Telephone Speech, Transcripts |
LDC2006T18 |
TDT5 Multilingual Text |
LDC2006T19 |
TDT5 Topics and Annotations |
LDC2006T20 |
Arabic Broadcast News Transcripts |
LDC2007S01 |
Levantine Arabic Conversational Telephone Speech |
LDC2007S02 |
Fisher Levantine Arabic Conversational Telephone Speech |
LDC2007S10 |
2003 NIST Rich Transcription Evaluation Data |
LDC2007T01 |
Levantine Arabic Conversational Telephone Speech, Transcripts |
LDC2007T04 |
Fisher Levantine Arabic Conversational Telephone Speech, Transcripts |
LDC2007T08 |
ISI Arabic-English Automatically Extracted Parallel Text |
LDC2007T20 |
GALE Phase 1 Distillation Training |
LDC2007T24 |
GALE Phase 1 Arabic Broadcast News Parallel Text - Part 1 |
LDC2007T40 |
Arabic Gigaword Third Edition |
LDC2007V01 |
TRECVID 2005 Keyframes & Transcripts |
LDC2008T02 |
GALE Phase 1 Arabic Blog Parallel Text |
LDC2008T09 |
GALE Phase 1 Arabic Broadcast News Parallel Text - Part 2 |
LDC2009S04 |
2007 NIST Language Recognition Evaluation Test Set |
LDC2009S05 |
2007 NIST Language Recognition Evaluation Supplemental Training Set |
LDC2009T03 |
GALE Phase 1 Arabic Newsgroup Parallel Text - Part 1 |
LDC2009T05 |
2008 NIST Metrics for Machine Translation (MetricsMATR08) Development Data |
LDC2009T07 |
Unified Linguistic Annotation Text Collection |
LDC2009T09 |
GALE Phase 1 Arabic Newsgroup Parallel Text - Part 2 |
LDC2009T10 |
Language Understanding Annotation Corpus |
LDC2009T11 |
REFLEX Entity Translation Training/DevTest |
LDC2009T22 |
Arabic Newswire English Translation Collection |
LDC2009T24 |
OntoNotes Release 3.0 |
LDC2009T30 |
Arabic Gigaword Fourth Edition |
LDC2010L01 |
LDC Standard Arabic Morphological Analyzer (SAMA) Version 3.1 |
LDC2010S07 |
Asian Spoken Language Sampler |
LDC2010T01 |
NIST Open MT 2008 Evaluation (MT08) Selected References and System Translations |
LDC2010T08 |
Arabic Treebank: Part 3 v 3.2 |
LDC2010T10 |
NIST 2002 Open Machine Translation (OpenMT) Evaluation |
LDC2010T11 |
NIST 2003 Open Machine Translation (OpenMT) Evaluation |
LDC2010T12 |
NIST 2004 Open Machine Translation (OpenMT) Evaluation |
LDC2010T13 |
Arabic Treebank: Part 1 v 4.1 |
LDC2010T14 |
NIST 2005 Open Machine Translation (OpenMT) Evaluation |
LDC2010T17 |
NIST 2006 Open Machine Translation (OpenMT) Evaluation |
LDC2010T21 |
NIST 2008 Open Machine Translation (OpenMT) Evaluation |
LDC2010T23 |
NIST 2009 Open Machine Translation (OpenMT) Evaluation |
LDC2010V02 |
TRECVID 2006 Keyframes |
LDC2011S01 |
2005 NIST Speaker Recognition Evaluation Training Data |
LDC2011S02 |
2006 NIST Spoken Term Detection Development Set |
LDC2011S03 |
2006 NIST Spoken Term Detection Evaluation Set |
LDC2011S04 |
2005 NIST Speaker Recognition Evaluation Test Data |
LDC2011S05 |
2008 NIST Speaker Recognition Evaluation Training Set Part 1 |
LDC2011S07 |
2008 NIST Speaker Recognition Evaluation Training Set Part 2 |
LDC2011S08 |
2008 NIST Speaker Recognition Evaluation Test Set |
LDC2011S09 |
2006 NIST Speaker Recognition Evaluation Training Set |
LDC2011S10 |
2006 NIST Speaker Recognition Evaluation Test Set Part 1 |
LDC2011T03 |
OntoNotes Release 4.0 |
LDC2011T05 |
2008/2010 NIST Metrics for Machine Translation (MetricsMaTr) GALE Evaluation Set |
LDC2011T09 |
Arabic Treebank: Part 2 v 3.1 |
LDC2011T11 |
Arabic Gigaword Fifth Edition |
LDC2012S01 |
2006 NIST Speaker Recognition Evaluation Test Set Part 2 |
LDC2012T06 |
GALE Phase 2 Arabic Broadcast Conversation Parallel Text Part 1 |
LDC2012T07 |
Arabic Treebank - Broadcast News v1.0 |
LDC2012T09 |
Arabic-Dialect/English Parallel Text |
LDC2012T14 |
GALE Phase 2 Arabic Broadcast Conversation Parallel Text Part 2 |
LDC2012T15 |
MADCAT Phase 1 Training Set |
LDC2012T17 |
GALE Phase 2 Arabic Newswire Parallel Text |
LDC2012T18 |
GALE Phase 2 Arabic Broadcast News Parallel Text |
LDC2013S02 |
GALE Phase 2 Arabic Broadcast Conversation Speech Part 1 |
LDC2013S07 |
GALE Phase 2 Arabic Broadcast Conversation Speech Part 2 |
LDC2013T01 |
GALE Phase 2 Arabic Web Parallel Text |
LDC2013T04 |
GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 1 |
LDC2013T06 |
1993-2007 United Nations Parallel Text |
LDC2013T07 |
NIST 2008-2012 Open Machine Translation (OpenMT) Progress Test Sets |
LDC2013T09 |
MADCAT Phase 2 Training Set |
LDC2013T10 |
GALE Arabic-English Parallel Aligned Treebank -- Newswire |
LDC2013T14 |
GALE Arabic-English Parallel Aligned Treebank -- Broadcast News Part 1 |
LDC2013T15 |
MADCAT Phase 3 Training Set |
LDC2013T17 |
GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 2 |
LDC2013T19 |
OntoNotes Release 5.0 |
LDC2014S02 |
King Saud University Arabic Speech Database |
LDC2014S07 |
GALE Phase 2 Arabic Broadcast News Speech Part 1 |
LDC2014S08 |
United Nations Proceedings Speech |
LDC2014T02 |
NIST 2012 Open Machine Translation (OpenMT) Progress Test Five Language Source |
LDC2014T03 |
GALE Arabic-English Parallel Aligned Treebank -- Broadcast News Part 2 |
LDC2014T05 |
GALE Arabic-English Word Alignment Training Part 1 -- Newswire and Web |
LDC2014T08 |
GALE Arabic-English Parallel Aligned Treebank -- Web Training |
LDC2014T09 |
HyTER Networks of Selected OpenMT08/09 Sentences |
LDC2014T10 |
GALE Arabic-English Word Alignment Training Part 2 -- Newswire |
LDC2014T14 |
GALE Arabic-English Word Alignment Training Part 3 -- Web |
LDC2014T17 |
GALE Phase 2 Arabic Broadcast News Transcripts Part 1 |
LDC2014T18 |
ACE 2007 Multilingual Training Corpus |
LDC2014T19 |
GALE Arabic-English Word Alignment -- Broadcast Training Part 1 |
LDC2014T22 |
GALE Arabic-English Word Alignment -- Broadcast Training Part 2 |
LDC2015S01 |
GALE Phase 2 Arabic Broadcast News Speech Part 2 |
LDC2015S02 |
RATS Speech Activity Detection |
LDC2015S10 |
Arabic Learner Corpus |
LDC2015S11 |
GALE Phase 3 Arabic Broadcast Conversation Speech Part 1 |
LDC2015T01 |
GALE Phase 2 Arabic Broadcast News Transcripts Part 2 |
LDC2015T05 |
GALE Phase 3 and 4 Arabic Broadcast Conversation Parallel Text |
LDC2015T07 |
GALE Phase 3 and 4 Arabic Broadcast News Parallel Text |
LDC2015T12 |
2006 CoNLL Shared Task - Arabic & Czech |
LDC2015T16 |
GALE Phase 3 Arabic Broadcast Conversation Transcripts Part 1 |
LDC2015T19 |
GALE Phase 3 and 4 Arabic Newswire Parallel Text |
LDC2015T23 |
KHATT: Handwritten Arabic Text |
LDC2016S01 |
GALE Phase 3 Arabic Broadcast Conversation Speech Part 2 |
LDC2016S07 |
GALE Phase 3 Arabic Broadcast News Speech Part 1 |
LDC2016T02 |
Arabic Treebank - Weblog |
LDC2016T06 |
GALE Phase 3 Arabic Broadcast Conversation Transcripts Part 2 |
LDC2016T08 |
GALE Phase 3 and 4 Arabic Web Parallel Text |
LDC2016T11 |
GALE Phase 4 Arabic Broadcast Conversation Parallel Sentences |
LDC2016T14 |
GALE Phase 4 Arabic Weblog Parallel Sentences |
LDC2016T17 |
GALE Phase 3 Arabic Broadcast News Transcripts Part 1 |
LDC2016T18 |
ARL Arabic Dependency Treebank |
LDC2016T20 |
GALE Phase 4 Arabic Broadcast News Parallel Sentences |
LDC2016T21 |
KAFD: Arabic Font Database |
LDC2016T24 |
JANA: A Human-Human Dialogues Corpus for Egyptian Dialect |
LDC2016T27 |
GALE Phase 4 Arabic Newswire Parallel Sentences |
LDC2017L01 |
Arabic Speech Recognition Pronunciation Dictionary |
LDC2017S02 |
GALE Phase 3 Arabic Broadcast News Speech Part 2 |
LDC2017S12 |
KSUEmotions |
LDC2017S15 |
GALE Phase 4 Arabic Broadcast Conversation Speech |
LDC2017S20 |
RATS Keyword Spotting |
LDC2017T04 |
GALE Phase 3 Arabic Broadcast News Transcripts Part 2 |
LDC2017T07 |
BOLT Egyptian Arabic SMS/Chat and Transliteration |
LDC2017T12 |
GALE Phase 4 Arabic Broadcast Conversation Transcripts |
LDC2018S05 |
GALE Phase 4 Arabic Broadcast News Speech |
LDC2018S06 |
2011 NIST Language Recognition Evaluation Test Set |
LDC2018S10 |
RATS Language Identification |
LDC2018T08 |
2007 CoNLL Shared Task - Arabic & English |
LDC2018T10 |
BOLT Arabic Discussion Forums |
LDC2018T13 |
TRAD Arabic-French Parallel Text -- Newsgroup |
LDC2018T14 |
GALE Phase 4 Arabic Broadcast News Transcripts |
LDC2018T18 |
BOLT Information Retrieval Comprehensive Training and Evaluation |
LDC2018T21 |
TRAD Arabic-French Parallel Text -- Newswire |
LDC2018T23 |
BOLT Egyptian Arabic Treebank - Discussion Forum |
LDC2019S02 |
Multi-Language Conversational Telephone Speech 2011 -- Arabic Group |
LDC2019S04 |
CALLFRIEND Egyptian Arabic Second Edition |
LDC2019T01 |
BOLT Arabic Discussion Forum Parallel Training Data |
LDC2019T06 |
BOLT Egyptian-English Word Alignment -- Discussion Forum Training |
LDC2019T18 |
BOLT Egyptian Arabic-English Word Alignment -- SMS/Chat Training |
LDC2020S04 |
2018 NIST Speaker Recognition Evaluation Test Set |
LDC2020S13 |
Phonemes of Arabic |
LDC2020T05 |
BOLT Egyptian Arabic-English Word Alignment -- Conversational Telephone Speech Training |
LDC2021L01 |
Classical Arabic Dictionary |
LDC2021S08 |
RATS Speaker Identification |
LDC2021T12 |
BOLT Egyptian Arabic Treebank - Conversational Telephone Speech |
LDC2021T14 |
BOLT Egyptian Arabic Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech |
LDC2021T15 |
BOLT Egyptian Arabic SMS/Chat Parallel Training Data |
LDC2021T17 |
BOLT Egyptian Arabic Treebank - SMS/Chat |
LDC2021T18 |
BOLT Egyptian Arabic PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech |