Parallel Corpora
About this Dataset
The translation dataset is a collection parallel corpora of texts translated from English to other languages. There are 4 billion units in 16 domains. There are different quality levels that impact the pricing. Language pairs available (translation from English): Albanian, Arabic, Armenian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, Estonian, Finnish, French, Georgian, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Kyrgyz, Latvian, Lithuanian, Malay, Maltese, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Thai, Turkish, Ukrainian, Vietnamese.
This dataset is covered by our standard Data license agreement. The license agreement is perpetual and allows for the commercialization of all models built on the data.
Download Free Sample
Fill out the form below and get access to Parallel Corpora dataset sample.