## About this Dataset

The translation dataset is a collection parallel corpora of texts **translated from English** to other languages. There are 4 billion units in 16 domains.
There are different quality levels that impact the pricing.
Language pairs available (translation from English): Albanian, Arabic, Armenian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, Estonian, Finnish, French, Georgian, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Korean, Kyrgyz, Latvian, Lithuanian, Malay, Maltese, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Thai, Turkish, Ukrainian, Vietnamese.

This dataset is covered by our standard [Data license agreement](https://www.defined.ai/dataset/data-license-agreement). The license agreement is perpetual and allows for the commercialization of all models built on the data.


Fill out the form below and get access to Parallel Corpora dataset sample.

Fill out the form below to receive selected samples of our datasets directly in your inbox, and discover how our data can transform your AI Initiatives.

Hey, Want to See Our Datasets in Action?

#### Download full sample
- [Parallel Corpora dataset sample](https://defineddata.blob.core.windows.net/samples/DAI_OTS_Parallel_Corpora_20_Samples__04112023.zip?sv=2023-01-03&st=2024-04-02T10%3A44%3A13Z&se=2025-04-03T10%3A44%3A00Z&sr=b&sp=r&sig=tz8qkClptTbkgES5TROD%2FpHBynb05Hqi7rg%2Fi1t%2BqHk%3D)

Parallel Corpora

About this Dataset

Download Free Sample