1st Edition
by Estelle Maryline Delpech (Author)
Computer-assisted translation
(CAT) has always used translation memories, which require the translator
to have a corpus of previous translations that the CAT software can use
to generate bilingual lexicons. This can be problematic when the
translator does not have such a corpus, for instance, when the text
belongs to an emerging field. To solve this issue, CAT research has
looked into the leveraging of comparable corpora, i.e. a set of texts,
in two or more languages, which deal with the same topic but are not
translations of one another.
This work had two primary
objectives. The first is to assess the input of lexicons extracted from
comparable corpora in the context of a specialized human translation
task. The second objective is to identify bilingual-lexicon-extraction
methods which best match the translators’ needs, determining the current
limits of these techniques and suggesting improvements. The author
focuses, in particular, on the identification of fertile translations,
the management of multiple morphological structures, and the ranking of
candidate translations.
The experiments are carried out on two
language pairs (English–French and English–German) and on specialized
texts dealing with breast cancer. This research puts significant
emphasis on applicability – methodological choices are guided by the
needs of the final users. This book is organized in two parts: the first
part presents the applicative and scientific context of the research,
and the second part is given over to efforts to improve compositional
translation.
The research work presented in this book received
the PhD Thesis award 2014 from the French association for natural
language processing (ATALA).