TransLectures – Transcription and Translation of Video Lectures

transLectures is a collaborative EU project that began at the end of 2011. Its aim is to develop technology capable of automatically transcribing lectures and then translating those transcriptions into the language of the user. It makes use of “massive adaptation” learning systems that adapt to the speaker by learning from the speaker’s previous lectures, both in terms of audio and content. It even uses text extracted from the slides to help the transcription and translation process.

One of the clever methods used to improve the learning process further is the incorporation of human supervision. Initial transcribed texts are presented to the user and words that the system is not confident about are highlighted. The user is then able to correct any errors. As corrections are made the system learns and does not repeat the same mistake again. It’s an intelligent interface that minimises the effort for users – much more efficient than checking entire transcribed documents.

“Massive adaptation” provides significant improvements in the quality of automatic transcriptions and translations, and also the “intelligent interaction” techniques reduce the user effort needed to supervise those texts. A quality comparison with Google’s public transcription and translation systems (which are based on general-purpose models) showed that the results obtained in automatic translation were of similar quality, while video lecture transcriptions generated automatically with transLectures technology fared considerably better than Google’s (mostly owing to the adaptation of acoustic models to the speaker by the transLectures tools).

The transLectures tools are still under development. They will initially be implemented for the VideoLectures.Net and PoliMedia repositories, with integration into the Opencast Matterhorn platform planned so that they can be exploited in any Matterhorn-based repository.

transLectures (2011-2014) is an EU-funded project to develop innovative, cost-effective tools for the automatic transcription and translation of online educational videos.
Online collections of video material are fast becoming a staple feature of the Internet and a key educational resource. What we are working on at transLectures is a set of easy-to-use tools that will allow users to add multilingual subtitles to these videos. In doing so, they will make the content of these videos available to a much wider audience in a way that is cost-effective and sustainable over the vast collections of online video lectures being generated.

Automatic transcription tools will provide verbatim subtitles of the talks recorded on video, thereby allowing the hard-of-hearing to access this content. Language learners and other non-native speakers will also benefit from these monolingual subtitles. Meanwhile, machine translation tools will make these subtitles available in languages other than that in which the video was recorded.

Specifically, we will be developing tools for use on VideoLectures.NET, a collection of videos recorded at various academic events set up by JSI’s Centre for Knowledge Transfer, and for poliMedia, a lecture capture system designed and implemented at the UPVLC. Our tools will also be fully compatible with Opencast Matterhorn, a free, open-source platform for the management of educational audio and video content

The language pairs being targeted in this project are English, Spanish and Slovenian for transcription, and English<>Spanish, English<>Slovenian, English>French and English>German for translation.

Keywords:

language technologies, machine translation, automatic speech recognition, massive adaptation, intelligent interaction, education, video lectures, multilingualism, accessibility, opencast matternhorn

List of Beneficiaries:

Universitat Politècnica de València – UPV (Spain)
Institut Jozef Stefan – IJS (Slovenia)
Rheinisch-Westfaelische Technische Hochschule Aachen – RWTH (Germany)
Xerox S.A.S. – XEROX (France)
European Media Laboratory GMBH – EML (Germany)
Deluxe Media Europe – DDS (UK)
Knowledge for All Foundation – K4A (UK) – Third party

Knowledge 4 All Foundation Ltd.