“Conquering Babel”

“Simultaneous translation by computer is getting closer”
From The Economist, Jan 5th, 2013, Seattle – from the print edition


IN “STAR TREK”, a television series of the 1960s, no matter how far across the universe the Starship Enterprise travelled, any aliens it encountered would converse in fluent Californian English. It was explained that Captain Kirk and his crew wore tiny, computerised Universal Translators that could scan alien brainwaves and simultaneously convert their concepts into appropriate English words.

Science fiction, of course. But the best sci-fi has a habit of presaging fact. Many believe the flip-open communicators also seen in that first “Star Trek” series inspired the design of clamshell mobile phones. And, on a more sinister note, several armies and military-equipment firms are working on high-energy laser weapons that bear a striking resemblance to phasers. How long, then, before automatic simultaneous translation becomes the norm, and all those tedious language lessons at school are declared redundant?

Not, perhaps, as long as language teachers, interpreters and others who make their living from mutual incomprehension might like. A series of announcements over the past few months from sources as varied as mighty Microsoft and string-and-sealing-wax private inventors suggest that workable, if not yet perfect, simultaneous-translation devices are now close at hand.

Over the summer, Will Powell, an inventor in London, demonstrated a system that translates both sides of a conversation between English and Spanish speakers—if they are patient, and speak slowly. Each interlocutor wears a hands-free headset linked to a mobile phone, and sports special goggles that display the translated text like subtitles in a foreign film.

In November, NTT DoCoMo, the largest mobile-phone operator in Japan, introduced a service that translates phone calls between Japanese and English, Chinese or Korean. Each party speaks consecutively, with the firm’s computers eavesdropping and translating his words in a matter of seconds. The result is then spoken in a man’s or woman’s voice, as appropriate.

Microsoft’s contribution is perhaps the most beguiling. When Rick Rashid, the firm’s chief research officer, spoke in English at a conference in Tianjin in October, his peroration was translated live into Mandarin, appearing first as subtitles on overhead video screens, and then as a computer-generated voice. Remarkably, the Chinese version of Mr Rashid’s speech shared the characteristic tones and inflections of his own voice.

Que?

Though the three systems are quite different, each faces the same problems. The first challenge is to recognise and digitise speech. In the past, speech-recognition software has parsed what is being said into its constituent sounds, known as phonemes. There are around 25 of these in Mandarin, 40 in English and over 100 in some African languages. Statistical speech models and a probabilistic technique called Gaussian mixture modelling are then used to identify each phoneme, before reconstructing the original word. This is the technology most commonly found in the irritating voice-mail jails of companies’ telephone-answering systems. It works acceptably with a restricted vocabulary, but try anything more free-range and it mistakes at least one word in four.

The translator Mr Rashid demonstrated employs several improvements. For a start, it aims to identify not single phonemes but sequential triplets of them, known as senones. English has more than 9,000 of these. If they can be recognised, though, working out which words they are part of is far easier than would be the case starting with phonemes alone.

Microsoft’s senone identifier relies on deep neural networks, a mathematical technique inspired by the human brain. Such artificial networks are pieces of software composed of virtual neurons. Each neuron weighs the strengths of incoming signals from its neighbours and send outputs based on those to other neighbours, which then do the same thing. Such a network can be trained to match an input to an output by varying the strengths of the links between its component neurons.

One thing known for sure about real brains is that their neurons are arranged in layers. A deep neural network copies this arrangement. Microsoft’s has nine layers. The bottom one learns features of the processed sound waves of speech. The next layer learns combinations of those features, and so on up the stack, with more sophisticated correlations gradually emerging. The top layer makes a guess about which senone it thinks the system has heard. By using recorded libraries of speech with each senone tagged, the correct result can be fed back into the network, in order to improve its performance.

Microsoft’s researchers claim that their deep-neural-network translator makes at least a third fewer errors than traditional systems and in some cases mistakes as few as one word in eight. Google has also started using deep neural networks for speech recognition (although not yet translation) on its Android smartphones, and claims they have reduced errors by over 20%. Nuance, another provider of speech-recognition services, reports similar improvements. Deep neural networks can be computationally demanding, so most speech-recognition and translation software (including that from Microsoft, Google and Nuance) runs in the cloud, on powerful online servers accessible in turn by smartphones or home computers. (…)

Read the entire article here

About these ads

About Anne

French native professional translator for English and German specialized in medicine, BDÜ member, former ProZ.com staff member and Translator without Borders contributor, Anne works at GxP Language Services (parent company of the Alexandria project). Co-founder of the Alexandria project and organizer of the TriKonf Conference, she also specializes in online presence with a soft spot for LinkedIn marketing, WordPress.com and search engine marketing. An experienced translation industry trainer and speaker in those matters (over 50 seminars, training sessions webinars and industry conferences presentations delivered) and regular author (Social Media Today, LinkedIn & Business magazine, BDÜ’s MDÜ…), she’s already helped hundreds of freelance translators & LSPs with their online presence. Anne is also an experienced language industry conferences organizer (24 industry events organized/managed to date).

One thought on ““Conquering Babel”

  1. Pingback: About Medical Translation - Medical Translation Services Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s