Translation Tools Could Save Less-Used Languages

Tom Simonite - Wednesday, June 6, 2012, Technology Review (published by MIT)

Languages that aren’t used online risk being left behind. New translation technology from Google and Microsoft could help them catch up.

Sometimes you may feel like there’s nothing worth reading on the Web, but at least there’s plenty of material you can read and understand. Millions of people around the world, in contrast, speak languages that are still barely represented online, despite widespread Internet access and improving translation technology.

Web giants Microsoft and Google are trying to change that with new translation technology aimed at languages that are being left behind—or perhaps even being actively killed off—by the Web. Although both companies have worked on translation technology for years, they have, until now, focused on such major languages of international trade as English, Spanish, and Chinese.

Microsoft and Google’s existing translation tools, which are free, are a triumph of big data. Instead of learning as a human translator would, by studying the rules of different languages, a translation tool’s algorithms learn how to translate one language into another by statistically comparing thousands or millions of online documents that have been translated by humans.

The two companies have both departed from that formula slightly to serve less popular languages. Google was able to recently launch experimental “alpha” support for a collection of five Indian languages (Bengali, Gujarati, Kannada, Tamil, and Telugu) by giving its software some direct lessons in grammar, while Microsoft has released a service that allows a community to build a translation system for its own language by supplying its own source material.

Google first realized it needed to give its system a grammar lesson when trying to polish its Japanese translations, says Ashish Venugopal, a research scientist working on Google’s translation software. “We were producing sentences with the verb in the middle, but in Japanese, it needs to go at the end,” Venugopal says. The problem stemmed from the system being largely blind to grammar. The fix that the Google team came up with—adding some understanding of grammar—enabled the launch of the five Indic languages, all used by millions on the subcontinent but largely missing from the Web.

Google’s system was trained in grammar by giving it a large collection of sentences in which the grammatical parts had been labeled—more instruction than Google’s translation algorithms typically receive.

Venugopal says that, so far, the system can’t handle the underserved languages as well as Google’s existing translation technology can handle more established languages, such as French and German. But, he says, offering any support at all is important for languages that are relatively rare online. “It’s an important part of our mission to make those other languages available on the Web,” he says. “We don’t want people to have to decide whether to publish their blog in their own language or in English. We want to help the world read your blog.”

Microsoft is also interested in helping languages not in common use online, to prevent those languages from being sidelined and falling from use, says Kristin Tolle, a director at Microsoft Research. Her team recently launched a website that helps anyone to create their own translation software, called Translation Hub. It is intended for communities that wish to ensure their language is used online.

Using Translation Hub involves creating an account and then uploading source materials in the two languages to be translated between. Microsoft’s machine-learning algorithms use that material and can then attempt to translate any text written in the new language. Microsoft piloted that technology in collaboration with leaders of Fresno, California’s large Hmong community, for whose language a machine translation system does not exist.

“Allowing anyone to create their own translation model can help communities save their languages,” says Kristin Tolle, a director at Microsoft Research. Machine translation systems have been developed for roughly 100 of the world’s 7,000 languages, says Tolle.

“There is a lot of truth to what Microsoft is saying,” says Greg Anderson, director of nonprofit Living Tongues, which documents, researches, and tries to support disappearing languages. “Today’s playing field involves a digital online presence whether you are community or a company—if you don’t have a Web presence, you don’t exist, on some level.” Anderson says that sidelined languages making a comeback are usually those from communities that have embraced online life using their language.

Margaret Noori, a lecturer at University of Michigan who works to preserve the Anishinaabemowin or Ojibwe, a native American language, agrees, but adds that preserving a language involves more than the Web. “There is a reason to be online in today’s world, but it absolutely must be balanced by songs sung only aloud and ceremonies never recorded.”

Microsoft’s Translation Hub is also aimed at enabling the translation of specialist technical terms or jargon, which general purpose online translation tools do not handle well. Nonprofits could, for example, use it to translate materials on agricultural techniques, says Tolle, and the technology can also be useful to companies that wish to speed up translation of instruction manuals or other material.

“Companies often want to have their data available to them privately and retain their data—not to provide it to someone else that will train a translation system,” she says. Volvo and Mercedes have expressed an interest in testing Microsoft’s Translation Hub, says Tolle.

Tom Simonite - Wednesday, June 6, 2012,
Source:  Technology Review (published by MIT)

Gmail now features Automatic Translation

After Google Translate passed the bar of 200 million monthly users last week (see here), it surely is no coincidence that Gmail announced 3 new features today, including… automatic message translation.

This feature originally comes from Gmail Labs (for those not familiar with the concept, Gmail Labs allow users to test gadget features on their own Gmail before they become standard features or disappear) and has been such a hit among users (particularly Business Apps users) that it is now an official, standard add-on on Gmail.

Below is the official announcement from Jeff Chin, Product Manager at Google Translate

Say hello (or olá or halo or salam) to automatic message translation in Gmail

“We’re excited to announce three Gmail Labs graduations today: Automatic Message Translation, Smart Mute and Title Tweaks.

Automatic Message Translation
Did you ever dream about a future where your communications device could transcend language with ease? Well, that day is a lot closer. Back when we launched automatic message translation in Gmail Labs, we were curious to see how people would use it.

We heard immediately from Google Apps for Business users that this was a killer feature for working with local teams across the world. Some people just wanted to easily read newsletters from abroad. Another person wrote in telling us how he set up his mom’s Gmail to translate everything into her native language, thus saving countless explanatory phone calls (he thanked us profusely). I continue to use it to participate in discussions with the global Google offices I often visit.

Since message translation was one of the most popular labs, we decided it was time to graduate from Gmail Labs and move into the real world. Over the next few days, everyone who uses Gmail will be getting the convenience of translation added to their email. The next time you receive a message in a language other than your own, just click on Translate message in the header at the top of the message, and it will be instantly translated into your language.”

Read more on the Official Gmail blog here.

Google Translate: 200 million monthly users

Google Translate had barely celebrated its 6th birthday that it reached  200 million monthly users, as Google announced earlier this week.

Franz Och, research scientist at Google Translate: ”In a given day we translate roughly as much text as you’d find in 1 million books. To put it another way: what all the professional human translators in the world produce in a year, our system translates in roughly a single day (…) We imagine a future where anyone in the world can consume and share any information, no matter what language it’s in, and no matter where it pops up.”

Wow. Imagine…What all the professional human translators in the world produce in a year, the Google Translate system translates in in one day.

Of course this is a simplistic view, and of course Google Translate can’t quite do what we do. The job of a professional, specialized translator goes beyond simply translating words and putting them in the right order to make a sentence out of it. Of course the machine does not have the background and the technical knowledge to translate a specific technical document. Of course the machine is not aware of specific terminology specified by the client. Of course the machine does not have the cultural knowledge allowing it to do much more than just translate, but adapt to the target audience/market. Of course. And of course – and this is a very important point – Google Translate is one thing, it’s great to translate “I love you into 64 languages”but there are many LSPs and companies who developed (and are developing) their very own machine translation solutions, completely customized to professional specialized translators, with stunning results.

As a translator from the “new generation”, I am not afraid of machine translation at all. CAT-Tools always belonged to my job, I did not know “the time before CAT”. So maybe this is why I see Machine Translation as the natural, normal, next step. I am also convinced that the machine will never replace the human brains when it comes to translation. But I am convinced that we will have to evolve, that the translator’s job will evolve – and that we’ may probably be “post-editors” rather than translators in a few years. Just like when CAT-Tools came and many translators saw them as a threat, as a personal insult, as a danger, Machine Translation is coming anyway, whether we like it or not – and my opinion is simple: MT is not a threat. MT is the next logical step. MT is a very powerful tool that can really help us do our job faster and better. So why not adapt and make it our best ally?

Bottom line: Machine Translation is coming – it’s actually already there – and it’s getting better and better. Exactly how long will half of the industry pretending it’s not happening?

Just my two cents.

Anyway, for those interested in knowing more about Google’s projects and plans for the future of Google Translate, here’s the blog post from Franz Och on the Google Team blog.

Breaking down the language barrier—six years in

“The rise of the web has brought the world’s collective knowledge to the fingertips of more than two billion people. With just a short query you can access a webpage on a server thousands of miles away in a different country, or read a note from someone halfway around the world. But what happens if it’s in Hindi or Afrikaans or Icelandic, and you speak only English—or vice versa?

In 2001, Google started providing a service that could translate eight languages to and from English. It used what was then state-of-the-art commercial machine translation (MT), but the translation quality wasn’t very good, and it didn’t improve much in those first few years. In 2003, a few Google engineers decided to ramp up the translation quality and tackle more languages. That’s when I got involved. I was working as a researcher on DARPA projects looking at a new approach to machine translation—learning from data—which held the promise of much better translation quality. I got a phone call from those Googlers who convinced me (I was skeptical!) that this data-driven approach might work at Google scale.” (Read more)