Skype Translator vows to translate multilingual voice calls - but is it any good?
The Independent July 18, 2014
"I can speak Spanish." When those words left the lips of a former colleague, my head turned and my right eyebrow automatically raised. I knew the man well. True, he had visited Spain a few times but surely not enough for him to be able to claim that he could speak the language. Anyone can say "Cerveza, por favor", but that wasn't going to equip him with what I knew was about to come next.
He hadn't heard the preceding conversation. The one in which a reporter had been discussing a complex case that would involve a call to the Spanish police. Then again, maybe he had hidden his fluency in another language and he certainly looked confident enough.
"I have some questions," said the reporter. "Please can you phone this number and speak to the cops in Madrid and let me know what they say." It's not often that you see someone's blood visibly drain from their face but I saw it that day.
As suspected, he was no more fluent in Spanish than Andrew Sachs. He made some excuse about being rusty, bravely said he would give it a go but asked if he could make the call in another room. Lo and behold, the interview never did quite work out. Something about the police not picking up the phone.
But if a service set to be launched later this year had been available, he would have been spared the embarrassment. For Skype Translator promises to automatically translate multilingual voice calls and it has the potential to revolutionise the way we talk to people whose language we do not share.
While translation programs already exist, the vast majority of them rely on the two people conversing to be in the same room at the same time. A person will talk in one language, indicate via a button or pause that they have finished talking and then get a translation that the other person can understand.
What Skype Translator will do is allow people to have fluid, remote conversations over the phone, with each side hearing the words spoken in the language they understand. As one of a number of companies working on the technology – the predominant mobile phone network in Japan NTT DoCoMo already has such a system running and Google is hoping to perfect real-time calls over the next few years – Skype is set to make a massive impact given it has more than a third of the international call market and 300 million users across the world.
Skype's forthcoming service is not an entirely new concept. For £6 an hour, Call Interpreter by Lexifone, which launched last year, lets you call an access number, dial the person you want to speak to and chat fluidly in your own language, converting it into another. But Lexifone's initial reception wasn't great, with critics saying it was frustrating and not always as seamless as it should be. When Skype launches Translator it will be on a limited beta so we could perhaps expect some teething problems too. "It is early days for this technology but the Star Trek vision for a Universal Translator isn't a galaxy away and its potential is every bit as exciting," says Gurdeep Pall, corporate vice-president of Skype. Still, such technologies go a long way towards addressing a growing need.
Businesses compete in a global market and ideas are shared across countries. Migration creates multilingual societies that bring their own needs. Police forces in England and Wales spent £40m on human translators over three years. Councils also splash the cash on employing interpreters to man phone lines. Bolton Council, which deals with a good number of Urdu, Gujarati and eastern European residents, spends £20,000 a year speaking to them via interpreters on the phone. The council is watching Skype Translator "with interest".
Multilingual: the Skype Translator in action
Much of the demand for translation has also resulted from the wars in Iraq and Afghanistan. In both countries there was a lack of military, diplomatic and intelligence personnel conversant in languages such as Arabic, Dari, Pashto and Urdu. In 2007, IBM's speech-to-speech translation software was introduced by US forces in Iraq to help them communicate more effectively with the Iraqi police, military forces and civilians. A demand such as this has only intensified the amount of research time and money being spent on machine translation.
Certainly, Skype Translator draws on years of expensive study by Microsoft Research (Microsoft bought Skype for $8.5bn in May 2011). But those who have seen Translator in action say it will be worth the effort. "I saw an early demonstration of the technology in 2010," says Dr Jeff Allen, an asdviser in computer-generated translation for business software company SAP. "Even at that time, the researchers were able to get the system to handle fairly rapid speech."
Skype Translator uses a process called "deep learning" which is based upon computerised neural networks rather than just the writing of explicit rules (the way computerised translation was carried out prior to the 1990s). Together they act as an artificial brain, able to learn the complex features of speech. It is data-hungry so the more knowledge of a language it picks up, the more accurate it becomes. What's more, in learning Spanish, for instance, it will become better at German. And even Skype doesn't know why this happens.
But then such systems will always be highly complex. Speech is rule-governed but there is more to it than just stringing words together. Linguist and author David Crystal says a perfect machine translator needs to understand the rhythm, stress and intonation of speech and consider idioms, metaphorical expression and discourse features. It needs to be culturally aware and stylistically appropriate.
"If you look at a sentence such as 'It was like Clapham Junction in there today', it can be handled in principle once the relevant linguistic analysis is done," he says. "But we are a long way from achieving that level of sophistication, even for the most well-studied languages." In other words, a machine would have to be highly intelligent to understand that Clapham Junction is a busy railway station and translate it as such.
There are also two processes at play. A translation app needs accurate voice recognition – to know exactly what you are saying without making a mistake. And it needs advanced machine translation to take those words and compute it into another language. According to Alexander Marktl, CEO of Sonico Mobile, maker of the Mac program iTranslate, there has been an imbalance of the two technologies. "The accuracy of voice recognition is improving way faster than the accuracy of machine translation," he says. "The biggest problem with speech-to-speech translation is having not one but two areas with accuracy problems. It is why you sometimes see funny results, no matter how sophisticated the technology."
For this reason, Dr Allen remains cautious. "Language is not binary and it's forever inventing new ways of saying things," he says. "This technology won't replace humans just yet. There will still be a need for professional interpreters to work in the legal and medical professions, for diplomacy and conferences. We see how difficult it is to translate political jokes from one language to another with humans. Imagine the bloopers that could be caused by a computer."
Even so, the new technology should have us conversing like never before. Crystal says the software will address the need for intelligibility and insists people will still want to learn a language. "The need to express identity is why we have different languages, dialects, and accents," he says. The machines, it seems, will help us talk but they'll never take our personality.