We have Turkish members here on Ricochet (quite some number, actually), and of course we have Bill Walsh, but when a member named John H left a particularly apt response in Turkish to one of my posts, I thought--who is this guy?

So I looked up his profile:

Just another beer-drinking Texan with a Ph.D. in biochemistry, proficiency in Portuguese, and 150,000 miles on his bicycle. My vocation now is computer programming; my impossible mission, to make cat and Toyota Echo ownership look totally masculine. (Well, the cats are easy: I just treat 'em like horses, slapping their flanks and singing to 'em. As for the Echo, I don't know...at least it has a 5-speed manual.) My interests - machine translation of Turkic languages, the unavoidable faultiness of computer models, the once and future Yugoslavia, all the lusophone world, and pedaling up to people in other area codes and watching their eyes telegraph But you're not wearing Spandex! - are summarized at http://www.machine-altaica.com/.

Doesn't Ricochet have the most interesting members? 

John, if you crack the problem of machine-translating Turkic languages, you will be my personal hero. I'll leave it to you to explain how you're approaching it.

It's a hugely challenging problem because Turkish is agglutinated, ridiculously morphologically complex, and has a flexible word order--no simple "subject-verb-object" stuff around here. Machine translation, even if it results in a correctly translated root, usually messes up the morphemes. Nine times out of ten, you get gobbledygook.

Google translate, you'll see, can't even begin to fathom a phrase such as Avrupalılaştıramadıklarımızdanmışsınız--which means, "It seems you're one of those people we were not able to Europeanize." It just gives up.

Which by the way also sums up the state of Turkey's EU accession negotiations.

  • Comment Filters
Contributor Comments
Member Comments
Comment Popularity

Comments :

Dave Molinari
Joined
Jun '10
Dave Molinari

There are a lot of very funny member profiles.  Look 'em up and you'll be pleased. This one is definitely one of the best I've seen.  Bravo. 


Joined
Sep '10
GT Speetzen

Wow, a post so idiotic that it actually made me cancel my account.

Oh well,

Poof

Claire Berlinski, Ed.

GT Speetzen: Wow, a post so idiotic that it actually made me cancel my account.

Oh well,

Poof · Apr 23 at 12:41am

Interesting example. I've never found a machine translator that could render this sentence into anything like the syntax a native Turkish speaker would use to express this concept. 

Scott Reusser
Joined
May '10
Scott Reusser

 John H should be posting more.

GT Speetzen thankfully won't be.

Finster
Joined
Feb '11
Finster

GT Speetzen: Wow, a post so idiotic that it actually made me cancel my account.

Oh well,

Poof · Apr 23 at 12:41am

budala

Lady Bertrum
Joined
Apr '11
Lady Bertrum

  Now I have to go add something to my profile.  I usually leave those things blank/empty, but, then usually there's no possibility of being awarded Member of the Day.  I'm nothing if not competitive.  :-)

Is it interesting that I'm currently trying to translate hieroglyphics into Gaelic and that I don't wear gabardine while speed walking?  Too derivative?

Claire Berlinski, Ed.

Lady Bertrum:   Now I have to go add something to my profile.  I usually leave those things blank/empty, but, then usually there's no possibility of being awarded Member of the Day.  I'm nothing if not competitive.  :-)

Is it interesting that I'm currently trying to translate hieroglyphics into Gaelic and that I don't wear gabardine while speed walking?  Too derivative? · Apr 23 at 7:58am

You can have "comment of the day." How's that?

Kennedy Smith
Joined
May '10
Kennedy Smith

 See, I just feel like you should get more than Member of the Day for Turkish-English translation.  That's seriously hard stuff.  It's not for nothin that Turkey has one of the lowest English-proficiency scores on the planet.  It's Asian steppe language.  Almost no correlation.

Being thus discouraged from ever being Member of the Day, I'll just continue posting innuendo, horserace handicapping, Walter Russell Mead rip-offs, and innuendo.  Also innuendo.

 PS, not sure my profile would cut any ice.

Edited on Apr 23, 2011 at 10:02am
Anthony Aristar
Joined
Nov '10
Anthony Aristar

One of the reasons why Turkish machine translation hasn't been very successful so far is not that it's fundamentally hard for a machine, but that not much money and time has been expended on it, Claire. Yes, I know human beings find it hard to learn to agglutinate; but machines don't. Agglutinative morphology is one of the easier morphologies for machines to handle in fact: the word can be broken up into easily analyzable chunks. Morphemes in Turkish have a few fixed forms varying predictably according to vowel harmony, and so they're easily recognizable by machines. What's called "non-concatenative" morphology (i.e. where morphemes are embedded inside other morphemes, like Arabic) are far harder. Yet compare how (relatively) well Google translate does on Arabic compared to Turkish.

Sergei Nirenburg
Joined
May '10
Sergei Nirenburg

Google translation systems work exclusively on the basis corpus-based statistics. It's a matter of religion for them, it seems. Google-style translation is good for getting an idea of what a text is about, which can be quite useful.

However, I am not sure whether they even touch what's traditionally known as morphology at all. And Mr. Aristar is right on the money about agglutinating morphology: e.g., for Turkish it's more or less solved -- check the work of Kemal Oflazer, of Sabanci U. and CMU campus in the Emirates.

The real reason for bad machine translation (MT), however, is the difficulty of extracting and manipulation meaning of text. That is a very difficult and expensive task but it is a strong prerequisite for high quality translation.

Being interested in computational semantics of natural languages, I worked in MT for many years until the statisticians took over promising to get results cheaper and sooner. I had to find a different set of applications -- which, in the end, proved a boon... But that's another story.

Claire Berlinski, Ed.
Anthony Aristar: One of the reasons why Turkish machine translation hasn't been very successful so far is not that it's fundamentally hard for a machine, but that not much money and time has been expended on it, Claire. 

Really? It's a matter of investment rather than the inherent difficulty of the problem? I did not know that. I wonder if now's the time to be hitting up the famous new breed of Turkish gazillionaires to invest in it, then? 

John H.
Joined
Aug '10
John H.

Here's why machine translation won't ever be adequate for any language pair. Imagine two such machines, both perfect: whatever you give one, its translation is unambiguously translated back by the other. Well, right away you see two problems: you, an outsider, have to seed the conversation ("Uh, OK...'The cheeseheads are out in force at Lambeau'") and then the two devices cycle pointlessly ("Lambeau'da çok peynirbaşları var" by definition inevitably returns "The cheeseheads are out in force at Lambeau"). Only intrusions from life itself can keep this going in a nontrivial manner, and to program machines to keep up with life is to do their work for them.

Claire, thanks for the citation - you've shown far more generosity than I ever have - and keep l'arnin' that Turkish! You'll know you've arrived when you no longer consciously "figure out" where all the infixes go. "Figuring out" is all a machine can ever do.

Claire Berlinski, Ed.
John H.: ("Uh, OK...'The cheeseheads are out in force at Lambeau'") and then the two devices cycle pointlessly ("Lambeau'da çok peynirbaşları var" by definition inevitably returns "The cheeseheads are out in force at Lambeau").

I cannot wait to test this example on actual Turkish native speakers. 

Do you agree that the difference between the quality of machine translation of Turkish and of Arabic right now is simply a matter of investment? 

John H.
Joined
Aug '10
John H.

A prideful man, I always shoo away help from browsers that want to translate for me. Which means, really, I just don't look at a lot of stuff. I stick to sites I can read unassisted, or at least get through with a dictionary. (Turkish is in the latter category. I'm hardly conversant in it but I know nearly all the grammar. Turkish grammar may be utterly unfamiliar but as you have undoubtedly observed, it is also commendably regular - it really was not that hard to program for it, even sitting on a garden bench with my right hand grasping a cold one and my left hand petting a cat.) So, I just can't compare machine translators. And I've never had much interest in Arabic. I got to recognizing "Israel," "Palestine," and a number, which was always a body count. Sigh.

I really think the future (maybe it is already the present) of machine translation is something like the so-called "data miner" I prototyped on my site: something that finds connections, reducing a huge corpus to a much smaller one, only then to be read and recast by bilingual humans. 


Would you like to comment on this Conversation?

Become a Member for $3.67 a month.

Join the Conversation
Already a member? Sign In
Loading
Welcome Visitor

Already a Member?
Please Sign In

Become a Member to enjoy the full benefits of Ricochet:

Join Ricochet today!

Already a Member? Sign In