Linguistic knowledge for language technology: what for?

This came from a recent article in CNET news, talking about machine translation:

In the past few years, however, researchers have switched to using statistical analysis to get the job done.

"It doesn't go through a deep understanding of the meaning of a sentence. It maps one word to another," Waibel said. "Increases in computer speed and power and databases have made this a winning approach...We essentially gave up trying to do the full semantics of this thing.

I hope no linguist still believes that linguistic knowledge can ever be the main force is language technology. It won't happen in useful applications. Disappointing? Maybe. Should linguists go deeper into statistical approaches? Probably not. Unless we all sell out!

The article stresses some weaknesses of the statistical approach, especially the lack of databases. If one wishes a system translating directly between some 100 languages or so, not a tiny fraction of data exists for it.

Another curious quote from the same Waibel:

"I was born German and spent my childhood in Spain and I speak German, English, Spanish, French and Latin," he said. "My wife is Japanese so I am sort of culturally messed up."

How can anyone claim to speak Latin? One can even have a deep knowledge of Latin, but speak it? With whom? Why? As an exercise or for showing off, I guess.


The 25 Funniest Country Music Song Titles

from the Tampa Tribune:
1. Get Your Tongue Outta My Mouth Cause I'm Kissing You Good-bye.
2. I Don't Know Whether To Kill Myself Or Go Bowling.
3. If I Can't Be Number One In Your Life, Then Number Two On You.
4. I Sold A Car To A Guy Who Stole My Girl, But It Don't Run So We're Even.
5. Mama Get A Hammer (There's A Fly On Daddy's Head).
6. If The Phone Don't Ring, You'll Know It's Me.
7. She's Actin' Single And I'm Drinkin' Doubles.
8. How Can I Miss You If You Won't Go Away.
9. I Keep Forgettin' I Forgot About You.
10. I Liked You Better Before I knew You So Well.
11. I Still Miss You Baby, But My Aim's Gettin' Better.
12. I Wouldn't Take Her To A Dog Fight, Cause I'm Afraid She'd Win.
13. I'll Marry You Tomorrow, But Let's Honeymoon Tonight.
14. I'm So Miserable Without You; It's Like Having You Here.
15. I've Got Tears In My Ears From Lying On My Back Cryin' Over You.
16. If I Had Shot You When I Wanted To, I'd Be Out By Now.
17. My Head Hurts, My Feet Stink, And I Don't Love You.
18. My Wife Ran Off With My Best Friend And I Sure Do Miss Him.
19. Please Bypass My Heart.
20. She Got The Ring And I Got The Finger.
21. You Done Tore Out My Heart And Stomped That Sucker Flat.
22. You're the Reason Our Kids Are So Ugly.
23. Her Teeth Were Stained, But Her Heart Was Pure.
24. She's Looking Better After Every Beer.
25. I Ain't Never Gone To Bed With An Ugly Woman, But I Sure Woke Up With a Few.

Two that didn't make the top 25:
She's my ex and I don't know why (Salmon)
The last thing I gave her was the bird (Goerge Jones)


Blog search

The last (or next to last, impossible to know) Google service/tool is "Google Blog Search". The linguistic interest? Well, possibly it can be a way to search texts which tend to be more informal. The bad thing is that even blogs are getting spam or being put online only to advertise porn/easy money, but that is some noise you have to deal in any case you want to get usage statistic from the web. I'll start to experiment on it soon and, if I remember, I get back to report how it goes.

Another point erlated to the blog search is what are Google plans with it? Word on the net is that Google might eventually get rid of blog content in the main index. The reason for such a move is that blogs are supposedly polluting search results because they tend to keep talking about the same stories and that schews Google's algorithms too much. I have no idea if the move will ever happen, but I see that more and more search is becoming a specialized business, what really sucks. I mean, Google Scholar is great, but who knows that if we will need Google Math, Google Physics, Google Language (yeah, right). Problem is, for how long is it gonna be possible to get good search results with so many billions of indexed pages? But I will not make any prediction.