2004-08-31

NYC Police Arrest Some 560 GOP Protesters

This headline isn't all that strange...until one discovers that the arrested protesters were actually protesting the GOP Convention. It would seem then that the protesters were probably not GOP members as the headline implies. The logical conclusion of the headline is that the GOP members were rowdy--as well as stupid enough to get arrested protesting their own convention.

Why not just say "Police Arrest 560 Anti-GOP Protesters" or "560 Protesters Arrested at GOP Convention"? I wonder what Grice would say about this headline, or better, Freud?


http://story.news.yahoo.com/news?tmpl=story&cid=536&e=3&u=/ap/20040901/ap_on_el_pr/cvn_protests

2004-08-28

Technosexual

Over the last year or so we gained the wonderful term "Metrosexual." Softhanders John Kerry and Edwards and other politicians clamored to let it be known they were indeed metrosexual. I think Howard Dean was actually an unmetrosexual, or unmetrosexualish. Anyway, the word always reminds me of that kind of scummy guy in Steinbeck's "Of Mice and Men"--the one that kept one of his hands in a glove full of vaseline so it would always be soft when he needed it.
Now we have "technosexuals". If you think you might be a technosexual you might check out the site below:
"
http://www.technosexual.org/
It probably won't be too long before we have linguisexuals--maybe even phonetisexuals and semantisexuals.

2004-08-27

Dirty Mnemonics?

It would appear that it is after all the medical students having all the fun--at least judging by the site linked below from St. Louis Medical School.
http://www.saintcyr.com/2001/mnemonic.html

Nothing beats the old classic list for remembering the branches of the superior thyroid artery:

MAY muscular
I infrahyoid
SOFTLY superior laryngeal
SQUEEZE sternomastoid
CHARLIE'S cricothyroid
GIRL glandular

Or, to remember the bones of the wrist:
SLOWLY scaphoid
LOWER lunate
TILLY'S triquetrum
PANTS pisiform
TO trapezium
THE trapezoid
CURLY capitate
HAIRS hamate

Yes, this is all quite exciting. The best quasi-mnemonic I ever used as a student of language was to learn the dative/accusative prepositions of German to the tune of "I'm too Sexy". However, this is quite lame compared to the St. Louis med students.



2004-08-24

Britney Spears and Zipf's law


Spellings (including the correct) for 'Britney Spears' as detected by Google spellchecker and respective rankings in a log scale.

Yep, it seems that people do not misspell at random. Not that surprising or insightful, but kinda cute.

Misspelling 'Britny Spears'

It seems that guys on Google did part of the job of detecting common mispellings in a purely statistical way: Britney Spears spelling correction: "britny spears"

By the way, I found the page by checking the correct spelling of 'misspellings'. Something I usually do when in doubt for common words. For not so common words, you better rely on Merriam-Webster or equivalent.

Cursing leaders: Lenin, Stalin, Khrushchev, Brezhnev, Yeltsin, Putin (until he finished 6th grade)

Professor of linguistics Tatiana Akhmetova on the role of cursing in Russian politics and culture"I can call many names of the people from the cultural elite of the past and the present who curse much", says professor of linguistics Tatian Akhmetova who has been studying Russian cursing words all her life.

http://english.pravda.ru/main/18/88/351/13831_cursing.html

I think I may have found my new calling as a linguistics student. In the future, I will either study curse words or I will peruse www.hotornot.com in search of sexy names. Maybe I can find some way to combine the two.

Bioinformatics techniques and spam

It seems that the fight against spam is a tough one and not only Microsot, but IBM is investing heavily on it. The last news about it is the use of DNA sequencing algorithms to detect spam:

Instead of chains of characters representing DNA sequences, the research group fed the algorithm 65,000 examples of known spam. Each email was treated as a long, DNA-like chain of characters. Teiresias identified six million recurring patterns in this collection, such as "Viagra".


And it seems that the new algorithm is quite aware of the spammer tricks:

Chung-Kwei deals with common spammer strategies to dodge pattern-recognition schemes, such as replacing the s with a $, as in "increa$e your $ex power" using its built-in tolerance for different, but functionally equivalent, DNA sequences.

The success rate is 97%, quite good and probably better than most speech recognitions algorithms. The false positives are around 1 in 6000, also not bad at all.

One possible flaw, is that the algorith needs to let go through large messages with few spam-like sequences. Very easy to imagine that spammers will just add a load of gibberish in the end of the e-mail to decrease the spam-like/good text ratio. Position of the spam sequences certainly counts too. I wonder whether in the future we will have to be careful about e-mail content. If a guy advises a friend to try Viagra in a short message, this might become spam...

Also, no mention about the consonant/vowel multiplying technique which I mentioned in the other post.

One funny note: to train the algorithm, non-spam e-mails are used. These are called 'ham'.

Article can be read here.

2004-08-22

Being a non-native English speaker

My friends probably have several tales from my confusions and mismatches in English, still a foreign language for me. Now it was a headline who caught me off-guard. A second reading was necessary to figure it out:

"Vietnam Vet Says Has No Proof for Claim Kerry Lied"

Despite being aware of the whole Kerry issue, my first reading was "Vietnam Veterinary", which is good up to "claim". Can we call this a naive/non-native garden path?

As a bonus, I must confess that it took me quite some time before I figure out the meaning of GOP and POW. Now, I just use Google:

Anti-spam technology improvement

I do not receive that much spam on most of my e-mail accounts, especially because of anti-spam server tools. I never relied that much on automatic spam tagging because of false positives, but it is more rare now (it happens once in a while with requested commercial e-mail. One I received from my ISP account, correctly tagged as spam, is curious:

''Cheeapest Medicaationns
High Qua1ity
shiiiip to all countriies
70% off discccountt


Cl1ick to ennjoy our offfeer''

The name of the fake sender is also curious: "Claretta Masako", a mix of italian and japanese name? But it could pass for a perfectly good american name. Anyway, even with the intentional spelling eeeerrrorrrrs, it was tagged as spam. And it did not contain any of the easily identifiable spam terms like "Viagra", " low interest rates", "size does matter", etc. Another possibility was the blocking by IP, but I doubt it, since it is apparently it is from Cox Communications, and it would be really bad if my provider blocked ALL e-mail coming from them. Now it occured me that it could use both information...

The point is that humans can easily recognize the trick, but, with language like english which allows lots of doubled consonants (and vowels), an algorithm to detect the trick is tricky, especially when the spammer is also using numbers inside the words. A dictionary approach is not feasible, due to the large number of possibilities. Maybe a dictionary approach with some good string matching (regular expressions) and probably in a quantitative fashion, but I am not sure about that either. I doubt there are people with linguistic background helping to improve anti-spam technology, and I doubt they are really necessary at all, but it certainly has a lot to do with language.

LO

2004-08-20

How Language Shapes Math

Members of a tiny tribe in the Amazon jungle that has no words for numbers beyond two can't conceptualize numbers any better than chimps or human infants do, a new study has found. The research attempts to cast light on a long-standing puzzle among linguists: whether concepts can exist without words to express them.
http://sciencenow.sciencemag.org/cgi/content/full/2004/819/1

That seems to be hot and controversial. We got this one before Language Log guys, but comments will have to wait a bit more. In case direct access is not possible, go through Google News (first and second hits):

http://news.google.com/news?sourceid=navclient&ie=UTF-8&q=language+math



2004-08-19

Poms want Aussies to talk proper

STONE the bloody crows! Wotcha mean, we can't speak English? Australians wanting to become British citizens must prove they can speak English under new rules introduced by the British Government. [...] "Just because someone's born in an English-speaking country doesn't mean to say they're exempt from these standards of proof," a Home Office spokesman said.


http://www.news.com.au/common/story_page/0,4057,10507795%255E421,00.html

If sounding sexy is the name of game, choose your vowels carefully

An MIT linguist, Amy Perfors, found it out by posting photos with fake names on the Web site, "Hot or Not", which allows the face police to rate strangers' looks. She found that men's photos tagged with "front vowel" names (say, Matt) were rated as more attractive than the same photos labeled with "back vowel" names (Paul). The opposite was true for women. (Rose: not sexy. I think.)


I don't know what those zany MIT linguists will come up with next. My feelings are mixed on this study, as my first name vowel is front, but not tensed. I think this would make me sexy. On the other hand, I've never been accused of being such.

http://www.clarionledger.com/apps/pbcs.dll/article?AID=/20040819/COL0204/408190349/1023/FEAT05

Nuclear Data Found Missing From New Mexico

Nuclear Data Found Missing From New Mexico

This one gets posted just because it's another great AP headline. Having read the article, I still do not know if it is previously lost nuclear data that is now found, previously not lost nuclear data that is now missing, or some other nuclear data that has been found missing.

http://story.news.yahoo.com/news?tmpl=story2&u=/ap/20040820/ap_on_go_ca_st_pe/nuclear_security

2004-08-10

Fancy IPA

An interactive IPA chart by Eric Armstrong and Paul Meier. Very nice job using Flash technology. A difference in this chart is that it includes consonants in coda position both released and unreleased where possible (the released is with added aspiration). I did not try to analyze, but the vowel [a] used in the consonant chart sounds too nasal even in non-nasal contexts. The chart also includes diphtongs and triphtongs in Received Pronunciation and General American (whatever it be).

http://www.paulmeier.com/ipa/charts.html

The official chart (without sounds):

http://www.arts.gla.ac.uk/IPA/fullchart.html

2004-08-09

Gorilla Seeks Help Using Sign Language

WOODSIDE, Calif. - When Koko the gorilla used the American Sign Language gesture for pain and pointed to her mouth, 12 specialists, including three dentists, sprang into action. The result? Her first full medical examination in about 20 years, an extracted tooth and a clean bill of health.

[...]

Koko and Ndume, her partner of 11 years (he doesn't "speak"), have been trying unsuccessfully to have a baby

I'm glad to hear Koko is okay and that those ASL signs are coming in handy. Best wishes to Koko and Ndume in their quest for parenthood.

http://story.news.yahoo.com/news?tmpl=story&cid=519&e=1&u=/ap/koko_s_health






Via Cell, Help's on the Way for Bad Dates

A quick note here on a new career path for linguistics Phds:

[...] fake "rescue" calls — now being offered by two cell phone providers, Cingular Wireless and Virgin Mobile USA. In an era of Internet-set dates, it's just customer service — a hip way to wiggle out of an uncomfortable encounter. [...]

For both Cingular and Virgin Mobile, the prerecorded messages are created at a high-tech central command in California's Silicon Valley. There, five people with doctorates in linguistics dream up excuses for folks to repeat before suddenly dropping a date gone sour.

http://story.news.yahoo.com/news?tmpl=story&ncid=716&e=9&u=/ap/20040808/ap_on_hi_te/date_rescue_calls

2004-08-08

Rewriting the Bible was never so hip...

An interesting piece on an interesting young man's interpretation of King James. It ain't your father's King James, though. I've heard rumors of an Ebonics version of King James too, but I've never actually seen it. Until I do, however, I suppose I will have to make do with that of Welsh performance artist, Rick Lacey's effort, from "The Word on the Street":

"Genesis 1:1-2: First off, nothing ... but God. No light, no time, no substance, no matter. Second off, God says the word, and WHAP! Stuff everywhere! The cosmos in chaos: no shape, no form, no function -- just darkness ... total. And floating above it all, God's Holy Spirit, ready for action.

Not exactly the King James Bible, eh?

http://www.kentucky.com/mld/heraldleader/living/religion/9333338.htm