"WE DELIVER UR NEEDED SOFTWARES TO ALL COUNTRIES AT CHEAP finally well". Another spam that went through one of my e-mail accounts. Curiously I could not see any message, maybe an attachment was filtered out, maybe it was just a first e-mail to deceive the filter and let others come behind. What caught my attention were two features which, again, I could easily rcognize, but that the anti-spam software was not instructed to. The first one is the sender: "Patty Bobby" firstname.lastname@example.org
. Patty Bobby is bad, but it would be hard to devise an algorithm to figure out what is a name or not. But the e-mail is worst: email@example.com
. Ok, people can be creative with e-mails, but this is clearly a random sequence of letters and numbers. Not a single vowel, these are not consonants of any easily recognizable word and it does not seem to be Hacker
language. Just a totally made-up e-mail in a very poor way. It would be too strict to expect any pattern matching between "Patty Bobby" and the e-mail, but this mismatch, plus the fact which it is a really bad string sequence could be used as a good cue to the detection of spam. The second feature of the e-mail is an old trick, the addition of a meaningful, non-spam message in the subject line after the spam part. The "finally well" is a perfectly good title, which in this case might have improved the spam score of the message. Again, a possible semantic analysis, even a brute force one, could be helpful to point out that the second part is totally bogus. The problem again is that people can be very creative with subject lines.
The problem seems to be not a trivial one because you have to go in the safe side. A false positive could be very harmful while faults are better tolerated. But maybe some very basic language usage observations might come handy in the fight against spam.