Spammers are increasingly obfuscating message content by misspelling spam keywords
Many spam-filtering techniques work by searching for patterns in the headers or bodies of messages. For instance, a user may decide that all e-mail they receive with the word “Viagra” in the subject line is spam, and instruct their mail program to automatically delete all such messages. To defeat such filters, the spammer may intentionally misspell commonly-filtered words or insert other characters, as in the following email example
Â
The principle of this method is to leave the word readable to humans (who can easily recognize the intended word for such misspellings), but not likely to be recognized by a literal computer program. This is only somewhat effective, because modern filter patterns have been designed to recognize blacklisted terms in the various iterations of misspelling. Other filters target the actual obfuscation methods; such as the non-standard use of punctuation or numerals into unusual places, for example: within in a word.
(Note: Using most common variations, it is possible to spell “Viagra” in over 1.3 * 1045 ways.[29])
So, how do we get around such Spam techniques?
Most of the spam that sneaks into my inbox past SpamAssassin and my Bayesian spam filter gets there because almost every word in the message is intentionally misspelled. By not giving the filter recognizable content, the messages get past. So how about a spam filter that works by spell check? If more than 50% of the words are misspelled, there’s a good bet that the message is spam or in a language I can’t read anyway.