HELP! Spam filtering in MS Outlook...

M

Matt Bourne

Guys,

I get heaps of spam from senders with names in the following format:

Frank J. Brown
Geoff R. Jones
James L. White

I need to get a tool, third party i assume, for Outlook XP that could
possibly detect mail coming with the following format. Of course, they would
need to retain:

Frank Brown
Geoff Jones
James White

Is there something out there where you can specify the format to filter, ie
xxxx y. zzzzz for a sender name?

Thanks in advance,
Matt
 
J

Jack

Matt said:
Is there something out there where you can specify the format to
filter, ie xxxx y. zzzzz for a sender name?

Spampal, with the regular-expression plug-in.

A regular expression that would match sender-names of the form you
indicated is:
[A-Z][a-z]+\s[A-Z]\.[A-Z][a-z]+

That's a capitalised first name, space, a capitalised initial, dot
space, a capitalised surname. It would match
"Aa B. Cc"
but not
"aa B. cc", "A. B. Cccc", "Aaaa Bbbb", "A. Bbb", "Aaa Bbb Ccc".


HTH.
Jack.
 
M

Mike Easter

Jack said:
A regular expression that would match sender-names of the form you
indicated is:
[A-Z][a-z]+\s[A-Z]\.[A-Z][a-z]+

That's a capitalised first name, space, a capitalised initial, dot
space, a capitalised surname. It would match
"Aa B. Cc"
but not
"aa B. cc", "A. B. Cccc", "Aaaa Bbbb", "A. Bbb", "Aaa Bbb Ccc".

One of these days I'm going to study that regex tutorial I saw
somewhere. At the present time, it might as well be Venusian to me.
I'm always fascinated by how powerful it is. I see that middle
expression in there, but...

I also tho't that was a dumb rule Matt was trying to make.
 
J

Jack

Mike said:
Jack said:
A regular expression that would match sender-names of the form you
indicated is: [A-Z][a-z]+\s[A-Z]\.[A-Z][a-z]+

That's a capitalised first name, space, a capitalised initial, dot
space, a capitalised surname. It would match "Aa B. Cc" but not
"aa B. cc", "A. B. Cccc", "Aaaa Bbbb", "A. Bbb", "Aaa Bbb Ccc".


One of these days I'm going to study that regex tutorial I saw
somewhere. At the present time, it might as well be Venusian to me.
I'm always fascinated by how powerful it is. I see that middle
expression in there, but...

I also tho't that was a dumb rule Matt was trying to make.
I know - it looks like line-noise (but not as much as Perl looks like
line-noise!). I don't try to debug regexes longer than about 80
characters - it's quicker and easier to rewrite them from scratch.

There's an error in the expression I posted; it should read:
[A-Z][a-z]+\s[A-Z]\.\s[A-Z][a-z]+

I.e. there's a missing \s (matches whitespace) after the quoted dot.

There's a good chapter on Regex in the Oh Really book "Javascript: The
Definitive Guide" (The Rhino Book). And you can play with regex here:

http://www.cacas.org/java/gnu/regexp/

I always end up using either or both of the test applet (which you can
download and install on your local webpage) and the syntax notes, which
are the most concise summary of regex syntax that I'm aware of.

There's also a pretty terse regex syntax-summary in the Javadoc for
Sun's Java regex:

http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html

What I haven't ever found is a good treatise on what regular expressions
are *not* suited for, nor why exactly it's almost impossible to parse
HTML (or XML) using regular expressions. I suppose if I ever did find
one, it would go over my head :-( I think they are unsuitable for
"languages" that are recursive - for example, it seems to be really hard
to express within a regex a requirement such as matching a <DIV> within
exactly one enclosing <DIV>, but not necessarily as its first child. And
<!--comments--> are really hard to deal with safely using regular
expressions.

Anyhow, it seems to me that they're best-suited to the simpler, ad-hoc
parsing jobs, at which they excel. And if you don't screw up, or use the
dodgier regex extensions such as look-forward, they're surprisingly
efficient, because then a compiled regex is equivalent to a finite-state
machine.
 
T

Tan

Matt Bourne said:
Guys,

I get heaps of spam from senders with names in the following format:

Frank J. Brown
Geoff R. Jones
James L. White

I need to get a tool, third party i assume, for Outlook XP that could
possibly detect mail coming with the following format. Of course, they would
need to retain:

Frank Brown
Geoff Jones
James White

Is there something out there where you can specify the format to filter, ie
xxxx y. zzzzz for a sender name?

Thanks in advance,
Matt
 
Top