1
0
mirror of https://git.savannah.gnu.org/git/emacs.git synced 2025-01-13 16:38:14 +00:00

2006-04-20 Reiner Steib <Reiner.Steib@gmx.de>

* gnus.texi (Spam Statistics Package): Fix typo in @pxref.
	(Splitting mail using spam-stat): Fix @xref.

2006-04-20  Chong Yidong <cyd@stupidchicken.com>

	* gnus.texi (Spam Package): Major revision of the text.  Previouly
	this node was "Filtering Spam Using The Spam ELisp Package".
This commit is contained in:
Reiner Steib 2006-04-20 20:14:50 +00:00
parent 5a02d811ed
commit 93f86ee0b1
2 changed files with 317 additions and 286 deletions

View File

@ -1,3 +1,13 @@
2006-04-20 Reiner Steib <Reiner.Steib@gmx.de>
* gnus.texi (Spam Statistics Package): Fix typo in @pxref.
(Splitting mail using spam-stat): Fix @xref.
2006-04-20 Chong Yidong <cyd@stupidchicken.com>
* gnus.texi (Spam Package): Major revision of the text. Previouly
this node was "Filtering Spam Using The Spam ELisp Package".
2006-04-20 Carsten Dominik <dominik@science.uva.nl>
* org.texi: (Time stamps): Better explanation of the purpose of
@ -8,7 +18,7 @@
2006-04-18 J.D. Smith <jdsmith@as.arizona.edu>
* misc.texi (Shell Ring): Added notes on saved input when
navigating off the end of the history list.
navigating off the end of the history list.
2006-04-18 Chong Yidong <cyd@mit.edu>

View File

@ -799,7 +799,8 @@ Various
* Moderation:: What to do if you're a moderator.
* Image Enhancements:: Modern versions of Emacs/XEmacs can display images.
* Fuzzy Matching:: What's the big fuzz?
* Thwarting Email Spam:: A how-to on avoiding unsolicited commercial email.
* Thwarting Email Spam:: Simple ways to avoid unsolicited commercial email.
* Spam Package:: A package for filtering and processing spam.
* Other modes:: Interaction with other modes.
* Various Various:: Things that are really various.
@ -818,7 +819,8 @@ Image Enhancements
* X-Face:: Display a funky, teensy black-and-white image.
* Face:: Display a funkier, teensier colored image.
* Smileys:: Show all those happy faces the way they were meant to be shown.
* Smileys:: Show all those happy faces the way they were
meant to be shown.
* Picons:: How to display pictures of what you're reading.
* XVarious:: Other XEmacsy Gnusey variables.
@ -828,28 +830,19 @@ Thwarting Email Spam
* Anti-Spam Basics:: Simple steps to reduce the amount of spam.
* SpamAssassin:: How to use external anti-spam tools.
* Hashcash:: Reduce spam by burning CPU time.
* Filtering Spam Using The Spam ELisp Package::
* Filtering Spam Using Statistics with spam-stat::
Filtering Spam Using The Spam ELisp Package
Spam Package
* Spam ELisp Package Sequence of Events::
* Spam ELisp Package Filtering of Incoming Mail::
* Spam ELisp Package Global Variables::
* Spam ELisp Package Configuration Examples::
* Blacklists and Whitelists::
* BBDB Whitelists::
* Gmane Spam Reporting::
* Anti-spam Hashcash Payments::
* Blackholes::
* Regular Expressions Header Matching::
* Bogofilter::
* ifile spam filtering::
* spam-stat spam filtering::
* SpamOracle::
* Extending the Spam ELisp package::
* Spam Package Introduction::
* Filtering Incoming Mail::
* Detecting Spam in Groups::
* Spam and Ham Processors::
* Spam Package Configuration Examples::
* Spam Back Ends::
* Extending the Spam package::
* Spam Statistics Package::
Filtering Spam Using Statistics with spam-stat
Spam Statistics Package
* Creating a spam-stat dictionary::
* Splitting mail using spam-stat::
@ -20797,7 +20790,8 @@ four days, Gnus will decay the scores four times, for instance.
* Fetching a Group:: Starting Gnus just to read a group.
* Image Enhancements:: Modern versions of Emacs/XEmacs can display images.
* Fuzzy Matching:: What's the big fuzz?
* Thwarting Email Spam:: A how-to on avoiding unsolicited commercial email.
* Thwarting Email Spam:: Simple ways to avoid unsolicited commercial email.
* Spam Package:: A package for filtering and processing spam.
* Other modes:: Interaction with other modes.
* Various Various:: Things that are really various.
@end menu
@ -22479,8 +22473,6 @@ This is annoying. Here's what you can do about it.
* Anti-Spam Basics:: Simple steps to reduce the amount of spam.
* SpamAssassin:: How to use external anti-spam tools.
* Hashcash:: Reduce spam by burning CPU time.
* Filtering Spam Using The Spam ELisp Package::
* Filtering Spam Using Statistics with spam-stat::
@end menu
@node The problem of spam
@ -22796,41 +22788,107 @@ hashcash cookies, it is expected that this is performed by your hand
customized mail filtering scripts. Improvements in this area would be
a useful contribution, however.
@node Filtering Spam Using The Spam ELisp Package
@subsection Filtering Spam Using The Spam ELisp Package
@node Spam Package
@section Spam Package
@cindex spam filtering
@cindex spam
The idea behind @file{spam.el} is to have a control center for spam detection
and filtering in Gnus. To that end, @file{spam.el} does two things: it
filters new mail, and it analyzes mail known to be spam or ham.
@dfn{Ham} is the name used throughout @file{spam.el} to indicate
non-spam messages.
The Spam package provides Gnus with a centralized mechanism for
detecting and filtering spam. It filters new mail, and processes
messages according to whether they are spam or ham. (@dfn{Ham} is the
name used throughout this manual to indicate non-spam messages.)
@menu
* Spam Package Introduction::
* Filtering Incoming Mail::
* Detecting Spam in Groups::
* Spam and Ham Processors::
* Spam Package Configuration Examples::
* Spam Back Ends::
* Extending the Spam package::
* Spam Statistics Package::
@end menu
@node Spam Package Introduction
@subsection Spam Package Introduction
@cindex spam filtering
@cindex spam filtering sequence of events
@cindex spam
You must read this section to understand how the Spam package works.
Do not skip, speed-read, or glance through this section.
@cindex spam-initialize
First of all, you @strong{must} run the function
@code{spam-initialize} to autoload @code{spam.el} and to install the
@code{spam.el} hooks. There is one exception: if you use the
@code{spam-use-stat} (@pxref{spam-stat spam filtering}) setting, you
should turn it on before @code{spam-initialize}:
@vindex spam-use-stat
To use the Spam package, you @strong{must} first run the function
@code{spam-initialize}:
@example
(setq spam-use-stat t) ;; if needed
(spam-initialize)
@end example
So, what happens when you load @file{spam.el}?
This autoloads @code{spam.el} and installs the various hooks necessary
to let the Spam package do its job. In order to make use of the Spam
package, you have to set up certain group parameters and variables,
which we will describe below. All of the variables controlling the
Spam package can be found in the @samp{spam} customization group.
First, some hooks will get installed by @code{spam-initialize}. There
are some hooks for @code{spam-stat} so it can save its databases, and
there are hooks so interesting things will happen when you enter and
leave a group. More on the sequence of events later (@pxref{Spam
ELisp Package Sequence of Events}).
There are two ``contact points'' between the Spam package and the rest
of Gnus: checking new mail for spam, and leaving a group.
You get the following keyboard commands:
Checking new mail for spam is done in one of two ways: while splitting
incoming mail, or when you enter a group.
The first way, checking for spam while splitting incoming mail, is
suited to mail back ends such as @code{nnml} or @code{nnimap}, where
new mail appears in a single spool file. The Spam package processes
incoming mail, and sends mail considered to be spam to a designated
``spam'' group. @xref{Filtering Incoming Mail}.
The second way is suited to back ends such as @code{nntp}, which have
no incoming mail spool, or back ends where the server is in charge of
splitting incoming mail. In this case, when you enter a Gnus group,
the unseen or unread messages in that group are checked for spam.
Detected spam messages are marked as spam. @xref{Detecting Spam in
Groups}.
@cindex spam back ends
In either case, you have to tell the Spam package what method to use
to detect spam messages. There are several methods, or @dfn{spam back
ends} (not to be confused with Gnus back ends!) to choose from: spam
``blacklists'' and ``whitelists'', dictionary-based filters, and so
forth. @xref{Spam Back Ends}.
In the Gnus summary buffer, messages that have been identified as spam
always appear with a @samp{$} symbol.
The Spam package divides Gnus groups into three categories: ham
groups, spam groups, and unclassified groups. You should mark each of
the groups you subscribe to as either a ham group or a spam group,
using the @code{spam-contents} group parameter (@pxref{Group
Parameters}). Spam groups have a special property: when you enter a
spam group, all unseen articles are marked as spam. Thus, mail split
into a spam group is automatically marked as spam.
Identifying spam messages is only half of the Spam package's job. The
second half comes into play whenever you exit a group buffer. At this
point, the Spam package does several things:
First, it calls @dfn{spam and ham processors} to process the articles
according to whether they are spam or ham. There is a pair of spam
and ham processors associated with each spam back end, and what the
processors do depends on the back end. At present, the main role of
spam and ham processors is for dictionary-based spam filters: they add
the contents of the messages in the group to the filter's dictionary,
to improve its ability to detect future spam. The @code{spam-process}
group parameter specifies what spam processors to use. @xref{Spam and
Ham Processors}.
If the spam filter failed to mark a spam message, you can mark it
yourself, so that the message is processed as spam when you exit the
group:
@table @kbd
@item M-d
@itemx M s x
@itemx S x
@ -22838,189 +22896,103 @@ You get the following keyboard commands:
@kindex S x
@kindex M s x
@findex gnus-summary-mark-as-spam
@code{gnus-summary-mark-as-spam}.
Mark current article as spam, showing it with the @samp{$} mark.
Whenever you see a spam article, make sure to mark its summary line
with @kbd{M-d} before leaving the group. This is done automatically
for unread articles in @emph{spam} groups.
@item M s t
@itemx S t
@kindex M s t
@kindex S t
@findex spam-bogofilter-score
@code{spam-bogofilter-score}.
You must have Bogofilter installed for that command to work properly.
@xref{Bogofilter}.
@findex gnus-summary-mark-as-spam
Mark current article as spam, showing it with the @samp{$} mark
(@code{gnus-summary-mark-as-spam}).
@end table
Also, when you load @file{spam.el}, you will be able to customize its
variables. Try @code{customize-group} on the @samp{spam} variable
group.
@noindent
Similarly, you can unmark an article if it has been erroneously marked
as spam. @xref{Setting Marks}.
@menu
* Spam ELisp Package Sequence of Events::
* Spam ELisp Package Filtering of Incoming Mail::
* Spam ELisp Package Global Variables::
* Spam ELisp Package Configuration Examples::
* Blacklists and Whitelists::
* BBDB Whitelists::
* Gmane Spam Reporting::
* Anti-spam Hashcash Payments::
* Blackholes::
* Regular Expressions Header Matching::
* Bogofilter::
* ifile spam filtering::
* spam-stat spam filtering::
* SpamOracle::
* Extending the Spam ELisp package::
@end menu
@node Spam ELisp Package Sequence of Events
@subsubsection Spam ELisp Package Sequence of Events
@cindex spam filtering
@cindex spam filtering sequence of events
@cindex spam
You must read this section to understand how @code{spam.el} works.
Do not skip, speed-read, or glance through this section.
There are two @emph{contact points}, if you will, between
@code{spam.el} and the rest of Gnus: checking new mail for spam, and
leaving a group.
Getting new mail is done in one of two ways. You can either split
your incoming mail or you can classify new articles as ham or spam
when you enter the group.
Splitting incoming mail is better suited to mail backends such as
@code{nnml} or @code{nnimap} where new mail appears in a single file
called a @dfn{Spool File}. See @xref{Spam ELisp Package Filtering of
Incoming Mail}.
For backends such as @code{nntp} there is no incoming mail spool, so
an alternate mechanism must be used. This may also happen for
backends where the server is in charge of splitting incoming mail, and
Gnus does not do further splitting. The @code{spam-autodetect} and
@code{spam-autodetect-methods} group parameters (accessible with
@kbd{G c} and @kbd{G p} as usual), and the corresponding variables
@code{gnus-spam-autodetect-methods} and
@code{gnus-spam-autodetect-methods} (accessible with @kbd{M-x
customize-variable} as usual).
When @code{spam-autodetect} is used, it hooks into the process of
entering a group. Thus, entering a group with unseen or unread
articles becomes the substitute for checking incoming mail. Whether
only unseen articles or all unread articles will be processed is
determined by the @code{spam-autodetect-recheck-messages}. When set
to @code{t}, unread messages will be rechecked.
@code{spam-autodetect} grants the user at once more and less control
of spam filtering. The user will have more control over each group's
spam methods, so for instance the @samp{ding} group may have
@code{spam-use-BBDB} as the autodetection method, while the
@samp{suspect} group may have the @code{spam-use-blacklist} and
@code{spam-use-bogofilter} methods enabled. Every article detected to
be spam will be marked with the spam mark @samp{$} and processed on
exit from the group as normal spam. The user has less control over
the @emph{sequence} of checks, as he might with @code{spam-split}.
When the newly split mail goes into groups, or messages are
autodetected to be ham or spam, those groups must be exited (after
entering, if needed) for further spam processing to happen. It
matters whether the group is considered a ham group, a spam group, or
is unclassified, based on its @code{spam-content} parameter
(@pxref{Spam ELisp Package Global Variables}). Spam groups have the
additional characteristic that, when entered, any unseen or unread
articles (depending on the @code{spam-mark-only-unseen-as-spam}
variable) will be marked as spam. Thus, mail split into a spam group
gets automatically marked as spam when you enter the group.
So, when you exit a group, the @code{spam-processors} are applied, if
any are set, and the processed mail is moved to the
@code{ham-process-destination} or the @code{spam-process-destination}
depending on the article's classification. If the
@code{ham-process-destination} or the @code{spam-process-destination},
whichever is appropriate, are @code{nil}, the article is left in the
current group.
If a spam is found in any group (this can be changed to only non-spam
groups with @code{spam-move-spam-nonspam-groups-only}), it is
processed by the active @code{spam-processors} (@pxref{Spam ELisp
Package Global Variables}) when the group is exited. Furthermore, the
spam is moved to the @code{spam-process-destination} (@pxref{Spam
ELisp Package Global Variables}) for further training or deletion.
You have to load the @code{gnus-registry.el} package and enable the
@code{spam-log-to-registry} variable if you want spam to be processed
no more than once. Thus, spam is detected and processed everywhere,
which is what most people want. If the
@code{spam-process-destination} is @code{nil}, the spam is marked as
expired, which is usually the right thing to do.
If spam can not be moved---because of a read-only backend such as
@acronym{NNTP}, for example, it will be copied.
If a ham mail is found in a ham group, as determined by the
@code{ham-marks} parameter, it is processed as ham by the active ham
@code{spam-processor} when the group is exited. With the variables
Normally, a ham message found in a non-ham group is not processed as
ham---the rationale is that it should be moved into a ham group for
further processing (see below). However, you can force these articles
to be processed as ham by setting
@code{spam-process-ham-in-spam-groups} and
@code{spam-process-ham-in-nonham-groups} the behavior can be further
altered so ham found anywhere can be processed. You have to load the
@code{gnus-registry.el} package and enable the
@code{spam-log-to-registry} variable if you want ham to be processed
no more than once. Thus, ham is detected and processed only when
necessary, which is what most people want. More on this in
@xref{Spam ELisp Package Configuration Examples}.
@code{spam-process-ham-in-nonham-groups}.
If ham can not be moved---because of a read-only backend such as
@acronym{NNTP}, for example, it will be copied.
@vindex gnus-ham-process-destinations
@vindex gnus-spam-process-destinations
The second thing that the Spam package does when you exit a group is
to move ham articles out of spam groups, and spam articles out of ham
groups. Ham in a spam group is moved to the group specified by the
variable @code{gnus-ham-process-destinations}, or the group parameter
@code{ham-process-destination}. Spam in a ham group is moved to the
group specified by the variable @code{gnus-spam-process-destinations},
or the group parameter @code{spam-process-destination}. If these
variables are not set, the articles are left in their current group.
If an article cannot not be moved (e.g., with a read-only backend such
as @acronym{NNTP}), it is copied.
If an article is moved to another group, it is processed again when
you visit the new group. Normally, this is not a problem, but if you
want each article to be processed only once, load the
@code{gnus-registry.el} package and set the variable
@code{spam-log-to-registry} to @code{t}. @xref{Spam Package
Configuration Examples}.
Normally, spam groups ignore @code{gnus-spam-process-destinations}.
However, if you set @code{spam-move-spam-nonspam-groups-only} to
@code{nil}, spam will also be moved out of spam groups, depending on
the @code{spam-process-destination} parameter.
The final thing the Spam package does is to mark spam articles as
expired, which is usually the right thing to do.
If all this seems confusing, don't worry. Soon it will be as natural
as typing Lisp one-liners on a neural interface@dots{} err, sorry, that's
50 years in the future yet. Just trust us, it's not so bad.
@node Spam ELisp Package Filtering of Incoming Mail
@subsubsection Spam ELisp Package Filtering of Incoming Mail
@node Filtering Incoming Mail
@subsection Filtering Incoming Mail
@cindex spam filtering
@cindex spam filtering incoming mail
@cindex spam
To use the @file{spam.el} facilities for incoming mail filtering, you
must add the following to your fancy split list
@code{nnmail-split-fancy} or @code{nnimap-split-fancy}:
To use the Spam package to filter incoming mail, you must first set up
fancy mail splitting. @xref{Fancy Mail Splitting}. The Spam package
defines a special splitting function that you can add to your fancy
split variable (either @code{nnmail-split-fancy} or
@code{nnimap-split-fancy}, depending on your mail back end):
@example
(: spam-split)
@end example
Note that the fancy split may be called @code{nnmail-split-fancy} or
@code{nnimap-split-fancy}, depending on whether you use the nnmail or
nnimap back ends to retrieve your mail.
@vindex spam-split-group
@noindent
The @code{spam-split} function scans incoming mail according to your
chosen spam back end(s), and sends messages identified as spam to a
spam group. By default, the spam group is a group named @samp{spam},
but you can change this by customizing @code{spam-split-group}. Make
sure the contents of @code{spam-split-group} are an unqualified group
name. For instance, in an @code{nnimap} server @samp{your-server},
the value @samp{spam} means @samp{nnimap+your-server:spam}. The value
@samp{nnimap+server:spam} is therefore wrong---it gives the group
@samp{nnimap+your-server:nnimap+server:spam}.
Also, @code{spam-split} will not modify incoming mail in any way.
@code{spam-split} does not modify the contents of messages in any way.
The @code{spam-split} function will process incoming mail and send the
mail considered to be spam into the group name given by the variable
@code{spam-split-group}. By default that group name is @samp{spam},
but you can customize @code{spam-split-group}. Make sure the contents
of @code{spam-split-group} are an @emph{unqualified} group name, for
instance in an @code{nnimap} server @samp{your-server} the value
@samp{spam} will turn out to be @samp{nnimap+your-server:spam}. The
value @samp{nnimap+server:spam}, therefore, is wrong and will
actually give you the group
@samp{nnimap+your-server:nnimap+server:spam} which may or may not
work depending on your server's tolerance for strange group names.
@vindex nnimap-split-download-body
Note for IMAP users: if you use the @code{spam-check-bogofilter},
@code{spam-check-ifile}, and @code{spam-check-stat} spam back ends,
you should also set set the variable @code{nnimap-split-download-body}
to @code{t}. These spam back ends are most useful when they can
``scan'' the full message body. By default, the nnimap back end only
retrieves the message headers; @code{nnimap-split-download-body} tells
it to retrieve the message bodies as well. We don't set this by
default because it will slow @acronym{IMAP} down, and that is not an
appropriate decision to make on behalf of the user. @xref{Splitting
in IMAP}.
You can also give @code{spam-split} a parameter,
e.g. @code{spam-use-regex-headers} or @code{"maybe-spam"}. Why is
this useful?
Take these split rules (with @code{spam-use-regex-headers} and
@code{spam-use-blackholes} set):
You have to specify one or more spam back ends for @code{spam-split}
to use, by setting the @code{spam-use-*} variables. @xref{Spam Back
Ends}. Normally, @code{spam-split} simply uses all the spam back ends
you enabled in this way. However, you can tell @code{spam-split} to
use only some of them. Why this is useful? Suppose you are using the
@code{spam-use-regex-headers} and @code{spam-use-blackholes} spam back
ends, and the following split rule:
@example
nnimap-split-fancy '(|
@ -23030,21 +23002,23 @@ Take these split rules (with @code{spam-use-regex-headers} and
"mail")
@end example
Now, the problem is that you want all ding messages to make it to the
ding folder. But that will let obvious spam (for example, spam
detected by SpamAssassin, and @code{spam-use-regex-headers}) through,
when it's sent to the ding list. On the other hand, some messages to
the ding list are from a mail server in the blackhole list, so the
invocation of @code{spam-split} can't be before the ding rule.
@noindent
The problem is that you want all ding messages to make it to the ding
folder. But that will let obvious spam (for example, spam detected by
SpamAssassin, and @code{spam-use-regex-headers}) through, when it's
sent to the ding list. On the other hand, some messages to the ding
list are from a mail server in the blackhole list, so the invocation
of @code{spam-split} can't be before the ding rule.
You can let SpamAssassin headers supersede ding rules, but all other
@code{spam-split} rules (including a second invocation of the
regex-headers check) will be after the ding rule:
The solution is to let SpamAssassin headers supersede ding rules, and
perform the other @code{spam-split} rules (including a second
invocation of the regex-headers check) after the ding rule. This is
done by passing a parameter to @code{spam-split}:
@example
nnimap-split-fancy
'(|
;; @r{all spam detected by @code{spam-use-regex-headers} goes to @samp{regex-spam}}
;; @r{spam detected by @code{spam-use-regex-headers} goes to @samp{regex-spam}}
(: spam-split "regex-spam" 'spam-use-regex-headers)
(any "ding" "ding")
;; @r{all other spam detected by spam-split goes to @code{spam-split-group}}
@ -23053,58 +23027,68 @@ nnimap-split-fancy
"mail")
@end example
@noindent
This lets you invoke specific @code{spam-split} checks depending on
your particular needs, and to target the results of those checks to a
your particular needs, and target the results of those checks to a
particular spam group. You don't have to throw all mail into all the
spam tests. Another reason why this is nice is that messages to
mailing lists you have rules for don't have to have resource-intensive
blackhole checks performed on them. You could also specify different
spam checks for your nnmail split vs. your nnimap split. Go crazy.
You should still have specific checks such as
@code{spam-use-regex-headers} set to @code{t}, even if you
specifically invoke @code{spam-split} with the check. The reason is
that when loading @file{spam.el}, some conditional loading is done
depending on what @code{spam-use-xyz} variables you have set. This
is usually not critical, though.
You should set the @code{spam-use-*} variables for whatever spam back
ends you intend to use. The reason is that when loading
@file{spam.el}, some conditional loading is done depending on what
@code{spam-use-xyz} variables you have set. @xref{Spam Back Ends}.
@emph{Note for IMAP users}
@c @emph{TODO: spam.el needs to provide a uniform way of training all the
@c statistical databases. Some have that functionality built-in, others
@c don't.}
The boolean variable @code{nnimap-split-download-body} needs to be
set, if you want to split based on the whole message instead of just
the headers. By default, the nnimap back end will only retrieve the
message headers. If you use @code{spam-check-bogofilter},
@code{spam-check-ifile}, or @code{spam-check-stat} (the splitters that
can benefit from the full message body), you should set this variable.
It is not set by default because it will slow @acronym{IMAP} down, and
that is not an appropriate decision to make on behalf of the user.
@node Detecting Spam in Groups
@subsection Detecting Spam in Groups
@xref{Splitting in IMAP}.
To detect spam when visiting a group, set the group's
@code{spam-autodetect} and @code{spam-autodetect-methods} group
parameters. These are accessible with @kbd{G c} or @kbd{G p}, as
usual (@pxref{Group Parameters}).
@emph{TODO: spam.el needs to provide a uniform way of training all the
statistical databases. Some have that functionality built-in, others
don't.}
You should set the @code{spam-use-*} variables for whatever spam back
ends you intend to use. The reason is that when loading
@file{spam.el}, some conditional loading is done depending on what
@code{spam-use-xyz} variables you have set.
@node Spam ELisp Package Global Variables
@subsubsection Spam ELisp Package Global Variables
By default, only unseen articles are processed for spam. You can
force Gnus to recheck all messages in the group by setting the
variable @code{spam-autodetect-recheck-messages} to @code{t}.
If you use the @code{spam-autodetect} method of checking for spam, you
can specify different spam detection methods for different groups.
For instance, the @samp{ding} group may have @code{spam-use-BBDB} as
the autodetection method, while the @samp{suspect} group may have the
@code{spam-use-blacklist} and @code{spam-use-bogofilter} methods
enabled. Unlike with @code{spam-split}, you don't have any control
over the @emph{sequence} of checks, but this is probably unimportant.
@node Spam and Ham Processors
@subsection Spam and Ham Processors
@cindex spam filtering
@cindex spam filtering variables
@cindex spam variables
@cindex spam
@vindex gnus-spam-process-newsgroups
The concepts of ham processors and spam processors are very important.
Ham processors and spam processors for a group can be set with the
@code{spam-process} group parameter, or the
@code{gnus-spam-process-newsgroups} variable. Ham processors take
mail known to be non-spam (@emph{ham}) and process it in some way so
that later similar mail will also be considered non-spam. Spam
processors take mail known to be spam and process it so similar spam
will be detected later.
Spam and ham processors specify special actions to take when you exit
a group buffer. Spam processors act on spam messages, and ham
processors on ham messages. At present, the main role of these
processors is to update the dictionaries of dictionary-based spam back
ends such as Bogofilter (@pxref{Bogofilter}) and the Spam Statistics
package (@pxref{Spam Statistics Filtering}).
The format of the spam or ham processor entry used to be a symbol,
but now it is a @sc{cons} cell. See the individual spam processor entries
for more information.
The spam and ham processors that apply to each group are determined by
the group's@code{spam-process} group parameter. If this group
parameter is not defined, they are determined by the variable
@code{gnus-spam-process-newsgroups}.
@vindex gnus-spam-newsgroup-contents
Gnus learns from the spam you get. You have to collect your spam in
@ -23258,8 +23242,8 @@ When autodetecting spam, this variable tells @code{spam.el} whether
only unseen articles or all unread articles should be checked for
spam. It is recommended that you leave it off.
@node Spam ELisp Package Configuration Examples
@subsubsection Spam ELisp Package Configuration Examples
@node Spam Package Configuration Examples
@subsection Spam Package Configuration Examples
@cindex spam filtering
@cindex spam filtering configuration examples
@cindex spam configuration examples
@ -23384,11 +23368,11 @@ bogofilter or DCC).
Because of the @code{gnus-group-spam-classification-spam} entry, all
messages are marked as spam (with @code{$}). When I find a false
positive, I mark the message with some other ham mark (@code{ham-marks},
@ref{Spam ELisp Package Global Variables}). On group exit, those
messages are copied to both groups, @samp{INBOX} (where I want to have
the article) and @samp{training.ham} (for training bogofilter) and
deleted from the @samp{spam.detected} folder.
positive, I mark the message with some other ham mark
(@code{ham-marks}, @ref{Spam and Ham Processors}). On group exit,
those messages are copied to both groups, @samp{INBOX} (where I want
to have the article) and @samp{training.ham} (for training bogofilter)
and deleted from the @samp{spam.detected} folder.
The @code{gnus-article-sort-by-chars} entry simplifies detection of
false positives for me. I receive lots of worms (sweN, @dots{}), that all
@ -23424,6 +23408,29 @@ through my local news server (leafnode). I.e. the article numbers are
not the same as on news.gmane.org, thus @code{spam-report.el} has to check
the @code{X-Report-Spam} header to find the correct number.
@node Spam Back Ends
@subsection Spam Back Ends
@cindex spam back ends
The spam package offers a variety of back ends for detecting spam.
Each back end defines a set of methods for detecting spam
(@pxref{Filtering Incoming Mail}, @pxref{Detecting Spam in Groups}),
and a pair of spam and ham processors (@pxref{Spam and Ham
Processors}).
@menu
* Blacklists and Whitelists::
* BBDB Whitelists::
* Gmane Spam Reporting::
* Anti-spam Hashcash Payments::
* Blackholes::
* Regular Expressions Header Matching::
* Bogofilter::
* ifile spam filtering::
* Spam Statistics Filtering::
* SpamOracle::
@end menu
@node Blacklists and Whitelists
@subsubsection Blacklists and Whitelists
@cindex spam filtering
@ -23728,6 +23735,15 @@ You should not enable this if you use @code{spam-use-bogofilter-headers}.
@end defvar
@table @kbd
@item M s t
@itemx S t
@kindex M s t
@kindex S t
@findex spam-bogofilter-score
Get the Bogofilter spamicity score (@code{spam-bogofilter-score}).
@end table
@defvar spam-use-bogofilter-headers
Set this variable if you want @code{spam-split} to use Eric Raymond's
@ -23829,20 +23845,21 @@ purpose. A ham and a spam processor are provided, plus the
should be used. The 1.2.1 version of ifile was used to test this
functionality.
@node spam-stat spam filtering
@subsubsection spam-stat spam filtering
@node Spam Statistics Filtering
@subsubsection Spam Statistics Filtering
@cindex spam filtering
@cindex spam-stat, spam filtering
@cindex spam-stat
@cindex spam
@xref{Filtering Spam Using Statistics with spam-stat}.
This back end uses the Spam Statistics Emacs Lisp package to perform
statistics-based filtering (@pxref{Spam Statistics Package}). Before
using this, you may want to perform some additional steps to
initialize your Spam Statistics dictionary. @xref{Creating a
spam-stat dictionary}.
@defvar spam-use-stat
Enable this variable if you want @code{spam-split} to use
spam-stat.el, an Emacs Lisp statistical analyzer.
@end defvar
@defvar gnus-group-spam-exit-processor-stat
@ -23902,18 +23919,17 @@ One possibility is to run SpamOracle as a @code{:prescript} from the
@xref{Mail Source Specifiers}, (@pxref{SpamAssassin}). This method has
the advantage that the user can see the @emph{X-Spam} headers.
The easiest method is to make @file{spam.el} (@pxref{Filtering Spam
Using The Spam ELisp Package}) call SpamOracle.
The easiest method is to make @file{spam.el} (@pxref{Spam Package})
call SpamOracle.
@vindex spam-use-spamoracle
To enable SpamOracle usage by @file{spam.el}, set the variable
@code{spam-use-spamoracle} to @code{t} and configure the
@code{nnmail-split-fancy} or @code{nnimap-split-fancy} as described in
the section @xref{Filtering Spam Using The Spam ELisp Package}. In
this example the @samp{INBOX} of an nnimap server is filtered using
SpamOracle. Mails recognized as spam mails will be moved to
@code{spam-split-group}, @samp{Junk} in this case. Ham messages stay
in @samp{INBOX}:
@code{nnmail-split-fancy} or @code{nnimap-split-fancy}. @xref{Spam
Package}. In this example the @samp{INBOX} of an nnimap server is
filtered using SpamOracle. Mails recognized as spam mails will be
moved to @code{spam-split-group}, @samp{Junk} in this case. Ham
messages stay in @samp{INBOX}:
@example
(setq spam-use-spamoracle t
@ -23945,14 +23961,14 @@ database to live somewhere special, set
SpamOracle employs a statistical algorithm to determine whether a
message is spam or ham. In order to get good results, meaning few
false hits or misses, SpamOracle needs training. SpamOracle learns the
characteristics of your spam mails. Using the @emph{add} mode
false hits or misses, SpamOracle needs training. SpamOracle learns
the characteristics of your spam mails. Using the @emph{add} mode
(training mode) one has to feed good (ham) and spam mails to
SpamOracle. This can be done by pressing @kbd{|} in the Summary buffer
and pipe the mail to a SpamOracle process or using @file{spam.el}'s
spam- and ham-processors, which is much more convenient. For a
detailed description of spam- and ham-processors, @xref{Filtering Spam
Using The Spam ELisp Package}.
SpamOracle. This can be done by pressing @kbd{|} in the Summary
buffer and pipe the mail to a SpamOracle process or using
@file{spam.el}'s spam- and ham-processors, which is much more
convenient. For a detailed description of spam- and ham-processors,
@xref{Spam Package}.
@defvar gnus-group-spam-exit-processor-spamoracle
Add this symbol to a group's @code{spam-process} parameter by
@ -24001,8 +24017,8 @@ the user marks some messages as spam messages, these messages will be
processed by SpamOracle. The processor sends the messages to
SpamOracle as new samples for spam.
@node Extending the Spam ELisp package
@subsubsection Extending the Spam ELisp package
@node Extending the Spam package
@subsection Extending the Spam package
@cindex spam filtering
@cindex spam elisp package, extending
@cindex extending the spam elisp package
@ -24109,9 +24125,8 @@ to the @code{spam-autodetect-methods} group parameter in
@end enumerate
@node Filtering Spam Using Statistics with spam-stat
@subsection Filtering Spam Using Statistics with spam-stat
@node Spam Statistics Package
@subsection Spam Statistics Package
@cindex Paul Graham
@cindex Graham, Paul
@cindex naive Bayesian spam filtering
@ -24138,7 +24153,11 @@ non-spam mail. Use the 15 most conspicuous words, compute the total
probability of the mail being spam. If this probability is higher
than a certain threshold, the mail is considered to be spam.
Gnus supports this kind of filtering. But it needs some setting up.
The Spam Statistics package adds support to Gnus for this kind of
filtering. It can be used as one of the back ends of the Spam package
(@pxref{Spam Package}), or by itself.
Before using the Spam Statistics package, you need to set it up.
First, you need two collections of your mail, one with spam, one with
non-spam. Then you need to create a dictionary using these two
collections, and save it. And last but not least, you need to use
@ -24224,8 +24243,10 @@ The filename used to store the dictionary. This defaults to
@node Splitting mail using spam-stat
@subsubsection Splitting mail using spam-stat
In order to use @code{spam-stat} to split your mail, you need to add the
following to your @file{~/.gnus.el} file:
This section describes how to use the Spam statistics
@emph{independently} of the @xref{Spam Package}.
First, add the following to your @file{~/.gnus.el} file:
@lisp
(require 'spam-stat)