Sunday, February 25, 2018

spamprobe - Bayesian spam filter

SpamProbe is a spam filter that use Bayesian technique to analyst of the frequency of words used in spam and non-spam emails of an individual person. The process is completely  automatic and tailors itself to the kinds of emails that each person receives.

1- install
$ sudo apt-get install spamprobe db-util

2- download training dataset from link below

3- create spamprobe database
$ spamprobe -d spamtr/ create-db

4- in the dataset there's a file call SPAMTrain.label, so if 1 , it is good email,
and if 0 it mean spam email

so move the good email to /home/user1/good directory and
move the spam email to /home/user1/spam directory

5- now start the training
$ spamprobe -d spamtr/ good /home/user1/good/* 
$ spamprobe -d spamtr/ spam /home/user1/spam/* 

6- after complete the training we can use the TESTING email of the dataset that we have downloaded. to test
$ spamprobe -d spamtr/ score tsp/CSDMC2010_SPAM/CSDMC2010_SPAM/TESTING/TEST_00009.eml
SPAM 0.9999089 c29f0ed737ef3d47a1402302c015971f 

$ spamprobe -d spamtr/ score tsp/CSDMC2010_SPAM/CSDMC2010_SPAM/TESTING/TEST_00000.eml
GOOD 0.0000013 bea5f60fa305f71d1dc1bcc64a557292

7- to inspect its database
 $ db_dump -l sp_words 
 $ db_dump -p sp_words

No comments:

Post a Comment