Friday, April 01, 2005

I have a datamining assignment to do as I upgrade my statistics education. The assignment is to filter spam. I have been given some 4601 observations or emails. I have 57 variables based on word frequency, character frequency and capitals frequency and length. Then the 58th variable is 1 or 0 depending on whether the email is spam or not. So this is supervised learning. I was able to open this data set in both R and SAS. In SAS I did a proc means and also a proc chart. I have 58 histograms coming out of SAS. I am printing these histograms 4 per page on both sides of the page. I am also printing the results of the proc means.

