And now for something completely different:
This weekend, my mail server was slammed by a spammer using a rogue account to create hundreds of thousands of spam emails that jammed my outbound mail queue. Mixed with the spam were valuable customer emails, so I had to sort through all the mail ASAP and delete anything that wasn’t legit.
First I tried a simple loop that loaded each file and deleted it if it contained a bad string. But that was taking a while, so I made my filter multithreaded.
First, I load a list of files to process:
string[] files = Directory.GetFiles(directory); Console.WriteLine(files.Length + " files."); |
(You can iterate through the files instead, but I wanted to see how many files there are.)
I instantiate the class with the BackgroundWorker:
DeleteProcess DeleteProcess = new DeleteProcess(); |
Now, I loop through the files, checking each for spam:
foreach (string mFile in files) { if (CheckBlacklist(mFile)) { DeleteProcess.filesToDelete.Add(mFile); if (!DeleteProcess.worker.IsBusy) DeleteProcess.worker.RunWorkerAsync(); } } |
Instead of loading the whole file, I just read it until I determine that it is spam. Since 99% of messages were spam, this went pretty quickly:
private static bool CheckBlacklist(string mFile) { using (StreamReader reader = new StreamReader(new FileStream(mFile, FileMode.Open, FileAccess.Read))) { string line; while ((line = reader.ReadLine()) != null) { if (line.Contains("NIGERIA") || line.Contains("Message Delivery Delay")) return true; } } return false; } |
(By using FileAccess.Read, I speed things up a bit.)
Now for the delete thread. Here is how it’s wired up:
public List filesToDelete = new List(); public BackgroundWorker worker = new BackgroundWorker { WorkerReportsProgress = true, WorkerSupportsCancellation = true }; public DeleteProcess() { worker.DoWork += worker_DoWork; worker.ProgressChanged += worker_ProgressChanged; worker.RunWorkerCompleted += worker_RunWorkerCompleted; } |
The worker thread should get the first file name from the queue, delete the file, and then delete the filename list item:
private void worker_DoWork(object sender, DoWorkEventArgs e) { while (filesToDelete.Count > 0) { worker.ReportProgress(0, filesToDelete[0].Replace(Program.directory, string.Empty)); File.Delete(filesToDelete[0]); File.Delete(filesToDelete[0].Replace(@"OutgoingMessages", @"Outgoing")); filesToDelete.RemoveAt(0); } } |
When we’re done, we count the remaining files:
Console.WriteLine(Directory.GetFiles(Program.directory).Length + " files left."); |
It’s possible to create a collection of BackgroundWorkers if you want to utilize multiple CPU’s, but the bottleneck in this case was the disk IO, so it wouldn’t help.
eh, mu bien…no conocía el método IsBusy() ;D
Muchas Gracias
can you explain how can you do that ?
“It’s possible to create a collection of BackgroundWorkers if you want to utilize multiple CPU’s,”
I am searching multi thread method on backgroundworker. but there arent any source..
Good article.. for more information
Visit: http://csharp-multithreading.blogspot.com