Tool for parsing SMTP logs that finds bounces - parsing

Our web application sends e-mails. We have lots of users, and we get lots of bounces. For example, user changes company and his company e-mail is no longer valid.
To find bounces, I parse SMTP log file with log parser. The logs come from Microsoft SMTP server.
Some bounces are great, like 550+#5.1.0+Address+rejected+user#domain.com. There is user#domain.com in bounce.
But some do not have e-mail in error message, like 550+No+such+recipient.
I have created simple Ruby script that parses logs (uses log parser) to find which mail caused something like 550+No+such+recipient.
I am just surprised that I could not find a tool that does it. I have found tools like Zabbix and Splunk for log analysis, but they look like overkill for such simple task.
Anybody knows a tool that would parse SMTP logs, find bounces and e-mails that cause them?

As far as I can see, log file analysis is really only useful to detect mails which are rejected at the SMTP session level. What about bounces which occur after the remote MTA has accepted a message for delivery but subsequently fails to deliver it?
We use the following set up to detect and classify all bounces after delivery to the remote MTA.
All outgoing mails are given a unique return-path header which, when decoded, identifies the recipient email address and the particular mailing.
An Apache James server which receives mail returned to the returned-path address.
A custom mailet, developed in Java and executing within Apache James which decodes the to address, sends the email text to boogietools bounce studio for bounce type classification and then persists the results to our database.
It works very, very well. We are able to detect permanent hard bounces and transient soft bounces which are further classified into very granular bounce types such as spam rejections, out of office replies etc.

This article is exactly what you are looking for. It is based on the great tool log parser.
Log parser is a powerful, versatile
tool that provides universal query
access to text-based data such as log
files, XML files and CSV files, as
well as key data sources on the
Windows® operating system such as the
Event Log, the Registry, the file
system, and Active Directory®. You
tell Log Parser what information you
need and how you want it processed.
The results of your query can be
custom-formatted in text based output,
or they can be persisted to more
specialty targets like SQL, SYSLOG, or
a chart. Most software is designed to
accomplish a limited number of
specific tasks. Log Parser is
different... the number of ways it can
be used is limited only by the needs
and imagination of the user. The
world is your database with Log
Parser.

You don't want to parse the logs to try and identify bounces. You will have both false negatives and false positives if you just look at logs.
Bounces might be generated downstream from the server you deliver to. They will look like successful deliveries in your outgoing server logs.
The naive pattern match for bounces in incoming logs (from the null sender, to one of your VERP-ed addresses) will be inaccurate. There are a few reasons why:
There will be delay warnings mixed in with actual failure bounces.
Most Out-of-Office and similar autoresponders use the null sender to prevent battlin-bots syndrome.
Similarly, challenge-response systems (like *spit* boxbe.com) tend to use the null sender.
Your VERP-ed sender addresses, if they are persistent per recipient, will get harvested by spammers and come back as either spam targets or backscatter.
So, sadly, the only reliable way to do it is to examine the bounce messages themselves. Most of them will have a "report/delivery-status" MIME part as per RFC1894, and depending on your language of choice there are probably libraries or modules to help with other bounce formats. The only one I have direct experience with is the Perl Mail::DeliveryStatus::BounceParser module, which works well enough.

I like logParser. When I need to parse for somthing very specific or custom or using regular expressions, I use biterScripting. They actually have some sample scripts that I used to get started. One is at http://www.biterscripting.com/Download/SS_WebLogParser.txt.

I based a bounce counter program on this post, only to find out later that this method doesn't actually work for high-volume senders because SMTP logs are not in sequential order. There's more about it in my blog post: Email Bounce Detection in SMTP Logs and Why It Is Impossible.

Related

Logging in vernemq plugin

I am trying to implement logging on connected users in my vernemq client using erlang. From documentation, I found that this could be bad, due to the scalability of the client and the assumption that there might be a lot of clients connecting and disconnecting. This is not my case, I will just have a bunch of clients but a lot of messages. Anyway, to my question. Is it possible to change the log file when using error_logger? Or should I use a different module for logging? Log file can be in any location if it had to, but I need it separated from vernemqs console.log. A followup question would be, can I somehow get a floating window on logs? I don't need to keep logs from previous year and I don't want to manually clean them every day or week or something like that
Thanks for any responses
From OTP21 on, you should use logger instead of error_logger, although the error_logger API is kept for compatibility (it justs uses logger under the hood).
With logger, which you can configure with the system configuration, you can use file backends such as logger_std_h (check the example configurations).
In logger_std_h you can set file rotation.

Semantics of an HL7 enabled Point of Care device - is this the right way to do it?

I am implementing automated HL7v2.7 reporting of observations on a point of care device. The way this works is by sending an "ORU^R30 Unsolicited Point-Of-Care Observation Message without Existing Order - Place an Order" message to what I'm assuming will be a laboratory information system or an associated channel in an integration engine. I'm currently going to have the device ask for IP/port numbers to the LIS and MPI/their associated connections on first set-up - our device is going to communicate over TCP/LLP.
Is this the smart way to do all this? I've never worked with HL7 or any kind of HIS before.
I appreciate any possible insight. This isn't the stuff you can learn about in the standard, and I don't think I can just email Epic and ask them how they design EHR/HIS systems.
Thanks!
Message Content: ORU^R30 is not a commonly used message type, but the structure is close enough to R01 that most systems will be able to receive it. Focus on making sure you collect as much patient demographics and the visit number, or better yet scan both from the patient's wristband barcode. You must have patient and visit to file the observations.
Transmission: It's safest to just do MLLP over TCP, it will speed up your installs because that's what everybody else does. The alternative is having the health system write something custom to receive the data, usually via the interface engine.
Network: It sounds like you're thinking of putting the connection info on the device. This probably is a bad idea, I would build some kind of aggregator service that actually sends data to the EHR, that way you don't have to deal with multiple devices trying to get through firewalls, etc.

reading from multiple imap.gmail.com from the same fetchmail client

For my portfolio software I have been using fetchmail to read from a Google email account over IMAP and life has been great. Thanks to the miracle of idle connection supported by imap3, my triggers fire in near-realtime due to server push, much sooner than periodic polling would allow otherwise.
In my basic .fetchmailrc setup, in which a brokerage customer's account emails trade notifications to a dedicated Gmail/Google Apps box, I've had
poll imap.gmail.com proto imap user "youraddress#yourdomain-OR-gmail.com" pass "yoMama" keep nofetchall ssl idle mimedecode limit 29000 no rewrite mda "myCustomSpecialMDAhandler.sh %F %T"
Trouble is, now I need to support reading from multiple email boxes, and hand off the emails to other specialized MDA scripts I wrote. No problem, just add more poll lines to .fetchmailrc, right? Well that doesn't work when the other accounts also use imap.gmail.com. What ends up happening is that while one account reads fine (and not necessary the first one listed, though usually yes), the other is getting "socket error" all day and the emails remain untouched, unread. I can't figure out why and not even sure if it's some mechanism at imap.gmail.com or not, eg. limiting to one IMAP connection from a host. That doesn't seem right since I have kept IMAP connections to many separate Gmail & Google Apps accounts from the same client for years (like Thunderbird) and never noticed this exclusivity problem.
I haven't tried launching multiple fetchmail daemons using separate -f config files (assuming they wouldn't conflict), or deploying one or more getmail and other similar email fetchers in addition. Still trying to avoid that kind of mess--unscaleable the more boxes I have to monitor.
Don't have the reference offhand but somewhere in fetchmail's docs I recall reading that idle is not so much an imap feature as a fetchmail optional trick, which has a (nasty for me) side effect of choking off all other defined accounts from polling until the connection is cut off by some external event or timeout. So at least that would vindicate Google.
Credit to Carl's Whine Rack blog for some tips.
For now I use killall fetchmail; fetchmail -f fetcher.$[$RANDOM % $numaccounts].rc periodically from crontab to cycle reading accounts each defined individually in fetcher.1.rc, fetcher.2.rc, etc. Acceptable while email events are relatively infrequent.

Intercept Print Jobs

We have some computers on which we charge for printing documents. When a user prints, I would like to intercept the print job, prompt them for their username / password so I can charge their account, then allow the print job to continue through to the printer.
How can this be accomplished? Is it possible to write such a utility in .NET?
You really need to look at creating a Port Monitor for this. Far from simple. You could look at RedMon. BTW: Many printer vendors offer solutions to this which use codes that are embedded into the print stream (PCL/PS) and the data is collected and retained on the printer.
For example, Xerox has something called Standard Accounting. When enabled in the driver it embeds PJL codes like this:
#PJL COMMENT OID_ATT_ACCOUNTING_INFORMATION_AVP "XRX_USERID,xxxx";
Once the job has been printed the device makes reference to the user, number of pages etc. which can then be reported on.
The problem you will run into when doing this on the workstation / server is that detecting the number of pages printed can be difficult. If you are trying, for example, to charge by the page you might be able to parse the number of pages from the file, or run through a PCL or PS RIP and determine but if they have a flag for 2up or 4up on the page and that work is done by the printer and not the driver, you will charge the client for 4 pages when they really only printed 1. That is one of the many pitfalls.

Printing from one Client to another Client via the Server

I don't know if it sounds crazy, but here's the scenario -
I need to print a document over the internet. My pc ClientX initiates the process using the web browser to access a ServerY on the internet and the printer is connected to a ClientZ (may be yours).
1. The document is stored on ServerY.
2. ClientZ is purely a cliet; no IIS, no print server etc.
3. I have the specific details of ClientZ, IP, Port, etc.
4. It'll be completely a server side application (and no client-side on ClientZ) with ASP.NET & C#
- so, is it possible? If yes, please give some clue. Thanks advanced.
This is kind of to big of a question for SO but basically what you need to do is
upload files to the server -- trivial
do some stuff to figure out if they are allowed to print the document -- trivial to hard depending on scope
add items to a queue for printing and associate them with a user/session -- easy
render and print the document -- trivial to hard depending on scope
notify the user that the document has been printed
handling errors
the big unknowns here are scope, if this is for a school project you probably don't have to worry about billing or queue priority in step two. If its for a commercial product billing can be a significant subsystem in its self.
the difficulty in step 4 depends directly on what formats you are going to support as many formats are going to require document specific libraries or applications. There are also security considerations here if this is a commercial product since it isn't safe to try to render all types of files.
Notifications can be easy or hard depending on how you want to do it. You can post back to the html page, but depending on how long its going to take for a job to complete it might be nice to have an email option as well.
You also need to think about errors. What is going to happen when paper or toner runs out or when someone tries to print something on A4 paper? Someone has to be notified so that jobs don't just build up.
On the server I would run just the user interaction piece on the web and have a "print daemon" running as a service to manage getting the documents printed and monitoring their status. I would use WCF to do IPC between the two.
Within the print daemon you are going to need a set of components to print different kinds of documents. I would make one assembly per type (or cluster of types) and load them into your service as plugins using MEF.
sorry this is so general, but you are asking a pretty general and difficult to answer question.

Resources