Collecting metrics for your application - monitoring

Curious how people are integrating sending of metrics to graphite. It appears most are using a client (many available on statsd github) that sends to statsd which sends on to carbon.
My question is do you think it is ok to include this "cross cutting concern" as such directly into your code. What I am seeing is my application code has went from been nice and clean to now not so nice and clean with code to gather metrics intertwined with my business code.
Any thoughts?

I do think that it's fine to include code to send business metrics in your application code. As #tomer-peled says in his comment, there is a close analogy with logging.
However I understand your reluctance to scatter this stuff around and possibly obfuscate the code. My approach in situations like this is to accept a certain amount of mess to begin with, then as more examples emerge I try to identify emergent structures which point to abstractions that would clean things up a bit.

Related

log analysis for rails

i have a rails app which makes heavy use of activeresource and httparty to make api calls.
Is there any library/extension to log the requests and parse them, so that log analysis becomes easier and automated.
RailsLogAnalyser is good but what about extra calls, what are the conventions?
Something like a opensource/self-hosted alternative to newrelic, but with extensions to plug in your own logging.
EDIT: let me clarify:
1) if we use activereresource it logs the calls in a certain format but any plain-bones http calls u make will not follow this convention.
2) a analytics/logging software that will make sense of the logs and have some metric of the number of calls made and count the number of calls etc.
3) will it support syslog, syslog-ng. Any other distributed logging framework.
I have started playing around with rails-analyzer but I am not sure if that will meet your needs or not. It seems to be aimed more at looking for optimization points but I have not delved enough to see what automated portions there are yet.
I'm using this, check this out maybe it will be helpfull for you this https://github.com/wvanbergen/request-log-analyzer

How can I test if my web application could handle heavy traffic?

What would be a proper way to simulate a large number of requests to test if my web application can handle it?
You could try using Microsoft's WCAT tools. Look here: http://support.microsoft.com/kb/231282
They're free, too. That's always nice.
Depending on your budget, you may be interested in some load testing software designed for this. A Google search brings up all sorts of alternatives. This is probably the best way to do it.
This one has a free trial version and isn't too pricey, but I would recommend shopping around first.
I've used JMeter in the past, and I find it to be very useful for stress/load testing as website, even ones written in ASP.NET (with or without MVC).
In general you would want to (with any tool) write a script of what an average user of your site would do. You may even end up creating multiple of these scripts. Tools like JMeter even allow for a random element to be added to a script. With these scripts created a load testing tool can then simulate as many users as you desire hitting your site.
I would recommend allow JMeter to slowly ramp up the number of concurrent users and watch the response time graph. At the point where the response time starts increasing too highly is at the point where you've hit the maximum number of users (given you scripts) that your site can handle.
ab and httperf are two, more unixy options, if you don't mind delving in that direction.
There's a nice screencast for using httperf by peepcode.
Use the load testing tools from Visual Studio Team System. 2010 if you can get it.
The tools are great to use and provide wonderful instrumentation. There is also a programming model to go with the tools, allowing you to make some very complex testing scenarios possible.
Post the URL on stackoverflow.
Make it sound like a challenge, so lots of people come check it out: "Can you find the hidden performance problem in this app?"

Seeking suggestion for my graduation project in Web development

I have to confirm the detail of my gradutaion project recently.
My setup a goal for myself, that is it should have values( maybe as a opensource project or tools that can be use by others).
Can you suggest some ideas or projects pertaining to one of :
Web architect, Social Media, Ruby, ROR, Testing.
Thanks!:D
First choose something that both interests you and is in the scope of your abilities.
After you have made such a choice, formalize the decision, perform research and, build requirements; at this stage one can still set "how big a bite they can chew". Most professors I have dealt with are understanding of partial implementations as long as the expectations have been previously established.
Finally, decided on the tools/language and approach for implementation that best fits in the requirement and resources (this includes your time, desired level of effort vs payout, and ability).
I personally find web work absolutely dull, but if I were to write something new, by choice, that was "web-related" and "social" it would be a multi-user interactive whiteboard which is in turn an extension of a real-time collaborative document. (I actually used this as one of my own projects, albeit I focused on a specific protocol implementation.)
i just had this thing a while ago .... and i really needed some help with that same problem ....
i gut a couple of ideas, witch i already used one of them, so i'll suggest the other:
its a network monitoring system based on "SNMP" protocol, gets it's data from the snmp agent on the desired machine (witch can be a computer, a router, a printer, ... ,any thing connected to the network), and alert the administrator (when somthing wrong is there like too many ports are open, or denial of service problem, or too many tcp packets, so it might be a tcp ping problem, ...) with any way u would like to (email, sms, a live ajax warning, ...) ...
sorry .... it sounds messy, but basically it will be like the "CACTI" or the "openNMS" systems (just google them), and it's based on alot of technologies ,thing like: ruby, mysql(to save the actions and to have users DB), linux(i would use Debian), SNMP agents, cron (to schedule the basic system working), SSH/telnet (take a reaction of some harmful action), PHP/RubyonRails to build an web interface that can also connect to your database, ...
i know it sounded like a big fat thing to do, but it's not that hard .... i can provide more things if u want, caus i worked some kind of a specification for this thing.
When I was in college, I used to look into a lot of the programming contests (which involved 3-4 months of projects). Recently came across https://tgmc.in/project_scenario.php. Quite possible that you can get some ideas after reading these project descriptions!

Web Server Log Analysis Tool

Any suggestions for an accurate Web Log analysis tool to generate reports on the IIS logs? We used WebTrends, but I don't feel it was accurate.
To analyze weblogs, I don't think you can go wrong with Analog: http://www.analog.cx/
If you are analyzing your own logs, which are often huge files, you will want the fastest analyzer you can find. Analog is fast.
You'll want one that's been around awhile and is still supported. Analog just celebrated its 10'th birthday.
Analog claims to be the most popular logfile analyser in the world.
Multi-languages.
Did I say its free and open source?
As far as accuracy goes, no tool gives perfect results. Javascript fails often in catching hits. Trying to track individual people's paths through a website (i.e. for Analytics purposes) is fraught with problems. And even trying to differentiate hits versus visits and screening out the bots is all more of a black art than a science.
What is best is simply to have a tool that gives decent basic statistics that tell you what you need to know.
I've looked at other tools, such as Deep Log Analyzer: http://www.deep-software.com/, which attempts to do analytics from your weblogs. But speed was a problem. They claim their new version 3.5 - April 2008, which I didn't try, has improved performance. The big advantage of a program like this is the advanced reporting you can do, including custom SQL requests. You have to purchase their professional version ($200) to do most of the analytics and custom queries. If Analog is too simple for you, then try the free version of Deep Log Analyzer.
And you can also try Microsoft's own Log Parser, as was the recommended answer in: https://stackoverflow.com/questions/157677/a-good-iis-log-viewer-for-large-log-files.
But you will need some extra skills to use it.
What are you wanting to analyze from your logs? There are a bunch of tools out there - free or paid for - that will go through the logs and spit out a great variety of figures. Some have real meaning, others are best used with a grain of salt.
What none will show you is "How many people are actually reading my wonderful web pages". Those that attempt to show "distinct site visitors" or any detailed metrics are at best a rough approximation to an indication of a vague trend...
But for what it's worth, we use Analog.
SHORT ANSWER:
You are correct to question the results; log analysis is not adequate to report actual traffic.
LONGER ANSWER:
WebTrends is a great tool for what it delivers. But as a previous administrator of a WebTrends installation, I found that web logs are notoriously bad at capturing metrics of interest.
For instance, if there exists any caching in your web delivery stack (or on the consumers side-- *I'm shaking my fist at YOU, AOL!), then your web logs are instantly non-reflective of your site's actual activity. This is because log analysis assumes that all user consumption will translate to an HTTP request back to the web server-- and thus having been recorded in the IIS logs. In the case of a cache, this would not be the case.
In the future if you want more reliable results, you ultimately need to ensure that there exists a way to bust any caching strategy. The obvious answer is dynamic content. But if you do not want to rewrite all of your content in such a fashion, just ensure your web traffic analysis uses a dynamic call.
WebTrends actually offers a solution to this problem, called SDC server. This is exactly what Google Analytics offers as well-- it's a javascript call back to the analysis server.
...I could go for days on this. If you want more specific information, comment back. ;)
EDIT: With WebTrends, specifically, it is quite important to configure session tracking beyond their default IP/userAgent configuration. If your web server assigns a session cookie, you will find this will increase your reliability; especially for differentiating between users which may sit behind the same NAT.
I have had really good luck with SmarterStats, from SmarterTools.
There is a logging package for free from MSFT for viewing this information using SQL Reporting Services. Google it.
doing it with the logs is only a good idea if it's internal - I'd use google analytics for anyhing on teh internets
I have been using Summary, which is paid for software, for years, and love it. The cost of updates is getting to me, and paying for an update to just get user agent string updates out of the deal is getting bothersome. Not that there are not other fixes, I just tend to not need them.
Anyone care to share if they have used Summary compared to analog?
Look at XpoLog log analysis platform for web application servers and web servers log. it a log management and analysis platform that integrate to web servers logs and create reports, provide search and log viewer and also monitor for problems. XpoLog

What weaknesses can be found in using Erlang?

I am considering Erlang as a potential for my upcoming project. I need a "Highly scalable, highly reliable" (duh, what project doesn't?) web server to accept HTTP requests, but not really serve up HTML. We have thousands of distributed clients (other systems, not users) that will be submitting binary data to central cluster of servers for offline processing. Responses would be very short, success, fail, error code, minimal data. We want to use HTTP since it is our best chance of traversing firewalls.
Given this limited information about the project, can you provide any weaknesses that might pop up using a technology like Erlang? For instance, I understand Erlang's text processing capabilities might leave something to be desired.
You comments are appreciated.
Thanks.
This sounds like a perfect candidate for a language like Erlang. The scaling properties of the language are very good, but if you're worried about the data processing abilities, you shouldn't be. It's a very powerful language, with many libraries available for developers. It's an old language, and it's been heavily used/tested in the past, so everything you want to do has probably already been done to some degree.
Make sure you use erlang version R11B5 or newer! Earlier versions of erlang did not provide the ability to timeout tcp sends. This results in stalled or malicious clients being able to execute a DoS attack on your application by refusing to recv data you send them, thus locking up the sending process.
See issue OTP-6684 from R11B5's release notes.
With Erlang the scalability and reliability is there but from your project definition you don't outline what type of text processing you will need.
I think Erlang's main limitation might be finding experienced developers in your area. Do some research on the availability of Erlang architects and coders.
If you are going to teach yourself or have your developers learn it on the job keep in mind that it is a very different way of coding and that while the core documentation is good a lot of people do wish there were more examples. Of course the very active community easily makes up for that.
I understand Erlang's text processing
capabilities might leave something to
be desired.
The starling project already provides basic unicode support and there is a EEP (Erlang Enhancement Proposal) currently in draft, but going in to bring it into the mainstream of Erlang/OTP support.
I encountered some problems with Redis read performance from Erlang. Here is my question. I tend to think the reason is Erlang-written module, which has troubles while processing tons of strings during communication with Redis.

Resources