How to track usage of Hound - code-search-engine

What are some ways we could monitor Hound, both usage and specific searches? Is there any systems either built in or that can be added as extensions?
We are running Hound on the intranet but we have no visibility of how it's being used and how much usage it gets.
Currently hound is being installed through a puppet pipeline.
Preliminary ideas:
1) Add a proxy forwarding endpoint in front of hound to record the "hits" and send it to some DB store. That wouldn't track low level usage of hound (i.e. search queries), just the hits.
2) Somehow enable server logs and parse through the logs? I'm not sure how much info I would get from the logs and the parsing might get involved. Then send this info to some DB store.

Related

Video Streaming for Mobile App

I'm building an iOS app for a client that allows users to pay a subscription and unlock additional content within the app. Part of the additional content will be videos which need to be streamed from a server... but I'm not sure whether we should use a hosting service (like Amazon CloudFront or Wowza, perhaps?) or roll our own solution.
Have any of you had experience with either of these options? I looks like this is supported natively by nginx, which we're currently using as our rProxy, but I'd like to hear some thoughts about that. I would be somewhat concerned about saturating our server's 1Gb network connection too...
Whatever the solution, we must be able to verify a person's account before they can access the video content. Variable bitrate is also desirable, and the ability to support >500 concurrent users. This company is also a new startup, so subscription costs are an important factor.
It is usually best to deploy streaming-specific software or services instead of generic HTTP servers such as Nginx. For Wowza, as an example, here's a quick list of features for this type of workflow.
Performance and scalability. You can do a quick comparison on playing back concurrent streams (using load test tools) and see what kind of load can be handled by an HTTP server vs Wowza.
Monitoring. Statistics collection is also integrated with Wowza, which may prove beneficial for start-up companies that need to leverage this kind of data mining.
Security. Wowza also has several options that you can use, such as Secure Token. For example, you can configure your mobile app to query the user's IP address once you determine that they are authorized to receive the stream. You can then generate a hash token based on this IP address and the stream they are authorized for, and only allow playback with the valid token. You can also expire these tokens.
Manager UI. Not as attractive for developers/sys admins, but users can take advantage of a relatively intuitive UI.
Extensibility. Wowza has REST and Java API that can allow you to add custom modules or integrate third-party systems. For example, you can use a custom module that monitors stream connection time, and cuts off any connections that are longer than x number of hours.

Please suggest a good Monitoring and Alerting tool for applications hosted in cloud [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am looking for a monitoring and alerting tool for my application hosted in cloud. My application is hosted across multiple servers and I want to monitor all these servers. I am interested in monitoring the following:
1. Service monitoring:
Check if the service is up. This requires
try siging-up a new user
log-in to the application with given username/password and perform certain steps like search etc.
Monitoring QoS. How much time is it taking for searches and some other opertions
2. resource monitoring
Monitoring the following parameters in each server:
CPU utilization
load average
Memory usage
Disk usage
IOPS
3. process monitoring
Monitor if a set of processes are running or not. If not running try restarting them.
Ex: php-fpm, my application binaries, mysql, nginx, smtp etc.
4. Monitoring log files
Error logs of my application
mysql error log
MySQL slow query log
etc.
Also I should be able to extend its usage by executing shell commands or writing my own shell scripts.
I should be able to set alert if any monitored item is found problematic. I should be able to get alert through
email
Mobile SMS
The Monitoring system should maintain history for the period I want. So that after receiving the alert I should be able to log-in to the
system and view past data (say past 2 weeks) and investigate problems.
Most important:
The tool should have a very good way of managing its own configuration.
The configuration should not be scattered at multiple places. All configuration should be stored in a centralized place. In future say, path of a monitored log file has changed. I would like to search and replace all occurrences of that file in my configuration.
I should be able to version control my configurations.
Instead of going to the web interface and setting configuration manually, I would like set up a script which automatically loads all the configurations and start monitoring.
I am exploring Zabbix but don't see a satisfactory way of configuration management. Should I try Nagios? Any other tool?
2 newer cloud type monitoring solutions that may be of interested to you are http://logicmonitor.com/ and http://copperegg.com/.
LogicMonitor has many of your requirements out of the box as it has a bit of customization for your own alerting.
CopperEgg / RevealCloud is more base system level monitoring (CPU, memory, disk, and network throughput). It has a nice polished interface that is much more straightforward than LogicMonitor. But that is about it.
Well, considering you've tagged this with Zabbix, I assume you're considering this as an option.
We use Zabbix to monitor the Amazon EC2 instances as well as instances in our private openstack cloud. It's as simple as "apt-get install zabbix-agent" really.
Zabbix is especially useful in the case of monitoring our openstack private cloud. We have the server scan an ip-range and automatically set up checks, alerts, etc, based solely on the hostname of the machine found.
Nagios is one of the standard ways of monitoring and can support all the use cases you brought up (plus, plugins have probably already been written for all of them).

Rails based hub and spoke distributed web site. Anything out there?

I need to design a system where we have a central Rails website for maintaining product information, some of which is rich media (photos, movies etc.) and we need a way to efficiently access this central information from a series of information kiosks. The central system will be used to update and control access to the information and the kiosks will primarily display this with no editing required. The only traffic which is likely to move back from kiosk to central site is usage information which is not bandwidth constrained.
My initial thoughts are to run separate Rails servers on each kiosk and 'somehow' (eg. scheduled rake task) synchronise the relevant content from the central server to each kiosk. Note that the kiosks won't all have the same content on them as it will be location dependent. We might need to employ something like Amazon S3 storage to host content.
Another option would be to employ some sort of advanced caching (ie. more advanced than standard browser caching) on each kiosk to minimise network bandwidth requirements and speed things up. I've used 'squid' before but only as a general purpose site cache server, I don't know if it can step up to what I need here.
So, my question is whether anyone out there has attempted anything like this before and what sort of architecture you found to work. I'd be interested in hearing if there are any Rails plugins which are relevant to my requirements and/or any smart caching servers.
Many thanks,
Craig.
I know it's not possible for every application, but you could generate static cache of the content and use a scheduled task to update each kiosk from that cache. Then you don't have to maintain rails servers in each one.
Depending on what you're running on kiosks, if you need a bit more interactivity, you can run a sinatra or a camping app. Those are a fair bit lighter weight than rails. You can communicate through XML. If you're running a flash app on the kiosk, look at rubyamf library.

Best practice for rate limiting users of a REST API?

I am putting together a REST API and as I'm unsure how it will scale or what the demand for it will be, I'd like to be able to rate limit uses of it as well as to be able to temporarily refuse requests when the box is over capacity or if there is some kind of slashdotted scenario.
I'd also like to be able to gracefully bring the service down temporarily (while giving clients results that indicate the main service is offline for a bit) when/if I need to scale the service by adding more capacity.
Are there any best practices for this kind of thing? Implementation is Rails with mysql.
This is all done with outer webserver, which listens to the world (i recommend nginx or lighttpd).
Regarding rate limits, nginx is able to limit, i.e. 50 req/minute per each IP, all over get 503 page, which you can customize.
Regarding expected temporary down, in rails world this is done via special maintainance.html page. There is some kind of automation that creates or symlinks that file when rails app servers go down. I'd recommend relying not on file presence, but on actual availability of app server.
But really you are able to start/stop services without losing any connections at all. I.e. you can run separate instance of app server on different UNIX socket/IP port and have balancer (nginx/lighty/haproxy) use that new instance too. Then you shut down old instance and all clients are served with only new one. No connection lost. Of course this scenario is not always possible, depends on type of change you introduced in new version.
haproxy is a balancer-only solution. It can extremely efficiently balance requests to app servers in your farm.
For quite big service you end-up with something like:
api.domain resolving to round-robin N balancers
each balancer proxies requests to M webservers for static and P app servers for dynamic content. Oh well your REST API don't have static files, does it?
For quite small service (under 2K rps) all balancing is done inside one-two webservers.
Good answers already - if you don't want to implement the limiter yourself, there are also solutions like 3scale (http://www.3scale.net) which does rate limiting, analytics etc. for APIs. It works using a plugin (see here for the ruby api plugin) which hooks into the 3scale architecture. You can also use it via varnish and have varnish act as a rate limiting proxy.
I'd recommend implementing the rate limits outside of your application since otherwise the high traffic will still have the effect of killing your app. One good solution is to implement it as part of your apache proxy, with something like mod_evasive

What are the requirements for an application health monitoring system?

What, at a minimum, should an application health-monitoring system do for you (the developer) and/or your boss (the IT Manager) and/or the operations (on-call) staff?
What else should it do above the minimum requirements?
Is monitoring the 'infrastructure' applications (ms-exchange, apache, etc.) sufficient or do individual user applications, web sites, and databases also need to be monitored?
if the latter, what do you need to know about them?
ADDENDUM: thanks for the input, i was really looking for application-level monitoring not infrastructure monitoring, but it is good to know about both
Whether the application is running.
Unusual cpu/memory/network usage.
Report any unhandled exceptions.
Status of various modules (if applicable).
Status of external components (databases, webservices, fileservers, etc.)
Number of pending background tasks (if applicable).
Maybe track usage of the application and report statistics on most/less used functionalities so you know where optimizations are most beneficial.
The answer is 'it depends'. Why do you need to monitor? How large is your operations staff? Do you need reporting? What is the application environment? Who cares if the application fails? Who cares if an exception happens? Are any of the errors recoverable? I could ask questions like these for a long time.
Great question.
We've been looking for some application-level monitoring solution for our needs some time ago without any luck. Popular monitoring solution are mostly addressed to monitor infrastrcture and - in my opinion - they are too complicated for a requirements of most of small and mid-sized companies.
We required (mainly) following features:
alerts - we wanted to know about
incident as fast as possible
painless management - hosted service wouldbe
the best
visualizations - it's good to know what is going on and take some knowledge from the data
Because we didn't find suitable solution we started to write our own. Finally we've ended with up-and-running service called AlertGrid. (You can check it for free of course.)
The idea behind it is to provide an easy way to handle custom monitoring scenarios. Integration API is very simple (one function with two required parameters). At the momment we and others are using it for:
monitor scheduled tasks (cron jobs)
monitor entire application logic execution
alert on errors in applications
we are also working on examples of basic infrastructure monitoring using AlertGrid
This is such an open ended question, but I would start with physical measurements.
1. Are all the machines I think are hosting this site pingable?
2. Are all the machines which should be serving content actually serving some content? (Ideally this would be hit from an external network.)
3. Is each expected service on each machine running?
3a. Have those services run recently?
4. Does each machine have hard drive space left? (Don't forget the db)
5. Have these machines been backed up? When was the last time?
Once one lays out the physical monitoring of the systems, one can address those specific to a system?
1. Can an automated script log in? How long did it take?
2. How many users are live? Have there been a million fake accounts added?
...
These sorts of questions get more nebulous, and can be very system specific. They also usually can be derived reactively when responding to phsyical measurements. Hard drive fill up, maybe the web server logs got filled up because a bunch of agents created too many fake users. That kind of thing.
While plan A shouldn't necessarily be reactive, it is the way many a site setup a monitoring system.
Minimum: make sure it is running :)
However, some other stuff would be very useful. For example, the CPU load, RAM usage and (in multiuser systems) which user is running what. Also, for applications that access network, a list of network connections for each app. And (if you have access to client computer(s)) it would be cool to be able to see the 'window title' of the app - maybe check each 2-3 minutes if it changed and save it. Also, a list of files open by the application could be very useful, but it is not a must.
I think this is fairly simple - monitor so that you can be warned early enough before something goes wrong. That means monitor dependencies and the application itself.
It's really hard to provide specifics if you're not going to give details on the application you're monitoring, so I'd say use that as a general rule.
At a minimum you want to know that the system is healthy. This is subjective in what defines your system is healthy. Is it computers are up, the needed resources exist, the data is flowing through the system, the data is properly producing results, etc, etc.
In my project we do monitoring of most of this and then some. It really comes down to what is the highest level that you can use to analyze that everything is working. In our case we need to know down to the data output. If you just need to know down to the are these machines up it saves you on trying to show an inexperienced end user what is wrong.
There are also "off the shelf" tools that will do a lot of the hard work for you if you are just looking too hard into data results. I particularly liked Nagios when I was looking around but we needed more than it could easily show so I wrote our own monitoring system. Basically we also watch for "peculiarities" in the system, memory / cpu spikes, etc...
thanks everyone for the input, i was really looking for application-level monitoring not infrastructure monitoring, but it is good to know about both
the difference is:
infrastructure monitoring would be servers plus MS Exchange Server, Apache, IIS, and so forth
application monitoring would be user machines and the specific programs that they use to do their jobs, and/or servers plus the data-moving/backend applications that they run to keep the data flowing
sometimes it's hard to draw the line - an oversimplified definition might be "if your team wrote it, it's an application; if you bought it, it's infrastructure"
i think in practice it is best to monitor both
What you need to do is to break down the business process of the application and then have the software emit events at major business components. In addition, you'll need to create end to end synthetic transactions (eg. emulating end users clicking on a website). All that data would be fed into an monitoring tool. In the past, I've done JMX for applications of which flowed into Tivoli Monitoring's JMX Adapter and then I've done scripts that implement a "fake user" and then pipe in the results into Tivoli Monitoring's Script Adapter. Tivoli Monitoring takes the data and then creates application health and performance charts from that raw data.

Resources