I need to display logfiles in real time to user webpage using ruby/rails. Users should be able to see logfile steaming without refreshing the page.
Logfiles may not always be in the same machine which runs rails.
Is it possible in ruby/rails ?.
You could do this with automatically refreshing page, using AJAX when JavaScript is available (this avoids page flicker and extra bandwidth usage caused by a page reload).
The other approach (actually incrementally updating page) consists of two separate issues: (1) reading the file as it grows, and (2) sending the answer without closing the connection. I have seen some solutions doing this (not in Rails, though), but unfortunately they tend not to be very reliable (browsers and other parts of the system will timeout) and the users get confused when the page loading never finishes.
Ruby and Rails can read files, but the issue is going to be accessing from different machines. Are the logfiles on the same network?
What you're asking for is not necessarily Rails-specific. It's more a question of technology, IMO.
Because the classic web model is client/server and pushing data asynchronously is not standard, you'll need to figure out how to do it either by:
1) faking it by polling on the client.
2) use a different technology. For Rails, you may want to look into something like comet or websockets.
Related
I've been working on a rails project that's unusual for me in a sense that it's not going to be using a MySQL database and instead will roll with mongoDB + Redis.
The app is pretty simple - "boot up" data from mongoDB to Redis, after which point rails will be ready to take requests from users which will consist mainly of pulling data from redis, (I was told it'd be pretty darn fast at this) doing a quick calculation and sending some of the data back out to the user.
This will be happening ~1500-4500 times per second, with any luck.
Before the might of the user army comes down on the server, I was wondering if there was a way to "simulate" the page requests somehow internally - like running a rake task to simply execute that page N times per second or something of the sort?
If not, is there a way to test that load and then time the requests to get a rough idea of the delay most users will be looking at?
Caveat
Performance testing is a very broad topic, and the right tool often depends on the type and quality of results that you need. As just one example of the issues you have to deal with, consider what happens if you write a benchmark spec for a specific controller action, and call that method 1000 times in a row. This might give a good idea of performance of that controller method, but it might be making the same redis or mongo query 1000 times, the results of which the database driver may be caching. This also ignores the time it'll take your web server to respond and serve up the static assets that are part of the request (this may be okay, especially if you have other tests for this).
Basic Tools
ab, or ApacheBench, is a simple commandline tool that you can use to test the throughput and speed of your app. I usually go to this first when I want to send a thousand requests at a web server, or test how many simultaneous requests my app can handle (e.g. when comparing mongrel, unicorn, thin, and goliath). Because all requests originate from the same server, this is good for a small number of requests, but as the number of requests grow, you'll be limited by the resources on your testing machine (network stack, cpu, and maybe memory).
Benchmark is a standard ruby class, and is great for quickly spitting out some profiling information. It can also be used with Test::Unit and RSpec. If you want a rake task for doing some quick benchmarking, this is probably the place to start
mechanize - I like using mechanize for quickly scripting an interaction with a page. It handles cookies and forms, but won't go and grab assets like images by default. It can be a good tool if you're rolling your own tests, but shouldn't be the first one to go to.
There are also some tools that will simulate actual users interacting with the site (they'll download assets as a browser would, and can be configured to simulate several different users). Most notable are The Grinder and Tsung. While still very much in development, I'm currently working on tsung-rails to make it easier to automate rails load testing with tsung, and would love some help if you choose to go in this direction :)
Rails Profiling Links
Good overview for writing performance tests
Great slide deck covering most of the latest tools for profiling at various levels
Hi
I've been working on a medium sized MVC project. It works fine on the localhost at a good speed rate. In each page, there's a lot of server-side data retrieved, I use a lot of jquery to minimize the traffic to the server, but even then, the webpage loads very slowly. There are many events on which I retrieve json results, to get a specific number from the database and make calculations, this data takes a long time to be retrieved on the webpage, although on the localhost it is immediately shown. Also, when I submit pages, it takes awfully a lot of time to submit. I've published my project to GoDaddy's server and also my database is there. What could be the problem that is making the project that slow? How can I minimize it? And why is it only when the website is online and not on the localhost too?
As such, issue can be anywhere and only certain way to know is instrumenting the code. I will suggest that you add simple logging traces with date-time stamp in your server code (note that logging should be configurable, any logging framework (including System.Diagnostic.Trace) should support it) and check where the time is spent. For example, database trips can be expensive etc. If you don't find the culprit on server side code i.e. sever is serving the request in reasonable time then you have to look at the performance over network. Tools such as Fiddler (or Firefox) should help you here - sometimes issuing too many requests from browser is also problematic because browser may make only n concurrent requests or even server may have been configured to accept only n requests from particular client - this could result in serialization of request increasing total response time. These scenarios are difficult to catch on localhost because network latency is almost zero there. You may also use tool such as YSlow for related performance improvement suggestions. But please do your investigation first, find the bottlenecks and then ask for solutions to specific problems.
Run it in chrome. Turn on the developer tools. Expand the Console. watch for errors. Also from there you can monitor those network calls to see which is slow.
if MVC uses entity framework (based on LINQ), it will sure be slow
because LINQ is slow compared to the old ADO.NET
I am writing a Rails app that "scrapes/navigates" some other websites and webservices for content. I am using Mechanize and Savon to do the heavylifting.
But given the dynamic nature of the web, I'd like to make my calls to these editable by the admin users of the site - rather than requiring me to release a new version of the site.
The actual scraping thread happens async to the website, using the daemons gem.
My requirements are:
Thinking that the scraping/webservice calling code is quite simple, the easiest route is to make the whole class editable by the admins.
Keep a history of the scraping code - so that we can fairly easily revert if we introduce a problem.
Initially use the code from the file system, but as soon as thats been edited and stored somewhere, to use that code instead.
I am thinking my options are:
Store the code in the db (with a history table for the old versions)
Store the code in a private git repo somewhere and access that for the history/latest versions.
I am thinking the git route might be easiest, given its raison d'etre is to track file history...
But perhaps there is a gem/plugin that does all this for me, out of the box?
Thanks in advance for any tips/advice.
~chris
I really hope you aren't doing something like what's talked about here...
Assuming you are doing a proper mixin, there used to be a gem called "acts_as_versioned" which would do something like you want. It's been a while so I don't know if it's been turned into a plugin or if it's been abandoned. Essentially the process it uses was to provide a combination key for your versioned table.
Your database would have a structure like this:
Key column (id for the record)
Version column (id for the record's version)
All the record attributes
Let's say you had a table for your scripts, and the script you wanted has three versions. Your table would have the following records:
123, 3, '#Be good now'
123, 2, 'puts "Hi"'
123, 1, '#Do not be bad'
Getting the most recent version would be as simple as
Scripts.find :first, :conditions=>{:id=>123}, :order=>"version desc"
Rolling back would be as simple as removing the most recent version, or having another table with a pointer to the active version. It's up to you.
You are correct in that git, subversion, mercurial and company are going to be much better at this. To provide support, you just follow these steps:
Check out the script on the server (using a tag so you can manage what goes there at any time)
Set up a cron job to check out the new script periodically (like every six hours or whatever you feel comfortable with)
The daemon you have for running the script should run the new version automatically.
IF your site is already under source control, and IF you're running under mod_rails/passenger, you could follow this procedure:
edit scraping code
commit change locally
touch yourapp/tmp/restart.txt
that should give you history of the change and you shouldn't have to re-deploy.
A bit safer, but not sure if it's possible for you is on a test/developement server: make change, commit locally, test it, then on production server, git pull then touch tmp/restart.txt
I've written some big spiders and page analyzers in the past, and one of the things to keep in mind is what code is providing what service to the entire application.
Rails is providing the presentation of the data being gathered by your spidering engine. The presentation is one side of the coin, and spidering is the other, and they should be two separate code bases, tied together by some data-sharing mechanism, which, in your case, is the database. The database gives you some huge advantages as does having Rails available, when your spidering code is separate. It sounds like you have some separation already, but I'd recommend creating a wider gap. With that in mind, here's how I've done it before, and what I'd do now.
Previously, I had a separate app for my spidering that was spawning multiple spider tasks. Each task would look at a bunch of different URLs, throw their results in the database, then quit. Each time one quit the main app would spawn another spider to process more URLs. Each loop, the main app checked a YAML configuration file for run-time parameters, like how many sub-tasks it should have running, how many URLs they'd get, how long they'd wait for connections, etc. It stored the last modification date of the config file each time it loaded it so, if I made a change to the file, the app would sense it in a reasonably short time, reread the file, and adjust its behavior.
All state information about the URLs/pages/sites being scraped/spidered, was kept in the database so I could check on its progress. I could see how many had been processed or remained in the queue, the various result codes, and the content being returned. If I didn't like something I could even tweak the filters to skip junk pages, knowing the spidering tasks would be updated in a few minutes.
That system worked extremely well, spidered a major customer's series of websites without a glitch, running for several weeks as I added new sites to the list. (We were helping one of the Fortune 50 companies improve their sites, and every site had been designed and implemented by a different team, making every site completely different. My code had to be flexible and robust; I was really happy with how it worked out.)
To change it, these days I'd use a database table to hold all the configuration info. That way I could easily build an admin form, and let someone else inherit the task of adjusting the app's runtime configuration. The spider tasks would also be written so they'd pull their configuration from the database, rather than inherit it from the main app. I originally had the main app do all the administration and pass the config info to the spidering apps because I wanted to keep the number of connections to the database as low as possible. I was using Postgres and now know it could have easily handled the load, so by letting the individual tasks handle their configuration I could have made it more responsive.
By making the spidering engine separate from the presentation engine it was possible to temporarily stop one or the other without affecting the progress of the spidering job. Once I had the auto-reload of the prefs in place I don't think I had to stop the spidering engine, I just adjusted its prefs. It literally ran for weeks without stopping and we eventually pulled the plug because we had enough data for our needs.
So, I'd recommend tweaking your code so your spidering engine doesn't rely on Rails, instead it will be fired off by cron or a separate scheduling app. If you have to temporarily stop Rails your engine will run anyway. If you have to temporarily stop the engine then Rails can continue serving pages. The database sits between the two acting as the glue.
Of course, if the database goes down you're hosed all the way around, but what else is new? :-)
EDIT: Chris said:
"I see your point about the splitting the code out, though my Ruby-fu is low - not sure how far I can separate things without having to have copies of the ActiveModel/migrations stuff, plus some shared model classes."
If we look at your application as spider engine <--> | <-- database --> | <--> Rails/MVC/presentation, where the engine and Rails separately read and write to the database, and look at what each does well, that helps figure out how to break them into separate code bases.
Rails is designed to handle migrations, so let it. There's no reason to reinvent that wheel. But, how often do you do migrations, and what is effected when you do? You do them seldom once the application is stable, and, at that point you'd do them in a maintenance cycle to tweak the database. You can shut down the spidering engine and the web interface for a few minutes, migrate the database, then bring things up and you're off and running. Migrations are a necessary evil, but are hardly show-stoppers once in production. Most enterprises have "Software Sunday", or some pre-announced window of maintenance, so do the same.
ActiveRecord, modeling and associations are pretty easy to deal with too. The models are in a file that is required internally by Rails already, so the spidering engine can inherit the database know-how that way too; Multiple apps/scripts can use the same model file. You don't see the Rails books talk about it much, but ActiveRecord is actually pretty easy to use outside of Rails. Search the googles for activerecord without rails for more info.
You can pull in ActiveSupport also if you want some of its extensions to classes by doing a regular require, but the Rails "view" and "controller" logic, which normally applies to presenting the web interface, shouldn't be needed at all in the engine.
Business logic, which goes in the controllers in Rails could even be refactored into separate methods that get required by the Rails side of things and by the spidering engine. It's a different way of looking at Rails but falls in line with the "DRY" mantra - don't repeat yourself, so make things modular and require (or require_relative) bits and pieces that are the building blocks of the entire system.
If you don't want a totally separate codebase, you can take advantage of Rail's script runner, which gives a script access to the ActiveRecord::Base and ActiveRecord::Associations and ActiveSupport. Do a rails runner -h from your app's main directory, or search for "rails runner" for more info. runner is not good for a job that starts and runs many times an hour, because Rail's startup cost is high. But, if you have a long-running task, say one that runs in parallel with your rails app, then it's a great choice. I'd give it serious consideration for the spidering side of your application. Eventually you might want to break the spidering-engine out to a separate host so the presentation side has a dedicated host, so runner will help you buy time and do it in small steps.
I'm writing a Rails web service that interacts with various pieces of hardware scattered throughout the country.
When a call is made to the web service, the Rails app then attempts to contact the appropriate piece of hardware, get the needed information, and reply to the web client. The time between the client's call and the reply may be up to 10 seconds, depending upon lots of factors.
I do not want to split the web service call in two (ask for information, answer immediately with a pending reply, then force another api call to get the actual results).
I basically see two options. Either run JRuby and use multithreading or else run several regular Ruby instances and hope that not many people try to use the service at a time. JRuby seems like the much better solution, but it still doesn't seem to be mainstream and have out of the box support at Heroku and EngineYard. The multiple instance solution seems like a total kludge.
1) Am I right about my two options? Is there a better one I'm missing?
2) Is there an easy deployment option for JRuby?
I do not want to split the web service call in two (ask for information, answer immediately with a pending reply, then force another api call to get the actual results).
From an engineering perspective, this seems like it would be the best alternative.
Why don't you want to do it?
There's a third option: If you host your Rails app with Passenger and enable global queueing, you can do this transparently. I have some actions that take several minutes, with no issues (caveat: some browsers may time out, but that may not be a concern for you).
If you're worried about browser timeout, or you cannot control the deployment environment, you may want to process it in the background:
User requests data
You enter request into a queue
Your web service returns a "ticket" identifier to check the progress
A background process processes the jobs in the queue
The user polls back, referencing the "ticket" id
As far as hosting in JRuby, I've deployed a couple of small internal applications using the glassfish gem, but I'm not sure how much I would trust it for customer-facing apps. Just make sure you run config.threadsafe! in production.rb. I've heard good things about Trinidad, too.
You can also run the web service call in a delayed background job so that it's not hogging up a web-server and can even be run on a separate physical box. This is also a much more scaleable approach. If you make the web call using AJAX then you can ping the server every second or two to see if your results are ready, that way your client is not held in limbo while the results are being calculated and the request does not time out.
It seems that the only tutorials out there talking about using Amazon's SimpleDB in a rails site are using AWSDBProxy... Personally, I find this counter-intuitive to scaling out, considering the server layout of a typical Rails site below (using AWSDBProxy):
Plugin here: http://agilewebdevelopment.com/plugins/aws_sdb_proxy
Image here: http://www.freeimagehosting.net/uploads/91be4e0617.png
As you can see, even if we add more mongrels, we have two problems.
We have a single point of failure far less stable than our load balancer
We have to force all our information through this one WEBrick server
The solution is, of course, to add more AWSDBProxies... but why not then just use the following code in say, a class, skipping the proxy all together?
service = AwsSdb::Service.new(Logger.new(nil),
CONFIG['aws_access_key_id'],
CONFIG['aws_secret_access_key'])
service.query(domain, query)
So what I'm getting at, is if you are using AWSDBProxy, what are you justifications for it? And if you are indeed using it, what is your performance like? If you have hard numbers, this would be even more appreciated!
I'm not using it, nor have I ever heard of it, but this is what I would think are reasonable reasons.
You're running your main app server on EC2, so the chance of Internet FAIL doesn't really affect you more than once.
You run one proxy on each of your app servers. So it's connection going down is no worse than it's connection(s) to the database going down.
Because it can be done. This is as good a reason as any in an open source project. Sometimes it takes building a thing before you know whether said thing is a good/bad idea.
You don't have the traffic levels to need a load balancer. Then your diagram squashes down to a line, if not a single machine.