Fast way to check if page is online - ruby-on-rails

I am using Nokogiri for parsing XML.
Problem is in response time of external resource. Sometimes it works fine. Sometimes respond time can be over 30 seconds. Sometimes it returns different error codes. What I need is to find out the fastest way to know if my XML is ready to be requested by open-uri. And only then to make actual request.
What I am doing now is setting Timeout to 5 seconds to prevent delays.
begin
Timeout::timeout(5) do
link = URI.escape("http://domain.org/timetable.xml")
#doc = Nokogiri::HTML(open(link))
end
rescue Timeout::Error
#error = "Data Server is offline"
end

For checks at the level your code shows, you'll need cooperation from the remote service, e.g., conditional HEAD requests and/or Etag comparison (those together would be my own preference.) It looks like you may have some of this as you say it sometimes returns error codes, though if the those error codes are in the XML payload they're not going to help and of course, if the remote service's responsiveness is variable it will probably fluctuate between your check and subsequent main GET request.
FWIW: if you're just looking to improve your app's responsiveness when using this data, there are cache approaches you can use, e.g., use a soft-TTL lower than the main TTL that, when expired, causes your cache code to return the cached XML and kick off an async job to refetch the data so it's fresher for the next request. Or use a repeating worker to keep the cache fresh.

Related

NSURLSession set request timeout accurately

We set timeout interval for a request by NSMutableURLRequest timeoutInterval. As Apple's document described, it specifies the limit between packets, not the whole request. When we analyse our requests logs, some timeout request exceeded the seconds we set to timeoutInterval. We need timeout the requests accurately.
By reading document and blogs, the timeoutIntervalForRequest property in NSURLSessionConfiguration is the same as timeoutInterval. But the timeoutIntervalForResource property seems fit our requirement.
However, Mattt says in objc.io that timeoutIntervalForResource "should only really be used for background transfers". Can it be used in normal request? Such as query user info. Is it appropriate in this situation?
Thanks very much.
It can be used, but it rarely makes sense to do so.
The expected user experience from an iOS app is that when the user asks to download or view some web-based resource, the fetch should continue, retrying/resuming as needed, until the user explicitly cancels it.
That said, if you're talking about fetching something that isn't requested by the user, or if you are fetching additional optional data that you can live without, adding a resource timeout is probably fine. I'm not sure why you would bother to cancel it, though. After all, if you've already spent the network bandwidth to download half of the data, it probably makes sense to let it finish, or else that time is wasted.
Instead, it is usually better to time out any UI that is blocked by the fetch, if applicable, but continue fetching the request and cache it. That way, the next time the user does something that would cause a similar fetch to occur, you already have the data.
The only exception I can think of would be fetching video fragments or something similar, where if it takes too long, you need to abort the transfer and switch to a different, lower-quality stream. But in most apps, that should be handled by the HLS support in iOS itself, so you don't have to manage that.

How to optimise computation intensive request response on rails [duplicate]

This question already has answers here:
How do I handle long requests for a Rails App so other users are not delayed too much?
(3 answers)
Closed 6 years ago.
I have an application, which does a lot of computation on few pages(requests). The web interface sends an AJAX request. The computation takes sometimes about 2-5 minutes. The problem is, by this time AJAX request times out.
We can certainly increase the timeout on the web portal, but that doesn't sound like right solution. Also, to improve performance:
Removed N+1/Duplicate queries
Implemented Caching
What else could be done here to reduce the calculation time?
Also, if it still takes longer, I was thinking of following solutions:
Do the computation beforehand and store it in DB. So when the actual request comes, there is no need of calculation. (Apprehensive about this approach. Since we will have to modify/Erase-and-recalculate this data, whenever there is some application logic change.)
Load the whole data in cache when application starts/data gets modified. But for the first time computation has to be done. Also, can't keep whole data in the cache when the application starts. So need to store it in the cache as per demand.
Maybe, do something like Angular promise, where promise gets fulfilled when the response comes from the server.
Do we have any alternative to do this efficiently?
UPDATE:
Depending on user input, the calculation might happen in few seconds. And also it might take 2-5 minutes. The scenario is, user imports an excel. The excel has been parsed and saved in DB. Now on another page, user wants to see the report/analytics graph derived with few calculations on the imported data(which has already been saved to db with background job). The calculation has to be done with many factors, so do not want to save it in DB(As pointed above). Also, when user request the report/analytics graph, It'll be bad experience to tell him that graph will be shown after sometime. You'll get email/notification etc.
The extremely typical solution is to enqueue a job for background processing, and return a job ID to the front-end. Your front-end can then poll for completion using that job ID, or you can trigger a notification such as an email to be sent to the user when the job completes.
There are a multitude of gems for this, and it is such a popular and accepted solution that Rails introduced its own ActiveJob for this exact purpose.
Here are a few possible solutions:
Optimize your tables with indexes to reduce data fetching time.
Preload all rows you'll be dealing with at the beginning, so you won't do a query each time you calculate something... it's faster/easier to #things.select { |r| r.blah } than to Thing.where(conditions)
Instead of all that, just do the computing in PLSQL on the database side. Sure, it's not the same as writing Ruby code but it could be faster.
And yes, cache the whole results set into memcache or redis or something (and expire when something change)
Run the calculation in the background (crontab?) and store the results in a JSON somewhere, or cache the entire HTML file (if you're not localizing or anything)
PS: I'm doing 1,2,3 combined with 5 (caching JSON results into memcache and then pulling the array and formatting/localizing) for a few M records from about 12 tables... sports data mainly.

What happens when Rails.cache.fetch expires

I'm generating json of 65,000 users to populate a typeahead. The query is quick, turns out building the json was the bottleneck. I'm trying to cache the result but what happens when the cache expires, does it rebuild it automatically or is it going to wait until someone triggers the call, resulting in a 9-second page load once every 12-hours?
def user_json
Rails.cache.fetch("users", expires_in: 12.hours) do
User.all.to_json
end
end
If you did not want to hit the database each time then you could look in to a solution such as elastic search or sphinx which are designed to perform quick searching like your describing.
I was listening to javascript jabber this morning and they where saying that the average web page is now a shade under 2mb including images and CSS. Your request doubles that size. While that's fine for north Americans your page is likely to feel much slower in internet backwaters such as Australia.
It's also worth noting that older browsers such as IE don't handle iteration in javascript too well. I would suggest that your application would crash in any IE pre version 9.
Because of these reasons I would avoid pushing JSON that contains 65,000 rows over the wire and in to the browser. If the query is quick why not do a trip back to the server each time the user changes the input. Many trips back to the server based on input would be quicker than sending all 65,000 records and in the process removes the entire class of problems I have described above. Your original problem also goes away as you don't have to cache any responses any more.

How can I reduce the waiting (ttfb) time

I have a query which involves getting a list of user from a table in sorted order based on at what time it was created. I got the following timing diagram from the chrome developer tools.
You can see that TTFB (time to first byte) is too high.
I am not sure whether it is because of the SQL sort. If that is the reason then how can I reduce this time?
Or is it because of the TTFB. I saw blogs which says that TTFB should be less (< 1sec). But for me it shows >1 sec. Is it because of my query or something else?
I am not sure how can I reduce this time.
I am using angular. Should I use angular to sort the table instead of SQL sort? (many posts say that shouldn't be the issue)
What I want to know is how can I reduce TTFB. Guys! I am actually new to this. It is the task given to me by my team members. I am not sure how can I reduce TTFB time. I saw many posts, but not able to understand properly. What is TTFB. Is it the time taken by the server?
The TTFB is not the time to first byte of the body of the response (i.e., the useful data, such as: json, xml, etc.), but rather the time to first byte of the response received from the server. This byte is the start of the response headers.
For example, if the server sends the headers before doing the hard work (like heavy SQL), you will get a very low TTFB, but it isn't "true".
In your case, TTFB represents the time you spend processing data on the server.
To reduce the TTFB, you need to do the server-side work faster.
I have met the same problem. My project is running on the local server. I checked my php code.
$db = mysqli_connect('localhost', 'root', 'root', 'smart');
I use localhost to connect to my local database. That maybe the cause of the problem which you're describing. You can modify your HOSTS file. Add the line
127.0.0.1 localhost.
TTFB is something that happens behind the scenes. Your browser knows nothing about what happens behind the scenes.
You need to look into what queries are being run and how the website connects to the server.
This article might help understand TTFB, but otherwise you need to dig deeper into your application.
If you are using PHP, try using <?php flush(); ?> after </head> and before </body> or whatever section you want to output quickly (like the header or content). It will output the actually code without waiting for php to end. Don't use this function all the time, or the speed increase won't be noticable.
More info
I would suggest you read this article and focus more on how to optimize the overall response to the user request (either a page, a search result etc.)
A good argument for this is the example they give about using gzip to compress the page. Even though ttfb is faster when you do not compress, the overall experience of the user is worst because it takes longer to download content that is not zipped.

Should I convert my action method to async action method?

I have a web site where user can upload a PDF and convert it to WORD doc.
It works nice but sometimes (5-6 times per hour) the users have to wait more than usual for the conversion to take place....
I use ASP.NET MVC and the flow is:
- USER uploads file -> get the stream and convert it to word -> save word file as a temp file -> return the user the url
I am not sure if I have to convert this flow to asynchronous? Basically, my flow is sequential now BUT I have about 3-5 requests per second and CPU is dual core and 4 GB Ram.
And as I know maxConcurrentRequestsPerCPU is 5000; also The default value of Threads Per Processor Limit is 25; so these default settings should be more than fine, right?
Then why still my web app has "waitings" some times? Are there any IIS settings I need to modify from default to anything else or I should just go and make my sync method for conversion to be async?
Ps: The conversion itself is taking between 1 seconds to 40-50 seconds depending on the pdf file size.
UPDATE: Basically what it's not very clear for me is: if a user uploads a file and the conversion is long shouldn't only current request "suffer" because of this? Because the next request is independent, make another CPU call and different thread so should be no wait here, isn't it?
There are a couple of things that must be defined clearly here. Async(hronous) method and flow are not the same thing at least as far as I can understand.
An asynchronous method (using Task, usually also leveraging the async/await keywords) will work in the following way:
The execution starts on thread t1 until it reaches an await
The (potentially) long operation will not take place on thread t1 - sometimes not even on an app thread at all, leveraging IOCP (I/O completion ports).
Thread t1 is free and released back to the thread pool and is ready to service other requests if needed
When the (potentially) long operation returns a thread is taken from the thread pool (could even be the same t1 or, most probably, another one) and the rest of the code execution resumes from the last await encountered
The rest of the code executes
There's a couple of things to note here:
a. The client is blocked during the whole process. The eventual switching of threads and so on happens only on the server
b. This approach is mainly designed to alleviate an unwanted condition called 'thread starvation'. It is not meant to speed up the total client waiting time and it usually doesn't speed up the process.
As far as I understand an asynchronous flow would mean, at least in this case, that after the user's request of converting the document, the client (i.e. the client's browser) would quickly receive a response in which (s)he is informed that this potentially long process has started on the server, the user should be patient and this current response page might provide progress feedback.
In your case I recommend the second approach because the first approach would not help at all.
Of course this will not be easy. You need to emulate a queue, you need to have a processing agent and an eviction policy (most probably enforce by the same agent if you don't want a second agent).
This would work along the following lines:
a. The end user submits a file, the web server receives it
b. The web server places it in the queue and receives a job number
c. The web server returns the user a response with the job number (let's say an HTML page with a polling mechanism that would periodically receive progress from the server)
d. The agent would start processing the document when it gets the chance (i.e. finishes other work) and update its status in a common place for the web server to pick this information
e. The web server would receive calls from the HTML response asking for the status of the job and would find out that the job is complete and offer a download link or start downloading it directly.
This can be refined in some ways:
instead of the client polling the server, websockets or long polling (for example SignalR covers both) could be used
many processing agents could be used instead of one if the hardware configuration makes sense
The queue can be implemented with a simple RDBMS, Remus Rușanu has a nice article about this.

Resources