I have this simple piece of code that writes to a socket, then reads the response from the server. The server is very fast (responds within 5ms every time). However, while writing to the socket is quick -- reading the response from the socket is always MUCH slower. Any clues?
module UriTester
module UriInfo
class << self
def send_receive(socket, xml)
# socket = TCPSocket.open("service.server.com","2316")
begin
start = Time.now
socket.print(xml) # Send request
puts "just printed the xml into socket #{Time.now - start}"
rescue Errno::ECONNRESET
puts "looks like there is an issue!!!"
socket = TCPSocket.open("service.server.com","2316")
socket.print(xml) # Send request
end
response=""
while (line =socket.recv(1024))
response += line
break unless line.grep(/<\/bcap>/).empty?
end
puts "SEND_RECEIVE COMPLETED. IN #{Time.now - start}"
# socket.close
response
end
end
end
end
Thanks!
Writing to the socket will always be way faster than reading in this case because the write is local to the machine and the read has to wait for a response to come over the network.
In more detail, when the call to write / send returns all the system is telling you is that N bytes have been successfully copied to the sockets kernel space buffer. It does not mean that the data has actually been sent across the network yet. In fact, the data can sit in the socket buffer for quite a long time ( assuming you're using TCP ). This is because of something called the Nagle algorithm which is intended to make efficient use of network bandwidth. This unseen Nagle delay adds to the round trip time and how long till you get a response. The server might also delay it's response for the same reason adding even more to the response time.
So when you time the write and it returns quickly, that doesn't actually mean anything useful.
As mentioned earlier the read from the socket will be way longer since when you time the read you are actually timing the round trip travel time plus server response time, which is always going to be way slower than the time it takes to copy data from a user space program to a kernel buffer.
What is it that you are actually trying to measure and why?
Related
I'm evaluating k6 for my load testing needs. I've set up a basic load test and I'm currently trying to interpret the error messages and result values I get. Maybe someone can help me interpret what I'm seeing:
If I crank up the VUS to about 300, I start seeing error messages in the console and at 500 lots of error messages.
These mostly consist of:
dial tcp XXX:443: i/o timeout
read tcp YYY(local ip):35252->XXX(host ip):443: read: connection reset by peer
level=warning msg="Request Failed" error="unexpected EOF"
Get https://REQUEST_URL/: context deadline exceeded"
I also have problems with several checks:
check errors in which res.status === 0 and res.body === null
check errors in which res.status === 0, but the body contains the correct content
How can res.status be 0 but the body still contains the proper values?
I suspect that I'm reaching the connection limit of my load producing machine and that's why I get the error messages. So I'd have to set up a cluster or move to the Cloud runners!?
The stats generated by k6 show long http_req_blocked values, which I interpret as the time waiting to get a connection port. This seems to indicate that the connection pool of my test running machine is at its limits.
http_req_blocked...........: avg=5.66s min=0s med=3.26s max=59.38s p(90)=13.12s p(95)=20.31s
http_req_connecting........: avg=1.85s min=0s med=280.16ms max=24.27s p(90)=4.2s p(95)=9.24s
http_req_duration..........: avg=2.05s min=0s med=496.24ms max=1m0s p(90)=4.7s p(95)=8.39s
http_req_receiving.........: avg=600.94ms min=0s med=82.89µs max=58.8s p(90)=436.95ms p(95)=2.67s
http_req_sending...........: avg=1.42ms min=0s med=35.8µs max=11.76s p(90)=56.22µs p(95)=62.45µs
http_req_tls_handshaking...: avg=3.85s min=0s med=1.78s max=58.49s p(90)=8.93s p(95)=15.81s
http_req_waiting...........: avg=1.45s min=0s med=399.43ms max=1m0s p(90)=3.23s p(95)=5.87s
Can anyone help me out interpret the results I'm seeing?
You are likely running out of CPU on the runner.
As explained in the http specific metrics of the documentation, you are right about http_req_blocked it is (mostly) the time from when we say we want to make a
request to when we get a socket on which to do it. This is most likely because:
the test runner is running out of CPU and can't handle both making all the other request and starting new
the system under test is running out of CPU and has ... the same problem
You will need to monitor them (you are highly advised to do this regardless) as test at 100% runner CPUs are probably not very representable :) and you likely don't want the system you are testing to get to 100% as well.
The status code === 0 means that we couldn't make the request/read the response ... for some reason, usually explained by the error and error_code.
As I commented if you have status code 0 and a body this is most likely a bug ... at least I don't remember there being a case where this won't be true.
The errors you have list mean (most likely):
dial tcp XXX:443: i/o timeout
this is literally we tried to get a tcp connection and it took too long (probably the reason for the big http_req_blocking)
read tcp YYY(local ip):35252->XXX(host ip):443: read: connection reset by peer
the other side closed the connection .. likely because some timeout was reached - for example, if we don't read over 30 seconds the server decides that we won't read anymore and closes it ... and in the case where CPU is 100% there is a good chance some connection won't get time to be read from.
level=warning msg="Request Failed" error="unexpected EOF"
literally, what it says .. the connection was closed when we totally didn't expect, or more accurately the golang net/http stdlib didn't expect. Likely again a timeout just at a point in the life of the request where the other errors aren't returned.
Get https://REQUEST_URL/: context deadline exceeded"
This is because a request took longer then the timeout (by default 60s) and will at some point be changed to a better error message.
I posted this to the Squeak Beginners list too - I'll be sure to make sure any answers from there get here :)
I'm using Squeak 4.2 and working on the smalltalk end of a named pipe connection, which sends a message to the named pipe server with:
msg := 'Here''s Johnny!!!!'.
pipe nextPutAll: msg; flush.
It should then receive an acknowledgement, which will be a 32-byte md5 hash of the received message (which the smalltalk app can then verify). It's possible the named pipe server may have gone away or otherwise been unable to deal with the request, and so I'd like to set a timeout on reading the acknowledgement. I've tried using this:
ack := [ pipe next: 32 ] valueWithin: (Duration seconds: 3) onTimeout: [ 'timeout'. ].
and then made the pipe server pause artificially to test the code. But the smalltalk thread blocks on the read and doesn't carry on (even after the timeout), although if I then get the pipe server to send the correct response (after a 5 second delay, for example), the value of 'ack' is 'timeout'. Obviously the timeout did what it's supposed to do, but couldn't 'unblock' the blocking read on the pipe.
Is there a way to accomplish this even with a blocking FileStream read? I'd rather avoid a busy wait on there being 32 characters available if at all possible.
This one may come in handy but not on Windows I am afraid.
http://www.samadhiweb.com/blog/2013.07.27.unixdomainsockets.html
I wrote a admin script that tails a heroku log and every n seconds, it summarizes averages and notifies me if i cross a certain threshold (yes I know and love new relic -- but I want to do custom stuff).
Here is the entire script.
I have never been a master of IO and threads, I wonder if I am making a silly mistake. I have a couple of daemon threads that have while(true){} which could be the culprit. For example:
# read new lines
f = File.open(file, "r")
f.seek(0, IO::SEEK_END)
while true do
select([f])
line = f.gets
parse_heroku_line(line)
end
I use one daemon to watch for new lines of a log, and the other to periodically summarize.
Does someone see a way to make it less processor-intensive?
This probably runs hot because you never really block while reading from the temporary file. IO::select is a thin layer over POSIX select(2). It looks like you're trying to block until the file is ready for reading, but select(2) considers EOF to be ready ("a file descriptor is also ready on end-of-file"), so you always return right away from select then call gets which returns nil at EOF.
You can get a truer EOF reading and nice blocking behavior by avoiding the thread which writes to the temp file and instead using IO::popen to fork the %x[heroku logs --ps router --tail --app pipewave-cedar] log tailer, connected to a ruby IO object on which you can loop over gets, exiting when gets returns nil (indicating the log tailer finished). gets on the pipe from the tailer will block when there's nothing to read and your script will only run as hot as it takes to do your line parsing and reporting.
EDIT: I'm not set up to actually try your code, but you should be able to replace the log tailer thread and your temp file read loop with this code to get the behavior described above:
IO.popen( %w{ heroku logs --ps router --tail --app my-heroku-app } ) do |logf|
while line = logf.gets
parse_heroku_line(line) if line =~ /^/
end
end
I also notice your reporting thread does not do anything to synchronize access to #total_lines, #total_errors, etc. So, you have some minor race conditions where you can get inconsistent values from the instance vars that parse_heroku_line method updates.
select is about whether a read would block. f is just a plain old file, so you when get to the end reads don't block, they just return nil instantly. As a result select returns instantly rather than waiting for something to be appending to the file as I assume you're expecting. Because of this you're sitting in a tight busy loop, so high cpu is to be expected.
If you are at eof (you could either check f.eof? or whether gets returns nil), then you could either start sleeping (perhaps with some sort of back off) or use something like listen to be notified of filesystem changes
I have been using open_uri to pull down an ftp path as a data source for some time, but suddenly found that I'm getting nearly continual "530 Sorry, the maximum number of allowed clients (95) are already connected."
I am not sure if my code is faulty or if it is someone else who's accessing the server and unfortunately there's no way for me to really seemingly know for sure who's at fault.
Essentially I am reading FTP URI's with:
def self.read_uri(uri)
begin
uri = open(uri).read
uri == "Error" ? nil : uri
rescue OpenURI::HTTPError
nil
end
end
I'm guessing that I need to add some additional error handling code in here...
I want to be sure that I take every precaution to close down all connections so that my connections are not the problem in question, however I thought that open_uri + read would take this precaution vs using the Net::FTP methods.
The bottom line is I've got to be 100% sure that these connections are being closed and I don't somehow have a bunch open connections laying around.
Can someone please advise as to correctly using read_uri to pull in ftp with a guarantee that it's closing the connection? Or should I shift the logic over to Net::FTP which could yield more control over the situation if open_uri is not robust enough?
If I do need to use the Net::FTP methods instead, is there a read method that I should be familiar with vs pulling it down to a tmp location and then reading it (as I'd much prefer to keep it in a buffer vs the fs if possible)?
I suspect you are not closing the handles. OpenURI's docs start with this comment:
It is possible to open http/https/ftp URL as usual like opening a file:
open("http://www.ruby-lang.org/") {|f|
f.each_line {|line| p line}
}
I looked at the source and the open_uri method does close the stream if you pass a block, so, tweaking the above example to fit your code:
uri = ''
open("http://www.ruby-lang.org/") {|f|
uri = f.read
}
Should get you close to what you want.
Here's one way to handle exceptions:
# The list of URLs to pass in to check if one times out or is refused.
urls = %w[
http://www.ruby-lang.org/
http://www2.ruby-lang.org/
]
# the method
def self.read_uri(urls)
content = ''
open(urls.shift) { |f| content = f.read }
content == "Error" ? nil : content
rescue OpenURI::HTTPError
retry if (urls.any?)
nil
end
Try using a block:
data = open(uri){|f| f.read}
This question already has an answer here:
Overriding/Modifying Rails Class (ActiveResource)
(1 answer)
Closed 3 years ago.
I'm trying to contact a REST API using ActiveResource on Rails 2.3.2.
I'm attempting to use the timeout functionality so that if the resource I'm contacting is down I can fail quickly - I'm doing this with the following:
class WorkspaceResource < ActiveResource::Base
self.timeout = 5
self.site = "http://mysite.com/restAPI"
end
However, when I try to contact the service when I know it isn't available, the class only times out after the default 60 seconds. I can see from the error stack that the timeout error does indeed come from an ActiveResource class in my gem folder that has the proper functions to allow timeout settings, but my set timeout never seems to work.
Any thoughts?
So apparently the issue is not that timeout is not functioning. I can run a server locally, make it not return a response within the timeout limit, and see that timeout works.
The issue is in fact that if the server does not accept the connection, timeout does not function as I expected it to - it doesn't function at all. It appears as though timeout only works when the server accepts the connection but takes too long to respond.
To me, this seems like an issue - shouldn't timeout also work when the server I'm contacting is down? If not, there should be another mechanism to stop a bunch of requests from hanging...anyone know of a quick way to do this?
The problem
If you're running on Ruby 1.8.x then the problem is its lack of real system threads.
As you can read first hereand then here, there are systemic problems with timeouts in Ruby. An interesting discussion but for you in particular some comments suggest that the timeout is effectively ignored and defaults to 60 seconds - exactly what you are seeing.
Solutions ...
I have a similar issue with our own product when trying to send emails - if the email server is down the thread blocks. For me the solution was to spin the request off on a separate thread and therefore my main request-processing thread doesn't block.
There are non-blocking libraries out there for Ruby but perhaps you could take a look first at this System Timeout Gem.
An option open to anyone using Rails behind a proxy like nginx would be to set the upstream timeout to a lower number - that way you'll get notified if the server is taking too long. I'd only do this if I were really stuck for a solution.
Last but not least, it's possible that running Rails 2.3.2 on top of Ruby 1.9.1 will fix the issue.
Alternatively, you could try to catch these connection errors and retry once (after certain period of time) just to make sure the connection is really out.
retried = false
begin
#businesses = Business.find(:all, :params => { :shop_domain => #shop.domain })
retried = false
rescue ActiveResource::TimeoutError => ex
#raise ex
rescue ActiveResource::ConnectionError, ActiveResource::ServerError, ActiveResource::ClientError => ex
unless retried
sleep(((ex.respond_to?(:response) && ex.response['Retry-After']) || 5).to_i)
retried = true
retry
else
# raise ex
end
end
Inspired by this solution from Shopify for paginating a large number of records. https://ecommerce.shopify.com/c/shopify-apis-and-technology/t/paginate-api-results-113066