I've been writing a Neo4j server extension (as described here, i.e. a managed server extension: http://docs.neo4j.org/chunked/stable/server-plugins.html). It should just get a string via POST (which in productivity would hold information the extension should process, but this is of no further concern here). It tried the extension with Neo4j 1.8.2 and 1.9.RC2, the outcome was the same.
Now my problem is that sometimes this extension does quite a lot of work which can take a couple of minutes. However, after exactly 200 seconds, the connection gets lost. I'm not absolutely sure what is happening, but it seems the server is dropping the connection.
To verify this behavior, I wrote a new, almost empty server extension which does nothing else but to wait 5 minutes (via Thread.sleep()). From a test-client-class, I POST some dummy data. I tested with Jersey, Apache HTTPcomponents and plain Java URL connections. Jersey and plain Java do a retry after exactly 200 seconds, the HTTPcomponents throw " org.apache.http.NoHttpResponseException: The target server failed to respond".
I think it's a server issue, first because the exception seems to stand for that in this context (there's a comment saying that in the httpcomponent's code) and second because when I set connection timeout and/or socket timeout to lower values than 200 seconds, I get just normal timeout exceptions.
Now there's one thing on top of that: I said I would POST some data. Seemingly this whole behavior depends on the amount of data sent. I pinned it down so far I can say, when sending a string of length ca. 4500 characters, the described behavior does NOT happen, but everything is alright and I get an HTTP 204 "no content" response which is correct.
As soon as I send ca. 6000 characters or more, the mentioned connection drop occurs. The string I'm sending here is only dummy data. The sent string is only a sequence of 'a', i.e. "aaaaaaaa..." created with a for loop with 4500 or 6000 iterations, respectively.
In my productive code I would really like to wait until the server operation has finished, but I don't how to prevent the connection drop.
Is there an option on the Neo4j server to configure (I looked but didn't find anything) or isn't it the server's fault and my clients do something wrong? A bug somewhere?
Thanks for reading and any help or hints!
Just to wrap this up: I eventually found out that there exists a default timeout constant in Jetty (version 6.x was used by Neo4j back then, I think) set to exactly 200 seconds. This could be changed using the Jetty API but the Neo4j server did not appear to offer any possibility to configure this.
Changing to Neo4j 2.x eventually solved the issue (why exactly is unknown). With those newer version of Neo4j the issue did not come up anymore.
Related
We are 2 working on a website with Ruby on Rails that receives GPS coordinates sent by a tracking system we developped. This tracking system send 10 coordinates every 10 seconds.
We have 2 servers to test our website and we noticed that one server is processing the 10 coordinates very quickly (less than 0.5 s) whereas the other server is processing the 10 coordinates in 5 seconds minimum (up to 20 seconds). We are supposed to use the "slow" server to put our website in production mode this is why we would try to solve this issue.
Here is an image showing the time response of the slow server (on the bottom we can see 8593 ms).
Slow Server
The second image shows the time response of the "quick" server.
Fast Server
The version of the website is the same. We upload it via Github.
We can easily reproduce the problem by sending fake coordinates with POSTMan and difference of time between the two servers remain the same. This means the problem does not come from our tracking system in my opinion.
I come here to find out what can be the origins of such difference. I guess it can be a problem from the server itself, or from some settings that are not imported with Github.
We use Sqlite3 for our database.
However I do not even know where to look to find the possible differences...
If you need further information (such as lscpu => I am limited to a number of 2 links...) in order to help me, please do not hesitate. I will reply very quickly as I work on it all day long.
Thank you in advance.
EDIT : here are the returns of the lscpu commands on the server.
Fast Server :
Slow Server :
May be one big difference is the L2 cache...
My guess is that the answer is here but how can I know what is my value of pragma synchronous and how can I change it ?
The size of the .sqlite3 file I use is under 1 Mo for the tests. Both databases should be identical according to my schema.rb file.
The provider of the "slow" server solved the problem, however I do not know the details. Some things were consuming memory and slowing down everything.
By virtual server, it means finally that several servers are running on the same machine, each is attributed a part of the machine.
Thanks a lot for your help.
Shortly, our project uses a Thrift server and mobile clients with multiplexing.
While I was developing the iOS client, I encountered a strange problem;
When I first created the client and made calls, it is OK and it works as expected.
Since there is no close method for Cocoa Thrift client, I am hoping ARC will take care of it.
After some time, I create another client for the same service and do the same things, but this time, when I made a service call, client hangs and after some time in throws a "'TTransportException', reason: 'Cannot read. Remote side has closed.'".
In the server, operation is successfully completed and the value returned.
Does anybody have an idea about what I am doing wrong?
Thanks in advance!
Reading your question i remembered that we encountered a very similar problem in very a different environment. If ARC takes care of your client and closes the connection, especially the port, this might be the reason why recreating the client again with the same port is the root of your problem. Opening the same port shortly after closing it can take a very long time (minutes) depending on timeouts.
Sorry no real answer to your problem but maybe a hint were to look for.
How do set - in my case raise - the connection timeouts of the Neo4j server? I have a server extension to which I POST data, sometimes that much that the extension is running for a couple of minutes. But after 200 seconds, the connection is dropped by the server. I think I have to raise the max idle time of the embedded jetty - but I don't know how to do that since it's all configured within the Neo4j Server code.
Edit: I've tried both Neo4j 1.8.2 and 1.9.RC2 with the same result.
Edit2: Added the "embedded-jetty" tag because there are no answers until now; perhaps the question can be answered by someone with knowledge about embedded Jetty since Neo4j uses an embedded Jetty.
Thank you!
I still don't know if there is a solution in the Neo4j server with versions <2.0. However, with switching to 2.0.0 and above, this issue was gone for my case.
The server guards against orphaned transactions by using a timeout. If there are no requests for a given transaction within the timeout period, the server will roll it back. You can configure the timeout period by setting the following property to the number of seconds before timeout. The default timeout is 60 seconds.
org.neo4j.server.transaction.timeout=60
See http://docs.neo4j.org/chunked/stable/server-configuration.html
I've been working on this bug for several days now and couldn't solve it.
I wrote an HttpsURLConnection client to upload large files(>1GB) using POST requests.
I also implemented the server side using com.sun.net.httpserver.HttpServer.
As the files are quite big I have to use the: setFixedLengthStreamingMode/setChunkedStreamingMode settings on my connection
(bug is reproduced using either).
Please notice I'm using an HTTPS connection for the upload as well.
I'm uploading the file to several servers simultaneously (seperate thread for each http client, connected to a different server).
I have set a limit on the concurrent uploads so each time only X threads have an open UrlConnection (bug was reproduced with X=[1..4]).
(The other threads wait on a semaphore)
My problem is such:
When uploads takes less than 5 minutes (less than 4:50 minutes to be accurate) everything works just fine.
If the first batch of threads takes more then 5 minutes to finish then an Exception is thrown for every active thread:
java.io.IOException: Error writing request body to server
at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(Unknown Source)
The exception is thrown while trying to write to the HttpURLConnection output stream.
[outputStream.write(buffer,0,len);]
The next batch of threads will work just fine (even if they take more then 5
minutes).
Please notice that the servers are completely identical, and now the process will not fail thus leading me to think that the problem is not on the server side.
(If it was then the second batch was suppose to fail after 5 minutes as well...)
I have reproduced this issue with/without connect/read timeouts on the connection.
Furthermore, on the server side I've seen the file is being created and growing until the exception occurs.
About 20-40 seconds after the client throws an exception the server will throw an IOException "read timeout".
I have collected a TCP/IP sample using wireshark and saw that the server sends me a FIN packet at about the time of the client exception, I have no idea why.
(All connection seems functioning prior to that)
I have read many threads on similiar issues but couldn't find any proper solution.
(including Using java.net.URLConnection to fire and handle HTTP requests)
Any ideas on why is this happening?
How can I find the cause of it?
How can I solve it?
Many Thanks.
P.S
I didn't publish the code because it is pretty long...
But if it could help understanding my problem I will be glad to do so.
I have a quite big application, running from inside spree extension. Now the issue is, all requests are very slow even locally. I am getting messages like 'Waiting for localhost" or "waiting for server" in my browser status bar for 3 - 4 seconds for each request issued, before it starts execution. I can see execution time logged in log file is quite good. But overall response time is poor because of initial delay. So please suggest me, where can I start looking into improving this situation?
One possible root cause for this kind of problem is that initial DNS name resolution is failing before eventually resolving. You can check if this is the case using tcpdump (if that's available for your platform) or wireshark. Look for taffic to and from your client host on port 53 and see if the name responses are happening in a timely fashion.
If it turns out that this is the problem then you need to make sure that the client is configured such that the first resolver it trys knows about your server addresses (I'm guessing these are local LAN addresses that are failing). Different platforms have different ways of configuring this. A quick hack would be to put the address of your server in the client's hosts file to see if that fixes it.
Once you send in your request, you will see 'waiting for host' right up until the Ruby work is done, and it starts sending a response. So, if there is pretty much any processing work that is slowing you down, you'd see this error. What you'd want to do is start looking at the functions that youre seeing the behaviour on, and breaking them down into pieces to see which peices are slow. If EVERYTHING is slow, than you need to look at the things that are common to every function - before functions, or Application Controller code, or something similar. What I do, when I'm just playing around to see what I need to fix is just put 'puts' statements in my code at different stages, to print the current time, then I can see which stage is taking a long time, you know?