Effects of setting PersistMessages to N and FileStorePath issues in QuickFixJ: - quickfixj

I am running into out of memory issues after a certain amount of time when I run my quickfixj app. After a little investigation I found out that this was being caused by messages that quickfixj caches for re sending when a resend request is received.
So for testing I set this flag to N on a particular session. After that my memory problems completely disappeared. But I do not understand why quickfixj is keeping these message in memory when I have properly set this property : FileStorePath. These messages should be stored into a file but they are not. I do see some files present in the directory I set in FileStorePath but none of them seems to be storing messages, I can only see sequence number in them. Do I need to set other flags besides this in order to make this work?
I do not plan to use PersisMessages flag outside testing. I would prefer FileStoreMaxCachedMsgs flag with a reasonable figure. I also need to know what will happen if my app receives a resend request when I have set PersisMessages to N? Will quickfixj send gapfills instead or will it crash with some exception message?
Thanks

i ve found that quickfixj sends gap fills when it could not find messages. also the config flag FileStoreMaxCachedMsgs is used to tell quickfixj about how much messages it should keep in cache before pushing them down to files. so this flag, in my opinion, should be altered in order to get your app to work without running out of memory due to message caching.
hope it ll be helpful for somebody. Thanks

Related

Esper 8.2 statement stops matching events

In my application I have about 100 continuous Esper filter queries with events being sent in. At some point for an unknown reason some of statemens stop matching events and never match any further event without ever throwing an exception (nothing logged in log4j default logging setup). This is not reproducible in a small example, and I realize that it's difficult to pinpoint a problem like this, but I'm writing this in the hopes of this being a known and/or fixed issue.
I would suggest to review your application code to make sure it is still sending events and the listener/subscriber code to make sure it is still processing output events, i.e. exception handling and logging. Or perhaps an OOM occurs and logging doesn't happen and thus you may want to check heap memory use. Also look at console out see if the JVM has encountered an issue.

How can I use visibility timeout as a "retry" mechanism?

My use case is that I have a queue from which my application reads messages. I need to delete the message if it is processed correctly, else I want it to stay in the queue. If my application processes the same message again, does that count as a "retry" based on which maximum receives works, or does maximum receives work as a different metric? if i cannot use it as a retry, can you help me with how I could go around implementing it?
Really sorry about the crude ideas I have compiled, but I am a newbie in terms of AWS, and wanted this done a bit quicker than expected.
Thanks for any help.
Maybe can you take a look on dead letter queues. When a message generate an unexpected error it can be retried according to maxReceiveCount, if there is still an error after all allowed retries then message will be moved in a dead letter queue according to redrive policy :
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-dead-letter-queues.html
Hope this helps.

How to handle SAP Kapsel Offline app OData conflicts properly?

I build an app that is able to store OData offline by using SAP Kapsel Plugins.
More or less it's the same as generated by WEB ID or similer to the apps in this example: https://blogs.sap.com/2017/01/24/getting-started-with-kapsel-part-10-offline-odatasp13/
Now I am at the point to check the error resolution potential. I created a sync conflict (chaning data on the server after the offline database was stored and changed something on the app and started a flush).
As mentioned in the documentation I can see the error in ErrorArchive and could also see some details. But what I am missing is the information of the "current" data on the database.
In the error details I can just see the data on the device but not the data changed on the server.
For example:
Device is loading some names into offline store
Device is offline
User A is changing some names
User B is changing one of this names directly online
User A is online again and starts a sync
User A is now informend about the entity that was changed BUT:
not the content user B entered
I just see the "offline" data.
Is there a solution to see the "current" and the "offline" one in a kind of compare view?
Please also note that the server communication is done by the Kapsel Plugin and not with normal AJAX calls. This could be an alternative but I am wondering if there is no smarter way supported by the API?
Meanwhile I figured out how to load the online data (manually).
This could be done by switching http handler back to normal one.
sap.OData.removeHttpClient();
sap.OData.applyHttpClient();
Anyhow this does not look like a proper solution and I also have the issue with the conflict log itself. It must be deleted before any refresh could be applied.
I could not find any proper documentation for that. Also ETag handling is hardly described in SAPUI5 and SAP Kapsel documentation.
This question is a really tricky one, due to its implications. I understand that you are simulating a synchronization error due to concurrent modification, and want to know if there is a way for the client to obtain the "current" server state in order to give the user a means to compare the local and server state.
First, let me give you the short answer: No, there is no way for the client to see the current server state "for reference" via the Offline APIs when there are synchronization errors. Doing an online query as outlined above might work, but it certainly is a bad idea.
Now for the longer answer, which explains why this is not necessarily a defect and why I said there are quite some implications to the answer.
Types of Synchronization Errors
We distinguish a number of synchronization errors, and in this context, we are clearly dealing with business-related issues. There are two subtypes here: Those that the user can correct, e.g. validation errors, and those that are issues in the business process itself.
If the user violates the input range, e.g. by putting a negative price for a product, the server would reply with the corresponding message: "-1 is not a valid input value for 'Price'". You, as a developer, can display such messages to the user from the error archive, and the ensuing fix is indeed a very easy one.
Now when we talk about concurrent modification, things get really, really nasty. In fact, I like to say that in this case there is an issue with the business process, because on one hand, we allow data to get out of sync. On the other hand, the process allows multiple users to manipulate the same piece of information. How all relevant users should now be notified and synchronize, is no longer just a technical detail, but in fact a new business process. There just is no way to generically device how to handle this case. In most cases, it would involve back-office experts who need to decide how the changes should be merged.
A Better Solution
Angstrom pointed out that there is no way to manipulate ETags on the client side, and you should in fact not even think about it. ETags work like version numbers in optimistic locking scenarios, and changing the ETag basically means "Just overwrite what's on the server". This is a no-go in serious scenarios.
An acceptable workaround would be the following:
Make sure the server returns verbose error messages so that the user can see what happened and what caused the conflict.
If that does not help, refresh the data. This will get you an updated ETag, and merge the local changes into the "current" server state, but only locally. "Merging" really means that local changes always overwrite remote changes.
The user now has another opportunity to review the data and can submit it again.
A Good Solution
Better is not necessarily good, so here is what you should really do: Never let concurrent modification happen because it is really expensive to handle. This implies that not the developer should address this issue, but the business needs to change the process.
The right question to ask is, "When you replicate data in a distributed system, why do you allow it to be modified concurrently at all?" Typically stakeholders will not like this kind of question, and the appropriate reaction is to work out a conflict resolution process together with them. Only then they will realize how expensive fixing that kind of desynchronization is, and more often than not they will see that adjusting the process is way cheaper than insisting in yet another back-office process to fix the issues it causes. Even if they insist that there is a need for this concurrent modification, they will now understand that it is not your task to sort this out and that they need to invest in a conflict resolution process.
TL;DR
There is no way to compare the server and client state to the server state on the client, but you can do a refresh to retain the local changes and get an updated ETag. The real solution, however, is to rework the business process, because this no longer is a purely technical issue.
The default solution is that SMP or HCPms is detecting errors by ETags. At client side there is no API to manipulate ETags in case of conflicts. A potential solution to implement a kind of diff view on the device would work like this:
Show errors
Cache errors (maybe only in memory?)
delete the errors
do a refresh of the database
build a diff view with current data and cached errors
The idea with
sap.OData.removeHttpClient();
sap.OData.applyHttpClient();
could also work but could be very tricky and may introduce side effects.
Maybe some requests are triggered against the "wrong" backend.

Detect "Delay write failed" occurence

I am loosing my patience with "delay write failed" errors. It silently disconnects the database from the application so nothing gets saved in the database while using it. Is there a way to detect the occurrence itself so I can flash a warning ? Or perhaps monitoring the connection itself for a disconnection ? Everyone seems to miss the balloon tip from the Windows XP so I figured to flash a more visible warning that the application must be restarted. It seems Microsoft has found a way to force people to upgrade....
I suppose this could be done with a timer and constantly check connected users:
cxlabel1.Caption := IntToStr(DataModule2.ABSDatabase1.GetDBFileConnectionsCount);
But I was thinking more of checking/detecting for the occurence itself. Is there something in Delphi that can detect this?
I would like to hear your ideas on this...
Putting this as an answer because the comment length is limited.
I have seen this before. IIRC, the problem you have is that the Delayed Write Error is an OS error, it has nothing to do with your application. Windows has already told you that the write has been committed correctly to disk. You would have to try and hook into the OS errors to see when this is happening.
You need to get the network issues resolved because that's where the problem is. In our situation it was a faulty router that was causing the problem.
It's unfair to expect users to check for the error message and then handle it. They could be out at lunch when it occurs as it's not immediate. They also have no way of know what has been saved and what hasn't. It's only a matter of time before your database is corrupted.
The problem with a timer is that it might tell you everything is fine because it triggers after the network resolves the problems.
A far better approach would be to switch to a Client/Server database. You can do this by setting up your own server that listens for web service or another remote call or switch to a database that supports client/server instead of using a file based database. This will tell you immediately when there is a problem with the save of data.

Erlang/OTP framework's error_logger hangs under fairly high load

My application is basically a content based router which will route MMS events.
The logger I am using is the one that comes with the OTP framework in SASL mode "error_logger"
The issue is ::
I am using a client to generate MMS events with default values. This client (in Java) has the ability to send high load of events in multiple THREADS
I am sending 100 events in 10 threads (each thread sending 10 MMS events) to the my router written in Erlang/OTP.
The problem is, when such high load is received by my router , my Logger hangs i.e it stops updating my Log file. But the router is still able to route the events.
The conclusions that I have come up with is ::
Scheduling problem in Erlang when such high load of events is received (a separate process for each event).
A very unlikely dead-loack state.
Might be due to sending events in multiple threads rather than sending them sequentially. But I guess a router will be connected to multiple service provider boxes, so I thought of sending events in threads.
Can anybody help mw in demystifying the problem?
You already have a good answer, but I'll add to the discussion.
The error_logger is by default using cached write operations to disk. So one possibility is that you don't really notice this while under low load, but under high load your writes get stuck in the cache for a while.
On a side note: there should be no problem having multiple threads doing calls to Erlang.
Another way of testing this is to add your own logger to error_logger, and see what happens. Possibly printing to the shell or something else that is "fast".
Which version of Erlang are you using? Prior to R14A (R13B4 maybe?), there was a performance penalty when you invoked a selective receive when the message queue contained a lot of messages. This behaviour meant that in a process that receives lots of messages (error_logger being the canonical example), if it was barely keeping up with the load then a small spike in load could cause the cost of processing to spike up and stay there as the new processing cost was higher than the process could bear. This problem has been solved in R14A.
Secondly - why are you sending a high volume of events/calls/logs to a text logger? Formatting strings for output to a human readable log file is a lot more expensive than using a binary disk_log for instance. Reducing the cost of logging will help, but reducing the volume of logs will help even more. Maybe investigate exactly why you need to log these things and see if you can't record them another (less expensive) way.
Problems with error_logger are often symptoms of some other overload problem. Try looking at the message queue sizes for all your processes when this problem occurs and see if something else is backed up too. The following erlang shellcode might help:
[ { P, element(2, process_info(P, message_queue_len)) }
|| P <- erlang:processes(), is_process_alive(P) ]

Resources