informatica code has generated 476 Gigs of session logs in Dev - data-warehouse

My informatica code has generated 476 gigs of session logs in Dev.
Can some one explain how to reduce these logs, any settings in session level.
what are the checks I need to do in Session level and coding level.

This is usually due to logging levels set on the session
Edit the session. Select config object tab there is override tracing which should be set to Normal not verbose initialisation or verbose data. Same for the tracing level on each of your transformations in the mapping tab of the session.
You don't debug by generating millions of lines of logfile and trawling through all of them, you find ways to reduce the load such as running fewer records otherwise this is performance test where logging level should not be set above normal

You have to first check what being logged. One common scenario I have seen is the session log recording all rejected data, which causes the session log to be huge. If that is the case, and if you are expecting data to be rejected you can switch off record level logging in session properties.

Set the session property Stop on Error to 1. Thus if there is an error in the session and records are being rejected you can see that upfront, otherwise session will run probably long time and you will see there are errors in the processing.
Also make sure tracing is set to Normal.

Related

Way to get current editor id (doc id) in HiveServerClient in Hue source code

When having multiple hue pages run tez applications at the same time, it, sometimes, will apply the same session to two different tasks, which will cause of of them receiving KILL signal and the other one complains that current app master is being used and retrying. I looked into the code of HiveServerClient._get_tez_session and I think the problem lies in the way busy_sessions is retrieved, which is not thread-safe. So there's chance that two query will be allocated to the same session when submitted virtually the same time.
I'd like to know is there any way to get current editor id (doc_id) from HiveServerClient._get_tez_session method, so I could do some hacking for a quick solution now. Thanks.
You can solve this by disabling Tez session mode
set tez.am.mode.session=false;
Session mode is more aggressive in reserving execution resources and
is typically used for interactive applications where multiple DAGs are
submitted in quick succession by the same user. For long running
applications, one-off executions, batch jobs etc non-session mode is
recommended. If session mode is enabled then container reuse is
recommended.
Also try to disable container reuse:
set tez.am.container.reuse.enabled=false;
See all Tez configuration settings here.
Also read this thread about Tez session naming.
I did not test it myself, maybe you can use hive.session.id property for getting/setting session id's.

How to handle SAP Kapsel Offline app OData conflicts properly?

I build an app that is able to store OData offline by using SAP Kapsel Plugins.
More or less it's the same as generated by WEB ID or similer to the apps in this example: https://blogs.sap.com/2017/01/24/getting-started-with-kapsel-part-10-offline-odatasp13/
Now I am at the point to check the error resolution potential. I created a sync conflict (chaning data on the server after the offline database was stored and changed something on the app and started a flush).
As mentioned in the documentation I can see the error in ErrorArchive and could also see some details. But what I am missing is the information of the "current" data on the database.
In the error details I can just see the data on the device but not the data changed on the server.
For example:
Device is loading some names into offline store
Device is offline
User A is changing some names
User B is changing one of this names directly online
User A is online again and starts a sync
User A is now informend about the entity that was changed BUT:
not the content user B entered
I just see the "offline" data.
Is there a solution to see the "current" and the "offline" one in a kind of compare view?
Please also note that the server communication is done by the Kapsel Plugin and not with normal AJAX calls. This could be an alternative but I am wondering if there is no smarter way supported by the API?
Meanwhile I figured out how to load the online data (manually).
This could be done by switching http handler back to normal one.
sap.OData.removeHttpClient();
sap.OData.applyHttpClient();
Anyhow this does not look like a proper solution and I also have the issue with the conflict log itself. It must be deleted before any refresh could be applied.
I could not find any proper documentation for that. Also ETag handling is hardly described in SAPUI5 and SAP Kapsel documentation.
This question is a really tricky one, due to its implications. I understand that you are simulating a synchronization error due to concurrent modification, and want to know if there is a way for the client to obtain the "current" server state in order to give the user a means to compare the local and server state.
First, let me give you the short answer: No, there is no way for the client to see the current server state "for reference" via the Offline APIs when there are synchronization errors. Doing an online query as outlined above might work, but it certainly is a bad idea.
Now for the longer answer, which explains why this is not necessarily a defect and why I said there are quite some implications to the answer.
Types of Synchronization Errors
We distinguish a number of synchronization errors, and in this context, we are clearly dealing with business-related issues. There are two subtypes here: Those that the user can correct, e.g. validation errors, and those that are issues in the business process itself.
If the user violates the input range, e.g. by putting a negative price for a product, the server would reply with the corresponding message: "-1 is not a valid input value for 'Price'". You, as a developer, can display such messages to the user from the error archive, and the ensuing fix is indeed a very easy one.
Now when we talk about concurrent modification, things get really, really nasty. In fact, I like to say that in this case there is an issue with the business process, because on one hand, we allow data to get out of sync. On the other hand, the process allows multiple users to manipulate the same piece of information. How all relevant users should now be notified and synchronize, is no longer just a technical detail, but in fact a new business process. There just is no way to generically device how to handle this case. In most cases, it would involve back-office experts who need to decide how the changes should be merged.
A Better Solution
Angstrom pointed out that there is no way to manipulate ETags on the client side, and you should in fact not even think about it. ETags work like version numbers in optimistic locking scenarios, and changing the ETag basically means "Just overwrite what's on the server". This is a no-go in serious scenarios.
An acceptable workaround would be the following:
Make sure the server returns verbose error messages so that the user can see what happened and what caused the conflict.
If that does not help, refresh the data. This will get you an updated ETag, and merge the local changes into the "current" server state, but only locally. "Merging" really means that local changes always overwrite remote changes.
The user now has another opportunity to review the data and can submit it again.
A Good Solution
Better is not necessarily good, so here is what you should really do: Never let concurrent modification happen because it is really expensive to handle. This implies that not the developer should address this issue, but the business needs to change the process.
The right question to ask is, "When you replicate data in a distributed system, why do you allow it to be modified concurrently at all?" Typically stakeholders will not like this kind of question, and the appropriate reaction is to work out a conflict resolution process together with them. Only then they will realize how expensive fixing that kind of desynchronization is, and more often than not they will see that adjusting the process is way cheaper than insisting in yet another back-office process to fix the issues it causes. Even if they insist that there is a need for this concurrent modification, they will now understand that it is not your task to sort this out and that they need to invest in a conflict resolution process.
TL;DR
There is no way to compare the server and client state to the server state on the client, but you can do a refresh to retain the local changes and get an updated ETag. The real solution, however, is to rework the business process, because this no longer is a purely technical issue.
The default solution is that SMP or HCPms is detecting errors by ETags. At client side there is no API to manipulate ETags in case of conflicts. A potential solution to implement a kind of diff view on the device would work like this:
Show errors
Cache errors (maybe only in memory?)
delete the errors
do a refresh of the database
build a diff view with current data and cached errors
The idea with
sap.OData.removeHttpClient();
sap.OData.applyHttpClient();
could also work but could be very tricky and may introduce side effects.
Maybe some requests are triggered against the "wrong" backend.

Effects of setting PersistMessages to N and FileStorePath issues in QuickFixJ:

I am running into out of memory issues after a certain amount of time when I run my quickfixj app. After a little investigation I found out that this was being caused by messages that quickfixj caches for re sending when a resend request is received.
So for testing I set this flag to N on a particular session. After that my memory problems completely disappeared. But I do not understand why quickfixj is keeping these message in memory when I have properly set this property : FileStorePath. These messages should be stored into a file but they are not. I do see some files present in the directory I set in FileStorePath but none of them seems to be storing messages, I can only see sequence number in them. Do I need to set other flags besides this in order to make this work?
I do not plan to use PersisMessages flag outside testing. I would prefer FileStoreMaxCachedMsgs flag with a reasonable figure. I also need to know what will happen if my app receives a resend request when I have set PersisMessages to N? Will quickfixj send gapfills instead or will it crash with some exception message?
Thanks
i ve found that quickfixj sends gap fills when it could not find messages. also the config flag FileStoreMaxCachedMsgs is used to tell quickfixj about how much messages it should keep in cache before pushing them down to files. so this flag, in my opinion, should be altered in order to get your app to work without running out of memory due to message caching.
hope it ll be helpful for somebody. Thanks

Data logged to a file; how do I rotate logs and how do I parse the data to not have 'gaps' in the data?

I've got a web application that, for performance reasons, throws any data sent into a logfile.
I've got two concerns with this approach:
How do I best rotate logs, in order to not lose data?
For each user session multiple requests are logged. Each request has a unique id so there is an easy way for me to tie the requests to the session. The problem is, however, that if I rotate the logs I risk ending up with one request in one log and another request in another log.
How do I arrange my parsing in a way that allows me to parse all requests from a given session? I am willing to define a session timelimit, for example that the requests must, at maximum be 30 minutes apart.
If I had a hourly log rotation at 00 minutes:
What if the user made one request at 13:59 and one at 14:01 - The user would end up having requests in two different logs.
Answer to part 1: If you're on *nix, use syslog/logger. Check the logger(1) and syslog.conf(5) man pages.
Answer to part 2: You're not forced to look at just one log file at a time. less ${SERVICE}* will normally open all the relevant log files together: when you get to the bottom of a page, :n will move you to the next file and :p back.
Alternatively, use a log analyser program. Steve Kemp's post on promptly finding needles in syslog haystacks covers, together with its comments, a lot of ground.

What information should I be logging in my web app?

I finishing up a web application and I'm trying to implement some logging. I've never seen any good examples of what to log. Is it just exceptions? Are there other things I should be logging? What type of information do you find useful for finding and fixing bugs.
Looking for some specific guidance and best practices.
Thanks
Follow up
If I'm logging exceptions what information specifically should I be logging? Should I be doing something more than _log.Error(ex.Message, ex); ?
Here is my logical breakdown of what can be logged within and application, why you might want to and how you might go about doing it. No matter what I would recommend using a logging framework such as log4net when implementing.
Exception Logging
When everything else has failed, this should not. It is a good idea to have a central means of capturing all unhanded exceptions. This shouldn't
be much harder then wrapping your entire application in a giant try/catch unless you are using more than on thread. The work doesn't end here
though because if you wait until the exception reaches you a lot of useful information would have gone out of scope. At the very least you should
try to collect specific pieces of the application state that are likely to help with debugging as the stack unwinds. Your application should always be prepared to produce this type of log output, especially in production. Make sure to take a look at ELMAH if you haven't already. I haven't tried it but I have heard great things
Application Logging
What I call application logs includes any log that captures information about what your application is doing on a conceptual level such as "Deleted Order" or "A User Signed On". This kind of information can be useful in analyzing trends, auditing the system, locking it down, testing, security and detecting bugs of coarse. It is probably a good idea to plan on leaving these logs on in production as well, perhaps at variable levels of granularity.
Trace Logging
Trace logging, to me, represents the most granular form of logging. At this level you focus less on what the application is doing and more on how it is doing it. This is one step above actually walking through the code line by line. It is probably most helpful in dealing with concurrency issues or anything for that matter which is hard to reproduce. You wouldn't want to always have this running, probably only turning it on when needed.
Lastly, as with so many other things that usually only get addressed at the very end, the best time to think about logging is at the beginning of a project so that the application can be designed with it in mind. Great question though!
Some things to log:
business actions, such as adding/deleting items. Talk to your app's business owner to come up with a list of things that are useful. These should make sense to the business, not to you (for example: when user submitted report, when user creates a new process, etc)
exceptions
exceptions
exceptions
Some things to NOT to log:
do not log information simply for tracking user usage. Use an analytics tool for that (which tracks the client in javascirpt, not in the client)
do not track passwords or hashes of passwords (huge security issue)
Maybe you should log page/resource accesses which are not yet defined in your application, but are requested by clients. That way, you may be able to find vulnerabilities.
It depends on the application and its audience. If you are managing sales or trading stocks, you probably should log more info than say a personal blog. When you need the log most is when an error is happening in your production environment, but can't reproduce it locally. Having log level and log hierarchy would help in such situations, because you can dynamically increase the log level. See log4j's documentation and log4net.
My few cents.
Besides using log severity and exceptions properly, consider structuring your log statements so that you could easily look though the log data in the future. For example - extracting meaningful info quickly, doing queries etc. There is no problem to generate an ocean of log data, the problem is to convert this data into information. So, structuring and defining it beforehand helps in later usage. If you use log4j, I would also suggest using mapped diagnostic context (MDC) - this helps a lot for tracking session contexts. Aside from trace and info, I would also use debug level where I usually keep temp. items. Those could be filtered out or disabled when not needed.
You probably shouldn't be thinking of this at this stage, rather, logging is helpful to consider at every stage of development to help diffuse potential bugs before they arise. Depending on your program, I would try to capture as much information as possible. Log everything. You can always stop logging certain components or processes if you don't reference that data enough. There is no such thing as too much information.
From my (limited) experience, if you don't want to make a specific error table for each possible error type, construct a generic database table that accepts general information as well as a string that you can populate with exception data, confirmation messages during successful yet important processes, etc. I've used a generic function with parameters for this.
You should also consider the ability to turn logging off if necessary.
Hope this helps.
I beleive when you log an exception you should also save current date and time, requested url, url refferer and user IP address.

Resources