Use log4j 2 for writing to data files or database table - log4j2

I used log4j (v. 1) in the past and was glad to know that a major refactoring was done to the project, resulting in log4j 2, which solves the issues that plagued version 1.
I was wondering if I could use log4j 2 to write to data files, not only log files.
The application I will be soon developing will need to be able to receive many events from different sources and write them very fast either to a data file or to a database (I haven't decided which yet).
The thread that receives the events must not be blocked by I/O while attempting to write events, so log4j2's Asynchronous Loggers, based on the LMAX Disruptor library, will definitely fit this scenario.
Moreover, my application must be able to recover either from a 'not enough space on disk' or 'unable to reach database' conditions, when writing to a data file or to a database table, respectively. In other words, when the application runs out of disk space or the database is temporarily unavailable, my application needs to store events in memory and wait for storage to become available and when it does, write all waiting events to disk or database.
Do you think I can do this with log4j?
Many thanks for your help.
Regards,
Nuno Guerreiro

Yes.
I'm aware of at least one production implementation in a similar scenario, where in gathered events are written to disk at high throughput.
Write to a volume other than your system volume to minimize the chances of system crashes due to disk space overrun.
Upfront capacity planning can help in ensuring h/w configuration with adequate resources to handle projected average load and bursts, for a reasonable period of time.
Do not let the system run out of disk space :). Keep track of disk usage, and proactively drop older data in extreme circumstances.

Related

Are there memory limitations when outputting to a ScrolledText widget?

I am fairly new to Python and to GUI programming, and have been learning the Tkinter package to further my development.
I have written a simple data logger that sends a command to a device via a serial or TCP connection, and then reads the response back, displaying it in a ScrolledText widget. In addition, I have a button that allows me to save the contents of the ScrolledText widget into a text file.
I was testing my software by sending a looped command, with a 0.5 second delay between commands. The aim was to test the durability of the logger so it may later be deployed to automatically monitor and log the output of the devices it is connected to.
After 30-40 minutes, I find that the program crashes on my Windows 7 system, and I suspect that it may be caused by a memory issue. The crash is a rather nondescript, "pythonw.exe has stopped working" message. When I monitor the process using Windows Task Manager, the memory used by pythonw.exe increases each time a response is read, and will eventually reach nearly 2Gb.
It may be that I need to rethink my logic and have the software log to the disk in 'real time', while the ScrolledText box overwrites the oldest data after x-number of lines... However, for my own education, I was wondering if there was a better way to manage the memory used by ScrolledText?
Thanks in advance!
In general, no, there are no memory limitations with writing to a scrolled text widget. Internally, the text is stored in an efficient b-tree (efficient, unless all the data is a single line, since the b-tree leaves are lines). There might be a limit of some sort, but it would likely be in the millions of lines or so.

Constant memory dumps

Is it possible to constantly dump the memory of a process to record every change that is happening? For example if I have a program that modifies the contents of an array I'd like to know the contents of that array before some modification. I imagine a program could save the initial memory and then all changes in a file and I'd just search the file by the modified contents of the array which I know. Then I'd look for changes in that specific memory location before that moment and find the initial contents.
Does a program like that exist? If so, what program would you recommend?
EDIT: I wrote a program in C++ that captures packets of another process using pcap and I would like to know how these packets are constructed inside that program. I'm using Windows.
Notice that memory content is (or may be) changing a lot faster than what a disk is capable of writing.
Also, your question is OS specific. I guess that you are using Linux.
In all cases, design your application very early with your goals.
Perhaps you are looking for application checkpointing. If on Linux, consider BLCR.
Perhaps you are looking for some persistence mechanism. A possible way might be to explicitly persist the state of your application at some points in your program, which are executed frequently. Persistence of the call stack or of continuations is a difficult issue
You may want to use textual formats (like JSON) for serialization. You could be interested in database technology, either relational-SQL (e.g. Sqlite or PostGreSQL) or noSQL mongodb
Persistence and checkpointing may be related to garbage collection algorithms (notably copying GC).
Some language implementations are able to persist their entire heap. For example, in Common Lisp, the SBCL implementation offers save-lisp-and-die
For debugging, you might want watchpoints, or the gcore(1) command.
Notice that if you fork(2) your process and sleep or idle immediately the child process you are keeping in that child process a snapshot of your address space.
Read also about transactional memory & ACID properties

Reading/Writing to/from iPhone's Documents folder performance

I have got an iPhone application where I archive permanent data in the documents folder of the application (arrays, dictionaries). I read/write from and to the documents folder quite frequently and I would like to know whether this is considered a bad habit. Wouldn't it be better if I had a singleton class, read and write to arrays there and then, only when the application quits, write this data to the documents folder ? I do not see/feel any performance issues right now on my iPhone 5, but I wanted to know whether this is a bad practise.
FLASH memory has limited write capability - a long time ago it was rated in some increment of thousands. Not sure where it is today.
That said, if your app is using the standard file system APIs, then the system is using the file cache, and you might open a file, read it then change it many times without the file system ever writing to flash. The system may sync to flash occasionally, but that process is opaque - no way to really know when or why iOS does it.
The UNIX APIs allow for syncing the file system cache to the storage system (iOS this is FLASH), but if you are not using that then you are probably not doing much I/O at all given what you say above.
Given the lack of Apple discouraging developers from writing to the file system, I for sure would not worry about this.
But, if you said you were writing gigabytes of image data every few minutes - well - that might be a problem.

Data Retrieval Throughput - ETS lookup vs inter-process Messaging

suppose we have an erlang application which involves thousands of processes. Suppose there is a single resource X which may be a tuple, a list, or any erlang term, which all these processes may need to read / pick out something from it, at any moment in time.
An example of such an occurrence, is say, an API system, in which client processes may need to read and write on a remote machine. Ant it happens that you do not want, for each read/write request, a new connection to be created. So, what you do, you create a pool of connections, consider them as a pool of open pipes/sockets/channels.
Now, this pool of resources is to be shared by thousands of processes such that for each read or write demand, you want that process to retrieve any available open channel/resource.
Question is, what if i have a process (a single process) hold this information, whether in its process dictionary or in its receive loop. It would mean that all the processes would have to send a message to this process whenever they need a free resource. This single process would have a huge mailbox at any time because of the high demand for this single resource. OR I could use an ETS Table, and have only one row, say, #resources{key=pool,value= List_of_openSockets_or_channels}. But this would mean that, all our processes would attempt to make a read from the ETS Table for the same row at (high probability) same instantaneous times.
How would the ETS Table handle, if 10,000 process atttempt a read, for the same row/record from it, at the same time/at almost same time ? and yet, if i use a process, its mailbox, if 10,000 processes send a message to it, at same time, for the same resource (and it would need to reply each requestor). And remember this action may occur so frequently. What option (dis-regarding availability issues of process going down blah blah), would provide higher throughput, in a way that, processes would get what they need faster ? Is there any other better way, of handling high demand data structures in the Erlang VM in a way that will provide very fast access to millions of processes, even if they all needed that resource at the same time ?
Short answer: profile. Try different approaches and verify how your system behaves.
Firstly, I would look at ETS' {read_concurrency, true} option. From the documentation:
{read_concurrency,boolean()} Performance tuning. Default is false.
When set to true, the table is optimized for concurrent read
operations. When this option is enabled on a runtime system with SMP
support, read operations become much cheaper; especially on systems
with multiple physical processors. However, switching between read and
write operations becomes more expensive. You typically want to enable
this option when concurrent read operations are much more frequent
than write operations, or when concurrent reads and writes comes in
large read and write bursts (i.e., lots of reads not interrupted by
writes, and lots of writes not interrupted by reads). You typically do
not want to enable this option when the common access pattern is a few
read operations interleaved with a few write operations repeatedly. In
this case you will get a performance degradation by enabling this
option. The read_concurrency option can be combined with the
write_concurrency option. You typically want to combine these when
large concurrent read bursts and large concurrent write bursts are
common.
Secondly, I would look at caching possibilities. Are the processes reading that information only once or multiple times? If they're accessing it multiple times, you could read it once and store it in your process state.
Thirdly, you could try to replicate and distribute that piece of information across your system. Divide et impera.
If you use the process approach, in order to avoid having all the read requests serialized on the message queue of the 'server' process you must replicate.
Using an ETS table with read_concurrency feels more natural and it is something that I used when developing the parallel version of Dialyzer. However, ETS access was never a bottleneck in that case.

OutOfMemoryException Processing Large File

We are loading a large flat file into BizTalk Server 2006 (Original release, not R2) - about 125 MB. We run a map against it and then take each row and make a call out to a stored procedure.
We receive the OutOfMemoryException during orchestration processing, the Windows Service restarts, uses full 2 GB memory, and crashes again.
The server is 32-bit and set to use the /3GB switch.
Also I've separated the flow into 3 hosts - one for receive, the other for orchestration, and the third for sends.
Anyone have any suggestions for getting this file to process wihout error?
Thanks,
Krip
If this is a flat file being sent through a map you are converting it to XML right? The increase in size could be huge. XML can easily add a factor of 5-10 times over a flat file. Especially if you use descriptive or long xml tag names (which normally you would).
Something simple you could try is to rename the xml nodes to shorter names, depending on the number of records (sounds like a lot) it might actually have a pretty significant impact on your memory footprint.
Perhaps a more enterprise approach, would be to subdivide this in a custom pipeline into separate message packets that can be fed through the system in more manageable chunks (similar to what Chris suggests). Then the system throttling and memory metrics could take over. Without knowing more about your data it would be hard to say how to best do this, but with a 125 MB file I am guessing that you probably have a ton of repeating rows that do not need to be processed sequentially.
Where does it crash? Does it make it past the Transform shape? Another suggestion to try is to run the transform in the Receive Port. For more efficient processing, you could even debatch the message and have multiple simultaneous orchestration instances be calling the stored procs. This would definately reduce the memory profile and increase performance.

Resources