Are there memory limitations when outputting to a ScrolledText widget? - memory

I am fairly new to Python and to GUI programming, and have been learning the Tkinter package to further my development.
I have written a simple data logger that sends a command to a device via a serial or TCP connection, and then reads the response back, displaying it in a ScrolledText widget. In addition, I have a button that allows me to save the contents of the ScrolledText widget into a text file.
I was testing my software by sending a looped command, with a 0.5 second delay between commands. The aim was to test the durability of the logger so it may later be deployed to automatically monitor and log the output of the devices it is connected to.
After 30-40 minutes, I find that the program crashes on my Windows 7 system, and I suspect that it may be caused by a memory issue. The crash is a rather nondescript, "pythonw.exe has stopped working" message. When I monitor the process using Windows Task Manager, the memory used by pythonw.exe increases each time a response is read, and will eventually reach nearly 2Gb.
It may be that I need to rethink my logic and have the software log to the disk in 'real time', while the ScrolledText box overwrites the oldest data after x-number of lines... However, for my own education, I was wondering if there was a better way to manage the memory used by ScrolledText?
Thanks in advance!

In general, no, there are no memory limitations with writing to a scrolled text widget. Internally, the text is stored in an efficient b-tree (efficient, unless all the data is a single line, since the b-tree leaves are lines). There might be a limit of some sort, but it would likely be in the millions of lines or so.

Related

Use log4j 2 for writing to data files or database table

I used log4j (v. 1) in the past and was glad to know that a major refactoring was done to the project, resulting in log4j 2, which solves the issues that plagued version 1.
I was wondering if I could use log4j 2 to write to data files, not only log files.
The application I will be soon developing will need to be able to receive many events from different sources and write them very fast either to a data file or to a database (I haven't decided which yet).
The thread that receives the events must not be blocked by I/O while attempting to write events, so log4j2's Asynchronous Loggers, based on the LMAX Disruptor library, will definitely fit this scenario.
Moreover, my application must be able to recover either from a 'not enough space on disk' or 'unable to reach database' conditions, when writing to a data file or to a database table, respectively. In other words, when the application runs out of disk space or the database is temporarily unavailable, my application needs to store events in memory and wait for storage to become available and when it does, write all waiting events to disk or database.
Do you think I can do this with log4j?
Many thanks for your help.
Regards,
Nuno Guerreiro
Yes.
I'm aware of at least one production implementation in a similar scenario, where in gathered events are written to disk at high throughput.
Write to a volume other than your system volume to minimize the chances of system crashes due to disk space overrun.
Upfront capacity planning can help in ensuring h/w configuration with adequate resources to handle projected average load and bursts, for a reasonable period of time.
Do not let the system run out of disk space :). Keep track of disk usage, and proactively drop older data in extreme circumstances.

How to profile a dart app?

I'm trying the demo of start, which is a pretty simple web site built on dart.
When I run it, the initial memory usage is 10M, but when I visit the home page, refresh it again and again, the memory is growing fast until it gets to 78M, and will never get back.
I want to find what uses the memory, and is there any memory leak, but I don't know how to do it. Is it any tool can help me to profile a dart app?
It has already been pointed out in the comments that there are ways to get a CPU profile from the VM on Linux (https://code.google.com/p/dart/wiki/Profiling).
As far as I understand what you are really looking for is to get a heap or memory profile. While it is possible to print an object histogram when the program terminates (see below), we do not have any convenient way to get the object histogram while your server is running. We do hope to be able to add this capability over the next months.
To print the object histogram when the Dart script exits, you should pass the flag
--print_object_histogram to the Dart VM. This will print the averages of the live objects at the end of each major GC over the life of the program. This can be fine to get a quick overview, but is not ideal to track down and identify real problems.

DSP on Beaglebone

I have a Beaglebone running Ubuntu. We want to continuously sample from 3 on-board ATD converters at 100KS/s, and every window of samples we will run a cross correlation DSP algorithm. Once we find a correlation value above a threshold, we will send the value to a PC.
My concern is the process scheduling in Ubuntu. If our process gets swapped out and an ATD sample becomes available during this time, the process will miss the sample. We need to ensure that our process will capture every sample and save it in memory.
With this being said, is there a way to trigger interrupts on the Beaglebone so that if an ATD sample is ready, the sample will be saved in the memory of our program even if the program does not have the processor at the time?
Thanks!
You might be able to trigger the EDMA or use the PRUSS. Probably best to ask on beagleboard#googlegroups.com. There isn't a DSP per-se on the BeagleBone.
This is not exactly an answer to your question, but hopefully it explains how the process works. Since you didn't mention what hardware you are running for AD conversion, maybe this is the best that can be done:
With audio hardware, which faces the same problem, the solution comes from the hardware and the drivers working together: whenever the hardware has filled up enough of the buffer it signals the driver (via an interrupt or some similar mechanism). In some cases, it's also possible that the driver polls the hardware or something like that, but that's a less efficient solution, and I'm not sure anyone does it that way anymore (maybe on cheaper hardware?). From there, the driver process may call right into the end-user process, or it may simply mark the relevant end-user process as "runnable". Either way, control needs to be transferred to the end user process.
For that to happen, the end user process must be running at a higher priority than anything else occupying the CPUs at that moment. To guarantee that your process will always be first in the queue, you can run it at a high priority, with the appropriate permissions, you can even run in very high priorities.
The time it takes for the top priority process to go from runnable to running is sometimes called the "latency" of the OS, though I am sure there's a more specific technical term. The latency of Linux is on the order of 1 ms, but since it's not a "hard" real-time OS, this is not a guarantee. If this is too long to handle your chunks of data, you may have to buffer some of it in your driver.

Detailed multitasking monitoring

I'm trying to put together a model of a computer and run some simulations on it (part of a school assignment). It's a very simple model - a CPU, a disk and a process generator that generates user processes that take turns in using the CPU and accessing the disk (I've decided to omit the various system processes, because according to Process Explorer they use next to no CPU time - I'm basing this on the Microsoft Process Explorer tool, running on Windows 7). And this is where I've stopped at.
I have no idea how to get relevant data on how often do various processes read/write to disk and how much data at once, and how much time they spend using the CPU. Let's say I want to get some statistics for some typical operations on a PC - playing music/movies, browsing the internet, playing games, working with Office, video editing and so on...is there even a way to gather such data?
I'm simulating preemptive multitasking using RR with a time quantum of 15ms for switching processes, and this is how it looks:
->Process gets to CPU
->Process does its work in 0-15ms, gives up the CPU or is cut off
And now, two options arise:
a)process just sits and waits before it gets the CPU again or before it gets some user input if there is nothing to do
b)process requested data from disk, and does not rejoin the queue until said data is available
And i would like the decision between a) and b) in the model be done based on a probability, for example 90% for a) and 10% for b). But I do not know how to get those percentages to be at least a bit realistic for a certain type of process. Also, how much data can and does a process typically access at once?
Any hints, sources, utilities available for this?
I think I found an answer myself, albeit an unreliable one.
The Process Explorer utility for Windows measures disk I/O - by volume and by occurences. So there's a rough way to get the answer:
say a process performs 3 000 reads in 30 minutes, whilst using 2% of CPU during that time (assuming a single core CPU). So the process has used 36000ms of CPU time, divided into ~5200 blocks (this is the unreliable part - the process in all proabbility does not use the whole of the time slot, so I'll just divide by half the time slot). 3000/5200 gives a 57% chance of reading data after using the CPU.
I hope I did not misunderstand the "reads" statistic in Process Explorer.

OutOfMemoryException Processing Large File

We are loading a large flat file into BizTalk Server 2006 (Original release, not R2) - about 125 MB. We run a map against it and then take each row and make a call out to a stored procedure.
We receive the OutOfMemoryException during orchestration processing, the Windows Service restarts, uses full 2 GB memory, and crashes again.
The server is 32-bit and set to use the /3GB switch.
Also I've separated the flow into 3 hosts - one for receive, the other for orchestration, and the third for sends.
Anyone have any suggestions for getting this file to process wihout error?
Thanks,
Krip
If this is a flat file being sent through a map you are converting it to XML right? The increase in size could be huge. XML can easily add a factor of 5-10 times over a flat file. Especially if you use descriptive or long xml tag names (which normally you would).
Something simple you could try is to rename the xml nodes to shorter names, depending on the number of records (sounds like a lot) it might actually have a pretty significant impact on your memory footprint.
Perhaps a more enterprise approach, would be to subdivide this in a custom pipeline into separate message packets that can be fed through the system in more manageable chunks (similar to what Chris suggests). Then the system throttling and memory metrics could take over. Without knowing more about your data it would be hard to say how to best do this, but with a 125 MB file I am guessing that you probably have a ton of repeating rows that do not need to be processed sequentially.
Where does it crash? Does it make it past the Transform shape? Another suggestion to try is to run the transform in the Receive Port. For more efficient processing, you could even debatch the message and have multiple simultaneous orchestration instances be calling the stored procs. This would definately reduce the memory profile and increase performance.

Resources