Like CPU simulation
I need to write an application that can simulate high memory-usage at a pre-set values ( e.g., 30%, 50%, 90% etc) for a certain duration. Meaning it will take two inputs (memoryvalue and duration). Let say i use 50% for memory-Usage and 2 minutes for Duration). This mean that when I run the application, it should take 50% memory for 2 minutes. Any ideas how this can be achieved?
Any help pls.
You can simulate a memory leak like this (taken from this thread):
var list = new List<byte[]>();
while (true)
{
list.Add(new byte[1024]); // Change the size here.
}
Similarly to the app I wrote for simulating CPU load for a specific amount of time, you just make a method allocating an amount of memory and create a timer, which when it runs out, clears the list and then invokes the Garbage Collector.
You have to watch out that if you allocate too much memory your system might become unresponsive and you might crash it.
Related
Here's a very simple demo:
class ViewController: UIViewController {
override func viewDidLoad() {
super.viewDidLoad()
for i in 0..<500000 {
DispatchQueue.global().async {
print(i)
}
}
}
}
When I run this demo from my simulator, the memory usage goes up to ~17MB and then drops to ~15MB in the end. However, if I comment out the dispatch code and only keeps the print() line, the memory usage is only ~10MB. The increase amount varies whenever I change the loop count.
Is there a memory leak? I tried Leaks and didn't find anything.
When looking at memory usage, one has to run the cycle several times before concluding that there is a “leak”. It could just be caching.
In this particular case, you might see memory growth after the first iteration, but it will not continue to grow on subsequent iterations. If it was a true leak, the post-peak baseline would continue to creep up. But it does not. Note that the baseline after the second peak is basically the same as after the first peak.
As an aside, the memory characteristics here are a result of the thread explosion (which you should always avoid). Consider:
for i in 0 ..< 500_000 {
DispatchQueue.global().async {
print(i)
}
}
That dispatches half a million work items to a queue that can only support 64 worker threads at a time. That is “thread-explosion” exceeding the worker thread pool.
You should instead do the following, which constraints the degree of concurrency with concurrentPerform:
DispatchQueue.global().async {
DispatchQueue.concurrentPerform(iterations: 500_000) { i in
print(i)
}
}
That achieves the same thing, but limits the degree of concurrency to the number of available CPU cores. That avoids many problems (specifically it avoids exhausting the limited worker pool thread which could deadlock other systems), but it also avoids the spike in memory, too. (See the above graph, where there is no spike after either of the latter two concurrentPerform processes.)
So, while the thread-explosion scenario does not actually leak, it should be avoided at all costs because of both the memory spike and the potential deadlock risks.
Memory used is not memory leaked.
There is a certain amount of overhead associated with certain OS services. I remember answering a similar question posed by someone using a WebView. There are global caches. There is simply code that has to be paged in from disk to memory (which is big with WebKit), and once the code is paged in, it's incredibly unlikely to ever be paged out.
I've not looked at the libdispatch source code lately, but GCD maintains one or more pools of threads that it uses to execute the blocks you enqueue. Every one of those threads has a stack. The default thread stack size on macOS is 8MB. I don't know about iOS's default thread stack size, but it for sure has one (and I'd bet one stiff drink that it's 8MB). Once GCD creates those threads, why would it shut them down? Especially when you've shown the OS that you're going to rapidly queue 500K operations?
The OS is optimizing for performance/speed at the expense of memory use. You don't get to control that. It's not a "leak" which is memory that's been allocated but has no live references to it. This memory surely has live references to it, they're just not under your control. If you want more (albeit different) visibility into memory usage, look into the vmmap command. It can (and will) show you things that might surprise you.
I am using MAX10 FPGA and have interfaced DDR3 memory. I have noticed that my DDR3 Memory is working slow as compared to on-chip memory. I came to know about this, as I wrote a blinking LEDs program, and for same delay function with on-chip memory it is working faster as compared to DDR3 memory. What can be done possibly to increase speed? And what might possibly be wrong? My system clock is running at 50MHz.
P.S. There are no Instruction or Data Caches in my system.
First,your function is not pipeline function as your description.Because you do something with memory and then blinking the LED.Every thing run in sequence.
In this case,you should estimate the response time and throughout of your memory.For example,you read a data from memory and then do a add function,and do this 10 times.If you always read memory after add function,your sum time consumption is about 10*response time + 10 add function time.
The difference is memory response time.Inner ram's response time can be 1 cycle at 50MHz.But DDR3 memory should be about 80 ns. That's the difference.
But you can change your module to pipeline pattern.Read/write data and do your other function parallel.And r/w DDR ahead.That's like cache in PC. This can save some time.
By the way,DDR throughout is highly depends on your function pattern.If you read or write data at the sequence order address, then you will get a bigger throughout.
After all,external memory's throughout and response time can never greater then internal memory.
Forgive my English.
I need to calculate power consumption of CPU. According to this formula.
Power(mW) = cpu * 1.8 / time.
Where time is the sum of cpu + lpm.
I need to measure at the start of certain process and at the end, however the time passed it is to short, and cpu don't change to lpm mode as seen in the next values taken with powertrace_print().
all_cpu all_lpm all_transmit all_listen
116443 1514881 148 1531616
17268 1514881 148 1532440
Calculating power consumption of cpu I got 1.8 mW (which is exactly the value of current draw of CPU in active mode).
My question is, how calculate power consumption in this case?
If MCU does not go into a LPM, then it spends all the time in active mode, so the result of 1.8 mW you get looks correct.
Perhaps you want to ask something different? If you want to measure the time required to execute a specific block of code, you can add RTIMER_NOW() calls at the start and end of the block.
The time resolution of RTIMER_NOW may be too coarse for short-time operations. You can use a higher frequency timer for that, depending on your platform, e.g. read the TBR register for timing if you're compiling for a msp430 based sensor node.
I am implementing a spiking neural network using the CUDA library and am really unsure of how to proceed with regard to the following things:
Allocating memory (cudaMalloc) to many different arrays. Up until now, simply using cudaMalloc 'by hand' has sufficed, as I have not had to make more than 10 or so arrays. However, I now need to make pointers to, and allocate memory for thousands of arrays.
How to decide how much memory to allocate to each of those arrays. The arrays have a height of 3 (1 row for the postsynaptic neuron ids, 1 row for the number of the synapse on the postsynaptic neuron, and 1 row for the efficacy of that synapse), but they have an undetermined length which changes over time with the number of outgoing synapses.
I have heard that dynamic memory allocation in CUDA is very slow and so toyed with the idea of allocating the maximum memory required for each array, however the number of outgoing synapses per neuron varies from 100-10,000 and so I thought this was infeasible, since I have on the order of 1000 neurons.
If anyone could advise me on how to allocate memory to many arrays on the GPU, and/or how to code a fast dynamic memory allocation for the above tasks I would have more than greatly appreciative.
Thanks in advance!
If you really want to do this, you can call cudaMalloc as many times as you want; however, it's probably not a good idea. Instead, try to figure out how to lay out the memory so that neighboring threads in a block will access neighboring elements of RAM whenever possible.
The reason this is likely to be problematic is that threads execute in groups of 32 at a time (a warp). NVidia's memory controller is quite smart, so if neighboring threads ask for neighboring bytes of RAM, it coalesces those loads into a single request that can be efficiently executed. In contrast, if each thread in a warp is accessing a random memory location, the entire warp must wait till 32 memory requests are completed. Furthermore, reads and writes to the card's memory happen a whole cache line at a time, so if the threads don't use all the RAM that was read before it gets evicted from the cache, memory bandwidth is wasted. If you don't optimize for coherent memory access within thread blocks, expect a 10x to 100x slowdown.
(side note: The above discussion is still applicable with post-G80 cards; the first generation of CUDA hardware (G80) was even pickier. It also required aligned memory requests if the programmer wanted the coalescing behavior.)
I am using R on some relatively big data and am hitting some memory issues. This is on Linux. I have significantly less data than the available memory on the system so it's an issue of managing transient allocation.
When I run gc(), I get the following listing
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 2147186 114.7 3215540 171.8 2945794 157.4
Vcells 251427223 1918.3 592488509 4520.4 592482377 4520.3
yet R appears to have 4gb allocated in resident memory and 2gb in swap. I'm assuming this is OS-allocated memory that R's memory management system will allocate and GC as needed. However, lets say that I don't want to let R OS-allocate more than 4gb, to prevent swap thrashing. I could always ulimit, but then it would just crash instead of working within the reduced space and GCing more often. Is there a way to specify an arbitrary maximum for the gc trigger and make sure that R never os-allocates more? Or is there something else I could do to manage memory usage?
In short: no. I found that you simply cannot micromanage memory management and gc().
On the other hand, you could try to keep your data in memory, but 'outside' of R. The bigmemory makes that fairly easy. Of course, using a 64bit version of R and ample ram may make the problem go away too.