I have been struggling with parallel and async constructs in F# for the last couple days and not sure where to go at this point. I have been programming with F# for about 4 months - certainly no expert - and I currently have a series of calculations that are implemented in F# (asp.net 4.5) and are working correctly when executed sequentially. I am running the calculations on a multi-core server and since there are millions of inputs to perform the same calculation on, I am hoping to take advantage of parallelism to speed it up.
The calculations are extremely data parallel - basically the exact calculation on different input data. I have tried a number of different avenues and I continually run into the same issue - it seems as if the parallel looping never gets to the end of the input data set. I have tried TPL, ConcurrentQueues, Parallel.Array.map/iter and all the same result: the program starts out fine and then somewhere in the middle (indeterminate) it just hangs and never completes. For simplicity I actually removed the calculation from the program and I am just calling a print method, and Here is where the code is currently at:
let runParallel =
let ids = query {for c in db.CustTable do select c.id} |> Seq.take(5)
let customerInputArray= getAllObservations ids
Array.Parallel.iter(fun c -> testParallel c) customerInputArray
let key = System.Console.ReadKey()
0
A few points...
I limited the results above to only 5 just for debugging. The actual program does not apply the Take(5).
The testParallel method is just a printfn "test".
The customerInputArray is a complex data type. It is a tuple of lists that contain records. So I am pretty sure my problem must be there...but I added exception handling and no exception is getting raised, so have no idea how to go about finding the problem.
Any help is appreciated. Thanks in advance.
EDIT: Thanks for the advice...I think it is definitely deadlock. When I remove all of the printfn, sprintfn, and string concat operations, it completes. (of course, I need those things in there.)
Is printfn, sprintfn, and string ops not thread-safe?
Another EDIT: Iteration always stops on the last item..So if my input array has 15 items, the processing stops on item 14, or seems to never get to item 15. Then everything just hangs. Does not matter what the size of the input array is..Any ideas what can be causing this? I even switched over to Parallel.ForEach (instead of Array.Parallel) and same behavior.
Update on the situation and how I resolved this issue.
I was unable to upload code from my example due to my company's firewall policy, so in the end my question did not have enough details. I failed to mention that I was using a type provider which was important information in this situation. But here is what I figured out.
I am using the F# type provider for SQL Server and was passing around its Service Types which I suspect are not thread-safe. When I replaced the ServiceTypes with plain old F# Records, the code worked fine - no more deadlocks and everything completed without error.
Related
MQL4 codes are written in C language and basically, there is no method to use error detecting mechanism in C language before executing the code. Is there a special functionality in the Mql4 platform which helps to catch runtime errors before it executes?
No. you cannot throw an error, and you cannot catch it. So be very careful, check that b!=0 before dividing a by b, check that idx>=0 and idx<array.size before accessing array[idx] and check CheckPointer(object)==POINTER_DYNAMIC before calling something with object.
There isn't a mecanisn in mql of detect the errors before executing the code.
So besides the basic checkings of the range limits in the arrays, divisions by 0, passing the parameters witht the right range (price, sl, tp...) to the operations etc. the best way of finding most of the errors is running a backtest of the strategy you have built for several months using M1. The strategy tester is available on Metatrader.
During the backtesting Metatrader is going to feed your code with a lot of data simulating the actual market, so the code is going to pass through a lot of situations/routines/functions that is going to find later in the actual trading.
The back testing is the best simulation you can do, testing not only the strategy, testing the code itself.
This is not going to guarantee a 100% error free code, but in my case it finds >99% of them.
Mistakenly, many months ago, I used the following logic to essentially implement the same functionality as the PCollectionView's asList() method:
I assigned a dummy key of “000” to every element in my collection
I then did a groupBy on this dummy key so that I would essentially get a list of all my elements in a single array list
The above logic worked fine when I only had about 200 elements in my collection. When I ran it on 5000 elements, it ran for hours until I finally killed it. Then, I created a custom “CombineFn” in which I essentially put all the elements into my own local hash table. That worked, and even on my 5000 element situation, it ran within about a minute or less. Similarly, I later learned that I could use the asList() method, and that too ran in less than a minute. However, what concerns me – and what I don't understand – is why the group by took so long to run (even with only 200 elements it would take more than a few seconds) and with 5000 it ran for hours without seeming to accomplish anything.
I took a look at the group by code, and it seems to be doing a lot off steps I don't quite understand… Is it somehow related to the fact that the group by statement is attempting to run things across a cluster? Or is it may be related to using an efficient coder? Or am I missing something else? The reason I ask is that there are certain situations in which I'm forced to use a group by statement, because the data set is way too large to fit in any single computer's RAM. However, I am concerned that I'm not understanding how to properly use a group by statement, since it seems to be so slow…
There are a couple of things that could be contributing to the slow performance. First, if you are using SerializableCoder, that is quite slow, and AvroCoder will likely work better for you. Second, if you are iterating through the Iterable in your ParDo after the GBK, if you have enough elements you will exceed the cache and end up fetching the same data many times. If you need to do that, explicitly putting the data in a container will help. Both the CombineFn and asList approaches do this for you.
I've noticed one of my dataflow jobs has produced output with what I could best describe as too many random bit flips. For example a year "2014" (as text) was written as "0007" or "2016" or "0052" or other textual values. In some cases the output line format is valid (which suggests something happened in processing) but few lines seems to have malformed formatting as well (e.g., "20141215-04-25" instead of something like "2014-12-25").
I'm occasionally re-running the jobs with the same code and different date range parameters, and for this specific dates range the job was completing successfully until about a week ago. I have been trying different machine configurations though (4 cpus and 1-cpu instances) and the problems seems to happen more with 4-cpu instances.
Does anybody know what could be leading to this?
Thanks,
G
When using 4-cpu instances, Dataflow runs multiple threads in a single Java process. Data corruption could happen if one of the transforms is thread-hostile, that is, not even separate instances of the class can be safely accessed by multiple threads. This typically happens when the class uses a static non-thread-safe member variable.
A thread safety issue in user code resulted in this type of corruption. This type of errors are likely to occur when using multi-core instances for compute.
Hy I am curious does anyone know a tutorial example where semaphores are used for more than 1 process /thread. I'm looking forward to fix this problem. I have an array, of elements and an x number of threads. This threads work over the array, only 3 at a moment. After 5 works have been completed, the server is signelised and it clean those 5 nodes. But I'm having problems with the designing this problem. (node contains worker value which contains the 'name' of the thread that is allowed to work on it, respectivly nrNodes % nrThreads)
In order to make changes on the list a mutex is neccesarly in order not to overwrite / make false evaluations.
But i have no clue how to limit 3 threads to parse, at a given point, the list, and how to signal the main for cleaning session. I have been thinking aboutusing a semafor and a global constant. When the costant reaches 5, the server to be signaled(which probably would eb another thread.)
Sorry for lack of code but this is a conceptual question, what i have written so far doesn't affect the question in any way.
I am running into the following issue while profiling an application under VC6. When I profile the application, the profiler is indicating that a simple getter method similar to the following is being called many hundreds of thousands of times:
int SomeClass::getId() const
{
return m_iId;
};
The problem is, this method is not called anywhere in the test app. When I change the code to the following:
int SomeClass::getId() const
{
std::cout << "Is this method REALLY being called?" << std::endl;
return m_iId;
};
The profiler never includes getId in the list of invoked functions. Comment out the cout and I'm right back to where I started, 130+ thousand calls! Just to be sure it wasn't some cached profiler data or corrupted function lookup table, I'm doing a clean and rebuild between each test. Still the same results!
Any ideas?
I'd guess that what's happening is that the compiler and/or the linker is 'coalescing' this very simple function to one or more other functions that are identical (the code generated for return m_iId is likely exactly the same as many other getters that happen to return a member that's at the same offset).
essentially, a bunch of different functions that happen to have identical machine code implementations are all resolved to the same address, confusing the profiler.
You may be able to stop this from happening (if this is the problem) by turning off optimizations.
I assume you are profiling because you want to find out if there are ways to make the program take less time, right? You're not just profiling because you like to see numbers.
There's a simple, old-fashioned, tried-and-true way to find performance problems. While the program is running, just hit the "pause" button and look at the call stack. Do this several times, like from 5 to 20 times. The bigger a problem is, the fewer samples you need to find it.
Some people ask if this isn't basically what profilers do, and the answer is only very few. Most profilers fall for one or more common myths, with the result that your speedup is limited because they don't find all problems:
Some programs are spending unnecessary time in "hotspots". When that is the case, you will see that the code at the "end" of the stack (where the program counter is) is doing needless work.
Some programs do more I/O than necessary. If so, you will see that they are in the process of doing that I/O.
Large programs are often slow because their call trees are needlessly bushy, and need pruning. If so, you will see the unnecessary function calls mid-stack.
Any code you see on some percentage of stacks will, if removed, save that percentage of execution time (more or less). You can't go wrong. Here's an example, over several iterations, of saving over 97%.