Reusing task objects in fork/join in Java 7 - scalability

I would like to use Java fork join to solve a recursive problem, but I don't want to create a new task instance explicitly for each recursion step. The reason is that too many tasks is equal to too many objects which fills up my memory after a few minutes of processing.
I have the following solution in Java 6, but is there a better implementation for Java 7?
final static AtomicInteger max = new AtomicInteger(10); // max parallel tasks
final static ThreadPoolExecutor executor = new ThreadPoolExecutor(....);
private void submitNewTask() {
if (max.decrementAndGet()>=0) {
executor.execute(new Task(....));
return;
}
run(); // avoid creating a new object
}
public void run() {
..... process ....
// do the recursion by calling submitNewTask()
max.incrementAndGet();
}
I tried something like calling the invoke() function on the same task again (after updating the related fields, of course), but it does not seem to work.

I think you are not using the right approach. Fork/Join framework is intended to execute a long time running algorithm on a (potentially) big set of data into a parallel fashion splitting your data into smaller pieces (the RecursiveTask itself) than can be executed by more threads (speeding up execution on multiple "cpu" machines) using a work-stealing strategy.
A RecursiveTask does not need to replicate all your data, but just to keep indexes on the portion you are working on (to avoid harmful overlapping), so data overhead is kept at minimum (of course, every RecursiveTask consumes memory too).
There's often a thread off between memory occupation and time of execution in algorithm design, so FJ framework is intended to reduce time of execution paying a (i think reasonably little) memory occupation. If time of execution is not your first concern, I think that FJ is useless for your problem.

Related

How to call Sinks.Many<T>.tryEmitNext from multiple threads?

I am wrapping my head around Flux Sinks and cannot understand the higher-level picture. When using Sinks.Many<T> tryEmitNext, the function tells me if there was contention and what should I do in case of failure, (FailFast/Handler).
But is there a simple construct which allows me to safely emit elements from multiple threads. For example, instead of letting the user know that there was contention and I should try again, maybe add elements to a queue(mpmc, mpsc etc), and only notify when the queue is full.
Now I can add a queue myself to alleviate the problem, but it seems a common use case. I guess I am missing a point here.
I hit the same issue, migrating from Processors which support safe emission from multiple threads. I use this custom EmitFailureHandler to do a busy loop as suggested by the EmitFailureHandler docs.
public static EmitFailureHandler etryOnNonSerializedElse(EmitFailureHandler fallback){
return (signalType, emitResult) -> {
if (emitResult == EmitResult.FAIL_NON_SERIALIZED) {
LockSupport.parkNanos(10);
return true;
} else
return fallback.onEmitFailure(signalType, emitResult);
};
}
There are various confusing aspects about the 3.4.0 implementation
There is an implication that unless the Unsafe variant is used, the sink supports serialized emission but actually all the serialized version does is to fail fast in case of concurrent emission.
The Sink provided by Flux.Create does support threadsafe emission.
I hope there will be a solidly engineered alternative to this offered by the library at some point.

How often does garbage collector gets called in Xamarin.Android application?

I have the following class:
public class CurrentOrder
{
//contains current order values which is global to all application
public static List<OrderArticleViewModel> listOfOrderArticles = new List<OrderArticleViewModel>();
public static string orderCustomerName;
public static string orderCustomerId;
public static string orderNumber;
public static string orderDateAndHour;
public static DateTime executionOrderDate = DateTime.Now.AddDays(1);
private CurrentOrder()
{
}
}
I use its fields throughout the whole application as global variables for example like that: CurrentOrder.orderNumber . When I am on certain activity and press the back button I want to clear all class fields values and I am doing it like that:
CurrentOrder.listOfOrderArticles = new List<OrderArticleViewModel>();
CurrentOrder.orderCustomerName = null;
CurrentOrder.orderCustomerId = null;
CurrentOrder.orderNumber = null;
CurrentOrder.orderDateAndHour = null;
CurrentOrder.executionOrderDate = DateTime.Now.AddDays(1);
But as far as I know the value of these fields stays in memory, the only thing is now my variables point to another place. If I click the back button 1000 times I will have 1000 times the fields in the memory nothing referencing them. I've heard that the garbage collector takes care to destroy the values that nothing is pointing at them but how often that occurs? Is it posible to press back button 100 times without the garbage collector cleaning?
There is no fixed time interval between garbage collections. Garbage collector called based upon the size of the remaining allocatable memory. Both c# and java are object-oriented language, so we don't need to allocate and release memory manually like c/c++.
Garbage collector will help developer to release memory. Xamarin.Android is using c# language, so it needs CLR to help process to manage memory(Native Android based on ART and Dalvik).
Here are conditions that when the GC will be called:
Garbage collection occurs when one of the following conditions is true:
1.The system has low physical memory. This is detected by either the low memory notification from the OS or low memory indicated by the host.
2.The memory that is used by allocated objects on the managed heap surpasses an acceptable threshold. This threshold is continuously adjusted as the process runs.
3.The GC.Collect method is called. In almost all cases, you do not have to call this method, because the garbage collector runs continuously. This method is primarily used for unique situations and testing.
And I think memory churn will prove that GC called doesn't have a fixed interval.
About your question:
Is it posible to press back button 100 times without the garbage collector cleaning?
It is based on your Android system environment( Your app is foreground or background? Is there enough memory?). But it will be gc finally.
So, about memory question, I think the memory leak and OOM( mainly due to Bitmap) should be got more attention. And memory churn also should be avoid, because it will effect Android render(UI Performance).

Java 8 Stream in main method

Someone asked me in an interview if we shoul write streaming operation in the main method.
Does it makes any difference?
For example:
class Athlete {
private String name;
private int id;
public Athlete(String name,int id) {
this.name = name;
this.id = id;
}
}
public class Trial {
public static void main(String[] args) {
List<Athlete> list = new ArrayList<>();
list.add(new Athlete("John", 1));
list.add(new Athlete("Jim", 2));
list.add(new Athlete("Jojo", 3));
list.stream().forEach(System.out::print); // or any other any stream operation
}
}
So I am just curious to know if it makes any difference... For now, the only thing I know is that once the stream is consumed, it cannot be consumed again.
So does it affect the memory or create buffer memory in the JVM for streaming?
If yes? Why should this not be used in the main method?
The question whether “we should write streaming operation in the main method” is a loaded question. The first thing it implies, is the assumption that there was something special about the main method. Regardless of which operations we are talking about, if the conclusion is that you may or may not use them in an arbitrary method, there is no reason why you should come to a different conclusion when the method in question is the main method.
Apparently, “should we …” is meant to actually ask whether “should we avoid …”. If that’s the question, then, keeping in mind that there are no special rules for the main method, if there was a reason forbidding the use of the Stream API, that reason also applied to all other methods, making the Stream API an unusable API. Of course, the answer is that there is no reason forbidding the Stream API in the main method.
Regarding memory consumption, when replacing a for-each loop with a Collection.forEach method invocation, you are trading an Iterator instance for a lambda instance, so have no signifcant difference in the number and size of the created object instances. If you use Stream’s forEach method, you add a Spliterator and a Stream instance, which still can be considered insignificant, even when your application consists of the main method only. The memory pre-allocated by a JVM is much larger than the memory consumed by those few objects and your objects are very likely to fit within the threads’s local allocation store. In other words, from the outside of the JVM, there will be no difference in the memory used by the process.
As you mentioned the term “buffer”, the conceptual thing you should know is that a Stream does not buffer elements for most operations (including forEach), so, regardless of whether you traverse a Collection via loop or Stream, in both cases no memory scaling with the Collection’s size is ever allocated, so the difference, if any, remains as small as described above, regardless of whether you iterate over three elements as in your example or over three million elements.
On issue that could create confusion is that you should not use multi-threaded operations in a class initializer, which implies that you should not use a parallel stream in a class initializer. But that’s not forbidding Stream operations per se, further, the main method is not a class initializer; when the main method is invoked, the class has been initialized already.
In interview questions, don't assume that every yes/no question is limited to those two choices. A good answer might be "no difference either way".
In this case, they might have been looking for you to recognize that list.foreach() is more efficient than list.stream().foreach().

Using AsyncController to help increase concurrency on a legacy ASP.NET MVC 3 project

We have a website that is struggling with concurrent users right now.
Here is the very high-level background of the project:
Legacy ASP.NET MVC 3 project (.NET 4)
Can't do any major rewriting of core code
Main entry point that takes the longest time to execute is the SubmitSearch action on the Search controller. Average time to respond is 5-10 seconds.
So as the second point outlines, we don't want to spend too much time on this project rewriting large sections. However, we want to attempt to increase concurrent users. We're not looking to change anything else or increase performance since it would require much more work.
What we are seeing is that as more people hit SubmitSearch, the web site in general slows down. That's most likely due to all the IIS threads being locked up executing the search.
We are looking to implement AsyncController and making the SubmitSearch action execute on a normal CLR thread. Here's how we wanted to implement it:
Assume this is the original SubmitSearch method:
/// <summary>
/// Submits a search for execution.
/// </summary>
/// <param name="searchData">The search data</param>
/// <returns></returns>
public virtual ActionResult SubmitSearch(SearchFormModel searchData)
{
//our search code
}
The quickest way we were hoping to convert to AsyncController is to simply do this:
/// <summary>
/// Submits a search for execution.
/// </summary>
/// <param name="searchData">The search data</param>
/// <returns></returns>
protected virtual ActionResult SubmitSearch(SearchFormModel searchData)
{
//our search code
}
/// <summary>
/// Asynchronous Search entry point
/// </summary>
/// <param name="searchData"></param>
public void SubmitSearchAsync(SearchFormModel searchData)
{
AsyncManager.OutstandingOperations.Increment();
System.Threading.Tasks.Task.Factory.StartNew(() =>
{
ActionResult result = SubmitSearch(searchData);
AsyncManager.Parameters["result"] = result;
AsyncManager.OutstandingOperations.Decrement();
});
return;
}
/// <summary>
/// Called when the asynchronous search has completed
/// </summary>
/// <param name="result"></param>
/// <returns></returns>
public ActionResult SubmitSearchCompleted(ActionResult result)
{
//Just return the action result
return result;
}
Of course this didn't work because all through-out the code, we are referencing HttpContext.Current, which we know ends up being null in this approach.
So we were then hoping to do this with SubmitSearchAsync:
/// <summary>
/// Asynchronous Search entry point
/// </summary>
/// <param name="searchData"></param>
public void SubmitSearchAsync(SearchFormModel searchData)
{
AsyncManager.OutstandingOperations.Increment();
System.Threading.Tasks.Task.Factory.StartNew(() =>
{
ActionResult result = null;
AsyncManager.Sync(() =>
{
result = SubmitSearch(searchData);
});
AsyncManager.Parameters["result"] = result;
AsyncManager.OutstandingOperations.Decrement();
});
return;
}
This fixes the issue.
So here's my concern:
Does wrapping the execution of SubmitSearch in the AsyncManager.Sync method defeat the purpose of using this model? In other words, when we are within the AsyncManager.Sync method, are we back on the IIS threads, which puts us back at square one?
Thanks
Does wrapping the execution of SubmitSearch in the AsyncManager.Sync method defeat the purpose of using this model? In other words, when we are within the AsyncManager.Sync method, are we back on the IIS threads, which puts us back at square one?
More or less, yes. But unfortunately, in your case, using Task.Factory.StartNew also defeats the purpose of using an async controller. With the approach you're trying to use, you can't win.
IIS threads, threads started by ThreadPool.QueueUserWorkItem, and Task threads, are all taken from the same thread pool.
In order to gain any benefit from async controllers, you need true async methods. In other words, methods like Stream.ReadAsync or WebRequest.GetResponseAsync. These specially-named methods use I/O completion ports instead of normal threads, which use hardware interrupts and operate on a different thread pool.
I wrote about this a long time ago in my answer here: Using ThreadPool.QueueUserWorkItem in ASP.NET in a high traffic scenario. Tasks and awaiters are pretty sweet, but they don't change the fundamental dynamics of the .NET thread pool.
One thing to note is that there is an option, TaskCreationOptions.LongRunning, that you can specify when creating a Task, which essentially informs the framework that the task will be doing a lot of waiting, and in theory the TPL will attempt to avoid scheduling it in the thread pool. In practice, this probably won't be very practical on a high-traffic site because:
The framework doesn't actually guarantee that it won't use the thread pool. That's an implementation detail, and the option is simply a hint that you provide.
Even if it does avoid the pool, it still needs to use a thread, which is essentially like using new Thread - if not literally then at least effectively so. What this means is heavy context-switching, which absolutely kills performance and is the main reason why thread pools exist in the first place.
A "search" command clearly implies some kind of I/O, which means there's probably a real asynchronous method you can use somewhere, even if it's the old-style BeginXyz/EndXyz. There are no shortcuts here, no quick fixes; you'll have to re-architect your code to actually be asynchronous.
The .NET framework can't inspect what's going on inside your Task and magically convert it into an interrupt. It simply cannot make use of an I/O completion port unless you refer directly to the specific methods that are aware of them.
Next web or middleware application you work on, try to consider this ahead of time and avoid synchronous I/O like the plague.
I think #Aaronaught has the best answer so far: you need true asynchronous processing in order to scale (i.e., Begin/End, not just using thread pool threads), and that there are no shortcuts or quick fixes to asynchronous code - it will take a re-architecting of at least that portion.
Which brings me to this part of your question:
we don't want to spend too much time on this project rewriting large sections.
The best bang for your buck is probably to purchase more memory and stick that in the server. (You should check with a profiler first just to make sure it is a memory issue - memory usually is the limiting factor on ASP.NET but it's best to check first).
As much as we developers love to solve problems, the truth is we can burn a lot of hours, e.g., changing synchronous code to asynchronous. FYI, the Task-based Asynchronous Pattern (async/await) in .NET 4.5 will allow you to change synchronous code to asynchronous much more easily.
So for now I say buy a couple RAM chips and make a mental note to do the (much easier) upgrade to async after you change to .NET 4.5.
I would start by looking at the performance of the server itself and then consider using the profiling tools in visual studio to identify exactly what and where the bottleneck is. Consider looking at the mini profiler a discussion of which can be found here http://www.hanselman.com/blog/NuGetPackageOfTheWeek9ASPNETMiniProfilerFromStackExchangeRocksYourWorld.aspx. Generally agree with the comment above about thread consumption.
There are dozens of reasons that can cause the server to slow down. If we only talk about threads, threads consume a minimum of 1/4 memory each, that means the more threads drawn up from the thread pool, the more memory is consumed. This can be one of the problems causing the server to slow down.
If the response from server takes over 10 seconds, consider using Asynchronous. Like in your code, make SubmitSearchAsync function Asynchronously, it will avoid blocking a thread, and also releases the thread back to thread pool. However, like the code you provided, when a request is received from the SubmitSearchAsync action, a thread is drawn from the thread pool to executed its body.
The SubmitSearch is a synchronous action, it waits until the implementation is finished, and it blocks the thread until the implementation finishes.
In other word, you released one thread, but you also blocked another thread. If you need to synchronize code from an asynchronous thread, use the AsyncManager.Sync method. But in your case, AsyncManager.Sync might not help much. I suggest two possible solutions:
1) manually spawning a thread:
public virtual ActionResult SubmitSearch(SearchFormModel searchData){
new Thread(() => {
//your search code
}).Start();
}
In this case, your search code might take longer, but the execution of the search will be done on a thread not a part of the pool.
2) change the SubmitSearch function asynchronously along with using Parallelism:
protected virtual async Task<ActionResult> SubmitSearch(SearchFormModel searchData){
// Make your search code using Parallel task like below.
var task1 = DoingTask1Async(searchData);
var task2 = DoingTask2Async(searchData)
await Task.WhenAll(task1,task2);
}
Aside from above suggestion, consider using Cancellation tokens, it further reduces thread usage.
Hope it helps.
What we are seeing is that as more people hit SubmitSearch, the web site in general slows down. That's most likely due to all the IIS threads being locked up executing the search.
If it was the threads locked up then it wouldn't be a slow down but probably http errors were returned. Can I ask how many parallel hits causes the slow down? The threadpool in .Net4 is quite big. Also, if your search takes 10 seconds that means your database is doing the heavy-lifting. I would have a look at the DB performance: if other parts of your site are also DB dependent then several parallel DB intensive searches will slow down your application.
If for some reason you can't/don't want to perfmon the database then here is a simple test: change the database search call to a sleep call for X seconds (around 10 in this case). Then run your parallel requests and see if the site responsiveness drops or not. Your request thread numbers are the same so if that was the reason then it should have the same effect.

Best practices to parallelize using async workflow

Lets say I wanted to scrape a webpage, and extract some data. I'd most likely write something like this:
let getAllHyperlinks(url:string) =
async { let req = WebRequest.Create(url)
let! rsp = req.GetResponseAsync()
use stream = rsp.GetResponseStream() // depends on rsp
use reader = new System.IO.StreamReader(stream) // depends on stream
let! data = reader.AsyncReadToEnd() // depends on reader
return extractAllUrls(data) } // depends on data
The let! tells F# to execute the code in another thread, then bind the result to a variable, and continue processing. The sample above uses two let statements: one to get the response, and one to read all the data, so it spawns at least two threads (please correct me if I'm wrong).
Although the workflow above spawns several threads, the order of execution is serial because each item in the workflow depends on the previous item. Its not really possible to evaluate any items further down the workflow until the other threads return.
Is there any benefit to having more than one let! in the code above?
If not, how would this code need to change to take advantage of multiple let! statements?
The key is we are not spawning any new threads. During the whole course of the workflow, there are 1 or 0 active threads being consumed from the ThreadPool. (An exception, up until the first '!', the code runs on the user thread that did an Async.Run.) "let!" lets go of a thread while the Async operation is at sea, and then picks up a thread from the ThreadPool when the operation returns. The (performance) advantage is less pressure against the ThreadPool (and of course the major user advantage is the simple programming model - a million times better than all that BeginFoo/EndFoo/callback stuff you otherwise write).
See also http://cs.hubfs.net/forums/thread/8262.aspx
I was writing an answer but Brian beat me to it. I fully agree with him.
I'd like to add that if you want to parallelize synchronous code, the right tool is PLINQ, not async workflows, as Don Syme explains.

Resources