Old versus new foreach loop - foreach

In what cases should we use the older foreach loop over the new collection.forEach() in JDK 8 or is it best practice to convert every foreach loop? Are there any important performance differences?
The only case I can think of is if you want to iterate over an array and don't want to convert your array to a list first.

It's hard to think of best practices at this point since JDK 8 hasn't even shipped yet. However, there are some interesting observations based on early use of these APIs.
The forEach() method is now on Iterable, which is inherited by Collection, so all collections can use forEach().
An array can be wrapped in a collection with Arrays.asList() or in a stream with Arrays.stream(). These are just wrappers; they don't copy all the elements into a new container.
Regarding performance, the default implementation of Iterable.forEach(action) is simply the usual "enhanced for-loop" which creates an iterator and issues successive hasNext() and next() calls and calls to the action method within a loop. There is a little bit of extra overhead for the extra method calls, compared to a bare enhanced-for loop, but it's probably very small.
I would make a choice on stylistic grounds, not performance.
It's probably not worth it to convert every enhanced-for loop to use forEach(). Consider:
for (String s : coll) {
System.out.println("---");
System.out.println(s);
System.out.println("---");
}
versus
coll.forEach(s -> {
System.out.println("---");
System.out.println(s);
System.out.println("---");
});
If the lambda is a true one-liner it might be worth it, but in my opinion a multi-line statement lambda within forEach() isn't any clearer than a nice for-loop.
However, if the body of the for-loop has logic in it, or if it needs to maintain some kind of running state, it might be worthwhile to recast the loop into a stream pipeline. Consider this snippet that finds the length of the longest string in a collection:
int longest = -1;
for (String s : strings) {
int len = s.length();
if (len > longest)
longest = len;
}
Rewritten using lambdas and the streams library, it would look like this:
OptionalInt longest =
strings.stream()
.mapToInt(s -> s.length())
.max();
This is new and unfamiliar, of course, but after working with this for a while, I find it to be concise and readable.

Related

What's the most efficient way to make a single-item Iterable in Dart?

In Dart, single-item iterables can be created in various ways.
['item']; // A single-item list
Iterable.generate(1, (_) => 'item'); // A single-item generated Iterable
How do these methods compare in terms of speed and memory usage, during both creation and iteration?
The list is probably the cheapest.
At least if you use a List.filled(1, 'item') to make it a fixed-length list, those are cheaper than growable lists.
You need to store the one element somehow. Storing it directly in the list takes a single reference. Then the list might spend one more cell on the length (of 1).
On top of that, compilers know platform lists and can be extra efficient if they can recognize that that's what they're dealing with.
They probably can't here, since you need an iterable to pass to something expecting an arbitrary iterable, because otherwise a single element iterable isn't useful.
It's a little worrying that the list is mutable. If you know the element at compile time, a const ['item'] is even better.
Otherwise Iterable.generate(1, constantFunction) is not a bad choice, if you know the value at compile time and can write a String constantFunction() => 'item'; static or top-level function that you can tear off efficiently.
That said, none of the things here really matter for performance, unless you're going to do it a lot of times in a singe run of your program.
The difference in timing is going to be minimal. The difference in memory usage is likely less than the size of the Iterable being created by either method.
You'll save much more by not using a for (var e in iterable) on an iterable, and, say, use for (var i = 0; i < list.length; i++) { var e = list[i]; ... } instead, if you can ensure that the iterable is always a list.

Best way to access/store persistent data in CUDA along multiple kernel calls [duplicate]

I have 2 very similar kernel functions, in the sense that the code is nearly the same, but with a slight difference. Currently I have 2 options:
Write 2 different methods (but very similar ones)
Write a single kernel and put the code blocks that differ in an if/else statement
How much will an if statement affect my algorithm performance?
I know that there is no branching, since all threads in all blocks will enter either the if, or the else.
So will a single if statement decrease my performance if the kernel function is called a lot of times?
You have a third alternative, which is to use C++ templating and make the variable which is used in the if/switch statement a template parameter. Instantiate each version of the kernel you need, and then you have multiple kernels doing different things with no branch divergence or conditional evaluation to worry about, because the compiler will optimize away the dead code and the branching with it.
Perhaps something like this:
template<int action>
__global__ void kernel()
{
switch(action) {
case 1:
// First code
break;
case 2:
// Second code
break;
}
}
template void kernel<1>();
template void kernel<2>();
It will slightly decrease your performance, especially if it's in an inner loop, since you're wasting an instruction issue slot every so often, but it's not nearly as much as if a warp were divergent.
If it's a big deal, it may be worth moving the condition outside the loop, however. If the warp is truly divergent, though, think about how to remove the branching: e.g., instead of
if (i>0) {
x = 3;
} else {
x = y;
}
try
x = ((i>0)*3) | ((i<3)*y);

What are the advantages of the "apply" functions? When are they better to use than "for" loops, and when are they not? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Is R's apply family more than syntactic sugar
Just what the title says. Stupid question, perhaps, but my understanding has been that when using an "apply" function, the iteration is performed in compiled code rather than in the R parser. This would seem to imply that lapply, for instance, is only faster than a "for" loop if there are a great many iterations and each operation is relatively simple. For instance, if a single call to a function wrapped up in lapply takes 10 seconds, and there are only, say, 12 iterations of it, I would imagine that there's virtually no difference at all between using "for" and "lapply".
Now that I think of it, if the function inside the "lapply" has to be parsed anyway, why should there be ANY performance benefit from using "lapply" instead of "for" unless you're doing something that there are compiled functions for (like summing or multiplying, etc)?
Thanks in advance!
Josh
There are several reasons why one might prefer an apply family function over a for loop, or vice-versa.
Firstly, for() and apply(), sapply() will generally be just as quick as each other if executed correctly. lapply() does more of it's operating in compiled code within the R internals than the others, so can be faster than those functions. It appears the speed advantage is greatest when the act of "looping" over the data is a significant part of the compute time; in many general day-to-day uses you are unlikely to gain much from the inherently quicker lapply(). In the end, these all will be calling R functions so they need to be interpreted and then run.
for() loops can often be easier to implement, especially if you come from a programming background where loops are prevalent. Working in a loop may be more natural than forcing the iterative computation into one of the apply family functions. However, to use for() loops properly, you need to do some extra work to set-up storage and manage plugging the output of the loop back together again. The apply functions do this for you automagically. E.g.:
IN <- runif(10)
OUT <- logical(length = length(IN))
for(i in IN) {
OUT[i] <- IN > 0.5
}
that is a silly example as > is a vectorised operator but I wanted something to make a point, namely that you have to manage the output. The main thing is that with for() loops, you always allocate sufficient storage to hold the outputs before you start the loop. If you don't know how much storage you will need, then allocate a reasonable chunk of storage, and then in the loop check if you have exhausted that storage, and bolt on another big chunk of storage.
The main reason, in my mind, for using one of the apply family of functions is for more elegant, readable code. Rather than managing the output storage and setting up the loop (as shown above) we can let R handle that and succinctly ask R to run a function on subsets of our data. Speed usually does not enter into the decision, for me at least. I use the function that suits the situation best and will result in simple, easy to understand code, because I'm far more likely to waste more time than I save by always choosing the fastest function if I can't remember what the code is doing a day or a week or more later!
The apply family lend themselves to scalar or vector operations. A for() loop will often lend itself to doing multiple iterated operations using the same index i. For example, I have written code that uses for() loops to do k-fold or bootstrap cross-validation on objects. I probably would never entertain doing that with one of the apply family as each CV iteration needs multiple operations, access to lots of objects in the current frame, and fills in several output objects that hold the output of the iterations.
As to the last point, about why lapply() can possibly be faster that for() or apply(), you need to realise that the "loop" can be performed in interpreted R code or in compiled code. Yes, both will still be calling R functions that need to be interpreted, but if you are doing the looping and calling directly from compiled C code (e.g. lapply()) then that is where the performance gain can come from over apply() say which boils down to a for() loop in actual R code. See the source for apply() to see that it is a wrapper around a for() loop, and then look at the code for lapply(), which is:
> lapply
function (X, FUN, ...)
{
FUN <- match.fun(FUN)
if (!is.vector(X) || is.object(X))
X <- as.list(X)
.Internal(lapply(X, FUN))
}
<environment: namespace:base>
and you should see why there can be a difference in speed between lapply() and for() and the other apply family functions. The .Internal() is one of R's ways of calling compiled C code used by R itself. Apart from a manipulation, and a sanity check on FUN, the entire computation is done in C, calling the R function FUN. Compare that with the source for apply().
From Burns' R Inferno (pdf), p25:
Use an explicit for loop when each
iteration is a non-trivial task. But a
simple loop can be more clearly and
compactly expressed using an apply
function. There is at least one
exception to this rule ... if the result will
be a list and some of the components
can be NULL, then a for loop is
trouble (big trouble) and lapply gives
the expected answer.

Do recursive sequences leak memory?

I like to define sequences recursively as follows:
let rec startFrom x =
seq {
yield x;
yield! startFrom (x + 1)
}
I'm not sure if recursive sequences like this should be used in practice. The yield! appears to be tail recursive, but I'm not 100% sure since its being called from inside another IEnumerable. From my point of view, the code creates an instance of IEnumerable on each call without closing it, which would actually make this function leak memory as well.
Will this function leak memory? For that matter is it even "tail recursive"?
[Edit to add]: I'm fumbling around with NProf for an answer, but I think it would be helpful to get a technical explanation regarding the implementation of recursive sequences on SO.
I'm at work now so I'm looking at slightly newer bits than Beta1, but on my box in Release mode and then looking at the compiled code with .Net Reflector, it appears that these two
let rec startFromA x =
seq {
yield x
yield! startFromA (x + 1)
}
let startFromB x =
let z = ref x
seq {
while true do
yield !z
incr z
}
generate almost identical MSIL code when compiled in 'Release' mode. And they run at about the same speed as this C# code:
public class CSharpExample
{
public static IEnumerable<int> StartFrom(int x)
{
while (true)
{
yield return x;
x++;
}
}
}
(e.g. I ran all three versions on my box and printed the millionth result, and each version took about 1.3s, +/- 1s). (I did not do any memory profiling; it's possible I'm missing something important.)
In short, I would not sweat too much thinking about issues like this unless you measure and see a problem.
EDIT
I realize I didn't really answer the question... I think the short answer is "no, it does not leak". (There is a special sense in which all 'infinite' IEnumerables (with a cached backing store) 'leak' (depending on how you define 'leak'), see
Avoiding stack overflow (with F# infinite sequences of sequences)
for an interesting discussion of IEnumerable (aka 'seq') versus LazyList and how the consumer can eagerly consume LazyLists to 'forget' old results to prevent a certain kind of 'leak'.)
.NET applications don't "leak" memory in this way. Even if you are creating many objects a garbage collection will free any objects that have no roots to the application itself.
Memory leaks in .NET usually come in the form of unmanaged resources that you are using in your application (database connections, memory streams, etc.). Instances like this in which you create multiple objects and then abandon them are not considered a memory leak since the garbage collector is able to free the memory.
It won't leak any memory, it'll just generate an infinite sequence, but since sequences are IEnumerables you can enumerate them without memory concerns.
The fact that the recursion happens inside the sequence generation function doesn't impact the safety of the recursion.
Just mind that in debug mode tail call optimization may be disabled in order to allow full debugging, but in release there'll be no issue whatsoever.

Worse sin: side effects or passing massive objects?

I have a function inside a loop inside a function. The inner function acquires and stores a large vector of data in memory (as a global variable... I'm using "R" which is like "S-Plus"). The loop loops through a long list of data to be acquired. The outer function starts the process and passes in the list of datasets to be acquired.
for (dataset in list_of_datasets) {
for (datachunk in dataset) {
<process datachunk>
<store result? as vector? where?>
}
}
I programmed the inner function to store each dataset before moving to the next, so all the work of the outer function occurs as side effects on global variables... a big no-no. Is this better or worse than collecting and returning a giant, memory-hogging vector of vectors? Is there a superior third approach?
Would the answer change if I were storing the data vectors in a database rather than in memory? Ideally, I'd like to be able to terminate the function (or have it fail due to network timeouts) without losing all the information processed prior to termination.
use variables in the outer function instead of global variables. This gets you the best of both approaches: you're not mutating global state, and you're not copying a big wad of data. If you have to exit early, just return the partial results.
(See the "Scope" section in the R manual: http://cran.r-project.org/doc/manuals/R-intro.html#Scope)
Remember your Knuth. "Premature optimization is the root of all programming evil."
Try the side effect free version. See if it meets your performance goals. If it does, great, you don't have a problem in the first place; if it doesn't, then use the side effects, and make a note for the next programmer that your hand was forced.
It's not going to make much difference to memory use, so you might as well make the code clean.
Since R has copy-on-modify for variables, modifying the global object will have the same memory implications as passing something up in return values.
If you store the outputs in a database (or even in a file) you won't have the memory use issues, and the data will be incrementally available as it is created, rather than just at the end. Whether it's faster with the database depends primarily on how much memory you are using: is the reduction is garbage collection going to pay for the cost of writing to disk.
There are both time and memory profilers in R, so you can see empirically what the impacts are.
FYI, here's a full sample toy solution that avoids side effects:
outerfunc <- function(names) {
templist <- list()
for (aname in names) {
templist[[aname]] <- innerfunc(aname)
}
templist
}
innerfunc <- function(aname) {
retval <- NULL
if ("one" %in% aname) retval <- c(1)
if ("two" %in% aname) retval <- c(1,2)
if ("three" %in% aname) retval <- c(1,2,3)
retval
}
names <- c("one","two","three")
name_vals <- outerfunc(names)
for (name in names) assign(name, name_vals[[name]])
I'm not sure I understand the question, but I have a couple of solutions.
Inside the function, create a list of the vectors and return that.
Inside the function, create an environment and store all the vectors inside of that. Just make sure that you return the environment in case of errors.
in R:
help(environment)
# You might do something like this:
outer <- function(datasets) {
# create the return environment
ret.env <- new.env()
for(set in dataset) {
tmp <- inner(set)
# check for errors however you like here. You might have inner return a list, and
# have the list contain an error component
assign(set, tmp, envir=ret.env)
}
return(ret.env)
}
#The inner function might be defined like this
inner <- function(dataset) {
# I don't know what you are doing here, but lets pretend you are reading a data file
# that is named by dataset
filedata <- read.table(dataset, header=T)
return(filedata)
}
leif
Third approach: inner function returns a reference to the large array, which the next statement inside the loop then dereferences and stores wherever it's needed (ideally with a single pointer store and not by having to memcopy the entire array).
This gets rid of both the side effect and the passing of large datastructures.
It's tough to say definitively without knowing the language/compiler used. However, if you can simply pass a pointer/reference to the object that you're creating, then the size of the object itself has nothing to do with the speed of the function calls. Manipulating this data down the road could be a different story.

Resources