Java 8 Stream in main method - stream

Someone asked me in an interview if we shoul write streaming operation in the main method.
Does it makes any difference?
For example:
class Athlete {
private String name;
private int id;
public Athlete(String name,int id) {
this.name = name;
this.id = id;
}
}
public class Trial {
public static void main(String[] args) {
List<Athlete> list = new ArrayList<>();
list.add(new Athlete("John", 1));
list.add(new Athlete("Jim", 2));
list.add(new Athlete("Jojo", 3));
list.stream().forEach(System.out::print); // or any other any stream operation
}
}
So I am just curious to know if it makes any difference... For now, the only thing I know is that once the stream is consumed, it cannot be consumed again.
So does it affect the memory or create buffer memory in the JVM for streaming?
If yes? Why should this not be used in the main method?

The question whether “we should write streaming operation in the main method” is a loaded question. The first thing it implies, is the assumption that there was something special about the main method. Regardless of which operations we are talking about, if the conclusion is that you may or may not use them in an arbitrary method, there is no reason why you should come to a different conclusion when the method in question is the main method.
Apparently, “should we …” is meant to actually ask whether “should we avoid …”. If that’s the question, then, keeping in mind that there are no special rules for the main method, if there was a reason forbidding the use of the Stream API, that reason also applied to all other methods, making the Stream API an unusable API. Of course, the answer is that there is no reason forbidding the Stream API in the main method.
Regarding memory consumption, when replacing a for-each loop with a Collection.forEach method invocation, you are trading an Iterator instance for a lambda instance, so have no signifcant difference in the number and size of the created object instances. If you use Stream’s forEach method, you add a Spliterator and a Stream instance, which still can be considered insignificant, even when your application consists of the main method only. The memory pre-allocated by a JVM is much larger than the memory consumed by those few objects and your objects are very likely to fit within the threads’s local allocation store. In other words, from the outside of the JVM, there will be no difference in the memory used by the process.
As you mentioned the term “buffer”, the conceptual thing you should know is that a Stream does not buffer elements for most operations (including forEach), so, regardless of whether you traverse a Collection via loop or Stream, in both cases no memory scaling with the Collection’s size is ever allocated, so the difference, if any, remains as small as described above, regardless of whether you iterate over three elements as in your example or over three million elements.
On issue that could create confusion is that you should not use multi-threaded operations in a class initializer, which implies that you should not use a parallel stream in a class initializer. But that’s not forbidding Stream operations per se, further, the main method is not a class initializer; when the main method is invoked, the class has been initialized already.

In interview questions, don't assume that every yes/no question is limited to those two choices. A good answer might be "no difference either way".
In this case, they might have been looking for you to recognize that list.foreach() is more efficient than list.stream().foreach().

Related

How often does garbage collector gets called in Xamarin.Android application?

I have the following class:
public class CurrentOrder
{
//contains current order values which is global to all application
public static List<OrderArticleViewModel> listOfOrderArticles = new List<OrderArticleViewModel>();
public static string orderCustomerName;
public static string orderCustomerId;
public static string orderNumber;
public static string orderDateAndHour;
public static DateTime executionOrderDate = DateTime.Now.AddDays(1);
private CurrentOrder()
{
}
}
I use its fields throughout the whole application as global variables for example like that: CurrentOrder.orderNumber . When I am on certain activity and press the back button I want to clear all class fields values and I am doing it like that:
CurrentOrder.listOfOrderArticles = new List<OrderArticleViewModel>();
CurrentOrder.orderCustomerName = null;
CurrentOrder.orderCustomerId = null;
CurrentOrder.orderNumber = null;
CurrentOrder.orderDateAndHour = null;
CurrentOrder.executionOrderDate = DateTime.Now.AddDays(1);
But as far as I know the value of these fields stays in memory, the only thing is now my variables point to another place. If I click the back button 1000 times I will have 1000 times the fields in the memory nothing referencing them. I've heard that the garbage collector takes care to destroy the values that nothing is pointing at them but how often that occurs? Is it posible to press back button 100 times without the garbage collector cleaning?
There is no fixed time interval between garbage collections. Garbage collector called based upon the size of the remaining allocatable memory. Both c# and java are object-oriented language, so we don't need to allocate and release memory manually like c/c++.
Garbage collector will help developer to release memory. Xamarin.Android is using c# language, so it needs CLR to help process to manage memory(Native Android based on ART and Dalvik).
Here are conditions that when the GC will be called:
Garbage collection occurs when one of the following conditions is true:
1.The system has low physical memory. This is detected by either the low memory notification from the OS or low memory indicated by the host.
2.The memory that is used by allocated objects on the managed heap surpasses an acceptable threshold. This threshold is continuously adjusted as the process runs.
3.The GC.Collect method is called. In almost all cases, you do not have to call this method, because the garbage collector runs continuously. This method is primarily used for unique situations and testing.
And I think memory churn will prove that GC called doesn't have a fixed interval.
About your question:
Is it posible to press back button 100 times without the garbage collector cleaning?
It is based on your Android system environment( Your app is foreground or background? Is there enough memory?). But it will be gc finally.
So, about memory question, I think the memory leak and OOM( mainly due to Bitmap) should be got more attention. And memory churn also should be avoid, because it will effect Android render(UI Performance).

Delphi - Why is TObject.InitInstance public?

I'm somewhat new to Delphi, and this question is just me being curious. (I also just tried using it by accident only to discover I'm not supposed to.)
If you look at the documentation for TObject.InitInstance it tells you not to use it unless you're overriding NewInstance. The method is also public. Why not make it protected if the user is never supposed to call it?
Since I was around when this whole Delphi thing got started back around mid-1992, there are likely several answers to this question. If you look at the original declaration for TObject in Delphi 1, there weren't any protected/private members on TObject. That was because very early on in the development of Delphi and in concert with the introduction of exceptions to the language, exceptions were allocated from a different heap than other objects. This was the genesis of the NewInstance/InitInstance/CleanupInstance/FreeInstance functions. Overriding these functions on your class types you can literally control where an object is allocated.
In recent years I've used this functionality to create a cache of object instances that are literally "recycled". By intercepting NewInstance and FreeInstance, I created a system where instances are not returned to the heap upon de-allocation, rather they are placed on a lock-free/low-lock linked list. This makes allocating/freeing instances of a particular type much faster and eliminates a lot of excursions into the memory manager.
By having InitInstance public (the opposite of which is CleanupInstance), this would allow those methods to be called from other utility functions. In the above case I mentioned, InitInstance could be called on an existing block of memory without having to be called only from NewInstance. Suppose NewInstance calls a general purpose function that manages the aforementioned cache. The "scope" of the class instance is lost so the only way to call InitInstance is of it were public.
One of these days, we'll likely ship the code that does what I described above... for now it's part of an internal "research" project.
Oh, as an aside and also a bit of a history lesson... Prior to the Delphi 1 release, the design of how Exception instances were allocated/freed was returned to using the same heap as all the other objects. Because of an overall collective misstep it was assumed that we needed to allocate all Exception object instances to "protect" the Out of memory case. We reasoned that if we try and raise an exception because the memory manager was "out of memory", how in the blazes would we allocate the exception instance!? We already know there is no memory at that point! So we decided that a separate heap was necessary for all exceptions... until either Chuck Jazdzewski or Anders Heijlsberg (I forget exactly which one), figured out a simple, rather clever solution... Just pre-allocate the out of memory exception on startup! We still needed to control whether or not the exception should ever actually be freed (Exception instances are automatically freed once handled), so the whole NewInstance/FreeInstance mechanism remained.
Well never say never. In the VCL too much stuff is private and not virtual as it is, so I kinda like the fact that this stuff is public.
It isn't really necessary for normal use, but in specific cases, you might use it to allocate objects in bulk. NewInstance reserves a bit of memory for the object and then calls InitInstance to initialize it. You could write a piece of code that allocates memory for a great number of objects in one go, and then calls InitInstance for different parts of that large block to initialize different blocks in it. Such an implementation could be the base for a flyweight pattern implementation.
Normally you wouln't need such a thing at all, but it's nice that you can if you really want/need to.
How it works?
The fun thing is: a constructor in Delphi is just some method. The Create method itself doesn't do anything special. If you look at it, it is just a method as any other. It's even empty in TObject!
You can even call it on an instance (call MyObject.Create instead of TMyObject.Create), and it won't return a new object at all. The key is in the constructor keyword. That tells the compiler, that before executing the TAnyClass.Create method, it should also construct an actual object instance.
That construction means basically calling NewInstance. NewInstance allocates a piece of memory for the data of the object. After that, it calls InitInstance to do some special initialization of that memory, starting with clearing it (filling with zeroes).
Allocating memory is a relatively expensive task. A memory manager (compiled into your application) needs to find a free piece of memory and assign it to your object. If it doesn't have enough memory available, it needs to make a request to Windows to give it some more. If you have thousands or even millions of objects to create, then this can be inefficient.
In those rare cases, you could decide to allocate the memory for all those objects in one go. In that case you won't call the constructor at all, because you don't want to call NewInstance (because it would allocate extra memory). Instead, you can call InitInstance yourself to initialize pieces of your big chunk of memory.
Anyway, this is just a hypotheses of the reason. Maybe there isn't a reason at all. I've seen so many irrationally applied visibility levels in the VCL. Maybe they just didn't think about it at all. ;)
It gives developers a way to create object not using NewInstance (memory from stack/memory pool)

Reusing task objects in fork/join in Java 7

I would like to use Java fork join to solve a recursive problem, but I don't want to create a new task instance explicitly for each recursion step. The reason is that too many tasks is equal to too many objects which fills up my memory after a few minutes of processing.
I have the following solution in Java 6, but is there a better implementation for Java 7?
final static AtomicInteger max = new AtomicInteger(10); // max parallel tasks
final static ThreadPoolExecutor executor = new ThreadPoolExecutor(....);
private void submitNewTask() {
if (max.decrementAndGet()>=0) {
executor.execute(new Task(....));
return;
}
run(); // avoid creating a new object
}
public void run() {
..... process ....
// do the recursion by calling submitNewTask()
max.incrementAndGet();
}
I tried something like calling the invoke() function on the same task again (after updating the related fields, of course), but it does not seem to work.
I think you are not using the right approach. Fork/Join framework is intended to execute a long time running algorithm on a (potentially) big set of data into a parallel fashion splitting your data into smaller pieces (the RecursiveTask itself) than can be executed by more threads (speeding up execution on multiple "cpu" machines) using a work-stealing strategy.
A RecursiveTask does not need to replicate all your data, but just to keep indexes on the portion you are working on (to avoid harmful overlapping), so data overhead is kept at minimum (of course, every RecursiveTask consumes memory too).
There's often a thread off between memory occupation and time of execution in algorithm design, so FJ framework is intended to reduce time of execution paying a (i think reasonably little) memory occupation. If time of execution is not your first concern, I think that FJ is useless for your problem.

XNA/C#: Entity Factories and typeof(T) performance

In our game (targeted at mobile) we have a few different entity types and I'm writing a factory/repository to handle instantiation of new entities. Each concrete entity type has its own factory implementation and these factories are managed by an EntityRepository.
I'd like to implement the repository as such:
Repository
{
private Dictionary <System.Type, IEntityFactory<IEntity>> factoryDict;
public T CreateEntity<T> (params) where T : IEntity
{
return factoryDict[typeof(T)].CreateEntity() as T;
}
}
usage example
var enemy = repo.CreateEntity<Enemy>();
but I am concerned about performance, specifically related to the typeof(T) operation in the above. It is my understanding that the compiler would not be able to determine T's type and it will have to be determined at runtime via reflection, is this correct? One alternative is:
Repository
{
private Dictionary <System.Type, IEntityFactory> factoryDict;
public IEntity CreateEntity (System.Type type, params)
{
return factoryDict[type].CreateEntity();
}
}
which will be used as
var enemy = (Enemy)repo.CreateEntity(typeof(Enemy), params);
in this case whenever typeof() is called, the type is on hand and can be determined by the compiler (right?) and performance should be better. Will there be a noteable difference? any other considerations? I know I can also just have a method such as CreateEnemy in the repository (we only have a few entity types) which would be faster but I would prefer to keep the repository as entity-unaware as possible.
EDIT:
I know that this may most likely not be a bottleneck, my concern is just that it is such a waste to use up time on reflecting when there is a slightly less sugared alternative available. And I think it's an interesting question :)
I did some benchmarking which proved quite interesting (and which seem to confirm my initial suspicions).
Using the performance measurement tool I found at
http://blogs.msdn.com/b/vancem/archive/2006/09/21/765648.aspx
(which runs a test method several times and displays metrics such as average time etc) I conducted a basic test, testing:
private static T GenFunc<T>() where T : class
{
return dict[typeof(T)] as T;
}
against
private static Object ParamFunc(System.Type type)
{
var d = dict[type];
return d;
}
called as
str = GenFunc<string>();
vs
str = (String)ParamFunc(typeof(String));
respectively. Paramfunc shows a remarkable improvement in performance (executes on average in 60-70% the time it takes GenFunc) but the test is quite rudimentary and I might be missing a few things. Specifically how the casting is performed in the generic function.
An interesting aside is that there is little (neglible) performance gained by 'caching' the type in a variable and passing it to ParamFunc vs using typeof() every time.
Generics in C# don't use or need reflection.
Internally types are passed around as RuntimeTypeHandle values. And the typeof operator maps to Type.GetTypeFromHandle (MSDN). Without looking at Rotor or Mono to check, I would expect GetTypeFromHandle to be O(1) and very fast (eg: an array lookup).
So in the generic (<T>) case you're essentially passing a RuntimeTypeHandle into your method and calling GetTypeFromHandle in your method. In your non-generic case you're calling GetTypeFromHandle first and then passing the resultant Type into your method. Performance should be near identical - and massively outweighed by other factors, like any places you're allocating memory (eg: if you're using the params keyword).
But it's a factory method anyway. Surely it won't be called more than a couple of times per second? Is it even worth optimising?
You always hear how slow reflection is, but in C#, there is actually fast reflection and slow reflection. typeof is fast-reflection - it has basically the overhead of method call, which is nearly infinitesimal.
I would bet a steak and lobster dinner that this isn't going to be a performance bottleneck in your application, so it's not even worth your (or our) time in trying to optimize it. It's been said a million times before, but it's worth saying again: "Premature optimization is the root of all evil."
So, finish writing the application, then profile to determine where your bottlenecks are. If this turns out to be one of them, then and only then spend time optimizing it. And let me know where you'd like to have dinner.
Also, my comment above is worth repeating, so you don't spend any more time reinventing the wheel: Any decent IoC container (such as AutoFac) can [create factory methods] automatically. If you use one of those, there is no need to write your own repository, or to write your own CreateEntity() methods, or even to call the CreateEntity() method yourself - the library does all of this for you.

Parsing variable length descriptors from a byte stream and acting on their type

I'm reading from a byte stream that contains a series of variable length descriptors which I'm representing as various structs/classes in my code. Each descriptor has a fixed length header in common with all the other descriptors, which are used to identify its type.
Is there an appropriate model or pattern I can use to best parse and represent each descriptor, and then perform an appropriate action depending on it's type?
I've written lots of these types of parser.
I recommend that you read the fixed length header, and then dispatch to the correct constructor to your structures using a simple switch-case, passing the fixed header and stream to that constructor so that it can consume the variable part of the stream.
This is a common problem in file parsing. Commonly, you read the known part of the descriptor (which luckily is fixed-length in this case, but isn't always), and branch it there. Generally I use a strategy pattern here, since I generally expect the system to be broadly flexible - but a straight switch or factory may work as well.
The other question is: do you control and trust the downstream code? Meaning: the factory / strategy implementation? If you do, then you can just give them the stream and the number of bytes you expect them to consume (perhaps putting some debug assertions in place, to verify that they do read exactly the right amount).
If you can't trust the factory/strategy implementation (perhaps you allow the user-code to use custom deserializers), then I would construct a wrapper on top of the stream (example: SubStream from protobuf-net), that only allows the expected number of bytes to be consumed (reporting EOF afterwards), and doesn't allow seek/etc operations outside of this block. I would also have runtime checks (even in release builds) that enough data has been consumed - but in this case I would probably just read past any unread data - i.e. if we expected the downstream code to consume 20 bytes, but it only read 12, then skip the next 8 and read our next descriptor.
To expand on that; one strategy design here might have something like:
interface ISerializer {
object Deserialize(Stream source, int bytes);
void Serialize(Stream destination, object value);
}
You might build a dictionary (or just a list if the number is small) of such serializers per expected markers, and resolve your serializer, then invoke the Deserialize method. If you don't recognise the marker, then (one of):
skip the given number of bytes
throw an error
store the extra bytes in a buffer somewhere (allowing for round-trip of unexpected data)
As a side-note to the above - this approach (strategy) is useful if the system is determined at runtime, either via reflection or via a runtime DSL (etc). If the system is entirely predictable at compile-time (because it doesn't change, or because you are using code-generation), then a straight switch approach may be more appropriate - and you probably don't need any extra interfaces, since you can inject the appropriate code directly.
One key thing to remember, if you're reading from the stream and do not detect a valid header/message, throw away only the first byte before trying again. Many times I've seen a whole packet or message get thrown away instead, which can result in valid data being lost.
This sounds like it might be a job for the Factory Method or perhaps Abstract Factory. Based on the header you choose which factory method to call, and that returns an object of the relevant type.
Whether this is better than simply adding constructors to a switch statement depends on the complexity and the uniformity of the objects you're creating.
I would suggest:
fifo = Fifo.new
while(fd is readable) {
read everything off the fd and stick it into fifo
if (the front of the fifo is has a valid header and
the fifo is big enough for payload) {
dispatch constructor, remove bytes from fifo
}
}
With this method:
you can do some error checking for bad payloads, and potentially throw bad data away
data is not waiting on the fd's read buffer (can be an issue for large payloads)
If you'd like it to be nice OO, you can use the visitor pattern in an object hierarchy. How I've done it was like this (for identifying packets captured off the network, pretty much the same thing you might need):
huge object hierarchy, with one parent class
each class has a static contructor that registers with its parent, so the parent knows about its direct children (this was c++, I think this step is not needed in languages with good reflection support)
each class had a static constructor method that got the remaining part of the bytestream and based on that, it decided if it is his responsibility to handle that data or not
When a packet came in, I've simply passed it to static constructor method of the main parent class (called Packet), which in turn checked all of its children if it's their responsibility to handle that packet, and this went recursively, until one class at the bottom of the hierarchy returned the instantiated class back.
Each of the static "constructor" methods cut its own header from the bytestream and passed down only the payload to its children.
The upside of this approach is that you can add new types anywhere in the object hierarchy WITHOUT needing to see/change ANY other class. It worked remarkably nice and well for packets; it went like this:
Packet
EthernetPacket
IPPacket
UDPPacket, TCPPacket, ICMPPacket
...
I hope you can see the idea.

Resources