What is meant by term "heterogeneous" in distributed system? - middleware

The term "heterogeneous" is often used in distributed-system and middleware. What does it mean?

Homogenous hardware in a distributed system would be every machine having the same hardware, same OS, and possibly even being dedicated to just running one thing.
Heterogenous would mean:
Inconsistent Hardware. You might have one set of servers from 2011, one from 2013, and one fancy new set ramped up this year, all in the same pool for compute resources. They have different CPUs, amounts of memory, and amounts of disk available for use.
Possibly Inconsistent OS. A crowdsourced distributed computing environment - something that runs when a user's screensaver is going - might have different everything; it may be running on Linux, on Windows, or OS/X.
In both cases, you have to plan to be immune to differences in resources, or flexible based on what the actual differences are.
To be clear; the first is 1000x more common. If someone says "heterogenous distributed computing", they mean two or more computers with different hardware being used together to solve a problem.

Related

Is it possible/easy to determine how much power a program is using?

Is it possible to determine or even reasonably estimate how much power a program is using? The idea being to profile my code in terms of power consumption instead of just typical performance.
Is it enough to measure CPU use, GPU use and memory access?
There are many aspects that can influence the power consumption of an application, and these will vary a lot depending on the used hardware.
The easiest way to get an idea is to measure it. If your program is doing heavy calculations, it is quite simple to measure the difference. Just read out the usage while the app is running, and subtract the usage while not.
If your app is not of the heavy calculations kind, then the challenge is allot bigger, since a simple 1 point in time comparison will not do the trick. You could get a measuring device that can log usage over time, which log you would need to compare with the logged process activity of your machine and try to filter all other scheduled tasks (checks for updates etc) out.
Just a tip if you want to go this way, APC's UPS's come with this functionality built-in, and the PowerChute software stores a power consumption log in an Access Database (C:\Program Files\APC\PowerChute Personal Edition\EnergyLog.mdb). I'm not sure if this is true for all models, but it was a nice extra feature that came with mine (Pro 550). I would lay the data alongside an Xperf trace (the free built in profiler in Windows, look here for an overview), in order to correlate power variations with your applications activity, and filter out scheduled jobs etc...
That said, remember you will get different results on different hardware. An ssd will differ from a tradational harddisk, and the used grafix adapter can also make a difference, so you could only get a rough estimation overall, by measuring on a "typical" system. Desktop systems will consume allot more than laptops, etc... (also see this blogpost).
Power consumption profiling tools are allot more common for mobile devices. I'm not an expert in that field, but I know there are quite some tools out there.
There is a project that allows you to get power consumption of a given program of PID on Linux, requiring no hardware: scaphandre
With that you could create grafana dashboards to follow the power consumption of a given project from one release to another. Examples can be seen here.

Why do the CLR and JVM use a stack-based architecture?

I am always curious as to why the JVM and CLR have a stack-based architecture?
Why don't they use a register-based approach?
What benefits does it have over the register-based approach?
I used to ponder the differences between register and stack machines and compare instruction sequences, and run benchmarks...
Then I spent a couple of years implementing both types of machines while working on the Parrot VM, which was a register machine. We started, naively, with a fixed register set, in combination with data and register stacks, but eventually concluded that it was an artificial limitation, so we changed to an infinite register set and an allocator. At some point, the Parrot fast core (GCC computed goto) outperformed Mono and JVM interpreter cores (non-JIT), but the difference came down to the JIT. Parrot's JIT never matched the quality of the others. It is the quality of the JITter that makes the eventual machine, and that is generally what people care about. If all VMs played by the same rules (ie. they had a constraint to run in interpreted mode with no JIT), then my evidence shows a register machine has the performance edge on an equivalent stack machine. Larger instructions, but fewer of them == higher throughput (IPC), and better cache locality of reference. The Dalvik JVM actually supports my findings, Dalvik had no JIT for a couple of years, and competed with its interpreter core.
Very few mainstream VMs run in interpretation mode exclusively (AFAIK), they JIT compile, and thats what we benchmark. The point of the interpreter core is to establish a presence on the platform, do bytecode verification, and provide a failsafe execution core in absence of the JIT. This isn't a rule, of course; there are billions of devices running ARM accelerated JVM without JIT, but in the absence of memory or CPU constraints, this applies.
I worked and worked at tweaking the core, testing and tuning, only to find that in the end we really wanted a fast JIT. I arrived at the conclusion that if you are going to eventually JIT, it doesn't matter much whether you implement a stack or register machine to start, do what you like; but you will get "to market" faster with a stack machine. Doing a lot of pseudo-register-machine virtual optimizations for bytecode interpretation by a virtual machine core is partially a wasted effort, because it isn't real native optimization. The soft-core doesn't do branch prediction, register renaming, instruction reordering, parallel execution or prefetch like a real processor. My feeling is that once we have a high quality JIT to native binary, we arrive at the same destination.
For those reasons, I technically favor a stack based machine for:
Simplicity - Much less code to maintain = less bugs
Time to implement
But visually, and emotionally I favor a register machine for:
Visual-Conceptual models more closely match the machine, and my
brain
Flexibility - Compilers can evaluate their expression trees
in different orders using SSA.
Note I didn't say compilers could more "easily" generate code. That seems to be what people who have worked mostly with stack machines like to argue. I don't believe that and didn't find that to be true. I saw many hobby compilers written in a short time on both Parrot and the CLR, though I would admit the ones on the CLR are of higher quality, but that is mainly one of ecosystem and quality of available tools. I wrote compilers on both platforms myself, and found there are tradeoffs, but not enough to lose sleep over.
This is an educated guess, because my real-world experience does not include writing a full JITter so I don't have first-hand experience comparing the pros and cons of JITting various forms of opcodes, but my opinion is, if you plan to include a JIT, then creating an extremely sophisticated virtual machine opcode core amounts to premature optimization. Your time is better spent elsewhere.
It is usually not appropriate to just link out to an article but this time I'll make an exception: This article by Eric Lippert answers just this question.

What is the significance of NCSU's double floating gate FET technology for the future of OS design?

this is my first post, and I certainly hope that it's appropriate to the forum - I've been lurking for some time, and I believe that it is, but my apologies if this is not the case.
I presume that most if not all readers here have encountered the story regarding the discovery (popularization?) of "double floating gate FETs" by NCSU, which is being hailed as a potential "universal memory."
http://www.engadget.com/2011/01/23/scientists-build-double-floating-gate-fet-believe-it-could-revo/
I've been slowly digesting a wide range of material related to software development, including OS design (I am, perhaps obviously, strictly amateur), and given that one of the most basic functions that any OS performs is managing the ever-present exchange of data between perpetual storage and volatile memory, it strikes me that if this technology matures and becomes widely available, it would represent such a sea-change in the role of the OS that it would almost necessitate rewriting our operating systems from the ground up to make full use of its potential. Am I correct in my estimation of the role of the OS, and in the potential ramifications of the new technology? Or, have I perhaps failed to understand some critical distinction regarding the logical (as opposed to physical) relationships between processor, memory, and storage?
Until we know what it'll cost in mass production, it's impossible to say. Unless it can be made cheaply enough to replace hard drives, so you erase the distinction between "main memory" and "long term storage", the impact will be minimal. I'd be surprised to see that happen though.
Even if economics allowed that to happen, I doubt it really would. Some mobile devices already use identical (battery backed RAM) for main memory and long term storage. This would eliminate the battery-backing for the memory, but doesn't seem likely to impact OS design at all. Since (at least most) use the same battery for normal operation and maintaining long-term storage, this would simply mean and extra 10% (or whatever) life from the same battery -- but only if its normal operation required about the same power as normal RAM (which strikes me as unlikely -- most flash draws extra power when writing).

Evaluate software minimum requirements

Is there a way to evaluate the minimum requirements of a software? I mean, how can I discover, for example, the minimum amount of RAM that my application will need?
Thanks!
A profiler will not help you here. Neither will estimating the size of data structures.
A profiler can certainly tell you where your code is spending the most CPU time, but it will not tell you if you are missing performance targets - e.g. if your users will be happy, or unhappy with the performance of your application on any given system.
Simply computing the size of data structures, and how many may be allocated at any one time will not at all give you an accurate picture of memory usage over time. The reason is that memory usage is determined by many other factors including how much I/O your application does, what OS services your application uses, and most importantly the temporal nature of how your application uses memory.
The most effective way to understand minimum requirements is to
Make sure you have an effective way of measuring performance using metrics that are important to your user. the best metric is response time. Depending on your app, a rate such as throughput or operations per second may be applicable. Your measurements could be empirical (e.g. just try it) but that is least effective. This is best done with some kind of instrumentation. On windows, the choice is [ETW][1]. Other operating systems have other suitable mechanisms.
Have some kind of automated method of exercising your application. This will let you make repeated and reliable measurements.
Measure your application using various memory sizes and see where performance begins to suffer. This may also expose performance bugs that prevent your application from performing well. If you have access to platforms of various performance levels, use those as well. You didn't indicate what your app does, but testing on a netbook with 1GB of memory is great for many (not all) client applications.
You can do the same with the CPU and other components such as disk, networking or the GPU.
Also note that there is no simple answer here - doing an effective job at setting minimum requirements is real work. This is especially true if your application is participatory sensitive to one platform aspect or another.
There are other factors as well - for example, your app may run fine in one configuration until the user opens another application that may be memory hungry or a CPU pig. Users rarely only have one application open.
This means that in addition to specifying minimum requirements you must do an effective job in setting user expectations - that is explaining when your application will perform well, and when it won't, and what the factors are that impact performance.
[1]: http://msdn.microsoft.com/en-us/library/ms751538.aspxstrong text
Ideally, you'd decide on the minimum requirements of a piece of software based on your target audience, and then test your software during development on that configuration to ensure it delivers a satisfactory experience.
You can look at a system running your software and see how much memory is being consumed by your application, and use that to guide how much memory is being consumed. CPU is a little bit more complex - you could try to model your CPU requirements, but doing this accurately can be challenging.
But ultimately, you need to test your app on the base system you are targeting.
Given the data structures used by the application, estimate how much space they will take up in normal use. Using that estimation, set up a number of machines (virtual or physical) to test the estimate in different scenarios (i.e. different target operating systems, different virtual memory settings, etc).
Then measure the performance of the application in the different scenarios. Your minimum settings will be the machine that performs the least adequately while still being acceptable.
You could try using a performance profiler on your software while stress testing it.
You could use virtualization to repeatedly run a representative test suite with different amounts of RAM in the virtual machine...when the performance falls below acceptable levels due to swapping, you've found the memory requirement.

Datamining models in FORTRAN or C (or managed code)?

We are planning to develop a datamining package for windows. The program core / calculation engine will be developed in F# with GUI stuff / DB bindings etc done in C# and F#.
However, we have not yet decided on the model implementations. Since we need high performance, we probably can't use managed code here (any objections here?). The question is, is it reasonable to develop the models in FORTRAN or should we stick to C (or maybe C++). We are looking into using OpenCL at some point for suitable models - it feels funny having to go from managed code -> FORTRAN -> C -> OpenCL invocation for these situations.
Any recommendations?
F# compiles to the CLR, which has a just-in-time compiler. It's a dialect of ML, which is strongly typed, allowing all of the nice optimisations that go with that type of architecture; this means you will probably get reasonable performance from F#. For comparison, you could also try porting your code to OCaml (IIRC this compiles to native code) and see if that makes a material difference.
If it really is too slow then see how far that scaling hardware will get you. With the performance available through a modern PC or server it seems unlikely that you would need to go to anything exotic unless you are working with truly brobdinagian data sets. Users with smaller data sets may well be OK on an ordinary PC.
Workstations give you perhaps an order of magnitude more capacity than a standard dekstop PC. A high-end workstation like a HP Z800 or XW9400 (similar kit is available from several other manufacturers) can take two 4 or 6 core CPU chips, tens of gigabytes of RAM (up to 192GB in some cases) and has various options for high-speed I/O like SAS disks, external disk arrays or SSDs. This type of hardware is expensive but may be cheaper than a large body of programmer time. Your existing desktop support infrastructure shouldn be able to this sort of kit. The most likely problem is compatibility issues running 32 bit software on a 64-bit O/S. In this case you have various options like VMs or KVM switches to work around the compatibility issues.
The next step up is a 4 or 8 socket server. Fairly ordinary wintel servers go up to 8 sockets (32-48 cores) and perhaps 512GB of RAM - without having to move off the Wintel platform. This gives you fairly wide range of options within your platform of choice before you have to go to anything exotic1.
Finally, if you can't make it run quickly in F#, validate the F# prototype and build a C implementation using the F# prototype as a control. If that's still not fast enough you've got problems.
If your application can be structured in a way that suits the platform then you could look at a more exotic platform. Depending on what will work with your application, you might be able to host it on a cluster, cloud provider or build the core engine on a GPU, Cell processor or FPGA. However, in doing this you're getting into (quite substantial) additional costs and exotic dependencies that might cause support issues. You will probably also have to bring a third-party consultant who knows how to program the platform.
After all that, the best advice is: suck it and see. If you're comfortable with F# you should be able to prototype your application fairly quickly. See how fast it runs and don't worry too much about performance until you have some clear indication that it really will be an issue. Remember, Knuth said that premature optimisation is the root of all evil about 97% of the time. Keep a weather eye out for issues and re-evaluate your strategy if you think performance really will cause trouble.
Edit: If you want to make a packaged application then you will probably be more performance-sensitive than otherwise. In this case performance will probably become an issue sooner than it would with a bespoke system. However, this doesn't affect the basic 'suck it and see' principle.
For example, at the risk of starting a game of buzzword bingo, if your application can be parallelized and made to work on a shared-nothing architecture you might see if one of the cloud server providers [ducks] could be induced to host it. An appropriate front-end could be built to run locally or through a browser. However, on this type of architecture the internet connection to the data source becomes a bottleneck. If you have large data sets then uploading these to the service provider becomes a problem. It may be quicker to process a large dataset locally than to upload it through an internet connection.
I would advise not to bother with optimizations yet. First try to get a working prototype, then find out where computation time is spent. You can probably move the biggest bottlenecks out into C or Fortran when and if needed -- then see how much difference it makes.
As they say, often 90% of the computation is spent in 10% of the code.

Resources