Related
I'm trying to understand why adding traits to Dart would cause the shape of objects in memory to change, and am therefore curious how it loads in code right now.
Dart is a dynamically typed language that generates its own machine language equivalents straight from source code with no intermediate byte-code step. There is no generic bytecode (like the JVM or llvm) and instead it is directly compiled into machine code.
I would add that despite compiling straight to machine code, the language itself is not designed in a way that would allow a C/C++ style compiler to effectively generate fast efficient code. This is by design as Dart seems to be an attempt to fill the gap between JavaScript and Java rather than the gap between Java and C/C++. Dart addresses many issues that make JavaScript hard to optimize most importantly typing of numeric variables.
There are some efforts to port the Dart environment to various platforms beyond Windows/Mac/Linux but I have yet to see an actual straight to machine language compiler for Dart. That doesn't mean they don't exist, I just haven't seen anything other than ports of the Linux Dart environment onto Beagleboard and other small Linux distros.
From the Dart FAQ
Q. Why didn’t Google build a bytecode VM targetable by multiple
languages including Dart? Each approach has advantages and
disadvantages, but we feel that in the context of Dart it made sense
to build a language-specific VM for the following reasons:
Google already works on a multi-language bytecode: LLVM bitcode in
PNaCl.
Even if a bytecode VM is specialized for Dart, a language VM will be
simpler and faster because it can work under stronger assumptions—for
instance, a structured control flow. These assumptions make the
implementation cleaner and optimizations easier.
A general-purpose bytecode VM would be even larger and slower, as it
generalizes assumptions and adds functionality that for Dart is dead
code: for example, multithreading with a shared heap.
No bytecode VM is truly general-purpose; they all make assumptions
that privilege some class of languages. A language VM leaves more room
to improve the VM and make deep changes to optimization of the
language. Some Dart engineers wrote an article talking about the VM
question in more detail.
A pretty good presentation on Compiling Dart to Efficient
Machine Code
I want to develop an application that can retrieve information such as, DLL version, DLL build mode(debug or release), info. regarding OS, memory, processer, processes/threads, program version etc. I am developing this mainly for Windows, but it'd be good if the application supports Linux too(wherever applicable).
I am basically a java programmer, and I know C, C++ to some extent.
Which programming language should I go for, that'd make my job easy? i.e. which language has APIs to fetch these kind of information?
Well... APIs are available regardless of the language... But the easiest way to get at what you are trying to do is going to be a C or C++ app. That doesn't mean it'll be easy (getting a DLL version is easy, getting memory and processor type is easy. The other stuff is certainly possible, but you may have to roll up your sleeves and learn the win32 API).
You might want to take a look at an application that already does exactly what you are asking about (Process Explorer) before you try to develop this yourself... It's going to be a big undertaking - and the folks at Sys Internals are really, really good at this stuff, and have already done it.
You commented on Kevin Day's answer that you would prefer to use Java for this.
Java is not very well suited for this, because the information you want to get is very platform-specific, and since Java is designed to be platform-independent, there are not a lot of ways to get at this kind of information from Java.
There are some methods in classes java.lang.System and java.lang.Runtime to get information about the platform that your Java program is running on. For example, class Runtime has a method availableProcessors() that tells you how many processors are available to the Java virtual machine. Note that this is not the same as the number of processors (or cores) that exist in the computer; the documentation even says that the number may change while the program is running.
Lookup the documentation for java.lang.System and java.lang.Runtime for more information.
Most likely you're not going to get exactly the information that you need by using pure Java - C or C++ will be better suited to get this kind of platform-specific information. If you would need this information from a Java program, you could write a small DLL or shared library and use JNI to call into it from your Java program.
Since DLLs are mentioned I presume we are talking about Windows.
I would recommend using WMI queries. They look very much like SQL and give you access to many very useful classes.
e.g. all info about the OS can be found here - in W32_OperatingSystem:
http://msdn.microsoft.com/en-us/library/aa394239(VS.85).aspx
You can use WMI classes from any language including C++.
As a side note - if you start a new application from scratch consider using PowerShell - new scripting language from Microsoft.
We are planning to develop a datamining package for windows. The program core / calculation engine will be developed in F# with GUI stuff / DB bindings etc done in C# and F#.
However, we have not yet decided on the model implementations. Since we need high performance, we probably can't use managed code here (any objections here?). The question is, is it reasonable to develop the models in FORTRAN or should we stick to C (or maybe C++). We are looking into using OpenCL at some point for suitable models - it feels funny having to go from managed code -> FORTRAN -> C -> OpenCL invocation for these situations.
Any recommendations?
F# compiles to the CLR, which has a just-in-time compiler. It's a dialect of ML, which is strongly typed, allowing all of the nice optimisations that go with that type of architecture; this means you will probably get reasonable performance from F#. For comparison, you could also try porting your code to OCaml (IIRC this compiles to native code) and see if that makes a material difference.
If it really is too slow then see how far that scaling hardware will get you. With the performance available through a modern PC or server it seems unlikely that you would need to go to anything exotic unless you are working with truly brobdinagian data sets. Users with smaller data sets may well be OK on an ordinary PC.
Workstations give you perhaps an order of magnitude more capacity than a standard dekstop PC. A high-end workstation like a HP Z800 or XW9400 (similar kit is available from several other manufacturers) can take two 4 or 6 core CPU chips, tens of gigabytes of RAM (up to 192GB in some cases) and has various options for high-speed I/O like SAS disks, external disk arrays or SSDs. This type of hardware is expensive but may be cheaper than a large body of programmer time. Your existing desktop support infrastructure shouldn be able to this sort of kit. The most likely problem is compatibility issues running 32 bit software on a 64-bit O/S. In this case you have various options like VMs or KVM switches to work around the compatibility issues.
The next step up is a 4 or 8 socket server. Fairly ordinary wintel servers go up to 8 sockets (32-48 cores) and perhaps 512GB of RAM - without having to move off the Wintel platform. This gives you fairly wide range of options within your platform of choice before you have to go to anything exotic1.
Finally, if you can't make it run quickly in F#, validate the F# prototype and build a C implementation using the F# prototype as a control. If that's still not fast enough you've got problems.
If your application can be structured in a way that suits the platform then you could look at a more exotic platform. Depending on what will work with your application, you might be able to host it on a cluster, cloud provider or build the core engine on a GPU, Cell processor or FPGA. However, in doing this you're getting into (quite substantial) additional costs and exotic dependencies that might cause support issues. You will probably also have to bring a third-party consultant who knows how to program the platform.
After all that, the best advice is: suck it and see. If you're comfortable with F# you should be able to prototype your application fairly quickly. See how fast it runs and don't worry too much about performance until you have some clear indication that it really will be an issue. Remember, Knuth said that premature optimisation is the root of all evil about 97% of the time. Keep a weather eye out for issues and re-evaluate your strategy if you think performance really will cause trouble.
Edit: If you want to make a packaged application then you will probably be more performance-sensitive than otherwise. In this case performance will probably become an issue sooner than it would with a bespoke system. However, this doesn't affect the basic 'suck it and see' principle.
For example, at the risk of starting a game of buzzword bingo, if your application can be parallelized and made to work on a shared-nothing architecture you might see if one of the cloud server providers [ducks] could be induced to host it. An appropriate front-end could be built to run locally or through a browser. However, on this type of architecture the internet connection to the data source becomes a bottleneck. If you have large data sets then uploading these to the service provider becomes a problem. It may be quicker to process a large dataset locally than to upload it through an internet connection.
I would advise not to bother with optimizations yet. First try to get a working prototype, then find out where computation time is spent. You can probably move the biggest bottlenecks out into C or Fortran when and if needed -- then see how much difference it makes.
As they say, often 90% of the computation is spent in 10% of the code.
For example, I'm running into developers and architects who are scared to death of Rails apps, but love the idea of writing new Grails apps.
From what I've seen, there is a LOT of resource overhead that goes into using the JVM to support languages such as Groovy, JRuby and Jython instead of straight Ruby or Python.
Ruby and Python can both be interpreted on just about any OS, so I don't see any "write once run anywhere" advantage... why bring the hulking JVM along with you?
Java is a much, much more mature platform, with a lot of existing class libraries that could be "dropped in" and used, than, say, Ruby or Python (or even Perl, for that matter). So for people who like using existing code, rather than writing everything themselves, Java is a huge win.
For example, recently I've been looking for something like JAXB for Python or Ruby. In the end, I ended up using JRuby just because I haven't found any mature, widely-used XML-binding libraries.
The huge advantage of writing code (in any language) for the JVM is that it's usually very easy to tap into the enormous wealth of mature Java libraries out there, if necessary.
And I don't know where you got this idea of a "hulking" JVM with a huge resource overhead. The JIT tends to produce code that is quite fast, and the core JVM is anything but huge by today's standards. It does tend to have a huge memory footprint when running, but that's because modern machines have a lot of RAM and the GC works best when it has a lot of RAM to play with. If desired, the GC can be fine-tuned to hell and back to be more conservative.
As someone else put it: "The best thing about Groovy is that I don't have to use Java. The second best thing about Groovy is that I can use Java".
An assumption that seems to be built into the question is that new projects are greenfield projects. Many organizations have made a huge investment in Java over the last decade+ and require any new project to work within the existing (internal) code ecosystem. As pointed out, there's a huge bonus in all the publicly available Java libraries (whether free/OSS or commercial), but the need to work with existing code and even as a component within an existing system is at least as important (if not more so) to large organizations.
A lot also comes down to the maturity and capability of the platform, which is to say the JVM and everything that comes with it (the entire Java ecosystem). A few examples off the top of my head:
You can plug a remote debugger into a running JVM and get all kinds of information about a running application that is simply impossible with Python, Ruby, etc. Going a step further, there's JMX, a standard way to write code so that objects can be monitored and even tweaked in a live application. Take a look at JConsole and see if you don't drool just a little (despite the ugliness of the interface).
Going even further in this direction, there's OSGi, a standard for writing highly modular code that can be deployed, started, stopped, and even upgraded in a live application. With OSGi you break a large application into many smaller "bundles" which can then be maintained (deployed, started/stopped, upgraded) separately. This is a really big deal in large applications, or any applications that need to remain running at all times.
The platform has very good support for asynchronous, reliable messaging. You get JMS as a baseline, and many excellent and powerful libraries built on it for doing complicated things with very little code (cf. Apache Camel, ServiceMix, Mule, and many others). This is another feature that's extremely valuable in larger applications or those which must run within a larger code universe.
The JVM has real (OS-level) threading, while Python et al. are very limited in this regard (notoriously so). (That being said, shared state concurrency -- threading -- is the wrong approach; cf. Erlang, Alice, Mozart/Oz, etc.)
There are numerous JVM choices beyond the standard Sun implementations, like JRockit, IBM's JVM, etc. This is a developing area with other languages -- Python has Jython, Iron Python, even PyPy and Stackless; Ruby has JRuby, Rubinius, and others -- but as good as these are they can't match the maturity found in the various JVM offerings.
All that being said, I really don't like Java the language and avoid it as much as possible. These days with all the excellent alternative languages for the JVM I don't have to. Groovy gets my vote for its accessibility and tight integration with the platform (and even the language), and because of Grails, which I sometimes like to call "Rails for grownups". I like other JVM languages better, particularly Clojure and Scala, but these aren't as accessible to the average programmer. Scala is popping up a lot lately, though, especially thanks to its high profile use at Twitter, so there's hope for interesting and truly excellent languages making it in larger environments. But that's another topic.
why bring to hulking JVM along with you?
JVM isn't bloated, nor is it slow. on the contrary, it's a lean, fast, deeply optimized VM. Unfortunately, it's optimized for static OOP languages.
Still, good compilers targeting JVM do create good performing programs. I don't know about JRuby; but Jython's goal is to be all-around faster than regular C Python, and they're getting close (it's already faster at several important use cases).
Remember that a good JIT (like those for JVM) can apply some optimizations unavailable on static C compilers, getting faster code from them isn't a pipe dream. Of course, a VM optimized for your language should be faster than a 'not-really-generic' VM like JVM; but there's the maturity issue: JVM has a lot of work done there, while JITs for Ruby and Python aren't anywhere near.
Unfortunately, there doesn't seem to be any better generic bytecode VM. Microsoft's CLI suffers from similar limitations as JVM (ironPython is much slower and heavier than JPython). The best candidate seems to be LLVM. Does anybody know why isn't there more dynamic languages over LLVM? I've seen a couple of Scheme compilers, but seem to have several problems.
Groovy is NOT an interpreted language, it is a dynamic language. The groovy compiler produces JVM bytecode that runs inside the JVM just like any other java class. In this sense, groovy is just like java, simply adding syntax to the java language that is meaningful only to developers and not to the JVM.
Developer productivity, ease and flexibility of syntax make groovy attractive to the java ecosystem - ruby or python would be as attractive if they resulted in java bytecode (see jython).
Java developers are not really scared of ruby; as a matter of fact many quickly embrace groovy or jython both close to ruby and python. What they don't care about is leaving such an amazing platform (java) for a less performant, less scalable even less used language such as ruby (for all its merits).
The big knock on RoR is that it isn't scalable and hard to deploy. By using the Java platform, you can leverage your existing infrastructure.
grails war
Produces a war file that is easily deployed on Glassfish, Jboss, etc.
Ruby and Python can both be
interpreted on just about any OS, so I
don't see any "write once run
anywhere" advantage... why bring the
hulking JVM along with you?
Mostly because you want to take advantage of the HUGE existing ecosystem of Java libraries, APIs and products, which dwarfs anything available for Ruby or Python, especially in the enterprise domain.
Also, keep in mind that JRuby and Jython are faster in a lot of benchmarks than the regular (C implementations) of the languages, especially Ruby (even Ruby 1.9).
Having multiple languages targeting the same virtual machine has a lot of benefits, such as leveraging a common infrastructure, code reuse, shared APIs, the ability to use whatever language is conceptually best for you, or for a specific problem domain, etc.
The same things happens in the .NET space, with multiple languages targeting the CLR. The Parrot (vaporware) VM project also aims to the same thing, and it's a stated goal of the LLVM project too.
The reason is Hotspot.
It is an engineering tour de force.
the other reason not many mentioned is existing infrastructure related to jvm - if you already have a server running java stuff, why not use it instead of bringing in yet another platform (like rails)?
I've encountered this and also been baffled by it, and here's my theory.
Enterprise software is full of Java programmers. Like programmers of all stripes, many Java programmers are convinced that their language is the fastest, the most flexible and the easiest to use--they're not too familiar with other languages but are convinced that those who practice them must be savages and barbarians, because any enlightened person would, of course, use Java.
These people have built vast, complicated Java infrastructures: rube-goldberg machines of frameworks and auto-generated code full of byzantine inheritance structures and very, very large XML files.
So, when someone comes along and says "Hey! Let's use a C interpreted language! It's fast and has neat libraries and is much quicker for scripting and prototyping!" The Java guy is firstly like "I have to run a make file to configure this? QUEL HORREUR!" Then the reality of having to deploy and host this on servers that are running dated OSes and dated versions of Tomcat and nothing else starts to set in.
"Hey, I know! There's a java version of this interpreted language! It may break down in the fast lane on the bridge in rush-hour, and it sometimes catches on fire, but I can get Tomcat to run it. I don't have to dirty my hands with learning non-java stuff, and I can shoehorn it into the existing infrastructure! Win!"
So, is this the "right" reason for choosing a java implementation of a scripting language? Probably not. Depends on your definition of "right". But, I suspect that it's the reason they're chosen more often than snobs like me would like to believe.
The same way DOS morphed into Windows?
We seem to have ended up supporting and developing for three platforms from Microsoft, and I'm not sure where the boundaries are supposed to lie.
Why can't the benefits of the CLR (such as type safety, memory protection, etc.) be built into Windows itself?
Or into the browser? Why an entirely other virtual machine? (How may levels of virtual machine indirection are we dealing with now? We just added Silverlight - and before that Flash - running inside the Browser running inside maybe a VM install...)
I can see raw Windows for servers, but why couldn't there be a CLR for workstations talking directly to the hardware (or at least not the whole Windows legacy ball and chain)?
(ooppp - I've got two questions here. Let's make this - why can't .net be built into Windows? I understand about backward compatibility - but the safety of what's in .NET could be at least optionally in Windows itself, couldn't it? It would just be yet another of many sets of APIs?)
Factoid - I recall that one of the competitor architectures selling against MS-DOS on the IBM PC was UCSD-pascal runtime - a VM.
And let's not forget that DOS didn't morph into Windows, at least not the Windows we know and love today. DOS was the operating system, Windows 3.1 a GUI shell resting atop said operating system.
When Windows 95 came out, it is true that there was no more boxed product labeled "Microsoft DOS," but Windows 95, architecturally, was DOS 7.0 with a GUI shell resting atop.
This continued through Win98 and WinME (aka Win9X).
The Windows we know today (XP, Vista, 2003, 2008) has its core from the Windows NT project, a totally separate beast. (Although NT was designed to be compatible with 3.1, and later, 9x binaries, and used a near-identical but expanded API.)
DOS no more morphed into the Windows we are familiar with than the original Linux core morphed into KDE.
The two APIs will need to continue to coexist as long as there are products built natively against Windows which are still in a support cycle. Considering that the Windows API still exists in Windows Server 2008 and Windows 7, that means at least 2017. Truthfully, it will probably be longer, because while managed code is a wonderful thing, it is not always the most appropriate/best answer.
Plus ... As a programmer, you ought to know better than anyone: It's never as easy to do something as it might appear from the outside!
Windows is multi-million lines of code, most of it in C. This represents an enormous decades-long investment. It is constantly being maintained (fixed) for today's users. It would be completely impossible to stop the world while they rewrite every line in C# for ten years, then debug and optimise for another ten, without totally wrecking their business.
Some of the existing code could in theory be compiled to run on the CLR, but it would gain no benefit from doing so. Compiling a large subset of C to the CLR is possible (using the C++/CLI compiler) but it does not automatically enable garbage collection, for example. You have to redesign from the ground up to get that.
Well, for one the CLR isn't an operating system. That's a pretty big reason why not ... I mean even the research OS, Singularity, is not just the CLR. I think you should read up on some books about the Windows kernel and general operating system stuff.
Microsoft are still a few Windows releases away from that.
But they would start with something like Singularity I think.
because it would break backwards compatibility? and mainstream chips architecture doesn't line up with VM architecture? They made hardware for a Java VM a while ago, but nobody cared.
The biggest issue I see is that the CLR runs on a VM, and the VM is useful as a layer of abstraction. Some .NET apps can be run on Linux (see the Mono project, I think they are up to .NET 2 compatibility now), so that would all be gone. In C/C++ or languages that directly talk to the hardware, you have to recompile your code into different binaries for every OS and hardware architecture. The point of having the VM there is to abstract that, so that you can write the code, build it, and use the exact same binary anywhere. If you look at it from a Java perspective, they have done a much better job of using their VM as a "write once run anywhere" model. The same java classes will run on Windows, Mac, and Linux without rebuild (by the programmer anyway, technically the VM is doing that work).
I think the #1 point here is that .NET/CLR is NOT Windows specific, and IMO Microsoft would only help the .NET suite of languages if it put a little more effort toward cross-OS compatibility.
Because Microsoft has got a huge legacy they cannot just simply drop. Companies has invested lots of money for the Windows and Win32 software they cannot dismiss.
CLR or some VM maybe used (VM's are being used) to run an OS on top of it . But then the question is, what should one use to build the VM? Probably C/C++ or some other similar language and (most) probably Assembly in some cases to speed up things.
That would mean the VM will still have the problems that Windows (or any OS) faces now. As pointed out by others, some part of the OS and related applications may be ported (or as you said morphed) to be over the VM, but getting the entire OS on top of a VM dosen't serve much purpose. The reason being, the VM will be the real OS then, implementing garbage collection and other protective measures for the Morphed OS.
Those are my two cents. :)
What language would the CLR itself use? What APIs would it call? Say it needed to open a file or allocate memory or create a process, you think the CLR is going to do that? The CLR is built on top of native code. A managed OS would create overhead.
CLR is for app development, it is there to make it easy to make apps, and easy to make less buggy software. It uses a garbage collector, and they can destroy performance. They can be great too, but you usually end up with some kind of performance problems during development, caused by garbage collection.
They must make it backward compatible so they must make it have some kind of native API.
If you're saying let's make a pure 100% managed OS and forget backward compatibility or have some giant compatability later, all you're really saying is let's force a garbage collector onto everything, right? Besides a garbage collector and the portability guarantees you get by being CLI compliant, what are you getting? The algorithms and everything are still being compiled into native code by the time they execute, so the only really significant difference is the memory management.
I actually did see trends that CLR will get planted deeper into the software stack. I remember saw the newest windows software stack, some CLR related library get planted into lower levels.
But CLR won't morph into windows, we know backward compatibility is very important for the software ecology.