And if so, how. I'm talking about this 4GB Patch.
On the face of it, it seems like a pretty nifty idea: on Windows, each 32-bit application normally only has access to 2GB of address space, but if you have 64-bit Windows, you can enable a little flag to allow a 32-bit application to access the full 4GB. The page gives some examples of applications that might benefit from it.
HOWEVER, most applications seem to assume that memory allocation is always successful. Some applications do check if allocations are successful, but even then can at best quit gracefully on failure. I've never in my (short) life come across an application that could fail a memory allocation and still keep going with no loss of functionality or impact on correctness, and I have a feeling that such applications are from extremely rare to essentially non-existent in the realm of desktop computers. With this in mind, it would seem reasonable to assume that any such application would be programmed to not exceed 2GB memory usage under normal conditions, and those few that do would have been built with this magic flag already enabled for the benefit of 64-bit users.
So, have I made some incorrect assumptions? If not, how does this tool help in practice? I don't see how it could, yet I see quite a few people around the internet claiming it works (for some definition of works).
Your troublesome assumptions are these ones:
Some applications do check if allocations are successful, but even then can at best quit gracefully on failure. I've never in my (short) life come across an application that could fail a memory allocation and still keep going with no loss of functionality or impact on correctness, and I have a feeling that such applications are from extremely rare to essentially non-existent in the realm of desktop computers.
There do exist applications that do better than "quit gracefully" on failure. Yes, functionality will be impacted (after all, there wasn't enough memory to continue with the requested operation), but many apps will at least be able to stay running - so, for example, you may not be able to add any more text to your enormous document, but you can at least save the document in its current state (or make it smaller, etc.)
With this in mind, it would seem reasonable to assume that any such application would be programmed to not exceed 2GB memory usage under normal conditions, and those few that do would have been built with this magic flag already enabled for the benefit of 64-bit users.
The trouble with this assumption is that, in general, an application's memory usage is determined by what you do with it. So, as over the past years storage sizes have grown, and memory sizes have grown, the sizes of files that people want to operate on have also grown - so an application that worked fine when 1GB files were unheard of may struggle now that (for example) high definition video can be taken by many consumer cameras.
Putting that another way: applications that used to fit comfortably within 2GB of memory no longer do, because people want do do more with them now.
I do think the following extract from your link of 4 GB Patch pretty much explains the reason of how and why it works.
Why things are this way on x64 is easy to explain. On x86 applications have 2GB of virtual memory out of 4GB (the other 2GB are reserved for the system). On x64 these two other GB can now be accessed by 32bit applications. In order to achieve this, a flag has to be set in the file's internal format. This is, of course, very easy for insiders who do it every day with the CFF Explorer. This tool was written because not everybody is an insider, and most probably a lot of people don't even know that this can be achieved. Even I wouldn't have written this tool if someone didn't explicitly ask me to.
And to expand on CFF,
The CFF Explorer was designed to make PE editing as easy as possible,
but without losing sight on the portable executable's internal
structure. This application includes a series of tools which might
help not only reverse engineers but also programmers. It offers a
multi-file environment and a switchable interface.
And to quote a Microsoft insider, Larry Miller of Microsoft MCSA on a blog post about patching games using the tool,
Under 32 bit windows an application has access to 2GB of VIRTUAL
memory space. 64 bit Windows makes 4GB available to applications.
Without the change mentioned an application will only be able to
access 2GB.
This was not an arbitrary restriction. Most 32 bit applications simply
can not cope with a larger than 2GB address space. The switch
mentioned indicates to the system that it is able to cope. If this
switch is manually set most 32 bit applications will crash in 64 bit
environment.
In some cases the switch may be useful. But don't be surprised if it
crashes.
And finally to add from MSDN - Migrating 32-bit Managed Code to 64-bit,
There is also information in the PE that tells the Windows loader if
the assembly is targeted for a specific architecture. This additional
information ensures that assemblies targeted for a particular
architecture are not loaded in a different one. The C#, Visual Basic
.NET, and C++ Whidbey compilers let you set the appropriate flags in
the PE header. For example, C# and THIRD have a /platform:{anycpu,
x86, Itanium, x64} compiler option.
Note: While it is technically possible to modify the flags in the PE header of an assembly after it has been compiled, Microsoft does not recommend doing this.
Finally to answer your question - how does this tool help in practice?
Since you have malloc in your tags, I believe you are working on unmanaged memory. This patch would mostly result in invalid pointers as they become twice the size now, and almost all other primitive datatypes would be scaled by a factor of 2X.
But for managed code since all these are handled by the CLR in .NET, this would mean really helpful and would not have much problems unless you are dealing with any of the following :
Invoking platform APIs via p/invoke
Invoking COM objects
Making use of unsafe code
Using marshaling as a mechanism for sharing information
Using serialization as a way of persisting state
To summarize, being a programmer I would not use the tool to convert my application and rather would migrate it myself by changing build targets. being said that if I have a exe that can do well like games with more RAM, then this is worth a try.
Related
I understand from other posts here that "IMAGE_FILE_LARGE_ADDRESS_AWARE" may work to effectively expand memory availability in e.g. Delphi 2007.
I don't get this to work in Delphi6, is this indeed the case, or should it work? Or is there an alternative command that does the same thing?
If not, I may need to migrate to a later version of Delphi. Then, does anyone know what the most recent version of Delphi is that would easily allow me to migrate my existing code (ideally, my existing code, which is fairly simple Turbo Pascal-type code, would just work as is) AND would support the "IMAGE_FILE_LARGE_ADDRESS_AWARE" 'trick' to expand memory?
Many thanks!
Remco
You can apply the IMAGE_FILE_LARGE_ADDRESS_AWARE PE flag to a Delphi 6 application, but you must beware of the following issues:
The default memory manager for Delphi 6, the Borland memory manager, does not support memory allocations with addresses above 2GB. You must replace the memory manager with one that supports large addresses. For instance FastMM.
Your code may well contain pointer truncation bugs that will need to be found and fixed.
The same goes for any third party software that you use. This includes the Borland RTL and VCL libraries. I did not encounter many problems with these libraries, but it may be that your program uses different parts of the runtime libraries that have pointer truncation bugs.
In order to stress test your program under large address conditions you should turn on top down memory allocation. Do not be surprised if your anti-malware software (or indeed other system level software) has to be disabled whilst you operate in top down memory allocation mode. This type of software is notoriously poor at operating in top down memory allocation mode.
Finally, it is worth pointing out that large address aware cannot solve all out of memory problems. All it does is open up the top half of the 32 bit address space. Your program might require even more address space than that. In which case you'd need to either re-design your program, or move to a 64 bit compiler.
If I compile my entire Delphi application to a single exe, that file will grow to 5MB, 10MB, maybe more. When is that too big? What are the issues with this? This is a commercial application, currently on Delphi XE.
I'm aware of the option to Build with Runtime Packages. That sounded like a good idea, but I see comments here noting that there are some issues and disadvantages.
A Delphi application is never really too big.
However the larger the exe is, the harder it will be to redistribute the file.
Also if the executable is located on a network-disk start-up time may suffer.
A number of factors make the exe grow:
enabling debug info (will more or less double the exe size). Disable the inclusion of debug info in the final exe (see screenshot above).
including bitmaps (in an imagelist or likewise component) will also grow the exe substantially.
including resources (using a custom *.res) file will grow the size.
I would advise against putting resources in a separate dll.
This will complicate your application, whilst not reducing the loading time and distribution issues.
Turning off debug info in production code is a must.
If you have a Delphi-2010 or newer you can choose to include images in the png format.
This will take up much less space than old-skool bitmaps.
As long as your app is below 30 MB I would not really worry overmuch about the file size though.
Strip RTTI info
David suggests stripping RTTI info (this will disable live-bindings and some other advanced stuff), see: Reduce exe file
According to David it saves about 30% in exe size.
Exe-size will only increase loading time
Far more important is the amount of data your application allocates as storage.
The amount of space you use (or waste) here will have a far greater impact on the performance of your application than the raw exe size.
Strategy or tools to find "non-leak" memory usage problems in Delphi?
A better way to optimize is to make sure you don't leak resources
How to activate ReportMemoryLeaksOnShutdown only in debug mode?
Windows API calls memory leak detection
Use smart datastructures and algorithms
It gets too general to really narrow it down here, but use algorithms with O(slowly increasing) over O(wasteful increase).
Big-O for Eight Year Olds?
And try and limit memory usage by only fetching the data that you need instead of all the data you might need but probably never will.
Delphi data structures
Etc etc.
I don't know any issues with the exe-size of an application. I'm currently working at an application where the exe is around 60MB and there is no problem.
The only limitation I know are the limitation of the available memory. And an application with use of runtime-packages will consume more working memory because all runtime packages are load on application start. And the packages contains a lot of code wich is problably not used in your application.
I really like the idea of runtime-packages but I don't like the implementation in Delphi. One main disadvantage is that you have to ship your app with a bunch of packages wich makes it hard to maintain.
use RELEASE Build for reduce execute size , increase performance.also use runtime package for reduce exe file but use runtime package cause increase package(setup) file size.
I am new to Computer Architecture.Can somebody help me in understanding the use of limited registers in processing of several complex applications. My question is there are fixed number of registers(For Example :: 80386 contains a total of sixteen registers) that are of interest to the applications programmer.
What happens if we want more registers( for example: to accommodate increased Stack size), are the addresses and data from registers written back to main memory ?.In multitasking environment, are the registers data and addresses of different applications moved from between main memory and back to register for processing ?
Does operating systems have special registers which does not interfere with application general purpose registers ?
And suggest any good resource for understanding such concepts for starters ?
Registers are the fastest memory in a computer. The instruction set of any particular cpu is written specifically for the register architecture. You are right that data/addresses must be backed to memory as more register space is used.
As far as a multitasking system goes, the scheduler generally has to save the execution context between tasks. This context involves the current state of the registers as well as other status bits (depending on the cpu).
A good first step would be to learn assembly programming. It is so close to the hardware that you will learn all of this stuff thoroughly. Once you have that, pick up an operating systems book to see how it is done at a higher level. Depending on your commitment (and curiosity), you could also read some of the source code for smaller real-time operating systems, such as FreeRTOS. Reading up on 8-bit microcontroller architectures is also nice, since they are simple. For example, AVR or HC08 are pretty straightforward architectures to learn. All of the info is free; you just have to read it.
Enjoy.
Right now I plan to test on 32-bit, 64-bit, Windows XP Home, Windows XP Pro, Windows Vista Home Basic, Windows Vista Ultimate, Windows 7 Home Basic, and Windows 7 Ultimate ... all with the latest service pack.
However, now I'm wondering if it's worthwhile to test on both AMD and Intel for all the listed scenarios above or would it be a waste of time?
Note: this is a security application for everyday average users.
My feeling is that this would only be worthwhile if you had lots of on-the-edge hand-coded assembly language or some kind of incredibly tight timings (which you're not going to meet with that selection of OS anyway).
If you're using off-the-shelf commercial compilers, then you can be reasonably sure they're going to generate code which runs on all the normal processors.
Of course, nobody could ever prove they didn't need to test on a particular platform, but I would think there are bigger causes of platform difference to worry about than CPU brand (all the various multi-core/hyperthreading permutations, for example, which might expose all your multithreaded code bugs in different ways)
Only if you're programming in assembly and use extended, vender specific instruction sets. But since AMD and Intel have cross-licensing agreements in place, this is more of an historic issue than a current one.
In every other case (e.g. using a high level language) it's the job of the compiler writers to ensure the code is x86 compliant and runs on every CPU.
Oh, and except the FDIV Bug Processor vendors usually don't do mistakes.
I think you're looking in the wrong direction for testing scenarios.
Yes, it's possible that your code will work on Intel but not on AMD, or in Windows Vista Home but not in Windows Vista Professional. But unless you're doing something very closely tied to low-level programming in the first case, or to details of OS implementation in the second, the odds are small. You could say that it never hurts to test every conceivable scenario. But in real life there must be some limit on the resources available to you for testing. Testing on different processors or different OS's is, in most cases, not testing YOUR program, it's testing the compiler, the OS, or the processor. How much time do you have to spare to test other people's work? I think your time would be better spent testing more scenarios within your own code. You don't give much detail on just what your app does, but just to take one of my own examples, it would be much more productive to spend a day testing selling products our own company makes versus products we resell from other manufacturers, or testing sales tax rules for different states, or whatever.
In practice, I rarely even test deploying on Windows versus deploying on Linux, never mind different versions of Windows, and I rarely get burned on that.
If I was writing low-level device drivers or some such, that would be a different story. But normal apps? Don't waste your time.
Certainly sounds like it would be a waste of time to me - which language(s) are your programs written in?
I'd say no. Unless you are writing your application in assembler, you should be far enough removed from the processor to not need to worry about differences. The processors will support the Windows OS whose API's are what you are interefacing with(depending on the language). If you are using .NET the ONLY forseeable issue you will have is if you are using a version of the framework that those platforms don't support. Given that they are all XP or later you should be fine. If you want to worry about something make sure your application will play nicely with the Vista and later security model.
The question is probably "what are you testing". It is unlikely that any of the test is testing something that would be potentially different between AMD and Intel hardware platforms. Differences could be expected at driver level, but you do not seems to plane testing your software for every existing bit of PC hardware available around. Most probably there would be much more differences between different levels of windows service pack than between AMD and Intel processors.
I suppose it's possible there is some functionality in your code that (whether you know it or not) takes advantage of some processing/optimization in one or the other that could have a serious effect on the outcome. Keyword possible.
I would say in general you're unlikely to have to worry about it. If you're going to do it on multiple machines anyway, mix it up on them. But I wouldn't stress out about it.
I would never run all of my regression tests on both AMD and Intel unless I had specifically fixed an issue unique to one either one. That is what regression testing is.
Unit testing on the other hand... I wouldn't anticipate any difference. So again, I wouldn't bother running unit tests on both until I had actually seen an issue specific to either AMD or Intel.
If you rely on accurate / consistent floating point results, then yes, definitely.
We are planning to develop a datamining package for windows. The program core / calculation engine will be developed in F# with GUI stuff / DB bindings etc done in C# and F#.
However, we have not yet decided on the model implementations. Since we need high performance, we probably can't use managed code here (any objections here?). The question is, is it reasonable to develop the models in FORTRAN or should we stick to C (or maybe C++). We are looking into using OpenCL at some point for suitable models - it feels funny having to go from managed code -> FORTRAN -> C -> OpenCL invocation for these situations.
Any recommendations?
F# compiles to the CLR, which has a just-in-time compiler. It's a dialect of ML, which is strongly typed, allowing all of the nice optimisations that go with that type of architecture; this means you will probably get reasonable performance from F#. For comparison, you could also try porting your code to OCaml (IIRC this compiles to native code) and see if that makes a material difference.
If it really is too slow then see how far that scaling hardware will get you. With the performance available through a modern PC or server it seems unlikely that you would need to go to anything exotic unless you are working with truly brobdinagian data sets. Users with smaller data sets may well be OK on an ordinary PC.
Workstations give you perhaps an order of magnitude more capacity than a standard dekstop PC. A high-end workstation like a HP Z800 or XW9400 (similar kit is available from several other manufacturers) can take two 4 or 6 core CPU chips, tens of gigabytes of RAM (up to 192GB in some cases) and has various options for high-speed I/O like SAS disks, external disk arrays or SSDs. This type of hardware is expensive but may be cheaper than a large body of programmer time. Your existing desktop support infrastructure shouldn be able to this sort of kit. The most likely problem is compatibility issues running 32 bit software on a 64-bit O/S. In this case you have various options like VMs or KVM switches to work around the compatibility issues.
The next step up is a 4 or 8 socket server. Fairly ordinary wintel servers go up to 8 sockets (32-48 cores) and perhaps 512GB of RAM - without having to move off the Wintel platform. This gives you fairly wide range of options within your platform of choice before you have to go to anything exotic1.
Finally, if you can't make it run quickly in F#, validate the F# prototype and build a C implementation using the F# prototype as a control. If that's still not fast enough you've got problems.
If your application can be structured in a way that suits the platform then you could look at a more exotic platform. Depending on what will work with your application, you might be able to host it on a cluster, cloud provider or build the core engine on a GPU, Cell processor or FPGA. However, in doing this you're getting into (quite substantial) additional costs and exotic dependencies that might cause support issues. You will probably also have to bring a third-party consultant who knows how to program the platform.
After all that, the best advice is: suck it and see. If you're comfortable with F# you should be able to prototype your application fairly quickly. See how fast it runs and don't worry too much about performance until you have some clear indication that it really will be an issue. Remember, Knuth said that premature optimisation is the root of all evil about 97% of the time. Keep a weather eye out for issues and re-evaluate your strategy if you think performance really will cause trouble.
Edit: If you want to make a packaged application then you will probably be more performance-sensitive than otherwise. In this case performance will probably become an issue sooner than it would with a bespoke system. However, this doesn't affect the basic 'suck it and see' principle.
For example, at the risk of starting a game of buzzword bingo, if your application can be parallelized and made to work on a shared-nothing architecture you might see if one of the cloud server providers [ducks] could be induced to host it. An appropriate front-end could be built to run locally or through a browser. However, on this type of architecture the internet connection to the data source becomes a bottleneck. If you have large data sets then uploading these to the service provider becomes a problem. It may be quicker to process a large dataset locally than to upload it through an internet connection.
I would advise not to bother with optimizations yet. First try to get a working prototype, then find out where computation time is spent. You can probably move the biggest bottlenecks out into C or Fortran when and if needed -- then see how much difference it makes.
As they say, often 90% of the computation is spent in 10% of the code.