I would like to know what's the difference betweenmach_vm_allocate and vm_allocate. I know mach_vm_allocate is only available in OS X and not iOS, but I'm not sure why. The file that has all the function prototypes in for the mach_vm_... functions (mach/mach_vm.h) only has #error mach_vm.h unsupported. in iOS.
the new Mach VM API that is introduced in Mac OS X 10.4. The new API is essentially the same as the old API from the programmer's standpoint, with the following key differences.
-Routine names have the mach_ prefixfor example, vm_allocate() becomes mach_vm_allocate() .
-Data types used in routines have been updated to support both 64-bit and 32-bit tasks. Consequently, the new API can be used with any task.
The new and old APIs are exported by different MIG subsystems:
mach_vm and vm_map , respectively. The corresponding header files are
<mach/mach_vm.h> and <mach/vm_map.h> , respectively.
The new Mach VM API that is introduced in Mac OS X 10.4. The new API is essentially the same as the old API from the programmer's standpoint, with the following key differences:
Routine names have the mach_ prefixfor example, vm_allocate() becomes mach_vm_allocate();
Data types used in routines have been updated to support both 64-bit and 32-bit tasks. Consequently, the new API can be used with any task.
The new and old APIs are exported by different MIG subsystems: mach_vm and vm_map respectively. The corresponding header files are <mach/mach_vm.h> and <mach/vm_map.h> respectively.
The information is derived from the book OS X Internals A System Approach.
Those would be /usr/include/mach/mach_vm.h and /usr/include/mach/vm_map.h. You can grep in /usr/include/mach to get those.
In the kernel, the APIs are basically using the same implementation. On iOS, you can use the "old" APIs perfectly well.
I'm not happy with the other answers. Two seem to quote from the same source (one doesn't correctly attribute it) and this source rather misleading.
The data types in the old API are variable size. If you are compiling for i386 CPU architecture, they are 32 bit, if you compile for x86_64 CPU architecture, they are 64 bit.
The data type in the new API are fixed size and always 64 bit.
As a 32 bit process only requires 32 bit data types and a 64 bit process already had 64 bit types even with the old API, it isn't obvious why the new API was required at all. But it will become obvious if you look at Apples kernel design:
The real kernel of macOS is a Mach 3 micro kernel. Being a micro kernel, it only takes care of process and thread management, IPC, virtual memory management and process/thread scheduling. That's it. Usually everything else would be done in userspace when using a micros kernel but that is slow and thus Apple took a different approach.
Apple took the FreeBSD monolithic kernel and wrapped it around the micro kernel which executes the FreeBSD kernel as a single process (a task) with special privileges (despite being a task, the code runs within kernel space not userspace as all the other tasks in the system). The combined kernel has the name XNU.
And in the past, Apple supported any combination of mixing 32 and 64 bits within a system: Your system could run a 32 bit kernel and run 32 or 64 bit processes on it or it could run a 64 bit kernel and run 32 or 64 bit processes on it. You might already be seeing where this going, aren't you?
If process and kernel are of different bit width, the old API is of no use. E.g., a 32 bit kernel cannot use it to interact with the memory of a 64 bit process because all data types of the API calls would only be 32 bit in the 32 bit kernel task but a 64 bit process has a 64 bit memory space even if the kernel itself has not.
Actually there is even a 3rd version of that API. To quote a comment from Apple's kernel source:
* There are three implementations of the "XXX_allocate" functionality in
* the kernel: mach_vm_allocate (for any task on the platform), vm_allocate
* (for a task with the same address space size, especially the current task),
* and vm32_vm_allocate (for the specific case of a 32-bit task). vm_allocate
* in the kernel should only be used on the kernel_task. vm32_vm_allocate only
* makes sense on platforms where a user task can either be 32 or 64, or the kernel
* task can be 32 or 64. mach_vm_allocate makes sense everywhere, and is preferred
* for new code.
As long as you are only using that API in userspace and only for memory management of your old process, using the old API is still fine, even for a 64 bit process as all data types will be 64 bit in that case.
The new API is only required when working across process boundaries and only if it's not certain that both processes are both 32 or 64 bit. However, Apple has dropped 32 bit kernel support long ago and meanwhile they have also dropped 32 bit userspace support as all system libraries only ship as 64 bit library starting with 10.15 (Catalina).
On iOS the new API was never required as you could never write kernel code for iOS and the security concept of iOS forbids direct interaction with other process spaces. On iOS you can only use that API to interact with your own process space and for that it will always have correct data types.
Related
I start to explore in the area of computer architecture. There are 2 questions about ISA that confuse me.
As far as I know, there are different kinds of ISA such as ARM, MIPS, 80x86, etc. I wonder whether a CPU can only specifically read one kind of ISA. For example, can a processor read both 80x86 and MIPS.
If a CPU is unique to an ISA, how can I check which ISA my PC processor is using? Can I find it out manually?
Thank you
All the CPU/MCU's I know of support just single instruction set.
There is capability of loading microcode to some of the newer architectures that may allow to change the instruction set behavior to some point bot strongly doubt it you can change the instruction set with it. Instruction set and internal CPU/MCU circuitry are strongly dependent. Making universal CPU with changeable instruction set is possible (for example with FPGA) but would be very slow in comparison to direct DIE encoded CPU. With similar technology of Die the clock speed would be may be just few MHz.
Some architectures like i80x86 supports modes that can switch to different kind of operation (16/32/64 bit,real,protected) but its hard to say it is different instruction set or just subset of the same thing ...(matter of perspective)
detection of instruction set.
This is madness. Yes it is possible to detect which type of instruction set you have via program but all the CPU/MCU's have different pinout, interfaces, architectures and are not interchangeable (even in the same architecture class) so you detecting instruction set is meaningless as you alredy know the architecture you are doing the wiring for ...
Anyway the detection would work like this:
have set of test programs of each supported instruction set/architecture that will set specific memory or IO to predefined state if working properly
have watch dog cycling between all the detections and stop on first valid result.
Yes, each type of CPU is unique to an instruction set. The instruction set for ARM will not work with x86, SPARC, etc. There may be some overlap by coincidence, but programs are not compatible between architectures.
Depending on your operating system, there are commands you can run to see this information. For unix/Linux, uname -a will show you what architecture you're running, as well as dmidecode. For Windows OS's, right-clicking on My Computer and selecting Properties should show you your architecture.
For example (Windows 7):
For Linux (I know, it's a super-old distro!):
$ uname -a
Linux hostname 2.6.35-22-generic #33-Ubuntu SMP Sun Sep 19 20:32:27 UTC 2010 x86_64 GNU/Linux
(In this example, the architecture is x86_64), which is 64-bit Intel or AMD. To tell for sure, you can run dmidecode as I mentioned earlier:
~# dmidecode |grep -i proc
Processor Information
Type: Central Processor
Version: AMD Opteron(tm) Processor 154
Processor Information
Type: Central Processor
Version: AMD Opteron(tm) Processor 154
It can actually read any instruction set if the support is implemented. Most of the CPUs nowadays support two/three instructions set that only slightly differ because of 32-bit/64-bit addressing.
x86 supports 16-bit, 32-bit and 64-bit instructions set, ARM support 32-bit, 64-bit, for both Thumb and Thumb-2, etc. Similarly for MIPS for example.
Original Transmeta I believe was flexible about it and supposed to transcompile any instruction set into internal set and run it natively. However it failed and nowadays there is nothing similar to it.
Anyway, once you run application, it's bound to specific instruction set in its header so it can't change it during the runtime. Well, ARM is exception to that - it's able to switch between full and Thumb versions but they are just different encoding for the same...
For the second part - either in your OS GUI or you can usually read it - in Linux by reading /proc/cpuinfo, on Windows in the environment variable PROCESSOR_ARCHITECTURE.
I have a workstation whose operating system is 64 bit windows server 2012 R2. I am using Delphi XE7 Update 1. The workstation has 72 cores including hyperthreading.
I want all my applications to run on all the cores that are available each time the application is run. I wish to do this programmatically rather that using set affinity in task master (which only applies to one group at a time and I have two groups of 36 cpus which I want to engage simultaneously) or boot options in advance setting from msconfig.
I realise that the question is similar to or encompasses the following questions that have already been asked on Stackoverflow
Delphi TParallel not using all available cpu
Strange behaviour of TParallel.For default ThreadPool
SetProcessAffinityMask - Select more than one processor?.
and I have also looked at the suggestion edn.embarcadero.com/article/27267.
But my SetProcessAffinityMask question relates to more that 64 cores using a 64 bit operation system and is not confined to using TParallel.
The solution that I have tried is an adaptation of the one proffered by Marco van de Voort
var
cpuset : set of 0..71;
i: integer;
begin
cpuset:=[];
for i:=0 to 71 do
cpuset:=cpuset+[i];
SetProcessAffinityMask(ProcInfo.hProcess, dword(cpuset));
end;
but it did not work.
I would be grateful for any suggestions.
As discussed in the comments, an affinity mask is 32 bits in 32 bit code, and 64 bits in 64 bit code. You are trying to set a 72 bit mask. Clearly that isn't going to work.
You are going to need to understand processor groups, covered in detail on MSDN: Processor Groups, including this link to Supporting Systems That Have More Than 64 Processors.
Since you've got 72 processors you'll have multiple processor groups. You need to either use multiple processes to reach all groups, or use a multi process group. From the doc:
After the thread is created, its affinity can be changed by calling SetThreadAffinityMask or SetThreadGroupAffinity. If a thread is assigned to a different group than the process, the process's affinity is updated to include the thread's affinity and the process becomes a multi-group process. Further affinity changes must be made for individual threads; a multi-group process's affinity cannot be modified using SetProcessAffinityMask.
This is pretty gnarly stuff. If you are in control of your threads then you should be able to do what you need with SetThreadGroupAffinity. If you are using the Delphi threading library then you don't control the threads. That probably makes a single process solution untenable.
Another problem to consider is memory locality. If the machine uses NUMA memory, then I know of no Delphi memory manager that can perform well with NUMA memory. In a NUMA environment, if performance is important to you, you probably need each thread to allocate memory on the thread's NUMA node.
The bottom line here is that there is no simple quick fix to produce code that will use all of this machine's resources effectively. Start with the documentation I linked to above and perform some trials to make sure you understand all the implications of processor groups and NUMA.
When executing a 32-bit program on a 64-bit CPU, can we assume that the underlying hardware will use 32-bit values in memory and in the CPU registers to store an integer?
Given these setups:
Intel i5 + Windows 32 bit + 32 bit application
Intel i5 + Windows 64 bit + 32 bit application
Please explain the drawbacks of each setup. Would using the second one be less efficient because the CPU gets extra work to ignore halt of the register's value?
When an x86_64 processor executes 32-bit code it is effectively running in i686 mode. This is all achieved in hardware so there is no performance penalty.
A user-space 32-bit program will be executed in "32 bit mode" whether the operating system is 32 bits or 64 bits so the behaviour should be identical.
The only potential drawback is that the program might need to dynamically link to some 32-bit libraries which are not installed by default on the 64-bit OS. That sort of scenario is more common on Linux where it is much more common for programs to dynamically link to external libraries; the only example I can think of on Windows is if the program needs the Visual C++ runtime library then you would have to install the 32-bit version of that (possibly alongside the 64-bit version, if another program needed that). So in short you might have to install more stuff with a 64 bit set up.
Also the actual OS might consume more memory on a 64 bit setup, however one drawback of 32-bit system is that your entire system memory size is limited to 4GB (or practically about 3.5 GB due to some memory space being mapped to hardware).
64-bit Knowledge Base: How much memory can an application access in Win32 and Win64?
I need some help to describe, in technical words, why a 64-bit application prompts a "Not a valid Win32 application" in Windows 32-bit on a 32-bit machine? Any MSDN reference is greatly appreciate it (I couldn't google a reliable source). I know it shouldn't run, but I have no good explanation for this.
A 32 bit OS runs in 32-bit protected mode. A 64 bit OS runs in long mode (i.e. 64-bit protected mode). 64 bit instructions (used by 64 bit programs) are only available when the CPU is in long mode; so you can't execute them in 32-bit protected mode, in which a 32 bit OS runs.
(The above statement applies to x86 architecture)
By the way, the reason for "Not a valid Win32 application" error message is that 64 bit executables are stored in PE32+ format while 32 bit executables are stored in PE32 format. PE32+ files are not valid executables for 32 bit Windows. It cannot understand that format.
To expand on what others have said in a more low-level and detailed way:
When a program is compiled, the instructions are written for a specific processor instruction set. What we see as "x = y + z" usually amounts to something along the lines of copying one value into a register, passing the add command with the memory location of the other value, etc.
Specific to this one question, a 64 bit application is expecting 64 bits of address space to work with. When you pass a command to the processor in a 32 bit system, it works on those 32 bits of data at once.
The point of it all? You can't address more than 4 gigabytes (232) of memory on a 32 bit system without creativity. Some tasks that would take multiple operations (say, dealing with simple math on numbers > 4 billion unsigned) can be done in a single operation. Bigger, faster, but requires breaking compatibility with older systems.
A 64-bit application requires a 64-bit CPU because it makes use of 64-bit instructions.
And a 32-bit OS will put your CPU into 32-bit mode, which disables said instructions.
Why would you expect it to work? You're compiling your program to take advantage of features that don't exist on a 32-bit system.
A 64-bit architecture is built to invoke hardware CPU instructions that aren't supported by a 32-bit CPU, which your CPU is emulating in order to run a 32-bit OS.
As it would be theorically possible to run in some kind of emulation mode for the instruction set, you get into troubles as soon as your application needs more memory than 32 bits can address.
The primary reason why 64-bit apps don't run on 32-bit OSes has to do with the registers being used in the underlying assembly (presuming IA-32 here) language. In 32-bit operating systems and programs, the CPU registers are capable of handle a maximum of 32 bits of data (DWORDS) in the registers. In 64-bit OSes, the registers must handle double the bits (64) and hence the QWORD data size in assembly. Compiling for x86 (32 bit) means the compiler is limiting the registers to handling 32 bits of data max. Compiling for x64 means the compiler is allowing the registers to handle up to 64 bits of data. Stepping down from QWORDs (64) to DWORDs (32) means you end up losing half the information stored in the registers. You're most likely getting the error because the underlying compiled machine code is trying to move more than 32 bits of data into a register which the 32-bit operating system can't handle since the maximum size of its registers are 32 bits.
Because they're fundamentally different. You wouldnt expect a Frenchman to understand Mandarin Chinese, so why would you expect a 32bit CPU to understand 64bit code?
Jeff covered this a while back on his blog in terms of 32 bit Vista.
Does the same 32 bit 4 GB memory cap that applies in 32 bit Vista apply to 32 bit Ubuntu? Are there any 32 bit operating systems that have creatively solved this problem?
Ubuntu server has PAE enabled in the kernel, the desktop version does not have this feature enabled by default.
This explains, by the way, why Ubuntu server does not work in some hardware emulators whereas the desktop edition does
Well, with windows, there's something called PAE, which means you can access up to 64 GB of memory on a windows machine. The downside is that most apps don't support actually using more than 4 GB of RAM. Only a small number of apps, like SQL Server are programmed to actually take advantage of all the extra memory.
Yes, 32 bit ubuntu has the same memory limitations.
There are exceptions to the 4GB limitation, but they are application specific... As in, Microsoft Sql Server can use 16 gigabytes with "Physical address Extensions" [PAE] configured and supported and... ugh
http://forums.microsoft.com/TechNet/ShowPost.aspx?PostID=3703755&SiteID=17
Also drivers in ubuntu and windows both reduce the amount of memory available from the 4GB address space by mapping memory from that 4GB to devices. Graphics cards are particularly bad at this, your 256MB graphics card is using up at least 256MB of your address space...
If you can [your drivers support it, and cpu is new enough] install a 64 bit os. Your 32 bit applications and games will run fine.
There seems to be some confusion around PAE. PAE is "Page Address Extension", and is by no means a Windows feature. It is a hack Intel put in their Pentium II (and newer) chips to allow machines to access 64GB of memory. On Windows, applications need to support PAE explicitely, but in the open source world, packages can be compiled and optimized to your liking. The packages that could use more than 4GB of memory on Ubuntu (and other Linux distro's) are compiled with PAE support. This includes all server-specific software.
In theory, all 32-bit OSes have that problem. You have 32 bits to do addressing.
2^32 bits / 2^10 (bits per kb) / 2^10 (kb per mb) / 2^10 (mb per gb) = 2^2 = 4gb.
Although there are some ways around it. (Look up the jump from 16-bit computing to 32-bit computing. They hit the same problem.)
Linux supports a technology called PAE that lets you use more than 4GB of memory, however I don't know whether Ubuntu has it on by default. You may need to compile a new kernel.
Edit: Some threads on the Ubuntu forums suggest that the server kernel has PAE on by default, you could try installing that.