VB6 Memory Limitations - memory

I'm currently supporting a VB6 application (that we are replacing, but it's a slow process!) that is running on several servers. Can anyone tell me please what the maximum amount of memory of VB6 process can address is? We are using a variety of operating systems:
Windows Server 2003 32bit
Windows Server 2008 64bit
Windows Server 2008 R2 64bit
I've tried using resources like this:
https://blogs.msdn.microsoft.com/tom/2008/04/10/chat-question-memory-limits-for-32-bit-and-64-bit-processes/
But I'm skeptical if this is accurate due to it discussing .NET based applications, however I can't find anything more on point than this.

It is hard to take these "What if Superman got in a fight with God" questions too seriously. Long before this becomes a concern you should have moved from memory-resident data structures to a disk file or a database anyway.
But even without linking with /LARGEADDRESSAWARE and booting into 3GB mode a VB6 program can address quite a bit of data on 32-bit Windows.
Option Explicit
Private Sub Main()
Const MAX_BYTES As Long = &H63700000
Dim Bytes() As Byte
ReDim Bytes(MAX_BYTES)
Bytes(MAX_BYTES) = 255
MsgBox "Success" & vbNewLine & vbNewLine _
& "Bytes(MAX_BYTES) = " & CStr(Bytes(MAX_BYTES)) & vbNewLine & vbNewLine _
& "MAX_BYTES = " & Format$(MAX_BYTES, "#,##0")
End Sub
Result:
Success
Bytes(MAX_BYTES) = 255
MAX_BYTES = 1,668,284,416
The linked blog post is correct in pointing out the limitations of a .Net process and their inability to cope with using large amounts of data. Scripting engines like .Net just are not built for these things, and don't underestimate the overhead of the gigantic libraries even the simplest .Net program drags into its address space.

Related

CLR Stored Procedure, limit imposed on column name length

Does anyone know how to work around the 128 character limit imposed on the ‘name’ constructor parameter (column name) of the class ‘Microsoft.SqlServer.Server.SqlMetaData’? Or know of an alternative method of returning data to the SQLPipeline, that doesn’t have a similar restriction.
Background:
A number of years ago we created a .Net (C#) CLR Stored Procedure, to replace one that was implemented in vb6 and used the ‘TrueOLEDBProviderLib’ (TOLAP). The driving force behind the change was the switch to 64bit SQL Server, which meant the vb6 code could no longer run in process. (vb6 doesn’t do 64bit)
Issue:
The core function of our CLR Stored Procedure is, based on a list of ‘data point identifiers’, retrieve and process data from a number of sources (DCOM components), then output a table of data to the SQLPipeline. For the table of data that is returned, we set the column name to the ‘data point identifiers’.
Note: That the ‘data point identifiers’ are created based on a hierarchy, so are quite long, with a maximum length of around 256 characters.
The problem we have recently discovered, is that when attempting to output the results to the SQLPipeline, if ‘data point identifiers’ longer than 128 characters are used, then the CLR throws an exception on the length of the ‘name’ (column name). (See ‘.Net Framework Code behaviour’ below)
But using the same ‘data point identifiers’ on the old vb6 CLR implementation, it works without error. With the returned table contains column names longer than 128 characters.
Supplementary Question:
I know it is a different technology, but why was there no 128 character limit imposed within the SQL Server implementation of ‘TrueOLEDBProviderLib’ (TOLAP). The question I need to provide an answer to is, “if TOLAP can return tables of data that contains column names longer than 128 characters, why can’t the .Net (C#) CLR Stored Procedure”.
Workaround:
The obvious fix would be to truncate the ‘data point identifiers’ down to 128 characters. However, as this is a change in functionality from the original vb6 CLR implementation, I need to explore all the alternatives first.
.Net Framework Code behaviour:
Within the internal constructor of ‘SqlMetaData’, the method ‘AssertNameIsValid’ is called, where the length of the ‘name’ parameter is checked to be less than ‘SmiMetaData.MaxNameLength’ (128 character), if not an exception is thrown.
https://referencesource.microsoft.com/#System.Data/fx/src/data/System/Data/Sql/SqlMetaData.cs
I understand that the value of this limit is set based on the 128 character limit SQL Server has for ‘Column_Length’.
https://learn.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-server-info-transact-sql?view=sql-server-ver15
** Additional Info Update: **
The old implementation was a vb6 DLL on the file system, called by a Stored Procedure.
Last version that the vb6 implementation ran on was SQL Server 2008 R2 SP2 (32bit).
The .Net CLR implementation was first run on SQL Server 2012 SP2 (64bit), current version is 2014 SP3 (64bit).
The column names ‘data point identifier’ will have all come from the SP parameters, as there is nothing like this hard coded in the vb6 version. All ‘data point identifier’ are user defined on the deployed system.

how Byte Address memory in Altera FPGA?

I worked with megafunctions to generate 32bit data memory in the fpga.but the output was addressed 32bit (4 bytes) at time , how to do 1 byte addressing ?
i have Altera Cyclone IV ep4ce6e22c8.
I'm designing a 32bit CPU in fpga ,
Nowadays every CPU address bus works in bytes. Thus to access your 32-bit wide memory you should NOT connect the LS 2 address bits. You can use the A[1:0] address bits to select a byte (or half word using A[1] only) from the memory when your read.
You still will need four byte write enable signals. This allows you to write word, half-words or bytes.
Have a look at existing CPU buses or existing connection standards like AHB or AXI.
Post edit:
but reading address 0001 , i get 0x05060708 but the desired value is 0x02030405.
What you are trying to do is read a word from a non-aligned address. There is no existing 32-bit wide memory that supports that. I suggest you have a look at how a 32-bit wide memory works.
The old Motorola 68020 architecture supported that. It requires a special memory controller which first reads the data from address 0 and then from address 4 and re-combines the data into a new 32-bit word.
With the cost of memory dropping and reducing CPU cycles becoming more important, no modern CPU supports that. They throw an exception: non-aligned memory access.
You have several choices:
Build a special memory controller which supports unaligned accesses.
Adjust your expectations.
I would go for the latter. In general it is based on the wrong idea how a memory works. As consolidation: You are not the first person on this website who thinks that is how you read words from memory.

Ada. Building "main" file takes forever when tasking

I have this simple tasking (threads) program that I'd like to run but building it takes forever (30 seconds or more). It makes it exhausting to have to wait for the build to be built before running the program every-time, especially when I all want to do is change something insignificant, like adding a Put sentence here or there.
This is the program I have been running for reference. I am using GPS 2016. I am a beginner in Ada.
with Ada.Text_IO, Ada.Integer_Text_IO;
use Ada.Text_IO, Ada.Integer_Text_IO;
procedure Main is
task First_Task;
task body First_Task is
begin
for Index in 1..4 loop
delay 2.0;
Put("This is in First_Task, pass number ");
Put(Index, 3);
New_Line;
end loop;
end First_Task;
task Second_Task;
task body Second_Task is
begin
for Index in 1..7 loop
delay 1.0;
Put("This is in Second_Task, pass number");
Put(Index, 3);
New_Line;
end loop;
end Second_Task;
task Third_Task;
task body Third_Task is
begin
for Index in 1..5 loop
delay 0.1;
Put("This is in Third_Task, pass number ");
Put(Index, 3);
New_Line;
end loop;
end Third_Task;
begin
for Index in 1..5 loop
delay 0.7;
Put_Line("This is in the main program.");
end loop;
end Main;
Posting an answer to help searchability for future users. If you find the full solution, exactly why your AV software does this and a clean solution, don't hesitate to post and accept your own answer.
First, the MCVE enabled a quick test, revealing nothing wrong with either the code or at least one Gnat compiler (Linux x86-64, Debian Jessie, gcc4.9.3) pointing at an installation-specific problem.
The installation in question is Gnat GPL-2016 (32 bit) on Windows-10, with GPS as the IDE, and AVAST anti-virus software.
Previous problem reports and rumour pointed at two possible candidates,
unusual Python installations - GPS depends on Python, and finding an unexpected Python version is rumoured to cause some troubles
Anti-virus software interacting with the IDE in unexpected ways.
Of these, the latter is confirmed to be the problem, and disabling AV during program build restores acceptable build times. (This isn't specific to Ada or Gnat, I've seen it on FPGA development tools too)
So we have a temporary workaround.
The next step might be to identify why AVAST is allergic to the build process, and disable its reaction to false positives, to maintain AV protection during programming sessions.
Possible candidates may be the intermediate .o and .ali files (Object and Ada Linker), or intermediate "binding" files b~whatever.ads/b which stitch the Ada code to runtime system and OS.
Most likely, the b~whatever.o object files spark an allergic reaction when they link to unusual OS primitives for process manipulation, to implement Ada tasking. Possibly this resembles virus behaviour closely enough to attract attention.
One answer may be to teach Avast not to scan your Ada project's build folder, or to filter what it scans by file type. But I can be no further help, and I encourage a better answer from anyone who finds one.

Do any CPUs have hardware support for bounds checking?

It doesn't seem like it would be difficult to associate ranges with segments of memory. Then have an assembly instruction which treats 2 integers as "location" & "offset" (another for "data" if setting), and returns the data and error code. This would mean no longer having to make a choice between speed and security/safety when working with arrays.
Another example might be a function which verifies that instructions originating in a particular memory range cannot physically access memory outside that range. If all hardware connected to the motherboard had this capability (and were made to be compatible with each other), it would be trivial to make perfect virtual machines that run at nearly the same speed as the physical machine.
Dustin Soodak
Yes.
Decades ago, Lisp machines performed simultaneous validation checks (e.g. type checks and bounds checks) as the program ran with the assumption the program and state were valid, jumping "back in time" if the check failed - unfortunately this ability to get "free" runtime validation was lost when conventional (i.e. x86) machines became dominant.
https://en.wikipedia.org/wiki/Lisp_machine
Lisp Machines ran the tests in parallel with the more conventional single instruction additions. If the simultaneous tests failed, then the result was discarded and recomputed; this meant in many cases a speed increase by several factors. This simultaneous checking approach was used as well in testing the bounds of arrays when referenced, and other memory management necessities (not merely garbage collection or arrays).
Fortunately we're finally learning from the past and slowly, and by piecemeal, reintroducing those innovations - Intel's "MPX" (Memory Protection eXtensions) for x86 were introduced in Skylake-generation processors for hardware bounds-checking - though it isn't perfect.
(x86 is a regression in other ways too: IBM's mainframes had true hardware-accelerated system virtualization in the 1980s - we didn't get it on x86 until 2005 with Intel's "VT-x" and AMD's "AMD-V" extensions).
x86 BOUND
Technically, x86 does have hardware bounds-checking: the the BOUND instruction was introduced in 1982 in the Intel 80188 (as well as the Intel 286 and above, but not the Intel 8086, 8088 or 80186 processors).
While the BOUND instruction does provide hardware bounds-checking, I understand it indirectly caused performance issues because it breaks the hardware branch predictor (according to a Reddit thread, but I'm unsure why), but also because it requires the bounds to be specified in a tuple in memory - that's terrible for performance - I understand at runtime it's no faster than manually having the instructions to do an "if index not in range [x,y] then signal the BR exception to the program or OS" (so you might imagine the BOUND instruction was added for the convenience of people who coded assembly by-hand, which was quite common in the 1980s).
The BOUND instruction is still present in today's processors, but it was not included in AMD64 (x64) - likely for the performance reasons I explained above, and also because likely very few people were using it (and compilers could trivially replace it with a manual bounds check, that might have better performance anyway, as that could use registers).
Another disadvantage to storing the array bounds in memory is that code elsewhere (that wasn't subject to BOUNDS checking) could overwrite the previously written bounds for another pointer and circumvent the check that way - this is mostly a problem with code that intentionally tries to disable safety features (i.e. malware), but if the bounds were stored in the stack - and given how easy it is to corrupt the stack, it has even less utility.
Intel MPX
Intel MPX was introduced in Skylake architecture in 2015 and should be present in all Skylake and subsequent processor models in the mainstream Intel Core family (including Xeon, and non-SoC versions of Celeron and Pentium). Intel also implemented MPX in the Goldmont architecture (Atom, and SoC versions of Celeron and Pentium) from 2016 onwards.
MPX is superior to BOUND in that it provides dedicated registers to store the bounds range so the bounds-check should be almost zero-cost compared to BOUND which required a memory access. On the Intel 486 the BOUND instruction takes 7 cycles (compare to CMP which takes only 2 cycles even if the operand was a memory address). In Skylake the MPX equivalent (BNDMK, BNDCL and BNDCU) are all 1-cycle instructions and BNDMK can be amortized as it only needs to be called once for each new pointer).
I cannot find any information on wherever or not AMD has implemented their own version of MPX yet (as of June 2017).
Critical thoughts on MPX
Unfortunately the current state of MPX is not all that rosy - a recent paper by Oleksenko, Kuvaiskii, et al. in February 2017 "Intel MPX Explained" (PDF link: caution: not yet peer-reviewed) is a tad critical:
Our main conclusion is that Intel MPX is a promising technique that is not yet practical for widespread adoption. Intel MPX’s performance overheads are still high (~50% on average), and the supporting infrastructure has bugs which may cause compilation or runtime errors. Moreover, we showcase the design limitations of Intel MPX: it cannot detect temporal errors, may have false positives and false negatives in multithreaded code, and its restrictions
on memory layout require substantial code changes for some programs.
Also note that compared to the Lisp Machines of yore, Intel MPX is still executed inline - whereas in Lisp Machines (if my understanding is correct) bounds checks happened concurrently in hardware with a retroactive jump backwards if the check failed; thus, so-long as a running program's pointers do not point to out-of-bounds locations then there would be an absolutely zero runtime performance cost, so if you have this C code:
char arr[10];
arr[9] = 'a';
arr[8] = 'b';
Then under MPX then this would be executed:
Time Instruction Notes
1 BNDMK arr, arr+9 Set bounds 0 to 9.
2 BNDCL arr Check `arr` meets lower-bound.
3 BNDCU arr Check `arr` meets upper-bound.
4 MOV 'a' arr+9 Assign 'a' to arr+9.
5 MOV 'a' arr+8 Assign 'a' to arr+8.
But on a Lisp machine (if it were magically possible to compile C to Lisp...), then the program-reader-hardware in the computer has the ability to execute additional "side" instructions concurrently with the "actual" instructions, allowing the "side" instructions to instruct the computer to disregard the results from the "actual" instructions in the event of an error:
Time Actual instruction Side instruction
1 MOV 'A' arr+9 ENSURE arr+9 BETWEEN arr, arr+9
2 MOV 'A' arr+8 ENSURE arr+8 BETWEEN arr, arr+9
I understand the instructions-per-cycle for the "side" instructions are not the same as the "Actual" instructions - so the side-check for the instruction at Time=1 might only complete after the "Actual" instructions have already progressed on to Time=3 - but if the check failed then it would pass the instruction pointer of the failed instruction to the exception handler that would direct the program to disregard the results of the instructions executed after Time=1. I don't know how they could achieve that without massive amounts of memory or some mandatory execution pauses, possibly memory-fencing too -
that's outside the scope of my answer, but it is at least theoretically possible.
(Note in this contrived example I'm using constexpr index values that a compiler can prove will never be out-of-bounds so would omit the MPX checks entirely - so pretend they're user-supplied variables instead :) ).
I'm not an expert in x86 (or have any experience in microprocessor design, spare a CS500-level course I took at UW and didn't do the homework for...) but I don't believe concurrent execution of bounds-checks nor "time travel" is possible with x86's current design, despite the extant implementation of out-of-order execution - I might be wrong, however. I speculate that if all pointer-types were promoted to 3-tuples ( struct BoundedPointer<T> { T* ptr, T* min, T* max } - which technically already happens with MPX and other software-based bounds-checks as every guarded pointer has its bounds defined when BNDMK is called) then the protection could be provided for free by the MMU - but now pointers will consume 24 bytes of memory, each, instead of the current 8 bytes - or compare to the measly 4 bytes under 32-bit x86 - RAM is plentiful, but still a finite resource that shouldn't be wasted.
MPX in GCC
GCC supported for MPX from version 5.0 to 9.1 ( https://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler ) when it was removed due to its maintenance burden.
MPX in Visual Studio / Visual C++
Visual Studio 2015 Update 1 (2015.1) added "experimental" support for MPX with the /d2MPX switch ( https://blogs.msdn.microsoft.com/vcblog/2016/01/20/visual-studio-2015-update-1-new-experimental-feature-mpx/ ). Support is still present in Visual Studio 2017 but Microsoft has not announced if it's considered a mainstream (i.e. non-experimental) feature yet.
MPX in Clang / LLVM
Clang has partially supported manual use of MPX in the past, but that support was fully removed in version 10.0
As of July 2021, LLVM still seems capable of outputting MPX instructions, but I can't see any evidence of an MPX "pass".
MPX in Intel C/C++ Compiler
The Intel C/C++ Compiler has supported MPX since version 15.0.
The XL compilers available on the IBM POWER processors on the Little Endian Linux, Big Endian Linux or AIX operating systems have a different implementation of array bounds checking.
Using the -qcheck or its synonym -C option turns on various kinds of checking. -qcheck=bounds checks array bounds. When this is used, the compilers check that every array reference has a valid subscript.
The hardware instruction used is a conditional trap, comparing the subscript to the upper limit and trapping if the subscript is too large or too small. In C and C++ the lower limit is 0. In Fortran it defaults to 1 but can be any integer. When it is not zero, the lower limit is subtracted from the subscript being checked, and the check compares that to the upper limit minus the lower limit.
When the limit is known at compile time and small enough, a conditional trap immediate instruction is enough. When the limit is calculated at execution time or is greater than 65535, a conditional trap instruction comparing two registers is needed.
The performance impact is small for several reasons:
1. The conditional trap instructions are fast.
2. They are executed in a standard integer pipeline. Since most POWER CPUs have 2 or 4 integer pipelines, there is usually an otherwise empty slot to put the trap in, so it is often essentially zero cost.
3. When it can the compiler optimizer moves the conditional trap out of loops so it is executed only once, checking all loop iterations at once.
4. When it can prove the actual subscript cannot exceed the limit, the optimizer discards the instruction.
5. Also when it can prove the subscript will also be invalid, the optimizer uses an unconditional trap.
6. If necessary -qcheck can be used during testing and skipped for production builds, but the overhead is small enough that's not usually necessary.
If my memory is correct, one long ago paper reported a 2% slowdown in one case and 0% in another. Since that CPU had only one integer pipeline, the slowdown should be significantly less with modern CPUs.
Other checking using the same mechanism is available to detect dereferencing NULL pointers, dividing an integer by zero, using an uninitialized auto variable, specially written asserts, etc.
This doesn't include all kinds of invalid memory usage, but it does handle the most common kind, does it very efficiently, and is very easy to use.
GCC supports -fbounds-check for similar purposes, but at this time it is only available for the Fortran front end (gfortran).

How does a virtual machine work?

I've been looking into how programming languages work, and some of them have a so-called virtual machines. I understand that this is some form of emulation of the programming language within another programming language, and that it works like how a compiled language would be executed, with a stack. Did I get that right?
With the proviso that I did, what bamboozles me is that many non-compiled languages allow variables with "liberal" type systems. In Python for example, I can write this:
x = "Hello world!"
x = 2**1000
Strings and big integers are completely unrelated and occupy different amounts of space in memory, so how can this code even be represented in a stack-based environment? What exactly happens here? Is x pointed to a new place on the stack and the old string data left unreferenced? Do these languages not use a stack? If not, how do they represent variables internally?
Probably, your question should be titled as "How do dynamic languages work?."
That's simple, they store the variable type information along with it in memory. And this is not only done in interpreted or JIT compiled languages but also natively-compiled languages such as Objective-C.
In most VM languages, variables can be conceptualized as pointers (or references) to memory in the heap, even if the variable itself is on the stack. For languages that have primitive types (int and bool in Java, for example) those may be stored on the stack as well, but they can not be assigned new types dynamically.
Ignoring primitive types, all variables that exist on the stack have their actual values stored in the heap. Thus, if you dynamically reassign a value to them, the original value is abandoned (and the memory cleaned up via some garbage collection algorithm), and the new value is allocated in a new bit of memory.
The VM has nothing to do with the language. Any language can run on top of a VM (the Java VM has hundreds of languages already).
A VM enables a different kind of "assembly language" to be run, one that is more fit to adapting a compiler to. Everything done in a VM could be done in a CPU, so think of the VM like a CPU. (Some actually are implemented in hardware).
It's extremely low level, and in many cases heavily stack based--instead of registers, machine-level math is all relative to locations relative to the current stack pointer.
With normal compiled languages, many instructions are required for a single step. a + might look like "Grab the item from a point relative to the stack pointer into reg a, grab another into reg b. add reg a and b. put reg a into a place relative to the stack pointer.
The VM does all this with a single, short instruction, possibly one or two bytes instead of 4 or 8 bytes PER INSTRUCTION in machine language (depending on 32 or 64 bit architecture) which (guessing) should mean around 16 or 32 bytes of x86 for 1-2 bytes of machine code. (I could be wrong, my last x86 coding was in the 80286 era.)
Microsoft used (probably still uses) VMs in their office products to reduce the amount of code.
The procedure for creating the VM code is the same as creating machine language, just a different processor type essentially.
VMs can also implement their own security, error recovery and memory mechanisms that are very tightly related to the language.
Some of my description here is summary and from memory. If you want to explore the bytecode definition yourself, it's kinda fun:
http://java.sun.com/docs/books/jvms/second_edition/html/Instructions2.doc.html
The key to many of the 'how do VMs handle variables like this or that' really comes down to metadata... The meta information stored and then updated gives the VM a much better handle on how to allocate and then do the right thing with variables.
In many cases this is the type of overhead that can really get in the way of performance. However, modern day implementations, etc have come a long way in doing the right thing.
As for your specific questions - treating variables as vanilla objects / etc ... comes down to reassigning / reevaluating meta information on new assignments - that's why x can look one way and then the next.
To answer a part of your questions, I'd recommend a google tech talk about python, where some of your questions concerning dynamic languages are answered; for example what a variable is (it is not a pointer, nor a reference, but in case of python a label).

Resources