Moving through allocated bytes - memory

I have following declaration data:
temp db 50 DUP(0)
How do I access each byte?
Let's say I do mov temp, 48 and then want to move 49 into the next byte of the allocated ones. I tried to
inc temp
mov temp, 49
but it just increased temp value to 49

E.g.
mov [temp + 1], 49
or, if you want to dynamically select the slot in temp to store a value in
mov [temp + ebx], 49
where ebx holds the index value (could be any register)

Related

Assigned register collision when inline assembly compiled with clang

Consider the following sample program (targeting Linux/x86-64):
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
unsigned arg1 = strtoul(argv[1], NULL, 0);
unsigned arg2 = strtoul(argv[2], NULL, 0);
asm(
"mov %[arg1], %%ecx\n\t"
"add %[arg2], %[arg1]\n\t"
"add %[arg2], %%ecx\n\t"
"xchg %%ecx, %[arg2]"
: [arg1] "+&abdSD" (arg1), [arg2] "+&abdSD" (arg2)
:
: "cc", "ecx");
printf("%u %u\n", arg1, arg2);
}
(xchg is used just for easy grepping of compiled instructions in listing.)
With GCC, it works as expected - different registers are assigned to arg1 and arg2, for example:
11bf: e8 cc fe ff ff callq 1090 <strtoul#plt>
11c4: 89 da mov %ebx,%edx
11c6: 89 d1 mov %edx,%ecx
11c8: 01 c2 add %eax,%edx
11ca: 01 c1 add %eax,%ecx
11cc: 91 xchg %eax,%ecx
(so, arg1 in edx, arg2 in eax)
But, compiling with Clang (confirmed on 6.0 and 10.0) results in assigning the same register for arg1 and arg2:
401174: e8 d7 fe ff ff callq 401050 <strtoul#plt>
401179: 44 89 f0 mov %r14d,%eax ; <--
40117c: 89 c1 mov %eax,%ecx
40117e: 01 c0 add %eax,%eax ; <-- so, arg1 and arg2 both in eax
401180: 01 c1 add %eax,%ecx
401182: 91 xchg %eax,%ecx
The issue remains with multiple variations as e.g.: + instead of +& in constraint strings; numeric forms like %0 to address operands; replacing xchg with another rare instruction; and so on.
I have been expecting, from the basic principles, that compilerʼs logic to assign output locations will always assign different locations to different output operands, whatever constraints are defined for them; and the same works among the input operands set. (Modifiers like '+', '&' add more rules to placement logic but canʼt erode the main principles.)
Is there a some trivial aspect Iʼve overlooked?
UPD: reported to LLVM.

Decode UDP message with LUA

I'm relatively new to lua and programming in general (self taught), so please be gentle!
Anyway, I wrote a lua script to read a UDP message from a game. The structure of the message is:
DATAxXXXXaaaaBBBBccccDDDDeeeeFFFFggggHHHH
DATAx = 4 letter ID and x = control character
XXXX = integer shows the group of the data (groups are known)
aaaa...HHHHH = 8 single-precision floating point numbers
The last ones is those numbers I need to decode.
If I print the message as received, it's something like:
DATA*{V???A?A?...etc.
Using string.byte(), I'm getting a stream of bytes like this (I have "formatted" the bytes to reflect the structure above.
68 65 84 65/42/20 0 0 0/237 222 28 66/189 59 182 65/107 42 41 65/33 173 79 63/0 0 128 63/146 41 41 65/0 0 30 66/0 0 184 65
The first 5 bytes are of course the DATA*. The next 4 are the 20th group of data. The next bytes, the ones I need to decode, and are equal to those values:
237 222 28 66 = 39.218
189 59 182 65 = 22.779
107 42 41 65 = 10.573
33 173 79 63 = 0.8114
0 0 128 63 = 1.0000
146 41 41 65 = 10.573
0 0 30 66 = 39.500
0 0 184 65 = 23.000
I've found C# code that does the decode with BitConverter.ToSingle(), but I haven't found any like this for Lua.
Any idea?
What Lua version do you have?
This code works in Lua 5.3
local str = "DATA*\20\0\0\0\237\222\28\66\189\59\182\65..."
-- Read two float values starting from position 10 in the string
print(string.unpack("<ff", str, 10)) --> 39.217700958252 22.779169082642 18
-- 18 (third returned value) is the next position in the string
For Lua 5.1 you have to write special function (or steal it from François Perrad's git repo )
local function binary_to_float(str, pos)
local b1, b2, b3, b4 = str:byte(pos, pos+3)
local sign = b4 > 0x7F and -1 or 1
local expo = (b4 % 0x80) * 2 + math.floor(b3 / 0x80)
local mant = ((b3 % 0x80) * 0x100 + b2) * 0x100 + b1
local n
if mant + expo == 0 then
n = sign * 0.0
elseif expo == 0xFF then
n = (mant == 0 and sign or 0) / 0
else
n = sign * (1 + mant / 0x800000) * 2.0^(expo - 0x7F)
end
return n
end
local str = "DATA*\20\0\0\0\237\222\28\66\189\59\182\65..."
print(binary_to_float(str, 10)) --> 39.217700958252
print(binary_to_float(str, 14)) --> 22.779169082642
It’s little-endian byte-order of IEEE-754 single-precision binary:
E.g., 0 0 128 63 is:
00111111 10000000 00000000 00000000
(63) (128) (0) (0)
Why that equals 1 requires that you understand the very basics of IEEE-754 representation, namely its use of an exponent and mantissa. See here to start.
See #Egor‘s answer above for how to use string.unpack() in Lua 5.3 and one possible implementation you could use in earlier versions.

What do these 2 lines of assembly code do?

I am in the middle of phase 2 for bomb lab and I can't seem to figure out how these two lines of assembly affect the code overall and how they play a role in the loop going on.
Here is the 2 lines of code:
add -0x24(%ebp,%ebx,4),%eax
cmp %eax,-0x20(%ebp,%ebx,4)
and here is the entire code:
Dump of assembler code for function phase_2:
0x08048ba4 <+0>: push %ebp
0x08048ba5 <+1>: mov %esp,%ebp
0x08048ba7 <+3>: push %ebx
0x08048ba8 <+4>: sub $0x34,%esp
0x08048bab <+7>: lea -0x20(%ebp),%eax
0x08048bae <+10>: mov %eax,0x4(%esp)
0x08048bb2 <+14>: mov 0x8(%ebp),%eax
0x08048bb5 <+17>: mov %eax,(%esp)
0x08048bb8 <+20>: call 0x804922f <read_six_numbers>
0x08048bbd <+25>: cmpl $0x0,-0x20(%ebp)
0x08048bc1 <+29>: jns 0x8048be3 <phase_2+63>
0x08048bc3 <+31>: call 0x80491ed <explode_bomb>
0x08048bc8 <+36>: jmp 0x8048be3 <phase_2+63>
0x08048bca <+38>: mov %ebx,%eax
0x08048bcc <+40>: add -0x24(%ebp,%ebx,4),%eax
0x08048bd0 <+44>: cmp %eax,-0x20(%ebp,%ebx,4)
0x08048bd4 <+48>: je 0x8048bdb <phase_2+55>
0x08048bd6 <+50>: call 0x80491ed <explode_bomb>
0x08048bdb <+55>: inc %ebx
0x08048bdc <+56>: cmp $0x6,%ebx
0x08048bdf <+59>: jne 0x8048bca <phase_2+38>
0x08048be1 <+61>: jmp 0x8048bea <phase_2+70>
0x08048be3 <+63>: mov $0x1,%ebx
0x08048be8 <+68>: jmp 0x8048bca <phase_2+38>
0x08048bea <+70>: add $0x34,%esp
0x08048bed <+73>: pop %ebx
0x08048bee <+74>: pop %ebp
0x08048bef <+75>: ret
I noticed the inc command that increments %ebx by 1 and using that as %eax in the loop. But the add and cmp trip me up every time. If I had %eax as 1 going into to the add and cmp what %eax comes out? Thanks! I also know that once %ebx gets to 5 then the loop is over and it ends the entire code.
You got a list of 6 numbers. This means you can compare at most 5 pairs of numbers. So the loop that uses %ebx does 5 iterations.
In each iteration the value at the lower address is added to the current loop count, and then compared with the value at the next higher address. As long as they match the bomb won't explode!
This loops 5 times:
add -0x24(%ebp,%ebx,4),%eax
cmp %eax,-0x20(%ebp,%ebx,4)
These numbers are used:
with %ebx=1 numbers are at -0x20(%ebp) and -0x1C(%ebp)
with %ebx=2 numbers are at -0x1C(%ebp) and -0x18(%ebp)
with %ebx=3 numbers are at -0x18(%ebp) and -0x14(%ebp)
with %ebx=4 numbers are at -0x14(%ebp) and -0x10(%ebp)
with %ebx=5 numbers are at -0x10(%ebp) and -0x0C(%ebp)
Those two instructions are dealing with memory at two locations, indexed by ebp and ebx. In particular, the add instruction is keeping a running total of all the numbers examined so far, and the comparison instruction is checking whether that is equal to the next number. So something like:
int total = 0;
for (i=0; ..., i++) {
total += array[i];
if (total != array[i+])
explode_bomb();
}

How CUDA constant memory allocation works?

I'd like to get some insight about how constant memory is allocated (using CUDA 4.2). I know that the total available constant memory is 64KB. But when is this memory actually allocated on the device? Is this limit apply to each kernel, cuda context or for the whole application?
Let's say there are several kernels in a .cu file, each using less than 64K constant memory. But the total constant memory usage is more than 64K. Is it possible to call these kernels sequentially? What happens if they are called concurrently using different streams?
What happens if there is a large CUDA dynamic library with lots of kernels each using different amounts of constant memory?
What happens if there are two applications each requiring more than half of the available constant memory? The first application runs fine, but when will the second app fail? At app start, at cudaMemcpyToSymbol() calls or at kernel execution?
Parallel Thread Execution ISA Version 3.1 section 5.1.3 discusses constant banks.
Constant memory is restricted in size, currently limited to 64KB which
can be used to hold statically-sized constant variables. There is an
additional 640KB of constant memory, organized as ten independent 64KB
regions. The driver may allocate and initialize constant buffers in
these regions and pass pointers to the buffers as kernel function
parameters. Since the ten regions are not contiguous, the driver
must ensure that constant buffers are allocated so that each buffer
fits entirely within a 64KB region and does not span a region
boundary.
A simple program can be used to illustrate the use of constant memory.
__constant__ int kd_p1;
__constant__ short kd_p2;
__constant__ char kd_p3;
__constant__ double kd_p4;
__constant__ float kd_floats[8];
__global__ void parameters(int p1, short p2, char p3, double p4, int* pp1, short* pp2, char* pp3, double* pp4)
{
*pp1 = p1;
*pp2 = p2;
*pp3 = p3;
*pp4 = p4;
return;
}
__global__ void constants(int* pp1, short* pp2, char* pp3, double* pp4)
{
*pp1 = kd_p1;
*pp2 = kd_p2;
*pp3 = kd_p3;
*pp4 = kd_p4;
return;
}
Compile this for compute_30, sm_30 and execute cuobjdump -sass <executable or obj> to disassemble you should see
Fatbin elf code:
================
arch = sm_30
code version = [1,6]
producer = cuda
host = windows
compile_size = 32bit
identifier = c:/dev/constant_banks/kernel.cu
code for sm_30
Function : _Z10parametersiscdPiPsPcPd
/*0008*/ /*0x10005de428004001*/ MOV R1, c [0x0] [0x44]; // stack pointer
/*0010*/ /*0x40001de428004005*/ MOV R0, c [0x0] [0x150]; // pp1
/*0018*/ /*0x50009de428004005*/ MOV R2, c [0x0] [0x154]; // pp2
/*0020*/ /*0x0001dde428004005*/ MOV R7, c [0x0] [0x140]; // p1
/*0028*/ /*0x13f0dc4614000005*/ LDC.U16 R3, c [0x0] [0x144]; // p2
/*0030*/ /*0x60011de428004005*/ MOV R4, c [0x0] [0x158]; // pp3
/*0038*/ /*0x70019de428004005*/ MOV R6, c [0x0] [0x15c]; // pp4
/*0048*/ /*0x20021de428004005*/ MOV R8, c [0x0] [0x148]; // p4
/*0050*/ /*0x30025de428004005*/ MOV R9, c [0x0] [0x14c]; // p4
/*0058*/ /*0x1bf15c0614000005*/ LDC.U8 R5, c [0x0] [0x146]; // p3
/*0060*/ /*0x0001dc8590000000*/ ST [R0], R7; // *pp1 = p1
/*0068*/ /*0x0020dc4590000000*/ ST.U16 [R2], R3; // *pp2 = p2
/*0070*/ /*0x00415c0590000000*/ ST.U8 [R4], R5; // *pp3 = p3
/*0078*/ /*0x00621ca590000000*/ ST.64 [R6], R8; // *pp4 = p4
/*0088*/ /*0x00001de780000000*/ EXIT;
/*0090*/ /*0xe0001de74003ffff*/ BRA 0x90;
/*0098*/ /*0x00001de440000000*/ NOP CC.T;
/*00a0*/ /*0x00001de440000000*/ NOP CC.T;
/*00a8*/ /*0x00001de440000000*/ NOP CC.T;
/*00b0*/ /*0x00001de440000000*/ NOP CC.T;
/*00b8*/ /*0x00001de440000000*/ NOP CC.T;
...........................................
Function : _Z9constantsPiPsPcPd
/*0008*/ /*0x10005de428004001*/ MOV R1, c [0x0] [0x44]; // stack pointer
/*0010*/ /*0x00001de428004005*/ MOV R0, c [0x0] [0x140]; // p1
/*0018*/ /*0x10009de428004005*/ MOV R2, c [0x0] [0x144]; // p2
/*0020*/ /*0x0001dde428004c00*/ MOV R7, c [0x3] [0x0]; // kd_p1
/*0028*/ /*0x13f0dc4614000c00*/ LDC.U16 R3, c [0x3] [0x4]; // kd_p2
/*0030*/ /*0x20011de428004005*/ MOV R4, c [0x0] [0x148]; // p3
/*0038*/ /*0x30019de428004005*/ MOV R6, c [0x0] [0x14c]; // p4
/*0048*/ /*0x20021de428004c00*/ MOV R8, c [0x3] [0x8]; // kd_p4
/*0050*/ /*0x30025de428004c00*/ MOV R9, c [0x3] [0xc]; // kd_p4
/*0058*/ /*0x1bf15c0614000c00*/ LDC.U8 R5, c [0x3] [0x6]; // kd_p3
/*0060*/ /*0x0001dc8590000000*/ ST [R0], R7;
/*0068*/ /*0x0020dc4590000000*/ ST.U16 [R2], R3;
/*0070*/ /*0x00415c0590000000*/ ST.U8 [R4], R5;
/*0078*/ /*0x00621ca590000000*/ ST.64 [R6], R8;
/*0088*/ /*0x00001de780000000*/ EXIT;
/*0090*/ /*0xe0001de74003ffff*/ BRA 0x90;
/*0098*/ /*0x00001de440000000*/ NOP CC.T;
/*00a0*/ /*0x00001de440000000*/ NOP CC.T;
/*00a8*/ /*0x00001de440000000*/ NOP CC.T;
/*00b0*/ /*0x00001de440000000*/ NOP CC.T;
/*00b8*/ /*0x00001de440000000*/ NOP CC.T;
.....................................
I annotated to the right of the SASS.
On sm30 you can see that parameters are passed in constant bank 0 starting at offset 0x140.
User defined __constant__ variables are defined in constant bank 3.
If you execute cuobjdump --dump-elf <executable or obj> you can find other interesting constant information.
32bit elf: abi=6, sm=30, flags = 0x1e011e
Sections:
Index Offset Size ES Align Type Flags Link Info Name
1 34 142 0 1 STRTAB 0 0 0 .shstrtab
2 176 19b 0 1 STRTAB 0 0 0 .strtab
3 314 d0 10 4 SYMTAB 0 2 a .symtab
4 3e4 50 0 4 CUDA_INFO 0 3 b .nv.info._Z9constantsPiPsPcPd
5 434 30 0 4 CUDA_INFO 0 3 0 .nv.info
6 464 90 0 4 CUDA_INFO 0 3 a .nv.info._Z10parametersiscdPiPsPcPd
7 4f4 160 0 4 PROGBITS 2 0 a .nv.constant0._Z10parametersiscdPiPsPcPd
8 654 150 0 4 PROGBITS 2 0 b .nv.constant0._Z9constantsPiPsPcPd
9 7a8 30 0 8 PROGBITS 2 0 0 .nv.constant3
a 7d8 c0 0 4 PROGBITS 6 3 a00000b .text._Z10parametersiscdPiPsPcPd
b 898 c0 0 4 PROGBITS 6 3 a00000c .text._Z9constantsPiPsPcPd
.section .strtab
.section .shstrtab
.section .symtab
index value size info other shndx name
0 0 0 0 0 0 (null)
1 0 0 3 0 a .text._Z10parametersiscdPiPsPcPd
2 0 0 3 0 7 .nv.constant0._Z10parametersiscdPiPsPcPd
3 0 0 3 0 b .text._Z9constantsPiPsPcPd
4 0 0 3 0 8 .nv.constant0._Z9constantsPiPsPcPd
5 0 0 3 0 9 .nv.constant3
6 0 4 1 0 9 kd_p1
7 4 2 1 0 9 kd_p2
8 6 1 1 0 9 kd_p3
9 8 8 1 0 9 kd_p4
10 16 32 1 0 9 kd_floats
11 0 192 12 10 a _Z10parametersiscdPiPsPcPd
12 0 192 12 10 b _Z9constantsPiPsPcPd
The kernel parameter constant bank is versioned per launch so that concurrent kernels can be executed. The compiler and user constants are per CUmodule. It is the responsibility of the developer to manage coherency of this data. For example, the developer has to ensure that a cudaMemcpyToSymbol is update in a safe manner.

Is 'as' just another way to express a type-annotation?

My understanding of Dart leads me to believe that this 'cast' should not affect run-time semantics, but just wanted to confirm:
(foo as Bar).fee();
(foo as Bar).fi();
(foo as Bar).fo();
Or is it "best practice" to cast once:
final bFoo = (foo as Bar);
bFoo.fee();
bFoo.fi();
bFoo.fo();
This is highly dependent on how the DartVM optimizer handles the case. Using the latest version of Dart I constructed two test functions:
void test1() {
Dynamic bar = makeAFoo();
for (int i = 0; i < 5000; i++) {
(bar as Foo).a();
(bar as Foo).b();
}
}
and
void test2() {
Dynamic bar = makeAFoo();
Foo f = bar as Foo;
for (int i = 0; i < 5000; i++) {
f.a();
f.b();
}
}
Looking at the optimized code for test1 you can see the loop looks like this:
00D09A3C bf813b9d00 mov edi,0x9d3b81 'instance of Class: SubtypeTestCache'
00D09A41 57 push edi
00D09A42 50 push eax
00D09A43 6811003400 push 0x340011
00D09A48 e8d36c83ff call 0x540720 [stub: Subtype1TestCache]
00D09A4D 58 pop eax
00D09A4E 58 pop eax
00D09A4F 5f pop edi
00D09A50 81f911003400 cmp ecx,0x340011
00D09A56 7411 jz 0xd09a69
00D09A58 81f9710f7c00 cmp ecx,0x7c0f71
00D09A5E 0f8437000000 jz 0xd09a9b
00D09A64 e900000000 jmp 0xd09a69
00D09A69 8b1424 mov edx,[esp]
00D09A6C 8b4c2404 mov ecx,[esp+0x4]
00D09A70 6811003400 push 0x340011
00D09A75 50 push eax
00D09A76 68b9229d00 push 0x9d22b9
00D09A7B 51 push ecx
00D09A7C 52 push edx
00D09A7D 6889289d00 push 0x9d2889
00D09A82 b8813b9d00 mov eax,0x9d3b81 'instance of Class: SubtypeTestCache'
00D09A87 50 push eax
00D09A88 b9b0d00b00 mov ecx,0xbd0b0
00D09A8D ba06000000 mov edx,0x6
00D09A92 e8896583ff call 0x540020 [stub: CallToRuntime]
00D09A97 83c418 add esp,0x18
00D09A9A 58 pop eax
00D09A9B 5a pop edx
00D09A9C 59 pop ecx
00D09A9D 50 push eax
00D09A9E a801 test al,0x1
00D09AA0 0f8450010000 jz 0xd09bf6
00D09AA6 0fb74801 movzx_w ecx,[eax+0x1]
00D09AAA 81f922020000 cmp ecx,0x222
00D09AB0 0f8540010000 jnz 0xd09bf6
00D09AB6 b9d1229d00 mov ecx,0x9d22d1 'Function 'a':.'
00D09ABB bae96ccb00 mov edx,0xcb6ce9 Array[1, 1, null]
00D09AC0 e82b6983ff call 0x5403f0 [stub: CallStaticFunction]
00D09AC5 83c404 add esp,0x4
00D09AC8 b911003400 mov ecx,0x340011
00D09ACD ba11003400 mov edx,0x340011
00D09AD2 8b45f4 mov eax,[ebp-0xc]
00D09AD5 51 push ecx
00D09AD6 52 push edx
00D09AD7 3d11003400 cmp eax, 0x340011
00D09ADC 0f849a000000 jz 0xd09b7c
00D09AE2 a801 test al,0x1
00D09AE4 7505 jnz 0xd09aeb
00D09AE6 e95f000000 jmp 0xd09b4a
00D09AEB 0fb74801 movzx_w ecx,[eax+0x1]
00D09AEF 81f922020000 cmp ecx,0x222
00D09AF5 0f8481000000 jz 0xd09b7c
00D09AFB 0fb77801 movzx_w edi,[eax+0x1]
00D09AFF 8b4e07 mov ecx,[esi+0x7]
00D09B02 8b891c100000 mov ecx,[ecx+0x101c]
00D09B08 8b0cb9 mov ecx,[ecx+edi*0x4]
00D09B0B 8b7927 mov edi,[ecx+0x27]
00D09B0E 8b7f03 mov edi,[edi+0x3]
00D09B11 81ff59229d00 cmp edi,0x9d2259
00D09B17 0f845f000000 jz 0xd09b7c
00D09B1D bfd13b9d00 mov edi,0x9d3bd1 'instance of Class: SubtypeTestCache'
00D09B22 57 push edi
00D09B23 50 push eax
00D09B24 6811003400 push 0x340011
00D09B29 e8f26b83ff call 0x540720 [stub: Subtype1TestCache]
00D09B2E 58 pop eax
00D09B2F 58 pop eax
00D09B30 5f pop edi
00D09B31 81f911003400 cmp ecx,0x340011
00D09B37 7411 jz 0xd09b4a
00D09B39 81f9710f7c00 cmp ecx,0x7c0f71
00D09B3F 0f8437000000 jz 0xd09b7c
00D09B45 e900000000 jmp 0xd09b4a
00D09B4A 8b1424 mov edx,[esp]
00D09B4D 8b4c2404 mov ecx,[esp+0x4]
00D09B51 6811003400 push 0x340011
00D09B56 50 push eax
00D09B57 68b9229d00 push 0x9d22b9
00D09B5C 51 push ecx
00D09B5D 52 push edx
00D09B5E 6889289d00 push 0x9d2889
00D09B63 b8d13b9d00 mov eax,0x9d3bd1 'instance of Class: SubtypeTestCache'
00D09B68 50 push eax
00D09B69 b9b0d00b00 mov ecx,0xbd0b0
00D09B6E ba06000000 mov edx,0x6
00D09B73 e8a86483ff call 0x540020 [stub: CallToRuntime]
00D09B78 83c418 add esp,0x18
00D09B7B 58 pop eax
00D09B7C 5a pop edx
00D09B7D 59 pop ecx
00D09B7E 50 push eax
00D09B7F a801 test al,0x1
00D09B81 0f8479000000 jz 0xd09c00
00D09B87 0fb74801 movzx_w ecx,[eax+0x1]
00D09B8B 81f922020000 cmp ecx,0x222
00D09B91 0f8569000000 jnz 0xd09c00
00D09B97 b961239d00 mov ecx,0x9d2361 'Function 'b':.'
00D09B9C bae96ccb00 mov edx,0xcb6ce9 Array[1, 1, null]
00D09BA1 e84a6883ff call 0x5403f0 [stub: CallStaticFunction]
00D09BA6 83c404 add esp,0x4
00D09BA9 8b4df8 mov ecx,[ebp-0x8]
00D09BAC 83c102 add ecx,0x2
00D09BAF 0f8055000000 jo 0xd09c0a
00D09BB5 89cf mov edi,ecx
00D09BB7 8b5df4 mov ebx,[ebp-0xc]
00D09BBA e90efeffff jmp 0xd099cd
And the optimized code for test2 you can see the loop looks like this:
00D09F3D 894df4 mov [ebp-0xc],ecx
00D09F40 81f910270000 cmp ecx,0x2710
00D09F46 0f8d46000000 jnl 0xd09f92
00D09F4C 3b251c414700 cmp esp,[0x47411c]
00D09F52 0f8659000000 jna 0xd09fb1
00D09F58 50 push eax
00D09F59 b9d1229d00 mov ecx,0x9d22d1 'Function 'a':.'
00D09F5E bae96ccb00 mov edx,0xcb6ce9 Array[1, 1, null]
00D09F63 e8886483ff call 0x5403f0 [stub: CallStaticFunction]
00D09F68 83c404 add esp,0x4
00D09F6B 8b45f0 mov eax,[ebp-0x10]
00D09F6E 50 push eax
00D09F6F b961239d00 mov ecx,0x9d2361 'Function 'b':.'
00D09F74 bae96ccb00 mov edx,0xcb6ce9 Array[1, 1, null]
00D09F79 e8726483ff call 0x5403f0 [stub: CallStaticFunction]
00D09F7E 83c404 add esp,0x4
00D09F81 8b4df4 mov ecx,[ebp-0xc]
00D09F84 83c102 add ecx,0x2
00D09F87 0f8048000000 jo 0xd09fd5
00D09F8D 8b45f0 mov eax,[ebp-0x10]
00D09F90 ebab jmp 0xd09f3d
And only one set of calls to SubTypeTestCache (outside the loop for test2) instead of two in test1.
Today, it seems that doing the cast once is faster but pulling the cast out of the loop seems like a simple optimization that the VM may do in the future.
Running (foo as Bar) has two effects:
It tells the editor that foo is a Bar which helps with static type analysis and lets the editor do code completion.
It checks that foo is a Bar (or a subtype of Bar), otherwise it'll throw a CastException.
Look for "Type Cast" in (http://www.dartlang.org/docs/spec/latest/dart-language-specification.pdf).
Updated: I like John's answer, too, but I think I should say one more thing. I overlooked the fact that you were talking about doing the cast once versus three times. Looking at final bFoo = (foo as Bar);, I want to say one more thing about the language semantics.
It's true that Dart Editor, dart2js, and the VM could conceivably infer that foo is of type Bar, which would save additional checks, etc. However, the semantics of the language say something slightly different. "final bFoo" does not have a type annotation. So according to the language spec, bFoo is of type Dynamic.
Hence, when you write "(foo as Bar)" three times, each expression results in a Bar. But when you write bFoo, you have a Dynamic object.
It is not "best practice" to perform three as casts right in a row for the same variable.
An as cast is really a runtime check. I'm just guessing, but if you are trying to reduce warnings from the editor, there is probably a better way to do it.
For example, here's one scenario:
class Foo {
}
class Bar extends Foo {
m1() => print('m1');
}
doStuff(Foo foo) {
foo.m1(); // warning here
}
main() {
var foo = new Bar();
doStuff(foo);
}
The above code runs just fine, but the editor does show a warning. To eliminate the warning, it's better to refactor the code. You could remove the Foo annotation from doStuff, or you could consider moving m1() up to Foo, or you could do double-dispatch.

Resources