ELEVENWORDINLINE when to use it?

ELEVENWORDINLINE when to use it? - ios

I was always wondering what can I do with things like that:
ONEWORDINLINE(w1)
TWOWORDINLINE(w1, w2)
THREEWORDINLINE(w1, w2, w3)
up to
TENWORDINLINE(w1, w2, w3, w4, w5, w6, w7, w8, w9, w10)
ELEVENWORDINLINE(w1, w2, w3, w4, w5, w6, w7, w8, w9, w10, w11)
TWELVEWORDINLINE(w1, w2, w3, w4, w5, w6, w7, w8, w9, w10, w11, w12)
How to use this macros?
When to use them?
Why from 1 to 12 and
not to...100 for example?

That is long-dead technology of the ancient Mac programmers, sentry to a tomb best left untouched. But, if you're interested, off we go, brave adventurer!
Here are the relevant #define-s straight from Apple:
#define ONEWORDINLINE(w1) = w1
#define TWOWORDINLINE(w1,w2) = {w1,w2}
#define THREEWORDINLINE(w1,w2,w3) = {w1,w2,w3}
/* ...etc... */
#define TWELVEWORDINLINE(w1,w2,w3,w4,w5,w6,w7,w8,w9,w10,w11,w12) = {w1,w2,w3,w4,w5,w6,w7,w8,w9,w10,w11,w12}
Now to some explaining.
A little history lesson: back in the ancient times days when Mac was implemented for the (just as ancient) Motorola 68k, Apple set up its system-calls in the following highly compact and very usable way: they mapped them to words starting with 1010b (0xA), as they were reserved by the Motorola devs for this use. These sys-calls and their mappings were called A-Traps for this hex value (no relation to "IT'S A TRAP!", honestly). They looked like this in hex: 0xA869 (this example is the A-Trap for the FixRatio(short numer, short denom) system-call). This technology was originally created for the Mac Toolbox API.
When a Mac-on-68k-targeting compiler (these should set the TARGET_OS_MAC and TARGET_CPU_68K macros as 1 or TRUE and TARGET_RT_MAC_CFM as 0 or FALSE, BTW) saw a function prototype with an assignment (=) after it, it treated the prototype as referring to an A-Trap system call indicated by the A-Trap value to the right of the assignment operator, which could be a single integer literal one-word value starting with 0xA (0xA???). So, ONEWORDINLINE was basically a stylish macro-way of saying "it's an A-Trap!".
So, here's what a sys-call function prototype declaration for 68k would look like:
EXTERN_API(Fixed) FixRatio(short numer, short denom) ONEWORDINLINE(0xA869);
This would be preprocessed to something like this:
extern Fixed FixRatio(short numer, short denom) = 0xA869;
Now, you might be thinking: if we're indexing system calls by one word, and one-fourth that word is taken by the huge 0xA, that'd be only 4096 functions at maximum (there was much less in reality, as many A-Traps actually mapped to the same system-call subroutines, but with different parameters), how is that enough? Well, obviously, it isn't. That's where selectors come in.
A-Traps like _HFSDispatch (0xA260) were called "selectors" because they had the job of selecting and calling another subroutine determined by values on the stack. So, when a function prototype was "assigned" an "array" of one-word integer literals, all but the last one (called the selector code(s)) got pushed onto the stack, and the last one was treated as an A-Trap selector that grabbed the words pushed onto the stack and called the appropriate subroutine. The maximum number of words in such an array was 12 because that was enough for the Mac Toolbox.
The macros TWOWORDINLINE through TWELVEWORDINLINE handled selector A-Traps. For example:
EXTERN_API(OSErr) ActivateTSMDocument(TSMDocumentID idocID) TWOWORDINLINE(0x7002, 0xAA54);
would be preprocessed to something like
extern OSErr ActivateTSMDocument(TSMDocumentID idocID) = {0x7002, 0xAA54};
Here, 0x7002 is the selector code, and 0xAA54 is the selector A-Trap.
So, to sum it all up, you only need it if you want to do some coding for pre-1994 Mac-s running on a Motorola 68k. So ios isn't really in place here ;)
Disclaimer: I know this stuff in theory only and I may have made a mistake somewhere. If there are any old-timers experienced with this stuff, please correct me if I got something wrong!

Related

Does ALU know about postfix notation?

As we all know, the ALU perform Arithmetic operation, but does the computer understand post-fix notation or not?

Assuming you mean Arithmetic/Logic Unit, no. The ALU does not understand any notation. It only understands instructions. So, for example, the machine code might include an instruction to "add R10 to R11 and store the result in R9," say (disassembled) ADD R9, R10, R11, but the machine code "notation" is understood by the Control Unit, not the ALU.
By the time the ALU receives the information, it is encoded in the form of various control lines being asserted. For instance, in the above example, the CU might assert control lines for "add," "input A is R10," "input B is R11," and "store result in R9." These lines determine how the ALU and the register file behave, and result in the operation desired.
Textual notation, such as 5 + 8 or (+ x 19) or x 19 15 + * or indeed ADD R9, R10, R11, is understood by software, doing processing at a much higher level than the ALU does. It is that software that interprets, say, postfix notation, and issues the instructions that cause the ALU to execute the desired operations.

Understand what a command does

I'm learning x86 syntax.
I've stumbled across this command which I don't seem to be sure what it does:
cmpl $0x0,%cs:0x6574
I know cmp just compares the difference and sets the flags. And the l to indicate that unsigned values are being compared.
My question is:
What are we comparing ?
The value in 0x0 against what value %cs:0x6574 ? cs register contains an address, should I add 0x6574 to it and extract the value ? something like:
mem[cs+0x6575]
Thanks in advance!

Assuming this is from real mode code it is default segment override. So instead of implicit DS use CS segment.
In real mode address calculation is a bit different. Value of segment is first multiplied by 16 and than offset is added.
So in your notation it will be
mem[16*cs+0x6575]

Fastest way of storing non-adjacent d registers with NEON intrinsics

I am porting 32bit NEON asm code to NEON intrinsics, and I am wondering if this code can be written in a concise way using intrinsics:
vst4.32 {d0[0], d2[0], d4[0], d6[0]}, [%[v1]]!
1) The previous code operates on q registers, but when it comes to storage, instead of using q0, q1, q2 and q3, it has to recreate vectors which have each part in one of the d registers, e.g. v1[0] = d0[0], v1[1] = d2[0] ... v2[0] = d0[1], v2[1] = d2[1] ... v3[0] = d1[0], v3[1] = d3[0] ... etc.
This operation is a one-liner in asm, but with intrinsics I don't know if I can do that without first splitting high and low bits and building a new float32x4x4_t variable to feed to vst4_f32.
Is that possible?
2) I'm not entirely sure of what [%[v1]]! does (yes, I googled quite a bit): it should be a reference to a variable named v1 and the exclamation mark will do writeback, which should mean the pointer is increased by the same amount that was written by the instruction on the same line.
Correct? Any way of replicating that with intrinsics?

After some more investigation I found this specific instruction to store a specific lane of an array of 4 vectors, so no need to split into high and low bits variables:
float32x4x4_t u = { q0, q1, q2, q3 };
vst4q_lane_f32(v1, u, 0);
v1 += 4;
Writeback is just an increased pointer, as #charlesbaylis wrote.

In principle, a sufficiently smart compiler could use the instruction you want for the vst4_f32 intrinsic, but in practice, no compiler is that good.
To get the post-index writeback, you can write
vst4_f32(ptr, v);
ptr += 4;
Some compilers will recognise this. GCC 5.1 (when released) will do this in at least some cases.
[Edit: misread the question, vst4q_lane_f32 does map to the required instruction perfectly]

It seems to be inline assembly.
Anyway, the answers are:
1) No
2) Yes

How to get this sqrt inline assembly working for iOS

I am trying to follow another SO post and implement sqrt14 within my iOS app:
double inline __declspec (naked) __fastcall sqrt14(double n)
{
_asm fld qword ptr [esp+4]
_asm fsqrt
_asm ret 8
}
I have modified this to the following in my code:
double inline __declspec (naked) sqrt14(double n)
{
__asm__("fld qword ptr [esp+4]");
__asm__("fsqrt");
__asm__("ret 8");
}
Above, I have removed the "__fastcall" keyword from the method definition since my understanding is that it is for x86 only. The above gives the following errors for each assembly line respectively:
Unexpected token in argument list
Invalid instruction
Invalid instruction
I have attempted to read through a few inline ASM guides and other posts on how to do this, but I am generally just unfamiliar with the language. I know MIPS quite well, but these commands/registers seem to be very different. For example, I don't understand why the original author never uses the passed in "n" value anywhere in the assembly code.
Any help getting this to work would be greatly appreciated! I am trying to do this because I am building an app where I need to calculate sqrt (ok, yes, I could do a lookup table, but for right now I care a lot about precision) on every pixel of a live-video feed. I am currently using the standard sqrt, and in addition to the rest of the computation, I'm running at around 8fps. Hoping to bump that up a frame or two with this change.
If it matters: I'm building the app to ideally be compatibly with any current iOS device that can run iOS 7.1 Again, many thanks for any help.

The compiler is perfectly capable of generating fsqrt instruction, you don't need inline asm for that. You might get some extra speed if you use -ffast-math.
For completeness' sake, here is the inline asm version:
__asm__ __volatile__ ("fsqrt" : "=t" (n) : "0" (n));
The fsqrt instruction has no explicit operands, it uses the top of the stack implicitly. The =t constraint tells the compiler to expect the output on the top of the fpu stack and the 0 constraint instructs the compiler to place the input in the same place as output #0 (ie. the top of the fpu stack again).
Note that fsqrt is of course x86-only, meaning it wont work for example on ARM cpus.

Implementing exponentiation in Forth

I'm using Gforth to try to implement exponentiation. I understand, in theory, how a stack-based language is supposed to operate. However, I'm having difficulties with my implementation of it on Gforth.
Here's what I have right now:
: myexp
1 swap ?do rot dup * rot rot loop ;
However, when I run it I see a stack underflow like:
3 2 myexp
:1: Stack underflow
3 2 >>>myexp<<<
Backtrace:
$7F645EFD6EF0 rot
$2
$1
Is Gforth's looping structure manipulating the stack when it loops?
I'm in the dark on how Forth works as most looping examples I've seen online are rather involved and confusing to someone new to Forth.
What is wrong with my implementation?

The 1 swap is wrong. ?do wants the lower bound at the top of the
stack.
The loop body is wrong. The two bounds are removed from the data stack, so your use of rot to access the exponentiation base doesn't work.
: myexp ( u1 u2 -- u3 ) \ u3 = u1^u2
over swap 1 ?do over * loop nip ;

I'm not sure how to use Gforth's floating point stack, so I can't give you the answer, but instead of using a loop, you can use the Pascal programming trick of defining exponentiation like so:
x^y = exp(y*ln(x))
Note...for more information, see this answer from the question on Exponentiation of real numbers.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

ELEVENWORDINLINE when to use it? - ios

Related

Does ALU know about postfix notation?

Understand what a command does

Fastest way of storing non-adjacent d registers with NEON intrinsics

How to get this sqrt inline assembly working for iOS

Implementing exponentiation in Forth

Categories

Resources