How to read PSTATE in AARCH64 - arm64

I would like to read if the value of the SError mask bit is set. Previously in case ARMv7 it was easy accessible by CSPR register, however in case of ARMv8-A and aarch64 that is no longer so - from ARMv8-A Programmers Guide:
"AArch64 does not have a direct equivalent of the ARMv7 Current Program Status Register
(CPSR). In AArch64, the components of the traditional CPSR are supplied as fields that can be
made accessible independently. These are referred to collectively as Processor State (PSTATE)."
so since PSTATE is not an actual register then:
mrs x0, pstate
gives me an error on compilation. So how can I read this PSTATE to get the value of SError that I need?

AArch64 has separate registers for various subsets of PSTATE. For the interrupt masks, this is called daif:
mrs x0, daif
The D, A, I and F flags are in bits 9, 8, 7 and 6 respectively (mask 0x3c0). The other bits are reserved and should read as zero.

Related

Is it possible to use clang to produce RISC V assembly without linking?

I am trying to learn more about compilers and RISC V assembly was specifically designed to be easy to learn and teach. I am interested in compiling some simple C code to assembly using clang for the purpose of understanding the semantics. I'm planning on using venus to step through the assembly and the source code does NOT actually need to be fully compiled to machine code in order to run on a real machine.
I want to avoid compiler optimizations so I can see what I've actually instructed the processor to do.
I don't actually need the program to compile to machine code--I just want the assembly.
I don't want to worry about linking to the system library because this code doesn't actually need to run
The code does not make any explicit use of system calls and so I think a std lib should not be required
This answer seems to indicate that clang definitely can compile to RISC V targets, but it requires having a version of the OS's standard library built for RISC V.
This answer indicates that some form of cross-compiling is necessary, but again I don't need to fully compile the code to machine instructions so this should not apply if I'm understanding correctly.
Use clang -S to stop after generating an assembly file:
$ cat foo.c
int main() { return 2+2; }
$ clang -target riscv64 -S foo.c
$ cat foo.s
.text
.attribute 4, 16
.attribute 5, "rv64i2p0_m2p0_a2p0_c2p0"
.file "foo.c"
.globl main
.p2align 1
.type main,#function
main:
addi sp, sp, -32
sd ra, 24(sp)
sd s0, 16(sp)
addi s0, sp, 32
li a0, 0
sw a0, -20(s0)
li a0, 4
ld ra, 24(sp)
ld s0, 16(sp)
addi sp, sp, 32
ret
.Lfunc_end0:
.size main, .Lfunc_end0-main
.ident "Ubuntu clang version 14.0.0-1ubuntu1"
.section ".note.GNU-stack","",#progbits
.addrsig
You can also use Compiler Explorer conveniently online.

Linking problems when creating a library for iOS 7

I get linking problem when create library for iOS 7 on iPhone (ARM64).
The error message is:
ld: in /long_path/libHEVCCodec.a(inv_xforms_arm64.o), in section TEXT,text reloc 0:
ARM64_RELOC_SUBTRACTOR must have r_length of 2 or 3 for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
This error appears as a result to this code (it's some sort of switch):
adr addr, .L.dct_add_switch
ldrh offset, [addr, ta, lsl #1]
add addr, addr, offset, uxth
br addr
.L.dct_add_switch:
.hword .L.dct_add_4 - .L.dct_add_switch
.hword .L.dst_add_4 - .L.dct_add_switch
...
ta, addr, offset are general registers x3, x4, w5 respectively.
Does anybody know how to handle this situation?
PS: there are not any problems with GNU GCC & Android.
EDIT1:
It seems that problem is not in linker itself but in compiler.
I checked object file (objdump) and instead of difference constants there are just zeros.
.L.dct_add_switch:
0000000000000010 .long 0x00000000
0000000000000014 .long 0x00000000
0000000000000018 .long 0x00000000
000000000000001c nop
When I put manually calculated constants instead of ".L.dct_add_4 - .L.dct_add_switch", etc expressions, everything is going all right.
Maybe there is some compiler keys which will make compiler to do it job rightfully?
Thanks.
Well there is a compiler & linker problem and it depends on the size of data which are used for offsets. Clang is not very friendly to anything what is different from 4 Bytes.
The discussion and possible solutions in other topic: creating constant jump table; xcode; clang; asm
The problem is the Mach-O object file format for ARM 64-bit targets doesn't support a relocation for the 16-bit difference between two symbols. It appears that the difference must be 32-bit or 64-bit. It doesn't seem to be a problem with the compiler or the linker. The assembly code you've quoted in your question looks like handcrafted assembly, not compiler output.
The solution would be to rewrite the assembly to use 32-bit difference values. Something like this:
adr addr, .L.dct_add_switch
ldr offset, [addr, ta, lsl #2]
add addr, addr, offset, uxtw
br addr
.L.dct_add_switch:
.word .L.dct_add_4 - .L.dct_add_switch
.word .L.dst_add_4 - .L.dct_add_switch

"Invalid operand for instruction" error in arm64 architecture for ios

I am getting compilation error when trying to run the below inline assembly instruction in arm64 architecture.Works fine with 32-bit architecture.
Store instruction stores the stack pointer(sp) to variable stack_ptr.
unsigned long stack_ptr = 0;
__asm__ __volatile__("str sp, %[stack_ptr]"
:[stack_ptr]"=m" (stack_ptr) //output operand list
);
In 64 bit code, you can't use SP as operand in STR instruction. Quoting the documentation:
You can only use SP as an operand in the following instructions:
As the base register for loads and stores. In this case it must be quadword-aligned before adding any offset, or a stack alignment
exception occurs.
As a source or destination for arithmetic instructions, but it cannot be used as the destination in instructions that set the
condition flags.
In logical instructions, for example in order to align it.
You should copy it into a general purpose register first, then store it into memory.
Unless you need a truly precise value, you could just use plain C and take the address of the local variable itself to get an estimate of the stack pointer:
unsigned long stack_ptr = (unsigned long)&stack_ptr;

How to get this sqrt inline assembly working for iOS

I am trying to follow another SO post and implement sqrt14 within my iOS app:
double inline __declspec (naked) __fastcall sqrt14(double n)
{
_asm fld qword ptr [esp+4]
_asm fsqrt
_asm ret 8
}
I have modified this to the following in my code:
double inline __declspec (naked) sqrt14(double n)
{
__asm__("fld qword ptr [esp+4]");
__asm__("fsqrt");
__asm__("ret 8");
}
Above, I have removed the "__fastcall" keyword from the method definition since my understanding is that it is for x86 only. The above gives the following errors for each assembly line respectively:
Unexpected token in argument list
Invalid instruction
Invalid instruction
I have attempted to read through a few inline ASM guides and other posts on how to do this, but I am generally just unfamiliar with the language. I know MIPS quite well, but these commands/registers seem to be very different. For example, I don't understand why the original author never uses the passed in "n" value anywhere in the assembly code.
Any help getting this to work would be greatly appreciated! I am trying to do this because I am building an app where I need to calculate sqrt (ok, yes, I could do a lookup table, but for right now I care a lot about precision) on every pixel of a live-video feed. I am currently using the standard sqrt, and in addition to the rest of the computation, I'm running at around 8fps. Hoping to bump that up a frame or two with this change.
If it matters: I'm building the app to ideally be compatibly with any current iOS device that can run iOS 7.1 Again, many thanks for any help.
The compiler is perfectly capable of generating fsqrt instruction, you don't need inline asm for that. You might get some extra speed if you use -ffast-math.
For completeness' sake, here is the inline asm version:
__asm__ __volatile__ ("fsqrt" : "=t" (n) : "0" (n));
The fsqrt instruction has no explicit operands, it uses the top of the stack implicitly. The =t constraint tells the compiler to expect the output on the top of the fpu stack and the 0 constraint instructs the compiler to place the input in the same place as output #0 (ie. the top of the fpu stack again).
Note that fsqrt is of course x86-only, meaning it wont work for example on ARM cpus.

Memory address wont load

I am trying to move the value 0 into the address stored in ax (assume that this is writable for now).
mov ax, 0EC7 ; assume writable
mov BYTE [ax], 0
But, nasm is giving me this error:
error: invalid effective address
Any ideas?
16-bit addressing modes are quite limited. You can use an (optional) offset (a plain number), plus an (optional) base register (bx or bp), plus an (optional) index register (si or di). That's it.
In 32-bit addressing modes, any register can be a base register and any register but esp can be an index register. 32-bit addressing also introduces an (optional) scale (1, 2, 4, or 8) to be multiplied by the index register.
[eax] will work - even in 16-bit code. The assembler generates an "address size override prefix" byte (0x67). If the value in eax exceeds the segment limit (usually 64k), an exception is generated (not handled in real DOS - it just hangs), so be careful with it.

Resources