Trying to understand the load memory address (LMA) and the binary file offset in an ARM binary image - memory

I'm working in an ARM Cortex M4 (STM32F4xxxx) and I'm trying to understand how exactly the binaries (*.elf and *.bin) are built and flashed in memory, specially with regards to the memory locations. Specifically, what I don't understand is how the LMA gets 'translated' from the actual binary file offset. Let me explain with an example:
I have an *.elf file whose (relevant) sections are the following ones:(obtained from objdump -h)
my_file.elf: file format elf32-littlearm
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 000001c4 08010000 08010000 00020000 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 .bootloader 00004000 08000000 08000000 00010000 2**0
CONTENTS, ALLOC, LOAD, DATA
According to that file, the VMA and LMA are 0x8000000 and 0x8010000, what is perfectly fine since they are defined that way in the linker script file. In addition, according to that report, the offsets of those sections are 0x10000 and 0x20000 respectively. Next, I execute the following command for dumping the memory corresponding to the .bootloader:
xxd -s 0x10000 -l 16 my_file.elf
00010000: b007 c0de b007 c0de b007 c0de b007 c0de ................
Now, create the binary file to be flashed into memory:
arm-none-eabi-objcopy -O binary --gap-fill 0xFF -S my_file.elf my_file.bin
According to the information provided above, and as far as I understand, the generated binary file should have the .bootloader section located at 0x8000000. I understand that this is not how it actually works, inasmuch as the file would get extremely big, so the bootloader is placed at the beginning of the file, so the address 0x0 (check that both memory chunks are identical, even though the are at different addresses):
xxd -s 0x00000 -l 16 my_file.bin
00000000: b007 c0de b007 c0de b007 c0de b007 c0de ................
As far as I understand, when the mentioned binary file is flashed into memory, the bootloader will be at address 0x0, what is perfectly fine taking into account that the MCU in question jumps to the address 0x4 (after getting the SP from 0x0) when it starts working, as I have checked here (page 26): https://www.st.com/content/ccc/resource/technical/document/application_note/76/f9/c8/10/8a/33/4b/f0/DM00115714.pdf/files/DM00115714.pdf/jcr:content/translations/en.DM00115714.pdf
Finally, my questions are:
Will the bootloader actually be placed at 0x0? If so, what's the purpose of defining the memory sectors in the linker file?
Is this because 0x0 belongs to flash memory, and when the MCU starts, all the flash is copied into RAM at address 0x8000000? If so, will the bootloader be executed from flash memory and all the rest of the code from RAM?
Taking into account the above questions, if I have not understood anything, what's the relation/difference between the LMA and the File offset?

No, bootloader will be at 08000000, as defined in elf file.
Image will be burned in flash at that address and executed directly from there (not copied somewhere else or so).
There's somewhat undocumented behaviour, that unitialized area before actual data is skipped when producing binary image. As comment in BFDlib source states (https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=bfd/binary.c;h=37f5f9f7363e7349612cdfc8bc579369bbabbc0c;hb=HEAD#l238)
/* The lowest section LMA sets the virtual address of the start
of the file. We use this to set the file position of all the
sections. */
Lowest section (.bootloader) LMA is 08000000 in your .elf, so binary file will start at this address.
You should take this address into account and add it to file offset when determining address in the image.
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 000001c4 08010000 08010000 00020000 2**0
/* ^^^^^^^^ */
/* this section will be at offset 10000 in image */
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 .bootloader 00004000 08000000 08000000 00010000 2**0
/* ^^^^^^^^ */
/* this is the lowest LMA in your case it will be used */
/* as a start of an image, and this section will be placed */
/* directly at start of the image */
CONTENTS, ALLOC, LOAD, DATA
Memory layout: Bin. image layout:
000000000 \ skipped
... ________________ /
080000000 .bootloader 0
... ________________
080004000 <gap> 4000
... ________________
080010000 .text 10000
... ________________
0800101C4 101C4
That address defined in ldscript, so binary image should start at fixed location. However you should be aware of this behaviour when dealing with ldscrips and binary images.
To summarize building and flashing process:
When linking, start address is defined in ldscript, and and first section in elf located there.
When converting to binary, start address is determined from LMA and binary image starts from that address.
When flashing image, same address given to flasher as a parameter, so image is placed at the right place (defined in ldscript).
Update: STM32F4xxx booting process.
Address region starting at address 0 is special to those MCUs. It can be configured to map other regions, which are flash, SRAM, or system ROM. They're selected by pins BOOTSELx.
From CPU side it looks like second copy of flash (SRAM or system ROM) appears at address 0.
When CPU starts, it first reads initial SP from adress 0 and initial PC from address 4. Actually, reads from the flash memory are performed.
If the code is linked to run from actual flash location, then initial PC will point there. In this case execution starts at actual flash address.
----- Mapped area (mimics contents as flash) ---
0: (02001000) ;
4: (0800ABCD) ----. ; CPU reads PC here
.... | ; (it points to flash)
----- FLASH ----- |
8000000: 20001000 | ; initial stack pointer
8000004: 0800ABCD --. | ; address of _start in flash
.... | |
800ABCD: <_start:> movw ... <-'<-' ; Code execution starts here
(Note: this does not apply to hex images (like intel hex or s-record) as such formats define loading address explicitly and it is used as is there).

The documentation is pretty clear on where the the address space is for the application code for the stm32's which is 0x08000000 (a competing vendor is like 0x01000000, and so on). And that when booting in a certain mode that 0x08000000 is mapped to address 0x00000000 as can easily be seen with a debugger (in both spaces).
The address space at 0x00000000 mapped to 0x08000000 is smaller than the potential address space at 0x08000000 depending on the chip. So it is wise to build for and use 0x08000000 rather than 0x00000000 but for small programs you can choose either.
Because the cortex-m is a vector table machine when the logic reads address 0x00000004 which is mapped to 0x08000004 in a normal boot mode it sees 0x080xxxxx and then gets out of the 0x00000000 memory space, avoiding any limitations there.
When you use the boot0/boot1 strap pins you can instead cause 0x00000000 to map elsewhere where the burned in bootloader lives. That bootloader of course can easily read 0x08000000 and easily simulate a reset by branching or it can change the logic and actually reset (if you ask it to, although I don't know if that bootloader actually supports running a program). Who knows if we did work there we couldn't necessarily say. Quite possible it always boots into the bootloader and then it changes the mapping depending on the straps.
Similar to an mmu but much simpler decoding addresses and aliasing them is pretty easy. if boot0 == 0 and address[31:16] = 0x0000 then address[31:16]=0x0800 and the memory system decodes it at the different address, as easy as it is to write in C it is that easy in the HDL if not easier.
This is not uncommon to be found in microcontrollers as well as others, but since microcontrollers generally boot from a flash/rom but that same boot space on some architectures is also the vector or exception table that an rtos might want to manipulate sometimes you see that ram can be swapped into that space so the cpu "sees" some ram after a control register is changed where on boot it "saw" the vector table on flash. that or you have the code on flash branch to somewhere in ram for the non-reset vector and then the rtos or any other application that cares to do this can make runtime changes to what code actually gets run for those exceptions or interrupts.
ARM imposes address space rules for where code can execute and data can live and where you might want to start your peripheral address space and what address space is reserved by arm for resources within the core. so you will sometimes see ram have an alias at a lower address implying that if you want to run a program in ram you want to use the lower address for execution but can use either address to copy the code there.
Up to the chip designers as to how simple or complicated to make this. For ST its pretty simple then have one or more boot pins on the package that at least let you choose between your application and the on chip bootloader, so far all the stm32s I have seen the application flash space is considered to live at 0x08000000 and is mapped/aliased to 0x00000000 for one of those boot modes. When there are two boot pins exposed then up to four possible boot conditions can exist of which one is the application with 0x00000000 aliased to 0x08000000.
As to how to get the bits into the flash, that varies widely by tool. The toolchains like gnu certainly will build a .bin file where the first byte of the file is the first byte from the elf that we desire to have at 0x08000000 (if built that way, if you built for 0x02000000 it will still be the first byte, and that code probably won't work). There are tools and you can certainly write your own, that knowing that can load a .bin file at the desired place of 0x08000000 or you can have your too write to address 0x00000000 in the right mode for a program that is not too big and have it still land in the right place to execute on reset. Likewise there are or you can write tools that can parse .elf files, intel hex, motorola srecord and others and based on the information in those binaries have the data be loaded into the address space you desire, assuming everything is bug free.
You might be trying to overcomplicate it. There isn't any magic to it, the tools need to do the sane thing and the sane thing is to take the binary from the compiler and put it in the chip where we want. We are responsible for the linker script and such and bootstrap code/vector table of course, but if we do that right the tools if they are designed right will put the bits in the right place in the chip and if the chip is designed right as documented then it will boot and run.
Will the bootloader actually be placed at 0x0? If so, what's the
purpose of defining the memory sectors in the linker file?
Ideally you want your application or bootloader as you are calling it to be at address 0x08000000 in the processors address space. In certain boot modes (boot0/boot1) that address is also aliased to 0x00000000 so you can see that vector table at both places at the same time. If you are not in the right boot mode then only 0x08000000 will show your code.
Is this because 0x0 belongs to flash memory, and when the MCU starts,
all the flash is copied into RAM at address 0x8000000? If so, will the
bootloader be executed from flash memory and all the rest of the code
from RAM?
The logic in the chip is designed to take the address the processor puts on its address bus and have more than one address land on the application flash, the application flash is not at 0x08000000 if its a 16Kbyte flash for example its only got an address from 0x0000 to 0xFFFF when you access 0x08001234 it actually sends 0x1234 to the flash controller and or the flash controller chops the top off if it knows it is supposed to handle that request. 0x00000000, 0x08000000 are the processors view of the address space, the reality is the upper bits are decoded and route the request to whomever it belongs to and the final handler ultimately looks at the lower bits to determine what is being addressed.
Like when you deliver a letter it has a first and last name, a street address, city state zip. Once it gets to the right post office in the right state then the street address is all that matters to the postal person. Once it gets to the right house, often the first name is all that matters, the rest can be ignored. No difference here. Portions of the address (can often) become don't cares as the responsible logic that inspects that address aims the request at the correct party.
Taking into account the above questions, if I have not understood
anything, what's the relation/difference between the LMA and the File offset?
the elf file format is generic, way overkill for microcontroller work but being well supported and easy to use why not. The load memory address is where we the programmer have desired that code to live with respect to the processors view of the world. From a readelf perspective the offset in the file is the offset for that information in the elf file and it is just wherever the tool put it it has no other interesting relationship. Or at least doesn't need to. Objcopy will rip that data out of the file and for -O binary put it in a sort of memory image file with the lowest address being copied out being offset 0 in that file and the size being determined by the total address space for all of the loadable blocks (unless you use more command line parameters).
And as you sort of implied but if you think about it and have a linker script bug if you were to have even a single instruction at 0x08000000 and a single byte of .data at 0x20000000 but didn't do the AT > thing then your file despite only having three relevant bytes will be 0x20000001 - 0x08000000 bytes long. (after a -O binary) so good idea to not put objcopy in your make file until you have debugged your linker script. Imagine say a target where flash is 0x00000000 and memory is 0xE0000000, pretty big .bin files until you get the linker script sorted out.

Related

How do memory addresses in binary programs point to the right place in memory at runtime?

From what I understand when you compile a program (let's say a C program for example), the Compiler takes your code and outputs a executable program in binary (i.e. machine code for the targeted arch) format.
Within this binary you're going to have instructions that point to addresses in memory to load data/instructions from other parts of the program.
Given this program will be loaded into memory at some arbitrary location, how does the program know what these memory addresses are? How are they set/calculated and who's job is it to do this?
For example, does the binary just have placeholders for the memory locations that are replaced by the OS when it loads it into memory for the first time?
If it needs to dynamically load a shared library how does it work out where the memory location is for that?
How does 'virtual memory' come into play with this? (if at all)
how does the program know what these memory addresses are?
The program (and its author) does not know what the memory address will be when it's loaded to computer memory, it only knows where the placeholder is, relative to the start of its segment. That's why the compiler accompanies each such placeholder with relocation record. Relocation is a piece of information which tells the OS or the linker
where the relocated address is (its offset in code or data segment)
which segment it is in
which segment or symbol it refers
what kind of relocation should apply on the address
Consider the following simple piece or source code of Windows Portable executable program:
[.text]
Main:NOP
LEA ESI,[Mem]
; more instructions
[.data]
DB "Some data"
Mem: DB "Other data"
which will be converted to machine instructions and memory data:
|[.text] |[.text]
|00000000:90 |Main:NOP
|00000001:8D35[09000000] | LEA ESI,[Mem]
|00000007: | ; more instructions
|[.data] |[.data]
|00000000:536F6D6520646174~| DB "Some data"
|00000009:4F74686572206461~|Mem: DB "Other data"
Compiler does not know the virtual address of Mem, it only knows that it is located 0x00000009 bytes from the start of .data segment, so it will put this temporary number into operation code of LEA ESI,[Mem] and creates relocation of the placeholder (located in segment .text at offset 0x00000003) which is relative to segment .data.
At link-time the linker decides that .text segment will be loaded at virtual address 0x00401000 and .data segment at VA 0x00402000. Linker then reads the relocation record and modifies the placeholder by adding 0x00402000. Instruction LEA ESI,[Mem] in the linked executable then will be 8D3509204000, which is the final fixed-up virtual address of Mem. We'll be able to see that address in debugger at run-time.
Relocations are present in linked executable files, too (16bit DOS MZ or Windows PE), for the case that they could not be loaded at the virtual imagebase address assumed at link time. With linking SO libraries in Linux it is more complicated, see chapter 2 Dynamic linking in http://www.skyfree.org/linux/references/ELF_Format.pdf
MMUs allow the OS to create the same address space (think addresses zero to N) for each application such that each application can be compiled for a known address space. There isn't much need for relocation in this situation. Even in the DOS days you could/would have a fixed offset relative to some data segment so that the applications could have an assumed address space.
The kernel bootstrap for Linux is a place where you will see relocation but the kernel itself not so much or perhaps that has changed in the last so many years.
Loadable modules and shared libraries would be one place where you might see relocation required. For at least the popular processors running the popular operating systems (Linux, Windows, macOS, arm, x86, mips) the code itself can be built to be relocatable without modification so long as it is all relative to itself, which is what is assumed.
Data relative to code though if you want to move the data then some form of table is typical, where the table is fixed relative to the code (or some other linked mechanism), but it contains information to tell where the data starts, or specific items/markers in the data start so that other data references can be relative to that.

STM32 Current Flash Vector Address

I'm working on a dual OS system with STM32F103, I have two separate program that programmed on different FLASH locations. if both of the programs are the same, the only way to know which of them running is just by its start vector address.
But How I Can Read The Current Program Start Vector Address in STM32 ???
After reading the comments, it sounds like what you have/want is a bootloader. If your goal here is to have two different applications, one to do your main processing and real time handling and the other to just program new firmware, then you want to make a bootloader in your default boot flash space.
Bootloaders fundamentally do a few things, everything else is extra.
Check itself using some type of data integrity check like a CRC.
Checks the application
Jumps to the application.
Bootloaders will also program applications in the app space and verify they are programmed correctly before jumping as well. Colin gave some good advice about appending a CRC to the hex file before it is programmed in flash space to verify the applications.
There are a few things to look out for. The first would be the linker script and this is extremely important. A linker script will be used to map input objects to output objects and then determine based upon that script, what memory space they go into. For both of your applications, you need to create a memory map of how you want both programs to sit inside of the flash space. From this point, you can then make linker scripts for both programs so that a hex file can be generated within the parameters of what you deem acceptable flash space for the program. Each project you have will have its own linker script. An example would look something like this:
LR_IROM1 0x08000000 0x00010000 { ; load region size_region
ER_IROM1 0x08000000 0x00010000 { ; load address = execution address
*.o (RESET, +First)
*(InRoot$$Sections)
.ANY (+RO)
}
RW_IRAM1 0x20000000 0x00018000 { ; RW data
.ANY (+RW +ZI)
}
}
This will give RAM for the application to use as well as a starting point for the application.
After that, you can start the bootloader and give it information about where the application space lies for jumping and programming. Once again this is all determined by you from your memory map and both applications' linker scripts. You are going to need to add a separate entry inside of the linker for your CRC and length for a comparison of the calculated versus stored as well. Whatever tool you use to append the CRC to the hex file and have it programmed to flash space, remember to note the location and make it known to the linker script so you can reference those addresses to check integrity later.
After you check everything and it is determined that it is okay to go to the application, you can use some ARM assembly to jump to the starting application address. Before jumping, make sure to disable all peripherals and interrupts that were enabled in the bootloader. As Colin mentioned, these will share RAM, so it is important you de-initialize all used, otherwise, you'll end up with a hard fault.
At this point, the program used another hex file laid out by a linker script, so it should begin executing as planned, as long as you have the correct vector table offset, which gets into your question fully.
As far as your question on the "Flash vector address", I think what your really mean is your interrupt vector table address. An interrupt vector table is a data structure in memory that maps interrupt requests to the addresses of interrupt handlers. This is where the PC register grabs the next available instruction address upon hardware interrupt triggers, for example. You can see this by keeping track of the ARM pipeline in a few lines of assembly code. Each entry of this table is a handler's address. This offset must be aligned with your application, otherwise you will never go into the main function and the program will sit in the application space, but have nothing to do since all handlers addresses are unknown. This is what the SCB->VTOR is for. It is a vector interrupt table offset address register. In this case, there are a few things you can do. Luckily, these are hard-coded inside of STM generated files inside of the file "system_stm32(xx)xx.c" (xx is your microcontroller variant). There is a define for something called VECT_TAB_OFFSET which is the offset in the memory map of the vector table and is assigned to the SCB->VTOR register with the value that is chosen. Your interrupt vector table will always lie at the starting address of your main application, so for the bootloader it can be 0x00, but for the application, it will be the subtraction of the starting address of the application space, and the first addressable flash address of the microcontroller.
/************************* Miscellaneous Configuration ************************/
/*!< Uncomment the following line if you need to relocate your vector Table in
Internal SRAM. */
/* #define VECT_TAB_SRAM */
#define VECT_TAB_OFFSET 0x00 /*!< Vector Table base offset field.
This value must be a multiple of 0x200. */
/******************************************************************************/
Make sure you understand what is expected from the micro side using STM documentation before programming things. Vector tables in this chip can only be in multiples of 0x200. But to answer your question, this address can be determined by a few things. Your memory map, and eventually, you will have a hard-coded reference to it as a define. You can figure it out from there.
Hope this helps and good luck to you on your application.

how Byte Address memory in Altera FPGA?

I worked with megafunctions to generate 32bit data memory in the fpga.but the output was addressed 32bit (4 bytes) at time , how to do 1 byte addressing ?
i have Altera Cyclone IV ep4ce6e22c8.
I'm designing a 32bit CPU in fpga ,
Nowadays every CPU address bus works in bytes. Thus to access your 32-bit wide memory you should NOT connect the LS 2 address bits. You can use the A[1:0] address bits to select a byte (or half word using A[1] only) from the memory when your read.
You still will need four byte write enable signals. This allows you to write word, half-words or bytes.
Have a look at existing CPU buses or existing connection standards like AHB or AXI.
Post edit:
but reading address 0001 , i get 0x05060708 but the desired value is 0x02030405.
What you are trying to do is read a word from a non-aligned address. There is no existing 32-bit wide memory that supports that. I suggest you have a look at how a 32-bit wide memory works.
The old Motorola 68020 architecture supported that. It requires a special memory controller which first reads the data from address 0 and then from address 4 and re-combines the data into a new 32-bit word.
With the cost of memory dropping and reducing CPU cycles becoming more important, no modern CPU supports that. They throw an exception: non-aligned memory access.
You have several choices:
Build a special memory controller which supports unaligned accesses.
Adjust your expectations.
I would go for the latter. In general it is based on the wrong idea how a memory works. As consolidation: You are not the first person on this website who thinks that is how you read words from memory.

AVR memory and intel hex

I work on a simple AVR programmer for my university project, and I am stuck with understanding how I can map memory from hex file to actual flash memory.
For instance, intel hex provides us the information about start address of data block, number of bytes in it and data itself. The trouble comes from that AVR MCUs, in particular ATmega16, often have one address for two bytes: high and low.
At first, I wrote a straightforward function, that just reads all the data from hex file and write it sequentially, increasing address by one each two bytes passed. To my surprise it works on simple blinky code. However, I am not sure, if this approach would work, if someone needs complex memory structure.
So the questions are:
Will this solution work on complex memory structures?
If not, how can I map intel hex address into actual flash address? The problem is there is no high and low bytes in intel hex format, only address = byte.
Intel hex uses byte addresses. The PC program counter refers to 16-bit word addresses. If you mean the word address to be the "actual address", then just double the number that represents the start address of the line in the hex file.
What do you mean by "complex memory structures"? Memory locations need unique addresses, no matter how that address space is broken up. I am not familiar with program memory spaces that don't start with 0 and continue linearly, but if there were such a scheme, a line in an intel hex file can specify the contents of any contiguous memory section starting at any address.
Edit:
Each line of an intel hex file can only contain up to 255 bytes. Typically, the data is split into 16 or 32 bytes chunks. Each line contains the start address of the chunk (which is added to the base address if used). A chunk doesn't have to start at the end of a previous chunk, and they can be out of order, too.
As for the complex memory structures you describe, most programs have them already. There is usually a vector table at the start, followed by a gap, followed by the crt and main program. Data to initialize global variables follows that. If there is a bootloader, it is placed in a special section at the end of memory.

ARM: Safe physical memory position (to reserve) for my ARM hypervisor in relation to a Linux/Android guest

I am developing a basic hypervisor on ARM (using the board Arndale Exynos 5250).
I want to load Linux(ubuntu or smth else)/Android as the guest. Currently I'm using a Linaro distribution.
I'm almost there, most of the big problems have already been dealt with, except for the last one: reserving memory for my hypervisor such that the kernel does not try to OVERWRITE it BEFORE parsing the FDT or the kernel command line.
The problem is that my Linaro distribution's U-Boot passes a FDT in R2 to the linux kernel, BUT the kernel tries to overwrite my hypervisor's memory before seeing that I reserved that memory region in the FDT (by decompiling the DTB, modifying the DTS and recompiling it). I've tried to change the kernel command-line parameters, but they are also parsed AFTER the kernel tries to overwrite my reserved portion of memory.
Thus, what I need is a safe memory location in the physical RAM where to put my hypervisor's code at such that the Linux kernel won't try to access (r/w) it BEFORE parsing the FDT or it's kernel command line.
Context details:
The system RAM layout on Exynos 5250 is: physical RAM starts at 0x4000_0000 (=1GB) and has the length 0x8000_0000 (=2GB).
The linux kernel is loaded (by U-Boot) at 0x4000_7000, it's size (uncompressed uImage) is less than 5MB and it's entry point is set to be at 0x4000_8000;
uInitrd is loaded at 0x4200_0000 and has the size less than 2MB
The FDT (board.dtb) is loaded at 0x41f0_0000 (passed in R2) and has the size less than 35KB
I currently load my hypervisor at 0x40C0_0000 and I want to reserve 200MB (0x0C80_0000) starting from that address, but the kernel tries to write there (a stage 2 HYP trap tells me that) before looking in the FDT or in the command line to see that the region is actually reserved. If instead I load my hypervisor at 0x5000_0000 (without even modifying the original DTB or the command line), it does not try to overwrite me!
The FDT is passed directly, not through ATAGs
Since when loading my hypervisor at 0x5000_0000 the kernel does not try to overwrite it whatsoever, I assume there are memory regions that Linux does not touch before parsing the FDT/command-line. I need to know whether this is true or not, and if true, some details regarding these memory regions.
Thanks!
RELATED QUESTION:
Does anyone happen to know what is the priority between the following: ATAGs / kernel-command line / FDT? For instance, if I reserve memory through the kernel command-line, but not in the FDT (.dtb) should it work or is the command-line overriden by the FDT? Is there somekind of concatenation between these three?
As per
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/arm/Booting, safe locations start 128MB from start of RAM (assuming the kernel is loaded in that region, which is should be). If a zImage was loaded lower in memory than what is likely to be the end address of the decompressed image, it might relocate itself higher up before it starts decompressing. But in addition to this, the kernel has a .bss region beyond the end of the decompressed image in memory.
(Do also note that your FDT and initrd locations already violate this specification, and that the memory block you are wanting to reserve covers the locations of both of these.)
Effectively, your reserved area should go after the FDT and initrd in memory - which 0x50000000 is. But anything > 0x08000000 from start of RAM should work, portably, so long as that doesn't overwrite the FDT, initrd or U-Boot in memory.
The priority of kernel/FDT/bootloader command line depends on the kernel configuration - do a menuconfig and check under "Boot options". You can combine ATAGS with the built-in command lines, but not FDT - after all, the FDT chosen node is supposed to be generated by the bootloader - U-boot's FDT support is OK so you should let it do this rather than baking it into the .dts if you want an FDT command line.
The kernel is pretty conservative before it's got its memory map since it has to blindly trust the bootloader has laid things out as specified. U-boot on the other hand is copying bits of itself all over the place and is certainly the culprit for the top end of RAM - if you #define DEBUG in (I think) common/board_f.c you'll get a dump of what it hits during relocation (not including the Exynos iRAM SPL/boot code stuff, but that won't make a difference here anyway).

Resources