I'm working on a so-called cartridge, for the geo-location based WheriGo (http://wherigo.com) game. The architecture that is used for these cartridges is 32-bit and big endian. However, my luac will create chunks that are 64-bit and little endian.
While there is an online compilation service for WheriGo, I'd rather be able to produce the proper binary format for myself. Especially, because there are things I'd rather keep a bit obscured in a stripped chunk, loaded by loadstring(), rather than having the full debug information available.
So my question is this: How hard would it be to generate a lua tool chain, that generates byte code for a different architecture, than the one it is running under?
If the floating point representation of both machines is compatible, then this should be just modifications to ldump.c and lundump.c
Care taken to ensure types E.g. long are same size. I have done this for integer lua on x86,x64
You could always run a 32-bit big-endian machine as VM, e.g. Aurélien’s prebuilt images for Debian/mips (notes). It’ll be slow but work and can be automated easily. (Do a dist-upgrade to at least wheezy from squeeze, then get latest Lua.)
I've run VMs like that often enough… it's slow, but I think of it as batch processing: I start a job (apt or compile), then look at it occasionally (or: the next day) to see whether it finished. Most of the time, this works out pretty well; some things of course do not work right in emulation (e.g. due to emulator bugs or differences), but to get a big-endian 32-bit Lua, this might work).
Suggested reading: lua bytecode portability and middle-endian doubles on ARM (both on the lua mailing list) – since PocketPC machines are mostly ARM, you might run into that. Best to check the actual Wherigo cartridges to see what settings they use…
The gist of these postings is: endianness, sizeof(int), sizeof(size_t), sizeof(Instruction), sizeof(lua_Number), and type of lua_Number must be the same for the bytecode to be compatible across architectures (says Luiz Henrique de Figueiredo), and middle-endian floats (both single and double) do exist in the wild (steve donovan and Dimiter 'malkia' Stanev).
Do tell if you do it – I'm interested because I'm a geocacher myself (though need to figure out how to play cartridges, no player for my platforms).
Related
I'm currently working on app that loads blob of tightly packed data which contains different integer types (sized from char to int) that might not be properly aligned.
So, can I use simple *(short*)ptr or similar accesses to that data? Test on my iphone 5 shows no problem with that, but I'm not sure about all cases on all newer processors.
I did find some related informations, like this:
ARMv6 and later, except some microcontroller versions, support unaligned accesses for half-word and single-word load/store instructions with some limitations, such as no guaranteed atomicity.
but in case of words it seems that on 32-bit and 64-bit ARMs word 32 and 64 bit accordingly, which would mean short requires proper alignment on 64-bit machine.
So, can I assume this is safe, or should I use some keywords like __packed?
Or should I rather avoid it completely and recreate my data so it always have proper alignment (or always use memmove when data is from external source and cannot by permanently modified)?
It's ages ago that I tried it. And it worked, but every single access to unaligned memory caused a trap, which took considerable time. I'd suggest you measure how long it takes to add a million aligned shorts vs a million unaligned shorts. If you have a few hundred or thousand unaligned numbers, nothing to worry about.
__packed works reasonably fast. ARM has some clever instructions to do unaligned access with very few instructions. Again, I'd measure how long that takes. My experience with this is not current.
I need an int128 (and/or int256).
Is there a library or way in which I can use that in Delphi?
Note that I do not want to muck around with strings and such, support as close as possible to int64 would be ideal.
There's BigInteger, but this calls a dll to do its work, which is not acceptable.
I remember there being another library for big numbers, but I cannot remember the name...
OK, found it at: http://sourceforge.net/projects/bigint-dl/
BigInt is the Delphi library providing operations with extremely large integer numbers, known as multi-precision arithmetics. Our primary goal is to achieve maximum performance of calculations.
The sourcecode is nicely documented in Chinese :-(
It uses mostly x86 32bit assembly (no MMX etc, which is a pity).
This is an open source unit that I have used in the past for math with 'unlimited' sized integers:
http://www.bvbcode.com/code/b1uxniwl-1626766
Would that be what you were looking for?
Ps
I am on my phone now. If this is helpful I will improve the formatting later.
I know Nvidia has CUDA, but what does ATI have? I dont want to use OpenCL because I want to keep as low level to the hardware as possible.
Is it brook, or stream?
The documentation available is pretty pathetic! CUDA seems easy to get programming, but I want to use ATI specifically because of their hardware.
OpenCL is AMD's currently preferred GPU/compute language.
Brook is deprecated.
However, you can write code at a very low level, using AMD's
shader and kernel analyzer
http://developer.amd.com/tools/shader/Pages/default.aspx.
http://developer.amd.com/tools/AMDAPPKernelAnalyzer/Pages/default.aspx
E.g. http://developer.amd.com/tools/shader/PublishingImages/GSA.png
shows OpenCL code, and the Radeon 5870 assembly produced.
You can actually code directly in several forms of "assembly".
Or at least you could - the webpages no longer mention this.
(I used to have this installed for tuning and testing, but do not at the moment.)
More usually, you can code in any of several forms of AMD IL, Intermediate Language,
which is closer to the machine than OpenCL. The kernel analyzer web page says
"If your kernel is an IL kernel Stream, KernelAnalyzer will automatically compile the IL..."
I would recommend that you use OpenCL, and then look at the disassembly and tweak the OpenCL code to be better tuned. But you can work in IL, and probably still can work at an even lower level.
I would like to try some ARM assembly code with apple iOS just for educational purpose. I would like to start with some in line code within Xcode.
My understanding is that I need to compile for a iOS device, for example for my iPhone, which means that I need to pay $99/year for membership.
I don't think I can use ASM assembly code with a iOS phone simulator.
I am having an hard time on finding examples, books or documentation on ARM assembly code in Xcode env with an iPhone.
Am I doing this wrong? Maybe iOS is not the most user friendly environment to learn ARM Assembly.
Back up...
What are you trying to learn? Arm assembly or iOS programming? Pick one...
Do you have any assembly experience?
What is it you think you are wanting to learn in arm assembly? Jump in and write some full blown gui applications? You need to learn to put immediates in registers add and or and xor and save answers in registers. then read and write some memory locations. Learn to use the stack, make calls, etc. Then write your applications in C or whatever and use asm for hand tuning or use your asm skills to debug the compiler and or code. Writing applications or operating systems, etc in asm is for folks who want to make a statement, or have a specific reason, not for educational purposes.
There is some leaning toward a unified ARM assembly language that works both on the ARM based cores and the thumb2 based cores. Not for all of the assembly language needed but for places where you might want to write a module of code and not have to have a lot of if thumb elses littering the code. You can certainly get your feet wet with that here and take some of that code straight to full 32 bit ARM instructions on some other platform. thumbulator is thumb only, the common instruction set between the ARM based cores and the thumb2 based cores, basically it is the portable ARM instruction set, write the code once, it works on almost all of their cores.
If your goal is to learn iOS programming, get the kit or whatever and learn using whatever language they want you to learn, get proficient at that, learn the apis, etc. Then if you do some of the assembler stuff above then you can start to think about making calls to asm functions or inline assembler, etc, from your iOS programs. How much assembler, your choice. I wouldnt expect to see applications written in assembler for that platform I would instead search for how do I call this assembly code from my ios application or how do I do inline assembly. (dont learn inline assembly until you are good at real assembly).
There is no reason at all to pay for access to a simulator, there are many many arm simulators out there, one in mame, arms armulator in gdb and other places, a number of gameboy advance and nintendo ds simultators, etc, etc, etc. Of course there is qemu-arm. there are more simulators than you probably are willing to take the time to try, i am about 10 years or so into it myself and not tried them all.
learning assembly is not like C or python or java, I will write a minesweeper game to learn this language. You are learning the mechanics of moving the bits around, small steps, not writing usable applications. For example adding two 128 bit numbers using a 16 bit processor is a worthy assembly language project. Multiplying two numbers, any size, with a processor without a multiply instruction, that is another assembler type learning project. yes, I agree you CAN learn those things by calling asm from an iOS application, but if you dont already have the iOS developers kit and know how to write iOS applications, you have a lot of learning to do before you start thinking about assembler.
If I am way off the mark with what you were asking, no problem I will gladly remove this answer...
Even without a code signing cert, I think you should be able to go to the scheme pop-up menu (the right side of it) and choose "iOS Device"
Once you do that, then you can choose any .c or .m (or .cpp or .mm) file in your project, open the assistant editor, and choose "Assembly" from the assistant editor jumpbar. Then you can see your source code and assembly code side by side.
Or you can just go to Product menu and Generate Output -> assembly
You may find it easier to start with C code, where the function calls will be much easier to follow initially than Objective-C method calls.
What you need is an ARM development kit with a Linux-based programming toolkit. You can then install VirtualBox on your Mac, create a Linux virtual machine and install the ARM development tools on the virtual Linux machine. Make sure that when you buy an ARM dev kit, it comes with the ARM cpu, complete dev/test board, USB cables for software transfer/debugging and the complete Linux toolchain. You can find such kits for less than $99.
Another quick introduction to ARM assembly and reverse engineering: http://yurichev.com/writings/RE_for_beginners-en.pdf
I want to port a good OpenCV code on an embedded platform. Earlier such stuffs were very difficult to perform but now TI has come up with nice embedded platforms which are comparatively hassle free as they say.
I want to know following things:
Given that :
The OpenCV code is already running on PC smoothly. (obviously)
Need to determine these before purchasing the device.
Can't put the code here in stackoverflow. :P
To chose from Texas Instruments: C6000.
Questions:
How to make it sure that the porting will be done?
What steps to be taken to make it sure that after porting the code, will run (at least).
to determine whether the code might require some changes to make its run smooth.
The point 3 above is optional.
I need info which will at least give me some start up in this regard.
What I thought I should do?
I am to list the inbuilt functions down.
Then to find available online bench marking for those functions for the particular device like as shown towards the end of this doc.
...
Need to know how to proceed further?
However C6-Integra™ DSP+ARM Processor seems the best.
The best you can do is to try a device simulator (if it is available), but what you'll see there is far from perfect.
Actually, nothing can tell you how fast and how well the app will run on the embedded device before running you specific app on that specific device.
So:
Step 1 Buy it
Step 2 Try it
Things to consider:
embedded CPU architecture: Your app needs a big cache? how big is the embedded cache?
algorithm: do you use a lot of floating point operations? how good is the device at floating point ops?
do you have memory transfers? data bus on a PC is waaay faster than on embedded
hardware support: do you use a lot of double-precision calculations? they are emulated on ARMs. They are gonna kill your app (from millisecons on a PC it can go to seconds on a ARM)
Acceleration. Do your functions use SSE? (many OpenCV funcs are SSEd, even if you don't know). Do you have the NEON counterpart? (OpenCV does not have much support for that). The difference can be orders of magnitude from x86 SSE to embedded without NEON.
and many, many others.
So, again: no one can tell you how it will work. Just the combination between the specific app and the real device tells the truth.
even a run on a similar device is not relevant. It can run smoothly on a given processor, and with another, with similar freq or listed memory, it will slow down too much
This is an interesting question but run is a very generic word in this context, therefore I feel the need to break it down to other 2 questions:
Will it compile in an embedded device?
Will it run as fast/smooth as in a PC?
I've used OpenCV in a lot of different devices, including ARM, SH4, MIPS and I found out that sometimes the manufacturer of the device itself provides a compiled version of OpenCV (for my surprise), which is great. That's something you can look into, maybe the manufacturer of your device provide OpenCV binaries.
There's no way to know for sure how smooth your OpenCV application will be on the target device unless you are able to find some benchmark of OpenCV running in there. PCs have far better processing power than embedded devices, so you can expect less performance from the target device.
There are 3rd party applications like opencv-performance, that you can use to test/benchmark the environment once you get your hands on it. And if performance is such a big deal in this project, you might also be interested in this nice article which explain some timing tests done on couple of OpenCV features comparing implementations using the C and C++ interfaces of OpenCV.