OP_JMP opcode scalability - lua

My app uses a customized version of Lua 5.0.2 and some customers feed very large (automatically generated) scripts into it. For example, scripts that are 200 megabytes or more in size. It all seems to work fine but I'm wondering how scalable the Lua VM is.
I'm especially concerned about the OP_JMP scalability. In Lua 5.0.2 OP_JMP uses 18 bits to code the destination PC. So it can jump 262144 instructions at a maximum. Can this limitation be a problem with very large scripts? For example, a 200 megabyte script in which a function at the very bottom of the script calls a function at the very top of the script? Will this hit an OP_JMP limit or are function calls implemented in a different way and OP_JMP is only used for coding control structures like for, if, ... which are very unlikely to have to jump more than 2^18 instructions...
Thanks to anybody who can shed some light onto this!

Related

Write protect a stack page on AIX?

I've a program where argv[0] gets overwritten from time to time. This happens (only) on a production machine which I cannot access and where I cannot use a debugger. In order to find the origin of this corruption, I'd like to write protect this stack page, so that any write access would be turned in a fault, and I could get the address of the culprit instruction.
The system is an AIX 5.3 64 bits based. When I try to invoke mprotect on my stack page, I get an ENOMEM error. I'm using gcc to generate my program.
On a Linux system (x86 based) I can set a similar protection using mprotect without trouble.
Is there any way to achieve this on AIX. Or is this a hopeless attempt?
On AIX, mprotect() requires that requested pages be shared memory or memory mapped files only. On AIX 6.1 and later, you can extend this to the text region, shared libraries, etc, with the MPROTECT_TXT environment variable.
You can however use the -qstackprotect option on XLC 11/AIX 6.1TL4 and later. "Stack Smashing Protection" is designed to protect against exactly the situation you're describing.
On AIX 5.3, my only suggestion would be to look into building with a toolset like Parasoft's Insure++. It would locate errant writes to your stack at runtime. It's pretty much the best (and now only) tool in the business for AIX development. We use it in house and its invaluable when you need it.
For the record, a workaround for this problem is to move processing over to a pthread thread. On AIX, pthread thread stacks live in the data segment which can be mprotected (as opposed to the primordial thread, which cannot be mprotected). This is the way the JVM (OpenJDK) on AIX implements stack guards.

Best setting for HTTPJVMMaxHeapSize in Domino 8.5.3 64 Bit

I am trying to find a definitive answer as to what the best setting is for the JVM Heap Size in Domino 8.5.3 FP4 - 64 Bit for Windows.
I know that by default it's set to 1024M. Some web sites have suggested that it's recommended to be 1G / 1024M - but that's the default setting so it that as good as it gets?
Other sites have said 25% of available RAM.
My Domino server has 12GB RAM available. It's currently got HTTPJVMMaxHeapSize = 1024M and Task Manager tells me that around 7GB of RAM is in use and nhttp.exe is using around 1.1GB. I want to increase this Heap to 2GB or 3GB if possible - will there be any issues doing to?
I'm running Windows Server 2008 R2 Standard Edition.
Speaking from an XPages perspective:
1 - Understand the dynamics of the working set and hardware
That is, understand what the application code, server runtime, and hardware profile is doing when processing a given working set (ie: the XPages application(s) within the server). Is the application coded in a non-optimal manner in terms of lifecycle execution and memory usage? Is the application making use of memory or disk persistence for component tree serialization? Is the server assigned an adequate amount of JVM memory? Is the hardware providing enough CPU and memory?
2 - Profile and monitor the working set with upper limit loads
To fully answer some of the questions in #1, detailed performance and scalability profiling must be carried out using tools like the XPages Toolbox and Eclipse Memory Analyzer. Furthermore, test the working set using Rational Performance Tester (or some other performance testing tool) to mimic real life concurrent workloads in a test environment. This allows you to set up a test environment where you can hit your application with (n) number of concurrent users using automation and collect that all invaluable data on health etc.
3 - Analyse the profile information to identify bottlenecks within the working set
Remember your working set can be one or more applications. Each doing something different, and having different load requirements. Be specific about the task at hand - do you want to tune the system more generally for all applications (for an average scale) or do you want the server to be fully optimized for a specific application (for a targeted scale)?
4 - Optimize the working set where applicable
Get in and make changes to the XSP / Java / ServerSide JavaScript code where applicable - use your knowledge of the XPages Request Processing Lifecycle and also look out for those hungry JVM memory consumers under high end load scenarios. Always favor disk persistence (disk storage is cheaper than RAM!) and code your custom Java objects and Managed Beans accordingly to cope with serialization and restoration. And make the trade-offs between function and speed in these scenario's where high scalability ends up burning CPU... a smarter UX with targetted functions / actions etc.
5 - Scale the hardware where applicable
Be prepared to increase cores, clock speeds, RAM, disk storage based on the needs of the working set - a cyclic approach to profiling, monitoring, and optimizing the working set will shed more and more light on the suitability of the hardware as this process evolves.
5 - Repeat from #2 until the working set and hardware performs and scales to the expected requirements / load expectancy of the system
I would recommend to set server's notes.ini param JavaVerboseGC=1 with console output into console.log in IBM_TECHNICAL_SUPPORT directory. After some time, take that log file and use IBM Support Assistant with tool IBM Monitoring and Diagnostic Tools for Java - Garbage Collection and Memory Visualizer (GCMV).
This could help how to interpret collected data: http://www-01.ibm.com/support/docview.wss?uid=swg27013824&aid=1.
You definitely should do that if you get OutOfMemoryException.
Heap size set too high can lead to fewer GC runs but taking longer time - server will periodically "hang" for very short time. So too high setting is not recommended.

MPI/Pthread program does not scale

I have a MPI/Pthread program in which each MPI process will be running on a separate computing node. Within each MPI process, certain number of Pthreads (1-8) are launched. However, no matter how many Pthreads are launched within a MPI process, the overall performance is pretty much the same. I suspect all the Pthreads are running on the same CPU core. How can I assign threads to different CPU cores?
Each computing node has 8 cores.(two Quad core Nehalem processors)
Open MPI 1.4
Linux x86_64
Questions like this are often dependent on the problem at hand. Most likely, you are running into a resource lock issue (where the threads are competing for a lock) -- this would look like only one core was doing any work, because only one thread can (effectively) do any work at any given time.
Setting CPU affinity for a certain thread is not a good solution. You should allow for the OS scheduler to optimally determine the physical core assignment for a given pthread.
Look at your code and try to figure out where you are locking where you shouldn't be, or if you've come up with a correct parallel solution to the problem at hand. You should also test a version of the program using only pthreads (not MPI) and see if scaling is achieved.

Get the number of cores in Erlang with Linux

i am writing a concurrent program and i need to know the number of cores of the system so then the program will know how many processes to open.
Is there command to get this inside Erlang code?
Thnx.
You can use
erlang:system_info(logical_processors_available)
to get the number of cores that can be used by the erlang runtime system.
There is also:
erlang:system_info(schedulers_online)
which tells you how many scheduler threads are actually running.
To get the number of available cores, use the logical_processors flag to erlang:system_info/1:
1> erlang:system_info(logical_processors).
8
There are two companion flags to this one: logical_processors_online shows how many are in use, and logical_processors_available show how many are available (it will return unknown when all logical processors available are online).
To know how to parallelize your code, you should rely on schedulers_online which will return the number of actual Erlang schedulers that are available in your current VM instance:
1> erlang:system_info(schedulers_online).
8
Note however that parallelizing on this value alone might not be enough. Sometimes you have other processes running that need some CPU time and sometimes your algorithm would benefit from even more parallelism (waiting on IO for example). A rule of thumb is to use the value obtained from schedulers_online as a multiplier for parallelism, but always test with different multiples to see what works best for your application.
How this information is exposed will be very operating system specific (unless you happen to be writing an operating system of course).
You didn't say what operating system you're working on. In the case of Linux, you can get the data from /proc/cpuinfo, however there are subtleties with the meaning of hyperthreading and the issue of multiple cores on the same die using a shared L2 cache (effectively you've got a NUMA architecture).

Batch processing of image

I have been assigned to write a windows application which will copy the images from source folder > and it's sub folder (there could be n number of sub folder which can have upto 50 Gb images). Each image size might vary from some kb to 20 MB. I need to resize and compress the picture.
I am clueless and wondering if that can be done without hitting the CPU to hard and on the other hand, little faster.
Is it possible ? Can you guide me the best way to implement this ?
Image processing is always a CPU intensive task. You could do little tricks like lowering the priority of the process that's preforming the image processes so it impacts your machine less, but there's very little tradeoff to be made.
As for how to do it,
Write a script that looks for all the files in the current directory and sub-directories. If you're not sure how, do a Google search. You could do this is Perl, Python, PHP, C#, or even a BAT file.
Call one of the 10,000,000 free or open-source programs to do image conversion. The most widely used Linux program is ImageMagick and there's a Windows version of it available too.

Resources