A problem about the code example in OSTEP - memory

In OSTEP(Operating Systems: Three Easy Pieces), the author offers a simple c program code to show how OS Virtualizing it's Memory
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include "common.h"
int
main(int argc, char *argv[])
{
int *p = malloc(sizeof(int));
//assert(p != NULL);
printf("(%d) address pointed to by p: %p\n", getpid(), p);
*p = 0;
while (1) {
sleep(1);
*p = *p + 1;
printf("(%d) p: %d\n", getpid(), *p);
}
return 0;
}
the book says that because of the virtualizing process the outcome shuld be:
prompt> ./mem &; ./mem &
[1]24113
[2]24114
(24113) address pointed to by p: 0x200000 (24114) address pointed to
(24114) address pointed to by p: 0x200000
(24113) p: 1
(24114) p: 1
(24114) p: 2
(24113) p: 2
(24113) p: 3
(24114) p: 3
(24113) p: 4
(24114) p: 4
the author explains why this happens:
Now, we again run multiple instances of this same program to see what
happens (Figure 2.4). We see from the example that each running
program has allocated memory at the same address (0x200000), and yet
each seems to be updating the value at 0x200000 independently! It is
as if each running program has its own private memory, instead of
sharing the same physical memory with other running programs 5 .
...
but in my computer(ubuntu) the outcome is:
it makes me feel really confuse...

In an attempt to frustrate malware, ubuntu is configured to load programs at pseudo-random addresses; so when your program is loaded into memory, the OS choose some base address, then loads the program relative to that. This is widely known as Address Space Layout Randomization (ASLR).
There is a kernel option: /proc/sys/kernel/randomize_va_space which you can set to zero to disable this feature. Note that if this offers any security, by disabling it you are losing that security feature.
as root:
$ oldval = $(cat /proc/sys/kernel/randomize_va_space)
$ echo 0 >> /proc/sys/kernel/randomize_va_space
then, after you have run your examples as somebody other than root, return to this session and:
$ echo "$oldval" >> /proc/sys/kernel/randomize_va_space

Related

Is Linux perf imprecise for page fault and tlb miss?

I wrote a simple program to test page faults and tlb miss with perf.
The code is as follow. It writes 1 GB data sequentially and is expected
to trigger 1GB/4KB=256K tlb misses and page faults.
#include<stdio.h>
#include <stdlib.h>
#define STEP 64
#define LENGTH (1024*1024*1024)
int main(){
char* a = malloc(LENGTH);
int i;
for(i=0; i<LENGTH; i+=STEP){
a[i] = 'a';
}
return 0;
}
However, the result is as follow and far smaller than expected. Is perf so imprecise? I would be very appreciated if anyone can run the code on his machine.
$ perf stat -e dTLB-load-misses,page-faults ./a.out
Performance counter stats for './a.out':
12299 dTLB-load-misses
1070 page-faults
0.427970453 seconds time elapsed
Environment: Ubuntu 14.04.5 LTS , kernel 4.4.0; gcc 4.8.4 glibc 2.19. No compile flags.
The CPU is Intel(R) Xeon(R) CPU E5-2640 v2 # 2.00GHz.
The kernel prefetches pages on a fault, at least after it has evidence of a pattern. Can't find a definitive reference on the algorithm, but perhaps https://github.com/torvalds/linux/blob/master/mm/readahead.c is a starting point to seeing what is going on. I'd look for other performance counters that capture the behavior of this mechanism.

Pre-Processing using m4

I am writing a pre-processor for Free-Pascal (Course Work) using m4. I was reading the thread at stackoverflow here and from there reached a blog which essentially shows the basic usage of m4 for pre-processing for C. The blogger uses a testing C file test.c.m4 like this:
#include
define(`DEF', `3')
int main(int argc, char *argv[]) {
printf("%d\n", DEF);
return 0;
}
and generates processed C file like this using m4, which is fine.
$ m4 test.c.m4 > test.c
$ cat test.c
#include <stdio.h>
int main(int argc, char *argv[]) {
printf("%dn", 3);
return 0;
}
My doubts are:
1. The programmer will write the code where the line
define(`DEF', `3')
would be
#define DEF 3
then who converts this line to the above line? We can use tool like sed or awk to do the same but then what is the use of m4. The thing that m4 does can be implemented using sed also.
It would be very helpful if someone can tell me how to convert the programmer's code into a file that can be used by m4.
2. I had another issue using m4. The comment in languages like C are removed before pre-processing so can this be done using m4? For this I was looking for commands in m4 by which I can replace the comments using regex and I found regexp(), but it requires the string to be replaced as argument which is not available in this case. So how to achieve this?
Sorry if this is a naive question. I read the documentation of m4 but could not find a solution.
m4 is the tool that will convert DEF to 3 in this case. It is true that sed or awk could serve the same purpose for this simple case but m4 is a much more powerful tool because it a) allows macros to be parameterized, b) includes conditionals, c) allows macros to be redefined through the input file, and much more. For example, one could write (in the file for.pas.m4, inspired by ratfor):
define(`LOOP',`for $1 := 1 to $2 do
begin')dnl
define(`ENDLOOP',`end')dnl
LOOP(i,10)
WriteLn(i);
ENDLOOP;
... which produces the following output ready for the Pascal compiler when processed by m4 for.pas.m4:
for i := 1 to 10 do
begin
WriteLn(i);
end;
Removing general Pascal comments using m4 would not be possible but creating a macro to include a comment that will be deleted by `m4' in processing is straightforward:
define(`NOTE',`dnl')dnl
NOTE(`This is a comment')
x := 3;
... produces:
x := 3;
Frequently-used macros that are to be expanded by m4 can be put in a common file that can be included at the start of any Pascal file that uses them, making it unnecessary to define all the required macros in every Pascal file. See include (file) in the m4 manual.

I am trying to use i2c on the beagle bone black with c++ but I keep getting 0x00 returned

Hi I am trying to read and write data from the i2c bus on the beagle bone black. But I keep reading 0x00 whenever I try to access the Who Am I register on a MMA84152 (or any other register for that matter), which is a constant register which means its value does not change. I am trying to read i2c-1 character driver located in /dev and I connect the sda and scl of the MMA852 lines to pins 19 and 20 on the p9 header. The sda and scl lines are both pulled high with 10 k resistors. Both pin 19 and pin 20 show 00000073 for their pin mux which means it is set for i2c functionality and slew control is slow, reciever is active, the pin is using a pull up resistor, and pull up is enabled. I ran i2cdetect -r 1 and my device shows up as 0x1d which is its correct address. I also ran i2cdump 1 0x1d and 0x2a shows up under 0x0d which is the register I am trying to read from my device and contains the correct value according to the datasheet. But when I read it it returns 0x00 for me. I am also running the latest angstrom distribution and I am logged in under root so no need for sudo.I'm lost now. Here is the code:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <linux/i2c-dev.h>
#include <linux/i2c.h>
#include <sys/ioctl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string>
using namespace std;
int main(int argc, char **argv){
int X_orientation=0;
char buffer1[256];
string i2cDeviceDriver="/dev/i2c-1";
int fileHandler;
if((fileHandler=open(i2cDeviceDriver.c_str(),O_RDWR))<0){
perror("Failed To Open i2c-1 Bus");
exit(1);
}
if(ioctl(fileHandler,I2C_SLAVE,0x1d)<0){
perror("Failed to acquire i2c bus access and talk to slave");
exit(1);
}
char buffer[1]={0x0D};
if(write(fileHandler,buffer,1)!=1){
perror("Failed to write byte to accelerometer");
exit(1);
}
if(read(fileHandler,buffer1,1)!=1){
perror("Failed to read byte from accelerometer");
exit(1);
}
printf("Contents of WHO AM I is 0x%02X\n",buffer1[0]);
}
It is likely that your I2C device does not support using separate write() and read() commands to query registers (or the equivalent i2c_smbus_write_byte() and i2c_smbus_read_byte() commands). The kernel adds a 'stop bit' to separate your messages on the wire, and some devices do not support this mode.
To confirm:
Try using the Linux i2cget command with the -c ('write byte/read byte' mode, uses separate read and write messages with a stop bit between) flag:
$ i2cget -c 1 0x1d 0x0d
Expected result: 0x00 (Incorrect response)
Then try using i2cget with the -b ('read byte data' mode, which combines the read and write messages without a stop bit) flag:
$ i2cget -b 1 0x1d 0x0d
Expected result: 0x2a (Correct response)
To resolve:
Replace your read() and write() commands with a combined i2c_smbus_read_byte_data() command if available on your system:
const char REGISTER_ID = 0x0d;
char result = i2c_smbus_read_byte_data(fileHandler, REGISTER_ID);
Alternatively (if the above is not available), you can use the the ioctl I2C_RDWR option:
const char SLAVE_ID = 0x1d;
const char REGISTER_ID = 0x0d;
struct i2c_rdwr_ioctl_data readByteData;
struct i2c_msg messages[2];
readByteData.nmsgs = 2;
readByteData.msgs = messages;
// Write portion (send the register we wish to read)
char request = REGISTER_ID;
i2c_msg& message = messages[0];
message.addr = SLAVE_ID;
message.flags = 0; // 0 = Write
message.len = 1;
message.buf = &request;
// Read portion (read in the value of the register)
char response;
message = messages[1];
message.addr = SLAVE_ID;
message.flags = I2C_M_RD;
message.len = 1; // Number of bytes to read
message.buf = &response; // Where to place the result
// Submit the combined read+write message
ioctl(fileHandler, I2C_RDWR, &readByteData);
// Output the result
printf("Contents of register 0x%02x is 0x%02x\n", REGISTER_ID, response);
More information:
https://www.kernel.org/doc/Documentation/i2c/smbus-protocol
https://www.kernel.org/doc/Documentation/i2c/dev-interface

malloc() not showing up in System Monitor

I wrote a program which only purpose was to allocate a given amount of memory so that I could see its effect on the Ubuntu 12.04 System Monitor.
This is what I wrote
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(int argc, char **argv)
{
if(argc<2)
exit(-1);
int *mem=0;
int kB = 1024*atoi(argv[1]);
mem = malloc(kB);
if(mem==NULL)
exit(-2);
sleep(3);
free(mem);
exit(0);
}
I used sleep() so that the program would not end immediately so that System Monitor would have time to show the change in memory.
What happens (and I'm puzzled about) is that even if I allocate 1GB, the System Monitor shows no memory change at all! Why is that so?
I thought the reason could be because the allocated memory was never accessed so I tried to insert the following before sleep()
int i;
for(i=0; i<kB/sizeof(int); i++)
mem[i]=i;
But this too had no effect on the graph in System Monitor.
Thanks.

tbb::parallel_for running out of memory on machine with 80 cores

I am trying to use tbb::parallel_for on a machine with 160 parallel threads (8 Intel E7-8870) and 0.5 TBytes of memory. It is a current Ubuntu system with kernel 3.2.0-35-generic #55-Ubuntu SMP. TBB is from the package libtbb2 Version 4.0+r233-1
Even with a very simple task, I tend to run out of resources, either "bad_alloc" or "thread_monitor Resource temporarily unavailable". I boiled it down to this very simple test:
#include <vector>
#include <cstdlib>
#include <cmath>
#include <iostream>
#include "tbb/tbb.h"
#include "tbb/task_scheduler_init.h"
using namespace tbb;
class Worker
{
std::vector<double>& dst;
public:
Worker(std::vector<double>& dst)
: dst(dst)
{}
void operator()(const blocked_range<size_t>& r ) const
{
for (size_t i=r.begin(); i!=r.end(); ++i)
dst[i] = std::sin(i);
}
};
int main(int argc, char** argv)
{
unsigned int n = 10000000;
unsigned int p = task_scheduler_init::default_num_threads();
std::cout << "Vector length: " << n << std::endl
<< "Processes : " << p << std::endl;
const size_t grain_size = n/p;
std::vector<double> src(n);
std::cerr << "Starting loop" << std::endl;
parallel_for(blocked_range<size_t>(0, n, grain_size), RandWorker(src));
std::cerr << "Loop finished" << std::endl;
}
Typical output is
Vector length: 10000000
Processes : 160
Starting loop
thread_monitor Resource temporarily unavailable
thread_monitor Resource temporarily unavailable
thread_monitor Resource temporarily unavailable
The errors appear randomly, and more frequent with greater n. The value of 10 million here is a point where they happen quite regularly. Nevertheless, given the machine characteristics, this should by far not exhaust the memory (I am using it alone for these tests).
The grain size was introduced after tbb created too many instances of the Worker, which made it fail for even smaller n.
Can anybody advise on how to set up tbb to handle large numbers of threads?
Summarizing the discussion in comments in an answer:
The message "thread_monitor Resource temporarily unavailable in pthread_create" basically tells that TBB cannot create enough threads; the "Resource temporarily unavailable" is what strerror() reports for the error code returned by pthread_create(). One possible reason for this error is insufficient memory to allocate stack for a new thread. By default, TBB requests 4M of stack for a worker thread; this value can be adjusted with a parameter to tbb::task_scheduler_init constructor if necessary.
In this particular case, as Guido Kanschat reported, the problem was caused by ulimit accidentally set which limited the memory available for the process.

Resources