Operating Systen three easy pieces - translate stack memory address

Operating Systen three easy pieces - translate stack memory address - memory

I am reading chapter 16 from OSTEP on memory segmentation.
In a example of the section, it translate the 15KB virtual address to physical address:
| Segment | Base | Size | Grow Positive |
| Code | 32KB | 2K | 1 |
| Heap | 34KB | 2K | 1 |
| Stack | 28KB | 2K | 0(negative) |
to translate 15KB virtual address to physical (in the text book):
15KB translate to bit => 11 1100 0000 00000
the top 2 bit(11) determined the segment, which is stack.
left with 3KB used to obtain correct offset:
3KB - maximum segment size = 3KB - 4KB = -1KB
physical address = 28KB -1KB = 27KB
My question is, in step 4, why is the maximum segment 4KB--isn't it 2KB?

in step 4, why is the maximum segment 4KB--isn't it 2KB?
For that part of that book; they're assuming that the hardware uses the highest 2 bits of the (14-bit) virtual address to determine which segment is being used. This leaves you with "14-2 = 12 bits" for the offset within a segment, so it's impossible for the hardware to support segments larger than 4 KiB (because the offset is 12 bits and 2**12 is 4 KiB).
Of course just because the maximum possible size of a segment is 4 KiB doesn't mean you can't have a smaller segment (e.g. a 2 KiB segment). For expand down segments I'd assume that the hardware being described in the book does something like "if(max_segment_size - offset >= segment_limit) { segmentation_fault(); }", so if the segment's limit is 2 KiB and "max_segment_size - offset = 4 KiB - 3 KiB = 1 KiB" it'd be fine (no segmentation fault) because 1 KiB less than the segment limit (2 KiB).
Note: Because no modern CPUs and no modern operating systems use segmentation (and because segmentation works differently on other CPUs - e.g. with segment registers and not "highest N bits select segment"); I'd be tempted to quickly skim through chapter 16 without paying much attention. The important part is "paging" (starting in chapter 18 of the book).

Related

In buffer overflow, how new return address aligns perfectly?

I'm confused that in stack based buffer overflow, if I overwrite a program's return address with a new target address in that stack eliment, how those addresses align or how to make sure that happens? For example: let say in a stack frame I have a variable element at address 0x1 (just pretend although 0x1 is too low for that to happen) and I have a return address element at address 0x6. Now if the architecture is 32 bit in my processor then the address will be of 4 bytes. So If the program asks me for a argument which it will put into that buffer and I supply that address two times repeated then clearly that address element will have the second byte of the new address. Thus the address is not aligned and the program will crash. How will I make sure that the address will always align?
let new address = 0x bf ff ff 3c
address from buffer to return address element: | 0x1 | 0x2 | 0x3 | 0x4 | 0x5 | 0x6 | ....
| | | | | | |
newly written return address layout in memory: | 3c | ff | ff | bf | 3c | ff | ....

You choose the contents that you're overflowing with. Make it the right length to line up the return address with its location relative to the buffer in the stack frame of the binary you're exploiting.
That distance is a compile-time constant (or at least n%16 is constant if it's a VLA or alloca...) so you can find it with a disassembler. Compilers will choose to align even local arrays by 161. Or at least by 4 on 32-bit ABIs that don't require 16-byte stack alignment. So even if you don't have a copy of the binary you're trying to exploit, you may still be able to repeat the return address as you suggest, at positions aligned by 4 relative to the start of the
buffer. Of course, knowing what return address to use (for a ROP attack) may be hard without a binary, unless there are some libraries that aren't ASLRed.
(e.g. the x86-64 System V ABI requires this for all arrays >=16 bytes, and all VLAs, despite private local arrays not being visible outside of a single function.)

machine learning model different inputs

i have dataset, the data set consists of : Date , ID ( the id of the event ), number_of_activities, running_sum ( the running_sum is the running sum of activities by id).
this is a part of my data :
date | id (id of the event) | number_of_activities | running_sum |
2017-01-06 | 156 | 1 | 1 |
2017-04-26 | 156 | 1 | 2 |
2017-07-04 | 156 | 2 | 4 |
2017-01-19 | 175 | 1 | 1 |
2017-03-17 | 175 | 3 | 4 |
2017-04-27 | 221 | 3 | 3 |
2017-05-05 | 221 | 7 | 10 |
2017-05-09 | 221 | 10 | 20 |
2017-05-19 | 221 | 1 | 21 |
2017-09-03 | 221 | 2 | 23 |
the goal for me is to predict the future number of activities in a given event, my question : can i train my model on all the dataset ( all the events) to predict the next one, if so how? because there are inequalities in the number of inputs ( number of rows for each event is different ), and is it possible to exploit the date data as well.

Sure you can. But alot of more information is needed, which you know yourself the best.
I guess we are talking about timeseries here as you want to predict the future.
You might want to have alook at recurrent-neural nets and LSTMs:
An Recurrent-layer takes a timeseries as input and outputs a vector, which contains the compressed information about the whole timeseries. So lets take event 156, which has 3 steps:
The event is your features, which has 3 timesteps. Each timestep has different numbers of activities (or features). To solve this, just use the maximum amount of features occuring and add a padding value (most often simply zero) so they all have the samel length. Then you have a shape, which is suitable for a recurrent neural Net (where LSTMS are currently a good choice)
Update
You said in the comments, that using padding is not option for you, let me try to convince you. LSTMs are good at situations, where the sequence length is different long. However, for this to work you also need to have longer sequences, what the model can learn its patterns from. What I want to say, when some of your sequences have only a few timesteps like 3, but you have other with 50 and more timesteps, the model might have its difficulties to predict these correct, as you have to specify, which timestep you want to use. So either, you prepare your data differently for a clear question, or you dig deeper into the topic using SequenceToSequence Learning, which is very good at computing sequences with different lenghts. For this you will need to set up a Encoder-Decoder network.
The Encoder squashs the whole sequence into one vector, whatever length it is. This one vector is compressed in a way, that it contains the information of the sequence only in one vector.
The Decoder then learns to use this vector for predicting the next outputs of the sequences. This is a known technique for machine-translation, but is suitable for any kind of sequence2sequence tasks. So I would recommend you to create such a Encoder-Decoder network, which for sure will improve your results. Have a look at this tutorial, which might help you further

Getting memory stored in address space

"If a computer handles data in 8-bit sizes and uses a 16-bit address to store and retrieve data in memory, its address space contains 2^16 (65536) bytes or 64k bytes"
My text book has this statement that I'm confused by. Where are they getting 2^16 from? If a computer uses a 16-bit address why isn't that just a 2 byte address space? The textbook hasn't explained how memory is stored in microcomputers and has this statement in the intro chapter. Am I missing something?

If an address is 16 bits, it means that you have 16 bits when referring to a location in memory. The address space is the range of valid addresses, not the physical size of an address.
These addresses start at address 0 (binary 0000 0000 0000 0000) and go up to address 216−1 (binary 1111 1111 1111 1111). That's total of 216 addresses that can be referenced. And if each address refers to 8 bits (i.e. a byte), the total amount of memory that you can refer to with those addresses is 216 × 8 bits, or 216 bytes.
As a smaller example, consider a system with 3-bit addresses, each referring to 4 bits (a nibble).
Address | 0 1 2 3 4 5 6 7
Memory | 0000 0000 0000 0000 0000 0000 0000 0000
Binary |
address | 000 001 010 011 100 101 110 111
The 3-bit addresses can have 23 values, from 0 to 7, and each refers to 4 bits of memory, so this system has a total of 23 = 8 nibbles of memory.
In this system, the only valid address are 0, 1, 2, 3, 4, 5, 6, and 7, so the address space is the set {0, 1, 2, 3, 4, 5, 6, 7}.
As an important point, don't forget that the address space is not necessarily the actual amount of memory available—computers use some handy tricks to use addresses spaces much larger than the memory they actual have available (for example, a 64-bit system can theoretically address 264 bytes of memory, but you don't even have a fraction of that in your computer).
Analogies for address spaces
Here are two analogies that might help you understand the difference between an address space, an address, and a pointer:
The web address space is the set of all URLs, basically the set of strings of the form https://[domain]/[path]. So https://example.com/page is an address, and this link is a pointer to that address.
The United States street address space is (approximately) the set of strings of this form:
[First name] [Last name]
[number] [Street name]
[Town], [STATE] [zip code]
In the same analogy, this is an address:
John Doe
10 Main St.
Faketown, NY 20164
Finally, a pointer is then analagous to the writing on the front of an envelope that the postal service uses to deliver letters.

Condition for memory access conflict in memory-banked vector processors

The Hennessy-Patterson book on Computer Architecture (Quantitative Approach 5ed) says that in a vector architecture with multiple memory banks, a bank conflict can happen if the following condition is met (Page 279 in 5ed):
(Number of banks) / LeastCommonMultiple(Number of banks, Stride) < Bank busy time
However, I think it should be GreatestCommonFactor instead of LCM, because memory conflict would occur if the effective number of banks you have is less than the busy time. By effective number of banks I mean this - let's say you have 8 banks, and a stride of 2. Then effectively you have 4 banks, because the memory accesses will be lined up only at four banks (e.g, let's say your accesses are all even numbers, starting from 0, then your accesses will be lined up at banks 0,2,4,6).
In fact, this formula even fails for the example given right below it. Suppose we have 8 memory banks with busy time of 6 clock cycles, with total memory latency of 12 clock cycles, how long will it take to complete a 64-element vector load with stride of 1? - Here they calculate the time as 12+64=76 clock cycles. However, memory bank conflict will occur according to the condition given, so we clearly can't have one access per cycle (64 in the equation).
Am I getting it wrong, or has the wrong formula managed to survive 5 editions of this book (unlikely)?

GCD(banks, stride) should come into it; your argument about that is correct.
Let's try this for a few different strides and see what we get,
for number of banks = b = 8.
# generated with the calc(1) function
define f(s) { print s, " | ", lcm(s,8), " | ", gcd(s,8), " | ", 8/lcm(s,8), " | ", 8/gcd(s,8) }`
stride | LCM(s,b) | GCF(s,b) | b/LCM(s,b) | b/GCF(s,b)
1 | 8 | 1 | 1 | 8 # 8 < 6 = false: no conflict
2 | 8 | 2 | 1 | 4 # 4 < 6 = true: conflict
3 | 24 | 1 | ~0.333 | 8 # 8 < 6 = false: no conflict
4 | 8 | 4 | 1 | 2 # 2 < 6 = true: conflict
5 | 40 | 1 | 0.2 | 8
6 | 24 | 2 | ~0.333 | 4
7 | 56 | 1 | ~0.143 | 8
8 | 8 | 8 | 1 | 1
9 | 72 | 1 | ~0.111 | 8
x >=8 2^0..3 <=1 1 2 4 or 8
b/LCM(s,b) is always <=1, so it always predicts conflicts.
I think GCF (aka GCD) looks right for the stride values I've looked at so far. You only have a problem if the stride doesn't distribute the accesses over all the banks, and that's what b/GCF(s,b) tells you.
Stride = 8 should be the worst-case, using the same bank every time. gcd(8,8) = lcm(8,8) = 8. So both expressions give 8/8 = 1 which is less than the bank busy/recovery time, thus correctly predicting conflicts.
Stride=1 is of course the best case (no conflicts if there are enough banks to hide the busy time). gcd(8,1) = 1 correctly predicts no conflicts: (8/1 = 8, which is not less than 6). lcm(8,1) = 8. (8/8 < 6 is true) incorrectly predicts conflicts.

How to debug leak in native memory on JVM?

We have a java application running on Mule. We have the XMX value configured for 6144M, but are routinely seeing the overall memory usage climb and climb. It was getting close to 20 GB the other day before we proactively restarted it.
Thu Jun 30 03:05:57 CDT 2016
top - 03:05:58 up 149 days, 6:19, 0 users, load average: 0.04, 0.04, 0.00
Tasks: 164 total, 1 running, 163 sleeping, 0 stopped, 0 zombie
Cpu(s): 4.2%us, 1.7%sy, 0.0%ni, 93.9%id, 0.2%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 24600552k total, 21654876k used, 2945676k free, 440828k buffers
Swap: 2097144k total, 84256k used, 2012888k free, 1047316k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3840 myuser 20 0 23.9g 18g 53m S 0.0 79.9 375:30.02 java
The jps command shows:
10671 Jps
3840 MuleContainerBootstrap
The jstat command shows:
S0C S1C S0U S1U EC EU OC OU PC PU YGC YGCT FGC FGCT GCT
37376.0 36864.0 16160.0 0.0 2022912.0 1941418.4 4194304.0 445432.2 78336.0 66776.7 232 7.044 17 17.403 24.447
The startup arguments are (sensitive bits have been changed):
3840 MuleContainerBootstrap -Dmule.home=/mule -Dmule.base=/mule -Djava.net.preferIPv4Stack=TRUE -XX:MaxPermSize=256m -Djava.endorsed.dirs=/mule/lib/endorsed -XX:+HeapDumpOnOutOfMemoryError -Dmyapp.lib.path=/datalake/app/ext_lib/ -DTARGET_ENV=prod -Djava.library.path=/opt/mapr/lib -DksPass=mypass -DsecretKey=aeskey -DencryptMode=AES -Dkeystore=/mule/myStore -DkeystoreInstance=JCEKS -Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf -Dmule.mmc.bind.port=1521 -Xms6144m -Xmx6144m -Djava.library.path=%LD_LIBRARY_PATH%:/mule/lib/boot -Dwrapper.key=a_guid -Dwrapper.port=32000 -Dwrapper.jvm.port.min=31000 -Dwrapper.jvm.port.max=31999 -Dwrapper.disable_console_input=TRUE -Dwrapper.pid=10744 -Dwrapper.version=3.5.19-st -Dwrapper.native_library=wrapper -Dwrapper.arch=x86 -Dwrapper.service=TRUE -Dwrapper.cpu.timeout=10 -Dwrapper.jvmid=1 -Dwrapper.lang.domain=wrapper -Dwrapper.lang.folder=../lang
Adding up the "capacity" items from jps shows that only my 6144m is being used for java heap. Where the heck is the rest of the memory being used? Stack memory? Native heap? I'm not even sure how to proceed.
If left to continue growing, it will consume all memory on the system and we will eventually see the system freeze up throwing swap space errors.
I have another process that is starting to grow. Currently at about 11g resident memory.
pmap 10746 > pmap_10746.txt
cat pmap_10746.txt | grep anon | cut -c18-25 | sort -h | uniq -c | sort -rn | less
Top 10 entries by count:
119 12K
112 1016K
56 4K
38 131072K
20 65532K
15 131068K
14 65536K
10 132K
8 65404K
7 128K
Top 10 entries by allocation size:
1 6291456K
1 205816K
1 155648K
38 131072K
15 131068K
1 108772K
1 71680K
14 65536K
20 65532K
1 65512K
And top 10 by total size:
Count Size Aggregate
1 6291456K 6291456K
38 131072K 4980736K
15 131068K 1966020K
20 65532K 1310640K
14 65536K 917504K
8 65404K 523232K
1 205816K 205816K
1 155648K 155648K
112 1016K 113792K
This seems to be telling me that because the Xmx and Xms are set to the same value, there is a single allocation of 6291456K for the java heap. Other allocations are NOT java heap memory. What are they? They are getting allocated in rather large chunks.

Expanding a bit more details on Peter's answer.
You can take a binary heap dump from within VisualVM (right click on the process in the left-hand side list, and then on heap dump - it'll appear right below shortly after). If you can't attach VisualVM to your JVM, you can also generate the dump with this:
jmap -dump:format=b,file=heap.hprof $PID
Then copy the file and open it with Visual VM (File, Load, select type heap dump, find the file.)
As Peter notes, a likely cause for the leak may be non collected DirectByteBuffers (e.g.: some instance of another class is not properly de-referencing buffers, so they are never GC'd).
To identify where are these references coming from, you can use Visual VM to examine the heap and find all instances of DirectByteByffer in the "Classes" tab. Find the DBB class, right click, go to instances view.
This will give you a list of instances. You can click on one and see who's keeping a reference each one:
Note the bottom pane, we have "referent" of type Cleaner and 2 "mybuffer". These would be properties in other classes that are referencing the instance of DirectByteBuffer we drilled into (it should be ok if you ignore the Cleaner and focus on the others).
From this point on you need to proceed based on your application.
Another equivalent way to get the list of DBB instances is from the OQL tab. This query:
select x from java.nio.DirectByteBuffer x
Gives us the same list as before. The benefit of using OQL is that you can execute more more complex queries. For example, this gets all the instances that are keeping a reference to a DirectByteBuffer:
select referrers(x) from java.nio.DirectByteBuffer x

What you can do is take a heap dump and look for object which are storing data off heap such as ByteBuffers. Those objects will appear small but are a proxy for larger off heap memory areas. See if you can determine why lots of those might be retained.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart