How many bytes are stored at a memory address? - memory

I am very confused with the following gdb output. I am debugging a program that processes a text file. The first word in the file is "the" and the gdb output looks as follows:
"The":
(gdb) p *(char*)0x7fffffff9d30
$12 = 84 'T'
(gdb) p *(char*)0x7fffffff9d34
$13 = 104 'h'
(gdb) p *(char*)0x7fffffff9d38
$14 = 101 'e'
A character is one byte, so when I increase the address of 'T' by 8 bits I should find 'h' there. But the address of 'h' is only 4 bits farther. What am I missing here?
Didn't realize that these are Wchar_t (wide characters).

FWIW, in situations like this you might like to use the "x" command to dump memory. This avoids any possible confusion caused by types and operators.

Related

Erlang equivalent of javascript codePointAt?

Is there an erlang equivalent of codePointAt from js? One that gets the code point starting at a byte offset, without modifying the underlying string/binary?
You can use bit syntax pattern matching to skip the first N bytes and decode the first character from the remaining bytes as UTF-8:
1> CodePointAt = fun(Binary, Offset) ->
<<_:Offset/binary, Char/utf8, _/binary>> = Binary,
Char
end.
Test:
2> CodePointAt(<<"πr²"/utf8>>, 0).
960
3> CodePointAt(<<"πr²"/utf8>>, 1).
** exception error: no match of right hand side value <<207,128,114,194,178>>
4> CodePointAt(<<"πr²"/utf8>>, 2).
114
5> CodePointAt(<<"πr²"/utf8>>, 3).
178
6> CodePointAt(<<"πr²"/utf8>>, 4).
** exception error: no match of right hand side value <<207,128,114,194,178>>
7> CodePointAt(<<"πr²"/utf8>>, 5).
** exception error: no match of right hand side value <<207,128,114,194,178>>
As you can see, if the offset is not in a valid UTF-8 character boundary, the function will throw an error. You can handle that differently using a case expression if needed.
First, remember that only binary strings are using UTF-8 in Erlang. Plain double-quote strings are already just lists of code points (much like UTF-32). The unicode:chardata() type represents both of these kinds of strings, including mixed lists like ["Hello", $\s, [<<"Filip"/utf8>>, $!]]. You can use unicode:characters_to_list(Chardata) or unicode:characters_to_binary(Chardata) to get a flattened version to work with if needed.
Meanwhile, the JS codePointAt function works on UTF-16 encoded strings, which is what JavaScript uses. Note that the index in this case is not a byte position, but the index of the 16-bit units of the encoding. And UTF-16 is also a variable length encoding: code points that need more than 16 bits use a kind of escape sequence called "surrogate pairs" - for example emojis like 👍 - so if such characters can occur, the index is misleading: in "a👍z" (in JavaScript), the a is at 0, but the z is not at 2 but at 3.
What you want is probably what's called the "grapheme clusters" - those that look like a single thing when printed (see the docs for Erlang's string module: https://www.erlang.org/doc/man/string.html). And you can't really use numerical indexes to dig the grapheme clusters out from a string - you need to iterate over the string from the start, getting them out one at a time. This can be done with string:next_grapheme(Chardata) (see https://www.erlang.org/doc/man/string.html#next_grapheme-1) or if you for some reason really need to index them numerically, you could insert the individual cluster substrings in an array (see https://www.erlang.org/doc/man/array.html). For example: array:from_list(string:to_graphemes(Chardata)).

Xcode weird debugger issue?

both integers, one is loaded from NSUserDefaults with the integerForKey: method. Has anyone seen a behaviour like this?
The result should obviously be 2, or is it way too late and I should sleep?
this is so weird....
Yes, this is a bug, please file it with the lldb.llvm.org bugzilla.
Note, po is just shorthand for: run the basic "expr" command to evaluate the arguments as an expression, then call the description method on the result.
The way the expression command works is if the expression is simple enough to interpret, we do that, and otherwise we JIT the expression and insert it into the debugee and run it. The bug is in the interpreter, apparently it can't do mod with signed integers. Unsigned integer types work correctly, and the JIT result is also correct. For instance, in Kurt's example:
(lldb) expr n % m
(int) $5 = 0
That's not right! But:
(lldb) expr (void) printf ("%d\n", n % m)
2
(lldb)
Because the expression involved a function call, we couldn't interpret it and had to JIT it, which got the calculation right. That's also a pretty gross workaround, but also please file a bug.

How can I extract some data out of the middle of a noisy file using Perl 6?

I would like to do this using idiomatic Perl 6.
I found a wonderful contiguous chunk of data buried in a noisy output file.
I would like to simply print out the header line starting with Cluster Unique and all of the lines following it, up to, but not including, the first occurrence of an empty line. Here's what the file looks like:
</path/to/projects/projectname/ParameterSweep/1000.1.7.dir> was used as the working directory.
....
Cluster Unique Sequences Reads RPM
1 31 3539 3539
2 25 2797 2797
3 17 1679 1679
4 21 1636 1636
5 14 1568 1568
6 13 1548 1548
7 7 1439 1439
Input file: "../../filename.count.fa"
...
Here's what I want parsed out:
Cluster Unique Sequences Reads RPM
1 31 3539 3539
2 25 2797 2797
3 17 1679 1679
4 21 1636 1636
5 14 1568 1568
6 13 1548 1548
7 7 1439 1439
One-liner version
.say if /Cluster \s+ Unique/ ff^ /^\s*$/ for lines;
In English
Print every line from the input file starting with the once containing the phrase Cluster Unique and ending just before the next empty line.
Same code with comments
.say # print the default variable $_
if # do the previous action (.say) "if" the following term is true
/Cluster \s+ Unique/ # Match $_ if it contains "Cluster Unique"
ff^ # Flip-flop operator: true until preceding term becomes true
# false once the term after it becomes true
/^\s*$/ # Match $_ if it contains an empty line
for # Create a loop placing each element of the following list into $_
lines # Create a list of all of the lines in the file
; # End of statement
Expanded version
for lines() {
.say if (
$_ ~~ /Cluster \s+ Unique/ ff^ $_ ~~ /^\s*$/
)
}
lines() is like <> in perl5. Each line from each file listed on the command line is read in one at a time. Since this is in a for loop, each line is placed in the default variable $_.
say is like print except that it also appends a newline. When written with a starting ., it acts directly on the default variable $_.
$_ is the default variable, which in this case contains one line from the file.
~~ is the match operator that is comparing $_ with a regular expression.
// Create a regular expression between the two forward slashes
\s+ matches one or more spaces
ff is the flip-flop operator. It is false as long as the expression to its left is false. It becomes true when the expression to its left is evaluated as true. It becomes false when the expression to its right becomes true and is never evaluated as true again. In this case, if we used ^ff^ instead of ff^, then the header would not be included in the output.
When ^ comes before (or after) ff, it modifies ff so that it is also false the iteration that the expression to its left (or right) becomes true.
/^\*$/ matches an empty line
^ matches the beginning of a string
\s* matches zero or more spaces
$ matches the end of a string
By the way, the flip-flop operator in Perl 5 is .. when it is in a scalar context (it's the range operator in list context). But its features are not quite as rich as in Perl 6, of course.
I would like to do this using idiomatic Perl 6.
In Perl, the idiomatic way to locate a chunk in a file is to read the file in paragraph mode, then stop reading the file when you find the chunk you are interested in. If you are reading a 10GB file, and the chunk is found at the top of the file, it's inefficient to continue reading the rest of the file--much less perform an if test on every line in the file.
In Perl 6, you can read a paragraph at a time like this:
my $fname = 'data.txt';
my $infile = open(
$fname,
nl => "\n\n", #Set what perl considers the end of a line.
); #Removed die() per Brad Gilbert's comment.
for $infile.lines() -> $para {
if $para ~~ /^ 'Cluster Unique'/ {
say $para.chomp;
last; #Quit reading the file.
}
}
$infile.close;
# ^ Match start of string.
# 'Cluster Unique' By default, whitespace is insignificant in a perl6 regex. Quotes are one way to make whitespace significant.
However, in perl6 rakudo/moarVM the open() function does not read the nl argument correctly, so you currently can't set paragraph mode.
Also, there are certain idioms that are considered by some to be bad practice, like:
Postfix if statements, e.g. say 'hello' if $y == 0.
Relying on the implicit $_ variable in your code, e.g. .say
So, depending on what side of the fence you live on, that would be considered a bad practice in Perl.

Error when appending string from word or variable

I'm trying to append two strings in gforth, but I get some scary looking error messages.
While s" foo" s" bar" append type cr works fine, as soon as I start storing strings in variables or creating them from words, I get errors. For instance:
: make-string ( -- s )
s" foo" ;
: append-print ( s s -- )
append type cr ;
make-string s" bar" append-print
Running it produces the following error:
$ gforth prob1.fs -e bye
gforth(41572,0x7fff79cc2310) malloc: *** error for object 0x103a551a0: pointer being realloc'd was not allocated
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6.
I'm well versed in C, so it seems pretty clear that I'm using Forth incorrectly!
I suppose I need to learn something very basic about memory management in Forth.
Can anyone please explain what goes wrong here, and what I should do?
I also run into problems when I try to append a string that is stored in a variable:
variable foo
s" foo" foo !
foo s" bar " append type cr
This ends in a loop that I have to break:
$ gforth prob2.fs
foo��^C
in file included from *OS command line*:-1
prob2.fs:4: User interrupt
foo s" bar " append >>>type<<< cr
Backtrace:
$10C7C2E90 write-file
For reference, I'm using gforth 0.7.2 on Mac OS X. I would be very grateful for some good explanations on what's going on.
Update
I can see the definition of append:
see append
: append
>l >l >l >l #local0 #local1 #local3 + dup >l resize throw >l #local4 #local0 #local3 + #local5
move #local0 #local1 lp+!# 48 ; ok
So, it would seem I need to manage memory myself in Forth? If so, how?
Solution
Andreas Bombe provides the clue below. The final program that works would be
: make-string ( -- s )
s" foo" ;
: append-print
s+ type cr ;
make-string s" bar" append-print
Output is
$ gforth b.fs -e bye
foobar
append uses resize on the first string make space to append the second string. This requires that the string be allocated on the heap.
When you compile a string with s" into a word, it gets allocated in the dictionary. If you try resize (directly or indirectly through append) on that pointer you will get the error you see.
Normally s" has undefined interpretation semantics. Gforth defines its interpretation semantics for convenience as allocating the string on the heap. That's why it works (in gforth) as long as you don't compile it.
Edit:
I've found the definition of append, it's part of libcc.fs (a foreign function interface builder as it seems) and not a standard word. This is the definition in the source, more readable than the see decompile:
: append { addr1 u1 addr2 u2 -- addr u }
addr1 u1 u2 + dup { u } resize throw { addr }
addr2 addr u1 + u2 move
addr u ;
Immediately before that is a definition of s+:
: s+ { addr1 u1 addr2 u2 -- addr u }
u1 u2 + allocate throw { addr }
addr1 addr u1 move
addr2 addr u1 + u2 move
addr u1 u2 +
;
As you can see this one allocates new memory space instead of resizing the first string and concatenates both strings into it. You could use this one instead. It is not a standard word however and just happens to be in your environment as an internal implementation detail of libcc.fs in gforth so you can't rely on it being available elsewhere.
The usage of strings in Forth doesn't warrant dynamic allocation mostly and at least not in your example. You can get by nicely with buffers that you allocate yourself using ALLOT
and some very simple words to manipulate them.
[ALLOT uses the data space (ANSI term) in an incremental fashion for adding words and buffers. It is not dynamic, you can't release an item without removing at the same time all items ALLOT-ted later. It is also simple. Do not confuse with ALLOCATE which is dynamic and is in a separate extension wordset]
You make a fundamental mistake in leaving out the specification of your append-buffer.
It doesn't work, and we don't know how it is supposed to work!
In ciforth's an example could be:
: astring S" foo" ;
CREATE buffer 100 ALLOT \ space for 100 chars
\ Put the first string in `buffer and append the second string.
\ Also print the second string
: append-print ( s s -- )
type cr 2swap
buffer $!
buffer $+! ;
astring s" bar" append-print
bar OK \ answer
buffer $# TYPE
foobar OK \ answer
Other Forths have other non-standard words to manipulate simple strings. An excursion through malloc land is really not necessary. In the gforth documentation you can look up 'place' and find an equivalent family of words.
Also nowadays (Forth 2012) you can have strings like so "foo".

Gdb Syntax for print command

How can I view the data at the address of the first operand in gdb?
cmp [ebp+eax], edi
I tried using:
print /d $ebp
print /d $eax
and manually adding the values to make the address, but was not sure what to do next, or if there was an easier way...
(gdb) help x
Examine memory: x/FMT ADDRESS.
ADDRESS is an expression for the memory address to examine.
FMT is a repeat count followed by a format letter and a size letter.
Format letters are o(octal), x(hex), d(decimal), u(unsigned decimal),
t(binary), f(float), a(address), i(instruction), c(char) and s(string).
Size letters are b(byte), h(halfword), w(word), g(giant, 8 bytes).
The specified number of objects of the specified size are printed
according to the format.
For your example:
x/d $ebp+$eax

Resources