use and meaning of DW_AT_location - dwarf

I wanted to know the use of the attribute DW_AT_location for debugging. It is one of the attributes specified by dwarf for debugging, but could not really understand what exactly it represents. And also when should this attribute be emitted when we compile a code.

From the DWARF 3 spec (http://dwarfstd.org/doc/Dwarf3.pdf):
2.16 Data Locations Any debugging information entry describing a data object, which includes variables, parameters, common blocks and the
like, may have a DW_AT_location attribute, whose value is a location
description (see Section 2.6).
The value of the DW_AT_location attribute is a location expression. Location expressions are fairly complex, I'd advise you to read the DWARF spec referenced above to learn more. In summary, a location expression can be a simple address with the location of the variable, or a mini-program that must be evaluated at run-time by the debugger to determine the location of the variable. Here are some example location expressions, from the DWARF spec:
Ideally, your compiler should emit a location list for a variable describing its location at all points in the program. Tracking a variable's location through registers is not trivial, that is why some compilers when producing debug information disable optimizations like moving variables into registers.

Please check the chapter 7 of DWARF 3 spec, DATA REPRESENTATION.
For example, if the value of DW_AT_location is 0x91 0x68, from the table in chapter 7, we can know that the 0x91 indicates the DWARF Debugging Information Format which is DW_OP_fbreg, and 0x68 is a SLEB128 offset, we can get its real value: -24.

Related

Getting current line number in Cobol

Is it possible to get and display the current line number in the Cobol program?
For example, C allows do it by the next way:
...
printf("Current line = %d\n", __LINE__);
...
Short answer: No.
There is no portable COBOL way in doing this, especially not in all places like __LINE__ does.
Long answer with potential alternatives:
COBOL 2002 added intrinsic functions for exception handling. Using these you can get the location where the last error happened, which checks are activated.
You could hack something by raising a non-fatal exception and ideally in the same line use that function...
From the standard:
The EXCEPTION-LOCATION function returns an alphanumeric character string, part of which is the implementor-defined location of the statement associated with the last exception status.
So this may provide you with the line number, as the returned value depends on the implementation, additional it seems that - at the time of writing - neither IBM nor MicroFocus nor Fujitsu compilers support that intrinsic function at all.
The GnuCOBOL implementation returns a semicolon-separated list with the last entry being the line number.
The upcoming COBOL standard added the MODULE-NAME intrinsic function - but this will only give the name, not the line reference.
If you are free to choose which implementation you use, then an addition of an extra register COB_SOURCE_LINE / COB_SOURCE_FILE in GnuCOBOL should be relative easy to add...
If the intend is a tracing of some kind: many compilers have an extension READY TRACE/ RESET TRACE. With those two statements (and possibly compiler directives / options) they will at least show the name of sections and paragraphs reached, some may also show the line number. Often this could be redirected to a file and will otherwise go to the default error stream.
If you use GnuCOBOL and compile with -ftrace-all you can also use that for line or statement tracing with self-defined format as specified in COB_TRACE_FORMAT [which can also be adjusted within the COBOL program and limited to the line number].
Q: Is it possible to get and display the current line number in the Cobol program?
There was a feature through COBOL 85 called the DEBUG module. The feature was made obsolete in COBOL 85 and subsequently removed in COBOL 2002. While DEBUG lines were available in the 2002 standard, the DEBUG module was removed from the standard.
NOTE: The DEBUG module may still be available in current compilers.
The feature requires debugging mode in the source-computer paragraph. If the line is removed, source lines with a D or d in column 7 are treated as comments.
Declaratives must be added to access debug-line which is the standard name for the source line number.
I have coded the source such that the source line number of wherever I place perform show-line will be displayed. Notice that show-line doesn't do anything.
Source:
program-id. dbug.
environment division.
source-computer. computer-name debugging mode.
object-computer. computer-name.
data division.
working-storage section.
01 char pic x.
procedure division.
declaratives.
ddebug section.
duse for debugging show-line.
d display "Source-line: " debug-line.
end declaratives.
main-line.
begin.
display "Before"
d perform show-line
display "After"
accept char
stop run.
dshow-line.
end program dbug.
Each implementor has their own means for activating the feature. For the system I use, it's a switch parameter (+D) on the command line. Without the switch parameter the line number will not show. (For GnuCOBOL 3.2 it is, apparently, the environment variable COB_SET_DEBUG with a value of 'Y', 'y' or '1'. ;-))
Command line:
dbug (+D)
Display:
Before
Source-line: 17
After

Explicit Plural string using iOS Stringsdict files

I am getting started learning iOS Stringsdict files and found some existing code on a project which used the following syntax:
<key>zero</key>
<string>You no message.</string>
As per the CLDR, zero is an invalid plural in English and we expect to use explicit plural rules (=0 when using ICU MessageFormat)
I tried to find how to use explicit plural rules in iOS Stringsdict files and could not find any way to achieve this. Can someone confirm if this is supported or not?
Example of solutions (I cannot test them but maybe someone can?)
<key>0</key>
<string>You no message.</string>
Or
<key>=0</key>
<string>You no message.</string>
Extra reference on explicit plural rules part of the CLDR implementation of ICU MessageFormat:
https://formatjs.io/guides/message-syntax/#plural-format
=value
This is used to match a specific value regardless of the plural categories of the current locale.
If you are interested in the zero rule only, it is handled in .stringsdict file for any language.
Source: Foundation Release Notes for OS X v10.9
If "zero" is present, the value is used for mapping the argument value zero regardless of what CLDR rule specifies for the numeric value.
Otherwise, these are the only rules handled (depends on language): zero, one, two, few, many, others
Short Answer
.stringsdict files have no way to support explicit plural rules (other than a custom Apple implementation of zero which is detailed below)
Detailed Answer
Normal CLDR implementation:
All rules that are not in the CLDR for a given language will be ignored
If using the rule zero, it will use the CLDR values (most languages have 0 as value for zero). This also includes languages like Latvian who have 20, 30, etc. values mapped to zero and also contradicts Apple's own documentation (this behavior was verified):
If "zero" is present, the value is used for mapping the argument value
zero regardless of what CLDR rule specifies for the numeric value.
Source: Foundation Release Notes for OS X v10.9
Custom (Apple) CLDR implementation:
All languages can use the zero category from the CLDR even if the rule is not defined for this language (reference here)
Presumably, they implemented this to facilitate negative forms of sentences which is a common use case (this can even be found in their examples). For example instead of writing:
You have 0 emails.
You can write:
You have no emails.
This is a very common use case but is typically not covered using CLDR categories, it is used by using explicit values. For example, in ICU MessageFormat you can use =0 and not zero for negative forms.
While this seems convenient, it creates a big problem, what if you want to use negative forms for Latvian using the zero category? You simply can't - basically Apple broke linguistic rules by overwriting the CLDR.
Complimentary details:
There are only two languages in the CLDR where zero does not equal 0:
Latvian: 1.3 million speakers worldwide
Prussian: dead language since the 18th century
Neither iOS nor macOS is available in the Latvian languages but they support locale settings (keyboard and date formats)
This means that there are probably few applications that will support Latvian, unless they have a manual way to change the language inside the application itself (this is a less common scenario for iOS which typically honor the device's settings)
Conclusion
Tip #1: If you need to use Latvian, you should probably avoid using zero for negative forms, and use code instead, with strings outside of the stringsdict file
Tip #2: Make sure that your translation process supports this behavior correctly!

Converting between M3 `loc` scheme and regular `loc` type?

The M3 Core module returns a sort of simplified loc representation in Rascal. For instance, a method in file MapParser might have the loc: |java+method:///MapParser/a()|.
However, this is evidently different from the other loc scheme I tend to see, which would look more or less like: |project://main-scheme/src/tests/MapParser.java|.
This wouldn't be a problem, except that some functions only accept one scheme or another. For instance, the function appendToFile(loc file, value V...) does not accept this scheme M3 uses, and will reject it with an error like: IO("Unsupported scheme java+method").
So, how can I convert between both schemes easily? I would like to preserve all information, like highlighted sections for instance.
Cheers.
There are two differences at play here.
Physical vs Logical Locations
java+method is an logical location, and project is a physical location. I think the best way to describe their difference is that a physical location describes the location of an actual file, or a subset of an actual file. A logical location describes the location of a certain entity in the context of a bigger model. For example, a java method in a java class/project. Often logical locations can be mapped to a physical location, but that is not always true.
For m3 for example you can use resolveLocation from IO to get the actual offset in the file that the logical location points to.
Read-only vs writeable locations
Not all locations are writeable, I don't think any logical location is. But there are also physical locations that are read only. The error you are getting is generic in that sense.
Rascal does support writing in the middle of text files, most likely you do not want to use appendToFile as it will append after the location you point it too. Most likely you want to replace a section of the text with your new section, so a regular writeFile should work.
Some notes
Note that you would have to recalculate all the offsets in the file after every write. So the resolved physical locations for the logical locations would be outdated, as the file has changed since constructing the m3 model and its corresponding map between logical and physical locations.
So for this use case, you might want to think of a better way. The nicest solution is using a grammar, and rewrite the parse tree's of the file, and after rewriting overwrite the old file. Note that the most recent Java grammar shipped with Rascal is for Java 5, so this might be a bit more work than you would like. Perhaps frame your goal as a new Stack Overflow question, and we'll see what other options might be applicable.

Documentation of Moses (statistical machine translation) mose.ini file format?

Is there any documentation of the moses.ini format for Moses? Running moses at the command line without arguments returns available feature names but not their available arguments. Additionally, the structure of the .ini file is not specified in the manual that I can see.
The main idea is that the file contains settings that will be used by the translation model. Thus, the documentation of values and options in moses.ini should be looked up in the Moses feature specifications.
Here are some excerpt I found on the Web about moses.ini.
In the Moses Core, we have some details:
7.6.5 moses.ini All feature functions are specified in the [feature] section. It should be in the format:
* Feature-name key1=value1 key2=value2 .... For example, KENLM factor=0 order=3 num-features=1 lazyken=0 path=file.lm.gz
Also, there is a hint on how to print basic statistics about all components mentioned in the moses.ini.
Run the script
analyse_moses_model.pl moses.ini
This can be useful to set the order of mapping steps to avoid explosion of translation options or just to check that the model components are as big/detailed as we expect.
In the Center for Computational Language and EducAtion Research (CLEAR) Wiki, there is a sample file with some documentation:
Parameters
It is recommended to make an .ini file to storage all of your setting.
input-factors
- Using factor model or not
mapping
- To use LM in memory (T) or read the file in hard disk directly (G)
ttable-file
- Indicate the num. of source-factor, num. of target-factor, num of score, and
the path to translation table file
lmodel-file
- Indicate the type using for LM (0:SRILM, 1:IRSTLM), using factor number, the order (n-gram) of LM, and the path to language model file
If it is not enough, there is another description on this page, see "Decoder configuration file" section
The sections
[ttable-file] and [lmodel-file] contain pointers to the phrase table
file and language model file, respectively. You may disregard the
numbers on those lines. For the time being, it's enough to know that
the last one of the numbers in the language model specification is the
order of the n-gram model.
The configuration file also contains some feature weights. Note that
the [weight-t] section has 5 weights, one for each feature contained
in the phrase table.
The moses.ini file created by the training process will not work with
your decoder without modification because it relies on a language
model library that is not compiled into our decoder. In order to make
it work, open the moses.ini file and find the language model
specification in the line immediately after the [lmodel-file] heading.
The first number on this line will be 0, which stands for SRILM.
Change it into 8 and leave the rest of the line untouched. Then your
configuration should work.

Determine Cobol coding style

I'm developing an application that parses Cobol programs. In these programs some respect the traditional coding style (programm text from column 8 to 72), and some are newer and don't follow this style.
In my application I need to determine the coding style in order to know if I should parse content after column 72.
I've been able to determine if the program start at column 1 or 8, but prog that start at column 1 can also follow the rule of comments after column 72.
So I'm trying to find rules that will allow me to determine if texts after column 72 are comments or valid code.
I've find some but it's hard to tell if it will work everytime :
dot after column 72, determine the end of sentence but I fear that dot can be in comments too
find the close character of a statement after column 72 : " ' ) }
look for char at columns 71 - 72 - 73, if there is not space then find the whole word, and check if it's a key word or a var. Problem, it can be a var from a COPY or a replacement etc...
I'd like to know what do you think of these rules and if you have any ideas to help me determine the coding style of a Cobol program.
I don't need an API or something just solid rules that I will be able to rely on.
I think you need to know the COBOL compiler for each program. Its documentation should tell you what conventions/configurations/switches it uses to decide if the source code ends at column 72 or not.
So.... which compiler(s)?
And if you think the column 72 issue is a pain, wait till you get around to actually parsing the COBOL itself. If you are not well prepared to handle the lexical issues of the language, you are probably very badly prepared to handle the syntactic ones.
There is no absolutely reliable way to determine if a COBOL program
is in fixed or free format based only on the source code. Heck it is sometimes difficult to identify
the programming language based only on source code. Check out
this classic polyglot - it is valid under 8 different language compilers. That
said, you could try a few heuristics that might yield
the correct answer more often than not.
Compiler directives imbedded in source code
Watch for certain compiler directives that determine code format.
Unfortunately, every compiler vendor uses their own flavour of directive.
For example, Microfocus COBOL uses the
SOURCEFORMAT directive. This directive will appear near the top of the program so a short pre-scan
could be used to find it. On the other hand, OpenCobol uses >>SOURCE FORMAT IS FREE and
>>SOURCE FORMAT IS FIXED to toggle between free and fixed format, different parts of the same program
could be formatted differently!
The bottom line here is that you will have to support the conventions of multiple COBOL compilers.
Compiler switches
Source code format can be also be specified using a compiler switch. In this case, there are no concrete
clues to go on. However, you can be reasonably sure that the entire source program will be either
fixed or free. All you can do here is guess. Unless the programmer is out to "mess with
your head" (and some will), a program in free format will have the keywords IDENTIFICATION DIVISION or ID DIVISION, starting before column 8.
Every COBOL program will begin with these keywords so you can use them as the anchor point for determining code format in the
absence of imbedded compiler directives.
Warning - this is far from fool proof, but might be a good start.
There won't be an algorithm to do this with 100% certainty, because if comments can be anything, they can also be compilable COBOL code. So you could theoretically write a program that means one thing if the comments are ignored, and something else entirely if the comments are treated as part of the COBOL.
But that's extremely unlikely. What's most likely to happen is that if you try to compile the code under the wrong convention, it will simply fail. So the only accurate way to do this is to try compiling/parsing the program one way, and if you come to a line that can't make sense, switch to the other style. You could also support passing an argument to the compiler when the style is already known.
You can try using heuristics like what you've described, but that will never be totally accurate. The most they can give you is a probability that the code is one or the other style, which will increase as they examine more and more lines of code. They could be useful for helping you guess the style before you start compiling, or for figuring out when the problem is really just a typo in the code.
EDIT:
Regarding ideas for heuristics, it's hard to say. If there were a standard comment sigil like // or # in other languages, this would be a lot easier (actually, there is, but it sounds like your code doesn't follow this convention). The only thing I can think of would be to check whether every line (or maybe 99% of lines, and not counting empty lines or lines commented with *) has a period somewhere before position 72.
One thing you DON'T want to do is apply any heuristics to the part after position 72. That is, you don't want to be checking the comments to see if they're valid COBOL. You want to check what you know is COBOL first, and see if that works by itself. There are several reasons for this:
Comments written in English are likely to have periods and quotes in them, so your first and second bullet points are out.
Natural languages are WAY harder to parse than something like COBOL.
The comments could easily have COBOL in them (maybe someone commented out the previous version of the line).
An important rule for comments is that they should never affect what the program does. If changing the comments can change how the program is compiled, you violate that.
All that in mind, my opinion is that you shouldn't use heuristics at all. You should always try to compile the program under both conventions unless one is explicitly specified. There's a chance that code will compile successfully under both conventions, and then you'll have two different programs and no way to tell which one is correct.
If that happens, you need to compare the two results (perhaps with a hash or something) to see if they're the same program. If they're the same, great, but if not, you'll need to force the user to explicitly choose a convention.
Most COBOL compilers will allow you to generate and analyze the post text manipulation phase.
The text preprocessor output can be seen (using OpenCOBOL for the example)
cobc -E program.cob
The text manipulation processor deals with any COPY ... REPLACING compiler directives, as well as converting SOURCE FORMAT IS FIXED (with line continuations, string literal concatenations, comment line removal, among other things) to the actual free format that the compiler lexical analyzer needs. A lot of the OpenCOBOL toolkits (Cross referencer and Animator, to name two) use source code AFTER the preprocessor pass. I don't think you'll lose any street cred if your parser program relies on post processed source code files.

Resources