Cobol Copybook Parser - parsing

Can anybody suggest me how to extract the fields of a Cobol Copybook?
It will be helpful if you could help with the code snippet or any links?
Example:
I want to extract it like this.
Field No.
Field Name
Field Type (999 or S9(4) or x(5)....)
Field Type-add (COMP, COMP-3, etc.,)
Other-Details (Copy Everything until "." excluding PIC clause)

Disclaimer: I am the maintainer of ProLeap COBOL parser.
You could use the Java-based ProLeap COBOL parser to extract all kinds of data from COBOL files such as level numbers, picture strings etc. Also you can extract COMP, COMP-1 etc. from the usage clause like this.
The ProLeap COBOL parser is licensed under an open source license, so it can be used for free.

For python, take a look at the Copybook package (https://github.com/zalmane/copybook). It supports most features of Copybook includes REDEFINES and OCCURS as well as a wide variety of PIC formats.
pip install copybook
root = copybook.parse_file('sample.cbl')
disclaimer : I am the maintainer of https://github.com/zalmane/copybook

Disclaimer: I maintain cb2xml
You could use cb2xml to parse your copybooks
In java each field is converted to a Cobol Object (with picture, usage, occurs fields)
for Other languages the cobol can be converted to xml
See Looking for The right way with Regular Expression with groups in different order
Cobol:
01 Ams-Vendor.
03 Brand Pic x(3).
03 Location-details.
05 Location-Number Pic 9(4).
05 Location-Type Pic XX.
05 Location-Name Pic X(35).
03 Address-Details.
05 actual-address.
10 Address-1 Pic X(40).
10 Address-2 Pic X(40).
10 Address-3 Pic X(35).
05 Postcode Pic 9(4).
05 Empty pic x(6).
05 State Pic XXX.
03 Location-Active Pic X.
Output from cb2xml:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<copybook filename="cbl2xml_Test110.cbl">
<item display-length="173" level="01" name="Ams-Vendor" position="1" storage-length="173">
<item display-length="3" level="03" name="Brand" picture="x(3)" position="1" storage-length="3"/>
<item display-length="41" level="03" name="Location-details" position="4" storage-length="41">
<item display-length="4" level="05" name="Location-Number" numeric="true" picture="9(4)" position="4" storage-length="4"/>
<item display-length="2" level="05" name="Location-Type" picture="XX" position="8" storage-length="2"/>
<item display-length="35" level="05" name="Location-Name" picture="X(35)" position="10" storage-length="35"/>
</item>
<item display-length="128" level="03" name="Address-Details" position="45" storage-length="128">
<item display-length="115" level="05" name="actual-address" position="45" storage-length="115">
<item display-length="40" level="10" name="Address-1" picture="X(40)" position="45" storage-length="40"/>
<item display-length="40" level="10" name="Address-2" picture="X(40)" position="85" storage-length="40"/>
<item display-length="35" level="10" name="Address-3" picture="X(35)" position="125" storage-length="35"/>
</item>
<item display-length="4" level="05" name="Postcode" numeric="true" picture="9(4)" position="160" storage-length="4"/>
<item display-length="6" level="05" name="Empty" picture="x(6)" position="164" storage-length="6"/>
<item display-length="3" level="05" name="State" picture="XXX" position="170" storage-length="3"/>
</item>
<item display-length="1" level="03" name="Location-Active" picture="X" position="173" storage-length="1"/>
</item>
</copybook>
An interesting application of cb2xml is described in Dynamically Reading COBOL Redefines with C#

disclaimer : I maintain https://www.cobolcopybook.co.in
Hi, check the site https://www.cobolcopybook.co.in, This site is designed specially for the analyze COBOL copybooks.
for Eg. Your input copybook is:
000100 01 BGG-FILE-REC.
000200 03 BGG-RCD-KEY.
000300 05 BGG-DUDENAME PIC XXXX.
000400 05 BGG-DUDEADDR PIC XX.
000500 05 BGG-HAIRCOLOR PIC X(71).
000600 05 BGG-EYECOLOR PIC X(8).
Then the output will be:
SR# LEVEL FIELD NAME PICTURE TYPE START END LENGTH
0 1 BGG-FILE-REC. # AN 1 85 85
1 3 BGG-RCD-KEY. # AN 1 85 85
2 5 BGG-DUDENAME XXXX. AN 1 4 4
3 5 BGG-DUDEADDR XX. AN 5 6 2
4 5 BGG-HAIRCOLOR X(71). AN 7 77 71
5 5 BGG-EYECOLOR X(8). AN 78 85 8
I hope this will solve your problem.

If you use a COBOL parser designed for the purpose, then this is relatively easy.
Such a parser has to be willing to parse not just
an entire program, but to parse various subparts, such as a copybook containing a paragraph or a data declaration. Such a tool has to be prepared to
handle the complexities found in real copybooks, such as field declarations, PIC strings, REDEFINEs, 77 and 88 level variables, all manner of crazy literal values used as initializers, and the messy/ugly problems with line continuations and COPY REPLACING, or it will only handle vanilla copybooks. Depending the origin of the copybook, you may have to process the text as EBCDIC. Its actually a lot of work to build a parser that handles all of this.
If you try to hack it with regexes or something else that can't do something like context-free parsing, your resulting tool will simply not work on any complex cases.
A good parser will produce an abstract syntax tree capturing all these details as a data structure
in memory to be processed (this is efficient) or as an XML file to be
processed by some other tool.
Our COBOL parser for DMS can handle all of the above and generate an AST dump or an XML equivalent.
Given this COBOL copybook fragment:
003110****************************************
003120* GEGVD - GEGVD-0872-WS
003130****************************************
003140 01 GEGVDC.
003150****************************************
003160****************************************
003170 10 GEJVD-CDA PIC X(11)
003180 VALUE
003190 '<<<GEGVD>>>'.
003200 10 GEJVD-COUNT PIC 9(9)
003210 VALUE ZERO
003220 COMPUTATIONAL-3.
003230 10 GEJVD-ITER PIC 9(3)
003240 VALUE ZERO
003250 COMPUTATIONAL-3.
003260 10 GEJVD-OP-CODE PIC X(1)
003270 VALUE SPACE.
003280 10 GEJVD-STATUS PIC X(2)
003290 VALUE SPACE.
003300 10 GEJVD-OPEN-SW PIC X(1)
003310 VALUE SPACE.
003320 10 GEGVD-STATUS PIC X(2)
003330 VALUE SPACE.
003340 10 STATUS-CODE REDEFINES
003350 GEGVD-STATUS PIC X(2).
003360 01 GEGVD.
003370****************************************
003380****************************************
003390 02 GEGVD-C.
003400 10 GEGVD-INPT-SCTN.
003410 15 GEGVD-ACTL-SERL-LN-CD PIC X(1)
003420 VALUE SPACE.
003430 15 GEGVD-FORM-VER-CD PIC X(5)
003440 VALUE SPACE.
003450 10 GEGVD-OUTP-SCTN.
003460 15 GEGVD-GE-RULE-RSLT-CD PIC X(1)
003470 VALUE SPACE.
003480 10 GEGVD-WORK-SCTN.
003490 15 GEGVD-GE-RSN-00826-ID PIC S9(5)
003500 VALUE +00826
003510 COMPUTATIONAL-3.
003520 15 GEGVD-WS-N-LIT PIC X(1)
003530 VALUE 'N'.
003540 15 GEGVD-WS-S-LIT PIC X(1)
003550 VALUE 'S'.
003560 15 GEGVD-MPN-FORM-VER-CD PIC X(5)
003570 VALUE '99-00'.
003580 15 GEGVD-PLUS-MPN-FORM-VER-CD PIC X(5)
003590 VALUE '03-04'.
our COBOL parser applied directly to the copybook file,
produces the following abstract syntax tree:
C:\DMS\Domains\COBOL\IBMEnterprise\Tools\Parser\Source>run ../domainparser ++AST c:/DMS/Domains/COBOL/IBMEnterprise/Examples/SallieMae/copylibs/GEGVD.COP
COBOL~IBMEnterprise Domain Parser Version 2.7.17
Copyright (C) 1996-2017 Semantic Designs, Inc; All Rights Reserved; SD Confidential
Powered by DMS (R) Software Reengineering Toolkit
AST Optimizations: remove constant tokens, remove unary productions, compact sequences
Using encoding Unicode-UTF-8?ANSI +CRLF +1 /^I
393 tree nodes in tree.
(cobol_source_file#COBOL~IBMEnterprise=17#92facc0^0 Line 4 Column 8
(record_or_data_item_entry_list#COBOL~IBMEnterprise=721#92fac80^1#92facc0:1 Line 4 Column 8
(record_01_description_entry#COBOL~IBMEnterprise=576#92f6240^1#92fac80:1 Line 4 Column 8
(one#COBOL~IBMEnterprise=2170#92f10e0^1#92f6240:1 Line 4 Column 8
|(unsigned_integer_number#COBOL~IBMEnterprise=2579#92f1080^1#92f10e0:1[1] Line 4 Column 8
| precomment 0:1 Type 0 Line 1 Column 7 `****************************************'
| precomment 0:2 Type 0 Line 2 Column 7 `* GEGVD - GEGVD-0872-WS'
| precomment 0:3 Type 0 Line 3 Column 7 `****************************************')unsigned_integer_number
)one#92f10e0
(data_description_entry#COBOL~IBMEnterprise=607#92f12c0^1#92f6240:2 Line 4 Column 12
|(composed_identifier#COBOL~IBMEnterprise=2351#92f12a0^1#92f12c0:1 Line 4 Column 12
| (identifier#COBOL~IBMEnterprise=2850#92f10c0^1#92f12a0:1[`GEGVDC'] Line 4 Column 12
|)composed_identifier#92f12a0
)data_description_entry#92f12c0
('.'#COBOL~IBMEnterprise=2358#92f1280^1#92f6240:3[Keyword:4] Line 4 Column 18
(subsidiary_description_entry_list#COBOL~IBMEnterprise=585#92f61c0^1#92f6240:4 Line 7 Column 15
|(subsidiary_description_entry#COBOL~IBMEnterprise=588#92f1800^1#92f61c0:1 Line 7 Column 15
| (level_number#COBOL~IBMEnterprise=2174#92f1360^1#92f1800:1 Line 7 Column 15
| (unsigned_integer_number#COBOL~IBMEnterprise=2579#92f1320^1#92f1360:1[10] Line 7 Column 15
| precomment 0:1 Type 0 Line 5 Column 7 `****************************************'
| precomment 0:2 Type 0 Line 6 Column 7 `****************************************')unsigned_integer_number
| )level_number#92f1360
| (data_description_entry#COBOL~IBMEnterprise=608#92f1720^1#92f1800:2 Line 7 Column 18
| (composed_identifier#COBOL~IBMEnterprise=2351#92f13c0^1#92f1720:1 Line 7 Column 18
| (identifier#COBOL~IBMEnterprise=2850#92f1340^1#92f13c0:1[`GEJVD-CDA'] Line 7 Column 18
er
| )composed_identifier#92f13c0
| (data_description_clause_list#COBOL~IBMEnterprise=613#92f16e0^1#92f1720:2 Line 7 Column 55
| (picture_clause#COBOL~IBMEnterprise=653#92f1520^1#92f16e0:1 Line 7 Column 55
| |(pic_picture#COBOL~IBMEnterprise=194#92f1420^1#92f1520:1 Line 7 Column 55
| | ('PIC'#COBOL~IBMEnterprise=2430#92f13a0^1#92f1420:1[Keyword:0] Line 7 Column 55
| |)pic_picture#92f1420
| |(optional_is#COBOL~IBMEnterprise=59#92f1440^1#92f1520:2 Line 7 Column 59
| |(picture_string#COBOL~IBMEnterprise=656#92f1500^1#92f1520:3 Line 7 Column 59
| | (alphanumeric_picture_string#COBOL~IBMEnterprise=2543#92f1400^1#92f1500:1[`X(11)'] Line 7 Column 59
VD.COP)alphanumeric_picture_string
| |)picture_string#92f1500
| )picture_clause#92f1520
| (value_is_clause#COBOL~IBMEnterprise=711#92f16a0^1#92f16e0:2 Line 8 Column 44
| |(value_values#COBOL~IBMEnterprise=202#92f1600^1#92f16a0:1 Line 8 Column 44
| | ('VALUE'#COBOL~IBMEnterprise=2437#92f14e0^1#92f1600:1[Keyword:0] Line 8 Column 44
| |)value_values#92f1600
| |(optional_is_are#COBOL~IBMEnterprise=63#92f1620^1#92f16a0:2 Line 9 Column 20
| |(non_figurative_non_numeric_literal#COBOL~IBMEnterprise=2205#92f1680^1#92f16a0:3 Line 9 Column 20
COP
| | (non_numeric_literal_string#COBOL~IBMEnterprise=2187#92f1660^1#92f1680:1 Line 9 Column 20
| | (non_numeric_literal_quote#COBOL~IBMEnterprise=2840#92f15e0^1#92f1660:1[`<<<GEGVD>>>'] Line 9 Column 20
s/GEGVD.COP)non_numeric_literal_quote
| | )non_numeric_literal_string#92f1660
| |)non_figurative_non_numeric_literal#92f1680
| )value_is_clause#92f16a0
| )data_description_clause_list#92f16e0
| )data_description_entry#92f1720
| ('.'#COBOL~IBMEnterprise=2358#92f1640^1#92f1800:3[Keyword:0] Line 9 Column 33
|)subsidiary_description_entry#92f1800
|(subsidiary_description_entry_list#COBOL~IBMEnterprise=585#92f6120^1#92f61c0:2 Line 10 Column 15
| (subsidiary_description_entry#COBOL~IBMEnterprise=588#92f1c20^1#92f6120:1 Line 10 Column 15
| (level_number#COBOL~IBMEnterprise=2174#92f18a0^1#92f1c20:1 Line 10 Column 15
| (unsigned_integer_number#COBOL~IBMEnterprise=2579#92f17e0^1#92f18a0:1[10] Line 10 Column 15
igned_integer_number
| )level_number#92f18a0
| (data_description_entry#COBOL~IBMEnterprise=608#92f1bc0^1#92f1c20:2 Line 10 Column 18
| (composed_identifier#COBOL~IBMEnterprise=2351#92f18e0^1#92f1bc0:1 Line 10 Column 18
| |(identifier#COBOL~IBMEnterprise=2850#92f1820^1#92f18e0:1[`GEJVD-COUNT'] Line 10 Column 18
tifier
| )composed_identifier#92f18e0
| (data_description_clause_list#COBOL~IBMEnterprise=613#92f1ba0^1#92f1bc0:2 Line 10 Column 55
| |(data_description_clause_list#COBOL~IBMEnterprise=613#92f1aa0^1#92f1ba0:1 Line 10 Column 55
| | (picture_clause#COBOL~IBMEnterprise=653#92f19a0^1#92f1aa0:1 Line 10 Column 55
| | (pic_picture#COBOL~IBMEnterprise=194#92f1920^1#92f19a0:1 Line 10 Column 55
| | ('PIC'#COBOL~IBMEnterprise=2430#92f18c0^1#92f1920:1[Keyword:0] Line 10 Column 55
| | )pic_picture#92f1920
| | (optional_is#COBOL~IBMEnterprise=59#92f1940^1#92f19a0:2 Line 10 Column 59
| | (picture_string#COBOL~IBMEnterprise=655#92f1980^1#92f19a0:3 Line 10 Column 59
| | (numeric_picture_string#COBOL~IBMEnterprise=2542#92f1900^1#92f1980:1[`9(9)'] Line 10 Column 59
COP)numeric_picture_string
| | )picture_string#92f1980
| | )picture_clause#92f19a0
| | (value_is_clause#COBOL~IBMEnterprise=711#92f1a80^1#92f1aa0:2 Line 11 Column 44
| | (value_values#COBOL~IBMEnterprise=202#92f1a00^1#92f1a80:1 Line 11 Column 44
| | ('VALUE'#COBOL~IBMEnterprise=2437#92f1960^1#92f1a00:1[Keyword:0] Line 11 Column 44
| | )value_values#92f1a00
| | (optional_is_are#COBOL~IBMEnterprise=63#92f1a20^1#92f1a80:2 Line 11 Column 54
e
| | (non_all_figurative_numeric_or_non_numeric_literal#COBOL~IBMEnterprise=2203#92f1a60^1#92f1a80:3 Line 11 Column 54
ae/copylibs/GEGVD.COP
| | ('ZERO'#COBOL~IBMEnterprise=2533#92f19e0^1#92f1a60:1[Keyword:0] Line 11 Column 54
| | )non_all_figurative_numeric_or_non_numeric_literal#92f1a60
| | )value_is_clause#92f1a80
| |)data_description_clause_list#92f1aa0
| |(usage_clause#COBOL~IBMEnterprise=684#92f1b80^1#92f1ba0:2 Line 12 Column 44
| | (optional_usage_is#COBOL~IBMEnterprise=674#92f1ae0^1#92f1b80:1 Line 12 Column 44
ge_is
| | ('COMPUTATIONAL-3'#COBOL~IBMEnterprise=2565#92f1a40^1#92f1b80:2[Keyword:0] Line 12 Column 44
'COMPUTATIONAL-3'
| |)usage_clause#92f1b80
| )data_description_clause_list#92f1ba0
| )data_description_entry#92f1bc0
| ('.'#COBOL~IBMEnterprise=2358#92f1b60^1#92f1c20:3[Keyword:0] Line 12 Column 59
| )subsidiary_description_entry#92f1c20
| (subsidiary_description_entry_list#COBOL~IBMEnterprise=585#92f5fe0^1#92f6120:2 Line 13 Column 15
| (subsidiary_description_entry#COBOL~IBMEnterprise=588#92f4120^1#92f5fe0:1 Line 13 Column 15
| (level_number#COBOL~IBMEnterprise=2174#92f1d40^1#92f4120:1 Line 13 Column 15
| |(unsigned_integer_number#COBOL~IBMEnterprise=2579#92f1be0^1#92f1d40:1[10] Line 13 Column 15
signed_integer_number
| )level_number#92f1d40
| (data_description_entry#COBOL~IBMEnterprise=608#92f40c0^1#92f4120:2 Line 13 Column 18
| |(composed_identifier#COBOL~IBMEnterprise=2351#92f1d80^1#92f40c0:1 Line 13 Column 18
| | (identifier#COBOL~IBMEnterprise=2850#92f1d00^1#92f1d80:1[`GEJVD-ITER'] Line 13 Column 18
tifier
| |)composed_identifier#92f1d80
| |(data_description_clause_list#COBOL~IBMEnterprise=613#92f40a0^1#92f40c0:2 Line 13 Column 55
| | (data_description_clause_list#COBOL~IBMEnterprise=613#92f1fa0^1#92f40a0:1 Line 13 Column 55
| | (picture_clause#COBOL~IBMEnterprise=653#92f1e40^1#92f1fa0:1 Line 13 Column 55
| | (pic_picture#COBOL~IBMEnterprise=194#92f1dc0^1#92f1e40:1 Line 13 Column 55
| | |('PIC'#COBOL~IBMEnterprise=2430#92f1d60^1#92f1dc0:1[Keyword:0] Line 13 Column 55
| | )pic_picture#92f1dc0
| | (optional_is#COBOL~IBMEnterprise=59#92f1de0^1#92f1e40:2 Line 13 Column 59
| | (picture_string#COBOL~IBMEnterprise=655#92f1e20^1#92f1e40:3 Line 13 Column 59
| | |(numeric_picture_string#COBOL~IBMEnterprise=2542#92f1da0^1#92f1e20:1[`9(3)'] Line 13 Column 59
.COP)numeric_picture_string
| | )picture_string#92f1e20
| | )picture_clause#92f1e40
| | (value_is_clause#COBOL~IBMEnterprise=711#92f1f80^1#92f1fa0:2 Line 14 Column 44
| | (value_values#COBOL~IBMEnterprise=202#92f1f00^1#92f1f80:1 Line 14 Column 44
| | |('VALUE'#COBOL~IBMEnterprise=2437#92f1e00^1#92f1f00:1[Keyword:0] Line 14 Column 44
| | )value_values#92f1f00
| | (optional_is_are#COBOL~IBMEnterprise=63#92f1f20^1#92f1f80:2 Line 14 Column 54
re
| | (non_all_figurative_numeric_or_non_numeric_literal#COBOL~IBMEnterprise=2203#92f1f60^1#92f1f80:3 Line 14 Column 54
Mae/copylibs/GEGVD.COP
| | |('ZERO'#COBOL~IBMEnterprise=2533#92f1ee0^1#92f1f60:1[Keyword:0] Line 14 Column 54
| | )non_all_figurative_numeric_or_non_numeric_literal#92f1f60
| | )value_is_clause#92f1f80
| | )data_description_clause_list#92f1fa0
| | (usage_clause#COBOL~IBMEnterprise=684#92f4080^1#92f40a0:2 Line 15 Column 44
| | (optional_usage_is#COBOL~IBMEnterprise=674#92f4000^1#92f4080:1 Line 15 Column 44
age_is
| | ('COMPUTATIONAL-3'#COBOL~IBMEnterprise=2565#92f1f40^1#92f4080:2[Keyword:0] Line 15 Column 44
)'COMPUTATIONAL-3'
| | )usage_clause#92f4080
| |)data_description_clause_list#92f40a0
| )data_description_entry#92f40c0
| ('.'#COBOL~IBMEnterprise=2358#92f4060^1#92f4120:3[Keyword:0] Line 15 Column 59
| )subsidiary_description_entry#92f4120
| (subsidiary_description_entry_list#COBOL~IBMEnterprise=585#92f5e60^1#92f5fe0:2 Line 16 Column 15
| (subsidiary_description_entry#COBOL~IBMEnterprise=588#92f45c0^1#92f5e60:1 Line 16 Column 15
| |(level_number#COBOL~IBMEnterprise=2174#92f4240^1#92f45c0:1 Line 16 Column 15
| | (unsigned_integer_number#COBOL~IBMEnterprise=2579#92f40e0^1#92f4240:1[10] Line 16 Column 15
nsigned_integer_number
| |)level_number#92f4240
| |(data_description_entry#COBOL~IBMEnterprise=608#92f44e0^1#92f45c0:2 Line 16 Column 18
| | (composed_identifier#COBOL~IBMEnterprise=2351#92f4280^1#92f44e0:1 Line 16 Column 18
| | (identifier#COBOL~IBMEnterprise=2850#92f4200^1#92f4280:1[`GEJVD-OP-CODE'] Line 16 Column 18
identifier
| | )composed_identifier#92f4280
| | (data_description_clause_list#COBOL~IBMEnterprise=613#92f44a0^1#92f44e0:2 Line 16 Column 55
| | (picture_clause#COBOL~IBMEnterprise=653#92f4340^1#92f44a0:1 Line 16 Column 55
| | (pic_picture#COBOL~IBMEnterprise=194#92f42c0^1#92f4340:1 Line 16 Column 55
| | |('PIC'#COBOL~IBMEnterprise=2430#92f4260^1#92f42c0:1[Keyword:0] Line 16 Column 55
| | )pic_picture#92f42c0
| | (optional_is#COBOL~IBMEnterprise=59#92f42e0^1#92f4340:2 Line 16 Column 59
| | (picture_string#COBOL~IBMEnterprise=656#92f4320^1#92f4340:3 Line 16 Column 59
| | |(alphanumeric_picture_string#COBOL~IBMEnterprise=2543#92f42a0^1#92f4320:1[`X(1)'] Line 16 Column 59
GEGVD.COP)alphanumeric_picture_string
| | )picture_string#92f4320
| | )picture_clause#92f4340
| | (value_is_clause#COBOL~IBMEnterprise=711#92f4480^1#92f44a0:2
** middle of AST deleted due to size limitations on SO answers **
| | (level_number#COBOL~IBMEnterprise=2174#92f7140^1#92f7600:1 Line 44 Column 18
| | |(unsigned_integer_number#COBOL~IBMEnterprise=2579#92f6fa0^1#92f7140:1[15] Line 44 Column 18
P)unsigned_integer_number
| | )level_number#92f7140
| | (data_description_entry#COBOL~IBMEnterprise=608#92f7540^1#92f7600:2 Line 44 Column 21
| | |(composed_identifier#COBOL~IBMEnterprise=2351#92f71a0^1#92f7540:1 Line 44 Column 21
| | | (identifier#COBOL~IBMEnterprise=2850#92f70c0^1#92f71a0:1[`GEGVD-WS-S-LIT'] Line 44 Column 21
COP)identifier
| | |)composed_identifier#92f71a0
| | |(data_description_clause_list#COBOL~IBMEnterprise=613#92f74a0^1#92f7540:2 Line 44 Column 55
P
| | | (picture_clause#COBOL~IBMEnterprise=653#92f72c0^1#92f74a0:1 Line 44 Column 55
| | | (pic_picture#COBOL~IBMEnterprise=194#92f7220^1#92f72c0:1 Line 44 Column 55
| | | ('PIC'#COBOL~IBMEnterprise=2430#92f7160^1#92f7220:1[Keyword:0] Line 44 Column 55
| | | )pic_picture#92f7220
| | | (optional_is#COBOL~IBMEnterprise=59#92f7260^1#92f72c0:2 Line 44 Column 59
| | | (picture_string#COBOL~IBMEnterprise=656#92f72a0^1#92f72c0:3 Line 44 Column 59
| | | (alphanumeric_picture_string#COBOL~IBMEnterprise=2543#92f7200^1#92f72a0:1[`X(1)'] Line 44 Column 59
bs/GEGVD.COP)alphanumeric_picture_string
| | | )picture_string#92f72a0
| | | )picture_clause#92f72c0
| | | (value_is_clause#COBOL~IBMEnterprise=711#92f7460^1#92f74a0:2 Line 45 Column 44
| | | (value_values#COBOL~IBMEnterprise=202#92f7380^1#92f7460:1 Line 45 Column 44
| | | ('VALUE'#COBOL~IBMEnterprise=2437#92f7280^1#92f7380:1[Keyword:0] Line 45 Column 44
UE'
| | | )value_values#92f7380
| | | (optional_is_are#COBOL~IBMEnterprise=63#92f73a0^1#92f7460:2 Line 45 Column 54
s_are
| | | (non_figurative_non_numeric_literal#COBOL~IBMEnterprise=2205#92f7440^1#92f7460:3 Line 45 Column 54
/GEGVD.COP
| | | (non_numeric_literal_string#COBOL~IBMEnterprise=2187#92f7420^1#92f7440:1 Line 45 Column 54
COP
| | | |(non_numeric_literal_quote#COBOL~IBMEnterprise=2840#92f7360^1#92f7420:1[`S'] Line 45 Column 54
EGVD.COP)non_numeric_literal_quote
| | | )non_numeric_literal_string#92f7420
| | | )non_figurative_non_numeric_literal#92f7440
| | | )value_is_clause#92f7460
| | |)data_description_clause_list#92f74a0
| | )data_description_entry#92f7540
| | ('.'#COBOL~IBMEnterprise=2358#92f73c0^1#92f7600:3[Keyword:0] Line 45 Column 57
| | )subsidiary_description_entry#92f7600
| | (subsidiary_description_entry_list#COBOL~IBMEnterprise=585#92fa180^1#92fa200:2 Line 46 Column 18
.COP
| | (subsidiary_description_entry#COBOL~IBMEnterprise=588#92f7b40^1#92fa180:1 Line 46 Column 18
| | |(level_number#COBOL~IBMEnterprise=2174#92f7760^1#92f7b40:1 Line 46 Column 18
| | | (unsigned_integer_number#COBOL~IBMEnterprise=2579#92f75a0^1#92f7760:1[15] Line 46 Column 18
OP)unsigned_integer_number
| | |)level_number#92f7760
| | |(data_description_entry#COBOL~IBMEnterprise=608#92f7a60^1#92f7b40:2 Line 46 Column 21
| | | (composed_identifier#COBOL~IBMEnterprise=2351#92f77c0^1#92f7a60:1 Line 46 Column 21
| | | (identifier#COBOL~IBMEnterprise=2850#92f7660^1#92f77c0:1[`GEGVD-MPN-FORM-VER-CD'] Line 46 Column 21
s/GEGVD.COP)identifier
| | | )composed_identifier#92f77c0
| | | (data_description_clause_list#COBOL~IBMEnterprise=613#92f7a20^1#92f7a60:2 Line 46 Column 55
OP
| | | (picture_clause#COBOL~IBMEnterprise=653#92f7880^1#92f7a20:1 Line 46 Column 55
| | | (pic_picture#COBOL~IBMEnterprise=194#92f7800^1#92f7880:1 Line 46 Column 55
| | | |('PIC'#COBOL~IBMEnterprise=2430#92f77a0^1#92f7800:1[Keyword:0] Line 46 Column 55
| | | )pic_picture#92f7800
| | | (optional_is#COBOL~IBMEnterprise=59#92f7820^1#92f7880:2 Line 46 Column 59
| | | (picture_string#COBOL~IBMEnterprise=656#92f7860^1#92f7880:3 Line 46 Column 59
| | | |(alphanumeric_picture_string#COBOL~IBMEnterprise=2543#92f77e0^1#92f7860:1[`X(5)'] Line 46 Column 59
ibs/GEGVD.COP)alphanumeric_picture_string
| | | )picture_string#92f7860
| | | )picture_clause#92f7880
| | | (value_is_clause#COBOL~IBMEnterprise=711#92f79c0^1#92f7a20:2 Line 47 Column 44
| | | (value_values#COBOL~IBMEnterprise=202#92f7920^1#92f79c0:1 Line 47 Column 44
| | | |('VALUE'#COBOL~IBMEnterprise=2437#92f7840^1#92f7920:1[Keyword:0] Line 47 Column 44
LUE'
| | | )value_values#92f7920
| | | (optional_is_are#COBOL~IBMEnterprise=63#92f7940^1#92f79c0:2 Line 47 Column 54
is_are
| | | (non_figurative_non_numeric_literal#COBOL~IBMEnterprise=2205#92f79a0^1#92f79c0:3 Line 47 Column 54
s/GEGVD.COP
| | | |(non_numeric_literal_string#COBOL~IBMEnterprise=2187#92f7980^1#92f79a0:1 Line 47 Column 54
.COP
| | | | (non_numeric_literal_quote#COBOL~IBMEnterprise=2840#92f7900^1#92f7980:1[`99-00'] Line 47 Column 54
ibs/GEGVD.COP)non_numeric_literal_quote
| | | |)non_numeric_literal_string#92f7980
| | | )non_figurative_non_numeric_literal#92f79a0
| | | )value_is_clause#92f79c0
| | | )data_description_clause_list#92f7a20
| | |)data_description_entry#92f7a60
| | |('.'#COBOL~IBMEnterprise=2358#92f7960^1#92f7b40:3[Keyword:0] Line 47 Column 61
| | )subsidiary_description_entry#92f7b40
| | (subsidiary_description_entry#COBOL~IBMEnterprise=588#92fa0c0^1#92fa180:2 Line 48 Column 18
| | |(level_number#COBOL~IBMEnterprise=2174#92f7c80^1#92fa0c0:1 Line 48 Column 18
| | | (unsigned_integer_number#COBOL~IBMEnterprise=2579#92f7ae0^1#92f7c80:1[15] Line 48 Column 18
OP)unsigned_integer_number
| | |)level_number#92f7c80
| | |(data_description_entry#COBOL~IBMEnterprise=608#92f7fc0^1#92fa0c0:2 Line 48 Column 21
| | | (composed_identifier#COBOL~IBMEnterprise=2351#92f7cc0^1#92f7fc0:1 Line 48 Column 21
| | | (identifier#COBOL~IBMEnterprise=2850#92f7b60^1#92f7cc0:1[`GEGVD-PLUS-MPN-FORM-VER-CD'] Line 48 Column 21
pylibs/GEGVD.COP)identifier
| | | )composed_identifier#92f7cc0
| | | (data_description_clause_list#COBOL~IBMEnterprise=613#92f7f60^1#92f7fc0:2 Line 48 Column 55
OP
| | | (picture_clause#COBOL~IBMEnterprise=653#92f7de0^1#92f7f60:1 Line 48 Column 55
| | | (pic_picture#COBOL~IBMEnterprise=194#92f7d20^1#92f7de0:1 Line 48 Column 55
| | | |('PIC'#COBOL~IBMEnterprise=2430#92f7ca0^1#92f7d20:1[Keyword:0] Line 48 Column 55
| | | )pic_picture#92f7d20
| | | (optional_is#COBOL~IBMEnterprise=59#92f7d40^1#92f7de0:2 Line 48 Column 59
| | | (picture_string#COBOL~IBMEnterprise=656#92f7da0^1#92f7de0:3 Line 48 Column 59
| | | |(alphanumeric_picture_string#COBOL~IBMEnterprise=2543#92f7ce0^1#92f7da0:1[`X(5)'] Line 48 Column 59
ibs/GEGVD.COP)alphanumeric_picture_string
| | | )picture_string#92f7da0
| | | )picture_clause#92f7de0
| | | (value_is_clause#COBOL~IBMEnterprise=711#92f7f20^1#92f7f60:2 Line 49 Column 44
| | | (value_values#COBOL~IBMEnterprise=202#92f7e80^1#92f7f20:1 Line 49 Column 44
| | | |('VALUE'#COBOL~IBMEnterprise=2437#92f7d80^1#92f7e80:1[Keyword:0] Line 49 Column 44
LUE'
| | | )value_values#92f7e80
| | | (optional_is_are#COBOL~IBMEnterprise=63#92f7ea0^1#92f7f20:2 Line 49 Column 54
is_are
| | | (non_figurative_non_numeric_literal#COBOL~IBMEnterprise=2205#92f7f00^1#92f7f20:3 Line 49 Column 54
s/GEGVD.COP
| | | |(non_numeric_literal_string#COBOL~IBMEnterprise=2187#92f7ee0^1#92f7f00:1 Line 49 Column 54
.COP
| | | | (non_numeric_literal_quote#COBOL~IBMEnterprise=2840#92f7e60^1#92f7ee0:1[`03-04'] Line 49 Column 54
ibs/GEGVD.COP)non_numeric_literal_quote
| | | |)non_numeric_literal_string#92f7ee0
| | | )non_figurative_non_numeric_literal#92f7f00
| | | )value_is_clause#92f7f20
| | | )data_description_clause_list#92f7f60
| | |)data_description_entry#92f7fc0
| | |('.'#COBOL~IBMEnterprise=2358#92f7ec0^1#92fa0c0:3[Keyword:0] Line 49 Column 61
| | )subsidiary_description_entry#92fa0c0
| | )subsidiary_description_entry_list#92fa180
| | )subsidiary_description_entry_list#92fa200
| |)subsidiary_description_entry_list#92fa340
| )subsidiary_description_entry_list#92fa3c0
| )subsidiary_description_entry#92fa500
| )subsidiary_description_entry_list#92fa6e0
|)subsidiary_description_entry_list#92fa7a0
)subsidiary_description_entry#92fa840
)record_01_description_entry#92faac0
)record_or_data_item_entry_list#92fac80
)cobol_source_file#92facc0
C:\DMS\Domains\COBOL\IBMEnterprise\Tools\Parser\Source>
Once you have the tree, it is fairly straightforward to walk over the tree and extract facts about the COBOL symbol declarations.

Related

Counting if range matches ranged criteria 1:1

I have an ongoing scoreboard with a friend for a game we play. It looks like this:
A B C D E F
+-----------------------------+-------+------+--------+--------+------------+
1 | Through the Ages Scoreboard | | | | | |
+-----------------------------+-------+------+--------+--------+------------+
2 | Game title | Kevin | M | First? | Winner | Difference |
+-----------------------------+-------+------+--------+--------+------------+
3 | thekoalaz's Game | 174 | 213 | Kevin | M | 39 |
4 | Game #0 | 242 | 126 | Kevin | Kevin | 116 |
5 | Game #1 | 105 | 146 | Kevin | M | 41 |
6 | Game #2 | 158 | 135 | Kevin | Kevin | 23 |
7 | Game #3 | 149 | 145 | M | Kevin | 4 |
8 | Game #4 | 91 | 145 | Kevin | M | 54 |
9 | Game #5 | 211 | 187 | M | Kevin | 24 |
10 | Game #6 | 160 | 158 | M | Kevin | 2 |
11 | Game #7 | 154 | 215 | Kevin | M | 61 |
12 | Game #8 | 169 | 177 | M | M | 8 |
13 | Game #9 | 135 | 129 | M | Kevin | 6 |
14 | Game #10 | 156 | 262 | Kevin | M | 106 |
15 | Game #11 | 205 | 171 | M | Kevin | 34 |
16 | Game #12 (2) | 186 | 203 | Kevin | M | 17 |
17 | | | | | | |
+-----------------------------+-------+------+--------+--------+------------+
Where there's space at the end of the board to add scores for future games.
How do I count how many times the player who goes first wins? In this case it should be 3: D4 = E4, D6 = E6, D12 = E12. Is this possible to do in a single formula? And I'd like to make adding future game scores "just work" with this as well.
Here, first is {K;K;K;K;M;K;M;M;K;M;M;K;M;K}
And winner is {M;K;M;K;K;M;K;K;M;M;K;M;K;M}
I tried =COUNTIF($E$3:$E, $D$3:$D), but this gives me 7, which I presume is the same as =COUNTIF($E$3:$E, $D$3), without the ranged criteria.
Other ranged criteria questions didn't seem to focus on this 1:1 necessity (or maybe I don't know how to word it).
Here's what I used:
=SUMPRODUCT(D3:D=E3:E, E3:E<>"")
Let's break it down.
D3:D=E3:E (also expressible as EQ(D3:D, E3:E)) - equality. I tried to figure out the concept of testing equality of ranges, but the best thing I could find was Microsoft's tutorial on array formulas. What I can say is if you just put =D3:D=E3:E in your Google sheet, it will just be one of the results--the one that matches the row. It requires =ArrayFormula(D3:D=E3:E) to enter as the array of equality results.
SUMPRODUCT - Sums the product of corresponding array elements between multiple arrays. For example, SUMPRODUCT({1,3}, {2,4}) = 1*2 + 2*4 = 10. If used with one array, it would just aggregate the array's values. TRUE=1 and FALSE=0, so when considering the array formula above, it will count how many times D3:D=E3:E is true. Ranges work as arrays, so maybe that's why wrapping the equality with ArrayFormula(...) isn't necessary
E3:E<>"" - Another array formula testing if the E cell is not empty (<> is the "not equals" sign). Because I want this to automatically work for any new entries, D3:D=E3:E will evaluate true for any empty entries (empty=empty). Mutliplying these two array formulas together is effectively an AND operator--"sum this if Dn=En AND En is not empty". To convince you, here are the truth tables:
+-----+---+---+ +------+---+---+
| AND | T | F | | MULT | 1 | 0 |
+-----+---+---+ +------+---+---+
| T | T | F | | 1 | 1 | 0 |
| F | F | F | | 0 | 0 | 0 |
+-----+---+---+ +------+---+---+

Calculate a bunch of data to display on stacked bar

I'm struggeling with creating my first chart.
i have a dataset of ordinal scaled data from a survey.
There i have several question with the possible answer from 1 - 5.
So have around 110 answers from different persons which i want to collect and show in a stacked bar.
Those data looks like:
| taste | region | brand | price |
| 1 | 3 | 4 | 2 |
| 1 | 1 | 5 | 1 |
| 1 | 3 | 4 | 3 |
| 2 | 2 | 5 | 1 |
| 1 | 1 | 4 | 5 |
| 5 | 3 | 5 | 2 |
| 1 | 5 | 5 | 2 |
| 2 | 4 | 1 | 3 |
| 1 | 3 | 5 | 4 |
| 1 | 4 | 4 | 5 |
...
to can display that in a stacked bar chart, i need to sum that.
so i know at the end it need to be calculated like:
| | taste | region | brand | price |
| 1 | 60 | 20 | 32 | 12 |
| 2 | 23 | 32 | 54 | 22 |
| 3 | 24 | 66 | 36 | 65 |
| 4 | 55 | 68 | 28 | 54 |
| 5 | 10 | 10 | 12 | 22 |
(this is just to demonstarte, the values are not correct)
Or somehow there is already a function for it on spss but i have now idea where an how.
Any advice how to do that?
I can't think of a single command but there are many ways to get to where you want. Here's one:
first recreating your sample data:
data list list/ taste region brand price .
begin data
1 3 4 2
1 1 5 1
1 3 4 3
2 2 5 1
1 1 4 5
5 3 5 2
1 5 5 2
2 4 1 3
1 3 5 4
1 4 4 5
end data.
Now counting the values for each row:
vector t(5) r(5) b(5) p(5).
* the vector command is only nescessary so the new variables will be ordered compfortably for the following parts.
do repeat vl= 1 to 5/t=t1 to t5/r=r1 to r5/b=b1 to b5/p=p1 to p5.
compute t=(taste=vl).
compute r=(region=vl).
compute b=(brand=vl).
compute p=(price=vl).
end repeat.
Now we can aggregate and restructure to arrive to the the exact data structure you specified:
aggregate /outfile=* /break= /t1 to t5 r1 to r5 b1 to b5 p1 to p5 = sum(t1 to p5).
varstocases /make taste from t1 to t5 /make region from r1 to r5
/make brand from b1 to b5/ make price from p1 to p5/index=val(taste).
compute val = char.substr(val,2,1).
alter type val(f1).

SQL - several columns to one

Is it possible to do a query in sql, to turn a several amount of columns to only one?
An example, turn the current database structure:
**product_ID | month_A | month_B | month_C**
AAAAA | 15 | 18 | 16
BBBBB | 20 | 21 | 26
CCCCC | 40 | 48 | 41
That I would like to change, so I can better use pivot tables in Excel:
**product_ID | sales_qt| month**
AAAAA | 15 | A
AAAAA | 18 | B
AAAAA | 16 | C
BBBBB | 20 | A
BBBBB | 21 | B
BBBBB | 26 | C
CCCCC | 40 | A
CCCCC | 48 | B
CCCCC | 41 | C
Best regards!!!
I am not intimately familiar with Pervasive. But a standard SQL method would be:
select product_ID, month_A as sales_qt, 'A' as month
from t
union all
select product_ID, month_B as sales_qt, 'B' as month
from t
union all
select product_ID, month_C as sales_qt, 'C' as month
from t;

writing a custom template/parser/filter for use in syslog-ng

My application generates logs and sends them to syslog-ng.
I want to write a custom template/parser/filter for use in syslog-ng to correctly store the fields in tables of an SQLite database (MyDatabase).
This is the legend of my log:
unique-record-id usename date Quantity BOQ possible,item,profiles Count Vendor applicable,vendor,categories known,request,types vendor_code credit
All these 12 fields are tab separated, and the parser must store them into 12 columns of table MyTable1 in MyDatabase.
Some of the fields: the 6th, 9th, and 10th however also contain "sub-fields" as comma-separated values.
The number of values within each of these sub-fields, is variable, and can change in each line of log.
I need these fields to be stored in respective separate tables
MyItem_type, MyVendor_groups, MyReqs
These "secondary" tables have 3 columns, record the Unique-Record-ID, and Quantity against each of their occurence in the log
So the schema in MyItem_type table looks like:
Unique-Record-ID | item_profile | Quantity
Similarly the schema of MyVendor_groups looks like:
Unique-Record-ID | vendor_category | Quantity
and the schema of MyReqs looks like:
Unique-Record-ID | req_type | Quantity
Consider these sample lines from the log:
unique-record-id usename date Quantity BOQ possible,item,profiles Count Vendor applicable,vendor,categories known,request,types vendor_code credit
234.44.tfhj Sam 22-03-2016 22 prod1 cat1,cat22,cat36,cat44 66 ven1 t1,t33,t43,t49 req1,req2,req3,req4 blue 64.22
234.45.tfhj Alex 23-03-2016 100 prod2 cat10,cat36,cat42 104 ven1 t22,t45 req1,req2,req33,req5 red 66
234.44.tfhj Vikas 24-03-2016 88 prod1 cat101,cat316,cat43 22 ven2 t22,t43 req1,req23,req3,req6 red 77.12
234.47.tfhj Jane 25-03-2016 22 prod7 cat10,cat36,cat44 43 ven3 t77 req1,req24,req3,req7 green 45.89
234.48.tfhj John 26-03-2016 97 serv3 cat101,cat36,cat45 69 ven5 t1 req11,req2,req3,req8 orange 33.04
234.49.tfhj Ruby 27-03-2016 85 prod58 cat10,cat38,cat46 88 ven9 t33,t55,t99 req1,req24,req3,req9 white 46.04
234.50.tfhj Ahmed 28-03-2016 44 serv7 cat110,cat36,cat47 34 ven11 t22,t43,t77 req1,req20,req3,req10 red 43
My parser should store the above log into MyDatabase.Mytable1 as:
unique-record-id | usename | date | Quantity | BOQ | item_profile | Count | Vendor | vendor_category | req_type | vendor_code | credit
234.44.tfhj | Sam | 22-03-2016 | 22 | prod1 | cat1,cat22,cat36,cat44 | 66 | ven1 | t1,t33,t43,t49 | req1,req2,req3,req4 | blue | 64.22
234.45.tfhj | Alex | 23-03-2016 | 100 | prod2 | cat10,cat36,cat42 | 104 | ven1 | t22,t45 | req1,req2,req33,req5 | red | 66
234.44.tfhj | Vikas | 24-03-2016 | 88 | prod1 | cat101,cat316,cat43 | 22 | ven2 | t22,t43 | req1,req23,req3,req6 | red | 77.12
234.47.tfhj | Jane | 25-03-2016 | 22 | prod7 | cat10,cat36,cat44 | 43 | ven3 | t77 | req1,req24,req3,req7 | green | 45.89
234.48.tfhj | John | 26-03-2016 | 97 | serv3 | cat101,cat36,cat45 | 69 | ven5 | t1 | req11,req2,req3,req8 | orange | 33.04
234.49.tfhj | Ruby | 27-03-2016 | 85 | prod58 | cat10,cat38,cat46 | 88 | ven9 | t33,t55,t99 | req1,req24,req3,req9 | white | 46.04
234.50.tfhj | Ahmed | 28-03-2016 | 44 | serv7 | cat110,cat36,cat47 | 34 | ven11 | t22,t43,t77 | req1,req20,req3,req10 | red | 43
And also parse the "possible,item,profiles" to record into MyDatabase.MyItem_type as:
Unique-Record-ID | item_profile | Quantity
234.44.tfhj | cat1 | 22
234.44.tfhj | cat22 | 22
234.44.tfhj | cat36 | 22
234.44.tfhj | cat44 | 22
234.45.tfhj | cat10 | 100
234.45.tfhj | cat36 | 100
234.45.tfhj | cat42 | 100
234.44.tfhj | cat101 | 88
234.44.tfhj | cat316 | 88
234.44.tfhj | cat43 | 88
234.47.tfhj | cat10 | 22
234.47.tfhj | cat36 | 22
234.47.tfhj | cat44 | 22
234.48.tfhj | cat101 | 97
234.48.tfhj | cat36 | 97
234.48.tfhj | cat45 | 97
234.48.tfhj | cat101 | 97
234.48.tfhj | cat36 | 97
234.48.tfhj | cat45 | 97
234.49.tfhj | cat10 | 85
234.49.tfhj | cat38 | 85
234.49.tfhj | cat46 | 85
234.50.tfhj | cat110 | 44
234.50.tfhj | cat36 | 44
234.50.tfhj | cat47 | 44
We also need to similarly parse "applicable,vendor,categories" and
store them into MyDatabase.MyVendor_groups. And parse
"known,request,types" for storage into MyDatabase.MyReqs The first
column for MyDatabase.MyItem_type, MyDatabase.MyVendor_groups and
MyDatabase.MyReqs will always be the Unique-Record-ID that was
witnessed in the log.
Therefore yes, this column does not contain unique data, like other columns, in these three tables.
The third column will always be the Quantity that was witnessed in the log.
I know a bit of PCRE, but it is the use of nested parsers in syslog-ng that's completely confusing me.
Documentation of Syslog-ng suggests this is possible, but simply failed to get a good example. If any kind hack around here has some reference or sample to share, it will be so useful.
Thanks in advance.
I think all of these can be done using the csv-parser a few times.
First, use a csv-parser with the tab delimiter("\t") to split the initial fields into named columns. Use this parser on the entire message.
Then you'll have to parse the fields that have subfields using other instances of the csv-parser on the columns that need further parsing.
You can find some examples at https://www.balabit.com/sites/default/files/documents/syslog-ng-ose-latest-guides/en/syslog-ng-ose-guide-admin/html/csv-parser.html and https://www.balabit.com/sites/default/files/documents/syslog-ng-ose-latest-guides/en/syslog-ng-ose-guide-admin/html/reference-parsers-csv.html
(It is possible that you can get it done with a single parser, if you specify both the tab and the comma as delimiters, but it might not work for the fields with variable number of fields.).

How to do a goedel numbering for bit strings?

I'm looking for a concept for doing a Gödel numbering for bit strings, i.e. for arbitrary binary data.
Approach 1 (failing): Simply interpret the binary data as data of an unsigned integer.
This fails, because e.g. the two different strings "01" and "001" both represent the same integer 1.
Is there a standard way of doing this? Is 0 usually included or excluded from the Gödel numbering?
The original Gödel numbering used prime numbers and unique encoding of symbols. If you want to do it for strings consisting of "0" and "1", you need positive codes for "0" (say 1) and "1" (say 2). Then numbering of "01" is
21 * 32
while numbering of "001" is
21 * 31 * 52
For longer strings use next prime numbers. However, note that Gödel numbering goals did not include any practical considerations, he simply needed numbering as a tool in the proof of his theorem. In practice for fairly short strings you will exceed range of integers in your language, so you need to use either a language with arbitrary large integers built-in (like Scheme) or a library supporting bignums in language without them built-in.
A super simple solution is to prepend a 1 to the binary data and then interpret the result as an unsigned integer value. This way, no 0-digits get lost at the left side of the bit string.
Illustration how well this works:
One obvious way to order bit strings is to order them first by length and then lexicographically:
+------------+
| bit string |
+------------+
| ε |
| 0 |
| 1 |
| 00 |
| 01 |
| 10 |
| 11 |
| 000 |
| 001 |
| 010 |
| 011 |
| 100 |
| 101 |
| 110 |
| ... |
+------------+
(ε denotes the empty string with no digits.)
Now we add an index number n to this table, starting with 1, and then look at the binary representation of the index number n. We will make a nice discovery there:
+------------+--------------+-------------+
| bit string | n in decimal | n in binary |
+------------+--------------+-------------+
| ε | 1 | 1 |
| 0 | 2 | 10 |
| 1 | 3 | 11 |
| 00 | 4 | 100 |
| 01 | 5 | 101 |
| 10 | 6 | 110 |
| 11 | 7 | 111 |
| 000 | 8 | 1000 |
| 001 | 9 | 1001 |
| 010 | 10 | 1010 |
| 011 | 11 | 1011 |
| 100 | 12 | 1100 |
| 101 | 13 | 1101 |
| 110 | 14 | 1110 |
| ... | ... | ... |
+------------+--------------+-------------+
This works out surprisingly well, because the binary representation of n (the index of each bit string when ordering in a very obvious way) is nothing else than a 1 prepended to the original bit string and then the whole thing interpreted as an unsigned integral value.
If you prefer a 0-based Goedel numbering, then subtract 1 from the resulting integer value.
Conversion formulas in pseudo code:
// for starting with 1
n_base1 = integer(prepend1(s))
s = removeFirstDigit(bitString(n_base1))
// for starting with 0
n_base0 = integer(prepend1(s)) - 1
s = removeFirstDigit(bitString(n_base0 + 1))

Resources