how to form fields section in xfd file - cobol

I'm having problem to form the field section's structure into xfd files after analyse by issuing commnad "vutil32.exe -i -kx pogl.dad". I hope somebody could help me out how to form out field structure as highlighted in below. I've uploaded sample of my file known as "pglc.dad" i hope soneone could guide me how to form .xfd file from his expert skills and guide me.Thanks
Result from vutil32.exe
file size: 250880
record size (min/max): 121/1024 compressed(80%)
# of keys: 4
key size: 16:02 31:03 56:03 15
key offset: 0 0 0 1
duplicates okay: N N N N
block size: 512
blocks per granule: 1
tree height: 4/2/2.7
# of nodes: 200
# of deleted nodes: 1
total node space: 101800
node space used: 67463 (66%)
user count: 0
Key Dups Seg-1 Seg-2 Seg-3 Seg-4 Seg-5 Seg-6
(sz/of) (sz/of) (sz/of) (sz/of) (sz/of) (sz/of)
0 N 1/0 15/1
1 N 1/0 15/66 15/1
2 N 1/0 40/81 15/1
3 N 15/1
Here is my further construction of .xfd file.
XFD,02,PGLC,PGLC
00300,00041,004
1,0,013,00000
01
PGSTAT
3,0,004,00004,020,00021,004,00000
3
PGSTAT
PGDESC
PGLINE
3,0,004,00004,008,00013,004,00000
03
PGSTAT
PGDESC
PGLINE
1,0,012,00021
01
PGSTAT
000
0150,00150,00003 =================>> How can i form this field section.
00000,00013,16,00016,+00,000,000,PGSTAT
00000,00001,16,00001,+00,000,000,PGDESC
00001,00015,16,00015,+00,000,000,PGLINE
here is the link for my pglc.dad : http://files.engineering.com/getfile.aspx?folder=080fdad6-b1d5-4a37-8dd0-b89f9a985c69&file=PGLC.DAD
Thanks appopriate to someone could helps.

I know the XFD format intimately as I have written a couple of parsers of this file format in both Perl and Cobol.
Having said that, I would strongly recommend that you do not try to hand craft an XFD file from scratch.
If you have an AcuCobol (MicroFocus) compiler, and the source of the file's SELECT and FD definitions, then you can create a very small Cobol program that has just the SELECT and FD definitions and then compile the program using:
ccbl32.exe -Fx <program>
That will create an XFD file for the indexed file definition. Note, you can specify a directory for the created XFD file using the -Fo <directory> option.
If you don't have the source of the file definitions, then you are just going to be guessing what and where the fields are. The indexed file by itself will not tell you that information. I can see from extracting the data in your file (using the vutil -e option) that the file contains binary data as well as text, so without knowing exactly what PICture those fields are (COMP-?) you will be struggling to figure out the structure of those fields.

Related

How can I generate a single .avro file for large flat file with 30MB+ data

currently two avro files are getting generated for 10 kb file, If I follow the same thing with my actual file (30MB+) I will n number of files.
so need a solution to generate only one or two .avro files even if the source file of large.
Also is there any way to avoid manual declaration of column names.
current approach...
spark-shell --packages com.databricks:spark-csv_2.10:1.5.0,com.databricks:spark-avro_2.10:2.0.1
import org.apache.spark.sql.types.{StructType, StructField, StringType}
// Manual schema declaration of the 'co' and 'id' column names and types
val customSchema = StructType(Array(
StructField("ind", StringType, true),
StructField("co", StringType, true)))
val df = sqlContext.read.format("com.databricks.spark.csv").option("comment", "\"").option("quote", "|").schema(customSchema).load("/tmp/file.txt")
df.write.format("com.databricks.spark.avro").save("/tmp/avroout")
// Note: /tmp/file.txt is input file/dir, and /tmp/avroout is the output dir
Try specifying number of partitions of your dataframe while writing the data as avro or any format. To fix this use repartition or coalesce df function.
df.coalesce(1).write.format("com.databricks.spark.avro").save("/tmp/avroout")
So that it writes only one file in "/tmp/avroout"
Hope this helps!

How can I read a directory on iso9660 from the path table when the table does not include size?

According to the spec for the structure of an iso9660 / ecma119, the path table contains records for each path, including the location of the starting sector and its name, but not its size. I can find the directory entry, but don't know how many sectors (normally 2048 bytes) it contains. Is it one? Two? Six?
If I "walk the directory tree", each directory entry includes the referenced location and size, so I can know how many bytes (essentially, how many sectors, since a directory must use entire sectors) to read. However, the path table only includes the starting location, and not the size, leaving me not knowing how many bytes to read.
In an example iso I have (ubuntu-18.04.1-live-server-amd64.iso fwiw), the root directory entry in the primary volume descriptor shows:
Root Directory:
Directory Record Length: 34
Extended Attribute Length: 0
Location of Extent: 20 $00000014 00:00:20
Data Length: 2048 $00000800
Recording Date and Time: 23:39:04 07/25/2018 GMT 0
File Flags: $02 visible regular dir non-record no-perms single-extent
File Unit Size: 0
Interleave Gap Size: 0
Volume Sequence Number: 1
File Identifier: . (current directory)
Since it says the Data Length is 2048, I know to read just one sector.
However, the root directory entry in the path table shows:
Path Record Length: 10 $0A
Extended Attribute Length: 0 $00
Location of Extent: 20 $00000014 00:00:20
Parent Directory Number: 1 $0001
File Identifier: . (current directory)
It also points to sector 20, but doesn't tell me how many sectors it uses, leaving me guessing.
Yes, unused bytes in a sector should be all 0x00, so if I read in a sector, read records, and then come to one whose first byte (length) is 0x00, then I know I have reached the end of records, but that has three issues:
If that were the canonical way, why bother including size in the directory entry?
If it includes 2 or 3 sectors, it is more efficient for me to read them all at once than one at a time.
If I have a directory whose records precisely fill a sector, without some size attribute, I don't know if the next sector is supposed to be read as an entry, or if the directory ended here.
Basically, I know how to read the ordered path table to get the directory entry, but don't know how to use that to know how many sectors to read for the directory itself. I could, in theory, read the parent to get the entry for this directory to know the size, but that adds a seek and read and pretty much defeats the purpose of the path table.
Ah, I figured it out. Because the directory entries always start with a directory entry for the directory itself, and the data length always is bytes 10-17 (10-13 for little-endian, 13-17 for big-endian), you can just read bytes 10-17 from the beginning of the sector and get the size. Still not as efficient as putting it in the path table itself - no idea why they did not - but it works.

Modify values programmatically SPSS

I have a file with more than 250 variables and more than 100 cases. Some of these variables have an error in decimal dot (20445.12 should be 2.044512).
I want to modify programatically these data, I found a possible way in a Visual Basic editor provided by SPSS (I show you a screen shot below), but I have an absolute lack of knowledge.
How can I select a range of cells in this language?
How can I store the cell once modified its data?
--- EDITED NEW DATA ----
Thank you for your fast reply.
The problem now its the number of digits that number has. For example, error data could have the following format:
Case A) 43998 (five digits) ---> 4.3998 as correct value.
Case B) 4399 (four digits) ---> 4.3990 as correct value, but parsed as 0.4399 because 0 has been removed when file was created.
Is there any way, like:
IF (NUM < 10000) THEN NUM = NUM / 1000 ELSE NUM = NUM / 10000
Or something like IF (Number_of_digits(NUM)) THEN ...
Thank you.
there's no need for VB script, go this way:
open a syntax window, paste the following code:
do repeat vr=var1 var2 var3 var4.
compute vr=vr/10000.
end repeat.
save outfile="filepath\My corrected data.sav".
exe.
Replace var1 var2 var3 var4 with the names of the actual variables you need to change. For variables that are contiguous in the file you may use var1 to var4.
Replace vr=vr/10000 with whatever mathematical calculation you would like to use to correct the data.
Replace "filepath\My corrected data.sav" with your path and file name.
WARNING: this syntax will change the data in your file. You should make sure to create a backup of your original in addition to saving the corrected data to a new file.

Exception in thread "main" java.lang.IllegalArgumentException: Wrong number of attributes in the string + Mahout

I am trying to create a file descriptor using the command:
$ MAHOUT_HOME/core/target/mahout-core--job.jar org.apache.mahout.classifier.df.tools.Describe -p testdata/KDDTrain+.arff -f testdata/KDDTrain+.info -d N 3 C 2 N C 4 N C 8 N 2 C 19 N L
from the link:
https://mahout.apache.org/users/classification/partial-implementation.html on my data file but whatever file I take and change the number of attributes string N 3 C 2 N C 4 N C 8 N 2 C 19 N L .
I get the following exception:
Exception in thread "main" java.lang.IllegalArgumentException: Wrong number of attributes in the string
Please help!
There are a couple of reasons for which you might get an error like that...
Wrong Descriptor: Putting this for a sake of completeness. You must have already checked this one out. You have actually given a wrong descriptor for the data. Re-check the number and type of columns and then give them correctly to the descriptor.
Bad separator: Re-check the delimiter used in the data. That also might create some trouble. May be the data you have has some wrongly placed delimiter in some records. Make sure of that.
Special Characters: In my few experiments, I have noticed mahout does not enjoy if there are certain special characters, or data consists of characters of language other than English (unless of course, you tweak around the code). So make sure you have a way of handling them, and you should be good to go.
Anyways all these fight just so you can create a descriptor of the data. ATB.
Old question, but I had a more acute answer that I discovered after landing here with the same problem.
In this particular case, the problem I found was that the format of data file (from http://nsl.cs.unb.ca/NSL-KDD/) seems to have changed from the example as listed on the Mahout Random Forest example page.
The example lists a line format with the specifier
N 3 C 2 N C 4 N C 8 N 2 C 19 N L
but there's an extra element at the end of the lines; for example:
13,tcp,telnet,SF,118,2425,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,26,10,0.38,0.12,0.04,0.00,0.00,0.00,0.12,0.30,guess_passwd,2
which has one more field. Adding another number field (N) to the end of the specifier, as
N 3 C 2 N C 4 N C 8 N 2 C 19 N L N
I had luck using just the plain .txt file format instead of the .arff file format.

How to form XFD from cobol Data

I'll try to redefine the XFD file structure based on below file settings option
Anaylsis Result:
Max Record length 300
Min record length 61
No of records 466
Blocking factor 1
Preallocation amount 0
Extension amount 1
Compression factor 80
Encrypted ? No
Number of keys 4
Primary key has 1 segments
key size 13 offset 0
key 02 has 3 segments
Duplicates Are allowed
Key size 4 offset 4
Key size 40 offset 21
Key size 4 offset 0
Key 03 has 3 segments
Duplicates Are allowed
Key size 4 offset 4
Key size 8 offset 13
Key size 4 offset 0
Key04 has 1 segment
Duplicates Are allowed
Key size 10 offset 21
Another given XFD file structure which is already failed to obtained data from AcuODBC:
I 'm linking against from pote.XFD to pote Acu database file through ACUODBC.
XFD,02,POTE,POTE
00026,00018,002
1,0,008,00000
02
INTIND-UNIQ
INTIND-OCC
1,1,010,00008
01
IND16
000
0004,00004
00000,00004,12,00009,+00,000,000,INTIND-UNIQ
00004,00004,12,00009,+00,000,000,INTIND-OCC
00008,00010,16,00010,+00,000,000,IND16
00018,00008,16,00008,+00,000,000,TERM20
I'm linking against from pote.XFD to pote Acu database file through ACUODBC.
My Question is here how could I change my pote.XFD structure based on give analysis as given on top to form a correct XFD structure.
I know there are four keys in this cobol table, but I still don't know how to manually configure this data structure based on given analysis information.
Below is another reference guide on how to form XFD correct structure in manual where I've already obtained, hope someone expert can help to explained the way on how to form on correct XFD structure.
# This xfd layout is a generic one suitable for accessing any
# .DAD file. However, it needs to be copied and amended for each
# DAD file that you wish to get access to.
# The simplest scenario is that you copy dad.xfd to a new file
# with the same name as the database you wish to access and extension .XFD
# Then edit this new file and replace the two instances of 'FILE' with the
# filename that you want to access. e.g. if you want to have ODBC access to
# icvc.dad then copy dad.xfd to new file icvc.xfd and change line
# XFD,02,FILE,FILE to be
# XFD,02,ICVC,ICVC
#
# If this doesn't work then the database file you are trying to access has
# probably set different values for search index sizes. The easiest way to
# check this is to run $list for the database that you want to access and
# note down all the key information that it gives. If that is different
# to the key info in this file then you need to modify the xfd file to match
# In the current xfd there are four indexes defined. In all cases the first
# index will be correct and so should the third index. However, the other
# two may need to be modified or removed if not present.
# Index 4 is optional and is not present if the database is rebuilt without
# the fast list option.
# explaining the details of 2nd index. 1 st line consists of 8 values separated
# by commas. The first value of 3 is how many segments the index consists of.
# second value 1 means duplicates allowed (0 means NO DUPS).
# The remaining six fields are three pairs of key size and byte offset, e.g.
# first index segment is 4 bytes long and starts from byte 4, second index
# segment is 20 bytes long and starts from byte 21 etc.
# The second line specifies how many field names there are to follow and lines 3
# to 5 are the three field names as defined lower in this xfd. For instance
# if you look at field D1UNIQ you will see it is defined as starting from byte 0
# and is 4 bytes long. This corresponds to the values entered in the key definition.
#
XFD,02,ICVC,ICVC
00300,00041,004
# [Key Section]
# [1st index]
01,0,013,00000
04
D1UNIQ
D1NAME
D1NAMX
D1OCCU
# [2nd index]
3,1,004,00004,020,00021,004,00000
03
D1NAME
D1TUPP
D1UNIQ
# [3rd index]
3,1,004,00004,008,00013,004,00000
03
D1NAME
D1NUMB
D1UNIQ
# [4th index]
1,1,020,00021
01
D1TUPP
# [Condition Section]
000
# [Field Section]
0015,00015,00016
00000,00013,16,00013,+00,000,999,D1KEY
00000,00004,12,00009,+00,000,000,D1UNIQ
00004,00004,16,00004,+00,000,000,D1NAME
00008,00001,16,00001,+00,000,000,D1NAMX
00009,00004,12,00009,+00,000,000,D1OCCU
00013,00008,11,00018,-06,000,000,D1NUMB
00021,00040,16,00040,+00,000,000,D1TUPP
00061,00001,01,00001,+00,000,000,D1GRAD
00062,00004,12,00008,+00,000,000,D1DLUP
00066,00004,12,00008,+00,000,000,D1TLUP
00070,00004,16,00004,+00,000,000,D1OLUP
00074,00001,16,00001,+00,000,000,D1TYPE
00075,00002,16,00002,+00,000,000,D1FORM
00077,00160,16,00160,+00,000,000,D1TEXT
00237,00001,16,00001,+00,000,000,D1PRIN
00238,00062,16,00062,+00,000,000,D1FILL
First do this:
'# This xfd layout is a generic one suitable for accessing any
'# .DAD file. However, it needs to be copied and amended for each
'# DAD file that you wish to get access to.
'# The simplest scenario is that you copy dad.xfd to a new file
'# with the same name as the database you wish to access and extension .XFD
'# Then edit this new file and replace the two instances of 'FILE' with the
'# filename that you want to access. e.g. if you want to have ODBC access to
'# icvc.dad then copy dad.xfd to new file icvc.xfd and change line
'# XFD,02,FILE,FILE to be
'# XFD,02,ICVC,ICVC
If that does not work, follow the instructions for finding out how many keys ther are and the values for the second key. If you discover there are only three keys, remove the fourth key from the template. If the values for the second key are different, change them in the [FieldSection].
Get it working before changing anything else.

Resources