i have a large cobol source code file that reads one file, checks for id's in a specific record, processes, then repeats till there are no more files. I have a few records that contain ids that I do not want to process. I'd like read the file, read the record, if the record variable equals a certain string, i'd like to move to the next file and do nothing with this file. Any suggestions?
Related
I am collecting data on cspro and export it to spss and I want append new data on to the old data, so I won't have the same files that I worked on on my last data.
Is there any syntax to sort that out?
If you are looking for a way to add up two SPSS files (after exporting your new data to a new file), the syntax is:
add files
/file="path1/filename1.sav"
/file="path2/filename2.sav".
you may have trouble if the same string variables have different widths in the two files. If so you need to choose the appropriate width and force it on all relevant variables before adding the files:
get file="path1/filename1.sav".
alter type stringVar1 (a50) stringVar2 (a150).
dataset name fil1.
get file="path2/filename2.sav".
alter type stringVar1 (a50) stringVar2 (a150).
dataset name fil2.
add files /file=fil1 /file=fil2.
execute.
I have an app that will take sales made available to vendors at Whole Foods and process the daily sales data by store and item. All the parent information is stored in one downloaded CSV with about 10,000 lines per month.
The importing process checks for new stores before importing the sale information.
I don't know how to track 'time' of processes in ruby and rails but i was wondering if it would be 'faster' to process one line at a time to each table or to process the file for one table (stores) and then to the other table (sales)
If it matters in anything, new stores are not often added though stores might be closed (and the import checks for that as well), so the scan through the stores might only add a few new entries whereas every row of the csv is added to the sales.
If this isn't appropriate - I apologize - still working out the kinks of the rules
When it comes to processing data with Ruby the memory consumption is what you should be concerned about.
With csv processing in Ruby, the best you can do is reading line by line:
file = CSV.open("data.csv")
while line = file.readline
# do stuff
end
This way no matter how many lines are in the file, there always be only single one (+ previous processed one) loaded into memory at a time - GC will collect processed lines as your program executes. This way is almost no-memory consumptive + it will speed up the parsing process, too.
i was wondering if it would be 'faster' to process one line at a time
to each table or to process the file for one table (stores) and then
to the other table (sales)
I would go with one line at a time to each table.
I want to read a csv file with a line Delimiter other than the default line delimiter. Each csv record spans multiple lines so the TextIO.Read does not suffice.
Should I extend the FileBasedSource or is there any existing CsvBasedSource (with a custom line/fields delimiter).
I was looking in to the splitIntoBundles() api, the XmlSource did not override the isSplittable() and so it can be split in to bundles and was wondering how the XmlSource handles this because the split can happen at the middle of a <record> as the split is happening based on the desiredBundleSize only.
That's correct that this will need a custom FileBasedSource implementation to work. Regarding XMLSource, record and root element names have to be unique (i.e. no other elements can have those names). We'll update the documentation to reflect that, and look at improving this in the future.
I have two files with same ID variable, so I want to match them with the MATCH FILES command, but I want to keep all the variables from the first file and just some from the other one. Thing is, I don't want to type every variable from the first file, but the subcommand KEEP ALL seems it's not working. Here my syntax and the error message:
GET FILE='C:\Users\Mike\Desktop\Households.sav'.
SORT CASES BY ID (A).
GET FILE='C:\Users\Mike\Desktop\Adults.sav'.
SORT CASES BY ID (A).
MATCH FILES
/FILE=*
/KEEP ALL
/FILE='C:\Users\Mike\Desktop\Households.sav'
/BY ID
/KEEP PV1 PV2 PV3 PV4.
EXECUTE.
SAVE OUTFILE
'C:\Users\Mike\Desktop\matchHouseholdsAdults.sav'.
Subcommands are out of order. All the FILE, TABLE, RENAME and IN subcommands must precede all other kinds of subcommands. Syntax checking begins with the next slash.
Thanks, fellows.
From the CSR:
DROP and KEEP must follow all FILE, TABLE, and RENAME subcommands.
You can use /DROP after the second FILE subcommand to ge rid of unwanted variables in the second file. If there are duplicate names, the first FILE content take priority.
Merging files in spss
Hi,
I have a problem in merging files. Here's what I need to do: I have chosen 200 cases from 7000 in ArcMap (GIS-program). In the process I have lost some of the cases' variable information.
Now I would like to get the variables back to my smaller dataset, and I used data-> merge files > add variables, and ID as match, match cases on > keyvariables in sorted files > both files provide cases.
This gave a dataset of all the 7000 cases, only the variables already existed in the first table didn't add to the merged dataset. I tried also all different choises, but none of them gave me the result I wanted. This would be the 200 cases added with the variables that were lost in the process.
So in a nutshell how do I merge/replace the info from variables A (dataset) to variables B(dataset) without extra casesĀ“ from A (only the info of the selected 200 casesĀ“out of 7000)?
Out of hand:
Create a new variable in the reduced DataSet with the Value of 1.
Match the files.
Sort by the new variable.
Delete all cases who don't have the value 1 on this variable.
I don't see why you are choosing both files provide cases. You want to use the 7000-case file as a keyed table using ID as the key and match it with the 200-case file, which provides all the cases. Assuming that you select all the variables from the large file that you want, this should give you the desired result.