Data collection task - parsing

I have data that follows this kind of patten:
ID Name1 Name2 Name3 Name4 .....
41242 MCJ5X TUAW OXVM4 Kcmev 1
93532 AVEV2 WCRB3 LPAQ 2 DVL2
.
.
.
As of now this is just format in a spreadsheet and has about 6000 lines. What I need to do is to create a new row for each Name after Name1 and associate that with the ID on its current row. For example, see below:
ID Name1
41242 MCJ5X
41242 TUAW
41242 OXVM4
41242 Kcmev 1
93532 AVEV2
93532 WCRB3
93532 LPAQ 2
93532 DVL2
Any ideas how I could do this? I feel like this shouldn't be too complicated but not sure of the best approach. Whether a script or some function I'd really appreciate the help.

If possible, you might want to use a csv file. These files are plain-text and most spreadsheet programs can open/modify them (I know Excel and the OpenOffice version can). If you go with this approach, your algorithm will look something like this:
read everything into a string array
create a 1 to many data structure (maybe a Dictionary<string, List<string>> or list of (string, string) tuple types)
loop over each line of the file
splice the current line on the ','s and loop over those
if this is the first splice, add a new item to the 1 to many data structure with the current splice as the Id
otherwise, add this splice to the "many" (name) part of the last item in the data structure
create a new csv file or open the old one for writing
output the "ID, Name1" row
loop over each 1-many item in the data collection
loop over the many items in the current 1-many item
output the 1 (id) + "," + current many item (current name)
You could do this in just about any language. If its a one-time use script then Python, Ruby, or Powershell (depending on platform) would probably be a good choice.

Related

Change this string - adding a space rather than dash

I have a script which basically changes a characters name and rank if the command CharRankPromote is used. Initially, the automatic name given to the player would be something like TEXT-RANK-TEXT (or something or other), but recent changes in my mind resulted in the need to make a new system being just: RANK NAME. However, the current code does not recognize this new name format, would there be an easy fix? This is the code in question:
for index, rank in next, targetRanks do
if (string.find(name, "[%D+]" .. rank .. "[%D+]")) then
if (newRank == index) then
return "#cRankSameRank", name, rank
end
The string (name, "[%D+]" .. rank .. "[%D+]") would promote someone with the same TEXT-RANK.TEXT:TEST, as it could identify the rank in the name. I'd like to change it so it could identify the rank in the format "RANK NAME," with a space in between (with the string only focusing on the word at the front.

Iterating through CSV::Rows

I'm going to preface that I'm still learning ruby.
I'm writing a script to parse a .csv and identify possible duplicate records in the data-set.
I have a .csv file with headers, so I'm parsing the data so that I can access each row using a header title as such:
#contact_table = CSV.parse(File.read("app/data/file.csv"), headers: true)
# Prints all last names in table
puts contact_table['last_name']
I'm trying to iterate over each row in the table and identify if the last name I'm currently iterating over is similar to the next last name, but I'm having trouble doing this. I guess the way I'm handling it is as if it's an array, but I checked the type and it's a CSV::Row.
example (this doesn't work):
#contact_table.each_with_index do |c, i|
puts "first contact is #{c['last_name']}, second contact is #{c[i + 1]['last_name']}"
end
I realized this doesn't work like this because the table isn't an array, it's a CSV::Row like I previously mentioned. Is there any method that can achieve this? I'm really blanking right now.
My csv looks something like this:
id,first_name,last_name,company,email,address1,address2,zip,city,state_long,state,phone
1,Donalt,Canter,Gottlieb Group,dcanter0#nydailynews.com,9 Homewood Alley,,50335,Des Moines,Iowa,IA,515-601-4495
2,Daphene,McArthur,"West, Schimmel and Rath",dmcarthur1#twitter.com,43 Grover Parkway,,30311,Atlanta,Georgia,GA,770-271-7837
#contact_table should be a CSV::Table which is a collection of CSV::Rows so in this:
#contact_table.each_with_index do |c, i|
...
end
c is a CSV::Row. That's why c['last_name'] works. The problem is that here:
c[i + 1]['last_name']
you're looking at c (a single row) instead of #contact_table, if you said:
#contact_table[i + 1]['last_name']
then you'd get the next last name or, when c is the last row, an exception because #contact_table[i+1] will be nil.
Also, inside the iteration, c is the current (or (i+1)th) row and won't always be the first.
What is your use case for this? Seems like a school project?
I recommend for_each instead of parse (see this comparison). I would probably use a Set for this.
Create a Set outside of the scope of parsing the file (i.e., above the parsing code). Let's call it rows.
Call rows.include?(row) during each iteration while parsing the file
If true, then you know you have a duplicate
If false, then call rows.add(row) to add the new row to the set
You could also just fill your set with an individual value from a column that must be distinct (e.g., row.field(:some_column_name)), such as email or phone number, and do the same inclusion check for that.
(If this is for a real app, please don't do this. Use model validations instead.)
I would use #read instead of #parse and do something like this:
require 'csv'
LASTNAME_INDEX = 2
data = CSV.read('data.csv')
data[1..-1].each_with_index do |row, index|
puts "Contact number #{index + 1} has the following last name : #{row[LASTNAME_INDEX]}"
end
#~> Contact number 1 has the following last name : Canter
#~> Contact number 2 has the following last name : McArthur

How to aggregate multiple rows into one in CSV?

I have following problem:
I have a CSV file, which looks like this:
1,12
1,15
1,18
2,10
2,11
3,20
And I would like to parse it somehow to get this:
1,12,15,18
2,10,11
3,20
Do you have any solution? Thanks!
Here is one solution for you.
This first part just sets up the example for testing. I am assuming you already have a file with values in the second part of the script.
$path = "$env:TEMP\csv.txt"
$data =#"
1,12
1,15
1,18
2,10
2,11
3,20
"#
$data | Set-Content $path
This should be all you need:
$path = "$env:TEMP\csv.txt"
$results = #{}
foreach($line in (Get-Content $path))
{
$split = $line -split ','
$rowid = $split[0]
$data = $split[1]
if(-not($results.$rowid))
{
$results.$rowid = $rowid
}
$results.$rowid += "," + $data
}
$results.values | Sort-Object
Your original dataset does not need to be sorted for this one to work. I slice the data up and insert it into a hashtable.
I don't know your exact code requirement. I will try to write some logic which may help you!
CSV means a text file which I can read into a string or an array
If one will look at the above CSV data, there is a common pattern i.e. after each pair there is a space in-between.
So my parsing will be depending on 2 phases
parse with ' ' i.e. single space and will insert into an array (say elements)
then parse with ',' i.e. comma from each element of elements and save into another array (say details) where odd indexes will be containing the left hand values and even indexes will be containing the right hand values.
So next while printing or using skip the odd index if you have an existing value.
Hope this helps...
Satyaranjan,
thanks for your answer! To clarify - I don't have any code requirements, I can use any language to achieve results. The point is to take unique values from first position (1,2,3) and put all related numbers on the right (1 - 12, 15 and 18 etc.). It is something like GROUP_CONCAT function in MySQL - but unfortunately I don't have such a function, so I am looking for some workaround.
Hope it is more clear now. Thanks

detect if a combination of string objects from an array matches against any commands

Please be patient and read my current scenario. My question is below.
My application takes in speech input and is successfully able to group words that match together to form either one word or a group of words - called phrases; be it a name, an action, a pet, or a time frame.
I have a master list of the phrases that are allowed and are stored in their respective arrays. So I have the following arrays validNamesArray, validActionsArray, validPetsArray, and a validTimeFramesArray.
A new array of phrases is returned each and every time the user stops speaking.
NSArray *phrasesBeingFedIn = #[#"CHARLIE", #"EAT", #"AT TEN O CLOCK",
#"CAT",
#"DOG", "URINATE",
#"CHILDREN", #"ITS TIME TO", #"PLAY"];
Knowing that its ok to have the following combination to create a command:
COMMAND 1: NAME + ACTION + TIME FRAME
COMMAND 2: PET + ACTION
COMMAND n: n + n, .. + n
//In the example above, only the groups of phrases 'Charlie eat at ten o clock' and 'dog urinate'
//would be valid commands, the phrase 'cat' would not qualify any of the commands
//and will therefor be ignored
Question
What is the best way for me to parse through the phrases being fed in and determine which combination phrases will satisfy my list of commands?
POSSIBLE solution I've come up with
One way is to step through the array and have if and else statements that check the phrases ahead and see if they satisfy any valid command patterns from the list, however my solution is not dynamic, I would have to add a new set of if and else statements for every single new command permutation I create.
My solution is not efficient. Any ideas on how I could go about creating something like this that will work and is dynamic no matter if I add a new command sequence of phrase combination?
I think what I would do is make an array for each category of speech (pet, command, etc). Those arrays would obviously have strings as elements. You could then test each word against each simple array using
[simpleWordListOfPets containsObject:word]
Which would return a BOOL result. You could do that in a case statement. The logic after that is up to you, but I would keep scanning the sentence using NSScanner until you have finished evaluating each section.
I've used some similar concepts to analyze a paragraph... it starts off like this:
while ([scanner scanUpToString:#"," intoString:&word]) {
processedWordCount++;
NSLog(#"%i total words processed", processedWordCount);
// Does word exist in the simple list?
if ([simpleWordList containsObject:word]) {
//NSLog(#"Word already exists: %#", word);
You would continue it with whatever logic you wanted (and you would search for a space rather than a ",".

How to compare and verify user input with a csv file columns data?

I am trying to get answer for the last few weeks. I want to give a textfield to users in the app and ask them to enter a id number and this number will be checked with the uploaded csv file columns. If it matches then display an alert saying that its found a match or else it doesn't.
A csv file is basically a normal file in which items are separated by commas.
For example:
id,name,email
1,Joe,joe#d.com
2,Pat,pat#d.com
To get the list of IDs stored in this csv files, simply loop through the lines, and grab the first item after splitting the string (use comma to split)
id = line.split(",").get(0); // first element ** Note this depends on what language you're using
Now, add this id to a collection of ids you are storing, like in a list.
IDs.add(id);
When you take the user input, all you need to do is to check if the id is in your list of ids.
if (IDs.contains(userId)) { print "Found"; }

Resources