Best place to parse big input text data in rails

Best place to parse big input text data in rails - ruby-on-rails

I have two related to each other questions. I need some best practices.
First part..
Im searching best place to put data parser from text area. I do not want to insert that data to database, just read from text area after that compare that with some data and do some action (no any data from database). Where would u place that parsers? I should use helper (for now I use helpers just in view), concerns or just model methods?
Second part..
For example, lets input data looks like:
A B H 0 2
C D R 1 3
E F E 4 9
R H T 1 0
I parse that data from text area, split by column, check regex, if there is 5 cols etc. I need to create list of objects, and where could i create that 'helper' class with fields like first_col, sec_col, third_col, that data will be not saved to database..?

I think it's the job of a service object. You will pass the text to your service, it will work with your param and return the result, anywhere in your app.

Related

Exact match in dget function with an array as the criteria

Example Sheet I'm trying to get an exact match with an array in the criteria section of dget. Maybe there is another way to work around this, but I'm trying to give it a dynamic component in the array.
=dget('Micro Data'!$A$1:J,"PCR Score",{"Micro Type","Stage Type","Tank","ID#";"PCR PAL","Bright",F2,H2})
Sometimes all criteria matches multiple data points except the "Tank". However the tanks won't exactly match. Ex. All the data is the same in two data sets, except the tanks are CT1 and CT18. This then comes up with the #NUM! error. I'm trying to find if there is a way to get an exact match in the array data while still allowing it to reference the cell?
I know there is the option of making it "=XXX" making it a txt string, but this would take away the dynamic function. I would also loose the auto updating aspect when more data is added.
Thanks

Ryan, see my solution using a query, in Retain Log-GK, cell F2. I think it is just as dynamic as the dget, but perhaps not. It will need some error wrapping to avoid errors if no result found.
Formula is basically:
=query('Criteria Source'!A2:J5,
"select J where B = '"&D9&"' and C = '"&D10&"' and E = '"&D11&"' and D ='"& D2 & "' ",0)
I made all of the criteria dynamic, though obviously you can do it whatever way suits you best...
Let me know of any questions. I'll check back later...

Syntax to add a new case to the data

If i have a variable in SPSS, with name (My_Variable), label (My Variable), values(1: Yes, 2: No) etc but without data (the column in data view is empty), i want to add data using syntax! For example, i want to add a participant in 1st row, who answered "Yes", so i want 1 to be added!!! How can i do it???
I found similar questions, but the solutions refers to creating A NEW SPSS window and add the values there! But i dont want this! I want to add data in an existing variable, without creating new SPSS file!

Apparently there is no way to directly add cases to an SPSS dataset through syntax.
But the following seems to me pretty close - you don't create new files but you create a new dataset and add it to your original.
Let's first create a small data to demonstrate on:
Data list list/ID (a5) var1 var2 var3 (3f2).
begin data
"first" 1 17 7
"secnd" 5 5 12
"third" 34 11 91
end data.
dataset name originalDataset.
So this is your original data. Now imaging that you want to add a new case to the data, with the ID value of "hello" and the number 42 in all the columns. This is what you do:
* creating the new case in a separate dataset.
Data list list/ID (a5) var1 var2 var3 (3f2).
begin data
"hello" 42 42 42
end data.
dataset name addition.
* going back to original dataset and adding the new case.
dataset activate originalDataset.
add files /file=* /file=addition.
exe.
dataset close addition.

You don't have to create data in the first data set. Just create the variables and define them however you want.
DATASET CLOSE ALL.
INPUT PROGRAM.
NUMERIC My_Variable (F1).
VARIABLE LABELS My_Variable "I want this!".
VALUE LABELS My_Variable 1 "Yes" 2 "No".
END FILE.
END INPUT PROGRAM.
DATASET NAME Empty.
DATA LIST FREE /My_Variable.
BEGIN DATA.
1 2
END DATA.
APPLY DICTIONARY /FROM Empty
/SOURCE VARIABLES=My_Variable
/TARGET VARIABLES=My_Variable
/VARINFO VALLABELS=REPLACE VARLABEL.
DATASET CLOSE Empty.
FREQUENCIES VARIABLES ALL.
I used DATASET but you could have save the empty file to disk.
See the APPLY DICTIONARY command for more details about how it works.

Using python you can add data with the cases.append() method
begin program.
import spss
spss.StartDataStep()
dataset = spss.Dataset()
dataset.cases.append([1])
spss.EndDataStep()
end program.
Say you have 3 variables, you can assign values to each by appending the list passed to the method
begin program.
spss.StartDataStep()
dataset = spss.Dataset()
dataset.cases.append([1,2,3])
spss.EndDataStep()
end program.
Would add a case wit value 1 in the first variable, value 2 in the second variable, 3 in the third variable.
Note: the method will only work within an open datastep.

Check out the ADD FILES command. You can also add cases with Python code.

Iterating through CSV::Rows

I'm going to preface that I'm still learning ruby.
I'm writing a script to parse a .csv and identify possible duplicate records in the data-set.
I have a .csv file with headers, so I'm parsing the data so that I can access each row using a header title as such:
#contact_table = CSV.parse(File.read("app/data/file.csv"), headers: true)
# Prints all last names in table
puts contact_table['last_name']
I'm trying to iterate over each row in the table and identify if the last name I'm currently iterating over is similar to the next last name, but I'm having trouble doing this. I guess the way I'm handling it is as if it's an array, but I checked the type and it's a CSV::Row.
example (this doesn't work):
#contact_table.each_with_index do |c, i|
puts "first contact is #{c['last_name']}, second contact is #{c[i + 1]['last_name']}"
end
I realized this doesn't work like this because the table isn't an array, it's a CSV::Row like I previously mentioned. Is there any method that can achieve this? I'm really blanking right now.
My csv looks something like this:
id,first_name,last_name,company,email,address1,address2,zip,city,state_long,state,phone
1,Donalt,Canter,Gottlieb Group,dcanter0#nydailynews.com,9 Homewood Alley,,50335,Des Moines,Iowa,IA,515-601-4495
2,Daphene,McArthur,"West, Schimmel and Rath",dmcarthur1#twitter.com,43 Grover Parkway,,30311,Atlanta,Georgia,GA,770-271-7837

#contact_table should be a CSV::Table which is a collection of CSV::Rows so in this:
#contact_table.each_with_index do |c, i|
...
end
c is a CSV::Row. That's why c['last_name'] works. The problem is that here:
c[i + 1]['last_name']
you're looking at c (a single row) instead of #contact_table, if you said:
#contact_table[i + 1]['last_name']
then you'd get the next last name or, when c is the last row, an exception because #contact_table[i+1] will be nil.
Also, inside the iteration, c is the current (or (i+1)th) row and won't always be the first.

What is your use case for this? Seems like a school project?
I recommend for_each instead of parse (see this comparison). I would probably use a Set for this.
Create a Set outside of the scope of parsing the file (i.e., above the parsing code). Let's call it rows.
Call rows.include?(row) during each iteration while parsing the file
If true, then you know you have a duplicate
If false, then call rows.add(row) to add the new row to the set
You could also just fill your set with an individual value from a column that must be distinct (e.g., row.field(:some_column_name)), such as email or phone number, and do the same inclusion check for that.
(If this is for a real app, please don't do this. Use model validations instead.)

I would use #read instead of #parse and do something like this:
require 'csv'
LASTNAME_INDEX = 2
data = CSV.read('data.csv')
data[1..-1].each_with_index do |row, index|
puts "Contact number #{index + 1} has the following last name : #{row[LASTNAME_INDEX]}"
end
#~> Contact number 1 has the following last name : Canter
#~> Contact number 2 has the following last name : McArthur

How does foreach work in Pig?

I have a sample data looks like:
1950,0,1
1950,22,1
1950,-11,1
1949,111,1
1949,78,1
and I used following commands:
A = load 'path/to/the/sample';
B = foreach A generate $0,$1;
which should only generate first 2 columns of the A.
then I used
describe B
to check how it works, it returns: B: {a: bytearray,b: bytearray}, that is correct.
HOWEVER, when I run the command
dump B
why it returns:
(1950,0,1,)
(1950,22,1,)
(1950,-11,1,)
(1949,111,1,)
(1949,78,1,)
as the result??? It's sooooo weird. I'v tried it several time... but still the same result

The reason this happens is because Pig by default tries to separate your data by tabs. So when you pass it a line like
1950,0,1
it thinks it has found just a single field, 1950,0,1. Since you indicated that each line has two fields, the second field is just set to NULL.
So when you GENERATE the two fields you loaded, it prints out the tuple
(1950,0,1,)
If you were to STORE this instead of DUMPing it you would see it more clearly. Pig would store the data separated by tabs (again, the default), and your output file would look like
1950,0,1
1950,22,1
1950,-11,1
1949,111,1
1949,78,1
That's not very enlightening, so look instead what happens if you were to do this:
B = foreach A generate $0, "test";
store B into 'output';
Now the data in output would be
1950,0,1 test
1950,22,1 test
1950,-11,1 test
1949,111,1 test
1949,78,1 test
You can control what Pig uses as the field separator for both LOAD and STORE by using the clause USING PigStorage(','). The argument to PigStorage can be whatever character you like. One other common one is USING PigStorage('\n'), which will load in each line as a whole.

Use PigStorage Clause in your Load statement.
A = load 'path/to/the/sample' using PigStorage(',');
B = foreach A generate $0,$1;
dump B
now you will get the result that what u expect
(1950,0)
(1950,22)
(1950,-11)
(1949,111)
(1949,78)

Combining table, web service data in Grails

I'm trying to figure out the best approach to display combined tables based on matching logic and input search criteria.
Here is the situation:
We have a table of customers stored locally. The fields of interest are ssn, first name, last name and date of birth.
We also have a web service which provides the same information. Some of the customers from the web service are the same as the local file, some different.
SSN is not required in either.
I need to combine this data to be viewed on a Grails display.
The criteria for combination are 1) match on SSN. 2) For any remaining records, exact match on first name, last name and date of birth.
There's no need at this point for soundex or approximate logic.
It looks like what I should do is extract all the records from both inputs into a single collection, somehow making it a set on SSN. Then remove the blank ssn.
This will handle the SSN matching (once I figure out how to make that a set).
Then, I need to go back to the original two input sources (cached in a collection to prevent a re-read) and remove any records that exist in the SSN set derived previously.
Then, create another set based on first name, last name and date of birth - again if I can figure out how to make a set.
Then combine the two derived collections into a single collection. The collection should be sorted for display purposes.
Does this make sense? I think the search criteria will limit the number of record pulled in so I can do this in memory.
Essentially, I'm looking for some ideas on how the Grails code would look for achieving the above logic (assuming this is a good approach). The local customer table is a domain object, while what I'm getting from the WS is an array list of objects.
Also, I'm not entirely clear on how the maxresults, firstResult, and order used for the display would be affected. I think I need to read in all the records which match the search criteria first, do the combining, and display from the derived collection.

The traditional Java way of doing this would be to copy both the local and remote objects into TreeSet containers with a custom comparator, first for SSN, second for name/birthdate.
This might look something like:
def localCustomers = Customer.list()
def remoteCustomers = RemoteService.get()
TreeSet ssnFilter = new TreeSet(new ClosureComparator({c1, c2 -> c1.ssn <=> c2.ssn}))
ssnFilter.addAll(localCustomers)
ssnFilter.addAll(remoteCustomers)
TreeSet nameDobFilter = new TreeSet(new ClosureComparator({c1, c2 -> c1.firstName + c1.lastName + c1.dob <=> c2.firstName + c2.lastName + c2.dob}))
nameDobFilter.addAll(ssnFilter)
def filteredCustomers = nameDobFilter as List
At this point, filteredCustomers has all the records, except those that are duplicates by your two criteria.
Another approach is to filter the lists by sorting and doing a foldr operation, combining adjacent elements if they match. This way, you have an opportunity to combine the data from both sources.
For example:
def combineByNameAndDob(customers) {
customers.sort() {
c1, c2 -> (c1.firstName + c1.lastName + c1.dob) <=>
(c2.firstName + c2.lastName + c2.dob)
}.inject([]) { cs, c ->
if (cs && c.equalsByNameAndDob(cs[-1])) {
cs[-1].combine(c) //combine the attributes of both records
cs
} else {
cs << c
}
}
}

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart