I'm importing data from old spreadsheets into a database using rails.
I have one column that contains a list on each row, that are sometimes formatted as
first, second
and other times like this
third and fourth
So I wanted to split up this string into an array, delimiting either with a comma or with the word "and". I tried
my_string.split /\s?(\,|and)\s?/
Unfortunately, as the docs say:
If pattern contains groups, the respective matches will be returned in the array as well.
Which means that I get back an array that looks like
[
[0] "first"
[1] ", "
[2] "second"
]
Obviously only the zeroth and second elements are useful to me. What do you recommend as the neatest way of achieving what I'm trying to do?
You can instruct the regexp to not capture the group using ?:.
my_string.split(/\s?(?:\,|and)\s?/)
# => ["first", "second"]
As an aside note
into a database using rails.
Please note this has nothing to do with Rails, that's Ruby.
Related
I have a GrapeSwaggerRails API application that takes a two dates and a comma-delimited string of category IDs. It should query the database for Records with a created_at within the two given dates and a category_id that matches one of the IDs passed to it. I'm not having any trouble with the dates, so I'll skip that for now. But let's say I want Records with categories matching 8, 2, or 1. In the code, it looks like "8,2,1". In the URL, it gets appended as &categories=%228%2C2%2C1%22.
Anyway, I figured one decent way of getting this to do what I want would be to convert that string into an array of integers like this: categories = params[:categories].split(',').map(&:to_i)
But given "8,2,1", the output is this (ignore the comment):
0 # <-- ?????
2
1
Very strange. In the definition of the API, params[:categories] looks like this: "8,2,1". But params[:categories].split(',') becomes the following:
"8
2
1"
That's a bit odd, isn't it? Running the map method on that turns it into that nonsense higher up, converting the "8 to a 0 for reasons I'm hoping to find out here. I know I could probably come at this problem from a different angle and sidestep the issue, but I'd rather try to get to the root of what's going wrong, so I can learn something from it. For reference, here's what the Rails console does when I put (as far as I can tell) the same data into it:
>> "8,2,1".split(',')
#=> ["8", "2", "1"]
map then works as expected.
>> "8,2,1".split(',').map(&:to_i)
#=> [8, 2, 1]
So my question is twofold. What's going wrong with this split function? Why does it behave differently in the console?
Because params[:categories] is actually
'"8,2,1"' # <- the outer ''s are just for illustration of a string.
If you pass &categories=8%2C2%2C1 it should work as expected.
I have a web page where a user can search through documents in a mongoDB collection.
I get the user's input through #q = params[:search].to_s
I then run a mongoid query:
#story = Story.any_of( { :Tags => /#{#q}/i}, {:Name => /#{#q}/i}, {:Genre => {/#{#q}/i}} )
This works fine if the user looks for something like 'humor' 'romantic comedy' or 'mystery'. But if looking for 'romance fiction', nothing comes up. Basically I'd like to add 'and' 'or' functionality to my search so that it will find documents in the database that are related to all strings that a user types into the input field.
How can this be done while still maintaining the substring search capabilties I currently have?Thanks in advance for help!
UPDATE:
Per Eugene's comment below...
I tried converting to case insensitive with #q.map! { |x| x="/#{x}/i"}. It does save it properly as ["/romantic/i","/comedy/i"]. But the query Story.any_of({:Tags.in => #q}, {:Story.in => #q})finds nothing.
When I change the array to be ["Romantic","Comedy"]. Then it does.
How can I properly make it case insensitive?
Final:
Removing the quotes worked.
However there is now no way to use an .and() search to find a book that has both words in all these fields.
to create an OR statement, you can convert the string into an array of strings, and then convert the array of strings into an array of regex and then use the '$in' option. So first, pick a delimeter - perhaps commas or space or you can set up a custom like ||. Let's say you do comma seperated. When user enters:
romantic, comedy
you split that into ['romantic', 'comedy'], then convert that to [/romantic/i, /comedy/i] then do
#story = Story.any_of( { :Tags.in => [/romantic/i, /comedy/i]}....
To create an AND query, it can get a little more complicated. There is an elemMatch function you could use.
I don't think you could do {:Tags => /romantic/i, :Tags => /comedy/i }
So my best thought would be to do sequential queries, even though there would be a performance hit, but if your DB isn't that big, it shouldn't be a big issue. So if you want Romantic AND Comedy you can do
query 1: find all collections that match /romantic/i
query 2: take results of query 1, find all collections that match /comedy/i
And so on by iterating through your array of selectors.
I dont how to accomplish this problem.
I faced with this problem 3 times and each time I put it in my todo list but even tho I tried to find a solution I couldnt.
For examples,
I m trying to create a query with dynamic variables of this example;
User.search(first_name_start: 'K')
there are 3 arguments in this example;
1)first_name - My model attribute
2)start - Query type (start/end/cont )
3)'3' - value
I was able to create dynamic ActiveRecord using static symbols but how am I suppose to make dynamic input
Thanks in advance
EDIT: ADDITIONAL INFORMATION
let me show you a some kind of pseudo-code
varArray.each_with_index |x,index|
queryString=varArray[i]+"_"+filterArray=[i] #lets say varArray[i], this will be first_name, an actual model attribute/a column in my db
#and filterArray like /start/end/with a filter type
#and finally valArray a string value like 'geo' or 'paul'
User.where(queryString valArray[i]).result
I tried to use send(variable) but that didnt help me either, so i dont how should i proceed,
This is one of a few cases where new fancy Ruby 1.9 syntax for defining hashes doesn't cut it. You have to use the traditional hashrocket (=>) that allows you to specify not only symbols, but any arbitrary values as hash keys:
column = "#{first_name}_size_#{query_type}".to_sym
User.where( column => value )
AFAIK, ActiveRecord is able to accept strings instead of symbols as column names, so you don't even need to call to_sym.
I have the following json
Suppose my selection in mobile then this fields will be generated
{"Style":"convertible","Year":"2010","Color":"green"}
{"Style":"convertible","Year":"2010","Color":"red"}
And if my selection is bike then this field will be generatd
{"model":"2012","mileage":"20kmph","Color":"red"}
How do i achieve the above result.
Edit-1
I have the form in which some of the fields with be auto generated based on category selection. I have converted the auto generated fields to json and stored in database as single column.
Image url
I don't know how to explain can you understand what I am looking for. Check out my screenshots for better understanding
I'm assuming (for some crazy reason) that you will be using Ruby to do this.
But first, your expected output is wrong because you can't have a hash with duplicate keys:
{"Color": "green", "Color": "red"}
...is impossible. Same goes for the "Year" keys. Think of keys within a hash as Highlanders. THERE CAN ONLY BE ONE (of the same name). Therefore, your actual expected output would be:
{"Style":"convertible", "Year":"2012", "Color":"red", "name":"test"}
Or whatever. Anyway...
Step 1: Convert JSON to a Ruby Hash
require 'json'
converted = JSON.parse '[{"Style":"convertible","Year":"2010","Color":"green"},
{"Style":"convertible","Year":"2010","Color":"red"},
{"name":"test","Year":"2012","Color":"red"}]'
Step 2: Merge them
merged = {}
converted.each { |c| merged.merge! c }
Now the merged variable should look like the above actual expected output.
The only problem left is deciding which duplicate keys override which other duplicate keys. What matters here is the order in which you merge the hashes. The ones merged last overrides any existing duplicate key/values. Hope that helps.
I need to extract a table of data on a collection of pages. I can already traverse the pages just fine.
How do I extract the table's data? I'm using Ruby and Nokogiri, but I would assume that this is a pretty general issue.
I underlined the desired data points in each row in the following image.
A sample of the html is: http://pastebin.com/YYFPbFLC
How would I parse this table into a hash via Nokogiri into the meaningful chunks?
The table's xpath is:
/html/body/table/tbody/tr/td[2]/table/tbody/tr[2]/td/table/tbody/tr/td[2]/table/tbody/tr/td/table/tbody/tr/td[2]/table
The table has a variable number of rows of data and formatting rows. I only want to collect the rows with meaningful data, but I don't readily see a way to distinguish this via an XPath except the second column will reliably have "keyword" in it. Each of these rows have an XPath of:
1st meaningful row is: /html/body/table/tbody/tr/td[2]/table/tbody/tr[2]/td/table/tbody/tr/td[2]/table/tbody/tr/td/table/tbody/tr/td[2]/table/tbody/tr[2]
...
Last meaningful row: /html/body/table/tbody/tr/td[2]/table/tbody/tr[2]/td/table/tbody/tr/td[2]/table/tbody/tr/td/table/tbody/tr/td[2]/table/tbody/tr[N]
The first meaningful column that needs to match text content on the "keyword" is:
/html/body/table/tbody/tr/td[2]/table/tbody/tr[2]/td/table/tbody/tr/td[2]/table/tbody/tr/td/table/tbody/tr/td[2]/table/tbody/tr[2]/td[2]
The last column of this first row of data would be:
/html/body/table/tbody/tr/td[2]/table/tbody/tr[2]/td/table/tbody/tr/td[2]/table/tbody/tr/td/table/tbody/tr/td[2]/table/tbody/tr[2]/td[6]
Each row is a record and has a timestamp with this column/td being the time in the timestamp; The year, month and day are all in their own variables and can be appended for a full timestamp:
/html/body/table/tbody/tr/td[2]/table/tbody/tr[2]/td/table/tbody/tr/td[2]/table/tbody/tr/td/table/tbody/tr/td[2]/table/tbody/tr[2]/td[5]
The first rule of XPath is: never use the autogenerated XPath from Firebug or other browser tool. This creates brittle XPath that treats all page elements as equally important and required, even parts you don't care about. For example, if a notice went up at the top of the page and it happened to be in a table, it could throw off your parsing.
Instead, think about how a human would identify it. In this case, you want "the first table under the heading with the word 'today' in it". Here's the XPath for that:
//table[preceding-sibling::h2[contains(text(), "today")]][1]
This says take the tables that have a preceding h2 (in other words, that follow the h2), where the h2 contains the word "today". Then take the first such table.
Then you need to identify the rows you are interested in. Note that some rows are just dividers containing a single td, so you want to make sure you only parse the rows that have multiple td tags. In XPath, that is:
//tr[td[2]]
Then you just grab the content of all the columns. In the first one you can remove everything before the words "of magnitude" to get just the value. Putting it all together:
doc = Nokogiri::HTML.parse(html)
events = []
doc.xpath('//table[preceding-sibling::h2[contains(text(), "today")]][1]//tr[td[2]]').each do |row|
cols = row.search('td/text()').map(&:to_s)
events << {
:magnitude => cols[0].gsub(/^.*of magnitude /,''),
:temp_area => cols[1],
:time_start => cols[2],
:time_middle => cols[3],
:time_end => cols[4]
}
end
The output is:
[
{:magnitude=>"F1.7",
:temp_area=>"0",
:time_start=>"01:11:00",
:time_middle=>"01:24:00",
:time_end=>"01:32:00"},
{:magnitude=>"F3.1",
:temp_area=>"0",
:time_start=>"04:01:00",
:time_middle=>"04:10:00",
:time_end=>"04:26:00"},
{:magnitude=>"F3.5",
:temp_area=>"134F55",
:time_start=>"06:24:00",
:time_middle=>"06:42:00",
:time_end=>"06:53:00"},
{:magnitude=>"F1.4",
:temp_area=>"0",
:time_start=>"11:58:00",
:time_middle=>"12:06:00",
:time_end=>"12:16:00"},
{:magnitude=>"F1.0",
:temp_area=>"0",
:time_start=>"13:02:00",
:time_middle=>"13:05:00",
:time_end=>"13:09:00"},
{:magnitude=>"D53.7",
:temp_area=>"134F55",
:time_start=>"17:37:00",
:time_middle=>"18:37:00",
:time_end=>"18:56:00"}
]