Conditional grep/ack searches?

Conditional grep/ack searches? - grep

What I mean by this is that I want to search for "thing1", and then I want to search for "thing2" based on the position of the "thing1". And I want to display both of them in the result in the order that they are in the coede.
eg. I find "thing1" on line 100. I want to then search for the first "thing2" that occurs before "thing1". Then I want to display both of these in the order "thing2" then "thing1". I want to do this for every instance of "thing1" that I find.
The reason for this is that I want to search for certain strings which I know will be in lists (python), and I want to know the name of the lists too. So I thought that I could search for the string and then also display the first "= [" sign that occurs before the string.
So if a file has:
my_list = [
'item1',
'item2',
'item3',
]
my_other_list = [
'item4',
'item5',
'item3',
]
and create a search which looks for 'item3' and then to looks back to find the previous '= ['
then the output should be (not including line numbers which grep and ack will put):
my_list = [
'item3',
my_other_list = [
'item3',

I think you want this:
awk '/=/{thing2=$0} /item3/{print thing2;print $0,"\n"}' YourFile
So, every time you see an =, you remember the line as thing2. When you see item3 you print the last thing2 you saw and the current line.
Sample Output
my_list = [
'item3',
my_other_list = [
'item3',

Related

How to scrape a page for certain strings from an array of substrings with Nokogiri

I want to scrape a restaurant page for certain titles of dishes.
I created an array holding keywords:
myarray = {"Rice", "Soup", "Chicken", "Vegetables"}
Whenever one of those keywords is found in a webpage, my scraper is supposed to give me the entire dish-title. I made this work with the following code:
html_doc = Nokogiri::HTML.parse(browser.html)
word = html_doc.at(':contains("Rice"):not(:has(:contains("Rice")))').text.strip
puts word
For example this returns: "Dish 41 - Vegetables with Chicken and Rice"
The problem is that the above code stops after the first dish is found. It does not loop through all dish-titles containing the word rice.
Secondly, I do not know how to let the code check for an entire array of substrings.

Use css. This will find all the elements which matches the given CSS and give you the collection:
words = html_doc.css(':contains("Rice"):not(:has(:contains("Rice")))').map(&:text)

I solved the second part of my question myself with this:
word = html_doc.css(":contains('#{keyword}'):not(:has(:contains('#{keyword}')))").map(&:text)

After fetching CSV file, my terminal output displays only last ~200 items

I am using https://github.com/tilo/smarter_csv gem to read my csv files. On the last line of my code I wrote puts records to see output from the function.
filename = 'db/csv/airports_codes.csv'
options = {
:col_sep => 'tt',
:headers_in_file => false,
:user_provided_headers => [
"id",
"code",
"city",
"country",
"country_code",
"continent",
"coordinate_x",
"coordinate_y"
]
}
records = SmarterCSV.process(filename, options)
puts records
However there are so many outputs that my terminal displays only last ~200 items. How do I see others?
These are first 2 items displayed on top of the terminal.
{:id=>4564, :code=>"YEB", :city=>"Bar River", :country=>"Canada", :country_code=>"CA", :continent=>"North America", :coordinate_x=>"\\N", :coordinate_y=>"\\N"}
{:id=>4565, :code=>"YED", :city=>"Edmonton", :country=>"Canada", :country_code=>"CA", :continent=>"North America", :coordinate_x=>"\\N", :coordinate_y=>"\\N"}
I also want to note that it doesn't let me scroll above that. It acts like this is the first line in terminal and that there was nothing above it. I am using Ubuntu linux.

Since records is really just an array of objects, you can treat it as you would any array and use slice to break it up into viewable pieces.
records.slice(20, 10)
Will output 10 records, beginning at the 21st item
Obviously there are other alternatives, such as increasing the number of lines your terminal displays - but you should ask a specific question to http://unix.stackexchange.com - let them know which terminal you're using and which shell environment, etc and someone will help you.

Elastic Search: How to get an exact match for each field in a cross-fields multi-match search?

I'm trying to do an exact location search, meaning that each term in the location should exactly match at least one location field. For example, if I search for "Sudbury, Middlesex, Massachusetts" then I want to only get results that have an exact match for each of those three terms. A result with location.city.name = Sudbury, location.county.name = Middlesex, and location.region.name = Massachusetts would match.
{
"multi_match": {
"fields": [
"location.city.name",
"location.region.name",
"location.county.name",
"location.country.name"
],
"query": "Sudbury, Middlesex, Massachusetts",
"type": "cross_fields",
"operator": "and"
}
This is very close, however I also get results for "East Sudbury." I don't want East Sudbury, I only want results that match the field exactly. How can I do this? I know that "type":"phrase" is wrong because then it would be searching for the entire phrase "Sudbury, Middlesex, Massachusetts" in each field and would get no results.

Sounds like the field location.city.name is being analysed and splitting 'East Sudbury' into 'East' and 'Sudbury' and getting returned for a search for 'Sudbury'
Try setting the field to not_analyzed if you are always searching for specific terms?

Suppress delimiters in Ruby's String#split

I'm importing data from old spreadsheets into a database using rails.
I have one column that contains a list on each row, that are sometimes formatted as
first, second
and other times like this
third and fourth
So I wanted to split up this string into an array, delimiting either with a comma or with the word "and". I tried
my_string.split /\s?(\,|and)\s?/
Unfortunately, as the docs say:
If pattern contains groups, the respective matches will be returned in the array as well.
Which means that I get back an array that looks like
[
[0] "first"
[1] ", "
[2] "second"
]
Obviously only the zeroth and second elements are useful to me. What do you recommend as the neatest way of achieving what I'm trying to do?

You can instruct the regexp to not capture the group using ?:.
my_string.split(/\s?(?:\,|and)\s?/)
# => ["first", "second"]
As an aside note
into a database using rails.
Please note this has nothing to do with Rails, that's Ruby.

MongoDB Substring matching query

My application is trying to match an incoming string against documents in my Mongo Database where a field has a list of keywords. The goal is to see if the keywords are present in the string.
Here's an example:
Incoming string:
"John Doe is from Florida and is a fan of American Express"
the field for the documents in the MongoDB has a value such as:
in_words: "georgia,american express"
So, the database record has inwords or keywords separate by comman and some of them are two words or more.
Currently, my RoR application pulls the documents and pulls the inwords for each one issuing a split(',') command on the inwords, then loops through each one and sees if it is present in the string.
I really want to find a way to push this type of search into the actual database query in order to speed up the processing. I could change the in_words in the database to an array such as follows:
in_words: ["georgia", "american express"]
but I'm still not sure how to query this?
To Sum up, my goal is to find the person that matches an incoming string by comparing a list of inwords/keywords for that person against the incoming string. And do this query all in the database layer.
Thanks in advance for your suggestions

You should definitely split the in_words into an array as a first step.
Your query is still a tricky one.
Next consider using a $regex query against that array field.
Constructing the regex will be a bit hard since you want to match any single word from your input string, or, it appears any pair of works (how many words??). You may get some further ideas for how to construct a suitable regex from my blog entry here where I am matching a substring of the input string against the database (the inverse of a normal LIKE operation).

You can solve this by splitting the long string into seperate tokens and put them in to the separate array. And use $all query to effectively find the matching keywords.
Check out the sample
> db.splitter.insert({tags:'John Doe is from Florida and is a fan of American Express'.split(' ')})
> db.splitter.insert({tags:'John Doe is a super man'.split(' ')})
> db.splitter.insert({tags:'John cena is a dummy'.split(' ')})
> db.splitter.insert({tags:'the rock rocks'.split(' ')})
and when you query
> db.splitter.find({tags:{$all:['John','Doe']}})
it would return
> db.splitter.find({tags:{$all:['John','Doe']}})
{ "_id" : ObjectId("4f9435fa3dd9f18b05e6e330"), "tags" : [ "John", "Doe", "is", "from", "Florida", "and", "is", "a", "fan", "of", "American", "Express" ] }
{ "_id" : ObjectId("4f9436083dd9f18b05e6e331"), "tags" : [ "John", "Doe", "is", "a", "super", "man" ] }
And remember, this operation is case-sensitive.
If you are looking for a partial match, use $in instead $all
Also you probably need to remove the noise words('a','the','is'...) before insert for accurate results.
I hope it is clear

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Conditional grep/ack searches? - grep

I think you want this: awk '/=/{thing2=$0} /item3/{print thing2;print $0,"\n"}' YourFile So, every time you see an =, you remember the line as thing2. When you see item3 you print the last thing2 you saw and the current line. Sample Output my_list = [ 'item3', my_other_list = [ 'item3',

Related

How to scrape a page for certain strings from an array of substrings with Nokogiri

After fetching CSV file, my terminal output displays only last ~200 items

Elastic Search: How to get an exact match for each field in a cross-fields multi-match search?

Suppress delimiters in Ruby's String#split

MongoDB Substring matching query

Categories

Resources