Search engine similar to gmail - ruby-on-rails

I'm looking for a search engine, which will let my users to search my website using syntax similar to this gmail's one.
My website is a map-based directory of restaurants and shops so it would be lovely to make it possible to search it using strings like those:
Restaurant's name city:Boston diet:vegetarian
Restaurant's name country:Belgium tags:fast-food
Restaurant's name country:Poland diet:vegan tags:pizza
etc...
Have you any idea what can i use to achieve such a functionality? I've browsed all of the solutions from ruby-toolbox but most of them requires to have some kind of special search server set up. I can do that on my VPS but at first i would love to hear your opinion which one is the most powerfull, dev-friendly and which one covers the functionality described above. Thank you in advance! :)

How about https://github.com/makandra/dusen gem?
It supports gmail-like token search!

You could try to use regexp to extract search params from request:
search_pairs = params[:search].scan(/([a-zA-Z]+):([a-zA-Z]+)/)
>> [ ['country', 'Poland'], ['diet', 'vegan'] ]

Related

OCR validations with Rails building a business card scanner

My goal is to write a validation class for Rails that is capable of using an OCR recognised text from a business card and is able to detect string snippets and assign them to the correct attributes. I know this cannot be probably 100% perfect but I want to get as close as possible. Here is my approach so far:
I scan business cards via jquery's navigator.mediaDevices
I send the scanned image to a third party API Service, called OCRSpace (a gem is available here: https://github.com/suyesh/ocr_space)
I then get a unformatted array of recognised text snippets back, for example:
result = [['John Doe'], ['+49 160 123456'], ['Mainstr. 45a'], ['12345 Berlin'], ['CEO'], ['johndoe#business-website.de'], ['www.business-website.de']]
I then iterate through the array and do some checks, for example
Using the people library (https://github.com/mericson/people)
to split the name in firstname and lastname (additionally the title
or middlenames) Using the phonelib library
(https://github.com/daddyz/phonelib) to look up a valid phone number
and format it in an international string
Doing a basic regex check on the email address and store it
What I miss now is:
How can I find out what the name-string would possibly be? Right now I let the user choose it (in my example he defines "John Doe" as the name and then the library does the rest). I'm sure I would run into conflicts when using a regex as strings like "Main Street" would then also be recognized as a name?
How do I regex a combination of ZIP-Code and City name? I'm not a regex expert, do you know any good sources that would help? Couldn't find any so far except some regex-checkers in general.
In general: Do you like my approach or is this way too complicated? And do you know some best-practices that look better?
Don't consider this a full answer, but it was too much to make it a comment.
Your way of working seems Ok but I wouldn't use the OCR Service since there are other ways , Tesseract is the best known.
If you do and all the results are comparible presented it seems not too difficult since every piece of info has it's own characteristics.
You can identify the name part because it won't have numbers in it, the rest does, also you can expect to contain it "Mr." or "Mrs." or the such and not "Str.", "street" and so on. You could also use Google Maps to check for correct adresses, there are Ruby gems but have no experience with them.
Your people gem could also help.
You could guess all of this, present the results in you webpage and let the user confirm or adjust.
You could also RegExpr the post-city combination by looking fo a number and string combination in either order but you could also use a gem like ZipCodes to help.
I'm sorry, don't have the time now to test some Regular Expressions now and I don't publish code without testing.
Hope this was some help, success !

Is it possible to use gmaps4rails without a model/database?

As far as I can tell from the documentation and ReadMe for the gmaps4rails gem, you need to a model to set as acts_as_gmappable in order to use this wrapper.
In my case I am using simple a form_tag and text_field_tag elements in order to gather the addresses I want to display, and then I want to pass it through the wrapper in order to render the Google Map. I am not storing this gathered data in a database or model.
My questions are:
Can this be done with gmaps4rails? If yes can you direct me to an example of a model-less use case or give me any tips on how to do this?
If it can't be done with gmaps4rails, is there another gem/wrapper that would work? (I eventually want to show routes and directions)
I understand that I can use the original Google Maps JS V3 API, however I'm trying to keep it in Rails if possible because I'm a total newbie (business guy that decided to learn Rails and make a proto himself in order to attract a tech co-founder) and it seems like it'd be easier to use a wrapper than try to integrate with the API.
Thank you in advance for your help!
You could do this with gmaps4rails.
No need for acts_as_gmappable which is meant to geocode addresses.
Simply provide the view with something like:
#markers_json = [{lat: , lng:, description: }, {lat: , lng:, description: }].to_json
description will be displayed in infowindow.

SEO: URL for detail page, include categories or not?

I'm working on a new advert website and want to implement some good SEO URLs.
I got category URLs like:
/category
/category/sub-category
This seems ok. What about detail pages?
Option 1:
/announcements-and-notices/announcements-various/15880/suscipit-dis-molestie-malesuada-vestibulum-ut.html
Option 2:
/adverts/15880/suscipit-dis-molestie-malesuada-vestibulum-ut.html
In reality my website has a pretty long URLs due to multiple areas you can shop. So it would become:
/en/area-name/announcements-and-notices/announcements-various/15880/suscipit-dis-molestie-malesuada-vestibulum-ut.html
/en/area-name/adverts/15880/suscipit-dis-molestie-malesuada-vestibulum-ut.html
Which detail page would be a better URL? The first option seems to be better if the product has no long/good title. The second seems better as its the most relevant one and shortest especially with long category names.
I would like to hear your thoughts!
EDIT:
I found this two google docs:
http://www.google.nl/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CDYQFjAA&url=http%3A%2F%2Fwww.google.com%2Fwebmasters%2Fdocs%2Fsearch-engine-optimization-starter-guide.pdf&ei=lXyaT6T_L8zR4QSM4c2qDw&usg=AFQjCNEMj8KHxhxQz9cMLoMxMDiLdrAbJw
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=76329
I think I will be going for /adverts. Anyone disagree?
i have seen many of SEO analysts miss something about optimizing their webpage and that is your page will be optimized for only some keywords not all keywords. it is not important how length is your URL. you should first analyze whether the contents in your webpage is rich enough to have such URL with these keywords or not. if the answer for every keyword is yes then the more length will give you the more rank.
I think you can even set your pages up in a way to use only the slug and skip the id, such as:
/adverts/suscipit-dis-molestie-malesuada-vestibulum-ut
or even just:
/suscipit-dis-molestie-malesuada-vestibulum-ut
like this and refer straight to the adverts controller and the advert itself, which has this slug assigned to it (the one with id 15880).
This way you'll have nice and clean URLs. Just assign and keep an unique slug for each advert and handle it using .htaccess, or dynamically inside the code of your site, if the system allows it.
Cheers.

twitter: search hashtags in a twitter list

i'm trying to use 'Twitter Search Widget' here searching an #hashtag in a 'Twitter List', but i can't fix the exact query. Someone did it before me?
Thanks in advance, sorry for my poor english.
Francesco
There doesn't seem to be a way to directly search twitter lists using the widget. If you look at the "operators" link on this page:
https://twitter.com/#!/search-home
You can use "from:userid OR from:userid2 #hashtag" to search for #hashtag tweets from specific users, and hashtags work fine - so you could manually build a search for a list if you wanted.
You can see what operators search can take by looking at the advanced search page here:
https://twitter.com/#!/search-advanced
If i understand correctly, you're trying to search for tweets containing a specific #hashtag, authored uniquely by certain list members?
If this is the case, say for example your looking for the hashtag #COVID19 relayed only by #Twitter's list of official Twitter accounts, you could do it with the following Twitter search query:
#COVID19 list:84839422
Try it...
The structure is quite simple...
[#hashtag] list:[list_id]
You can find list IDs by looking at the URL line in the browser when opening lits on Twitter. For example, here is the URL of #Twitter's official accounts liss:
https://twitter.com/i/lists/84839422
The list ID are the digits after the last /.

Can't parse new google urls - HTTP_REFERER doesn't contain parameters anymore

It seems a little odd to my, but although everybody knows about the new google search urls (see Google using # instead of search? in URL. Why?) no one has a problem with the HTTP_REFERER.
I'm using the referrer to parse the google string for the searchquery (&q= ) but as this is all in a hash-tag it wont be sent to the server and all i get is "http://www.google.de/".
So do you know a way of getting the query the user searched for, befor landing on my site?
Due to late-2011 Google security changes, this is no longer possible when the search was performed by a signed-in Google user. See:
http://googleblog.blogspot.com/2011/10/making-search-more-secure.html
http://analytics.blogspot.com/2011/10/making-search-more-secure-accessing.html
Since there are multiple q's in the query string you have to match the "q" parameter globally and take the last one:
/[?|&|#]q=([^&|^#]+)/ig
Get rid of "site:" searches (there are others, but I haven't done them)
/[\+|?|&]?site:([^&|^#])+/g, '');
Then parse the results.
/[\w^'\(\)\{\}]+|"[^"]+"/g
This has been working well for me.

Resources