Google advanced boolean query - search-engine

I am trying to create an advanced boolean query that searches for derogatory content on the web, for use in back-grounding. I have tried "AROUND" in Google and "near:" in Bing; neither seems to work. Why don't they work? Is there a better way?
The query would be something like:
Firstname [and within 2 words] Lastname AND Lastname [and within 15 words] accus* OR appeal OR arraign* OR arrest* OR controvers* OR convict* OR scam* OR unlawful OR threat* OR scam* OR "no confidence" OR scandal* OR felon* OR lawsuit OR unethical [etc, other derogatory keywords]

"Why don't they work?"
Many of the big search engines these days don't apply boolean constraints as a strict requirement of the search. Instead, they're used as 'hints', which is to say that the pages that conform are given higher scores, but you'll also see pages that don't conform if they seem to be a closer match to the terms. The AROUND function is handled in the same way (and it worked when I tried it).
"Is there a better way?"
You may find something that meets your requirements here:
http://searchenginewatch.com/article/2065129/Search-Features-Chart
Though it may be out of date.

Related

Use Boolean Operator(mini-language) in where clause

I had implement SearchKick in my rails app. That app holds all the feature of searching that a recruiter app can have. I had working on this app for last 2 years and my searching query is written well enough to hold the aggregated data and some conditional clauses as well.
Now I want to implement "mini-language"(Boolean Operator) that is supported by elastic search using query_string.
Query String Supported Boolean Operator:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-string-syntax
I want to use query_String with my existing SearchKick query. I know I can use Advanced Search but for this I would have to replace my current SearchKick query with an ElasticSearch query. I don't want to do that because it is a big change for my project which is live and has 1000+ of users.
I just want to adjust query_string in seachKick query in some way without having to replace it with ES query.
Is this possible?
I have the same problem.
But, if I understood the situation correctly, Searchkick maintainers decided to not support current behavior.
They mentioned it here:
https://github.com/ankane/searchkick/issues/97
and here:
https://github.com/ankane/searchkick/issues/1494
But, some workarounds may help you well enought.
You may do something like this:
Product.search body: {query: {query_string: {query: 'white shoes and water washable', default_field: 'product_description'}}}
I found it in this conversation:
https://github.com/ankane/searchkick/issues/1390#issuecomment-596657334
Or, you may just combine the needed fields into one for the search_data method. Which maybe even easier, but probably a more robust solution.
Please, see the reference below:
https://stackoverflow.com/a/51778715/2703618

how to create a replicable, unique code for a pre-ISBN book

I am putting my collection of some 13000 books in a mySQL database. Most of the copies I possess
can be identified uniquely by ISBN. I need to use this distinguishing code as a foreign key into
another database table.
However, quite a few of my books date from pre-ISBN ages. So for these, I am trying to devise a
scheme to uniquely assign a code, sort of like an SKU.
The code would be strictly for private use. It should have the important property that, when I
obtain a pre-ISBN publication, I could build the code from inspecting the work, and based on the
result search the database to see if I already have other copies in my possession.
Many years ago I think I saw a search scheme for some university(?) catalogue, where you could
perform a search of a title based on a concatenated string' (or code) that was made up of let's
say 8 letters from the title, and 4 from the author, and maybe some other data. For example,
to search 'The Nature of Space and Time' by Stephen Hawking and Roger Penrose you might perform
a search on the string 'Nature SHawk', being comprised of 8 characters from the title (omitting
non-filing words and stopwords) and 4 from the author(s).
I haven't been able to find any information on such scheme's, or whether or not such an approach
was standardized in any way.
Something along these lines could be made up of course, but I was wondering if people here have
heard of such schemes, of have ideas on how to come to a solution to this.
So keep in mind the important property of 'replicability': using the scheme, inspection of a pre-
ISBN dated work should --omitting very special or exclusive cases-- in general lead to a code
that can singly be used to subsequently determine if such a copy is already in the database.
Thank you for your time.
Just use the Title (add Author and Publisher as options) and a series id to produce a fake isbn. Take a look at fake_isbn.
NOTE: use the first digit as a series id but don't use 9!

How to implement fuzzy search

I'm using Neo4j 3 REST API and i have node named customer it has properties like name etc i need to get search results of name of customer eg i should get results for name "john" for my input "joan".how to implement fuzzy search to get my desired results.
Thanks in advance
First off, I want to make that you know that if you're using Neo4j 3.x that 3.x is currently in beta and isn't considered stable yet.
You have two options to implement a fuzzy search in Neo4j. You can use the legacy indexes to implement Lecene-based indexing. That should provide anything that Lucene can do, though you'd probably need to do a bit more work. You can also implement your own unmanaged extension which will allow you to use Lucene a bit more directly.
Perhaps the easier alternative is to use elasticsearch with Neo4j and have elasticsearch do your full-text indexing. You might take a look at the Neo4j and ElasticSearch page on neo4j.com. There they provide a link to a GitHub repository which is a plugin for Neo4j which automagically updates ElasticSearch with data from Neo4j and which provides and endpoint for querying your graph fuzzily. There is also a video tutorial on how to do this.
You will have to try using https://neo4j.com/developer/kb/how-to-perform-a-soundex-search/ which in this case will work. If your input is Joan you will not get John as the response, unless you just give jo as input in which you will get both. To get what you are expecting you will have to use the soundex search.
Stepping back a little, what is the problem you are trying to solve with fuzzy matching?
My experience has been that misspellings and typos are far less common than you might think, and humans prefer exact matches whenever possible. If there is no exact match (often just missing a space between words), that's a good time to use a spellchecker, and that's where the fuzzy matching should kick in.
In addition, your example would match "joan" to "john", but some synonyms like "joanie" would be more useful. If you have a big corpus of content to work with, you may be able to extract some relationships, using fuzzy & machine learning to identify "joanne" and "joni" as possible synonyms and then submit that to a human curator. "Jon" looks like a related name but it's not, while "jo" and even "nonie" may or may not be nicknames in these groupings.

question regarding twitter search api

I want to use the twitter search api to search for some famous person. For instance I want to search for a particular "Mr Patrick Lee C K". I would construct my search term to be something like:
http://search.twitter.com/search?q=%22lee+c+k%22+OR+%22patrick+lee%22
However, knowing that tweets are often informal, I know that sometimes people can address him by his initials 'lck'. To increase the precision of my search, I figure it would be better if my query can associate with his company, for instance my query could also be lck microsoft.
Now, i want to string these 3 search terms "patrick lee"/"lee c k"/lck microsoft together in one query. I probably will use OR. Then again, my last search term should not be a fixed phrase i.e word lck and microsoft can be some distance from each other.
Can anyone tell me how should i link these search terms together inside one query?
Stringing your queries together with "OR" is the best way to do this. The best way to do this is by searching "patrick lee" OR "lee c k" OR lck microsoft. Note that proximity queries are not supported by the Twitter Search API.
There's a few reasons why: 1) The search query only counts towards one count of your API limit, despite it being a fairly expensive query and 2) even though you can't really do a proximity query for "lck microsoft", Tweets are only 140 characters and chances are that those terms would be fairly close to each other regardless. In fact, eliminating the quotes around "lee c k" might actually raise your recall without compromising too much precision.
The features available on the Twitter Advanced Search page are the full list of features that you can use in your search query.

Cleaning Up Query Strings

This is more of an open question. What is your opinion on query strings in a URL? While creating sites in ASP.NET MVC you spend a lot of time thinking about and crafting clean URLs only for them to be shattered the first time you have to use query strings, especially on a search form.
For example I recently did a fairly simple search form with half a dozen text field and two or three lists of checkboxes and selects. This produced the query string below when submitted
countrylcid=2057&State=England&StateId=46&Where=&DateFrom=&DateTo=&Tags=&Keywords=&Types
=1&Types=0&Types=2&Types=3&Types=4&Types=5&Costs=0.0-9.99&Costs=10.00-29.99&Costs=30.00-59.99&Costs=60.00-10000.00
Beautiful I think you'll agree. Half the fields had no information in them and the list inputs are very verbose indeed.
A while ago I implemented a simple solution to this for paging which produced a url such as
www.yourdomain.com/browse/filter-on/page-1/perpage-50/
This used a catchall route to grab what is essentially a replacement query string after the filter-on portion. Works quite well but breaks down when doing form submissions.
Id be keen to hear what other solutions people have come up with? There are lots of articles on clean urls but are aimed at asp.net developers creating basic restful urls which MVC has covered. I am half considering diving into model binding to produce a proper solution along those lines. With the above convention the large query string could be rewritten as:
filter-on/countrylcid-2057/state-England/stateId-46/types-{1,0,2,3,4,5}/costs-{0.0-9.99,10.00-29.99,30.00-59.99,60.00-10000.00}/
Is this worth the effort?
Thanks,
My personal view is that where users are likely to want to either bookmark or pass on URLs to other people then a nice, clean "friendly" URL is the way to go. Aesthetically they are much nicer. For simple pagination and ordering then a re-written URL is a good idea.
However, for pages that have a large number of temporary, dynamic fields (such as a search) then I think the humble query string is fine. Like wise for pages whose contents are likely to change significantly given the exact same URL in the future. In these cases, URLs with query strings are fine and, perhaps, even preferable as they at least indicate to the observant user that the page is dynamic. However, in these cases it may be better to use form POST variables, anyway, that way visitors are not tempted to "fiddle" with the values.
In addition to what others have said, a URL implies a hierarchy that is semantic. Whether true today or not, the ancestry is directories and people still think of it as such. That's why you have controller/action/id. Likewise, to me a querystring implies options or queries.
Personally, I think a rewritten URL is best when you can't tell if there's an interpreter behind it -- maybe it's just a generated HTML file?
So however you choose to do it (and it's a pain on the client in a search form -- I'd say more trouble than it's worth), I'd support you doing it for hierarchies.
E.g. /Search/Country/State/City
but once you start getting into prices and types, or otherwise having to preface a "directory" with the type of value (e.g. /prices=50.00/ or worse, with an array), then that's where you've lost me.
In fact, if all elements are filled in, then all you've really done is taken the querystring, replaced "&" with "/", and combined your arrays into a single field.
If you're going to be writing the javascript anyways, why don't you just loop through the form elements and:
Remove the empty ones, cleaning up the querystring from the "&price_low=&price_high=&" sorts of things.
Combine multiple values into an array structure
But then submit as a querystring.
James
Aren't the values of the different fields available in the FormsCollection anyway on post?

Resources