Filter by multiple resourceId's - google-docs-api

Is there a way to query by multiple resourceId's using the .Net library?
For instance, usually I query for a single feed entry by resourceId like this:
DocumentsListQuery query = new DocumentsListQuery();
query.Uri = new Uri(string.Format("{0}/{1}", DocumentsListQuery.documentsBaseUri, doc.ResourceId));
DocumentsFeed feed = service.Query(query);
I'm wondering if there's some way to query for multiple documents by their resourceId's in a single query, instead of just fetching the whole list.

A single RESTful query can only return a single element or a feed, so there's no way to query by multiple resourceId's.
An alternative might be to specify a search query that restricts your results to the elements you want, but such search criteria only exist if your files have something in common that differentiates them from the other documents.
The documentation for search query in the Documents List API is available at https://developers.google.com/google-apps/documents-list/#searching_for_documents_and_files, but I'd also recommend you to take a look at the newer Drive API and at how it manages search:
https://developers.google.com/google-apps/documents-list/#searching_for_documents_and_files

Related

Analyze similarities in model data using Elasticsearch and Rails

I would like to use Elasticsearch to analyze data and display it to the user.
When a user views a record for a model, I want to display a list of 'similar' records in the database for that model, and the percentage of similarity. This would match against every field on the model.
I am aware that with the Searchkick gem I can use a command to find similar records:
product = Product.first
product.similar(fields: ["name"], where: {size: "12 oz"})
I would like to take this further and compare entire records (and eventually associations).
Is this feasible with Elasticsearch / Searchkick in Rails, or should I use another method to analyze the data?
There is a feature built exactly for this purpose in Elasticsearch called more_like_this. The documentation for the mlt query goes into great details about how you can achieve exactly what you want to do.
The content you provide to the like field will be analyzed and the most relevant terms for each field will be used to retrieve documents with as many of those relevant terms. If you have all your records stored in Elasticsearch, you can use the Multi GET syntax to specify a document already in your index as content of the like field like this:
"like" : [
{
"_index" : "model",
"_type" : "model",
"_id" : "1"
}
]
Remember that you cannot use index aliases when using this syntax (so you'll have to do a document lookup first if you are not sure which index your document is currently residing in).
If you don't specify the fields field, all fields in the source document will be used. My suggestion to avoid bad surprises, is to always specify the list of fields you want your similar documents to match.
If you have non-textual fields that you want to match perfectly with the source document, you might want to consider using a bool query, programmatically creating the filter section to limit documents returned by the mlt query to only a filtered subset of your entire index.
You can build these queries in Searchkick using the advanced search feature, manually specifying the body of search requests.
Read up on using More Like This Query. This is the query produced by product.similar(). It operates only on text fields. If you also want to compare numeric or date fields, you'll have to incorporate these rules into a scoring script to do what you're asking.

Dynamic Queries using Couch_Potato

The documentation for creating a fairly straightforward view is easy enough to find:
view :completed, :key => :name, :conditions => 'doc.completed === true'
How, though, does one construct a view with a condition created on the fly? For example, if I want to use a query along the lines of
doc.owner_id == my_var
Where my_var is set programatically.
Is this even possible? I'm very new to NoSQL so apologies if I'm making no sense.
Views in CouchDB are incrementally built / indexed as data is inserted / updated into that particular database. So in order to take full advantage of the power behind views you won't want to dynamically query them. You'll want to construct your views in such a way that you can efficiently access the data based on the expected usage patterns of the application. In my experience it's not uncommon to have multiple views each giving you a different way to access / query the same data. I find it helpful to think of CouchDB views as a way to systematically denormalize your documents.
On the other hand there are also ways to generalize your indexes in your views so you can use a single view for endless combinations of queries.
For example, you have an "articles" database, and each article document contains a list of tags. If you want to set up a query to dynamically retrieve all articles tagged with a handful of tags, you could emit multiple entries to the view on the same document:
// this article is tagged with "tag1","tag2","tag3"
emit("tag1",doc._id);
emit("tag2",doc._id);
emit("tag3",doc._id);
....
Now you have a way to query: Give me all articles tagged with these words: ["tag1","tag2",etc]
For more info on how to query multiple keys see "Parameter -> keys" in the table of Querying Options here:
http://wiki.apache.org/couchdb/HTTP_view_API#Querying_Options
One problem with the above example is it would produce duplicates if a single document was tagged with both or all of the tags you were querying for. You can easily de-dupe the results of the view by using a CouchDB "List Function". More info about list functions can be found here:
http://guide.couchdb.org/draft/transforming.html
Another way to construct views for even more robust "dynamic" access to the data would be to compose your indexes out of complex data types such as JavaScript arrays. Also incorporating "range queries" can help. So for example if you have a 3-item array in your index, but only have the first 2 values, you can set up a range query to pull all documents that match the first 2 items of the array. Some useful info about that can be found here:
http://guide.couchdb.org/draft/views.html
Refer to the "startkey", and "endkey" options under "Querying Options" table here:
http://wiki.apache.org/couchdb/HTTP_view_API#Querying_Options
It's good to know how CouchDB indexes itself. It uses a "B+ tree" data structure:
http://guide.couchdb.org/draft/btree.html
Keep this in mind when thinking about how to compose your indexes. This has specific implications about how you need to construct your indexes. For example, you can't expect to get good performance on a view if you query with a range on the first item in the array. For example:
startkey = [a,1,2]
endkey = [z,1,2]
You'll get the performance you'd expect if your query is:
startkey = [1,2,a]
endkey = [1,2,z]
This, in more general terms, means that index order does matter when querying views. Not just on basis of performance, but on basis of what documents will be returned. If you index a document in a view with [1,2,3], you can't expect it to show up in query for index [3,2,1], [2,1,3], or any other combination.
In my experience, most data-access problems can be solved elegantly and efficiently with CouchDB and the basic tools it provides. If / when your project needs true dynamic access to the data, I generally still use CouchDB for common data access needs, but I'll also integrate ElasticSearch using an ElasticSearch plugin which streams your data from CouchDB into ElasticSearch as it becomes available:
http://www.elasticsearch.org/
https://github.com/elasticsearch/elasticsearch-river-couchdb

Applying distinct on specific field in CloudSearch query

I am examining AWS CloudSearch for system's new searching engine.
Assume that there are articles and some comments written on each articles. The search API should return articles that are matching or having any matching comments. So is there any possible way to retrieve DISTINCT values(in this case, unique ID of the article) from CloudSearch with single query execution? If not, what would be the nice solution to resolve this feature's requirement with CloudSearch?
I know there's text-array type for document field in CloudSearch but it seems expensive to update documents since N of comments on single article can be more than thousands.
I faced similar problem, putting comments is not an option in your case as array elements cannot be more than 1000 in cloudsearch. I will make two search domains, articles and comments. I will issue search query to both of them in parallel (async or multithreaded depending upon the language), articles will always generate non duplicate ids but on the results of comments query you have to apply the logic to an article id only once and always pick the top one, as results are sorted by matching score.

Does Youtube API have "AND and OR" search and explicit match search?

Does Youtube API allow me to do searches like
Search videos which have (in their title) strings both Lady Gaga AND (Cyrus OR Muse)
And does Youtube API allow me to do searches like
Search videos which have (in their title) string exactly Katy Perry. I don't want titles which have Katy Elizabeth Perry.
What's the most efficient code to write that type of search request? I want to code it using Ruby on rails.
I've gone through various introduction about how to search Youtube but they were mainly talking about other filtering things like relevance and view counts filtering.
And is supported with include and exclude just like the search query in the Web UI.
You can use -{query term} to exclude a query term. Or |{gaga} to OR.
like {lady -gaga} or in decoded form
https://www.googleapis.com/youtube/v3/search?part=snippet&q=lady+-gaga&key={YOUR_API_KEY}
You can also make separate calls, put results into sets and do all these operations in your client.

Using Ferret to build unique tag clouds

I've been using Ferret as my full-text search engine in a small project I'm working on.
Through the documentation and a few examples online, i've been able to pull together a tag cloud generator using the full-text index to help with tag cloud generation using the IndexReader.terms method.
It's worked quite well up to now, when I want to get term data based on a search result.
For example, if the user searches for "cake", I want to show them a tag cloud of terms used in association with the term "cake".
I've been looking for examples of where the terms method can be used in association with a search result set or similar?
Currently I'm using the following method to generate my list of tags:
reader = Ferret::Index::IndexReader.new(Scrape.find_last_index_version)
terms = []
reader.terms(:all_quotes).each do |term, doc_freq|
terms << [term, doc_freq]
end
Cheers.
It's more like a term frequency chart (like a wordle) than a tag cloud? Or are these in a tag field? Anyway, the index doesn't keep track of term frequency within each possible document subset (such as the results of a search), so that method wouldn't be fast, even if it existed. For a single document, you can get the TermFreqVector and provide suggested documents that are good matches for other frequent terms in that document. So, you could take some of the top results, grab the term vectors from each one, and just add them up, but those aggregate functions don't exist natively (they generally try not to put slow operations in there.)

Resources