I am using Google Books API to let a user search for a particular book and display the results. The problem is that different editions of the same book have different ISBNs. Is there any way to club these reprints, based on the information the API returns ?
I want to do this is because I have the ISBNs of some of the editions in my database. So when the user searches for a book, I would like to clubs all the results and display them as one result.
I'm not familiar with this use of the word "club", but it appears that you want to group books that have the same ISBN. I don't know how to do this solely with Google Books, but you can use the wonderful xISBN web service to look up alternate ISBNs for books.
Hit a URL like
http://xisbn.worldcat.org/webservices/xid/isbn/0596002815
to get this response:
<rsp stat="ok">
<isbn>0596002815</isbn>
<isbn>1565928938</isbn>
<isbn>1565924649</isbn>
<isbn>0596158068</isbn>
<isbn>0596513984</isbn>
<isbn>1600330215</isbn>
<isbn>8371975961</isbn>
<isbn>059680539X</isbn>
<isbn>8324616489</isbn>
</rsp>
Which includes first the original ISBN and then all alternates known to WorldCat. Then you can use the alternates for grouping.
Related
I have 2 solr collections:
Ads {id, title, body, description, etc etc)
AdPlacement (ad_id, placement_id, price)
Each Ad can have 500-1000 placements, with different prices.
The search usecase is where I have a placement and some search keyword and I want to find the Ads that map the keyword provided in the title/body/description fields and it should be sorted by the price in the AdPlacement collection for the given placement. We would like to get the Ad details and the price in the output returned.
Is there any way to achieve this in solr using join across multiple collections? What I have read so far says you can only get data from one collection and use the other one just for filtering.
Solr is a Document database and supports nested documents so ideally you would want to model such that your add placement records are a part of the Ad document. This would be the better way to handle your scenario. Please go through this blog Solr Nested Objects and the relevant Solr documentation
In case modifying the document structure is not an option then consider this documentation which mentions about allowing some level of join between collections.
I have a Rails application featuring a city in the US. I'm working on a database process that will feature businesses that pay to be on the website. The goal is to feature businesses within an hour's drive of the city's location in order to make visitors aware of what is available. My goal is to group the businesses by city where the businesses in the city are listed first then the businesses from the next closest city are displayed. I want the cities to be listed by distance and the businesses within the city group to be listed by the business name.
I have two tables that I want to join in order to accomplish this.
city (has_many :businesses) - name, distance
business (belongs_to :city) - name, city_id, other columns
I know I can do something like the statement below that should only show data where business rows exist for a city row.
#businesses = City.order(“distance ASC").joins('JOIN businesses ON businesses.city_id = cities.id')
I would like to add order by businesses.name. I've seen an example ORDER BY a.Date, p.title which referencing columns from two databases.
ORDER BY a.Date, p.title
Can I add code to my existing statement to order businesses by name or will I have to embed SQL code to do this? I have seen examples with other databases doing this but either the answer is not Rails specific or not using PostgreSQL.
After lots more research I was finally able to get this working the way I wanted to.
Using .joins(:businesses) did not yield anything because it only included the columns for City aka BusinessCity and no columns for Business. I found that you have to use .pluck or .select to get access to the columns from the table you are joining. This is something I did not want to do because I foresee more columns being added in the future.
I ended up making Business the main table instead of BusinessCity as my starting point since I was listing data from Business on my view as stated in my initial question. When I did this I could not use the .joins(:business_cities) clause because it said the relation did not exist. I decided to go back to what I had originally started with using Business as the main table.
I came up with the following statement that provides all the columns from both tables ordered by distance on the BusinessCity table and name on the Business table. I was successful in added .where clauses as needed to accommodate the search functionality on my view.
#businesses = Business.joins("JOIN business_cities ON business_cities.id = businesses.business_city_id").order("business_cities.distance, businesses.name")
I am working with twitter streaming api. and am a little confused about deciding the criteria for indexing the data. Right now I have a single index that contains all the tweets in one doc_type and users in another doc type.
Is it the best way to go about storing them or should i create a new doc type for every category (category can be decided on basis of hashtag and tweet content)
What should be the best approach to storing such data?
Thanks in advance.
At first, the answer to your question is that this very much depends on your use case. What is your application doing? What do you do with the tweets? How many categories do you plan to have?
I'd in general, however, go for a solution where you use the same index and the same doc_type for all tweets. This allows you to build queries and aggregations over all your tweets without thinking about the different types of categories. It also allows you to add new categories easily without having to change your queries.
If you want to do some classification of the tweets you could add a category field to the tweet document stored in elasticsearch. You can then use this category field to implement your specific application logic.
If your category names have spaces or punctuation marks don't forget to define the category field as not_analyzed. Otherwise it will be broken up in parts.
Could you please provide me some details on mahout recommendations using data with multiple factors.? I have data with user id, book, language, category etc. Say suppose, a person read the book with category as thriller, in french language. Now considering all those facts i need to recommend a book to him. Could you please give me some insight on picking the right path.?
Just the thing for Mahout 1.0 where we create models for a search engine to index and query.
The models are called indicators and are lists of similar items for each item. Similar in the sense that they were purchased by the same people. This is the essence of a cooccurrence recommender.
The collaborative filtering data is the book read or the ID. If you recommend a book you can show other IDs with the same titles for multiple formats (ebook, recorded, papeback etc.) The metadata can be used to skew recs toward a certain category. The language is probably a filter unless you think your audience are usually multilingual.
Create the CF type indicator by feeding purchases into Mahout 1.0 spark-itemsimilairty. out will come a list of similar books for each book. Index those in a search engine. Then the simplest query is the user's history of books purchased. This will yields unskewed recommendations as an ordered list of books.
Now to skew results towards the user's most favored category index the categories for each item in a separate field in the index. So the index has a field for "indicators" and one for "categories". The "docs" are really items/books in your catalog. the skewed query is (pseudo-code):
query:
field: indicators; q: "book1 book2 book3 book10" //the user's purchase history
field: categories; q: "user's-favorite-category user's-second-favorite-category"
field: language; filter: "list-of-languages-of-books-the-user-has-purchased"
You can put as many categories in the query on that field as you wish, perhaps all the user has purchased from. Note use of a language filter, you may want to use this as a skewing factor rather than a filter. In this way you can seamlessly integrate collaborative filtering recs skewed or filtered by metadata to get higher quality recs. Any metadata can be used that you think will help.
BTW you will get even better recs if you add in other actions you have recorded like views of book details. This will call for a specially processed indicator called a cross-cooccurrence indicator and is also calculated by spark-itemsimilairty. In fact you can include just about any action the user takes--the entire clickstream as separate cross-cooccurrence indicators. This will tend to greatly increase the amount of collaborative filtering data you can use in making recs and therefore improve quality.
This idea can even be extended to actions on items that are not books, like categories. If a user purchases a book they also, in a sense, purchase a category. If you record these "category purchases" as a secondary action and create a cross-cooccurrence indicator with them you can use them both to skew results and as a purchase indicator. The query would look like this:
query:
field: indicators; q: "book1 book2 book3 book10" //the user's purchase history
field: category-indicators; q: "user's-history-of-purchased-categories"
field: categories; q: "user's-favorite-category user's-second-favorite-category"
field: language; filter: "list-of-languages-of-books-the-user-has-purchased"
Read about spark-itemsimilarity here: http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html This includes some discussion of how to use a search engine (Solr, Elasticsearch) for the index and query part.
I'm hooking into the yelp 2.0 Api and I'm wondering if there is a way to retrieve the list of categories and sub categories. I know that the list is available here http://www.yelp.com/developers/documentation/category_list but there doesn't seem to be a way to retrieve it. I'd like to have a source to retrieve it from so that it's not hard coded in my application and will stay up to date.
Here is a link to all of the categories in json format: https://raw.github.com/Yelp/yelp-api/master/category_lists/en/category.json
You can programmaticaly download this. The good thing about page is that it gives the categories as they exist in the Yelp ontology.
[Edit]
Now you can get the json of all categories from all countries:
https://www.yelp.com/developers/documentation/v2/all_category_list/categories.json
bad news, it's not sorted by country, it's all of them
I know you won't like it but I recommend not to use json from github and parse this html page instead.
As of writing json from accepted answer is 11 months old. It's missing many categories. To name few:
gift shops
shanghainese
cantonese
food trucks
beer
wine & spirits
bubble tea
puerto rican
resorts
Also note that there is discrepancy between categories from this html list, json from git hub and actual value used on web-page of each business in regards to visualization of word and. In some cases it's wine & spirits and in others it's wine and spirits. Be careful with it.
PS. I'm not (yet) a Scala guru but here is how I parse the HTML