Indexing criteria for elasticsearch - twitter

I am working with twitter streaming api. and am a little confused about deciding the criteria for indexing the data. Right now I have a single index that contains all the tweets in one doc_type and users in another doc type.
Is it the best way to go about storing them or should i create a new doc type for every category (category can be decided on basis of hashtag and tweet content)
What should be the best approach to storing such data?
Thanks in advance.

At first, the answer to your question is that this very much depends on your use case. What is your application doing? What do you do with the tweets? How many categories do you plan to have?
I'd in general, however, go for a solution where you use the same index and the same doc_type for all tweets. This allows you to build queries and aggregations over all your tweets without thinking about the different types of categories. It also allows you to add new categories easily without having to change your queries.
If you want to do some classification of the tweets you could add a category field to the tweet document stored in elasticsearch. You can then use this category field to implement your specific application logic.
If your category names have spaces or punctuation marks don't forget to define the category field as not_analyzed. Otherwise it will be broken up in parts.

Related

Solr join across multiple collections and fetch data from both collections

I have 2 solr collections:
Ads {id, title, body, description, etc etc)
AdPlacement (ad_id, placement_id, price)
Each Ad can have 500-1000 placements, with different prices.
The search usecase is where I have a placement and some search keyword and I want to find the Ads that map the keyword provided in the title/body/description fields and it should be sorted by the price in the AdPlacement collection for the given placement. We would like to get the Ad details and the price in the output returned.
Is there any way to achieve this in solr using join across multiple collections? What I have read so far says you can only get data from one collection and use the other one just for filtering.
Solr is a Document database and supports nested documents so ideally you would want to model such that your add placement records are a part of the Ad document. This would be the better way to handle your scenario. Please go through this blog Solr Nested Objects and the relevant Solr documentation
In case modifying the document structure is not an option then consider this documentation which mentions about allowing some level of join between collections.

ActiveRecord schema for multiple types of objects with common columns?

I'm working on a Rails 5 app for Guild Wars 2, and I'm trying to figure out a way to serialize and store all of the items in the game without duplicating code or table columns. The game has a public API to get the items from, documented
here: https://wiki.guildwars2.com/wiki/API:2/items
As you can see, all of the items share several pieces of data like ID, value, rarity, etc. but then also branch off into specific details based on their type.
I've been searching around for a solution, and I've found a few answers, but none that work for this specific situation.
Single Table Inheritance: There's way too much variance between items. STI would likely end up with a table over 100 columns wide, with most of them null.
Polymorphic Associations: Really doesn't seem to be the proper way to use these. I'm not trying to create a type of model that gets included multiple other places, I just want to extend the data of my "Item" model.
Multiple Table Inheritance: This looks to me like the perfect solution. It would do exactly what I'm wanting. Unfortunately, ActiveRecord does not support this, and all of the "workarounds" I've found seem hacky and weird.
Basically, what I'm wanting is a single "Item" model with the common columns, then a "details" attribute that will fetch the type-specific data from the relevant table.
What's the best way to create this schema?
One possible solution:
Use #serialize on the details (text) column
class Item
serialize :details, Hash
end
One huge downside is that this is very inefficient if you need to query on the details data. This essentially bypasses the native abstractions of the database.
I was in a similar situation recently. I solved by using Sequel instead of ActiveRecords.
You can find it here:
https://github.com/TalentBox/sequel-rails
And an implmentation example:
http://www.matchingnotes.com/class-table-inheritance-in-rails.html
Good luck

Creating a keyword structure on Core Data and search that big animal

I am about to create a coredata database of pictures. These pictures will have keywords. One picture may have keywords like "bird", "eagle", "flying", "nest", etc.
OK I can create an entity one to many to link the Keyword entity to the Pictures entity but my problem is this, I will use the search controller to scan the database as the user types on the search box. Will be coredata fast enough to search as the user type and show the results without clogging the application?
By the way, thinking about it now, it will be a pretty complex query to send to coredata! And how do I combine queries? I mean, suppose the user search for eagle. The database can contain different types of eagles but then the user types "bald" and the app must combing "eagle" and "bald", or, pictures that contain both...
What is the better way of doing that structure
Core Data is very fast. If you index your keywords, it will make queries faster. As for combining multiple search terms you can you use an NSCompoundPredicate.

Store as JSON or separate attributes?

I'm using Rails 3.2 and MariaDB. I have this group of data:
description, services, facilities
Not indexed and purely for output in the show page. Should I store these as one JSON object in one more_info attribute or store as separate attributes?
I personally would make columns for them, it would generally make the fields easier to work with, especially if there will be need a to update the values. I usually reserve JSON serialized fields when I do not know how many attributes there will be.
If you are showing the data to your users I would recommend saving them in different columns. I find that as soon as users see something they want to filter by it or work with it in ways you have not foreseen.
If you are not then the choice is less clear cut but the very fact you have 3 distinct groups suggest that they are different things which could be treated differently as your application matures.
I would always go with the Normalised form unless you have documented reasons not to.

Returning Search Results in Rails

I am having a problem implementing a special kind of search for my Rails application. I am working on an achievement system where you can search for a set of users in a search form (e.g., the query being "Ross, Adam, Jake") and it returns all of the common achievements that the users have unlocked (e.g., if users Ross, Adam, and Jake all had an achievement named "You are winner!"). I have three tables, one for achievements, one for users, and a join table. We have tested the associations and such, so we know that works.
My first idea was to put the search terms in an array and get the search results for each item in the array and place them into respective "search result arrays". Then, I was thinking to go through each item in search result array 1 to see if it appears in both of the other result arrays. The objects that appear in all three of the search result arrays would be returned and displayed on a page.
Is there an easy way to implement this without writing a bunch of my own code? Are there some functions I should know about? Any help will be appreciated!
Well, both Ransack and it's predecessor (MetaSearch) are useful gems for creating complex search forms.
In general I think you want to do something like select distinct achievement ids for user ids in an array. Off the top of my head I'm not quite sure how you should write it... others may know.
Look at the documentation on MetaSearch (more established) and see if you see a pattern that fits, if not check Ransack (more advanced).
You can use some autocomplete plugin for user names and convert the names to ids on the fly, that way you won't have to deal with converting user names to ids in backend later.
For common achievements, if a user can have a achievement only once, aggregating the results in join table and counting the results with achievement ids would be the way to go.
You can provide more details for a more detailed answer. :)
You can use Sunspot which is allows easy solr integration with Ruby and Rails

Resources