I know that this question or its variations were discussed many times.
But I didn't find exactly my solution.
I have many tables assume 10. I would like to provide a solution for searching in table1.field1, table2.field2, ... table10.field10 using only one form input field. When User is typing a searching phrase Autocomplete field have to suggest a list of available results with note a table its goes from.
First idea that comes in my mind is denormalization i.e. creation a separate table Table_for_search(field1, ..., field10) that contains data from tables was mentioned above.
My problem is I wouldn't like to use any heavy solutions like Sphinx, ElasticSearch etc. Also light solutions like meta_search.gem, ransack.gem look like not suitable for may situation.
Thanks in advance.
Related
I have data containing candidates who look for a job. The original data I got was a complete mess but I managed to enhance it. Now, I am facing an issue which I am not able to resolve.
One candidate record looks like
https://i.imgur.com/LAPAIbX.png
Since ML algorithms cannot work with categorical data, I want to encode this. My goal is to have a candidate record looking like this:
https://i.imgur.com/zzsiDzy.png
What I need to change is to add a new column for each possible value that exists in Knowledge1, Knowledge2, Knowledge3, Knowledge4, Tag1, and Tag2 of original data, but without repetition. I managed to encode it to get way more attributes than I need, which results in an inaccurate model. The way I tried gives me newly created attributes Jscript_Knowledge1, Jscript_Knowledge2, Jscript_Knowledge3 and so on, for each possible option.
If the explanation is not clear enough please let me know so that I could explain it further.
Thanks and any help is highly appreciated.
Cheers!
I have some understanding of your problem based on your explanation. I will try and elaborate how I would approach this problem. If that is not solving your problem, I may need more explanation to understand your problem. Lets get started.
For all the candidate data that you would have, collect a master
skill/knowledge list
This list becomes your columns
For each candidate, if he has this skill, the column becomes 1 for his record else it stays 0
This is the essence of one hot encoding, however, since same skill is scattered across multiple columns you are struggling with autoencoding it.
An alternative approach could be:
For each candidate collect all the knowledge skills as list and assign it into 1 column for knowledge and tags as another list and assign it to another column instead of current 4(Knowledge) + 2 (tags).
Sort the knowledge(and tag) list alphabetically within this column.
Auto One hot encoding after this may yield smaller columns than earlier
Hope this helps!
I need some advice please, I have been looking for help on this topic but it's not something you find so often. I am also quite new to Zend, so please excuse my terminology
I have a few large sql queries coming up. Most of my other queries are quite small, just a couple of joins etc, but these ones consist of many queries (drop and create temporary tables which together form the final select.) For example
DROP table if exists tmp_abc;
CREATE temporary table tmp_abc as SELECT .... From ... Group By //finish statement
Consider 20 other of these, then a final select which pulls a lot of data from one table.
Can anyone offer some advice on the best solution to tackle this problem?
Would this be possible using some RAW sql adapter or? ... I am kinda of tempted to sod the MVC principle for this based on the complexity/size of the query, but it is something I would like to know for the future which action I should go.
One possibility is to do most of it on database level, some stored procedure(s)/maybe view or two if needed. And then select from that.
The use case is this:
I'd like to let my user search from a single text box, then on the search results page organize the results by class, essentially.
So for example, say I have the following models configured for Thinking Sphinx: Post, Comment and User. (In my situation i have about 10 models but for clarity on StackOverflow I'm pretending there are only 3)
When i do a search similar to: ThinkingSphinx.search 'search term', :classes => [Post, Comment, User] I'm not sure the best way to iterate through the results and build out the sections of my page.
My first inclination is to do something like:
Execute the search
Iterate over the returned result set and do a result.is_a?(ClassType)
Based on the ClassType, add the item to 1 of 3 arrays -- #match_posts, #matching_comments, or #matching_users
Pass those 3 instance variables down to my view
Is there a better or more efficient way to do this?
Thank you!
I think it comes down to what's useful for people using your website. Does it make sense to have the same query run across all models? Then ThinkingSphinx.search is probably best, especially from a performance perspective.
That said, do you want to group search results by their respective classes? Then some sorting is necessary. Or are you separating each class's results, like a GitHub search? Then having separate collections may be worthwhile, like what you've already thought of.
At a most basic level, you could just return everything sorted by relevance instead of class, and then just render slightly different output depending on each result. A case statement may help with this - best to keep as much of the logic in helpers, and/or possibly partials?
If you have only 3 models to search from then why don't you use only model.search instead of ThinkingSphinx.search . This would resolve your problem of performing result.is_a?. That means easier treatment to the way you want to display results for each model.
I was thinking about text driven search by user input.
often you are searching in a database of addresses, where you can find customers and so on.
has anybody any idea how to find out which of the typed words is the name, which is the street name, which is the company name?
and secondly if the name is a double name like "Lee Harvey", how can I find out that the two words Lee and Harvey belong together?
Same problem with company names like "frank the baker inc."...
Is there any algorithm or best practice strategy?
thanks for links, tutorials, scripts and all other help ;-)
What you basically want is a search engine :) Here are the basic steps you need to follow -
You need to create an 'Inverted Index' of the content you want to be searched on.
The index is 'name'=>'value' pair. You can have this pair in whichever way you want (tuned according to your data & needs.
Eg. for your problem of double names, you could split all your names into single words & index it like so -
'lee'=>'lee harvey'
'harvey'=>'lee harvey'
...
this way when anyone searches for 'lee' they get 'lee harvey'. There are other better approaches to this called "n-gram" indexing. Check it out...
You could possibly build indexes of names, addresses, emails etc & when the user types a query check it against all your indexes with the approach suggested above. After you get the results then merge them. Maybe you could introduce the notion of rank so that you can sort your results & show the most latest or most relevant ones at the top. For this you need to figure out a way to score your terms...
Don't care, just perform full-text search. Then you should check the result items for which field contains the search terms. Also, you may display items in separate lists (terms found int name, term found in address). The only difficulty is if John Smith is living in the John Smiht street, you must decide, which list/lists the result item belongs to.
I'm writing a webrobot which categorizes sites based on there keyword/meta/links into a predefined list of categories.
I've been looking at various ontology approaches and have looked at Wordnet (for the hypernym/hyponym), ResearchCyc , WebKb and was wondering if this was as hard a problem as I'm thinking or has it been solved somewhere else before.
Essentially I have large stacks of sorted keyword values and would like to use them to match against a category name. My current thoughts are to check against the category name in some kind of ontology hierarchy.
Has anyone else approached a ontology based problem like this?
Cheers!
You might want to look at text mining, specifically keyword mining or subject indexing, research.