I am using md-highlight-text to highlight words in a list of checkbox-labels based on search. But I want to highlight multiple words searched.. There is no option/flag for this in the directive?
Code example from md site:
<input placeholder="Enter a search term..." ng-model="searchTerm" type="text">
<ul>
<li ng-repeat="result in results" md-highlight-text="searchTerm">
{{result.text}}
</li>
</ul>
here I want to highlight multiple words typed in the input.
Because I found this in my top search result on the web, and because the question was a bit unclear, I'm going to answer this multiple ways.
Of course, as mentioned in lorenzo montanan's answer, you do need to provide some css for the highlight (I think so, at least).
If the OP (or you, the reader) was asking to highlight multiple words in the results, there is now a md-highlight-flags which could help (see md-highlight-text documentation) which currently works like this:
md-highlight-flags - string - A list of flags (loosely based on JavaScript RexExp flags).
Supported flags:
g: Find all matches within the provided text
i: Ignore case when searching for matches
$: Only match if the text ends with the search term
^: Only match if the text begins with the search term
If, however, you want to type in multiple words in the input and have the output highlight each word separately without having to be in the same order, then md-highlight-text will not, and is not interested in doing that (see: request for highlighting multiple input words and another request). One way to do that is to write your own filter directive.
Related
I am in the need of doing multiple substitutions in a text coming from a database and before displaying it to the user.
My example is for data most likely found on a CRM and the output is HTML for web, but the question is generalizable to any other text-subtitution need. The question is general for any programming language. In my case I use PHP but it's more an algorithm question than a PHP question.
Problem
Each of the 3 examples I'm writing below are super-easy to do via regular expressions. But combining them in a single shot is not so direct even if I do multi-step substitutions. They interfere.
Question
Is there a design-pattern for doing multiple interferring text substitutions?
Example #1 of substitution: The IDs.
We work with IDs. The IDs are sha-1 digests. IDs are universal and can represent any entity in the company, from a user to an airport, from an invoice to a car.
So in the database we can find this text to be displayed to a user:
User d19210ac35dfc63bdaa2e495e17abe5fc9535f02 paid 50 EUR
in the payment 377b03b0b4e92502737eca2345e5bdadb1262230. We sent
an email a49c6737f80eadea0eb16f4c8e148f1c82e05c10 to confirm.
We want all IDs to be translated into links so the user watching it the info can click. There's one general URL for decoding IDs. Let's assume it's http://example.com/id/xxx
The transformed text would be this:
User d19210ac35dfc63bdaa2e495e17abe5fc9535f02 paid 50 EUR
in the payment 377b03b0b4e92502737eca2345e5bdadb1262230. We sent
an email a49c6737f80eadea0eb16f4c8e148f1c82e05c10 to confirm
Example #2 of substitution: The Links
We want anything that ressembles a URI to be clickable. Let's focus only in http and https protocols and forget the rest.
If we find this in the database:
Our website is http://mary.example.com and the info
you are requesting is in this page http://mary.example.com/info.php
would be converted into this:
Our website is http://mary.example.com and the info
you are requesting is in this page http://mary.example.com/info.php
Example #3 of substitution: The HTML
When the original text contains HTML it must not be sent raw as it would be interpreted. We want to change the < and > chars into the escaped form < and >. The translation table for HTML-5 also contains the & symbol to be converted to &This also affects the translation of the Message Ids of the emails, for example.
For example if we find this in the database:
We need to change the CSS for the <code> tag to a pure green.
Sent to John&Partners in Message-ID: <aaa#bbb.ccc> this morning.
The resulting substitution would be:
We need to change the CSS for the <code> tag to a pure green.
Sent to John&Partners in Message-ID: <aaa#bbb.ccc> this morning.
Allright... But... combinations?
Up to here, every change "per se" is super-easy.
But when we combine things we want them to still be "natural" to the user. Let's assume that the original text contains HTML. And one of the tags is an <a> tag. We still want to see the complete tag "displayed" and the HREF be clickable. And also the text of the anchor if it was a link.
Combination sample: #2 (inject links) then #3 (flatten HTML)
Let's say we have this in the database:
Paste this <a class="dark" href="http://example.com/data.xml">Download</a> into your text editor.
If we first apply #2 to transform the links and then #3 to encode HTML we would have:
Applying rule #2 (inject links) on the original the link http://example.com/data.xmlis detected and subtituted by http://example.com/data.xml
Paste this <a class="dark" href="http://example.com/data.xml">Download</a> into your text editor.
which obviously is a broken HTML and makes no sense but, in addition, applying rule #3 (flatten HTML) on the output of #2 we would have:
Paste this <a class="dark" href="<a href="http://example.com/data.xml">http://example.com/data.xml</a>">Download</a> into your text editor.
which in turn is the mere flat HTML representation of the broken HTML and not clickable. Wrong output: Neither #2 nor #3 were satisfied.
Reversed combination: First #3 (flatten HTML) then #2 (inject links)
If I first apply rule #3 to "decode all HTML" and then afterwards I apply rule #2 to "inject links HTML" it happens this:
Original (same than above):
Paste this <a class="dark" href="http://example.com/data.xml">Download</a> into your text editor.
Result of applying #3 (flatten HTML)
Paste this <a class="dark" href="http://example.com/data.xml">Download</a> into your text editor.
Then we apply rule #2 (inject links) it seems to work:
Paste this <a class="dark" href="http://example.com/data.xml">Download</a> into your text editor.
This works because " is not a valid URL char and detects http://example.com/data.xml as the exact URL limit.
But... what if the original text had also a link inside the link text? This is a very common case scenario. Like this original text:
Paste this <a class="dark" href="http://example.com/data.xml">http://example.com/data.xml</a> into your text editor.
Then applying #2 would give this:
Paste this <a class="dark" href="http://example.com/data.xml"<http://example.com/data.xml</a> into your text editor.
HERE WE HAVE A PROBLEM
As all of &, ; and / are valid URL characters, the URL parser would find this: http://example.com/data.xml</a> as the URL instead of ending at the .xml point.
This would result in this wrong output:
Paste this <a class="dark" href="http://example.com/data.xml"<http://example.com/data.xml</a> into your text editor.
So http://example.com/data.xml</a> got substituted by http://example.com/data.xml</a> but the problem is that the URL was not correctly detected.
Let's mix it up with rule #1
If rules #2 and #3 are a mess when processed together imagine if we mix them with rule #1 and we have a URL which contains a sha-1 like this database entry:
Paste this <a class="dark" href="http://example.com/id/89019b16ab155ba1c19e1ab9efdb9134c8f9e2b9">http://example.com/id/89019b16ab155ba1c19e1ab9efdb9134c8f9e2b9</a> into your text editor.
Could you imagine??
Tokenizer?
I have thought of creating a syntax tokenizer. But I feel it's an overkill.
Is there a design-pattern
I wonder if there's a design-pattern to read and study, how is it called, and where is it documented, when it comes to do multiple text substitutions.
If there's not any pattern... then... is building a syntax tokenizer the only solution?
I feel there must be a much simpler way to do this. Do I really have to tokenize the text in a syntax-tree and then re-render by traversing the tree?
The design pattern is the one you already rejected, left-to-right tokenisation. Of course, that's easier to do in languages for which there are code generators which produce lexical scanners.
There's no need to parse or to build a syntax tree. A linear sequence of tokens suffices. In effect, the scanner becomes a transducer. Each token is either passed through unaltered, or is replaced immediately with the translation required.
Nor does the tokeniser need to be particularly complicated. The three regular expressions you currently have can be used, combined with a fourth token type representing any other character. The important part is that all patterns are tried at each point, one is selected, the indicated replacement is performed, and the scan resumes after the match.
When users paste items from MS word, for example numbered list or bullet points Trix leaves the symbols in, but does not use the default stye rules. eg See below. Note the indenting
I am wanting to replace pasted bulletpoints with '<li>' tags since that is how the browser, or just adds the default style rules to the text.
As a workaround I was thinking that using Javascript/coffee script to replace all incidents of '•' to <li> during a paste command using onPaste='' However this is problematic, since the implementation could cause unforeseen effects.
Another way might be to create a regex expression, to remove the sybols and do It JIT while pasting.
Any other suggestions would be welcome in achieving this.
Edit
/\d\.\s+|[a-z]\)\s+|•\s+|[A-Z]\.\s+|[IVX]+\.\s+[•].\s/g
This regex expression can find Numbered list and a simple replace on the pasted string, will allow for the desired results.
I am trying to develop Artificial Bot i found AIML is something that can be used for achieving such goal i found these points regarding AIML parsing which is done by Program-O
1.) All letters in the input are converted to UPPERCASE
2.) All punctuation is stripped out and replaced with spaces
3.) extra whitespace chatacters, including tabs, are removed
From there, Program O performs a search in the database, looking for all potential matches to the input, including wildcards. The returned results are then “scored” for relevancy and the “best match” is selected. Program O then processes the AIML from the selected result, and returns the finished product to the user.
I am just wondering how to define score and find relevant answer closest to user input
Any help or ideas will be appreciated
#user3589042 (rather cumbersome name, don't you think?)
I'm Dave Morton, lead developer for Program O. I'm sorry I missed this at the time you asked the question. It only came to my attention today.
The way that Program O scores the potential matches pulled from the database is this:
Is the response from the aiml_userdefined table? yes=300/no=0
Is the category for this bot, or it's parent (if it has one)? this=250/parent=0
Does the pattern have one or more underscore (_) wildcards? yes=100/no=0
Does the current category have a <topic> tag? yes(see below)/no=0
a. Does the <topic> contain one or more underscore (_) wildcards? yes=80/no=0
b. Does the <topic> directly match the current topic? yes=50/no=0
c. Does the <topic> contain a star (*) wildcard? yes=10/no=0
Does the current category contain a <that> tag? yes(see below)/no=0
a. Does the <that> contain one or more underscore (_) wildcards? yes=45/no=0
b. Does the <that> directly match the current topic? yes=15/no=0
c. Does the <that> contain a star (*) wildcard? yes=2/no=0
Is the <pattern> a direct match to the user's input? yes=10/no=0
Does the <pattern> contain one or more star (*) wildcards? yes=1/no=0
Does the <pattern> match the default AIML pattern from the config? yes=5/no=0
The script then adds up all passed tests listed above, and also adds a point for each word in the category's <pattern> that also matches a word in the user's input. The AIML category with the highest score is considered to be the "best match". In the event of a tie, the script will then select either the "first" highest scoring category, the "last" one, or one at random, depending on the configuration settings. this selected category is then returned to other functions for parsing of the XML.
I hope this answers your question.
In other words, if I search for "beer can", I'll get a number of results that include phrases like "... beer. Can ...". I want to exclude all results that do not contain the term "beer [space] can". I want to remove all possibility of punctuation from the results.
I thought typing "-." or "-'.'" would work, but it doesn't.
Don't know if it helps, but have you tried [beer can]? Possibly with additional exclude phrases such as -"beer cans", -"beercan" etc.?
I'm looking for a simple way to generate 'Did you mean ...' style search tips when a search over the title of a record doesn't hit on a substring match because of slightly different punctuation or phrasing for a Rails 3 app.
Most commonly, I want to generate hits for 'Alpha: Beta' when the user searches for 'Alpha Beta', 'Alpha & Beta' for 'Alpha and Beta' and 'Alpha Beta' for 'The Alpha Beta' e.g.. The same goes for the opposite direction for the first two examples, because my current substring searching will catch the latter case already. I would prefer to do this without specific logic for each of the above examples though, as there may be other variants I can't think of right now.
I'd also prefer to shy away from a solution that requires me to popular a hidden field of the record with alternate spellings as records are generated, which is then searched over instead of the publicly displayed one.
I'm guessing that a proper full text search like Sphinx/Thinking Sphinx would accomplish this, but I want to check if there's an easier solution for my limited scope problem. Ideally something that automatically generated this hidden field by striping out common words like 'the', 'and' and punctuation like '&' and ':' from both the record title and search term and the title field and then does the search. The actual order of the remaining words needn't necessarily have to match when juggle around ('Alpha Beta Gamma' can match 'Alpha, Beta, Gamma' but not 'Alpha, Gamma, Beta').
This solution doesn't meet all of your requirements, but I believe it's close enough to be worth mentioning - the excellent "scoped_search" gem, available at https://github.com/wvanbergen/scoped_search
It implements a simple query language where a search for 'alpha beta' matches results containing all those words, rather than the exact phrase - see the wiki at https://github.com/wvanbergen/scoped_search/wiki/query-language for more information on what it supports.
It generates SQL queries behind the scenes, so doesn't require a separate search daemon like Sphinx.
However, I don't believe it does anything similar to stripping out common words. Perhaps you could get some mileage by manually stripping out your common words, and then getting scoped_search to search for your revised term?