Why do search engines ignore symbols? [closed] - search-engine

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Searching for symbols is a common in programming, especially when you are new to a language.
For example, I had a question about the :: operator in Python, and that is not searchable. People looking for things like this or Object [] (array of Objects), would not find what they want.
Why do search engines seem to ignore symbols completely? They are just characters like any others. I can see why
it would be hard to extract semantics from symbols compared to words (eg: a search engine can figure out that "find," "finds," "found" are all related, if not the same word),
but is it really that hard to search for them?
I can also see why in everyday use you'd want symbols to be ignored, but how hard would it
be to make it look for something explicitly (eg: "::" would search for ::)

Check out this article on Interpreting Google Search Queries.
Specifically, section 9
Google ignores some punctuation and special characters, including ! ?
, . ; [ ] # / # < > .
Because punctuation is typically not
as important as the text around it,
Google ignores most punctuation in
your search terms. There are
exceptions, e.g., C++ and $99.
Mathematical symbols, such as /, <,
and >, are not ignored by Google's
calculator.
[ Dr. Ruth ] returns the same results
as [ Dr Ruth ]
What if you're seeking information
that includes punctuation that Google
ignores, e.g., an email address? Just
enter the whole thing including the
punctuation.
* [ info#amazon.com ]
Be aware that web pages sometimes
camouflage email addresses to make
collecting such information difficult
for spammers. For example, on some
sites you'll find the # sign in an
email address replaced with the word
“at.”
Now we'll look at some special
characters that Google doesn't ignore.

To minimize the number of entries in the index.
A search engine doesn't have to ignore them though. For example, it seems Google Code doesn't.

Related

What is your opinion about blocking languages based on sy-langu [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
Roses are red,
violets are blue
Unexpected 'Langu'
on line 32.
Well, to put things short: a technical limitation forced me to ignore Cyrillic material texts (short texts from table MAKT).
As an ABAP guy, I always take the pragmatic solution: I have excluded the languages manually by browsing through table T002 and googling if the languages are based on Cyrillic characters.
It works. But is not sexy. Feedback is appreciated.
REPORT Y_TEST_BLOCK_LANGU.
DATA langu_logon TYPE c LENGTH 2.
DATA langu_selected TYPE c LENGTH 1.
CONSTANTS:
BEGIN OF language,
german TYPE c LENGTH 1 VALUE 'D',
english TYPE c LENGTH 1 VALUE 'E',
END OF language.
CALL FUNCTION 'CONVERSION_EXIT_ISOLA_OUTPUT'
EXPORTING
input = sy-langu
IMPORTING
output = langu_logon.
IF
langu_logon = 'BG'
OR langu_logon = 'KK'
OR langu_logon = 'RU'
OR langu_logon = 'SR'
OR langu_logon = 'SH'
OR langu_logon = 'UK'
.
langu_selected = language-english.
ELSE.
langu_selected = sy-langu.
ENDIF.
START-OF-SELECTION.
PARAMETERS p_matnr TYPE matnr.
SELECT SINGLE maktx FROM makt INTO #DATA(maktx)
WHERE matnr = #p_matnr
AND spras = #langu_selected.
WRITE: /'This is the text',
/ maktx,
/'for Material number',
/ p_matnr
.
Alternatively, you can reverse-exclude starting from the material texts themselves. SELECT all short texts from MAKT that include undesired characters - whatever these are. Then track back which languages these texts belong to. Then put these languages on a deny list. The involved SELECTs may be too time-intensive for online processing, but could be repeated on a regular basis, to fill the DB-persisted deny list.
As some others already noted, the much cleaner solution would be to enable your UI to correctly display those characters. Or, if not possible, you could at least mask them, for example by escaping them to their HTML or UTF codes. This will not look nice, but at least the UI will display something at all.
Also ensure that you investigate other languages, as Dragonthoughts suggests. If your texts include other non-Latin-based languages, you may have the trouble in other places as well. Accidentally hitting Chinese characters may sound a bit off. But think of other widely used characters, such as widely used Greek symbols like alpha, epsilon, omega etc. that might well occur in otherwise English product descriptions.

Regular Expression in MVC5 [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
What would a Regular Expression need to allow characters and numbers only, no special characters or spacing in asp.net mvc5?
You generally use ranges such as [a-z] and [0-9] to filter out just characters and numbers with an asterisk after it *
I don't have a copy of MVC 5 handy so I don't know what the particular syntax is.
A regex for that often looks like:
([0-9]|[A-Z]|[a-z])*
It will be very similar in asp.net or mvc, likely.
That searches for all alphabetic characters from a to z, and all numbers from 0 to 9. The asterisk makes it search for multiple characters and not just a single character at a time. The pipe character says "or". Search for characters upper case, or characters lower case, or numbers. The brackets help sort groups.
As I said though you will have to figure it out the specific syntax of your regex library that your programming language uses, as they can differ. There are perl style regexes, and many variations. The above is just a sample. You can test at:
http://regexstorm.net/tester

Ruby - converting a hashtag to actual word(s) ? (#contentmarketing => content marketing) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
Hashtags sometimes combine two or more words, such as:
content marketing => #contentmarketing
If I have a bunch of hashtags assigned to an article, and the word is in that article, i.e. content marketing. How can I take that hash tag, and detect the word(s) that make up the hashtag?
If the hashtag is a single word, it's trivial: simply look for that word in the article. But, what if the hash tag is two or more words? I could simply split the hashtag in all possible indices and check if the two words produced were in the article.
So for #contentmarketing, I'd check for the words:
c ontentmarketing
co ntentmarketing
con tentmarketing
...
content marketing <= THIS IS THE ANSWER!
...
However, this fails if there are three or more words in the hashtags, unless I split it recursively but that seems very inelegant.
Again, this is assuming the words in the hash tag are in the article.
You can use a regex with an optional space between each character to do this:
your_article =~ /#{hashtag.chars.to_a.join(' ?')}/
I can think of two possible solutions depending on the requirements for the hashtags:
Assuming hashtags must be made up of words and can't be non-words like "#abfgtest":
Do the test similar to your answer above but only test the first part of the string. If the test fails then add another character and try again until you have a word. Then repeat this process on the remaining string until you have found each word. So using your example it would first test:
- c
- co
- ...
- content <- Found a word, start over with rest
- m
- ma
- ...
- marketing <- Found a word, no more string so exit
If you can have garbage, then you will need to do the same thing as option 1. with an additional step. Whenever you reach the end of the string without finding a word, go back to the beginning + 1. Using the #abfgtest example, first you'd run the above function on "abfgtest", then "bfgtest", then "fgtest", etc.

output routes and avoid the routes which user input in prolog [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
output prolog list path and avoid certain routes which user input in prolog.
hi ,I'm working on a project , a building contains zones , each zone has an exit , we want to evacuate people through the zones to exits ,the user input two parameters ,the first one is the "infected zone" ,the other parameter is "zone of people we want to evacuate".
the output should be all the safe routes from the "zone of people we want to evacuate" to exits avoiding the infected zone.
for example :
user input (z11, z12) // it means z11 is infected , people we want to evacuate is in z12.
output : z12->z22->exit3. and
z12->z21->exit2. and z12->elevators
the facts are :
path(z11,z12).
path(z12,z11).
path(z12,z22).
path(z12,z21).
path(z22,z12).
path(z22,z21).
path(z21,z22).
path(z11,exit1).
path(z12,elevators).
path(z21,exit2).
path(z22,exit3).
please help me writing the code.
It's inconvenient that you've chosen to name your predicate path/2 since we'd probably want to call the thing that generates a path to the exit with that name. So first I'd rename all your facts from path/2 to connected/2. Then you're going to want to annotate the exits:
exit(exit1). exit(exit2).
exit(elevators).
Otherwise you'd have to hard-code them somewhere else.
A simple thing to do would be to solve the general path question and then check to ensure the path doesn't contain an infected site. That would look like this:
path(Start, Path) :- path(Start, Path, []).
path(Start, [Exit], Seen) :-
exit(Exit),
connected(Start, Exit),
\+ memberchk(Exit, Seen).
path(Start, [Next|Rest], Seen) :-
connected(Start, Next),
\+ memberchk(Next, Seen),
path(Next, Rest, [Next|Seen]).
safe_path(Start, Avoid, Path) :-
path(Start, Path),
\+ memberchk(Avoid, Path).
This easily generalizes to handle sets of avoid zones:
safe_path(Start, AvoidList, Path) :-
path(Start, Path),
forall(member(Avoid, AvoidList), \+ memberchk(Avoid, Path)).
The bulk of what's interesting and fun to do in Prolog is accomplished with a generate/test paradigm. The simplest and most direct formulation is usually one in which you generate too much (too generally, you might say) and put all the restrictions in the test. Generally speaking, you achieve better performance by making the generator more intelligent about generating possibilities--moving code from the "test" part into the "generate" part of "generate and test."
Usually the first problem you face is generating an infinite tree. This is particularly true with graphs. The memberchk/2 in path/3 with the Seen list serves to prevent looping back and is necessary to make the set of paths finite. Using exit/1 in the base case of path/3 also helps performance because we're not generating intermediate paths. It's nice that with your particular situation you can get away with this.
Doing the avoidance at the end is winnowing out chaff last. The generation doesn't know to avoid these nodes so all of the poisoned paths will get generated and removed by the test. If performance isn't sufficient this way, you can move that code into path/2 directly, doing a similar kind of check to the one done with the Seen list.

Preventing \texttt LaTeX tag from letting its content passing over the margin [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
In my report, I'm writing some class names or variable names inside of a paragraph, and I want these names to be rendered in a monospace font.
Example:
This is my class name: \texttt{baseAdminConfiguration}.
Sometimes when the single word inside of the \texttt tag is rendered at the end of a line, the word does not go to the next line, and there is no break in it neither: the end of the word passes over the margin.
How should I handle such a case?
Cheers.
This hasn’t got much to do with \texttt. The word is simply too long, and LaTeX doesn’t know how to hyphenate it. You can tell it how to do this manually, by declaring hyphenation rules:
\hyphenation{base-Admin-Configuration}
The \hyphenation command may take arbitrarily many words, separated by whitespace.
Alternatively, if this doesn’t the trick, you can introduce manual hypenation hints in the text:
This is a long text that uses the word \texttt{base\-Admin\-Configuration) …
Only the actual hyphenation will be displayed – unused so-called discretionary hyphens (\-) will not be displayed so you can freely sprinkle your text with them, if necessary.
[Read more about hyphenation in LaTeX]
To prevent LaTeX from overflowing lines in principle, the whole paragraph can be wrapped in a sloppypar environment (thanks to Will for pointing this out in the comments):
\begin{sloppypar}
Some text …
\end{sloppypar}
This manipulates the parameters of the line-breaking algorithm (in particular, \tolerance). The downside: this can lead to very ugly spacing. Alternatively, \tolerance and other internal parameters can be manipulated directly – the TeX FAQ shows how.
The solution is quite simple: use the url package and replace the texttt command with the path command.
I found out that here
https://tex.stackexchange.com/questions/299/how-to-get-long-texttt-sections-to-break
in the post of Will Robertson.
Cheers

Resources