Unable to search the non english character in google search appliance (GSA) - search-engine

We are using google search appliance product in our application. We have added the non English character in Frontend->Keymatch. When we are searching from our site, no result found error page is displayed.
Kindly suggest us to fix this issue.

I guess by saying non english character, you meant the accented characters.
GSA can retrun keymatch results for query terms with accented characters if the query is encoded properly. Encode your query and added ie request parameter with appropriate encoding value.
You can read more about ie and other request parameters here.
You should have added some sample keymatch entry which you configured that would have helped to assist you effectively.

Related

iPhone XML parsing Norwegian characters æ ø å

I've had this problem for a long time but I've been implementing this ugly hack on the backend to get around it.
Now I've decided to act as a real developer and deal with it.
My problem is that when parsing an XML feed with any of the Norwegian characters æ, ø or å in the title node, all the letters appearing before these special characters are ommitted.
So if the word is "Bålhuset" it only displays "ålhuset" - the funny thing is that æ,ø and å characters AFTER the initial problem character is included.
So if I put for example "ÅBålhuset", I will get "Bålhuset". So it seems it's only the first occurence of one of these special characters that will cause a problem.
Any help would be immensely appreciated!
-Chris
Try while you creating XML use CDATA tags like
<title><![CDATA[Transport "Bålhuset"Classic World's]]></title>
Also here is a list of HTML Tags and more cases XML with those characters is invalid, unless they are contained within a CDATA. Also try this Question hope with help you
Otherwise you need to use their special character code. If you want to represent ö you need to type ö please review like.
And Final XML with those characters is invalid, unless they are contained within a CDATA.
You can Validate you XML while creating and easily fix the bug.
What did it for me was getting the data in JSON and using the native JSON methods; no dropped characters and other sporadic behaviour.
So what that means to me is that there is an issue with NSXMLParser that makes it choke on international characters (the first occurence of which mind you) even though everything is in order with encoding etc.

What should I do with emails using charset ansi_x3.110-1983?

My application is parsing incoming emails. I try to parse them as best as possible but every now and then I get one with puzzling content. This time is an email that looks to be in ASCII but the specified charset is: ansi_x3.110-1983.
My application handles it correctly by defaulting to ASCII, but it throws a warning which I'd like to stop receiving, so my question is: what is ansi_x3.110-1983 and what should I do with it?
According to this page on the IANA's site, ANSI_X3.110-1983 is also known as:
iso-ir-99
CSA_T500-1983
NAPLPS
csISO99NAPLPS
Of those, only the name NAPLPS seems interesting or informative. If you can, consider getting in touch with the people sending those mails. If they're really using Prodigy in this day and age, I'd be amazed.
The IANA site also has a pointer to RFC 1345, which contains a description of the bytes and the characters that they map to. Compared to ISO-8859-1, the control characters are the same, as are most of the punctuation, all of the numbers and letters, and most of the remaining characters in the first 7 bits.
You could possibly use the guide in the RFC to write a tool to map the characters over, if someone hasn't written a tool for it already. To be honest, it may be easier to simply ignore the whines about the weird character set given that the character mapping is close enough to what is expected anyway...

What characters are allowed in twitter hashtags?

In developing an iOS app containing a twitter client, I must allow for user generated hashtags (which may be created elsewhere within the app, not just in the tweet body).
I would like to ensure any such hashtags are valid for twitter, so I would like to error check the entered value for invalid characters. Bear in mind that users may be from non-English speaking countries.
I am aware of the usual limitations, such as not beginning a hashtag with a number, and no special punctuation characters, but I was wondering if there is a known list of all additional characters that are technically allowed within hashtags (i.e. international characters).
Karl, as you've rightly pointed out, any word in any language can be a valid twitter hashtag (as long as it meets a number of basic criteria). As such what you are asking for is a list of valid international word characters. I'm sure someone has compiled such a list somewhere, but using it would not be the most efficient approach to reaching what appears to be your initial goal: ensuring that a given hashtag is valid for twitter.
I believe, what you are looking for is a regular expression that can match all word characters within a Unicode range. Such an expression would not be dependant on your locale and would match all characters in the modern typography that can appear as part of a word.
You didn't specify what language you are writing your app in, so I can't help you with a language specific implementation. However, the basic approach would be as follows:
Check if any of the bracket expressions or character classes already support Unicode character ranges in your language. If yes, then use them.
Check if there is regex modifier that can enable Unicode character range support for your language.
Most modern languages implement regular expressions in a fairly similar way and a lot of them borrow heavily from Perl, so I hope the following two example will put you on the right track:
Perl:
Use POSIX bracket expressions (eg: [[:alpha:]], [[:allnum:]], [[:digit:]], etc) as they give you greater control over the characters you want to match, compared to character classes (eg: \w).
Use /u modifier to enable Unicode support when pattern matching. Under this modifier, the ASCII platform effectively becomes a Unicode platform; and hence, for example, \w will match any of the more than 100,000 word characters in Unicode.
See Perl documentation for more info:
http://perldoc.perl.org/perlre.html#Character-set-modifiers
http://perldoc.perl.org/perlrecharclass.html#POSIX-Character-Classes
Ruby:
Use POSIX bracket expressions as they encompass non-ASCII characters. For instance, /\d/ matches only the ASCII decimal digits (0-9); whereas /[[:digit:]]/ matches any character in the Unicode Nd category.
See Ruby documentation for more info:
http://www.ruby-doc.org/core-2.1.1/Regexp.html#class-Regexp-label-Character+Classes
Examples:
Given a list of hashtags, the following regex will match all hashtags that start with a word character (inc. international word characters) followed by at least one other word character, a number or an underscore:
m/^#[[:alpha:]][[:alnum:]_]+$/u # Perl
/^#[[:alpha:]][[:alnum:]_]+$/ # Ruby
Twitter allows letters, numbers, and underscores.
I checked this by generating tweets via their API. For example, tweeting
Hash tag test #foo[bar
resulted in "#foo" being marked as a hash tag, and "[bar" being unformatted text.
Well, for starters you can't use a # in the hashtag (##hash).
The guidelines below are being quoted from Twitter's help center:
People use the hashtag symbol # before a relevant keyword or phrase (no spaces) in their Tweet to categorize those Tweets and help them show more easily in Twitter Search.
Clicking on a hashtagged word in any message shows you all other Tweets marked with that keyword.
Hashtags can occur anywhere in the Tweet – at the beginning, middle, or end.
Hashtagged words that become very popular are often Trending Topics.
Example: In the Tweet below, #eddie included the hashtag #FF. Users created this as shorthand for "Follow Friday," a weekly tradition where users recommend people that others should follow on Twitter. You'll see this on Fridays.
Using hashtags correctly:
If you Tweet with a hashtag on a public account, anyone who does a search for that hashtag may find your Tweet
Don't #spam #with #hashtags. Don't over-tag a single Tweet. (Best practices recommend using no more than 2 hashtags per Tweet.)
Use hashtags only on Tweets relevant to the topic.
Just want to add that in addition to alphanumeric characters and underscore, you can apparently use em dash in a Twitter hashtag like #COVIDー19.
Only letters and numbers are allowed to be part of a hashtag. If a character other than these follows the leading # and a letter or number, the hashtag will be cut off at this point.
I would recommend that your user interface indicate this to the user by changing the text color of the input field if the user enters anything other than a letter or number.
I had the same issue to implement in golang.
It seems allowed chars with [[:alpha:]] is only English-alphabet and could not use this syntax for other language characters.
Instead, I could use \p{L} for this purpose.
My test with \p{L} is here.
* Arabic, Hebrew, Hindi...etc is not confirmed yet.

YouTube Api (Search) Other Language..(Arabic, Korean, Etc....)

this time i've one Q, which is how to search through YouTube Api...
English is perfectly searched... but other language(arabic, korean, etc...) doesn't work...T T
http://gdata.youtube.com/feeds/api/videos?q=(SEARCH_WORD)&start-index=1&max-results=3&v=2
=> My Access Code...
I'd like to search Arabic or Korean.. plz comment anything....
I need you guys help...
Have a Nice day~!!!
Please try
http://gdata.youtube.com/feeds/api/videos?q=%EA%B0%95%EB%82%A8%EC%8A%A4%ED%83%80%EC%9D%BC&start-index=1&max-results=3&v=2
or
http://gdata.youtube.com/feeds/api/videos?q=\uac15\ub0a8\uc2a4\ud0c0\uc77c&start-index=1&max-results=3&v=2
on your browser.
Cheers
The more general concept underlying the other answer is that the 'q' parameter (all parameters, really) will only accept any valid unicode character; so if the string you're trying to search on is not unicode (i.e. some other encoding set), those code points will be interpreted as unicode and thus result in a search on random characters (returning no results).

Is it valid to have more than one question mark in a URL?

I came across the following URL today:
http://www.sfgate.com/cgi-bin/blogs/inmarin/detail??blogid=122&entry_id=64497
Notice the doubled question mark at the beginning of the query string:
??blogid=122&entry_id=64497
My browser didn't seem to have any trouble with it, and running a quick bookmarklet:
javascript:alert(document.location.search);
just gave me the query string shown above.
Is this a valid URL? The reason I'm being so pedantic (assuming that I am) is because I need to parse URLs like this for query parameters, and supporting doubled question marks would require some changes to my code. Obviously if they're in the wild, I'll need to support them; I'm mainly curious if it's my fault for not adhering to URL standards exactly, or if it's in fact a non-standard URL.
Yes, it is valid. Only the first ? in a URL has significance, any after it are treated as literal question marks:
The query component is indicated by
the first question mark ("?")
character and terminated by a number
sign ("#") character or by the end of
the URI.
...
The characters slash ("/") and
question mark ("?") may represent data
within the query component. Beware
that some older, erroneous
implementations may not handle such
data correctly when it is used as the
base URI for relative references
(Section 5.1), apparently because they
fail to distinguish query data from
path data when looking for
hierarchical separators. However, as
query components are often used to
carry identifying information in the
form of "key=value" pairs and one
frequently used value is a reference
to another URI, it is sometimes better
for usability to avoid
percent-encoding those characters.
https://www.rfc-editor.org/rfc/rfc3986#section-3.4
As a tangentially related answer, foo?spam=1?&eggs=3 gives the parameter spam the value 1?

Resources