What is your opinion about blocking languages based on sy-langu [closed] - localization

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
Roses are red,
violets are blue
Unexpected 'Langu'
on line 32.
Well, to put things short: a technical limitation forced me to ignore Cyrillic material texts (short texts from table MAKT).
As an ABAP guy, I always take the pragmatic solution: I have excluded the languages manually by browsing through table T002 and googling if the languages are based on Cyrillic characters.
It works. But is not sexy. Feedback is appreciated.
REPORT Y_TEST_BLOCK_LANGU.
DATA langu_logon TYPE c LENGTH 2.
DATA langu_selected TYPE c LENGTH 1.
CONSTANTS:
BEGIN OF language,
german TYPE c LENGTH 1 VALUE 'D',
english TYPE c LENGTH 1 VALUE 'E',
END OF language.
CALL FUNCTION 'CONVERSION_EXIT_ISOLA_OUTPUT'
EXPORTING
input = sy-langu
IMPORTING
output = langu_logon.
IF
langu_logon = 'BG'
OR langu_logon = 'KK'
OR langu_logon = 'RU'
OR langu_logon = 'SR'
OR langu_logon = 'SH'
OR langu_logon = 'UK'
.
langu_selected = language-english.
ELSE.
langu_selected = sy-langu.
ENDIF.
START-OF-SELECTION.
PARAMETERS p_matnr TYPE matnr.
SELECT SINGLE maktx FROM makt INTO #DATA(maktx)
WHERE matnr = #p_matnr
AND spras = #langu_selected.
WRITE: /'This is the text',
/ maktx,
/'for Material number',
/ p_matnr
.

Alternatively, you can reverse-exclude starting from the material texts themselves. SELECT all short texts from MAKT that include undesired characters - whatever these are. Then track back which languages these texts belong to. Then put these languages on a deny list. The involved SELECTs may be too time-intensive for online processing, but could be repeated on a regular basis, to fill the DB-persisted deny list.
As some others already noted, the much cleaner solution would be to enable your UI to correctly display those characters. Or, if not possible, you could at least mask them, for example by escaping them to their HTML or UTF codes. This will not look nice, but at least the UI will display something at all.
Also ensure that you investigate other languages, as Dragonthoughts suggests. If your texts include other non-Latin-based languages, you may have the trouble in other places as well. Accidentally hitting Chinese characters may sound a bit off. But think of other widely used characters, such as widely used Greek symbols like alpha, epsilon, omega etc. that might well occur in otherwise English product descriptions.

Related

Regular Expression in MVC5 [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
What would a Regular Expression need to allow characters and numbers only, no special characters or spacing in asp.net mvc5?
You generally use ranges such as [a-z] and [0-9] to filter out just characters and numbers with an asterisk after it *
I don't have a copy of MVC 5 handy so I don't know what the particular syntax is.
A regex for that often looks like:
([0-9]|[A-Z]|[a-z])*
It will be very similar in asp.net or mvc, likely.
That searches for all alphabetic characters from a to z, and all numbers from 0 to 9. The asterisk makes it search for multiple characters and not just a single character at a time. The pipe character says "or". Search for characters upper case, or characters lower case, or numbers. The brackets help sort groups.
As I said though you will have to figure it out the specific syntax of your regex library that your programming language uses, as they can differ. There are perl style regexes, and many variations. The above is just a sample. You can test at:
http://regexstorm.net/tester

Ruby - converting a hashtag to actual word(s) ? (#contentmarketing => content marketing) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
Hashtags sometimes combine two or more words, such as:
content marketing => #contentmarketing
If I have a bunch of hashtags assigned to an article, and the word is in that article, i.e. content marketing. How can I take that hash tag, and detect the word(s) that make up the hashtag?
If the hashtag is a single word, it's trivial: simply look for that word in the article. But, what if the hash tag is two or more words? I could simply split the hashtag in all possible indices and check if the two words produced were in the article.
So for #contentmarketing, I'd check for the words:
c ontentmarketing
co ntentmarketing
con tentmarketing
...
content marketing <= THIS IS THE ANSWER!
...
However, this fails if there are three or more words in the hashtags, unless I split it recursively but that seems very inelegant.
Again, this is assuming the words in the hash tag are in the article.
You can use a regex with an optional space between each character to do this:
your_article =~ /#{hashtag.chars.to_a.join(' ?')}/
I can think of two possible solutions depending on the requirements for the hashtags:
Assuming hashtags must be made up of words and can't be non-words like "#abfgtest":
Do the test similar to your answer above but only test the first part of the string. If the test fails then add another character and try again until you have a word. Then repeat this process on the remaining string until you have found each word. So using your example it would first test:
- c
- co
- ...
- content <- Found a word, start over with rest
- m
- ma
- ...
- marketing <- Found a word, no more string so exit
If you can have garbage, then you will need to do the same thing as option 1. with an additional step. Whenever you reach the end of the string without finding a word, go back to the beginning + 1. Using the #abfgtest example, first you'd run the above function on "abfgtest", then "bfgtest", then "fgtest", etc.

Is there a distinct name for prefix notation used in Delphi oftenly? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Does my variable naming convention have a name?
Notation in question is described by example below:
T for type
P for pointer
F for field
A for argument
L for local
et cetera, there is at least S missing from the list, but i'm not sure which string it designates.
First 3 prefices was with Delphi since very beginning, last 2 i've noticed relatively recently. I'd like to know notation name (if any), and read some normative whitepaper (and adopt then, may be).
Zarko Gajic has a pretty good Delphi-specific list here:
http://delphi.about.com/od/standards/l/bldnc.htm
Personally, I find some conventions like this useful. I still remember my first language FORTRAN, where the convention for Integers was to start them any letter from I to N, and it was easy to remember because they are the first two letters of INteger.
Section "3.3 Field Naming" of the Object Pascal Style Guide by Charles Calvert gives a brief but good guide as to when to use Hungarian notation, and also what single character identifier names are appropriate. My FORTRAN background (8 character names max) also made me use "N" as the count of items and led to code such as:
DO 10 I = 1, N
DO 20 J = I, N
...
20 CONTINUE
10 CONTINUE
Ouch! The memories hurt.
My personal favorite of all these standards, is to obey the standards already established in the code you're in, and not try to impose a different standard 50% of the way through, and to religiously avoid bikeshed discussions.
But if you press me really hard, I'll admit, I prefer Charlie Calvert's standards as used by JVCL devs, same as "section 3.3" link by LKessler above.
Hungarian notation.
With modern IDEs (including Delphi's) many people (myself included) feel it is no longer necessary.
EDIT: Technically this is not true Hungarian notation, as sometimes the prefix indicates the scope rather than the type.

Why do search engines ignore symbols? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Searching for symbols is a common in programming, especially when you are new to a language.
For example, I had a question about the :: operator in Python, and that is not searchable. People looking for things like this or Object [] (array of Objects), would not find what they want.
Why do search engines seem to ignore symbols completely? They are just characters like any others. I can see why
it would be hard to extract semantics from symbols compared to words (eg: a search engine can figure out that "find," "finds," "found" are all related, if not the same word),
but is it really that hard to search for them?
I can also see why in everyday use you'd want symbols to be ignored, but how hard would it
be to make it look for something explicitly (eg: "::" would search for ::)
Check out this article on Interpreting Google Search Queries.
Specifically, section 9
Google ignores some punctuation and special characters, including ! ?
, . ; [ ] # / # < > .
Because punctuation is typically not
as important as the text around it,
Google ignores most punctuation in
your search terms. There are
exceptions, e.g., C++ and $99.
Mathematical symbols, such as /, <,
and >, are not ignored by Google's
calculator.
[ Dr. Ruth ] returns the same results
as [ Dr Ruth ]
What if you're seeking information
that includes punctuation that Google
ignores, e.g., an email address? Just
enter the whole thing including the
punctuation.
* [ info#amazon.com ]
Be aware that web pages sometimes
camouflage email addresses to make
collecting such information difficult
for spammers. For example, on some
sites you'll find the # sign in an
email address replaced with the word
“at.”
Now we'll look at some special
characters that Google doesn't ignore.
To minimize the number of entries in the index.
A search engine doesn't have to ignore them though. For example, it seems Google Code doesn't.

What is parsing in terms that a new programmer would understand? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I am a college student getting my Computer Science degree. A lot of my fellow students really haven't done a lot of programming. They've done their class assignments, but let's be honest here those questions don't really teach you how to program.
I have had several other students ask me questions about how to parse things, and I'm never quite sure how to explain it to them. Is it best to start just going line by line looking for substrings, or just give them the more complicated lecture about using proper lexical analysis, etc. to create tokens, use BNF, and all of that other stuff? They never quite understand it when I try to explain it.
What's the best approach to explain this without confusing them or discouraging them from actually trying.
I'd explain parsing as the process of turning some kind of data into another kind of data.
In practice, for me this is almost always turning a string, or binary data, into a data structure inside my Program.
For example, turning
":Nick!User#Host PRIVMSG #channel :Hello!"
into (C)
struct irc_line {
char *nick;
char *user;
char *host;
char *command;
char **arguments;
char *message;
} sample = { "Nick", "User", "Host", "PRIVMSG", { "#channel" }, "Hello!" }
Parsing is the process of analyzing text made of a sequence of tokens to determine its grammatical structure with respect to a given (more or less) formal grammar.
The parser then builds a data structure based on the tokens. This data structure can then be used by a compiler, interpreter or translator to create an executable program or library.
(source: wikimedia.org)
If I gave you an english sentence, and asked you to break down the sentence into its parts of speech (nouns, verbs, etc.), you would be parsing the sentence.
That's the simplest explanation of parsing I can think of.
That said, parsing is a non-trivial computational problem. You have to start with simple examples, and work your way up to the more complex.
What is parsing?
In computer science, parsing is the process of analysing text to determine if it belongs to a specific language or not (i.e. is syntactically valid for that language's grammar). It is an informal name for the syntactic analysis process.
For example, suppose the language a^n b^n (which means same number of characters A followed by the same number of characters B). A parser for that language would accept AABB input and reject the AAAB input. That is what a parser does.
In addition, during this process a data structure could be created for further processing. In my previous example, it could, for instance, to store the AA and BB in two separate stacks.
Anything that happens after it, like giving meaning to AA or BB, or transform it in something else, is not parsing. Giving meaning to parts of an input sequence of tokens is called semantic analysis.
What isn't parsing?
Parsing is not transform one thing into another. Transforming A into B, is, in essence, what a compiler does. Compiling takes several steps, parsing is only one of them.
Parsing is not extracting meaning from a text. That is semantic analysis, a step of the compiling process.
What is the simplest way to understand it?
I think the best way for understanding the parsing concept is to begin with the simpler concepts. The simplest one in language processing subject is the finite automaton. It is a formalism to parsing regular languages, such as regular expressions.
It is very simple, you have an input, a set of states and a set of transitions. Consider the following language built over the alphabet { A, B }, L = { w | w starts with 'AA' or 'BB' as substring }. The automaton below represents a possible parser for that language whose all valid words starts with 'AA' or 'BB'.
A-->(q1)--A-->(qf)
/
(q0)
\
B-->(q2)--B-->(qf)
It is a very simple parser for that language. You start at (q0), the initial state, then you read a symbol from the input, if it is A then you move to (q1) state, otherwise (it is a B, remember the remember the alphabet is only A and B) you move to (q2) state and so on. If you reach (qf) state, then the input was accepted.
As it is visual, you only need a pencil and a piece of paper to explain what a parser is to anyone, including a child. I think the simplicity is what makes the automata the most suitable way to teaching language processing concepts, such as parsing.
Finally, being a Computer Science student, you will study such concepts in-deep at theoretical computer science classes such as Formal Languages and Theory of Computation.
Have them try to write a program that can evaluate arbitrary simple arithmetic expressions. This is a simple problem to understand but as you start getting deeper into it a lot of basic parsing starts to make sense.
Parsing is about READING data in one format, so that you can use it to your needs.
I think you need to teach them to think like this. So, this is the simplest way I can think of to explain parsing for someone new to this concept.
Generally, we try to parse data one line at a time because generally it is easier for humans to think this way, dividing and conquering, and also easier to code.
We call field to every minimum undivisible data. Name is field, Age is another field, and Surname is another field. For example.
In a line, we can have various fields. In order to distinguish them, we can delimit fields by separators or by the maximum length assign to each field.
For example:
By separating fields by comma
Paul,20,Jones
Or by space (Name can have 20 letters max, age up to 3 digits, Jones up to 20 letters)
Paul 020Jones
Any of the before set of fields is called a record.
To separate between a delimited field record we need to delimit record. A dot will be enough (though you know you can apply CR/LF).
A list could be:
Michael,39,Jordan.Shaquille,40,O'neal.Lebron,24,James.
or with CR/LF
Michael,39,Jordan
Shaquille,40,O'neal
Lebron,24,James
You can say them to list 10 nba (or nlf) players they like. Then, they should type them according to a format. Then make a program to parse it and display each record. One group, can make list in a comma-separated format and a program to parse a list in a fixed size format, and viceversa.
Parsing to me is breaking down something into meaningful parts... using a definable or predefined known, common set of part "definitions".
For programming languages there would be keyword parts, usable punctuation sequences...
For pumpkin pie it might be something like the crust, filling and toppings.
For written languages there might be what a word is, a sentence, what a verb is...
For spoken languages it might be tone, volume, mood, implication, emotion, context
Syntax analysis (as well as common sense after all) would tell if what your are parsing is a pumpkinpie or a programming language. Does it have crust? well maybe it's pumpkin pudding or perhaps a spoken language !
One thing to note about parsing stuff is there are usually many ways to break things into parts.
For example you could break up a pumpkin pie by cutting it from the center to the edge or from the bottom to the top or with a scoop to get the filling out or by using a sledge hammer or eating it.
And how you parse things would determine if doing something with those parts will be easy or hard.
In the "computer languages" world, there are common ways to parse text source code. These common methods (algorithims) have titles or names. Search the Internet for common methods/names for ways to parse languages. Wikipedia can help in this regard.
In linguistics, to divide language into small components that can be analyzed. For example, parsing this sentence would involve dividing it into words and phrases and identifying the type of each component (e.g.,verb, adjective, or noun).
Parsing is a very important part of many computer science disciplines. For example, compilers must parse source code to be able to translate it into object code. Likewise, any application that processes complex commands must be able to parse the commands. This includes virtually all end-user applications.
Parsing is often divided into lexical analysis and semantic parsing. Lexical analysis concentrates on dividing strings into components, called tokens, based on punctuationand other keys. Semantic parsing then attempts to determine the meaning of the string.
http://www.webopedia.com/TERM/P/parse.html
Simple explanation: Parsing is breaking a block of data into smaller pieces (tokens) by following a set of rules (using delimiters for example),
so that this data could be processes piece by piece (managed, analysed, interpreted, transmitted, ets).
Examples: Many applications (like Spreadsheet programs) use CSV (Comma Separated Values) file format to import and export data. CSV format makes it possible for the applications to process this data with a help of a special parser.
Web browsers have special parsers for HTML and CSS files. JSON parsers exist. All special file formats must have some parsers designed specifically for them.

Resources