String concatenation VS string format - delphi

What is the best approach, simple string concatenation or string.format?
For instance, what is the better to use:
s:=v1+' '+v2
or
s:=format('%S %S',[v1,v2])

Depends on your criteria for "best". If all you're doing is concatenating two strings, I'd go with the + operator. It's obvious what you're trying to do and easy to read, and it's a little bit faster because it doesn't have to use variants. (Have you looked at what format actually does under the hood? it's kinda scary!)
The major advantage of format is that it lets you make a single string and store it somewhere, such as in a text file or a resourcestring, and gather other parameters later. This makes it useful for more complex tasks. But if all you need to do is stick two strings together, it's kinda overkill IMO.

Format works with internationalization, making it possible to localize your app. Concatenation does not. Hence, I favor format for any display which may have to be produced in a culture-dependent manner.
Update: The reason format works for internationalization is that not all languages express everything in the same order. A contrived example would be:
resourcestring
sentence = ' is ';
var
subject = 'Craig';
adjective = 'helpful';
begin
WriteLn(subject + sentence + adjective + '!');
This works, and I can customize with a resourcestring, but in Spanish I would write, "¡Qué servicial es Craig!" The resourcestring doesn't help me. Instead I should write:
resourcestring
sentence = '%S is %S!'; // ES: '¡Qué %1:S es %0:S!'

Here's a third option:
s:=Concat(V1,V2);

I use:
s := v1 + ' ' + v2;
It's clearest and easiest to understand.
That is the most important thing.
You may find a construct that is marginally more efficient, e.g. using TStringBuilder in Delphi 2009. If efficiency is of utmost importance, then do what's necessary in the two or three most critical lines. Everywhere else, use code and constructs that are clear and easy to understand.

Related

Elixir/Erlang - Split paragraph into sentences based on the language

In Java there is a class called BreakItterator which allows me to pass a paragraph of text in any language (the language it is written in is known) and it will split the text into separate sentences. The magic is that it can take as an argument the locale of the langue the text is written in and it will split the text according to that languages rules (if you look into it it is actually a very complex issue even in English - it is certainly not a case of 'split by full-stops/periods').
Does anybody know how I would do this in elixir? I can't find anything in a Google search.
I am almost at the point of deploying a very thin public API that does only this basic task that I can call into from elixir - but this is really not desirable.
Any help would be really appreciated.
i18n library should be usable for this. Just going from the examples provided, since I have no experience using it, something like the following should work (:en is the locale code):
str = :i18n_string.from("some string")
iter = :i18n_iterator.open(:en, :sentence)
sentences = :i18n_string.split(iter, str)
There's also Cldr, which implements a lot of locale-dependent Unicode algorithms directly in Elixir, but it doesn't seem to include iteration in particular at the moment (you may want to raise an issue there).

Save Mathematica code in `FullForm` syntax

I need to do some metaprogramming on a large Mathematica code base (hundreds of thousands of lines of code) and don't want to have to write a full-blown parser so I was wondering how best to get the code from a Mathematica notebook out in an easily-parsed syntax.
Is it possible to export a Mathematica notebook in FullForm syntax, or to save all definitions in FullForm syntax?
The documentation for Save says that it can only export in the InputForm syntax, which is non-trivial to parse.
The best solution I have so far is to evaluate the notebook and then use DownValues to extract the rewrite rules with arguments (but this misses symbol definitions) as follows:
DVs[_] := {}
DVs[s_Symbol] := DownValues[s]
stream = OpenWrite["FullForm.m"];
WriteString[stream,
DVs[Symbol[#]] & /# Names["Global`*"] // Flatten // FullForm];
Close[stream];
I've tried a variety of approaches so far but none are working well. Metaprogramming in Mathematica seems to be extremely difficult because it keeps evaluating things that I want to keep unevaluated. For example, I wanted to get the string name of the infinity symbol using SymbolName[Infinity] but the Infinity gets evaluated into a non-symbol and the call to SymbolName dies with an error. Hence my desire to do the metaprogramming in a more suitable language.
EDIT
The best solution seems to be to save the notebooks as package (.m) files by hand and then translate them using the following code:
stream = OpenWrite["EverythingFullForm.m"];
WriteString[stream, Import["Everything.m", "HeldExpressions"] // FullForm];
Close[stream];
You can certainly do this. Here is one way:
exportCode[fname_String] :=
Function[code,
Export[fname, ToString#HoldForm#FullForm#code, "String"],
HoldAllComplete]
For example:
fn = exportCode["C:\\Temp\\mmacode.m"];
fn[
Clear[getWordsIndices];
getWordsIndices[sym_, words : {__String}] :=
Developer`ToPackedArray[words /. sym["Direct"]];
];
And importing this as a string:
In[623]:= Import["C:\\Temp\\mmacode.m","String"]//InputForm
Out[623]//InputForm=
"CompoundExpression[Clear[getWordsIndices], SetDelayed[getWordsIndices[Pattern[sym, Blank[]], \
Pattern[words, List[BlankSequence[String]]]], Developer`ToPackedArray[ReplaceAll[words, \
sym[\"Direct\"]]]], Null]"
However, going to other language to do metaprogramming for Mathematica sounds ridiculous to me, given that Mathematica is very well suited for that. There are many techniques available in Mathematica to do meta-programming and avoid premature evaluation. One that comes to my mind I described in this answer, but there are many others. Since you can operate on parsed code and use the pattern-matching in Mathematica, you save a lot. You can browse the SO Mathematica tags (past questions) and find lots of examples of meta-programming and evaluation control.
EDIT
To ease your pain with auto-evaluating symbols (there are only a few actually, Infinity being one of them).If you just need to get a symbol name for a given symbol, then this function will help:
unevaluatedSymbolName = Function[sym, SymbolName#Unevaluated#sym, HoldAllComplete]
You use it as
In[638]:= unevaluatedSymbolName[Infinity]//InputForm
Out[638]//InputForm="Infinity"
Alternatively, you can simply add HoldFirst attribute to SymbolName function via SetAttributes. One way is to do that globally:
SetAttributes[SymbolName,HoldFirst];
SymbolName[Infinity]//InputForm
Modifying built-in functions globally is however dangerous since it may have unpredictable effects for such a large system as Mathematica:
ClearAttributes[SymbolName, HoldFirst];
Here is a macro to use that locally:
ClearAll[withUnevaluatedSymbolName];
SetAttributes[withUnevaluatedSymbolName, HoldFirst];
withUnevaluatedSymbolName[code_] :=
Internal`InheritedBlock[{SymbolName},
SetAttributes[SymbolName, HoldFirst];
code]
Now,
In[649]:=
withUnevaluatedSymbolName[
{#,StringLength[#]}&[SymbolName[Infinity]]]//InputForm
Out[649]//InputForm= {"Infinity", 8}
You may also wish to do some replacements in a piece of code, say, replace a given symbol by its name. Here is an example code (which I wrap in Hold to prevent it from evaluation):
c = Hold[Integrate[Exp[-x^2], {x, -Infinity, Infinity}]]
The general way to do replacements in such cases is using Hold-attributes (see this answer) and replacements inside held expressions (see this question). For the case at hand:
In[652]:=
withUnevaluatedSymbolName[
c/.HoldPattern[Infinity]:>RuleCondition[SymbolName[Infinity],True]
]//InputForm
Out[652]//InputForm=
Hold[Integrate[Exp[-x^2], {x, -"Infinity", "Infinity"}]]
, although this is not the only way to do this. Instead of using the above macro, we can also encode the modification to SymbolName into the rule itself (here I am using a more wordy form ( Trott - Strzebonski trick) of in-place evaluation, but you can use RuleCondition as well:
ClearAll[replaceSymbolUnevaluatedRule];
SetAttributes[replaceSymbolUnevaluatedRule, HoldFirst];
replaceSymbolUnevaluatedRule[sym_Symbol] :=
HoldPattern[sym] :> With[{eval = SymbolName#Unevaluated#sym}, eval /; True];
Now, for example:
In[629]:=
Hold[Integrate[Exp[-x^2],{x,-Infinity,Infinity}]]/.
replaceSymbolUnevaluatedRule[Infinity]//InputForm
Out[629]//InputForm=
Hold[Integrate[Exp[-x^2], {x, -"Infinity", "Infinity"}]]
Actually, this entire answer is a good demonstration of various meta-programming techniques. From my own experiences, I can direct you to this, this, this, this and this answers of mine, where meta-programming was essential to solve problem I was addressing. You can also judge by the fraction of functions in Mathematica carrying Hold-attributes to all functions - it is about 10-15 percents if memory serves me well. All those functions are effectively macros, operating on code. To me, this is a very indicative fact, telling me that Mathematica jeavily builds on its meta-programming facilities.
The full forms of expressions can be extracted from the Code and Input cells of a notebook as follows:
$exprs =
Cases[
Import["mynotebook.nb", "Notebook"]
, Cell[content_, "Code"|"Input", ___] :>
ToExpression[content, StandardForm, HoldComplete]
, Infinity
] //
Flatten[HoldComplete ## #, 1, HoldComplete] & //
FullForm
$exprs is assigned the expressions read, wrapped in Hold to prevent evaluation. $exprs could then be saved into a text file:
Export["myfile.txt", ToString[$exprs]]
Package files (.m) are slightly easier to read in this way:
Import["mypackage.m", "HeldExpressions"] //
Flatten[HoldComplete ## #, 1, HoldComplete] &

Strings in a separate .pas file

This may not be the correct place for this question, if not feel free to move it. I tagged as Delphi/Pascal because it's what I am working in atm, but this could apply to all programming I guess.
Anyway I am doing some code cleanup and thinking of moving all the strings in my program to a separate single .pas file. Are there any pros and cons to doing this? Is it even worth doing?
To clarify: I mean that I will be creating a separate file, Strings.pas in it I will make all my text string variables.
Ex
Current Code
Messages.Add('The voucher was NOT sent to ' + sName+
' because the application is in TEST MODE.');
Messages.Add('Voucher Saved to ' + sFullPath);
Messages.Add('----------------------------------------------------------');
New Code would be something like:
Messages.Add(sMsgText1 + '' + sName + '' + sMsgText2 + '' + sFullPath)
The Strings.pas file would hold all the string data. Hope that makes better sense
Moving your strings to a separate file is a good idea! It keeps them together and will let you easily change them if required. Your question doesn't say you want to be able to translate them, but centralizing will help that to.
But, code like:
Messages.Add(sMsgText1 + '' + sName + '' + sMsgText2 + '' + sFullPath)
is not better than code like:
Messages.Add('The voucher was NOT sent to ' + sName+
' because the application is in TEST MODE.');
You've turned a messy but readable function call into a messy and un-readable function call. With the old code (the second snippet just above), you can read the code and see roughly what the message is going to say, because a lot of it is there in text. With the new code, you can't.
Second, the reason for moving the strings to to keep related items together and make it easier to change them. What if you want to change the above message so that instead of saying "The file 'foo' in path 'bar'..." it is phrased "The file bar\foo is..."? You can't: the way the messages are built is still fixed and scattered throughout your code. If you want to change several messages to be formatted the same way, you will need to change lots of individual places.
This will be even more of a problem if your goal is to translate your messages, since often translation requires rephrasing a message not just translating the components. (You need to change the order of subitems included in your messages, for example - you can't just assume each language is a phrase-for-phrase in order substitution.)
Refactor one step further
I'd suggest instead a more aggressive refactoring of your message code. You're definitely on the right track when you suggest moving your messages to a separate file. But don't just move the strings: move the functions as well. Instead of a large number of Messages.Add('...') scattered through your code, find the common subset of messages you create. Many will be very similar. Create a family of functions you can call, so that all similar messages are implemented with a single function, and if you need to change the phrasing for them, you can do it in a single spot.
For example, instead of:
Messages.Add('The file ' + sFile + ' in ' + sPath + ' was not found.');
... and elsewhere:
Messages.Add('The file ' + sFileName + ' in ' + sURL + ' was not found.');
have a single function:
Messages.ItemNotFound(sFile, sPath);
...
Messages.ItemNotFound(sFileName, sURL);
You get:
Centralized message strings
Centralized message functions
Less code duplication
Cleaner code (no assembling of strings in a function call, just parameters)
Easier to translate - provide an alternate implementation of the functions (don't forget that just translating the substrings may not be enough, you often need to be able to alter the phrasing substantially.)
Clear descriptions of what the message is in the function name, such as ItemNotFount(item, path), which leads to
Clearer code when you're reading it
Sounds good to me :)
I think it makes a lot of sense to move all string constants to a single unit. It makes changing the texts a lot easier, especially if you want to translate to other (human) languages.
But instead of strings, why don't you do what I usually do, i.e. use resourcestring. That way, your strings can be changed by someone else with a resource editor, without recompilation.
unit Strings;
resourcestring
strMsgText1 = 'The voucher was NOT sent to ';
etc...
But such a string is probably better done as:
resourcestring
strVoucherNotSent =
'The voucher was NOT sent to %s because the application is in TEST MODE.';
strVoucherForNHasValueOf =
'The voucher for %s has a value of $%.2f';
The advantage of that is that in some languages, the placement and order of such substitutions is different. That way, a translator can place the arguments wherever it is necessary. Of course the application must then use Format() to handle the string:
Messages.Add(Format(strVoucherNotSent, [sName]));
Messages.Add(Format(strVoucherSavedTo, [sFullPath]));
Messages.Add(Format(strVoucherForNHasValueOf, [sName, dblValue]));
If you are wanting to translate the UI into different languages then you may benefit from having all your text in a single file, or perhaps a number of files dedicated to declaring string constants.
However, if you do this change without such a strong motivation, then you may just make your code hard to read.
Generally you have to ask what the benefits of such a major refactoring are, and if they are not self-evident, then you may well be changing things just for the sake of change.
If you want to translate your app, consider using Gnu GetText for delphi also known as dxGetText. This is better than putting your strings into a separate .pas file because it allows you to enable translation without any special tools, any recompilation, by even end users.
Well, the consensus here seems to tend towards the pro side, and I agree with that completely. Wrong overuse of string literals could lead to your source getting stringly typed.
One benefit I am still missing is reusability of strings. Although that is not a direct benefit from the shipping to another unit, it ís from moving the string literals to constants.
But a rather significant drawback is the required time to create this separate source file. You might consider to implement this good programming practice in the next project. It entirely depends on whether you have sufficient time (e.g. hobby project) or a deadline, on having a next project or not (e.g. student), or on just wanting to do some practice. Like David H answers and comments, as with all decisions you make, you have to weight the benefits.
Seen apart from all kinds of fancy refactoring tools that could provide in some automatic assistance, realize that moving the string literals to another unit alone does not get the job done. Like Rudy and David M already answered, you partly have to rewrite your source. Also, finding readable, short and applicable constant names dóes take time. As many comments already stated that having control over spelling an consistency is important, I like to think that same argument goes for the replacing constants itself as well.
And about the translation answers, whether being applicable to OP or not, moving all your source strings to a separate unit is only part of a translation solution. You also have to take care for designer strings (i.e. captions), and GUI compatibility: a longer translation still has to fit in your label.
If you have the luxury or the need to: go for it. But I would take this to the next project.
I did group all literals with resourcestrings in an older version of our framework. I came back from that since in frameworks you might not use all strings in a framework (e.g. because some units or unit groups are not used, but scanning the common dirs will show up all strings in your translation tool).
Now I distribute them again over units. I originally started grouping them to avoid duplicates, but in retrospect that was the lesser problem
(*) using dxgettext on the "common" dirs.

How to write Regular expressions in asp.net? for Email Validator?

I want to know the General formula for Writing Regular Expression? Any article?
Regexes for email addresses are trickier than you'd think. Here's a good page to start: http://www.regular-expressions.info/email.html
A "complete" regex here: http://code.iamcal.com/php/rfc822/full_regexp.txt ;)
There isn't really a general formula. Regex is a non-trivial language for matching strings. There are plenty of books and tutorials available.
One of the better ways of learning regex is to use a piece of regex designer software, like this one.
Regexes for emails are tricky, but there are some good ones here: http://regexlib.com/DisplayPatterns.aspx
Yes, start with the Regex class. And read lots of tutorials, like this one.
in .vb if you dont care about a postback....
Imports System.Text.RegularExpressions
valueStr = Regex.Replace(oldString, ",", ",#")
Another common way to do it is in javascript in your aspx page without a postback.
script type = "text/javascript"
function intChecker(field) {
//use regular expressions to take out alphanumeric characters
//and special characters such as !##$
//The reason I run the match before I run the replace is so that the
//cursor doesn't jump to the end of the textbox unless it is a bad character
var regExp2 = /[A-Za-z\.\!\#\#\$\%\^\&\*\(\)\,\?\:\;\_\-\+\=\~\/]/g;
if (field.value.match(regExp2)) {
field.value = field.value.replace(regExp2, '');
}
}
/script
This will get you started with regex in vb.
You will need to find the expression to validate the email address.
Have fun!

How hard would it be to translate a programming language to another human language?

Let me explain. Suppose I want to teach Python to someone who only speaks Spanish. As you know, in most programming languages all keywords are in English. How complex would it be to create a program that will find all keywords in a given source code and translate them? Would I need to use a parser and stuff, or will a couple of regexes and string functions be enough?
If it depends on the source programming language, then Python and Javascript would be the most important.
What I mean by "how complex would it be" is that would it be enough to have a list of keywords, and parse the source code to find keywords not in quotes? Or are there enough syntactical weirdnesses that something more complicated is required?
If all you want is to translate keywords, then (while you definitely DO need a proper parser, as otherwise avoiding any change in strings, comments &c becomes a nightmare) the task is quite simple. For example, since you mentioned Python:
import cStringIO
import keyword
import token
import tokenize
samp = '''\
for x in range(8):
if x%2:
y = x
while y>0:
print y,
y -= 3
print
'''
translate = {'for': 'per', 'if': 'se', 'while': 'mentre', 'print': 'stampa'}
def toks(tokens):
for tt, ts, src, erc, ll in tokens:
if tt == token.NAME and keyword.iskeyword(ts):
ts = translate.get(ts, ts)
yield tt, ts
def main():
rl = cStringIO.StringIO(samp).readline
toki = toks(tokenize.generate_tokens(rl))
print tokenize.untokenize(toki)
main()
I hope it's obvious how to generalize this to "translate" any Python source and in any language (I'm supplying only a very partial Italian keyword translation dict). This emits:
per x in range (8 ):
se x %2 :
y =x
mentre y >0 :
stampa y ,
y -=3
stampa
(strange though correct whitespace, but that could be easily enough remedied). As an Italian speaker I can tell you this is terrible to read, but that's par for the course for any "programming language translation" as you desire. Worse, NON-keywords such as range remain un-translated (as per your specs) -- of course, you don't have to constrain your translation to keywords-only (it's easy enough to remove the if that does that above;-).
The problem you will encounter is that, unless you have strict coding standards, the fact that people will not necessarily follow a pattern in how they do the code. And in any dynamic language you will have a problem where the eval function will have keywords within quotes.
If you are trying to teach a language, you could create a DSL that has keywords in spanish, so that you can teach in your language, and it can be processed in python or javascript, so you have basically made your own language, with the constructs you want, for teaching.
Once they understand how to program, they will then need to start learning languages with the "English" keywords, so that they can communicate with others, but that could come after they understand how to program, if it would make your life easier.
So, to answer your question, there is enough syntactic weirdness that it would be considerably more complicated to translate the keywords.
This is not an optimistic answer nor a great one. However, I feel it has some merit.
I can speak about C# and the translation is not worth it. Here are reasons:
C# is based on English but it is not English literature per se. For example, what would "var" or "int" be in Spanish?
It is possible to create a program to let you use Spanish words in place of English keywords like "for", "in" and "as". However, some Spanish equivalent words may be compound words (two words instead of one, dealing with space can get tricky) or an English keyword may not have a direct Spanish equivalent.
Debugging may get tricky. Converting to English and to Spanish and back to English then Spanish has the marks of "loaded with bugs" written all over it.
The user will not have then benefit of having learning resources. All C# code examples are in the way Microsooft designed it. No one will try to Spanish-ize the syntax just for a few users who will use your app.
I have seen a few people discuss C# code in language other than English. In all cases the authors explain code in their native language but write it in English-looking code as it naturally is. The best approach seems to be try to learn enough of English to be comfortable with C# as it naturally is.
It would be impossible to make a translation that would handle every case. Take for example this Javascript code:
var x = Math.random() < 0.5 ? window : { location : { href : '' } };
var y = x.location.href;
The x variable can either become a reference to the window object, or a reference to the newly created object. It would only make sense to translate the members if it's the window object, otherwise you would have to translate the variable names too, which would be a mess and could easily cause problems.
Besides, it's not really useful to know a language in the wrong language. All the documentation and examples out there is going to be in the original language, so they would be useless.
You should think that the 'de facto' language for tokens on commonly used programming languages is english. So, for purely educational objectives, to teach on a translated language can be harmful for your student(s).
But, if you really want to translate a computer language tokents, you should think on the following issues:
You should translate language primitive constructs. This is easy... you have to learn and use a basic parser like yacc or antlr
You should translate language API's. This can be so painful and difficult... first, modern API's like java's one are very extensive; second, you have to translate the API's documentation.... no more words about that.
While I don't have an answer to the question, I think it's an interesting one. It brings up some issues which I have been thinking about:
As developing countries start introducing their population to higher technologies, naturally some will be interested in learning to program. Will English-only programming languages be an impediment?
Let's say a programming language was developed in a non-English part of the world: the keywords were written in the native language for that area and it used the native punctuation (eg, «» instead of " ", a comma as the decimal point (123,45), and so forth). It's a fantastic programming language, generating lots of buzz. Do you think it would see widespread adoption? Would you use it?
Most English-speaking people answer "no" to the first question. Even non-English (but educated) people answer no. But they also answer "no" to the second question, which seems to be a contradiction.
There was a moment I was thinking about something like that for bash scripts, but idea can be implemented in other languages too:
#!/bin/bash
PrintOnScreen() {
echo "$1 $2 $3 $4 $5 $6 $7 $8 $9"
}
PrintOnScreenWithoutNewline() {
echo -n "$1 $2 $3 $4 $5 $6 $7 $8 $9"
}
MathAdd() {
expr $1 + $2
}
Then we can add this to some script:
#!/bin/bash
. HumanLanguage.sh
PrintOnScreen Hello
PrintOnScreenWithoutNewline "Some number:"
MathAdd 2 3
This will produce:
Hello
Some number: 5
You might find Perl's Lingua::Romana::Perligata interesting -- it allows you to write your perl programs in latin. It's not quite the same as your idea, as it essentially restructures the language semantics around Latin ideas, rather than just translating the strings.
It is relatively easy to translate the keywords from one programming language into another language. There are several non-English-based programming languages, including Chinese Python, which replaces English keywords with Chinese keywords.
It would be much more difficult to translate each individual variable name from English into another natural language. If two different English variable names had only one translation in another language, there would be a name collision.

Resources