I have a problem with the turkish i and capital I.
In turkye there is also a dotted capital I. When i send a fieldbyname with the lowercase i it will not find my field because in the background the functions capares it with the capital dotted I
Does anyone know a workaround?
These 2 lines of code has different results:
showmessage(s.ToUpper);
showmessage(uppercase(s));
The fieldbyname uses the 1st one.
FieldByName finds a match by calling CompareText using the user's default locale, with case insensitivity. CompareText is a function provided by Windows.
If CompareText is saying that lowercase-I, and capital-letter-I-with-dot don't match, then your choices are:
[1] Use the capital dotted I in your call to FieldByName, or
[2] Use a locale in which those two characters are treated the same by CompareText.
Related
I have some strings with a sentence and i need to subdivise it into a substring of maximum 40 characters.
But i don't want to split the sentence in the middle of a word.
I tried with .gsub function but it's return 40 characters maximum and avoid to cut the string in the middle of a word. But it's return only the first occurence.
sentence[0..40].gsub(/\s\w+$/,'')
I tried with split but i can select only the fist 40 characters and split in the middle of a word...
sentence.split(...){40}
My string is "Sure, we will show ourselves only when we know the east door has been opened.".
The string output i want is
["Sure, we will show ourselves only when we","know the east door has
been opened."]
Do you have a solution ? Thanks
Your first attempt:
sentence[0..40].gsub(/\s\w+$/,'')
almost works, but it has one fatal flaw. You are splitting on the number of characters before cutting off the last word. This means you have no way of knowing whether the bit being trimmed off was a whole word, or a partial word.
Because of this, your code will always cut off the last word.
I would solve the problem as follows:
sentence[/\A.{0,39}[a-z]\b/mi]
\A is an anchor to fix the regex to the start of the string.
.{0,39}[a-z] matches on 1 to 40 characters, where the last character must be a letter. This is to prevent the last selected character from being punctuation or space. (Is that desired behaviour? Your question didn't really specify. Feel free to tweak/remove that [a-z] part, e.g. [a-z.] to match a full stop, if desired.)
\b is a word boundary look-around. It is a zero-width matcher, on beginning/end of words.
/mi modifiers will include case insensitive (i.e. A-Z) and multi-line matches.
One very minor note is that because this regex is matching 1 to 40 characters (rather than zero), it is possible to get a null result. (Although this is seemingly very unlikely, since you'd need a 1-word, 41+ letter string!!) To account for this edge case, call .to_s on the result if needed.
Update: Thank you for the improved edit to your question, providing a concrete example of an input/result. This makes it much clearer what you are asking for, as the original post was somewhat ambiguous.
You could solve this with something like the following:
sentence.scan(/.{0,39}[a-z.!?,;](?:\b|$)/mi)
String#scan returns an array of strings that match the pattern - so you can then re-join these strings to reconstruct the original.
Again, I have added a few more characters (!?,;) to the list of "final characters in the substring". Feel free to tweak this as desired.
(?:\b|$) means "either a word boundary, or the end of the line". This fixes the issue of the result not including the final . in the substrings. Note that I have used a non-capture group (?:) to prevent the result of scan from changing.
I'm trying to split a string and counts the number os words using Ruby but I want ignore special characters.
For example, in this string "Hello, my name is Hugo ..." I'm splitting it by spaces but the last ... should't counts because it isn't a word.
I'm using string.inner_text.split(' ').length. How can I specify that special characters (such as ... ? ! etc.) when separated from the text by spaces are not counted?
Thank you to everyone,
Kind Regards,
Hugo
"Hello, my name is não ...".scan /[^*!#%\^\s\.]+/
# => ["Hello,", "my", "name", "is", "não"]
/[^*!#%\^]+/ will match anything other than *!#%\^. You can add more to this list which need not be matched
this is part answer, part response to #Neo's answer: why not use proper tools for the job?
http://www.ruby-doc.org/core-1.9.3/Regexp.html says:
POSIX bracket expressions are also similar to character classes. They provide a portable alternative to the above, with the added benefit that they encompass non-ASCII characters. For instance, /\d/ matches only the ASCII decimal digits (0-9); whereas /[[:digit:]]/ matches any character in the Unicode Nd category.
/[[:alnum:]]/ - Alphabetic and numeric character
/[[:alpha:]]/ - Alphabetic character
...
Ruby also supports the following non-POSIX character classes:
/[[:word:]]/ - A character in one of the following Unicode general categories Letter, Mark, Number, Connector_Punctuation
you want words, use str.scan /[[:word:]]+/
If I have a string containing only characters from the ASCII set (0 to 127), can I guarantee that converting to upper case or lower case will result in a consistent value regardless of any localisation settings?
For example, can I know that "Hello World" will become "hello world" and "HELLO WORLD" under conversions to upper and lower case without knowing anything about localisation?
No, as #SLaks writes in a comment, Turkish has special rules for “i”: the uppercase equivalent of “i” is I with dot above, “İ”, and the lowercase equivalent of “I” is dotless i, “ı”. The same applies to Azeri, a close relative of Turkish.
It would depend on the function doing the conversion. You'd be fine with all the C library functions for example.
For example:
x := #123;
I tried to search around Google but I simply have no idea what this means.
IIRC it means a character value of the number (eg. #32 -> space).
#123 is a character (Char type) of the ordinal value 123.
It's character code. #97 is equivalent to 'a' etc etc
A chart can be see here.
It is an extention to standard Pascal, Borland Pascal accepts the pound sign ('#') followed immediately by a decimal number between 0 and 255 as a single character with that code.
As other have mentioned it's a character code, I most often see them used for line breaks in messages, or other control character such as Tab (#9)
ShowMessage('Error:'#13#10'Something terrible happened')
Strangely it's not necessary to concatinate a string involving these.
It's character code. #97 is equivalent to chr(97) etc etc
I'm typesetting in LaTeX, and I'd like to display a "variable" (in my case, a reference \ref{blah} to an item number in list) in roman rather than the default arabic. Is there an easy way to do this? Thanks for any pointers!
You can try \def\theenumi{\roman{enumi}} inside an enumerate environment -- this changes both labels and refs, but you'll have to then explicitly undo it (if you want to).
lowercase
\romannumeral 0\ref{blah}\relax
uppercase
\uppercase\expandafter{\romannumeral 0\ref{blah}}
What are the references to? Usually, you would redefine how that particular counter is displayed.
For example, to change how a section number is displayed, you could use the following command:
\renewcommand\thesection{\Roman{section}}
Now, each command that internally uses \thesection will print the section number as a roman numeral.
Similar commands work for chapter, figure etc.
\roman (lowercase r) yield lowercase roman numerals.
For lowercase: {\romannumeral \ref{blah}}
For uppercase: \uppercase\expandafter{\romannumeral \ref{blah}}
A good solution seems to me to declare
\renewcommand{\theenumi}{\roman{enumi}}
\renewcommand{\labelenumi}{(\theenumi)}
in the header and then cite by \eqref{blah} to get your (iii) for the third item. (Note that \eqref requires the amsmath package. Alternatively, write (\ref{blah}).)