I found some Dart code with # in front of a string:
_specialCharactersInsideCharacterClass = new HashSet.from([#"^", #"-", #"]"]);
Found in: RegExpBuilder.dart
What is the meaning of the symbol # in this case?
Right now, a prefix # character in front of a string is not valid Dart code. But I can imaging that it is was used to disable escaping and string interpolation in the past. The linked Dart file is from 2013, so maybe it was created before the prefix r was introduced to mark raw strings:
_specialCharactersInsideCharacterClass = new HashSet.from([r"^", r"-", r"]"]);
In raw strings, string interpolation (using the $ character) and escaping (for example \r) are disabled.
Related
I was having the url which on converting to punycode has suffix as xn---- which all the regex present in ruby libraries fails to match.
Currently I am using validates_url_format_of ruby library.
Example Url: "https://www.θεραπευτικη-κανναβη.com.gr"
Punycode url: "https://www.xn----ylbbafnbqebomc7ba3bp1ds.com.gr"
So can you please suggest that is there any issue in the regex in the library or the issue lies in the conversion to punycode.
As per the punycode conversion rules the suffix always is xn--. So can anyone suggest what extra two -- means here
"https://www.xn----ylbbafnbqebomc7ba3bp1ds.com.gr".match(/https?:\/\/w*\.xn----.*/)
=> #<MatchData "https://www.xn----ylbbafnbqebomc7ba3bp1ds.com.gr">
Note the url matcher is not perfect
When you have a - inside the URL, the algorithm gets it duplicated and moves it to the beginning of the puny code.
For example:
áéíóú.com -> xn--1caqmy9a.com
á-é-í-ó-ú.com -> xn-------4na3c3a3cwd.com
I guess it has to do with the xn-- encoding restrictions.
This one should work for you:
(xn--)(--)*[a-z0-9]+.com.gr
The beginning of the code: (xn--)
An even number (or 0) of --: (--)*
The domain chars/numbers :([a-z0-9]+)
The TLD of the domain : .com.gr
You can add http/https if you wish
Update:
After adding numbers to the URL I found that the regex needs a fix:
(xn--)(-[-0-9]{1})*[a-z0-9]+.com.gr
á-1é-2í-3ó-4ú.gr.com -> xn---1-2-3-4-7ya6f1b6dve.gr.com
my app work with SMSC, and i need to get involve in sms before it send,
i try to send from the mobile that string
"hello this is test"
And when I check the smsc I got this as binary string of my text:
userData = "c8329bfd06d1d1e939283d07d1cb733a"
the encoding of this string is:
<Encoding:ASCII-8BIT>
I know that probably this userData is in GSM encoding in binary-string
so how can i get from userData back the clear text string ?
this question is for english lang, because in Hebrew I can get back the
string with this code:
[userData].pack('H*').force_encoding('utf-16be').encode('utf-8')
but in english i got error:
Encoding::InvalidByteSequenceError: "\xDA\xF3" followed by "u" on UTF-16BE
What I was try is to detect the binary string with ICU, and I got:
"ISO-8859-1" and the language that detected is: 'PT', that very strange cause my languages is English or Hebrew.
anyway i got lost with encoding stuff, so i try to encode to each name of list from Encoding.list
but without luck until now
thanks in advance
Shmulik
OK,
For who that also have this issue, i got the solution, thanks to someone from #ruby irc community (i missed his nickname)
The solution is:
for ascii chars that interpolate to binary:
You need that:
"c8329bfd06d1d1e939283d07d1cb733a".scan(/../).reverse_each.map { |h| h.to_i(16) }.pack('C*').unpack('B*')[0][2..-1].scan(/.{7}/).map.with_object("") { |x, s| s << x.to_i(2) }.reverse
Remember I sent this words in sms:
"hello this is test"
And that it has become in binary to:
"c8329bfd06d1d1e939283d07d1cb733a"
The reason that i got garbage in any encoding is, because the ascii chars is 7bits GSM, so only first 7bits represents the data but each another encoding uses at least 8bits, so that what the code actually do.
But this is just for ascii char set.
In another language like I use Hebrew, the SMS send as ucs2
So this code work for me:
[your_binary_string].pack('H*').force_encoding('utf-16be').encode('utf-8')
Very important to put the binary string in array
So that all for now.
If anybody want to translate and explain what exactly happen in the code for ascii char set, be my guest and welcome.
Shmulik
I am using the iconv C API and I want iconv to detect the local encoding of the computer. Is that possible? Apparently it is because when I look in the source code, I find in the file iconv_open1.h that if the fromcode or tocode variables are empty strings ("") then the local encoding is used using the locale_charset() function call.
Someone also told me that in order to convert the locale encoding to unicode, all I needed was to use iconv_open ("UTF-8", "")
Unfortunately, I find no mention of this in the documentation.
And when I convert some iso-8859-1 text to the locale encoding (which is utf-8 on my machine), then during conversion I get errno=EILSEQ (illegal sequence). I checked and iconv_open returned no error.
If instead of the empty string in iconv_open I specify "utf-8", then I get no error. Obviously iconv failed to detect my current charset.
edit: I checked with a simple C program that puts(nl_langinfo(CODESET)) and I get ANSI_X3.4-1968 (which is ASCII). Apparently, I got a problem with charset detection.
edit: this should be related to Why is nl_langinfo(CODESET) different from locale charmap?
additional information: my program is written in Ada, and I bind at link-time to C functions. Apparently, the locale setting is not initialized the same way in the Ada runtime and C runtime.
I'll take the same answer as in Why is nl_langinfo(CODESET) different from locale charmap?
You need to first call
setlocale(LC_ALL, "");
I'm trying to find documentation that describe the syntax and possibilities suggested by the construction ${PRODUCT_NAME:rfc1034identifier}. Obviously this turns into some version of the product name, but where is the documentation that describes how? I just grepped the entire /Developer directory, and got nothing useful.
I'm not looking for the narrow definition of what happens to this particular variable, I want to know about all such modifiers like rfc1034identifier.
By using strings I also dug out the following things that look like they're related to :rfc1034identifier:
:quote - adds backslashes before whitespaces (and more), for use in shell scripts
:identifier - replaces whitespace, slashes (and more) with underscores
:rfc1034identifier - replaces whitespace, slashes (and more) with dashes
:dir - don't know, observed replace with ./ in some cases
:abs - don't know
Exact command:
strings /Developer/Library/PrivateFrameworks/DevToolsCore.framework/Versions/A/DevToolsCore|grep '^:'
There are more things that look like interesting modifiers (for example, :char-range=%#), but I couldn't get these to work. There's only one example of :char-range on the net, and it's from a crash log for Xcode.
Someone asked how do we know it's a modifier specification. Well, we know because it works on multiple variables in build settings. Plist preprocessor probably uses the same mechanisms to resolve build variables as does the build system.
Hack Saw, if you get a response via that bug report, don't forget to keep us informed :-)
Looks like you can stack these as well. The useful case floating around out there is
com.yourcompany.${PRODUCT_NAME:rfc1034identifier:lower}
such that a product name of "Your App" becomes com.yourcompany.your-app.
At long last, Apple produced some documentation on this. This is in the "Text Macros" section of the Xcode manual, as of this date.
Text macro format reference
A text macro can contain any valid unicode text. It can also contain other text macros.
Including other text macros
To include another text macro, add three underscore (_) characters before and after the macro name:
___<MacroName>___
Modifying text macro expansion
You can modify the final expansion of the text macro by adding one or more modifiers. Add a modifier to a text macro by placing a colon (:) at the end of the macro followed by the modifier. Add multiple modifiers by separating each one with a comma (,).
<MACRO>:<modifier>[,<modifier>]…
For example, the following macro will remove the path extension from the FILENAME macro:
FILENAME:deletingPathExtension
To turn the modified macro above into a valid C identifier, add the identifier macro:
FILENAME:deletingPathExtension,identifier
Modifiers
bundleIdentifier: Replaces any non-bundle identifier characters with a hyphen (-).
deletingLastPathComponent: Removes the last path component from the expansion string.
deletingPathExtension: Removes any path extension from the expansion string.
deletingTrailingDot: Removes any trailing dots (.).
identifier: Replaces any non-C identifier characters with an underscore (_).
lastPathComponent: Returns just the last path component of the expansion string.
pathExtension: Returns the path extension of the expansion string.
rfc1034Identifier: Replaces any non-rfc1034 identifier characters with a hyphen (-).
xml: Replaces special xml characters with the corresponding escape string. For example, less-than (<) is replaced with <
TEXT MACROS
Text macros reference
COPYRIGHT
A copyright string that uses the company name of the team for the project. If there is no company name, the string is blank.
The example shows a copyright string when the company is set to “Apple”.
Copyright © 2018 Apple. All rights reserved.
DATE
The current date.
DEFAULTTOOLCHAINSWIFTVERSION
The version of Swift used for the default toolchain.
FILEBASENAME
The name of the current file without any extension.
FILEBASENAMEASIDENTIFIER
The name of the current file encoded as a C identifier.
FILEHEADER
The text placed at the top of every new text file.
FILENAME
The full name of the current file.
FULLUSERNAME
The full name of the current macOS user.
NSHUMANREADABLECOPYRIGHTPLIST
The entry for the human readable copyright string in the Info.plist file of a macOS app target. The value of the macro must include the XML delimiters for the plist. For example, a valid value is:
'''
<key>NSHumanReadableCopyright</key>
<string>Copyright © 2018 Apple, Inc. All rights reserved.</string>
'''
Notice that the value includes a newline.
ORGANIZATIONNAME
The name for your organization that appears in boilerplate text throughout your project folder. The organization name in your project isn’t the same as the organization name that you enter in App Store Connect.
PACKAGENAME
The name of the package built by the current scheme.
PACKAGENAMEASIDENTIFIER
A C-identifier encoded version of the package name built by the current scheme.
PRODUCTNAME
The app name of the product built by the current scheme.
PROJECTNAME
The name of the current project.
RUNNINGMACOSVERSION
The version of macOS that is running Xcode.
TARGETNAME
The name of the current target.
TIME
The current time.
USERNAME
The login name for the current macOS user.
UUID
Returns a unique ID. The first time this macro is used, it generates the ID before returning it. You can use this macro to create multiple unique IDs by using a modifier. Each modifier returns an ID that is unique for that modifier. For example, the first time the UUID:firstPurpose modifier is used, the macro generates and returns a unique ID for that macro and modifier combination. Subsequent uses of the UUID:firstPurpose modifier return the same ID. Adding the UUID:secondPurpose modifier generates and returns a different ID that will be unique to UUID:secondPurpose, and different from the ID for UUID:firstPurpose.
WORKSPACENAME
The name of the current workspace. If there is only one project open, then the name of the current project.
YEAR
The current year as a four-digit number.
$ strings /Developer/Library/PrivateFrameworks/DevToolsCore.framework/Versions/A/DevToolsCore
PRODUCTNAME
PRODUCTNAMEASIDENTIFIER
PRODUCTNAMEASRFC1034IDENTIFIER
PRODUCTNAMEASXML
It seems that there are :identifier, :rfc1034identifier and :xml modifiers. But I have no clue except this.
After stumbling over this question and its existing answers, I have to say: Apples documentation did not improve on this topic over the recent years. We are currently at Xcode 13 and there is still no complete list of all modifiers available.
Therefore I did some spelunking and found the supported modifiers in DVTFoundation.framework which I will list below.
I've tested them all in Xcode 13.3 build settings and used the following two macros to illustrate their impact:
MY_MACRO = Some "text" with umlauts äöüçñ and special characters are ',/|\-_:;%&<>.!
MY_SOURCE = /Applications/Xcode.app/Contents/Frameworks/../SharedFrameworks/DVTFoundation.framework
Retrieval operators/modifiers
Retrieval modifiers are used to extract and/or transform all or parts of a macro/variable/setting.
They are applied using the following syntax: $(<VARIABLE>:<MODIFIER>)
quote: Escapes all characters which have a special meaning in shell scripts/commands like space, colon, semicolon and backslash.
RESULT_quote = $(MY_MACRO:quote)
Some\ \"text\"\ with\ umlauts\ äöüçñ\ and\ special\ characters\ are\ \',/|\\-_:;%&<>.!
upper: Transforms all characters to their uppercase equivalents.
RESULT_upper = $(MY_MACRO:upper)
SOME "TEXT" WITH UMLAUTS ÄÖÜÇÑ AND SPECIAL CHARACTERS ARE ',/|\-_:;%&<>.!
lower: Transforms all characters to their lowercase equivalents.
RESULT_lower = $(MY_MACRO:lower)
some "text" with umlauts äöüçñ and special characters are ',/|\-_:;%&<>.!
identifier: Replaces any non-C identifier characters with an underscore (_).
RESULT_identifier = $(MY_MACRO:identifier)
Some__text__with_umlauts_______and_special_characters_are________________
rfc1034identifier: Replaces any non-rfc1034 identifier characters with a hyphen (-)
RESULT_rfc1034identifier = $(MY_MACRO:rfc1034identifier)
Some--text--with-umlauts-------and-special-characters-are----------------------------
c99extidentifier: Replaces any non-C99 identifier characters with an underscore (_). Umlauts are allowed as C99 uses Unicode!
RESULT_c99extidentifier = $(MY_MACRO:c99extidentifier)
Some__text__with_umlauts_äöüçñ_and_special_characters_are___________________________
xml: According to Apple documentation it should replace special xml characters with the corresponding escape string. For example, less-than (<) is replaced with <. But in my examples this didn't work.
RESULT_xml = $(MY_MACRO:xml)
Some "text" with umlauts äöüçñ and special characters are ',/|\-_:;%&<>.!
dir: Extracts the directory part of a path
RESULT_dir = $(MY_SOURCE:dir)
/Applications/Xcode.app/Contents/Frameworks/../SharedFrameworks/
file: Extracts the filename part of a path
RESULT_file = $(MY_SOURCE:file)
DVTFoundation.framework
base: Extracts the filename base part of a path (=filename without suffix/extension)
RESULT_base = $(MY_SOURCE:base)
DVTFoundation
suffix: Extracts the filename extension/suffix a path or filename
RESULT_suffix = $(MY_SOURCE:suffix)
.framework
standardizepath: Standardizes the path (e.g. ../ and tilde (~) are resolved)
RESULT_standardizepath = $(MY_SOURCE:standardizepath)
/Applications/Xcode.app/Contents/SharedFrameworks/DVTFoundation.framework
Replacement operators/modifiers
Beside above extracting/transforming operators, there is support built into the build settings system to replace specific parts of a directory which are matched using a modifier.
They are applied using the following syntax: $(<VARIABLE>:<MODIFIER>=<VALUE>)
dir=<VALUE>: Replaces the directory part of a path with <VALUE> and returns the new path
RESULT2_dir = $(MY_SOURCE:dir=/Developer/SharedFrameworks)
/Developer/SharedFrameworks/DVTFoundation.framework
file=<VALUE>: Replaces the filename part of a path and returns the new path
RESULT2_file = $(MY_SOURCE:file=my_file.txt)
/Applications/Xcode.app/Contents/Frameworks/../SharedFrameworks/my_file.txt
base=<VALUE>: Replaces the filename base part of a path (=filename without suffix/extension) and returns the new path
RESULT2_base = $(MY_SOURCE:base=Dummy)
/Applications/Xcode.app/Contents/Frameworks/../SharedFrameworks/Dummy.framework
suffix=<VALUE>: Replaces the filename extension/suffix a path and returns the new path
RESULT2_suffix = $(MY_SOURCE:suffix=.txt)
/Applications/Xcode.app/Contents/Frameworks/../SharedFrameworks/DVTFoundation.txt
I hope this list will help more people looking at Xcodes build settings and wondering how they can be transformed.
Edit: whats the difference between reading a backslash from a file and writing it to the interactive window vs writing directly the string to the interactive window ?
For example
let toto = "Adelaide Gu\u00e9nard"
toto;;
the interactive window prints "Adelaide Guénard".
Now if I save a txt file with the single line Adelaide Gu\u00e9nard . And read it in:
System.IO.File.ReadAllLines(#"test.txt")
The interactive window prints [|"Adelaide Gu\u00e9nard"|]
What is the difference between these 2 statements in terms of the interactive window printing ?
As far as I know, there is no library that would decode the F#/C# escaping of string for you, so you'll have to implement that functionality yourself. There was a similar question on how to do that in C# with a solution using regular expressions.
You can rewrite that to F# like this:
open System
open System.Globalization
open System.Text.RegularExpressions
let regex = new Regex (#"\\[uU]([0-9A-F]{4})", RegexOptions.IgnoreCase)
let line = "Adelaide Gu\\u00e9nard"
let line = regex.Replace(line, fun (m:Match) ->
(char (Int32.Parse(m.Groups.[1].Value, NumberStyles.HexNumber))).ToString())
(If you write "some\\u00e9etc" then you're creating string that contains the same thing as what you'd read from the text file - if you use single backslash, then the F# compiler interprets the escaping)
It uses the StructuredFormat stuff from the F# PowerPack. For your string, it's effectively doing printfn toto;;.
You can achieve the same behaviour in a text file as follows:
open System.IO;;
File.WriteAllText("toto.txt", toto);;
The default encoding used by File.WriteAllText is UTF-8. You should be able to open toto.txt in Notepad or Visual Studio and see the é correctly.
Edit: If wanted to write the content of test.txt to another file in the clean F# interactive print, how would i proceed ?
It looks like fsi is being too clever when printing the contents of test.txt. It's formatting it as a valid F# expression, complete with quotes, [| |] brackets, and a Unicode character escape. The string returned by File.ReadAllLines doesn't contain any of these things; it just contains the words Adelaide Guénard.
You should be able to take the array returned by File.ReadAllLines and pass it to File.WriteAllLines, without the contents being mangled.