F# trim null chars - f#

I have a string recieved from a web socket:
let websocket = new ClientWebSocket()
let source = new CancellationTokenSource()
let buffer = ArraySegment(Array.zeroCreate 32)
do! websocket.ConnectAsync(url, source.Token)
do! websocket.ReceiveAsync(buffer, source.Token)
let str = Encoding.ASCII.GetString buffer.Array
let trim1 = Regex.replace str "\0" String.Empty // result is empty string
let trim2 = Regex.replace str "\\0" String.Empty // result is empty string
let trim3 = str.TrimEnd [| '\\'; '0' |] // result is untouched
I am obviously trying to trim the excess null chars
In the debugger the value for str is "{\"type\":\"hello\"}\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
When printed it looks like: {"type":"hello"} which makes sense, the symbols are being interpreted properly.
I seem to not be able to do this simple task in f#, what am I doing wrong?

Using ASCII escaping will do the trick:
let trim3 = str.TrimEnd [| '\x00' |]
You could also escape it with the unicode escape sequence:
let trim3 = str.TrimEnd [| '\u0000' |]
Your regular expression version does not work because the proper way to represent a null character in a regular expression is by ASCII escaping using "\x00", or unicode escaping using "\u0000".

Related

How to express Strings in Swift using Unicode hexadecimal values (UTF-16)

I want to write a Unicode string using hexadecimal values in Swift. I have read the documentation for String and Character so I know that I can use special Unicode characters directly in strings like the following:
var variableString = "Cat‼🐱" // "Cat" + Double Exclamation + cat emoji
But I would like to do it using the Unicode code points. The docs (and this question) show it for characters, but are not very clear about how to do it for strings.
(Note: Although the answer seems obvious to me now, it wasn't obvious at all just a short time ago. I am answering my own question below as a means of learning how to do this and also to help myself understand Unicode terminology and how Swift Characters and Strings work.)
Character
The Swift syntax for forming a hexadecimal code point is
\u{n}
where n is a hexadecimal number up to 8 digits long. The valid range for a Unicode scalar is U+0 to U+D7FF and U+E000 to U+10FFFF inclusive. (The U+D800 to U+DFFF range is for surrogate pairs, which are not scalars themselves, but are used in UTF-16 for encoding the higher value scalars.)
Examples:
// The following forms are equivalent. They all produce "C".
let char1: Character = "\u{43}"
let char2: Character = "\u{0043}"
let char3: Character = "\u{00000043}"
// Higher value Unicode scalars are done similarly
let char4: Character = "\u{203C}" // ‼ (DOUBLE EXCLAMATION MARK character)
let char5: Character = "\u{1F431}" // 🐱 (cat emoji)
// Characters can be made up of multiple scalars
let char7: Character = "\u{65}\u{301}" // é = "e" + accent mark
let char8: Character = "\u{65}\u{301}\u{20DD}" // é⃝ = "e" + accent mark + circle
Notes:
Leading zeros can be added or omitted
Characters are known as extended grapheme clusters. Even when they are composed of multiple scalars, they are still considered a single character. What is key is that they appear to be a single character (grapheme) to the user.
TODO: How to convert surrogate pair to Unicode scalar in Swift
String
Strings are composed of characters. See the following examples for some ways to form them using hexadecimal code points.
Examples:
var string1 = "\u{0043}\u{0061}\u{0074}\u{203C}\u{1F431}" // Cat‼🐱
// pass an array of characters to a String initializer
let catCharacters: [Character] = ["\u{0043}", "\u{0061}", "\u{0074}", "\u{203C}", "\u{1F431}"] // ["C", "a", "t", "‼", "🐱"]
let string2 = String(catCharacters) // Cat‼🐱
Converting Hex Values at Runtime
At runtime you can convert hexadecimal or Int values into a Character or String by first converting it to a UnicodeScalar.
Examples:
// hex values
let value0: UInt8 = 0x43 // 67
let value1: UInt16 = 0x203C // 8252
let value2: UInt32 = 0x1F431 // 128049
// convert hex to UnicodeScalar
let scalar0 = UnicodeScalar(value0)
// make sure that UInt16 and UInt32 form valid Unicode values
guard
let scalar1 = UnicodeScalar(value1),
let scalar2 = UnicodeScalar(value2) else {
return
}
// convert to Character
let character0 = Character(scalar0) // C
let character1 = Character(scalar1) // ‼
let character2 = Character(scalar2) // 🐱
// convert to String
let string0 = String(scalar0) // C
let string1 = String(scalar1) // ‼
let string2 = String(scalar2) // 🐱
// convert hex array to String
let myHexArray = [0x43, 0x61, 0x74, 0x203C, 0x1F431] // an Int array
var myString = ""
for hexValue in myHexArray {
if let scalar = UnicodeScalar(hexValue) {
myString.append(Character(scalar))
}
}
print(myString) // Cat‼🐱
Further reading
Strings and Characters docs
Glossary of Unicode Terms
Strings in Swift
Working with Unicode code points in Swift
from your Hex "0x1F52D" to actual Emoji
let c = 0x1F602
next step would possibly getting an Uint32 from your Hex
let intEmoji = UnicodeScalar(c!).value
from this you can do something like
titleLabel.text = String(UnicodeScalar(intEmoji)!)
here you have a "😂"
it work with range of hexadecimal too
let emojiRanges = [
0x1F600...0x1F636,
0x1F645...0x1F64F,
0x1F910...0x1F91F,
0x1F30D...0x1F52D
]
for range in emojiRanges {
for i in range {
let c = UnicodeScalar(i)!.value
data.append(c)
}
}
to get multiple UInt32 from your Hex range for exemple

How to use rangeOfString below in Swift

Suppose I have a string "10.9.1.1", I want to get substring "10.9". How can I achieve this?
So far I have the following:
var str = "10.9.1.1"
let range = str.rangeOfString(".",options: .RegularExpressionSearch)!
let rangeOfDecimal = Range(start:str.startIndex,end:range.endIndex)
var subStr = str.subStringWithRange(rangeOfDecimal)
But this will only return 10.
Actually your code returns "1" only, because "." in a regular
expression pattern matches any character.
The correct pattern would be
\d+ one ore more digits
\. a literal dot
\d+ one or more digits
In a Swift string, you have to escape the backslashes as "\\":
let str = "10.9.1.1"
if let range = str.rangeOfString("\\d+\\.\\d+",options: .RegularExpressionSearch) {
let subStr = str.substringWithRange(range)
println(subStr) // "10.9"
}

Loop through list of 2 tuples to replace part of a string

I'm trying to replace chained String.Replace() calls with a more functional version. Original:
let ShortenRomanNumeral (num : string) : string =
num.Replace("VIIII", "IX").Replace("IIII", "IV").Replace("LXXXX", "XC").Replace("XXXX", "XL").Replace("DCCCC", "CM").Replace("CCCC", "CD")
Functional version that works with one key value pair:
let ShortenRomanNumeral' (str : string) (k : string) (v : string) : string =
let strAfterReplace =
str.Replace(k, v)
strAfterReplace
I'm struggling to extend it to work with a list of tuples, such as
let replacements = [("VIIII", "IX"); ("IIII", "IV"); ...]
How can I write this function to apply the Replace() to the string for each key and value in the replacements list?
Fold is good. But just to demonstrate another way to do it...
// You can put the input string
// as the LAST parameter not first
let shortenRomanNumeral (k:string,v:string) (input:string) =
input.Replace(k,v)
// This allows you to do partial application like this
let replace4 = shortenRomanNumeral ("IIII", "IV")
let replace9 = shortenRomanNumeral ("VIIII", "IX")
// replace9 and replace4 have the signature string->string
// they are now simple string transformation functions
replace4 "abcIIIIdef" |> printfn "result is '%s'"
replace9 "abcVIIIIdef" |> printfn "result is '%s'"
// and they can be composed together.
// Order is important. Do 9 before 4.
let replace4and9 = replace9 >> replace4
replace4and9 "VIIII abc IIII" |> printfn "result is '%s'"
// With this approach, you can now transform a list of tuples
// into a list of string transforms using List.map
let listOfTransforms =
[("VIIII", "IX"); ("IIII", "IV");]
|> List.map shortenRomanNumeral
// and you can combine all these into one big transformation
// function using composition
let transformAll =
listOfTransforms
|> List.reduce (>>)
// finally you can apply the big function
transformAll "VIIII abc IIII" |> printfn "result is '%s'"
A fold will do the job:
let ShortenRomanNumeral' (str : string) (k : string, v : string) : string =
let strAfterReplace =
str.Replace(k, v)
strAfterReplace
let replacements = [("VIIII", "IX"); ("IIII", "IV"); ]
let replaceValues str = List.fold ShortenRomanNumeral' str replacements
replaceValues "VI VII VIIII I II III IIII" // "VI VII IX I II III IV"
Note that I only modified the last parameter of ShortenRomanNumeral' to accept tupled values.

Applying a filter to get a single item and using the filter function to transform the result

In the following example code, I filter a list of strings on a regular expression, knowing that there can only be a single entry that will match that string. I then use the same match string to get 2 grouped values out of the single remaining value.
let input = ["aaaa bbbb";"aaabbbb";"cccc$$$$";"dddddda";" "]
let ValuesOfAB (input: string list) =
let matchString = "(?<a>\w+)\s(?<b>\w+)"
let value = input |> List.filter (fun line -> Regex.Matches(line, matchString).Count <> 0)
|> List.head
(Regex.Matches(value, matchString).[0].Groups.["a"].Value, Regex.Matches(value, matchString).[0].Groups.["b"].Value)
let a = ValuesOfAB input
Is there a better way where I don't have to use Regex.Matches on the same string again for a second time to get the values I wish to return?
Use List.pick:
let input = ["aaaa bbbb";"aaabbbb";"cccc$$$$";"dddddda";" "]
let valuesOfAB (input: string list) =
let matchString = "(?<a>\w+)\s(?<b>\w+)"
let v = input |> List.pick (fun line -> let m = Regex.Match(line, matchString)
if m.Success then Some m else None)
v.Groups.["a"].Value, v.Groups.["b"].Value
let a = valuesOfAB input
Explanation:
You would like to match the first string in the list and return Match object in order that you don't have to run Regex again. List.pick fits the task quite well.
With each string, you need to match at least once so Regex.Match and Match.Success is enough for the purpose.

How to convert string array to float array and substitute Double.NaN for non-numeric values?

I'm writing a parser for CSV data, and am trying to determine how to handle records
that are blank ("") or contain character data ("C"). The parser code I have below works great, but forces me to deal with the float conversions later. I'd like to be able to just make my string[][] a float[][], and handle the conversions when I parse the file, but I notice that it blows up with any non-numeric data. Ideally there would be no non-numeric or blank values, but they are unavoidable, and as such, have to be dealt with.
Can someone possibly recommend a concise approach to attempt to convert to Double, and then if it doesn't work, replace with Double.NaN instead? (Without sacrificing much performance if possible). Thank you.
let stringLine = [| "2.0"; "", "C"|]
let stringLine2Float = Array.map float stringLine
//desiredFloatArray = [| 2.0; Double.NaN; Double.NaN |]
type csvData = { mutable RowNames: string[]; mutable ColNames: string[]; mutable Data: string[][] }
let csvParse (fileString: string) =
let colNames = ((fileLines fileString |> Seq.take 1 |> Seq.nth 0).Split(',')).[1..]
let lines = fileLines fileString |> Seq.skip 1 |> Array.ofSeq
let rowNames = Array.init lines.Length string;
let allData : string [][] = Array.zeroCreate rowNames.Length
for i in 0..rowNames.Length - 1 do
let fields = lines.[i].Split(',')
allData.[i] <- fields.[1..]
rowNames.[i] <- fields.[0]
{ RowNames = rowNames; ColNames = colNames; Data = allData }
Use this instead of the built-in float conversion:
let cvt s =
let (ok,f) = System.Double.TryParse(s)
if ok then f else nan

Resources