extract only words from sentence which includees numbers - ios

i am using the TesseractOCR to read a receipt and i have managed to extract the text from the receipt line by line e.g
2 melon £3.00
1 lime £1.50
5 chicken wings £10.00
But now, for each line, i would like to extract the item name(melons, lime, chicken wings), then the integer and then the float all sepearately line by line. I have googled a lot and have written this in ruby using regex but cant figure out how to do it in swift. I have figured out the float and integer part just not the words only part.
a link to an answer already would be great or an answer. thanks for any help in advance.

If you have solved this using regex in Ruby, the solution in Swift is similar. First let's define some helper functions since NSRegularExpression still deals in NSRange units:
extension String {
var fullRange: NSRange {
return NSMakeRange(0, self.characters.count)
}
subscript(range: NSRange) -> String {
let startIndex = self.index(self.startIndex, offsetBy: range.location)
let endIndex = self.index(startIndex, offsetBy: range.length)
return self[startIndex..<endIndex]
}
}
And the code:
let text =
"2 melon £3.00\n" +
"1 lime £1.50\n" +
"5 chicken wings £10.00"
let regex = try! NSRegularExpression(pattern: "(\\d+)\\s+(.+?)\\s+£([\\d\\.]+)$", options: [.anchorsMatchLines])
regex.enumerateMatches(in: text, options: [], range: text.fullRange) { result, flag, stop in
if let result = result {
let r1 = result.rangeAt(1)
let r2 = result.rangeAt(2)
let r3 = result.rangeAt(3)
print("quantity = \(text[r1]), item = \(text[r2]), price = \(text[r3])")
}
}

use componentSeparatedByString
let a = "5 Chicken Wing"
let b = a.componentSeparatedByString(" ") //meaning space
let b0 = b[0] //5
let b1 = b[1] //Chicken
let b2 = b[2] //Wing

Related

How to put and sort word in NSCountedSet in swift?

I'm try to getting most duplicated word from string with this code.
let text = """
aa bb aa bb aa bb cc dd dd cc zz zz cc dd zz
"""
let words = text.unicodeScalars.split(omittingEmptySubsequences: true, whereSeparator: { !CharacterSet.alphanumerics.contains($0) })
.map { String($0) }
let wordSet = NSCountedSet(array: words)
let sorted = wordSet.sorted { wordSet.count(for: $0) > wordSet.count(for: $1) }
print(sorted.prefix(3))
result is
[cc, dd, aa]
Currently, it put all words, even it is a single charcter.
What I'm going to do is,
put a word to NSCountedSet which has more than one character.
if words in NSCountedSet have same count, sort it alphabetically.
(desired result is aa ,cc, dd)
And if it is possible..
omit parts of speech from the string, such as 'and, a how,of,to,it,in on, who '....etc
Let's consider this string:
let text = """
She was young the way an actual young person is young.
"""
You could use a linguistic tagger :
import NaturalLanguage
let options = NSLinguisticTagger.Options.omitWhitespace.rawValue
let tagger = NSLinguisticTagger(tagSchemes: NSLinguisticTagger.availableTagSchemes(forLanguage: "en"), options: Int(options))
To count the multiplicity of each word I'll be using a dictionary:
var dict = [String : Int]()
Let's define the accepted linguistic tags (you change these to your liking) :
let acceptedtags: Set = ["Verb", "Noun", "Adjective"]
Now let's parse the string, using the linguistic tagger :
let range = NSRange(location: 0, length: text.utf16.count)
tagger.string = text
tagger.enumerateTags(
in: range,
scheme: .nameTypeOrLexicalClass,
options: NSLinguisticTagger.Options(rawValue: options),
using: { tag, tokenRange, sentenceRange, stop in
guard let range = Range(tokenRange, in: text)
else { return }
let token = String(text[range]).lowercased()
if let tagValue = tag?.rawValue,
acceptedtags.contains(tagValue)
{
dict[token, default: 0] += 1
}
// print(String(describing: tag) + ": \(token)")
})
Now the dict has the desired words with their multiplicity
print("dict =", dict)
As you can see a Dictionary is an unoreded collection. Now let's introduce some law and order:
let ordered = dict.sorted {
($0.value, $1.key) > ($1.value, $0.key)
}
Now let's get the keys 🗝 only:
let mostFrequent = ordered.map { $0.key }
and print the three most frequent words :
print("top three =", mostFrequent.prefix(3))
To get the topmost frequent words, it would be more efficient to use a Heap (or a Trie) data structure, instead of having to hash every word, sort them all by frequency, and then prefixing. It should be a fun exercise 😉.

Swift 4.2 extract substring using multiple characters are delimiter

I'm new to Swift and after going through the Apple documentation and other sources is not clear for me how can I extract a substring using more than one character as delimiter. For example: I have a string which looks like:
A.1 value1
B.2 value2
E value3
C value4
and need to assign the values 1 - 4 to different variables.
• Possible solution:
1. Separate all the elements (separator: white space)
2. Iterate 2 by 2 and use a key/value system, like a Dictionary.
3. Read each values from the keys afterward
Step 1:
let string = "A.1 value1 B.2 value2 E value3 C value4"
let components = string.components(separatedBy: CharacterSet.whitespaces)
Step 2:
var dictionary: [String: String] = [:]
stride(from: 0, to: components.count - 1, by: 2).forEach({
dictionary[components[$0]] = components[$0+1]
})
or
let dictionary = stride(from: 0, to: components.count - 1, by: 2).reduce(into: [String: String]()) { (result, currentInt) in
result[components[currentInt]] = components[currentInt+1]
}
dictionary is ["A.1": "value1", "C": "value4", "E": "value3", "B.2": "value2"]
Inspiration for the stride(from:to:) that I rarely use.
Step 3:
let name = dictionary["A.1"]
let surname = dictionary["C"]
• Potential issues:
If you have:
let string = "A.1 value One B.2 value2 E value3 C value4"
You want "value One", and since there is a space, you'll get some issue because if will give a false result (since there is the separator).
You'll get: ["A.1": "value", "One": "B.2", "value2": "E", "value3": "C"] for dictionary.
So you could use instead a regex: A.1(.*)B.2(.*)E(.*)C(.*) (for instance).
let string = "A.1 value One B.2 value2 E value3 C value4"
let regex = try! NSRegularExpression(pattern: "A.1(.*)B.2(.*)E(.*)C(.*)", options: [])
regex.enumerateMatches(in: string, options: [], range: NSRange(location: 0, length: string.utf16.count)) { (result, flags, stop) in
guard let result = result,
let aValueRange = Range(result.range(at: 1), in: string),
let bValueRange = Range(result.range(at: 2), in: string),
let cValueRange = Range(result.range(at: 4), in: string),
let eValueRange = Range(result.range(at: 3), in: string) else { return }
let aValue = string[aValueRange].trimmingCharacters(in: CharacterSet.whitespaces)
print("aValue: \(aValue)")
let bValue = string[bValueRange].trimmingCharacters(in: CharacterSet.whitespaces)
print("bValue: \(bValue)")
let cValue = string[cValueRange].trimmingCharacters(in: CharacterSet.whitespaces)
print("cValue: \(cValue)")
let eValue = string[eValueRange].trimmingCharacters(in: CharacterSet.whitespaces)
print("eValue: \(eValue)")
}
Output:
$>aValue: value One
$>bValue: value2
$>cValue: value4
$>eValue: value3
Note that the trim could be inside the regex, but I don't especially like having too complex regexes.
I like regular expressions for this sort of thing.
I'm going to take you very literally and assume that the substrings to be found are preceded by "A.1", "B.2", "E", and "C", and are all preceded and followed by a space except for the last substring which is followed by the end of the original string. Moreover I'm going to assume very simple-mindedly that the delimiters such as "E" cannot appear in our string in any other way. Then we can capture each substring with an appropriate pattern:
let s = "A.1 harpo B.2 chico E zeppo C groucho"
let p1 = "^A\\.1 (.*) B\\.2 "
let p2 = " B\\.2 (.*) E "
let p3 = " E (.*) C "
let p4 = " C (.*)$"
let patts = [p1,p2,p3,p4]
var result = [String]()
for patt in patts {
let regex = try! NSRegularExpression(pattern: patt, options: [])
if let match = regex.firstMatch(in: s, options: [],
range: NSRange(s.startIndex..<s.endIndex, in: s)) {
let r = match.range(at: 1)
result.append((s as NSString).substring(with: r))
}
}
// result is now ["harpo", "chico", "zeppo", "groucho"]
We now have the four desired substrings extracted into an array, and dealing with them from there is trivial.
Observe that we make no assumptions about spaces. The above works perfectly well even if the target substrings contain spaces, because we are appealing only to the delimiters. For example, if the original string is
let s = "A.1 the rain B.2 in spain E stays mainly C in the plain"
then result is the array
["the rain", "in spain", "stays mainly", "in the plain"]
I should point out, however, that another way to do this sort of thing is to walk the original string with a Scanner. You might prefer this because regular expressions are not really needed here, and if you don't know regular expressions you'll find this kind of walk much clearer. So here it is rewritten to use a scanner. Note that we end up with four Optional NSString objects, because Scanner is actually an Objective-C Cocoa Foundation thing, but it isn't difficult to turn those into String objects as needed:
let s = "A.1 the rain B.2 in spain E stays mainly C in the plain"
let scan = Scanner(string: s)
scan.scanString("A.1 ", into: nil)
var r1 : NSString? = nil
scan.scanUpTo(" B.2 ", into: &r1)
scan.scanString("B.2 ", into: nil)
var r2 : NSString? = nil
scan.scanUpTo(" E ", into: &r2)
scan.scanString("E ", into: nil)
var r3 : NSString? = nil
scan.scanUpTo(" C ", into: &r3)
scan.scanString("C ", into: nil)
var r4 : NSString? =
(scan.string as NSString).substring(from: scan.scanLocation) as NSString
r1 // the rain
r2 // in spain
r3 // stays mainly
r4 // in the plain

How to find multiple substrings within braces from a string?

I have a string [Desired Annual Income] /([Income per loan %] /100)
Using this string, I have to find two sub strings 'Desired Annual Income' and 'Income per loan %' in Swift3.
I am using below code to achieve this 'How do I get the substring between braces?':
let myString = "[Desired Annual Income] /([Income per loan %] /100)"
let start: NSRange = (myString as NSString).range(of: "[")
let end: NSRange = (myString as NSString).range(of: "]")
if start.location != NSNotFound && end.location != NSNotFound && end.location > start.location {
let result: String = (myString as NSString).substring(with: NSRange(location: start.location + 1, length: end.location - (start.location + 1)))
print(result)
}
But as an output I am getting only 'Desired Annual Income', How can I get all substrings?
Try this,
Hope it will work
let str = "[Desired Annual Income] /([Income per loan %] /100)"
let trimmedString = str.components(separatedBy: "]")
for i in 0..<trimmedString.count - 1{ // not considering last component since it's of no use hence count-1 times loop
print(trimmedString[i].components(separatedBy: "[").last ?? "")
}
Output:-
Desired Annual Income
Income per loan %
It's a very good use case for regular expressions (NSRegularExpression). The principle of regular expressions is to describe a "pattern" that you want to search in a string.
In that case you search something between two brackets.
The code is then:
let str = "[Desired Annual Income] /([Income per loan %] /100)"
if let regex = try? NSRegularExpression(pattern: "\\[(.+?)\\]", options: [.caseInsensitive]) {
var collectMatches: [String] = []
for match in regex.matches(in: str, options: [], range: NSRange(location: 0, length: (str as NSString).length)) {
// range at index 0: full match (including brackets)
// range at index 1: first capture group
let substring = (str as NSString).substring(with: match.range(at: 1))
collectMatches.append(substring)
}
print(collectMatches)
}
For the explanation about regular expressions, there are plenty of tutorial on internet. But in very short:
\\[ and \\]: opening and closing brackets characters (the double backslashes are because brackets have a meaning in regular expression, so you need to escape them. In a text editor one backslash is enough, but you need a second one because you are in a String and you need to escape the backslash to have a backslash.
(.+?) is a bit more complex: the parentheses are the "capture group", what you want to get. . means "any character", + one or more time, ? after a + is the greedy operator, which means that you want the capture to stop ASAP. If you don't put it, your capture can be in your case "Desired Annual Income] /([Income per loan %", depending of the regex library that you are using. Foundation seems to be greedy by default, that being said.
Regex are not always super easy/direct, but if you do often text processing, it's a very powerful tool to know.

How to take NSRange in swift?

I am very much new to swift language. I am performing some business logic which needs to take NSRange from given String.
Here is my requirement,
Given Amount = "144.44"
Need NSRange of only cent part i.e. after "."
Is there any API available for doing this?
You can do a regex-based search to find the range:
let str : NSString = "123.45"
let rng : NSRange = str.range("(?<=[.])\\d*$", options: .RegularExpressionSearch)
Regular expression "(?<=[.])\\d*$" means "zero or more digits following a dot character '.' via look-behind, all the way to the end of the string $."
If you want a substring from a given string you can use componentsSeparatedByString
Example :
var number: String = "144.44";
var numberresult= number.componentsSeparatedByString(".")
then you can get components as :
var num1: String = numberresult [0]
var num2: String = numberresult [1]
hope it help !!
Use rangeOfString and substringFromIndex:
let string = "123.45"
if let index = string.rangeOfString(".") {
let cents = string.substringFromIndex(index.endIndex)
print("\(cents)")
}
Another version that uses Swift Ranges, rather than NSRange
Define the function that returns an optional Range:
func centsRangeFromString(str: String) -> Range<String.Index>? {
let characters = str.characters
guard let dotIndex = characters.indexOf(".") else { return nil }
return Range(dotIndex.successor() ..< characters.endIndex)
}
Which you can test with:
let r = centsRangeFromString(str)
// I don't recommend force unwrapping here, but this is just an example.
let cents = str.substringWithRange(r!)

Swift sort NSArray

I have a problem with my sorting algorithm.
My NSArray (here vcd.signals.key) contains values of Strings for example:
"x [0]", "x [18]", "x [15]", "x [1]"...
When I try to sort this the result ends up in
"x [0]", "x [15]", "x [18]", "x [1]"
instead of:
"x [0]", "x [1]", "x [15]", "x [18]"
This is my code:
let sortedKeys = sorted(vcd.signals.keys) {
var val1 = $0 as! String
var val2 = $1 as! String
return val1 < val2
}
Any idea how I can fix this issue?
Your problem come associated with your comparison , for example see what happen when you compare the two following strings:
println("x [15]" < "x [1]") // true
This is because the default lexicography comparer goes character for character, position by position comparing ,and of course 5 in position 3 is less than ] in position 3:
println("5" < "]") // true
For the explained above you need to create you own comparer but , only compare for the numbers inside the [$0]. For achieve this I use regular expressions to match any numbers inside the brackets like in the following way:
func matchesForRegexInText(regex: String!, text: String!) -> [String] {
let regex = NSRegularExpression(pattern: regex,
options: nil, error: nil)!
let nsString = text as NSString
let results = regex.matchesInString(text,
options: nil, range: NSMakeRange(0, nsString.length))
as! [NSTextCheckingResult]
return map(results) { nsString.substringWithRange($0.range)}
}
var keysSorted = keys.sorted() {
var key1 = $0
var key2 = $1
var pattern = "([0-9]+)"
var m1 = self.matchesForRegexInText(pattern, text: key1)
var m2 = self.matchesForRegexInText(pattern, text: key2)
return m1[0] < m2[0]
}
In the above regular expression I assume that the numbers only appears inside the brackets and match any number inside the String, but feel free to change the regular expression if you want to achieve anything more. Then you achieve the following:
println(keysSorted) // [x [0], x [1], x [15], x [18]]
I hope this help you.
The issue you are running into is the closing brace character ']' comes after digits. This means that "18" is less than "1]". As long as all your strings share the form of "[digits]" then you can remove the closing brace, sort the strings, add the closing brace back to your final array. The code below works for Swift 2:
let arr = ["x [0]", "x [18]", "x [15]", "x [1]"]
let sorted = arr.map { $0.substringToIndex($0.endIndex.predecessor()) }.sort().map { $0 + "]" }
print(sorted)

Resources