Calculating word count from a url in swift

Calculating word count from a url in swift - ios

I'm creating a reading list app, and I'd like to pass the read time of a user added link to a table cell in their reading list - and the only way to get that number is from that page's word count. I've found a few solutions, namely Parsehub, Parse and Mercury but they seem to be geared more towards use cases that need more advanced things to be scraped from a url. Is there a simpler way in Swift to calculate word count of a url?

First of all, you need to parse the HTML. HTML can only be parsed reliably with dedicated HTML parser. Please don't use Regular Expressions or any other search method to parse HTML. You may read it why from this link. If you are using swift, you may try Fuzi or Kanna. After you get the body text with any one of the library, you have to remove extra white spaces and count the words. I have written some basic code with Fuzi library for you to get started.
import Fuzi
// Trim
func trim(src:String) -> String {
return src.trimmingCharacters(in: CharacterSet.whitespacesAndNewlines)
}
// Remove Extra double spaces and new lines
func clean(src:String) ->String {
return src.replacingOccurrences(
of: "\\s+",
with: " ",
options: .regularExpression)
}
let htmlUrl = URL(fileURLWithPath: ((#file as NSString).deletingLastPathComponent as NSString).appendingPathComponent("test.html"))
do {
let data = try Data(contentsOf: htmlUrl)
let document = try HTMLDocument(data: data)
// get body of text
if let body = document.xpath("//body").first?.stringValue {
let cleanBody = clean(src: body)
let trimmedBody = trim(src:cleanBody)
print(trimmedBody.components(separatedBy: " ").count)
}
} catch {
print(error)
}
If you are fancy, you may change my global functions to String extension or you can combine them in a single function. I wrote it for clarity.

Related

UILabel text is not updated even in MainThread

UILabel.text is not updated inside the main thread. labelOne is updated but labelTwo which is to show translated word is not updated. When I print translatedWord it prints right string to console but UILabel is not updated.
datatask = session.dataTask(with: request, completionHandler: {data, response, error in
if error == nil {
let receivedData = try? JSONSerialization.jsonObject(with: data!, options: []) as? [String: Any]
DispatchQueue.main.async {
self.labelOne.text = wordTobeTranslated
let data = "\(String(describing: receivedData!["text"]))"
let all = data.components(separatedBy: "(")
let afterAll = all[2]
let last = afterAll.components(separatedBy:")" )
self.translatedWord = last[0]
self.importantWords.append(last[0])
self.labelTwo.text = self.translatedWord
print(self.translatedWord)
}
}
})
datatask?.resume()

Some tips to debug your issue:
Try to set some hard coded string to labelTwo and check whether it
is displaying. (or)
Update any value for labelTwo in Storyboard and check whether is is
displaying.
If string is displayed from any of the above steps, then labelTwo is configured correctly in storyboard.
You can also try self.labelTwo.layoutIfNeeded() method after updating the text to force update the UI.
If none of the steps helps you, check the Font color of UILabel. If it is same as background color, it would not be seen.

Looking at your code sample, it would appear that you’re trying to retrieve:
describing: receivedData!["text"]
from:
\(String(describing: receivedData!["text"]))
The problem is that the \(...) in a String literal results in string interpolation where the expression inside \( and ) will be evaluated and that’s what will be place in the string. And the String(describing: ...) will interpret the value and return a string representation. So, let’s say that receivedData!["text"] contained the word “Foo”. Then
let data = "\(String(describing: receivedData!["text"]))"
Would result in data containing the string, Optional("Foo").
If you want to remove that Optional(...) part, you should either unwrap the optional or use a nil-coalescing operator, ??. And frankly, rather than using string interpolation at all, I’d just do:
let data = String(describing: receivedData?["text"] ?? "")

I know this is an old question, but for new searchers: Try setting number of lines of UILabel to Zero. It might be a constraints issue.

Integrate KML on offline Google Map iOS

I want to create a kml file programmatically in iOS Swift4. I surfed a lot regarding offline GoogleMaps & saving data in kml file, but I haven't found any relevant material regarding this. In short, I just want to store some POI in kml & display it in offline Googlemap!!

KML is an XML document scheme. You can write it in any way you see fit. Use a dedicated XML encoder or brute force using strings.
The full spec is documented at https://developers.google.com/kml/documentation/kmlreference
You may find examining existing KML files useful to show example structures.
Here is an example brute force encoder to encode a set of labeled placemarks…
Raw strings will need to be XML escaped before writing to file.
You may find Google Toolbox for Mac useful here.
class KMLWriter {
func createKMLFormat(_ logs:[DDBLogEntry])->String {
var kml = preamble()
for log in logs {
kml += placemarkForLog(log)
}
terminate(&kml)
return kml
}
func placemarkForLog(_ log:DDBLogEntry)->String {
let name = log.userContent ?? "empty"
let time = log.dateCreated ?? Date()
let timeStamp = Date.UTCDateFormatter.string(from: time)
return "<Placemark>\n<name>\(name)</name>\n<Point><coordinates>\(log.longitude ?? 0),\(log.latitude ?? 0),\(log.elevation ?? 0)</coordinates></Point>\n<TimeStamp><when>\(timeStamp)</when></TimeStamp>\n</Placemark>\n"
}
func preamble()->String {
return "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<kml xmlns=\"http://www.opengis.net/kml/2.2\">\n<Document>\n"
}
func terminate(_ kml:inout String) {
kml += "</Document>\n</kml>\n"
}
}

Swift: how to display bold words in textview without attributed string

We're adding the finishing touches to an app we're working on, and that apparently means putting an entire "Terms and Conditions" and "FAQs" section, formatting, bullets, breaks and all.
So I tried copy-pasting it into a textView with "editable" set to off, which kept the bullets, but not the bolded text.
Now, I've done attributed string before, and I have to say, I'm not sure it will be easy to do that on some 12-pages worth of paragraphs, bulleted lists and breaks that are likely to change in a few years or so.
So my question is, is there a way to do this without using attributed string?
Barring that, perhaps there's a way to loop through the text, and look for a written tag that will apply the attributes?
EDIT:
Update. It's been suggested I use HTML tags, and web view. That's what was done for the FAQs (which uses a label), I neglected to mention I tried that too.
For some reason, it just shows a blank textview, albeit a large-sized one, as if there's text in it (there isn't any). Strange that copy-pasting works but this doesn't.
Here's my code for it:
override func awakeFromNib() {
super.awakeFromNib()
termsTitle.text = "Terms and Conditions"
htmlContent = "<p style=\"font-family:Helvetica Neue\"><br/><strong><br/> BLA BLA BLA BLA BLA BLA x 12 Pages"
do {
let str = try NSAttributedString(data: htmlContent.dataUsingEncoding(NSUnicodeStringEncoding, allowLossyConversion: true)!, options: [ NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType], documentAttributes: nil)
termsTextView.attributedText = str
} catch {
print("Dim background error")
}
}

I'm pretty sure you can't do this in a Textview without using AttributedString. A possible solution would be using a WebView. Converting your "Terms and Conditions" and "FAQ" to HTML would probably be much easier than using an AttributedString.

If you still want to use your HTML in a UITextView you can try this function:
func getAttributedString(fileName: String) -> NSAttributedString? {
if let htmlLocation = NSBundle.mainBundle().URLForResource(fileName, withExtension: "html"), data = NSData(contentsOfURL: htmlLocation) {
do {
let attrString = try NSAttributedString(data: data, options: [NSDocumentTypeDocumentAttribute:NSHTMLTextDocumentType], documentAttributes: nil)
return attrString
} catch let err as NSError {
print("Attributed String Creation Error")
print(err.localizedDescription)
return nil
}
} else {
return nil
}
}
This function assumes you have a .html file in your main bundle. You pass it the name (minus extension) of the file (that should be in your project) and then use it like so:
textView.attributedText = getAttributedString("TermsAndConditions")
Just to clarify, the textView is a #IBOutlet on a View Controller in this example.
This function returns nil if either the .html file does not exist or the NSAttributedString conversion failed.

How can I parse content from a PDF page with Swift

The documentation is not really clear to me. So far I reckon I need to set up a CGPDFOperatorTable and then create a CGPDFContentStreamCreateWithPage and CGPDFScannerCreate per PDF page.
The documentation refers to setting up Callbacks, but it's unclear to me how. How to actually obtain the content from a page?
This is my code so far.
let pdfURL = NSBundle.mainBundle().URLForResource("titleofdocument", withExtension: "pdf")
// Create pdf document
let pdfDoc = CGPDFDocumentCreateWithURL(pdfURL)
// Nr of pages in this PF
let numberOfPages = CGPDFDocumentGetNumberOfPages(pdfDoc) as Int
if numberOfPages <= 0 {
// The number of pages is zero
return
}
let myTable = CGPDFOperatorTableCreate()
// lets go through every page
for pageNr in 1...numberOfPages {
let thisPage = CGPDFDocumentGetPage(pdfDoc, pageNr)
let myContentStream = CGPDFContentStreamCreateWithPage(thisPage)
let myScanner = CGPDFScannerCreate(myContentStream, myTable, nil)
CGPDFScannerScan(myScanner)
// Search for Content here?
// ??
CGPDFScannerRelease(myScanner)
CGPDFContentStreamRelease(myContentStream)
}
// Release Table
CGPDFOperatorTableRelease(myTable)
It's a similar question to: PDF Parsing with SWIFT but has no answers yet.

Here is an example of the callbacks implemented in Swift:
let operatorTableRef = CGPDFOperatorTableCreate()
CGPDFOperatorTableSetCallback(operatorTableRef, "BT") { (scanner, info) in
print("Begin text object")
}
CGPDFOperatorTableSetCallback(operatorTableRef, "ET") { (scanner, info) in
print("End text object")
}
CGPDFOperatorTableSetCallback(operatorTableRef, "Tf") { (scanner, info) in
print("Select font")
}
CGPDFOperatorTableSetCallback(operatorTableRef, "Tj") { (scanner, info) in
print("Show text")
}
CGPDFOperatorTableSetCallback(operatorTableRef, "TJ") { (scanner, info) in
print("Show text, allowing individual glyph positioning")
}
let numPages = CGPDFDocumentGetNumberOfPages(pdfDocument)
for pageNum in 1...numPages {
let page = CGPDFDocumentGetPage(pdfDocument, pageNum)
let stream = CGPDFContentStreamCreateWithPage(page)
let scanner = CGPDFScannerCreate(stream, operatorTableRef, nil)
CGPDFScannerScan(scanner)
CGPDFScannerRelease(scanner)
CGPDFContentStreamRelease(stream)
}

You've actually specified exactly how to do it, all you need to do is put it together and try until it works.
First of all, you need to setup a a table with callbacks as you state yourself in the beginning of your question (all code in Objective C, NOT Swift):
CGPDFOperatorTableRef operatorTable = CGPDFOperatorTableCreate();
CGPDFOperatorTableSetCallback(operatorTable, "q", &op_q);
CGPDFOperatorTableSetCallback(operatorTable, "Q", &op_Q);
This table contains a list of the PDF operators you want to get called for and associates a callback with them. Those callbacks are simply functions you define elsewhere:
static void op_q(CGPDFScannerRef s, void *info) {
// Do whatever you have to do in here
// info is whatever you passed to CGPDFScannerCreate
}
static void op_Q(CGPDFScannerRef s, void *info) {
// Do whatever you have to do in here
// info is whatever you passed to CGPDFScannerCreate
}
And then you create the scanner and get it going, while passing it the information you just defined.
// Passing "self" is just an example, you can pass whatever you want and it will be provided to your callback whenever it is called by the scanner.
CGPDFScannerRef contentStreamScanner = CGPDFScannerCreate(contentStream, operatorTable, self);
CGPDFScannerScan(contentStreamScanner);
If you want to see a complete example with sourcecode on how to find and process images, check this website.

To understand why a parser works this way, you need to read the PDF specification a bit better. A PDF file contains something close to printing instructions. Such as "move to this coordinate, print this character, move there, change the color, print the character number 23 from the font #23", etc.
The parser gives you callbacks for each instructions, with the possibility to retrieve the instruction parameters. That's all.
So, in order to get the content from a file, you need to rebuild its state manually. Which means, recompute the frames for all characters, and try to reverse-engineer the page layout. This is clearly not an easy task, and that's why people have created libraries to do so.
You may want to have a look at PDFKitten , or PDFParser which is a Swift port with some improvement that i did.

Repeat code for whole array

I am using some Facebook IDs in my app, and I have an array of serveral ID's, the array can be 10 numbers but can also be 500 numbers..
Right now the numbers are displayed in a tableview, and I want all the results there too, so they need to be in an array.
let profileUrl = NSURL(string:"http://www.facebook.com/" + newArray[0])!
let task = NSURLSession.sharedSession().dataTaskWithURL(profileUrl) {
(data, response, error) -> Void in
// Will happen when task completes
if let urlContent = data {
let webContent = NSString(data: urlContent, encoding: NSUTF8StringEncoding)
dispatch_async(dispatch_get_main_queue(),
{ () -> Void in
let websiteArray = webContent!.componentsSeparatedByString("pageTitle\">")
//print(websiteArray[1])
let secondArray = websiteArray[1].componentsSeparatedByString("</title>")
print(secondArray[0])
})
}
}
this code takes the first number of the array, goes to facebook.com/[the actual number], and then downloads the data and splits the data into pieces, so that the data that I want it in the secondArray[0]. I want to do this for every number of the array, take the result data and put it back into an array. I have no idea how to do this because you don't know how much numbers there are gonna be etc, does someone has a good solution for this?
Any help would be appreciated, really!
Thanks

You have several problems here, and you should take them one at at a time to build up to your solution.
First, forget the table for the moment. Don't worry at all about how you're going to display these results. Just focus on getting the results in a simple form, and then you'll go back and convert that simple form into something easy to display, and then you'll display it.
So first, we want this in a simple form. That's a little bit complicated because it's all asynchronous. But that's not too hard to fix.
func fetchTitle(identifier: String, completion: (title: String) -> Void) {
let profileUrl = NSURL(string:"http://www.facebook.com/" + identifier)!
let task = NSURLSession.sharedSession().dataTaskWithURL(profileUrl) {
(data, response, error) -> Void in
if let urlContent = data {
let webContent = NSString(data: urlContent, encoding: NSUTF8StringEncoding)
let websiteArray = webContent!.componentsSeparatedByString("pageTitle\">")
let secondArray = websiteArray[1].componentsSeparatedByString("</title>")
let title = secondArray[0]
completion(title: title)
}
}
task.resume()
}
Now this is still pretty bad code because it doesn't handle errors at all, but it's a starting point, and the most important parts are here. A function that takes a string, and when it's done fetching things, calls some completion handler.
(Regarding error handling, note how many places this code would crash if it were returned surprising data. Maybe the data you get isn't a proper string. Maybe it's not formatted like you think it is. Every time you use ! or subscript an array, you run the risk of crashing. Try to minimize those.)
So you might then wrap it up in something like:
var titles = [String]()
let identifiers = ["1","2","3"]
let queue = dispatch_queue_create("titles", DISPATCH_QUEUE_SERIAL)
dispatch_apply(identifiers.count, queue) { index in
let identifier = identifiers[index]
fetchTitle(identifier) { title in
dispatch_async(queue) {
titles.append(title)
}
}
}
This is just code to get you on the right track and start studying the right things. It certainly would need work to be production quality (particularly to handle errors).
Once you have something that returns your titles correctly, you should be able to write a program that does nothing but take a list of identifiers and prints out the list of titles. Then you can add code to integrate that list into your tableview. Keep the parts separate. The titles are the Model. The table is the View. Read up on the Model-View-Controller paradigm, and you'll be in good shape.

To repeat code for whole array put your code in a loop and run that loop from 0 to array.count-1
You don't need to know how many items there will be an array. You can just get the count at run time array.count here array is your array.
I hope this is what you wanted to know, your question doesn't make much sense though.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Calculating word count from a url in swift - ios

Related

UILabel text is not updated even in MainThread

Integrate KML on offline Google Map iOS

Swift: how to display bold words in textview without attributed string

How can I parse content from a PDF page with Swift

Repeat code for whole array

Categories

Resources