How can I parse content from a PDF page with Swift - ios

The documentation is not really clear to me. So far I reckon I need to set up a CGPDFOperatorTable and then create a CGPDFContentStreamCreateWithPage and CGPDFScannerCreate per PDF page.
The documentation refers to setting up Callbacks, but it's unclear to me how. How to actually obtain the content from a page?
This is my code so far.
let pdfURL = NSBundle.mainBundle().URLForResource("titleofdocument", withExtension: "pdf")
// Create pdf document
let pdfDoc = CGPDFDocumentCreateWithURL(pdfURL)
// Nr of pages in this PF
let numberOfPages = CGPDFDocumentGetNumberOfPages(pdfDoc) as Int
if numberOfPages <= 0 {
// The number of pages is zero
return
}
let myTable = CGPDFOperatorTableCreate()
// lets go through every page
for pageNr in 1...numberOfPages {
let thisPage = CGPDFDocumentGetPage(pdfDoc, pageNr)
let myContentStream = CGPDFContentStreamCreateWithPage(thisPage)
let myScanner = CGPDFScannerCreate(myContentStream, myTable, nil)
CGPDFScannerScan(myScanner)
// Search for Content here?
// ??
CGPDFScannerRelease(myScanner)
CGPDFContentStreamRelease(myContentStream)
}
// Release Table
CGPDFOperatorTableRelease(myTable)
It's a similar question to: PDF Parsing with SWIFT but has no answers yet.

Here is an example of the callbacks implemented in Swift:
let operatorTableRef = CGPDFOperatorTableCreate()
CGPDFOperatorTableSetCallback(operatorTableRef, "BT") { (scanner, info) in
print("Begin text object")
}
CGPDFOperatorTableSetCallback(operatorTableRef, "ET") { (scanner, info) in
print("End text object")
}
CGPDFOperatorTableSetCallback(operatorTableRef, "Tf") { (scanner, info) in
print("Select font")
}
CGPDFOperatorTableSetCallback(operatorTableRef, "Tj") { (scanner, info) in
print("Show text")
}
CGPDFOperatorTableSetCallback(operatorTableRef, "TJ") { (scanner, info) in
print("Show text, allowing individual glyph positioning")
}
let numPages = CGPDFDocumentGetNumberOfPages(pdfDocument)
for pageNum in 1...numPages {
let page = CGPDFDocumentGetPage(pdfDocument, pageNum)
let stream = CGPDFContentStreamCreateWithPage(page)
let scanner = CGPDFScannerCreate(stream, operatorTableRef, nil)
CGPDFScannerScan(scanner)
CGPDFScannerRelease(scanner)
CGPDFContentStreamRelease(stream)
}

You've actually specified exactly how to do it, all you need to do is put it together and try until it works.
First of all, you need to setup a a table with callbacks as you state yourself in the beginning of your question (all code in Objective C, NOT Swift):
CGPDFOperatorTableRef operatorTable = CGPDFOperatorTableCreate();
CGPDFOperatorTableSetCallback(operatorTable, "q", &op_q);
CGPDFOperatorTableSetCallback(operatorTable, "Q", &op_Q);
This table contains a list of the PDF operators you want to get called for and associates a callback with them. Those callbacks are simply functions you define elsewhere:
static void op_q(CGPDFScannerRef s, void *info) {
// Do whatever you have to do in here
// info is whatever you passed to CGPDFScannerCreate
}
static void op_Q(CGPDFScannerRef s, void *info) {
// Do whatever you have to do in here
// info is whatever you passed to CGPDFScannerCreate
}
And then you create the scanner and get it going, while passing it the information you just defined.
// Passing "self" is just an example, you can pass whatever you want and it will be provided to your callback whenever it is called by the scanner.
CGPDFScannerRef contentStreamScanner = CGPDFScannerCreate(contentStream, operatorTable, self);
CGPDFScannerScan(contentStreamScanner);
If you want to see a complete example with sourcecode on how to find and process images, check this website.

To understand why a parser works this way, you need to read the PDF specification a bit better. A PDF file contains something close to printing instructions. Such as "move to this coordinate, print this character, move there, change the color, print the character number 23 from the font #23", etc.
The parser gives you callbacks for each instructions, with the possibility to retrieve the instruction parameters. That's all.
So, in order to get the content from a file, you need to rebuild its state manually. Which means, recompute the frames for all characters, and try to reverse-engineer the page layout. This is clearly not an easy task, and that's why people have created libraries to do so.
You may want to have a look at PDFKitten , or PDFParser which is a Swift port with some improvement that i did.

Related

SKScene and URLQueryItems in Swift3?

Ok, I am new to URL querying and this whole aspect of Swift and need help. As is, I have an iMessage app that contains and SKScene. For the users to take turns playing the game, I need to send the game back and forth in messages within 1 session as I learned here : https://medium.com/lost-bananas/building-an-interactive-imessage-application-for-ios-10-in-swift-7da4a18bdeed.
So far I have my scene all working however Ive poured over Apple's ice cream demo where they send the continuously-built ice cream back and forth, and I cant understand how to "query" everything in my SKScene so I can send the scene.
I'm unclear as to how URLQueryItems work as the documentation does not relate to sprite kit scenes.
Apple queries their "ice cream" in its current state like this:
init?(queryItems: [URLQueryItem]) {
var base: Base?
var scoops: Scoops?
var topping: Topping?
for queryItem in queryItems {
guard let value = queryItem.value else { continue }
if let decodedPart = Base(rawValue: value), queryItem.name == Base.queryItemKey {
base = decodedPart
}
if let decodedPart = Scoops(rawValue: value), queryItem.name == Scoops.queryItemKey {
scoops = decodedPart
}
if let decodedPart = Topping(rawValue: value), queryItem.name == Topping.queryItemKey {
topping = decodedPart
}
}
guard let decodedBase = base else { return nil }
self.base = decodedBase
self.scoops = scoops
self.topping = topping
}
}
fileprivate func composeMessage(with iceCream: IceCream, caption: String, session: MSSession? = nil) -> MSMessage {
var components = URLComponents()
components.queryItems = iceCream.queryItems
let layout = MSMessageTemplateLayout()
layout.image = iceCream.renderSticker(opaque: true)
layout.caption = caption
let message = MSMessage(session: session ?? MSSession())
message.url = components.url!
message.layout = layout
return message
}
}
But I cant find out how to "query" an SKScene. How can I "send" an SKScene back and forth? Is this possible?
You do not need to send an SKScene back and forth :) What you need to do is send the information relating to your game set up - such as number of turns, or whose turn it is, or whatever, as information that can be accessed by your app at the other end to build the scene.
Without knowing more about how your scene is set up and how it interacts with the information received for the other player's session, I can't tell you a lot in terms of specifics. But, what you need to do, if you are using URLQueryItems to pass the information, simply retrieve the list of query items in your scene and set up the scene based on the received values.
If you have specific questions about how this could be done, if you either share the full project, or post the relevant bits of code as to where you send out a message from one player and how the other player receives the information and sets up the scene, I (or somebody else) should be able to help.
Also, if you look at composeMessage in the code you posted above, you will see how in that particular code example the scene/game information was being sent to the other user. At the other end of the process, the received message's URL parameter would be decomposed to get the values for the various query items and then the scene would be set up based on those values. Look at how that is done in order to figure out how your scene should be set up.

Calculating word count from a url in swift

I'm creating a reading list app, and I'd like to pass the read time of a user added link to a table cell in their reading list - and the only way to get that number is from that page's word count. I've found a few solutions, namely Parsehub, Parse and Mercury but they seem to be geared more towards use cases that need more advanced things to be scraped from a url. Is there a simpler way in Swift to calculate word count of a url?
First of all, you need to parse the HTML. HTML can only be parsed reliably with dedicated HTML parser. Please don't use Regular Expressions or any other search method to parse HTML. You may read it why from this link. If you are using swift, you may try Fuzi or Kanna. After you get the body text with any one of the library, you have to remove extra white spaces and count the words. I have written some basic code with Fuzi library for you to get started.
import Fuzi
// Trim
func trim(src:String) -> String {
return src.trimmingCharacters(in: CharacterSet.whitespacesAndNewlines)
}
// Remove Extra double spaces and new lines
func clean(src:String) ->String {
return src.replacingOccurrences(
of: "\\s+",
with: " ",
options: .regularExpression)
}
let htmlUrl = URL(fileURLWithPath: ((#file as NSString).deletingLastPathComponent as NSString).appendingPathComponent("test.html"))
do {
let data = try Data(contentsOf: htmlUrl)
let document = try HTMLDocument(data: data)
// get body of text
if let body = document.xpath("//body").first?.stringValue {
let cleanBody = clean(src: body)
let trimmedBody = trim(src:cleanBody)
print(trimmedBody.components(separatedBy: " ").count)
}
} catch {
print(error)
}
If you are fancy, you may change my global functions to String extension or you can combine them in a single function. I wrote it for clarity.

How to get the console logs and display in a textview [Swift]

How can I get the console logs with all the print/Nslog contents and display it on a textview? Thank you very much for your answer.
To accomplish this I modified the OutputListener Class described in this article titled "Intercepting stdout in Swift" by phatblat:
func captureStandardOutputAndRouteToTextView() {
outputPipe = Pipe()
// Intercept STDOUT with outputPipe
dup2(self.outputPipe.fileHandleForWriting.fileDescriptor, FileHandle.standardOutput.fileDescriptor)
outputPipe.fileHandleForReading.waitForDataInBackgroundAndNotify()
NotificationCenter.default.addObserver(forName: NSNotification.Name.NSFileHandleDataAvailable, object: outputPipe.fileHandleForReading , queue: nil) {
notification in
let output = self.outputPipe.fileHandleForReading.availableData
let outputString = String(data: output, encoding: String.Encoding.utf8) ?? ""
DispatchQueue.main.async(execute: {
let previousOutput = self.outputText.string
let nextOutput = previousOutput + outputString
self.outputText.string = nextOutput
let range = NSRange(location:nextOutput.count,length:0)
self.outputText.scrollRangeToVisible(range)
})
self.outputPipe.fileHandleForReading.waitForDataInBackgroundAndNotify()
}
}
}
If you do not want to change existing code, you can;
1 - redirect the output of print to a known file.
see instructions here; How to redirect the nslog output to file instead of console ( answer 4, redirecting)
2 - monitor the file for changes and read them in to display in your textView.
You cannot do that.
You can use some logger, witch allow you to add custom log destination.
You will have to change all print/NSLog calls to e.g. Log.verbose(message).
I'm using SwiftyBeaver. It allows you to define your custom destination. You can later read it and present in some text field.
You can totally do that! Check this out: https://stackoverflow.com/a/13303081/1491675
Basically you create an output file and pipe the stderr output to that file. Then to display in your textView, just read the file and populate your textView.

Getting Climacons to display in UILabel with CZWeatherKit in Swift

So I am using the CZWeatherKit library to grab weather data from forecast.io.
When I get results, it sends a climacon UInt8 char, which should match to an icon if the climacon font is installed. I did that but it only shows the char, not the actual icon. Here is the code, it prints a quote i.e. " which is the correct mapping to ClimaconCloudSun, but the icon doesn't show. I followed these instructions to install the climacons.ttf font
request.sendWithCompletion { (data, error) -> Void in
if let error = error {
print(error)
} else if let weather = data {
let forecast = weather.dailyForecasts.first as! CZWeatherForecastCondition
dispatch_async(dispatch_get_main_queue(), { () -> Void in
// I get back good results, this part works
let avgTempFloat = (forecast.highTemperature.f + forecast.lowTemperature.f) / 2
let avgTemp = NSDecimalNumber(float: avgTempFloat).decimalNumberByRoundingAccordingToBehavior(rounder)
self.temperatureLabel.text = String(avgTemp)
self.weatherLabel.text = forecast.summary
// this part does not work, it has the right char, but does not display icon
// I tried setting self.climaconLabel.font = UIFont(name: "Climacons-Font", size: 30) both in IB and programmatically
let climaChar = forecast.climacon.rawValue
let climaString = NSString(format: "%c", climaChar)
self.climaconLabel.text = String(climaString)
})
}
}
I solved the exact same issue, the problem was the font file. Replace your current font with the one provided here: https://github.com/comyar/Sol/blob/master/Sol/Sol/Resources/Fonts/Climacons.ttf
You've probably moved on from this problem by now, but I'll leave this here for future use.
You need to call setNeedsLayout on the label after you change the title text to the desired value, and the label will change to the corresponding icon.

Repeat code for whole array

I am using some Facebook IDs in my app, and I have an array of serveral ID's, the array can be 10 numbers but can also be 500 numbers..
Right now the numbers are displayed in a tableview, and I want all the results there too, so they need to be in an array.
let profileUrl = NSURL(string:"http://www.facebook.com/" + newArray[0])!
let task = NSURLSession.sharedSession().dataTaskWithURL(profileUrl) {
(data, response, error) -> Void in
// Will happen when task completes
if let urlContent = data {
let webContent = NSString(data: urlContent, encoding: NSUTF8StringEncoding)
dispatch_async(dispatch_get_main_queue(),
{ () -> Void in
let websiteArray = webContent!.componentsSeparatedByString("pageTitle\">")
//print(websiteArray[1])
let secondArray = websiteArray[1].componentsSeparatedByString("</title>")
print(secondArray[0])
})
}
}
this code takes the first number of the array, goes to facebook.com/[the actual number], and then downloads the data and splits the data into pieces, so that the data that I want it in the secondArray[0]. I want to do this for every number of the array, take the result data and put it back into an array. I have no idea how to do this because you don't know how much numbers there are gonna be etc, does someone has a good solution for this?
Any help would be appreciated, really!
Thanks
You have several problems here, and you should take them one at at a time to build up to your solution.
First, forget the table for the moment. Don't worry at all about how you're going to display these results. Just focus on getting the results in a simple form, and then you'll go back and convert that simple form into something easy to display, and then you'll display it.
So first, we want this in a simple form. That's a little bit complicated because it's all asynchronous. But that's not too hard to fix.
func fetchTitle(identifier: String, completion: (title: String) -> Void) {
let profileUrl = NSURL(string:"http://www.facebook.com/" + identifier)!
let task = NSURLSession.sharedSession().dataTaskWithURL(profileUrl) {
(data, response, error) -> Void in
if let urlContent = data {
let webContent = NSString(data: urlContent, encoding: NSUTF8StringEncoding)
let websiteArray = webContent!.componentsSeparatedByString("pageTitle\">")
let secondArray = websiteArray[1].componentsSeparatedByString("</title>")
let title = secondArray[0]
completion(title: title)
}
}
task.resume()
}
Now this is still pretty bad code because it doesn't handle errors at all, but it's a starting point, and the most important parts are here. A function that takes a string, and when it's done fetching things, calls some completion handler.
(Regarding error handling, note how many places this code would crash if it were returned surprising data. Maybe the data you get isn't a proper string. Maybe it's not formatted like you think it is. Every time you use ! or subscript an array, you run the risk of crashing. Try to minimize those.)
So you might then wrap it up in something like:
var titles = [String]()
let identifiers = ["1","2","3"]
let queue = dispatch_queue_create("titles", DISPATCH_QUEUE_SERIAL)
dispatch_apply(identifiers.count, queue) { index in
let identifier = identifiers[index]
fetchTitle(identifier) { title in
dispatch_async(queue) {
titles.append(title)
}
}
}
This is just code to get you on the right track and start studying the right things. It certainly would need work to be production quality (particularly to handle errors).
Once you have something that returns your titles correctly, you should be able to write a program that does nothing but take a list of identifiers and prints out the list of titles. Then you can add code to integrate that list into your tableview. Keep the parts separate. The titles are the Model. The table is the View. Read up on the Model-View-Controller paradigm, and you'll be in good shape.
To repeat code for whole array put your code in a loop and run that loop from 0 to array.count-1
You don't need to know how many items there will be an array. You can just get the count at run time array.count here array is your array.
I hope this is what you wanted to know, your question doesn't make much sense though.

Resources