Coding a PDF Text Parser in swift

Coding a PDF Text Parser in swift - ios

I'm currently developing a pdf text parser completely in swift.
I was looking trough the PDFKittens code and found this in the stringwithpdfstring method (In SimpleFont.m) taking a CGPDFStringRef as parameter.
const unsigned char *bytes = CGPDFStringGetBytePtr(pdfString);
NSUInteger length = CGPDFStringGetLength(pdfString);
// Translate to Unicode
for (int i = 0; i < length; i++)
{
unichar cid = bytes[i];
unichar uni = [self.toUnicode unicodeCharacter:cid];
}
From my understanding *bytes is a CChar, what is this method exactly iterating trough? When I translate this code to swift I receive the error that
Type UnsafePointer? has no subscript members.
What is the equivalent of that objective c code in swift...?

Related

NSData to NSArray of NSNumbers?

I'm coming from Swift to Objective-C and a problem I have run into is that NSData doesn't seem have methods to enumerate over UInt8 values the way Swift's Data does. Other answers with similar titles exist but they all deal with property lists, whereas I just have a bucket of ASCII bytes. Specifically, the code I want to replicate in Obj-C is:
//Find the next newline in ASCII data given a byte offset
let subset = myData.advanced(by: offset)
var lineEnd: Data.Index = myData.endIndex
subset.enumerateBytes({ (memory, idx, stop) in
let newline: UInt8 = 10
for idx in memory.indices {
let charByte = memory[idx]
if charByte == newline {
lineEnd = idx
stop = true; return;
}
}
})
Ideally I want a way to convert an NSArray to an array of NSNumbers which I can extract the intValues from. Of course, if there is a better method, let me know. A goal is to keep the Obj-C code as similar to Swift as possible as I will be maintaining the two codebases simultaneously.

The only good way to access an NSData's bytes is to call its -bytes method, which gets you a C pointer to its internal storage.
NSData *data = ...;
const uint8_t *bytes = data.bytes;
NSUInteger length = data.length;
for (NSUInteger i = 0; i < length; i++) {
uint8_t byte = bytes[i];
// do something with byte
}

The closest equivalent to advance would be subdataWithRange. The equivalent to enumerateBytes is enumerateByteRangesUsingBlock. It would yield something like:
NSData *subset = [data subdataWithRange:NSMakeRange(offset, data.length - offset)];
__block NSUInteger lineEnd = data.length;
Byte newline = 10;
[subset enumerateByteRangesUsingBlock:^(const void * _Nonnull bytes, NSRange byteRange, BOOL * _Nonnull stop) {
for (NSInteger index = 0; index < byteRange.length; index++) {
Byte charByte = ((Byte *)bytes)[index];
if (charByte == newline) {
lineEnd = index + byteRange.location;
*stop = true;
return;
}
}
}];
Note, I made a few changes from your Swift example:
If the data was not contiguous, your Swift example returns the index within the current block. But I suspect you want the location within subset, not the current block. I'm wagering that you've never noticed this because it's pretty rare that NSData blocks are not continuous.
But the Swift code doesn't look correct to me. This Objective-C example reports the offset within subset, not within the current block within subset.
It's not observable performance difference, but I pulled the definition of newline out of the enumeration block. Why repeatedly define that?
If you're really searching for a character in the NSData, I'd suggest avoiding creating the subset altogether. Just use rangeOfData:options:range:. This will find whatever you're looking for.

Converting NSStrings to C chars and calling a C function from Objective-C

I'm in an Objective-C method with various NSStrings that I want to pass to a C function. The C function requires a struct object be malloc'd so that it can be passed in - this struct contains char fields. So the struct is defined like this:
struct libannotate_baseManual {
char *la_bm_code; // The base code for this manual (pointer to malloc'd memory)
char *la_bm_effectiveRevisionId; // The currently effective revision ID (pointer to malloc'd memory or null if none effective)
char **la_bm_revisionId; // The null-terminated list of revision IDs in the library for this manual (pointer to malloc'd array of pointers to malloc'd memory)
};
This struct is then used in the following C function definition:
void libannotate_setManualLibrary(struct libannotate_baseManual **library) { ..
So that's the function I need to call from Objective-C.
So I have various NSStrings that I basically want to pass in there, to represent the chars - la_bm_code, la_bm_effectiveRevisionId, la_bm_revision. I could convert those to const chars by using [NSString UTF8String], but I need chars, not const chars.
Also I need to do suitable malloc's for these fields, though apparently I don't need to worry about freeing the memory afterwards. C is not my strong point, though I know Objective-C well.

strdup() is your friend here as that both malloc()s and strcpy()s for you in one simple step. It's memory is also released using free() and it does your const char * to char * conversion for you!
NSString *code = ..., *effectiveRevId = ..., *revId = ...;
struct libannotate_baseManual *abm = malloc(sizeof(struct libannotate_baseManual));
abm->la_bm_code = strdup([code UTF8String]);
abm->la_bm_effectiveRevisionId = strdup([effectiveRevId UTF8String]);
const unsigned numRevIds = 1;
abm->la_bm_effectiveRevisionId = malloc(sizeof(char *) * (numRevIds + 1));
abm->la_bm_effectiveRevisionId[0] = strdup([revId UTF8String]);
abm->la_bm_effectiveRevisionId[1] = NULL;
const unsigned numAbms = 1;
struct libannotate_baseManual **abms = malloc(sizeof(struct libannotate_baseManual *) * (numAbms + 1));
abms[0] = abm;
abms[1] = NULL;
libannotate_setManualLibrary(abms);
Good luck, you'll need it. It's one of the worst interfaces I've ever seen.

Objective-C how to convert a keystroke to ASCII character code?

I need to find a way to convert an arbitrary character typed by a user into an ASCII representation to be sent to a network service. My current approach is to create a lookup dictionary and send the corresponding code. After creating this dictionary, I see that it is hard to maintain and determine if it is complete:
__asciiKeycodes[#"F1"] = #(112);
__asciiKeycodes[#"F2"] = #(113);
__asciiKeycodes[#"F3"] = #(114);
//...
__asciiKeycodes[#"a"] = #(97);
__asciiKeycodes[#"b"] = #(98);
__asciiKeycodes[#"c"] = #(99);
Is there a better way to get ASCII character code from an arbitrary key typed by a user (using standard 104 keyboard)?

Objective C has base C primitive data types. There is a little trick you can do. You want to set the keyStroke to a char, and then cast it as an int. The default conversion in c from a char to an int is that char's ascii value. Here's a quick example.
char character= 'a';
NSLog("a = %ld", (int)test);
console output = a = 97
To go the other way around, cast an int as a char;
int asciiValue= (int)97;
NSLog("97 = %c", (char)asciiValue);
console output = 97 = a
Alternatively, you can do a direct conversion within initialization of your int or char and store it in a variable.
char asciiToCharOf97 = (char)97; //Stores 'a' in asciiToCharOf97
int charToAsciiOfA = (int)'a'; //Stores 97 in charToAsciiOfA

This seems to work for most keyboard keys, not sure about function keys and return key.
NSString* input = #"abcdefghijklkmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890!##$%^&*()_+[]\{}|;':\"\\,./<>?~ ";
for(int i = 0; i<input.length; i ++)
{
NSLog(#"Found (at %i): %i",i , [input characterAtIndex:i]);
}

Use stringWithFormat call and pass the int values.

Chinese character to ASCII or Hexadecimal

Im struggling to covert chinese word/characters to ascii or hexadecimal and all the values I've got up until now is not what I was suppose to get.
Example of conversion is the word 手 to hex is 1534b.
Methods Ive followed till now are as below, and I got varieties of results but the one I was looking for,
I really appreciate if you can help me out on this issue,
Thanks,
Mike
- (NSString *) stringToHex:(NSString *)str{
NSUInteger len = [str length];
unichar *chars = malloc(len * sizeof(unichar));
[str getCharacters:chars];
NSMutableString *hexString = [[NSMutableString alloc] init];
for(NSUInteger i = 0; i < len; i++ )
{
[hexString appendFormat:#"%02x", chars[i]]; //EDITED PER COMMENT BELOW
}
free(chars);
return hexString;}
and
const char *cString = [#"手" cStringUsingEncoding:NSASCIIStringEncoding];
below is the similar code in Java for Android, Maybe it helps
public boolean sendText(INotifiableManager manager, String text) {
final int codeOffset = 0xf100;
for (char c : text.toCharArray()) {
int code = (int)c+codeOffset;
if (! mConnection.getBoolean(manager, "SendKey", Integer.toString(code))) {
}

Your Java code is just doing this:
Take each 16-bit character of the string and add 0xf100 to it.
If you do the same thing in your above Objective-C code you will get the result you want.

Find Character String In Binary Data

I have a binary file I've loaded using an NSData object. Is there a way to locate a sequence of characters, 'abcd' for example, within that binary data and return the offset without converting the entire file to a string? Seems like it should be a simple answer, but I'm not sure how to do it. Any ideas?
I'm doing this on iOS 3 so I don't have -rangeOfData:options:range: available.
I'm going to award this one to Sixteen Otto for suggesting strstr. I went and found the source code for the C function strstr and rewrote it to work on a fixed length Byte array--which incidentally is different from a char array as it is not null terminated. Here is the code I ended up with:
- (Byte*)offsetOfBytes:(Byte*)bytes inBuffer:(const Byte*)buffer ofLength:(int)len;
{
Byte *cp = bytes;
Byte *s1, *s2;
if ( !*buffer )
return bytes;
int i = 0;
for (i=0; i < len; ++i)
{
s1 = cp;
s2 = (Byte*)buffer;
while ( *s1 && *s2 && !(*s1-*s2) )
s1++, s2++;
if (!*s2)
return cp;
cp++;
}
return NULL;
}
This returns a pointer to the first occurrence of bytes, the thing I'm looking for, in buffer, the byte array that should contain bytes.
I call it like this:
// data is the NSData object
const Byte *bytes = [data bytes];
Byte* index = [self offsetOfBytes:tag inBuffer:bytes ofLength:[data length]];

Convert your substring to an NSData object, and search for those bytes in the larger NSData using rangeOfData:options:range:. Make sure that the string encodings match!
On iPhone, where that isn't available, you may have to do this yourself. The C function strstr() will give you a pointer to the first occurrence of a pattern within the buffer (as long as neither contain nulls!), but not the index. Here's a function that should do the job (but no promises, since I haven't tried actually running it...):
- (NSUInteger)indexOfData:(NSData*)needle inData:(NSData*)haystack
{
const void* needleBytes = [needle bytes];
const void* haystackBytes = [haystack bytes];
// walk the length of the buffer, looking for a byte that matches the start
// of the pattern; we can skip (|needle|-1) bytes at the end, since we can't
// have a match that's shorter than needle itself
for (NSUInteger i=0; i < [haystack length]-[needle length]+1; i++)
{
// walk needle's bytes while they still match the bytes of haystack
// starting at i; if we walk off the end of needle, we found a match
NSUInteger j=0;
while (j < [needle length] && needleBytes[j] == haystackBytes[i+j])
{
j++;
}
if (j == [needle length])
{
return i;
}
}
return NSNotFound;
}
This runs in something like O(nm), where n is the buffer length, and m is the size of the substring. It's written to work with NSData for two reasons: 1) that's what you seem to have in hand, and 2) those objects already encapsulate both the actual bytes, and the length of the buffer.

If you're using Snow Leopard, a convenient way is the new -rangeOfData:options:range: method in NSData that returns the range of the first occurrence of a piece of data. Otherwise, you can access the NSData's contents yourself using its -bytes method to perform your own search.

I had the same problem.
I solved it doing the other way round, compared to the suggestions.
first, I reformat the data (assume your NSData is stored in var rawFile) with:
NSString *ascii = [[NSString alloc] initWithData:rawFile encoding:NSAsciiStringEncoding];
Now, you can easily do string searches like 'abcd' or whatever you want using the NSScanner class and passing the ascii string to the scanner. Maybe this is not really efficient, but it works until the -rangeOfData method will be available for iPhone also.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Coding a PDF Text Parser in swift - ios

Related

NSData to NSArray of NSNumbers?

Converting NSStrings to C chars and calling a C function from Objective-C

Objective-C how to convert a keystroke to ASCII character code?

Chinese character to ASCII or Hexadecimal

Find Character String In Binary Data

Categories

Resources