My main issue is figuring out the 10* and 01* states.
This is what I have so far: image
Related
I am building a some kind of calculator for iOS using DDMathParser. I would like to know how to separate an expression (NSString) like 3+3*4 (After parsing add(3,multiply(3,4))) into its individual steps:multiply(3,4) add(3,12). I would prefer that I get these in an NSArray or other relatively useful list object.
Example: add(3,multiply(3,4))
[1] multiply(3,4)
[2] add(3,12)
[3] 15
DDMathParser author here.
This should be pretty straight forward. The parser can give you back the well-formed expression tree instead of just the numeric evaluation. With the expression tree, you'd do something like this:
Find the deepest expression in the tree. This is the first thing that will be evaluated.
Evaluate that expression manually
Substitute the evaluation back in to the tree in place of the original expression
Create a copy of the entire tree to refer to what it looked at this point.
Repeat until there's nothing more to evaluate.
I have a bit of an unorthodox question that I cannot think of an approach how to tackle. I have some letters written like this:
/\ |---\ /---\
/ \ |___/ |
/----\ | \ |
/ \ |___/ \---/
Now, the idea is to read this content (possibly from a text file) and parse it to the real letters they actually represent. So this should be parsed to ABC.
I understand this is not OCR, but I have no idea if something like that is possible. I am not asking for a solution, but rather, how would you best attack this problem? What would be a good criteria for distinguishing when a 'letter' starts and when does it end?
Based on the comments it sounds like you could store a character font map (2-dimensional array for each character) and then read the input file and buffer a number of lines equal to the height of the characters.
Then, for each group of lines you would want to segment the input based on the width of the characters and slide across horizontally, looking for matches against your font map.
If you need to support multiple fonts then things get more complicated and you'd benefit more from a neural-net approach to character recognition of sorts.
One important aspect to keep in mind about how OCR typically works is that it takes an arbitrary image and it "pixelates" it generating a much lower resolution image. In your case you've already got a "pixelated" representation of the image and all you'd have to do is read in the input and feed that into the rest of the pipeline.
I would still approach this as an OCR-esque problem.
You could first draw the characters onto an image and run it through an available OCR library.
Or you could do it yourself.
Pre-process it by converting vertical and horzitonal characters into lines first.
Then where there are forward and backslashes, approximate start and finish points of the curve by where they meet the previous horizontal and vertical (a different approach would be needed for letters such as 'o' or 'e').
Once you have this image a simple pattern analysis approach, such as naive bayes should be able to produce reliable results.
Whether the pre processing would actually provide accuracy improvements, i'm not sure
I have a question about how to convert a regular expression to aan automata? I hear about the Gluskov algorithm but i couldn't find a right document about it.
Example: i have an regular expression like (a*|b*) U (a*a|c*)* and i want to convert in a automata with a simple algorithm.
Please help me
I'm trying to parse binary data using pipes-attoparsec in Haskell. The reason pipes (proxies) are involved is to interleave reading with parsing to avoid high memory use for large files. Many binary formats are based on blocks (or chunks), and their sizes are often described by a field in the file. I'm not sure what a parser for such a block is called, but that's what i mean by "sub-parser" in the title. The problem I have is to implement them in a concise way without a potentially large memory footprint. I've come up with two alternatives that each fail in some regard.
Alternative 1 is to read the block into a separate bytestring and start a separate parser for it. While concise, a large block will cause high memory use.
Alternative 2 is to keep parsing in the same context and track the number of bytes consumed. This tracking is error-prone and seems to infest all the parsers that compose into the final blockParser. For a malformed input file it could also waste time by parsing further than indicated by the size field before the tracked size can be compared.
import Control.Proxy.Attoparsec
import Control.Proxy.Trans.Either
import Data.Attoparsec as P
import Data.Attoparsec.Binary
import qualified Data.ByteString as BS
parser = do
size <- fromIntegral <$> anyWord32le
-- alternative 1 (ignore the Either for simplicity):
Right result <- parseOnly blockParser <$> P.take size
return result
-- alternative 2
(result, trackedSize) <- blockparser
when (size /= trackedSize) $ fail "size mismatch"
return result
blockParser = undefined
main = withBinaryFile "bin" ReadMode go where
go h = fmap print . runProxy . runEitherK $ session h
session h = printD <-< parserD parser <-< throwParsingErrors <-< parserInputD <-< readChunk h 128
readChunk h n () = runIdentityP go where
go = do
c <- lift $ BS.hGet h n
unless (BS.null c) $ respond c *> go
I like to call this a "fixed-input" parser.
I can tell you how pipes-parse will do it. You can see a preview of what I'm about to describe in pipes-parse in the parseN and parseWhile functions of the library. Those are actually for generic inputs, but I wrote similar ones for example String parsers as well here and here.
The trick is really simple, you insert a fake end of input marker where you want the parser to stop, run the parser (which will fail if it hits the fake end of input marker), then remove the end of input marker.
Obviously, that's not as easy as I make it sound, but it's the general principle. The tricky parts are:
Doing it in such a way that it still streams. The one I linked doesn't do that, yet, but the way you do this in a streaming way is to insert a pipe upstream that counts bytes flowing through it and then inserts the end-of-input marker at the correct spot.
Not interfering with existing end of input markers
This trick can be adapted for pipes-attoparsec, but I think the best solution would be for attoparsec to directly include this feature. However, if that solution is not available, then we can restrict the input that is fed to the attoparsec parser.
Ok, so I finally figured out how to do this and I've codified this pattern in the pipes-parse library. The pipes-parse tutorial explains how to do this, specifically in the "Nesting" section.
The tutorial only explains this for datatype-agnostic parsing (i.e. a generic stream of elements), but you can extend it to work with ByteStrings to work, too.
The two key tricks that make this work are:
Fixing StateP to be global (in pipes-3.3.0)
Embedding the sub-parser in a transient StateP layer so that it uses a fresh leftovers context
The pipes-attoparsec is going to release an update soon that builds on pipes-parse so that you can use these tricks in your own code.
I need help designing an nfa that accepts the words "hello","hello world" and "stay together" the alphabet includes the english alphabet,numbers and symbols. I need help getting started. Anyone has any suggestions?
I would start with a regular expression then work up from there.
A regex for your problem is: hello | hello world | stay together
(bear in mind the "hello" is redundant, but you didn't specify it needed to be optimal)
We can then use the rules of construction to convert the regular expression to an nfa. It looks like it is explained pretty well here
since it is just a bunch of concatenation (h-e-l-l-o...) and some unions (those '|' characters mean union), the final nfa will look like the following (except with more concatenations if you want to treat each letter separately):
NOTE: the above image was generated from here and is for the regular expression h|hw|st