Parsing complex files with Parsec

Parsing complex files with Parsec - parsing

I would like to parse files with several sequences of data (same number of column, same content, ...) with Haskell.
My data sequences will be delimited by keywords before and after.
BEGIN
1 882
2 809
3 435
4 197
5 229
6 425
...
END
BEGIN
1 235 623 684
2 871 699 557
3 918 686 49
4 53 564 906
5 246 344 501
6 929 138 474
...
END
My problem is that after several tests with Parsec, I have the impression that Parsec is rather made to parse a file line by line and not the whole file.
Is Parsec the right way to make what I want or should I consider an other tool like Happy or Alex ?
Is there a website (or other ressource) providing examples of parsing complex text files with Parsec ?
Note : The example I give is a very simple one. Things would be more tricky in my files with many more keywords and combinations.

The format as you've described wouldn't be hard at all to handle in parsec.
As for learning how to use it: your first step should be to avoid whatever guide gave you the impression that parsec worked line-by-line. I recommend Chapter 16 of Real World Haskell as a good place to get started, and once you're comfortable with the basics the reference material at http://hackage.haskell.org/package/parsec is actually very clear.

Related

Im trying to figure out how I can decompile this obfuscated Lua script

So I've been trying to figure out how to decompile dlls, java, and Lua files but once I ran into this one I got stumped.
Does anyone have any ideas on how I can decompile this?
Since the script was way too big I put it in a pastebin link. https://pastebin.com/UsdWEHnmIlIIl1liIllIi1II1Ii.lIl1llIllIii1111lIIIii = lIli1IlI11lIlI1il11i1() lIli1IlI11lIlI1il11i1() lIli1IlI11lIlI1il11i1() local ll1ili1i1Ii1II111li = lIlIlll1Ill1illiliIiI() for i1IiIili111iI1lil1l = lIliI1iii11lilII1IIil, ll1ili1i1Ii1II111li do IlIIl1liIllIi1II1Ii.l111II111Il1IiIII11i[i1IiIili111iI1lil1l] = lIlIlll1Ill1illiliIiI() end lIlIlll1Ill1illiliIiI() lIli1IlI11lIlI1il11i1() lIli1IlI11lIlI1il11i1() lIlIlll1Ill1illiliIiI() lIlIlll1Ill1illiliIiI() lIlIlll1Ill1illiliIiI() lIli1IlI11lIlI1il11i1() local ll1ili1i1Ii1II111li = lIlIlll1Ill1illiliIiI() - (#{ 91625, 31274, 132907, 128929, 89879, 28353, 85846, 63662, 120975, 94604, 40073, 120271, 29175, 126728, 55753, 31423, 118592, 112751, 123563, 26653 } + 49 - 22 - 12 + 24 + 10 + 32 - 27 + 22 - 35 + 41 + 25 + 29 + 18 + 33 + 32 + 133485) for i1IiIili111iI1lil1l = lIliI1iii11lilII1IIil, ll1ili1i1Ii1II111li do local l1iI1Illil1i1il1iII = {} local lIlill1IIIlli1iII1ill = lIlIllI1i111lilIi1ilI(i1iIIIii1liiIillilI) l1iI1Illil1i1il1iII.il1li1iilIii1iIll11l = iiIlIlilIlIll1l1l1l(lIlill1IIIlli1iII1ill, #{ 19814, 81950, 109054, 18321, 117777, 126276, 941, 40833, 27393, 25354, 106568, 58140, 73781, 28751, 110509, 42721, 118305, 94680, 18166, 4591 } + 26 - 9 + 9 - 4 - 48 - 2 + 24 + 47 - 35 - 8 - 31 - 1 + 39, #{ 34453, 33661, 37020, 5461, 3935, 7245, 90253, 30010, 122438, 78286, 50375, 62446, 101176, 126539, 91679, 59085, 67167, 93133, 73148, 54067, 13807 } + 29 - 46 - 15 + 41 + 32 - 26 + 6 - 6 + 27 - 43 + 12 - 17 + 11 + 6) l1iI1Illil1i1il1iII.lIlilIilillll11iil1li1 = iiIlIlilIlIll1l1l1l(lIlill1IIIlli1iII1ill, #{ 59738, 38876, 31250, 75801, 96293, 27832, 11774, 9098, 31230, 80836, 129303, 101680, 12689, 60836 } - 3 + 38 + 32 - 43 + 21 - 10 + 5 - 32 + 14 - 8 + 7 - 15 - 19, #{ 37073, 70137, 113242, 21765, 129309, 86407, 33113, 85980, 105005, 59356, 53236, 100694, 61483, 55175, 85902, 33351, 70969, 133357, 55705, 74121, 116292, 132529 } - 13 - 4 - 47 - 36 - 29 + 17 - 49 + 43 - 48 - 42 - 4 - 18 + 16 + 201) l1iI1Illil1i1il1iII.I1i1IiiIlIIl1II11IiI = iiIlIlilIlIll1l1l1l(lIlill1IIIlli1iII1ill, #{ 129902, 68496, 976, 73113, 19012, 12350, 23326, 93845, 88636, 103236, 52249, 70226, 40074,
This is a VERY VERY VERY small sample, in all there are 40,000 chars.

I understand this is quite an old thread but perhaps I can assist with some information you may not know. This script is in lua created by an older version of the current Luraph. Sold on a multi-purpose website called V3rmillion and sold as a service on the black market. Although the "obfuscated" code was paid, since every script you want obfuscated costs $1 PayPal, I assume their obfuscations are quite worth the price. The pastebin provided does not include the watermark, but I know that it's Luraph because of this side message: http://prntscr.com/k37hin This is a good example of custom bytecode which may sound pretty awesome but in return is just Lua bytecode that uses a lua virtual machine to comprehend what it means. Which is also related to Lua bytecode; using a interpreter to understand what the bytecode means and resulting in a executable code. So practically Luraph uses a custom-made interpreter and custom-made bytecode and uses the interpreter to understand the bytecode then turn it into executable code.

Luraph is an LBI.
Here is the lbi: https://github.com/JustAPerson/lbi/blob/master/src/lbi.lua
It uses custom Bytecode, you can easily find the patterns in the script from the lbi, and the luraph vm. Just replace those, and you get a readable vm.
Deobfuscating the Bytecode is a whole different matter.

There are a few different things you can do to help get rid of obfuscation in code.
1. Use Proper Variable and Function Names
One would be to find and replace all of the different variable and function names with something more distinctive than "I1lili1" and so on. This would allow you to follow the code much easier and also prevent you confusing any variables with each other.
2. Indent the Code
Another would be to look for the 'if', 'while', 'function' and 'end' keywords and then start indenting the code to make it more readable and again easier to follow.
3. Solve the Basic Maths
The above code uses the length function (#) very often as it is using most of the lists as another way to represent normal numbers and prevent people seeing the actual numbers easily. For example:
#{ 10, 372, 67298, 2287, 694, 1, 5039 }
will become:
7
when you perform the length function. If you change all of those lists to actual numbers and then solve the simple addition and subtraction equations after you can get rid of nearly all the numbers.
Of course doing this will take a lot of time but that's the point of obfuscating the code after all. If you don't want to spend a few hours going through all the code getting rid of the obfuscation you can just use this version I prepared earlier: https://pastebin.com/Amtt8UMP I have used all of the above methods to get rid of some of the obfuscation in the code however you will still need to trace through the program to find the outputs from all the functions.
As Egor Skriptunoff commented however, all this piece of code will do most likely is activate a loadstring. This code from the loadstring will also probably be obfuscated so in reality this piece of code it useless to you.
Hope this helps!

How to use F# TypeProvider to read PowerBall csv?

The powerball schema and separators are not consistent which makes it an unusual file to read. (http://www.powerball.com/powerball/winnums-text.txt)
Sample:
Draw Date WB1 WB2 WB3 WB4 WB5 PB PP
09/24/2016 15 07 29 41 20 22 2
09/21/2016 63 67 01 69 28 17 4
09/17/2016 51 19 09 62 55 14 4
Any suggestions?

This looks like a "fixed column width" file rather than an ordinary CSV (meaning that the columns are not separated by any single character, but instead have fixed number of characters, with padding spaces).
There is some early work on supporting this in F# Data in the pull request here. We'd welcome any help getting this tested - but you'd need to get the soruce code and build F# Data from source (which is just a matter of running the build script though!)
Alternatively, you could probably do some simple pre-processing on the file before reading it as an ordinary CSV file. Looking at the sample file, using a regular expression to replace 1 or more spaces with a comma would produce regular CSV that the CSV provider can consume.

Postscript file - Image instead text

With a Postscript driver (Xerox, Canon, HP, all), when I create a PS file, for example when I print the test page in the printer properties, I get :
OK :
The view of the result is correct (with GSview for example)
Not OK :
The file size is to big, more than 4 MB.
When I edit the file, I have one big image (doNimage). I think is the reason of the big size file.
The example file : https://drive.google.com/open?id=0B9bet657DEU5alV6WFZZdDFjMmc
I'm on Windows 10, similar problem with Windows server 2012 r2.
I let the configuration of the driver by default.
Anyone has an idea ?
Thanks a lot.
Regards.

I don't understand your problem, the file you posted a link to contains text. Here's an example:
360 4485 M <202530360E0F1102381030100D100B0824152D30103102020C302A1E19181B1E1730132E28301530132D3B02230B2A2E22081308>[46 16 28 70 18 42 44 44 54 32 28 32 36 32 25 39 65 40 40 28 32 44 44 44 18 28 53 45 20 47 38 45
40 28 34 40 40 28 40 28 34 40 18 44 44 25 53 40 16 39 34 0]xS
M is a moveto and xS uses the xshow operator to draw the glyphs represented by the character codes in the hexstring, using the values in the array to modify the width of each glyph.
If you were expecting to see ASCII character codes you are going to be sadly disappointed, the files uses an incrementally downloaded subset TrueType font, so the character codes are defined as they are encountered, that is the first glyph used will be given character code 1, the second will be character code 2 and so on.
Even without that, using ASCII would limit the languages that could be supported. Back in the 1980s that maybe didn't seem like a problem, but its a long time since that was considered acceptable.
If you were expecting to be able to modify the text by editing it in a text editor, forget it. PostScript is a programming language, and the output of a PostScript printer driver is a machine-generated program. Its a lengthy process for a skilled user of the language to decipher what the program is doing. The program is not amenable to alteration, if there's a fault in the output, correct the original document and recreate the PostScript program from the original.
PostScript is not an editable format.

Thanks all for your response. I see I was not very clear in my question.
Here is the state :
With the PS driver, on a windows server 2008, I get this file :
http://expirebox.com/download/0bb511565377e8b74eead67641fe7f68.html
Inside the file I can see the text "Page de test d\222imprimante"
On a Windows server 2012 R2 :
http://expirebox.com/download/60fa957cba97c82bbcd5c0e975825b52.html
I can't see any text. It's a printer page test too.
I need to see text because I'll print document with code inside. Code for a printer to identify page type. (for example a white page for the tray n° 1, yellow page for tray 2)
KenS : I understand your point. But why the same driver give different file.
I checked if it's really the same. The only difference I see is the OS, one x86, the other x64.
Thanks.
Regards.

Need to Arrange Some Numbers Accordingly in the View

I have a tree structure suppose. I know some conditions. I would like to give an example:
469 & 470 results 468
472 & 473 results 471
476 & 477 results 475
479 & 480 results 478
This is the round 1 suppose. In the next few rounds:
Round 2:
468 & 471 results 467
475 & 478 results 474
Round 3:
467 & 474 results 466
I need to arrange them as shown in image. Also I have one more thing that to arrange them I have made some ids in css so that they go in the appropriate position. So starting from the right most it should get 15 and then left to it 14, 13. I cannot post images so I am making a structure here itself:
469
468
470
467
472
471
473
466
476
475
477
474
479
478
480
Now the numbers each will get is:
1
9
2
13
3
10
4
15
5
11
6
14
7
12
8
Now my question is I get this things from database that these two numbers result into third. I need to write a piece of code that makes this arrangement automatic. I am getting an array of hashes for each number. Means a hash for 469, other for 470 and so on. In rails term what we call is ActiveRecord::Relation. Can anyone help me please.
More update:
I know always that 469 & 470 will results 468 and so on. Also suppose I am on 466 then it will have the detail that it has came from 467 & 474. In short it has the forward and backward both numbers. I want to run a loop on them and arrange them in the above order so that the left side schedules and right side schedules match. This can be assumed as a world cup match of any sport in which the two matches result in next match and so on. And finally I want to make a tree in my view.

I have solved the issue. What I have done is made a custom hash. For this I have ran a loop like this:
First took the last one i.e., in the above example it is 466. So I inserted it in hash:
{5 => {466}}
Then I know that 466 has come from 467 & 474. So I inserted these two in the same has as this:
{5 => {466}, 4=> {467, 474}}
Then for the next one I ran a loop on the hash[4] and first took the two schedules from 467 and inserted in the hash and then the two from 474 in the hash. This is how:
{5 => {466}, 4 => {467, 474}, 3 => {468, 471, 475, 478}}
and so on. And then in the view I looped on the hash according to the key and arranged them in the view. Suppose the hash is s. Then:
(4).downto(1).each do |i|
s[i].each do |x|
#code to display here
end
end
Hope this helps anyone else too if they need to do the same thing.

xcode : retrieving one line of xcode based on search query

Here is a sample of my CSV
10820 0 0 0 0
10900 2 4 4 4
11000 21 50 54 58
11100 23 54 59 63
11200 25 59 63 68
11300 27 63 68 73
11400 29 68 73 78
11500 31 72 78 83
11600 32 76 82 88
11700 34 81 87 93
I'm looking to create to use xcode to retreive one line of code from this very lengthy CSV based on the first line.
For example:
if the user enters "10900", the second line columns will be returned.
If the user returns 11650, the 11600 line columns will be returned...always taking the lower line when the input value is less then the following line.
Any help would be appreciated. I've seen code to parse an entire CSV file, but I'm thinking this may be a big memory drain, right now my CSV has 2000 lines of values, which are all in ascending order based on the first column.

You have to load a file into memory anyways to find correct value.
With such a big CSV file I would recommend to turn CSV file into binary file (plist file for example) and put it as binary into your application - instead of parsing it each time in RunTime. It has much better performance and it's easier to work with that since you are working directly with NSDictonaries an NSArrays.
If you don't want to do it for some reason, the next solution is to use something like CHCSVParser:
https://github.com/davedelong/CHCSVParser
It provides optimization for loading only part of file at a time - which is the optimization you might be looking for.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart