How to parse ASCII PGM files in data structures with Haskell?

How to parse ASCII PGM files in data structures with Haskell? - parsing

I am completely new to functional programming and Haskell and i need to parse an ASCII PGM image into a data structure but i can't figure out how to do that.
I have looked at quite a few examples (including the Graphics.Pgm module) but still don't know how to write it in Haskell. Here is what i have so far (this code does not compile):
import System.IO
import Control.Monad
import Control.Applicative
import Data.Attoparsec.Char8
import qualified Data.ByteString as B
data ASCIIGreymap = ASCIIGreymap {
aGreyType :: String
, aGreyComment :: String
, aGreyWidth :: Int
, aGreyHeight :: Int
, aGreyMax :: Int
, aGreyData :: [Int]
} deriving (Eq)
instance Show ASCIIGreymap where
show ( ASCIIGreymap t c w h m _ ) = "ASCIIGreymap Type: "++show t ++ "Comment: " ++ show c ++ " w: " ++ show w ++ " h: " ++ show h ++ " max: " ++ show m
parseASCIIGreymap :: Parser ASCIIGreymap
parseASCIIGreymap = do
pgmType <- string
pgmComment <- string
pgmWidth <- integer
char ' '
pgmHeight <- integer
pgmMax <- integer
pgmGreyData <- [integer]
return $ ASCIIGreymap pgmType pgmComment pgmWidth pgmHeight pgmMax pgmGreyData
pgmFile :: FilePath
pgmFile = "test_ascii.pgm"
main = B.readFile logFile >>= print . parseOnly parseASCIIGreymap
A example file (test_ascii.pgm) looks like this:
P2
# CREATOR: GIMP PNM Filter Version 1.1
10 10
255
0
0
0
0
0
64
255
255
255
179
0
0
0
0
0
159
255
255
255
243
0
0
0
0
96
223
255
255
255
255
0
0
64
96
223
255
255
255
255
255
128
128
191
223
255
255
255
255
255
255
255
255
255
255
255
249
217
179
128
128
255
255
255
255
249
198
89
51
0
0
255
255
255
249
198
77
0
0
0
0
255
255
255
236
128
0
0
0
0
0
191
255
255
218
51
0
0
0
0
0
The first line holds the "magicNumber" where P2 stands for 8bit grey
image
The second line is a comment
The third line has the width and height of the image separated with
a space
In the fourth line is the max grey value
From here on to the end of the file are the grey values for each
pixel
I would like to parse this pgm file into the data structure (ASCIIGreymap) to compare two images later. But like i said i don't know how to get there. If my approach is wrong or if there are better ways of parsing a pgm image please let me know.
Any help is much appreciated!
Edit: Since i haven't made any progress with parsing a pgm file i am not so sure anymore if my approach is correct.
Can someone please comment on my general idea to take the content of the file and put it in a data structure to further work with the data? Or is there a better way?
Thanks again!

Related

why max_samples does not accept float type?

I am doing machine learning.Here I want to find the best triple (max_samples, n_trees and threshold) that gives the greatest performance in terms of area under ROC curve and area under recall precison curve.
Here is the code:
def meilleur_triplet(x,classes):
for n_trees in np.arange(100,160,10):
for sample_size in np.arange(0.1,1,0.1):
for threshold in np.arange(0.4,1,0.1):
model=IforestLocal(sample_size,n_trees)
model.fit(x)
y_pred,y_score=model.predict(x,threshold)
auc=roc_auc_score(classes,y_pred)
auc_pr=average_precision_score(classes,y_pred)
Now when I use max_samples with a range of int I don't have an error however if it's in float I have the following error:
**
TypeError Traceback (most recent call last)
Input In [201], in <cell line: 1>()
----> 1 meilleur_triplet(X_glass,y_glass)
Input In [200], in meilleur_triplet(x, classes)
6 for threshold in np.arange(0.4,1,0.1):#(0.4,1,0.1)
8 model=IforestLocal(sample_size,n_trees)
----> 9 model.fit(x)
File ~\Desktop\THESE\Maurras\Code_Maurras\iforest_D.py:45, in IsolationForest.fit(self, X)
42 self.sample_size = len_x
44 for i in range(self.n_trees):
---> 45 sample_idx = random.sample(list(range(len_x)), self.sample_size)
46 # TODO: Must be deleted before compute the memory consumption of the methods
47 self.samples.append(sample_idx)
File ~\anaconda3\lib\random.py:450, in Random.sample(self, population, k, counts)
448 if not 0 <= k <= n:
449 raise ValueError("Sample larger than population or is negative")
--> 450 result = [None] * k
451 setsize = 21 # size of a small set minus size of an empty list
452 if k > 5:
TypeError: can't multiply sequence by non-int of type 'numpy.float64'
**
This is where I called the function
meilleur_triplet(X_glass,y_glass)
Thank you please help me

Lua Obfuscation - How would you make a lua string look like C++ compiled code

I was wondering, how would I make a simple lua string or entire code look look C++ compiled code but run as regular vanilla lua?
print("Test string") -- How would this look like C++ compiler code?

With Lua you can not directly dump print to a binary Format.
...as i know.
Dumping a Function to a Binary is easy doing with own defined Functions...
> -- Lua 5.4
> myfunc = function() print("Teststring") return end
> string.dump(myfunc, true)
uaT�
�
xV(w#����
��DGG��print�Teststring������
> load(string.dump(myfunc, true))()
Teststring
As you can see, like in a compiled C Binary the Constants are not obfuscated.
More obfuscating you can reach with converting the binary String to Bytecode...
> string.dump(myfunc, true):byte(1, -1)
27 76 117 97 84 0 25 147 13 10 26 10 4 8 8 120 86 0 0 0 0 0 0 0 00 0 0 40 119 64 1 128 129 129 0 0 2 133 11 0 0 0 131 128 0 0 68 0 21 71 0 1 0 71 0 1 0 130 4 134 112 114 105 110 116 4 139 84 101 115 116 115 116 114 105 110 103 129 0 0 0 128 128 128 128 128
...and for converting back later lets put it into a table...
> byte_code_tab = {string.dump(myfunc, true):byte(1, -1)}
> table.concat(byte_code_tab,',')
27,76,117,97,84,0,25,147,13,10,26,10,4,8,8,120,86,0,0,0,0,0,0,0,0,0,0,0,40,119,64,1,128,129,129,0,0,2,133,11,0,0,0,131,128,0,0,68,0,2,1,71,0,1,0,71,0,1,0,130,4,134,112,114,105,110,116,4,139,84,101,115,116,115,116,114,105,110,103,129,0,0,0,128,128,128,128,128
...now a function is needed to get it back...
> bytes_dec = function(tab) local txt = '' for k, v in pairs(tab) do txt = txt .. tostring(v):char() end return txt end
> bytes_dec(byte_code_tab)
uaT�
�
xV(w#����
��DGG��print�Teststring������
> load(bytes_dec(byte_code_tab))()
Teststring
EDIT
To show how it work with a single Lua file that returning a table with a __call metamethod check out this...
-- obfsc.lua
return setmetatable({27,76,117,97,84,0,25,147,13,10,26,10,4,8,8,120,86,0,0,0,0,0,0,0,0,0,0,0,40,119,64,1,128,129,129,0,0,2,133,11,0,0,0,131,128,0,0,68,0,2,1,71,0,1,0,71,0,1,0,130,4,134,112,114,105,110,116,4,139,84,101,115,116,115,116,114,105,110,103,129,0,0,0,128,128,128,128,128},
{__call = function(self, ...)
local txt = ''
for k, v in pairs(self) do
txt = txt .. tostring(v):char()
end
return load(txt)()
end})
...the bytes_dec function is stored in the __call metamethod...
$ /usr/local/bin/lua
Lua 5.4.4 Copyright (C) 1994-2022 Lua.org, PUC-Rio
> require('obfsc')
table: 0x565d3650 ./obfsc.lua
> require('obfsc')()
Teststring
...and do also the load()
But it is up to you where you store: bytes_dec()
Another nice method is ROT.
Its very simple and also old but good enough for de/obfuscating.
An Impression...
$ /bin/lua
Lua 5.1.5 Copyright (C) 1994-2012 Lua.org, PUC-Rio
> rot=require('rot')
> -- Lets rotate the Banner
> print(rot('Lua 5.1.5 Copyright (C) 1994-2012 Lua.org, PUC-Rio'))
5!`unqnu``/092)'(4`hi`qyytmrpqr`
5!n/2'l`m)/ 51
> -- Now read source of rot.lua into rot_src and print it
> rot_src = io.open('rot.lua'):read('*a')
> print(rot_src)
-- rot.lua
local rotator = function(...)
local args, rot, c = {...}, {}, ''
for i = 1, 63 do rot[c.char(i)] = c.char(i + 64) end
for i = 64, 127 do rot[c.char(i)] = c.char(i - 64) end
return args[1]:gsub('.', rot)
end
return rotator
> -- Obfuscate the source and print it
> rot_obfsc = rot(rot_src)
> print(rot_obfsc)
mm`2/4n,5!J,/#!,`2/4!4/2`}`&5.#4)/.hnnniJ,/#!,`!2'3l`2/4l`#`}`;nnn=l`;=l`ggJJ&/2`)`}`ql`vs`$/`2/4#(!2h)i`}`#n#(!2h)`k`vti`%.$J&/2`)`}`vtl`qrw`$/`2/4#(!2h)i`}`#n#(!2h)`m`vti`%.$JJ2%452.`!2'3z'35"hgngl`2/4iJ%.$JJ2%452.`2/4!4/2J
> -- Deobfuscate and print on the fly
> print(rot(rot_obfsc))
-- rot.lua
local rotator = function(...)
local args, rot, c = {...}, {}, ''
for i = 1, 63 do rot[c.char(i)] = c.char(i + 64) end
for i = 64, 127 do rot[c.char(i)] = c.char(i - 64) end
return args[1]:gsub('.', rot)
end
return rotator
236

How to save dpi info in py-opencv?

import cv2
def clear(img):
back = cv2.imread("back.png", cv2.IMREAD_GRAYSCALE)
img = cv2.bitwise_xor(img, back)
ret, img = cv2.threshold(img, 120, 255, cv2.THRESH_BINARY_INV)
return img
def threshold(img):
ret, img = cv2.threshold(img, 120, 255, cv2.THRESH_BINARY_INV)
img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
ret, img = cv2.threshold(img, 248, 255, cv2.THRESH_BINARY)
return img
def fomatImage(img):
img = threshold(img)
img = clear(img)
return img
img = fomatImage(cv2.imread("1566135246468.png",cv2.IMREAD_COLOR))
cv2.imwrite("aa.png",img)
This is my code. But when I tried to identify it with tesseract-ocr, I got a warning.
Warning: Invalid resolution 0 dpi. Using 70 instead.
How should I set up dpi?

AFAIK, OpenCV doesn't set the dpi of PNG files it writes, so you are looking at work-arounds. Here are some ideas...
Method 1 - Use PIL/Pillow instead of OpenCV
PIL/Pillow can write dpi information into PNG files. So you would:
Step 1 - Convert your BGR OpenCV image into RGB to match PIL's channel ordering
from PIL import Image
RGBimage = cv2.cvtColor(BGRimage, cv2.COLOR_BGR2RGB)
Step 2 - Convert OpenCV Numpy array onto PIL Image
PILimage = Image.fromarray(RGBimage)
Step 3 - Write with PIL
PILimage.save('result.png', dpi=(72,72))
As Fred mentions in the comments, you could equally use Python Wand in much the same way.
Method 2 - Write with OpenCV but modify afterwards with some tool
You could use Python's subprocess module to shell out to, say, ImageMagick and set the dpi like this:
magick OpenCVImage.png -set units pixelspercentimeter -density 28.3 result.png
All you need to know is that PNG uses metric (dots per centimetre) rather than imperial (dots per inch) and there are 2.54cm in an inch, so 72 dpi becomes 28.3 dots per cm.
If your ImageMagick version is older than v7, replace magick with convert.
Method 3 - Write with OpenCV and insert dpi yourself
You could write your file to memory using OpenCV's imencode(). Then search in the file for the IDAT (image data) chunk - which is the one containing the image pixels and insert a pHYs chunk before that which sets the density. Then write to disk.
It's not that hard actually - it's just 9 bytes, see here and also look at pngcheck output at end of answer.
This code is not production tested but seems to work pretty well for me:
#!/usr/bin/env python3
import struct
import numpy as np
import cv2
import zlib
def writePNGwithdpi(im, filename, dpi=(72,72)):
"""Save the image as PNG with embedded dpi"""
# Encode as PNG into memory
retval, buffer = cv2.imencode(".png", im)
s = buffer.tostring()
# Find start of IDAT chunk
IDAToffset = s.find(b'IDAT') - 4
# Create our lovely new pHYs chunk - https://www.w3.org/TR/2003/REC-PNG-20031110/#11pHYs
pHYs = b'pHYs' + struct.pack('!IIc',int(dpi[0]/0.0254),int(dpi[1]/0.0254),b"\x01" )
pHYs = struct.pack('!I',9) + pHYs + struct.pack('!I',zlib.crc32(pHYs))
# Open output filename and write...
# ... stuff preceding IDAT as created by OpenCV
# ... new pHYs as created by us above
# ... IDAT onwards as created by OpenCV
with open(filename, "wb") as out:
out.write(buffer[0:IDAToffset])
out.write(pHYs)
out.write(buffer[IDAToffset:])
################################################################################
# main
################################################################################
# Load sample image
im = cv2.imread('lena.png')
# Save at specific dpi
writePNGwithdpi(im, "result.png", (32,300))
Whichever method you use, you can use pngcheck --v image.png to check what you have done:
pngcheck -vv a.png
Sample Output
File: a.png (306 bytes)
chunk IHDR at offset 0x0000c, length 13
100 x 100 image, 1-bit palette, non-interlaced
chunk gAMA at offset 0x00025, length 4: 0.45455
chunk cHRM at offset 0x00035, length 32
White x = 0.3127 y = 0.329, Red x = 0.64 y = 0.33
Green x = 0.3 y = 0.6, Blue x = 0.15 y = 0.06
chunk PLTE at offset 0x00061, length 6: 2 palette entries
chunk bKGD at offset 0x00073, length 1
index = 1
chunk pHYs at offset 0x00080, length 9: 255x255 pixels/unit (1:1). <-- THIS SETS THE DENSITY
chunk tIME at offset 0x00095, length 7: 19 Aug 2019 10:15:00 UTC
chunk IDAT at offset 0x000a8, length 20
zlib: deflated, 2K window, maximum compression
row filters (0 none, 1 sub, 2 up, 3 avg, 4 paeth):
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
(100 out of 100)
chunk tEXt at offset 0x000c8, length 37, keyword: date:create
chunk tEXt at offset 0x000f9, length 37, keyword: date:modify
chunk IEND at offset 0x0012a, length 0
No errors detected in a.png (11 chunks, 76.5% compression).
While I am editing PNG chunks, I also managed to set a tIME chunk and a tEXt chunk with the Author. They go like this:
# Create a new tIME chunk - https://www.w3.org/TR/2003/REC-PNG-20031110/#11tIME
year, month, day, hour, min, sec = 2020, 12, 25, 12, 0, 0 # Midday Christmas day 2020
tIME = b'tIME' + struct.pack('!HBBBBB',year,month,day,hour,min,sec)
tIME = struct.pack('!I',7) + tIME + struct.pack('!I',zlib.crc32(tIME))
# Create a new tEXt chunk - https://www.w3.org/TR/2003/REC-PNG-20031110/#11tEXt
Author = "Author\x00Sir Mark The Great"
tEXt = b'tEXt' + bytes(Author.encode('ascii'))
tEXt = struct.pack('!I',len(Author)) + tEXt + struct.pack('!I',zlib.crc32(tEXt))
# Open output filename and write...
# ... stuff preceding IDAT as created by OpenCV
# ... new pHYs as created by us above
# ... new tIME as created by us above
# ... new tEXt as created by us above
# ... IDAT onwards as created by OpenCV
with open(filename, "wb") as out:
out.write(buffer[0:IDAToffset])
out.write(pHYs)
out.write(tIME)
out.write(tEXt)
out.write(buffer[IDAToffset:])
Keywords: OpenCV, PIL, Pillow, dpi, density, imwrite, PNG, chunks, pHYs chunk, Python, image, image-processing, tEXt chunk, tIME chunk, author, comment

Using esttab with ttest

I am running a ttest command and exporting results to LaTeX using estpost and the community-contributed command esttab.
I am testing for a difference for means (of variable height, by child gender) for several years and would like the years to be displayed vertically (in rows) rather than horizontally.
My code and is given below:
foreach i in 2009 2010 2013 {
use "`i'.dta", clear
global year `i'
eststo _$year : estpost ttest height, by(child_gender)
}
esttab . using "trends.tex", nonumber append
Data for 2009:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(child_gender height)
0 156
1 135
0 189
1 168
0 157
1 189
1 135
1 145
0 124
1 139
end
Data for 2010:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(child_gender height)
0 151
1 162
0 157
1 134
0 157
1 189
1 135
1 145
0 143
1 166
end
Data for 2013:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(child_gender height)
0 177
0 135
0 189
0 168
0 157
1 189
1 135
1 145
1 124
1 127
end
I would like the output arranged as follows (but in LaTeX):
Any suggestions on how to make this work?

The way to do this can be found below. You need to play with the options to further polish the table.
First define the program append_ttests, which is a quickly modified version of appendmodels, Ben Jann's program for stacking models:
program append_ttests, eclass
version 8
syntax namelist
tempname b V tmp
foreach name of local namelist {
qui est restore `name'
mat `tmp' = e(b)
local eq1: coleq `tmp'
gettoken eq1 : eq1
mat `tmp' = `tmp'[1,"`eq1':"]
local cons = colnumb(`tmp',"_cons")
if `cons'<. & `cons'>1 {
mat `tmp' = `tmp'[1,1..`cons'-1]
}
mat `b' = nullmat(`b') , `tmp'
mat `tmp' = e(t)
mat `tmp' = `tmp'["`eq1':","`eq1':"]
if `cons'<. & `cons'>1 {
mat `tmp' = `tmp'[1..`cons'-1,1..`cons'-1]
}
capt confirm matrix `V'
if _rc {
mat `V' = `tmp'
}
else {
mat `V' = ///
( `V' \ ///
`tmp' )
}
}
mat `b' = `b''
mat A = `b' , `V'
mat rown A = `0'
ereturn matrix results = A
eret local cmd "append_ttests"
end
Then run your loop and append the t-tests:
foreach i in 2009 2010 2013 {
use "`i'.dta", clear
estpost ttest height, by(child_gender)
estimates store year`i'
}
append_ttests year2009 year2010 year2013
See the results as follows:
esttab e(results), nonumber mlabels(none) ///
varlabels(year2009 2009 year2010 2010 year2013 2013) ///
collabels("Height" "t statistic")
--------------------------------------
Height t statistic
--------------------------------------
2009 4.666667 .3036859
2010 -3.166667 -.2833041
2013 21.2 1.415095
--------------------------------------
Add the tex option to see the LaTeX output.

Parsing Data in C

I am trying to parse some data using C.
The data is of the form:
REMARK 280 100 MM MES PH 6.5, 5 % GLYCEROL
REMARK 290
REMARK 290 CRYSTALLOGRAPHIC SYMMETRY
REMARK 290 SYMMETRY OPERATORS FOR SPACE GROUP: P 1 21 1
REMARK 290
REMARK 290 SYMOP SYMMETRY
REMARK 290 NNNMMM OPERATOR
REMARK 290 1555 X,Y,Z
REMARK 290 2555 -X,Y+1/2,-Z
I want to extract the "Symmetry Operator" data: X,Y,Z and -X,Y+1/2,-Z and turn the data into two matrices for each set of symmetry operators of the form:
[1 0 0 [0 [-1 0 0 [0
0 1 0 0 and 0 1 0 1/2
0 0 1] 0] 0 0 -1] 0]
for X,Y,Z, and -X,Y+1/2,-Z respectively.
I have not done much data parsing and would appreciate any help anyone could offer.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How to parse ASCII PGM files in data structures with Haskell? - parsing

Related

why max_samples does not accept float type?

Lua Obfuscation - How would you make a lua string look like C++ compiled code

How to save dpi info in py-opencv?

Using esttab with ttest

Parsing Data in C

Categories

Resources