How to get a pointer value in Haskell? - memory

I wish to manipulate data on a very low level.
Therefore I have a function that receives a virtual memory address as an integer and "does stuff" with this memory address. I interfaced this function from C, so it has the type (CUInt -> a).
The memory I want to link is a Word8 in a file. Sadly, I have no idea how to access the pointer value to that Word8.
To be clear, I do not need the value of the Word8, i need the value to the virtual memory address, which is the value of the pointer to it.

For the sake of a simple example, say you want to add an offset to the pointer.
Front matter:
module Main where
import Control.Monad (forM_)
import Data.Char (chr)
import Data.Word (Word8)
import Foreign.ForeignPtr (ForeignPtr, withForeignPtr)
import Foreign.Ptr (Ptr, plusPtr)
import Foreign.Storable (peek)
import System.IO.MMap (Mode(ReadOnly), mmapFileForeignPtr)
Yes, you wrote that you don't want the value of the Word8, but I've retrieved it with peek to demonstrate that the pointer is valid. You might be tempted to return the Ptr from inside withForeignPtr, but the documentation warns against that:
Note that it is not safe to return the pointer from the action and use it after the action completes. All uses of the pointer should be inside the withForeignPtr bracket. The reason for this unsafeness is the same as for unsafeForeignPtrToPtr below: the finalizer may run earlier than expected, because the compiler can only track usage of the ForeignPtr object, not a Ptr object made from it.
The code is straightforward:
doStuff :: ForeignPtr Word8 -> Int -> IO ()
doStuff fp i =
withForeignPtr fp $ \p -> do
let addr = p `plusPtr` i
val <- peek addr :: IO Word8
print (addr, val, chr $ fromIntegral val)
To approximate “a Word8 in a File” from your question, the main program memory-maps a file and uses that buffer to do stuff with memory addresses.
main :: IO ()
main = do
(p,offset,size) <- mmapFileForeignPtr path mode range
forM_ [0 .. size-1] $ \i -> do
doStuff p (offset + i)
where
path = "/tmp/input.dat"
mode = ReadOnly
range = Nothing
-- range = Just (4,3)
Output:
(0x00007f1b40edd000,71,'G')
(0x00007f1b40edd001,117,'u')
(0x00007f1b40edd002,116,'t')
(0x00007f1b40edd003,101,'e')
(0x00007f1b40edd004,110,'n')
(0x00007f1b40edd005,32,' ')
(0x00007f1b40edd006,77,'M')
(0x00007f1b40edd007,111,'o')
(0x00007f1b40edd008,114,'r')
(0x00007f1b40edd009,103,'g')
(0x00007f1b40edd00a,101,'e')
(0x00007f1b40edd00b,110,'n')
(0x00007f1b40edd00c,33,'!')
(0x00007f1b40edd00d,10,'\n')

You are probably looking for ptrToIntPtr and probably fromIntegral to make it a CUInt.
Note that a CUInt cannot represent a pointer on all platforms, though.

Related

Suppress Printing for Custom Structures in Julia

I have a structure that ends up having a lot of circular references. It resembles this:
mutable struct Friend
a :: Int64
b :: Float64
your_best_friend :: Union{Nothing, Friend}
you_are_best_friend :: Union{Nothing, Friend}
Friend() = new()
end
Any two people who are best friends with each other will cause a circular reference when this is printed. Julia handles these circular references so that the print doesn't go forever, but I would prefer to have no printing at all whenever a variable of the structure Friend is created. I know supressor.jl is a thing, but I am wondering if there is a solution inherent to Base Julia. Basically, is there an option for structures so that the object isn't printed when assigned without using an extra package? If not, what's the next best thing? I am not a CS guy, so I'm not sure what kind of computation time printing takes, but I'd like to avoid it if possible (and I'm not sure supressor.jl removes the printing time or if printing still takes extra time but just isn't displayed). This seems simple to me, but I can't find the solution in the docs. Sorry if it is obvious and thanks in advance!
-J
You need to overload Base.show to change how objects are shown by the REPL:
julia> mutable struct Friend
a :: Int64
b :: Float64
your_best_friend :: Union{Nothing, Friend}
you_are_best_friend :: Union{Nothing, Friend}
Friend() = new()
end
julia> Friend()
Friend(0, 0.0, #undef, #undef)
julia> import Base.show
julia> show(io::IO, f::Friend) = show(io, "Friend $(f.a)")
show (generic function with 223 methods)
julia> d = Friend()
"Friend 0"
Note if you also want to change how things print outside the REPL command line, you may also need to overload printing via import Base.print

How to get the address of a global variable in Fortran at initialization?

In C, I can initialize a pointer type global variable in this way:
<<file.h>>
extern int dummy;
extern int* p;
<<file.c>>
int dummy;
int* p = &dummy;
The advantage is that p is a const at link time. I do not need to write an init function to initialize p. Since in my case, value of 'dummy' is never used, I only need its address p and I won't change p.
I want to know how to achieve this in Fortran, i.e., getting the address of a variable without execution time initialization. I did the following, but did not succeed.
module mod
use, intrinsic :: iso_c_binding, only: c_ptr, c_loc
integer, target :: dummy
type(c_ptr), bind(c, name="p") :: p = c_loc(dummy)
end module mod
The compiler says "Error: Intrinsic function 'c_loc' at (1) is not permitted in an initialization expression"
I need this feature since I have a variable declared in Fortran. I need its address in C (to be used as a global var), but I don't want to call any Fortran init routines.

Haskell: Lazily read binary file with binary

I'm trying to read in a binary file and parse it lazily using the 'binary' package. The package documentation gives an example of how to do this without forcing all the input for a scenario very similar to mine:
example2 :: BL.ByteString -> [Trade]
example2 input
| BL.null input = []
| otherwise =
let (trade, rest, _) = runGetState getTrade input 0
in trade : example2 rest
However, this uses the deprecated runGetState function, which itself points you towards the runGetIncremental function.
The problem is that the 'runGetIncremental' function seems to force the remaining input to be a strict bytestring, thus forcing it to load the whole file into memory. Indeed, I'm seeing memory usage of around 6GB when I try to run this. Even the implementation of runGetState now seems to be based on runGetIncremental and then reconverts the strict bytestring back to a lazy one using chunk.
Can I get the behaviour as described in the tutorial, or is this now unsupported by binary? If the latter, what's the best way to do this? I have a little experience using conduit, but it's not clear to me how I could use it here.
You can do this using pipes-binary and pipes-bytestring. Here's a helper function for your benefit:
import Control.Monad (void)
import Data.Binary
import Pipes
import Pipes.Binary (decodeMany)
import Pipes.ByteString (fromHandle)
import qualified Pipes.Prelude as P
import System.IO
decodeHandle :: (Binary a) => Handle -> Producer a IO ()
decodeHandle handle = void $ decodeMany (fromHandle handle) >-> P.map snd
The void and map snd are there because decodeMany actually returns more information (like byte offsets and parsing errors). If you actually want that information, then just remove them.
Here's an example of how you might use decodeHandle, using a quick skeleton for Trade I threw together:
data Trade = Trade
instance Binary Trade where
get = return Trade
put _ = return ()
instance Show Trade where show _ = "Trade"
main = withFile "inFile.txt" ReadMode $ \handle -> runEffect $
for (decodeHandle handle) $ \trade -> do
lift $ print (trade :: Trade)
-- do more with the parsed trade
You can use for to loop over the decoded trades and handle them, or if you prefer you can use pipe composition:
main = withFile "inFile.txt" ReadMode $ \handle -> runEffect $
decodeHandle handle >-> P.print
This will be lazy and only decode as many trades as you actually need. So if you insert a take in between the decoder and the printer, it will only read as much input as necessary to process the requested number of trades:
main = withFile "inFile.txt" ReadMode $ \handle -> runEffect $
for (decodeHandle handle >-> P.take 4) $ \trade -> do
... -- This will only process the first 4 trades
-- or using purely pipe composition:
main = withFile "inFile.txt" ReadMode $ \handle -> runEffect $
decodeHandle handle >-> P.take 4 >-> P.print

Freeing memory allocated with newCString

As library docs say CString created with newCString must be freed with free function. I have been expecting that when CString is created it would take some memory and when it is released with free memory usage would go down, but it didn't! Here is example code:
module Main where
import Foreign
import Foreign.C.String
import System.IO
wait = do
putStr "Press enter" >> hFlush stdout
_ <- getLine
return ()
main = do
let s = concat $ replicate 1000000 ['0'..'9']
cs <- newCString s
cs `seq` wait -- (1)
free cs
wait -- (2)
When program stopped at (1), htop program showed that memory usage is somewhere around 410M - this is OK. I press enter and the program stops at line (2), but memory usage is still 410M despite cs has been freed!
How is this possible? Similar program written in C behaves as it should. What am I missing here?
The issue is that free just indicates to the garbage collector that it can now collect the string. That doesn't actually force the garbage collector to run though -- it just indicates that the CString is now garbage. It is still up to the GC to decide when to run, based on heap pressure heuristics.
You can force a major collection by calling performGC straight after the call to free, which immediately reduces the memory to 5M or so.
E.g. this program:
import Foreign
import Foreign.C.String
import System.IO
import System.Mem
wait = do
putStr "Press enter" >> hFlush stdout
_ <- getLine
return ()
main = do
let s = concat $ replicate 1000000 ['0'..'9']
cs <- newCString s
cs `seq` wait -- (1)
free cs
performGC
wait -- (2)
Behaves as expected, with the following memory profile - the first red dot is the call to performGC, immediately deallocating the string. The program then hovers around 5M until terminated.

How to parse a 7GB file, with Data.ByteString?

I have to parse a file, and indeed a have to read it first, here is my program :
import qualified Data.ByteString.Char8 as B
import System.Environment
main = do
args <- getArgs
let path = args !! 0
content <- B.readFile path
let lines = B.lines content
foobar lines
foobar :: [B.ByteString] -> IO()
foobar _ = return ()
but, after the compilation
> ghc --make -O2 tmp.hs
the execution goes through the following error when called with a 7Gigabyte file.
> ./tmp big_big_file.dat
> tmp: {handle: big_big_file.dat}: hGet: illegal ByteString size (-1501792951): illegal operation
thanks for any reply!
The length of ByteStrings are Int. If Int is 32 bits, a 7GB file will exceed the range of Int and the buffer request will be for a wrong size and can easily request a negative size.
The code for readFile converts the file size to Int for the buffer request
readFile :: FilePath -> IO ByteString
readFile f = bracket (openBinaryFile f ReadMode) hClose
(\h -> hFileSize h >>= hGet h . fromIntegral)
and if that overflows, an "illegal ByteString size" error or a segmentation fault are the most likely outcomes.
If at all possible, use lazy ByteStrings to handle files that big. In your case, you pretty much have to make it possible, since with 32 bit Ints, a 7GB ByteString is impossible to create.
If you need the lines to be strict ByteStrings for the processing, and no line is exceedingly long, you can go through lazy ByteStrings to achieve that
import qualified Data.ByteString.Lazy.Char8 as LC
import qualified Data.ByteString.Char8 as C
main = do
...
content <- LC.readFile path
let llns = LC.lines content
slns = map (C.concat . LC.toChunks) llns
foobar slns
but if you can modify your processing to deal with lazy ByteStrings, that will probably be better overall.
Strict ByteStrings only support up to 2 GiB of memory. You need to use lazy ByteStrings for it to work.

Resources