Error in file.read() return above 2 GB on 64 bit python - memory

I have several ~50 GB text files that I need to parse for specific contents. My files contents are organized in 4 line blocks. To perform this analysis I read in subsections of the file using file.read(chunk_size) and split into blocks of 4 then analyze them.
Because I run this script often, I've been optimizing and have tried varying the chunk size. I run 64 bit 2.7.1 python on OSX Lion on a computer with 16 GB RAM and I noticed that when I load chunks >= 2^31, instead of the expected text, I get large amounts of /x00 repeated. This continues as far as my testing has shown all the way to, and including 2^32, after which I once again get text. However, it seems that it's only returning as many characters as bytes have been added to the buffer above 4 GB.
My test code:
for i in range((2**31)-3, (2**31)+3)+range((2**32)-3, (2**32)+10):
with open('mybigtextfile.txt', 'rU') as inf:
print '%s\t%r'%(i, inf.read(i)[0:10])
My output:
2147483645 '#HWI-ST550'
2147483646 '#HWI-ST550'
2147483647 '#HWI-ST550'
2147483648 '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
2147483649 '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
2147483650 '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
4294967293 '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
4294967294 '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
4294967295 '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
4294967296 '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
4294967297 '#\x00\x00\x00\x00\x00\x00\x00\x00\x00'
4294967298 '#H\x00\x00\x00\x00\x00\x00\x00\x00'
4294967299 '#HW\x00\x00\x00\x00\x00\x00\x00'
4294967300 '#HWI\x00\x00\x00\x00\x00\x00'
4294967301 '#HWI-\x00\x00\x00\x00\x00'
4294967302 '#HWI-S\x00\x00\x00\x00'
4294967303 '#HWI-ST\x00\x00\x00'
4294967304 '#HWI-ST5\x00\x00'
4294967305 '#HWI-ST55\x00'
What exactly is going on?

Yes, this is the known issue according to the comment in cpython's source code. You can check it in Modules/_io/fileio.c. And the code add a workaround on Microsoft windows 64bit only.

Related

Flashing ESP8266 with a LittleFS binary

I'm trying to build a LittleFS file system binary on my PC and flash it to my WeMos D1 Mini Pro (16MB) ESP8266.
I used the following code on the ESP
LittleFS.begin()
FSInfo info;
LittleFS.info(info);
Serial.print("LittleFS block size:");
Serial.println(info.blockSize);
Serial.print("LittleFS total bytes:");
Serial.println(info.totalBytes);
To determine the block size and total bytes, which gave me 8192 and 14655488 respectively. 14655488 / 8192 = 1789 so I used 1789 as the block size in the below python:
from littlefs import LittleFS
fs = LittleFS(block_size=8192, block_count=1789)
with open( 'index.html', 'rb' ) as f:
data = f.read()
with fs.open( '/index.html', 'w') as fh:
fh.write( data )
with open('fs.bin', 'wb') as fh:
fh.write(fs.context.buffer)
This creates a 14655488 bytes .bin file.
I then looked in boards.txt and found these lines:
d1_mini_pro.menu.eesz.16M14M=16MB (FS:14MB OTA:~1019KB)
d1_mini_pro.menu.eesz.16M14M.build.flash_size=16M
d1_mini_pro.menu.eesz.16M14M.build.flash_size_bytes=0x1000000
d1_mini_pro.menu.eesz.16M14M.build.flash_ld=eagle.flash.16m14m.ld
d1_mini_pro.menu.eesz.16M14M.build.spiffs_pagesize=256
d1_mini_pro.menu.eesz.16M14M.upload.maximum_size=1044464
d1_mini_pro.menu.eesz.16M14M.build.rfcal_addr=0xFFC000
d1_mini_pro.menu.eesz.16M14M.build.spiffs_start=0x200000
d1_mini_pro.menu.eesz.16M14M.build.spiffs_end=0xFFA000
d1_mini_pro.menu.eesz.16M14M.build.spiffs_blocksize=8192
This confirms the block size and gives the SPIFFS (but LittleFS is equivalent here, right?) start address as 0x200000
Checking these Arduino bits, i get:
FS_PHYS_ADDR: 2097152 (0x200000)
FS_PHYS_SIZE: 14655488
FS_PHYS_PAGE: 256
FS_PHYS_BLOCK: 8192
So then I used:
python upload.py --chip esp8266 --port COM6 --baud 460800 write_flash 0x200000 fs.bin
which outputs:
esptool.py v2.8
Serial port COM6
Connecting....
Chip is ESP8266EX
Features: WiFi
Crystal is 26MHz
MAC: ec:fa:bc:6e:19:90
Uploading stub...
Running stub...
Stub running...
Changing baud rate to 460800
Changed.
Configuring flash size...
Auto-detected Flash size: 16MB
Compressed 14655488 bytes to 215596...
Writing at 0x00234000... (100 %)
Wrote 14655488 bytes (215596 compressed) at 0x00200000 in 56.7 seconds (effective 2067.0 kbit/s)...
Hash of data verified.
Leaving...
Hard resetting via RTS pin...
However, when I then use code like the below
Dir root = LittleFS.openDir("/");
while (root.next())
{
Serial.print(root.fileName());
}
I get nothing, and
LittleFS.exists("/index.html")
returns false.
What am I doing wrong, or how do I debug this?
I'm uploading my firmware (not the filesystem) via Visual Studio Code, and the board configuration I'm using is
"xtal=80,vt=flash,exception=legacy,ssl=all,eesz=16M14M,ip=lm2f,dbg=Disabled,lvl=None____,wipe=none,baud=921600"
If I open the bin in a hex editor, I can see:
Which is some javascript inside the html file.
If I do this:
uint32_t b;
ESP.flashRead(0x006C2800 + 0x200000, &b, 1);
Serial.println(b);
then it returns 115/0x73, so it looks like the binary has flashed successfully, so that leaves me with either the binary being flashed in the wrong place, or it being corrupted/invalid....
I haven't fixed this exact problem, but I have achieved what I want.
This:
C:\Users\Andrew Bullock\AppData\Local\Arduino15\packages\esp8266\tools\mklittlefs\2.5.0-4-fe5bb56\mklittlefs.exe
exists (instructions here https://github.com/earlephilhower/mklittlefs) which works. So I can only assume that the Python wrapper or the version of LittleFS it's using is somehow incompatible or broken.
The only problem here i can see is that the block_count you gave in your python script is wrong. block_count actually refers to the pageSize of the Filesystem, that holds number of bytes every page is going to hold, in ESP littleFS the pageSize is 256.
So your object initialisation should be like this:
fs = LittleFS(block_size=8192, block_count=259)

Why Scapy added 'c2' byte in Dot11 element infomation?

I just followed the steps from Forging WiFi Beacon, but my output is strange, it added 'c2' byte between '0f' and 'ac' byte, Why this happened? How do I solve this problem?
Please look at the image
It seems that you're on Python 3 (as IPython > 5 requires Python > 3)
In that case, you should remember to append a b in front of bytes strings: b"\x01\x02..." as it is required on Python 3+

Length and size of strings in Elixir / Erlang needs explanation

Can someone explain why s is a string with 4096 chars
iex(9)> s = String.duplicate("x", 4096)
... lots of "x"
iex(10)> String.length(s)
4096
but its memory size are a few 6 words?
iex(11)> :erts_debug.size(s)
6 # WHAT?!
And why s2 is a much shorter string than s
iex(13)> s2 = "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"
"1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"
iex(14)> String.length(s)
50
but its size has more 3 words than s?
iex(15)> :erts_debug.size(s2)
9 # WHAT!?
And why does the size of these strings does not match their lengths?
Thanks
First clue why this is showing that values can be found in this question. Quoting size/1 docs:
%% size(Term)
%% Returns the size of Term in actual heap words. Shared subterms are
%% counted once. Example: If A = [a,b], B =[A,A] then size(B) returns 8,
%% while flat_size(B) returns 12.
Second clue can be found in Erlang documentation about bitstrings implementation.
So in the first case the string is too big to fit on heap alone, so it uses refc binaries which are stored on stack and on heap there is only pointer to given binary.
In second case string is shorter than 64 bytes and it uses heap binaries which is just array of bytes stored directly in on the heap, so that gives us 8 bytes per word (64-bit) * 9 = 72 and when we check documentation about exact memory overhead in VM we see that Erlang uses 3..6 words per binary + data, where data can be shared.

Delphi Shared Module for Apache Runs Out of Memory

I hope someone will shed some lights on issue I am having.
I am writing an Apache shared object module that acts as a server for my app. Client or multiple Clients make SOAP request to this module, module talks to database and returns SOAP responses to client(s).
Task Manager shows httpd processes (incoming requests from clients) in Task Manager using about 35000K but every once in a while, one of httpd processes will grow in memory/CPU usage and over time reach the cap of 2GB, and then it will crash. Server reports "Internal Server 500" error in this case. (screenshot added)
I use FastMM to check for memory leaks and it does produce log but there is no memory leaks reported. To make sure I use FastMM properly, I introduced memory leaks and FastMM will log it. So, my assumption is that I do not have memory leak but that somehow memory gets consumed until 2GB threshold is reached and not released until I manually restart Apache.
Then I started taking snapshots using FastMM's LogMemoryManagerStateToFile call to create memory snapshot at the time of call. LogMemoryManagerStateToFile creates file with following information but I dont understand how is it helpful to me other than telling me that for example 8320 calls to UnicodeString allocated 676512 bytes.
Which of these are not released properly?
Note that this information is shortened version of created file and it is just for one single method call. There could be many calls that occur to different methods:
FastMM State Capture:
---------------------
1793K Allocated
7165K Overhead
20% Efficiency
Usage Detail:
676512 bytes: UnicodeString x 8320
152576 bytes: TTMQuery x 256
144312 bytes: TParam x 1718
134100 bytes: TDBQuery x 225
107444 bytes: Unknown x 1439
88368 bytes: TSDStringList x 1052
82320 bytes: TStringList x 980
80000 bytes: TList x 4000
53640 bytes: TTMStoredProc x 90
47964 bytes: TFieldList x 571
47964 bytes: TFieldDefList x 571
41112 bytes: TFields x 1142
38828 bytes: TFieldDefs x 571
20664 bytes: TParams x 574
...
...
...
4 bytes: TChunktIME x 1
4 bytes: TChunkpHYs x 1
4 bytes: TChunkgAMA x 1
Additional Information:
-----------------------
Finalizing MyMethodName
How is this data helpful to figure out where memory is being consumed so much in terms of locating it in code and fixing it?
Much appreciated,

Understanding call x"91"

Can someone please help me understand call x"91" function 11 and function 12 with simple example. I have tried to search and couldn't understand it. Right now I an using this code in COBOL under UNIX environment,Does this call works in windows environment as well?
http://opencobol.add1tocobol.com/#what-are-the-xf4-xf5-and-x91-routines
The CALL's X"F4", X"F5", X"91" are from MF.
You can find them in the online MF doc under
Library Routines.
F4/F5 are for packing/unpacking bits from/to bytes.
91 is a multi-use call. Implemented are the subfunctions
get/set cobol switches (11, 12) and get number of call params (16).
Use
CALL X"F4" USING
BYTE-VAR
ARRAY-VAR
RETURNING STATUS-VAR
to pack the last bit of each byte in the 8 byte ARRAY-VAR into corresponding bits of the 1 byte BYTE-VAR.
The X”F5” routine takes the eight bits of byte and moves them to the corresponding occurrence within array.
X”91” is a multi-function routine.
CALL X"91" USING
RESULT-VAR
FUNCTION-NUM
PARAMETER-VAR
RETURNING STATUS-VAR
As mentioned by Roger, OpenCOBOL supports FUNCTION-NUM of 11, 12 and 16.
11 and 12 get and set the on off status of the 8 (eight) run-time OpenCOBOL switches definable in the SPECIAL-NAMES paragraph. 16 returns the number of call parameters given to the current module.
x'91' is a general library routine, for a complete list of those see the MF documentation.
This documentation also specifies what its function 11 and function 12 do: they set/read the COBOL runtime switches 0-7 and the internal debugging mode switch.
Other than these library routines you can also read them one by one from COBOL and set "some" switches via the SET statement.

Resources