I have a file I opened as binary like this: local dem = io.open("testdem.dem", "rb")
I can read out strings from it just fine: print(dem:read(8)) -> HL2DEMO, however, afterwards there is a 4-byte little endian integer and a 4-byte float (docs for the file format don't specify endianess but since it didn't specify little like the integer I'll have to assume big).
This cannot be read out with read.
I am new to the LuaJIT FFI and am not sure how to read this out. Frankly, I find the documentation on this specific aspect of the FFI to be underwhelming, although I'm just a lua programmer and don't have much experience with C. One thing I have tried is creating a cdata, but I don't think I understand that:
local dem = io.open("testdem.dem", "rb")
print(dem:read(8))
local cd = ffi.new("int", 4)
ffi.copy(cd, dem:read(4), 4)
print(tostring(cd))
--[[Output
HL2DEMO
luajit: bad argument #1 to 'copy' (cannot convert 'int' to 'void *')
]]--
Summary:
Goal: Read integers and floats from binary data.
Expected output: A lua integer or float I can then convert to string.
string.unpack does this for Lua 5.3, but there are some alternatives for LuaJIT as well. For example, see this answer (and other answers to the same question).
I'm reading ejabberd source, specifically ejabberd_http.erl.
In the code below,
...
case (State#state.sockmod):recv(State#state.socket,
min(Len, 16#4000000), 300000)
of
{ok, Data} ->
recv_data(State, Len - byte_size(Data), <<Acc/binary, Data/binary>>);
...
What does 16#4000000 mean?
I've tested this in the Erlang shell.
%%erlang shell
...
7>16#4000000.
67108864
8>is_integer(16#4000000).
true
I know it's just an integer value.
Is there any advantage to writing 16#4000000 instead of 67108864?
In Erlang, the number before the # is the integer base. In your example, 16#4000000 means the hexadecimal representation of 67108864. In other languages it is often represented as 0x4000000.
One reason for using the hex representation is because each digit represents 4 bits, for example 16#F is 16 (in decimal), or 1111 in binary. When working with binary processing, using base 16 makes it easier to handle and understand for the human reader.
I am trying to parse a huge .dat file (4gb). I have tried with R but it just takes too long. Is there a way to parse a .dat file by segments, for example every 30000 lines? Any other solutions would also be welcomed.
This is what it looks like:
These are the first two lines with header:
ST|ZIPCODE|GEO_ID|GEO_TTL|FOOTID_GEO|NAICS2012|NAICS2012_TTL|FOOTID_NAICS|YEAR|EMPSZES|EMPSZES_TTL|ESTAB|ESTAB_F <br/>
01|35004|8610000US35004|35004(MOODY,AL)||00|Total for all sectors||2012|001|All establishments|167| <br/>
01|35004|8610000US35004|35004(MOODY,AL)||00|Total for all sectors||2012|212|Establishments with 1 to 4 employees|91|
This is an option to read data faster in R by using the fread function in the data.table package.
EDIT
I removed all <br/> new-line tags. This is the edited dataset
ST|ZIPCODE|GEO_ID|GEO_TTL|FOOTID_GEO|NAICS2012|NAICS2012_TTL|FOOTID_NAICS|YEAR|EMPSZES|EMPSZES_TTL|ESTAB|ESTAB_F
01|35004|8610000US35004|35004(MOODY,AL)||00|Total for all sectors||2012|001|All establishments|167|
01|35004|8610000US35004|35004(MOODY,AL)||00|Total for all sectors||2012|212|Establishments with 1 to 4 employees|91|
Then I matched variables with classes. You should use nrows ~ 100.
colclasses = sapply(read.table(edited_data, nrows=1, sep="|", header=T),class)
Then I read the edited data.
your_data <- fread(edited_data, sep="|", sep2=NULL, nrows=-1L, header=T, na.strings="NA",
stringsAsFactors=FALSE, verbose=FALSE, autostart=30L, skip=-1L, select=NULL,
colClasses=colclasses)
Everything worked like a charm. In case you have problems removing the tags, use this simple Python script (it will take some time for sure):
original_file = file_path_to_original_file # e.g. "/Users/User/file.dat"
edited_file = file_path_to_new_file # e.g. "/Users/User/file_edited.dat"
with open(original_file) as inp:
with open(edited_file, "w") as op:
for line in inp:
op.write(line.replace("<br/>", "")
P.S.
You can use read.table with similar optimizations, but it won't give you nearly as much speed.
my question now is :
I have the variavle M which contains : 37.5 (as you see is integer)
I want to convert M in order to be string "37.5"
so 37.5 should became "37.5"
I try with function :
M2=integer_to_list(M)
but when I execute this function it displays this error :
** exception error: bad argument
in function integer_to_list/1
called as integer_to_list(37.5)
integer_to_list wont work in that situation because 37.5 is a float and not an integer. Erlang does have float_to_list, but the output is usually pretty unusable.
Instead, I would recommend looking into mochiweb project for pretty conversion of floats to lists. In particular, the mochinum module:
> M = 37.5,
> mochinum:digits(M).
"37.5"
#chops has a great answer, IMO (using mochinum:digits/1), but you might also get something out of looking at the io_lib module. For example:
8> io_lib:format("~.2f",[37.5]).
["37.50"]
9> io_lib:format("~.1f",[37.5]).
["37.5"]
I realize this might not be exactly what you are looking for, and in this case I think looking at/using the mochinum module is an efficient way to go, but io_lib is often overlooked and provides a really useful set of functions for formatting lists / strings
For example, if I want to read the middle value from magic(5), I can do so like this:
M = magic(5);
value = M(3,3);
to get value == 13. I'd like to be able to do something like one of these:
value = magic(5)(3,3);
value = (magic(5))(3,3);
to dispense with the intermediate variable. However, MATLAB complains about Unbalanced or unexpected parenthesis or bracket on the first parenthesis before the 3.
Is it possible to read values from an array/matrix without first assigning it to a variable?
It actually is possible to do what you want, but you have to use the functional form of the indexing operator. When you perform an indexing operation using (), you are actually making a call to the subsref function. So, even though you can't do this:
value = magic(5)(3, 3);
You can do this:
value = subsref(magic(5), struct('type', '()', 'subs', {{3, 3}}));
Ugly, but possible. ;)
In general, you just have to change the indexing step to a function call so you don't have two sets of parentheses immediately following one another. Another way to do this would be to define your own anonymous function to do the subscripted indexing. For example:
subindex = #(A, r, c) A(r, c); % An anonymous function for 2-D indexing
value = subindex(magic(5), 3, 3); % Use the function to index the matrix
However, when all is said and done the temporary local variable solution is much more readable, and definitely what I would suggest.
There was just good blog post on Loren on the Art of Matlab a couple days ago with a couple gems that might help. In particular, using helper functions like:
paren = #(x, varargin) x(varargin{:});
curly = #(x, varargin) x{varargin{:}};
where paren() can be used like
paren(magic(5), 3, 3);
would return
ans = 16
I would also surmise that this will be faster than gnovice's answer, but I haven't checked (Use the profiler!!!). That being said, you also have to include these function definitions somewhere. I personally have made them independent functions in my path, because they are super useful.
These functions and others are now available in the Functional Programming Constructs add-on which is available through the MATLAB Add-On Explorer or on the File Exchange.
How do you feel about using undocumented features:
>> builtin('_paren', magic(5), 3, 3) %# M(3,3)
ans =
13
or for cell arrays:
>> builtin('_brace', num2cell(magic(5)), 3, 3) %# C{3,3}
ans =
13
Just like magic :)
UPDATE:
Bad news, the above hack doesn't work anymore in R2015b! That's fine, it was undocumented functionality and we cannot rely on it as a supported feature :)
For those wondering where to find this type of thing, look in the folder fullfile(matlabroot,'bin','registry'). There's a bunch of XML files there that list all kinds of goodies. Be warned that calling some of these functions directly can easily crash your MATLAB session.
At least in MATLAB 2013a you can use getfield like:
a=rand(5);
getfield(a,{1,2}) % etc
to get the element at (1,2)
unfortunately syntax like magic(5)(3,3) is not supported by matlab. you need to use temporary intermediate variables. you can free up the memory after use, e.g.
tmp = magic(3);
myVar = tmp(3,3);
clear tmp
Note that if you compare running times with the standard way (asign the result and then access entries), they are exactly the same.
subs=#(M,i,j) M(i,j);
>> for nit=1:10;tic;subs(magic(100),1:10,1:10);tlap(nit)=toc;end;mean(tlap)
ans =
0.0103
>> for nit=1:10,tic;M=magic(100); M(1:10,1:10);tlap(nit)=toc;end;mean(tlap)
ans =
0.0101
To my opinion, the bottom line is : MATLAB does not have pointers, you have to live with it.
It could be more simple if you make a new function:
function [ element ] = getElem( matrix, index1, index2 )
element = matrix(index1, index2);
end
and then use it:
value = getElem(magic(5), 3, 3);
Your initial notation is the most concise way to do this:
M = magic(5); %create
value = M(3,3); % extract useful data
clear M; %free memory
If you are doing this in a loop you can just reassign M every time and ignore the clear statement as well.
To complement Amro's answer, you can use feval instead of builtin. There is no difference, really, unless you try to overload the operator function:
BUILTIN(...) is the same as FEVAL(...) except that it will call the
original built-in version of the function even if an overloaded one
exists (for this to work, you must never overload
BUILTIN).
>> feval('_paren', magic(5), 3, 3) % M(3,3)
ans =
13
>> feval('_brace', num2cell(magic(5)), 3, 3) % C{3,3}
ans =
13
What's interesting is that feval seems to be just a tiny bit quicker than builtin (by ~3.5%), at least in Matlab 2013b, which is weird given that feval needs to check if the function is overloaded, unlike builtin:
>> tic; for i=1:1e6, feval('_paren', magic(5), 3, 3); end; toc;
Elapsed time is 49.904117 seconds.
>> tic; for i=1:1e6, builtin('_paren', magic(5), 3, 3); end; toc;
Elapsed time is 51.485339 seconds.