Dask iloc workaround? - dask

I'd like to index into a Dask dataframe -- like iloc in Pandas, but I know that this is not possible.
Would the following be an okay workaround? Are there potential drawbacks?
df.reset_index(drop=True)
df['column'].loc[index]
I'm guessing there must be some drawback to this -- otherwise, why isn't this the implementation for iloc in Dask?

Related

DL4J - When using a ComputationGraph, is it possible to get the Class labels from it?

I saw how to do this from a DataSet object, and I saw a setLabel method, and I saw a getLabelMaskArrays, but none of these are what I'm looking for.
Am I just blind or is there not a way?
Thanks
Masking is for variable length time series in RNNs. Most of the time you don't need it. Our built in sequence dataset iterators also tend to handle these cases. For more details see our rnn page: https://deeplearning4j.org/usingrnns

Parse batch of SequenceExample

There is function to parse SequenceExample --> tf.parse_single_sequence_example().
But it parses only single SequenceExample, which is not effective.
Is there any possibility to parse a batch of SequenceExamples?
tf.parse_example can parse many Examples.
Documentation for tf.parse_example contain a little info about SequenceExample:
Each FixedLenSequenceFeature df maps to a Tensor of the specified type (or tf.float32 if not specified) and shape (serialized.size(), None) + df.shape. All examples in serialized will be padded with default_value along the second dimension.
But it is not clear, how to do that. Have not found any examples in google.
Is it possible to parse many SequenceExamples using parse_example() or may be other function exists?
Edit:
Where can I ask question to tensorflow developers: does they plan to implement parse function for multiple SequenceExample -s?
Any help ll be appreciated.
If you have many small sequences where batching at this stage is important, I would recommend VarLenFeatures or FixedLenSequenceFeatures with regular Example protos (which, as you note, can be parsed in batches with parse_example). For examples of this, see the unit tests associated with example parsing (testSerializedContainingSparse parses Examples with FixedLenSequenceFeatures).
SequenceExamples are more geared toward cases where there is significant amounts of preprocessing work to be done for each SequenceExample (which can be done in parallel with queues). parse_example does does not support SequenceExamples.

Julia Parsing a CSV

Hope the evening is a more than hospitable one and you have traded your emacs terminal for some bonne-vivant Ralph Lauren catalog dinner party type scene. As for me, I'm trying to parse a CSV in Julia and things are deteriorating. Here is my code:
f2 = open("/Users/MacBookPro15/testnovo.csv", "r")
skip(f2, 736)
for line in eachline(f2)
string_split = split(line, ",")
println(string_split[1])
end
Now if I substitute string_split[2] or anything other than [1] I get a BoundsError and it's rather frustrating because I need those items. Can anyone tell me how to avoid this?
Every time I hear "parsing a CSV" I want to duck and cover my ears, before I get flashbacks of a missing quote, or a 32-column line 98% of the way through a 33-column, 10GB csv file.
Fortunately, there are two useful functions that'll prevent you from rolling your own csv parser:
readcsv in Julia's standard library http://docs.julialang.org/en/release-0.2/stdlib/base/?highlight=readcsv#Base.readcsv
readtable in Dataframe.jl http://juliastats.github.io/DataFrames.jl/io.html
Unfortunately, it looks like you need the DataStream abstraction, which we stopped including in DataFrames since not enough people worked on it to make it robust. The first 100 lines of https://github.com/JuliaStats/DataFrames.jl/blob/master/prototypes/datastream.jl should provide you with enough information to write your own streaming algorithm for working with CSV's.

Erlang: Compute data structure literal (constant) at compile time?

This may be a naive question, and I suspect the answer is "yes," but I had no luck searching here and elsewhere on terms like "erlang compiler optimization constants" etc.
At any rate, can (will) the erlang compiler create a data structure that is constant or literal at compile time, and use that instead of creating code that creates the data structure over and over again? I will provide a simple toy example.
test() -> sets:from_list([usd, eur, yen, nzd, peso]).
Can (will) the compiler simply stick the set there at the output of the function instead of computing it every time?
The reason I ask is, I want to have a lookup table in a program I'm developing. The table is just constants that can be calculated (at least theoretically) at compile time. I'd like to just compute the table once, and not have to compute it every time. I know I could do this in other ways, such as compute the thing and store it in the process dictionary for instance (or perhaps an ets or mnesia table). But I always start simple, and to me the simplest solution is to do it like the toy example above, if the compiler optimizes it.
If that doesn't work, is there some other way to achieve what I want? (I guess I could look into parse transforms if they would work for this, but that's getting more complicated than I would like?)
THIS JUST IN. I used compile:file/2 with an 'S' option to produce the following. I'm no erlang assembly expert, but it looks like the optimization isn't performed:
{function, test, 0, 5}.
{label,4}.
{func_info,{atom,exchange},{atom,test},0}.
{label,5}.
{move,{literal,[usd,eur,yen,nzd,peso]},{x,0}}.
{call_ext_only,1,{extfunc,sets,from_list,1}}.
No, erlang compiler doesn't perform partial evaluation of calls to external modules which set is. You can use ct_expand module of famous parse_trans to achieve this effect.
providing that set is not native datatype for erlang, and (as matter of fact) it's just a library, written in erlang, I don't think it's feasibly for compiler to create sets at compile time.
As you could see, sets are not optimized in erlang (as any other library written in erlang).
The way of solving your problem is to compute the set once and pass it as a parameter to the functions or to use ETS/Mnesia.

What is inside Erlang's digraph?

Disclaimer: The author is a newbie in Erlang.
I would like to implement some kind of shortest-path algorithm in Erlang.
There is a standard implementation of graph data structure in Erlang: http://www.erlang.org/doc/man/digraph.html
However, I have not found any information on the actual data structure it uses.
Mostly I would like to know:
what is the worst case performance of getting all 'neighbours' for a vertex action?
what is the worst case performance of fetching a vertex from the graph?
A digraph uses 3 ets tables (vertices, edges and neighbour vertices).
So both of that operations are O(1).
Take a look at OTP code, it's clean and in most cases idiomatic Erlang. stdlib's gen.erl + gen_server.erl, proc_lib.erl and sys.erl are must read :)

Resources