List still being treated as a set even after converting - machine-learning

So i have an instance where even after converting my sets to lists, they aren't recognized as lists.
So the idea is to delete extra columns from a data frame comparing with columns in another. I have two data frames say df_test and df_train . I need to remove columns in df_test which are not in train .
extracols = set(df_test.columns) - set(df_train.columns) #Gives cols 2b
deltd
l = [extracols] # or list(extracols)
Xdp.dropna( subset = l, how ='any' , axis = 0)
I get an error : Unhashable type set
Even on printing l it prints like a set with {} curlies.

[{set}] doesn't cast to list, it just creates a list of length 1 with your set inside it.
Are you sure that list({set}) isn't working for you? Maybe you should post more of your code as it is hard to see where this is going wrong for you.

Related

options for saving xarray dataset with to_netcdf

I would like to add units, long_name, and maybe a description to a variable while using the to_netcdf command. Let me know if you know how.
Here is my code that work:
filename = path+'file.nc'
ds = xr.Dataset({'sla': (('time_counter','x', 'y'), SLA)}, coords={'time_counter':time_counter,'nav_lon':(('x','y'),lon),'nav_lat':(('x','y'),lat)})
ds.to_netcdf(filename, 'w')
Supplementary informations if you want to use this:
'sla' is the name I give while saving the variable SLA
SLA has 3 dimensions; I give them the names 'time_counter', 'x', and 'y'
I defined coordinates, one of which ('time_counter') is directly a dimension of SLA, but also it is possible to have a coordinate with multiple dimensions (e.g., 'nav_lon' and 'nav_lat' have 2 dimensions.
Here is the link that explain the function: http://xarray.pydata.org/en/stable/generated/xarray.Dataset.to_netcdf.html
You can set the attributes of each variable before saving the Dataset to NetCDF, for example (after creating your ds):
ds['sla'].attrs = {'units': 'something'}
After the to_netcdf() step I get (part of the ncdump -h):
double sla(time_counter, x, y) ;
...
sla:units = "something" ;

Other ways to call/eval dynamic strings in Lua?

I am working with a third party device which has some implementation of Lua, and communicates in BACnet. The documentation is pretty janky, not providing any sort of help for any more advanced programming ideas. It's simply, "This is how you set variables...". So, I am trying to just figure it out, and hoping you all can help.
I need to set a long list of variables to certain values. I have a userdata 'ME', with a bunch of variables named MVXX (e.g. - MV21, MV98, MV56, etc).
(This is all kind of background for BACnet.) Variables in BACnet all have 17 'priorities', i.e., every BACnet variable is actually a sort of list of 17 values, with priority 16 being the default. So, typically, if I were to say ME.MV12 = 23, that would set MV12's priority-16 to the desired value of 23.
However, I need to set priority 17. I can do this in the provided Lua implementation, by saying ME.MV12_PV[17] = 23. I can set any of the priorities I want by indexing that PV. (Corollaries - what is PV? What is the underscore? How do I get to these objects? Or are they just interpreted from Lua to some function in C on the backend?)
All this being said, I need to make that variable name dynamic, so that i can set whichever value I need to set, based on some other code. I have made several attempts.
This tells me the object(MV12_PV[17]) does not exist:
x = 12
ME["MV" .. x .. "_PV[17]"] = 23
But this works fine, setting priority 16 to 23:
x = 12
ME["MV" .. x] = 23
I was trying to attempt some sort of what I think is called an evaluation, or eval. But, this just prints out function followed by some random 8 digit number:
x = 12
test = assert(loadstring("MV" .. x .. "_PV[17] = 23"))
print(test)
Any help? Apologies if I am unclear - tbh, I am so far behind the 8-ball I am pretty much grabbing at straws.
Underscores can be part of Lua identifiers (variable and function names). They are just part of the variable name (like letters are) and aren't a special Lua operator like [ and ] are.
In the expression ME.MV12_PV[17] we have ME being an object with a bunch of fields, ME.MV12_PV being an array stored in the "MV12_PV" field of that object and ME.MV12_PV[17] is the 17th slot in that array.
If you want to access fields dynamically, the thing to know is that accessing a field with dot notation in Lua is equivalent to using bracket notation and passing in the field name as a string:
-- The following are all equivalent:
x.foo
x["foo"]
local fieldname = "foo"
x[fieldname]
So in your case you might want to try doing something like this:
local n = 12
ME["MV"..n.."_PV"][17] = 23
BACnet "Commmandable" Objects (e.g. Binary Output, Analog Output, and o[tionally Binary Value, Analog Value and a handful of others) actually have 16 priorities (1-16). The "17th" you are referring to may be the "Relinquish Default", a value that is used if all 16 priorities are set to NULL or "Relinquished".
Perhaps your system will allow you to write to a BACnet Property called "Relinquish Default".

Modify values programmatically SPSS

I have a file with more than 250 variables and more than 100 cases. Some of these variables have an error in decimal dot (20445.12 should be 2.044512).
I want to modify programatically these data, I found a possible way in a Visual Basic editor provided by SPSS (I show you a screen shot below), but I have an absolute lack of knowledge.
How can I select a range of cells in this language?
How can I store the cell once modified its data?
--- EDITED NEW DATA ----
Thank you for your fast reply.
The problem now its the number of digits that number has. For example, error data could have the following format:
Case A) 43998 (five digits) ---> 4.3998 as correct value.
Case B) 4399 (four digits) ---> 4.3990 as correct value, but parsed as 0.4399 because 0 has been removed when file was created.
Is there any way, like:
IF (NUM < 10000) THEN NUM = NUM / 1000 ELSE NUM = NUM / 10000
Or something like IF (Number_of_digits(NUM)) THEN ...
Thank you.
there's no need for VB script, go this way:
open a syntax window, paste the following code:
do repeat vr=var1 var2 var3 var4.
compute vr=vr/10000.
end repeat.
save outfile="filepath\My corrected data.sav".
exe.
Replace var1 var2 var3 var4 with the names of the actual variables you need to change. For variables that are contiguous in the file you may use var1 to var4.
Replace vr=vr/10000 with whatever mathematical calculation you would like to use to correct the data.
Replace "filepath\My corrected data.sav" with your path and file name.
WARNING: this syntax will change the data in your file. You should make sure to create a backup of your original in addition to saving the corrected data to a new file.

How can filter any SET by its concat value according to another SET in Redis

I have a filter optimization problem in Redis.
I have a Redis SET which keeps the doc and pos pairs of a type in a corpus.
example:
smembers type_in_docs.1
result: doc.pos pairs
array (size=216627)
0 => string '2805.2339' (length=9)
1 => string '2410.14208' (length=10)
2 => string '3516.1810' (length=9)
...
Another redis set i create live according to user choices
It contains selected docs.
smembers filteredDocs
I want to filter doc.pos pairs "type_in_docs" set according to user Doc id choices.
In fact if i didnt use concat values in set it was easy with SINTER.
So i implement a php filter code as below.
It works but need an optimization.
In big doc.pairs set too much time need. (Nearly After 150000 members!)
$concordance= $this->redis->smembers('types_in_docs.'.$typeID);
$filteredDocs= $this->redis->smembers('filteredDocs');
$filtered = array_filter($concordance, function($pairs) use ($filteredDocs) {
if( in_array(substr($pairs, 0, strpos($pairs, '.')), $filteredDocs) ) return true;
});
I tried sorted set with scores as docId.
Bu couldnt find a intersect or filter option for score values.
I am thinking and searching a Redis based solution with supported keys, sets or Lua script for time optimization.
But nothing find.
How can i filter Redis sets with concat values?
Thanks for helps.
Your code is slow primarily because you're moving a lot of data from Redis to your PHP filter. The general motivation here should be perform as much filtering as possible on the server. To do that you'd need to pay some sort of price in CPU & RAM.
There are many ways to do this, here's one:
Ensure you're using Redis v2.8.9 or above.
To allow efficiently looking for doc only, keep your doc.pos pairs as is but use Sorted Sets with score = 0, your e.g.:
ZADD type_in_docs.1 0 2805.2339 0 2410.14208 0 3516.1810
This will allow you to mimic SISMEMBER for doc in the set with:
ZRANGEBYLEX type_in_docs.1 [<$typeID> (<$typeID + "\xff">
You can now just SMEMBERS on the (usually) smaller filterDocs set and then call ZRANGEBYLEX on each for immediate gains.
If you want to do better - in extreme cases (i.e. large filterDocs, small type_in_docs) you should do the reverse.
If you want to do even better, use Lua to wrap up the filtering logic - something like:
-- #usage: redis-cli --filter_doc_pos.lua <filter set keyname> <type pairs keyname>
-- #returns: list of matching doc.pos pairs
local r = {}
for _, fv in pairs(redis.call("SMEMBERS", KEYS[1])) do
local t = redis.call("ZRANGEBYLEX", KEYS[2], "[" .. fv , "(" .. fv .. "\xff")
for _, tv in pairs(t) do
r[#r+1] = tv
end
end
return r

f# deedle filter data frame based on a list

I wanted to filter a Deedle dataframe based on a list of values how would I go about doing this?
I had an idea to use the following code below:
let d= df1|>filterRowValues(fun row -> row.GetAs<float>("ts") = timex)
However the issue with this is that it is only based on one variable, I then thought of combining this with a for loop and an append function:
for i in 0.. recd.length -1 do
df2.Append(df1|>filterRowValues(fun row -> row.GetAs<float>("ts") = recd.[i]))
This does not work either however and there must be a better way of doing this without using a for loop. In R I could for instance using an %in%.
You can use the F# set type to create a set of the values that you are interested. In the filtering, you can then check whether the set contains the actual value for the row.
For example, say that you have recd of type seq<float>. Then you should be able to write:
let recdSet = set recd
let d = df1 |> Frame.filterRowValues (fun row ->
recdSet.Contains(row.GetAs<float>("ts"))
Some other things that might be useful:
You can replace row.GetAs<float>("ts") with just row?ts (which always returns float and works only when you have a fixed name, like "ts", but it makes the code nicer)
Comparing float values might not be the best thing to do (because of floating point imprecisions, this might not always work as expected).

Resources