py2neo 2.x, "Local entity is not bound to a remote entity" - neo4j

When updating Neo4j and py2neo to last versions (2.2.3 and 2.0.7 respectively), I'm facing some problems with some import scripts.
For instance here, just a bit of code.
graph = py2neo.Graph()
graph.bind("http://localhost:7474/db/data/")
batch = py2neo.batch.PushBatch(graph)
pp.pprint(batch)
relationshipmap={}
def create_go_term(line):
if(line[6]=='1'):
relationshipmap[line[0]]=line[1]
goid = line[0]
goacc = line[3]
gotype = line[2]
goname = line[1]
term = py2neo.Node.cast( {
"id": goid, "acc": goacc, "term_type": gotype, "name": goname
})
term.labels.add("GO_TERM")
pp.pprint(term)
term.push()
#batch.append( term )
return True
logging.info('creating terms')
reader = csv.reader(open(opts.termfile),delimiter="\t")
iter = 0
for row in reader:
create_go_term(row)
iter = iter + 1
if ( iter > 5000 ):
# batch.push()
iter = 0
# batch.push()
When using batch or simply push without batch, I'm getting this error:
py2neo.error.BindError: Local entity is not bound to a remote entity
What am I doing wrong?
Thanks!

I think you first have to create the node before you can add the label and use push:
term = py2neo.Node.cast( {
"id": goid, "acc": goacc, "term_type": gotype, "name": goname
})
graph.create(term) # now the node should be bound to a remote entity
term.labels.add("GO_TERM")
term.push()
Alternatively, you can create the node with a label:
term = Node("GO_TERM", id=goid, acc=goacc, ...)
graph.create(term)

Related

Handling error with regressions inside a parallel foreach loop

Hi I am having issues regarding a foreach loop where in every iteration I estimate a regression on a subset of the data with a different list of controls on several outcomes. The problem is that for some outcomes in some countries I only have missing values and therefore the regression function returns an error message. I would like to be able to run the loop, get the output with NAs or a string saying "Error" for example instead of the coefficient table. I tried several things but they don't quite work with the .combine = rbind option and if I use .combine = c I get a very messy output. Thanks in advance for any help.
reg <- function(y, d, c){
if (missing(c))
feols(as.formula(paste0(y, "~ 0 + treatment")), data = d)
else {
feols(as.formula(paste0(y, "~ 0 + treatment + ", c)), data = d)
}
}
# Here we set up the parallelization to run the code on the server
n.cores <- 9 #parallel::detectCores() - 1
#create the cluster
my.cluster <- parallel::makeCluster(
n.cores,
type = "PSOCK"
)
# print(my.cluster)
#register it to be used by %dopar%
doParallel::registerDoParallel(cl = my.cluster)
# #check if it is registered (optional)
# foreach::getDoParRegistered()
# #how many workers are available? (optional)
# foreach::getDoParWorkers()
# Here is the cycle to parallel regress each outcome on the global treatment
# variable for each RCT with strata control
tables <- foreach(
n = 1:9, .combine = rbind, .packages = c('data.table', 'fixest'),
.errorhandling = "pass"
) %dopar% {
dt_target <- dt[country == n]
c <- controls[n]
est <- lapply(outcomes, function(x) reg(y = x, d = dt_target, c))
table <- etable(est, drop = "!treatment", cluster = "uid", fitstat = "n")
table
}

Dynamically building subtables in a table

I'm trying to figure out how to dynamically build a series of sub-tables inside a lua table. For example
function BuildsubTable()
local oTable = {}
local name = {"P1","P2"}
for i = 1, 2 do
oTable[i] = {name = name[i], "x" = i + 2, "y" = i + 1}
end
return oTable
end
expected output:
oTable = {
{name = "P1", "x"=3, "y"=2},
{name = "P2", "x"=4, "y"=3}
}
Which obviously doesn't work, but you get the idea of what I'm trying to do. This is a rather simple task but in LUA 5.3 this is proving to be difficult. I cannot find a good example of building a table in this manner. Any solutions or other ideas would be appreciated. In Python I would use a class or a simple dictionary.
Your problem is that you quote the string indices. The generic syntax for declaring a table key inside the table constructor is [<key>] = <value>, for example, [20] = 20 or ["x"] = i + 2.
A shorthand for ["<key>"] = <value>, that is, for string indices that are valid variable names, you can write <key> = <value>, for example x = i + 2.
In your code you use a mix of both and write { ..., "x" = i + 2, ... }. A quick google search shows me that in Python, which you mention, you quote string keys in dictionaries, so you probably mixed that up with Lua?
EDIT: I noticed this a bit late, but you can also use ipairs to iterate the first table and table.insert to insert values:
function BuildsubTable()
local oTable = {}
local name = {"P1","P2"}
for i,name in ipairs(name) do
table.insert(oTable, {name = name, "x" = i + 2, "y" = i + 1})
end
return oTable
end
Use
oTable[i] = {name = name[i], x = i + 2, y = i + 1}
DarkWiiPlayers & lhf's answers are the proper way.
But here is how you can fix your current code if you intend to use a string as a key
function BuildsubTable()
local oTable = {}
local name = {"P1","P2"}
for i = 1, 2 do
oTable[i] = {name = name[i], ["x"] = i + 2, ["y"] = i + 1}
end
return oTable
end
Output
{
[1] = { ['name'] = 'P1', ['x'] = 3, ['y'] = 2},
[2] = { ['name'] = 'P2', ['x'] = 4, ['y'] = 3}
}

Some difficulties of designing with types in F# by simple graph example

There is oriented graph:
We are adding node and edge to it:
and then removing some other (by the algorithm, it doesn't matter here):
I had tried to do this in F#, but I cannot choose properly architecture decisions because of my little experience.
open System.Collections.Generic
type Node = Node of int
type OGraph(nodes : Set<Node>,
edges : Dictionary<Node * int, Node>) =
member this.Nodes = nodes
member this.Edges = edges
let nodes = set [Node 1; Node 2; Node 3]
let edges = Dictionary<Node * int, Node>()
Array.iter edges.Add [|
(Node 1, 10), Node 2;
(Node 2, 20), Node 3;
|]
let myGraph = OGraph(nodes, edges)
myGraph.Nodes.Add (Node 4)
myGraph.Edges.Add ((Node 2, 50), Node 4)
myGraph.Edges.Remove (Node 2, 20)
myGraph.Nodes.Remove (Node 3)
How to add empty node? I mean, it may be 3 or 4 or even 100500. If we add node without number, then how we can use it to create edge? myGraph.Edges.Add ((Node 2, 50), ???) In imperative paradigm it would be simple because of using named references and Nulls, we can just create Node newNode = new Node() and then use this reference newNode, but seems that in F# this is a bad practice.
Should I specify separate types Node and Edge or use simple types instead? Or may be some other representation, more complicated?
It is better to use common .NET mutable collections (HashSet, Dictionary etc.), or special F# collections (Set, Map, etc.)? If collections are large, it is acceptable in terms of performance to copy entire collection every time it should be changed?
The graph itself is easy enough to model. You could define it like this:
type Graph = { Node : int option; Children : (int * Graph) list }
If you will, you can embellish it more, using either type aliases or custom types instead of primitive int values, but this is the basic idea.
You can model the three graphs pictured in the OP like the following. The formatting I've used looks quite verbose, but I deliberately formatted the values this way in order to make the structure clearer; you could write the values in a more compact form, if you'd like.
let x1 =
{
Node = Some 1;
Children =
[
(
10,
{
Node = Some 2;
Children =
[
(
20,
{
Node = Some 3;
Children = []
}
)
]
}
)
]
}
let x2 =
{
Node = Some 1;
Children =
[
(
10,
{
Node = Some 2;
Children =
[
(
20,
{
Node = Some 3;
Children = []
}
);
(
50,
{
Node = None;
Children = []
}
)
]
}
)
]
}
let x3 =
{
Node = Some 1;
Children =
[
(
10,
{
Node = Some 2;
Children =
[
(
50,
{
Node = Some 3;
Children = []
}
)
]
}
)
]
}
Notice the use of int option to capture whether or not a node has a value.
The Graph type is an F# record type, and uses the F# workhorse list for the children. This would be my default choice, and only if performance becomes a problem would I consider other data types. Lists are easy to work with.
Sine if these are easy:
Use Option - then an empty node is None
Maybe - depends on problem
This depends on your specific problem you are solving - the F# collections tend to be immutable and some operations are fast, but the .NET collections have other operations which are fast.

Caret - Setting the seeds inside the gafsControl()

I am trying to set the seeds inside the caret's gafsControl(), but I am getting this error:
Error in { : task 1 failed - "supplied seed is not a valid integer"
I understand that seeds for trainControl() is a vector equal to the number of resamples plus one, with the number of combinations of models's tuning parameters (in my case 36, SVM with 6 Sigma and 6 Cost values) in each (resamples) entries. However, I couldn't figure out what I should use for gafsControl(). I've tried iters*popSize (100*10), iters (100), popSize (10), but none has worked.
Thanks in advance.
here is my code (with simulated data):
library(caret)
library(doMC)
library(kernlab)
registerDoMC(cores=32)
set.seed(1234)
train.set <- twoClassSim(300, noiseVars = 100, corrVar = 100, corrValue = 0.75)
mylogGA <- caretGA
mylogGA$fitness_extern <- mnLogLoss
#Index for gafsControl
set.seed(1045481)
ga_index <- createFolds(train.set$Class, k=3)
#Seed for the gafsControl()
set.seed(1056)
ga_seeds <- vector(mode = "list", length = 4)
for(i in 1:3) ga_seeds[[i]] <- sample.int(1500, 1000)
## For the last model:
ga_seeds[[4]] <- sample.int(1000, 1)
#Index for the trainControl()
set.seed(1045481)
tr_index <- createFolds(train.set$Class, k=5)
#Seeds for the trainControl()
set.seed(1056)
tr_seeds <- vector(mode = "list", length = 6)
for(i in 1:5) tr_seeds[[i]] <- sample.int(1000, 36)#
## For the last model:
tr_seeds[[6]] <- sample.int(1000, 1)
gaCtrl <- gafsControl(functions = mylogGA,
method = "cv",
number = 3,
metric = c(internal = "logLoss",
external = "logLoss"),
verbose = TRUE,
maximize = c(internal = FALSE,
external = FALSE),
index = ga_index,
seeds = ga_seeds,
allowParallel = TRUE)
tCtrl = trainControl(method = "cv",
number = 5,
classProbs = TRUE,
summaryFunction = mnLogLoss,
index = tr_index,
seeds = tr_seeds,
allowParallel = FALSE)
svmGrid <- expand.grid(sigma= 2^c(-25, -20, -15,-10, -5, 0), C= 2^c(0:5))
t1 <- Sys.time()
set.seed(1234235)
svmFuser.gafs <- gafs(x = train.set[, names(train.set) != "Class"],
y = train.set$Class,
gafsControl = gaCtrl,
trControl = tCtrl,
popSize = 10,
iters = 100,
method = "svmRadial",
preProc = c("center", "scale"),
tuneGrid = svmGrid,
metric="logLoss",
maximize = FALSE)
t2<- Sys.time()
svmFuser.gafs.time<-difftime(t2,t1)
save(svmFuser.gafs, file ="svmFuser.gafs.rda")
save(svmFuser.gafs.time, file ="svmFuser.gafs.time.rda")
Session Info:
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS
locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C LC_TIME=en_CA.UTF-8
[4] LC_COLLATE=en_CA.UTF-8 LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=en_CA.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] kernlab_0.9-22 doMC_1.3.3 iterators_1.0.7 foreach_1.4.2 caret_6.0-52 ggplot2_1.0.1 lattice_0.20-33
loaded via a namespace (and not attached):
[1] Rcpp_0.12.0 magrittr_1.5 splines_3.2.2 MASS_7.3-43 munsell_0.4.2
[6] colorspace_1.2-6 foreach_1.4.2 minqa_1.2.4 car_2.0-26 stringr_1.0.0
[11] plyr_1.8.3 tools_3.2.2 parallel_3.2.2 pbkrtest_0.4-2 nnet_7.3-10
[16] grid_3.2.2 gtable_0.1.2 nlme_3.1-122 mgcv_1.8-7 quantreg_5.18
[21] MatrixModels_0.4-1 iterators_1.0.7 gtools_3.5.0 lme4_1.1-9 digest_0.6.8
[26] Matrix_1.2-2 nloptr_1.0.4 reshape2_1.4.1 codetools_0.2-11 stringi_0.5-5
[31] compiler_3.2.2 BradleyTerry2_1.0-6 scales_0.3.0 stats4_3.2.2 SparseM_1.7
[36] brglm_0.5-9 proto_0.3-10
>
I am not so familiar with the gafsControl() function that you mention, but I encountered a very similar issue when setting parallel seeds using trainControl(). In the instructions, it describes how to create a list (length = number of resamples + 1), where each item is a list (length = number of parameter combinations to test). I find that doing that does not work (see topepo/caret issue #248 for info). However, if you then turn each item into a vector, e.g.
seeds <- lapply(seeds, as.vector)
then the seeds seem to work (i.e. models and predictions are entirely reproducible). I should clarify that this is using doMC as the backend. It may be different for other parallel backends.
Hope this helps
I was able to figure out my mistake by inspecting gafs.default. The seeds inside gafsControl() takes a vector with length (n_repeats*nresampling)+1 and not a list (as in trainControl$seeds). It is actually stated in the documentation of ?gafsControl that seeds is a vector or integers that can be used to set the seed during each search. The number of seeds must be equal to the number of resamples plus one. I figured it out the hard way, this is a reminder to carefully read the documentation :D.
if (!is.null(gafsControl$seeds)) {
if (length(gafsControl$seeds) < length(gafsControl$index) +
1)
stop(paste("There must be at least", length(gafsControl$index) +
1, "random number seeds passed to gafsControl"))
}
else {
gafsControl$seeds <- sample.int(1e+05, length(gafsControl$index) +
1)
}
So, the proper way to set my ga_seeds is:
#Index for gafsControl
set.seed(1045481)
ga_index <- createFolds(train.set$Class, k=3)
#Seed for the gafsControl()
set.seed(1056)
ga_seeds <- sample.int(1500, 4)
If that way settings seeds you can ensure each run the same feature subset is selected ? I ams asking due randominess of GA

batch insert in neo4j using py2neo 2.0

I have written this function to insert data as batch but while adding labels I am getting BindError: Local entity is not bound to a remote entity.
def bulkInsertNodes(n=1000):
graph = Graph()
btch = WriteBatch(graph)
nodesList = []
for i in range(1,n+1):
temp = Node(id = str(i))
nodesList.append(temp)
btch.create(temp)
btch.run()
btch = WriteBatch(graph)
for n in nodesList:
btch.add_labels(n,"Person")
btch.run()

Resources