I have written this function to insert data as batch but while adding labels I am getting BindError: Local entity is not bound to a remote entity.
def bulkInsertNodes(n=1000):
graph = Graph()
btch = WriteBatch(graph)
nodesList = []
for i in range(1,n+1):
temp = Node(id = str(i))
nodesList.append(temp)
btch.create(temp)
btch.run()
btch = WriteBatch(graph)
for n in nodesList:
btch.add_labels(n,"Person")
btch.run()
Related
I am new to using LSI with Python and Gensim + Scikit-learn tools. I was able to achieve topic modeling on a corpus using LSI from both the Scikit-learn and Gensim libraries, however, when using the Gensim approach I was not able to display a list of documents to topic mapping.
Here is my work using Scikit-learn LSI where I successfully displayed document to topic mapping:
tfidf_transformer = TfidfTransformer()
transformed_vector = tfidf_transformer.fit_transform(transformed_vector)
NUM_TOPICS = 14
lsi_model = TruncatedSVD(n_components=NUM_TOPICS)
lsi= nmf_model.fit_transform(transformed_vector)
topic_to_doc_mapping = {}
topic_list = []
topic_names = []
for i in range(len(dbpedia_df.index)):
most_likely_topic = nmf[i].argmax()
if most_likely_topic not in topic_to_doc_mapping:
topic_to_doc_mapping[most_likely_topic] = []
topic_to_doc_mapping[most_likely_topic].append(i)
topic_list.append(most_likely_topic)
topic_names.append(topic_id_topic_mapping[most_likely_topic])
dbpedia_df['Most_Likely_Topic'] = topic_list
dbpedia_df['Most_Likely_Topic_Names'] = topic_names
print(topic_to_doc_mapping[0][:100])
topic_of_interest = 1
doc_ids = topic_to_doc_mapping[topic_of_interest][:4]
for doc_index in doc_ids:
print(X.iloc[doc_index])
Using Gensim I was unable to proceed to display the document to topic mapping:
processed_list = []
stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()
for doc in documents_list:
tokens = word_tokenize(doc.lower())
stopped_tokens = [token for token in tokens if token not in stop_words]
lemmatized_tokens = [lemmatizer.lemmatize(i, pos="n") for i in stopped_tokens]
processed_list.append(lemmatized_tokens)
term_dictionary = Dictionary(processed_list)
document_term_matrix = [term_dictionary.doc2bow(document) for document in processed_list]
NUM_TOPICS = 14
model = LsiModel(corpus=document_term_matrix, num_topics=NUM_TOPICS, id2word=term_dictionary)
lsi_topics = model.show_topics(num_topics=NUM_TOPICS, formatted=False)
lsi_topics
How can I display the document to topic mapping here?
In order to get the representation of a document (represented as a bag-of-words) from a trained LsiModel as a vector of topics, you use Python dict-style bracket-accessing (model[bow]).
For example, to get the topics for the 1st item in your training data, you can use:
first_doc = document_term_matrix[0]
first_doc_lsi_topics = model[first_doc]
You can also supply a list of docs, as in training, to get the LSI topics for an entire batch at once. EG:
all_doc_lsi_topics = model[document_term_matrix]
So I am trying to edit my config list where it has to edit robbed to true when entity is equal to entity in the list (entities get generated when my script is starting)
Config file
Config.location = {
[1] = {
x = 24.39,
y = -1345.776,
z = 29.49,
h = 267.58,
robbed = false,
entity = nil
},
[2] = {
x = -47.7546,
y = -1759.276,
z = 29.421,
h = 48.035,
robbed = false,
entity = nil
},
}
So this list gets loaded - When [1] has been robbed it should change robbed in [1] if the entity matches.
I would imagine i should do a for loop but i'm still clueless.
As Config.list is a sequence with positive integer keys starting from 1 you can conveniently use the iparis iterator in combination with a generic for loop to check every entry in your list.
for i,v in ipairs(Config.location) do
v.robbed = v.entity == someOtherEntity and true or false
end
Of course your entity entries shouldn't be nil as this wouldn't make sense.
I am using Groovy script to pull the data from oracle DB from Jenkins Job/Combo Box. It is taking lot of time to pull the data.
How to improve the performance?
import groovy.sql.Sql
Properties properties = new Properties()
File propertiesFile = new File('/opt/groovy/db.properties')
propertiesFile.withInputStream {
properties.load(it)
}
def Param = []
def arg = []
args.each{ arg.push(it)}
def dbUrl = 'jdbc:oracle:thin:#' + properties.dbServer + ':52000/' +
properties.dbSchema
sql = Sql.newInstance( dbUrl, properties.dbUser, properties.dbPassword,
properties.dbDriver )
switch (arg[0]) {
case { it == 'APP' }:
Param.push('Select')
query = "SELECT DISTINCT APP FROM INV ORDER BY APP"
sql.eachRow(query) { row ->
Param.push(row[0])
}
def App_array_final = Param.collect{ '"' + it + '"'}
print App_array_final
break;
I am trying to get all the data from sql table every minute using Flume.
Can someone please suggest what config changes needs to be done?
Configs :
agent.channels = ch1
agent.sinks = kafkaSink
agent.sources = sql-source
agent.channels.ch1.type = memory
agent.channels.ch1.capacity = 1000000
agent.sources.sql-source.channels = ch1
agent.sources.sql-source.type = org.keedio.flume.source.SQLSource
# URL to connect to database
agent.sources.sql-source.connection.url = jdbc:sybase:Tds:abcServer:4500
# Database connection properties
agent.sources.sql-source.user = user
agent.sources.sql-source.password = XXXXXXX
agent.sources.sql-source.table = person
agent.sources.sql-source.columns.to.select = *
# Increment column properties
agent.sources.sql-source.incremental.column.name = person_id
# Increment value is from you want to start taking data from tables (0 will import entire table)
agent.sources.sql-source.incremental.value = 0
# Query delay, each configured milisecond the query will be sent
agent.sources.sql-source.run.query.delay=1000
# Status file is used to save last readed row
agent.sources.sql-source.status.file.path = /dump/apache-flume-1.6.0-bin
agent.sources.sql-source.status.file.name = sql-source.status
Change value of agent.sources.sql-source.run.query.delay to 60000..
When updating Neo4j and py2neo to last versions (2.2.3 and 2.0.7 respectively), I'm facing some problems with some import scripts.
For instance here, just a bit of code.
graph = py2neo.Graph()
graph.bind("http://localhost:7474/db/data/")
batch = py2neo.batch.PushBatch(graph)
pp.pprint(batch)
relationshipmap={}
def create_go_term(line):
if(line[6]=='1'):
relationshipmap[line[0]]=line[1]
goid = line[0]
goacc = line[3]
gotype = line[2]
goname = line[1]
term = py2neo.Node.cast( {
"id": goid, "acc": goacc, "term_type": gotype, "name": goname
})
term.labels.add("GO_TERM")
pp.pprint(term)
term.push()
#batch.append( term )
return True
logging.info('creating terms')
reader = csv.reader(open(opts.termfile),delimiter="\t")
iter = 0
for row in reader:
create_go_term(row)
iter = iter + 1
if ( iter > 5000 ):
# batch.push()
iter = 0
# batch.push()
When using batch or simply push without batch, I'm getting this error:
py2neo.error.BindError: Local entity is not bound to a remote entity
What am I doing wrong?
Thanks!
I think you first have to create the node before you can add the label and use push:
term = py2neo.Node.cast( {
"id": goid, "acc": goacc, "term_type": gotype, "name": goname
})
graph.create(term) # now the node should be bound to a remote entity
term.labels.add("GO_TERM")
term.push()
Alternatively, you can create the node with a label:
term = Node("GO_TERM", id=goid, acc=goacc, ...)
graph.create(term)