Data Frame from RNeo4j cypher() output - neo4j

I've got a problem with this code:
library(RNeo4j)
library(dplyr)
library(stringr)
library(MASS)
graph = startGraph("http://localhost:7474/db/data/", username = "neo4j", password = ",yT:/9L)8aoi8t")
query = "MATCH (m:MARKETS)
MATCH (n:CBP_NAICS {msa_naics: m.jll_msa} )
WHERE n.naics CONTAINS '----'
match (c1:Category)
WHERE toString(c1.id) = left(n.naics,2)
match (q:JLL_qtr {qtr: 'Q2-2016',mkt: m.mkt,level: 1} )
match (c:BldgClass {qtr: q.qtr,mkt: m.mkt,class: 'Totals'} )
match (N:Neighborhood {qtr: c.qtr,nbrhd: c.nbrhd,nbrhd: q.nbrhd,BldgClass: c.class} )
return m.mkt, n.msa_naics, ... ,N.AvgOverallAskRent"
naics_jll <- cypher(graph, query)
df_corr <- naics_jll[sapply(naics_jll, is.numeric)]
The Neo4j query itself yields expected results when run in the Neo4j shell.
In RStudio, the data frame appears correct. View(naics_jll) & View(df_corr) both "look right"
However dplyr::summarize() -- for both data frames -- gives:
## data frame with 0 columns and 0 rows
On top of that, I get "funny" results from analysis of the data in the data frames.
I did both a Google search and a SO search on both data frame with 0 columns and 0 rows AND rneo4j data frame with 0 columns and 0 rows, and found nothing helpful.

Related

save strings in lua table

Does someone know a solution to save the key and the values to an table? My idea does not work because the length of the table is 0 and it should be 3.
local newstr = "3 = Hello, 67 = Hi, 2 = Bye"
a = {}
for k,v in newstr:gmatch "(%d+)%s*=%s*(%a+)" do
--print(k,v)
a[k] = v
end
print(#a)
The output is correct.
run for k,v in pairs(a) do print(k,v) end to check the contents of your table.
The problem is the length operator which by default cannot be used to get the number of elements of any table but a sequence.
Please refer to the Lua manual: https://www.lua.org/manual/5.4/manual.html#3.4.7
When t is a sequence, #t returns its only border, which corresponds to
the intuitive notion of the length of the sequence. When t is not a
sequence, #t can return any of its borders. (The exact one depends on
details of the internal representation of the table, which in turn can
depend on how the table was populated and the memory addresses of its
non-numeric keys.)
Only use the length operator if you know t is a sequence. That's a Lua table with integer indexes 1,..n without any gap.
You don't have a sequence as you're using non-numeric keys only. That's why #a is 0
The only safe way to get the number of elements of any table is to count them.
local count = 0
for i,v in pairs(a) do
count = count + 1
end
You can put #Piglet' code in the metatable of a as method __len that is used for table key counting with length operator #.
local newstr = "3 = Hello, 67 = Hi, 2 = Bye"
local a = setmetatable({},{__len = function(tab)
local count = 0
for i, v in pairs(tab) do
count = count + 1
end
return count
end})
for k,v in newstr:gmatch "(%d+)%s*=%s*(%a+)" do
--print(k,v)
a[k] = v
end
print(#a) -- puts out: 3
The output of #a with method __len even is correct if the table holds only a sequence.
You can check this online in the Lua Sandbox...
...with copy and paste.
Like i do.

horizontally joining multiple dataframes in pyspark

I am trying to horizontally join multiple dataframes (with same number of records) in pyspark using monotonically_increasing_id(). However the results obtained have inflated number of records
for i in range(len(lst)+1):
if i==0:
df[i] = cust_mod.select('key')
df[i+1] = df[i].withColumn("idx", monotonically_increasing_id())
else:
df_tmp = o[i-1].select(col("value").alias(obj_names[i-1]))
df_tmp = df_tmp.withColumn("idx", monotonically_increasing_id())
df[i+1] = df[i].join(df_tmp, "idx", "outer")
Expected number of records in df[i+1]=~60m. Got : ~88m. It seems monotonically increasing id is not generating same numbers all the time. How can I resole this problem?
Other details:
cust_mod > dataframe, count- ~60m
o[i] - another set of dataframes, with length equal to cust_mod
lst - a list than has 49 components . So in total 49 loops
I tried using zipWithIndex():
for i in range(len(lst)+1):
if i==0:
df[i] = cust_mod.select('key')
df[i+1] = df[i].rdd.zipWithIndex().toDF()
else:
df_tmp = o[i-1].select("value").rdd.zipWithIndex().toDF()
df_tmp1 = df_tmp.select(col("_1").alias(obj_names[i-1]),col("_2"))
df[i+1] = df[i].join(df_tmp1, "_2", "inner").drop(df_tmp1._2)
But it's way sloww. Like 50 times slow.

Find all occurences of exact string in range and list it

I want to create list of all occurences of "x" string in range. This is my sheet:
And I want to search all occurences and list them and give proper names:
For example for G2, I want "Beret Grey" string as result. I think that I need to use array formula or something like that.
Let me first preface this that vba would be much more robust, but this formula will get you there. It may be slow as it is an array type formula and is doing a lot of calculations. These calculations only expound exponentially as the number of cells with them in it increases:
=IFERROR(INDEX(A:A,AGGREGATE(15,6,ROW($B$2:$G$7)/($B$2:$G$7="x"),ROW(1:1))) & " " & INDEX($1:$1,AGGREGATE(15,6,COLUMN(INDEX(A:G,AGGREGATE(15,6,ROW($B$2:$G$7)/($B$2:$G$7="x"),ROW(1:1)),0))/(INDEX(A:G,AGGREGATE(15,6,ROW($B$2:$G$7)/($B$2:$G$7="x"),ROW(1:1)),0)="x"),ROW(1:1)-COUNTIF($B$1:INDEX(G:G,AGGREGATE(15,6,ROW($B$2:$G$7)/($B$2:$G$7="x"),ROW(1:1)) -1),"x"))),"")
You will need to expand the range to what you need. Change all the $B$2:$G$7 to $B$2:$N$29. Do not use full column references outside those that I have used. It will kill Excel.
Also note what is and what is not relative references, they need to remain the same or you will get errors as the formula is dragged/copied down.
As simple UDF to do what you want:
Function findMatch(rng As Range, crit As String, inst As Long) As String
Dim rngArr() As Variant
rngArr = rng.Value
Dim i&, j&, k&
k = 0
If k > Application.WorksheetFunction.CountIf(rng, crit) Then
findMatch = ""
Exit Function
End If
For i = LBound(rngArr, 1) + 1 To UBound(rngArr, 1)
For j = LBound(rngArr, 2) + 1 To UBound(rngArr, 2)
If rngArr(i, j) = crit Then
k = k + 1
If k = inst Then
findMatch = rngArr(i, 1) & " " & rngArr(1, j)
Exit Function
End If
End If
Next j
Next i
then you would call it like this:
=findMatch($A$1:$G$7,"x",ROW(1:1))
And drag/copy down.

how to "page" through the data in a Lua table used as a dictionary?

How would I code a function to iterate through one "pages" worth of data? Sample code would be ideal...
So say we image the size of a page is 5 items. If we had a lua table with 18 items it would need to print out:
Page 1: 1 to 5
Page 2: 6 to 10
Page 3: 11 to 15
Page 4: 16 to 18
So assume the data is something like:
local data = {}
data["dog"] = {1,2,3}
data["cat"] = {1,2,3}
data["mouse"] = {1,2,3}
data["pig"] = {1,2,3}
.
.
.
How would one code the function that would do the equivalent of this:
function printPage (myTable, pageSize, pageNum)
-- find items in "myTable"
end
So in fact I'm not even sure if a Lua table used as a dictionary can even do this? There is no specific ordering is there in such a table, so how would you be sure the order would be the same when you come back to print page 2?
The next function allows you to go through a table in an order (albeit an unpredictable one). For example:
data = { dog = "Ralf", cat = "Tiddles", fish = "Joey", tortoise = "Fred" }
function printPage(t, size, start)
local i = 0
local nextKey, nextVal = start
while i < size and nextKey ~= nil do
nextKey, nextVal = next(t, nextKey)
print(nextKey .. " = " .. nextVal)
i = i + 1
end
return nextKey
end
local nextPage = printPage(data, 2) -- Print the first page
printPage(data, 2, nextPage) -- Print the second page
I know this isn't quite in the form you were after, but I'm sure it can be adapted quite easily.
The next function returns the key after the one provided in the table, along with its value. When the end of the table is reached, it returns nil. If you provide nil as the second parameter, it returns the first key and value in the table. It's also documented in Corona, although it appears to be identical.

Detailed distance between words

How would I go about displaying detailed distance between words.
For example, the output of the program could be:
Words are "car" and "cure":
Replace "a" with "u".
Add "e".
The Levenshtein distance does not fulfill my needs (I think).
Try the following. The algorithm is roughly following Wikipedia (Levenshtein distance). The language used below is ruby
Use as an example, the case of changing s into t as follows:
s = 'Sunday'
t = 'Saturday'
First, s and t are turned into arrays, and an empty string is inserted at the beginning. m will eventually be the matrix used in the argorithm.
s = ['', *s.split('')]
t = ['', *t.split('')]
m = Array.new(s.length){[]}
m here, however, is different from the matrix given if the algorithm in wikipedia for the fact that each cell includes not only the Levenshtein distance, but also the (non-)operation (starting, doing nothing, deletion, insertion, or substitution) that was used to get to that cell from an adjacent (left, up, or upper-left) cell. It may also include a string describing the parameters of the operation. That is, the format of each cell is:
[Levenshtein distance, operation(, string)]
Here is the main routine. It fills in the cells of m following the algorithm:
s.each_with_index{|a, i| t.each_with_index{|b, j|
m[i][j] =
if i.zero?
[j, "started"]
elsif j.zero?
[i, "started"]
elsif a == b
[m[i-1][j-1][0], "did nothing"]
else
del, ins, subs = m[i-1][j][0], m[i][j-1][0], m[i-1][j-1][0]
case [del, ins, subs].min
when del
[del+1, "deleted", "'#{a}' at position #{i-1}"]
when ins
[ins+1, "inserted", "'#{b}' at position #{j-1}"]
when subs
[subs+1, "substituted", "'#{a}' at position #{i-1} with '#{b}'"]
end
end
}}
Now, we set i, j to the bottom-right corner of m and follow the steps backwards as we unshift the contents of the cell into an array called steps, until we reach the start.
i, j = s.length-1, t.length-1
steps = []
loop do
case m[i][j][1]
when "started"
break
when "did nothing", "substituted"
steps.unshift(m[i-=1][j-=1])
when "deleted"
steps.unshift(m[i-=1][j])
when "inserted"
steps.unshift(m[i][j-=1])
end
end
Then we print the operation and the string of each step unless that is a non-operation.
steps.each do |d, op, str=''|
puts "#{op} #{str}" unless op == "did nothing" or op == "started"
end
With this particular example, it will output:
inserted 'a' at position 1
inserted 't' at position 2
substituted 'n' at position 2 with 'r'
class Solution:
def solve(self, text, word0, word1):
word_list = text.split()
ans = len(word_list)
L = None
for R in range(len(word_list)):
if word_list[R] == word0 or word_list[R] == word1:
if L is not None and word_list[R] != word_list[L]:
ans = min(ans, R - L - 1)
L = R
return -1 if ans == len(word_list) else ans
ob = Solution()
text = "cat dog abcd dog cat cat abcd dog wxyz"
word0 = "abcd"
word1 = "wxyz"
print(ob.solve(text, word0, word1))

Resources