I am considering using either of the following stack for a personal project:
Nodes.js/MongoDB (learning)
Rails/Postgres (more familiar)
I would like to give MongoDB a try for learning purposes, but I am unsure if it is suitable for this problem. I would like to hear the trade-off and examples based on the following problem description, some specific questions are at the bottom:
There are a list of Products, let's say p1, p2, p3, and each product has the fields for some environmental impact, let's say A, B, C.
p1 p2
+ +
| |
| |
+------------------+ +----+----+
| | | | |
+ + + + +
p3 p4 p5 p3 p6
+ + |
| | |
+-----+-+ +---+--+ +---+--+
+ + + + + +
p7 p8 p2 p9 p10 p11
p1.A = p3.A + p4.A + p5.A
p1.B = p3.B + p4.B + p5.B
p3.A = p7.A + p8.A
Product Table would look something like this
id A B C parents children
1 4 5 6 [] [3, 4, 5]
2 10 11 12 [4] [3, 6]
3 6 7 8 [1,2] [7, 8]
4 3 9 6 [1] [2, 9]
5 3 3 10 [1] [10, 11]
6 3 1 2 [2] []
7 4 5 0 [3] []
...
Updates Process would look like this:
p1 is made of p2 and p3.
p2 is also made of p3
If p3 A, B, or C updates, it would trigger a p1 update to recalculate its A, B, C, although maybe still with old p2's value. Then when p3 updates p2, p2 updates will trigger the p1 updates again. There could be some redundant operations in the updates depending on the ordering. I am guessing that is ok.
Since the environmental impact is not a critical data, I am just looking that the data becomes eventually consistent.
In terms of scale, maybe tens of thousands of products at some point.
Questions:
1) I need to way to prevent infinite update cycle in a circular graph.
2) Can you handle this type of two-way associations in MongoDB easily, product has parents that are products, and children that are products.
3) What are the different approaches I can structure my data instead of parents and child arrays, and design this update process efficiently. If I design it such that when one product update, trigger another update, which trigger another update and the chain goes on, that could potentially make a long web request cycle?
Thanks.
Your model is best described as a directed graph
G = (V,E) ; V->P ; E = VxV
Therefore neither PostgreSQL or MongoDB is really good for your use case.
The biggest advantage of MongoDB compared to traditional RDBMS like PostgreSQL is the dynamic schema of MongoDB. That means you can add new records with various structure without re-defining the database schema.
But the model you have described is pretty static to me. So the argument doesn't count for your problem.
As far as I am concerned the best technology decision in your case is to use a graph database like Neo4j.
As alternative inspiration you could take a look on graph data structures. Therefore one efficient way to model a graph is the use of an adjacency matrix.
Related
I could be doing this completely wrong, or I could be on the right path, I have no idea! I'm trying to grade a decision based on 3 criteria. The grades are AAA-A and BBB-B, etc. but for now I just need AAA-A and can figure out the rest.
Essentially, we want Col. J to populate based on what Col.'s G-I say. In my head it's super easy but I want to automate this step.
So I start with col.I and see the pairing.. AAA-A results are any of these "G/G" "LG/G" "G/LG" or "R/R". If it is one of those 4 pairings then we start at AA grade.
Then I check col.G (it doesnt matter now if I check H or G first), and if G>=.5 we grade it higher at AAA, if its less than .5 then do nothing and keep it at AA.
Then I look at col. H (or G if we started at H) and if it is a "Y" we grade down from AA to A. or AAA to AA. But it is "N" do nothing.
What I have so far is attached. It technically works for 3/4 of these cells but that could be a coincidence. The results column(J) should be row3 - AA, row4 - AA, row5 - AAA, row6 - AA.
And for one additional test, imagine: col.g = .64, col.h = Y, col.i = G/G -- then we want AA as the result.
Definitely the hardest test I've had in excel/sheets. I appreciate the help! Thanks in advance!
Formula I tried:
=Ifs((or(I3="G/G",I3="LG/G",I3="G/LG",I3="R/R"),"AA", and(or(I3="G/G",I3="LG/G",I3="G/LG",I3="R/R"),G3>0.5),"AAA",H3="Y","A")
Data Sample:
G
H
I
J
3
-0.07
N
R/R
AA
4
-0.46
N
R/R
AA
5
0.64
N
G/G
AA
6
0.76
Y
LG/G
AA
As presented, your formula simply returns an error, and seems like a misinterpretation of how Ifs works. However, it suggest you're trying to Nest If statements. And, from your description, I think that makes sense.
Assuming that's a valid interpretation, the following does what you want.
(At least as far as AAA-A is concerned).
=If(or(I3="G/G",I3="LG/G",I3="G/LG",I3="R/R"),if(G3<0.5,"AA","AAA"),if(H3="Y","A","Not an A"))
The BBB-B logic would be the same (just nested in where "Not an A" is).
I posed the question generically, because maybe it is a generic answer. But a specific example is comparing 2 BigQuery tables with the same schema, but potentially different data. I want a diff, i.e. what was added, deleted, modified, with respect to a composite key, e.g. the first 2 columns.
Table A
C1 C2 C3
-----------
a a 1
a b 1
a c 1
Table B
C1 C2 C3 # Notes if comparing B to A
-------------------------------------
a a 1 # No Change to the key a + a
a b 2 # Key a + b Changed from 1 to 2
# Deleted key a + c with value 1
a d 1 # Added key a + d
I basically want to be able to make/report the comparison notes.
Or from a Beam perspective I may want to Just output up to 4 labeled PCollections: Unchanged, Changed, Added, Deleted. How do I do this and what would the PCollections look like?
What you want to do here, basically, is join two tables and compare the result of that, right? You can look at my answer to this question, to see the two ways in which you can join two tables (Side inputs, or CoGroupByKey).
I'll also code a solution for your problem using CoGroupByKey. I'm writing the code in Python because I'm more familiar with the Python SDK, but you'd implement similar logic in Java:
def make_kv_pair(x):
""" Output the record with the x[0]+x[1] key added."""
return ((x[0], x[1]), x)
table_a = (p | 'ReadTableA' >> beam.Read(beam.io.BigQuerySource(....))
| 'SetKeysA' >> beam.Map(make_kv_pair)
table_b = (p | 'ReadTableB' >> beam.Read(beam.io.BigQuerySource(....))
| 'SetKeysB' >> beam.Map(make_kv_pair))
joined_tables = ({'table_a': table_a, 'table_b': table_b}
| beam.CoGroupByKey())
output_types = ['changed', 'added', 'deleted', 'unchanged']
class FilterDoFn(beam.DoFn):
def process((key, values)):
table_a_value = list(values['table_a'])
table_b_value = list(values['table_b'])
if table_a_value == table_b_value:
yield pvalue.TaggedOutput('unchanged', key)
elif len(table_a_value) < len(table_b_value):
yield pvalue.TaggedOutput('added', key)
elif len(table_a_value) > len(table_b_value):
yield pvalue.TaggedOutput('removed', key)
elif table_a_value != table_b_value:
yield pvalue.TaggedOutput('changed', key)
key_collections = (joined_tables
| beam.ParDo(FilterDoFn()).with_outputs(*output_types))
# Now you can handle each output
key_collections.unchanged | WriteToText(...)
key_collections.changed | WriteToText(...)
key_collections.added | WriteToText(...)
key_collections.removed | WriteToText(...)
I have a list of variables for which I want to create a list of numbered variables. The intent is to use these with the reshape command to create a stacked data set. How do I keep them in order? For instance, with this code
local ct = 1
foreach x in q61 q77 q99 q121 q143 q165 q187 q209 q231 q253 q275 q297 q306 q315 q324 q333 q342 q351 q360 q369 q378 q387 q396 q405 q414 q423 {
gen runs`ct' = `x'
local ct = `ct' + 1
}
when I use the reshape command it generates an order as
runs1 runs10 runs11 ... runs2 runs22 ...
rather than the desired
runs01 runs02 runs03 ... runs26
Preserving the order is necessary in this analysis. I'm trying to add a leading zero to all ct values less than 10 when assigning variable names.
Generating a series of identifiers with leading zeros is a documented and solved problem: see e.g. here.
local j = 1
foreach v in q61 q77 q99 q121 q143 q165 q187 q209 q231 q253 q275 q297 q306 q315 q324 q333 q342 q351 q360 q369 q378 q387 q396 q405 q414 q423 {
local J : di %02.0f `j'
rename `v' runs`J'
local ++j
}
Note that I used rename rather than generate. If you are going to reshape the variables afterwards, the labour of copying the contents is unnecessary. Indeed the default float type for numeric variables used by generate could in some circumstances result in loss of precision.
I note that there may also be a solution with rename groups.
All that said, it's hard to follow your complaint about what reshape does (or does not) do. If you have a series of variables like runs* the most obvious reshape is a reshape long and for example
clear
set obs 1
gen id = _n
foreach v in q61 q77 q99 q121 q143 {
gen `v' = 42
}
reshape long q, i(id) j(which)
list
+-----------------+
| id which q |
|-----------------|
1. | 1 61 42 |
2. | 1 77 42 |
3. | 1 99 42 |
4. | 1 121 42 |
5. | 1 143 42 |
+-----------------+
works fine for me; the column order information is preserved and no use of rename was needed at all. If I want to map the suffixes to 1 up, I can just use egen, group().
So, that's hard to discuss without a reproducible example. See
https://stackoverflow.com/help/mcve for how to post good code examples.
I'm parsing a book within neo4j and I'd like to extract genealogy out of it I have sentences like :
"A begat B,C and D"
"X begat Y, and Y begat Z, ..."
and I store that as
(A:word)-[:subj]->(begat:word)-[:obj]-> (B:word)
(A:word)-[:subj]->(begat:word)-[:comp]-> (C:word)
(X:word)-[:subj]->(begat:word)-[:obj]-> (Y:word)
(Y:word)-[:subj]->(begat:word)-[:obj]-> (Z:word)
(X:word)-[:NNP]->(sentence:word)
(Y:word)-[:NNP]->(sentence:word)
(Z:word)-[:NNP]->(sentence:word)
(begat:word)-[:VBG]->(sentence:word)
How could I write my cypher request so that neo4j server visualization give me a tree instead of one "begat" node with all the other ones linking to it ? My genealogy is on several sentences and when linking word together I add the sentenceId to the relationship maybe we could use that.
The result would look like
A
______|_____
| | |
B C D
|
X
|
Y
|
Z
One more info the words are stored only once to avoid memory consumption.
Here is a sample of my data :
http://console.neo4j.org/r/xzsazf
Many thanks
I'm at a loss.
Scenario: get depth of 2 from Joe (#'s represent 'Person'. Letters represent 'Position')
0 E
| |
1 B
/ | \ / | \
2 JOE 3 C A D
/|\ /|\
0 0 0 F G H
/\ | | /\ | |
0 0 0 0 I J K L
Catch is, a person is tied to a position. Position has relationships to each other but a person doesn't have a relationship to another person. so it goes something like:
Joe<-[:occupied_by]-(PositionA)-[:authority_held_by]->
(PositionB)-[:occupied_by]->Sam
This query:
Match (:Identity {value:"1234"})-[:IDENTIFIES]->(posStart:Position)
-[:IS_OCCUPIED_BY]->(perStart:Person)
Optional Match p=(perStart)<-[:IS_OCCUPIED_BY]-(posStart)
-[r:AUTHORITY_HELD_BY*..1]-(posEnd:Position)-[:IS_OCCUPIED_BY]->
(perEnd:Person) Return p
does get me what I need but always returns the first column as the original node it started with (perStart). I want to return it in a way where the first two column always represent the start node and the second column to represent the end node.
PositionA, PositionB (and we can infer this means A-[:authority_held_by]->B
If we had bi directional relationship, such as, A-[:authority_held_by]->B and B-[:manages]->A
I wouldn't mind what's in the first or the second column as we can have the third column represent the relationship
PositionB, PositionA, [:manages]
but we are trying to stay away from bi-directional relationship
Ultimately I want something like:
PositionA, PositionB (inferring, A-[:A_H_B]->B)
PositionB, PositionE (inferring, B-[:A_H_B]->E)
PositionF, PositionA (inferring, F-[:A_H_B]->A)
PositionG, PositionA (inferring, G-[:A_H_B]->A)
Is this possible with cypher or do I have to do some black magic? :)
I hope I explained throughly and understandably.. Thank you so much in advance!
would replacing Return p with-
RETURN nodes(p)[0] AS START, LAST(nodes(p)) as last
work?