Creating a single column from multiple columns (not merging) in SPSS - join

I have a single data file in SPSS that is organized as such:
Project (Unique)
Student 1 Name
Student 2 Name
Student 3 Name
Project A
Barry B.
Roger W.
Frank L.
Project B
Rebecca M.
Harry J.
Sam E.
Project C
Kit B.
MISSING
MISSING
I want to create a new table that gives each student their own row (and also lists the project they are affiliated with). Please notice the "MISSING" entries are omitted from the table:
Project
Student
Project A
Barry B.
Project A
Roger W.
Project A
Frank L.
Project B
Rebecca M.
Project B
Harry J.
Project B
Sam E.
Project C
Kit B.
Please help!
I have done quite a bit of googling and watched various YouTube videos on joining, merging, concatenating, and appending. But it seems like I am trying to do something different and I can't pin down what this process is called.

You need the varstocases function to restructure your data from wide to long format. Like this:
varstocases /make Student from student1 student2 student3/index=studentN(student).
The resulting data structure will be as you described, plus the studentN column will have the old student number column labels.

Related

In Google Sheets, how can I remove similar (but not duplicate) strings from the same column?

In Google Sheets, I have a column which is a list of company names, all within the same column. The problem is that some of them are repeated with slight differences. For example:
Company 1 Limited
Company 1 Ltd
Company 2
Company 3 Group
Company 3
Company 4
Company 5 p.l.c
Company 5 plc
Is there a way I can delete the similar (e.g. Company 1 "Ltd" vs Company 1 "Limited" entries to end up with a list like this?
Company 1 Ltd
Company 2
Company 3 Group
Company 4
Company 5 plc
I don't have a preference between words like 'Ltd' or 'Limited', or whether 'group' is present or not. I would just like to reduce, as much as possible, these similar double entries. I've come across fuzzylookup but my understanding is that it only works between two ranges.
The easiest thing for me to do would be to strip down the company names to a standard form by removing "Ltd" and "limited" with find and replace, and then remove the duplicates, but I would rather not go down this path as I would like to retain something following the company name. Keep in mind that the column contains company names which vary in string length. "Company X" is used in this case for demonstration purposes.
Sample here: https://docs.google.com/spreadsheets/d/1u2XDzKR09Ri_hR9FXxs9OwRswEKDvhlXCaLb1w-rhSc/edit?usp=sharing

Neo4j Cypher - how to query 'inherited' reations on a child-node?

I want to use Cypher to query all specifications that are valif for a product, but with the specifications defined at different parent levels of the product.
I have a data model that represents a product categorisation tree with levels C1, C2, C3, ... and at the lowest level products P. To simplify the maintenance and data-entry of product specifications, the validity of product specifications is defined at the categorisation levels. The products 'inherit' the specifications that are valid for all their parent categories, up to the root of the categorisation tree.
The (simplified) data-model is shown in the image. In this case product specifications are defined for categorisation levels C1, C2 and C3. The product is connected to the lowest categorisation level C3.
My objective is to query all specifications that are valid for product P, based on their relations to the categorisation levels C1, C2 and C3.
I have the following questions:
Is this possible with a single Cypher query?
What is the best query strategy in a large database? Use a query? Create real relations for all valid specifications for a product instead of querying the 'inherited' specifications?
Change the datamodel?
Other tips?
thanks
You could find all specifications of a product by MATCHing patterns of variable length.
Assuming you have a parameter productId, you would use something like this
MATCH (p:PRODUCT {productId:$productiId)-[:BELONGS_TO*]->(c:Category)<-[:VALID_FOR]-(s:Specification)
RETURN s
to retrieve the relevant specifications.
Since you seem to be working on bills of materials, some things you may want to look at:
Management of complex product specifications by splitting it up in "atoms" https://www.slideshare.net/neo4j/graphtour-neo4j-murrelektronik
and
An example of how you can keep track of versions of your BOM :
https://www.youtube.com/watch?v=7iMraBHtTqE
Disclosure : I'm a member of the Graphileon team, and involved in what is shown in the slide deck and video.

Select rows with "one of each" in relational algebra

Say I have a Personstable with attributes {name, pet}. How do I select the names of people where they have one of each kind of pet (dog, cat, bird), but a person only has one of each kind of pet if they pet is in the table.
Example: Bob, Dog and Bob, Cat are the only rows in the table. Therefore, Bob has one of each kind of pet. But the moment Lynda, Bird are added, Bob doesn't have one of each type of pet anymore.
I think the first step to this is to π(pet). You get a list of all kinds of pets since relational algebra removes duplicates. Not sure what to do after this, but I have think I need to join π(pet) and Persons.
I've tried a few things like Natural Join and Cross products but I haven't arrived at a result yet and I'm out of ideas.
The answer to the question can be found with the Division operator:
Persons ÷ πpet(Persons)
This relational algebra expression returns a relation with only the column name, containing all the names of the persons that have all the different kind of pets currently present in the Persons table itself.
The division is an operator that, in some sense, is the inverse of the product operator (the name is derived exactly from this fact). It is a derived operator that can be defined in terms of projection, set difference and product (see for instance this answer).

Neo4J Grandchild Relationships Linked to Nodes

I'm trying to model contractor relationships in Neo4J and I'm struggling with how to conceptualize subcontracts. I have nodes for Government Agencies (label: Agency) and Contractors (label:Company). Each of these have geospatial Office nodes with the HAS_OFFICE relationship. I'm thinking of creating a node that represents a Government Contract (label: Contract).
Here's what I'm struggling with: A Contract has a Government Agency (I'm thinking this is a "HAS CONTRACT" relationship) and one or more prime contractor(s) (I'm thinking this is a "PRIME" relationship). Here's where it gets complicated. Each of those primes contractors can have subcontractors under the prime contract only. Graphically, this is:
(Agency) -[HAS_CONTRACT]-> (Contract) -[PRIME]-> (Company 1) -[SUB]-> (Company 2)
The problem I'm struggling with is that the [SUB] relationship is only for certain contracts -- not all. For example:
Agency 1 -HAS-> Contract ABC -P-> Company 1 -S-> Company 2
Agency 1 -HAS-> Contract ABC -P-> Company 3 -S-> Company 4
Agency 2 -HAS-> Contract XYZ -P-> Company 1
Agency 2 -HAS-> Contract XYZ -P-> Company 4 -S-> Company 2
I want some way to search on that so I can ask cypher questions like "Find ways Agency 2 can put money on contract with Company 2." It should come back with the XYZ contract through Company 4, and NOT the XYZ contract through Company 1.
It seems like maybe storing and filtering on data within the relationship would work, but I'm struggling with how. Can I say Prima and Sub relationships have a property, "contract_id" that must match Contract['id']? If so, how?
Edit: I don't want to have to specify the contract name for the query. Based on #MarkM's reply, I'm thinking something like:
MATCH (a:Agency)-[:HAS]-(c:Contract)-[:PRIME {contract_id:c.id}]
-(p:Company)-[:SUB {contract_id:c.id}]-(s:Company)
RETURN s
I'd also like to be able to use things like shortestPath to find the shortest path between an agency and a contractor that follows a single contract ID.
I'd create the subcontractor by having two relationships, one to the contractor and one to the contract.
(:Agency)-[:ISSUES]->(con:Contract)-[:PRIMARY]->(contractor:Company)
(con:Contract)-[:SECONDARY]->(subContractor:Company)<-[:SUBCONTRACTS]-(contractor:Company)
Perhaps you can mode your use-case as a graph-gist, which is a good way of documenting and discussing modeling issues.
This seems pretty simple; I apologize if I've misunderstood the question.
If you want subcontractors you can simply query:
MATCH (a:Agency)-[:HAS]-(:Contract)-[:PRIME]-(p:Company)-[:SUB]-(s:Company) RETURN s
This will return all companies that are subcontractors. The query matches the whole pattern. So if you want XYZ contract subcontractors you simply give it the parameter:
MATCH (a:Agency)-[:HAS]-(:Contract {contractID: XYZ})-[:PRIME]-(p:Company)-[:SUB]-(s:Company) RETURN s
You'll only get company 2.
EDIT: based on your edit:
"Find ways Agency 2 can put money on contract with Company 2"
This seems to require some domain-specific knowledge which I don't have. I assume Agency 2 can only put money on subcontractors but not primes?? I might help if you reword so we know exactly what your trying to get from the graph. From my reading it looks like you want all companies that are subcontractors under Company 2's contracts. Is that right?
If that's what you want, again you just give Neo the path:
MATCH (a:Agency: {AgencyID: 2)-[:HAS]
-(c:Contract)-[:PRIME]-(:Company)-[:SUB]-(s:Company: {companyID: 2)
RETURN c, s
This will give you a list of all contracts under XYZ for which Company 2 is a subcontractor. With the current example, it will one row: [c:Contract XYZ, s:Company 2]. If Agency 2 had more contracts under which Company 2 subcontracted, you would get more rows.
You can't do this: [:PRIME {contract_id:c.id}] [:SUB {contract_id:c.id}] because Prime and Sub relationships shouldn't have contract_id properties. They don't need them — the very fact that they are connected to a contract is enough.
One thing that might make this a little more complicated is if the subcontractors also have subcontractors, but that's not evident.
Okay take 2:
So the problem isn't captured well in the original example data — sorry for missing it. A better example is:
Agency 1 -HAS-> Contract ABC -P-> Company 1 -S-> Company 2
Agency 1 -HAS-> Contract XYZ -P-> Company 1 -S-> Company 3
Now if I ask
MATCH (a:Agency)-[:HAS_CONTRACT]-(ABC:Contract {id:ABC})-[:PRIME]
-(c:Company)-[:SUBS]-(c2) RETURN c2
I'll get both Company 2 and 3 even though only 2 is on ABC, Right?
The problem here is the data model not the query. There's no way to distinguish a company's subs because they are all connected directly to the company node. You could put a property on the sub relationship with the prime ID, but a better way that really captures the information is to add another contract node under company. Whether you label this as a different type depends on your situation.
Company1 then [:HAS] a contract which the subs are connected to. The contract can then point back to the prime contract with a relationship of something like [:PARENT] or [:PRIME] or maybe from the prime to the sub with a [:SUBCONTRACT] relationship
Now everything becomes much easier. You can find all subcontracts under a particular contract, all subcontracts a particular company [:HAS], etc. To find all subcontractors under a particular contract you could query something like this:
MATCH (contract:Contract { id:"ContractABC" })-[:PRIME]-(c:Company)
-[:HAS]->(subcontract:Contract)-[:PARENT]-(contract)
WITH c, subcontract
MATCH (subcontract)-[:SUBS]-(subcontractor:Company)
RETURN c, subcontractor
This should give you a list of all companies and their subcontractors under contract ABC. In this case Company 1, Company 2 (but not company 3).
Here's a console example: http://console.neo4j.org/?id=flhv8e
I've left the original [:SUB] relationships but you might not need them.

How do I design product and category tables in an Entity Relationship Diagram

I am designing an ERD. I am considering how I would link a product entity to a category
I have two categories:
1. BrandCategory (i.e. Apple, Nokia etc.)
2. TypeCategory (Smartphone, Laptop, Tablet etc.)
A product can belong to a BrandCategory and a TypeCategory.
Can somoene advise me on how on how to link these up?
Thank you.
That is easy. First look at your objects:
P = Product, BC = BrandCategory, TC = TypeCategory
1) BC and TC are not related to each other.
2) P is related to TC. Lets look at the objects:
P_1 ----> TC_1 (read as: Product_1 belongs to TypeCategory_1)
P_2 ----> TC_1
P_3 ----> TC_2
As we see, ONE product belongs to ONE typeCategory. And ONE typeCategory can have MANY products.
So we have a 1 to many relationship here.
Do the same for brandCategory. And the model should be complete.
But I suggest the you have inheritence here. You could model it with inheritance too, which could make more sense. Google for "Entity-Relationship" to get more info on that.
regards

Resources