what is 1..n, ?..m and these symbols in ERD? - entity-relationship

I'm new to ERD, I know 1:1 means one to one.
1:n one to many.
m:1 many to one.
m:n many to many
but what are these symbols mean is ERD?
image from "software idea modeler v11.55": http://s8.picofile.com/file/8326381876/Capture.JPG

n means exactly number
You can replace n with any number needed for your database model.
read more about this cheatsheet: http://www.32geeks.com/classes/resources/IDEF1X_Cheat_Sheet.pdf

Related

How to classify extracted relations (NLP)?

There are some not labeled corpus. I extracted from it triples (OBJECT, RELATION, OBJECT). For relation extraction I use Stanford OpenIE. But I need only some of this triples. For example, I need relation "funded".
Text:
Every startup needs a steady diet of funding to keep it strong and growing. Datadog, a monitoring service that helps customers bring together data from across a variety of infrastructure and software is no exception. Today it announced a massive $94.5 million Series D Round. The company would not discuss valuation.
From this text i want to extract relation (Datadog, announced, $94.5 million Round)
I have only one idea:
Use StanfordCoreference to detect that 'Datadog' in the first sentence and 'it' in second sentence are the same entity
Try to cluster relations, but i think it's won't work well
May be there are better approach? May be I need labeled corpus(i haven't it)?

Dimension with two surrogate keys or two seperate dimensions?

im looking for some guidance for dimensional modeling.
I'm looking at some search data that is stored in a database in a star schema. There is one dimension for queries and one dimension for landing pages. Both dimensions have a surrogate key that are stored in the fact table as foreign keys.
The fact table has about 100 million rows and the dimensions each have about 100k rows.
As the joins of these tables are taking very long lately i'm wondering if it would be a good idea to combine the two dimensions into one so it only joins to one table. The two dimensions are M:N so the new dimension would be very huge.
Thanks!!
There isn't a "right" answer for your question without knowing more about your data (like do you have more dimensions in your fact table? how many combinations of Queries and Landing pages do you have?), but few comments:
You current design (for what I can understand from here) is not bad, you have a lot of data, you have to deal with it, but combine two dimensions with 100K elements to avoid a join doesn't seems right to me
Try to optimize your queries, build indexes if you don't have them, parallelize your queries (if your db engine allows you to do so), try to avoid like in your where if possible, last resource think about more hardware or a different database engine.
If you usually query using only one of these dimensions maybe you can think about aggregated tables to reduce the number of rows, you will use more space but your query will have a single join and a smaller fact table
Can query be child of landing page? (i.e. stackoverflow.com is parent of queries like "Guru Meditation error message" and "stackcareers.com" is parent of "pool boy for datalake jobs") Of course you will end with the same query for multiple landing pages, you will need to assign different foreign keys in that case. But this different model can lead to a different solution, you will have only 1:M relationships and can build an aggregated table by landing page dimension, but this will require to change your queries to extract data. And again I don't know your data, maybe it will make more sense Queries parent of Landing Pages...
Again this are just my "thoughts" no solutions.

Should the "count" measure be stored in the fact table?

I have a fact table that includes "wait times in hours" for certain services. I have a lot of dimensions that could describe the wait-times based on different slices; however, I am also interested in knowing how many people (counts) came for services through the filters of the same dimensions.
Given the dimensions for both the wait-times in hours and the number of people who got services are exactly the same, I think it's best practice to keep it in the same fact table. My question is:
Should there be a different fact table for the count measure mentioned?
How would I include this measure? Do I just put 1 in every single row? Because regardless of the wait-time, they've gotten the service only once (you cannot go above/below 1 in my scenario).
1) Think about the grain of your existing fact table. It sounds like it's probably "an occasion on which a person received a service." If that's the same thing you're trying to count, then yes - the waiting time and the count are the same grain.
However, while they may well be the same grain, there might be no need to add anything to the table. Read point 2 for an explanation.
2) You could put a 1 in a column on every row, but I'm not sure what you'd gain from it. You've not said what tools will be consuming this data, but you should be able to do a count/distinct count of some kind.
Working on the basis that you've tagged SSIS so are likely using Microsoft's BI stack:
TSQL has count(), and you can do count(distinct [column]).
SSAS has both counts and distinct counts as aggregation types.
MDX offers several different types of count.
SSRS has Count, CountDistinct, and CountRows.
Whether you do a normal count or a distinct count will depend on whether you're trying to ask "How many people used this service?" or "How many different people used this service?"

Querying single relationship from a bidirectional relationship in Neo4j

Is it possible to show only one direction relationship from a bidirectional relationship?
(n)-[:EMAIL_LINK]->(m)
(n)<-[:EMAIL_LINK]-(m)
If the relationship type in question does not have directional semantics, it's best practice to have them only one time in the graph and omit the direction while querying, i.e. (a)-[:EMAIL_LINK]-(b) instead of (a)-[:EMAIL_LINK]->(b).
To get rid of duplicated relationships in different directions, use:
MATCH (a)-[r1:EMAIL_LINK]->(b)<-[r2:EMAIL_LINK]-(a)
WHERE ID(a)<ID(b)
DELETE r2
if your graph is large you need to take care of having reasonable transaction sizes by adding a LIMIT and running the query multiple times until all have been processed.
NB: the WHERE ID(a)<ID(b) is necessary. Otherwise a and b might change roles during in a later iteration. Consequently r1 and r2 would change roles as well and both get deleted.

SPSS Aggregating data from another dataset

I'm making a SPSS syntax. I have two datasets. One of the individuals, the other consisting of their groups. I have aggregated the mean of individuals in that active dataset. However, I need to have this constant in the other, the groups dataset. So that I can compare group means with the overall mean. Please help.
As mentioned in comment MATCH FILES command will help you to achieve the result.
Do you know the "insert" command which generates syntax for you?

Resources