ERD or MCD one-to-many diagrams - entity-relationship

What is the difference of those two diagrams?
and

The first diagram is in Chen notation the second diagram is in Crow's foot notation.
Image Source: Wikipedia

Related

Recent methods for finding semantic similarity between two short sentences or articles (on a concept level)

I'm working on finding similarities between short sentences and articles. I used many existing methods such as tf-idf, word2vec etc but the results are just okay. The most relevant measure which I found was word moving distance, however, its results are not that better than the other measures. I know it's a challenging problem, however, I am wondering if there are any new methods to find an approximate similarity more on a higher or concept level than just matching words. Especially, any alternative new methods like word moving distance which looks at slightly higher semantic of a sentence or article?
This is the most recent basing on a paper published 4 months ago.
Step 1:
Load the suitable model using gensim and calculate the word vectors for words in the sentence and store them as a word list
Step 2 : Computing the sentence vector
The calculation of semantic similarity between sentences was difficult before but recently a paper named "A SIMPLE BUT TOUGH-TO-BEAT BASELINE FOR SENTENCE EMBEDDINGS" was proposed which suggests a simple approach by computing the weighted average of word vectors in the sentence and then remove the projections of the average vectors on their first principal component.Here the weight of a word w is a/(a + p(w)) with a being a parameter and p(w) the (estimated) word frequency called smooth inverse frequency.this method performing significantly better.
A simple code to calculate the sentence vector using SIF(smooth inverse frequency) the method proposed in the paper has been given here
Step 3: using sklearn cosine_similarity load two vectors for the sentences and compute the similarity.
This is the most simple and efficient method to compute the semantic similarity of sentences.
Obviously, this is a huge and busy research area, but I'd say there are two broad types of approaches you could look into:
First, there are some methods that learn sentence embeddings in an unsupervised manner, such as Le and Mikolov's (2014) Paragraph Vectors, which are implemented in gensim, or Kiros et al.'s (2015) SkipThought vectors, with an implementation on Github.
Then there also exist supervised methods that learn sentence embeddings from labelled data. The most recent one is Conneau et al.'s (2017), which trains sentence embeddings on the Stanford Natural Language Inference dataset, and shows these embeddings can be used successfully across a range of NLP tasks. The code is available on Github.
You might also find some inspiration in a blog post I wrote earlier this year on the topic of embeddings.
To be honest the best thing I know to use for this at the moment is AMR:
About AMR here: https://amr.isi.edu/
Documentation here: https://github.com/amrisi/amr-guidelines/blob/master/amr.md
You can use a system like JAMR (see here: https://github.com/jflanigan/jamr) to generate AMRs for your sentence and then you can use Smatch (see here: https://amr.isi.edu/eval/smatch/tutorial.html) to compare the similarity of the two generated AMRs.
What you are trying to do is very difficult and is an active ongoing area of research.
You can use semantic similarity with WordNet for each pair of nouns.
To have a quick look you can enter bird-noun-1 and chair-noun-1 and select wordnet at http://labs.fc.ul.pt/dishin/ it gives you:
Resnik 0.315625756544
Lin 0.0574161071905
Jiang&Conrath 0.0964964414156
The Python code is at: https://github.com/lasigeBioTM/DiShIn

Approach to Sentence similarity algorithm

I want to implement a sentence similarity algorithm. Is it possible to implement it using sequence prediction algorithm? If it is possible what kind of approach should i go forward with or is there any other method which is more suitable for sentence similarity algorithm ,please share your views.
You could try to treat your sentences as separate documents and then use traditional approach for finding similarity between documents. It was answered here using sklearn:
Similarity between two text documents
If you want, you could try and implement the same code in tensorflow.
I also strongly recommend to read this answer, which covers more sophisticated approaches: https://stackoverflow.com/a/15173821/3633250
You could consider using Doc2Vec. Each sentence (document) is mapped to an n-dimensional space. To find the most similar document,
model.most_similar(“documentID”)
Reference

Is there a machine learning method to learn the structure of a sentence instead of the words?

I'm trying to use HMM to do named entity recognition but later on, I found most of the sentences that contain the entities are very structured. For example:
What's Apple's price today? Than instead to teach the model to learn each word within the sentence, can i teach it to learn the structure of the sentence? Like every word after "What's " or "What is" should be the name of a kind of fruit?
Thanks!
Instead of using an HMM, consider using a conditional random field. They are very similar to HMMs, but are the discriminative version (in Ng and Jordan's terminology, HMMs and Linear Chain CRFs form a generative/discriminative pair).
The benefits of doing this are that you can define features of your word observation which are the POS tag of the current word, the POS tag of the previous word(s), etc, without making independence assumptions about these features. This would allow you to incorporate structural and lexical features into the same decision framework.
Edit: Here's the original paper. Here's a very comprehensive tutorial.
You could begin exploring that structure with something as simple as n-grams, or try something richer like grammar induction.

Which Stanford NLP package to use for content categorization

I have about 5000 terms in a table and I want to group them into categories that make sense.
For example some terms are:
Nissan
Ford
Arrested
Jeep
Court
The result should be that Nissan, Ford, Jeep get grouped into one category and that Arrested and Court are in another category. I looked at the Stanford Classifier NLP. Am I right to assume that this is the right one to choose to do this for me?
I would suggest you to use NLTK if there weren't many proper nouns. You can use the semantic similarity from WordNet as features and try to cluster the words. Here's a discussion about how to do that.
To use the Stanford Classifier, you need to know how many buckets (classes) of words you want. Besides I think that is for documents rather than words.
That's an interesting problem that the word2vec model that Google released may help with.
In a nutshell, a word is represented by an N-dimensional vector generated by a model. Google provides a great model that returns a 300-dimensional vector from a model trained on over 100 billion words from their news division.
The interesting thing is that there are semantics encoded in these vectors. Suppose you have the vectors for the words King, Man, and Woman. A simple expression (King - Man) + Woman will yield a vector that is exceedingly close to the vector for Queen.
This is done via a distance calculation (cosine distance is their default, but you can use your own on the vectors) to determine similarity between words.
For your example, the distance between Jeep and Ford would be much smaller than between Jeep and Arrested. Through this you could group terms 'logically'.

Classification of relationships in words?

I'm not sure whats the best algorithm to use for the classification of relationships in words. For example in the case of a sentence such as "The yellow sun" there is a relationship between yellow and sun. THe machine learning techniques I have considered so far are Baynesian Statistics, Rough Sets, Fuzzy Logic, Hidden markov model and Artificial Neural Networks.
Any suggestions please?
thank you :)
It kind of sounds like you're looking for a dependency parser. Such a parser will give you the relationship between any word in a sentence and its semantic or syntactic head.
The MSTParser uses an online max-margin technique known as MIRA to classify the relationships between words. The MaltParser package does the same but uses SVMs to make parsing decisions. Both systems are trainable and provide similar classification and attachment performance, see table 1 here.
Like the user dmcer pointed out, dependency parsers will help you. There is tons of literature on dependency parsing you can read. This book and these lecture notes are good starting points to introduce the conventional methods.
The Link Grammar Parser which is sorta like dependency parsing uses Sleator and Temperley's Link Grammar syntax for producing word-word linkages. You can find more information on the original Link Grammar page and on the more recent Abiword page (Abiword maintains the implementation now).
For an unconventional approach to dependency parsing, you can read this paper that models word-word relationships analogous to subatomic particle interactions in chemistry/physics.
The Stanford Parser does exactly what you want. There's even an online demo. Here's the results for your example.
Your sentence
The yellow sun.
Tagging
The/DT yellow/JJ sun/NN ./.
Parse
(ROOT
(NP (DT The) (JJ yellow) (NN sun) (. .)))
Typed dependencies
det(sun-3, The-1)
amod(sun-3, yellow-2)
Typed dependencies, collapsed
det(sun-3, The-1)
amod(sun-3, yellow-2)
From your question it sounds like you're interested in the typed dependencies.
Well, no one knows what the best algorithm for language processing is because it hasn't been solved. To be able to understand a human language is to create a full AI.
Hoever, there have, of course, been attempts to process natural languages, and these might be good starting points for this sort of thing:
X-Bar Theory
Phrase Structure Rules
Noam Chomsky did a lot of work on natural language processing, so I'd recommend looking up some of his work.

Resources