I want to be able to find all the features that refer to a specific person in text.
For example, if I had the string "John Smith is a Doctor and lives in Sydney, unlike his co-worker Jane who lives in Newcastle.", is there an NLP technique to extract only the information that relates to John? I.e. {John: Doctor, Sydney}.
Sub-question: is there a Python library that lets me do this?
Thank you in advance :)
Take a look at AMR (abstract meaning representation)
https://amr.isi.edu
It goes further than dependency parsing and will extract semantic relations.
the graph shows that there is a doctor who is a person and their name is John smith and this same doctor is the agent of live and the location of living is a city called Sydney.
The accuracy for longer and more complex sentences might be low.
Related
I am trying to find the name of a specific location from tweets and performing sentiment analysis on the hits I get from the search. The problem I am facing is that, I am looking for a location whose name is suppose "Sammy's Tap and Grill", searching which I get no hits. I need to search something like "Sammys" or "Sammy's" to get some hits. Alternatively, when I search for "Empire State Building", I cannot search for "Empire" alone, it gives weird tweets including Mayan and Chola empires. So here I have to search with "Empire State Building" or "Empire State". So is there an NLP trick where I can do something and search for the best possible term from the full name of the location that gets most relevent hits? I was just able to make a solution where I was checking if the hits I get were nouns, because some places have names like "Excellent" and "Fantastic" and I didnt want adjectives to pop up. So is there some NLP way to solve my problem about searching a locationname from a tweet?
your problem is very similar to named entity recognition problem. You can try using standart named entity exctractors or train your own NER model.
There different libraries for NER, like
Stanford NER,
SpaCy NER Tool
NLTK NER module
In case if you want to train your own Named Entity Recognition model check this links:
CRF git repository
Named Entity Recognition with Tensorflow
Good luck)
I am trying to solve a problem where I'm identifying entities in articles (ex: names of cars), and trying to predict sentiment about each car within the article. For that, I need to extract the text relevant to each entity from within the article.
Currently, the approach I am using is as follows:
If a sentence contains only 1 entity, tag the sentence as text for that entity
If sentence has more than 1 entity, ignore it
If sentence contains no entity, tag as a sentence for previously identified entity
However, this approach is not yielding accurate results, even if we assume that our sentiment classification is working.
Is there any method that the community may have come across that can solve this problem?
The approach fails for many cases and gives wrong results. For example if I am saying - 'Lets talk about the Honda Civic. The car was great, but failed in comparison to the Ford focus. The car also has good economy.'
Here, the program would pick up Ford Focus as the entity in last 2 sentences and tag those sentences for it.
I am using nltk for descriptive words tagging, and scikit-learn for classification (linear svm model).
If anyone could point me in the right direction, it would be greatly appreciated. Is there some classifier I could build with custom features that can detect this type of text if I were to manually tag say - 50 articles and the text in them?
Thanks in advance!
I am newbie to Open NLP - Entity extraction with NER, I had train and evaluated models for Entity extraction in Open NLP NER, which works fine when I give input text with an entity of one word Eg: "I want to buy Cadbury"
But It does not works works for the Multi-word scenarios Eg: "I want to but an Apple MacBook"
How train the models to pick the multi word
PS: I have understood that I need to do something related with BiGrams provided in NLP, but how do i do it with OpenNLP?
You need to provide training data which covers multi-word spans. Example from the OpenNLP documentation:
<START:person> Pierre Vinken <END> , 61 years old , will join the board as a nonexecutive director Nov. 29 . Mr . <START:person> Vinken <END> is chairman of Elsevier N.V. , the Dutch publishing group .
Besides the above format, IO/BIO/etc tags are also common.
In your example, Apple MacBook could be one entity of type Product Name, but could also be two, with Apple as Company Name and MacBook as Product Name. How that works depends completely on your training data.
You can create data like this by hand or visually using brat.
I am working on an NLP project, wherein I have a list of emails all related to appreciation. I am trying to determine from the email content, who is being appreciated. This in turn will help the organization in our performance evaluation program.
Apart from identifying who is being appreciated, I am also trying to identify the type of work a person has done and score it. I am using open NLP (max entropy/logistic regression) for classification of the email and use some form of heuristics to identify the person being appreciated.
The approach for person identification is as follows:
Determine if an email is related to appreciation
Get the list of people in the "To:" list
Check if that person is being referred to in the email
Tag that person as the receiver of appreciation
However, this approach is very simple and does not work for complex emails we generally see. An email can consist of many email ids or people being referred to and they are not the receivers of the appreciation. The context of the person is not available and hence the accuracy is not very good.
I am thinking of using HMM and word2vec to solve the person issue. I would appreciate if anyone has come across this problem or has any suggestion.
Use tm package for R. And use tf-idf (term frequency - inverse document frequency) to determine whos been appreciate.
I'm suggesting this because , for what I can read , this is an unsupervised learning (you dont knot prior whos been appreciate). So you have to describe the documents (emails) content , and that formula (tf-idf) will help know what words are been use most in a particular document that are rarely uso on all anothers.
One way to solve this problem is through the use of Named Entity Recognition. You can possibly run something like Stanford NER over the text which will help you recognize all people names mentioned in the email and then use a rules based chunker such as Stanford TokensRegex to extract sentences where names of people and appreciation words are mentioned.
The best way to solve this will be by treating this as a supervised learning problem. You will then need to annotate a bunch of training data with entities and expression phrases and the relations between them. Then you can use Stanford Relation Extractor to extract appropriate relations.
I am building a recommendation system for dishes. Consider a user eating french fries and rates it a 5. Then I want to give a good rating to all the ingredients that the dish is made of. In the case of french fires the linked words should be "fried" "potato" "junk food" "salty" and so on. From the word Tsatsiki I want to extract "Cucumbers", "Yoghurt" "Garlic". From Yoghurt I want to extract Milk product, From Cucumbers vegetable and so on.
What is this problem called in Natural Language Processing and is there a way to address it?
I have no data at all, and I am thinking of building web crawler that analyzes the web for the dish. I would like it to be as little Ad-Hoc as possible and not necessarily in English. Is there a way, maybe in within deep learning to do the thing? I would not only a dish to be linked to the ingredients but also a category: junk food, vegetarian, Italian food and so on.
This type of problem is called ontology engineering or ontology building. For an example of a large ontology and how it's structured, you might check out something like YAGO. It seems like you are going to be building a boutique ontology for food and then overlaying a rating's system. I don't know of any ontologies out there of the form you're looking for, but there are relevant things out there you should take a look at, for example, this OWL-based food ontology and this recipe ontology.
Do you have a recipe like that:
Ingredients:
*Cucumbers
*Garlic
*Yoghurt
or like that:
Grate a cucumber or chop it. Add garlic and yoghurt.
If the former, your features have already been extracted. The next step would be to convert to a vector recommend other recipes. The simplest way would be to do (unsupervised) clustering of recipes.
If the latter, I suspect you can get away with a simple rule of thumb. Firstly, use a part-of-speech tagger to extract all the nouns in the recipe. This would extract all the ingredients and a bit more (e.g. kitchen appliances, cutlery, etc). Look up the nouns in a database of food ingredients database such as this one.