We have to extract an entity which is inside another entity, any idea on how can we annotate the training data to train a NER model for this task. We are using Flair model for custom entity training and prediction.
Ex: Text: "" Address: 123, ABC Company, 4th floor, xyz street, state, country.""
We have a sample like this, where whole text itself is an entity of type "Address" and in the same text we have another entity called "Company Name".
For train a flair model, we are converting the data into BIEO format, not sure how to annotate the data and train the model.
We came up with a solution to handle this scenario by training two models, one for address and other for company name.
Comment your approach on how we can handled this kind of a scenario in much better way.
Related
I have a request model. A request has one classification. What I want to set up is to store a bunch of form fields in the DB. Their types, names etc. Different classifications will have different form fields for the user to fill out on a request form. So ultimately User creates new request with classification C, and they are presented with a form with the appropriate fields for classification C.
I would like the values stored in a table with the request. My question is how should this be modeled?
Request has one classification.
Classification has_many requests.
I'm just not sure what to do with the dynamic form fields. I would like to be able to create the fields and attach them to the classification. So if first name, last name are fields needed I wouldn't have to create them for every classification. Just create them once and set associate them with a classification through a join table.
Looking for advice on how to model this out and be able to easily reference them from a request.
Thanks! Any info or thoughts are appreciated.
I would say that you should first try to model it according the relational model as far as possible.
# beware of potential conflicts with this name as it clashes with core method in controllers
class Request < ApplicationRecord
has_many :classifications
end
class Classification < ApplicationRecord
belongs_to :request
end
Model everything you know you can normalize. It's usually more then you think.
Dealing with data that doesn't adhere to a fixed schema can then be dealt with a few ways:
Just define all the fields and live with a few nulls here and there.
The Entity–attribute–value (EAV) pattern. This classic approach consists of a separate table where each row represents a value for a classification eg rails g model ClassificationAttribute classification:references attr_name attr_value. This is largely made obsolete by JSON data types.
A JSON/JSONB column. This additional column would be used to shove any unstructured data that cannot be normalized.
Serialized data columns. This also made obsolete by JSON/JSONB.
All of these can be combined with the Single Table Inheritance pattern.
If classification can be broken down into a limited number of variants you could consider Multiple Table Inheritance where you store the base data in the classification table and then use separate tables for the more specific data. Rails delegated_type feature can be used for this.
Your question is really confused and it is hard to understand what you are trying to achieve. But a few remarks:
You say "Request has one classification. Classification has_many requests" But if Request has one classification. Then classification should belongs to request. This way The Classification model holds a field called request_id (foreign key) that will help ActiveRecord link the two models together. (The child model is the one holding a foreign key)
If each is the parent of the other (has_one or has_many), then where is the foreign key ?
dynamic fields is not something possible. Your databse if hard coded: each field is declared in the relational database and Rails ActiveRecord's allows to access it easily and validate it. There is indeed a solution: have one of the model holds a JSON or JSONB field. And the value instead of being of the common types: string, text, integer.. be of JSON type and holds a value that is converted to a hash by Rails :
{
first_name: "Arthur",
last_name: "Smith",
age: "23"
}
This is pretty convenient for shopping carts as you can save an actual list of items rather than an association. Having an association would need to version your items changes (when the price of an item changes for example) which need some good engineering.
The question is : is it what you really want to do ? Because this is an option that doesn't fit all apps or uses.
Also you say the request depends on the classification. I have mentionned the problem of the foreign key above. But it seems weird that one of your record behavior is set by a direct relationship relationship. Who creates the classification ? Is it one of the app models such as the User ? an Admin ? or is it seeded by the app creator (then Classification is a standalone model) ? In this case the classification preexists the request the Request and maybe a has_and_belongs_to_many association (a join table ) would fit better...
Maybe give us a clearer view of what you want to achieve with real life examples so we can help further
I am using spacy to train my own NER model. In addition to entities trained by spacy basic 'en_core_web_sm' model (ORG, PERSON, DATE, etc), I want to add my own entities. I trained my model with 'en_core_web_sm' as my base model, but then the model can only detect my own custom entities only, not the basic entities. Is there any way to do this? Thanks.
You can definitely do this with spaCy, cf the docs and also check Matt's blogpost around the problem of "catastrophic forgetting" (when your model "forgets" about the old types it knew before, which you obviously want to avoid).
The ANN model I am working on must recognize a specific object in an image, and only this one. As the model has to give the probability that the object is in the image or not, how should be the organization of my dataset?
Can I split my data into 2 categories: "right object" and an "other" gathering random pictures, or do I have to create several "other" categories such as "birds", "devices", and so on?
Thanks.
EDIT: I did not find any post here nor website providing interesting tips on how to create a good image dataset.
Okay, so I understood my question was not that relevant, because it can be solved after a few experiments and tests.
It is essential to have differents categories for the "other" objects, as they do not have the same features nor shapes. Creating a mixture of several objects in the same category can be very detrimental for the model's accuracy...
The priority (at least in my case) is to deal with objects that are similar to the ones I need to recognize.
For example, if I want to detect a Blu-ray player, I will have many electronic device categories to help my model to see the difference, like "keyboard", "screen", "computer" categories.
I want to implement a tool which can transform (or at least suggest) one data model into another data model . For example one data model has two fields, 'first_name' and 'last_name' and 2nd data model has one field 'name' . Then tool should be able to suggest concatenation of 'first_name' and 'last_name'. A different use can be a mapping suggestion between two fields whose name are different but they mean the same thing .For ex map 'supplier_id' from source data model to 'customer_id' in destination data model (A typical case of supplier relation management scenario).
I have seen approaches based on Machine Learning , heuristics and some research papers for semantic translation/mapping as well but couldn't find anything concrete .
Any pointers for start will be highly appreciated .
Please help me understand the difference between Named Entity Recognition and Named Entity Extraction.
Named Entity Recognition is recognition of the surface form of an Entity (person, place, organization), i.e. "George Bush" or "Barack Obama" are "PERSON" entities in this text string.
Entity Extraction will extract additional information as attributes from the text string. For example in the sentence "George W. Bush was president before President Obama" recognizing "Obama" as a person with attribute "title=president".
But if you look at software the distinction is often blurred.
There is no such a thing as Named Entity Extraction.
Paraphrasing better the sentence I would say that Named Entity Extraction is simple the process of concrete extracting previously recognized named entities. So, in a sense, there is no real theoretical knowledge that is relevant to this task, is just a matter of defining the mechanical operation.
If we are instead interested in extracting all the specific entities or the additional information regarding them from a piece text, than we have to look at information or knowledge extraction.
For information extraction you could for example ask to extract all the names of cities, or e-mail addresses, that appear in a corpus of documents. For such a task Named Entity Extraction could be used. You could even go much more generic, asking simply to extract general knowledge, for example in the form of relations (relation extraction).
For more details I would suggest the Natural Language Processing chapter of the book Artificial Intelligence: A Modern
Approach.