I am using spacy to train my own NER model. In addition to entities trained by spacy basic 'en_core_web_sm' model (ORG, PERSON, DATE, etc), I want to add my own entities. I trained my model with 'en_core_web_sm' as my base model, but then the model can only detect my own custom entities only, not the basic entities. Is there any way to do this? Thanks.
You can definitely do this with spaCy, cf the docs and also check Matt's blogpost around the problem of "catastrophic forgetting" (when your model "forgets" about the old types it knew before, which you obviously want to avoid).
Related
I have a request model. A request has one classification. What I want to set up is to store a bunch of form fields in the DB. Their types, names etc. Different classifications will have different form fields for the user to fill out on a request form. So ultimately User creates new request with classification C, and they are presented with a form with the appropriate fields for classification C.
I would like the values stored in a table with the request. My question is how should this be modeled?
Request has one classification.
Classification has_many requests.
I'm just not sure what to do with the dynamic form fields. I would like to be able to create the fields and attach them to the classification. So if first name, last name are fields needed I wouldn't have to create them for every classification. Just create them once and set associate them with a classification through a join table.
Looking for advice on how to model this out and be able to easily reference them from a request.
Thanks! Any info or thoughts are appreciated.
I would say that you should first try to model it according the relational model as far as possible.
# beware of potential conflicts with this name as it clashes with core method in controllers
class Request < ApplicationRecord
has_many :classifications
end
class Classification < ApplicationRecord
belongs_to :request
end
Model everything you know you can normalize. It's usually more then you think.
Dealing with data that doesn't adhere to a fixed schema can then be dealt with a few ways:
Just define all the fields and live with a few nulls here and there.
The Entity–attribute–value (EAV) pattern. This classic approach consists of a separate table where each row represents a value for a classification eg rails g model ClassificationAttribute classification:references attr_name attr_value. This is largely made obsolete by JSON data types.
A JSON/JSONB column. This additional column would be used to shove any unstructured data that cannot be normalized.
Serialized data columns. This also made obsolete by JSON/JSONB.
All of these can be combined with the Single Table Inheritance pattern.
If classification can be broken down into a limited number of variants you could consider Multiple Table Inheritance where you store the base data in the classification table and then use separate tables for the more specific data. Rails delegated_type feature can be used for this.
Your question is really confused and it is hard to understand what you are trying to achieve. But a few remarks:
You say "Request has one classification. Classification has_many requests" But if Request has one classification. Then classification should belongs to request. This way The Classification model holds a field called request_id (foreign key) that will help ActiveRecord link the two models together. (The child model is the one holding a foreign key)
If each is the parent of the other (has_one or has_many), then where is the foreign key ?
dynamic fields is not something possible. Your databse if hard coded: each field is declared in the relational database and Rails ActiveRecord's allows to access it easily and validate it. There is indeed a solution: have one of the model holds a JSON or JSONB field. And the value instead of being of the common types: string, text, integer.. be of JSON type and holds a value that is converted to a hash by Rails :
{
first_name: "Arthur",
last_name: "Smith",
age: "23"
}
This is pretty convenient for shopping carts as you can save an actual list of items rather than an association. Having an association would need to version your items changes (when the price of an item changes for example) which need some good engineering.
The question is : is it what you really want to do ? Because this is an option that doesn't fit all apps or uses.
Also you say the request depends on the classification. I have mentionned the problem of the foreign key above. But it seems weird that one of your record behavior is set by a direct relationship relationship. Who creates the classification ? Is it one of the app models such as the User ? an Admin ? or is it seeded by the app creator (then Classification is a standalone model) ? In this case the classification preexists the request the Request and maybe a has_and_belongs_to_many association (a join table ) would fit better...
Maybe give us a clearer view of what you want to achieve with real life examples so we can help further
We have to extract an entity which is inside another entity, any idea on how can we annotate the training data to train a NER model for this task. We are using Flair model for custom entity training and prediction.
Ex: Text: "" Address: 123, ABC Company, 4th floor, xyz street, state, country.""
We have a sample like this, where whole text itself is an entity of type "Address" and in the same text we have another entity called "Company Name".
For train a flair model, we are converting the data into BIEO format, not sure how to annotate the data and train the model.
We came up with a solution to handle this scenario by training two models, one for address and other for company name.
Comment your approach on how we can handled this kind of a scenario in much better way.
I have entity called Item. It has attribute title and I want it to have collection of subitems (type of Item).
One item can have many (sub)items. (sub)item is part of right one item. For example, there is item titled as car. It has subitems titled wheels, engine and cabine. Cabine has subitems seat and steering wheel.
How to model it? Should I set inverse to subitems? If I set no inverse, I'm getting warning. And whether it is inverse or not, it is still many-to-many. No way to set it one-to-many.
How should I think of this problem? I don't have much experience with databases and I think there is also difference between modeling in Core Data and in SQL.
EDIT: There should be subitems instead of subitem in the picture
I've added relationship superitem as inverse to subitems. superitem is to-one type with nullify delete rule and subitems is to-many type with cascade delete rule. Seems to be the most perfect solution for my case. As bonus I don't have to write my own - addSubitem: method (as it is not generated for Swift) because it is automatically added if I set item's superitem.
Object modeling and relational database design are quite different, at least on the surface. The concepts of encapsulation, inheritance, and polymorphism have no exact analog in the relational data model. You are going to have to think about the problem in two different ways in order to do both object modeling and relational database design.
There is a model that is sort of half way between them. It's called the "Entity Relationship model", and this has been around almost as long as the relational model. This is useful for thinking about the problem and analyzing the data requirements at a conceptual level. ER modeling is very parallel to object modeling, except that object modeling models behavior as well as data, and ER modeling only models data.
The problem with learning ER modeling for this purpose is that in the present state of affairs, most of the professionals who use ER diagrams do not use them to depict a conceptual model. They use them to depict a relational design for a database. So if you learn ER modeling from them, you'll learn a design methodology, and not an analysis methodology.
Data analysis and database design are really very different activities, and it's useful to keep them separate in your mind, even if a single project requires you to do both of them. Oddly enough, the same division ultimately comes up in object modeling as well. Some object models are analysis models, and try to clarify the problem space. Other object models are design models, and try to clarify the solution space.
Acknowledging what Mitty said. You need wrap your brain around objects (not relational tables). Considering your example I would break it down as follows. The top level object is an item such as a car, truck, airplane, boat, etc. Items can have systems such as engines, transmissions, cabins. Systems can have components such as pistons, spark plugs, seats, steering wheels, tires. If you think of all these things as objects, then perhaps the beginning of a model would look like this:
An item may have many systems. Systems may have many components. Apple recommends setting the inverse, but you should worry more about the relationships and their cardinality (i.e. one-to-one, one-to-many). You can use a reflexive relationship (to self) as you depicted, but I think that limits your ability to really leverage the power of the object model as all 'things' would be represented as 'item' and you wouldn't have the nice distinction of system and component (IMO)
Please help me understand the difference between Named Entity Recognition and Named Entity Extraction.
Named Entity Recognition is recognition of the surface form of an Entity (person, place, organization), i.e. "George Bush" or "Barack Obama" are "PERSON" entities in this text string.
Entity Extraction will extract additional information as attributes from the text string. For example in the sentence "George W. Bush was president before President Obama" recognizing "Obama" as a person with attribute "title=president".
But if you look at software the distinction is often blurred.
There is no such a thing as Named Entity Extraction.
Paraphrasing better the sentence I would say that Named Entity Extraction is simple the process of concrete extracting previously recognized named entities. So, in a sense, there is no real theoretical knowledge that is relevant to this task, is just a matter of defining the mechanical operation.
If we are instead interested in extracting all the specific entities or the additional information regarding them from a piece text, than we have to look at information or knowledge extraction.
For information extraction you could for example ask to extract all the names of cities, or e-mail addresses, that appear in a corpus of documents. For such a task Named Entity Extraction could be used. You could even go much more generic, asking simply to extract general knowledge, for example in the form of relations (relation extraction).
For more details I would suggest the Natural Language Processing chapter of the book Artificial Intelligence: A Modern
Approach.
I'm not sure if this question is non-sense or not, please tell me if so. I am wondering do I create my models like one per each table in my database, or do you do one per controller? Is there something I am missing here?
From what I read the Model is suppose to be a representation of the real business objects, so would I just want to make it one big model or divide them out based on things in the app? based on real user/client perception of the data?
Thanks for your advice.
There's nothing wrong with controllers sharing models. But trying to serve every controller with the same model doesn't make sense.
Models and controllers really are't related, nor should they be. Models also aren't directly related to how data is stored in your application.
Models encapsulate data. Their design should be dictated by the data they are encapsulating. The demands of the system dictate what models you'll need and what data they must hold.
Don't try to overthink it. For a given request, determine what you need to show in your view and how it will be displayed. Determine what an appropriate model would look like for this scenario. If one already exists, use it. If not, create a new model. Save the overengineering later when you know what your needs are and can find commonalities between models.
Models can also contain other models, that's fine. Think of a model for a sales report. You would have a model for the report which would contain not only a report name, a total, but also a collection of other models which make up the report's line items.
It depends on what you mean by "Model". If by model you mean the business rule layer of your application, then there is no relationship in terms of numbers. One model of that type is used for any amount of views you need to build.
Personally, however, I would not bind any view to any model, but create an intermediary layer called a ViewModel that essentially taylors the data from your model to fit a particular view. In that case, the relationship is one-to-one. This is essentially how Presenter patterns work. Every view is strongly typed to it's own ViewModel that is populated from the Model layer.
Models do not necessarily have a literal coorespondence with the database either. How you store data your model is different from how your "Model" uses that data.