Which dataset organization for binary image classification? - machine-learning

The ANN model I am working on must recognize a specific object in an image, and only this one. As the model has to give the probability that the object is in the image or not, how should be the organization of my dataset?
Can I split my data into 2 categories: "right object" and an "other" gathering random pictures, or do I have to create several "other" categories such as "birds", "devices", and so on?
Thanks.
EDIT: I did not find any post here nor website providing interesting tips on how to create a good image dataset.

Okay, so I understood my question was not that relevant, because it can be solved after a few experiments and tests.
It is essential to have differents categories for the "other" objects, as they do not have the same features nor shapes. Creating a mixture of several objects in the same category can be very detrimental for the model's accuracy...
The priority (at least in my case) is to deal with objects that are similar to the ones I need to recognize.
For example, if I want to detect a Blu-ray player, I will have many electronic device categories to help my model to see the difference, like "keyboard", "screen", "computer" categories.

Related

The best way to keep multiple images in database

I use Ruby on Rails with postgresql and I see two options:
Model has field image, which type of is json (and keep them as array).
Model has_many_and_belongs_to related model like "Image".
Which of these two is more preferable?
I consider performance as the first and convenience as the second.
I would ask my self this question.
Is the image only relates to 1 model?
If, it will only be related to 1 model, like profile picture. Then, it will be best fit for first approach
Model has field image, which type of is json (and keep them as array).
If the image will be used for multiple models like emoji or letter_avatar. Then, it will be fit on the second approach
Model has_many_and_belongs_to related model like "Image".
Performance wise, it will be better in the first approach. You don't have to spend resource to fetch related models.

Fact table linked to Slowly Changing Dimension

I'm struggling to understand the best way to model a particular scenario for a data warehouse.
I have a Person dimension, and a Tenancy dimension. A person could be on 0, 1 or (rarely) multiple tenancies at any one time, and will often have a succession of tenancies over time. A tenancy could have one or more people associated with it. The people associated with a tenancy can change over time, and tenancies generally last for many years.
One option is to add tenancy reference, start and end dates to the Person Dimension as type 2 SCD columns. This would work well as long as I ignore the possibility of multiple concurrent tenancies for a person. However, I have other areas of the data warehouse where I am facing a similar design issue and ignoring multiple relationships is not a possibility.
Another option is to model the relationship as an accumulating snapshot fact table. I'm not sure how well this would work in practice though as I could only link it to one version of a Person and Tenancy (both of which will have type 2 SCD columns) and that would seem to make it impossible to produce current or historical reports that link people and tenancies together.
Are there any recommended ways of modelling this type of relationship?
Edit based on the patient answer and comments given by SQL.Injection
I've produced a basic model showing the model as described by SQL.Injection.
I've moved tenancy start/end dates to the 'junk' dimension (Dim.Tenancy) and added Person tenancy start/end dates to the fact table as I felt that was a more accurate way to describe the relationship.
However, now that I see it visually I don't think that this is fundamentally any different from the model that I started with, other than the fact table is a periodic snapshot rather than an accumulating snapshot. It certainly seems to suffer from the same flaw that whenever I update a type 2 slowly changing attribute in any of the dimensions it is not reflected in the fact.
In order to make this work to reflect current changes and also allow historical reporting it seems that I will have to add a row to the fact table every time a SCD2 change occurs on any of the dimensions. Then, in order to prevent over-counting by joining to multiple versions of the same entity I will also need to add new versions of the other related dimensions so that I have new keys to join on.
I need to think about this some more. I'm beginning to think that the database model is right and that it's my understanding of how the model will be used that is wrong.
In the meantime any comments or suggestions are welcome!
Your problem is similar to to the sale transactions with multiple item. The difference, is that a transaction usually has multiple items and your tenancy fact usually has a single person (the tenant).
Your hydra is born because you are trying to model the tenancy as a dimension, when you should be modeling it as a fact.
The reason why I think you have a tenancy dimension, is because somewhere you have a fact rent. To model the fact rent consider use the same approach i stated above, if two persons are tenants of the same property two fact records should be inserted each month:
1) And now comes some magic (that is no magic at all), split the value of the of the rent by the number of tenants and store it the fact
2) store also the full value of the rent (you don't know how the data scientist is going to use the data)
3) check 1) with the business user (i mean people that build the risk models); there might be some advanced rule on how to do the spliting (a similar thing happens when the cost of shipping is to be divided across multiple item lines of the same order -- it might not be uniformly distributed)

One-to-many relationship between same entity in Core Data

I have entity called Item. It has attribute title and I want it to have collection of subitems (type of Item).
One item can have many (sub)items. (sub)item is part of right one item. For example, there is item titled as car. It has subitems titled wheels, engine and cabine. Cabine has subitems seat and steering wheel.
How to model it? Should I set inverse to subitems? If I set no inverse, I'm getting warning. And whether it is inverse or not, it is still many-to-many. No way to set it one-to-many.
How should I think of this problem? I don't have much experience with databases and I think there is also difference between modeling in Core Data and in SQL.
EDIT: There should be subitems instead of subitem in the picture
I've added relationship superitem as inverse to subitems. superitem is to-one type with nullify delete rule and subitems is to-many type with cascade delete rule. Seems to be the most perfect solution for my case. As bonus I don't have to write my own - addSubitem: method (as it is not generated for Swift) because it is automatically added if I set item's superitem.
Object modeling and relational database design are quite different, at least on the surface. The concepts of encapsulation, inheritance, and polymorphism have no exact analog in the relational data model. You are going to have to think about the problem in two different ways in order to do both object modeling and relational database design.
There is a model that is sort of half way between them. It's called the "Entity Relationship model", and this has been around almost as long as the relational model. This is useful for thinking about the problem and analyzing the data requirements at a conceptual level. ER modeling is very parallel to object modeling, except that object modeling models behavior as well as data, and ER modeling only models data.
The problem with learning ER modeling for this purpose is that in the present state of affairs, most of the professionals who use ER diagrams do not use them to depict a conceptual model. They use them to depict a relational design for a database. So if you learn ER modeling from them, you'll learn a design methodology, and not an analysis methodology.
Data analysis and database design are really very different activities, and it's useful to keep them separate in your mind, even if a single project requires you to do both of them. Oddly enough, the same division ultimately comes up in object modeling as well. Some object models are analysis models, and try to clarify the problem space. Other object models are design models, and try to clarify the solution space.
Acknowledging what Mitty said. You need wrap your brain around objects (not relational tables). Considering your example I would break it down as follows. The top level object is an item such as a car, truck, airplane, boat, etc. Items can have systems such as engines, transmissions, cabins. Systems can have components such as pistons, spark plugs, seats, steering wheels, tires. If you think of all these things as objects, then perhaps the beginning of a model would look like this:
An item may have many systems. Systems may have many components. Apple recommends setting the inverse, but you should worry more about the relationships and their cardinality (i.e. one-to-one, one-to-many). You can use a reflexive relationship (to self) as you depicted, but I think that limits your ability to really leverage the power of the object model as all 'things' would be represented as 'item' and you wouldn't have the nice distinction of system and component (IMO)

What is the difference between Named Entity Recognition and Named Entity Extraction?

Please help me understand the difference between Named Entity Recognition and Named Entity Extraction.
Named Entity Recognition is recognition of the surface form of an Entity (person, place, organization), i.e. "George Bush" or "Barack Obama" are "PERSON" entities in this text string.
Entity Extraction will extract additional information as attributes from the text string. For example in the sentence "George W. Bush was president before President Obama" recognizing "Obama" as a person with attribute "title=president".
But if you look at software the distinction is often blurred.
There is no such a thing as Named Entity Extraction.
Paraphrasing better the sentence I would say that Named Entity Extraction is simple the process of concrete extracting previously recognized named entities. So, in a sense, there is no real theoretical knowledge that is relevant to this task, is just a matter of defining the mechanical operation.
If we are instead interested in extracting all the specific entities or the additional information regarding them from a piece text, than we have to look at information or knowledge extraction.
For information extraction you could for example ask to extract all the names of cities, or e-mail addresses, that appear in a corpus of documents. For such a task Named Entity Extraction could be used. You could even go much more generic, asking simply to extract general knowledge, for example in the form of relations (relation extraction).
For more details I would suggest the Natural Language Processing chapter of the book Artificial Intelligence: A Modern
Approach.

Modeling a database (ERD) that has quirky behavior

One of the databases that I'm working on has some quirky behavior that I want to account for in the entity-relationship diagram.
One of the behaviors is that there is a 'booking' table and a 'invoice' table. When a 'booking' is invoiced, then the record is inserted into the 'invoice' table and then deleted from the 'booking' table.
However, a reference is still kept of the booking number.
How do we model this? Big arrow between the tables and some text beside it describing what happens?
No, changing the database schema is not possible at this point in time
Edit: This is the type of diagram that I want to use:
alt text http://img813.imageshack.us/img813/5601/erdartistperformssong.png
Link
If, by ERD, you mean the original "Chen" diagrams where the relationship was words written in a diamond, then you have a relationship between between Booking and Invoice. It's a special kind of relationship that's NOT implemented with a simple foreign key; it's implemented via a complicated move and a constraint.
If, by ERD, you mean the diagrams that ERwin draws, then you don't have an easy way to do this. It tends to focus you on drawing PK-FK relationships. You have a non-PK-FK relationship between these things. Some kind of line with text is about all you can do.
Arrows, BTW, aren't appropriate because the ERD shows the "state" of the database. Data flowing around isn't part of an ERD. You do have a relationship, it's just not a typical PK-FK relationship. It's an atypical relationship based on rows existing in some places and not existing in others.
In the UML you can easily draw this as a "constraint" among the relationships.
I don't know what these people are talking about.
The Entity Relation Diagram doesn't describe the data fully; yes of course, it only shows Entities and Relations, it doesn't show Attributes. That's why it is called an ERD and not a Data Model. Evidently many people here can't tell the difference.
The Data Model is supposed to show as much as possible. But it depends on (a) the standard [if any] that you use and (b) the Notation. Some show more than others. IDEF1X which is the only Relational modelling Standard (NIST 184 of 1993). It is the most complete, and shows intricacies and complexities that other notations do not show. Recently MS and others have come out with "simplified" notations, of course, much is lost in the "ERDs".
It is not "process flow", it is a relation in a database.
UML is completely inappropriate for modelling data, especially when there is at least one Standard plus several non-standard but commonly used data modelling notations. There is nothing that can be shown in UML that can't be shown in IDEF1X. But most developers here have never heard of it (developers should not be modelling unless they have acquired modelling skills, but that is another story)..
This is a perfectly legal; it may not be commonly known, but it is legal and named. It is a Supertype-Subtype relation, except that the Cardinality is 1::0-n instead of 1::0-1. The IDEF1X Notation (right) has a Subtype symbol. Note there is only one relation at the parent end; and one each at the child end. And of course the crows feet show the cardinality. These relations can be Exclusive or Non-exclusive; yours is Exclusive; that is what the X through the half-circle means.
ERwin is the only modelling (not diagramming) tool that implements IDEF1X, and thus has the full complement of the IDEF1X Notation.
Of course, the Standard, the modelling capability, are all in the mind, not in the tool. I draw Data Models that are IDEF1X-compliant using a simple drawing tool.
I find that some developers baulk at the Subtype symbol, so I show a simplified version (left) in my IDEF1X models; it is intended to convey the sense of exclusivity, while the retention of the single line at the parent end indicates it is a subtype.
Lott: Click here▶Link to Data Model◀Lott: Click here
Link to IDEF1X Notation for those who are unfamiliar with the Relational Modelling Standard.
Sounds like a process flow, not an entity relationship. If at the time the entry is added to invoice, and the entry is deleted from booking, then there is never a relationship between the two. There is never a situation where you can traverse that relationship because there is never a record in both places that can be related together.
ERD don't describe the database fully. There are other things like process flow and use cases that detail other facets of the system.
This is kind of an analogy to UML for software. A class diagram doesn't show you all the different ways classes interact. One class might initialize locally and call functions of another class, but because there is not composition or inheritance that relates those two classes, then the class diagram doesn't show this relationship. Only when you fully document the system with all the various types of diagrams can you see all the facets of how it operates.

Resources