Detect Multiple Intents from Call Center Conversation transcripts - machine-learning

I am trying to build a Model that can take the User conversation, that involves dialogues, as Input and Find all the Intents involved in it. This is Basically an Intent Detection Problem. However, Normally labeling Sentences and extracting the features out of it and building an Intent classifier wouldn't work here because Multiple Intents might be Available in a Single Conversation. Is there any Tool / Way / any pipeline that I should follow to achieve this Use case.

here is a list of references on multiple intents detection (some with links to github repos) -
GL-GIN: Fast and Accurate Non-Autoregressive Model for Joint Multiple Intent Detection and Slot Filling (2021)
https://arxiv.org/pdf/2106.01925v1.pdf
https://github.com/yizhen20133868/GL-GIN
Joint Multiple Intent Detection and Slot Labeling for Goal-Oriented Dialog (2019)
https://aclanthology.org/N19-1055.pdf
Towards Open Intent Discovery for Conversational Text (2019)
https://arxiv.org/pdf/1904.08524.pdf
MULTI-CLASS CLASSIFICATION WITHOUT MULTICLASS LABELS (2019)
https://openreview.net/pdf?id=SJzR2iRcK7
https://github.com/yizhen20133868/GL-GIN

Related

Using BERT model for parsing and doing bigram or multi-gram

The task I want to do is finding the sentiment of some phrases in a group of sentences (not of all the sentences). For example, I have these sentences:
Ended up deciding that a pebble was best. Less charging, more simplistic ui, buttons seem easier for an older person.
And I want to detect these phrases:
less charging
more simplistic UI
easier buttons
I need something that first detects the phrases, and then identifies their sentiments - they are all positive in terms of smart watches for elderly people.
Does the BERT model solve this problem? Or I have to find another tool/technique?

Face recognition vs image classification

I need to build an image classification model using tensor flow but in my datasets I have more than 10000 classes and only 5 images per class.
I understand that 5 is too small number of images and ideally there should be "at least" 100 images for each class, but at this point I don't understand how some "Face recognition" models can work.
For instance, all the modern smartphones provide a "face recognition" feature that can identify the phone's owner among all the faces in the world and the setup is very easy, it just needs a quick shot (3 to 5 secs) to the owner face.
So, why can this work and instead the image classification models require a high number of image to achieve an acceptable accuracy?
Are these models built using a different technology behind the scenes?
Would it be possible to build an "image classification" models using the same technology that the smartphones use for "Face recognition"?
Smartphone Face recognition: What your smartphones Face recognition system does is to identify certain key features say S, on your face. So given a new face, it will either say "Yes this face matches with the S" or "No, this face does not match with the S". So as you can see all you need is few samples of your face to identify this good set S. When it sees a new face all it has to do is to get these key features from the new face and compare it with S and finally says "Yes" or "No". It does not have to say, it is your face or your fathers face or your mothers face etc. All it has to say is "yes it matches" or "no it does not match"
Image classification: However, Image classification is a totally different task, where it has to classify each image to a class. To identify if a image is a cat it has to extract certain key features which distinguish it from other animals. So if you have have 100 such different animal you need 100 such sets of distinguishable key features. This is the reason you need large samples for each class so that the Image classification system can identify such key feature set for each class.
How you identify the key features is a totally different ball game. It can be either using the classical Image processing techniques (like SIFT, SURF etc) or by using deep learning techniques (like CNNs, Autoencoder etc)

How to combine deep learning models that perform different task

I wish to know if there is any means to combine two or more deep learning models that perform different task so that I have one which can perform all those tasks.
Let's say for example I want to build a chat bot which adapts to your mood during a conversation. I have a model (CNN) for emotion detection on your face (using a camera as the chat is real-time), another one for speech recognition (speech-to-text) ... and I want to combine all those so that when you speak, it reads your facial expression to determine your mood, converts your speech to text, formulates an answer (taking your mood into consideration) and outputs voice (text-to-speech).
How can I combine all these different features/models into a single one

train a neural network on real subject input/output to have it behave similarly to subject

The goal is to create an AI to play a simple game, tracking a horizontally moving dot across the screen which increases speed until no longer tracked.
I would like to create an AI to behave similarly to a real test subject. I have a large amount of trials that were recorded of many months, position of dot on screen and user cursor position over time.
I would like to train the network on these trials so that the network behaves similarly to a real test subject and I can then obtain very large amounts of test data to observe how changing the parameters of the game affects the networks ability to track the moving dot.
I am interested in learning about the underlying code of neural networks and would love some advice on where to start with this project. I understand AIs can get very good at performing different tasks such as snake, or other simple games, but my goal would be to have the AI perform similarly to a real test subject.
Your question is a bit broad, but i'll try to answer nonetheless.
To imitate a subjects behavior you could use and LSTM network which has an understanding of the state it's in (in your case the state may include information about how fast and in which direction the dot is going and where the pointer is) and then decides on an action. You will need to feed your data (the dot coordination and users behavior) into the the network.
A simpler yet effective approach would be using simple MLP network. Your problem does not seem like a hard one and a simple network should be able to learn what a user would in a certain situation. However, based on what you mean by "perform similarly to a real test subject" you might need a more complex architecture.
Finally there are GAN networks, which are somewhat complicated if you're not familiar with NNs, are hard and time-consuming to train and in some cases might fail to train at all. The bright side is they are exactly designed to imitate a probability distribution (or to put it more simply, a set of data).
There are two more important notes to mention:
The performance of your network depends heavily on your data and the game. for example, if in your dataset users have acted very differently to the same situation MLP or LSTMs will not be able to learn all those reactin.
Your network can only imitate what it's taught. So if you're planning to figure out what a human agent would do under some conditions that never happened in your dataset (e.g. if in your dataset the dot only moves in a line but you want it to move in a circle when experimenting) you won't get good results.
hope this helps.

which ML algorithm to choose

In my lab, I have 10 devices which I monitor using each device specific features like.
heat-generated
power consumed
patterns in power consumption
Using a supervised classification model I could classify these devices.
The problem I have is.. in case we add more such different type of devices.. how do I classify them? These device based on the trained model will classify new devices also as one among the classified device, which is untrue. They might have their own patterns.
Is there a way?. and how ?.
If you look at it, it seems like when a new type of device is added to your data-set, you are actually adding a new "Class".
In that case, you might have to retrain your model to accommodate the new Classes added to your dataset.

Resources