How to retrieve Transition scores and Emission score from CRF layer of Flair sequence tagger? - named-entity-recognition

I have built a NER model with Flair sequence tagger. But I need Transition scores and Emission scores of CRF layer for some calculations. How could I get it in Flair?

Related

Anomaly detection in multivariate time series with autoencoder

I have an autoencoder with LSTM layers for anomaly detection in time series.
During the train, the autoencoder learns to reconstruct only the normal sample and then we evaluate the testing set that contains anomalies. If the reconstruction is "too bad" then that time window is an anomaly.
The problem is how to define the threshold during the train. The input has shape (samples, window_size, features) and the error is calculated as the square difference between the true input and the reconstructed sequence.
My problem is how to define a threshold and how to apply that threshold to the testing set. One possible choice is to define one threshold for each feature or one global threshold but I do not know which way is better.

SMOTE oversampling for anomaly detection using a classifier

I have sensor data and I want to do live anomaly detection using LOF on the training set to detect anomalies and then apply the labeled data to a classifier to do classification for new data points. I thought about using SMOTE because I want more anamolies points in the training data to overcome the imbalanced classification problem but the issue is that SMOTE created many points which are inside the normal range.
how can I do oversampling without creating samples in the normal data range?
the graph for the data before applying SMOTE.
data after SMOTE
SMOTE is going to linearly interpolate synthetic points between a minority class sample's k-nearest neighbors. This means that you're going to end up with points between a sample and its neighbors. When samples are all over the place like this, it makes sense that you're going to create synthetic points in the middle.
SMOTE should really be used to identify more specific regions in the feature space as the decision region for the minority class. This doesn't seem to be your use case. You want to know which points "don't belong," per se.
This seems like a fairly nice use case for DBSCAN, a density-based clustering algorithm that will identify points beyond some distance, eps, as not belonging to the same neighborhood.

Determine accuracy of model which estimates probability of one of the classes

I'm modeling an event with two outcomes, 0(rejection) and 1(acceptance). I have created a model which estimates the probability that 1(acceptance) will happen (i.e. the model will calculate that '1' will happen with 80% chance or in other words probability of acceptance is 0.8)
Now, I have a large record of outcomes of trials with the estimates from the model (For example: probability of acceptance=0.8 and actual class (acceptance=1)). I would like to quantify or validate how accurate the model is. Is this possible, and if so how?
Note: I am just predicting probability of class 1. Let's say prediction for class 1 is 0.8 and the actual class value is 1. Now I want to find performance of my model.
You simply need to convert your probability to one of two discrete classes with thresholded rounding, i.e. if p(y=1|x)>0.5: predict 1, else predict 0. Then all of the metrics are applicable. The threshold can be chosen by inspecting the ROC curve and/or precision-recall changes or can be simply set to 0.5.
Sort the objects by prediction.
Then compute the ROC AUC of the resulting curve.

what is the score in plot_learning_curve of scikit learn?

In scikit learn, I make a regression of Boston House Price and get the following learning curve. But what is meaning of score(y axis) in regression?
Graph visualizes the learning curves of the model for both training and validation as the size of the training set is increased. The shaded region of a learning curve denotes the uncertainty of that curve (measured as the standard deviation). The model is scored on both the training and testing sets using R2, the coefficient of determination.
It depends on what do you want to measure, you can choose anything from following chart(may be any other metric not present here):
Reference:
http://scikit-learn.org/stable/modules/model_evaluation.html

What do the values of latent feature models for user and item matrix in collabarative filter represent?

When decomposing a rating matrix for recommender system, the rating matrix can be written as P* t(Q), which P represents user factor matrix and Q represents item factor matrix. The dimension of Q can be written as rank*number of items. I am wondering if the values in the Q matrix actually represent anything, such as the weight of the item? And also, is there any way to find out some hidden patterns in the Q matrix?
Think of features as the important direction of variance in multidimensional data. Imagine a 3-d chart plotting which of 3 items the user bought. It would be an amorphous blob but the actual axis or orientation of the blob is probably not along the x,y,z axises. The vectors that it does orient along are the features in vector form. Take this to huge dimensional data (many users, many items) and this high-dimensional data very often can be spanned by a small number of vectors, most variance not along these new axises is very small and may even be noise. So an algorithm like ALS finds these few vectors that represent most of the span of data. Therefore "features" can be thought of as the primary modes of variance in the data or put another way, the archetypes for describing how one item differs from another.
Note that PQ factorization in recommenders relies on dropping insignificant features to achieve potentially huge compression of the data. These insignificant features (ones that account for very little variance in the user/items input) can be dropped because they often are interpreted as noise and in practice yield better results for being discarded.
Can you find hidden patterns; sure. The new smaller but dense item and user vectors can be treated with techniques like clustering, KNN, etc. They are just vectors in a new "space" defined by the new basis vectors--the new axises. When you want to interpret the result of such operations you will need to transform them back into item & user space.
The essence of ALS (PQ matrix factorization) is to transform the user's feature vector into item space and rank by the item weights. The highest ranked items are recommended.

Resources