Symbol detection using Faster RCNN

Symbol detection using Faster RCNN - opencv

I'm using a Faster RCNN network to perform object/symbol detection and I'm facing 2 major issues.
The bounding box of the detected symbols is not tight enough. For example, in many cases, only 50%-70% of the entire symbol is being identified (Example: Resistor R1 in the image below). What can I do to make my bounding box more accurate?
In the below example we have 3 resistors, R1, R2, R3. The trained network is able to identify R1 with partial IoU, R2 properly but it has missed out R3 completely even though R3 is present on the same page and is the same symbol as R1 and R2. Why does this happen and how can I overcome this? (I tried a correlation-based approach but there are too many variations to consider in my use case)
How can I fix the above issues? Thanks in advance.

Related

mlxtend Sequential Feature Selector (SFS) for Regression

I have used SFS from mlxtend with r2 scorer. The thing about R2 is that in regression models, with addition of new features R2 will always increase. Hence, is it ever useful to use R2 as a scorer or should a custom function with adjusted r2 be always used?
While I have seen R2 used in practice, I have seen the plot where with more features I see R2 dropping. For example in this notebook https://www.kaggle.com/code/jorijnsmit/linear-regression-by-sequential-feature-selection the validation section has the SFS plot where R2 is dropping on adding new features.
SFS Plot
Can someone pls help me understand how this is possible?

Could you explain this question? i am new to ML, and i faced this problem, but its solution is not clear to me

The problem is in the picture
Question's image:
Question 2
Many substances that can burn (such as gasoline and alcohol) have a chemical structure based on carbon atoms; for this reason they are called hydrocarbons. A chemist wants to understand how the number of carbon atoms in a molecule affects how much energy is released when that molecule combusts (meaning that it is burned). The chemists obtains the dataset below. In the column on the right, kj/mole is the unit measuring the amount of energy released. examples.
You would like to use linear regression (h a(x)=a0+a1 x) to estimate the amount of energy released (y) as a function of the number of carbon atoms (x). Which of the following do you think will be the values you obtain for a0 and a1? You should be able to select the right answer without actually implementing linear regression.
A) a0=−1780.0, a1=−530.9 B) a0=−569.6, a1=−530.9
C) a0=−1780.0, a1=530.9 D) a0=−569.6, a1=530.9

Since all a0s are negative but two a1s are positive lets figure out the latter first.
As you can see by increasing the number of carbon atoms the energy is become more and more negative, so the relation cannot be positively correlated which rules out options c and d.
Then for the intercept the value that produces the least error is the correct one. For the 1 and 10 (easier to calculate) the outputs are about -2300 and -7000 for a, -1100 and -5900 for b, so one would prefer b over a.
PS: You might be thinking there should be obvious values for a0 and a1 from the data, it's not. The intention of the question is to give you a general understanding of the best fit. Also this way of solving is kinda machine learning as well

Why to add mean in the reconstruction in PCA?

Suppose that X is our dataset (still not centered) and X_cent is our centered dataset (X_cent = X - mean(X)).
If we are doing PCA projection in this way Z_cent = F*X_cent, where F is matrix of principal components, that is pretty obvious that we need to add mean(X) after reconstruction Z_cent.
But what if we are doing PCA projection in this way Z = F*X? In this case we don't need to add mean(X) after reconstruction, but it gives us another result.
I think that something wrong with this procedure (construction-reconstruction), when it applied to the non-centered data (X in our case). Can anyone explain how it works? Why can't we do construction/reconstruction phase without this subracting/adding mean?
Thank you in advance.

If you retain all Principal Components, then reconstruction of the centered and non-centered vectors as described in your question would be identical. The problem (as indicated in your comments) is that you are only retaining K principal components. When you drop PCs, you lose information so the reconstruction will contain errors. Since you don't have to reconstruct the mean in one of the reconstructions, you don't introduce errors w.r.t. the mean there so the reconstruction errors of the two versions will be different.
Reconstruction with fewer than all PCs isn't quite as simple as multiplying by the transpose of the eigenvectors (F') because you need to pad your transformed data with zeros but to keep things simple, I'll ignore that here. Your two reconstructions look like this:
R1 = F'*F*X
R2 = F'*F*X_cent + X_mean
= F'*F*(X - X_mean) + X_mean
= F'*F*X - F'*F*X_mean + X_mean
Since the reconstruction is lossy, in general, F'*F*Y != Y for matrix Y. If you retrained all PCs, you would have R1 - R2 = 0. But since you are only retaining a subset of the PCs, your two reconstructions will differ by
R2 - R1 = X_mean - F'*F*X_mean
Your follow-up question in the comments regarding why it's better to reconstruct X_cent instead of X is a bit more nuanced and really depends on why you are doing PCA in the first place. The most fundamental reason is that the PCs are with respect to the mean in the first place so by not centering the data prior to transforming/rotating, you aren't really decorrelating the features. Another reason is that the numeric values of the transformed data are often orders of magnitude smaller when centering the data first.

How can I use machine learning to extract larger chunks of text from a document?

I am currently learning about machine learning, as I think it might be helpful to solve a problem I have. However, I am unsure about what techniques I should apply to solve my problem. I apologise in advance for probably not knowing enough about this field to even ask a proper question.
What I want to to is extract the significant parts of a knitting pattern (the actual pattern, not all the intro and stuff like that). For instance, I would like to feed this web page into my program and get out something like this:
{
title: "Boot Style Red and White Baby Booties for Cold Weather"
directions: "
Right Bootie.
Cast on (31, 43) with white color.
Rows (1, 3, 5, 7, 9, 10, 11): K.
Row 2: K1, M1, (K14, K20), M1, K1, M1, (K14, K20), M1, K1. (35, 47 sts)
Row 4: K2, M1, (K14, K20), M1, K3, M1, (K14, K20), M1, K2. (39, 51 sts)
Row 6: K3, M1, (K14, K20), M1, K5, M1, (K14, K20), M1, K3. (43, 55 sts)
..."
}
I've been reading about extracting smaller parts, like sentences and words, and also about stuff like Named Entity Recognition, but they all seem to be focused on very small parts of the text.
My current thoughts are to use supervised learning, but I'm also very unsure about how to extract features out of the text. Naive methods like using letters, words or even sentences as features seems like they wouldn't be relevant enough to yield any kind of satisfactory results (and also, there would be tons of features, unless I use some kind of sampling), but what are really the significant features for finding out which parts are what in a knitting pattern?
Can someone point me in the right direction of algorithms and methods to do extraction of larger portions of the text?

One way to see this is as a straightforward classification problem: for each sentence in the page, you want to determine if it's relevant to you or not. Optionally, you have different classes of relevant sentences, such as "title" and "directions".
As a result, for each sentence you need to extract the features that contain information about its status. This will likely involve tokenizing the sentence, and possibly applying some type of normalization. Initially I would focus on features such as individual words (M1, K1, etc.) or n-grams (a number of adjacent words). Yes, there are many of them, but a good classifier will learn which features are informative, and which are not. If you're really worried about data sparseness, you can also reduce the number of features by mapping similar "words" such as M1 and K1 to the same feature.
Additionally, you will need to label a set of example sentences, to serve as the training and test sets for your classifier. This will allow you to train the system, evaluate its performance and compare different approaches.
To start, you can experiment with some simple, but popular classification methods such as Naive Bayes.

Detecting blobs that connects to any other blob, maybe with OpenCV

In the image, I connects(has a bridge, binds) to universe, but II and III not.
I need to detect both II and III, also I if possible.
Is it possible with current computer vision libraries?
Or any path, idea that i can use to draw my own algorithm?
Thanks.

It is possible, but hard to express a generic pre-processing solution without a good bunch of sample images.
One solution could be
frame -> morphological closing + skeletonize + find contours (gives all)
frame -> skeletonize + find contours (gives 2 and 3)
difference gives 1 obviously,
and maybe with some addition of shape matching of those contours with an hand-written eye-like contour -just for an extra check.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart