encogmodel selectmethod configuration - machine-learning

Could someone point me to examples on how to configure the encogmodel with selectmethod? This is an overloaded method with the first one providing just taking inputs as dataset and method. The second one however allows the following:
dataset
methodtype
methodArgs
trainingType
trainingArgs
I am unable to get this working as the following error appears "Layer can't have zero neurons, Unknown architecture element:". Any help is appreciated. thank you.
Also, some insight on how to dump the weights in this approach? When the model is built via building the network (BasicNetwork), it is possible to dump the weights as network.flat approach. In this encogmodel driven approach, how do we dump the weights, gradients etc? thank you

There are three examples for EncogModel, you can find them here:
If that does not help, let me know more specifically what you are trying to do, or provide some code that is not working, and I update this to a more specific answer.
The weights can be directly accessed by BasicNetwork.dumpWeights, BasicNetwork.dumpWeightsVerbose(), or more directly with BasicNetwork.getWeight

Related

Robust Standard Errors in spatial error models

I am fitting a Spatial Error Model using the errorsarlm() function in the spdep library.
The Breusch-Pagan test for spatial models, calculated using the bptest.sarlm() function, suggest the presence of heteroskedasticity.
A natural next step would be to get the robust standard error estimates and update the p-values. In the documentation of the bptest.sarlm() function says the following:
"It is also technically possible to make heteroskedasticity corrections to standard error estimates by using the “lm.target” component of sarlm objects - using functions in the lmtest and sandwich packages."
and the following code (as reference) is presented:
lm.target <- lm(error.col$tary ~ error.col$tarX - 1)
if (require(lmtest) && require(sandwich)) {
print(coeftest(lm.target, vcov=vcovHC(lm.target, type="HC0"), df=Inf))}
where error.col is the spatial error model estimated.
Now, I can easily adapt the code to my problem and get the robust standard errors.
Nevertheless, I was wondering:
What exactly is the “lm.target” component of sarlm objects? I can not find any mention to it in the spdep documentation.
What exactly are $tary and $tarX? Again, it does not seem to be mentioned on the documentation.
Why documentation says it is "technically possible to make heteroskedasticity corrections"? Does it mean that proposed approach is not really recommended to overcome issues of heteroskedasticity?
I report this issue on github and had a response by Roger Bivand:
No, the approach is not recommended at all. Either use sphet or a Bayesian approach giving the marginal posterior distribution. I'll drop the confusing documentation. tary is $y - \rho W y$ and similarly for tarX in the spatial error model case. Note that tary etc. only occur in spdep in documentation for localmoran.exact() and localmoran.sad(); were you using out of date package versions?

How to make your own custom image dataset?

As I am working on my project that is to detect FOD (Foreign Object Debirs) that is found on the runway. FOD include anything like nuts, bolts, screws, locking wires, plastic debris, stones etc. that has the potential to cause damage to the aircraft. Now I have searched on the Internet to find any image dataset but no dataset is available related to FOD. Now my question is kindly guide me that how can I make my own dataset of images that can then be used for training purpose.
Kindly guide me in making image dataset for both classification and detection purposes. And also the data pre-processing that will be required. Thanks and waiting for the reply!
Although the question is a bit vague regarding your requirements and the specs of your machine, I'll try to answer it. You'll need object detection to do your task. There are many models available which you can use like Yolo, SSD, etc..
To create your own dataset, you can follow these steps:
Take lots of images of your objects of interest in various conditions, viewpoints and backgrounds. (Around 2000 per class should be good enough).
Now annotate (or mark) where your object is in the image. If you're using Yolo, make use of Yolo-mark for annotating. There should be other similar tools for SSD and other models.
Now you can begin training.
These steps should get you started or at least point you in the right direction.
You can build your own dataset with this code. I wrote it, and it works correctly.
You need to import the libraries and add your DATADIR.
if __name__ == "__main__":
for category in CATEGORIES:
path = os.path.join(DATADIR, category)
class_num = CATEGORIES.index(category)
for img in os.listdir(path):
try:
img_array = cv2.imread(os.path.join(path,img))
new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))
training_data.append([new_array, class_num])
except Exception as e:
pass
for features, label in training_data:
x_train.append(features)
y_train.append(label)
#create pikle
pickle_out = open("x_train.pickle", "wb")
pickle.dump(x_train, pickle_out)
pickle_out.close()
pickle_out = open("y_train.pickle", "wb")
pickle.dump(y_train, pickle_out)
pickle_out.close()
In case if you're starting completely from scratch, you can use "Dataset Directory", available on Play store. The App helps you in creating custom datasets using your mobile. You'll have to sign in to your Google drive such that your dataset is stored in Drive rather on your mobile. Additionally, It also contains Labelling the entity for classification and Regression predictive models.
Currently, the App supports Binary Image Classification and Image Regression.
Hope this Helped!
Download Link :
https://play.google.com/store/apps/details?id=com.applaud.datasetdirectory

Tiny YOLOv3 (Darknet) training "too quickly" and produces different output

I am pretty new to YOLO/Darknet and am walking in circles with the solutions. I have looked at the Github and Stackexchange fora pages corresponding with similar issues, but none seems to directly address this output issue (i.e. where the region IOU line is missing). Here is my output (training/testing):
Here is my directory structure:
Other details:
I am using the AlexeyAB fork.
6 classes in total (following this convention of annotating occluded and truncated items, so two "items" with three classes each)
I'm using 200+ training images (definitely too few, but I don't know if this is the root cause of my troubles).
There is no predictions.png, just predictions.jpg. However, I don't think this should be an issue.
I followed this tutorial.
Any help is very much appreciated; thank you in advance!
If it finish too soon on training, try adding -clear 1at the end of your training command.
EDIT:
This is the correct answer (ergo why I accepted it), but lacks an explanation. The "-clear 1" flag is, according to this answer, clears past stats.

Support Vector Machine bad results-Python

I'm studying SVM and implemented this code , it's too basic,primitive and taking too much time but I just wanted to see how it actually works.Unfortunately,it is giving me bad results.What did I miss? Some coding error or mathematical mistakes? If you want to look at dataset , it's link here. I taked it from UCI Machine Learning Repository. Thanks for your deal.
def hypo(x,q):
return 1/(1+np.exp(-x.dot(q)))
data=np.loadtxt('LSVTVoice',delimiter='\t');
x=np.ones(data.shape)
x[:,1:]=data[:,0:data.shape[1]-1]
y=data[:,data.shape[1]-1]
q=np.zeros(data.shape[1])
C=0.002
##mean normalization
for i in range(q.size-1):
x[:,i+1]=(x[:,i+1]-x[:,i+1].mean())/(x[:,i+1].max()-x[:,i+1].min());
for i in range(2000):
h=x.dot(q)
for j in range(q.size):
q[j]=q[j]-(C*np.sum( -y*np.log(hypo(x,q))-(1-y)*np.log(1-hypo(x,q))) ) + (0.5*np.sum(q**2))
for i in range(y.size):
if h[i]>=0:
print y[i],'1'
else:
print y[i],'0'
Depending on your data, it's very usual that Simple Implementation of SVM give you bad result. You must try advanced version on SVM implementation(e.g Sickit SVM) you can also check this: https://github.com/scikit-learn/scikit-learn/tree/master/sklearn/svm
SVM has types of implementation and parameters like different kernels(e.g rbf). You must check them and try them with different parameter(depending on your data) and compare results to each other.
You can use Grid Search approach for comparing(check this: http://scikit-learn.org/stable/modules/grid_search.html)

Does it help to duplicate original data in order to make more data for building model?

I just got an interview question.
"Assume you want to build a statistical or machine learning model, but you have very limited data on hand. Your boss told you can duplicate original data several times, to make more data for building the model" Does it help?
Intuitively, it does not help, because duplicating original data doesn't create more "information" to feed the model.
But is there anyone can explain it more statistically? Thanks
Consider e.g. variance. The data set with the duplicated data will have the exact same variance - you don't have a more precise estimate of the distrbution afterwards.
There are, however, some exceptions. For example bootstrap validation helps when evaluating your model, but you have very little data.
Well, it depends on exactly what one means by "duplicating the data".
If one is exactly duplicating the whole data set a number of times, then methods based on maximum likelihood (as with many models in common use) must find exactly the same result since the log likelihood function of the duplicated data is exactly a multiple of the unduplicated data's log likelihood, and therefore has the same maxima. (This argument doesn't apply to methods which aren't based on the likelihood function; I believe that CART and other tree models, and SVM's, are such models. In that case you'll have to work out a different argument.)
However, if by duplicating, one means duplicating the positive examples in a classification problem (which is common enough, since there are often many more negative examples than positive), then that does make a difference, since the likelihood function is modified.
Also if one means bootstrapping, then that, too, makes a difference.
PS. Probably you'll get more interest in this question on stats.stackexchange.com.

Resources