I am working on a project to predict day to an event(graft failure) using machine learning approaches with a dataset containing about 900 variables, I am new to this area and I am not sure whether I am in the right track.
At first, I used Neural Network for prediction using Keras library in python, then I figured out I have 70% censored instances in my data (Y variable). However, I have also a follow up variable in addition to the Y. So, I came to conclusion that I should not use Neural Network for this kind of problems and also I have to handle censored data beforehand. Now I have three questions:
1) Is it normal to have this rate of censored data? How should we deal with this kind of cases?
2) Is my conclusion correct that Neural Network is not the best solution for my problem? If not, what is the most common machine learning approach? I've found lifelines and scikit-survival packages for this purpose, but I don't know which one is better and It is the right solution.
3) I replaced censored data with follow up variable, while according to the high percentage of censoring data in my dataset I think it is not suitable. I realized that I should use another approach like Calibration, but I could not find out the python library for doing this. Could you please help me with this? What usually is used for Calibration in Pyhton?
A way to deal with censored data I think is to use a model that predicts cox proportional hazard/kaplan meier. Looks like there is a framework here researchgate.net/publication/…. it has an associate GitHub github.com/jaredleekatzman/DeepSurv. Unfortunately in Theano not Keras
I take back what I said. I found a keras version (though I think it is still backend with theano; not too hard to change)!
https://github.com/mexchy1000/DeepSurv_Keras/blob/master/Survival_Keras_lifelineExample.py
Related
I try to make the correspondence between two faces and give as a result if two faces match or not.
To do this, I did some research and I found the face comparison package (https://pypi.org/project/face-compare/) that allows me to do this, and it works very well which is based on FaceNet. But here, I want to compare the accuracy of this solution with other solutions to choose the best one. Can anyone have an idea of other solutions (open source or commercial) that can help me for this benchmark
The FaceNet work should be a good start. The network does a good feature matching for the facial data. Even though the face-compare library uses the same model, it would be good if you can fine-tune the FaceNet model on another dataset and evaluate with respect to the output form face-compare.
Apart from that, different variants of siamese architecture can be tried for feature matching. If you want to compare the matching, try getting the triplet loss value for set of images.
Is this possible to train new smaller network based on already trained network without data? i.e. new network should just try to mimic behaviour of 1st one.
If it's not possible with out data, if there any benefits of have already trained network? i.e. as I understand at least we can use it for pseudo labeling.
Update:
The most relevant paper I have found:
https://arxiv.org/pdf/1609.02943.pdf
I don't think that you can say that you are training a network if you are not using any data. But you can always try to get a smaller one, for example by pruning the large network (in the simplest case, this means removing weights that have an l2 norm that is close to zero), there is a rich literature on the subject. Also, I think you might find some works in knowledge distillation useful, e.g. Data-Free Knowledge Distillation
for Deep Neural Networks .
I am using FCN (Fully Convolutional Networks) and trying to do image segmentation. When training, there are some areas which are mislabeled, however further training doesn't help much to make them go away. I believe this is because network learns about some features which might not be completely correct ones, but because there are enough correctly classified examples, it is stuck in local minimum and can't get out.
One solution I can think of is to train for an epoch, then validate the network on training images, and then adjust weights for mismatched parts to penalize mismatch more there in next epoch.
Intuitively, this makes sense to me - but I haven't found any writing on this. Is this a known technique? If yes, how is it called? If no, what am I missing (what are the downsides)?
It highly depends on your network structure. If you are using the original FCN, due to the pooling operations, the segmentation performance on the boundary of your objects is degraded. There have been quite some variants over the original FCN for image segmentation, although they didn't go the route you're proposing.
Just name a couple of examples here. One approach is to use Conditional Random Field (CRF) on top of the FCN output to refine the segmentation. You may search for the relevant papers to get more idea on that. In some sense, it is close to your idea but the difference is that CRF is separated from the network as a post-processing approach.
Another very interesting work is U-net. It employs some idea from the residual network (RES-net), which enables high resolution features from lower levels can be integrated into high levels to achieve more accurate segmentation.
This is still a very active research area. So you may bring the next break-through with your own idea. Who knows! Have fun!
First, if I understand well you want your network to overfit your training set ? Because that's generally something you don't want to see happening, because this would mean that while training your network have found some "rules" that enables it to have great results on your training set, but it also means that it hasn't been able to generalize so when you'll give it new samples it will probably perform poorly. Moreover, you never talk about any testing set .. have you divided your dataset in training/testing set ?
Secondly, to give you something to look into, the idea of penalizing more where you don't perform well made me think of something that is called "AdaBoost" (It might be unrelated). This short video might help you understand what it is :
https://www.youtube.com/watch?v=sjtSo-YWCjc
Hope it helps
Is there software out there that optimises the best combination of learning rate, weight ranges, hidden layer structure, for a certain task? After presumably trying and failing different combinations? What is this called? As far as I can tell, we just do it manually at the moment...
I know this is not differently code related but am sure it will help many others too. Cheers.
The above comes under multi variate optimization problem, use an optimization algorithm and check the results. Particle Swarm Optimization would do it ( there are however considerations to use this algorithm) as long as you have a cost function to optimize for example the error rate of the network output
I've got a set of 16000 images. I've got one sample images, I need to find one of 16000 images on it. I've already tried OpenCV's ORB + FLANN approach, but it is too slow. I hope once trained network will be faster than it. I don't know NN theory well, I've read some articles & websites and I've got a bunch of questions:
Should I use 16k output neurons to classificate input image?
How can I train my NN if I have only one train image per class?
What architecture should I use?
Maybe I should increase training dataset by randomly distorting input images?
Sorry in advance for my bad English:)
I'm not an expert, but I think that this kind of problem is not the perfect suit for neural networks. Probably feature extraction, interest points and descriptors, all avaliable in openCV, are the best option. Anyway, let's try this. With the infos received, I think you could try this:
SOM network - Create a Self Organizing Maps network with 16.000 classes for output. Never saw a example with that many classes and just one sample per class, but it should work. Maybe you can try to use PCA to reduce the images dimensionality. Keep training the network with your images (or PCA features). Start with, I don't know, 1.000 epochs. Keep raising this value until you have good results.
Here you can read a bit more about SOM