What does the cntk.blocks.Stabilizer() function do in CNTK? - machine-learning

I am going throught the CNTK 204: Sequence to Sequence Networks with Text Data tutorial. A function cntk.blocks.Stabilizer() is used, but there is currently no documentation for that function. Does anyone know what it does?

It implement self stabilized from:
Self-stabilized deep neural network," P. Ghahremani and J. Droppo, ICASSP 2016
And here a direct link to the paper: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/11/SelfLR.pdf

Related

RL: Self-Play with On-Policy and Off-Policy

I try to implement self play with PPO.
Suppose we have a game with 2 agents. We control one player on each side and get information like observation and reward after each step. As far as I know, you can use the information of the right and left player to generate training data and to optimize the model. But that is only possible for off-policy, isn't it?
Because with on-policy e.g. PPO, you expect that the training data to be generated by the current network version and that is usually not the case during self play?
Thanks!
Exactly, this is also the same reason why you can use experience-replay (Replay BUffers) only for off-policy methods like Q-learning. Using sample steps that were not generated by the current policy violates the mathematical assumptions behind the gradients that are being backpropagated.

best algorithm to predict 3 similar blogs based on a blog props and contents only

{
"blogid": 11,
"blog_authorid": 2,
"blog_content": "(this is blog complete content: html encoded on base64 such as) PHNlY3Rpb24+PGRpdiBjbGFzcz0icm93Ij4KICAgICAgICA8ZGl2IGNsYXNzPSJjb2wtc20tMTIiIGRhdGEtdHlwZT0iY29udGFpbmVyLWNvbnRlbn",
"blog_timestamp": "2018-03-17 00:00:00",
"blog_title": "Amazon India Fashion Week: Autumn-",
"blog_subtitle": "",
"blog_featured_img_link": "link to image",
"blog_intropara": "Introductory para to article",
"blog_status": 1,
"blog_lastupdated": "\"Mar 19, 2018 7:42:23 AM\"",
"blog_type": "Blog",
"blog_tags": "1,4,6",
"blog_uri": "Amazon-India-Fashion-Week-Autumn",
"blog_categories": "1",
"blog_readtime": "5",
"ViewsCount": 0
}
Above is one sample blog as per my API. I have a JsonArray of such blogs.
I am trying to predict 3 similar blogs based on a blog's props(eg: tags,categories,author,keywords in title/subtitle) and contents. I have no user data i.e, there is no logged in user data(such as rating or review). I know that without user's data it will not be accurate but I'm just getting started with data science or ML. Any suggestion/link is appreciated. I prefer using java but python,php or any other lang also works for me. I need an easy to implement model as I am a beginner. Thanks in advance.
My intuition is that this question might not be at the right address.
BUT
I would do the following:
Create a dataset of sites that would be an inventory from which to predict. For each site you will need to list one or more features: Amount of tags, amount of posts, average time between posts in days, etc.
Sounds like this is for training and you are not worried about accuracy
too much, numeric features should suffice.
Work back from a k-NN algorithm. Don't worry about the classifiers. Instead of classifying a blog, you list the 3 closest neighbors (k = 3). A good implementation of the algorithm is here. Have fun simplifying it for your purposes.
Your algorithm should be a step or two shorter than k-NN which is considered to be among simpler ML, a good place to start.
Good luck.
EDIT:
You want to build a recommender engine using text, tags, numeric and maybe time series data. This is a broad request. Just like you, when faced with this request, I’d need to dive in the data and research best approach. Some approaches require different sets of data. E.g. Collaborative vs Content-based filtering.
Few things may’ve been missed on the user side that can be used like a sort of rating: You do not need a login feature get information: Cookie ID or IP based DMA, GEO and viewing duration should be available to the Web Server.
On the Blog side: you need to process the texts to identify related terms. Other blog features I gave examples above.
I am aware that this is a lot of hand-waving, but there’s no actual code question here. To reiterate my intuition is that this question might not be at the right address.
I really want to help but this is the best I can do.
EDIT 2:
If I understand your new comments correctly, each blog has the following for each other blog:
A Jaccard similarity coefficient.
A set of TF-IDF generated words with
scores.
A Euclidean distance based on numeric data.
I would create a heuristic from these and allow the process to adjust the importance of each statistic.
The challenge would be to quantify the words-scores TF-IDF output. You can treat those (over a certain score) as tags and run another similarity analysis, or count overlap.
You already started on this path, and this answer assumes you are to continue. IMO best path is to see which dedicated recommender engines can help you without constructing statistics piecemeal (numeric w/ Euclidean, tags w/ Jaccard, Text w/ TF-IDF).

Support Vector Machine bad results-Python

I'm studying SVM and implemented this code , it's too basic,primitive and taking too much time but I just wanted to see how it actually works.Unfortunately,it is giving me bad results.What did I miss? Some coding error or mathematical mistakes? If you want to look at dataset , it's link here. I taked it from UCI Machine Learning Repository. Thanks for your deal.
def hypo(x,q):
return 1/(1+np.exp(-x.dot(q)))
data=np.loadtxt('LSVTVoice',delimiter='\t');
x=np.ones(data.shape)
x[:,1:]=data[:,0:data.shape[1]-1]
y=data[:,data.shape[1]-1]
q=np.zeros(data.shape[1])
C=0.002
##mean normalization
for i in range(q.size-1):
x[:,i+1]=(x[:,i+1]-x[:,i+1].mean())/(x[:,i+1].max()-x[:,i+1].min());
for i in range(2000):
h=x.dot(q)
for j in range(q.size):
q[j]=q[j]-(C*np.sum( -y*np.log(hypo(x,q))-(1-y)*np.log(1-hypo(x,q))) ) + (0.5*np.sum(q**2))
for i in range(y.size):
if h[i]>=0:
print y[i],'1'
else:
print y[i],'0'
Depending on your data, it's very usual that Simple Implementation of SVM give you bad result. You must try advanced version on SVM implementation(e.g Sickit SVM) you can also check this: https://github.com/scikit-learn/scikit-learn/tree/master/sklearn/svm
SVM has types of implementation and parameters like different kernels(e.g rbf). You must check them and try them with different parameter(depending on your data) and compare results to each other.
You can use Grid Search approach for comparing(check this: http://scikit-learn.org/stable/modules/grid_search.html)

Original paper for DisparityWLSFilter in openCV?

I am working on post processing of disparity map.
My disparity image, even though it is WLS filtered, has too many 'holes'.
This is what i get for now. Rectified, but in fish eye way. Anyway rectified for sure, but have many holes. Disparity matching algorithm is SGBM. WLS filter sigma is 2.1, lambda is 30000. Black regions are holes.
I am referring official opencv site which says Disparity map post-filtering and it is using DisparityWLSFilter extensively. But I wonder how it works internally and want to read theoretical paper regarding this implementation. I want to know what Sigma and Lambda does, and how it will filter my image.
And, is there any other good disparity filter that i can use? WLS filter cannot fill the 'holes' effectively. Or, any algorithm that is easy to use or easy to implement, or library that is not GPL?
Self reply.
Got answer from Opencv.
Orig question is HERE.
Reply says
References have been added here, documentation reference
cc #sbokov
—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
Check out the comments here, and the code here. That should answer some of your questions. To see how the code author has come up with this method perhaps should contact him directly as there is no reference for that in the code comments.

Hyperopt Exploration/Exploitation strategy

What kind of settings Hyperopt provides to adjust balance between exploration with exploitation ? There's something like "bandit" and "bandit_algo" in the code but no explanation.
Could someone provide any code sample.
Thanks a lot for any help!
I just found hyperopt partial() a magical wrapper function for the optimizer algo. It allows to balance between different strategies and then E/E:
Partial returns the result of a randomly-chosen suggest function. For example to search by sometimes using random search, sometimes anneal, and sometimes tpe, type:
fmin(...,
algo=partial(mix.suggest,
p_suggest=[
(.1, rand.suggest),
(.2, anneal.suggest),
(.7, tpe.suggest),]),
)
Parameter "p_suggest": list of (probability, suggest) pairs. Make a suggestion from one of the suggest functions, in proportion to its corresponding probability. sum(probabilities) must be [close to] 1.0.
If you want an even sharper control of algo progression: you can use the fact that hyperopt optimizer algos are stateless and return the trial object which can be provided as an input to a new fmin to continue the process. Then you can call fmin with max_evals at 1 and handle the process in a loop, therefore you could modify "trials" and "suggest algo" between each iteration.
For the best bet, read the papers by Bergstra et. al. 1 2 and 3. I am not 100% clear on what the bandit_algo is, except that one of the papers mentions it as an alternative method to Gaussian Process and Tree of Parzen Estimators - maybe you can use it in the same way as those two?
My guess is that if it not documented, it may not be finished yet. You can try raising an issue on Github - the devs are fairly responsive from what I have seen.
EDIT: Looking at this paper, these bandit algorithms may be the base class that the others inherit from.

Resources