Grouping of summaries during hyperparameter search in tensorflow? - machine-learning

I would like to perform a hyperparameter search. On each combination of parameters, I am creating appropriate graph and run training on it. Inside graph I have summaries like likelihood plots.
How to organize tensorboard reporting in order to see everything in one page?
I can have different logdir names, what are else options?

Assuming you have experiment1 and experiment2, each of them corresponds to a different hyperparam setting, the best way will be to save the summaries is in the following structure:
<base_path>/experiment1/
<base_path>/experiment2/
Now, when invoking tensorboard, to see comparison of the experiments according to the summaries defined, load tensorboard with logsdir=<base_path>
Note that in order to see a comparison of the different settings you must give each summary a name that is not dependent on a specific setting. For example, tf.summary.scalar("loss", loss) is good but tf.summary.scalar("loss_%s"%(experiment_id), loss) will result in different summary plot for each different setting.

Related

Result verification with Weka Experiment tab with individual classifier models

I ran different classifiers on the same dataset. I got some statistical values after run the classifiers.
This is the summary of all classifiers
I am using Weka to trained the model. Weka itself has a method to compare different algorithms. For that we need to use the Experiment tab. I have done with this option as well for the same dataset.
Weka gave me the result for Kappa statistics when use Experiment tab
Rootmean squared error is
Relative absolute error
and so on.....
Now I am unable to understand that the values I got from Experiment tab how does those are similar to the values that I have shared in the table format in the first picture?
I presume that the initial table was populated with statistics obtained from cross-validation runs in the Weka Explorer.
The Explorer aggregates the predictions across a single cross-validation run so that it appears that you had a single test set of that size. It is only to be used as an explorative tool, hence the name.
The Experimenter records the metrics (like accuracy, rmse, etc) generated from each fold pair across the number of runs that you perform during your experiment. The metrics collected across multiple classifiers and/or datasets can then be analyzed using significance tests. By default, 10 runs of 10-fold CV are used, which is recommended for such comparisons. This results in 100 individual values for each metric from which mean and standard deviation are generated. */v indicate whether there is a statistically significant loss/win.

Applying multiple weights

I have a data set that has weighted scores based on gender and age profiles. I also have region data broken up into 7 states. I basically want to exclude one of those states and apply additional weights based on state to come up with a new "overall" score.
Manual excel calculations is only way I can think of doing this.
I need to take scores that already have a variable weight applied and add an additional weight dependent on region.
SPSS Statistics only allows a single weight variable to be applied at any one time, so Kevin Troy's comments are correct: you'll have to combine things into a single weight. If the data are properly combined into a single file you may find the Rake Weights extension that's installed with the Python Essentials useful, as you can specify multiple variables as inputs to the overall weighting scheme and have the weights calculated for you. If you're not familiar with the theory behind this, look up raking or rim weighting.

Use k-means test results for training set SPSS

I am student working with SPSS (statistics) for the first time. I used 1,000 rows of test data to run k-means cluster tool and obtained the results. I now want to take those results and run against a test set (another 1,000) to see how my model did.
I am not sure how to do this; any help is greatly appreciated!
Thanks
For clustering model (or any unsupervised model), there really is no right or wrong result. As such, there is no target variable that you can compare the cluster model result (the cluster allocation) to and the idea of splitting the data set into a training and a testing partition does not apply to these types of models.
The best you can do is to review the output of the model and explore the cluster allocations and determine whether these appear to be useful for the intended purpose.

How many and/or what criteria does CfsSubsetEvaluator use in selecting features in each step of cross-validation while doing feature selection?

I am quite new to WEKA, and I have a dataset of 111 cases with 109 attributes. I am using feature selection tab in WEKA with CfsSubsetEval and BestFirst search method for feature selection. I am using leave-one-out cross-validation.
So, how many features does WEKA pick or what is the stopping criteria for number of features this method selects in each step of cross-validation
Thanks,
Gopi
The CfsSubsetEval algorithm is searching for a subset of features that work well together (have low correlation between the features and a high correlation to the target label). The score of the subset is called merit (you can see it in the output).
The BestFirst search won't allow you to determine the number of features to select. However, you can use other methods such as the GreedyStepWise or using InformationGain/GainRatio algorithms with Rankerand define the size of the feature set.
Another option you can use to influence the size of the set is the direction of the search (forward, backward...).
Good luck

Splitting data set into training and testing sets on recommender systems

I have implemented a recommender system based upon matrix factorization techniques. I want to evaluate it.
I want to use 10-fold-cross validation with All-but-one protocol (https://ai2-s2-pdfs.s3.amazonaws.com/0fcc/45600283abca12ea2f422e3fb2575f4c7fc0.pdf).
My data set has the following structure:
user_id,item_id,rating
1,1,2
1,2,5
1,3,0
2,1,5
...
It's confusing for me to think how the data is going to be splitted, because I can't put some triples (user,item,rating) in the testing set. For example, if I select the triple (2,1,5) to the testing set and this is the only rating user 2 has made, there won't be any other information about this user and the trained model won't predict any values for him.
Considering this scenario, how should I do the splitting?
You didn't specify a language or toolset so I cannot give you a concise answer that is 100% applicable to you, but here's the approach I took to solve this same exact problem.
I'm working on a recommender system using Treasure Data (i.e. Presto) and implicit observations, and ran into a problem with my matrix where some users and items were not present. I had to re-write the algorithm to split the observations into train and test so that every user and every item would be represented in the training data. For the description of my algorithm I assume there are more users than items. If this is not true for you then just swap the two. Here's my algorithm.
Select one observation for each user
For each item that has only one observation and has not already been selected from the previous step select one observation
Merge the results of the previous two steps together.
This should produce a set of observations that covers all of the users and all of the items.
Calculate how many observations you need to fill your training set (generally 80% of the total number of observations)
Calculate how many observations are in the merged set from step 3.
The difference between steps 4 and 5 is the number of remaining observations necessary to fill the training set.
Randomly select enough of the remaining observations to fill the training set.
Merge the sets from step 3 and 6: this is your training set.
The remaining observations is your testing set.
As I mentioned, I'm doing this using Treasure Data and Presto so the only tool I have at my disposal is SQL, common table expressions, temporary tables, and Treasure Data workflow.
You're quite correct in your basic logic: if you have only one observation in a class, you must include that in the training set for the model to have any validity in that class.
However, dividing the input into these classes depends on the interactions among various observations. Can you identify classes of data, such as the "only rating" issue you mentioned? As you find other small classes, you'll also need to ensure that you have enough of those observations in your training data.
Unfortunately, this is a process that's tricky to automate. Most one-time applications simply have to hand-pick those observations from the data, and then distribute the others per normal divisions. This does have a problem that the special cases are over-represented in the training set, which can detract somewhat from the normal cases in training the model.
Do you have the capability of tuning the model as you encounter later data? This is generally the best way to handle sparse classes of input.
collaborative filtering (matrix factorization) can't have a good recommendation for an unseen user with no feedback. Nevertheless, an evaluation should consider this case and take it into account.
One thing you can do is to report performance for all test users, just test users with some feedback and just unseen users with no feedback.
So I'd say keep the test, train split random but evaluate separately for unseen users.
More info here.

Resources