Download from Google Colab in a scheduled way - machine-learning

I'm training a very resourceful CycleGAN.
As the training at night, it happens that the system makes a wipe of the virtual machine and I lose all the checkpoints of my training phases.
I would like to insert a control, in which for example every 100 epochs, the checkpoints are downloaded to my hard disk, so that I can reload them and restart the training.
Is it possible to download files in a programmed way on Colab?

Variants of this question have been asked several times, and the consensus seems to be to transfer results into one's google drive. I do something like what is described here -- it works well for me.
How to download file created in Colaboratory workspace?

Related

ERP study using EEG signals in EEGLab

I have done an experiment where a user performs a click when a familiar image pops up on the screen. This experiment is done for 10 trials (10 times) and each trial is stored in a different file. I would need clarification on the following questions.
Because I already have each trial in a different file, I don't have to extract epochs. Is my understanding correct?
If I don't have to extract epochs, how do I plot ERPs? Because in the tutorials that I referred to, epoch creation is one of the steps involved in ERP creation.
When I try to create an ERP study, I am unable to find the option to include multiple files (files with epochs) as input. In this case, how do I import multiple epoch files?
When I try to create an ERP study, I am unable to find the option to include multiple files (files with epochs) as input. In this case, how do I import multiple epoch files?

Triggering training whenever recieved new data in some specific folder(ex: my_data)

I am having an issue. I have completed my code in tensorflow for image classification in mnist dataset. It is working perfectly. But I want to implement some advancement in it.
Let's say I have a folder name my_data. In this folder I have only images of 1's and 2's. If I am moving 3's images in this folder and those images is less than 200 then don't start training if it is greater then it should start training.
I have tried to find everything but not able to find any solution regarding this mlops problem.

On replacing the LJ-Speech dataset with your own

In most github repositories for machine learning based text to speech, the LJ-Speech dataset is being used and optimized for.
Having unsucessfully tried to use my own wave files for it, I am interested in the right approach to prepare your dataset for an optimized framework to likely convert.
With Mozilla TTS, you can have a look at the LJ-Speech script used to prepare the data to have an idea of what is needed for your own dataset:
https://github.com/erogol/TTS_recipes/blob/master/LJSpeech/DoubleDecoderConsistency/train_model.sh

Machine learning: specific strategy learned because of playing against specific agent?

First of all I found difficulties formulating my question, feedback is welcome.
I have to make a machine learning agent to play dots and boxes.
I'm just in the early stages but came up with the question: if I let my machine learning agent (with a specific implementation) play against a copy of itself to learn and improve it's gameplay, wouldn't it just make a strategy against that specific kind of gameplay?
Would it be more interesting if I let my agent play and learn against different forms of other agents in an arbitrary fashion?
The idea of having an agent learn by playing against a copy of itself is referred to as self-play. Yes, in self-play, you can sometimes see that agents will "overfit" against their "training partner", resulting in an unstable learning process. See this blogpost by OpenAI (in particular, the "Multiplayer" section), where exactly this issue is described.
The easiest way to address this that I've seen appearing in research so far is indeed to generate a more diverse set of training partners. This can, for example, be done by storing checkpoints of multiple past versions of your agent in memory / in files, and randomly picking one of them as training partner at the start of every episode. This is roughly what was done during the self-training process of the original AlphaGo Go program by DeepMind (the 2016 version), and is also described in another blogpost by OpenAI.

Can anyone explain how to get BIDMach's Word2vec to work?

In a paper titled, "Machine Learning at the Limit," Canny, et. al. report substantial word2vec processing speed improvements.
I'm working with the BIDMach library used in this paper, and cannot find any resource that explains how Word2Vec is implemented or how it should be used within this framework.
There are several scripts in the repo:
getw2vdata.sh
getwv2data.ssc
I've tried running them (after building the referenced tparse2.exe file) with no success.
I've tried modifying them to get them to run but have nothing but errors come back.
I emailed the author, and posted an issue on the github repo, but have gotten nothing back. I only got somebody else having the same troubles, who says he got it to run but at much slower speeds than reported on newer GPU hardware.
I've searched all over trying to find anyone that has used this library to achieve these speeds with no luck. There are multiple references floating around that point to this library as the fastest implementation out there, and cite the numbers in the paper:
Intel research references the reported numbers without running the code on GPU (they cite numbers reported in the original paper)
old reddit post pointing to BIDMach as the best (but the OP says "I haven't tested BIDMach myself yet")
SO post citing BIDMach as the best (OP doesn't actually run the library to make this claim...)
many more not worth listing citing BIDMach as the best/fastest without example or claims of "I haven't tested myself..."
When I search for a similiar library (gensim), and the import code required to run it, I find thousands of results and tutorials but a similar search for the BIDMach code yields only the BIDMach repo.
This BIDMach implementation certainly carries the reputation for being the best, but can anyone out there tell me how to use it?
All I want to do is run a simple training process to compare it to a handful of other implementations on my own hardware.
Every other implementation of this concept I can find either has works with the original shell script test file, provides actual instructions, or provides shell scripts of their own to test.
UPDATE:
The author of the library has added additional shell scripts to get the previously mentioned scripts running, but exactly what they mean or how they work is still a total mystery and I can't understand how to get the word2vec training procedure to run on my own data.
EDIT (for bounty)
I'll give out the bounty to anywone that can explain how I'd use my own corpus (text8 would be great), and then train a model, and then save the ouput vectors and the vocabulary to files that can be read by Omar Levy's Hyperwords.
This is exactly what the original C implementation would do with arguments -binary 1 -output vectors.bin -save-vocab vocab.txt
This is also what Intel's implementation does, and other CUDA implementations, etc, so this is a great way to generate something that can be easily compared with other versions...
UPDATE (bounty expired without answer)
John Canny has updated a few scripts in the repo and added a fmt.txt file, thus making it possible to run test scripts that are package in the repo.
However, my attempt to run this with the text8 corpus yields near 0% accuracy on they hyperwords test.
Running the training process on the billion word benchmark (which is what the repo scripts now do) also yields well-below-average accuracy on the hyperwords test.
So, either the library never yielded accuracy on these tests, or I'm still missing something in my setup.
The issue remains open on github.
BIDMach's Word2vec is a tool for learning vector representations of words, also known as word embeddings. To use Word2vec in BIDMach, you will need to first download and install BIDMach, which is an open-source machine learning library written in Scala. Once you have BIDMach installed, you can use the word2vec function to train a Word2vec model on a corpus of text data. This function takes a number of parameters, such as the size of the word vectors, the number of epochs to train for, and the type of model to use. You can find more detailed instructions and examples of how to use the word2vec function in the BIDMach documentation.

Resources