Error downloading YouTube-8M dataset with curl in Windows 8.1 - youtube

I'm trying to download a small chunk of the YouTube-8M dataset. It is just a dataset with video features and labels and you can create your own model to classify them.
The command that they claim will download the dataset is this :
curl storage.googleapis.com/data.yt8m.org/download_fix.py | shard=1,100 partition=2/frame/train mirror=us python
This actually didn't worked at all and the error produced is :
'shard' is not recognized as an internal or external command,operable program or bash file.
I found someone posted on a forum. It says to add 'set' to the variables which seems to fix my problem partially.
curl storage.googleapis.com/data.yt8m.org/download_fix.py | set shard=1,100 partition=2/video/train mirror=us python
The download seemingly started for a split second and an error pop up. The error right now is (23) Failed writing body.
So what is the command line for downloading the dataset.

I'd try using the Kaggle API instead. You can install the API using:
pip install Kaggle
Then download your credentials (step-by-step guide here). Finally, you can download the dataset like so:
kaggle competitions download -c youtube8m
If you only want part of the dataset, you can first list all the downloadable files:
kaggle competitions files -c youtube8m
And then only download the file(s) you want:
kaggle competitions download -c youtube8m -f name_of_your_file.extension
Hope that helps! :)

Related

Download google sheets file as csv to cpanel using cron task

I have a specific task to accomplish which involves downloading a file from Google sheets. I need to always have just one file downloaded so the new file will overwrite any previous one (if it exists)
I have tried the following command but I can't quite get it to work. Not sure what's missing.
/usr/local/bin/php -q https://docs.google.com/spreadsheets/d/11rFK_fQPgIcMdOTj6KNLrl7pNrwAnYhjp3nIrctPosg/ -o /usr/local/bin/php /home/username/public_html/wp-content/uploads/wpallimport/files.csv
Managed to solve with the following:
curl --user-agent cPanel-Cron https://docs.google.com/spreadsheets/d/[...]/edit?usp=sharing --output /home/username/public_html/wp-content/uploads/wpallimport/files/file.csv

Apache Jena: riot does not produce output

I recently installed Apache Jena 3.17.0, and have been trying to use it to convert nquads files to ntriples.
As per the instructions, here (https://jena.apache.org/documentation/tools/) I first set up my WSL (Ubuntu 20.04) environment
$ export JENA_HOME=apache-jena-3.17.0/
$ export PATH=$PATH:$JENA_HOME/bin
and then attempted to run riot to do the conversion (triail.nq is my nquads file).
$ riot --output=NTRIPLES -v triail.nq
When I ran this, I got no output to the terminal. I'm not sure what is going wrong here, since there is no error message. Does anyone know what could be causing this / what the solution could be?
Thanks in advance!
The command will read the quad (multiple graph) data and output only the default graph. Presumably there is no default graph data in triail.nq.
If "convert" means combine all the quads into a single graph, then remove the graph field on each line of the data file with a text editor.
Otherwise, read into a RDF dataset and copy the named graphs into a single graph and output that.

Unzip a 7z file in google collab?

I am trying to write a CNN on Kaggle’s Amazon from Space dataset. I can’t spend money now. So, I want to use Google collab. I have successfully downloaded the dataset using kaggle cli tool. But I am not able to extract the data. Please help me.
[enter image description here][1]
[1]: https://i.stack.imgur.com/RFAnL.png
Try this
!7z e train-jpg.tar.7z
See if you get the tar file, then tar -xvf
Also check that train-jpg.tar.7z is in the current directory too.

Google Cloud Platform - Viewing downloaded files after wget

I am completing this tutorial and am at the part where you download the code for the tutorial. The request we send to Github is:
wget https://github.com/GoogleCloudPlatform/cloudml-samples/archive/master.zip
I understand that this downloads archive to GCP, and I can see the files in the Cloud shell, but is there a way to see the files through the Google Console GUI? I would like to browse the files I have downloaded to understand their structure better.
By clicking on the pencil icon on the top right corner, the Cloud Shell Code editor will pop.
Quoting the documentation:
"The built-in code editor is based on Orion. You can use the code
editor to browse file directories as well as view and edit files, with
continued access to the Cloud Shell. The code editor is available by
default with every Cloud Shell instance."
You can find more info here: https://cloud.google.com/shell/docs/features#code_editor
If you prefer to use the command line to view files, you can install and run the tree Unix CLI command 1 and run it in Cloud Shell to list contents of directories in a tree-like format.
install tree => $ sudo apt-get install tree
run it => $ tree ./ -h --filelimit 4
-h will show human readable size of files/directories
and you can use --filelimit to set the maximum number of directories to descent within the list.
Use $ man tree to see the available parameters for the command, or check the man online documentation here: https://linux.die.net/man/1/tree

Error while running model training in google cloud ml

I want to run model training in the cloud. I am following this link which runs a sample code to train a model based on flower dataset. The tutorial consists of 4 stages:
Set up your Cloud Storage bucket
Preprocessing training and evaluation data in the cloud
Run model training in the cloud
Deploying and using the model for prediction
I was able to complete step 1 and 2, however in step 3, job is successfully submitted but somehow error occurs and task exits with non exit status 1. Here is the log of the task
Screenshot of expanded log is:
I used following command:
gcloud ml-engine jobs submit training test${JOB_ID} \
--stream-logs \
--module-name trainer.task \
--package-path trainer\
--staging-bucket ${BUCKET_NAME} \
--region us-central1 \
--runtime-version=1.2 \
-- \
--output_path "${GCS_PATH}/training" \
--eval_data_paths "${GCS_PATH}/preproc/eval*" \
--train_data_paths "${GCS_PATH}/preproc/train*"
Thanks in advance!
Can you please confirm that the input files (eval_data_paths and train_data_paths) are not empty? Additionally if you are still having issues can you please file an issue https://github.com/GoogleCloudPlatform/cloudml-samples since its easier to handle the issue on Github.
I met the same issue and couldn't figure out, then I followed this, do it again from git clone and there was no error after running on gcs.
It is clear from your error message
The replica worker 1 exited with a non-zero status of 1. Termination reason: Error
that you have some programming error (syntax, undefined etc).
For more information, Check the return code and meaning
Return code -------------Meaning-------------- Cloud ML Engine response
0 Successful completion Shuts down and releases job resources.
1-128 Unrecoverable error Ends the job and logs the error.
Your need to find your bug first and fix it, then try again.
I recommend run your task locally (if your configuration supports) before you submit in cloud. If you find any bug, you can fix easily in your local machine.

Resources