Unzip a 7z file in google collab? - machine-learning

I am trying to write a CNN on Kaggle’s Amazon from Space dataset. I can’t spend money now. So, I want to use Google collab. I have successfully downloaded the dataset using kaggle cli tool. But I am not able to extract the data. Please help me.
[enter image description here][1]
[1]: https://i.stack.imgur.com/RFAnL.png

Try this
!7z e train-jpg.tar.7z
See if you get the tar file, then tar -xvf
Also check that train-jpg.tar.7z is in the current directory too.

Related

Redirect output along with error of console to a file on AWS S3 inside docker

I'm trying to find a way to re-direct the output as well as the errors to AWS S3 as a file inside docker. There are already some great answers in this link.
Using the first answer given in the link I tried the following (outisde docker):
python3 train.py 2>&1 | aws s3 cp - s3://my_bucket_name/folder/output.log
This is working properly. The output.log file gets created in my s3 bucket as I intent to do. But when I put the same command as the CMD command inside Dockerfile it does nothing.
CMD python3 train.py 2>&1 | aws s3 cp - s3://my_bucket_name/folder/output.log
In fact docker kind of stuck and terminates after a while.
But, if I use the following code inside docker, the output gets created in the mount directory without any issue:
CMD python3 train.py > /mount/directory/output.log 2>&1
But I want the file uploaded to S3 live.
My use case:
I'm trying to train a deep learning model in an EC2 instance, but I want to somehow get whatever happening on the console as a log file and store it on S3 live. From S3 whenever a log file is uploaded lambda triggers and it sends that log file to the localhost/another server for some processing.
Also, is there any way to show the output in the main console along with the file being uploaded in S3?
P.S. I don't have a software background. I'm a mathematician trying to get in the field of deep learning. So, if I've framed the question wrong or used wrong terminologies pardon me.

Download google sheets file as csv to cpanel using cron task

I have a specific task to accomplish which involves downloading a file from Google sheets. I need to always have just one file downloaded so the new file will overwrite any previous one (if it exists)
I have tried the following command but I can't quite get it to work. Not sure what's missing.
/usr/local/bin/php -q https://docs.google.com/spreadsheets/d/11rFK_fQPgIcMdOTj6KNLrl7pNrwAnYhjp3nIrctPosg/ -o /usr/local/bin/php /home/username/public_html/wp-content/uploads/wpallimport/files.csv
Managed to solve with the following:
curl --user-agent cPanel-Cron https://docs.google.com/spreadsheets/d/[...]/edit?usp=sharing --output /home/username/public_html/wp-content/uploads/wpallimport/files/file.csv

Error downloading YouTube-8M dataset with curl in Windows 8.1

I'm trying to download a small chunk of the YouTube-8M dataset. It is just a dataset with video features and labels and you can create your own model to classify them.
The command that they claim will download the dataset is this :
curl storage.googleapis.com/data.yt8m.org/download_fix.py | shard=1,100 partition=2/frame/train mirror=us python
This actually didn't worked at all and the error produced is :
'shard' is not recognized as an internal or external command,operable program or bash file.
I found someone posted on a forum. It says to add 'set' to the variables which seems to fix my problem partially.
curl storage.googleapis.com/data.yt8m.org/download_fix.py | set shard=1,100 partition=2/video/train mirror=us python
The download seemingly started for a split second and an error pop up. The error right now is (23) Failed writing body.
So what is the command line for downloading the dataset.
I'd try using the Kaggle API instead. You can install the API using:
pip install Kaggle
Then download your credentials (step-by-step guide here). Finally, you can download the dataset like so:
kaggle competitions download -c youtube8m
If you only want part of the dataset, you can first list all the downloadable files:
kaggle competitions files -c youtube8m
And then only download the file(s) you want:
kaggle competitions download -c youtube8m -f name_of_your_file.extension
Hope that helps! :)

Google Cloud Platform - Viewing downloaded files after wget

I am completing this tutorial and am at the part where you download the code for the tutorial. The request we send to Github is:
wget https://github.com/GoogleCloudPlatform/cloudml-samples/archive/master.zip
I understand that this downloads archive to GCP, and I can see the files in the Cloud shell, but is there a way to see the files through the Google Console GUI? I would like to browse the files I have downloaded to understand their structure better.
By clicking on the pencil icon on the top right corner, the Cloud Shell Code editor will pop.
Quoting the documentation:
"The built-in code editor is based on Orion. You can use the code
editor to browse file directories as well as view and edit files, with
continued access to the Cloud Shell. The code editor is available by
default with every Cloud Shell instance."
You can find more info here: https://cloud.google.com/shell/docs/features#code_editor
If you prefer to use the command line to view files, you can install and run the tree Unix CLI command 1 and run it in Cloud Shell to list contents of directories in a tree-like format.
install tree => $ sudo apt-get install tree
run it => $ tree ./ -h --filelimit 4
-h will show human readable size of files/directories
and you can use --filelimit to set the maximum number of directories to descent within the list.
Use $ man tree to see the available parameters for the command, or check the man online documentation here: https://linux.die.net/man/1/tree

How to install libspatialindex on Google Colaboratory

To efficiently analyse spatial data with Python, I use the rtree spatial index library, relying on the libspatialindex C library.
I am able to successfully install rtree in the Google Colaboratory notebook using !pip install rtree.
As expected, this is not sufficient, as libspatialindex needs to be installed first, as confirmed by import rtree resulting in:
OSError: Could not find libspatialindex_c library file
I am unsure whether and how to install external libraries in the Google Collaboratory. Following https://github.com/libspatialindex/libspatialindex/wiki/1.-Getting-Started I managed to run !curl -L http://download.osgeo.org/libspatialindex/spatialindex-src-1.8.5.tar.gz | tar xz but I do not have permissions for configure:
!spatialindex-src-1.8.5/configure
/bin/sh: 1: spatialindex-src-1.8.5/configure: Permission denied
Edit: Looks like the bug has been fixed. Building no longer requires the !mount ... command below. I've updated the example notebook accordingly.
The original response follows.
This looks like a Colab bug. The /content directory is mounted with noexec, which is what's causing the permissions error.
Until that's fixed, you can remount /content with the exec permissions you need using the command:
!mount -o remount,exec /content
Here's a complete notebook that installs libspatialindex and rtree.
https://colab.research.google.com/notebook#fileId=1N7i9zmOwVcUzd4eHWZux4p_WTBMZHi8C

Resources