Download google sheets file as csv to cpanel using cron task - google-sheets

I have a specific task to accomplish which involves downloading a file from Google sheets. I need to always have just one file downloaded so the new file will overwrite any previous one (if it exists)
I have tried the following command but I can't quite get it to work. Not sure what's missing.
/usr/local/bin/php -q https://docs.google.com/spreadsheets/d/11rFK_fQPgIcMdOTj6KNLrl7pNrwAnYhjp3nIrctPosg/ -o /usr/local/bin/php /home/username/public_html/wp-content/uploads/wpallimport/files.csv

Managed to solve with the following:
curl --user-agent cPanel-Cron https://docs.google.com/spreadsheets/d/[...]/edit?usp=sharing --output /home/username/public_html/wp-content/uploads/wpallimport/files/file.csv

Related

Web hdfs open and read a file

I'm trying to automate a process whereby I import multiple files within a hdfs folder. Is there a way like I can get multiple files in one command instead of getting the json and downloading each file with every iteration.
In the below command for the is there a way I can put a wildcard instead of actual file name so I can get multiple files in a single read?
curl -i -L "http://:/webhdfs/v1/?op=OPEN
[&offset=][&length=][&buffersize=]"

Exclude a directory from `podman/docker export` stream and save to a file

I have a container that I want to export as a .tar file. I have used a podman run with a tar --exclude=/dir1 --exclude=/dir2 … that outputs to a file located on a bind-mounted host dir. But recently this has been giving me some tar: .: file changed as we read it errors, which podman/docker export would avoid. Besides the export I suppose is more efficient. So I'm trying to migrate to using the export, but the major obstacle is I can't seem to find a way to exclude paths from the tar stream.
If possible, I'd like to avoid modifying a tar archive already saved on disk, and instead modify the stream before it gets saved to a file.
I've been banging my head for multiple hours, trying useless advices from ChatGPT, looking at cpio, and attempting to pipe the podman export to tar --exclude … command. With the last I did have small success at some point, but couldn't make tar save the result to a particularly named file.
Any suggestions?
(note: I do not make distinction between docker and podman here as their export command is completely the same, and it's useful for searchability)

Travis-CI: Where do I find the path to the file I downloaded, and find out "where I am"?

I would like to download data into the travis environment via a public URL:
before_install:
- curl -OL http://publicwebsite.org/data/subdirectory2
My problem is that I don't know whether the data downloaded successfully.
My question is, how do I now find this data? I have assumed that it would be found at /home/travis/build/subdirectory2, but this isn't true.
Commands such as pwd and ls do not show me the files downloaded in the build, or where I am.
$PWD is $TRAVIS_BUILD_DIR when the execution gets to before_install. https://docs.travis-ci.com/user/environment-variables#Default-Environment-Variables
If you are unsure of where it is, though, you can pass -o /path/to/file to curl.

tarring and untarring between two remote hosts

I have two systems that I'm splitting processing between, and I'm trying to find the most efficient way to move the data between the two. I've figured out how to tar and gzip to an archive on the first server ("serverA") and then use rsync to copy to the remote host ("serverB"). However, when I untar/unzip the data there, it saves the archive including the full path name from the original server. So if on server A my data is in:
/serverA/directory/a/lot/of/subdirs/myData/*
and, using this command:
tar -zcvf /serverA/directory/a/lot/of/subdirs/myData-archive.tar.gz /serverA/directory/a/lot/of/subdirs/myData/
Everything in .../myData is successfully tarred and zipped in myData-archive.tar.gz
However, after copying the archive, when I try to untar/unzip on the second host (I manually log in here to finish the processing, the first step of which is to untar/unzip) using this command:
tar -zxvf /serverB/current/directory/myData-archive.tar.gz
It untars everything in my current directory (serverB/current/directory/), however it looks like this:
/serverB/current/directory/serverA/directory/a/lot/of/subdirs/myData/Data*ext
How should I formulate both the tar commands so that my data ends up in a directory called
/serverB/current/directory/dataHERE/
?
I know I'll need the -C flag to untar into a different directory (in my case, /serverB/current/directory/dataHERE ), but I still can't figure out how to make it so that the entire path is not included when the archive gets untarred. I've seen similar posts but none that I saw discussed how to do this when moving between to different hosts.
UPDATE: per one of the answers in this question, I changed my commands to:
tar/zip on serverA:
tar -zcvf /serverA/directory/a/lot/of/subdirs/myData-archive.tar.gz serverA/directory/a/lot/of/subdirs/myData/ -C /serverA/directory/a/lot/of/subdirs/ myData
and, untar/unzip:
tar -zxvf /serverB/current/directory/myData-archive.tar.gz -C /serverB/current/directory/dataHERE
And now, not only does it untar/unzip the data to:
/serverB/current/directory/dataHERE/
like I wanted, but it also puts another copy of the data here:
/serverB/current/directory/serverA/directory/a/lot/of/subdirs/myData/
which I don't want. How do I need to fix my commands so that it only puts data in the first place?
On serverA do
( cd /serverA/directory/a/lot/of/subdirs; tar -zcvf myData-archive.tar.gz myData; )
After some more messing around, I figured out how to achieve what I wanted:
To tar on serverA:
tar -zcvf /serverA/directory/a/lot/of/subdirs/myData-archive.tar.gz -C /serverA/directory/a/lot/of/subdirs/ myData
Then to untar on serverB:
tar -zxvf /serverB/current/directory/myData-archive.tar.gz -C /serverB/current/directory/dataHERE

GSUTIL not re-uploading a file that has already been uploaded earlier that day

I'm running GSUTIL v3.42 from a Windows CMD script on a Windows server 2008 R2 using Python 2.7.6. Files to be uploaded arrive in an "outgoing" directory and are uploaded in parallel by GSUTIL to an "incoming" bucket. The script requests a listing of the "incoming" bucket after uploading has finished and then compares the files listed with those it attempted to upload, in order to detect any upload failures. Another separate script moves files from the "incoming" bucket to a "processed" bucket afterwards.
If I attempt to upload the identical file (same name/size/content/date etc.) a second time, it doesn't upload, although I get no errors and nothing in my logging to indicate failure. I am not using the "no clobber" option, so I would expect gsutil to just upload the file.
In the scenario below, assume that the file has been successfully uploaded and moved to the "processed" bucket already on that day. In case timings matter, the second upload is being attempted within half an hour of the first.
File A arrives in "outgoing" directory.
I get a file listing of "outgoing" and write this to dirListing.txt
I perform a GSUTIL upload using
type dirListing.txt | python gsutil -m cp -I -L myGsutilLogFile.txt gs://myIncomingBucket
I then perform a GSUTIL listing
python gsutil ls -l -h gs://myIncomingBucket > bucketListing.txt
File match dirListing.txt and bucketListing.txt to detect mismatches and hence upload failures.
On the second run, File A isn't being uploaded in step 3 and consequently isn't returned in step 4, causing a mismatch in step 5. [I've checked the content of all of the relevant files and it's definitely in dirListing.txt and not in bucketListing.txt]
I need the ability to re-process a file in case the separate script that moves the file from the "incoming" to the "processed" bucket fails for some reason or doesn't do what it should do. I have to upload in parallel because there are normally hundreds of files on each run.
Is what I've described above expected behaviour from GSUTIL? (I haven't seen anything in the documentation that suggests this) If so, is there any way of forcing GSUTIL to re-attempt the upload? Or am I missing something obvious, please? I have debug output from GSUTIL if that's necessary/useful.
From the above, it looks like you're uploading using "-L" to log to a manifest file. If you're using the same manifest file, and the file has already been uploaded once, then gsutil will not try to re-upload the file. From the docs on "-L" in "gsutil help cp":
If the log file already exists, gsutil will use the file as an
input to the copy process, and will also append log items to the
existing file. Files/objects that are marked in the existing log
file as having been successfully copied (or skipped) will be
ignored.

Resources