I'm in the process of converting a Makefile-based data workflow to dvc. I have a Google spreadsheet that I'm using in a data workflow to make it easy to update a few things in a makeshift database. Currently this works with something like this:
# Makefile
data.csv:
curl -L https://docs.google.com/spreadsheets/d/MY-GOOGLE-DOC-ID/export?exportFormat=csv > data.csv
Of course, I can incorporate the same step into my dvc pipeline directly with dvc run, but my understanding is that something like dvc import-url would be more appropriate but I'm getting an error:
$ poetry run dvc import-url https://docs.google.com/spreadsheets/d/MY-GOOGLE-DOC-ID/export?exportFormat=csv data.csv
Importing 'https://docs.google.com/spreadsheets/d/MY-GOOGLE-DOC-ID/export?exportFormat=csv' -> 'data.csv'
ERROR: unexpected error - 'NoneType' object has no attribute 'endswith'
My guess is that this is because the response data from the Google Spreadsheet export url doesn't have a filename suffix associated with it. Is there a way to work around this problem? Is there a better way to pull data from a google spreadsheet into a dvc workflow?
Related
I am trying to move data from InfluxDB to QuestDB,
I was able to export my tables as JSON by following: https://stackoverflow.com/a/27913640/1267728
How do I now import these JSON files into QuestDB?
Convert from JSON to CSV
QuestDB supports importing data via CSV file, so first you would need to flatten the JSON and ensure that column names are modified to reflect nested properties.
There is a Java library called Json2Flat that already does this.
Import the CSV file
Using the REST API, you can import the data into QuestDB
curl -F data=file.csv http://localhost:9000/imp
For more details of how to use the REST API, please go to the official documentation.
Check the data
To verify that the import is successful, you can check via the Web Console or via CURL…
curl -G --data-urlencode "query=select * from 'file.csv'" http://localhost:9000/exp
Just adding here that QuestDB recently improved the performance of CSV ingestion. More info at https://questdb.io/docs/guides/importing-data/
If you want to avoid converting from JSON (and probably more performant as well than exporting to JSON for large tables), you can use the influxd inspect export-lp command that exports all your data as ILP points. You can choose to export a single bucket.
Once you have the ILP files, you can import as explained at this other StackOverflow post What's the best way to upload an ILP file into QuestDB?
Alright! Following is the scenario with respective queries:
1) I am using a bash script to generate JSON object for status of custom processes.
2) Providing the bash inside zabbix_agentd.conf file:
UserParameter=service.check[*],/usr/lib/zabbix/externalscripts/service_check.bash
I want to provide the process names as parameters to the bash file here in UserParameter, how do I do that?
3) Restarting the zabbix-agent and checking with zabbix-get yields an empty JSON (because we have not given any process names):
{
"data":[
]
}
4) If I provide a process name into UserParameter as:
UserParameter=service.check[*],/usr/lib/zabbix/externalscripts/service_check.bash apache2 ntp cron
It yields the following:
{
"data":[
which I know is wrong, since I need to pass the process names in a different way. I tried passing them inside the bash script and even then it generates an invalid json as above.
5) The JSON generated will be taken care by the Zabbix discovery rule of type "Zabbix agent", where it will create different items out of process names. Following is the JSON that my script should send:
{"data":[{"{#NAME}":"apache2","{#STATUS}":"RUNNING","{#VALUE}":"1"},{"{#NAME}":"ntp","{#STATUS}":"RUNNING","{#VALUE}":"1"},{"{#NAME}":"cron","{#STATUS}":"STOPPED","{#VALUE}":"0"}]}
I could have used zabbix-sender for the same, but it would need me to run the sender for every key-value that I need to send. Also, this way I have to be concerned with manipulating data at one place only, and the rest will be taken care of.
Hope this is clear enough and explains my situation.
I have a Rails webapp full of students with test scores. Each student has exactly one test score.
I want the following functionality:
1.) The user enters an arbitrary test score into the website and presses "enter"
2.) "Some weird magic where data is passed from Rails database to bash script"
3.) The following bash script is run:
./tool INPUT file.txt
where:
INPUT = the arbitrary test score
file.txt = a list of all student test scores in the database
4.) "More weird magic where output from the bash script is sent back up to a rails view and made displayable on the webpage"
And that's it.
I have no idea how to do the weird magic parts.
My attempt at a solution:
In the rails dbconsole, I can do this:
SELECT score FROM students;
which gives me a list of all the test scores (which satisfies the "file.txt" argument to the bash script).
But I still don't know how my bash script is supposed to gain access to that data.
Is my controller supposed to pass the data down to the bash script? Or is my model supposed to? And what's the syntax for doing so?
I know I can run a bash script from the controller like this:
system("./tool")
But, unfortunately, I still need to pass the arguments to my script, and I don't see how I can do that...
You can just use the built-in ruby tools for running shell commands:
https://ruby-doc.org/core-2.3.1/Kernel.html#method-i-60
For example, in one of my systems I need to get image orientation:
exif_orientation = `exiftool -Orientation -S "#{image_path}"`.to_s.chomp
Judging from my use of .to_s, running the command may sometimes return nil, and I don't want an error trying to chomp nil. A normal output includes the line ending which I feed to chomp.
I have a spss syntax file that I need to run on multiple files each in a different directory with the same name as the file, and I am trying to too do this automatically. So far I have tried doing it with syntax code and am trying to avoid doing python is spss, but all I have been able to get is the code bellow which does not work.
VECTOR v = key.
LOOP #i = 1 to 41.
GET
FILE=CONCAT('C:\Users\myDir\otherDir\anotherDir\output\',v(#i),'\',v(#i),'.sav').
DATASET NAME Data#i WINDOW=FRONT.
*Do stuff to the opened file
END LOOP.
EXE.
key is the only column in a file that contains all the names of the files.
I am having trouble debugging since I don't know how to print to the screen if it is possible. So my question is: is there a way to get the code above to work, or another option that accomplishes the same thing?
You can't use an expression like that on a GET command. There are two choices. Use the macro language to put this together (see DEFINE in the Command Syntax Reference via the Help menu) or use the SPSSINC PROCESS FILES extension command or your own Python code to select the files with a wildcard.
The extension command or a Python program require the free Python Essentials available from the SPSS Community website or available with your Statistics version.
So it looks like the gremlin API requires a url to import a GraphML file to the server (http://docs.neo4j.org/chunked/stable/gremlin-plugin.html#rest-api-load-a-sample-graph). I was hoping there'd be some API where you could just POST the GraphML to it, does something like this exist?
I realise I could write a Neo4j extension to essentially do this, but I was wondering if one already existed...
There a shell extension at https://github.com/jexp/neo4j-shell-tools#graphml-import providing this feature. It should not be too hard to convert that into a server extension.
If the graph is not huge, perhaps you can try passing the file as a string to the gremlin extension and use the script in the doc you cited. Therefore your gremlin script expects a String variable that contains your graph and it creates the file (by writing the string graph to the file):
def fos= new FileOutputStream('path_to_my_file.xml')
fos.write(myGraphAsString)
You can then load this file:
g.clear()
g.loadGraphML('file:/path_to_my_file.xml')