Can I batch update multiple worksheet in a spreadsheet using gspread? - google-sheets

import gspread
from oauth2client.service_account import ServiceAccountCredentials
scope = ["https://spreadsheets.google.com/feeds",'https://www.googleapis.com/auth/spreadsheets',"https://www.googleapis.com/auth/drive.file","https://www.googleapis.com/auth/drive"]
creds = ServiceAccountCredentials.from_json_keyfile_name("creds.json", scope)
client = gspread.authorize(creds)
sh=client.open("kite")
s0=sh.get_worksheet(0)
s1=sh.get_worksheet(1)
s0.batch_update([
{'range': 'A3', 'values':fin_5min},
{'range': 'D3', 'values': fin_15min}
])
I want to update s1 sheet with same data as s0. How can I do this in single request using gspread?

Issue and workaround:
At Sheets API, I thought that your goal can be achieved by the method of spreadsheets.values.batchUpdate. But unfortunately, it seems that in the current stage, this method is not included in gspread. So as a workaround, I would like to propose to use the method using requests module.
When your script is modified, it becomes as follows.
Modified script:
client = gspread.authorize(creds)
sh = client.open("kite")
sheetTitles = [sh.get_worksheet(0).title, sh.get_worksheet(1).title]
values = [{'range': 'A3', 'values': fin_5min}, {'range': 'D3', 'values': fin_15min}]
reqs = [[{'range': "'" + t + "'!" + v['range'], 'values': v['values']} for v in values] for t in sheetTitles]
res = requests.post(
'https://sheets.googleapis.com/v4/spreadsheets/' + sh.id + '/values:batchUpdate',
headers={"Authorization": "Bearer " + creds.get_access_token().access_token, "Content-Type": "application/json"}, # modified
data=json.dumps({"data": sum(reqs, []), "valueInputOption": "USER_ENTERED"})
)
The access token is retrieved from creds of client = gspread.authorize(creds).
In this case, the values of fin_5min and fin_15min are required to be 2 dimensional array. Please be careful this.
By the above modification, the values of values are put to both the 1st and 2nd sheets.
And, in this modified script, please add import requests and import json.
Reference:
Method: spreadsheets.values.batchUpdate

Related

Using Google Colab how to drive.files().list more than 1000 files from google drive

About once a month I get a google drive folder with lots of videos in it (usually around 700-800) and a spreadsheet that column A gets populated with the names of all of the video files in order of the time stamp in the video file name. Now I've already got the code that does this (I will post it below) but This time I've got about 8,400 video files in the folder and this algorithm has a pageSize limit of 1,000 (it was originally 100, I changed it to 1,000 but that's the highest it will accept) How do I change this code to accept more than 1000
This is the part that initializes everything
!pip install gspread_formatting
import time
import gspread
from gspread import urls
from google.colab import auth
from datetime import datetime
from datetime import timedelta
from gspread_formatting import *
from googleapiclient.discovery import build
from oauth2client.client import GoogleCredentials
from google.auth import default
folder_id = '************************' # change to whatever folder the required videos are in
base_dir = '/Example/drive/videofolder' # change this to whatever folder path you want to grab videos from same as above
file_name_qry_filter = "name contains 'mp4' and name contains 'cam'"
file_pattern="cam*.mp4"
spreadSheetUrl = 'https://docs.google.com/spreadsheets/d/SpreadsheetIDExample/edit#gid=0'
data_drive_id = '***********' # This is the ID of the shared Drive
auth.authenticate_user()
creds, _ = default()
gc = gspread.authorize(creds)
#gc = gspread.authorize(GoogleCredentials.get_application_default())
wb = gc.open_by_url(spreadSheetUrl)
sheet = wb.worksheet('Sheet1')
And this is the main part of the code
prevTimeStamp = None
prevHour = None
def dateChecker(fileName, prevHour):
strippedFileName = fileName.strip(".mp4") # get rid of the .mp4 from the end of the file name
parsedFileName = strippedFileName.split("_") # split the file name into an array of (0 = Cam#, 1 = yyyy-mm-dd, 2 = hh-mm-ss)
timeStamp = parsedFileName[2] # Grabbed specifically the hh-mm-ss time section from the original file name
parsedTimeStamp = timeStamp.split("-") # split the time stamp into an array of (0 = hour, 1 = minute, 2 = second)
hour = int(parsedTimeStamp[0])
minute = int(parsedTimeStamp[1])
second = int(parsedTimeStamp[2]) # set hour, minute, and seccond to it's own variable
commentCell = "Reset"
if prevHour == None:
commentCell = " "
prevHour = hour
else:
if 0 <= hour < 24:
if hour == 0:
if prevHour == 23:
commentCell = " "
else:
commentCell = "Missing Video1"
else:
if hour - prevHour == 1:
commentCell = " "
else:
commentCell = "Missing Video2"
else:
commentCell = "Error hour is not between 0 and 23"
if minute != 0 or 1 < second <60:
commentCell = "Check Length"
prevHour = hour
return commentCell, prevHour
# Drive query variables
parent_folder_qry_filter = "'" + folder_id + "' in parents" #you shouldn't ever need to change this
query = file_name_qry_filter + " and " + parent_folder_qry_filter
drive_service = build('drive', 'v3')
# Build request and call Drive API
page_token = None
response = drive_service.files().list(q=query,
corpora='drive',
supportsAllDrives='true',
includeItemsFromAllDrives='true',
driveId=data_drive_id,
pageSize=1000,
fields='nextPageToken, files(id, name, webViewLink)', # you can add extra fields in the files() if you need more information about the files you're grabbing
pageToken=page_token).execute()
i = 1
array = [[],[]]
# Parse/print results
for file in response.get('files', []):
array.insert(i-1, [file.get('name'), file.get('webViewLink')]) # If you add extra fields above, this is where you will have to start changing the code to make it accomadate the extra fields
i = i + 1
array.sort()
array_sorted = [x for x in array if x] #Idk man this is some alien shit I just copied it from the internet and it worked, it somehow removes any extra blank objects in the array that aren't supposed to be there
arrayLength = len(array_sorted)
print(arrayLength)
commentCell = 'Error'
# for file_name in array_sorted:
# date_gap, start_date, end_date = date_checker(file_name[0])
# if prev_end_date == None:
# print('hello')
# elif start_date != prev_end_date:
# date_gap = 'Missing Video'
for file_name in array_sorted:
commentCell, prevHour = dateChecker(file_name[0], prevHour)
time.sleep(0.3)
#insertRow = [file_name[0], "Not Processed", " ", date_gap, " ", " ", " ", " ", base_dir + '/' + file_name[0], " ", file_name[1], " ", " ", " "]
insertRow = [file_name[0], "Not Processed", " ", commentCell, " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " "]
sheet.append_row(insertRow, value_input_option='USER_ENTERED')
Now I know the problem has to do with the
page_token = None
response = drive_service.files().list(q=query,
corpora='drive',
supportsAllDrives='true',
includeItemsFromAllDrives='true',
driveId=data_drive_id,
pageSize=1000,
fields='nextPageToken, files(id, name, webViewLink)', # you can add extra fields in the files() if you need more information about the files you're grabbing
pageToken=page_token).execute()
In the middle of the main part of the code. I've obviously already tried just changing the pageSize limit to 10,000 but I knew that wouldn't work and I was right, it came back with
HttpError: <HttpError 400 when requesting https://www.googleapis.com/drive/v3/files?q=name+contains+%27mp4%27+and+name+contains+%27cam%27+and+%271ANmLGlNr-Cu0BvH2aRrAh_GXEDk1nWvf%27+in+parents&corpora=drive&supportsAllDrives=true&includeItemsFromAllDrives=true&driveId=0AF92uuRq-00KUk9PVA&pageSize=10000&fields=nextPageToken%2C+files%28id%2C+name%2C+webViewLink%29&alt=json returned "Invalid value '10000'. Values must be within the range: [1, 1000]". Details: "Invalid value '10000'. Values must be within the range: [1, 1000]">
The one idea I have is to have multiple pages with 1000 each and than iterate through them but I barely understood how this part of the code worked a year ago when I set it up and since than I haven't touched google colab except to run this algorithm and Every time I try to google how to do this or look up the google drive API or anything else everything always comes back with how to download and upload a couple file where what I need is just to get a list of the names of all the files.
The documentation explains how to use the pageToken for pagination (the page is for Calendar API but it works the same in Drive):
In order to retrieve the next page, perform the exact same request as previously and append a pageToken field with the value of nextPageToken from the previous page. A new nextPageToken is provided on the following pages until all the results are retrieved.
Essentially you want a loop where you run files.list(), retrieve the pageToken, and run it again while feeding it the previous token until you stop getting tokens.
For your specific scenario you can try to replace the "problem" snippet with the following:
page_token = ""
filelist = {}
while True:
response = drive_service.files().list(q=query,
corpora='drive',
supportsAllDrives='true',
includeItemsFromAllDrives='true',
driveId=data_drive_id,
pageSize=1000,
fields='nextPageToken, files(id, name, webViewLink)',
pageToken=page_token).execute()
page_token = response.get('nextPageToken', None)
filelist.setdefault("files",[]).extend(response.get('files'))
if (not page_token):
break
response = filelist
This does as I described, looping files.list() and adding the results to the filelist variable, then breaking the loop when the API stops returning page tokens. At the end I just assigned the value of filelist to the response variable since that's what you're using in the rest of your code. It should parse the same way but with the full list of results this time.
Sources:
Page through list of resources
Files.list()

Exporting > 1000 issues from JIRA

I am trying to export JIRA tasks via API and I hit a wall on excel due to JIRA only allowing a 1000 limit. I can do an export manually to CSV and get over 1000 results and was wondering if anyone had any luck with large JIRA exports via REST API and can help point me in the right direction on this.
Guessing an export to CSV then pull into excel for reporting might work?
Thanks!
JIRA's REST API supports pagination to prevent that clients of the API can put too much load on the application. This means you cannot just pull in all issue data with 1 REST call.
You can only retrieve "pages" of max 1000 issues using the paging query parameters startAt and maxResults. See the Pagination section here.
If you run a JIRA standalone server then you can tweak the maximum number of results that JIRA returns, but for a cloud instance this is not possible. See this KB article for more info.
using jira-python (according to your tag)
# search_issues can only return 1000 issues, so if there are more we have to search again, thus startAt=count
issues = []
count = 0
while True:
tmp_issues = jira_connection.search_issues('', startAt=count, maxResults=count + 999)
if len(tmp_issues) == 0:
# Since Python does not offer do-while, we have to break here.
break
issues.extend(tmp_issues)
count += 999
The code below will fetch results 200 records at a time, till all records are exported.
you can export max 1000 records at a go by updating the page size, it will recursively fetch 1000 records until everything is exported
var windowSlider = 200
const request = require('request')
const fs = require('fs')
const chalk = require('chalk')
var windowSlider = 200
var totlExtractedRecords = 0;
fs.writeFileSync('output.txt', '')
const option = {
url: 'https://jira.yourdomain.com/rest/api/2/search',
json: true,
qs: {
jql: "project in (xyz)",
maxResults: 200,
startAt: 0,
}
}
const callback = (error, response) => {
const body = response.body
console.log(response.body)
const dataArray = body.issues
const total = body.total
totlExtractedRecords = dataArray.length + totlExtractedRecords
if (totlExtractedRecords > 0) {
option.qs.startAt = windowSlider + option.qs.startAt
}
dataArray.forEach(element => {
fs.appendFileSync('output.txt', element.key + '\n')
})
console.log(chalk.red.inverse('Total extracted data : ' + totlExtractedRecords))
console.log(chalk.red.inverse('Total data: ' + total))
if (totlExtractedRecords < total) {
console.log('Re - Running with start as ' + option.qs.startAt)
console.log('Re - Running with maxResult as ' + option.qs.maxResults)
request(option, callback).auth('api-reader', 'APITOKEN', true)
}
}
request(option, callback).auth('api-reader', 'APITOKEN', true)

Making a variable number of parallel HTTP requests with Gatling?

I am trying to model a server-to-server REST API interaction in Gatling 2.2.0. There are several interactions of the type "request a list and then request all items on the list at in parallel", but I can't seem to model this in Gatling. Code so far:
def groupBy(dimensions: Seq[String], metric: String) = {
http("group by")
.post(endpoint)
.body(...).asJSON
.check(
...
.saveAs("events")
)
}
scenario("Dashboard scenario")
.exec(groupBy(dimensions, metric)
.resources(
// a http() for each item in session("events"), plz
)
)
I have gotten as far as figuring out that parallel requests are performed by .resources(), but I don't understand how to generate a list of requests to feed it. Any input is appreciated.
Below approach is working for me. Seq of HttpRequestBuilder will be executed concurrently:
val numberOfParallelReq = 1000
val scn = scenario("Some scenario")
.exec(
http("first request")
.post(url)
.resources(parallelRequests: _*)
.body(StringBody(firstReqBody))
.check(status.is(200))
)
def parallelRequests: Seq[HttpRequestBuilder] =
(0 until numberOfParallelReq).map(i => generatePageRequest(i))
def generatePageRequest(id: Int): HttpRequestBuilder = {
val body = "Your request body here...."
http("page")
.post(url)
.body(StringBody(body))
.check(status.is(200))
}
Not very sure of your query but seems like you need to send parallel request which can be done by
setUp(scenorio.inject(atOnceUsers(NO_OF_USERS)));
Refer this http://gatling.io/docs/2.0.0-RC2/general/simulation_setup.html

Value registerAsTable is not a member of org.apache.spark.rdd.RDD[Tweet]

I am trying to extract twitter data using rest API in zeppelin. Tried both option registerAsTable and registerTempTable, both ways are not working. Please help me to resolve the error. Getting below error while executing zeppelin Tutorial Code:
error: value registerAsTable is not a member of org.apache.spark.rdd.RDD[Tweet] ).foreachRDD(rdd=> rdd.registerAsTable("tweets")
RDD cannot be registered as Table whereas dataframe can. You can convert your RDD into dataframe and then write the resulting dataframe as tempTable or table.
You can convert RDD into Dataframe as below
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
rdd.toDF()
Refer How to convert rdd object to dataframe in spark and http://spark.apache.org/docs/latest/sql-programming-guide.html
in zepplin interpretors add external dependency of org.apache.bahir:spark-streaming-twitter_2.11:2.0.0 from GUI and after that run following using spark-2.0.1
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.{ SparkConf, SparkContext}
import org.apache.spark.storage.StorageLevel
import scala.io.Source
//import org.apache.spark.Logging
import java.io.File
import org.apache.log4j.Logger
import org.apache.log4j.Level
import sys.process.stringSeqToProcess
import scala.collection.mutable.HashMap
/** Configures the Oauth Credentials for accessing Twitter */
def configureTwitterCredentials(apiKey: String, apiSecret: String, accessToken: String, accessTokenSecret: String) {
val configs = new HashMap[String, String] ++= Seq(
"apiKey" -> apiKey, "apiSecret" -> apiSecret, "accessToken" -> accessToken, "accessTokenSecret" -> accessTokenSecret)
println("Configuring Twitter OAuth")
configs.foreach{ case(key, value) =>
if (value.trim.isEmpty) {
throw new Exception("Error setting authentication - value for " + key + " not set")
}
val fullKey = "twitter4j.oauth." + key.replace("api", "consumer")
System.setProperty(fullKey, value.trim)
println("\tProperty " + fullKey + " set as [" + value.trim + "]")
}
println()
}
// Configure Twitter credentials , following config values will not work,it is for show off
val apiKey = "7AVLnhssAqumpgY6JtMa59w6Tr"
val apiSecret = "kRLstZgz0BYazK6nqfMkPvtJas7LEqF6IlCp9YB1m3pIvvxrRZl"
val accessToken = "79438845v6038203392-CH8jDX7iUSj9xmQRLpHqLzgvlLHLSdQ"
val accessTokenSecret = "OXUpYu5YZrlHnjSacnGJMFkgiZgi4KwZsMzTwA0ALui365"
configureTwitterCredentials(apiKey, apiSecret, accessToken, accessTokenSecret)
import org.apache.spark.{ SparkConf, SparkContext}
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._
import org.apache.spark.SparkContext._
val ssc = new StreamingContext(sc, Seconds(2))
val tweets = TwitterUtils.createStream(ssc, None)
val twt = tweets.window(Seconds(10))
//twt.print
val sqlContext= new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
case class Tweet(createdAt:Long, text:String)
val tweet = twt.map(status=>
Tweet(status.getCreatedAt().getTime()/1000, status.getText())
)
tweet.foreachRDD(rdd=>rdd.toDF.registerTempTable("tweets"))
ssc.start()
//ssc.stop()
After that run some queries in the table in another zappelin cell
%sql select createdAt, text from tweets limit 50
val data = sc.textFile("/FileStore/tables/uy43p2971496606385819/testweet.json");
//convert RDD to DF
val inputs= data.toDF();
inputs.createOrReplaceTempView("tweets");

Getting {"errors":[{"message":"Bad Authentication data","code":215}]} using python-twitter API v1.0

I am using python-twitter API v1.0. It says that it works with v1.1 of the twitter API.
This is my code :
import json
import urllib2
import urllib
import csv
file1 = open("somez.csv", "wb")
fname="tw1.txt"
file=open(fname,"r")
ins = open( "tw1.txt", "r" )
array = []
for line in ins:
array.append( line )
s = 'https://api.twitter.com/1.1/statuses/show/' + line[:-1] + '.json'
print s
try:
data=urllib2.urlopen(s)
except:
print "Not Found"
continue
print data
json_format=json.load(data)
js=json_format
print line[:-1]
print js[('user')]['id']
print js[('user')]['created_at']
print js['retweet_count']
print js['text']
# js = js.decode('utf-8')
one = line[:-1].encode('utf-8')
thr = js['user']['id']
two = js['user']['created_at'].encode('utf-8')
four = js['retweet_count']
five = js['text'].encode('utf-8')
rw = [one,two,thr,four,five];
spamWriter = csv.writer(file1 , delimiter=',')
spamWriter.writerow(rw)
file1.close()
I am not able to retrieve any data. It is saying "Not Found". When I open one of the URLs, I get this error:
{"errors":[{"message":"Bad Authentication data","code":215}]}
Can anyone suggest what the problem might be?

Resources