I've had some working api code for quite a long time but suddenly (about 30 minutes from the previous use of the api) it's stopped working
here's the traceback
row_cells = self.range('%s:%s' % (start_cell, end_cell))
File"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/gspread/models.py", line 72, in wrapper
return method(self, *args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/gspread/models.py", line 412, in range
params={'range': name, 'return-empty': 'true'}
File "/Lbrary/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/gspread/client.py", line 176, in get_cells_feed
r = self.session.get(url)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/gspread/httpsession.py", line 73, in get
return self.request('GET', url, params=params, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/gspread/httpsession.py", line 69, in request
response.status_code, response.content))
gspread.exceptions.RequestError: (401, '401: b\'<HTML>\\n<HEAD>\\n<TITLE>Unauthorized</TITLE>\\n</HEAD>\\n<BODY BGCOLOR="#FFFFFF" TEXT="#000000">\\n<H1>Unauthorized</H1>\\n<H2>Error 401</H2>\\n</BODY>\\n</HTML>\\n\'')
And I don't really understand this...
here's the code
import gspread
from oauth2client.service_account import ServiceAccountCredentials
import pprint
scope = [ 'https://spreadsheets.google.com/feeds' ]
creds = ServiceAccountCredentials.from_json_keyfile_name('client_secret.json', scope)
client = gspread.authorize(creds)
sheet = client.open('sheet_name').sheet1
I really don't know what to do, I've already created new api's email address, downloaded the json file (client_secret.json) but it still didn't get back to working and I honestly don't know why
I'm running in to an error with my Snakemake variant identification pipeline, when the original DAG of jobs is built. I believe this is a memory issue; when I test with a short list of input files, the DAG is constructed without issue, however, when I try with 300+ input paired-fastq, I receive the following error:
Building DAG of jobs...
Traceback (most recent call last):
File "/home//.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/__init__.py", line 633, in snakemake
File "/home//.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/workflow.py", line 568, in execute
File "/home//.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/dag.py", line 281, in check_incomplete
incomplete = self.incomplete_files
File "/home//.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/dag.py", line 402, in incomplete_files
filterfalse(self.needrun, self.jobs),
File "/home/k/.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/dag.py", line 399, in <genexpr>
File "/home//.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/persistence.py", line 205, in incomplete
return any(map(lambda f: f.exists and marked_incomplete(f), job.output))
File "/home//.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/persistence.py", line 205, in <lambda>
return any(map(lambda f: f.exists and marked_incomplete(f), job.output))
File "/home//.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/persistence.py", line 203, in marked_incomplete
return self._read_record(self._metadata_path, f).get("incomplete", False)
File "/home//.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/persistence.py", line 322, in _read_record_cached
return self._read_record_uncached(subject, id)
File "/home//.conda/envs/snakemake/lib/python3.6/site-packages/snakemake/persistence.py", line 328, in _read_record_uncached
return json.load(f)
File "/home//.conda/envs/snakemake/lib/python3.6/json/__init__.py", line 299, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/home//.conda/envs/snakemake/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/home//.conda/envs/snakemake/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home//.conda/envs/snakemake/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I'm not sure how to resolve this - if this is a known bug or if there is a way to define my pipeline to build a less complex DAG? I am including the first section of my Snakemake file as well. I use the rule all to define all desired output files.
#### Mtb bwa/GATK Snakemake ####
import numpy as np
from collections import defaultdict
import pandas as pd
samples_df = pd.read_table('config/tgen_samples2a.tsv',sep = ',').set_index("sample", drop=False)
sample_names = list(samples_df['sample'])
batch_names = list(samples_df['batch'])
# fastq1 input function definition
def fq1_from_sample(wildcards):
return samples_df.loc[wildcards.sample, "fastq_1"]
# fastq2 input function definition
def fq2_from_sample(wildcards):
return samples_df.loc[wildcards.sample, "fastq_2"]
# Define config file. Stores sample names and other things.
configfile: "config/config.yaml"
# Define a rule for running the complete pipeline.
rule all:
trim = expand(['results/{batch}/{samp}/trim/{samp}_trim_1.fq.gz'], zip, samp=sample_names,batch=batch_names),
kraken=expand('results/{batch}/{samp}/kraken/{samp}_trim_kr_1.fq.gz', zip, samp=sample_names,batch=batch_names),
bams=expand('results/{batch}/{samp}/bams/{samp}_{mapper}_{ref}_sorted.bam', zip, samp=sample_names,batch=batch_names, ref = config['ref']*len(sample_names), mapper = config['mapper']*len(sample_names)), # When using zip, need to use vectors of equal lengths for all wildcards.
per_samp_run_stats = expand('results/{batch}/{samp}/stats/{samp}_{mapper}_{ref}_combined_stats.csv', zip, samp=sample_names,batch=batch_names, ref = config['ref']*len(sample_names), mapper = config['mapper']*len(sample_names)),
amr_stats=expand('results/{batch}/{samp}/stats/{samp}_{mapper}_{ref}_amr.csv', samp=sample_names,batch=batch_names, ref=config['ref'], mapper=config['mapper']),
cov_stats=expand('results/{batch}/{samp}/stats/{samp}_{mapper}_{ref}_cov_stats.txt', samp=sample_names,batch=batch_names, ref=config['ref'], mapper=config['mapper']),
all_sample_stats=expand('results/{batch}/stats/combined_per_run_sample_stats.csv',batch = batch_names),
vcfs=expand('results/{batch}/{samp}/vars/{samp}_{mapper}_{ref}_{caller}_qfilt.vcf.gz', samp=sample_names,batch=batch_names, ref=config['ref'], mapper=config['mapper'], caller = config['caller']),
ann_vcfs=expand('results/{batch}/{samp}/vars/{samp}_{mapper}_{ref}_gatk_ann.vcf.gz', samp=sample_names,batch=batch_names, ref=config['ref'], mapper=config['mapper'], caller = config['caller']),
fastas=expand('results/{batch}/{samp}/fasta/{samp}_{mapper}_{ref}_{caller}_{filter}.fa', samp=sample_names,batch=batch_names, ref=config['ref'], mapper=config['mapper'], caller = config['caller'], filter=config['filter']),
profiles=expand('results/{batch}/{samp}/stats/{samp}_{mapper}_{ref}_lineage.csv', samp=sample_names,batch=batch_names, ref=config['ref'], mapper=config['mapper'])
# Trim reads for quality.
rule trim_reads:
'{config[scripts_dir]}trim_reads.sh {input.p1} {input.p2} {output.trim1} {output.trim2} &>> {log}'
# Filter reads taxonomically with Kraken.
rule taxonomic_filter:
kraken_stats = 'results/{batch}/{samp}/kraken/{samp}_kraken_stats.csv'
threads: 8
'{config[scripts_dir]}run_kraken.sh {input.trim1} {input.trim2} {output.kr1} {output.kr2} {output.kraken_report} &>> {log}'
Thank you in advance for help using Snakemake!
All the best,
I kind of doubt memory is an issue. 300+ is not much, especially if each of them is processed independently of the others.
Try to start from the subset of samples that you say worked and gradually increase it until you see the problem appearing. Perhaps you have some funny value in your sample sheet or in your config? json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) hints at something like that in my impression.
The answer was from #TroyComi, above: after deleting the .snakemake directory, the issue was resolved. Thank you!
I’m trying to import a course that has multiple videos and is hosted on an ironwood server to a newly deployed juniper server. While importing it throws the following errors in the console.
Following is the console log output
TypeError: Unicode-objects must be encoded before hashing
[2020-12-11 09:29:02,847: INFO/Worker-1] VAL: Video created with id [54454916-47e9-4769-8b41-06062d0b7e8c] and status [external]
[2020-12-11 09:29:02,860: ERROR/Worker-1] [VAL] Transcript save failed to storage for video_id “54454916-47e9-4769-8b41-06062d0b7e8c” language code “en”
Traceback (most recent call last):
File “/edx/app/edxapp/venvs/edxapp/lib/python3.5/site-packages/edxval/models.py”, line 489, in create
video_transcript.transcript.save(file_name, transcript_content)
File “/edx/app/edxapp/venvs/edxapp/lib/python3.5/site-packages/django/db/models/fields/files.py”, line 87, in save
self.name = self.storage.save(name, content, max_length=self.field.max_length)
File “/edx/app/edxapp/venvs/edxapp/lib/python3.5/site-packages/django/core/files/storage.py”, line 52, in save
return self._save(name, content)
File “/edx/app/edxapp/venvs/edxapp/lib/python3.5/site-packages/storages/backends/s3boto3.py”, line 495, in _save
self._save_content(obj, content, parameters=parameters)
File “/edx/app/edxapp/venvs/edxapp/lib/python3.5/site-packages/storages/backends/s3boto3.py”, line 510, in _save_content
obj.upload_fileobj(content, ExtraArgs=put_parameters)
File “/edx/app/edxapp/venvs/edxapp/lib/python3.5/site-packages/boto3/s3/inject.py”, line 513, in object_upload_fileobj
ExtraArgs=ExtraArgs, Callback=Callback, Config=Config)
File “/edx/app/edxapp/venvs/edxapp/lib/python3.5/site-packages/boto3/s3/inject.py”, line 431, in upload_fileobj
return future.result()
File “/edx/app/edxapp/venvs/edxapp/lib/python3.5/site-packages/s3transfer/futures.py”, line 73, in result
return self._coordinator.result()
File “/edx/app/edxapp/venvs/edxapp/lib/python3.5/site-packages/s3transfer/futures.py”, line 233, in result
raise self._exception
File “/edx/app/edxapp/venvs/edxapp/lib/python3.5/site-packages/s3transfer/tasks.py”, line 126, in call
return self._execute_main(kwargs)
File “/edx/app/edxapp/venvs/edxapp/lib/python3.5/site-packages/s3transfer/tasks.py”, line 150, in _execute_main
return_value = self._main(**kwargs)
File “/edx/app/edxapp/venvs/edxapp/lib/python3.5/site-packages/s3transfer/upload.py”, line 692, in _main
client.put_object(Bucket=bucket, Key=key, Body=body, **extra_args)
File “/edx/app/edxapp/venvs/edxapp/lib/python3.5/site-packages/botocore/client.py”, line 317, in _api_call
return self._make_api_call(operation_name, kwargs)
File “/edx/app/edxapp/venvs/edxapp/lib/python3.5/site-packages/botocore/client.py”, line 596, in _make_api_call
request_signer=self._request_signer, context=request_context)
File “/edx/app/edxapp/venvs/edxapp/lib/python3.5/site-packages/botocore/hooks.py”, line 242, in emit_until_response
responses = self._emit(event_name, kwargs, stop_on_response=True)
File “/edx/app/edxapp/venvs/edxapp/lib/python3.5/site-packages/botocore/hooks.py”, line 210, in _emit
response = handler(**kwargs)
File “/edx/app/edxapp/venvs/edxapp/lib/python3.5/site-packages/botocore/handlers.py”, line 209, in conditionally_calculate_md5
calculate_md5(params, **kwargs)
File “/edx/app/edxapp/venvs/edxapp/lib/python3.5/site-packages/botocore/handlers.py”, line 187, in calculate_md5
binary_md5 = _calculate_md5_from_file(body)
File “/edx/app/edxapp/venvs/edxapp/lib/python3.5/site-packages/botocore/handlers.py”, line 201, in _calculate_md5_from_file
The course is imported from ironwood deployed sever to juniper deployed server.
I went through all the logs and was able to track that an error occurs in edxval while uploading the file to the S3 bucket. So, I checked edxval release versions and version 1.4.3 is the one that fixed S3 bucket upload issue. I updated it and it fixed my issue.
I am working on a video analysis project which requires to download videos from youtube and upload them on google cloud storage. I could not figure out a way to directly upload them to gcs thus, I tried to download them on local machine and then upload them to gcs.
I went through multiple articles on stackoverflow regarding the same and with the help of those I was able to come up with the following script.
I went through multiple articles on stackoverflow regarding the same such as
python: get all youtube video urls of a channel and
Download YouTube video using Python to a certain directory
and with the help of those I was able to come up with the following script.
import urllib.request
import json
from pytube import YouTube
import pickle
def get_all_video_in_channel(channel_id):
api_key = 'AIzaSyCK9eQlD1ptx0SKMsmL0srmL2ua9_EuwSs'
base_video_url = 'https://www.youtube.com/watch?v='
base_search_url = 'https://www.googleapis.com/youtube/v3/search?'
first_url = base_search_url+'key={}&channelId={}&part=snippet,id&order=date&maxResults=25'.format(api_key, channel_id)
video_links = []
url = first_url
while True:
inp = urllib.request.urlopen(url)
resp = json.load(inp)
for i in resp['items']:
if i['id']['kind'] == "youtube#video":
video_links.append(base_video_url + i['id']['videoId'])
next_page_token = resp['nextPageToken']
url = first_url + '&pageToken={}'.format(next_page_token)
return video_links
#Load the file containing all the youtube video url
load_url = get_all_video_in_channel(channel_id)
#Access all the youtube url in the list and store them on local machine. Need to figure out if there is a way to directly upload them to gcs
for i in range(0,len(load_url)):
It works only for the first two video urls and then fails with the below error
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "C:\Python37\lib\site-packages\pytube\streams.py", line 217, in download
bytes_remaining = self.filesize
File "C:\Python37\lib\site-packages\pytube\streams.py", line 164, in filesize
headers = request.get(self.url, headers=True)
File "C:\Python37\lib\site-packages\pytube\request.py", line 21, in get
response = urlopen(url)
File "C:\Python37\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Python37\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Python37\lib\urllib\request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python37\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "C:\Python37\lib\urllib\request.py", line 503, in _call_chain
result = func(*args)
File "C:\Python37\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
I was hoping if someone can please help me understand what is going wrong here and if could help me to resolve this issue. I desperately need this and have been unable to resolve the issue for some time.
Thanks a lot in advance !!
P.S. If possible, is there a way to directly upload them to gcs.
Seems like you could fall in a conflict with YouTube's the terms of service, I suggest to you check this document and put attention on the section number 5 letter B. [1]
This question is related to bioinformatics. I did not recieve any suggestions in corresponding forums, so I write it here.
I need to remove non-ACTG nucleotides in fasta file and write output to a new file using seqio from biopython.
My code is
import re
import sys
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC
for seq_record in SeqIO.parse("test.fasta", "fasta",IUPAC.ambiguous_dna):
seq_record.seq = re.sub('[^GATC]',"",str(sequence).upper())
Running this code gives errors:
Traceback (most recent call last):
File "remove.py", line 18, in <module>
File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/__init__.py", line 481, in write
count = writer_class(fp).write_file(sequences)
File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages /Bio/SeqIO/Interfaces.py", line 209, in write_file
count = self.write_records(records)
File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 194, in write_records
File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/FastaIO.py", line 202, in write_record
data = self._get_seq_string(record) # Catches sequence being None
File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 100, in _get_seq_string
% record.id)
TypeError: SeqRecord (id=CALB_TCONS_00001015) has an invalid sequence.
If I change this line
seq_record.seq = re.sub('[^GATC]',"",str(sequence).upper())
to for example seq_record.seq = sequence + "A" everything works fine. However, re.sub('[^GATC]',"",str(sequence).upper()) also should work in theory.
Biopython's SeqIO expects the SeqRecord object's .seq to be a Seq object (or similar), not a plain string. Try:
seq_record.seq = Seq(re.sub('[^GATC]',"",str(sequence).upper()))
For FASTA output there is no need to set the sequence's alphabet.
I am getting a timeout error when trying to upload a 2 GIG file using resumable upload reurned from the Doclist api - please see log extract below. I thought using resumable upload in an Appengine task that 2 GIG would not be an issue, any ideas?
File "/base/data/home/apps/s~gofiledrop/31.358777816137338904/handler.py", line 564, in post
new_entry = uploader.UploadFile('/feeds/upload/create-session/default/private/full?convert=false', entry=entry)
File "/base/data/home/apps/s~gofiledrop/31.358777816137338904/gdata/client.py", line 1033, in upload_file
start_byte, self.file_handle.read(self.chunk_size))
File "/base/data/home/apps/s~gofiledrop/31.358777816137338904/gdata/client.py", line 987, in upload_chunk
File "/base/data/home/apps/s~gofiledrop/31.358777816137338904/gdata/client.py", line 265, in request
uri=uri, auth_token=auth_token, http_request=http_request, **kwargs)
File "/base/data/home/apps/s~gofiledrop/31.358777816137338904/atom/client.py", line 117, in request
return self.http_client.request(http_request)
File "/base/data/home/apps/s~gofiledrop/31.358777816137338904/atom/http_core.py", line 420, in request
http_request.headers, http_request._body_parts)
File "/base/data/home/apps/s~gofiledrop/31.358777816137338904/atom/http_core.py", line 497, in _http_request
return connection.getresponse()
File "/base/python_runtime/python_dist/lib/python2.5/httplib.py", line 206, in getresponse