Unable to download multiple videos from youtube in python - youtube

I am working on a video analysis project which requires to download videos from youtube and upload them on google cloud storage. I could not figure out a way to directly upload them to gcs thus, I tried to download them on local machine and then upload them to gcs.
I went through multiple articles on stackoverflow regarding the same and with the help of those I was able to come up with the following script.
I went through multiple articles on stackoverflow regarding the same such as
python: get all youtube video urls of a channel and
Download YouTube video using Python to a certain directory
and with the help of those I was able to come up with the following script.
import urllib.request
import json
from pytube import YouTube
import pickle
def get_all_video_in_channel(channel_id):
api_key = 'AIzaSyCK9eQlD1ptx0SKMsmL0srmL2ua9_EuwSs'
base_video_url = 'https://www.youtube.com/watch?v='
base_search_url = 'https://www.googleapis.com/youtube/v3/search?'
first_url = base_search_url+'key={}&channelId={}&part=snippet,id&order=date&maxResults=25'.format(api_key, channel_id)
video_links = []
url = first_url
while True:
inp = urllib.request.urlopen(url)
resp = json.load(inp)
for i in resp['items']:
if i['id']['kind'] == "youtube#video":
video_links.append(base_video_url + i['id']['videoId'])
try:
next_page_token = resp['nextPageToken']
url = first_url + '&pageToken={}'.format(next_page_token)
except:
break
return video_links
#Load the file containing all the youtube video url
load_url = get_all_video_in_channel(channel_id)
#Access all the youtube url in the list and store them on local machine. Need to figure out if there is a way to directly upload them to gcs
for i in range(0,len(load_url)):
YouTube('load_url[i]').streams.first().download('C:/Users/Tushar/Documents/Serato_Video_Intelligence/youtube_videos')
It works only for the first two video urls and then fails with the below error
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "C:\Python37\lib\site-packages\pytube\streams.py", line 217, in download
bytes_remaining = self.filesize
File "C:\Python37\lib\site-packages\pytube\streams.py", line 164, in filesize
headers = request.get(self.url, headers=True)
File "C:\Python37\lib\site-packages\pytube\request.py", line 21, in get
response = urlopen(url)
File "C:\Python37\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Python37\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Python37\lib\urllib\request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python37\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "C:\Python37\lib\urllib\request.py", line 503, in _call_chain
result = func(*args)
File "C:\Python37\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
I was hoping if someone can please help me understand what is going wrong here and if could help me to resolve this issue. I desperately need this and have been unable to resolve the issue for some time.
Thanks a lot in advance !!
P.S. If possible, is there a way to directly upload them to gcs.

Seems like you could fall in a conflict with YouTube's the terms of service, I suggest to you check this document and put attention on the section number 5 letter B. [1]
[1]https://www.youtube.com/static?gl=US&template=terms

Related

Gensim: error loading pretrained vectors No such file or directory: 'word2vec.kv.vectors.npy'

I am trying to load a Pretrained word2vec embeddings that is in gensim keyedvector 'word2vec.kv'
pretrained = KeyedVectors.load(args.pretrained mmap = 'r')
where arg.pretrained is "/ptembs/word2vec.kv"
and iam getting this error
File "main.py", line 60, in main
pretrained = KeyedVectors.load(args.pretrained, mmap = 'r')
File "C:\Users\ASUS\anaconda3\lib\site-packages\gensim\models\keyedvectors.py", line 1553, in load
model = super(WordEmbeddingsKeyedVectors, cls).load(fname_or_handle, **kwargs)
File "C:\Users\ASUS\anaconda3\lib\site-packages\gensim\models\keyedvectors.py", line 228, in load
return super(BaseKeyedVectors, cls).load(fname_or_handle, **kwargs)
File "C:\Users\ASUS\anaconda3\lib\site-packages\gensim\utils.py", line 436, in load obj._load_specials(fname, mmap, compress, subname)
File "C:\Users\ASUS\anaconda3\lib\site-packages\gensim\utils.py", line 478, in _load_specials
val = np.load(subname(fname, attrib), mmap_mode=mmap)
File "C:\Users\ASUS\anaconda3\lib\site-packages\numpy\lib\npyio.py", line 417, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: 'ptembs/word2vec.kv.vectors.npy'
i dont understand why it need word2vec.kv.vectors.npy file ? and i dont have it.
Any idea how to solve this problem?
gensim version 3.8.3
tried it on 4.1.2 also same error.
Where did you get the file 'word2vec.kv'?
If loading that file triggers an error mentioning a 2nd file by name, then that 2nd file should've been created alongside 'word2vec.kv' when it was 1st saved using a .save() operation.
That other file needs to be kept alongside 'word2vec.kv' in order for 'word2vec.kv' to be .load()ed again in the future.

OpenCV Haar Cascade Creation

I want to try create my own .xml file for my graduation project with this reference.
But I have a problem which stage 6 doesn't work.It gives error such as:
Traceback (most recent call last):
File "./tools/mergevec.py", line 170, in <module>
merge_vec_files(vec_directory, output_filename)
File "./tools/mergevec.py", line 120, in merge_vec_files
val = struct.unpack('<iihh', content[:12])
TypeError: a bytes-like object is required, not 'str'
I have found a solution which says find 0 size vector files and delete them.
But, I don't know which vector files are 0 size and how I can detect them.
Can you help about this please?
I was able to solve my problem when i changed it:
for f in files:
with open(f, 'rb') as vecfile:
content = ''.join(str(line) for line in vecfile.readlines())
data = content[12:]
outputfile.write(data)
except Exception as e:
exception_response(e)
for it:
for f in files:
with open(f, 'rb') as vecfile:
content = b''.join((line) for line in vecfile.readlines())
outputfile.write(bytearray(content[12:]))
except Exception as e:
exception_response(e)
and like before i changed it:
content = ''.join(str(line) for line in vecfile.readlines())
for it:
content = b''.join((line) for line in vecfile.readlines())
because it was waiting for some str, and now it is able to receive the binary archives that we're in need.
:)
Try following this guide. It's more recent.

Problems with OAuth2 and gspread

I've had some working api code for quite a long time but suddenly (about 30 minutes from the previous use of the api) it's stopped working
here's the traceback
row_cells = self.range('%s:%s' % (start_cell, end_cell))
File"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/gspread/models.py", line 72, in wrapper
return method(self, *args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/gspread/models.py", line 412, in range
params={'range': name, 'return-empty': 'true'}
File "/Lbrary/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/gspread/client.py", line 176, in get_cells_feed
r = self.session.get(url)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/gspread/httpsession.py", line 73, in get
return self.request('GET', url, params=params, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/gspread/httpsession.py", line 69, in request
response.status_code, response.content))
gspread.exceptions.RequestError: (401, '401: b\'<HTML>\\n<HEAD>\\n<TITLE>Unauthorized</TITLE>\\n</HEAD>\\n<BODY BGCOLOR="#FFFFFF" TEXT="#000000">\\n<H1>Unauthorized</H1>\\n<H2>Error 401</H2>\\n</BODY>\\n</HTML>\\n\'')
And I don't really understand this...
here's the code
import gspread
from oauth2client.service_account import ServiceAccountCredentials
import pprint
scope = [ 'https://spreadsheets.google.com/feeds' ]
creds = ServiceAccountCredentials.from_json_keyfile_name('client_secret.json', scope)
client = gspread.authorize(creds)
sheet = client.open('sheet_name').sheet1
I really don't know what to do, I've already created new api's email address, downloaded the json file (client_secret.json) but it still didn't get back to working and I honestly don't know why

Posting a status with video - Twython

I'm trying to post a video on behalf of someone through Twython.
I have followed the Twython video upload docs to achieve it, but it fails with an error on the upload_video() method that their github page marked as solved (still happens to me though).
I tried an SO solution I've found, but it fails also with TypeError: post() got an unexpected keyword argument 'files'.
So... Is there any way to achieve that using Twython?
My code:
from twython import Twython
twitter = Twython(...)
video = open(video_path, 'rb')
response = twitter.upload_video(media=video, media_type='video/mp4')
twitter.update_status(status='Checkout this cool video!', media_ids=[response['media_id']])
Result
.
.
response = twitter.upload_video(media=video, media_type='video/mp4')
File "/usr/local/lib/python3.5/dist-packages/twython/endpoints.py", line 184, in upload_video
media_chunk.write(data)
TypeError: string argument expected, got 'bytes'

Timeout error uploading 2 GIG file (resumable upload in Appengine task Python)

I am getting a timeout error when trying to upload a 2 GIG file using resumable upload reurned from the Doclist api - please see log extract below. I thought using resumable upload in an Appengine task that 2 GIG would not be an issue, any ideas?
File "/base/data/home/apps/s~gofiledrop/31.358777816137338904/handler.py", line 564, in post
new_entry = uploader.UploadFile('/feeds/upload/create-session/default/private/full?convert=false', entry=entry)
File "/base/data/home/apps/s~gofiledrop/31.358777816137338904/gdata/client.py", line 1033, in upload_file
start_byte, self.file_handle.read(self.chunk_size))
File "/base/data/home/apps/s~gofiledrop/31.358777816137338904/gdata/client.py", line 987, in upload_chunk
desired_class=self.desired_class)
File "/base/data/home/apps/s~gofiledrop/31.358777816137338904/gdata/client.py", line 265, in request
uri=uri, auth_token=auth_token, http_request=http_request, **kwargs)
File "/base/data/home/apps/s~gofiledrop/31.358777816137338904/atom/client.py", line 117, in request
return self.http_client.request(http_request)
File "/base/data/home/apps/s~gofiledrop/31.358777816137338904/atom/http_core.py", line 420, in request
http_request.headers, http_request._body_parts)
File "/base/data/home/apps/s~gofiledrop/31.358777816137338904/atom/http_core.py", line 497, in _http_request
return connection.getresponse()
File "/base/python_runtime/python_dist/lib/python2.5/httplib.py", line 206, in getresponse
deadline=self.timeout)

Resources