How to connect and store the cookie with Scrappy - session-cookies

I want to login to a site with Scrappy and after that call another url.
So far so good I installed Scrappy and made this script:
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.http import FormRequest
class LoginSpider2(BaseSpider):
name = 'github_login'
start_urls = ['https://github.com/login']
def parse(self, response):
return [FormRequest.from_response(response, formdata={'login': 'username', 'password': 'password'}, callback=self.after_login)]
def after_login(self, response):
if "authentication failed" in response.body:
self.log("Login failed", level=log.ERROR)
else:
self.log("Login succeed", response.body)
After launching this script, I got the log "Login succeed".
Then I added another URL but it didn't work:
To do that I replaced:
start_urls = ['https://github.com/login']
by
start_urls = ['https://github.com/login', 'https://github.com/MyCompany/MyPrivateRepo']
But I got these errors:
2013-06-11 22:23:40+0200 [scrapy] DEBUG: Enabled item pipelines:
Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 4, in <module>
execute()
File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 131, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 76, in _run_print_help
func(*a, **kw)
File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 138, in _run_command
cmd.run(args, opts)
File "/Library/Python/2.7/site-packages/scrapy/commands/crawl.py", line 43, in run
spider = self.crawler.spiders.create(spname, **opts.spargs)
File "/Library/Python/2.7/site-packages/scrapy/spidermanager.py", line 43, in create
raise KeyError("Spider not found: %s" % spider_name)
What I am doing wrong ? I searched on stackoverflow but I didn't find a proper response..
Thanks you

Your error indicates that Scrapy is not able to find the spider. Did you create it in the project/spiders folder?
Anyway, once you get it to run you will find a second issue: the default callback for the start_url requests is self.parse, which will fail for the repo page (there's no login form there). And they will probably run in parallel, so by the time it visits the private repo, it will get an error :P
You should leave only the login url in start_urls, and return a new Request in the after_login method, if it worked. Like this:
def after_login(self, response):
...
else:
return Request('https://github.com/MyCompany/MyPrivateRepo',
callback=self.parse_repo)

Is the name attribute of the spider still set correctly? Incorrect/missing setting of name usually leads to errors like these.

Related

Authentication unsuccessful error with smtp.office365.com when using python3 smtplib

I have this python program that sends me daily emails. This is my personal email account with Microsoft outlook.com. My code has been working fine but broke yesterday. Here is my code
def email(subject, text):
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.mime.image import MIMEImage
user = "xxx#hotmail.com"
passwd = "xxxxxx"
sender = 'xxx#hotmail.com'
receiver = 'xxx#hotmail.com'
msg = MIMEMultipart('mixed')
msg['Subject'] = subject
msg['From'] = 'xxx#hotmail.com'
msg['To'] = 'xxx#hotmail.com'
text_plain = MIMEText(text,'plain','utf-8')
msg.attach(text_plain)
server = smtplib.SMTP('smtp.office365.com', 587)
server.ehlo()
server.starttls()
server.login(user, passwd)
server.sendmail(sender, receiver, msg.as_string())
server.quit()
User, sender, receiver, to and from are all the same email address. When I run the script, I got this error
>>> email('test subject', 'test message')
File "<stdin>", line 1, in <module>
File "<stdin>", line 19, in email
File "/usr/lib/python3.6/smtplib.py", line 730, in login
raise last_exception
File "/usr/lib/python3.6/smtplib.py", line 721, in login
initial_response_ok=initial_response_ok)
File "/usr/lib/python3.6/smtplib.py", line 642, in auth
raise SMTPAuthenticationError(code, resp)
smtplib.SMTPAuthenticationError: (535, b'5.7.3 Authentication unsuccessful [MW4PR03CA0229.namprd03.prod.outlook.com]')
Any ideas what could go wrong? This script has been working for at least half year..
Thanks!
Difan
not sure if I'll be of any help but since yesterday we are having problems with thunderbird connecting to microsoft mail server. For the base account changing authentication method to OAuth2 helped, but I still don't know what to do about aliases.
So I guess the problem lies with microsoft changing the requierements for authentication.

Authenticate to Gsuite APIs without service account

I'm trying to read Google Spreadsheet using my local credentials (i.e. not service account). I'm basically doing:
from googleapiclient.discovery import build
service = build('sheets', 'v4')
request = sheet_service.spreadsheets().values().get(spreadsheetId=SPREADSHEET_ID, range='{}!A1:a4'.format(SHEET_NAME))
result = request.execute()
values = result.get('values', [])
However I get the following error:
Traceback (most recent call last):
File "./update_incidents_tracker.py", line 54, in <module>
sys.exit(main())
File "./update_incidents_tracker.py", line 48, in main
result = request.execute()
File "/home/filip/.virtualenvs/monitoring-tools/lib/python3.6/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
return wrapped(*args, **kwargs)
File "/home/filip/.virtualenvs/monitoring-tools/lib/python3.6/site-packages/googleapiclient/http.py", line 898, in execute
raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 403 when requesting https://sheets.googleapis.com/v4/spreadsheets/1lTcQ3WknG_2oZvy9O9LEbgAzoykOWaheYSaDkkV21wE/values/Incidents%21A1%3Aa4?alt=json returned "Request had insufficient authentication scopes.">
I can't find a way to set scopes for my local (gcloud SDK) credentials. How can I set proper scopes?
For reference for service account based authentication I can simply run:
credentials = service_account.Credentials.from_service_account_file(CREDENTIALS_FILE, scopes=SCOPES)
build('sheets', 'v4', credentials=credentials)
Use the default credential explicitally, and set the scope at this time
import google.auth
credentials, project_id = google.auth.default(scopes=....)
build('sheets', 'v4', credentials=credentials)
....
Full documentation of google-auth here

tweepy Not Authorized - tweepy.error.TweepError: Not authorized

I get the following error when I try to use tweepy for twitter authentication.
File "/usr/local/lib/python2.7/dist-packages/tweepy/models.py", line 146, in followers
return self._api.followers(user_id=self.id, **kargs)
File "/usr/local/lib/python2.7/dist-packages/tweepy/binder.py", line 197, in _call
return method.execute()
File "/usr/local/lib/python2.7/dist-packages/tweepy/binder.py", line 173, in execute
raise TweepError(error_msg, resp)
tweepy.error.TweepError: Not authorized.
I am not building a web app. So, authentication is simpler.
consumer_key="----------"
consumer_secret="----------"
access_token="--------------"
access_token_secret="-----------------"
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
api.get_user('---').followers()
Fixed. The particular user had protected tweets. Hence, .followers() was failing.
I had a for that looped through my followers in order to get all their followers. And i got crashed with the same error.
My workaround was:
try:
api.get_user('---').followers()
...
except tweepy.TweepError:
print("Failed to run the command on that user, Skipping...")
Although it makes you miss some of the users. My loop has successfully finished and got about 99% percent of my followers. So It is probably really rare that a user has protected tweets.

Specifiy oauth_callback URL for OAuth1 services (twitter)

I am trying to provide seamless login with twitter to my web application. For that, I need twitter to redirect to a specific URL after the user has authorized my application.
I do not want the user to be forced to copy paste a PIN to authorize the application.
According to the guidelines on "Implementing Sign in with Twitter", in Step 1, when obtaining a request_token an oauth_callback must be specified. But doing so with rauth raises an exception:
Traceback (most recent call last):
File "/install_dir/web2py/gluon/restricted.py", line 212, in restricted
exec ccode in environment
File "/install_dir/web2py/applications/wavilon_portal/controllers/signup.py", line 213, in <module>
File "/install_dir/web2py/gluon/globals.py", line 194, in <lambda>
self._caller = lambda f: f()
File "/install_dir/web2py/applications/wavilon_portal/controllers/signup.py", line 198, in oauth_signup
authorized, authorize_url = oauth_service.check_authorization()
File "/python_modules/oauth/service.py", line 230, in check_authorization
authorize_url = self.get_authorize_url()
File "/python_modules/oauth/service.py", line 195, in get_authorize_url
return self.get_authorize_url_oauth1() if self.oauthver == OAUTH1_VER else self.get_authorize_url_oauth2()
File "/python_modules/oauth/service.py", line 175, in get_authorize_url_oauth1
request_token, request_token_secret = self.oauth_service.get_request_token(method="POST", oauth_callback=self.redirect_uri)
File "/virtualenvs/python2.7.2/lib/python2.7/site-packages/rauth/service.py", line 212, in get_request_token
r = self.get_raw_request_token(method=method, **kwargs)
File "/virtualenvs/python2.7.2/lib/python2.7/site-packages/rauth/service.py", line 186, in get_raw_request_token
return session.request(method, self.request_token_url, **kwargs)
File "/virtualenvs/python2.7.2/lib/python2.7/site-packages/rauth/session.py", line 195, in request
return super(OAuth1Session, self).request(method, url, **req_kwargs)
TypeError: request() got an unexpected keyword argument 'oauth_callback'
How can the redirect URI (oauth_callback) be speficied for OAuth1?
Rauth maintains the same API as Requests: twitter.get_request_token(..., params={'oauth_callback': 'http://example.com/callback'}).

Google provisioning api returning apparently bogus error for account creation

We use the Google Python API to create accounts. Beginning on 11/8/2012 at 1pm PST, we
have started to get these intermittent error messages:
errorCode="1301" invalidInput="loginname" reason="EntityDoesNotExist"
When we check the Google dashboard, the account is in fact created, but the remainder of
our account creation tasks are not completed due to the error message that google sends back.
Has anyone else noticed this problem and/or have an idea why this may be happening?
Our account provisioning code is robust and has created over 50,000 accounts prior to 11/8.
Here is the code snippet:
r = client.CreateUser(act.localpart, family_name, given_name, password, suspended='false', quota_limit=25600, password_hash_function="SHA-1",change_password=None )
Here is the full traceback:
Traceback (most recent call last):
File "/usr/lib/python2.4/site-packages/cherrypy/_cphttptools.py", line 105, in _run
self.main()
File "/usr/lib/python2.4/site-packages/cherrypy/_cphttptools.py", line 254, in main
body = page_handler(*virtual_path, **self.params)
File "<string>", line 3, in create_accountgmail
File "/usr/lib/python2.4/site-packages/turbogears/controllers.py", line 348, i expose
output = database.run_with_transaction(
File "<string>", line 5, in run_with_transaction
File "/usr/lib/python2.4/site-packages/turbogears/database.py", line 376, in s _rwt
retval = dispatch_exception(e, args, kw)
File "/usr/lib/python2.4/site-packages/turbogears/database.py", line 357, in s _rwt
retval = func(*args, **kw)
File "<string>", line 5, in _expose
File "/usr/lib/python2.4/site-packages/turbogears/controllers.py", line 365, i <lambda>
mapping, fragment, args, kw)))
File "/usr/lib/python2.4/site-packages/turbogears/controllers.py", line 393, in _execute_func
output = errorhandling.try_call(func, *args, **kw)
File "/usr/lib/python2.4/site-packages/turbogears/errorhandling.py", line 72, in try_call
return func(self, *args, **kw)
File "<string>", line 3, in create_accountgmail
File "/usr/lib/python2.4/site-packages/turbogears/controllers.py", line 182, in validate
return errorhandling.run_with_errors(errors, func, *args, **kw)
File "/usr/lib/python2.4/site-packages/turbogears/errorhandling.py", line 115, in run_with_errors
return func(self, *args, **kw)
File "<string>", line 3, in create_accountgmail
File "/usr/lib/python2.4/site-packages/turbogears/identity/conditions.py", line 235, in require
return fn(self, *args, **kwargs)
File "/usr/local/MYA/mya/account_controllers.py", line 1893, in create_accountgmail
raise Exception('Could not create gmail account, %s: %s'%(result, act.format_address()))
Exception: Could not create gmail account, RequestError: Server responded with: 400, <?xml version="1.0" encoding="UTF-8"?>
<AppsForYourDomainErrors>
<error errorCode="1301" invalidInput="LOGIN" reason="EntityDoesNotExist" />
</AppsForYourDomainErrors>: LOGIN#berkeley.edu
Another person with the same problem filed a ticket with google and got this response:
We received the following update from Google Enterprise Support regarding the
"EntityDoesNotExist" provisioning error:
It seems that it's the request to retrieve the user that is
returning this exception. It's most likely due to a propagation
delay in our servers: the user is correctly provisioned but the
information isn't propagated quickly enough and the call to
retrieve the user is made on a server where the user isn't
provisioned yet so you get the error EntityDoesNotExist.
As a temporary workaround until additional specialists can
resolve the propagation issue, I suggest you ignore the requests
that are failing with the error EntityDoesNotExist. I have added
your case to an issue report and will be sure to update you with
additional updates as they transpire.
I was having the same problem while using their .NET Library and Google support told me to stop using Client Login and use OAuth2.0. Client login is deprecated : https://developers.google.com/accounts/docs/AuthForInstalledApps
I am currently wrestling with OAuth so i can't report if it works any better. However the issue disappeared by itself in the meantime.

Resources