How to look through post in Reddit with praw - reddit

I've been looking at the documentation for praw and i simply cannot find which method is for looking through all the post.
What I want to do is look through all the post
import ProcessingBot
import Auth
import praw
SETPHRASES = ["python", "bots", "jarmahent", "is proves there was no Global Warming in 1966", "test"]
SETPHRASE = ("This is a bot, ignore this reply")
USERNAME = Auth.pG
def run():
r = praw.Reddit(Auth.app_ua)
print("Signing In")
r.set_oauth_app_info(Auth.app_id, Auth.app_secret, Auth.app_uri)
print("Setting Oauth App Info")
r.refresh_access_information(Auth.app_refresh)
sub = r.get_subreddit("ProcessingImages")
print("Getting SubReddit")
for: #Look through all the post this is where the post finder will be
print("Finished")
return r
while True:
run()
The formatting is a little wrong, I spaced 4 times and pasted and it still didnt work.

This is covered in the example on the documentation front page:
>>> import praw
>>> r = praw.Reddit(user_agent='my_cool_application')
>>> submissions = r.get_subreddit('opensource').get_hot(limit=5)
>>> [str(x) for x in submissions]
Aside from get_hot, there are also methods for rising and top posts.

Related

BeautifulSoup - All href links don't appear to be extracting

I am trying to extract all href links that are within class ['address']. Each time I run the code, I only get the first 5 and that's it, even though I know there should be 9.
Web-Page:
https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch
I have read through a variety of threads below, altered my code countless times, including switching through all parsers (html.parser, html5lib, lxml, xml, lxml-xml) but nothing seems to be working. Any idea of what's causing it stop after the 5th iteration? I am still fairly new into python so I apologize if this is a rookie mistake that I'm overlooking. Any help would be appreciated, even the sarcastic answers :)
Beautiful Soup findAll doesn't find them all
Beautiful Soup 4 find_all don't find links that Beautiful Soup 3 finds
BeautifulSoup fails to parse long view state
Beautifulsoup lost nodes
Missing parts on Beautiful Soup results
Python 64 bit not storing as long of string as 32 bit python
I used pretty similar code on the following web-pages below and did not experience any issues scraping the hrefs:
https://www.walgreens.com/storelistings/storesbystate.jsp?requestType=locator
https://www.walgreens.com/storelistings/storesbycity.jsp?requestType=locator&state=AK
My code below:
import requests
from bs4 import BeautifulSoup
local_rg = requests.get('https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch')
local_rg_content = local_rg.content
local_rg_content_src = BeautifulSoup(local_rg_content, 'lxml')
for link in local_rg_content_src.find_all('div'):
local_class = str(link.get('class'))
if str("['address']") in str(local_class):
local_a = link.find_all('a')
for a_link in local_a:
local_href = str(a_link.get('href'))
print(local_href)
My results (first 5):
/locator/walgreens-1470+w+northern+lights+blvd-anchorage-ak-99503/id=15092
/locator/walgreens-725+e+northern+lights+blvd-anchorage-ak-99503/id=13656
/locator/walgreens-4353+lake+otis+parkway-anchorage-ak-99508/id=15653
/locator/walgreens-7600+debarr+rd-anchorage-ak-99504/id=12679
/locator/walgreens-2197+w+dimond+blvd-anchorage-ak-99515/id=12680
But should be 9:
/locator/walgreens-1470+w+northern+lights+blvd-anchorage-ak-99503/id=15092
/locator/walgreens-725+e+northern+lights+blvd-anchorage-ak-99503/id=13656
/locator/walgreens-4353+lake+otis+parkway-anchorage-ak-99508/id=15653
/locator/walgreens-7600+debarr+rd-anchorage-ak-99504/id=12679
/locator/walgreens-2197+w+dimond+blvd-anchorage-ak-99515/id=12680
/locator/walgreens-2550+e+88th+ave-anchorage-ak-99507/id=15654
/locator/walgreens-12405+brandon+st-anchorage-ak-99515/id=13449
/locator/walgreens-12051+old+glenn+hwy-eagle+river-ak-99577/id=15362
/locator/walgreens-1721+e+parks+hwy-wasilla-ak-99654/id=12681
Try using selenium instead of requests to get the source code of the page. Here is how you do it:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch')
local_rg_content = driver.page_source
driver.close()
local_rg_content_src = BeautifulSoup(local_rg_content, 'lxml')
The rest of the code is the same. Here is the full code:
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch')
local_rg_content = driver.page_source
driver.close()
local_rg_content_src = BeautifulSoup(local_rg_content, 'lxml')
for link in local_rg_content_src.find_all('div'):
local_class = str(link.get('class'))
if str("['address']") in str(local_class):
local_a = link.find_all('a')
for a_link in local_a:
local_href = str(a_link.get('href'))
print(local_href)
Output:
/locator/walgreens-1470+w+northern+lights+blvd-anchorage-ak-99503/id=15092
/locator/walgreens-725+e+northern+lights+blvd-anchorage-ak-99503/id=13656
/locator/walgreens-4353+lake+otis+parkway-anchorage-ak-99508/id=15653
/locator/walgreens-7600+debarr+rd-anchorage-ak-99504/id=12679
/locator/walgreens-2197+w+dimond+blvd-anchorage-ak-99515/id=12680
/locator/walgreens-2550+e+88th+ave-anchorage-ak-99507/id=15654
/locator/walgreens-12405+brandon+st-anchorage-ak-99515/id=13449
/locator/walgreens-12051+old+glenn+hwy-eagle+river-ak-99577/id=15362
/locator/walgreens-1721+e+parks+hwy-wasilla-ak-99654/id=12681
the page uses Ajax to load store information from external URL. You can use requests/json module to load it:
import re
import json
import requests
url = 'https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch'
ajax_url = 'https://www.walgreens.com/locator/v1/stores/search?requestor=search'
m = re.search(r'"lat":([\d.-]+),"lng":([\d.-]+)', requests.get(url).text)
params = {
'lat': m.group(1),
'lng': m.group(2)
}
data = requests.post(ajax_url, json=params).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
for result in data['results']:
print(result['store']['address']['street'])
print('https://www.walgreens.com' + result['storeSeoUrl'])
print('-' * 80)
Prints:
1470 W NORTHERN LIGHTS BLVD
https://www.walgreens.com/locator/walgreens-1470+w+northern+lights+blvd-anchorage-ak-99503/id=15092
--------------------------------------------------------------------------------
725 E NORTHERN LIGHTS BLVD
https://www.walgreens.com/locator/walgreens-725+e+northern+lights+blvd-anchorage-ak-99503/id=13656
--------------------------------------------------------------------------------
4353 LAKE OTIS PARKWAY
https://www.walgreens.com/locator/walgreens-4353+lake+otis+parkway-anchorage-ak-99508/id=15653
--------------------------------------------------------------------------------
7600 DEBARR RD
https://www.walgreens.com/locator/walgreens-7600+debarr+rd-anchorage-ak-99504/id=12679
--------------------------------------------------------------------------------
2197 W DIMOND BLVD
https://www.walgreens.com/locator/walgreens-2197+w+dimond+blvd-anchorage-ak-99515/id=12680
--------------------------------------------------------------------------------
2550 E 88TH AVE
https://www.walgreens.com/locator/walgreens-2550+e+88th+ave-anchorage-ak-99507/id=15654
--------------------------------------------------------------------------------
12405 BRANDON ST
https://www.walgreens.com/locator/walgreens-12405+brandon+st-anchorage-ak-99515/id=13449
--------------------------------------------------------------------------------
12051 OLD GLENN HWY
https://www.walgreens.com/locator/walgreens-12051+old+glenn+hwy-eagle+river-ak-99577/id=15362
--------------------------------------------------------------------------------
1721 E PARKS HWY
https://www.walgreens.com/locator/walgreens-1721+e+parks+hwy-wasilla-ak-99654/id=12681
--------------------------------------------------------------------------------

Web Scraping - BeautifulSoup parsers do not seem to work

I am trying to extract the name of a few items from the url below. The node and class_, point to the right content but when I use find_all , I do not get back any results. From previous posts it seems that this problem might be connected to using the wrong parser. I have used xml, lxml and others but nothing seems to work.
Is anyone able to extract the content successfully?
import requests
from bs4 import BeautifulSoup
import pandas as pd
import html5lib
import urllib3
url_pb = 'https://www.pullandbear.com/it/uomo/accessori/zaini-c1030207088.html'
req_pb = requests.get(url_pb)
pars_pb = BeautifulSoup(req_pb.content, 'html.parser')
con_pb = pars_pb.find_all('div', class_ = 'name namorio')
UPDATE
I have managed to find the info I needed, hidden in another section of the same code available to inspection. I have extracted them using this code:
url_pb = 'https://www.pullandbear.com/it/uomo/accessori/zaini-c1030207088.html'
req_pb = requests.get(url_pb)
pars_pb = BeautifulSoup(req_pb.content, 'html.parser')
con_pb = pars_pb.find_all('li', class_ = False)
names_pb = [c.select("a > p")[0].text for c in con_pb]
prices_pb = [c.select('a > p')[1].text for c in con_pb]
picts_pb = [c.find('img').get('src') for c in con_pb]
df_pb = pd.DataFrame({'(Pull&Bear) Item_Name': names_pb,
'Item_Price_EUR': prices_pb,
'Link_to_Pict': picts_pb })
It seems that the website is using javascript in order to display its content. Meaning that you can't directly visit the homepage and scrape the content (as the requests doesn't support javascript rendered websites). That being said all of the data displayed on the website is sent in the form of a JSON string, so in order to get all the names of the items you could use the following code:
import requests
url = "https://www.pullandbear.com:443/itxrest/2/catalog/store/24009405/20309428/category/1030207088/product?languageId=-4&appId=1"
all_products = requests.get(url).json()["products"]
product_names = [item["bundleProductSummaries"][0]["name"] for item in all_products]
print(product_names)
hope this helps

How to create context for the chatbot using twilio

I am trying to create a bot where when ever someone send a message about a type of food to the bot, then the bot will respond with the location that serves that food. However I am trying to establish context so that the conversation can flow more thoroughly.
I have tried nesting the if statement, and it gets it to display the message, but it would have to rely on the if-statement prior to be true before testing for the ones that comes after.
from flask import Flask, request
from twilio.twiml.messaging_response import MessagingResponse
from intents import fallback_intent, getLocation
import random
app = Flask(__name__)
location_fallback = ['What kind of restaurant are you seeking?', 'What kind? Nearby, Cheap or The best?']
welcome = ['hello', 'what\'s up', 'hey','hi', 'what\'s happening?']
near = ['near', 'nearby']
cheap = ['cheap', 'good for my pockets']
good = ['good', 'top rated']
intro_resp = ['''Hey! Welcome to Crave! This interactive platform connects you to the top foodies in the world! We provide you with the best food places where ever you are. The instructions are simple:
1. Save our number in your Phone as Crave.
2. Text us and tell us what type of food you are craving!
This is from python''', '''
Welcome to Crave! Are you ready to get some food for today?
1. Save our number in your Phone as Crave.
2. Text us and tell us what type of food you are craving!
''']
#app.route('/sms', methods=['GET','POST'])
def sms():
num = request.form['From']
msg = request.form['Body'].lower()
resp = MessagingResponse()
#welcome intent
if any(word in msg for word in welcome):
if any(near_word in msg for near_word in near):
resp.message('These are the location of places near you!')
print(str(msg.split()))
return str(resp)
elif any(cheap_word in msg for cheap_word in cheap):
resp.message('These are the location of places that are low cost to you!')
return str(resp)
elif any(good_word in msg for good_word in good):
resp.message('These are the best places in town!')
return str(resp)
else:
location_fallback[random.randint(0,1)]
resp.message(intro_resp[random.randint(0, 1)])
print(str(msg.split()))
return str(resp)
else:
resp.message(fallback_intent())
print(str(msg))
return str(resp)
if __name__ == '__main__':
app.run(debug=True)
I want the user to say 'hi'' or something related to initiate the bot, then I want the bot to prompt the user to ask what kind of food they would like. Then the bot will ask what parameters for the restaurant they would like(i.e Close, cheap, or good). Then the user will answer accordingly, and then the bot needs to use these parameters to search for the restaurant near them with these attributes.
Twilio developer evangelist here.
You could store this in many places, in cookies as part of the conversation with Twilio, in a database where you use the user's number as a key to look up previous messages, or even just in memory.
If you're looking for a more robust way to achieve this, with better natural language processing, have you checked out Twilio Autopilot? It stores the context of a conversation for you and is built to collect information before giving a response based on the complete set like you are doing.

getAdGroupBidLandscape returning no campaigns found

I'm trying to use the Google AdWords bid simulator system to try and get some insights out of the AdWords bid simulator. More specifically I'm using the AdGroupBidLandscape() functionality, but it's returning 'No Campaigns Found', but we definitely have campaigns where the Bid Simulator tool works through the AdWords web page interface, so I'm a bit confused. Here is the code I'm running, and yes I know I'm only retrieving a single field - I'm just trying to keep things as simple as possible.
from googleads import adwords
import logging
import time
CHUNK_SIZE = 16 * 1024
PAGE_SIZE = 100
logging.basicConfig(level=logging.INFO)
logging.getLogger('suds.transport').setLevel(logging.DEBUG)
adwords_client = adwords.AdWordsClient.LoadFromStorage()
dataService = adwords_client.GetService('DataService', version='v201710')
offset = 0
selector = {'fields':['Bid'], #'impressions', 'promotedImpressions', 'requiredBudget', 'bidModifier', 'totalLocalImpressions', 'totalLocalClicks', 'totalLocalCost', 'totalLocalPromotedImpressions'],
'paging': {
'startIndex': str(offset),
'numberResults': str(PAGE_SIZE)
}
}
more_pages = True
while more_pages:
page = dataService.getAdGroupBidLandscape(selector)
# Display results.
if 'entries' in page:
for campaign in page['entries']:
print ('Campaign with id "%s", name "%s", and status "%s" was '
'found.' % (campaign['id'], campaign['name'],
campaign['status']))
else:
print 'No campaigns were found.'
offset += PAGE_SIZE
selector['paging']['startIndex'] = str(offset)
more_pages = offset < int(page['totalNumEntries'])
time.sleep(1)
We have several different accounts attachd to AdWords. My account is the only one that has developer API access, so I sort of wonder if the problem is that my account isn't the primary account associated with the campaigns- I just have one of the few administrator accounts. Can anyone provide some insights about this for me?
Thanks,
Brad
The solution I found to this problem was to add a predicate to the selector specifying a particular CampaignId. While that doesn't make any sense to me that it would fix it, because it should really just be filtering the data with that if I understand things correctly, it seems to have. I don't have a good explanation for that, but I thought someone else might find this useful. If I come to realize this wasn't the fix to the problem I had, I will come back and update this answer.

Scraping webpages for links with a specific class

First post here and I have had a look but can't find the answer I need.
I'm trying to go through a website and find all the links that have a certain class, in this case 'annmt'.
I want the result to only show the link though and am having trouble trying to get the format right. Once right I want to append it to an empty list that I can call on later.
My code is:
import requests
from bs4 import BeautifulSoup
import datetime as dt
l = []
def getlinks():
page = requests.get("http://www.investegate.co.uk/Index.aspx?
ftse=1&date=20170609")
soup = BeautifulSoup(page.content, 'html.parser')
for links in soup.find_all('a', attrs={'class': 'annmt'}):
for link in links.find_all('a', href=True):
link = link['href']
l.append(link)
print l
Here is the working code for your reference:
import requests
from bs4 import BeautifulSoup
import datetime as dt
l = []
def getlinks():
page = requests.get("http://www.investegate.co.uk/Index.aspx?ftse=1&date=20170609")
soup = BeautifulSoup(page.content, 'html.parser')
for links in soup.find_all('a', attrs={'class': 'annmt'}):
link = links.get('href')
l.append(link)
print l

Resources