Retrieving data via POST request - post

I am having trouble obtaining data programmatically from a particular webpage.
http://www.uschess.org/msa/thin2.php allows one to search for US Chess ratings by name and state.
Submitting a POST request, I can get to the equivalent of http://www.uschess.org/msa/thin2.php?memln=nakamura&memfn=hikaru but still requires one to clicking the "Search" button to get useful data. What is the best way to get to that results page?
import urllib.request
import urllib.parse
data = {'memfn':'hikaru', 'memln':'nakamura'}
url = r'http://www.uschess.org/msa/thin2.php'
s = urllib.parse.urlopen(url, bytes(urllib.parse.urlencode(data),'UTF-8'))
s.read()
Thanks!

This one works:
#!/usr/bin/env python
import urllib
data = {'memfn':'hikaru', 'memln':'nakamura', 'mode':'Search'}
url = r'http://www.uschess.org/msa/thin2.php'
s = urllib.urlopen(url, bytes(urllib.urlencode(data)))
print s.read()
Basically you need to submit hidden parameter mode with value Search to imitate the button press.
Note: I rewrote it for python 2.x, sorry, but I didn't have python3 handy.

Related

Beautiful soup findAll returns empty list on this website?

I'm trying to extract property value history from this website:https://www.properly.ca/buy/home/view/ma-tEpHcSzeES-OlhE-V6A/bc/vancouver/1268-w-broadway-%23720/
But my code returns an empty list instead of the property cost history.
I used the following code:
from selenium import webdriver
import time
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
url= "https://www.properly.ca/buy/home/view/ma-tEpHcSzeES-OlhE-V6A/bc/vancouver/1268-w-broadway-%23720/"
driver.maximize_window()
driver.get(url)
time.sleep(5)
content = driver.page_source.encode('utf-8').strip()
soup = BeautifulSoup(content,"html.parser")
officials = soup.findAll("table",{"id":"property-history"})
for entry in officials:
print(str(entry))
Which returns an empty list, although this URL does have a property history table. Any help would be appreciated.
Thanks!
officials = soup.findAll("table",{"id":"property-history"})
On browser, I don't see a table with id="property-history" - but there is a div with that id, so maybe you can instead get the data you want through
officials = soup.find_all("div", {"id":"property-history"})
Btw, the only table I could find while inspecting the page was inside the map, and I don't think it holds any useful information for you.

How do I find the Conversion Action ID for use in the Google Ads API?

I'm using the latest (v7) Google Ads API to upload offline conversions for Google Ads, using the Python Client Library. This is the standard code I'm using:
import os
from google.ads.googleads.client import GoogleAdsClient
client = GoogleAdsClient.load_from_env(version='v7')
def process_adwords_conversion(
conversion_date_time,
gclid,
conversion_action_id,
conversion_value
):
conversion_date_time = convert_datetime(conversion_date_time)
customer_id = os.environ['GOOGLE_ADS_LOGIN_CUSTOMER_ID']
click_conversion = client.get_type("ClickConversion")
conversion_action_service = client.get_service("ConversionActionService")
click_conversion.conversion_action = (
conversion_action_service.conversion_action_path(
customer_id, conversion_action_id
)
)
click_conversion.gclid = gclid
click_conversion.conversion_value = float(conversion_value)
click_conversion.conversion_date_time = conversion_date_time
click_conversion.currency_code = "USD"
conversion_upload_service = client.get_service("ConversionUploadService")
request = client.get_type("UploadClickConversionsRequest")
request.customer_id = customer_id
request.conversions = [click_conversion]
request.partial_failure = True
conversion_upload_response = (
conversion_upload_service.upload_click_conversions(
request=request,
)
)
uploaded_click_conversion = conversion_upload_response.results[0]
print(conversion_upload_response)
print(
f"Uploaded conversion that occurred at "
f'"{uploaded_click_conversion.conversion_date_time}" from '
f'Google Click ID "{uploaded_click_conversion.gclid}" '
f'to "{uploaded_click_conversion.conversion_action}"'
)
return False
I believe the code is fine, but I'm having problems locating the conversion_action_id value to use. In the Google Ads UI there's a screen listing the different Conversion Actions, with no sign of an ID anywhere. You can click on the name and get more details, but still no ID:
The conversion action detail screen in Google Ads UI
I've tried the following:
Using the ocid, ctId, euid, __u, uscid, __c, subid URL parameters from this detail page as the conversion_action_id. That always gives an error:
partial_failure_error {
code: 3
message: "This customer does not have an import conversion action that matches the conversion action provided., at conversions[0].conversion_action"
details {
type_url: "type.googleapis.com/google.ads.googleads.v7.errors.GoogleAdsFailure"
value: "\n\305\001\n\003\370\006\t\022dThis customer does not have an import conversion action that matches the conversion action provided.\0320*.customers/9603123598/conversionActions/6095821\"&\022\017\n\013conversions\030\000\022\023\n\021conversion_action"
}
}
Using the standard Google answer:
https://support.google.com/google-ads/thread/1449693/where-can-we-find-google-ads-conversion-ids?hl=en
Google suggests creating a new Conversion Action and obtaining the ID in the process. Unfortunately their instructions don't correspond to the current UI version, at least for me. The sequence I follow is:
Click the + icon on the Conversion Actions page
Select "Import" as the kind of conversion I want
Select "Other data sources or CRMs" then "Track conversions from clicks"
Click "Create and Continue"
I then get the screen:
Screen following Conversion Action creation
The recommended answer says:
The conversion action is now created and you are ready to set up the tag to add it to your website. You have three options and the recommended answer in this thread is discussing the Google Tag Manager option, which is the only option that uses the Conversion ID and Conversion Label. If you do not click on the Google Tag Manager option you will not be presented with the Conversion ID and Conversion Label.
Not so! What three options? The first "Learn more" link mentions the Google Tag Manager, but in the context of collecting the GCLID, which I already have. The "three options" mentioned in the official answer have gone. Clicking "done" simply takes me back to the Conversion Actions listing.
Using the REST API
I've tried authenticating and interrogating the endpoint:
https://googleads.googleapis.com/v7/customers/9603123598/conversionActions/
hoping that would give a list of conversion actions, but it doesn't. It just gives a 404.
Does anybody know a way of getting the Conversion Action ID, either from the UI or programmatically (via client library, REST or some other method)?
Thanks!
If you're using Python, you can list your conversions via next snippet:
ads: GoogleAdsServiceClient = client.get_service('GoogleAdsService')
pages = ads.search(query="SELECT conversion_action.id, conversion_action.name FROM conversion_action", customer_id='YOUR_MCC_ID')
for page in pages:
print(page.conversion_action)
You can also open conversion action in UI and locate &ctId=, that's the conversion action id.
I found this post with the solution how to get the Conversion Action ID:
(…) I found out that conversionActionId can be found also in Google
Ads panel. When you go to conversion action details, in URL there is
parameter "ctId=123456789" which represent conversion action id.
By the way, I tried something similar and it's still not working, but with this Conversion Action ID I get a different "Partial Failure" message, at least.
At least in Google Ads API (REST) v10,
it works if field conversionAction is set with value:
'customers/{customerId}/conversionActions/{ctId}'
customerId - without hyphens
ctId - conversion action id, as mentioned in above comments,
taken from GET parameters in Google Ads panel when specific conversion is opened.
Can also be found programmatically with API method.

Web Scraping - BeautifulSoup parsers do not seem to work

I am trying to extract the name of a few items from the url below. The node and class_, point to the right content but when I use find_all , I do not get back any results. From previous posts it seems that this problem might be connected to using the wrong parser. I have used xml, lxml and others but nothing seems to work.
Is anyone able to extract the content successfully?
import requests
from bs4 import BeautifulSoup
import pandas as pd
import html5lib
import urllib3
url_pb = 'https://www.pullandbear.com/it/uomo/accessori/zaini-c1030207088.html'
req_pb = requests.get(url_pb)
pars_pb = BeautifulSoup(req_pb.content, 'html.parser')
con_pb = pars_pb.find_all('div', class_ = 'name namorio')
UPDATE
I have managed to find the info I needed, hidden in another section of the same code available to inspection. I have extracted them using this code:
url_pb = 'https://www.pullandbear.com/it/uomo/accessori/zaini-c1030207088.html'
req_pb = requests.get(url_pb)
pars_pb = BeautifulSoup(req_pb.content, 'html.parser')
con_pb = pars_pb.find_all('li', class_ = False)
names_pb = [c.select("a > p")[0].text for c in con_pb]
prices_pb = [c.select('a > p')[1].text for c in con_pb]
picts_pb = [c.find('img').get('src') for c in con_pb]
df_pb = pd.DataFrame({'(Pull&Bear) Item_Name': names_pb,
'Item_Price_EUR': prices_pb,
'Link_to_Pict': picts_pb })
It seems that the website is using javascript in order to display its content. Meaning that you can't directly visit the homepage and scrape the content (as the requests doesn't support javascript rendered websites). That being said all of the data displayed on the website is sent in the form of a JSON string, so in order to get all the names of the items you could use the following code:
import requests
url = "https://www.pullandbear.com:443/itxrest/2/catalog/store/24009405/20309428/category/1030207088/product?languageId=-4&appId=1"
all_products = requests.get(url).json()["products"]
product_names = [item["bundleProductSummaries"][0]["name"] for item in all_products]
print(product_names)
hope this helps

Change flask.app.request_class in flask according to sending request

From client send data(name field encoed 'euc-kr'), POST http://127.0.0.1, name=테스트&charset=euc-kr
In server(based on flask) receive data, but only Unicode and broken display.
#app.route('/', methods='post'):
def post():
print request.charset #utf-8
print request.url_charset #utf-8
print type(request.form['name']) #unicode
So I use the subclass of Flask.Requsest class for supporting charset:
# main.py
class EuckrRequest(Request):
url_charset = 'euc-kr'
charset = 'euc-kr'
app = Flask(__name__, static_url_path='', static_folder='static')
app.request_class=EuckrRequest
So good and not broken display. But I want to change app.request_class according to charset in POST data.
How to modify code? app.request_context?, app.before_request?
dynamically changing the request class is the wrong way to go about this,
since the request class is instantiated very early to begin with
i suggest you reach in to request.environ and handle the details explicitly from the data that came in
I solve use request.get_data()
import urllib
urllib.unquote(request.get_data()).decode('euc-kr')

TypeError when attempting to parse pubmed EFetch

I'm new to this python/biopyhton stuff, so am struggling to work out why the following code, pretty much lifted straight out of the Biopython Cookbook, isn't doing what I'd expect.
I'd have thought it'd end up with the interpreter display two list containing the same number, but all i get is one list and then a message saying TypeError: 'generator' object is not subscriptable.
I'm guessing something is going wrong with the Medline.parse step and the result of the efetch isn't being processed in a way that allows subsequent interation to extract the PMID values. Or, the efetch isn't returning anything.
Any pointers at to what I'm doing wrong?
Thanks
from Bio import Medline
from Bio import Entrez
Entrez.email = 'A.N.Other#example.com'
handle = Entrez.esearch(db="pubmed", term="biopython")
record = Entrez.read(handle)
print(record['IdList'])
items = record['IdList']
handle2 = Entrez.efetch(db="pubmed", id=items, rettype="medline", retmode="text")
records = Medline.parse(handle2)
for r in records:
print(records['PMID'])
You're trying to print records['PMID'] which is a generator. I think you meant to do print(r['PMID']) which will print the 'PMID' entry in the current record dictionary object for each iteration. This is confirmed by the example given in the Bio.Medline.parse() documentation.

Resources