Scrapy Splash - I am not able to get the value - docker

I am trying to scrape this page: https://simple.ripley.com.pe/laptop-lenovo-ideapad-5-amd-ryzen-7-16gb-ram-256gb-ssd-14-2004286061746p?s=o
All ok, but I am not able to get the values in this xpath:
//*[#id="panel-Especificaciones"]/div/div/table/tbody/tr[19]/td[2]
I think it loads dynamically. It's a table with many rows inside. I would like to get those values.
Image: page section i can't scrape
This is my spider code:
import scrapy
from scrapy_splash import SplashRequest
from numpy import nan
LUA_SCRIPT = """
function main(splash)
splash.private_mode_enabled = false
splash:go(splash.args.url)
splash:wait(2)
html = splash:html()
splash.private_mode_enabled = true
return html
end
"""
class RipleySpider(scrapy.Spider):
name = "ripley"
def start_requests(self):
url = 'https://simple.ripley.com.pe/tecnologia/computacion/laptops?facet%5B%5D=Procesador%3AIntel+Core+i7'
yield SplashRequest(url=url, callback=self.parse)
def parse(self, response):
for link in response.xpath("//div[#class='catalog-container']/div/a/#href"):
yield response.follow(link.get(), callback=self.parse_products)
# for href in response.xpath("//ul[#class='pagination']/li[last()]/a/#href").getall():
# yield SplashRequest(response.urljoin(href), callback=self.parse)
def parse_products(self, response):
titulo = response.css("h1::text").get()
link = response.request.url
sku = response.css(".sku-value::text").get()
precio = response.css(".product-price::text").getall()
if len(precio)==1:
precio_normal = nan
precio_internet = precio[0]
precio_tarjeta_ripley = nan
elif len(precio)==2:
precio_normal = precio[0]
precio_internet = precio[1]
precio_tarjeta_ripley = nan
elif len(precio)==4:
precio_normal = precio[0]
precio_internet = precio[1]
precio_tarjeta_ripley = precio[-1]
try:
# descripcion = response.css(".product-short-description::text").get()
descripcion = response.xpath('//*[#id="panel-Especificaciones"]/div/div/table/tbody/tr[1]/td[2]/text()').get()
except:
descripcion = 'sin valor'
yield {
'Título': titulo,
'Link': link,
'SKU': sku,
'Precio Normal': precio_normal,
'Precio Internet': precio_internet,
'Precio Tarjeta Ripley': precio_tarjeta_ripley,
'Descripción': descripcion,
}
Please, what solutions does scrapy offer? Thanks in advance for your help.
P.D.: I'm using Docker with Splash in localhost:8050. settings.py according to documentation.

Related

eval() arg 1 must be a string, bytes or code object Traceback (most recent call last)

So I try to deploy a machine Learning model to streaml1t Using Flask. But as we know from the title, the error gave me that 'eval() args1 must be a string
------This is The Code For The Back End------
from flask import Flask, request,jsonify
import pickle
app = Flask(__name__)
with open ('forest_opt.pkl', 'rb') as model_file:
model = pickle.load(model_file)
#app.route('/')
def model_prediction():
age = eval(request.args.get('age'))
internship = eval(request.args.get('internship'))
cgpa = eval(request.args.get('cgpa'))
hostel = eval(request.args.get('hostel'))
history = eval(request.args.get('history'))
new_data = [age, internship, cgpa, hostel, history]
res = model.predict([new_data])
classes = ['No','Yes']
response = {'status': 'success',
'code' : 200,
'data' : {'result':classes(res[0])}
}
return jsonify(response)
#app.route('/predict', methods=['POST'])
def predict_post():
content = request.json
data = [content['age'],
content['internship'],
content['cgpa'],
content['hostel'],
content['history']
]
res = model.predict([data])
response = {'status': 'success',
'code' : 200,
'data' : {'result':str(res[0])}
}
return jsonify(response)
app.run(debug=True)
------This is The Code For The Front End------
import streamlit as st
import requests
URL = 'http://127.0.0.1:5000/'
st.title('App for Detecting Chance of Getting a Job')
age = st.number_input('age')
internship = st.number_input('Internship (0,1,2,3)')
cgpa = st.number_input('cgpa')
hostel = st.number_input('hostel')
history = st.number_input('history')
data = {'age':age,
'internship':internship,
'cgpa':cgpa,
'hostel':hostel,
'history':history}
r = requests.post(URL, json=data)
res = r.json()
st.write (f" Predict The Result: {res['data']['result']}")
The error keep saying this 'age = eval(request.args.get('age'))' I don't know why inside eval() have no information. Please help I kind of new to this one. Thank you !

How to spllit laserscan data from lidar into sections and view them on rviz

I was trying to split the laser scan range data into subcategories and like to publish each category into different laser topics.
to specify more, the script should get one topic as an input - /scan and the script should publish three topics as follow = scan1, scan2, scan3
is there a way to split the laser scan and publish back and look them on rviz
I tried the following
def callback(laser):
current_time = rospy.Time.now()
regions["l_f_fork"] = laser.ranges[0:288]
regions["l_f_s"] = laser.ranges[289:576]
regions["stand"] = laser.ranges[576:864]
l.header.stamp = current_time
l.header.frame_id = 'laser'
l.angle_min = 0
l.angle_max = 1.57
l.angle_increment =0
l.time_increment = 0
l.range_min = 0.0
l.range_max = 100.0
l.ranges = regions["l_f_fork"]
l.intensities = [0]
left_fork.publish(l)
# l.ranges = regions["l_f_s"]
# left_side.publish(l)
# l.ranges = regions["stand"]
# left_side.publish(l)
rospy.loginfo("publishing new info")
I can see the different topics on rviz, but they are lies on the same line,
Tutorial
The following code splits the LaserScan data into three equal sections:
#! /usr/bin/env python3
"""
Program to split LaserScan into three parts.
"""
import rospy
from sensor_msgs.msg import LaserScan
class LaserScanSplit():
"""
Class for splitting LaserScan into three parts.
"""
def __init__(self):
self.update_rate = 50
self.freq = 1./self.update_rate
# Initialize variables
self.scan_data = []
# Subscribers
rospy.Subscriber("/scan", LaserScan, self.lidar_callback)
# Publishers
self.pub1 = rospy.Publisher('/scan1', LaserScan, queue_size=10)
self.pub2 = rospy.Publisher('/scan2', LaserScan, queue_size=10)
self.pub3 = rospy.Publisher('/scan3', LaserScan, queue_size=10)
# Timers
rospy.Timer(rospy.Duration(self.freq), self.laserscan_split_update)
def lidar_callback(self, msg):
"""
Callback function for the Scan topic
"""
self.scan_data = msg
def laserscan_split_update(self, event):
"""
Function to update the split scan topics
"""
scan1 = LaserScan()
scan2 = LaserScan()
scan3 = LaserScan()
scan1.header = self.scan_data.header
scan2.header = self.scan_data.header
scan3.header = self.scan_data.header
scan1.angle_min = self.scan_data.angle_min
scan2.angle_min = self.scan_data.angle_min
scan3.angle_min = self.scan_data.angle_min
scan1.angle_max = self.scan_data.angle_max
scan2.angle_max = self.scan_data.angle_max
scan3.angle_max = self.scan_data.angle_max
scan1.angle_increment = self.scan_data.angle_increment
scan2.angle_increment = self.scan_data.angle_increment
scan3.angle_increment = self.scan_data.angle_increment
scan1.time_increment = self.scan_data.time_increment
scan2.time_increment = self.scan_data.time_increment
scan3.time_increment = self.scan_data.time_increment
scan1.scan_time = self.scan_data.scan_time
scan2.scan_time = self.scan_data.scan_time
scan3.scan_time = self.scan_data.scan_time
scan1.range_min = self.scan_data.range_min
scan2.range_min = self.scan_data.range_min
scan3.range_min = self.scan_data.range_min
scan1.range_max = self.scan_data.range_max
scan2.range_max = self.scan_data.range_max
scan3.range_max = self.scan_data.range_max
# LiDAR Range
n = len(self.scan_data.ranges)
scan1.ranges = [float('inf')] * n
scan2.ranges = [float('inf')] * n
scan3.ranges = [float('inf')] * n
# Splitting Block [three equal parts]
scan1.ranges[0 : n//3] = self.scan_data.ranges[0 : n//3]
scan2.ranges[n//3 : 2*n//3] = self.scan_data.ranges[n//3 : 2*n//3]
scan3.ranges[2*n//3 : n] = self.scan_data.ranges[2*n//3 : n]
# Publish the LaserScan
self.pub1.publish(scan1)
self.pub2.publish(scan2)
self.pub3.publish(scan3)
def kill_node(self):
"""
Function to kill the ROS node
"""
rospy.signal_shutdown("Done")
if __name__ == '__main__':
rospy.init_node('laserscan_split_node')
LaserScanSplit()
rospy.spin()
The following are screenshots of the robot and obstacles in the environment in Gazebo and RViz:
References:
ROS1 Python Boilerplate
atreus

Access variable and render it in different URL

I tried to render a variable('predictions') in URL(/predict) on url('/hello'). I am beginner in the web development. If someone knows , please help.
app = Flask(__name__)
#app.route('/predict', methods=['POST'])
def apicall(responses = None):
test_json = request.get_json()
test = pd.read_json(test_json, orient='records')
query_df = pd.DataFrame(test)
clf = 'kmeans__model.pkl'
print("Loading the model...")
lin_reg_model = None
with open(clf,'rb') as f:
lin_reg_model = joblib.load(f)
# lin_reg_model = joblib.load('/home/q/new_project/models/kmeans_model.pkl')
print("The model has been loaded...doing predictions now...")
predictions = lin_reg_model.predict(test)
print(predictions)
prediction_series = list(pd.Series(predictions))
final_predictions = pd.DataFrame(list(zip(prediction_series)))
responses = jsonify(predictions=final_predictions.to_json(orient="records"))
responses.status_code = 200
return (responses)
#app.route('/hello')
def hello():
**What should be used here to render *predictions*?**
return 'Hello, World '
You can return to template and show in html if you like. Create templates folder and create html file.
import requests
def hello():
response = requests.get(url_for('apicall'))
return render_template('<yourtemplate.html>', predictions=response.json())
HTML
{{ predictions }}
Note: Please follow this link for any problem with thread and deployment Handling multiple requests in Flask

Why aren't these 2 strings equal

Going blind here.. I can't understand why these 2 strings are not equal.. When I puts them to the terminal they are both class string and when I just compare the output they ARE equal. But somehow in my code they are not.. I can't figure out why.
Here is my Ruby code:
def prep_for_duplicate_webhook
#redis_cart = Redis.new
cart_stamp_saved = #redis_cart.get("cart_stamp_saved")
if cart_stamp_saved.nil?
cart_stamp_saved = {}
cart_stamp_saved[:token] = cart_params['token']
cart_stamp_saved[:updated_at] = cart_params['updated_at']
#redis_cart.set("cart_stamp_saved", cart_stamp_saved.to_json)
end
#cart_stamp_incoming = {}
#cart_stamp_incoming["token"] = cart_params['token']
#cart_stamp_incoming["updated_at"] = cart_params['updated_at']
end
def duplicate_webhook?
prep_for_duplicate_webhook
#cart_stamp_saved = redis_cart.get("cart_stamp_saved")
cart_stamp_saved == cart_stamp_incoming.to_json
end
And then the hash's I'm comparing are these two:
cart_stamp_saved = {"token"=>"4a093432ba5c430dd545b16c0e89f187",
"updated_at"=>"2017-02-17T15:27:22.923Z"}
cart_stamp_incoming= {"token"=>"4a093432ba5c430dd545b16c0e89f187",
"updated_at"=>"2017-02-17T15:27:22.923Z"}
If I just copy and paste the above into a new page, and the do this, the response is true
pp cart_stamp_saved == cart_stamp_incoming.to_json
What am I missing?

When scraping a website using Scrapy, how do I make sure all characters scrape properly?

I am using Scrapy to scrape a website but some of the characters, such as apostrophes, do not scrape correctly nor are they consistently the same wrong character, i.e., I've had an apostrophe show up as multiple odd characters in my result set. How do I ensure that all characters scrape properly?
Edit
I am trying to scrape http://www.nowtoronto.com/music/listings/ with the following scraper:
import urlparse
import time
from scrapy.http import Request
from scrapy.spider import BaseSpider
from scrapy.selector import Selector
#from NT.items import NowTorontoItem
from scrapy.item import Item, Field
class NowTorontoItem(Item):
eventArtist = Field()
eventTitle = Field()
eventHolder = Field()
eventDetails = Field()
#venueName = Field()
eventAddress = Field()
eventLocality = Field()
eventPostalCode = Field()
eventPhone = Field()
eventURL = Field()
eventPrice = Field()
eventDate = Field()
internalURL = Field()
class MySpider(BaseSpider):
name = "NTSpider"
allowed_domains = ["nowtoronto.com"]
start_urls = ["http://www.nowtoronto.com/music/listings/"]
def parse(self, response):
selector = Selector(response)
listings = selector.css("div.listing-item0, div.listing-item1")
for listing in listings:
item = NowTorontoItem()
for body in listing.css('span.listing-body > div.List-Body'):
item ["eventArtist"] = body.css("span.List-Name::text").extract()
item ["eventTitle"] = body.css("span.List-Body-Emphasis::text").extract()
item ["eventHolder"] = body.css("span.List-Body-Strong::text").extract()
item ["eventDetails"] = body.css("::text").extract()
#item ["internalURL"] = body.css("a::attr(href)").extract()
time.sleep(1)
for body in listing.css('div.listing-readmore'):
item ["internalURL"] = body.css("a::attr(href)").extract()
# yield a Request()
# so that scrapy enqueues a new page to fetch
detail_url = listing.css("div.listing-readmore > a::attr(href)")
if detail_url:
yield Request(urlparse.urljoin(response.url,
detail_url.extract()[0]),
meta={'item': item},
callback=self.parse_details)
else:
yield item
def parse_details(self, response):
self.log("parse_details: %r" % response.url)
selector = Selector(response)
listings = selector.css("div.whenwhereContent")
for listing in listings:
for body in listing.css('tr:nth-child(1) td.small-txt.dkgrey-txt.rightInfoTD'):
item = response.meta['item']
#item ["eventLocation"] = body.css("span[property='v:location']::text").extract()
#item ["eventOrganization"] = body.css("span[property='v:organization'] span[property='v:name']::text").extract()
#item ["venueName"] = body.css("span[property='v:name']::text").extract()
item ["eventAddress"] = body.css("span[property='v:street-address']::text").extract()
item ["eventLocality"] = body.css("span[property='v:locality']::text").extract()
item ["eventPostalCode"] = body.css("span[property='v:postal-code']::text").extract()
item ["eventPhone"] = body.css("span[property='v:tel']::text").extract()
item ["eventURL"] = body.css("span[property='v:url']::text").extract()
item ["eventPrice"] = listing.css('tr:nth-child(2) td.small-txt.dkgrey-txt.rightInfoTD::text').extract()
item ["eventDate"] = listing.css('span[content*="201"]::attr(content)').extract()
yield item
I am getting characters in some of the results like ée and é. These are supposed to be characters like é and ç.
Edit 2
I am not sure the issue is simply related to the file viewer I am using. When I open my first scrape in a text editor, an apostrophe is formatted as ’ whereas in my second scrape, the same apostrophe (from the same text string) is formatted as —È.
It turns out that the data is actually fine but the encoding was broken when I opened and saved the file in Excel. I have switched to Libre Office, which specifically asks for the encoding of the document when it is being opened, and everything works fine.

Resources