I am scraping a page and I want to extract the links of said page but when extracting it, it returns an empty list both in the terminal and in the editor
but in the console of the page it returns the links
this is the code and the link
import scrapy
class AsGeekStore(scrapy.Spider):
name = 'as_stores'
start_urls = [
'https://c21.com.bo/busqueda/en-estado_santa-cruz'
]
custom_settings = {
'FEED_URI': 'stores.json',
'FEED_FORMAT': 'json',
'FEED_EXPORT_ENCODING': 'utf-8',
'ROBOTSTXT_OBEY': True
}
def parse(self,response):
store_links = response.xpath('//div[contains(#class, "col-sm-8 pt-2 px-2 pb-0")]/a/#href').getall()
for link in store_links:
yield response.follow(link, callback=self.parse_link, cb_kwargs={'url': response.urljoin(link)})
def parse_link(self, response, **kwargs):
link = kwargs['url']
yield {
'url': link
}
I only want the 10 links to access more details
enter image description here
enter image description here
I tried to be all the links but I did not miss those links
Related
In my angular-rails app I'm trying to implement a very basic live database search, such that on keyup, a results array populates or depopulates as the input value changes. Currently there are around 400 products in the database table, but there could be many more in the future. Here's my code:
On Rails side, inside products controller:
def index
#products = Product.search(params[:search]);
render json: #products
end
And my product.rb file:
def self.search(search)
where("name iLIKE ?", "%#{search}%")
end
While on the angular side, inside the relevant controller:
function MainController(dataService) {
var ctrl = this
ctrl.searchResultsArray = [];
ctrl.populateArray = function(search) {
dataService.getProductSummaries(search)
.then(function(response){
ctrl.searchResultsArray = response.data
})
};
};
Inside my dataService:
function dataService($http) {
var ctrl = this
ctrl.getProductSummaries = function(search) {
return $http({
method: 'GET',
url: '/products',
params: { search: search }
});
};
};
And inside my angular view (controlled by MainController as ctrl):
<input ng-keyup="ctrl.populateArray(ctrl.result)" ng-model="ctrl.result" />
<div ng-repeat="result in ctrl.searchResultsArray">
<li>
{{ result.name }} costs {{ result.price }}
</li>
</div>
The above code mostly works, but typing too quickly can shortcircuit it, and I occasionally see errors in my dev console, so it definitely is broken. What is it missing or doing wrong?
So, since you expect issues with too quickly type - im not sure you have real errros in console. Most likely you see canceled status code, because you've initiated new query to same url as before.
However, here is couple tricks i can advice to make your queries faster:
You can start requests only when user input 3+ symbols.
Think about .limit(n) in your rails controller. Depending on business requirements you can/cannot limit amount of returned matches from controller.
As i see you only need name and price fields of your Product model. So you can shrink response size. You can use any json builder gem, like jbuilder, or simply try something like that json: #products.select(:name, :price) - it will export only selected columns from database.
To speedup DB queries you can add index to name column.
I wrote an application in rails 4. In that app, I have two pagination in single page 'x (page)'. Params like groups and page in the url.
Url looks like:
https://example.com/x?page=2&group=4
Initial page:
https://example.com/x
If pagination page params, then
https://example.com/x?page=2
If paginating groups params, then
https://example.com/x?group=2
If paginating both,then
https://example.com/x?page=2&group=2
and so on.
I am using Kaminari gem to do pagination. In that gem I used rel_next_prev_link_tags helper to show link tag for prev/next.
How to show link tags for multiple pagination?
I created an custom helper to process the URL and based on params create the categorized link tags. ex: In view,
pagination_link_tags(#pages,'page') for pages pagination
pagination_link_tags(#groups,'group') for groups pagination
def pagination_link_tags(collection,pagination_params)
output = []
link = '<link rel="%s" href="%s"/>'
url = request.fullpath
uri = Addressable::URI.parse(url)
parameters = uri.query_values
# Update the params based on params name and create a link for SEO
if parameters.nil?
if collection.next_page
parameters = {}
parameters["#{pagination_params}"] = "#{collection.next_page}"
uri.query_values = parameters
output << link % ["next", uri.to_s]
end
else
if collection.previous_page
parameters["#{pagination_params}"] = "#{collection.previous_page}"
uri.query_values = parameters
output << link % ["prev", uri.to_s]
end
if collection.next_page
parameters["#{pagination_params}"] = "#{collection.next_page}"
uri.query_values = parameters
output << link % ["next", uri.to_s]
end
end
output.join("\n").html_safe
end
You can't show search engines two-dimensional pagination. In your case it looks more like grouping/categorizing + pagination.
Like:
Group 1 pages:
https://example.com/x
https://example.com/x?page=2
https://example.com/x?page=3
Group 2 pages:
https://example.com/x?group=2
https://example.com/x?page=2&group=2
https://example.com/x?page=3&group=2
Etc.
I'm using the Mechanize gem to automate interaction with a website form.
The site i'm trying to interact with is http://www.tastekid.com/like/books
I'm trying to automatically submit a string to query in the form and return the suggested books in an array.
Following the guide, i've pretty printed the page layout to find the form name, but, I am just finding a form with no name, nill:
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
page = agent.get('http://www.tastekid.com/like/books')
pp page
How do I enter a string, submit the form and return the results in the form of an array?
These answers feel a little cluttered to me, so let me try to make it simpler:
page = agent.get 'http://www.tastekid.com/like/books'
there's only one form, so:
form = page.form
form['q'] = 'twilight'
submit the form
page = form.submit
print the text from the a's
puts page.search('.books a').map &:text
Following the guide, you can get the form:
form = page.form
I didn't see a name on the form, and I actually got two forms back: one on the page and one hidden.
I called
form.fields.first.methods.sort #not the hidden form
and saw that I could call value on the form, so I set it as such:
form.fields.first.value = "Blood Meridian"
then I submitted and pretty printed:
page = agent.submit(form)
This should work for you!
You could use the form_with method to locate the form you want. For example:
require 'mechanize'
agent = Mechanize.new
page = agent.get('http://www.tastekid.com/like/books')
the_form_you_want = page.form_with(:id => "searchFrm") # form_with
the_form_you_want.q = 'No Country for Old Men'
page = agent.submit(the_form_you_want)
pp page
It looks like the book titles all have the same class attribute. To extract the book titles, use the links_with method and pass in the class as a locator:
arr = []
page.links_with(:class => "rsrc").each do |link|
arr << link.text
end
But #aceofbassgreg is right. You'll need to read up on the mechanize and nokogiri documentation...
So this is probably a simple question, but I've never done it before.
I have a Rails action that queries a database and creates a csv string from the query result.
I'd like to take the query string, put it into a .csv file, and when the user makes the http request associated with this method, the .csv file will download onto the user's machine.
How can I do this?
UPDATE
The file is sending from rails, but my angular app on the front end (that requested the csv) is not downloading it.
Here is the angular code I'm using to request the file from the rails app
$scope.csvSubmit = function() {
var csv = $.post('http://ip_addr:3000/api/csv', { 'input': $scope.query_box });
csv.done(function(result){
//empty - after the request is sent I want the csv file to download
})
}
You can use the send_file method, passing the path to the file as the first argument, as see in Rails documentation.
UPDATE
You can use a temporary file to save the CSV, like this:
require 'tempfile'
# automatically creates a file in /tmp
file = Tempfile.new('data.csv', 'w')
file.write('my csv')
file.close
send_file(file.path)
# remove the file from /tmp
file.unlink
UPDATE 2: AngularJS download
There are two ways to accomplish this: you can add a hidden href to download the file in the page and click it, or redirect the user to the Rails URL that sends the file when he clicks in the button. Note that the redirect will use parameters in the url, so it won't work well depending on the structure of query_box.
To add a hidden href to the page with the CSV:
$scope.csvSubmit = function() {
var csv = $.post('http://ip_addr:3000/api/csv', { 'input': $scope.query_box });
csv.done(function(result){
var hiddenElement = document.createElement('a');
hiddenElement.href = 'data:attachment/csv,' + encodeURI(result);
hiddenElement.target = '_blank';
hiddenElement.download = 'filename.csv';
hiddenElement.click();
})
}
To use the redirect:
$scope.csvSubmit = function() {
var url = 'http://ip_addr:3000/api/csv/?' + 'input=' + encodeURI($scope.query_box);
window.location = url;
}
I've had to do this plenty of times before. You need to set the response headers to get the browser to force the download.
I like to use the comma gem for rendering csv. Using the gem all you need to do is add the following lines to your controller action.
respond_to do |format|
format.csv do
response.headers['Content-Type'] = 'text/csv'
response.headers['Content-Disposition'] = 'attachment; filename=books.csv'
render :csv => Book.limited(50)
end
end
Then you just use the csv format and it works.
If you don't want to use comma. Just change the render line to render your csv string:
render :plain => csv_string_variable
Use send_data to generate a downloadable file from a string in just one line:
send_data #your_data, type: 'text/csv', disposition: 'attachment', filename: 'books.csv'
I have some code to extract offers on eBay, but there are several result pages and I get only the results of the first page. How can I loop through several result pages?
Here is my code:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = "http://www.ebay.de/sch/i.html?_nkw=Suzuki+DR+BIG&_sacat=131090&_odkw=Suzuki+DR+BIG&_osacat=0&_from=R40"
doc = Nokogiri::HTML(open(url))
doc.css(".dtl").each do |dtl|
puts dtl.at_css(".vip").text
end
You have to aggregate the results from each page by pulling the link from the "next" button (which, inspecting the page, is at the css .botpg-next a) and loading it.
Something like this:
url = "http://www.ebay.de/sch/i.html?_nkw=Suzuki+DR+BIG&_sacat=131090&_odkw=Suzuki+DR+BIG&_osacat=0&_from=R40"
while (url) do
doc = Nokogiri::HTML(open(url))
doc.css(".dtl").each do |dtl|
puts dtl.at_css(".vip").text
end
link = doc.css('.botpg-next a')
url = link && link[0]['href'] #=> url is nil if no link is found on the page
end
I'm just looping until no "next" button is found, but you could change that to limit the loop to a given number of results.