Why is Nokogiri not finding this img src? - ruby-on-rails

I want get image from this Url :
doc_autobip = Nokogiri::HTML(URI.open('https://www.autobip.com/fr/actualite/sappl_mercedes_benz_livraison_de_282_camions_mercedes_benz/16757'))
The img tag is :
<img src="https://www.autobip.com/storage/photos/articles/16757/sappl_mercedes_benz_livraison_de_282_camions_mercedes_benz_2020-08-12-09-1087474.jpg" class="fotorama__img">
Logically this can be useful
src_img = article.css('img.fotorama__img').map { |link| link['src'] }
But i have alwayse src_img = [] !!
any ideas, please

The html class fotorama__img is being added to the image dynamically. Although you can see it when you inspect the page, you cannot find the fotorama__img class on it when you View Source of the page.
Nokogiri, gets the source of the website & doesn't wait for the javascript on the page to execute.
You can try something like this, which should work
doc_autobip = Nokogiri::HTML(URI.open('https://www.autobip.com/fr/actualite/sappl_mercedes_benz_livraison_de_282_camions_mercedes_benz/16757'))
# the div wrapping the image has the classes "fotorama mnmd-gallery-slider mnmd-post-media-wide"
doc_autobip.css('.fotorama.mnmd-gallery-slider.mnmd-post-media-wide img').map { |link| link['src'] }
This is just to show it works. You can choose wisely which element & classes to use to make it work.
Update:
Or if you want the content of the page to load you can use watir
require 'nokogiri'
require 'watir'
browser = Watir::Browser.new
browser.goto 'https://www.autobip.com/fr/actualite/sappl_mercedes_benz_livraison_de_282_camions_mercedes_benz/16757'
doc = Nokogiri::HTML.parse(browser.html)
doc.css('img.fotorama__img').map { |link| link['src'] }
But you'll need to install additional drivers to use watir fyi.

Related

Grails: render file from assets folder into gsp

I use require.js in a Grails project. There are a couple of single JavaScript files containing the require.js modules defined with define.
There is also a *.gsp file which generates the require.js config and the entry point starting with require() as there is some dynamic configs to be generated. It looks somehow like this:
<%# page contentType="application/javascript;charset=UTF-8" %>
require(['a', 'b'], function(a, b){
...
var a = ${controllerPropertyA};
...
some functions
...
});
In my layout I integrate require.js like this:
<script data-main='http://example.com/exampleController/dynamicGenerateMethod?domain=xyz.com' src='http://example.com/assets/require.js'></script>
All the modules a , b, and so on are asynchronously loaded by require.js. Now I would like to bundle them into a single file - I could use the require.js optimize tool but I prefer to use the assets-pipeline. This works as far as that I get all modules bundled into a single optimized-modules.js which is available on http://example.com/assets/optimized-modules.js.
The question: I would like to have the optimized JavaScript code in the dynamically rendered GSP file. So how can I inject the optimized-modules.js file into the GSP I'm dynamically rendering? I already thought about a tag defined in the tag library so that my *.gsp would look like
<%# page contentType="application/javascript;charset=UTF-8" %>
<g:renderFile file="/assets/optimized-modules.js" />
require(['a', 'b'], function(a, b){
...
var a = ${controllerPropertyA};
...
some functions
...
});
and the tag definition somehow like that:
def g:renderFile = { attrs, body ->
def filePath = attrs.file
if (!filePath) {
throwTagError("'file' attribute must be provided")
}
//out << filePath
out << request.servletContext.getResource(filePath).file
//out << grailsResourceLocator.findResourceForURI(filePath).file.text
//out << grailsApplication.mainContext.getResource(filePath).file.text
//out << Holders.getServletContext().getResource(filePath).getContent()
//IOUtils.copy(request.servletContext.getResourceAsStream(filePath), out);
}
But I can't get the content of the minified optimized-modules.js which was done by the assets-pipeline plugin on startup. Any thoughts on this?
Ok, I finally found it out by myself:
Instead of using the grailsResourceLocator I had to use the assetResourceLocator which is the way to go if you try to access assets resources.
My tag definition now looks like:
def renderFile = { attrs, body ->
def filePath = attrs.file
if (!filePath) {
throwTagError("'file' attribute must be provided")
}
ServletContextResource bar = (ServletContextResource)assetResourceLocator.findAssetForURI(filePath)
String fileAsPlainString = bar.getFile().getText("UTF-8")
out << fileAsPlainString
}
That way I can inject a compile assets javascript file into my GSP - perfect!

JQueryUI in Chrome extension content script

After putting jquery-ui(css and js) and jquery in the manifest, I can use jq selectors ($), however jquery-ui seems to be inaccessible. For example, I'm trying to insert a resizable div in a content-script (content_script.js):
var $div = document.createElement('div');
$div.id = 'divId';
$div.innerHTML = 'inner html';
$("body").append($div);//works without error
$("#divId").css("background-color","yellow");//works
//doesn't give resizable handles, however works in a regular html file:
$("#divId").resizable();
//however this also has issue:
document.getElementById("divId").style.resize = "both";
Manifest:
"css":["jquery-ui.css"],
"js": ["jquery-ui.js","jquery.js","content_script.js"]
Wrong load order - jquery-ui expects jquery to be loaded first.
"js": ["jquery.js", "jquery-ui.js", "content_script.js"]

Rails Image assets in AngularJS Directive and template

I have Rails 4 Application with AngularJS using these gems:
gem 'angularjs-rails'
gem 'angular-rails-templates'
gem 'asset_sync'
It works great with a template like this:
<img ng-controller='LikePostController'
ng-dblclick='like(post);'
ng-src='{{post.photo.standard}}'
class='lazy post_photo pt_animate_heart'
id='post_{{post.id}}_image'
/>
The Image render correctly. However in my other js
petto.directive('ptAnimateHeart', ['Helper', function(Helper){
linkFunc = function(scope, element, attributes) {
$heartIcon = $("#heart_icon");
if($heartIcon.length == 0) {
$heartIcon = $("<img id='heart_icon' src='/assets/feed.icon.heart.png' alt='Like' /> ");
$(document.body).append($heartIcon);
}
element.on('dblclick', function(event){
$animateObj = $(this);
Helper.animateHeart($animateObj);
});
}
return {
restrict: 'C',
link: linkFunc
}
}])
I got 'assets/feed.icon.heart.png' was not found error from the browser console. I have feed.icon.heart.png located under app/assets/feed.icon.heart.png.
ps: Forget to mention I use assets sync gem to host assets in amazon s3. the image worked well in development but not in production.
Hardcoded asset links only work in development because in production the assets get precompiled. Which means, amongst other things, the filename changes from:
my_image.png
into something like this (it adds and unique md5-hash):
"my_image-231a680f23887d9dd70710ea5efd3c62.png"
Try this:
Change the javascript file extension to: yourjsfile.js.erb
And the link to:
$heartIcon = $("<img id='heart_icon' src='<%= image-url("feed.icon.heart.png") %>' alt='Like' /> ");
For better understanding The Asset Pipeline — Ruby on Rails Guides
You can define the following method somewhere in your helpers, e.g. in app/helpers/application_helper.rb:
def list_image_assets(dir_name)
path = File.expand_path("../../../app/assets/images/#{dir_name}", __FILE__)
full_paths = Dir.glob "#{path}/**.*"
assets_map = {}
full_paths.each do |p|
original_name = File.basename p
asset_path = asset_path p[p.index("#{dir_name}")..-1]
assets_map[original_name] = asset_path
end
assets_map.to_json
end
One can modify the method to work with any assets you wish, not just the ones located in subdirs of app/assets/images as in this example. The method will return a map with all the original asset names as keys and their 'compiled' names as values.
The map returned can be passed to any angular controller via ng-init (not generally recommended, but appropriate in this case):
<div ng-controller="NoController" ng-init="assets='<%=list_image_assets "images_dir_name"%>'"></div>
To make the assets really usable in angular, define a new $scope valiable in the controller:
$scope.$watch('assets', function(value) {
if (value) {
$scope.assets = JSON.parse(value);
}
});
Having this in the $scope, it's possible to use assets names as usual, in e.g. ng-src directives, and this won't brake after the precompile process.
<img ng-src={{::assets['my_image.png']}}/>
Just do the following:
app.run(function($rootScope,$location){
$rootScope.auth_url = "http://localhost:3000"
$rootScope.image_url = $rootScope.auth_url + "/uploads/user/image/"
});
In controller inject dependency for $rootScope
and in views
<img ng-src="{{user.image.url}}" width="100px" height="100px">
Note: It's working great in Rails API and it assumes that you've user object available so that it could specify the correct image in the /uploads/image/ directory

Jasmine, RequireJS and Rails

I'm starting to make the move over to requireJS for a project I'm building. I'm currently using jasminerice, rails 3.2 and the require-rails gem.
I've tried to implement http://ryantownsend.co.uk/post/31662285280/jasminerice-and-requirejs-rails-fix with little success, the specs don't run at all.
I am starting to think it maybe I might be better to use requirejs on it's own or maybe the jasmine gem?
I'm not sold on either jasminerice or require-rails gems, so does anyone have any advice on the best tools, and any tips on how to get it up and running/good tutorials?
Ok as I didn't get any response I managed to find a slightly hacky way of making it work.
If you create a file in your view folder jasminerice/spec/index.html.erb (or haml) and copy the html from the jasminerice gem. Replace the spec.js call with:
%script{"data-main"=>"/assets/#{#specenv}", src:"/assets/require.js"}
Then write your spec file like require template like so:
require.config {
paths:{
'jquery':'/assets/jquery'
'underscore': '/assets/underscore-min'
'sinon':'sinon-1.6.0'
'jasmine-sinon':'jasmine-sinon'
'my_js':'my_js'
'my_spec':'my_spec'
}
}
require ['sinon', 'jasmine-sinon', 'jquery', 'underscore', 'my_js', 'my_spec'], () ->
jasmine.getEnv().execute()
This will prevent jasminerice triggering the tests
jasmine.rice.autoExecute = false
Set up your tests with a beforeFilter similar to this(taken from http://kilon.org/blog/2012/08/testing-backbone-requirejs-applications-with-jasmine/)
describe "MySpec", ->
beforeEach ->
flag = false
#thing = ""
that = #
require ['myjs'], (Myjs) ->
flag = true
that.thing = new Myjs()
waitsFor ->
flag
it 'It should exsist', ->
expect(#thing).toBeDefined()
Hope that helps anyone with a similar issue and if anyone has a better solution please post! :)
I have the same setup, here's what I did (starting from the blog post mentioned in the original question):
1. Create a helper to load all spec files
In a file lib/jasminerice/spec_helper.rb, put the following code:
require "requirejs-rails"
module Jasminerice
module SpecHelper
include RequirejsHelper
def spec_files
Rails.application.assets.each_logical_path.select { |lp| lp =~ %r{^spec/.*\.js$} }
end
end
end
This will create a helper method spec_files which you can call in the Jasminerice runner view to automatically get all your specs, so you don't need to update the list of specs every time you add a new one.
2. Override default Jasminerice index view
Create a view named app/views/jasminerice/spec/index.html.erb with the following:
<!doctype html>
<head>
<title>Jasmine Spec Runner</title>
<%= stylesheet_link_tag "jasmine", "spec" %>
<%= requirejs_include_tag 'application' %>
<%= javascript_include_tag "jasminerice", "spec", :debug => true %>
<script>
jasmine.rice.autoExecute = false;
require([<%= spec_files.map { |f| "'#{f.sub(/\.js$/,'')}'" }.join(',').html_safe %>],
function() { jasmine.getEnv().execute() },
function(err) {
var failedId = err.requireModules && err.requireModules[0];
requirejs.undef(failedId);
define(failedId, function() { return function() { console.debug(failedId + ': ' + err); null }; });
require([ failedId ], function() {} );
});
</script>
<%= csrf_meta_tags %>
</head>
<body>
</body>
</html>
This will require all the specs before running Jasmine (with jasmine.getEnv().execute()). I have an ugly hack in there to take the array of spec paths and generate an array of module names in quotes to pass to require.
I've also included an error callback in case there's a problem loading a module -- if you don't do this, your specs will hang when a module load fails. That's especially a problem when you're running them on the command line through guard-jasmine, which is what I do.
Unfortunately I haven't found a very good way to handle such errors -- here I write some info to console.debug and then required the failed module, returning an anonymous function in its place. This allows the specs to run but produces unpredictable results (which is better than no results). I've been struggling to find a better way to deal with this situation, suggestions would be much appreciated.
3. Write some specs
My Jasmine specs take the form:
define (require) ->
MyModule = require 'my-module'
# any other dependencies needed to test
describe 'MyModule', ->
it 'exists', ->
expect(MyModule).toBeDefined()
etc. Note that all my testing dependencies (jasmine, sinon, jasmine-sinon, etc.) I load outside of require, in spec.js.coffee:
#=require sinon
#=require jasmine-sinon
#=require_tree ./helpers/
I put any other helper functions I need in the helpers directory.
4. Bonus
One other tip: if you have problems because your browser won't reload modules even when they change, I use a trick of adding a dummy argument with a timestamp so that the browser will always see a new file and correctly load it.
I created this function in ApplicationController which I load in a before filter:
before_filter :set_requirejs_config
def set_requirejs_config
opts = { :urlArgs => "bust=#{Time.now.to_i}" }) if Rails.env == "development"
Requirejs::Rails::Engine.config.requirejs.run_config.merge!(opts)
end
This adds a query param bust=... to the end of each module name if we're in development mode, so that we always reload modules and get the most up-to-date version. Somewhere there's a post on SO explaining how to do this in RequireJS, but to get it to work with requirejs-rails you have to put it into ApplicationController (and not config/requirejs.yml) so that it is loaded every time you load the page.
Hope that might provide some hints to anyone else using this configuration!

Nokogiri and Mechanize help (navigating to pages via div class and scraping)

I need help clicking on some elements via div class, not by text of link, to get to a page to scrape some data.
Starting with the page http://www.salatomatic.com/b/United-States+125, how do I click on each state's name without using the text of the link but by the div class?
After clicking on a state, for example http://www.salatomatic.com/b/Alabama+7, I need to click on a region in the state, again by div class, not text of the link.
Inside a region, www [dot] salatomatic [dot] com/c/Birmingham+12, I want to loop through, clicking on each of the items (11 mosques in this example).
Inside the item/mosque, I need to scrape the address (at the top under the title of the mosque) and store/create it in my database.
UPDATES:
I have this now:
require 'nokogiri'
require 'open-uri'
require 'mechanize'
agent = Mechanize.new
page = agent.get("http://www.salatomatic.com/b/United-States+125")
#loops through all state links
page.search('.subtitleLink a').map{|a| page.uri.merge a[:href]}.each do |uri|
page2 = agent.get uri
#loops through all regions in each state
page2.search('.subtitleLink a').map{|a| page2.uri.merge a[:href]}.each do |uri|
page3 = agent.get uri
#loops through all places in each region
page3.search('.subtitleLink a').map{|a| page3.uri.merge a[:href]}.each do |uri|
page4 = agent.get uri
#I'm able to grab the title of the place but not sure how to get the address b/c there is no div around it.
puts page4.at('.titleBM')
#I'm guessing I would use some regex/xpath here to get the address, but how would that work?
#This is the structure of the title/address in HTML:
<td width="100%"><div class="titleBM">BIS Hoover Crescent Islamic Center </div>2524 Hackberry Lane, Hoover, AL 35226</td> This is the listing page: http://www.salatomatic.com/d/Hoover+12446+BIS-Hoover-Crescent-Islamic-Center
end
end
end
It's important to make sure the a[:href]'s are converted to absolute urls first though.
Therefore, maybe:
page.search('.subtitleLink a').map{|a| page.uri.merge a[:href]}.each do |uri|
page2 = agent.get uri
end
For the pages of US and regions you can do:
agent = Mechanize.new
page = agent.get('http://www.salatomatic.com/b/United-States+125')
page.search("#header a").each { |a| ... }
Here inside the block you can find corresponding link and click:
page.link_with(text: a.text).click
or ask mechanize to load the page by href:
region_page = agent.get a[:href]
Inside the region you can do the same, just search like
page.search(".tabTitle a").each ...
for Tabs (Restaurants, Markets, Schools etc.) And like
page.search(".subtitleLink a").each ...
How to find these things? Try some bookmarklets like SelectorGadget or similar, dig in HTML source code and find common parents/classes for links you are interested in.
UPDATED getting page by href as #pguardiario suggested

Resources