Twitter Card Images not working on Gatsby app - twitter

I'm working on a Gatsby app with Netlify CMS (and hosted on Netlify). Trying to get the metadata working so that Twitter cards display correctly with images.
The metadata is generally all right, but the images aren't showing on the Twitter validator or if I try to post to Twitter. The problem is clearly the images themselves, which are hosted on the site using Gatsby and Gatsby Image Sharp to render.
In fact, the validator seems to show no fundamental issues. Simply, the image doesn't show up:
Example relevant metadata:
<meta name="twitter:url" content="https://example.com/" data-react-helmet="true">
<meta name="twitter:image" content="https://example.com/static/12345/c5b20/blah.jpg" data-react-helmet="true">
<meta data-react-helmet="true" name="twitter:title" content="Site title">
<meta data-react-helmet="true" name="twitter:card" content="summary_large_image">
I know the images the issue, because if I replace my image URL (which is the full image URL) with an external URL, it works fine, showing the full card with image.
Any idea what could be causing this? I'm sizing the image down so it loads quickly, and it seems to load just fine directly (eg). (I mean, is there something weird/off about that image?)
NOTE: In a previous version of this question, I referenced Cloudinary and Uploadcare, but have since removed those two in a branch to simplify the problem. (They seem to have been unecessary holdovers from the starter app I used.) You can now see an example page for that branch here and the associated image in the twitter:image tag here. I feed this pre-processed/shrunk image into the header using React Helmet (and Gatsby React Helmet) and using the following code in my GraphQL call to get the image associated with the blogpost in that particular, smaller format:
featuredimage {
childImageSharp {
fixed(width: 480, quality: 75) {
src
}
}
Second Note/thought: Should I be worried about the fact that the pages in production seem to be re-rendering on every reload? Isn't SSR supposed to ensure that doesn't happen? I tested this by including a call to Math.random(), hidden, in the page. You can see the result by running document.getElementsByClassName('document')[0].children[0].innerText, and note that it produces a different number on each page reload. This implies to me that the whole page is being re-rendered by the client. Isn't that wrong? Why would that be happening? Might that relate to some sort of client processing of the images on each request, which might be screwing up the Twitter cards?
Third update: I put together a simpler reproduction here. It's based off of this starter template, with Uploadcare/Cloudinary removed and Twitter card metadata added to the header. Other than that, and removing unnecessary pages, I didn't make any other changes. I used this starter for a repro rather than a vanilla starter app, because I'm unsure whether the issue is caused by the interaction of Netlify CMS and the Gatsby Sharp Image plugin. I might try to put together a second reproduction. For now, the code for this repo is here, and the pages that should show Twitter cards are the blog posts, such as this one.
ACTUALLY, it seems that a super basic reproduction, with Gatsby 3 and no Netlify CMS or anything, has the same issue. Here's the minimal reproduction, with the image taken from src/images using an allImageSharp query and inserted into the metadata for each page. Code here.
FINAL UPDATE
Based on Derek's answer below, I removed the #reach/router stuff, and got the site URL from Netlify build env variables. It appeared that #reach/router only gave this information when JS was running, which excluded the Twitterbot, resulting in an undefined base URL, which broke the Twitter image. Including the URL from Netlify (using process.env.URL in the Gatsby config and pulling that in through a siteMetadata query) fixed the problem!

Update:
I think I might have found the issue. When opening the minimal production with script disabled, the url for twitter:image is invalid:
<meta data-react-helmet="true" name="twitter:image" content="undefined/static/03475800ca60d2a62669c6ad87f5fda0/58026/energy.jpg">
So for some reasons, during build, the hostname is missing, but when JS kicks in, it appears (Might have something to do with the way you get the hostname). Twitter crawlers probably does not have JS enabled & couldn't fetch the image.
Make sure your opengraph images are absolute urls with https:// or http:// protocols. I checked your example link & saw that it was a relative link (/static/etc.)
For Twitter, it seems to demand social cards to be 2:1
Images for this Card support an aspect ratio of 2:1 with minimum dimensions of 300x157 or maximum of 4096x4096 pixels.
https://developer.twitter.com/en/docs/twitter-for-websites/cards/overview/summary-card-with-large-image
If you're using the latest Gatsby image plugin, you can use aspectRatio to crop the image.
Also note that you can skip the twitter:image tag, if your og:image has already satisfied Twitter's card requirement.
SSR does not mean to never run JS in the client, React will render your page on the client side regardless of SSR.

This was solved here: https://github.com/gatsbyjs/gatsby/discussions/32100.
"location and thus origin is not available during gatsby build and thus the generated HTML has undefined there."
I got it working by changing the way I create the image URL inside seo.js from this:
let origin = "";
if (typeof window !== "undefined") {
origin = window.location.origin;
}
const image = origin + imageSrc;
to this:
const imageSrc = thumbnail && thumbnail.childImageSharp.fixed.src;
const image = site.siteMetadata?.siteUrl + imageSrc;
You need to use siteUrl from siteMetadata.
Below is my pageQuery from inside blog-post.js:
export const pageQuery = graphql`
query BlogPostBySlug(
$id: String!
$previousPostId: String
$nextPostId: String
) {
site {
siteMetadata {
title
siteUrl
}
}
markdownRemark(id: { eq: $id }) {
id
excerpt(pruneLength: 160)
html
frontmatter {
title
date(formatString: "MMMM DD, YYYY")
description
thumbnail {
childImageSharp {
fixed(width: 1200) {
...GatsbyImageSharpFixed
}
}
}
}
}
}
`

Related

Rails Cloudinary Widget implementation trouble

I am in the process of moving from Dropzone's widget to Cloudinary's widget and running into heaps of trouble.
First off Dropzone is currently working beautifully with uploads to cloudinary. I am moving to their proprietary widget for a bunch of reasons that would just distract this post.
The issue I am having is "simple" at first glance. Images are uploading correctly to Cloudinary. It is on the subsequent form post that I am having issues.
Dropzone automagically creates necessary hidden inputs and values...Cloudinary you have to roll your own. So I have done that and not only is it not working the input values are very different to what dropzone generates for the very same image. I cannot find the logic in dropzone.js that can explain how the inputs are created.
For example, here is what dropzone renders for one image:
<input type="hidden" name="entity[job_entries_attributes][0][images][]" value="eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBBNkpLQWc9PSIsImV4cCI6bnVsbCwicHVyIjoiYmxvYl9pZCJ9fQ==--7d13c16894d2a146f1ac85e12ddea03d9c14c26e">
When I hand roll I have access to the object returned from direct upload to Cloudinary - public_id, asset_id, etc. But none of them resemble the value above. I am assuming the post and subsequent image rendering is failing because of this.
Anyone have experience with this??? Driving me nuts...
The Upload Widget is an interactive UI for uploading images to your Cloudinary Media Library account and its usage can be extended depending on the use case requirements. Although, the widget does not have many features that Dropzone has, but you can use the different events to include additional custom processes using javascript or HTML (see sample demos here and here).
<div id="widgetdiv"></div>
<script type="text/javascript">
cloudinary.openUploadWidget({
cloudName: 'cloudname',
inlineContainer: '#widgetdiv',
uploadPreset: 'uploadPreset',
showPoweredBy: false,
sources: ['local','instagram']},
(error, result) => {
if (!error && result && result.event === "success") {
console.log('Done! Here is the image info: ', result.info);
console.log('Custom implementations here...');
var filename = result.info.secure_url.split('/').pop().split('#')[0].split('?')[0];
var filesize = result.info.bytes;
// Populate form, card container, post to another endpoint, etc
}
}
);
</script>

Inside Splash, how to use src attribute to append to a url

------------ORIGINAL QUESTION------------------
In my Splash Script, I am trying to use "splash:go" on a new url that is based on the "src" attribute of an "img" tag. How can I access this "src" relative url and join it to a start_url?
For example, imagine that the img element has the following contents:
<img id="ImageViewer1_docImage" onload="BlockerResize('ImageViewer1_ContentBlocker1','ImageViewer1_WaterMarkImage');" src="ACSResource.axd?SCTTYPE=ENCRYPTED&SCTKEY=gMYed5OWqcT9I1Y2fM85DvB48X5U1DQ5mOUiJoUH4rioyau0nJdxt0PHFfGVTMiUsork/YD+Cw0F6ZzcviP4sG09xrqWM8/zJlyEeVRFkKXVnkyHYWgwNJzCSUE4Kh4yCsqw6mCuIxWxPj6BAI7Hbw==&CNTWIDTH=849&CNTHEIGHT=684&FITTYPE=Height&ZOOM=1" alt="Please wait" style="border-width:0px;cursor: url(images/Cursors/hmove.cur); z-index: 1000">
Here I am trying to extract the src attribute and add it to start_url:
https://i2a.uslandrecords.com/ME/Cumberland/D/
I want all of this inside the Splash script. I need it to be done inside of Splash because otherwise I lose my security/encryption or something--it renders "Bad Data" instead of the new webpage. Do you have any recommendations?
------------UPDATE------------------
So I managed to obtain the url I needed from the src attribute using the following code:
var = splash:evaljs("document.getElementById('ImageViewer1_docImage').src;")
splash:go(var)
However, the problem is that this is producing a error message. All I find in the snapshot is a white page with the following message:
Failed loading page (Frame load interrupted by policy change)
https://i2a.uslandrecords.com/ME/Cumberland/D/ACSResource.axd?SCTTYPE=ENCRYPTED&SCTKEY=gMYed5OWqcSvEWOJA6wGVmb642s2oZHqkYmT6VTpORTzMY7CgvDU5jsjJG/xp0X3eQ9BiDnbaTdAmISeLkC3hyjxGjcSnXOKgGDa8cI2fniY0ILT+NqvQToMGIB+/X3ZIs7Q+D4ppTSZGYZ2L4M/
Webkit error #102
Any idea why?
The image src attribute is exactly the URL you need to access or as stated by the question title you need to append it to some other URL parts?
If that is the case, you can do it by '..'
Ex.: splash:go(base_url..var) -- concatenation
ISSUE RESOLVED:
Here is the solution. The GET request was breaking down because it didn't know how to render the image in html given the webkit settings. If you execute the GET request without rendering the page, the response.body has the image.
CODE:
local response = splash:http_get(var)
return {
body = response.body
}

NativeScript Webview newbie questions

I am experimenting with using NativeScript to speed up the process of porting an existing Android app to iOS. The app in question uses a great deal of SVG manipulation in a Cordova webview. To keep things simple I want to port all of my existing Webview side code - in essence the entire existing Cordova www folder and its contents - over to the new NativeScript app. The WebView talks to a custom Cordova plugin which I use to talk with my servers to do such things as get new SVGs to show, keep track of user actions etc.
If I an get through these teething issues I am considering using this component to implement bi-direction communications between by current webview JS code and the, new, NativeScript backend that will replace my current Cordova plugin. Someone here is bound to tell me that I don't need to do that... . However, doing so would mean throwing out the baby with the bathwater and rewriting all of my current Webview ES6/JS/CSS code.
This is pretty much Day 1 for me with NativeScript and I have run into a few issues.
I find that I cannot get rid of the ActionBar even though I have followed the instructions here and set the action bar to hidden.
I can use the following markup in home.component.html
to show external web content. However, what I want to really do is to show the local HTML file that is in the www folder in the following folder hierarchy
app
|
____home
|
____www
|
______ index.html
|
______css
|
______ tpl
|
.....
However, when I use the markup
<Page actionBarHidden="true" >
<WebView src="~/www/index.html"></WebView>
</Page>
I am shown the error message
The webpage at file:///data/data/com.example.myapp/files/app/www/index.html is not available.
I'd be most grateful to anyone who might be able to tell me what I am doing wrong here - and also, how I can get rid of that action bar which is currently showing the app title.
About using local HTML file
Is your local HTML file recognized by Webpack (which is enabled by default in NativeScript)? Try to explicitly add the local HTML file to your webpack.config.js file. This way Webpack will "know" that it will have to bundle this file as well.
new CopyWebpackPlugin([
{ from: { glob: "<path-to-your-custom-file-here>/index.html" } }, // HERE
{ from: { glob: "fonts/**" } },
{ from: { glob: "**/*.jpg" } },
{ from: { glob: "**/*.png" } },
]
Example here
About hiding the ActionBar
NativeScript Core only: Try hiding the action bar directly for the frame that holds the page. See the related documentation here
NativeScript Angular: The page-router-outlet will have an action bar by default (you can hide it by using the Page DI as done here). Otherwise, you could create a router-outlet (instead of page-router-outlet). The router-outler won't have the mobile-specific ActionBar.

How to set a resource in a firefox addon?

I have basically the same problem as this guy. I have a page, accessed over the web (well, local intranet, if that matters), and it needs to reference images on the client's machine. I know those images are going to be in C:\pics. Internet Explorer lets you just reference them, but I'm having trouble printing properly with internet explorer, so I want to try firefox. The answer on that question says you can create a "resource" with a firefox add-on that pages will be able to reference. However, it doesn't seem to be working. I followed the guide for how to make your first add-on and got the red border to work on mozilla sites. I tried editing that add-on to include a chrome.manifest file that just says this:
resource exposedpics file:///C:/pics
and then the page (an asp page) references exposedpics.
<img align=left border="0" src="resource:///exposedpics/<%=Request("Number")%>.jpg" style="border: 3 solid #<%=bordercolor%>" align="right" WIDTH="110" HEIGHT="110">
the page doesn't show the picture. If I go to View Image Info on the image, I'll see the address is "resource:///exposedpics/8593.jpg" (in my example where I input 8593), but it doesn't show the image here. (yes, the image does exist under c:\pics. if I go to file:///C:/pics/8593.jpg, it loads.)
so maybe I don't know how to use a chrome.manifest. (I'm not sure if I need to reference it somehow in my manifest.json, I'm not.) That stack overflow question also says it's possible to dynamically create resources. so I tried to make my manifest.json say:
{
"manifest_version": 2,
"name": "FirefoxPixExposer",
"version": "1.0",
"description": "allows websites to access C:\\pics",
"content_scripts": [
{
"matches": ["<all_urls>"],
"js": ["expose.js"]
}
]
}
and expose.js says
// Import Services.jsm unless in a scope where it's already been imported
Components.utils.import("resource://gre/modules/Services.jsm");
var resProt = Services.io.getProtocolHandler("resource")
.QueryInterface(Components.interfaces.nsIResProtocolHandler);
var aliasFile = Components.classes["#mozilla.org/file/local;1"]
.createInstance(Components.interfaces.nsILocalFile);
aliasFile.initWithPath("file:///C:/pics");
var aliasURI = Services.io.newFileURI(aliasFile);
resProt.setSubstitution("ExposedPics", aliasURI);
but the same thing happens, the image doesn't display. I did notice that if I put document.body.style.border = "5px solid red"; at the top of expose.js, I do see a border around the body, but if I move it to below the line Components.utils.import("resource://gre/modules/Services.jsm"); it doesn't show up. Therefore, I suspect the code to dynamically create a resource is broken.
What am I doing wrong? Ultimately, how can I get an image on the client's machine to show up on a page from the internet?
You are writing a WebExtensions so none of the APIs you are trying to use exist.
This includes Components.utils.import, Components.classes etc. You should read Working with files on MDN to get an idea, what is still possible.

How to scrape images from eBay and Amazon using XPath in Nokogiri from JSON

I'm trying to scrape images from websites using Nokogiri and XPath, so far with limited success. For a typical website whose HTML has img and src, I can use:
tmp2 = Nokogiri::HTML(open(site_url))
tmp2.xpath("//img/#src").each do |src|
...do whatever
end
However, some sites like Amazon and eBay only trigger certain images with JavaScript. If I look at the code I can see the data in arrays. For example, from Amazon:
<script type="text/javascript">
P.when('jQuery', 'cf').execute(function($, cf){
P.load.js('http://z-ecx.images-amazon.com/images/G/01/browser-scripts/imageBlock-udp-airy/imageBlock-udp-airy-4060168860._V1_.js');
});
P.when('A', 'jQuery', 'ImageBlockATF', 'cf').register('ImageBlockBTF', function(A, $, imageBlockATF, cf){
var data = {"indexToColor":[],"burjImageBlock":0,"isSwatchHoverConsistent":1,"heroFocalPoint":null,"visualDimensions":["color_name"],"productGroupID":"apparel_display_on_website","newVideoMissing":0,"useIV":0,"useClickZoom":null,"useChildVideos":0,"numColors":7,"logMetrics":0,"defaultColor":"initial","airyConfig":{"enableContinuousPlay":null,"installFlashButtonText":"Install Flash Player","contentTitle":null,"autoplayCutOffTimeSeconds":null,"ageGate":{"monthNames":["January","February","March","April","May","June","July","August","September","October","November","December"],"deniedPrompt":"We're sorry. You are not old enough to watch this video.","submitText":"Submit","prompt":"This video is not intended for all audiences. What date were you born?"},"videoAds":null,"videoUnsupportedPrompt":"Sorry, this video is unsupported on this browser.","desiredMode":null,"swfUrl":"http://g-ecx.images-amazon.com/images/G/01/vap/video/airy2/prod/2.0.1102.0/flash/AiryBasicRenderer._V304902271_.swf","isAutoplayEnabled":null,"installFlashPrompt":"Adobe Flash Player is required to watch this video.","isLiveStream":null,"regionCode":"NA","contentId":null,"playbackErrorPrompt":"Sorry, an error has occurred while attempting video playback. Please try again later.","contentMinAge":null,"isForesterTrackingDisabled":null,"streamingUrls":null,"parentId":null,"foresterMetadataParams":{"client":"Dpx","requestId":"1MX7VHFRVAS6TWY64BXC","marketplaceId":"ATVPDKIKX0DER","session":"182-9511970-7757812","method":"Apparel.ImageBlock"},"jsUrl":"http://z-ecx.images-amazon.com/images/G/01/vap/video/airy2/prod/2.0.1102.0/js/airy.chromeless._V304902265_.js"},"mainImageMaxSizes":null,"staticStrings":{"playVideo":"Click to play video","rollOverToZoom":"Roll over image to zoom in","images":"Images","video":"video","clickToZoom":"Click on image to zoom in","touchToZoom":"Touch the image to zoom in","videos":"Videos","close":"Close","pleaseSelect":"Please select","clickToExpand":"Click to open expanded view","allMedia":"All Media"},"notThumbnailClickImmersiveView":1,"gIsNewTwister":1,"title":"Threads 4 Thought Women's Tabitha Basic Tank Top","ivRepresentativeAsin":{"6":"B00T46V76W","4":"B00WM3O7ES","1":"B00T46YZES","3":"B00WM3NLPE","2":"B00T46VD16","5":"B00T46VGXQ"},"mainImageSizes":[[342,445],[385,500],[425,550],[466,606],[522,679]],"isQuickview":0,"ipadVideoSizes":[[340,444],[384,500]],"colorToAsin":{"Coral Dreams":{"asin":"B00T46V76W"},"Heather Grey":{"asin":"B00WM3NLPE"},"Black":{"asin":"B00T46YZES"},"White":{"asin":"B00T46VGXQ"},"Deep Blue Sea":{"asin":"B00T46VD16"},"Sea Glass":{"asin":"B00WM3O7ES"}},"thumbExperimentEnabledValue":1,"showLITBOnClick":0,"videoSizes":[[342,445],[384,500]],"stretchyGoodnessWidth":[1280,1440,1640,1800],"autoplayVideo":0,"hoverZoomIndicator":"","sitbReftag":"","useHoverZoom":1,"staticImages":{"zoomOut":"http://g-ecx.images-amazon.com/images/G/01/detail-page/cursors/zoom-out._V184888738_.bmp","hoverZoomIcon":"http://g-ecx.images-amazon.com/images/G/01/img11/apparel/UX/DP/icon_zoom._V138923886_.png","zoomIn":"http://g-ecx.images-amazon.com/images/G/01/detail-page/cursors/zoom-in._V184888790_.bmp","zoomLensBackground":"http://g-ecx.images-amazon.com/images/G/01/apparel/rcxgs/tile._V211431200_.gif","videoThumbIcon":"http://g-ecx.images-amazon.com/images/G/01/Quarterdeck/en_US/images/video._V183716339_SX38_SY50_CR,0,0,38,50_.gif","spinner":"http://g-ecx.images-amazon.com/images/G/01/ui/loadIndicators/loading-large_labeled._V192238949_.gif","zoomInCur":"http://g-ecx.images-amazon.com/images/G/01/detail-page/cursors/zoomIn._V323082799_.cur","videoSWFPath":"http://g-ecx.images-amazon.com/images/G/01/Quarterdeck/en_US/video/20110518115040892/Video._V178668404_.swf","arrow":"http://g-ecx.images-amazon.com/images/G/01/javascripts/lib/popover/images/light/sprite-vertical-popover-arrow._V186877868_.png","zoomOutCur":"http://g-ecx.images-amazon.com/images/G/01/detail-page/cursors/zoomOut._V323082798_.cur"},"videos":[],"gPreferChildVideos":0,"altsOnLeft":1,"ivImageSetKeys":{"Coral Dreams":"6","Heather Grey":"3","Black":"1","initial":0,"White":"5","Deep Blue Sea":"2","Sea Glass":"4"},"useHoverZoomIpad":"","isUDP":1,"alwaysIncludeVideo":0,"widths":[1280,1440,1640,1800],"maxAlts":7,"useChromelessVideoPlayer":1,"mainImageHeightPartitions":null};
data["customerImages"] = eval('[]');
data["colorImages"] = {"Coral Dreams":[{"large":"http://ecx.images-amazon.com/images/I/41FGlhksmtL.jpg","variant":"MAIN","hiRes":"http://ecx.images-amazon.com/images/I/81iXQbkcpiL._UL1500_.jpg","thumb":"http://ecx.images-amazon.com/images/I/41FGlhksmtL._SR38,50_.jpg","main":{"http://ecx.images-amazon.com/images/I/81iXQbkcpiL._UX466_.jpg":["466","606"],"http://ecx.images-amazon.com/images/I/81iXQbkcpiL._UX522_.jpg":["522","679"],"http://ecx.images-amazon.com/images/I/81iXQbkcpiL._UY550_.jpg":["423","550"],"http://ecx.images-amazon.com/images/I/81iXQbkcpiL._UX342_.jpg":["342","445"],"http://ecx.images-amazon.com/images/I/81iXQbkcpiL._UY500_.jpg":["385","500"]}},{"large":"http://ecx.images-amazon.com/images/I/41XR9o0cV-L.jpg","variant":"BACK","hiRes":"http://ecx.images-amazon.com/images/I/81bVmFiRu0L._UL1500_.jpg","thumb":"http://ecx.images-amazon.com/images/I/41XR9o0cV-L._SR38,50_.jpg","main":{"http://ecx.images-amazon.com/images/I/81bVmFiRu0L._UY500_.jpg":["385","500"],"http://ecx.images-amazon.com/images/I/81bVmFiRu0L._UX522_.jpg":["522","679"],"http://ecx.images-amazon.com/images/I/81bVmFiRu0L._UX342_.jpg":["342","445"],"http://ecx.images-amazon.com/images/I/81bVmFiRu0L._UX466_.jpg":["466","606"],"http://ecx.images-amazon.com/images/I/81bVmFiRu0L._UY550_.jpg":["423","550"]}}],"Heather Grey":[{"large":"http://ecx.images-amazon.com/images/I/41f-8R8Eu-L.jpg","variant":"MAIN","hiRes":"http://ecx.images-amazon.com/images/I/81dTYkBL%2BxL._UL1500_.jpg","thumb":"http://ecx.images-amazon.com/images/I/41f-8R8Eu-L._SR38,50_.jpg","main":{"http://ecx.images-amazon.com/images/I/81dTYkBL%2BxL._UX466_.jpg":["466","606"],"http://ecx.images-amazon.com/images/I/81dTYkBL%2BxL._UY500_.jpg":["385","500"],"http://ecx.images-amazon.com/images/I/81dTYkBL%2BxL._UY550_.jpg":["423","550"],"http://ecx.images-amazon.com/images/I/81dTYkBL%2BxL._UX522_.jpg":["522","679"],"http://ecx.images-amazon.com/images/I/81dTYkBL%2BxL._UX342_.jpg":["342","445"]}},{"large":"http://ecx.images-amazon.com/images/I/41gLiFBbcdL.jpg","variant":"BACK","hiRes":"http://ecx.images-amazon.com/images/I/81ua3AXCpJL._UL1500_.jpg","thumb":"http://ecx.images-amazon.com/images/I/41gLiFBbcdL._SR38,50_.jpg","main":{"http://ecx.images-amazon.com/images/I/81ua3AXCpJL._UX342_.jpg":["342","445"],"http://ecx.images-amazon.com/images/I/81ua3AXCpJL._UY550_.jpg":["423","550"],"http://ecx.images-amazon.com/images/I/81ua3AXCpJL._UY500_.jpg":["385","500"],"http://ecx.images-amazon.com/images/I/81ua3AXCpJL._UX522_.jpg":["522","679"],"http://ecx.images-amazon.com/images/I/81ua3AXCpJL._UX466_.jpg":["466","606"]}}],"Black":[{"large":"http://ecx.images-amazon.com/images/I/41BxSpfEM7L.jpg","variant":"MAIN","hiRes":"http://ecx.images-amazon.com/images/I/81%2BTW8762BL._UL1500_.jpg","thumb":"http://ecx.images-amazon.com/images/I/41BxSpfEM7L._SR38,50_.jpg","main":{"http://ecx.images-amazon.com/images/I/81%2BTW8762BL._UY550_.jpg":["423","550"],"http://ecx.images-amazon.com/images/I/81%2BTW8762BL._UX342_.jpg":["342","445"],"http://ecx.images-amazon.com/images/I/81%2BTW8762BL._UX522_.jpg":["522","679"],"http://ecx.images-amazon.com/images/I/81%2BTW8762BL._UY500_.jpg":["385","500"],"http://ecx.images-amazon.com/images/I/81%2BTW8762BL._UX466_.jpg":["466","606"]}},{"large":"http://ecx.images-amazon.com/images/I/41Gf%2BW-cPTL.jpg","variant":"BACK","hiRes":"http://ecx.images-amazon.com/images/I/81SJwuaCspL._UL1500_.jpg","thumb":"http://ecx.images-amazon.com/images/I/41Gf%2BW-cPTL._SR38,50_.jpg","main":{"http://ecx.images-amazon.com/images/I/81SJwuaCspL._UY500_.jpg":["385","500"],"http://ecx.images-amazon.com/images/I/81SJwuaCspL._UX522_.jpg":["522","679"],"http://ecx.images-amazon.com/images/I/81SJwuaCspL._UX342_.jpg":["342","445"],"http://ecx.images-amazon.com/images/I/81SJwuaCspL._UX466_.jpg":["466","606"],"http://ecx.images-amazon.com/images/I/81SJwuaCspL._UY550_.jpg":["423","550"]}}],"White":[{"large":"http://ecx.images-amazon.com/images/I/41tElK2wPKL.jpg","variant":"MAIN","hiRes":"http://ecx.images-amazon.com/images/I/81kKgU75rIL._UL1500_.jpg","thumb":"http://ecx.images-amazon.com/images/I/41tElK2wPKL._SR38,50_.jpg","main":{"http://ecx.images-amazon.com/images/I/81kKgU75rIL._UY550_.jpg":["423","550"],"http://ecx.images-amazon.com/images/I/81kKgU75rIL._UX522_.jpg":["522","679"],"http://ecx.images-amazon.com/images/I/81kKgU75rIL._UY500_.jpg":["385","500"],"http://ecx.images-amazon.com/images/I/81kKgU75rIL._UX342_.jpg":["342","445"],"http://ecx.images-amazon.com/images/I/81kKgU75rIL._UX466_.jpg":["466","606"]}},{"large":"http://ecx.images-amazon.com/images/I/31lEDIs4cqL.jpg","variant":"BACK","hiRes":"http://ecx.images-amazon.com/images/I/81OBgvbUR7L._UL1500_.jpg","thumb":"http://ecx.images-amazon.com/images/I/31lEDIs4cqL._SR38,50_.jpg","main":{"http://ecx.images-amazon.com/images/I/81OBgvbUR7L._UX466_.jpg":["466","606"],"http://ecx.images-amazon.com/images/I/81OBgvbUR7L._UX342_.jpg":["342","445"],"http://ecx.images-amazon.com/images/I/81OBgvbUR7L._UX522_.jpg":["522","679"],"http://ecx.images-amazon.com/images/I/81OBgvbUR7L._UY500_.jpg":["385","500"],"http://ecx.images-amazon.com/images/I/81OBgvbUR7L._UY550_.jpg":["423","550"]}}],"Deep Blue Sea":[{"large":"http://ecx.images-amazon.com/images/I/41oNq3KmSGL.jpg","variant":"MAIN","hiRes":"http://ecx.images-amazon.com/images/I/81MtZtmxVLL._UL1500_.jpg","thumb":"http://ecx.images-amazon.com/images/I/41oNq3KmSGL._SR38,50_.jpg","main":{"http://ecx.images-amazon.com/images/I/81MtZtmxVLL._UX342_.jpg":["342","445"],"http://ecx.images-amazon.com/images/I/81MtZtmxVLL._UX522_.jpg":["522","679"],"http://ecx.images-amazon.com/images/I/81MtZtmxVLL._UY550_.jpg":["423","550"],"http://ecx.images-amazon.com/images/I/81MtZtmxVLL._UY500_.jpg":["385","500"],"http://ecx.images-amazon.com/images/I/81MtZtmxVLL._UX466_.jpg":["466","606"]}},{"large":"http://ecx.images-amazon.com/images/I/41AJgd1OuYL.jpg","variant":"BACK","hiRes":"http://ecx.images-amazon.com/images/I/81uLEksrYFL._UL1500_.jpg","thumb":"http://ecx.images-amazon.com/images/I/41AJgd1OuYL._SR38,50_.jpg","main":{"http://ecx.images-amazon.com/images/I/81uLEksrYFL._UX342_.jpg":["342","445"],"http://ecx.images-amazon.com/images/I/81uLEksrYFL._UY500_.jpg":["385","500"],"http://ecx.images-amazon.com/images/I/81uLEksrYFL._UX522_.jpg":["522","679"],"http://ecx.images-amazon.com/images/I/81uLEksrYFL._UX466_.jpg":["466","606"],"http://ecx.images-amazon.com/images/I/81uLEksrYFL._UY550_.jpg":["423","550"]}}],"Sea Glass":[{"large":"http://ecx.images-amazon.com/images/I/418vg-re8oL.jpg","variant":"MAIN","hiRes":"http://ecx.images-amazon.com/images/I/81YgtD-bEwL._UL1500_.jpg","thumb":"http://ecx.images-amazon.com/images/I/418vg-re8oL._SR38,50_.jpg","main":{"http://ecx.images-amazon.com/images/I/81YgtD-bEwL._UX342_.jpg":["342","445"],"http://ecx.images-amazon.com/images/I/81YgtD-bEwL._UX522_.jpg":["522","679"],"http://ecx.images-amazon.com/images/I/81YgtD-bEwL._UX466_.jpg":["466","606"],"http://ecx.images-amazon.com/images/I/81YgtD-bEwL._UY500_.jpg":["385","500"],"http://ecx.images-amazon.com/images/I/81YgtD-bEwL._UY550_.jpg":["423","550"]}},{"large":"http://ecx.images-amazon.com/images/I/41lcpC41VSL.jpg","variant":"BACK","hiRes":"http://ecx.images-amazon.com/images/I/814%2B6ZLwIxL._UL1500_.jpg","thumb":"http://ecx.images-amazon.com/images/I/41lcpC41VSL._SR38,50_.jpg","main":{"http://ecx.images-amazon.com/images/I/814%2B6ZLwIxL._UY500_.jpg":["385","500"],"http://ecx.images-amazon.com/images/I/814%2B6ZLwIxL._UX342_.jpg":["342","445"],"http://ecx.images-amazon.com/images/I/814%2B6ZLwIxL._UX522_.jpg":["522","679"],"http://ecx.images-amazon.com/images/I/814%2B6ZLwIxL._UX466_.jpg":["466","606"],"http://ecx.images-amazon.com/images/I/814%2B6ZLwIxL._UY550_.jpg":["423","550"]}}]};
data["heroImage"] = {};
data["landingAsinColor"] = 'Coral Dreams';
data["shouldApplyResizeFix"] = false;
return data;
});
</script>
The filenames I want to grab don't have src (i.e. http://ecx.images-amazon.com/images/I/81%2BTW8762BL._UY500_.jpg) In this case, the array is called data["colorImages"]. But I can't hard-code anything because the same thing happens on eBay.
The filenames I need here are in enImgCarousel.
On a side note, when I use the following JavaScript bookmarklet for each URL to get images, I'm able to get the correct images:
a='';
for (b=0;b<document.images.length;b++){
a+='<img src='+document.images[b].src+'><br>'};
ifa=''){
document.writea+'</center>');
void(document.close())
}else{
alert('No images!')
}
Back to Nokogiri and XPath, I've also tried:
tmp2.xpath("//img").each do |src|...
and
tmp2.xpath("html//img").each do |src|
Any ideas how I should do this or which direction to go in?
This is alternative way to solve what you want; you can use Capybara and Poltergeist.
I assume you don't have to dive into JavaScript with this solution.
If you scrape, I recommend that you consider Capybara with Poltergeist, you can find many sources to reference.
This is the code I tried:
require 'capybara'
require 'capybara/dsl'
require 'capybara/poltergeist'
Capybara.register_driver :poltergeist_debug do |app|
Capybara::Poltergeist::Driver.new(app, inspector: true)
end
Capybara.javascript_driver = :poltergeist_debug
Capybara.current_driver = :poltergeist_debug
# Amazon Case
visit_site('https://www.amazon.com/dp/B00T46V758/?tag=stackoverfl08-20')
doc_amazon = Nokogiri::HTML.parse(page.html)
doc_amazon.xpath("//img/#src").each do |src|
p src.value
end
#ebay case
visit_site('https://www.ebay.com/itm/Summer-Women-Casual-Chiffon-Loose-Tops-Batwing-Short-Sleeve-Loose-T-Shirt-Blouse-/351411949784?pt=LH_DefaultDomain_0&var=&hash=item51d1c8d0d8')
doc_ebay = Nokogiri::HTML.parse(page.html)
doc_ebay.xpath("//img/#src").each do |src|
p src.value
end
If you want to dig into it:
doc.xpath("//div[#id='imgTagWrapperId']/img").attribute('src').value
# => "https://images-na.ssl-images-amazon.com/images/I/81%2BTW8762BL._UX453_.jpg"
doc.xpath("//div[#id='mainImgHldr']/img[#id='icImg']").attribute('src').value
# => "https://i.ebayimg.com/images/g/dtAAAOSwpdpVZuU~/s-l300.jpg"
Are you trying to generate a database of competitors items with pricing, etc.?
Are you trying to grab entire categories or individual sellers?
The reason why I ask is you can get an RSS feed of items each seller lists if they have turned that feature on. This way, you do not have to waste time scraping a page when you can get the central data from an RSS feed.
When parsing webpages, depending upon where you are in the webpage (you mentioned carousel) the indices you are encountering are from the stash of thumbnails representing the larger images.
I recommend looking at the eBay API and the Amazon API and finding the RSS feeds for the sellers first.
As far as getting past any Javascript issues, the webpage loads rotating slideshows and carousels dynamically, so you will have to use Mechanize (as RAJ suggested above) or Beautiful Soup or Selenium to get fully rendered web pages in which all images are in a scrapable state.
Feel free to post your source if there is anything else I can help with.
Sorry, as I am posting the answer from mobile phone, I can't write full code right away, however, I can give you a way. You should use Mechanize with selenium-webdriver & watir instead of only Nokogiri.
Using Mechanize, you will be able to handle elements coming from JavaScript. You can mock the actual moves on browser i.e. you can code for clicking on links/buttons, you can wait for image load and then can scrape it. And all this can be done using Mechanize very easily.

Resources