XPath Node selection - parsing

I am using HtmlAgilityPack to parse data for a Windows Phone 8 app. I have managed four nodes but I am having difficulties on the final one.
Game newGame = new Game();
newGame.Title = div.SelectSingleNode(".//section//h3").InnerText.Trim();
newGame.Cover = div.SelectSingleNode(".//section//img").Attributes["src"].Value;
newGame.Summary = div.SelectSingleNode(".//section//p").InnerText.Trim();
newGame.StoreLink = div.SelectSingleNode(".//img[#class= 'Store']").Attributes["src"].Value;
newGame.Logo = div.SelectSingleNode(".//div[#class= 'text-col'").FirstChild.Attributes["src"].Value;
That last piece of code is the one I am having problems with. The HTML on the website looks like this (simplified with the data I need)
<div id= "ContentBlockList" class="tier ">
<section>
<div class="left-side"><img src="newGame.Cover"></div>
<div class="text-col">
<img src="newGame.Logo http://url.png" />
<h3>newGame.Title</h3>
<p>new.Game.Summary</p>
<img src="newGame.StoreLink" class="Store" />
</div>
</div>
</section>
As you can see, I need to parse two images from this block of HTML. This code seems to take the first img src and uses it correctly for the game cover...
newGame.Cover = div.SelectSingleNode(".//section//img").Attributes["src"].Value;
However, I'm not sure how to get the second img src to retrieve the store Logo. Any ideas?

newGame.Cover = div.SelectSingleNode(".//img[2]").Attributes["src"].Value;
You didn't post the entire thing but, this should do the trick.

You can try this way :
newGame.Cover = div.SelectSingleNode("(.//img)[2]")
.GetAttributeValue("src", "");
GetAttributeValue() is preferable over Attributes["..."].Value because, while the latter throws exception, the former approach returns the 2nd parameter (empty string in the example above) when the attribute is not found.
Side note : your HTML markup is invalid as posted (some elements are not closed, <section> for example). That may cause confusion.

Related

Image doesn't show even when I can see it doing inspect in the browser

I have tried for more than two hours and I can not find an answer to this problem. I have this part of the code in a view strongly typed in MVC:
<td>
#foreach (var TitleBook in Directory.GetFiles(Server.MapPath("~/App_Data/Images"), "*.jpg"))
{
var fileName = Path.GetFileName(TitleBook);
if (Convert.ToInt32(fileName.Substring(0,3)) == item.IdBook)
{
<img src="#TitleBook" alt="Alternate Text" height="100" width="100">
}
}
</td>
The name of the image file is created so the three first characters in the filename are numbers.
When I run the code, I get the alternate message for the image in the hmtl view. However, the path to this image is well read and I know it because I can obtain it by doing inspect in the browser and calling only the src part of in another window which shows the image as expected (sorry for my English). The css is handled by the version of boostrap installed in VS2017.
Could someone point to my error here?
Your titleBook is a full physical path. But <img> wants an URL.
Untested:
<img src="~/App_Data/Images/#fileName" alt="Alternate Text" height="100" width="100">
BTW: with Directory.GetFiles(..., $"{item.IdBook:D3}*.jpg")) you wouldn't need the foreach loop.

Get children of an XHPChild

I am trying to move my website to Hack and XHP, of course. Below is a structure of what code structure I want to achieve:
<ui:backstageHeader>
<ui:backstageHeader-navItem href="/">stories</ui:backstageHeader-navItem>
<ui:backstageHeader-navItem href="/story/send">send a story</ui:backstageHeader-navItem>
<ui:backstageHeader-navItem href="/aboutus">support</ui:backstageHeader-navItem>
</ui:backstageHeader>
(Note: :ui:backstageHeader-navItem basically renders to <a href={$this->:href}>{$this->getCHildren}</a> so there is not need to attach its class here.)
Below is the code for :ui:backstageHeader:
final class :ui:backstageHeader extends :ui:base {
attribute :div;
children (:ui:backstageHeader-navItem)*;
protected function compose() {
$dom =
<section class="backstage-header">
<div class="container">
<div class="cell-logo">
<a href="/">
<span class="no23-logo-white"></span>
</a>
</div>
<div class="cell-navigation">
</div>
<div class="cell-account">
<div class="cell-login">
<div id="siteNav-login">Autentificare</div>
</div>
</div>
</div>
</section>;
$mainContainer = $dom->getChildren("div")[0];
$cellNavigation = $mainContainer->getChildren("div")[1];
$navItems = <ul class="main-navigation"></ul>;
foreach($this->getChildren() as $child) {
$navItems->appendChild(<li>{$child}</li>);
}
$dom->appendChild($navItems);
return $dom;
}
}
I used the Terminal to debug my code using hhvm -m d <file.php>, and everything was alright there; however, when I get to my browser, I get 500 error header. This is what the log says:
Catchable fatal error: Hack type error: Could not find method getChildren in an object of type XHPChild at /var/www/res/ui/backstage-header.php line 25
The error comes from
$cellNavigation = $mainContainer->getChildren("div")[1];
But, somehow, I need to append ul.main-navigation to div.cell-navigation from my section.backstage-header.
How can I do it?
Don't structure your code this way. Built it up from the inside out, so that you don't have to do a ton of unreadable getChildren calls looking for specific children. Those calls are super hard to read, and super inflexible when you change the structure of your XHP. You wouldn't do something like node.firstChild.lastChild.lastChild.firstChild in the JS DOM, would you? No, there's a better way in JS, to find things by class or ID; in XHP, you can just build it up the right way in the first place!
I'd give you an example of this, but it doesn't look like you actually use $mainContainer or $cellNavigation, so you can just remove those two problematic definitions.
As an aside, you really shouldn't be getting your type errors as catchable fatals from HHVM; this is a last resort sort of check. Try running the hh_client checker directly, maybe even showing its result in your IDE; it will give you a much faster iteration cycle, and much more information than HHVM provides.
From my experience, appendChild is very prone to human error. It's easier to do something like:
$items = (new Vector($this->getChildren()))->map($child ==> <li>{$child}</li>);
return <div id="container">{$items}</div>;
If you want to wrap the children in <li />.
Not sure if that will work but it will be close.
Pro tip: You can assign variables from within an XHP tree.
$root =
<div>
{$child = <span>
Text children
</span>}
</div>;
Now $child is already set to the <span> element.

Grails- How to display the last uploaded image?

I used to displayed more than one picture, hence I used the each. Now I have to display only the last uploaded picture. How should I changed my code?
<g:each in ="${statusMessage?.fetchProductPictureUrls() }" var="picture">
<div class="feed-picture">
<div class="fl">
<img class="single" src="${picture }" alt="Product Picture">
</div>
</div>
</g:each>
You can do this by many ways, here are 2:
1 - Filter the desired image in your controller, service or presenter and in your .gsp you just need to access your image variable.
2 - Use a tagLib to do this (not tested):
class LastImageTagLib {
static namespace = "last"
def image = { attr ->
//filter your image
def lastImage = attr.images.last()
out << "<img class='single' src='${lastImage}' alt='Product Picture'>"
}
}
And in your .gsp:
<div class="feed-picture">
<div class="fl">
<last:image images="${statusMessage?.fetchProductPictureUrls()}" />
</div>
</div>
I think the first option is better.
So going on the code above I am unsure if the list "fetchProductPicturesUrls" is sorted or not.
If it is, you have a few options.
You can grab the last entry of the list by leveraging the .last() method.
http://groovy.codehaus.org/groovy-jdk/java/util/List.html
OR
You can grab the list size and track the count by setting the status flag in your foreach loop.
http://grails.org/doc/latest/ref/Tags/each.html
Suggestion:
I would recommend storing images on disk or in the cloud vs storing them in the database.
A nice way to maintain these images is to create a domain. Here is a sample I use for S3 images.
class S3Image {
Date dateCreated
Date lastUpdated
String imageUrl
String imageName
static constraints = {
imageUrl(blank:false)
imageName(blank:false)
}
}

BeautifulSoup: parse only part of the page

I want to parse a part of html page, say
my_string = """
<p>Some text. Some text. Some text. Some text. Some text. Some text.
Link1
Link2
</p>
<img src="image.png" />
<p>One more paragraph</p>
"""
I pass this string to BeautifulSoup:
soup = BeautifulSoup(my_string)
# add rel="nofollow" to <a> tags
# return comment to the template
But during parsing BeautifulSoup adds <html>,<head> and <body> tags (if using lxml or html5lib parsers), and I don't need those in my code. The only way I've found up to now to avoid this is to use html.parser.
I wonder if there is a way to get rid of redundant tags using lxml - the quickest parser.
UPDATE
Originally my question was asked incorrectly. Now I removed <div> wrapper from my example, since common user does not use this tag. For this reason we cannot use .extract() method to get rid of <html>, <head> and <body> tags.
Use
soup.body.renderContents()
lxml will always add those tags, but you can use Tag.extract() to remove your <div> tag from inside them:
comment = soup.body.div.extract()
I could solve the problem using .contents property:
try:
children = soup.body.contents
string = ''
for child in children:
string += str(item)
return string
except AttributeError:
return str(soup)
I think that ''.join(soup.body.contents) would be more neat list to string converting, but this does not work and I get
TypeError: sequence item 0: expected string, Tag found

Graph background-image resizing without PHP

I've read several helpful answers in re. image resizing using PHP and max-height etc.: Image resize script
However, my problem is that I want to resize an image of a graph that I am retrieving from another site (USGS), and putting into a site (zenfolio) that supports HTML and JavaScript, but not PHP. I have tried adjusting the specified height and width, but keep on ending up resizing only the amount of the image that shows on the page, and not the image itself (sorry I cannot post images as I am a new user).
I just posted them as png's above to demonstrate the problem, but the images are generated as follows:
<div id="riverlevels">
<center>
<div id="MyWidget" style="background-image:url(http://waterdata.usgs.gov/nwisweb/graph?agency_cd=USGS&site_no=12354500&parm_cd=00065&period=21);width:576px;Height:400px;">
<br/>
Montana River Photography </div>
</center>
</div>
</div>
This same image can be generated using this JavaScript, but for some reason that does not allow me to display more than one variable graph per page (I want to show both discharge (00060), and gage height (00065)):
<script type="text/javascript">
wStation = "12354500";
wDays = "21";
wType = "00065";
wWidth = "576px";
wHeight = "400px";
wFColor = "#000033";
wTitle = "";
document.write('<div id="gageheight"></div>');
document.write('<scr'+'ipt type="text/JavaScript"src="http://batpigandme.com/js/showstring.js"></scr'+'ipt>');
As you can tell, I have to use a separate site that I own to create the JavaScript file. The graphs are currently located in various iterations at:
montanariverphoto.com/test
clark fork gage height
I sincerely apologize if I have missed an obvious answer to this! I basically created this widget by reverse engineering a widget from another site, so perhaps my call is incorrect all together.
Does it absolutely have to be a background image? Scaling them is possible (using background-size), but this property is not well supported (basically it won't work in Internet Explorer). Your code would work almost as-is if you can use an image tag instead:
<img src="http://waterdata.usgs.gov/nwisweb/graph?agency_cd=USGS&site_no=12354500&parm_cd=00065&period=21" width="576" height="400" alt="..." />
for your other problem, ids need to be unique on a page. In your code example you are creating a div with the id of gageheight, and this is ID is hardcoded into your javascript file at http://batpigandme.com/js/showstring.js. Since you can only have one element with this ID on the page, if you repeat the code later on it won't work. You'd need to change this script so that you could pass in the ID as a variable, something like:
wTitle = "";
wElement = "gageheight";
document.write('<div id="gageheight"></div>');
document.write('<scr'+'ipt type="text/JavaScript"src="http://batpigandme.com/js/showstring.js"></scr'+'ipt>');
and then in your JS:
var myElement = document.getElementById(wElement);
var JavaScriptCode = document.createElement("script");
JavaScriptCode.setAttribute('type', 'text/javascript');
JavaScriptCode.setAttribute("src", 'http://batpigandme.com/js/data2.js');
myElement.appendChild(JavaScriptCode);

Resources