Why this Xpath not working? - parsing

For example this HTML
<div>
<span></span> I want to find this <b>this works ok</b>.
</div>
I want to find a DIV with I want to find this in it and then grab the whole text inside that DIV including child elements
My XPATH, //*[contains(text(), 'I want to find this')] does not work at all.
If I do this //*[contains(text(), 'this works')] it works but I want to find any DIV based on I want to find this text
However, if I remove the <span></span> from that HTML, it works, why is that?

text() only gets the text before the first inner element. You can replace it with . to use the current node to search.
//div[contains(., 'I want to find this')]
This will search in a string concatenation of all text nodes inside the current node.
To grab all text you can use node.itertext() to iterate all inner texts if you are using lxml:
from lxml import etree
html = """
<div>
<span></span> I want to find this <b>this works ok</b>.
</div>
"""
root = etree.fromstring(html, etree.HTMLParser())
for div in root.xpath('//div[contains(., "I want to find this")]'):
print(''.join([x for x in div.itertext()]))
# => I want to find this this works ok.

Try using //*[text()=' I want to find this '] , this will select the div tag and then for text you can use the getText() method to get the text

You can try Replace text() with string():
//div[contains(string(), " I want to find this")]
Or, you can check that span's following text sibling contains the text:
//div[contains(span/following-sibling::text(), " I want to find this")]

Related

Google Sheets IMPORTXML, text ONLY for xpath with text inside <a href> tags

I'm trying to import just the text from a div on a client's site into a Google sheets using the =IMPORTXML function so they can see everything on one sheet. The problem is that some pages have href tags wrapped around text, which if using =IMPORTXML(website, "[xpath]/text") gives me an error about the array overwriting the next cell. So I tried some of the tricks around the web (wrapping in =REGEXREPLACE, =JOIN, etc) and those got me the text of the div minus the text of the children.
For example, if I have this HTML
<div class="text">
I want to get this text and
this text, too
so what do I do?
</div>
In my sheet I get "I want to get this text and so what do I do?"
I found the solution for anyone else trying to do this:
=JOIN(CHAR(10),IMPORTXML("site","//div[#class='divclass']//text()"))
My mistake was using /text() instead of //text() The extra / was missing.

Is it possible to Html.Hidden a text by its id?

Is it possible to #Html.Hidden a text value by id?
In the next example someText changes in JS.
<div id="someText">1</div>
I would like to add hidden value that will get the div's text, is it possible?
For example:
#Html.Hidden("Position", GetTextById("#someText"))
Sure you can!
First, let's say you want to add the hidden value inside your html body... just append the whole thing using jQuery:
$(body).append('#Html.Hidden("Position",' + $("#someText").val() + ')');
Given that, if your goal is just to hide the value, you could do:
$("#someText").hide();
Regards,

Display raw text from custom field in Drupal

I'm trying to render a Block's Field as Plain Text as I need it used as part of HTML, I've tried using |RAW however I read it was unstable + it didn't work haha!
This is my existing HTML minified
Read More
However I would like to make it more useable
Read More
This would mean that when a user modifies the DrupalBlock HEX code it would change the color of the box. However the issues is when it's printed on the page it's looking like this
<div data-quickedit-field-id="#" class="field field--name-field-color field--type-string field--label-hidden field--item quickedit-field">FFFFFF</div>
the only thing I would like printed is "FFFFFF" with no div's
-
Here is my question: How do I display my Field_color as plain text when it prints?
You can use |raw : {{ content.field_color|raw }}.
If you need more information please ask.
I suggest you do a dump or kint of your content.field_color variable. You might be able to get some more information on it and get the answer!
Anyway, we have something similar in our project and the way we do it is by using a .getString() method.
{% set image_align = content.field_image_align['#items'][0].getString() %}
<div class="{{ image_align }}">
Our field is a list of values so you'll have to look for another array item to call the .getString() method on.

Extract text from specific HTML location across multiple pages

I have been experimenting with Jericho HTML Parser and Selenium IDE for the purpose of extracting text from a specific location inside HTML across multiple pages.
I have not found a simple example of how to do this and I don't know java.
I would like to find in a folder all HTML pages in the 1st table, 4th row, 1st div any string of text:
</table>
<tr class="abc"><td class="xyz"><div align="center">The Text I don't want</div></td></tr>
<tr class="abc"><td class="xyz"><div align="center">The Text I don't want</div></td></tr>
<tr class="abc"><td class="xyz"><div align="center">The Text I don't want</div></td></tr>
<tr class="abc"><td class="xyz"><div align="center">The Text I want</div></td></tr>
</table>
And print the selected text to a txt file in a list like this:
The Text I want
Another Text I want
All the source files are stored locally and may contain bad HTML, so figured Jericho might be best for this purpose. However I'm happy to learn any method to achieve the desired result.
Well in the end I went with beautifulsoup and used a python script with something like this:
# open source html file
with open(html_pathname, 'r') as html_file:
# using BeautifulSoup module search html tag's tree
soup = BeautifulSoup(html_file)
# find according your criteria "1st table, 6th tr, 1st td, 1st div"
trs = soup.html.body.table.tr.findNextSiblings('tr')[4].td.div
# write found text to result txt
print ' - writing to result txt'
result_file.write(''.join(trs.contents) + '\n')
print ' - ok!'

append to collabsible content after the header?

i am running into problem, concerning append. i ve a dynacmic collabsible, which i fill with a dynamic list. i want to append this list after the header h3 of the collabsible.
when i append it to the collabsible, it does not appear in the
<div class="ui-collapsible-content ui-collapsible-content-collapsed"> </div>
but after. therefor i get a space between the content header and the list, which i want to avoid.
i tried this:
$('some-selector > ui-collapsible-content ui-collapsible-content-collapsed') but it does not work.
any hints?
If you are trying to append inside
<div class="ui-collapsible-content ui-collapsible-content-collapsed"> </div>
Then you should use:
$('div.ui-collapsible-content.ui-collapsible-content-collapsed').append($content);
Where $content is either a jQuery object, a DOM element or a HTML string. Note the . in the selector, which specifies a class (or two).
See append.

Resources