Say I have
<h3></h3>
<h2></h2>
<p></p>
How can I get to the p node from h3
Right now I can only get from doc.css('h3').next_element which doesn't take any arguments and returns the h2 tag.
Is there a way to check node types recursively or is there a method where I can call for example doc.css('h3').next('p')
P.S Of course the HTML I'm parsing is not as simple as the example above.
If you need only one element (not collection), there is at method
And you need selector with general sibling combinator ~
doc.at('h3 ~ p')
If you need collections of such p that go after each h3 tag
doc.css('h3 ~ p')
Related
http://xxx/api/xml?&tree=builds[number,description,result,id,actions[parameters[name,value]]]
Above API returns all the build IDs. Is there a way to limit results to get last 5 build IDS?
The tree query parameter allows you to explicitly specify and retrieve only the information you are looking for, by using an XPath-ish path expression. The value should be a list of property names to include, with sub-properties inside square braces. Try tree=jobs[name],views[name,jobs[name]] to see just a list of jobs (only giving the name) and views (giving the name and jobs they contain). Note: for array-type properties (such as jobs in this example), the name must be given in the original plural, not in the singular as the element would appear in XML (). This will be more natural for e.g. json?tree=jobs[name] anyway: the JSON writer does not do plural-to-singular mangling because arrays are represented explicitly.
For array-type properties, a range specifier is supported. For example, tree=jobs[name]{0,10} would retrieve the name of the first 10 jobs. The range specifier has the following variants:
{M,N}: From the M-th element (inclusive) to the N-th element (exclusive).
{M,}: From the M-th element (inclusive) to the end.
{,N}: From the first element (inclusive) to the N-th element (exclusive). The same as {0,N}.
{N}: Just retrieve the N-th element. The same as {N,N+1}.
Another way to retrieve more data is to use the depth=N query parameter . This retrieves all the data up to the specified depth. Compare depth=0 and depth=1 and see what the difference is for yourself. Also note that data created by a smaller depth value is always a subset of the data created by a bigger depth value.
Because of the size of the data, the depth parameter should really be only used to explore what data Jenkins can return. Once you identify the data you want to retrieve, you can then come up with the tree parameter to exactly specify the data you need.
I'm on version 1.509.4. which doesn't support range specifier.
Source: http://ci.citizensnpcs.co/api/
You can create an xml object with the build numbers via xpath and parse it yourself with via different means.
http://xxx/api/xml?xpath=//build/number&wrapper=meep
Creates an xml that looks like:
<meep>
<number>n</number>
<number>n+1</number>
...
<number>m</number>
</meep>
And will be populated with the build numbers n through m that are currently in jenkins for the specified job in the url. You can substitute anything for the word "meep", that will become the wrapper object for the newly created xml object.
How are you collecting/manipulating the api xml output once you get it? Because there is a solution here for How do I select the last N elements with XPath?. I tried using some of these xpath manipulations but I couldn't get it to work when playing with the url in my browser; it might work if you are doing something else.
When I get the xml object, I happen to manipulate it via shell scripts.
#!/bin/sh
# NOTE: To get the url to work with curl, you need a valid jenkins user and api token
# Put all build numbers in a variable called build_ids
build_ids="$(curl -sL --user ${_jenkins_api_user}:${_jenkins_api_token} \
"${_jenkins_url}/job/${_job_name}/api/xml?xpath=//build/number&wrapper=meep" \
| sed -e 's/<[^>]*>/ /g' | sed -e 's/ / /g')"
# Print the last 5 items with awk
echo "${build_ids}" | awk '{n = 5; for (--n; n >= 0; n--){ printf "%s\t",$(NF-n)} print ""}';
Once you have your xml object you can essentially parse it however you want.
NOTE: I am running Jenkins ver. 2.46.1
Looking at the doco at the raw .../api/ endpoint (on Jenkins 2.60.3) it says
For array-type properties, a range specifier is supported. For
example, tree=jobs[name]{0,10} would retrieve the name of the first 10
jobs. The range specifier has the following variants:
{M,N}: From the M-th element (inclusive) to the N-th element (exclusive).
{M,}: From the M-th element (inclusive) to the end.
{,N}: From the first element (inclusive) to the N-th element (exclusive). The same as {0,N}.
{N}: Just retrieve the N-th element. The same as {N,N+1}.
For the OP's case, you'd append {,5} to the end of the URL to get the first 5 results:
http://xxx/api/xml?&tree=builds[number,description,result,id,actions[parameters[name,value]]]{,5}
I have code which retrieves a static text element for me fine but what I want to do is then get the xpath of that element as a string. I'm using ruby. At this point I have an array of elements I have already retrieved. Below is what I've tried but no luck.
elements.each do |element|
if element.attribute("name").include? vProblem
p "Problem found, retrieving xpath..."
# Neither of these work
p "Problem xpath is: " + element.attribute("xpath").to_s
p "Problem xpath is: " + element.xpath.to_s
end
end
I don't believe there's an easy method or setting to call which will give you the xpath value. You would have to determine it yourself. To do this, iterate through all elements on the page until you found one that (A) matched the class of the element you're looking for and (B) matches the name/value of the element.
The only issue with this approach is that it assumes that the first element matching the class and value is the correct element. If there are multiple elements with the same class and value, it will only find the first.
<xsl:template match="lat:entry[document(lat:file)//h2]"/>
Is this template called ONLY on "entry" elements that contain a lat:file tag with a file name, which file contains h2 tags?
Or on ANY lat:entry ?
If the latter, how can I construct a correct match? (correct being the former option)
That match pattern lat:entry[document(lat:file)//h2] indeed matches elements with local name entry with the namespace matched by the prefix lat which have one or more file child element in the same namespace where document(lat:file) finds at least one XML document containing h2 elements (in no namespace or in the xpath-default-namespace, depending on the context). So your first description is kind of right, with the exception that document(lat:file)//h2 could result in several documents being loaded and checked for h2 elements, if there are several lat:file child elements.
I am using IOS regular expression engine to match any text in the form:
"[h1]test text[/h1]"
i wrote: #"\\[h1]([^.]*)[/h1\\]]"
to match this form, but it is working sometimes and other times it matches text out of bound of the last bracket, is it the best form to match these strings or what you suggest ?
I would recommend using (.*?) instead of ([^.]*?).
It looks want you want is "between [h1] and [/h1] match anything." That would be (.*?).
What you have is "between [h1] and [/h1] match anything which is not a period (.)."
In addition, you have a problem with your ending [/h1\\]] means end with a /, h, 1, or ]. I think you want \\[/h1] which means end with the string [/h1].
The final regex would be #"\\[h1](.*?)\\[/h1]".
Syntax of the xml document:
<x name="GET-THIS">
<y>
<z>Z</z>
<z>Z__2</z>
<z>Z__3</z>
</y>
</x>
I'm able to get all z elements using:
xpath("//z")
But after that I got stuck, I'm not sure what to do next. I don't really understand the syntax of the .. parent method
So, how do I get the attribute of the parent of the parent of the element?
Instead of traversing back to the parent, just find the right parent to begin with:
//x will select all x elements.
//x[//z] will select all x elements which have z elements as descendants.
//x[//z]/#name will get the name attribute of each of those elements.
You already have a good accepted answer, but here are some other helpful expressions:
//z/ancestor::x/#name - Find <z> elements anywhere, then find all the ancestor <x> elements, and then the name="…" attributes of them.
//z/../../#name - Find the <z> elements, and then find the parent node(s) of those, and then the parent node(s) of those, and then the name attribute(s) of the final set.
This is the same as: //z/parent::*/parent::*/#name, where the * means "an element with any name".
The // is useful, but inefficient. If you know that the hierarchy is x/y/z, then it is more efficient to do something like //x[y/z]/#name
I dont have a reputation, so I cannot add comment to accepted answer by Blender. But his answer will not work in general.
Correct version is
//x[.//z]/#name
Explanation is simple - when you use filter like [//z] it will search for 'z' in global context, i.e. it returns true if xml contains at least one node z anywhere in xml. For example, it will select both names from xml below:
<root>
<x name="NOT-THIS">
</x>
<x name="GET-THIS">
<y>
<z>Z</z>
<z>Z__2</z>
<z>Z__3</z>
</y>
</x>
</root>
Filter [.//z] use context of current node (.) which is xand return only 2nd name.