pugixml number of child nodes - xml-parsing

Does a pugixml node object have a number-of-child-nodes method? I cannot find it in the documentation and had to use an iterator as follows:
int n = 0;
for (pugi::xml_node ch_node = xMainNode.child("name"); ch_node; ch_node = ch_node.next_sibling("name")) n++;

There is no built-in function to compute that directly; one other approach is to use std::distance:
size_t n = std::distance(xMainNode.children("name").begin(), xMainNode.children("name").end());
Of course, this is linear in the number of child nodes; note that computing the number of all child nodes, std::distance(xMainNode.begin(), xMainNode.end()), is also linear - there is no constant-time access to child node count.

You could use an expression based on an xpath search (no efficiency guarantees, though):
xMainNode.select_nodes( "name" ).size()

int children_count(pugi::xml_node node)
{
int n = 0;
for (pugi::xml_node child : node.children()) n++;
return n;
}

Related

Merge two sorted linked lists: space complexity

I am looking at the following Geeks for Geeks problem:
Given two sorted linked lists consisting of N and M nodes respectively. The task is to merge both of the list (in-place) and return head of the merged list.
Example 1
Input:
N = 4, M = 3
valueN[] = {5,10,15,40}
valueM[] = {2,3,20}
Output: 2 3 5 10 15 20 40
Explanation: After merging the two linked
lists, we have merged list as 2, 3, 5,
10, 15, 20, 40.
Below answer is the GFG answer. I don't understand how its space complexity is O(1). We are creating a new node, so it must be O(m+n).
Node* sortedMerge(Node* head1, Node* head2)
{
struct Node *dummy = new Node(0);
struct Node *tail = dummy;
while (1) {
if (head1 == NULL) {
tail->next = head2;
break;
}
else if (head2 == NULL) {
tail->next = head1;
break;
}
if (head1->data <= head2->data){
tail->next = head1;
head1 = head1->next;
}
else{
tail->next = head2;
head2 = head2->next;
}
tail = tail->next;
}
return dummy->next;
}
Could someone explain how the space complexity is O(1) here?
I can't understand how it's space complexity is O(1). Since we are creating a new node so it must be O(m+n).
Why should it be O(m+n) when it creates one node? The size of that node is a constant, so one node represents O(1) space complexity. Creating one node has nothing to do with the size of either of the input lists. Note that the node is created outside of the loop.
It is actually done this way to keep the code simple, but the merge could be done even without that dummy node.

Search for sequence in Uint8List

Is there a fast (native) method to search for a sequence in a Uint8List?
///
/// Return index of first occurrence of seq in list
///
int indexOfSeq(Uint8List list, Uint8List seq) {
...
}
EDIT: Changed List<int> into Uint8List
No. There is no built-in way to search for a sequence of elements in a list.
I am also not aware of any dart:ffi based implementations.
The simplest approach would be:
extension IndexOfElements<T> on List<T> {
int indexOfElements(List<T> elements, [int start = 0]) {
if (elements.isEmpty) return start;
var end = length - elements.length;
if (start > end) return -1;
var first = elements.first;
var pos = start;
while (true) {
pos = indexOf(first, pos);
if (pos < 0 || pos > end) return -1;
for (var i = 1; i < elements.length; i++) {
if (this[pos + i] != elements[i]) {
pos++;
continue;
}
}
return pos;
}
}
}
This has worst-case time complexity O(length*elements.length). There are several more algorithms with better worst-case complexity, but they also have larger constant factors and more expensive pre-computations (KMP, BMH). Unless you search for the same long list several times, or do so in a very, very long list, they're unlikely to be faster in practice (and they'd probably have an API where you compile the pattern first, then search with it.)
You could use dart:ffi to bind to memmem from string.h as you suggested.
We do the same with binding to malloc from stdlib.h in package:ffi (source).
final DynamicLibrary stdlib = Platform.isWindows
? DynamicLibrary.open('kernel32.dll')
: DynamicLibrary.process();
final PosixMalloc posixMalloc =
stdlib.lookupFunction<Pointer Function(IntPtr), Pointer Function(int)>('malloc');
Edit: as lrn pointed out, we cannot expose the inner data pointer of a Uint8List at the moment, because the GC might relocate it.
One could use dart_api.h and use the FFI to pass TypedData through the FFI trampoline as Dart_Handle and use Dart_TypedDataAcquireData from the dart_api.h to access the inner data pointer.
(If you want to use this in Flutter, we would need to expose Dart_TypedDataAcquireData and Dart_TypedDataReleaseData in dart_api_dl.h https://github.com/dart-lang/sdk/issues/40607 I've filed https://github.com/dart-lang/sdk/issues/44442 to track this.)
Alternatively, could address https://github.com/dart-lang/sdk/issues/36707 so that we could just expose the inner data pointer of a Uint8List directly in the FFI trampoline.

Coding a type of random walk in Neo4j using the Traversal Framework

I'm currently working on a graph where nodes are connected via probabilistic edges. The weight on each edge defines the probability of existence of the edge.
Here is an example graph to get you started
(A)-[0.5]->(B)
(A)-[0.5]->(C)
(B)-[0.5]->(C)
(B)-[0.3]->(D)
(C)-[1.0]->(E)
(C)-[0.3]->(D)
(E)-[0.3]->(D)
I would like to use the Neo4j Traversal Framework to traverse this graph starting from (A) and return the number of nodes that have been reached based on the probability of the edges found along the way.
Important:
Each node that is reached can only be counted once. -> If (A) reaches (B) and (C), then (C) need not reach (B). On the other hand if (A) fails to reach (B) but reaches (C) then (C) will attempt to reach (B).
The same goes if (B) reaches (C), (C) will not try and reach (B) again.
This is a discrete time step function, a node will only attempt to reach a neighboring node once.
To test the existence of an edge (whether we traverse it) we can generate a random number and verify if it's smaller than the edge weight.
I have already coded part of the traversal description as follows. (Here it is possible to start from multiple nodes but that is not necessary to solve the problem.)
TraversalDescription traversal = db.traversalDescription()
.breadthFirst()
.relationships( Rels.INFLUENCES, Direction.OUTGOING )
.uniqueness( Uniqueness.NODE_PATH )
.uniqueness( Uniqueness.RELATIONSHIP_GLOBAL )
.evaluator(new Evaluator() {
#Override
public Evaluation evaluate(Path path) {
// Get current
Node curNode = path.endNode();
// If current node is the start node, it doesn't have previous relationship,
// Just add it to result and keep traversing
if (startNodes.contains(curNode)) {
return Evaluation.INCLUDE_AND_CONTINUE;
}
// Otherwise...
else {
// Get current relationhsip
Relationship curRel = path.lastRelationship();
// Instantiate random number generator
Random rnd = new Random();
// Get a random number (between 0 and 1)
double rndNum = rnd.nextDouble();
// relationship wc is greater than the random number
if (rndNum < (double)curRel.getProperty("wc")) {
String info = "";
if (curRel != null) {
Node prevNode = curRel.getOtherNode(curNode);
info += "(" + prevNode.getProperty("name") + ")-[" + curRel.getProperty("wc") + "]->";
}
info += "(" + curNode.getProperty("name") + ")";
info += " :" + rndNum;
System.out.println(info);
// Keep node and keep traversing
return Evaluation.INCLUDE_AND_CONTINUE;
} else {
// Don't save node in result and stop traversing
return Evaluation.EXCLUDE_AND_PRUNE;
}
}
}
});
I keep track of the number of nodes reached like so:
long score = 0;
for (Node currentNode : traversal.traverse( nodeList ).nodes())
{
System.out.print(" <" + currentNode.getProperty("name") + "> ");
score += 1;
}
The problem with this code is that although NODE_PATH is defined there may be cycles which I don't want.
Therefore, I would like to know:
Is there is a solution to avoid cycles and count exactly the number of nodes reached?
And ideally, is it possible (or better) to do the same thing using PathExpander, and if yes how can I go about coding that?
Thanks
This certainly isn't the best answer.
Instead of iterating on nodes() I iterate on the paths, and add the endNode() to a set and then simply get the size of the set as the number of unique nodes.
HashSet<String> nodes = new HashSet<>();
for (Path path : traversal.traverse(nodeList))
{
Node currNode = path.endNode();
String val = String.valueOf(currNode.getProperty("name"));
nodes.add(val);
System.out.println(path);
System.out.println("");
}
score = nodes.size();
Hopefully someone can suggest a more optimal solution.
I'm still surprised though that NODE_PATH didn't not prevent cycles from forming.

Some difficulties of designing with types in F# by simple graph example

There is oriented graph:
We are adding node and edge to it:
and then removing some other (by the algorithm, it doesn't matter here):
I had tried to do this in F#, but I cannot choose properly architecture decisions because of my little experience.
open System.Collections.Generic
type Node = Node of int
type OGraph(nodes : Set<Node>,
edges : Dictionary<Node * int, Node>) =
member this.Nodes = nodes
member this.Edges = edges
let nodes = set [Node 1; Node 2; Node 3]
let edges = Dictionary<Node * int, Node>()
Array.iter edges.Add [|
(Node 1, 10), Node 2;
(Node 2, 20), Node 3;
|]
let myGraph = OGraph(nodes, edges)
myGraph.Nodes.Add (Node 4)
myGraph.Edges.Add ((Node 2, 50), Node 4)
myGraph.Edges.Remove (Node 2, 20)
myGraph.Nodes.Remove (Node 3)
How to add empty node? I mean, it may be 3 or 4 or even 100500. If we add node without number, then how we can use it to create edge? myGraph.Edges.Add ((Node 2, 50), ???) In imperative paradigm it would be simple because of using named references and Nulls, we can just create Node newNode = new Node() and then use this reference newNode, but seems that in F# this is a bad practice.
Should I specify separate types Node and Edge or use simple types instead? Or may be some other representation, more complicated?
It is better to use common .NET mutable collections (HashSet, Dictionary etc.), or special F# collections (Set, Map, etc.)? If collections are large, it is acceptable in terms of performance to copy entire collection every time it should be changed?
The graph itself is easy enough to model. You could define it like this:
type Graph = { Node : int option; Children : (int * Graph) list }
If you will, you can embellish it more, using either type aliases or custom types instead of primitive int values, but this is the basic idea.
You can model the three graphs pictured in the OP like the following. The formatting I've used looks quite verbose, but I deliberately formatted the values this way in order to make the structure clearer; you could write the values in a more compact form, if you'd like.
let x1 =
{
Node = Some 1;
Children =
[
(
10,
{
Node = Some 2;
Children =
[
(
20,
{
Node = Some 3;
Children = []
}
)
]
}
)
]
}
let x2 =
{
Node = Some 1;
Children =
[
(
10,
{
Node = Some 2;
Children =
[
(
20,
{
Node = Some 3;
Children = []
}
);
(
50,
{
Node = None;
Children = []
}
)
]
}
)
]
}
let x3 =
{
Node = Some 1;
Children =
[
(
10,
{
Node = Some 2;
Children =
[
(
50,
{
Node = Some 3;
Children = []
}
)
]
}
)
]
}
Notice the use of int option to capture whether or not a node has a value.
The Graph type is an F# record type, and uses the F# workhorse list for the children. This would be my default choice, and only if performance becomes a problem would I consider other data types. Lists are easy to work with.
Sine if these are easy:
Use Option - then an empty node is None
Maybe - depends on problem
This depends on your specific problem you are solving - the F# collections tend to be immutable and some operations are fast, but the .NET collections have other operations which are fast.

Checking if removing an edge in a graph will result in the graph splitting

I have a graph structure where I am removing edges one by one until some conditions are met. My brain has totally stopped and i can't find an efficient way to detect if removing an edge will result in my graph splitting in two or more graphs.
The bruteforce solution would be to do an bfs until one can reach all the nodes from a random node, but that will take too much time with large graphs...
Any ideas?
Edit: After a bit of search it seems what I am trying to do is very similar to the fleury's algorithm, where I need to find if an edge is a "bridge" or not.
Edges that make a graph disconnected when removed are called 'bridges'. You can find them in O(|V|+|E|) with a single depth-first search over the whole graph. A related algorithm finds all 'articulation points' (nodes that, if removed, makes the graph disconnected) follows. Any edge between two articulation-points is a bridge (you can test that in a second pass over all edges).
//
// g: graph; v: current vertex id;
// r_p: parents (r/w); r_a: ascents (r/w); r_ap: art. points, bool array (r/w)
// n_v: bfs order-of-visit
//
void dfs_art_i(graph *g, int v, int *r_p, int *r_v, int *r_a, int *r_ap, int *n_v) {
int i;
r_v[v] = *n_v;
r_a[v] = *n_v;
(*n_v) ++;
// printf("entering %d (nv = %d)\n", v, *n_v);
for (i=0; i<g->vertices[v].n_edges; i++) {
int w = g->vertices[v].edges[i].target;
// printf("\t evaluating %d->%d: ", v, w);
if (r_v[w] == -1) {
// printf("...\n");
// This is the first time we find this vertex
r_p[w] = v;
dfs_art_i(g, w, r_p, r_v, r_a, r_ap, n_v);
// printf("\n\t ... back in %d->%d", v, w);
if (r_a[w] >= r_v[v]) {
// printf(" - a[%d] %d >= v[%d] %d", w, r_a[w], v, r_v[v]);
// Articulation point found
r_ap[i] = 1;
}
if (r_a[w] < r_a[v]) {
// printf(" - a[%d] %d < a[%d] %d", w, r_a[w], v, r_a[v]);
r_a[v] = r_a[w];
}
// printf("\n");
}
else {
// printf("back");
// We have already found this vertex before
if (r_v[w] < r_a[v]) {
// printf(" - updating ascent to %d", r_v[w]);
r_a[v] = r_v[w];
}
// printf("\n");
}
}
}
int dfs_art(graph *g, int root, int *r_p, int *r_v, int *r_a, int *r_ap) {
int i, n_visited = 0, n_root_children = 0;
for (i=0; i<g->n_vertices; i++) {
r_p[i] = r_v[i] = r_a[i] = -1;
r_ap[i] = 0;
}
dfs_art_i(g, root, r_p, r_v, r_a, r_ap, &n_visitados);
// the root can only be an AP if it has more than 1 child
for (i=0; i<g->n_vertices; i++) {
if (r_p[i] == root) {
n_root_children ++;
}
}
r_ap[root] = n_root_children > 1 ? 1 : 0;
return 1;
}
If you remove the link between vertices A and B, can't you just check that you can still reach A from B after the edge removal? That's a little better than getting to all nodes from a random node.
How do you choose the edges to be removed?
Can you tell more about your problem domain?
Just how large Is your graph? maybe BFS is just fine!
After you wrote that you are trying to find out whether an edge is a bridge or not, I suggest
you remove edges in decreasing order of their betweenness measure.
Essentially, betweenness is a measure of an edges (or vertices) centrality in a graph.
Edges with higher value of betweenness have greater potential of being a bridge in a graph.
Look it up on the web, the algorithm is called 'Girvan-Newman algorithm'.

Resources