TriadicCensus JUNG 2.0 getCounts() returns negative value for motif 003 - jung

I have nine directed graphs of different metropolitan road networks.
When I use TriadicCensus.getCounts(myGraph) negative values are returned for motif 003 (empty triad) in three of the nine cases.
As I understand from the API the getCounts method should return a count, therefore I cannot understand why it could return a negative value for some of the networks. Am I misunderstanding something regarding the TriadicCensus class?

The array that's returned is 1-based, not 0-based. My guess (without seeing your graph) is that you're interpreting index 0 as the count for 003; the value in index 0 is actually meaningless. (See the documentation: https://code.google.com/p/jung/source/browse/trunk/jung/jung-algorithms/src/main/java/edu/uci/ics/jung/algorithms/metrics/TriadicCensus.java)

Related

How to merge zero values (vector(0) with metric values in PromQL

I'm using flexlm_exporter to export my license usage to Prometheus and from Prometheus to custom service (Not Graphana).
As you know Prometheus hides missing values.
However, I need those missing values in my metric values, therefore I added to my prom query or vector(0)
For example:
flexlm_feature_used_users{app="vendor_lic-server01",name="Temp"} or vector(0)
This query adds a empty metric with zero values.
My question is if there's a way to merge the zero vector with each metric values?
Edit:
I need grouping, at least for a user and name labels, so vector(0) is probably not the best option here?
I tried multiple solutions in different StackOverflow threads, however, nothing works.
Please assist.
It would help if you used Absent with labels to convert the value from 1 to zero, use clamp_max
( Metrics{label=“a”} OR clamp_max(absent(notExists{label=“a”}),0))
+
( Metrics2{label=“a”} OR clamp_max(absent(notExists{label=“a”}),0)
Vector(0) has no label.
clamp_max(Absent(notExists{label=“a”},0) is 0 with label.
If you do sum(flexlm_feature_used_users{app="vendor_lic-server01",name="Temp"} or vector(0)) you should get what you're looking for, but you'll lose possibility to do group by, since vector(0) doesn't have any labels.
I needed a similar thing, and ended up flattening the options. What worked for me was something like:
(sum by xyz(flexlm_feature_used_users{app="vendor_lic-server01",name="Temp1"} + sum by xyz(flexlm_feature_used_users{app="vendor_lic-server01",name="Temp2"}) or
sum by xyz(flexlm_feature_used_users{app="vendor_lic-server01",name="Temp1"} or
sum by xyz(flexlm_feature_used_users{app="vendor_lic-server01",name="Temp2"}
There is no an easy generic way to fill gaps in returned time series with zeroes in Prometheus. But this can be easily done via default operator in VictoriaMetrics:
flexlm_feature_used_users{app="vendor_lic-server01",name="Temp"} default 0
The q default N fills gaps with the given default value N per each time series returned from q. See more details in MetricsQL docs.

Fortran entries of array change seemingly at random

I have been working with a FORTRAN program. I have noticed seemingly random changes in a 1D matrix I'm working with. It is a matrix of 4000 integers. Values are added to the matrix one by one, starting with index 1 and iterating by 1 for each added value. The matrix does not get fully "filled", currently only 100 values are placed into the matrix. So one would expect that the first 100 entries of the matrix will be non-zero (all added values are non-zero) and the remaining 3900 entries will be 0. However, several of the entries of the matrix end up being large negative numbers, but I'm certain that no portion of my code touches these entries.
What could be causing this issue? I'm sorry but I can't post the code for you all to work with.
The code has several other large matrices, taking up a total of ~100 MB of space. Could this potentially be a memory issue?
Thanks!
You have to initialize your array, otherwise it will almost always contain garbage. This would do it:
array = 0.0e0 ! real array
or
array = 0.0e0 ! double precision
or
array = 0 ! integer
A "matrix" is two-dimensional; your array is one-dimensional.
Things do not change unless you ask them to change.
FORTRAN does not initialize variables other than (as I recall) in a labeled COMMON. As such, they are guaranteed to start out with garbage values. Try initializing your data with a DATA statement. If you have to initialize a labeled COMMON, you will have to supply a BLOCK DATA subprogram.

ELKI OPTICS pre-computed distance matrix

I can't seem to get this algorithm to work on my dataset, so I took a very small subset of my data and tried to get it to work, but that didn't work either.
I want to input a precomputed distance matrix into ELKI, and then have it find the reachability distance list of my points, but I get reachability distances of 0 for all my points.
ID=1 reachdist=Infinity predecessor=1
ID=2 reachdist=0.0 predecessor=1
ID=4 reachdist=0.0 predecessor=1
ID=3 reachdist=0.0 predecessor=1
My ELKI arguments were as follows:
Running: -dbc DBIDRangeDatabaseConnection -idgen.start 1 -idgen.count 4 -algorithm clustering.optics.OPTICSList -algorithm.distancefunction external.FileBasedDoubleDistanceFunction -distance.matrix /Users/jperrie/Documents/testfile.txt -optics.epsilon 1.0 -optics.minpts 2 -resulthandler ResultWriter -out /Applications/elki-0.7.0/elkioutputtest
I use the DBIDRangeDatabaseConnection instead of an input file to create indices 1 through 4 and pass in a distance matrix with the following format, where there are 2 indices and a distance on each line.
1 2 0.0895585119724274
1 3 0.19458931684494
2 3 0.196315720677376
1 4 0.137940123677254
2 4 0.135852232575417
3 4 0.141511023044586
Any pointers to where I'm going wrong would be appreciated.
When I change your distance matrix to start counting at 0, then it appears to work:
ID=0 reachdist=Infinity predecessor=-2147483648
ID=1 reachdist=0.0895585119724274 predecessor=-2147483648
ID=3 reachdist=0.135852232575417 predecessor=1
ID=2 reachdist=0.141511023044586 predecessor=3
Maybe you should file a bug report - to me, this appears to be a bug. Also, predecessor=-2147483648 should probably be predecessor=None or something like that.
This is due to a recent change, that may not yet be correctly presented in the documentation.
When you do multiple invocations in the MiniGUI, ELKI will assign fresh object DBIDs. So if you have a data set with 100 objects, the first run would use 0-99, the second 100-199 the third 200-299 etc. - this can be desired (if you think of longer running processes, you want object IDs to be unique), but it can also be surprising behavior.
However, this makes precomputed distance matrixes really hard to use; in particular with real data. Therefore, these classes were changed to use offsets. So the format of the distance matrix now is
DBIDoffset1 DBIDoffset2 distance
where offset 0 = start + 0 is the first object.
When I'm back in the office (and do not forget), I will 1. update the documentation to reflect this, provide 2. an offset parameter so that you can continue counting starting at 1, 3. make the default distance "NaN" or "infinity", and 4. add a sanity check that warns if you have 100 objects, but distances are given for objects 1-100 instead of 0-99.

How to effectively use the Levenshtein algorithm for text auto-completion

I'm using the Levenshtein distance algorithm to filter through some text in order to determine the best matching result for the purpose of text field auto-completion (and top 5 best results).
Currently, I have an array of strings, and apply the algorithm to each one in an attempt to determine how close of a match it is to the text which was typed by the user. The problem is that I'm not too sure how to interpret the values outputted by the algorithm to effectively rank the results as expected.
For example: (Text typed = "nvmb")
Result: "game" ; levenshtein distance = 3 (best match)
Result: "number the stars" ; levenshtein distance = 13 (second best match)
This technically makes sense; the second result needs many more 'edits', because of it's length. The problem is that the second result is logically and visually a much closer match than the first one. It's almost as if I should ignore any characters longer than the length of the typed text.
Any ideas on how I could achieve this?
Levenshtein distance itself is good for correcting query, not for auto-completion.
I can propose alternative solution:
First, store your strings in prefix tree instead of array, so you will have no need to analyze all of them.
Second, given user input enumerate strings with fixed distance from it and suggest completions for any.
Your example: Text typed = "nvmb"
Distance is 0, no completions
Enumerate strings with distance 1
Only "numb" will have some completions
Another example:Text typed="gamb"
For distance 0 you have only one completion, "gambling", make it first suggestion, and continue to get 4 more
For distance 1 you will get "game" and some completions for it
Of course, this approach sometimes gives more than 5 results, but you can order them by another criterion, not depending on current query.
I think it is more efficient because typically you can limit distance with at maximum two, i.e. check order of 1000*n prefixes, where n is length of input, most times less than number of stored strings.
The Levenshtein distance corresponds to the number of single-character insertions, deletions and substitutions in an optimal global pairwise alignment of two sequences if the gap and mismatch costs are all 1.
The Needleman-Wunsch DP algorithm will find such an alignment, in addition to its score (it's essentially the same DP algorithm as the one used to calculate the Levenshtein distance, but with the option to weight gaps, and mismatches between any given pair of characters, arbitrarily). But there are more general models of alignment that allow reduced penalties for gaps at the start or the end (and reduced penalties for contiguous blocks of gaps, which may also be useful here, although it doesn't directly answer the question). At one extreme, you have local alignment, which is where you pay no penalty at all for gaps at the ends -- this is computed by the Smith-Waterman DP algorithm. I think what you want here is in-between: You want to penalise gaps at the start of both the query and test strings, and gaps in the test string at the end, but not gaps in the query string at the end. That way, trailing mismatches cost nothing, and the costs will look like:
Query: nvmb
Costs: 0100000000000000 = 1 in total
Against: number the stars
Query: nvmb
Costs: 1101 = 3 in total
Against: game
Query: number the stars
Costs: 0100111111111111 = 13 in total
Against: nvmb
Query: ber star
Costs: 1110001111100000 = 8 in total
Against: number the stars
Query: some numbor
Costs: 111110000100000000000 = 6 in total
Against: number the stars
(In fact you might want to give trailing mismatches a small nonzero penalty, so that an exact match is always preferred to a prefix-only match.)
The Algorithm
Suppose the query string A has length n, and the string B that you are testing against has length m. Let d[i][j] be the DP table value at (i, j) -- that is, the cost of an optimal alignment of the length-i prefix of A with the length-j prefix of B. If you go with a zero penalty for trailing mismatches, you only need to modify the NW algorithm in a very simple way: instead of calculating and returning the DP table value d[n][m], you just need to calculate the table as before, and find the minimum of any d[n][j], for 0 <= j <= m. This corresponds to the best match of the query string against any prefix of the test string.

How does Lua 5.x represent sparse arrays?

Say, I have an array like this:
T = {1,2,[1000] = 3, [-1] = -1}
I know 1 and 2 will be in continous array part and -1 will be in hash part.
But I don't know where 3 will be. How it will be represented 'inside' Lua.
Would there be 997 wasted spaces between 2 and 3? Would 3 be delegated to hash part for efficency? Would there be 2 linked continous tables, one starting from index 1 and second starting from index 1000?
It depends on which version of Lua you use. In Lua 4, tables are implemented strictly as hash tables. In Lua 5, tables are part hash tables and part arrays, see the Lua Implementation where Section 4 covers tables and sparse arrays.
The array part tries to store the values corresponding to integer keys
from 1 to some limit n. Values corresponding to non-integer keys or
to integer keys outside the range are stored in the hash part. ... The
computed size of the array part is the largest n such that at least
half the blocks between 1 and n are in use... and there is at least
one slot used between n/2+1 and n.
In your example, 1000 would likely be outside the initial n, and would not cause the array part to grow as it would be too sparse.
You shouldn't need to worry about these details: just trust that Lua tables are implemented efficiently with expected constant-time access to an entry given its key. The array part is just an implementation detail to reduce memory usage by not needing to store some keys.
As explained by rpattiso, there is no memory waste in your example.

Resources