AWK (or similar) - change 2 lines below the matching pattern - parsing

I have a problem that I think it's easiest to solve with awk but I wrapped my head around it.
Inside a file I have repeating output like this:
....
Name="BgpIpv4RouteConfig_XXX">
<Ipv4NetworkBlock id="13726"
StartIpList="x.y.z.t"
PrefixLength="30"
NetworkCount="10000"
... other output
then this block will repeat.
a)I want to match on BGPIpv4Route.*, then skip 2 lines (the "n" keyword of awk), then when reaching Prefix Length:
- either replace it with random (25,30)
or
- better but I guess harder (no idea came to mind for keeping track of what was used and looping among /25../30) -> first occurrence /25, second one /26...till /30, then rollback to /25
b) then next line with NetworkCount depending on the new value of PrefixCount calculate it as 65536 / 2^(32-Prefix Count)
eg: if PrefixCount on this occurrence was replaced with /25, then NetworkCount on the line following it = 65536 / 2 ^ 7 = 65536 / 128 = 512
I found some examples with inserting/changing a line after one that matched (or with a counter variable X lines below the match) but I got a bit confused with the value generation part and also with the changing of two lines where one is depending on the other.
Not sure I made any sense...my head is a bit overwhelmed with what I'm finding everywhere right now.
Thanks in advance!

this should do
$ awk 'BEGIN {q="\""; FS=OFS="="; n=split("25=26=27=28=29=30",ps)}
/BgpIpv4Route/ {c=c%n+1}
/PrefixLength/ {$2=q ps[c] q}
/NetworkCount/ {$2=q 65536/2^(32-ps[c]) q}1' file
perhaps minimize computation by changing to 2^(ps[c]-16)
If there are free standing PrefixLength and NetworkCount attributes perhaps you need to qualify them for each BgpIpv4Route context.

Related

Finding the number of digits in a number restricted number of tools since I am a Python beginner

def digits(n):
total=0
for i in range(0,n):
if n/(10**(i))<1 and n/(10**(i-1))=>1:
total+=i
else:
total+=0
return total
I want to find the number of digits in 13 so I do the below
print digits(13)
it gives me $\0$ for every number I input into the function.
there's nothing wrong with what I've written as far as I can see:
if a number has say 4 digits say 1234 then dividing by 10^4 will make it less than 1: 0.1234 and dividing by 10^3 will make it 1.234
and by 10^3 will make it 1.234>1. when i satisfies BOTH conditions you know you have the correct number of digits.
what's failing here? Please can you advise me on the specific method I've tried
and not a different one?
Remember for every n there can only be one i which satisfies that condition.
so when you add i to the total there will only be i added so total returning total will give you i
your loop makes no sense at all. It goes from 0 to exact number - not what you want.
It looks like python, so grab a solution that uses string:
def digits(n):
return len(str(int(n))) # make sure that it's integer, than conver to string and return number of characters == number of digits
EDIT:
If you REALLY want to use a loop to count number of digits, you can do this this way:
def digits(n):
i = 0
while (n > 1):
n = n / 10
++i
return i
EDIT2:
since you really want to make your solution work, here is your problem. Provided, that you call your function like digits(5), 5 is of type integer, so your division is integer-based. That means, that 6/100 = 0, not 0.06.
def digits(n):
for i in range(0,n):
if n/float(10**(i))<1 and n/float(10**(i-1))=>1:
return i # we don't need to check anything else, this is the solution
return null # we don't the answer. This should not happen, but still, nice to put it here. Throwing an exception would be even better
I fixed it. Thanks for your input though :)
def digits(n):
for i in range(0,n):
if n/(10**(i))<1 and n/(10**(i-1))>=1:
return i

How to reference a particular row for an existing variable in SPSS syntax?

I have 2 variables, one for raw p-values and another for adjusted p-values. I need to compute a new variable based on the values of these two variables. What I need to do isn't too complicated, but I have a hard time doing it in SPSS because I can't figure out how I can reference a particular row for an existing variable in SPSS syntax.
The first column lists raw p-values in ascending order. The next column lists adjusted p-values, but these adjusted p-values are still incomplete. I need to compare two adjacent p-values in the adjusted p-values column (e.g., row 1 and 2, row 2 and 3, row 3 and 4, and so forth), and take the p-values whichever is smaller in each of these comparisons and enter those p-values into the following column as values for a new variable.
However, that's not the end of the story. One more condition has to be met. That is, the new p-values have to be in the same order as the raw p-values. However, I cannot ensure this if I start the comparisons from the top row. You can see that (i') is greater than (h') and (g'), and (d') is greater than (c'), (b'), and (a') in the example below (picture).
In order to solve this issue, I would need to start the comparison of the adjusted p-values from the bottom. In addition, I would need to compare the adjusted p-values to the new p-values of one row below. One exception is that I can simply use the value of (a) as the value of (a') since the value of (a) should always be the greatest of all the p-values as a rule. Then, for (b') , I need to compare (b) and (a') and enter whichever is smaller as (b'). For (c'), I need to compare (c) and (b') and enter whichever is smaller as (c'), and so forth. By doing this way, (d') would be 0.911 and (i') would be 0.017.
Sorry for this long post, but I would really appreciate if I can get some help to do this task in SPSS.
Thank you in advance for your help.
Raw p-values | Adjusted p-values (Temporal)| New p-values (Final)
-------------|-----------------------------|---------------------
0.002 | 0.030 (i) | 0.025 (i')
0.003 | 0.025 (h) | 0.017 (h')
0.004 | 0.017 (g) | 0.017 (g')
0.005 | 0.028 (f) | 0.028 (f')
0.023 | 0.068 (e) | 0.068 (e')
0.450 | 1.061 (d) | 1.061 (d')
0.544 | 1.145 (c) | 0.911 (c')
0.850 | 0.911 (b) | 0.911 (b')
0.974 | 0.974 (a) | 0.974 (a')
Another tool that may be convenient is the SHIFT VALUES command. It can move one or more columns of data either forward or backward.
I wonder whether the purpose of this has to do with adjusting p values for multiple testing corrections as with Benjamin-Hochberg FDR or others similar. If that is the case, you might find the STATS PADJUST (Analyze > Descriptives > Calculate adjusted p values) extension command useful. It offers six adjustment methods. You can install it from the Utilities (pre-V24) or Extensions (V24+) menu.
To get you started, here are a few tools that can help you with this task:
The LAG function
you can compare values in this line and the previous one, for example, the following will compare the Pval in each line to the one in the previous one, and put the smaller of the two in the NewPval:
compute NewPVal=min(Pval, lag(Pval)).
If you want to do the same process only start from the bottom, you can easily sort your data in reverse order and do the same.
CREATE + LEAD
if you want to make comparisons to the next line instead of the previous line, you should first create a "lead" variable and then compare to it.
for example, the following syntax will create a new variable that for each line contains the value of Pval in the next line, and then chooses the smaller of the two for the NewPval:
create /LeadPval=LEAD(Pval 1).
compute NewPVal=min(Pval, LeadPval).
Using case numbers
You can use case numbers (line numbers) in calculations and in conditions. For example, the following syntax will let you make different calculations in the first line and the following ones:
if $casenum=1 NewPval=Pval.
if $casenum>1 NewPVal=min(Pval, lag(Pval)).

Can't modify loop-variable in lua [duplicate]

This question already has answers here:
Lua for loop reduce i? Weird behavior [duplicate]
(3 answers)
Closed 7 years ago.
im trying this in lua:
for i = 1, 10,1 do
print(i)
i = i+2
end
I would expect the following output:
1,4,7,10
However, it seems like i is getting not affected, so it gives me:
1,2,3,4,5,6,7,8,9,10
Can someone tell my a bit about the background concept and what is the right way to modify the counter variable?
As Colonel Thirty Two said, there is no way to modify a loop variable in Lua. Or rather more to the point, the loop counter in Lua is hidden from you. The variable i in your case is merely a copy of the counter's current value. So changing it does nothing; it will be overwritten by the actual hidden counter every time the loop cycles.
When you write a for loop in Lua, it always means exactly what it says. This is good, since it makes it abundantly clear when you're doing looping over a fixed sequence (whether a count or a set of data) and when you're doing something more complicated.
for is for fixed loops; if you want dynamic looping, you must use a while loop. That way, the reader of the code is aware that looping is not fixed; that it's under your control.
When using a Numeric for loop, you can change the increment by the third value, in your example you set it to 1.
To see what I mean:
for i = 1,10,3 do
print(i)
end
However this isn't always a practical solution, because often times you'll only want to modify the loop variable under specific conditions. When you wish to do this, you can use a while loop (or if you want your code to run at least once, a repeat loop):
local i = 1
while i < 10 do
print(i)
i = i + 1
end
Using a while loop you have full control over the condition, and any variables (be they global or upvalues).
All answers / comments so far only suggested while loops; here's two more ways of working around this problem:
If you always have the same step size, which just isn't 1, you can explicitly give the step size as in for i =start,end,stepdo … end, e.g. for i = 1, 10, 3 do … or for i = 10, 1, -1 do …. If you need varying step sizes, that won't work.
A "problem" with while-loops is that you always have to manually increment your counter and forgetting this in a sub-branch easily leads to infinite loops. I've seen the following pattern a few times:
local diff = 0
for i = 1, n do
i = i+diff
if i > n then break end
-- code here
-- and to change i for the next round, do something like
if some_condition then
diff = diff + 1 -- skip 1 forward
end
end
This way, you cannot forget incrementing i, and you still have the adjusted i available in your code. The deltas are also kept in a separate variable, so scanning this for bugs is relatively easy. (i autoincrements so must work, any assignment to i below the loop body's first line is an error, check whether you are/n't assigning diff, check branches, …)

Can a SHA-1 hash be all-zeroes?

Is there any input that SHA-1 will compute to a hex value of fourty-zeros, i.e. "0000000000000000000000000000000000000000"?
Yes, it's just incredibly unlikely. I.e. one in 2^160, or 0.00000000000000000000000000000000000000000000006842277657836021%.
Also, becuase SHA1 is cryptographically strong, it would also be computationally unfeasible (at least with current computer technology -- all bets are off for emergent technologies such as quantum computing) to find out what data would result in an all-zero hash until it occurred in practice. If you really must use the "0" hash as a sentinel be sure to include an appropriate assertion (that you did not just hash input data to your "zero" hash sentinel) that survives into production. It is a failure condition your code will permanently need to check for. WARNING: Your code will permanently be broken if it does.
Depending on your situation (if your logic can cope with handling the empty string as a special case in order to forbid it from input) you could use the SHA1 hash ('da39a3ee5e6b4b0d3255bfef95601890afd80709') of the empty string. Also possible is using the hash for any string not in your input domain such as sha1('a') if your input has numeric-only as an invariant. If the input is preprocessed to add any regular decoration then a hash of something without the decoration would work as well (eg: sha1('abc') if your inputs like 'foo' are decorated with quotes to something like '"foo"').
I don't think so.
There is no easy way to show why it's not possible. If there was, then this would itself be the basis of an algorithm to find collisions.
Longer analysis:
The preprocessing makes sure that there is always at least one 1 bit in the input.
The loop over w[i] will leave the original stream alone, so there is at least one 1 bit in the input (words 0 to 15). Even with clever design of the bit patterns, at least some of the values from 0 to 15 must be non-zero since the loop doesn't affect them.
Note: leftrotate is circular, so no 1 bits will get lost.
In the main loop, it's easy to see that the factor k is never zero, so temp can't be zero for the reason that all operands on the right hand side are zero (k never is).
This leaves us with the question whether you can create a bit pattern for which (a leftrotate 5) + f + e + k + w[i] returns 0 by overflowing the sum. For this, we need to find values for w[i] such that w[i] = 0 - ((a leftrotate 5) + f + e + k)
This is possible for the first 16 values of w[i] since you have full control over them. But the words 16 to 79 are again created by xoring the first 16 values.
So the next step could be to unroll the loops and create a system of linear equations. I'll leave that as an exercise to the reader ;-) The system is interesting since we have a loop that creates additional equations until we end up with a stable result.
Basically, the algorithm was chosen in such a way that you can create individual 0 words by selecting input patterns but these effects are countered by xoring the input patterns to create the 64 other inputs.
Just an example: To make temp 0, we have
a = h0 = 0x67452301
f = (b and c) or ((not b) and d)
= (h1 and h2) or ((not h1) and h3)
= (0xEFCDAB89 & 0x98BADCFE) | (~0x98BADCFE & 0x10325476)
= 0x98badcfe
e = 0xC3D2E1F0
k = 0x5A827999
which gives us w[0] = 0x9fb498b3, etc. This value is then used in the words 16, 19, 22, 24-25, 27-28, 30-79.
Word 1, similarly, is used in words 1, 17, 20, 23, 25-26, 28-29, 31-79.
As you can see, there is a lot of overlap. If you calculate the input value that would give you a 0 result, that value influences at last 32 other input values.
The post by Aaron is incorrect. It is getting hung up on the internals of the SHA1 computation while ignoring what happens at the end of the round function.
Specifically, see the pseudo-code from Wikipedia. At the end of the round, the following computation is done:
h0 = h0 + a
h1 = h1 + b
h2 = h2 + c
h3 = h3 + d
h4 = h4 + e
So an all 0 output can happen if h0 == -a, h1 == -b, h2 == -c, h3 == -d, and h4 == -e going into this last section, where the computations are mod 2^32.
To answer your question: nobody knows whether there exists an input that produces all zero outputs, but cryptographers expect that there are based upon the simple argument provided by daf.
Without any knowledge of SHA-1 internals, I don't see why any particular value should be impossible (unless explicitly stated in the description of the algorithm). An all-zero value is no more or less probable than any other specific value.
Contrary to all of the current answers here, nobody knows that. There's a big difference between a probability estimation and a proof.
But you can safely assume it won't happen. In fact, you can safely assume that just about ANY value won't be the result (assuming it wasn't obtained through some SHA-1-like procedures). You can assume this as long as SHA-1 is secure (it actually isn't anymore, at least theoretically).
People doesn't seem realize just how improbable it is (if all humanity focused all of it's current resources on finding a zero hash by bruteforcing, it would take about xxx... ages of the current universe to crack it).
If you know the function is safe, it's not wrong to assume it won't happen. That may change in the future, so assume some malicious inputs could give that value (e.g. don't erase user's HDD if you find a zero hash).
If anyone still thinks it's not "clean" or something, I can tell you that nothing is guaranteed in the real world, because of quantum mechanics. You assume you can't walk through a solid wall just because of an insanely low probability.
[I'm done with this site... My first answer here, I tried to write a nice answer, but all I see is a bunch of downvoting morons who are wrong and can't even tell the reason why are they doing it. Your community really disappointed me. I'll still use this site, but only passively]
Contrary to all answers here, the answer is simply No.
The hash value always contains bits set to 1.

Erlang: What is most-wrong with this trie implementation?

Over the holidays, my family loves to play Boggle. Problem is, I'm terrible at Boggle. So I did what any good programmer would do: wrote a program to play for me.
At the core of the algorithm is a simple prefix trie, where each node is a dict of references to the next letters.
This is the trie:add implementation:
add([], Trie) ->
dict:store(stop, true, Trie);
add([Ch|Rest], Trie) ->
% setdefault(Key, Default, Dict) ->
% case dict:find(Key, Dict) of
% { ok, Val } -> { Dict, Val }
% error -> { dict:new(), Default }
% end.
{ NewTrie, SubTrie } = setdefault(Ch, dict:new(), Trie),
NewSubTrie = add(Rest, SubTrie),
dict:store(Ch, NewSubTrie, NewTrie).
And you can see the rest, along with an example of how it's used (at the bottom), here:
http://gist.github.com/263513
Now, this being my first serious program in Erlang, I know there are probably a bunch of things wrong with it… But my immediate concern is that it uses 800 megabytes of RAM.
So, what am I doing most-wrong? And how might I make it a bit less-wrong?
You could implement this functionality by simply storing the words in an ets table:
% create table; add words
> ets:new(words, [named_table, set]).
> ets:insert(words, [{"zed"}]).
> ets:insert(words, [{"zebra"}]).
% check if word exists
> ets:lookup(words, "zed").
[{"zed"}]
% check if "ze" has a continuation among the words
78> ets:match(words, {"ze" ++ '$1'}).
[["d"],["bra"]]
If trie is a must, but you can live with a non-functional approach, then you can try digraphs, as Paul already suggested.
If you want to stay functional, you might save some bytes of memory by using structures using less memory, for example proplists, or records, such as -record(node, {a,b,....,x,y,z}).
I don't remember how much memory a dict takes, but let's estimate. You have 2.5e6 characters and 2e5 words. If your trie had no sharing at all, that would take 2.7e6 associations in the dicts (one for each character and each 'stop' symbol). A simple purely-functional dict representation would maybe 4 words per association -- it could be less, but I'm trying to get an upper bound. On a 64-bit machine, that'd take 8*4*2.7 million bytes, or 86 megabytes. That's only a tenth of your 800M, so something's surely wrong here.
Update: dict.erl represents dicts with a hashtable; this implies lots of overhead when you have a lot of very small dicts, as you do. I'd try changing your code to use the proplists module, which ought to match my calculations above.
An alternative way to solve the problem is going through the word list and see if the word can be constructed from the dice. That way you need very little RAM, and it might be more fun to code. (optimizing and concurrency)
Look into DAWGs. They're much more compact than tries.
I don't know about your algorithm, but if you're storing that much data, maybe you should look into using Erlang's built-in digraph library to represent your trie, instead of so many dicts.
http://www.erlang.org/doc/man/digraph.html
If all words are in English, and the case doesn't matter, all characters can be encoded by numbers from 1 to 26 (and in fact, in Erlang they are numbers from 97 to 122), reserving 0 for stop. So you can use the array module as well.

Resources