We would be given with stream of data i.e a,b,c,.. and we need to remove first occurring consecutive duplicate substring - stream

Remove repeating n tokens iteration.
1,
2,
3,
4,
iteration 1:
repeating 1 seq example
a,a,a,a,b,a,a,a,a,a, ==> a,b,a
^,^, ^
iteration 2:
repeating 2 seq example
a,a,a,a,b,a,b,a,b,a, ==> a,b,a
^, ^,^
iteration 3:
repeating 3 seq example
a,a,a,a,b,c,a,b,c,a,b,c,e ==> a,b,c,a,b,c,e
^, ^,^,^,^,^,^
.
.
.
iteration 4:(combination of iteration 2 and iteration 3)
a,a,a,a,b,c,a,b,c,a,b,c,b,c,e ==> a,b,c,a,b,c,e
^, ^,^,^, ^,^,^
Here we would be given with stream of tokens(here token refers to msg id which has further information about time stamp and some other information).
So as we are having time stamp dedicated to each message id so we require only latest timestamp and need to remove duplicate substring which had occurred previously.
Thus only recent duplicate substring would be stored and previous consecutive substring would be removed.
If after removal of duplicate substring if we have stored any substring then a new window will start for upcoming stream which will check for duplicates in next window of stream of character.

Related

Why the output differs in these two erlang expression sequence in shell?

In Erlang shell why the following produces different result?
1> Total=15.
2> Calculate=fun(Number)-> Total=2*Number end.
3> Calculate(6).
exception error: no match of right hand side value 12
1> Calculate=fun(Number)-> Total=2*Number end.
2> Total=15.
3> Calculate(6).
12
In Erlang the = operator is both assignment and assertion.
If I do this:
A = 1,
A = 2,
my program will crash. I just told it that A = 1 which, when A is unbound (doesn't yet exist as a label) it now is assigned the value 1 forever and ever -- until the scope of execution changes. So then when I tell it that A = 2 it tries to assert that the value of A is 2, which it is not. So we get a crash on a bad match.
Scope in Erlang is defined by two things:
Definition of the current function. This scope is absolute for the duration of the function definition.
Definition of the current lambda or list comprehension. This scope is local to the lambda but also closes over whatever values from the outer scope are referenced.
These scopes are always superceded at the time they are declared by whatever is in the outer scope. That is how we make closures with anonymous functions. For example, let's say I had have a socket I want to send a list of data through. The socket is already bound to the variable name Socket in the head of the function, and we want to use a list operation to map the list of values to send to a side effect of being sent over that specific socket. I can close over the value of the socket within the body of a lambda, which has the effect of currying that value out of the more general operation of "sending some data":
send_stuff(Socket, ListOfMessages) ->
Send = fun(Message) -> ok = gen_tcp:send(Socket, Message) end,
lists:foreach(Send, ListOfMessages).
Each iteration of the list operation lists:foreach/2 can only accept a function of arity 1 as its first argument. We have created a closure that captures the value of Socket internally already (because that was already bound in the outer scope) and combines it with the unbound, inner variable Message. Note also that we are checking whether gen_tcp:send/2 worked each time within the lambda by asserting that the return value of gen_tcp:send/2 was really ok.
This is a super useful property.
So with that in mind, let's look at your code:
1> Total = 15.
2> Calculate = fun(Number)-> Total = 2 * Number end.
3> Calculate(6).
In the code above you've just assigned a value to Total, meaning you have created a label for that value (just like we had assigned Socket in the above example). Then later you are asserting that the value of Total is whatever the result of 2 * Number might be -- which can never be true since Total was an integer so 2 * 7.5 wouldn't cut it either, because the result would be 15.0, not 15.
1> Calculate = fun(Number)-> Total = 2 * Number end.
2> Total = 15.
3> Calculate(6).
In this example, though, you've got an inner variable called Total which does not close over any value declared in the outer scope. Later, you are declaring a label in the outer scope called Total, but by this time the lambda definition on the first line has been converted to an abstract function and the label Total as used there has been completely given over to the immutable space of the new function definition the assignment to Calculate represented. Thus, no conflict.
Consider what happens, for example, with trying to reference an inner value from a list comprehension:
1> A = 2.
2
2> [A * B || B <- lists:seq(1,3)].
[2,4,6]
3> A.
2
4> B.
* 1: variable 'B' is unbound
This is not what you would expect from, say, Python 2:
>>> a = 2
>>> a
2
>>> [a * b for b in range(1,4)]
[2, 4, 6]
>>> b
3
Incidentally, this has been fixed in Python 3:
>>> a = 2
>>> a
2
>>> [a * b for b in range(1,4)]
[2, 4, 6]
>>> b
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'b' is not defined
(And I would provide a JavaScript example for comparison as well, but the scoping rules there are just so absolutely insane it doesn't even matter...)
In the first case you have bound Total to 15. In Erlang, variable are unmutable, but in the shell when you write Total = 15. you do not really create the variable Total, the shell does its best to mimic the behavior you will have if you were running an application, and it stores in a table the couple {"Total",15}.
On the next line you define the fun Calculate. the Parser find the expression Total=2*Number, and it goes through its table to detect that Total was previously defined. The evaluation is turned into something equivalent to 15 = 2*Number.
So, in the third line, when you ask to evaluate Calculate(6), it goes to calculate and evaluates 15 = 2*6 and issues the error message
exception error: no match of right hand side value 12
In the second example, Total is not yet defined when you define the function. The function is stored without assignment (Total is not used anymore), at least no assignment to a global variable. So there is no conflict when you define Total, and no error when you evaluate Calculate(6).
The behavior would be exactly the same in a compiled module.
The variable 'Total' is already assigned a value 15, so you can NOT using the same variable name Total in the second line. You should change to the other name Total1 or Total2...

Lua Pattern Matching - force optional parameters to be filled before it moves on?

I need a pattern that will grab values for two or more parameters of varying length.
I need to convert a scanf format %i to a lua pattern and it is proving very difficult. I don't need to worry about the type of storage that can be passed in with scanf. Just the %i for integer and if they specify a specific link.
Documentation for scanf, if needed, can be found here.
This is what I have so far:
if(scanfLetter == "i" or scanfLetter == "d" or scanfLetter == "u") then
if(specifiedNum == 0)then
newPattern = "([%+%-]?%d+)"
elseif(specifiedNum >= 2)then
newPattern = "([%+%-%d]?%d^".. string.rep("%d?", specifiedNum-2)..")$"
else
newPattern = "(%d)"
end
which basically just checks to see if they passed in a specific number or not (specifiedNum). If it is 0 that means they didn't.
It all works until there are two length specified %is in a row. For example:
%5i%6i
If they enter in 6 or more characters it works fine because match will return two values: the first one consisting of the first 5 numbers and the 6th goes the next value.
The trouble is if there are fewer than 5. In scanf or printf it will not match %5i%6i if there are 5 or fewer numbers, but in my pattern it will still match under string.match and it returns all the values but the last in the first return and the second value get the last number entered.
More specific example so you don't have to type it all out and see it the pattern ends up looking like.
Given:
([%.%+%-%d]?%d[%.%d]?[%.%d]?[%.%d]?)([%.%+%-%d]?%d[%.%d]?[%.%d]?[%.%d]?[%.%d]?)
If 123456789 is passed to it, the match returns 2 values:
`12345` and `6789`
However, if 1234 is passed in, the match returns 2 values:
`123` and `4`
which is incorrect (it should not be a match).
Is what I seek possible?
(Maybe somebody has already written a scanf format to lua patterns converter?)

PostScript execution of nested procedures

(I'm back with yet another question :-) )
Given the following PostScript code:
/riverside { 5 pop } def
/star { 6 pop 2 {riverside} repeat } def
star
I'm wondering how nested procedures should be handled. (I'm creating my own interpreter).
When I execute the star procedure, halfway it finds a nameObjec(riverside) and replaces it with an executable array containing the values from the riverside procedure and executes them.
If I execute the repeat operator the interpreter crashes because there is only one item left on the stack.
Should I actually execute an executable array (=procedure) directly when I'm already in an executable array (=prodecure), or should the executable arrays (=procedures) always be pushed on the (operand?/execution?)stack? or only be executed by another operator?
How many times should this riverside be executed? (2 or 3 times?) I guess 2?
For your information: this is the situation that I have when I execute star on the 3rd line (see the ERROR):
% begin execute 3rd line (star)
% OP = operand stack
% EX = execution stack
% handle 6
OP: 6
EX: star
% handle pop (removes 6 from OP)
OP: -
EX: star
% handle 2
OP: 2
EX: star
% set the riverside executable array on the EX, execute the values
OP: 2
EX: star riverside
% repeat operator:
CRASH, only one item on the OP left, but repeat operator requires 2 operands.
OP: 5
EX:
% end
Please shine a light on this matter, because it is somewhat complex/confusing :-)
Update:
another code sample might be this one:
/starside
{ 72 0 lineto
currentpoint translate
-144 rotate } def
/star
{ moveto
currentpoint translate
4 {starside} repeat
closepath
gsave
.5 setgray fill
grestore
stroke } def
200 200 star
showpage
when the interpreter tokenizes /star { moveto ... if it encounters the nested {starside} how will that be treated? (+ what if there was {starside 5 2 mul pop} instead of only {starside} ?)
I believe you need to look at section 3.5.3 of the PLRM. Although this deals with a simple executable array the concept is the same. When the token scanner encounters a '{' it starts to build an executable array. Until it reaches a matching '}' token the scanner simply stores what it encounters on the operand stack. When it encounters the matching '{' then the objects are converted into an executable array (and stored on the operand stack)
In the case of the scanner encountering an executable name, it stores the name on the operand stack. It does not execute the name, nor does it even perform lookup on it to retrieve the associated object.
So immediately before the execution of '}' in your example, the operand stack would contain twp objects, the '{' opening array, and the executable name riverside. When you encounter the '}' then the scanner creates the actual executable array and stores it on the operand stack. (Note, implementation details vary here)
So immediately before the execution of 'repeat' you would have two objects on the stack, the counter and an executable array containing a single executable name.
You don't look up the name until the executable array containing the name is executed.
This might make it clearer:
%!
/test {(This is my initial string\n) print} def
2 {test} repeat
2 {test} /test {(This is my second string\n) print} def repeat
Notice that I've redefined 'test' after creating the executable array containing the executable name 'test', yet the execution uses the later definition of test. As you can see, its vitally important not to do name lookup too early!

Read numbers following a keyword into an array in Fortran 90 from a text file

I have many text files of this format
....
<snip>
'FOP' 0.19 1 24 1 25 7 8 /
'FOP' 0.18 1 24 1 25 9 11 /
/
TURX
560231
300244
70029
200250
645257
800191
900333
600334
770291
300335
220287
110262 /
SUBTRACT
'TURX' 'TURY'/
</snip>
......
where the portions I snipped off contain other various data in various formats. The file format is inconsistent (machine generated), the only thing one is assured of is the keyword TURX which may appear more than once. If it appears alone on one line, then the next few lines will contain numbers that I need to fetch into an array. The last number will have a space then a forward slash (/). I can then use this array in other operations afterwards.
How do I "search" or parse a file of unknown format in fortran, and how do I get a loop to fetch the rest of the data, please? I am really new to this and I HAVE to use fortran. Thanks.
Fortran 95 / 2003 have a lot of string and file handling features that make this easier.
For example, this code fragment to process a file of unknown length:
use iso_fortran_env
character (len=100) :: line
integer :: ReadCode
ReadLoop: do
read (75, '(A)', iostat=ReadCode ) line
if ( ReadCode /= 0 ) then
if ( ReadCode == iostat_end ) then
exit ReadLoop
else
write ( *, '( / "Error reading file: ", I0 )' ) ReadCode
stop
end if
end if
! code to process the line ....
end do ReadLoop
Then the "process the line" code can contain several sections depending on a logical variable "Have_TURX". If Have_TRUX is false you are "seeking" ... test whether the line contains "TURX". You could use a plain "==" if TURX is always at the start of the string, or for more generality you could use the intrinsic function "index" to test whether the string "line" contains TURX.
Once the program is in the mode Have_TRUX is true, then you use "internal I/O" to read the numeric value from the string. Since the integers have varying lengths and are left-justified, the easiest way is to use "list-directed I/O": combining these:
read (line, *) integer_variable
Then you could use the intrinsic function "index" again to test whether the string also contains a slash, in which case you change Have_TRUX to false and end reading mode.
If you need to put the numbers into an array, it might be necessary to read the file twice, or to backspace the file, because you will have to allocate the array, and you can't do that until you know the size of the array. Or you could pop the numbers into a linked list, then when you hit the slash allocate the array and fill it from the linked list. Or if there is a known maximum number of values you could use a temporary array, then transfer the numbers to an allocatable output array. This is assuming that you want the output argument of the subroutine be an allocatable array of the correct length, and the it returns one group of numbers per call:
integer, dimension (:), allocatable, intent (out) :: numbers
allocate (numbers (1: HowMany) )
P.S. There is a brief summary of the language features at http://en.wikipedia.org/wiki/Fortran_95_language_features and the gfortran manual has a summary of the intrinsic procedures, from which you can see what built in functions are available for string handling.
I'll give you a nudge in the right direction so that you can finish your project.
Some basics:
Do/While as you'll need some sort of loop
structure to loop through the file
and then over the numbers. There's
no for loop in Fortran, so use this
type.
Read
to read the strings.
To start you need something like this:
program readlines
implicit none
character (len=30) :: rdline
integer,dimension(1000) :: array
! This sets up a character array with 30 positions and an integer array with 1000
!
open(18,file='fileread.txt')
do
read(18,*) rdline
if (trim(rdline).eq.'TURX') exit !loop until the trimmed off portion matches TURX
end do
See this thread for way to turn your strings into integers.
Final edit: Looks like MSB has got most of what I just found out. The iostat argument of the read is the key to it. See this site for a sample program.
Here was my final way around it.
PROGRAM fetchnumbers
implicit none
character (len=50) ::line, numdata
logical ::is_numeric
integer ::I,iost,iost2,counter=0,number
integer, parameter :: long = selected_int_kind(10)
integer, dimension(1000)::numbers !Can the number of numbers be up to 1000?
open(20,file='inputfile.txt') !assuming file is in the same location as program
ReadLoop: do
read(20,*,iostat=iost) line !read data line by line
if (iost .LT. 0) exit !end of file reached before TURX was found
if (len_trim(line)==0) cycle ReadLoop !ignore empty lines
if (index(line, 'TURX').EQ.1) then !prepare to begin capturing
GetNumbers: do
read(20, *,iostat=iost2)numdata !read in the numbers one by one
if (.NOT.is_numeric(numdata)) exit !no more numbers to read
if (iost2 .LT. 0) exit !end of file reached while fetching numbers
read (numdata,*) number !read string value into a number
counter = counter + 1
Storeloop: do I =1,counter
if (I<counter) cycle StoreLoop
numbers(counter)=number !storing data into array
end do StoreLoop
end do GetNumbers
end if
end do ReadLoop
write(*,*) "Numbers are:"
do I=1,counter
write(*,'(I14)') numbers(I)
end do
END PROGRAM fetchnumbers
FUNCTION is_numeric(string)
IMPLICIT NONE
CHARACTER(len=*), INTENT(IN) :: string
LOGICAL :: is_numeric
REAL :: x
INTEGER :: e
is_numeric = .FALSE.
READ(string,*,IOSTAT=e) x
IF (e == 0) is_numeric = .TRUE.
END FUNCTION is_numeric

Constrained Sequence to Index Mapping

I'm puzzling over how to map a set of sequences to consecutive integers.
All the sequences follow this rule:
A_0 = 1
A_n >= 1
A_n <= max(A_0 .. A_n-1) + 1
I'm looking for a solution that will be able to, given such a sequence, compute a integer for doing a lookup into a table and given an index into the table, generate the sequence.
Example: for length 3, there are 5 the valid sequences. A fast function for doing the following map (preferably in both direction) would be a good solution
1,1,1 0
1,1,2 1
1,2,1 2
1,2,2 3
1,2,3 4
The point of the exercise is to get a packed table with a 1-1 mapping between valid sequences and cells.
The size of the set in bounded only by the number of unique sequences possible.
I don't know now what the length of the sequence will be but it will be a small, <12, constant known in advance.
I'll get to this sooner or later, but though I'd throw it out for the community to have "fun" with in the meantime.
these are different valid sequences
1,1,2,3,2,1,4
1,1,2,3,1,2,4
1,2,3,4,5,6,7
1,1,1,1,2,3,2
these are not
1,2,2,4
2,
1,1,2,3,5
Related to this
There is a natural sequence indexing, but no so easy to calculate.
Let look for A_n for n>0, since A_0 = 1.
Indexing is done in 2 steps.
Part 1:
Group sequences by places where A_n = max(A_0 .. A_n-1) + 1. Call these places steps.
On steps are consecutive numbers (2,3,4,5,...).
On non-step places we can put numbers from 1 to number of steps with index less than k.
Each group can be represent as binary string where 1 is step and 0 non-step. E.g. 001001010 means group with 112aa3b4c, a<=2, b<=3, c<=4. Because, groups are indexed with binary number there is natural indexing of groups. From 0 to 2^length - 1. Lets call value of group binary representation group order.
Part 2:
Index sequences inside a group. Since groups define step positions, only numbers on non-step positions are variable, and they are variable in defined ranges. With that it is easy to index sequence of given group inside that group, with lexicographical order of variable places.
It is easy to calculate number of sequences in one group. It is number of form 1^i_1 * 2^i_2 * 3^i_3 * ....
Combining:
This gives a 2 part key: <Steps, Group> this then needs to be mapped to the integers. To do that we have to find how many sequences are in groups that have order less than some value. For that, lets first find how many sequences are in groups of given length. That can be computed passing through all groups and summing number of sequences or similar with recurrence. Let T(l, n) be number of sequences of length l (A_0 is omitted ) where maximal value of first element can be n+1. Than holds:
T(l,n) = n*T(l-1,n) + T(l-1,n+1)
T(1,n) = n
Because l + n <= sequence length + 1 there are ~sequence_length^2/2 T(l,n) values, which can be easily calculated.
Next is to calculate number of sequences in groups of order less or equal than given value. That can be done with summing of T(l,n) values. E.g. number of sequences in groups with order <= 1001010 binary, is equal to
T(7,1) + # for 1000000
2^2 * T(4,2) + # for 001000
2^2 * 3 * T(2,3) # for 010
Optimizations:
This will give a mapping but the direct implementation for combining the key parts is >O(1) at best. On the other hand, the Steps portion of the key is small and by computing the range of Groups for each Steps value, a lookup table can reduce this to O(1).
I'm not 100% sure about upper formula, but it should be something like it.
With these remarks and recurrence it is possible to make functions sequence -> index and index -> sequence. But not so trivial :-)
I think hash with out sorting should be the thing.
As A0 always start with 0, may be I think we can think of the sequence as an number with base 12 and use its base 10 as the key for look up. ( Still not sure about this).
This is a python function which can do the job for you assuming you got these values stored in a file and you pass the lines to the function
def valid_lines(lines):
for line in lines:
line = line.split(",")
if line[0] == 1 and line[-1] and line[-1] <= max(line)+1:
yield line
lines = (line for line in open('/tmp/numbers.txt'))
for valid_line in valid_lines(lines):
print valid_line
Given the sequence, I would sort it, then use the hash of the sorted sequence as the index of the table.

Resources