How to take in digits 0-255 from a file with no delimeters - parsing

I have a plaintext file that has only numerical digits in it (no spaces, commas, newlines, etc.) which contains n digits which range from 0 to 255. I want to take it in and store these values in an array.
Example
Let's say we have this sequence in the file:
581060100962552569
I want to take it in like this, where in.read is the file input stream, tempArray is a local array of at most 3 variables that is wiped every time something is stored in endArray, which is where I want the final values to go:
in.read tempArray endArray
5 [5][ ][ ] [] //It reads in "5", sees single-digit number X guarantees that "5X" is less than or equal to 255, and continues
8 [5][8][ ] [58] //It reads in "8", realizes that there's no number X that could make "58X" smaller than or equal to "255", so it stores "58" in endArray
1 [1][ ][ ] [58] //It wipes tempArray and reads the next value into it, repeating the logic of the first step
0 [1][0][ ] [58] //It realizes that all single-digit numbers X guarantee that "10X" is less than or equal to "255", so it continues
6 [1][0][6] [58][106] //It reads "6" and adds "106" to the endArray
0 [0][ ][ ] [58][106] //It wipes tempArray and stores the next value in it
1 [0][1][ ] [58][106]
0 [0][1][0] [58][106][10] //Even though all single-digit numbers X guarantee that "010X" is less than or equal to "255", tempArray is full, so it stores its contents in endArray as "10".
0 [0][ ][ ] [58][106][10]
9 [0][9][ ] [58][106][10]
6 [0][9][6] [58][106][10][96] //Not only can "96" not have another number appended to it, but tempArray is full
2 [2][ ][ ] [58][106][10][96]
5 [2][5][ ] [58][106][10][96] //There are numbers that can be appended to "25" to make a number less than or equal to "255", so continue
5 [2][5][5] [58][106][10][96][255] //"5" can be appended to "25" and still be less than or equal to "255", so it stores it in tempArray, finds tempArray is full, so it stores tempArray's values in endArray as "255"
2 [2][ ][ ] [58][106][10][96][255][37]
5 [2][5][ ] [58][106][10][96][255][37] //There are numbers that can be appended to "25" to make a number less than or equal to "255", so continue
6 [6][ ][ ] [58][106][10][96][255][37][25] //It sees that adding "6" to "25" would make a number that's larger than 255, so it stores "25" in the endArray and remembers "6" in the tempArray
9 [6][9][ ] [58][106][10][96][255][37][25][69] //It sees that there is no number X such that "69X" is less than "255", so it stores "69" in endArray
Does anyone know how I can accomplish this behavior? Please try to keep your answers in pseudocode, so it can be translated to many programming langauges

I would not use the temp array for holding the intermediate numbers - for the CPU numbers are stored in binary format and you are reading decimal numbers.
Something like this could solve your problem:
array = []
accumulator = 0
count = 0
while not EOF:
n = readDigit()
if accumulator*10 + n > 256 or count == 2:
array.push(accumulator)
accumulator = n
count = 0
else:
accumulator = accumulator*10 + n
count = count + 1
The results are appended to the array called array.
Edit: Thanks to DeanOC for noticing the missing counter. But DeanOC's solution initializes the counter for the first iteration to 0 instead of 1.

antiguru's response is nearly there.
The main problem is that it doesn't take into consideration that the numbers can only have 3 digits. This modification should work for you.
array = []
accumulator = 0
digitCounter = 0
while not EOF
n = readDigit()
if accumulator*10 + n > 255 or digitcounter = 3:
array.push(accumulator)
accumulator = n
digitCounter = 1
else:
accumulator = accumulator*10 + n
digitCounter = DigitCounter + 1

Related

What's the proper way to find the last parcel of an array?

I'm doing some codewars and arr[index] keeps returning nil. I've done this a few different ways, and I'm sure the array exists, as well as the index. What's wrong here, is it syntax?
As I've mentioned in the title, I want to find the last digit of the array.
if arr[index] <= 0 then
return -1
end
Full Code:
local solution = {}
function solution.newAvg(arr, navg)
local currentAverage = 0
local index = 0
for i, v in pairs(arr) do
index = i
currentAverage = currentAverage + v
end
if arr[index] <= 0 then
return -1
end
return math.ceil(((index+1) * navg) - currentAverage)
end
return solution
I see two issues with your code:
Edge case: Empty array
If arr = {}, the loop for i, v in pairs(arr) do won't execute at all and index will remain at 0. Since arr is empty, arr[0] will be nil and arr[index] <= 0 will fail with an "attempt to compare a nil value" error.
Lack of ordering guarantee
You use pairs rather than ipairs to loop over what I assume is a list. This means keys & values might be traversed in any order. In practice pairs usually (but not always!) traverses the list part of a table in the same order as ipairs, but the reference manual clearly states that you can't rely on no specific order. I don't think CodeWars is this advanced but consider the possibility that pairs may be overridden to deliberately shuffle the order of traversal in order to check whether you're relying on the dreaded "undefined behavior". If this is the case, your "last index" might actually be any index that happens to be visited last, obviously breaking your algorithm.
Fixes
I'll assume arr is an "array", that is, it only contains keys from 1 to n and all values are non-nil (i.e. there are no holes). Then you can (and should!) use ipairs to loop over the "array":
for i, v in ipairs(arr) do ... end
I don't know the problem statement so it's hard to tell how an empty array should be handled. I'll assume that it should probably return 0. You could add a simply early return at the top of the function for that: if arr[1] == nil then return 0 end. Nonempty arrays will always have arr[1] ~= nil.
I want to find the last digit of the array.
If you mean the last integer (or entry/item) of the array:
local last = array[#array]
If you mean the last digit (for example array = {10, 75, 44, 62} and you want 2), then you can get the last item and then get the last digit using modulo 10:
local last = array[#array] % 10
for i, v in pairs(arr) do
index = i
currentAverage = currentAverage + v
end
Just a reminder:
#array returns the number of items in a table.
In Lua, arrays are implemented using integer-indexed tables.
There's a difference between pairs() and ipairs().
Regarding point 3 above, the following code:
local array = {
[1] = 12,
[2] = 32,
[3] = 41,
[4] = 30,
[5] = 14,
[6] = 50,
[7] = 62,
[8] = 57
}
for key, value in pairs(array) do
print(key, value)
end
produces the following output (note that the order of keys is not respected):
8 57
1 12
2 32
3 41
4 30
5 14
6 50
7 62
while the same code above with pairs() replaced with ipairs() gives:
1 12
2 32
3 41
4 30
5 14
6 50
7 62
8 57
So, this might be the cause of your problem.

How do I fix the error "subscript out of range" in QBASIC?

I'm trying to create a code that generates random numbers within the range 10-30 but making sure that no number is repeated. It shows "subscript out of range" on NumArray(Count) = Count when I run the code.
'Make an array of completely sorted numbers
FOR Count = 10 TO 30
NumArray(Count) = Count
NEXT Count
RANDOMIZE TIMER
FOR Count = 10 TO 30
Number = (RND * (31 - Count)) + 10
PRINT #1, NumArray(Number)
FOR Counter = Number TO 30 - Count
NumArray(Counter) = NumArray(Counter + 1)
NEXT Counter
NEXT Count
This isn't actually my code. Copied and pasted for my assignment.
It looks like you're missing some DIM statements.
Variables containing numbers have type SINGLE by default, so you might see something like FOR Counter = 18.726493 TO 20 because the RND function returns a number between 0 and 1, excluding 1, meaning you will be trying to use NumArray(18.726493) which will not work.
Arrays that are not explicitly declared can only have 11 items with an index from 0 to 10, but the range 10-30 requires you to store 21 items (30 - 10 + 1 = 21). You can also specify a custom upper and lower bound if it will make your code easier for you to understand. Add these lines before the first line in your code shown above:
DIM Number AS INTEGER
DIM NumArray(10 TO 30) AS INTEGER
This will ensure Number only contains integers (any fractional values are rounded to the nearest integer), and NumArray will work from NumArray(10) to NumArray(30), but you can't use NumArray(9), NumArray(8), NumArray(31), etc. The index must be in the range 10-30.
I think that should fix your code, but I don't know for certain since I don't fully understand how it is supposed to work. At the very least, it will fix the type and subscript problems in your code.
You need to declare the array:
'Make an array of completely sorted numbers
DIM NumArray(30) AS INTEGER
FOR Count = 10 TO 30
NumArray(Count) = Count
NEXT Count
RANDOMIZE TIMER
FOR Count = 10 TO 30
Number = (RND * (31 - Count)) + 10
PRINT #1, NumArray(Number)
FOR Counter = Number TO 30 - Count
NumArray(Counter) = NumArray(Counter + 1)
NEXT Counter
NEXT Count

If you can combine 3+ arbitrarily sized integers and still be able to deconstruct it back

Say you have 3 integers:
13105
705016
13
I'm wondering if you could combine these into one integer in any way, so that you can still get back to the original 3 integers.
var startingSet = [ 13105, 705016, 13 ]
var combined = combineIntoOneInteger(startingSet)
// 15158958589285958925895292589 perhaps, I have no idea.
var originalIntegers = deconstructInteger(combined, 3)
// [ 13105, 705016, 13 ]
function combineIntoOneInteger(integers) {
// some sort of hashing-like function...
}
function deconstructInteger(integer, arraySize) {
// perhaps pass it some other parameters
// like how many to deconstruct to, or other params.
}
It doesn't need to technically be an "integer". It is just a string using only the integer characters, though perhaps I might want to use the hex characters instead. But I ask in terms of integers because underneath I do have integers of a bounded size that will be used to construct the combined object.
Some other notes....
The combined value should be unique, so no matter what values you combine, you will always get a different result. That is, there are absolutely no conflicts. Or if that's not possible, perhaps an explanation why and a potential workaround.
The mathematical "set" containing all possible outputs can be composed of different amounts of components. That is to say, you might have the output/combined set containing [ 100, 200, 300, 400 ] but the input set is these 4 arrays: [ [ 1, 2, 3 ], [ 5 ], [ 91010, 132 ], [ 500, 600, 700 ] ]. That is, the input arrays can be of wildly different lengths and wildly different sized integers.
One way to accomplish this more generically is to just use a "separator" character, which makes it super easy. So it would be like 13105:705016:13. But this is cheating, I want it to only use the characters in the integer set (or perhaps the hex set, or some other arbitrary set, but for this case just the integer set or hex).
Another idea for a potential way to accomplish this is to somehow hide a separator in there by doing some hashing or permutation jiu jitsu so that [ 13105, 705016, 13 ] becomes some integer-looking thing like 95918155193915183, where 155 and 5 are some separator like interpolator values based on the preceding input or some other tricks. A simpler approach to this would be like saying "anything following three zeroes 000 like 410001414 means it's a new integer. So basically 000 is a separator. But this specifically is ugly and brittle. Maybe it could get more tricky and work though, like "if the value is odd and followed by a multiple of 3 of itself, then it's a separator" sort of thing. But I can see that also having brittle edge cases.
But basically, given a set of integers n (of strings of integer characters), how to convert that into a single integer (or single integer-charactered string), and then convert it back into the original set of integers n.
Sure, there are lots of ways to do this.
To start with, it's only necessary to have a reversible function which combines two values into one. (For it to be reversible, there must be another function which takes the output value and recreates the two input values.)
Let's call the function which combines two values combine and the reverse function separate. Then we have:
separate(combine(a, b)) == [a, b]
for any values a and b. That means that combine(a, b) == combine(c, d)
can only be true if both a == c and b == d; in other words, every pair of inputs produces a different output.
Encoding arbitrary vectors
Once we have that function, we can encode arbitrary-length input vectors. The simplest case is when we know in advance what the length of the vector is. For example, we could define:
combine3 = (a, b, c) => combine(combine(a, b), c)
combine4 = (a, b, c, d) => combine(combine(combine(a, b), c), d)
and so on. To reverse that computation, we only have to repeatedly call separate the correct number of times, each time keeping the second returned value. For example, if we previously had computed:
m = combine4(a, b, c, d)
we could get the four input values back as follows:
c3, d = separate(m)
c2, c = separate(c3)
a, b = separate(c2)
But your question asks for a way to combine an arbitrary number of values. To do that, we just need to do one final combine, which mixes in the number of values. That lets us get the original vector back out: first, we call separate to get the value count back out, and then we call separate enough times to extract each successive input value.
combine_n = v => combine(v.reduce(combine), v.length)
function separate_n(m) {
let [r, n] = separate(m)
let a = Array(n)
for (let i = n - 1; i > 0; --i) [r, a[i]] = separate(r);
a[0] = r;
return a;
}
Note that the above two functions do not work on the empty vector, which should code to 0. Adding the correct checks for this case is left as an exercise. Also note the warning towards the bottom of this answer, about integer overflow.
A simple combine function: diagonalization
With that done, let's look at how to implement combine. There are actually many solutions, but one pretty simple one is to use the diagonalization function:
diag(a, b) = (a + b)(a + b + 1)
------------------ + a
2
This basically assigns positions in the infinite square by tracing successive diagonals:
<-- b -->
0 1 3 6 10 15 21 ...
^ 2 4 7 11 16 22 ...
| 5 8 12 17 23 ...
a 9 13 18 24 ...
| 14 19 25 ...
v 20 26 ...
27 ...
(In an earlier version of this answer, I had reversed a and b, but this version seems to have slightly more intuitive output values.)
Note that the top row, where a == 0, is exactly the triangular numbers, which is not surprising because the already enumerated positions are the top left triangle of the square.
To reverse the transformation, we start by solving the equation which defines the triangular numbers, m = s(s + 1)/2, which is the same as
0 = s² + s - 2m
whose solution can be found using the standard quadratic formula, resulting in:
s = floor((-1 + sqrt(1 + 8 * m)) / 2)
(s here is the original a+b; that is, the index of the diagonal.)
I should explain the call to floor which snuck in there. s will only be precisely an integer on the top row of the square, where a is 0. But, of course, a will usually not be 0, and m will usually be a little more than the triangular number we're looking for, so when we solve for s, we'll get some fractional value. Floor just discards the fractional part, so the result is the diagonal index.
Now we just have to recover a and b, which is straight-forward:
a = m - combine(0, s)
b = s - a
So we now have the definitions of combine and separate:
let combine = (a, b) => (a + b) * (a + b + 1) / 2 + a
function separate(m) {
let s = Math.floor((-1 + Math.sqrt(1 + 8 * m)) / 2);
let a = m - combine(0, s);
let b = s - a;
return [a, b];
}
One cool feature of this particular encoding is that every non-negative integer corresponds to a distinct vector. Many other encoding schemes do not have this property; the possible return values of combine_n are a subset of the set of non-negative integers.
Example encodings
For reference, here are the first 30 encoded values, and the vectors they represent:
> for (let i = 1; i <= 30; ++i) console.log(i, separate_n(i));
1 [ 0 ]
2 [ 1 ]
3 [ 0, 0 ]
4 [ 1 ]
5 [ 2 ]
6 [ 0, 0, 0 ]
7 [ 0, 1 ]
8 [ 2 ]
9 [ 3 ]
10 [ 0, 0, 0, 0 ]
11 [ 0, 0, 1 ]
12 [ 1, 0 ]
13 [ 3 ]
14 [ 4 ]
15 [ 0, 0, 0, 0, 0 ]
16 [ 0, 0, 0, 1 ]
17 [ 0, 1, 0 ]
18 [ 0, 2 ]
19 [ 4 ]
20 [ 5 ]
21 [ 0, 0, 0, 0, 0, 0 ]
22 [ 0, 0, 0, 0, 1 ]
23 [ 0, 0, 1, 0 ]
24 [ 0, 0, 2 ]
25 [ 1, 1 ]
26 [ 5 ]
27 [ 6 ]
28 [ 0, 0, 0, 0, 0, 0, 0 ]
29 [ 0, 0, 0, 0, 0, 1 ]
30 [ 0, 0, 0, 1, 0 ]
Warning!
Observe that all of the unencoded values are pretty small. The encoded values is similar in size to the concatenation of all the input values, and so it does grow pretty rapidly; you have to be careful to not exceed Javascript's limit on exact integer computation. Once the encoded value exceeds this limit (253) it will no longer be possible to reverse the encoding. If your input vectors are long and/or the encoded values are large, you'll need to find some kind of bignum support in order to do precise integer computations.
Alternative combine functions
Another possible implementation of combine is:
let combine = (a, b) => 2**a * 3**b
In fact, using powers of primes, we could dispense with the combine_n sequence, and just produce the combination directly:
combine(a, b, c, d, e,...) = 2a 3b 5c 7d 11e...
(That assumes that the encoded values are strictly positive; if they could be 0, we'd have no way of knowing how long the sequence was because the encoded value does not distinguish between a vector and the same vector with a 0 appended. But that's not a big issue, because if we needed to deal with 0s, we would just add one to all used exponents:
combine(a, b, c, d, e,...) = 2a+1 3b+1 5c+1 7d+1 11e+1...
That is certainly correct and its very elegant in a theoretical sense. It's the solution which you will find in theoretical CS textbooks because it is much easier to prove uniqueness and reversibility. However, in the real world it is really not practical. Reversing the combination depends on finding the prime factors of the encoded value, and the encoded values are truly enormous, well out of the range of easily representable numbers.
Another possibility is precisely the one you mention in the question: simply put a separator between successive values. One simple way to do this is to rewrite the values to encode in base 9 (or base 15) and then increment all the digit values, so that the digit 0 is not present in any encoded value. Then we can put 0s between the encoded values and read the result in base 10 (or base 16).
Neither of these solutions has the property that every non-negative integer is the encoding of some vector. (The second one almost has that property, and it's a useful exercise to figure out which integers are not possible encodings, and then fix the encoding algorithm to avoid that problem.)

Codility: Passing cars in Lua

I'm currently practicing programming problems and out of interest, I'm trying a few Codility exercises in Lua. I've been stuck on the Passing Cars problem for a while.
Problem:
A non-empty zero-indexed array A consisting of N integers is given. The consecutive elements of array A represent consecutive cars on a road.
Array A contains only 0s and/or 1s:
0 represents a car traveling east,
1 represents a car traveling west.
The goal is to count passing cars. We say that a pair of cars (P, Q), where 0 ≤ P < Q < N, is passing when P is traveling to the east and Q is traveling to the west.
For example, consider array A such that:
A[0] = 0
A[1] = 1
A[2] = 0
A[3] = 1
A[4] = 1
We have five pairs of passing cars: (0, 1), (0, 3), (0, 4), (2, 3), (2, 4).
Write a function:
function solution(A)
that, given a non-empty zero-indexed array A of N integers, returns the number of pairs of passing cars.
The function should return −1 if the number of pairs of passing cars exceeds 1,000,000,000.
For example, given:
A[0] = 0
A[1] = 1
A[2] = 0
A[3] = 1
A[4] = 1
the function should return 5, as explained above.
Assume that:
N is an integer within the range [1..100,000];
each element of array A is an integer that can have one of the following values: 0, 1.
Complexity:
expected worst-case time complexity is O(N);
expected worst-case space complexity is O(1), beyond input storage (not counting the storage required for input arguments).
Elements of input arrays can be modified.
My attempt in Lua keeps failing but I can't seem to find the issue.
local function solution(A)
local zeroes = 0
local pairs = 0
for i = 1, #A do
if A[i] == 0 then
zeroes = zeroes + 1
else
pairs = pairs + zeroes
if pairs > 1e9 then
return -1
end
end
end
return pairs
end
In terms of time-space complexity constraints, I think it should pass so I can't seem to find the issue. What am I doing wrong? Any advice or tips to make my code more efficient would be appreciated.
FYI: I keep getting a result of 2 when the desired example result is 5.
The problem statement says A is 0-based so if we ignore the first and start at 1, the output would be 2 instead of 5. 0-based tables should be avoided in Lua, they go against convention and will lead to a lot of off-by one errors: for i=1,#A do will not do what you want.
function solution1based(A)
local zeroes = 0
local pairs = 0
for i = 1, #A do
if A[i] == 0 then
zeroes = zeroes + 1
else
pairs = pairs + zeroes
if pairs > 1e9 then
return -1
end
end
end
return pairs
end
print(solution1based{0, 1, 0, 1, 1}) -- prints 5 as you wanted
function solution0based(A)
local zeroes = 0
local pairs = 0
for i = 0, #A do
if A[i] == 0 then
zeroes = zeroes + 1
else
pairs = pairs + zeroes
if pairs > 1e9 then
return -1
end
end
end
return pairs
end
print(solution0based{[0]=0, [1]=1, [2]=0, [3]=1, [4]=1}) -- prints 5

How to generate random lines of text of a given length from a dictionary of words (bin-packing problem)?

I need to generate three lines of text (essentially jibberish) that are each 60 characters long, including a hard return at the end of each line. The lines are generated from a dictionary of words of various lengths (typically 1-8 characters). No word may be used more than once, and words must be separated by spaces. I think this is essentially a bin-packing problem.
The approach I've taken so far is to create a hashMap of the words, grouped by their lengths. I then choose a random length, pull a word of that length from the map, and append it to the end of the line I'm currently generating, accounting for spaces or a hard return. It works about half the time, but the other half of the time I'm getting stuck in an infinite loop and my program crashes.
One problem I'm running into is this: as I add random words to the lines, groups of words of a given length may become depleted. This is because there are not necessarily the same number of words of each length in the dictionary, e.g., there may only be one word with a length of 1. So, I might need a word of a given length, but there are no longer any words of that length available.
Below is a summary of what I have so far. I'm working in ActionScript, but would appreciate insight into this problem in any language. Many thanks in advance.
dictionary // map of words with word lengths as keys and arrays of corresponding words as values
lengths // array of word lengths, sorted numerically
min = lengths[0] // minimum word length
max = lengths[lengths.length - 1] // maximum word length
line = ""
while ( line.length < 60 ) {
len = lengths[round( rand() * ( lengths.length - 1 ) )]
if ( dictionary[len] != null && dictionary[len].length > 0 ) {
diff = 60 - line.length // number of characters needed to complete the line
if ( line.length + len + 1 == 60 ) {
// this word will complete the line exactly
line += dictionary[len].splice(0, 1) + "\n"
}
else if ( min + max + 2 >= diff ) {
// find the two word lengths that will complete the line
// ==> this is where I'm having trouble
}
else if ( line.length + len + 1 < 60 - max ) {
// this word will fit safely, so just add it
line += dictionary[len].splice(0, 1) + " "
}
if ( dictionary[len].length == 0 ) {
// delete any empty arrays and update min and max lengths accordingly
dictionary[len] = null
delete dictionary[len]
i = lengths.indexOf( len )
if ( i >= 0 ) {
// words of this length have been depleted, so
// update lengths array to ensure that next random
// length is valid
lengths.splice( i, 1 )
}
if ( lengths.indexOf( min ) == -1 ) {
// update the min
min = lengths[0]
}
if ( lengths.indexOf( max ) == -1 ) {
// update the max
max = lengths[lengths.length - 1]
}
}
}
}
You should think of an n-letter word as being n+1 letters, because each word has either a space or return after it.
Since all your words are at least 2 characters long, you don't ever want to get to a point where you have 59 characters filled in. If you get to 57, you need to pick something that is 2 letters plus the return. If you get to 58, you need a 1-letter word plus the return.
Are you trying to optimize for time? Can you have the same word multiple times? Multiple times in one line? Does it matter if your words are not uniformly distributed, e.g. a lot of lines contain "a" or "I" because those are the only one-letter words in English?
Here's the basic idea. For each line, start choosing word lengths, and keep track of the word lengths and total character count so far. As you get toward the end of the line, choose word lengths less than the number of characters you have left. (e.g. if you have 5 characters left, choose words in the range of 2-5 characters, counting the space.) If you get to 57 characters, pick a 3-letter word (counting return). If you get to 58 characters, pick a 2-letter word (counting return.)
If you want, you can shuffle the word lengths at this point, so all your lines don't end with short words. Then for each word length, pick a word of that length and plug it in.
dictionnary = Group your words by lengths (like you already do)
total_length = 0
phrase = ""
while (total_length < 60){
random_length = generate_random_number(1,8)
if (total_length + random_length > 60)
{
random_length = 60 - total_length // possibly - 1 if you cound \n and -2 if you
// append a blank anyway at the end
}
phrase += dictionnary.get_random_word_of_length(random_length) + " "
total_length += random_length + 1
}

Resources