How to stop SPSS from altering cell decimal values at import from .csv - spss

I'm importing roughly 50.000 cases/rows into SPSS via .csv file.
The data in question consists of a total of 17 variables some of which contain numbers.
They're basically Decimals but they get changed by SPSS when I import them.
The problem is I can't set a particular variable to have 3 Decimals because the actual value can sometimes be 2, which is important to keep as is and at other times it is actually 3. Hence, if I set the whole variable to 3 Decimals the values containing only 2 Decimals get added a 0 at the end, which screws everything up for me.
Snippet from actual data:
I need 1.667 to stay as-is. Then I need 1.50 to stay as-is. Then 1.40, 1.364 and so on for everything.
What happens when I import it is 1.50 becomes 1.500 1.40 becomes 1.400 and so on and so forth..
Any suggestions?

If the original data is 1.25 them the actual data stored is 1.25, which is equal to 1.250, and to 1.250000 for that matter. So this shouldn't screw up any calculations you are making - just the display.
You are forced to decide whether to rounded to two decimal points (`1.25') or three ('1.250'). If indeed this is what's bothering you - To the best of my knowledge there is no way (unlike Excel) to have a different number of decimals for different parts of one column, nor is there a way to remove trailing zeros.
This being said here is a weird workaround: changing the number format to 'restricted numeric' should, in theory, make your data unacceptable (as numbers in this format aren't supposed to have fractions), but will show the data without trailing zeros (well, in version 23 on my machine it does at least).
you can change the format through syntax like this:
formats var1 to var7 (n8).

Related

Google Sheet yields infinitesimal number as remainder of an integer/whole number

I have this worksheet where I need to create a checker to determine whether a number (result of dividing the sum of two numbers by another value --DIVISOR) is an integer/does not have decimals. Upon running the said checker, it mostly worked just fine but appeared to detect that a few items are not integers despite being exact multiples of the DIVISOR.
https://docs.google.com/spreadsheets/d/17-idS5G0kUI7JoHAx3qcJOiJ-zofmMrg93hUvZuxPiA/edit#gid=0
I have two values (V1 and V2) whose sum I need to divide by a certain number (Divisor).
I need the OUTPUT to be an integer/whole number. Since the DIVISOR is a multiple of SUM (V1,V2), the OUTPUT is supposed to be a whole number. I also expanded the number of decimal places to make sure that there are no trailing numbers after the decimal point.
However, upon running the MOD function over the OUTPUT, it generated some infinitesimal value.
I also tried TRUNCATING the OUTPUT and getting the DIFFERENCE between the TRUNC and OUTPUT. It yielded the same remainder value as the MOD result.
I downloaded the GSheet and opened it in MS Excel. There seems to be no problem with the DIFFERENCE result, but the MOD function yielded yet another value.
actually, this is not a bug and it is pretty common. its called a floating point "error" and in a nutshell, it has to do things with how decimal numbers are stored within a google sheets (even excel or any other app)
more details can be found here: https://en.wikipedia.org/wiki/IEEE_754
to counter it you will need to introduce rounding like:
=ROUND(SUM(A1:A))
this is not an ideal solution for all cases so depending on your requirements you may need to use these instead of ROUND:
ROUNDUP
ROUNDDOWN
TRUNC
TEXT

How to keep track of the seed

So in Lua it's common knowledge that you can use math.randomseed but it's also obvious that math.random sets the seed as well (calling it twice does not return the same result), what does it set it to, and how can I keep track of it, and if it's impossible, please explain why that is so.
This is not a Lua question, but general question on how some RNG algorithm works.
First, Lua don't have their own RNG - they just output you (slightly mangled) value from RNG of underlying C library. Most RNG implementations do not reveal you their inner state, but sometimes you can caclulate it yourself.
For example when you use Lua on Windows, you'll be using LCG-based RNG from MS C library. The numbers you get is a slice of seed, not full value. There are two ways you can deal with that:
If you know how many times you called random, you can just take initial seed value, feed it to your copy of the same algorithm with same constants that are hardcoded in MS library and get exact value of seed.
If you don't, but you can be sure that nobody interferes in between your two calls to random, you can get two generated numbers, and reverse LCG algorithm by shifting bits back to their place. This will leave you with several missing bits (with one more bit thanks to Lua mangling) that you will need to simply bruteforce - just reiterate over all missing bits until your copy of algorithm produces exactly same two "random" numbers you've recorded before. That will be current seed stored inside library's RNG as well. Well programmed solution in Lua can bruteforce this in about 0.2-0.5s on somewhat dated PC - I did it past. Here's example on Crypto.SE talking about this task in more details: Predicting values from a Linear Congruential Generator.
First approach can be used with any other RNG algorithm that doesn't use any real entropy, second with most RNGs that don't mask too much bits in slice to make bruteforcing unreasonable.
Real answer though is: you don't need to keep track of seed at all. What you want is probably something else.
If you set a seed all numbers math.random() generates are pseudo-random (This is always the case as the system will generate a seed by itself).
math.randomseed(4)
print(math.random())
print(math.random())
math.randomseed(4)
print(math.random())
Outputs
0.50827539156303
0.75454387490399
0.50827539156303
So if you reset the seed to the same value you can predict all values that are going to come up to the maximum number of consecutive values that you already generated using that seed.
What the seed does not do is keep the output of math.random() the same. It would be the same if you kept resetting it to the same value.
An analogy as an example
Imagine the random number is an integer between 0 and 9 (instead of a double between 0 and 1).
math.random() could traverse pi's decimals from an arbitrary starting position (default could be system time).
What you do when you use set.seed() is (not literally, this is an analogy as mentioned) set the starting decimals of where in pi you are going to retrieve your numbers.
If you now reset the seed to the same starting position the numbers are going to be the same as the last time you reset the starting position.
You will know the numbers of to the last call, after that you can't be certain anymore.

using acts_as_list and transferring floats to int value

I would like to use acts_as_list in an app that was originally written in php and is being moved to rails. We used a 'position' value that was a float such that if a user wanted to put something between position 1 and 2, they would just enter 1.5 in the form. It looks like acts_as_list just uses integers. Is there a way in acts_as_list to make it use floats rather than integers? Or possibly convert a set of floats to an integer for insert?
thx
You could modify it, but it's easier to user integers and just reorder all the items in the list that appear after the one you're moving.
Using floats you're forced to split numbers into higher and higher precision every time you move a list around, and if a list gets enough reordering in it, you're likely to eventually run into problems related to how floating point numbers are stored, and then you'll have a list whose ordering breaks in subtle ways that won't be immediately obvious. The other issue with using floats really has to do with storing position (which is inherently a whole number) as a floating point number. When you're standing in a line, you don't think of yourself as being in position 1.5 - you're either in position 1 or 2. The only case where a measurement like 1.5 makes sense in something like people standing in a line is if you're measuring distance (like physical distance in feet/meters) from, say, the front of the line. However, at that point the 1.5 measurement has a very different meaning - it's no longer position, it's distance.
If you're trying to save the on the number of queries/DB changes that are required (floats would allow you to update just one column on one record instead of one column on multiple records), then you're probably missing out on the convenience of the gem doing it for you, and you might want to roll your own if there's some reason you really need to support floats.
However, given the points above about position inherently being a whole number, I'd recommend against doing floats just to save on DB time. How often are people really going to re-sort a list, and how much real load would it put on your app?
If instead, you have to support floats because of some integration point with the old system, then please tell us more about that.

Ruby Floating Point Math - Issue with Precision in Sum Calc

Good morning all,
I'm having some issues with floating point math, and have gotten totally lost in ".to_f"'s, "*100"'s and ".0"'s!
I was hoping someone could help me with my specific problem, and also explain exactly why their solution works so that I understand this for next time.
My program needs to do two things:
Sum a list of decimals, determine if they sum to exactly 1.0
Determine a difference between 1.0 and a sum of numbers - set the value of a variable to the exact difference to make the sum equal 1.0.
For example:
[0.28, 0.55, 0.17] -> should sum to 1.0, however I keep getting 1.xxxxxx. I am implementing the sum in the following fashion:
sum = array.inject(0.0){|sum,x| sum+ (x*100)} / 100
The reason I need this functionality is that I'm reading in a set of decimals that come from excel. They are not 100% precise (they are lacking some decimal points) so the sum usually comes out of 0.999999xxxxx or 1.000xxxxx. For example, I will get values like the following:
0.568887955,0.070564759,0.360547286
To fix this, I am ok taking the sum of the first n-1 numbers, and then changing the final number slightly so that all of the numbers together sum to 1.0 (must meet validation using the equation above, or whatever I end up with). I'm currently implementing this as follows:
sum = 0.0
array.each do |item|
sum += item * 100.0
end
array[i] = (100 - sum.round)/100.0
I know I could do this with inject, but was trying to play with it to see what works. I think this is generally working (from inspecting the output), but it doesn't always meet the validation sum above. So if need be I can adjust this one as well. Note that I only need two decimal precision in these numbers - i.e. 0.56 not 0.5623225. I can either round them down at time of presentation, or during this calculation... It doesn't matter to me.
Thank you VERY MUCH for your help!
If accuracy is important to you, you should not be using floating point values, which, by definition, are not accurate. Ruby has some precision data types for doing arithmetic where accuracy is important. They are, off the top of my head, BigDecimal, Rational and Complex, depending on what you actually need to calculate.
It seems that in your case, what you're looking for is BigDecimal, which is basically a number with a fixed number of digits, of which there are a fixed number of digits after the decimal point (in contrast to a floating point, which has an arbitrary number of digits after the decimal point).
When you read from Excel and deliberately cast those strings like "0.9987" to floating points, you're immediately losing the accurate value that is contained in the string.
require "bigdecimal"
BigDecimal("0.9987")
That value is precise. It is 0.9987. Not 0.998732109, or anything close to it, but 0.9987. You may use all the usual arithmetic operations on it. Provided you don't mix floating points into the arithmetic operations, the return values will remain precise.
If your array contains the raw strings you got from Excel (i.e. you haven't #to_f'd them), then this will give you a BigDecimal that is the difference between the sum of them and 1.
1 - array.map{|v| BigDecimal(v)}.reduce(:+)
Either:
continue using floats and round(2) your totals: 12.341.round(2) # => 12.34
use integers (i.e. cents instead of dollars)
use BigDecimal and you won't need to round after summing them, as long as you start with BigDecimal with only two decimals.
I think that algorithms have a great deal more to do with accuracy and precision than a choice of IEEE floating point over another representation.
People used to do some fine calculations while still dealing with accuracy and precision issues. They'd do it by managing the algorithms they'd use and understanding how to represent functions more deeply. I think that you might be making a mistake by throwing aside that better understanding and assuming that another representation is the solution.
For example, no polynomial representation of a function will deal with an asymptote or singularity properly.
Don't discard floating point so quickly. I could be that being smarter about the way you use them will do just fine.

(La)TeX Base 10 fixed point arithmetic

I'm trying to implement decimal arithmetic in (La)TeX. I'm trying to use dimens to store the values. I want the arithmetic to be exact to some (fixed) number of decimal places. If I use 1pt as my base unit, then this fails, because \divide rounds down, so 1pt / 10 gives 0.09999pt. If I use something like 1000sp as my base unit, then I get working fixed point arithmetic with 3 decimal places, but I can't figure out an easy way to format the numbers. If I try to convert them to pt, so I can use TeX's display mechanism, I have the same problem with \divide.
How do I fix this problem, or work around it?
The fp package provides fixed point arithmetic for LaTeX. The LaTeX3 Project are currently implementing something similar as part of the expl3 bundle. The code is currently not on CTAN, but can be grabbed from the SVN (or will appear when the next update from the SVN to CTAN takes place).
I would represent all the values as integers and scale them appropriately. For example, when you need three decimal digits, 0.124 would be represented as 124. This is nice because addition and subtraction are trivial. When multiplying two numbers a and b, you would have to divide the result by 1000 to get the proper representation. Dividing works by multiplying the result with 1000.
You still have to get the rounding issues correct, but this isn't very difficult. At least if you don't get near the maximum representable integer (I don't remember if it's 2^31-1 or 2^30-1).
Here is some code:
\def\fixadd#1#2#3{%
#1=#2\relax
\advance #1 by #3\relax
}
\def\fixsub#1#2#3{%
#1=#2\relax
#1=-#1\relax
\advance #1 by #3\relax
#1=-#1\relax
}
\def\fixmul#1#2#3{%
#1=#2\relax
\multiply #1 by #3\relax
\divide #1 by 1000\relax
}
\def\fixdiv#1#2#3{%
#1=#2\relax
\divide #1 by #3\relax
\multiply #1 by 1000\relax
}
\newcount\numa
\newcount\numb
\newcount\numc
\numa=1414
\numb=2828
\fixmul\numc\numa\numb
\the\numc
\bye
The operations are modeled after a three register machine, where the first is the destination and the other two are the operands. The rounding after the multiplication and division, including corner cases for very large or very small numbers are left as an exercise to you.

Resources