Using numeric variables, what is the best practice to encode results of measurements that are below or above the range provided by the instrumentation (e.g. TSH < 0.001)? In the specific case this is needed for a medical project, but the problem is expected to apply to any kind of measurement. In my own research I couldn’t find a satisfactory solution up to now.
Generally, this class of problems is addressed in medical data formats, e.g. HL7, but there, numeric values are basically represented as strings. Is there an efficient way to do this with numeric data types (apart from a separate flag variable indicating if the result is within, below or above the cut-off value of the range of measurement)?
This should preferably be a cross-platform solution independent of the used processor architecture and being compatible with Pascal or Object Pascal, but elegant solutions in other programming languages are welcome, too.
The double values, in their IEEE definition, have already some "special values".
0 11111111111 00000000000000000000000000000000000000000000000000002 ≙ 7FF0 0000 0000 000016
≙ +∞ (positive infinity)
1 11111111111 00000000000000000000000000000000000000000000000000002 ≙ FFF0 0000 0000 000016
≙ −∞ (negative infinity)
You may reuse these "flags" for below/above range values.
Every language can recognize those values, e.g. Delphi/FPC Math.pas unit defines NegInfinity and Infinity if I remember correctly:
Infinity = 1.0 / 0.0;
NegInfinity = -1.0 / 0.0;
One side advantage is that they will be converted as text properly as non numbers (+INF/-INF), so it may help debugging and tracing those values.
Of course, you should detect and avoid computing with those values (e.g. a mean/R² or curve fitting), which may break your calculation with the correct values. But the result will probably be so obviously wrong (infinity will preempt other values in most mathematical operations) that it could be not too difficult to track this problem.
Check this article as reference.
Related
I am preparing a code that sends various electrical parameters across a CANbus (power, voltage, current, etc.). Precision is necessary for this circumstance.
There are two options I see:
Send the value as an integer, but scale it on sending and receiver side (Sending x100, Receiving /100, for example). If I do this, then I can send 2.12V as 212 (or 0xD4).
Send it as a float value, which would require 32 bits but no scaling.
My questions are as follows:
Can float values be sent across CANbus?
If yes to Q1, is that a common practice? Or do most CANbus communication programmers use scaled integers?
Thank you in advance.
First of all, please read this answer since it answers all the floating point concerns:
Does the "Avoid using floating-point" rule of thumb apply to a microcontroller with a floating point unit (FPU)?
Can float values be sent across CANbus?
Yes, though like any large number they depend on endianess, so you will have to specify a network endianess for floating point on your CAN application layer. Sending 4 or 8 bytes in a single package is obviously not a problem.
If yes to Q1, is that a common practice? Or do most CANbus communication programmers use scaled integers?
It is not common practice, unscaled raw data is by far the most common and convenient. For example if you wish to express a voltage between 0 and 5V, you could for example send a raw data value between 0 and 65535 then scale it to mean Volt if you actually need to use that unit at some point (for example when showing voltage for humans on a display).
The raw data form also has the advantage of being technology forwards/backwards compatible. Suppose you read voltage with a 10 bit ADC. Your values will be in the range of 0 to 1023. You can express that as 16 bit raw data by multiplying with (65536/1024) = 64. Some years later you switch to a 12 bit ADC with values up to 4095. You can then keep the very same CAN protocol and format, just with increased resolution smaller "steps" of raw data.
I searched for clear explanation with some example close to real life.
What is bit manipulation?
Why we need to use bit manipulation?
We can use bit manipulation in image processing as far as I know. Can anyone show me a simple problem which can be solved using bit manipulation?
I read about bit manipulation from some link:
Link 1
Link 2
In Link 2 Data compression is done using bit packing. Are there any difference between bit manipulation and bit packing?
It will be appreciable If anyone explain me with very simple example which have resemble to real life problem.
What is bit manipulation?
Bit manipulation usually refers to changing data using bit operators.
I think Wikipedia expains it good enough so I won't write another article.
https://en.wikipedia.org/wiki/Bit_manipulation
Bit manipulation is the act of algorithmically manipulating bits or
other pieces of data shorter than a word. Computer programming tasks
that require bit manipulation include low-level device control, error
detection and correction algorithms, data compression, encryption
algorithms, and optimization. For most other tasks, modern programming
languages allow the programmer to work directly with abstractions
instead of bits that represent those abstractions. Source code that
does bit manipulation makes use of the bitwise operations: AND, OR,
XOR, NOT, and possibly other operations analogous to the boolean
operators; there are also bit shifts and operations to count ones and
zeros, find high and low one or zero, set, reset and test bits,
extract and insert fields, mask and zero fields, gather and scatter
bits to and from specified bit positions or fields. Integer arithmetic
operators can also effect bit-operations in conjunction with the other
operators.
Bit manipulation, in some cases, can obviate or reduce the need to
loop over a data structure and can give many-fold speed ups, as bit
manipulations are processed in parallel.
Why we need to use bit manipulation?
Because it is fast and often we don't have another choice. For example in microcontrollers, pretty much everything is controlled by manipulating the bits of 8 bit registers. So an output would go high if you set a certain bit 1.
Bit packing is a compression technique that tries to minimize the number of bits necessary to represent a number. While you'll use bit operators to implement it, it is not the same as "bit-manipulation". It's just one of many many use cases for bit-manipulation.
Can anyone show me a simple problem which can be solved using bit manipulation?
Let's say you have a rgb touple rgb = 0xa1fc03 and you want to make the green channel 0.
rgb_without_green = rgb & 0xFF00FF
We've bitwise ANDed the value with 0xFF00FF.
Now rgb is 0xa10003.
Basically any operation boils down to bit manipulation. For most of them you just have convenient solutions. Say instead of 0x00000011 << 0x0000101 you write 3 * 32
Or have a look at this where the addition of two integers is implemented using bit operations. Add two integers using only bitwise operators?
Edit due to comment
How bitwise AND operation between 0xa1fc03 and 0xFF00FF gives 0xa10003
? Just I need to see how to do this calculation
Bitwise AND means that you AND all the bits of both numbers.
1 AND 1 -> 1
0 AND 1 -> 0
1 AND 0 -> 0
0 AND 0 -> 0
So
0xa1fc03 -> 0b101000011111110000000011
0xff00ff -> 0b111111110000000011111111
AND -> 0b101000010000000000000011
0b101000010000000000000011 -> 0xa10003
With a bit more expierience you know that 0xFF is 0b11111111 so you instantly know that 0xa1fc03 AND 0xff00ff is 0xa1003 becaue you keep everything that is masked with FF and set everything 0 that is masked with 00.
There are countless resources available. You should not have to ask me how to bitwise AND two numbers. Please do your own research.
I am busy designing a new barcode symbology for real-life applications. It uses a checksum value, which is computed on slices of k bits of large numbers. Hence intense bit manipulation.
I am using neo4j to calculate some statistics on a data set. For that I am often using sum on a floating point value. I am getting different results depending on the circumstances. For example, a query that does this:
...
WITH foo
ORDER BY foo.fooId
RETURN SUM(foo.Weight)
Returns different result than the query that simply does the sum:
...
RETURN SUM(foo.Weight)
The differences are miniscule (293.07724195098984 vs 293.07724195099007). But it is enough to make simple equality checks fail. Another example would be a different instance of the database, loaded with the same data using the same loading process can produce the same issue (the dbs might not be 1:1, the load order of some relations might be different). I took the raw values that neo4j sums (by simply removing the SUM()) and verified that they are the same in all cases (different dbs and ordered/not ordered).
What are my options here? I don't mind losing some precision (I already tried to cut down the precision from 15 to 12 decimal places but that did not seem to work), but I need the results to match up.
Because of rounding errors, floats are not associative. (a+b)+c!=a+(b+c).
The result of every operation is rounded to fit the floats coding constraints and (a+b)+c is implemented as round(round(a+b) +c) while a+(b+c) as round(a+round(b+c)).
As an obvious illustration, consider the operation (2^-100 + 1 -1). If interpreted as a (2^-100 + 1)-1, it will return 0, as 1+2^-100 would require a precision too large for floats or double coding in IEEE754 and can only be coded as 1.0. While (2^-100 +(1-1)) correctly returns 2^-100 that can be coded by either floats or doubles.
This is a trivial example, but these rounding errors may exist after every operation and explain why floating point operations are not associative.
Databases generally do not return data in a garanteed order and depending on the actual order, operations will be done differently and that explains the behaviour that you have.
In general, for this reason, it not a good idea to do equality comparison on floats. Generally, it is advised to replace a==b by abs(a-b) is "sufficiently" small.
"sufficiently" may depend on your algorithm. float are equivalent to ~6-7 decimals and doubles to 15-16 decimals (and I think that it is what is used on your DB). Depending on the number of computations, you may have the last 1--3 decimals affected.
The best is probably to use
abs(a-b)<relative-error*max(abs(a),abs(b))
where relative-error must be adjusted to your problem. Probably something around 10^-13 can be correct, but you must experiment, as rounding errors depends on the number of computations, on the dispersion of the values and on what you may consider as "equal" for you problem.
Look at this site for a discussion on comparison methods. And read What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg that discusses, among others, these problems.
I start with a simple Maxima question, the answer to which may provide the answer to the actual problem I'm grappling with.
Related Simple Question:
How can I get maxima to calculate:
bfloat((1+%i)^0.3);
Might there be an option variable that can be set so that this evaluates to a complex number?
Actual Question:
In evaluating approximations for numerical time integration for finite element methods, for this purpose I'm using spectral analysis, which requires the calculation of the eigenvalues of a 4 x 4 matrix. This matrix "cav" is also calculated within maxima, using some of the algebra capabilities of maxima, but sustituting numerical values, so that matrix is entirely numerical, i.e. containing no variables. I've calculated the eigenvalues with Mathematica and it returns 4 real eigenvalues. However Maxima calculates horrenduously complicated expressions for this case, which apparently it does not "know" how to simplify, even numerically as "bigfloat". Perhaps this problem arises because Maxima first approximates the matrix "cac" by rational numbers (i.e. fractions) and then tries to solve the problem fully exactly, instead of simply using numerical "bigfloat" computations throughout. Is there I way I can change this?
Note that if you only change the input value of gzv to say 0.5 it works fine, and returns numerical values of complex eigenvalues.
I include the code below. Note that all of the code up until "cav:subst(vs,ca)$" is just for the definition of the matrix cav and seems to work fine. It is in the few statements thereafter that it fails to calculate numerical values for the eigenvalues.
v1:v0+ (1-gg)*a0+gg*a1$
d1:d0+v0+(1/2-gb)*a0+gb*a1$
obf:a1+(1+ga)*(w^2*d1 + 2*gz*w*(d1-d0)) -
ga *(w^2*d0 + 2*gz*w*(d0-g0))$
obf:expand(obf)$
cd:subst([a1=1,d0=0,v0=0,a0=0,g0=0],obf)$
fd:subst([a1=0,d0=1,v0=0,a0=0,g0=0],obf)$
fv:subst([a1=0,d0=0,v0=1,a0=0,g0=0],obf)$
fa:subst([a1=0,d0=0,v0=0,a0=1,g0=0],obf)$
fg:subst([a1=0,d0=0,v0=0,a0=0,g0=1],obf)$
f:[fd,fv,fa,fg]$
cad1:expand(cd*[1,1,1/2-gb,0] - gb*f)$
cad2:expand(cd*[0,1,1-gg,0] - gg*f)$
cad3:expand(-f)$
cad4:[cd,0,0,0]$
cad:matrix(cad1,cad2,cad3,cad4)$
gav:-0.05$
ggv:1/2-gav$
gbv:(ggv+1/2)^2/4$
gzv:1.1$
dt:0.01$
wv:bfloat(dt*2*%pi)$
vs:[ga=gav,gg=ggv,gb=gbv,gz=gzv,w=wv]$
cav:subst(vs,ca)$
cav:bfloat(cav)$
evam:eigenvalues(cav)$
evam:bfloat(evam)$
eva:evam[1]$
The main problem here is that Maxima tries pretty hard to make computations exact, and it's hard to tell it to ease up and allow inexact results.
Is there a mistake in the code you posted above? You have cav:subst(vs,ca) but ca is not defined. Is that supposed to be cav:subst(vs,cad) ?
For the short problem, usually rectform can simplify complex expressions to something more usable:
(%i58) rectform (bfloat((1+%i)^0.3));
`rat' replaced 1.0B0 by 1/1 = 1.0B0
(%o58) 2.59023849130283b-1 %i + 1.078911979230303b0
About the long problem, if fixed-precision (i.e. ordinary floats, not bigfloats) is acceptable to you, then you can use the LAPACK function dgeev to compute eigenvalues and/or eigenvectors.
(%i51) load (lapack);
<bunch of messages here>
(%o51) /usr/share/maxima/5.39.0/share/lapack/lapack.mac
(%i52) dgeev (cav);
(%o52) [[- 0.02759949957202372, 0.06804641655485913, 0.997993508502892, 0.928429191717788], false, false]
If you really need variable precision, I don't know what to try. In principle it's possible to rework the LAPACK code to work with variable-precision floats, but that's a substantial task and I'm not sure about the details.
Good morning all,
I'm having some issues with floating point math, and have gotten totally lost in ".to_f"'s, "*100"'s and ".0"'s!
I was hoping someone could help me with my specific problem, and also explain exactly why their solution works so that I understand this for next time.
My program needs to do two things:
Sum a list of decimals, determine if they sum to exactly 1.0
Determine a difference between 1.0 and a sum of numbers - set the value of a variable to the exact difference to make the sum equal 1.0.
For example:
[0.28, 0.55, 0.17] -> should sum to 1.0, however I keep getting 1.xxxxxx. I am implementing the sum in the following fashion:
sum = array.inject(0.0){|sum,x| sum+ (x*100)} / 100
The reason I need this functionality is that I'm reading in a set of decimals that come from excel. They are not 100% precise (they are lacking some decimal points) so the sum usually comes out of 0.999999xxxxx or 1.000xxxxx. For example, I will get values like the following:
0.568887955,0.070564759,0.360547286
To fix this, I am ok taking the sum of the first n-1 numbers, and then changing the final number slightly so that all of the numbers together sum to 1.0 (must meet validation using the equation above, or whatever I end up with). I'm currently implementing this as follows:
sum = 0.0
array.each do |item|
sum += item * 100.0
end
array[i] = (100 - sum.round)/100.0
I know I could do this with inject, but was trying to play with it to see what works. I think this is generally working (from inspecting the output), but it doesn't always meet the validation sum above. So if need be I can adjust this one as well. Note that I only need two decimal precision in these numbers - i.e. 0.56 not 0.5623225. I can either round them down at time of presentation, or during this calculation... It doesn't matter to me.
Thank you VERY MUCH for your help!
If accuracy is important to you, you should not be using floating point values, which, by definition, are not accurate. Ruby has some precision data types for doing arithmetic where accuracy is important. They are, off the top of my head, BigDecimal, Rational and Complex, depending on what you actually need to calculate.
It seems that in your case, what you're looking for is BigDecimal, which is basically a number with a fixed number of digits, of which there are a fixed number of digits after the decimal point (in contrast to a floating point, which has an arbitrary number of digits after the decimal point).
When you read from Excel and deliberately cast those strings like "0.9987" to floating points, you're immediately losing the accurate value that is contained in the string.
require "bigdecimal"
BigDecimal("0.9987")
That value is precise. It is 0.9987. Not 0.998732109, or anything close to it, but 0.9987. You may use all the usual arithmetic operations on it. Provided you don't mix floating points into the arithmetic operations, the return values will remain precise.
If your array contains the raw strings you got from Excel (i.e. you haven't #to_f'd them), then this will give you a BigDecimal that is the difference between the sum of them and 1.
1 - array.map{|v| BigDecimal(v)}.reduce(:+)
Either:
continue using floats and round(2) your totals: 12.341.round(2) # => 12.34
use integers (i.e. cents instead of dollars)
use BigDecimal and you won't need to round after summing them, as long as you start with BigDecimal with only two decimals.
I think that algorithms have a great deal more to do with accuracy and precision than a choice of IEEE floating point over another representation.
People used to do some fine calculations while still dealing with accuracy and precision issues. They'd do it by managing the algorithms they'd use and understanding how to represent functions more deeply. I think that you might be making a mistake by throwing aside that better understanding and assuming that another representation is the solution.
For example, no polynomial representation of a function will deal with an asymptote or singularity properly.
Don't discard floating point so quickly. I could be that being smarter about the way you use them will do just fine.