ECIES: correct way ECDH-input for KDF? Security effect? - elliptic-curve

In order to understand ECIES completely and use my favorite library I implemented some parts of ECIES myself. Doing this and comparing the results led to one point which is not really clear for me: what exacly is the input of KDF?
The result of ECDH is an vector, but what do you use for the KDF? Is it just the X value, or is it X + Y (perhaps with an prepended 04)? You can find both concept in the wild, and for sake of interoberability, it would be really interesting which way is the correct way (if there is a correct way at all - I know that ECIEs is more a concept and has several degrees of freedom).
Explanation (correct me if I'm wrong at a specific point, please). If I talk about byte length, this will refer to ECIES with 256 Bit EC Keys.
So, first, the big picture: here's the ECIES process, and I'm talking about the step 2 -> 3:
The recipient's public key is an vector V, the sender's emphemal private key is a scalar u, and key agreement function KA is ECDH which is basicly a multiplication of V * u. As a result, you get a shared key which is also a vector - let's call it "shared key".
Then you take the sender's public key, concat it with the shared key, and use this as an input for the key derival function KDF.
But: If you want to use this vector for the key derival function KDF, you have two ways of doing this:
you can use just shared key's X. Then you have a bytestring of 32 bytes.
you can use shared key's X and Y and prepend it 0x04 as you with public keys. Then you have a bytestring of 01 + 32 + 32 bytes
[3) just to be complete: you can also use X + Y as a compressed point)
The length of the bytestring does not really matter, because after KDF (which usually involves hashing) you always have a fixed value, e.g. 32 bytes (if you use sha256).
But of course the result of KDF is quite different if you choose one or the other method. So the question is: what's the correct way?
eciespy uses method 2 https://github.com/ecies/py/blob/master/ecies/utils.py#L143
python cryptography gives just X back at their ECDH: https://cryptography.io/en/latest/hazmat/primitives/asymmetric/ec/#cryptography.hazmat.primitives.asymmetric.ec.ECDH . They have no ECIES support.
if I understand CryptoC++s documentation correctly, they also just give X back: https://cryptopp.com/wiki/Elliptic_Curve_Diffie-Hellman
same with Java BountyCastle, if I read this correctly - result is an integer: https://github.com/bcgit/bc-java/blob/master/core/src/main/java/org/bouncycastle/crypto/agreement/DHBasicAgreement.java#L79
but you can also find online calculators with both, X and Y: http://www-cs-students.stanford.edu/~tjw/jsbn/ecdh.html
So, I tried to get more information in documentation:
there's the ISO propsal for ECIES. They don't describe it in detail (or I was not able to find it), but I would interpret it as the way with the full vector, X and Y: https://www.shoup.net/papers/iso-2_1.pdf
there is this paper which is widely linked in the internet which refers to just using X at page 27: http://www.secg.org/sec1-v2.pdf
So, result is: I'm confused. Can anybody point me in the right direction, or is this just a degree of freedom you have (and reason for lot's of fun when it comes to compatibility)?

To answer my quesion myself: yes, this is a degree of freedom. The X coordinate way is called compact representation, and it's defined in RFC 6090. So both are valid.
They are also equally secure, because you can calculate Y out of X as described in appendix C at RFC 6090.
The default way is using compact representation. Both ways are not compatibile to each other, so if you stumble across compatibility issues between libaries this might be an interesting point to find out.

Related

What is the use of bit manipulation?

I searched for clear explanation with some example close to real life.
What is bit manipulation?
Why we need to use bit manipulation?
We can use bit manipulation in image processing as far as I know. Can anyone show me a simple problem which can be solved using bit manipulation?
I read about bit manipulation from some link:
Link 1
Link 2
In Link 2 Data compression is done using bit packing. Are there any difference between bit manipulation and bit packing?
It will be appreciable If anyone explain me with very simple example which have resemble to real life problem.
What is bit manipulation?
Bit manipulation usually refers to changing data using bit operators.
I think Wikipedia expains it good enough so I won't write another article.
https://en.wikipedia.org/wiki/Bit_manipulation
Bit manipulation is the act of algorithmically manipulating bits or
other pieces of data shorter than a word. Computer programming tasks
that require bit manipulation include low-level device control, error
detection and correction algorithms, data compression, encryption
algorithms, and optimization. For most other tasks, modern programming
languages allow the programmer to work directly with abstractions
instead of bits that represent those abstractions. Source code that
does bit manipulation makes use of the bitwise operations: AND, OR,
XOR, NOT, and possibly other operations analogous to the boolean
operators; there are also bit shifts and operations to count ones and
zeros, find high and low one or zero, set, reset and test bits,
extract and insert fields, mask and zero fields, gather and scatter
bits to and from specified bit positions or fields. Integer arithmetic
operators can also effect bit-operations in conjunction with the other
operators.
Bit manipulation, in some cases, can obviate or reduce the need to
loop over a data structure and can give many-fold speed ups, as bit
manipulations are processed in parallel.
Why we need to use bit manipulation?
Because it is fast and often we don't have another choice. For example in microcontrollers, pretty much everything is controlled by manipulating the bits of 8 bit registers. So an output would go high if you set a certain bit 1.
Bit packing is a compression technique that tries to minimize the number of bits necessary to represent a number. While you'll use bit operators to implement it, it is not the same as "bit-manipulation". It's just one of many many use cases for bit-manipulation.
Can anyone show me a simple problem which can be solved using bit manipulation?
Let's say you have a rgb touple rgb = 0xa1fc03 and you want to make the green channel 0.
rgb_without_green = rgb & 0xFF00FF
We've bitwise ANDed the value with 0xFF00FF.
Now rgb is 0xa10003.
Basically any operation boils down to bit manipulation. For most of them you just have convenient solutions. Say instead of 0x00000011 << 0x0000101 you write 3 * 32
Or have a look at this where the addition of two integers is implemented using bit operations. Add two integers using only bitwise operators?
Edit due to comment
How bitwise AND operation between 0xa1fc03 and 0xFF00FF gives 0xa10003
? Just I need to see how to do this calculation
Bitwise AND means that you AND all the bits of both numbers.
1 AND 1 -> 1
0 AND 1 -> 0
1 AND 0 -> 0
0 AND 0 -> 0
So
0xa1fc03 -> 0b101000011111110000000011
0xff00ff -> 0b111111110000000011111111
AND -> 0b101000010000000000000011
0b101000010000000000000011 -> 0xa10003
With a bit more expierience you know that 0xFF is 0b11111111 so you instantly know that 0xa1fc03 AND 0xff00ff is 0xa1003 becaue you keep everything that is masked with FF and set everything 0 that is masked with 00.
There are countless resources available. You should not have to ask me how to bitwise AND two numbers. Please do your own research.
I am busy designing a new barcode symbology for real-life applications. It uses a checksum value, which is computed on slices of k bits of large numbers. Hence intense bit manipulation.

Outputting values from CAMPARY

I'm trying to use the CAMPARY library (CudA Multiple Precision ARithmetic librarY). I've downloaded the code and included it in my project. Since it supports both cpu and gpu, I'm starting with cpu to understand how it works and make sure it does what I need. But the intent is to use this with CUDA.
I'm able to instantiate an instance and assign a value, but I can't figure out how to get things back out. Consider:
#include <time.h>
#include "c:\\vss\\CAMPARY\\Doubles\\src_cpu\\multi_prec.h"
int main()
{
const char *value = "123456789012345678901234567";
multi_prec<2> a(value);
a.prettyPrint();
a.prettyPrintBin();
a.prettyPrintBin_UnevalSum();
char *cc = a.prettyPrintBF();
printf("\n%s\n", cc);
free(cc);
}
Compiles, links, runs (VS 2017). But the output is pretty unhelpful:
Prec = 2
Data[0] = 1.234568e+26
Data[1] = 7.486371e+08
Prec = 2
Data[0] = 0x1.987bf7c563caap+86;
Data[1] = 0x1.64fa5c3800000p+29;
0x1.987bf7c563caap+86 + 0x1.64fa5c3800000p+29;
1.234568e+26 7.486371e+08
Printing each of the doubles like this might be easy to do, but it doesn't tell you much about the value of the 128 number being stored. Performing highly accurate computations is of limited value if there's no way to output the results.
In addition to just printing out the value, eventually I also need to convert these numbers to ints (I'm willing to try it all in floats if there's a way to print, but I fear that both accuracy and speed will suffer). Unlike MPIR (which doesn't support CUDA), CAMPARY doesn't have any associated multi-precision int type, just floats. I can probably cobble together what I need (mostly just add/subtract/compare), but only if I can get the integer portion of CAMPARY's values back out, which I don't see a way to do.
CAMPARY doesn't seem to have any docs, so it's conceivable these capabilities are there, and I've simply overlooked them. And I'd rather ask on the CAMPARY discussion forum/mail list, but there doesn't seem to be one. That's why I'm asking here.
To sum up:
Is there any way to output the 128bit ( multi_prec<2> ) values from CAMPARY?
Is there any way to extract the integer portion from a CAMPARY multi_prec? Perhaps one of the (many) math functions in the library that I don't understand computes this?
There are really only 2 possible answers to this question:
There's another (better) multi-precision library that works on CUDA that does what you need.
Here's how to modify this library to do what you need.
The only people who could give the first answer are CUDA programmers. Unfortunately, if there were such a library, I feel confident talonmies would have known about it and mentioned it.
As for #2, why would anyone update this library if they weren't a CUDA programmer? There are other, much better multi-precision libraries out there. The ONLY benefit CAMPARY offers is that it supports CUDA. Which means the only people with any real motivation to work with or modify the library are CUDA programmers.
And, as the CUDA programmer with the most vested interest in solving this, I did figure out a solution (albeit an ugly one). I'm posting it here in the hopes that the information will be of value to future CAMPARY programmers. There's not much information out there for this library, so this is a start.
The first thing you need to understand is how CAMPARY stores its data. And, while not complex, it isn't what I expected. Coming from MPIR, I assumed that CAMPARY stored its data pretty much the same way: a fixed size exponent followed by an arbitrary number of bits for the mantissa.
But nope, CAMPARY went a different way. Looking at the code, we see:
private:
double data[prec];
Now, I assumed that this was just an arbitrary way of reserving the number of bits they needed. But no, they really do use prec doubles. Like so:
multi_prec<8> a("2633716138033644471646729489243748530829179225072491799768019505671233074369063908765111461703117249");
// Looking at a in the VS debugger:
[0] 2.6337161380336443e+99 const double
[1] 1.8496577979210756e+83 const double
[2] 1.2618399223120249e+67 const double
[3] -3.5978270144026257e+48 const double
[4] -1.1764513205926450e+32 const double
[5] -2479038053160511.0 const double
[6] 0.00000000000000000 const double
[7] 0.00000000000000000 const double
So, what they are doing is storing the max amount of precision possible in the first double, then the remainder is used to compute the next double and so on until they encompass the entire value, or run out of precision (dropping the least significant bits). Note that some of these are negative, which means the sum of the preceding values is a bit bigger than the actual value and they are correcting it downward.
With this in mind, we return to the question of how to print it.
In theory, you could just add all these together to get the right answer. But kinda by definition, we already know that C doesn't have a datatype to hold a value this size. But other libraries do (say MPIR). Now, MPIR doesn't work on CUDA, but it doesn't need to. You don't want to have your CUDA code printing out data. That's something you should be doing from the host anyway. So do the computations with the full power of CUDA, cudaMemcpy the results back, then use MPIR to print them out:
#define MPREC 8
void ShowP(const multi_prec<MPREC> value)
{
multi_prec<MPREC> temp(value), temp2;
// from mpir at mpir.org
mpf_t mp, mp2;
mpf_init2(mp, value.getPrec() * 64); // Make sure we reserve enough room
mpf_init(mp2); // Only needs to hold one double.
const double *ptr = value.getData();
mpf_set_d(mp, ptr[0]);
for (int x = 1; x < value.getPrec(); x++)
{
// MPIR doesn't have a mpf_add_d, so we need to load the value into
// an mpf_t.
mpf_set_d(mp2, ptr[x]);
mpf_add(mp, mp, mp2);
}
// Using base 10, write the full precision (0) of mp, to stdout.
mpf_out_str(stdout, 10, 0, mp);
mpf_clears(mp, mp2, NULL);
}
Used with the number stored in the multi_prec above, this outputs the exact same value. Yay.
It's not a particularly elegant solution. Having to add a second library just to print a value from the first is clearly sub-optimal. And this conversion can't be all that speedy either. But printing is typically done (much) less frequently than computing. If you do an hour's worth of computing and a handful of prints, the performance doesn't much matter. And it beats the heck out of not being able to print at all.
CAMPARY has a lot of shortcomings (undoced, unsupported, unmaintained). But for people who need mp numbers on CUDA (especially if you need sqrt), it's the best option I've found.

Generate custom length hash values of a String in Swift

Is it possible to somehow "hash" a given String with length n to a hash value of an arbitrary length m?
I want to achieve something like follows:
let s1 = "<UNIQUE_USER_IDENTIFIER_1>"
let s2 = "<UNIQUE_USER_IDENTIFIER_2>"
let x1 = s1.hashValue(length: 4)
let x2 = s2.hashValue(length: 4)
I want to assign each given user a (e.g. four-digit) number, that is based on its unique UID. Is that possible?
First, I want to be clear that you mean "hash" and don't mean "(lossless) compress." You should expect some collisions where x1 and x2 are the same value for different s1 and s2. If you really mean a mapping so that there are no collisions, then we have to know a lot more about the problem. It is impossible to achieve that in the general case (see the Pigeonhole principle). But it can be achieved in some special cases where there is sufficient redundancy in the input. Or it can be done by maintaining a table (i.e. a database or the like). The rest of this answer is about hashing.
If your UID is a UUID created on iOS (or any v4 UUID), then its bits are already quite high quality, and the last four digits should be fine without doing any hashing at all. There are a couple of bytes in the middle that you should avoid, but the whole end section is random and so an ideal hash.
If your UUID is not random, you can try using the default hashes and pulling the required number of bits out of them, but non-cryptographic hashes don't always have good independence between their bits, so this may collide more than you like.
In that case use a cryptographic hash larger than the size you need and truncate it (or take the least-significant bits; either set are fine). This is commonly done in cryptography. For example SHA-512/256 is a commonly used hash that computes a 512-bit hash and extracts 256 bits from it. Cryptographic hashes require high independence of all their bits, so any subset of bits will also be collision resistant.
BTW, if you mean "4 decimal digits," then you should expect a collision about 1 time out 100. If you mean 16 bits (4 hex digits), you should expect a collision about one time in 300. These are your best-case scenarios and mean your hash is working well. See Birthday Attack for a table of expectations and some helpful approximations.
Based on only the information you provided:
extension String {
func hashValue(length: Int) -> Int? {
return Int(String(abs(hash)).prefix(length))
}
}
Usage:
"foo".hashValue(length: 4) // 5192
This will give you a consistent positive integer result based on the string input. Obviously it is not very useful for uuid purposes but useful for other use-cases nonetheless.

Does Z3 have support for optimization problems

I saw in a previous post from last August that Z3 did not support optimizations.
However it also stated that the developers are planning to add such support.
I could not find anything in the source to suggest this has happened.
Can anyone tell me if my assumption that there is no support is correct or was it added but I somehow missed it?
Thanks,
Omer
If your optimization has an integer valued objective function, one approach that works reasonably well is to run a binary search for the optimal value. Suppose you're solving the set of constraints C(x,y,z), maximizing the objective function f(x,y,z).
Find an arbitrary solution (x0, y0, z0) to C(x,y,z).
Compute f0 = f(x0, y0, z0). This will be your first lower bound.
As long as you don't know any upper-bound on the objective value, try to solve the constraints C(x,y,z) ∧ f(x,y,z) > 2 * L, where L is your best lower bound (initially, f0, then whatever you found that was better).
Once you have both an upper and a lower bound, apply binary search: solve C(x,y,z) ∧ 2 * f(x,y,z) > (U - L). If the formula is satisfiable, you can compute a new lower bound using the model. If it is unsatisfiable, (U - L) / 2 is a new upper-bound.
Step 3. will not terminate if your problem does not admit a maximum, so you may want to bound it if you are not sure it does.
You should of course use push and pop to solve the succession of problems incrementally. You'll additionally need the ability to extract models for intermediate steps and to evaluate f on them.
We have used this approach in our work on Kaplan with reasonable success.
Z3 currently does not support optimization. This is on the TODO list, but it has not been implemented yet. The following slide decks describe the approach that will be used in Z3:
Exact nonlinear optimization on demand
Computation in Real Closed Infinitesimal and Transcendental Extensions of the Rationals
The library for computing with infinitesimals has already been implemented, and is available in the unstable (work-in-progress) branch, and online at rise4fun.

DSL for Clojure image synthesis

I'm experimenting with creating a small library/DSL for image synthesis in Clojure. Basically the idea is to allow users of the library to compose sets of mathematical functions to procedurally create interesting images.
The functions need to operate on double values, and take the form of converting a location vector into a colour value, e.g. (x,y,z) - > (r,g,b,a)
However I'm facing a few interesting design decisions:
Inputs could have 1,2,3 or maybe even 4 dimensions (x,y,z plus time)
It would be good to provide vector maths operations (dot products, addition, multiplication etc.)
It would be valuable to compose functions with operations such as rotate, scale etc.
For performance reasons, it is important to use primitive double maths throughout (i.e. avoid creating boxed doubles in particular). So a function which needs to return red, green and blue components perhaps needs to become three separate functions which return the primitive red, green and blue values respectively.
Any ideas on how this kind of DSL can reasonably be achieved in Clojure (1.4 beta)?
A look at the awesome ImageMagick tools http://www.imagemagick.org can give you an idea of what kind of operations would be expected from such a library.
Maybe you'll see that you won't need to drop down to vector math if you replicate the default IM toolset.
OK, so I eventually figured out a nice way of doing this.
The trick was to represent functions as a vector of code (in the "code is data" sense, e.g.
[(Math/sin (* 10 x))
(Math/cos (* 12 y))
(Math/cos (+ (* 5 x) (* 8 y)))]
This can then be "compiled" to create 3 objects that implement a Java interface with the following method:
public double calc(double x, double y, double z, double t) {
.....
}
And these function objects can be called with primitive values to get the Red, Green and Blue colour values for each pixel. Results are something like:
Finally, it's possible to compose the functions using a simple DSL, e.g. to scale up a texture you can do:
(vscale 10 some-function-vector)
I've published all the code on GitHub for anyone interested:
https://github.com/mikera/clisk

Resources