large integers in cypher, neo4j - neo4j

I have a dataset with some hexadecimal integers like '4726E440'.
I want to add this numbers as attributes of the nodes.
If I execute:
CREATE (n {id:toInt("4726E440")});
neo4j gives me this error:
integer, 4726E440, is too large
Is there any way to handle this kind of integers (other than saving them as strings)?

Not 100% sure, but this looks like you're trying to convert a string holding a floating point number 4724*10^440 to an int value. That one obviously is too large.
If you want to use hex literals you need to prefix them with 0x, e.g.
return toInt(0x4726E440)
returns 1193731136 - so it's still in range.

If you are wondering what the actual limit for number size in Neo4J is, this forum post might interest you.
Basically, Neo4J uses signed 64bit integers with a maximum of 2**63 - 1. There seems to be no way to increase this limit at the moment, and you will have to resort to strings or byte lists if you really have to store numbers of this size.

Just to build on the other answers, you'll need to wrap your big number in the toInteger() in cypher. The following numbers should not equal one another, but Neo4j thinks they do. (Code was run in Neo4j v4.2, first via the browser interface and then using the python driver):
RETURN 2^63-2 AS Minus2, 2^63-1 AS Minus1, 2^63-2 = 2^63-1 AS Comparison
╒═════════════════════╤═════════════════════╤════════════╕
│"Minus2" │"Minus1" │"Comparison"│
╞═════════════════════╪═════════════════════╪════════════╡
│9223372036854776000.0│9223372036854776000.0│true │
└─────────────────────┴─────────────────────┴────────────┘
But, if you convert the big number to an integer in the statement, Cypher reads it correctly:
RETURN toInteger(2^63)-2 AS Minus2, toInteger(2^63)-1 AS Minus1, toInteger(2^63)-2 = toInteger(2^63)-1 AS Comparison
╒═══════════════════╤═══════════════════╤════════════╕
│"Minus2" │"Minus1" │"Comparison"│
╞═══════════════════╪═══════════════════╪════════════╡
│9223372036854775805│9223372036854775806│false │
└───────────────────┴───────────────────┴────────────┘

Related

What is the difference between len() and count() for the Vinyl engine?

I have different results for space:len() and space:count() for a space with a vinyl engine. What do these methods return?
Scanning vinyl space and counting exact number may be very expensive, that's why :len() function exists. It gives you fast but approximate result. If it doesn't work for you, use :count() function instead. It does full scan and returns correct result.
For more info there is related issue on this topic and implementation commit says the following:
index.len returns the total number of rows stored in the index.
It is the sum of memory.rows and disk.rows as reported by
index.info. Note, it may be greater than the number of tuples
stored in the space, because it includes DELETE and UPDATE
statements.

Neo4j floating point sum different results

I am using neo4j to calculate some statistics on a data set. For that I am often using sum on a floating point value. I am getting different results depending on the circumstances. For example, a query that does this:
...
WITH foo
ORDER BY foo.fooId
RETURN SUM(foo.Weight)
Returns different result than the query that simply does the sum:
...
RETURN SUM(foo.Weight)
The differences are miniscule (293.07724195098984 vs 293.07724195099007). But it is enough to make simple equality checks fail. Another example would be a different instance of the database, loaded with the same data using the same loading process can produce the same issue (the dbs might not be 1:1, the load order of some relations might be different). I took the raw values that neo4j sums (by simply removing the SUM()) and verified that they are the same in all cases (different dbs and ordered/not ordered).
What are my options here? I don't mind losing some precision (I already tried to cut down the precision from 15 to 12 decimal places but that did not seem to work), but I need the results to match up.
Because of rounding errors, floats are not associative. (a+b)+c!=a+(b+c).
The result of every operation is rounded to fit the floats coding constraints and (a+b)+c is implemented as round(round(a+b) +c) while a+(b+c) as round(a+round(b+c)).
As an obvious illustration, consider the operation (2^-100 + 1 -1). If interpreted as a (2^-100 + 1)-1, it will return 0, as 1+2^-100 would require a precision too large for floats or double coding in IEEE754 and can only be coded as 1.0. While (2^-100 +(1-1)) correctly returns 2^-100 that can be coded by either floats or doubles.
This is a trivial example, but these rounding errors may exist after every operation and explain why floating point operations are not associative.
Databases generally do not return data in a garanteed order and depending on the actual order, operations will be done differently and that explains the behaviour that you have.
In general, for this reason, it not a good idea to do equality comparison on floats. Generally, it is advised to replace a==b by abs(a-b) is "sufficiently" small.
"sufficiently" may depend on your algorithm. float are equivalent to ~6-7 decimals and doubles to 15-16 decimals (and I think that it is what is used on your DB). Depending on the number of computations, you may have the last 1--3 decimals affected.
The best is probably to use
abs(a-b)<relative-error*max(abs(a),abs(b))
where relative-error must be adjusted to your problem. Probably something around 10^-13 can be correct, but you must experiment, as rounding errors depends on the number of computations, on the dispersion of the values and on what you may consider as "equal" for you problem.
Look at this site for a discussion on comparison methods. And read What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg that discusses, among others, these problems.

Dealing with big numbers in Lua

I am in need to store a large number in Lua, for example the number 63680997318088143281752740767766707563546963464218564507450892460763521488675430192536461.
If I simple assign to a variable, I don't get the actual number:
local n = 63680997318088143281752740767766707563546963464218564507450892460763521488675430192536461
print(string.format("%.0f",n)) -- prints 63680997318088143929455344863959288468423333130904105158115881995380577784972357899649024
What would be the possible turn arounds to handle large numbers?
Lua numbers have limited precision, but you are trying to store a number that exceeds what can be stored. You'll need to use a different mechanism to store them and operate on these numbers.
The key words are "bignum" and "arbitrary precision numbers". Quick google search returns several pure-Lua modules (bignum and lua-nums) and one C-based one (lmapm). Also see this SO answer for other options.

Storing data with large input

There is a problem in a competitive programming site(hackerrank) in which the input number is of the range 10^18.So,is it possible to store (10^18) in java?If yes then which data type should be used?
For some easy HackerRank problems, BigInteger or BigDecimal do work for extremely large inputs,but they usually don't work in moderate/difficult problems as they tend to reduce performance & a high number of test-cases of extremely large inputs can cause a timeout.
In such cases,you will need go for different storage techniques e.g. an array of int,each element of the array representing a digit of the large input. You will then need to do digit-based arithmetic on the array for your computations.
BigInteger.valueOf(10).pow(10000)
No real need not be careful, as the BigInteger.valueOf(long) method will give you a compilation error if you try to write a literal that exceeds Long.MAX_VALUE. Furthermore, it's easy to construct a BigInteger much greater, say BigInteger.valueOf(10).pow(10000)

Lookup table size reduction

I have an application in which I have to store a couple of millions of integers, I have to store them in a Look up table, obviously I cannot store such amount of data in memory and in my requirements I am very limited I have to store the data in an embebedded system so I am very limited in the space, so I would like to ask you about recommended methods that I can use for the reduction of the look up table. I cannot use function approximation such as neural networks, the values needs to be in a table. The range of the integers is not known at the moment. When I say integers I mean a 32 bit value.
Basically the idea is use some copmpression method to reduce the amount of memory but without losing many precision. This thing needs to run in hardware so the computation overhead cannot be very high.
In my algorithm I have to access to one value of the table do some operations with it and after update the value. In the end what I should have is a function which I pass an index to it and then I get a value, and after I have to use another function to write a value in the table.
I found one called tile coding , this one is based on several look up tables, does anyone know any other method?.
Thanks.
I'd look at the types of numbers you need to store and pull out the information that's common for many of them. For example, if they're tightly clustered, you can take the mean, store it, and store the offsets. The offsets will have fewer bits than the original numbers. Or, if they're more or less uniformly distributed, you can store the first number and then store the offset to the next number.
It would help to know what your key is to look up the numbers.
I need more detail on the problem. If you cannot store the real value of the integers but instead an approximation, that means you are going to reduce (throw away) some of the data (detail), correct? I think you are looking for a hash, which can be an artform in itself. For example say you have 32 bit values, one hash would be to take the 4 bytes and xor them together, this would result in a single 8 bit value, reducing your storage by a factor of 4 but also reducing the real value of original data. Typically you could/would go further and perhaps and only use a few of those 8 bits , say the lower 4 and reduce the value further.
I think my real problem is either you need the data or you dont, if you need the data you need to compress it or find more memory to store it. If you dont, then use a hash of some sort to reduce the number of bits until you reach the amount of memory you have for storage.
Read http://www.cs.ualberta.ca/~sutton/RL-FAQ.html
"Function approximation" refers to the
use of a parameterized functional form
to represent the value function
(and/or the policy), as opposed to a
simple table."
Perhaps that applies. Also, update your question with additional facts -- don't merely answer in the comments.
Edit.
A bit array can easily store a bit for each of your millions of numbers. Let's say you have numbers in the range of 1 to 8 million. In a single megabyte of storage you can have a 1 bit for each number in your set and a 0 for each number not in your set.
If you have numbers in the range of 1 to 32 million, you'll require 4Mb of memory for a big table of all 32M distinct numbers.
See my answer to Modern, high performance bloom filter in Python? for a Python implementation of a bit array of unlimited size.
If you are merely looking for the presence of the number in question a bloom filter, might be what you are looking for. Honestly though your question is fairly vague and confusing. It would help to explain what Q values are, and what you do with them once you find them in the table.
If your set of integers is homongenous, then you could try a hash table, because there is a trick you can use to cut the size of the stored integers, in your case, in half.
Assume the integer, n, because its set is homogenous can be the hash. Assume you have 0x10000 (16k) buckets. Each bucket index, iBucket = n&FFFF. Each item in a bucket need only store 16 bits, since the first 16 bits are the bucket index. The other thing you have to do to keep the data small is to put the count of items in the bucket, and use an array to hold the items in the bucket. Using a linked list will be too large and slow. When you iterate the array looking for a match, remember you only need to compare the 16 bits that are stored.
So assuming a bucket is a pointer to the array and a count. On a 32 bit system, this is 64 bits max. If the number of ints was small enough we might be able to do some fancy things and use 32 bits for a bucket. 16k * 8 bytes = 524k, 2 million shorts = 4mb. So this gets you a method to lookup the ints and about 40% compression.

Resources