Are UUID's, and the most basic level, just a string of unique characters? - ios

I am currently learning about UUID in iOS, and of course I'm trying to make sense of them. From what I can gather, when you call NSUUID(), it returns a 128 bit string that is completely unique (though I'm not currently interested in how it can ensure a completely unique string, I figure it takes into account the date, time, and device identity). To make use of this string, you can append it to the end of the Document Directory (which I believe is unique to each application) to ensure a unique file path that can be used to access files later. Is this a correct understanding of the concept?

Globally Unique Identifiers are 128-bit binary strings.
Microsoft COM uses them to prevent "name collisions" between components without needing some "central naming authority" (like we have for DNS names, IP addresses, broadcast frequencies, etc etc).
GUIDs are likely to be unique ... but it's not guaranteed.
Here is a good article explaining more:
http://betterexplained.com/articles/the-quick-guide-to-guids/
And yes, your understanding of iOS NSUUIDs is exactly right:
http://nshipster.com/nstemporarydirectory/
http://nshipster.com/uuid-udid-unique-identifier/

It depends on the version of Universally unique identifier. Version 4 is almost guaranteed to be unique but not completely. Wikipedia states the following:
"Out of a total of 128 bits, two bits indicate an RFC 4122 ("Leach-Salz") UUID and four bits the version (0100 indicating "randomly generated"), so randomly generated UUIDs have 122 random bits. The chance of two such UUIDs having the same value can be calculated using probability theory (birthday paradox). Using the approximation"
Reference: https://en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_.28random.29

Related

NSUUID duplication chance form different device.

I need to generate Unique ID for the device when the application installed, and store this value on the device, then need to communicate with server using this UUID. And it seems NSSUUD suit for the sitiation, but I am confused is there any chance of duplication of the UUID from multiple device. I already found the answer https://stackoverflow.com/a/6963990/1573209 where it describe that the version 1 type uses MAC address and 60 bit clock to generate UUID, so the duplication chance is negligible. Where as the Version4 uses some fixed number and some random number to generate the UUID, and the doc says that UUIDs created by NSUUID conform to RFC 4122 version 4 and are created with random bytes
Does that mean the chance of duplication higher?.
Then how can I use version 1 type of UUID generator, I cant see any documentation for it.
You can have look at this RFC 4122. UUID conforming to RFC 4122 are practically unique in given space and time. You can also see Random UUID probability of duplicates.
Out of a total of 128 bits, two bits indicate an RFC 4122 ("Leach-Salz") UUID and four bits the version (0100 indicating "randomly generated"), so randomly generated UUIDs have 122 random bits. The chance of two such UUIDs having the same value can be calculated using probability theory (birthday problem). Probabilities of an accidental clash after calculating n UUIDs, with x = 122 is found to be very close to zero
For n=2^36 which is 68,719,476,736 probability of collision is found to be 0.0000000000000004. For lesser value of n, this value will be even less and probability increases as more UUID's are generated. In above estimation n represents number of UUID's generated.

How do you I register for Electronic Data Interchange (EDI) ISA number?

I will receive 850 purchase order. In return, I need to generate and send 997 response, which include ISA/GS number. Where and who do I register with for this ISA id?
Thanks in advance
EDI systems are typically limited in scope to be between a few or even just 2 different organizations. These organizations need to decide beforehand on how much of the full EDI specification they're going to use, and how they're going to specify IDs. See here.
Also, see here. From this it looks like DUNS numbers or variants on them are common choices for IDs.
So your organization and the others need to just figure out if you're going to use DUNS number or ad-hoc made up numbers or what.
Your 850 will have an ISA (interchange) and GS (group) identifier where you will be designated as the receiver. When you generate the 997, the IDs will be reversed so that you are the sender of the acknowledgement.
Back in the day, it was important to uniquely identify yourself. X12 handles this via a qualifier/ID pair. Let's say you want to use your phone number. Your ID would be 12 (qualifier) and then 5555551212 (your ID / phone number). You could make up something arbitrary like ZZ (qualifier: mutally defined) and ACMEWIDGETSCO. Again, it should be something unique and not already found on a VAN. This is probably less probable these days than it was 10 years ago when everyone was using VANs predominantly.
Look at the below example. The IDs in this example are made up, but could be DUNS, HIN, Industry identifier, phone number, mutually defined, etc. Just for frame of reference, I used SENDER and RECEIVER.
ISA*00* *00* *ZZ*SENDER *ZZ*RECEIVER *150622*2131*U*00401*000000006*0*T*>~
GS*PO*SENDER*RECEIVER*20150622*2131*4*X*004010~
In other words, you don't need to register it with anyone, you just need to make sure it is unique on the networks you are trading on with - that's really the important part. If you're using direct connections (AS2, FTP) to your partners, it won't matter as much, but the best practice is to give your company an ID that is somewhat unique (DUNS, phone numbers, arbitrary name). If you don't understand EDI, download EDI Notepad from Liaison and that should give you a better picture of how the data is described.

How do systems typically map an 997 or 999 acknowledgement back to the originating ISA?

The implementation guides (and most web resources I can find) describe the GS06 and ST02 Control Numbers as being unique only within the Interchange they are contained in. So when we build our GS and ST segments we just start the control numbers at 1 and increment as we add more Functional Groups and/or Transaction Sets. The ISA13 control numbers we generate are always unique.
The dilemma is when we receive a 999 acknowledgment; it does not include any reference to the ISA control number that it's responding to. So we have no way to find the correct originating Functional Group in our records.
This seems like a problem that anyone receiving functional acknowledgements would face, but clearly lots of systems and companies handle it, so what is the typical practice to reconcile 997s or 999s? I think we must be missing something in our reading of the guides.
GS06 and ST02 only have to be unique within the interchange, but if you use an ID that's truly unique for each one (not just within the message), then you can skip right to the proper transaction set or functional group, not just the right message.
I typically have GS start at 1 and increment the same way that you do, but the ST02 I keep unique (to the extent allowed by the 9 character limit).
GS06 is supposed to be globally unique, not only within the interchange. This is from X12-6
In order to provide sufficient discrimination for the acknowledgment
process to operate reliably and to ensure that audit trails are
unambiguous, the combination of Functional ID Code (GS01), Application
Sender's ID (GS02), Application Receiver's ID (GS03), and Functional
Group Control Numbers (GS06, GE02) shall by themselves be unique
within a reasonably extended time frame whose boundaries shall be
defined by trading partner agreement. Because at some point it may be
necessary to reuse a sequence of control numbers, the Functional Group
Date and Time may serve as an additional discriminant only to
differentiate functional group identity over the longest possible time
frame.

DHT Node ID Generation?

I just start studying DHT implementation and theory and stuck on on part, how generates node id when node startup and connect to network. I read that ID is random hash from some hashes range but, is it unique hash? and is hash generates close no the data which this node store? Help me with this.
Self-generation of the node ID using a good hash function over a large space of values is a common technique used in DHT/P2P systems. Since the hash guarantees good random distribution, the probability of a collision is very small. Statistically, the ID will (almost always) be unique.
That hash is independent from the data stored of the node.
import random
import hashlib
def newID():
s = ""
for i in range(20):
s += chr(random.randint(0, 255))
m = hashlib.sha1()
m.update(s)
return m.digest()
As said in the previous answers, the ID of a node is generated by hashing it's IP address (generally speaking, such is the case in a DHT like Chord) or other uniquely identifiable information.
And since it uses Consistent Hashing when a node will join or leave the n-network, only 1/nkeys needs to be remapped, thus it lends itself to highly dynamic network topologies, such as peer-to-peer.
Technically, the hash generated doesn't convey any information about the data that is stored on this node. Rather the hash for a certain key (or entry in a data store, if used for such purpose) originates from hashing the keyword (or the filename or the file contents).
As a direct consequence of the Consistent Hashing, the abstract concept of distance between keys emerges. (As stated here) A node owns all the keys for which its identifying key (ID) is the closest to according to the distance metric.

"Smart" / Economical Data Storage Techniques?

I would like to store millions of data lines that looks like this:
key, value
key is an integer in the range of (0 to 5,000,000); all values are unique;
value is an unsigned int16 value (0 to 65535)
the key is to store the data while taking the LEAST AMOUNT OF DISK SPACE, and yet, be able to query the data. can you think of any algorithms / smart schemes for data storage that would be helpful?
just in case it matters, I use Linux.
One option would be, if the key values are not important data but rather just index data to utilize a flat file of bits ( with a descriptive header ). Every 16 bits is a value and the nth value would then be (n - 1) * 16 bits from the end of the header.
Additionally, if the key value does matter, a set flat file of about 10MB would allow for the entire key space to be stored without storing actual keys. The 16 bits that are at the (n - 1) * 16 offset would be that key's value.
That would probably be the least space intensive method for storage, as it would be only the data that is literally required. ( Though, if you are only interested in say 100k values and one has a key of 5 million you do end up with a lot of wasted space, which wouldn't be there with an actual key,value addressing system. So this methodology only achieves a minimum disk storage for sets of tightly grouped values or many many numbers (over about the 2 million mark ).
how do you plan to use stored data? with random or sequential access? for sequential access you can use any archiving algorithm, e.g. LZMA. Random access doesn't leave you a lot of space for improvements.
can you see any patterns of this data? e.g. if the difference between adjacent keys/values are often small you can store only packed differences. and million of other possible approaches.
[EDIT] also you can check techniques used for data compression in network communication
[EDIT1] and you can check this Google Code Integer Array Compression project
This depend upon the operation and data. I would first recommend "just using a database" (a simple key-value store such as BDB/EhCache [read: Key Value store], for instance :-)
Mimisbrunnr also has a good answer if all the keys are used.
If the keys are near constant/read-only and only a relatively small percent of the keys are used, consider the use of a (disk-based) Heap data-structure (very similar to an Array-based Heap; Heaps need not be Array-based). Robert Sedgewick had a good book from the late 80's that had a very lean implementation, but I forget the name. A Heap will be more beneficial when compared to a flat index with a smaller proportion of used keys and at full-load will have worse storage requirements.
(If abstracted, the used method could be switched and/or a hybrid heap with indexed/sequenced leaf-node values could be used [along with Huffman encoding or whatnot], but that is just adding far more complications. Keep it simple ... hence first suggestion of an existing key/value store ;-)
Happy coding.
Have you considered using a database designed for mobile devices such as SQL Server Compact, or another similar database? These will have a small footprint on the disk, while still providing the full search power you need.
Another example of a compact database engine is KeyDB for linux:
http://3d2f.com/programs/11-989-keydb-download.shtml

Resources