Meaning of a halfword and a word in microprocessor - memory

I'm taking a microprocessor class and I'm a bit confused on the value of a halfword and word. In lectures, my professor keeps referring to a halfword or a word as though it's a fixed value, like Pi.
My understanding (if correct) is the value depends on the processor, 32bits, 64 bits, etc.. So, for a 32bit processor, a halfword would be 18bits. Would someone clarify this?

As the commenter noted, "word" is a loaded term. Now a days, processors have standardized on processors that use powers of as their "word" size (16, 32, 64, 128). This is why we talk in terms of bytes.
1) This was not so in ye olde days. The PDP-8 was a 12-bit processor. Sperry and the DEC TOPS systems were 36-bits. In these cases a "word" the amount of data the processor inherently worked with: in these examples, 12-bits or 36-bits.
2) In on the PDP-11 and VAX, the term "word" meant 16-bits (even though the VAX was a 32-bit processor).
[There may be other "word" meanings I have omitted.]
Thus a word could mean either the amount of data the processor works with or a specific data length. A half word is half of that amount.
You'd really have to ask to find out what your professor means when he says "word."

Related

Is information stored in registers/memory structured as binary?

Looking at this question on Quora HERE ("Are data stored in registers and memory in hex or binary?"), I think the top answer is saying that data persistence is achieved through physical properties of hardware and is not directly relatable to either binary or hex.
I've always thought of computers as 'binary', but have just realized that that only applies to the usage of components (magnetic up/down or an on/off transistor) and not necessarily the organisation of, for example, memory contents.
i.e. you could, theoretically, create an abstraction in memory that used 'binary components' but that wasn't binary, like this:
100000110001010001100
100001001001010010010
111101111101010100001
100101000001010010010
100100111001010101100
And then recognize that as the (badly-drawn) image of 'hello', rather than the ASCII encoding of 'hello'.
An answer on SO (What's the difference between a word and byte?) mentions that processors can handle 'words', i.e. several bytes at a time, so while information representation has to be binary I don't see why information processing has to be.
Can computers do arithmetic on hex directly? In this case, would the internal representation of information in memory/registers be in binary or hex?
Perhaps "digital computer" would be a good starting term and then from there "binary digit" ("bit"). Electronically, the terms for the values are sometimes "high" and "low". You are right, everything after that depends on the operation. Most of the time, groups of bits are operated on together. Commonly groups are 1, 8, 16, 32 and 64 bits. The meaning of the bits depends on the program but some operations go hand-in-hand with some level of meaning.
When the meaning of a group of bits is not known or important, humans like to be able to decern the value of each bit. Binary could be used but more than 8 bits is hard to read. Although it is rare to operate on groups of 4 bits, hexadecimal is much more readable and is generally used regardless of the number of bits. Sometimes octal is used but that's based on contexts where there is some meaning to a subgrouping of the 3 bits or an avoidance of digits beyond 9.
Integers can be stored in two's complement format and often CPUs have instructions for such integers. Once such operation is negation. For a group of 8 bits, it would map 1 to -1,… 127 to -127, and -1 to 1, … -127 to 127, and 0 to 0 and -128 to -128. Decimal is likely the most valuable to humans here, not base 256, base 2 or base 16. In unsigned hexadecimal, that would be 01 to FF, …, 00 to 00, 80 to 80.
For an intro to how a CPU might do integer addition on a group of bits, see adder circuits.
Other number formats include IEEE-754 floating point and binary-coded decimal.
I think you understand that digital circuits are binary. So, based on the above, yes, operations do operate on a higher conceptual level despite the actual storage.

How much storage would be required to store a human genome?

I'm looking for the amount of storage in bytes (MB, GB, TB, etc.) required to store a single human genome. I read a few articles on Wikipedia about DNA, chromosomes, base pairs, genes, and have some rough guess, but before disclosing anything I'd like to see how others would approach this issue.
An alternative question would be how many atoms are there in human DNA, but that would be off topic for this site.
I understand that this will be an approximation, so I'm looking for the minimal value that would be able to store DNA of any human.
If you trust such things, here is what Wikipedia claims (from http://en.wikipedia.org/wiki/Human_genome#Information_content):
The 2.9 billion base pairs of the haploid human genome correspond to a
maximum of about 725 megabytes of data, since every base pair can be
coded by 2 bits. Since individual genomes vary by less than 1% from
each other, they can be losslessly compressed to roughly 4 megabytes.
You do not store all the DNA in one stream, rather most the time it is store by chromosomes.
A large chromosome take about 300 MB and a small one about 50 MB.
Edit:
I think the first reason why it is not saved in 2 bits per base pair is that it would cause an hurdle to work with the data. Most of the people would not know how to convert it. And even when a program for conversion would be given, a lot of people in large companies or research institutes are not allowed to/need to ask or do not know how to install programs...
1GB storage costs nothing, even the download of 3 GB takes only 4 minutes with 100 Mbitsps and most companies have faster speeds.
Another point is that the data isn't as simple as you get told.
e.g. The method for sequencing invented by Craig_Venter was a great breakthrough but has its down sides. It could not separate long chains of the same base pair, so it is not always 100% clear if there are 8 A's or 9 A's. Things you have to take care of later on...
Another example is the DNA methylation because you can't store this Information in a 2-bit representation.
Basically, each base pair takes 2 bits (you can use 00, 01, 10, 11 for T, G, C, and A). Since there are about 2.9 billion base pairs in the human genome, (2 * 2.9 billion) bits ~= 691 megabytes.
I'm no expert, however, the Human Genome page on Wikipedia states the following:
Raw MB:
Male (XY): 770MB
Female (XX): 756MB
I'm not sure where their variance comes from, but I'm sure you can figure it out.
Yes, the minimum storage space needed for whole human DNA is about 770 MB.
However, the 2-bit representation is impractical. It is hard to search through or do some computations on it. Therefore, some mathematicians designed more effective way to store those sequencies of bases and use them in searching and comparation algorithms. One such example is GARLI.
This application runs on my PC right now, and I have the human genome stored in 1563 MB.
The human genome contains over 3 billion base pairs. So if you represented each base pair as two bits then it would take over 6.15 × 10⁹ bits or approximately 770 MB.
just did it too. the raw sequence is ~700 MB. if one uses a fixed storage sequence or a fixed sequence storage algoritm - and the fact that the changes are 1% i calcuated ~120 MB with a perchromosome-sequenceoffset-statedelta storage. that's it for the storage.
There are 4 nucleotide bases that make up our DNA these are A,C,G,T therefore for each base in the DNA takes up 2bits. There are around 2.9billion bases so thats around 700 megabytes. The weird thing is that would fill a normal data cd! coincidence?!?
All answers are leaving off the fact that nuDNA is not the only DNA that defines a human genome. mtDNA is also inherited and it contributes an additional 16,500 base pairs to a human genome, bringing it more in line with the Wikipedia guess of 770MB for males, and 756MB for females.
This does not mean that a human genome can easily be stored on an 4GB USB stick. Bits do not represent information by themselves, it is the combination of bits that represent information. So in the case of nuDNA and mtDNA, the bits are encoded (not to be confused with compressed) to represent proteins and enzymes that in themselves would requires many MBs of raw data to represent, especially in terms of functionality.
Food for thought: 80% of the human genome is called "non-coding" DNA, so did you actually really believe that the entire human body and brain can be represented in a mere 151 to 154MBs of raw data?
Most answers except users slayton, rauchen, Paul Amstrong are dead wrong if its about pure storage one-on-one without compression techniques.
The human genome with 3Gb of nucleotides correspond with 3Gb of bytes and not ~750MB. The constructed "haploid" genome according to NCBI is currently 3436687kb or 3.436687 Gb in size. Check here for yourself.
Haploid = single copy of a chromosome.
Diploid = two versions of haploid.
Humans have 22 unique chromosomes x 2 = 44.
Male 23rd chromosome is X, Y and makes 46 in total.
Females 23rd chrom. is X, X and thus makes 46 in total.
For males it would be 23 + 1 chromosome in data storage on a HDD and for females 23 chromosomes, explaining the little differences mentioned now and then in answers. The X chrom. from males is equal to X chrom. from the females.
Thus loading the genome (23 + 1) into memory is done in parts via BLAST using constructed databases from fasta-files. Regardless of zipped versions or not nucleotides are hardly to be compressed. Back in the early days one of the tricks used was to replace tandem repeats (GACGACGAC with shorter coding e.g. "3GAC"; 9byte to 4byte). The reason was to save harddrive space (area of the 500bm-2GB HDDD platters with 7.200 rpm and SCSI connectors). For sequence searching this was also done with the query.
If "coded nucleotide" storage would be 2-bit per letter then you get for a byte:
A = 00
C = 01
G = 10
T = 11
Only this way you fully profit from positions 1,2,3,4,5,6,7 and 8 for 1 byte of coding. For example the combination 00.01.10.11 (as byte 00011011) would then correspond for "ACTG" (and show in a textfile as an unrecognizable character). This alone is responsible for a four times reduction in file-size as we see in other answers. Thus 3.4Gb will be downsized to 0.85917175 Gb... ~860MB including a then required conversion program (23kb-4mb).
But... in biology you want to be able to read something thus compression gzipped is more than enough. Unzipped you can still read it. If this byte filling was used it becomes harder to read the data. That's why fasta-files are plain-text files in reality.
There is only 2 types of base pairs, Cytosine can only bind to Guanine, and Adenine can only bind to thymine,
So each base pair can be considered a single bit.
This means that an entire strand of Human DNA ~3 billion "Bits" would be right around ~350 megabytes.
One base -- T, C, A, G (in the base-4 number system: 0, 1, 2, 3) -- is encoded as two bits (not one), so one base pair is encoded by four bits.

Why is the smallest value that can be stored is a Byte(8bit) & not a Bit(1bit)?

Why is the smallest value that can be stored a Byte(8bit) & not a Bit(1bit) in memory?
Even booleans are stored as Bytes. Will we ever bump the smallest number to 32 or 64bits like register's on the CPU?
EDIT: To clarify as many answers seemed confused about the nature of questing. This question is about why isn't a byte 7-bit, 1-bit, 32-bit, etc (not why lower bit primitives must fit within the hardware's byte at min). Is the 8-bit byte simply historical as some hardware has 10-bit bytes for example. Or is there a mathematical reason 8-bit is ideal vs say 10-bit for general processing?
The hardware is built to read data in blocks (bytes, later words and dwords). This provides greater efficiency, than accessing individual bits, and also offers more addressing range. So most data is aligned to at least byte boundary. There exist encodings that operate with bit sequences, rather than bytes, but they are quite rare.
Nowadays the data is most often aligned to dword (32-bits) boundary anyway. Moreover, some hardware (ARM, for example), can't access misaligned multibyte variables, i.e. 16-bit word can't "cross" dword boundary - exception will be thrown.
Because computers address memory at the byte level, so anything smaller than a byte is not addressable.
The underlying methods of processor access are limited to the size of the smallest usable register. On most architectures, that size is 8 bits. You can use smaller portions of these; for instance, C has the bitfield feature in structs that will allow combining fields that only need to be certain bit lengths. Access will still require that the whole byte be read.
Some older exotic architectures actually did have different a "word size." In these machines, 10 bits might be the common size.
Lastly, processors are almost always backwards compatible. Intel, for instance, has maintained complete instruction compatibility from the 386 on up. If you take a program compiled for the 386, it will still run on an i7 processor. Changing the word size would break compatibility. So while it is possible, no manufacturer will ever do it.
Assume that we have native language that consist of 2 character such as a , b
to distinguish two characters we need at least 1 bit for example 0 to represent char a and 1 to represent char b
so that if we count number of characters and special characters and symbols, there are 128 character and to distinguish one character from another, you need log2(128) = 7 bit and 8th bit for transmission

Why does each location in memory contain 8 bits? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Somebody has confirmed there are 8 bits in every location/address in memory.
Can I know why? is it related to the memory chip architecture? Or is it because of 32bit CPU. Is this 8 bits true for another OS such as FreeBSD, Mac, Linux?
Is there any relation the amount of bits in every location to the count of address line in memory?
Is there any other architecture that has different amount of bits per address?
As is often the case, Wikipedia has an answer:
Architectures that did not have eight-bit bytes include the CDC 6000 series scientific mainframes that divided their 60-bit floating-point words into 10 six-bit bytes. These bytes conveniently held character data from punched Hollerith cards, typically the upper-case alphabet and decimal digits. CDC also often referred to 12-bit quantities as bytes, each holding two 6-bit display code characters, due to the 12-bit I/O architecture of the machine. The PDP-10 used assembly instructions LDB and DPB to load and deposit bytes of any width from 1 to 36-bits—these operations survive today in Common Lisp. Bytes of six, seven, or nine bits were used on some computers, for example within the 36-bit word of the PDP-10. The UNIVAC 1100/2200 series computers (now Unisys) addressed in both 6-bit (Fieldata) and nine-bit (ASCII) modes within its 36-bit word. Telex machines used 5 bits to encode a character.
Factors behind the ubiquity of the eight bit byte include the popularity of the IBM System/360 architecture, introduced in the 1960s, and the 8-bit microprocessors, introduced in the 1970s. The term octet unambiguously specifies an eight-bit byte (such as in protocol definitions, for example).
It's a hardware architecture, not OS, thing. All architectures that you're going to run into, day to day, use eight bits per byte. There may be modern exceptions, particularly in the extremes (mainframe, super, embedded), but I'm not aware of any.
A memory address is the location of a specific byte in memory.
A byte has 8 bits.
In a way it's totally arbitrary - but 8 bits is convenient.
It holds 256 values so it's large enough to store all the upper and lower case letters, numbers and symbols plus it's divisible by 2, and 4 which is handy for a few things.
Some DSP architectures use 16-bit addressable units. On many PIC microcontrollers, code memory is accessed in multiples of the instruction size (12 or 14 bits). I think it's useful to have a data size which is a power of two, and which can efficiently store character data. Using 16 bits to hold characters would historically have been considered wasteful (IMHO, in many cases, it still is), and 4 bits is too small a chunk size to be useful.

Memory Units, calculating sizes, help!

I am preparing for a quiz in my computer science class, but I am not sure how to find the correct answers. The questions come in 4 varieties, such as--
Assume the following system:
Auxiliary memory containing 4 gigabytes,
Memory block equivalent to 4 kilobytes,
Word size equivalent to 4 bytes.
How many words are in a block,
expressed as 2^_? (write the
exponent)
What is the number of bits needed to
represent the address of a word in
the auxiliary memory of this system?
What is the number of bits needed to
represent the address of a byte in a
block of this system?
If a file contains 32 megabytes, how
many blocks are contained in the
file, expressed as 2^_?
Any ideas how to find the solutions? The teacher hasn't given us any examples with solutions so I haven't been able to figure out how to do this by working backwards or anything and I haven't found any good resources online.
Any thoughts?
Questions like these basically boil down to working with exponents and knowing how the different pieces fit together. For example, from your sample questions, we would do:
How many words are in a block, expressed as 2^_? (write the exponent)
From your description we know that a word is 4 bytes (2^2 bytes) and that a block is 4 kilobytes (2^12 bytes). To find the number of words in one block we simply divide the size of a block by the size of a word (2^12 / 2^2) which tells us that there are 2^10 words per block.
What is the number of bits needed to represent the address of a word in the auxiliary memory of this system?
This type of question is essentially an extension of the previous one. First you need to find the number of words contained in the memory. And from that you can get the number of bits required to represent a word in the memory. So we are told that memory contains 4 gigabytes (2^32 bytes) and that the word is 4 bytes (2^2 bytes); therefore the number words in memory is 2^32/2^2 = 2^30 words. From this we can deduce that 30 bits are required to represent a word in memory because each bit can represent two locations and we need 2^30 locations.
Since this is tagged as homework I will leave the remaining questions as exercises :)
Work backwards. This is actually pretty simple mathematics. (Ignore the word "auxilliary".)
How much is a kilobyte? How much is 4 kilobytes? Try putting in some numbers in 2^x, say x == 4. How much is 2^4 words? 2^8?
If you have 4GB of memory, what is the highest address? How large numbers can you express with 8 bits? 16 bits? Hint: 4GB is an even power of 2. Which?
This is really the same question as 2, but with different input parameters.
How many kilobytes is a megabyte? Express 32 megabytes in kilobytes. Division will be useful.

Resources