# Fixed Point Format and Floating Point Format Examples

Posted by Sreejith Hrishikesan ~ on ~ 0 comments

• It is found that in the floating-point representation we can cover a much larger range of numbers than that is possible in the fixed-point representation.

• In the floating-point representation the resolution decreases with an increase in the size of the range; this means that the distance between two successive floating-point numbers increases.

• In the floating-point scheme, resolution is variable within the range. However, for the fixed-point format, resolution is fixed and uniform.

• This variability in resolution provides a large dynamic range of the numbers.

To the ideas given above can be illustrated, by considering case of a 16-bit computer.

Example 35: Consider a 16-bit computer. Obtain the dynamic range and resolution when the computer is operated (a) fixed-point format and (b) in the floating-point format.

Solution:

(a) For the 16-bit computer, with one bit reserved for representing the sign, the highest positive and negative numbers that can be represented in the fixed-point format are -(2m-1 - 1) and  (2m-1 - 1), respectively, where m = 16. Substituting for m = 16 yields:

The highest positive number  = 216-1 - 1 = 215-1 = 32767
The highest negative number = -(216-1 - 1) = -(215-1) = -32767

This means that we can represent all the whole numbers from -32,767 to + 32,767. Since the numbers represented are of the form -32,767, -32,766, -32,765, …, 32,766, and 32,767 (i.e., successive numbers differing by one digit), we find that in this scheme:

Resolution = 1

In this scheme, we also find that we can represent only whole numbers; we can not represent fractions.
Now, suppose we want to express fractions also through this scheme. For this, let us reserve 5 bits to represent the fractional part, 10 bits to represent the integer part, and 1 bit to represent the sign of the mantissa. In the fixed-point representation, the given number can now be written as

X  = ± (210 - 1) ´ 2-5 = ± 31.96875

Thus the range in this case will be between -31.96875 and +31.96875. We have:

Resolution = 2-5  = 0.00001

We find that in this case, the range (sometimes called the dynamic range) has been considerably decreased, but resolution has been greatly increased.

(b) Floating-Point Format

In the floating-point format, we reserve 5 bits to represent the exponent, 1 bit to represent its sign, 9 bits to represent the mantissa part, and 1 bit to represent its sign. Table 1.38 shows the floating-point representation of the given number.

Table 1.38  Floating-point representation in a 16-bit computer
 SM M (9 bits) SE E (5 bits) 0 0.1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0.1 0 0 0 0 0 0 0 0 0 1 1 1 1 1

In Table 1.38, SM  represents the sign bit of mantissa, M represents the mantissa, SE represents the sign bit of exponent, and E represents the exponent. We find that row 2 shows the smallest bit that can be represented in this format. This is obtained as follows:

We have, for the smallest possible number in this scheme, mantissa = 0.5, which is the minimum possible value of mantissas in the floating-point scheme, as stated earlier. This is then represented under the column M as .1 followed by eight 0s (i.e., .100000000). The sign of the mantissa is taken as positive and is represented by a 0 under the column SM. Since the exponent has 5 bits to represent it, the highest possible number in this case will be 25 = 32. Using this as the exponent, we get the smallest number in our present case as

We find that this is achieved by sacrificing uniformity in resolution. Notice that in the floating-point format, compared to larger numbers, whose resolution is coarse, small numbers have finer resolution.

The IEEE 754 Standard (Floating-point format) for 32-bit Machines

IEEE 754 standard for floating-point arithmetic in 32-bit computers is shown in Table 1.39.

Table 1.39 IEEE 754 standard for 32-bit machines

 Sign (S) Exponent (E) Mantissa (M) 0 1                 8 9                     31

In this scheme, we have 23 bits reserved for representing the mantissa, one bit for the sign of the mantissa (S), 7 bits for the exponent, and one bit for the sign of the exponent. The maximum number in this scheme is