# Floating Point Representation Examples

Posted by Sreejith Hrishikesan ~ on ~ 0 comments

Consider the decimal number 468. We can express this in the form

468 = 468/1000 = 0.468 ´ 103

Here, we first divided and then multiplied 468 by 1000 so that its value does not change. In this process, the division by 1000 had converted 468 into a fraction, and the multiplication by 1000 ensured that its value remained unchanged at 468 itself. Thus, in our example, we expressed 468 as the product of a fractional part (0.468) and an exponent part (103). This type representation of numbers is called the floating-point representation. The fractional (decimal part) is called as mantissa, and the exponential part is called as exponent. Thus to express a given decimal number in the  floating-point format, the steps to be followed are:

1.    First, divide the given number by an appropriate power of 10 (radix of the decimal-number system) so that it is converted into a fraction.
2.   Multiply the resulting fraction with the same power of 10 (103, here) so that division is cancelled by multiplication; this brings the number back to its original value (468, here).

The steps given above may be extended to the binary-number system also to represent a given number in the floating-point format.

Example 33:  Express decimal number 7 in the binary floating-point format.

Solution: Consider decimal number 7. The binary equivalent of 7 is 111. To express this in the floating-point format, we divide 111 with an appropriate power of 2. Since there are three bits in the given number, extending our theory from the decimal system given above, we have to divide and multiply 111 by 23. Then X can be written in the form
X = (111/23) × 23 = 0.111× 23

In the binary floating-point format, we must express the exponent also in binary. The binary equivalent of decimal 3 is 011. As this is a positive exponent, we use sign bit 0 in the first bit position of the exponent Thus the complete floating-point representation of decimal number 7 is:
X = 0.111× 20011
To check whether our operation has yielded the correct answer, we expand the above relation

X = 0.111 x 23 = (1 x 2-1+1 x 2-2 + 1 x 2-3) x 23 = 7

The result of this checking operation shows the correctness of our method. Now, we generalize the floating-point method with the expression
X = M ´ RE                   (1.19)

where M = mantissa, R = radix of the number system used, and E = exponent. Now, to express X in  the  floating-point scheme  (when it is a whole number), we  multiply and divide X by RE> Thus

X = (X/ RE) x RE

where E is dependent on the number of bits in X. From the above, we find

M = X/ RE                       (1.20)

In the example using decimal number 468, we had X = 468, R = 10, E = 3, and therefore M = 0.468.
Let us now consider the binary floating-point scheme again. Even though, we can use any number as the mantissa, in the binary floating-point format, a restriction has been imposed on it: it should lie between ½ and 1. That is
0.5 ≤ M ≤ 1                         (1.21)

This restriction imposes the condition that the first bit after the binary point must be a 1.