× back

Floating Point Arithmetic

Applications of Floating Point Arithmetic

  1. Representation of Real Numbers: Floating point arithmetic is used to represent real numbers in computer systems, allowing for the handling of decimal fractions and large or small numbers.
  2. Standardization: Floating point arithmetic follows standardized formats and operations, such as IEEE 754, ensuring consistency and compatibility across different computing platforms and programming languages.
  3. Efficiency and Speed: Floating point arithmetic enables efficient and speedy computation of complex mathematical operations, making it essential for scientific computing, simulations, and graphics processing.
  4. Numerical Stability: It provides numerical stability by minimizing rounding errors and maintaining accuracy in calculations, crucial for scientific and engineering applications where precision is critical.
  5. Handling a Large Range of Magnitude: Floating point arithmetic supports computations involving a wide range of magnitudes, from very small numbers (e.g., subatomic particles) to extremely large numbers (e.g., astronomical distances), facilitating versatile and accurate numerical processing.

Significant Digits

In a real or decimal number, the digits used to express the number are called significant digits. Here are the criteria for determining significant digits:

  1. Digits from 1 to 9 are significant. For example, in 123, all three digits are significant.
  2. Zero (0) is a significant digit in many cases, but there are specific times when it is not significant:
    • When zero is used just to show the position of the decimal point. For example, in 0.001, only the digit 1 is significant because the zeros are just placeholders.
      • Example: In 0.001, only the digit 1 is significant.
    • When zero is used to fill in places in large numbers without adding precision. For example, in 100, the zeros are not significant because they only indicate that the number is in the hundreds.
    However, there are cases where zeros are significant:
    • If a zero appears between other significant digits, it is also significant.
      • Example: In 105, all digits (1, 0, 5) are significant.
    • If a zero comes after a decimal point and after another significant digit, it is also significant.
      • Example: In 2.50, all digits (2, 5, 0) are significant.
      • Example: In 2.500000, all digits are significant.

Criteria for Significant Digits:

  1. All non-zero digits are significant. For example:
    • In 456, all three digits are significant.
    • In 0.00478, all digits except zeros after the decimal point are significant.
  2. Zero digits are significant when:
    • They lie between significant digits. For example, in 102, both zeros are significant because they are between 1 and 2.
    • They are to the right of the decimal point and to the right of a non-zero digit. In 10.20, both zeros are significant.

Q- Find out the number of significant digit and write the significant digits for the following numbers:
3696, 3060, 3900, 39.69, 39.00, 0.00390, 3.9, 6*102, 3.0069.

Concept of MSD and LSD

Most Significant Digit (MSD)

  • The Most Significant Digit (MSD) is the digit in a number that represents the largest value and contributes the most to the overall magnitude of the number.
  • It is the leftmost digit in a number and holds the highest place value.
  • For example, in the number 54321, the MSD is 5, as it represents the thousands place.

Least Significant Digit (LSD)

  • The Least Significant Digit (LSD) is the digit in a number that contributes the least to the overall magnitude and has the smallest place value.
  • It is the rightmost digit in a number and changes the value of the number minimally when altered.
  • For example, in the number 54321, the LSD is 1, as it represents the units place.

Understanding Scientific Notation

Scientific notation is a way to express numbers that are very large or very small in a concise and standardized format using powers of 10.

  • Format: The scientific notation format is represented as M * 10n, where:
    • M is a decimal number between 1 and 10 (inclusive), known as the mantissa or coefficient.
    • 10 is the base or radix of the notation.
    • n is an integer representing the exponent, indicating the number of places the decimal point needs to be moved.
  • Example:
    • Large Number: 300,000,000 can be written in scientific notation as 3*108. Here, M=3 and n=8.
    • Small Number: 0.000000005 can be written as 5*10-9. Here, M=5 and n=−9.

Normalization

Normalization in floating point arithmetic refers to the process of representing a floating point number in a standardized form, typically in scientific notation, to ensure efficient storage and computation while preserving accuracy. It involves adjusting the significand (mantissa) and exponent to a normalized format.

Operations

  1. Addition
  2. Subtraction
  3. Multiplication
  4. Division

Addition

  • Ensure that the exponents of all the numbers are equal. If not, adjust them by shifting the decimal point (normalization).
  • Add the two numbers together. If the result has a number before the decimal greater than 0 (e.g., 1.8834), shift the decimal point to make it 0.18834.

Subtraction

  • Ensure that the exponents of all the numbers are equal. If not, adjust them by shifting the decimal point (normalization).
  • Subtract the two numbers. If the result has a number after the decimal of zero (e.g., 0.001883), and there is no significant digit before the decimal, shift the decimal point to make it equal to 0 (0.1883).

Example:
1:
0.6546 * 105 - 0.5433 * 105 = 0.1113 * 105 (as the exponent are same so there is no change required)

2:
0.6546 * 105 - 0.5433 * 107
First we will make the exponent same:
0.6546 * 105 = 0.006546 * 107
now perform 0.0065 * 107 - 0.5433 * 107 = - 5368 * 107 (answer)

Multiplication

  • Here the exponents of two numbers get added together.
  • If the result is having number greater than 0 before the decimal point (e.g. 1.898) so shift the decimal point accordingly.

Division

  • Here the exponents of two numbers get subtracted.
  • If the result of the division of the mantissa part has a number greater than 0 before the decimal (e.g., 1.898), then shift the decimal point accordingly.
  0.6546 * 10^5
/ 0.5433 * 10^8
----------------- 
  1.205 * 10^-3 = 0.1205 * 10^-2
                    

Overflow and Underflow Conditions

  • Overflow is a condition where the exponent becomes greater than 99, and it can occur in both addition and multiplication.
  • Underflow condition is a condition when the exponent becomes smaller than -99, and it can occur in subtraction.

Errors in Numerical Computation

Understanding Rules for Rounding-off a Number to n Significant Figures:

Rounding to Three Significant Figures:

  1. If the digit after the third significant figure is 5 or greater, round up the third significant figure by 1.
    • Example: 12.3567 rounds to 12.4 (since 3 is followed by 5 or greater)
  2. If the digit after the third significant figure is less than 5, the third significant figure remains unchanged.
    • Example: 78.432 remains as 78.4 (since 4 is followed by less than 5)
  3. If the digit after the third significant figure is exactly 5 with non-zero digits after it, round up the third significant figure by 1.
    • Example: 67.3512 rounds to 67.4 (since 5 is followed by non-zero digits)
  4. If the digit after the third significant figure is exactly 5 with only zero digits after it, the third significant figure remains unchanged.
    • Example: 89.650 remains as 89.6 (since 5 is followed by only zeros)

Algebraic and transcendental equations Using Iterative Methods

We should know What Algebraic and Transcendental Equations Are

What is the Root of an Equation?

The root of an equation is a value that, when substituted into the equation, makes the equation equal to zero. In other words, it is a solution or solutions that satisfy the equation by reducing it to zero.

For example, consider the quadratic equation:

ax^2 + bx + c = 0

The roots of this equation can be found using the quadratic formula:

x = (-b ± √(b^2 - 4ac)) / 2a

In this formula, the values of a, b, and c are coefficients of the quadratic equation. The ± symbol indicates that there are usually two roots: one with the positive square root and one with the negative square root.

So, the roots of the quadratic equation are the values of x that, when substituted into the equation, make it equal to zero.

Root finding algorithms:

  1. Root Bracketing Algorithm:
    1. Bisection Method
    2. False Positioning or Regula Falsi
  2. Root Polishing Algorithm:
    1. Secant Method
    2. Newton's Rapshon Method
    3. Iteration Method
  • Root finding methods are broadly categorized into root bracketing and root polishing methods.
  • Root Bracketing: In these methods, a new interval is obtained from the previous interval, and the root for every interval is determined. In contrast, root polishing methods only find the new root without determining the bracket, making them much faster.

Arrangement of the root finding methods in terms of speed, from slowest to fastest:

  1. Bisection Method: This method is relatively slower because it involves narrowing down the interval by halving it repeatedly until the root is found. It guarantees convergence but can be slower compared to other methods.
  2. Regula Falsi Method: Also known as the false position method, this approach is generally slower than some other methods because it combines aspects of both bisection and secant methods, which can lead to slower convergence.
  3. Iterative Methods: These methods, such as fixed-point iteration or the method of successive approximations, can be faster than bisection and regula falsi but may still require several iterations to converge, especially for complex functions or poorly chosen initial guesses.
  4. Secant Method: The secant method is faster than bisection and regula falsi because it uses secant lines to approximate the root, leading to quicker convergence compared to linear methods like bisection.
  5. Newton-Raphson Method: This method is often the fastest among these options because it utilizes both the function and its derivative to iteratively refine the root approximation. It converges quadratically, which means it can converge much faster than linear methods like bisection and even secant.

Bisection Method

  • The bisection method in mathematics is a root-finding technique that involves iteratively bisecting an interval and selecting a subinterval in which a root must lie for further processing. This method is also commonly known as the interval halving method.

Steps involved in Bisection Method

  1. Finding the initial guess (assumptions)
  2. Find the mid value of the root
    xmid = (xl + xr) / 2
  3. Finding next bracket: New interval is found out by the following condition.
    • if (f(xmid) * f(xl) < 0)
      Then xr = xmid ; xl is same
      else xl = xmid
  4. Note: Iterations are continued until the absolute value of the root is greater than epsilon (0.003) or until the specified number of iterations is reached.

Example: 1



Example: 2



Regula Falsi or False Position

  • It is used when root finding through bisection method takes a longer time. It uses slope concept for reducing the number of iteration.

Example 1:

Secant method

  • The secant method bears similarities to the regula falsi method, and it is known for its faster convergence compared to the regula falsi method.
  • However, it's important to note that convergence is not guaranteed in the case of the secant method. The secant method is also referred to as the chord method.
  • The generation of a new point in the secant method is determined using the formula:
    \(x_{i+1} = x_{i-1} - \frac{x_{i} - x_{i-1}}{f(x_{i-1}) - f(x_{i})} * f(x_{i-1})\)
  • As this method is not a root bracket method so we don't have to check that condition (f(xnext) * f(xl) < 0).

Steps in Secant method:

  1. Start with initial guess \(x_{i-1}\) and \(x_{i}\).
  2. Find next approximated root as \(x_{i+1} = x_{i-1} - \frac{x_{i} - x_{i-1}}{f(x_{i-1}) - f(x_{i})} * f(x_{i-1})\)
  3. Verify if convergence is obtained i.e. |\(x_{i+1}\) - \(x_{i}\)| <= epsilon if yes then \(x_{i+1}\) is the required root

Example:

f(x) = x3 - 5x + 1
Finding initial guess:
f(0) = 1
f(1) = -3
hence \(x_0\) = 0 and \(x_1\) = 1
and f(\(x_0\)) = 1, f(\(x_1\)) = -3

  • Iteration 1:
    \(x_2\) = \(x_{0} - \frac{x_{1} - x_{0}}{f(x_{0}) - f(x_{1})} * f(x_{0})\)
    \(x_2\) = \(0 - \frac{1 - 0}{f(0) - f(1)} * f(0)\)
    \(x_2\) = 0.25
  • Iteration 2:
    \(x_3\) = \(x_{1} - \frac{x_{2} - x_{1}}{f(x_{2}) - f(x_{1})} * f(x_{1})\)
    \(x_3\) = 0.1864
    f(\(x_3\)) = 0.0744
  • Iteration 3:
    \(x_4\) = \(x_{2} - \frac{x_{3} - x_{2}}{f(x_{3}) - f(x_{2})} * f(x_{2})\)
    \(x_4\) = 0.2017
    f(\(x_4\)) = -0.000294
  • Iteration 4:
    \(x_5\) = \(x_{3} - \frac{x_{4} - x_{3}}{f(x_{4}) - f(x_{3})} * f(x_{3})\)
    \(x_5\) = 0.2016
    checking convergence; 0.2017 - 0.2016 = 0.0001 which is less than 0.003 hence we can say the root is 0.2016

Newton-Raphson Method

  • The Newton-Raphson method takes only a single guess to find the next root. It is the most preferred method for finding the root of an equation and is also known as the method of tangents. Selection of the initial guess ensures that this algorithm converges faster than other algorithms. The next guess is found using the formula.
    \(x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}\)
    For inital guess take the \(x_0\) which is closer to 0 out of initial bracket.
    We keep finding next term until the terms are repeated.
  • Derivation or differentiation is a mathematical concept that measures how a quantity changes when another related quantity undergoes a change. In simpler terms, it provides a way to understand the rate at which one variable changes concerning another. For example, if we have a function describing the position of an object over time, the derivative of that function with respect to time gives us the object's velocity.

Q- Find the real root of \(x^4 - x - 10 = 0\) by Newton Raphson Method.

  • Sol:
    f(x) = \(x^4 - x - 10 = 0\)
    finding inital guess \((x_0)\)
    f(0) = -10
    f(1) = -10
    f(2) = 4
    1 and 2 are the bracket
    at f(2), the function value is closer to 0, so we will take \(x_0 = 2\)
    iteration 1:
    \(x_1\) = \(x_0\) - \(\frac{f(x_0)}{f'(x_0)}\)
    \(f'(x_0)\) = \(\frac{d\ (x^4 - x - 10)}{dx}\) = \(4x^3 - 1\) = 33
    \(x_1\) = 2 - \(\frac{4}{33}\)
    \(x_1\) = 1.8788 and f(\(x_1\)) = 0.5813
    iteration 2:
    \(x_2\) = \(x_1\) - \(\frac{f(x_1)}{f'(x_1)}\)
    \(f'(x_1)\) = 27.53
    \(x_2\) = 1.8788 - \(\frac{0.5813}{27.53}\)
    \(x_2\) = 1.8577 and f(\(x_2\)) = 0.0520
    iteration 3:
    \(x_3\) = \(x_2\) - \(\frac{f(x_2)}{f'(x_2)}\)
    \(f'(x_2)\) = 26.6441
    \(x_3\) = 1.8577 - \(\frac{0.0520}{26.6441}\)
    \(x_3\) = 1.8557 and f(\(x_3\)) = 0.0028
    iteration 4:
    \(x_4\) = \(x_3\) - \(\frac{f(x_3)}{f'(x_3)}\)
    \(f'(x_3)\) = 26.5613
    \(x_4\) = 1.8557 - \(\frac{0.0028}{26.5613}\)
    \(x_4\) = 1.8556 and f(\(x_4\)) = 0.0003
    iteration 5:
    \(x_5\) = \(x_4\) - \(\frac{f(x_4)}{f'(x_4)}\)
    \(f'(x_4)\) = 26.5572
    \(x_5\) = 1.8556 - \(\frac{0.0003}{26.5572}\)
    \(x_5\) = 1.8556
    as \(x_5\) and \(x_4\) are same hence 1.8556 is the required root.