× back

Curve Fitting

Fitting of a Straight Line Using the Least Squares Method

Working Rule (Steps)

  1. Let y = ax + b (Equation 1) be the curve of best fit, where 'a' and 'b' are the constants to be determined.
  2. Write the normal equations:
    There are two constants in y = ax + b, so there are two normal equations:
    ∑y = a∑x + b∑1 (Note: ∑1 = n, so it is ∑y = a∑x + bn)
    where n is the number of given data points.
    ∑xy = a∑x² + b∑x
  3. Using tabular calculation with the given data, find the unknowns 'a' and 'b'.
  4. Substitute these values of 'a' and 'b' into Equation 1 to get the required curve of best fit.

Question: By the method of least squares, find the best fitting straight line to the data given below:

x  |  5  |  10  |  15  |  20  |  25
y  | 15  |  19  |  23  |  26  |  30
                

Solution:

  1. Let y = ax + b (Equation 1) be the straight line of best fit where 'a' and 'b' are the constants to be determined.
  2. The normal equations are:
    ∑y = a∑x + bn (Equation 2)
    where n is the number of given data points
    ∑xy = a∑x² + b∑x (Equation 3)
    The number of given data points, n = 5.
  3. Tabular calculations as we need ∑x, ∑y, ∑x², and ∑xy for finding the values of 'a' and 'b' through the normal equations:
      x  |  y  | x²   |  xy 
    --------------------------
      5  | 15  | 25   | 75
      10 | 19  | 100  | 190
      15 | 23  | 225  | 345
      20 | 26  | 400  | 520
      25 | 30  | 625  | 750
                            
    ∑x = 75, ∑y = 113, ∑x² = 1375, and ∑xy = 1880
    Now substitute these summation values into the normal equations to get:
    Equation 2: 113 = 75a + 5b (Equation 4)
    Equation 3: 1880 = 1375a + 75b (Equation 5)
    On solving Equation 4 and Equation 5 we get:
    a = 0.74
    b = 11.5
  4. Now substitute the values of 'a' and 'b' into Equation 1 to get:
    y = 0.74x + 11.5

Fitting of Exponential Curves by the Method of Least Squares
Example: y = aebx (exponential), y = axb (power)

Question: Fit y = aebx to the following data using the method of least squares:

    x  |  0  |    5  |   8  |  12    |  20
    y  |  3  |  1.5  |   1  |  0.55  |  0.18
                
  1. Let y = aebx (Equation 1) be the curve to be fitted where 'a' and 'b' are constants to be determined.
  2. Taking the logarithm on both sides, we get:
    log(y) = log(aebx)
    log(y) = log(a) + log(ebx) [since log(AB) = log(A) + log(B)]
    log(y) = log(a) + bx log(e) [since log(Am) = m log(A)]
    Therefore, Y = A + Bx where Y = log(y), A = log(a), and B = b log(e)
    Now, the normal equations are:
    ∑Y = An + B∑x (Equation 2)
    ∑xY = A∑x + B∑x² (Equation 3)
    The number of data points, n = 5
  3. Tabular calculation:
        x  |   y   |  Y = log(y)   |   xY    | x² 
      ---------------------------------------------
        0  |  3    |    0.4771     |  0      | 0
        5  | 1.5   |    0.1761     |  0.8805 | 25
        8  |  1    |    0.0000     |  0      | 64
        12 | 0.55  |   -0.2596     | -3.1152 | 144
        20 | 0.18  |   -0.7447     | -14.894 | 400
                            
    ∑x = 45, ∑Y = -0.3512, ∑xY = -17.1292, and ∑x² = 633
  4. Now we will substitute these values into Equations 2 and 3 to get A and B:
    Equation 2: -0.3512 = 5A + 45B (Equation 4)
    Equation 3: -17.1292 = 45A + 633B (Equation 5)
    Upon solving, we get:
    B = -0.0612
    A = 0.48056
    But A = log10(a)
    log10(a) = 0.48056
    a = 100.48056
    a ≈ 3.0238
    And B = b log(e)
    b log10(e) = -0.0612
    b = -0.0612 / log(e)
    b ≈ -0.1409
  5. Now Equation 1 becomes:
    y = 3.0238 e-0.1409x

Fit a curve y = axb to the following data:

    x  |    1   |    2   |   3    |    4   |    5   |   6
    y  |  2.98  |  4.26  |  5.21  |  6.10  |  6.80  |  7.50
                

Solution:

  1. Let y = axb (Equation 1) be the curve to be fitted where 'a' and 'b' are constants to be determined. Taking the logarithm on both sides of Equation 1, we get:
    log(y) = log(axb)
    log(y) = log(a) + log(xb) [since log(AB) = log(A) + log(B)]
    log(y) = log(a) + b log(x)
    Therefore, Y = A + bX where Y = log(y), X = log(x), and A = log(a)
    Now, the normal equations are:
    ∑Y = An + b∑X (Equation 2)
    ∑XY = A∑X + b∑X² (Equation 3)
    The number of given data points, n = 6
  2. Tabular calculation:
        x  |   y     |   X = log(x)  |   Y = log(y)  |   X²   |   XY
    -------------------------------------------------------------------
        1  |  2.98   |       0       |   0.4742      | 0      |  0
        2  |  4.26   |    0.3010     |   0.6294      | 0.0906 |  0.1894
        3  |  5.21   |    0.4771     |   0.7168      | 0.2276 |  0.3419
        4  |  6.10   |    0.6021     |   0.7853      | 0.3625 |  0.4727
        5  |  6.80   |    0.6990     |   0.8325      | 0.4886 |  0.5819
        6  |  7.50   |    0.7782     |   0.8751      | 0.6056 |  0.6809
    -------------------------------------------------------------------
        ∑  |         |    2.8574     |    4.3133     | 1.7743 |  2.2668
        
  3. Now we will substitute these values into Equations 2 and 3 to get A and B:
    Equation 2: 4.3133 = 6A + 2.8574b (Equation 4)
    Equation 3: 2.2668 = 2.8574A + 1.7743b (Equation 5)
    On solving these equations, we get:
    b = 0.5140
    A = 0.4741
    Now, A = log(a)
    So, log(a) = 0.4741
    a = 100.4741
    a ≈ 2.9792
  4. Equation 1 becomes:
    y = 2.9792x0.5140

Correlation

Karl Pearson's coefficient of correlation

Example Question: Find the coefficient of correlation for the given data:

    x  |  6  |  2  |  10  |  4  |  8
    y  |  9  | 11  |  5   |  8  |  7  
                    

Solution:

  • There are numerically small values, so we are going to use the following formula:
    Coefficient of Correlation Formula
    Where X = x - mean(x) and Y = y - mean(y)
  • The number of given data points, n = 5
    Let's find mean(x) = sum of x / n = (6 + 2 + 10 + 4 + 8) / 5 = 6
    mean(y) = sum of y / n = (9 + 11 + 5 + 8 + 7) / 5 = 8
    Therefore, X = x - mean(x) = x - 6
    Therefore, Y = y - mean(y) = y - 8
  • Tabular calculation:
        x  |  y  |  X  |  Y  |  X²  |  Y²  |  XY
        ------------------------------------------
        6  |  9  |  0  |  1  |  0   |  1   |  0
        2  | 11  | -4  |  3  | 16   |  9   | -12
        10 |  5  |  4  | -3  | 16   |  9   | -12
        4  |  8  | -2  |  0  |  4   |  0   |  0
        8  |  7  |  2  | -1  |  4   |  1   | -2
        ------------------------------------------
        ∑  |     |     |     | 40   | 20   | -26
                            
  • r = -26 / sqrt(40 * 20)
    Therefore, r = -0.9192

Question: Find the correlation for the following data:

    x  |  122  |  135  |  143  |  103  |  156 | 178 | 190
    y  |   46  |   78  |  89   |   46  |   82 |  52 |  96 
                    

Solution:

  • The number of given data points, n = 7
    mean(x) = 146.71
    mean(y) = 69.85
    As mean(x) and mean(y) are not integers, we use the method of assumed means where we round them off. So now, mean(x) = 147 and mean(y) = 70
    Now, X = x - mean(x) = x - 147
    and Y = y - mean(y) = y - 70
    As the data points are larger, we are going to use the following formula:
    Correlation Formula
  • Tabular calculation:
        x   |  y  |   X   |   Y   |   X²   |   Y²   |   XY
        ---------------------------------------------------
        122 |  46 |  -25  |  -24  |  625   |  576   |   600
        135 |  78 |  -12  |   8   |  144   |   64   |   -96
        143 |  89 |   -4  |  19   |   16   |  361   |   -76
        103 |  46 |  -44  |  -24  | 1936   |  576   |  1056
        156 |  82 |    9  |  12   |   81   |  144   |   108
        178 |  52 |   31  |  -18  |  961   |  324   |  -558
        190 |  96 |   43  |   26  | 1849   |  676   |  1118
        ----------------------------------------------------
        ∑   |     |   -2  |  -17  | 5612   | 2712   |  2152
                            
  • After putting all the values in the formula, we get r = 0.5506

Spearman's Rank Correlation

  • Determine the correlation coefficient between two variables that cannot be measured quantitatively.
  • Qualitative features such as honesty, beauty, character, morality, etc.
  • The rank correlation coefficient is given by:
    Spearman's Rank Correlation Formula
  • Rank correlation coefficient has three cases:
    1. When actual ranks are given
    2. When ranks are not given
    3. In case of equal ranks
  • The value of the rank correlation coefficient (r) can range from -1 to 1, indicating the strength and direction of the relationship:
    • If r = 1, there is a perfect positive correlation.
    • If 0.7 < r < 1, there is a strong positive correlation.
    • If 0.3 < r ≤ 0.7, there is a moderate positive correlation.
    • If 0 < r ≤ 0.3, there is a weak positive correlation.
    • If r = 0, there is no correlation.
    • If -0.3 ≤ r < 0, there is a weak negative correlation.
    • If -0.7 ≤ r < -0.3, there is a moderate negative correlation.
    • If -1 ≤ r < -0.7, there is a strong negative correlation.
    • If r = -1, there is a perfect negative correlation.

Case 1: When actual ranks are given

Question: The preference of two persons A and B for different car brands are given in the following table:

Car Brand  | Person A | Person B
--------------------------------
   Suzuki  |    5     |    1
  Hyundai  |    4     |    4
     Tata  |    1     |    2
    Skoda  |    3     |    5
   Toyota  |    2     |    3
    
Determine the coefficient of rank correlation between the preferences of consumers A and B and interpret your result.

Solution:

  • Tabular Calculation to find D and D²
    R1 | R2 | D = R1 - R2 |  D²
    ----------------------------
     5 |  1 |      4      |  16
     4 |  4 |      0      |   0
     1 |  2 |     -1      |   1
     3 |  5 |     -2      |   4
     2 |  3 |     -1      |   1
    ----------------------------
    ∑  |    |      0      |  22
            
  • Now the formula is:
    Spearman's Rank Correlation Formula
    r = 1 - ((6 * 22) / (125 - 5))
    r = 1 - (132 / 120)
    r = 1 - 1.1
    r = -0.1
    The value of r shows that there is a very weak negative correlation between the preferences of persons A and B.

Case 2: When ranks are not given

Question: Calculate the rank correlation coefficient from the given data:

Marks in Computer (x):  20  30  40  50  60  70  80
Marks in English (y) :  14   5  30  32  40  45  65

Solution:

  • Tabular calculation, where ranks are assigned based on the lowest values getting the lowest rank
    x  | Rx |  y | Ry |  D  | D²
    -----------------------------
    20 |  1 | 14 |  2 | -1  |  1
    30 |  2 |  5 |  1 |  1  |  1
    40 |  3 | 30 |  3 |  0  |  0
    50 |  4 | 32 |  4 |  0  |  0
    60 |  5 | 40 |  5 |  0  |  0
    70 |  6 | 45 |  6 |  0  |  0
    80 |  7 | 65 |  7 |  0  |  0
    -----------------------------
    ∑  |    |    |    |  0  |  2
            
  • Using the formula: Spearman's Rank Correlation Formula
    Here n = 7
    r = 1 - ((6 * 2) / (343 - 7))
    r = 1 - (12 / 336)
    r = 1 - 0.0357
    r = 0.964
    The value of r shows that there is a very strong positive correlation between the marks in Computer and English.

Case 3: Equal ranks are given

  • If some item is repeated for 'p' times in a given data, then add a factor of 1/12(p3 - p) to every repeated item.
    i.e.,
    If 2 items are repeated, then 1/12(p3 - p) will repeat 2 times.

Question: Calculate the rank correlation coefficient for the following data:

x  | 12 | 15 | 18 | 20 | 16 | 15 | 18 | 22 | 15 | 21 | 18 | 15
---------------------------------------------------------------
y  | 10 | 18 | 19 | 12 | 15 | 19 | 17 | 19 | 16 | 14 | 13 | 17

Solution:

  • x  |  y  |  Rx | Ry |   D   |  D²
    --------------------------------------
    12 |  10 |   1 |  1 |    0  |      0
    15 |  18 | 3.5 |  9 | -5.5  |  30.25
    18 |  19 |   8 | 11 |   -3  |      9
    20 |  12 |  10 |  2 |    8  |     64
    16 |  15 |   6 |  5 |    1  |      1
    15 |  19 | 3.5 | 11 | -7.5  |  56.25
    18 |  17 |   8 |7.5 |  0.5  |   0.25
    22 |  19 |  12 | 11 |    1  |      1
    15 |  16 | 3.5 |  6 | -2.5  |   6.25
    21 |  14 |  11 |  4 |    7  |     49
    18 |  13 |   8 |  3 |    5  |     25
    15 |  17 | 3.5 |7.5 |   -4  |     16
    ---------------------------------------
    ∑  |     |     |    |    0  |   258
            
    • For ranking x variable:
      • Rank of 15 = (2 + 3 + 4 + 5) / 4 = 3.5
      • Rank of 18 = (7 + 8 + 9) / 3 = 8
    • For ranking y variable:
      • Rank of 17 = (7 + 8) / 2 = 7.5
      • Rank of 19 = (10 + 11 + 12) / 3 = 11

  • r = 1 - ((6 * (258 + 5 + 2 + 0.5 + 2)) / 1716)
    r = 1 - (1605 / 1716)
    r = 0.0646
    The value of r shows that there is a very weak positive correlation between the variables.

Reference