Lesson 1
Introduction to Curve Fitting
Introduction to Curve Fitting

Welcome to the first lesson on Introduction to Curve Fitting in this course. Curve fitting is a method of finding a mathematical function that provides the best fit to a series of data points. It is an essential concept in data analysis, helping us model and predict behaviors in various fields. For example, businesses might use curve fitting to forecast sales trends, and scientists might use it to analyze experimental data.

In this lesson, we will use Python to perform curve fitting using the SciPy library, focusing particularly on a linear model. By the end of this lesson, you will understand how to implement a linear curve-fitting process, visualize it, and interpret the results.

Defining a Linear Model Function

To perform curve fitting, we need a model function. A linear model is a fundamental starting point, often expressed as y = ax + b, where a is the slope and b is the intercept.

Let's define a simple linear model function in Python:

Python
1def linear_model(x, a, b): 2 return a * x + b

Here, the function linear_model takes in an input variable x and parameters a (slope) and b (intercept) to return the predicted y value.

Example 1: Calling the function with specific values

Python
1# Call the function with x=5, a=3, b=2 2result = linear_model(5, 3, 2) 3print(result) # Output: 17

In this example, we call linear_model with x=5, a=3, and b=2, which calculates the y value as 3 * 5 + 2 = 17.

Example 2: Using list comprehension

Python
1# Use list comprehension to apply the function to a list of x values 2x_values = [1, 2, 3, 4, 5] 3results = [linear_model(x, 3, 2) for x in x_values] 4print(results) # Output: [5, 8, 11, 14, 17]

Here, we use list comprehension to apply linear_model to each element in x_values, with a=3 and b=2, resulting in a list of y values.

Creating Synthetic Data for Practice

To learn curve fitting practically, we use synthetic data that shows linear behavior with some noise added to mimic real-world data variations.

First, we generate some synthetic data:

Python
1import numpy as np 2import matplotlib.pyplot as plt 3 4# Generate x values and calculate y values with noise 5x_data = np.linspace(0, 10, num=20) 6y_data = 3.5 * x_data + 2 + np.random.normal(size=x_data.size) 7 8# Print x and y values 9for x, y in zip(x_data, y_data): 10 print(f'x = {x:.2f}, y = {y:.2f}') 11 12# Plot the data 13plt.scatter(x_data, y_data, color='black') 14plt.grid() 15plt.show()
  • np.linspace(0, 10, num=20): This generates 20 evenly spaced points between 0 and 10, representing our x values.
  • 3.5 * x_data + 2: This creates the true linear relationship.
  • np.random.normal(size=x_data.size): Adds random noise to the y values, simulating measurement errors or other real-world factors.

Run this code to observe the generated data.

Applying SciPy's `curve_fit` for Curve Fitting

Now, let's fit our linear model to the generated synthetic data using SciPy's curve_fit function.

Python
1from scipy.optimize import curve_fit 2 3params, covariance = curve_fit(linear_model, x_data, y_data) 4print(params) # [3.48448829 2.2934106 ]
  • params: An array that holds the optimal values for parameters a and b.
  • covariance: Provides an estimate of the accuracy of the parameter values.

By fitting the model, we determine the values of a and b that best describe the linear trend in our noisy data.

Using Obtained Parameters to Generate the Fitted Curve

Once we have obtained the parameters from the curve_fit function, we can use them to calculate the predicted y values for our model directly. These values represent the curve that best fits the data.

Python
1# Extract the optimal parameters 2a_opt, b_opt = params 3 4# Use the model function with these parameters 5fitted_y = linear_model(x_data, a_opt, b_opt)
  • a_opt, b_opt: The optimal values for the slope and intercept obtained from curve_fit.
  • fitted_y: The y values calculated using the linear_model function and the optimal parameters, representing the resulting curve.

Also, you can obtain fitted_y like this:

Python
1# Use the model function with these parameters 2fitted_y = linear_model(x_data, *params)

Python will unpack params to a and b.

Finally, you can plot fitted_y against x_data to visualize the curve, as demonstrated in the visualization section.

Visualizing Curve Fitting Results

Visualization helps us see how well our model fits the data. We will plot the original data points and the fitted line.

Python
1plt.scatter(x_data, y_data, label='Data') 2plt.plot(x_data, fitted_y, label='Fitted line', color='red') 3plt.legend() 4plt.grid() 5plt.show()
  • plt.scatter(...) plots the original data as a scatter plot.
  • plt.plot(...) draws the fitted line using the parameters from curve_fit.

Here is the resulting plot:

As you can see, the defined function has been successfully fitted to the provided data.

Understanding Mean Squared Error (MSE)

Mean Squared Error (MSE) is a common metric used to evaluate the accuracy of a model's predictions. It measures the average squared difference between the actual data points and the predicted values. A lower MSE indicates a better fit of the model to the data.

The formula for MSE is:

MSE=1ni=1n(yiy^i)2\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2

where:

  • nn is the number of data points,
  • yiy_i is the actual value,
  • y^i\hat{y}_i is the predicted value.

Here's a short code snippet to calculate MSE for our fitted model:

Python
1# Calculate MSE 2mse = np.mean((y_data - fitted_y) ** 2) 3print(f'Mean Squared Error: {mse:.2f}')

In this snippet, y_data represents the actual data points, and fitted_y represents the predicted values from our linear model. The np.mean function computes the average of the squared differences, giving us the MSE.

Full Runnable Code

Here is the complete code for this lesson:

Python
1import numpy as np 2import matplotlib.pyplot as plt 3from scipy.optimize import curve_fit 4 5# Define the linear model function 6def linear_model(x, a, b): 7 return a * x + b 8 9# Generate x values and calculate y values with noise 10x_data = np.linspace(0, 10, num=20) 11y_data = 3.5 * x_data + 2 + np.random.normal(size=x_data.size) 12 13# Fit the model to the data 14params, covariance = curve_fit(linear_model, x_data, y_data) 15 16# Extract the optimal parameters 17a_opt, b_opt = params 18 19# Use the model function with these parameters 20fitted_y = linear_model(x_data, a_opt, b_opt) 21 22# Visualize the results 23plt.scatter(x_data, y_data, label='Data') 24plt.plot(x_data, fitted_y, label='Fitted line', color='red') 25plt.legend() 26plt.grid() 27plt.show() 28 29# Calculate MSE 30mse = np.mean((y_data - fitted_y) ** 2) 31print(f'Mean Squared Error: {mse:.2f}')

You can run this code in the playground and experiment with parameters.

Summary and Next Steps

In this lesson, you learned the basics of curve fitting using a linear model, from generating synthetic data to fitting the model and visualizing the results. You now have a foundational understanding of how to perform curve fitting using SciPy.

As you advance, practice these techniques with various datasets and models to gain confidence. The upcoming practice exercises will give you a chance to apply what you've learned and further solidify your understanding. Keep moving forward, as this skill is integral to data analysis and modeling in countless real-world scenarios.

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.