Welcome to the first lesson on Introduction to Curve Fitting in this course. Curve fitting is a method of finding a mathematical function that provides the best fit to a series of data points. It is an essential concept in data analysis, helping us model and predict behaviors in various fields. For example, businesses might use curve fitting to forecast sales trends, and scientists might use it to analyze experimental data.
In this lesson, we will use Python to perform curve fitting using the SciPy
library, focusing particularly on a linear model. By the end of this lesson, you will understand how to implement a linear curve-fitting process, visualize it, and interpret the results.
To perform curve fitting, we need a model function. A linear model is a fundamental starting point, often expressed as y = ax + b
, where a
is the slope and b
is the intercept.
Let's define a simple linear model function in Python:
Python1def linear_model(x, a, b): 2 return a * x + b
Here, the function linear_model
takes in an input variable x
and parameters a
(slope) and b
(intercept) to return the predicted y
value.
Example 1: Calling the function with specific values
Python1# Call the function with x=5, a=3, b=2 2result = linear_model(5, 3, 2) 3print(result) # Output: 17
In this example, we call linear_model
with x=5
, a=3
, and b=2
, which calculates the y
value as 3 * 5 + 2 = 17
.
Example 2: Using list comprehension
Python1# Use list comprehension to apply the function to a list of x values 2x_values = [1, 2, 3, 4, 5] 3results = [linear_model(x, 3, 2) for x in x_values] 4print(results) # Output: [5, 8, 11, 14, 17]
Here, we use list comprehension to apply linear_model
to each element in x_values
, with a=3
and b=2
, resulting in a list of y
values.
To learn curve fitting practically, we use synthetic data that shows linear behavior with some noise added to mimic real-world data variations.
First, we generate some synthetic data:
Python1import numpy as np 2import matplotlib.pyplot as plt 3 4# Generate x values and calculate y values with noise 5x_data = np.linspace(0, 10, num=20) 6y_data = 3.5 * x_data + 2 + np.random.normal(size=x_data.size) 7 8# Print x and y values 9for x, y in zip(x_data, y_data): 10 print(f'x = {x:.2f}, y = {y:.2f}') 11 12# Plot the data 13plt.scatter(x_data, y_data, color='black') 14plt.grid() 15plt.show()
np.linspace(0, 10, num=20)
: This generates 20 evenly spaced points between 0 and 10, representing ourx
values.3.5 * x_data + 2
: This creates the true linear relationship.np.random.normal(size=x_data.size)
: Adds random noise to they
values, simulating measurement errors or other real-world factors.
Run this code to observe the generated data.
Now, let's fit our linear model to the generated synthetic data using SciPy's curve_fit
function.
Python1from scipy.optimize import curve_fit 2 3params, covariance = curve_fit(linear_model, x_data, y_data) 4print(params) # [3.48448829 2.2934106 ]
params
: An array that holds the optimal values for parametersa
andb
.covariance
: Provides an estimate of the accuracy of the parameter values.
By fitting the model, we determine the values of a
and b
that best describe the linear trend in our noisy data.
Once we have obtained the parameters from the curve_fit
function, we can use them to calculate the predicted y
values for our model directly. These values represent the curve that best fits the data.
Python1# Extract the optimal parameters 2a_opt, b_opt = params 3 4# Use the model function with these parameters 5fitted_y = linear_model(x_data, a_opt, b_opt)
a_opt
,b_opt
: The optimal values for the slope and intercept obtained fromcurve_fit
.fitted_y
: They
values calculated using thelinear_model
function and the optimal parameters, representing the resulting curve.
Also, you can obtain fitted_y
like this:
Python1# Use the model function with these parameters 2fitted_y = linear_model(x_data, *params)
Python will unpack params
to a
and b
.
Finally, you can plot fitted_y
against x_data
to visualize the curve, as demonstrated in the visualization section.
Visualization helps us see how well our model fits the data. We will plot the original data points and the fitted line.
Python1plt.scatter(x_data, y_data, label='Data') 2plt.plot(x_data, fitted_y, label='Fitted line', color='red') 3plt.legend() 4plt.grid() 5plt.show()
plt.scatter(...)
plots the original data as a scatter plot.plt.plot(...)
draws the fitted line using the parameters fromcurve_fit
.
Here is the resulting plot:
As you can see, the defined function has been successfully fitted to the provided data.
Mean Squared Error (MSE) is a common metric used to evaluate the accuracy of a model's predictions. It measures the average squared difference between the actual data points and the predicted values. A lower MSE indicates a better fit of the model to the data.
The formula for MSE is:
where:
- is the number of data points,
- is the actual value,
- is the predicted value.
Here's a short code snippet to calculate MSE for our fitted model:
Python1# Calculate MSE 2mse = np.mean((y_data - fitted_y) ** 2) 3print(f'Mean Squared Error: {mse:.2f}')
In this snippet, y_data
represents the actual data points, and fitted_y
represents the predicted values from our linear model. The np.mean
function computes the average of the squared differences, giving us the MSE.
Here is the complete code for this lesson:
Python1import numpy as np 2import matplotlib.pyplot as plt 3from scipy.optimize import curve_fit 4 5# Define the linear model function 6def linear_model(x, a, b): 7 return a * x + b 8 9# Generate x values and calculate y values with noise 10x_data = np.linspace(0, 10, num=20) 11y_data = 3.5 * x_data + 2 + np.random.normal(size=x_data.size) 12 13# Fit the model to the data 14params, covariance = curve_fit(linear_model, x_data, y_data) 15 16# Extract the optimal parameters 17a_opt, b_opt = params 18 19# Use the model function with these parameters 20fitted_y = linear_model(x_data, a_opt, b_opt) 21 22# Visualize the results 23plt.scatter(x_data, y_data, label='Data') 24plt.plot(x_data, fitted_y, label='Fitted line', color='red') 25plt.legend() 26plt.grid() 27plt.show() 28 29# Calculate MSE 30mse = np.mean((y_data - fitted_y) ** 2) 31print(f'Mean Squared Error: {mse:.2f}')
You can run this code in the playground and experiment with parameters.
In this lesson, you learned the basics of curve fitting using a linear model, from generating synthetic data to fitting the model and visualizing the results. You now have a foundational understanding of how to perform curve fitting using SciPy
.
As you advance, practice these techniques with various datasets and models to gain confidence. The upcoming practice exercises will give you a chance to apply what you've learned and further solidify your understanding. Keep moving forward, as this skill is integral to data analysis and modeling in countless real-world scenarios.