Regression analysis is useful in statistics as it allows us to identify trends in the data. If you find yourself wondering if there is a correlation or relationship between variables, then regression analysis might be worth exploring. As it will help identify and quantify trends, as well as aid in future predictions. Here are the basics of regression analysis using Excel and Tableau.
An overly simple example might be looking at total bill amount and tip amount. Common sense would tell us that yes, there is a positive correlation between the two, as the bill amount increases, so does the tip amount…or at least it should! So, instead of relying on common sense or our gut instinct to make a prediction, we can utilize regression analysis. Fortunately, there is plenty of technology out there that can perform such analysis, so you don’t have to do the calculations manually. IBM SPSS Modeler, Tableau and Excel are all viable options, just to name a few!
In my example of regression analysis basics, I will be using Excel and Tableau to briefly walk you through how easy it is to perform linear regression on a sample data set that has recorded the average snowfall from 2004 to 2017, as well as how you can use linear regression to predict the average snowfall for future years.
Essentially, regression is your “best guess” at predicting what might happen in the future by using the data available to you. By plotting the predictor variable (dependent variable) and the criterion variable (independent variable).
Excel:
With your columns highlighted, select the Insert tab and click on the insert scatter chart icon. Select the first option, which is a scatter plot.
Now that we have a scatter chart created lets add some chart elements to jazz it up a bit! Click your chart to bring up the Chart Tools options in the ribbon and select Design. Click the Add Chart Element icon on the far left. Scroll down to Trendline and hover to expand the available options. Select Linear.
To add further detail to the chart repeat the previous steps and select More Trendline Options.
Check the two check boxes in the screenshot.
And here is the result. Just by eyeballing the graph you can arrive at a prediction for 2018. Looks like the average amount of snowfall will be around 5-10 inches, based on the slope of the line.
If you wanted to be more accurate, simply plug in and x value (the year) into the regression equation.
For example, for 2018:
y = -2.2923(2018) + 4624.4 = 5.4155 inches.
This is pretty accurate compared to our original guess based on simply looking at the slope of the line.
We are also given an R squared value, which is essentially telling you how good your model is. The values range from 0 to 1, with 1 being a perfect model. So our R squared value of 0.702 is quite acceptable and is assurance that you can be pretty confident in your predictions.
Tableau:
Once you’ve imported your data. Drag years to the column shelf and your measure (Amount in inches) to the rows shelf. Tableau automatically chooses a visualization for you based on your inputs and in this case it chose a line graph.
In the left hand column under the marks card, select the dropdown arrow next to Automatic and select Circle to get rid of the line.
Next right click anywhere on the canvas to bring up the more options menu. Hover over Trend Lines to expand and select Show Trend Lines.
And there you have it! A quick and easy way to add a trend line to your scatter plot. If you hover your mouse over the trend line you are presented with summary statistics such as the regression equation, R-squared value and the p-value.
Conclusion:
This blog focuses on simple linear regression, which is when you have one independent variable (x-variable) for a single dependent variable (y-variable). There are more advanced techniques like multiple linear regression, which is when you have multiple dependent variables for a single independent variable. For example, if you’re the owner of a transportation company that deliverers furniture and you want to be able to predict your drivers total travel time based on two factors (independent variables): total distance traveled and number of deliveries.
This was a very high level introduction to regression analysis, but this type of statistical analysis can be applied to many areas of business from predicting customer churn, future sales or student enrollment. If you can see how this might benefit your business and would like to have a more detailed discussion please shoot me an email at:
Kristi Smith,
ksmith@ecapitaladvisors.com