Located in the Graphs pallet you will see a node called Evaluation. Evaluation nodes compare your predictive model against a baseline, (the expected response for the entire sample if no model were not used at all. Also known as an “at-chance” model) and a perfect prediction model (a model that has no errors when making a prediction).
Add the Evaluation node to the canvas and connect it to the current model. Double click the evaluation node to edit the node.
We are going to be creating a Gains chart and by definition gains are defined as the proportion total hits that occur in each quantile or increment.
Gains = (# of hits in quantile/total # of hits) x 100
Select the Grid chart type and check the following checkboxes and click Run:
- Cumulative plot
- Include baseline
- Include best line
And here is what we are presented with. Notice that our graph can only show results for one categorical value at a time, in this example those students who chose to enroll in our school. (I’ll show how to change this default value in a later step). The thick diagonal red line is the at-chance model, the blue line is the perfect prediction model and the red line in the middle is our current model. The Y-axis is represents the cumulative percent of hits or the % Gained, so for example at value 20 we have found 20% of the students who have enrolled, at value 60 we have found 60% of the students who have enrolled, and so on. The X-axis represents the percentile groups ordered by confidence.
Notice that the perfect prediction model and our model both start out pretty steep, and as a rule of thumb the steeper the curve, the higher the gain.
Interpreting the graph
So the graph can be interpreted as follows:
As we’ve gone through 20% of the data the at-chance model is showing that we will have correctly identified 20% of the students who have enrolled (by doing nothing at all), the perfect prediction model is showing that we have correctly identified 60% of the students who enrolled and finally, our model is showing that we have identified 50% of the students who have enrolled.
Taking a closer looking at the perfect prediction line you can see that once we’ve gone through 33% of the data the model correctly predicts student enrollment 100% of the time.
To help visualize this point further, I ran a distribution node of enrollment. Here we can see that 33% of the sample ended up enrolling and 67% did not. Now be careful to remember that this gains chart is only graphing results on students who did enroll. The distribution node helps to illustrate that once the perfect prediction model has gone through the first 33% of the sample, it’s identified 100% of the students who have enrolled and flattens out because it cannot improve anymore.
The highlighted area here is known as “under the curve”, which indicates how much better our model is than the at-chance model.
Alternatively, the highlighted area here indicates where our model can be improved, as our goal is to be as close to the perfect prediction model as possible.
Changing the default category of the target field
Edit the evaluation node and select the ‘Options’ tab. From there select the User defined hit checkbox and select the expression builder to the right.
Since we want to change the target field, scroll down the @Functions list until you find the @Target function. Double click to add it to the expression. Click the ‘=’ sign to add it to the expression.
In the Fields section select the enroll variable and click the field values icon. Select the category of interest, in this case we want to evaluate those students who did not enroll, so select ‘False:N’.
Click ‘Ok’ twice.
And here is our newly generated gains chart:
Hope you found this post helpful and comes in handy the next time you’re interpreting a gains chart in SPSS Modeler.
If you have any questions or comments please feel free to reach out to me at: