In mathematics, the standard way to refer to the numeric output representative of certainty of a classification algorithm is as a "score." The name of this in the json structure returned by Watson Assistant currently is "confidence". However, this has caused confusion with our clients. (https://en.wikipedia.org/wiki/Statistical_classification vs https://www.merriam-webster.com/dictionary/confidence)
Consider the following scenario: the IBM expert services team numerically calculates the best threshold to hand off from one Assistant's domain to another domain. That threshold, currently called "confidence" is 0.2. The client doesn't understand why anything with a "confidence" above 0.2 would be considered a good response. We have been trained as a society to call things on a 0-1 scale labeled as "confidence", anything below 0.6 would be "failing". However, this is not the correct interpretation of this number. In fact, a machine learning score is only be interpreted in terms of its relation on how it performs on a set of data. In this particular example, a "confidence" ("score") of 0.2 resulted in a system that was 75% accurate.
This suggestion comes from feedback directly from the customer during our GEICO virtual agent development.
Why is it useful?
|Who would benefit from this IDEA?||As a customer, I don't want the literal definition of the word "confidence" to confuse me when I interpret the score from a classifier|
How should it work?