In order to improve the machine-learning model, we need to investigate how and what WKS did differently compared to the ground truth. Right now, we are taking screenshots for each documents' ground truth and decoding result and compare and find out which entities(words), relations were confusing.
There is Confusion Matrix and this gives us a hit on which entities and relations we should look into but in order to take actions on improving the model we need to see which words were confusing in what kind of sentences.
Why is it useful?
|Who would benefit from this IDEA?||Human Annotators, WKS model evaluation and analysis persona.|
How should it work?