Affects both ingestion and document conversion / segmentation.
Benefits usage of the ingested content making it consumable in a more basic format. Original customer use case was to not only remove / strip HTML but also segment based on HTML header level. So this content:
My content for first document.
Is this and I really am not sure if or how to handle <b>stylistic markup</b>
And here is my second document.
Ingested content resulting in two JSON documents.
Follow up investigation on this idea: