Affects both ingestion and document conversion / segmentation.
Benefits usage of the ingested content making it consumable in a more basic format. Original customer use case was to not only remove / strip HTML but also segment based on HTML header level. So this content:
My content for first document.
Is this and I really am not sure if or how to handle <b>stylistic markup</b>
And here is my second document.
Ingested content resulting in two JSON documents.
Follow up investigation on this idea:
Why is it useful?
|Who would benefit from this IDEA?||As a user of ingested content, I want to be able to use result text in text format without markup so that I can present it directly to the end user|
How should it work?