We're dealing with many different document types and our ability to focus on Watson is limited but its support for only basic document/artifact types in Watson Discovery document injestion.
We encounter many but by percentage here are our top 10:
PDF, HTML, DOC(X), PPT(X)(S), XLS(X), JSON, TXT, RTF, CSV, EPUB,
We have seen ODT, ODP, ODS, TEX and their relatives mostly when we encounter government clients as well.
While we don't expect Watson to specifically deal with ZIP files it would be nice to have a simple way to package and minimize the size/time/cost of the transfer of artifacts if possible along with other file compression formats.
Eventually we fully expect to encounter more and we want to minimize our efforts, costs and transforms in analyzing them through Watson along with potential for OCR.
Why is it useful?
|Who would benefit from this IDEA?||As a user of analysis tools I would gain insight into a broader range of document formats|
How should it work?