IBM Watson™ Ideas

Welcome to the IBM Watson™ Ideas Portal

We welcome and appreciate your feedback on IBM Watson™ Products to help make them even better than they are today!

If you are looking for troubleshooting help or wondering how to use our products and services, please check the IBM Watson™ documentation. Please do not use the Ideas Portal for reporting bugs - we ask that you report bugs or issues with the product by contacting IBM support.

Before you submit an idea, please perform a search first as a similar idea may have already been reported in the portal.

If a related idea is not yet listed, please create a new idea and include with it a description which includes expected behavior as well as why having this feature would improve the service and how it would address your use case.

Standard WDS crawler for websites

The standard WDS crawler supports filesystem, sharepoint and databases.

Since one of the few document types that WDS can ingest is HTML, it would make sense to have the standard crawler also support HTTP and HTTPS accessible websites.

On my recent project the customer had documents in filesystem and sharepoint and on their intranet and internet websites that they wanted to use via WDS.  To support this we had to write a custom crawler in Nutch, which was a lot of effort for what seems a common thing to want to do.

  • Mar 9 2018
  • Already exists
Why is it useful?
Who would benefit from this IDEA? As a customer I want to use WDS to query documents and HTML pages off my intranet and internet sites
How should it work?
Idea Priority
Priority Justification
Customer Name
Submitting Organization
Submitter Tags
  • Attach files
  • Admin
    Phil Anderson commented
    May 09, 2018 05:23

    We already have this functionality via a Nutch plug-in: