Currently, we are unable to scale to more than 20 threads when using a WKS custom model for entity and relation prediction with the NLU service. Our documents are very large and one document can take up to 5 minutes to process. If we have multiple users and multiple documents, we cannot process more than one document at a time, or if we do, we need to share the 20 threads among these documents. IBM employees told us that the only way we could scale is to manually deploy WKS models to new NLU instances when our usage increases. So we either have the choice to deploy WKS instances manually when our usage increases, or always have a high number of WKS instances deployed at all time and pay 800$*number of instances/month (even when we don't need it). An easy solution would be for you to provide and endpoint to duplicate a custom model. Either from Watson Knowledge Studio directly, or from the NLU Service. That way, we can handle the scaling on our side, and we don't need to hire an employee whose amazing job would be to deploy custom models manually.