Technologies
While we work on all kinds of NLP problems, most use cases relate to one of the following problem types. For all problem types, we use a mix of our own experience and latest research to come up with the best possible model architecture.
Document Classification
Information in today's businesses is often buried in unstructured text documents such as invoices, analyst reports or customer emails. The automatic classification of documents into fine granular categories (e.g. customer emails into request types) is often key to further process automation.

Deep neural networks have reached human-level accuracy in many of such document classification tasks. The models are often trained in a two-step procedure: Teaching the general properties of the language and then training it for the specific classification problem using many examples. While we have models who master the general properties of language, the second step is completely tailored to your individual case.

Question Answering
In many business processes, humans need to answer questions using information stored in documents like financial reports or technical documentation. Even for rather simple, fact-based questions this can be a time consuming task, if the document is long or if there are many of them. Finding the answer automatically can help to accelerate processes, e.g. the Due Dilligence during a M&A transaction.

Due to new deep learning architectures and a growing research interest, the model performance currently improves on a monthly base and already reached human performance for rather simple Q&A problems. The models are usually pre-trained on some large corpus and get fine-tuned afterwards for your particular domain and type of questions.
Question Answering
In many business processes, humans need to answer questions using information stored in documents like financial reports or technical documentation. Even for rather simple, fact-based questions this can be a time consuming task, if the document is long or if there are many of them. Finding the answer automatically can help to accelerate processes, e.g. the Due Dilligence during a M&A transaction.

Due to new deep learning architectures and a growing research interest, the model performance currently improves on a monthly base and already reached human performance for rather simple Q&A problems. The models are usually pre-trained on some large corpus and get fine-tuned afterwards for your particular domain and type of questions.

Cognitive Search
Traditional keyword-based search engines are outdated. With cognitive search you can search for content rather than pure keywords. For example, you can find documents that have semantically similar words or search for concepts, relations and document types. This yields more meaningful results to the user and accelerates his search experience.

Various NLP techniques can be applied to implement a cognitive search engine. One common scenario is to use a combination of Named-Entity Recognition and document representations generated by deep neural networks to match the query with the most relevant documents.

Content Extraction
Documents often contain valuable information, but further processing in a structured manner is difficult without human intervention. Models for Named-Entity Recognition (NER) allow the automated extraction of predefined entity types like companies, persons or locations. The extracted entities can be passed to other systems (e.g. CRM / ERP) or used for further analysis.

Deep neural networks or Conditional Random Fields are the dominant methods in this area. Entity types are not limited to the general ones mentioned above. They can be customized to individual interests (e.g. your product names or fields in a form) as long as annotated training samples are available.
Content Extraction
Documents often contain valuable information, but further processing in a structured manner is difficult without human intervention. Models for Named-Entity Recognition (NER) allow the automated extraction of predefined entity types like companies, persons or locations. The extracted entities can be passed to other systems (e.g. CRM / ERP) or used for further analysis.

Deep neural networks or Conditional Random Fields are the dominant methods in this area. Entity types are not limited to the general ones mentioned above. They can be customized to individual interests (e.g. your product names or fields in a form) as long as annotated training samples are available.

Summarization
Texts and conversations are full of unnecessary details. With the help of summarization models texts can be condensed to their essential information.

Models are distinguished into two types: extractive and abstractive. Whereas the first type outputs a subset of sentences from the original text as summary, the latter generates completely new sentences. Models are trained on large external data sets and fine-tuned for your specific domain.

Topic Extraction
Topic models infer abstract "topics" that occur in a collection of documents and can be used to explore, understand and cluster a large collection of documents. For example, analyzing the topics across customer requests helps to understand repetitive topics not covered by your FAQ.

By modeling hidden semantic structures in a text body the algorithm detects the major topics across all documents. Each document can then be described as a distribution of these topics and each topic is described as a distribution of words. Since the models are trained in an unsupervised manner, there's no not need to have any labeled training examples.
Topic Extraction
Topic models infer abstract "topics" that occur in a collection of documents and can be used to explore, understand and cluster a large collection of documents. For example, analyzing the topics across customer requests helps to understand repetitive topics not covered by your FAQ.

By modeling hidden semantic structures in a text body the algorithm detects the major topics across all documents. Each document can then be described as a distribution of these topics and each topic is described as a distribution of words. Since the models are trained in an unsupervised manner, there's no not need to have any labeled training examples.
FEEL FREE TO CONTACT US
engage@deepset.ai
Write Close
Close
Get in touch
By clicking the button you agree to our Privacy Policy