Biotechnology & Pharma
Industry & Manufacturing
Finance & Compliance
Accelerating search and enabling new insights based on advanced interpretation of research information
Improving processes and product quality by uncovering implicit knowledge
Define successful strategies through increased transparency on markets and players
The amount of information created in the biomedical sector is enormous.
There are over 27 million scientific publications in the MEDLINE database with about 500,000 new publications every year - not counting experimental protocols and method descriptions. These texts must be read, understood and put into relation to advance scientific progress in a field that is highly competitive.
Your company has a growing amount of manuals and other documentation, but you still struggle to find what you actually need? While your experts discuss new challenges they should have access to already documented efforts to make them more efficient, let them find already discussed topics or enrich their knowledge with solutions on similar problems.
Although money laundering is a global problem only very few cases are detected by current systems. Billions of daily transactions have to be monitored with little information attached. The schemes to hide money laundering are becoming more and more complex, from complex trading schemes to the creation of fake companies. With increasing regulatory pressure to fight money laundering, financial institutions are challenged to increase performance while keeping costs at a reasonable level.

We built a tool to intelligently categorize the largest available resource on experimental protocols and methods for the publishing company Springer Nature. All mentions of cell lines and organisms were extracted from the literature and matched with an existing database. The method had to be flexible to account for alternative spellings, incorporate context to resolve the actual meaning and generalize to new words to detect new trends and developments.
We built a tool for a DAX30 company to automatically label internal documents by their content and search for concepts rather than keywords. Cognitive search is the new generation of enterprise search that uses AI to return results that are more relevant to the user. This even includes searching for texts in different languages.
All transactions have to be monitored in real time. The underlying customers are analyzed through their related website, social media accounts and further documents. The flow of money across bank accounts is modeled, suspicious activity is tracked and marked for human supervision. With this support, case managers can handle more cases and come to better decisions.

Tech approach
Deepset incrementally built a customized approach for detecting cell line and organism mentions called Named Entity Recognition (NER). A conditional random field - the standard model family for these types of tasks - was trained as a baseline and then substituted by a neural network based BiLSTM sequence model on both words and characters. This more powerful Natural Language Processing approach improves context understanding significantly and the ability to generalize to previously unseen examples.
Tech approach
Topic models for unsupervised document vectors might already be sufficient. But to fully utilize the power of Natural Language Processing we at deepset are using cutting-edge sentence embeddings, e.g. either based on smart aggregation techniques like p-mean or InferSent or we using a language Model like BERT for better context understanding.
Tech approach
In collaboration with Fintech startup hawkAI, deepset developed a real-time decision engine for bank transactions. Technologies used: A Kafka transaction queue enriched with relevant information through Apache Spark, Redis and other data sources. The decisions are made by gradient boosted trees to have a good tradeoff between accuracy and interpretability of predictions.
