Natural Language Processing Expertise

Data is at the heart of fighting cancer as it’s central to finding effective ways to prevent,  diagnose, and treat the disease. The challenge, however, is that much of the data is unstructured. Examples include clinical notes, surgical and lab reports, clinical trial data, and discharge notes.

A prominent US-based healthcare provider recognized the importance of unstructured data for all cancer-related activities – from research through treatment. However, given the volume and complexity of the work, they knew they needed assistance from a company that had Natural Language Processing expertise, experience with the applied tooling, and a deep understanding of medical terminology. For help, they turned to Klarrio US.

A second project where Klarrio was asked to apply its NLP expertise was the analysis of drug labels.

Every manufacturer submits a label for every drug they register with the United States Food and Drug Administration. The label is structured; however, the structure mostly defines the sections of the label, but the section itself may contain a lot of unstructured data in the form of free text. Extracting elements of interest from the text, such as a list of clinical studies, what arms they have, what adverse reactions each arm had, and the outcomes the adverse reactions resulted in, can enable a much deeper research across all the drugs dataset and correlate reactions and outcomes across the drugs, for example, connecting them to a common ingredient.

Contact usLook behind the scenes


To perform the work, Klarrio devised a process for redeveloping the annotators, which included the following:


Developing and applying tooling to determine the rules and dictionary dependencies for each annotator.

This led to the elimination of unused rules, restructuring existing dictionaries, and generating new dictionaries.

Introducing reusable models to make maintenance of the annotators more efficient

Establishing Gold Standard document samples for each annotator and assessing the annotators against them for accuracy.


Our work with annotators:

  • Improved accuracy
  • A cleaner system

Our work with the analysis of drug labels:

  • Precision and recall for the extracted entities: approximately 95% and 75% (early results).
  • The work on improving accuracy continues.
Average precision and recall for the extracted entities

Behind the scenes

To enable the client to find and extract the information needed for their research in large volumes of unstructured data, Klarrio’s initial task was to assess an existing set of annotators to improve their accuracy. Klarrio had the added objective of making the maintenance of the annotators more efficient

This approach was applied to assess and improve additional existing annotators as well as the development of new annotators.

Regarding the analysis of drug labels, Klarrio used spaCy, an open-source NLP library, to combine its capabilities of a machine-learning Named Entity Recognizer to detect the entities with deep sentence parsing and rule-based matching to detect the relations between the entities. The extracted entities and relationships are saved back into a relational database for further analysis by researchers and clinicians. 

The Technology

  • Open-source Frameworks, such as spaCy
  • Proprietary Analytics

The Expertise

  • Data Engineering
  • Software Development
  • Data Analytics

Join us!

Want to work on similar projects?

Introverts and extroverts, geeks, nerds, and digital poets... Klarrio is the perfect place to learn and teach, experiment and brainstorm, exercise your brain, and feed your passion. Surrounded by people with amazing, world-changing talents.

We're hiring

Contact us!

We're your one-stop cloud-native partner

We design cloud native, cloud agnostic software solutions to empower you to control your data, limit cloud costs, and optimize performance–all without compromise. What can Klarrio do for you today?

Contact us

Other Projects

Just a few projects examples.