A Combination of Text Mining and NLP Techniques Can Be Employed in Deriving Patient Level Insights From Cancer Support Forums

Social media and patient support forums are rich sources of information when it comes to patient feedback and experiences. Apart from seeking guidance on post-diagnosis journeys, many patients express their experiences with existing therapies and expectations from the upcoming treatments.

Pharma companies can benefit from this information to plan their:

  • Targeting activities based on the understanding of target patient profiles
  • R&D activities keeping in mind the unmet needs expressed by the patients

The complications:

  • Social media is sensitive data to be processed given the risk of uncovering adverse events
  • So, it’s utmost important to set the right compliance standards, for timely reporting of the adverse events
  • It’s easy to be lost in the vastness of unstructured information available on the internet
  • This is where the big consultancy or niche product firms fail, as they either lack technical richness or the domain expertise in producing a comprehensive solution

D Cube’s Approach:

To enable comprehensive and business-consumable results, we combine our domain knowledge and technical expertise in systemically processing unstructured social media data:

  • Stage
  • Domain Knowledge
  • Technical Expertise


For a given disease type, knowledge of which forums are popular among patients in expressing their questions, opinion, and experiences. Modular web-scraping components that can easily be customized for various patient forum websites.

Data Structuring

Understanding of the important data and metadata elements that need to be extracted from patient forums. Structuring the data using necessary data pre-processing steps to store the data in an efficient format, that can be directly used for building machine learning models.

Entity Extraction

Knowledge of the entities that are relevant for various pharma business stakeholders (e.g., Disease state, Treatment Status, Treatment Experience, etc.) and the ability to define their constituent taxonomies. Ensemble Machine learning components to derive highly accurate multi-class and multi-label entities.

Patient Profiling

Understanding of patient attributes that are suitable for generating targeted patient profiles and ability to provide business interpretable insights for the same. The hypothesis-driven approach in variable selection, followed by rigorous segmentation exercise.

Related Blogs