Pharmaceutical industry

Forum analysis

A pharmaceutical player needed to understand the real problems of patients with Sjögren's syndrome — not those of clinical trials, those that patients express to each other. We built the NLP system that can extract trends, themes, and weak signals from thousands of forum discussions.

Problem

Medical forums contain what no clinical study captures: the raw voices of patients. Their frustrations, their unreported side effects, the alternative treatments they are trying, the questions they are afraid to ask their doctor. For the client, this information was a strategic gold mine, but completely unusable as it was. Hundreds of thousands of messages, unstructured, written in everyday language, without any tools to collect them, organize them and extract actionable trends.

Vue rapprochée d’une coupe transversale colorée d’une géode montrant des couches concentriques de minéraux en jaune, marron, rouge et vert.

Solution

What we built

We designed a comprehensive NLP collection and analysis system to turn forum discussions into strategic insights.

Step 1 — Automated collection. Scraping 216,000 posts on the sjogrensworld.org forum and structured insertion into a MongoDB database. A collection pipeline designed to be reproducible on other sources.

Step 2 — Descriptive analysis. Overview of the data collected: volume of publications, engagement rate, links between posts, mapping of the most active users and community dynamics.

Step 3 — Topic modeling and sentiment analysis. Automatic discovery of patient-specific themes via topic modeling, coupled with a feeling analysis to measure the emotional impact of each subject. Identifying and dissecting current and past trends.

Step 4 — Extraction of latent knowledge. Use of word embeddings techniques to detect weak signals: emerging problems, undocumented drug-side effect associations, changes in the perception of treatments.

Projects in the same category