Medical forums contain what no clinical study captures: the raw voices of patients. Their frustrations, their unreported side effects, the alternative treatments they are trying, the questions they are afraid to ask their doctor. For the client, this information was a strategic gold mine, but completely unusable as it was. Hundreds of thousands of messages, unstructured, written in everyday language, without any tools to collect them, organize them and extract actionable trends.
Problem

Solution
What we built
We designed a comprehensive NLP collection and analysis system to turn forum discussions into strategic insights.
Step 1 — Automated collection. Scraping 216,000 posts on the sjogrensworld.org forum and structured insertion into a MongoDB database. A collection pipeline designed to be reproducible on other sources.
Step 2 — Descriptive analysis. Overview of the data collected: volume of publications, engagement rate, links between posts, mapping of the most active users and community dynamics.
Step 3 — Topic modeling and sentiment analysis. Automatic discovery of patient-specific themes via topic modeling, coupled with a feeling analysis to measure the emotional impact of each subject. Identifying and dissecting current and past trends.
Step 4 — Extraction of latent knowledge. Use of word embeddings techniques to detect weak signals: emerging problems, undocumented drug-side effect associations, changes in the perception of treatments.
Projects in the same category






