Bank

Classification of web pages

A player in the banking sector had hundreds of thousands of uncategorized web pages and no way to link the content viewed to the products that were relevant to each user. We have developed a textual classification model that reaches 95% accuracy on a small sample, paving the way for personalized product recommendations on a large scale.

Problem

The client had hundreds of thousands of web pages on its sites, but none were reliably classified. Without categorizing content, it is impossible to exploit users' connection logs to understand their journeys and recommend the right products to them. The technical challenge: to build an efficient and interpretable classification model based on a very small training sample (less than 1,500 pages annotated on a corpus of several hundreds of thousands).

Vue rapprochée d’une coupe transversale colorée d’une géode montrant des couches concentriques de minéraux en jaune, marron, rouge et vert.

Solution

What we built

We deployed a team of 2 Data Scientists and 1 Data Engineer to design a comprehensive molecular discovery support system.

Step 1 — Scraping and data preparation. Automated extraction of content from web pages, cleaning and standardization of texts to eliminate HTML noise and non-informative elements.

Step 2 — Semantic encoding. Transformation of text into vector representations that can be used by models, by testing several approaches: TF-IDF for the baseline, Word2Vec and Doc2Vec to capture semantics beyond keywords.

Step 3 — Deep Learning modeling. Development of a bidirectional sequential neural network (Bidirectional LSTM) capable of classifying pages according to categories defined by the business. The model utilizes the context of the text in both directions of reading to maximize content comprehension.

Step 4 — Interpretability. Implementation of heatmaps allowing business teams to visualize which words and text passages influenced the classification. The business can verify that the model is based on the right signals.

Projects in the same category

See all projects

Supply Chain Optimization Application

A pharmaceutical distribution player had to rethink its entire supply chain: pharmacy assortment, inventory management and delivery channels. Operational research alone was no longer enough. We built the application that transformed its operations.

Social media trend detection

In a market where consumer behaviors change faster than decision cycles, a customer needed to anticipate trends instead of experiencing them. We built the platform that turns social media noise into actionable signals.

Yield of agricultural fields

The climate subsidiary of a French insurance leader needed to predict field yields across Germany to price a new drought insurance offer. Internal data was not enough. We built the predictive models that made the product marketable.

Customer needs analysis platform

A large French group needed to understand the current and future needs of its customers by aggregating a massive volume of consumer data. After a POC of 8 weeks, we industrialized a complete platform deployed on all French entities and is now being extended at the group level.

Skin scoring and analysis

A major player in luxury cosmetics had developed a skin scoring algorithm. Problem: no one could verify what the AI was based on to make its diagnosis. We built the visualization system that makes predictions transparent and deployable in stores and on mobile.

Beaconing detection

A large group's security team needed to identify beaconing signals, regular network traffic sent by potentially compromised machines to servers controlled by attackers. The volume of logs made manual detection impossible. We built the anomaly detection system capable of processing this data on a large scale.