
Digital analytics is used to support a wide range of data science projects, and vice versa. Features built with machine learning and AI are also emerging in web analytics solutions – predictions, targeting, new services, etc. In a recent publication, Gartner refers to augmented analytics as the main trend and priority for CDOs in 2020. We recently interviewed Jérémie Bureau, data scientist and head of the data science team at AT Internet, about data science and analytics. Read on to learn more…
How do you become a data scientist?
There are many ways into the world of data science. Engineering schools and universities offer courses from master’s to PhD level. Indeed, the demand for data scientists is so high that specialised private schools are starting to spring up. Personally, I studied applied mathematics at the University of Bordeaux, then went on to a doctorate in mathematics and statistics at the University of Toulouse. I wrote my thesis under the CIFRE agreement (French industrial convention on training through research) and worked as an R&D engineer for a startup during my three years of doctorate. My thesis was on the reliability of geolocation systems in an aeronautical context. I then worked in various professional fields such as health, employment and digital.
How does data science specifically apply to an analytics solution in SaaS mode?
When working on issues that require data processing, before even going
into predictive models or machine learning, we need to meet two requirements to
extract information that is actionable and value-added – firstly, collecting a
sufficient volume of data, and secondly, ensuring it is representative of the population
we want to study. AT Internet’s huge advantage is
its variety of customer websites, making it possible to tick both boxes!
However, each site will have its own specific properties depending on
its business sector. These differences can vary enormously from one sector to
another – e-commerce sites, media, advertisers, banks, institutional sites etc.
The data science team needs to provide tools that target all our
customers to help them optimise their marketing strategy. The tools, based on mathematical algorithms and models, must make it
possible to describe and predict the behaviour of Internet users.
An example of this is a segmentation method to identify users who
purchase the most, or alternately users who have a high probability of churn
(unsubscribing or not returning to a site). It is often a case of choosing
between a generic model with an acceptable performance on average across all
customer sites, or a specific model for similar sites.
How and why is data science useful for web analysts today?
Data science is now able to provide descriptive,
predictive and even prescriptive tools to support analysts. There are numerous metrics to monitor and
understand to obtain useful information. It’s also not rational to try to
follow this vast number of metrics manually. One of the applications of machine learning to
support analysts is to offer an automatic
anomaly detection service. The goal is to capture unusual or suspicious
fluctuations in metrics over time. Our teams are currently working on analyses to
explain the probable causes of these anomalies – e.g. if a bot passes over a
site and causes a significant peak in traffic, an anomaly is detected on the
number of pages viewed. We aim to support the
analyst in his or her investigative work by automatically exploring a set of
dimensions (source, device, browser, etc.). Our causality analysis module shows that this
anomaly was caused by an abnormal increase in traffic on the direct traffic
segment in Canada on the Chrome 55 version. This type of tool will enable the analyst to
carry out an initial analysis and gain a better understanding of behaviour to
anticipate and implement the necessary actions or strategies.

RFM segmentation is another use case – it is a clustering (segmentation) of customers according to their
purchasing habits to optimise a marketing strategy. Customer transactions
are analysed based on three criteria: Date of last purchase (Recency),
Frequency over a given period, Amount (cumulative over this period). Scoring methods are then used to create the customer segments, such as Stars who buy a lot, and who have bought
recently, or Thrifty dormants who
have a poor recency score. At AT Internet, we have decided to integrate an
automatic RFM clustering feature – the idea is to use a turnkey analysis which
will automatically adjust to the customer context and especially to seasonal
fluctuations. In addition, prediction elements are added and integrated into a set of
adapted graphs. Our teams are currently applying the same segmentation methodologies,
but on metrics related to engagement rather than purchasing, to enable these
features to be used on non-transactional sites.

What are the data science team’s challenges at AT Internet?
Firstly, the construction of a data science roadmap in line with the
needs of our users. Our priority is to be attentive and responsive. From an organisational point of view, our team
is now part of a high-level development environment. This requires the
implementation of a workflow combining major R&D work, industrialisation
and continuous optimisation of our models.

Each member of the team must now be able to handle both modeling and industrialisation issues. The technologies and tools the team uses are very diverse: Python, R, Shiny, Scala, Spark, Elastic Search, Kibana, Snowflake, AWS, Kubernetes, Jenkins, Git, etc. The other key challenge is to ensure that the team’s skill base progresses consistently for everyone. To do this, we work with platforms such as DataCamp or Kaggle.
And to sum up…
It’s important to always stay sharp and
attentive, with a passion for discovery and learning – “Data science is driven
by curiosity”.