Skip to main content

Modern Data Stack: a modern data infrastructure for strategic decisions

Anthony Bouyer ·

Introduction: why a Modern Data Stack?

In a fast-growing company like Make IT Safe, data is everywhere. It comes from sources as diverse as flat files, Excel files, MySQL databases and tools like Pipedrive. But data — essential as it is — quickly becomes unusable if not organised, cleaned and analysable. Like a library full of books without labels or classification: good luck finding what you’re looking for!

Before deploying our Modern Data Stack, each team worked with its own data, often isolated from others. Not ideal! We wasted time searching for information, and our decisions rested on often incomplete data.

To solve this, we decided to build a modern, robust infrastructure: our Modern Data Stack. It unifies all our data, makes it accessible and above all secure.

Our approach: an open-source, secure Modern Data Stack

Our technology choices were guided by three principles:

  1. Data sovereignty: all our data stays on our servers. Nothing leaves our infrastructure.
  2. Open source: we want to control the tools we use, without depending on costly or rigid proprietary solutions.
  3. Flexibility and security: every brick in our stack is deployed and configured in a secure environment, aligned with market standards.

It’s as if we built our own factory, perfectly adapted to our needs, where every machine (or tool) plays a precise role in transforming raw data into actionable analyses.

The blocks of our Modern Data Stack

We structured our stack around four main tools, each playing a key role in the data-processing pipeline.

Deploying our Modern Data Stack wasn’t a whim or attraction to “new” and “cool” tools. Each choice was carefully considered, based on two essentials: our internal expertise and external guidance.

We already had significant experience with the chosen technologies. Our team mastered MySQL and had solid advanced-SQL foundations, which naturally led us to DBT and ClickHouse. We also knew data integration and ETL pipelines were challenges we could tackle effectively with Airbyte.

But to ensure the transition happened under the best conditions, we didn’t hesitate to call on external experts. These partners helped us configure the tools, optimise pipelines and avoid common pitfalls while sharing best practices adapted to our needs. This support accelerated our deployment AND helped internal teams skill up on these new technologies.

Combining internal expertise with targeted external support, we built an infrastructure that’s not just high-performance but also perfectly aligned with our strategic goals and operational constraints. This approach let us adopt modern tools while staying pragmatic — no succumbing to mere fashion. Each tool was chosen for its ability to address specific needs while integrating smoothly with the other components.

Data processing at Make IT Safe

Airbyte: the universal connector doing the dirty work

Airbyte is our “data vacuum cleaner”. It fetches information wherever it sits, whatever the source — a MySQL database, an API, a CSV file, Airbyte connects and delivers data directly into our warehouse.

But Airbyte doesn’t just extract — it manages pipelines reliably. If a task fails (and it happens!), it automatically restarts.

Concrete example

Say we want to retrieve data from our Pipedrive CRM to track sales opportunities. Thanks to Airbyte, we set up a connector that syncs this data daily with our data warehouse. No more manual downloads.

Why we love it:

  • Open source, fully customisable.
  • Huge connector library — over 300 integrations.
  • Clear interface to supervise every data flow.

DBT: the conductor of data transformations

If Airbyte is the vacuum, DBT (Data Build Tool) is the architect. It takes extracted raw data and transforms it into analysis-ready models. The idea: structure data to be understandable and usable by teams.

With DBT, our analysts can write SQL models to clean, aggregate or enrich data. Plus, DBT keeps track of every transformation, guaranteeing full transparency and auditability.

Concrete example

Take customer-satisfaction data. It arrives raw in our warehouse — a mix of comments, numerical ratings and metadata. With DBT, we built a model that automatically classifies feedback (positive, neutral, negative) and computes a global satisfaction score.

Why it’s essential:

  • Transformations are versioned like code, easing collaboration.
  • Errors are easy to trace and fix thanks to model history.
  • Our teams save a lot of time by reusing existing models.

ClickHouse: the data warehouse built for speed

ClickHouse is the brain of our stack. Every data transformed by DBT is stored here, ready to be analysed. Why ClickHouse? Because it’s designed to handle massive data volumes while staying ultra-fast.

Unlike a classic database, ClickHouse is optimised for analytical queries. That means we can query billions of rows and get a response in milliseconds.

Concrete example

We use ClickHouse to analyse marketing performance. How many leads were generated this week? Which channels work best? Answers that used to take hours are now available almost instantly.

Its superpowers:

  • Advanced data compression, reducing storage costs.
  • OLAP (Online Analytical Processing) architecture ideal for multi-dimensional analysis.
  • Scalability — it can handle petabytes of data without sweating.

Metabase: the interface that makes data meaningful

Metabase is our window into data — the tool we use to visualise and explore all information consolidated in ClickHouse. One of Metabase’s biggest advantages: simplicity. You don’t need to be a data scientist.

With Metabase, each team creates its own dashboards or asks questions directly on data — “What are our best-selling products?” or “What is the average satisfaction rate this quarter?”.

Concrete example

Our CSM team uses Metabase to track customer satisfaction. At a glance, they can see which customers need special attention and which actions to prioritise.

Why it’s great:

  • Intuitive, even for non-technical users.
  • Interactive, customisable dashboards.
  • API integration to automate report sending.

Security and governance: an absolute priority

Data security and governance aren’t optional for us:

  • Hosted in our secure network: no tool or data leaves our infrastructure.
  • Strict access control: only authorised people access specific data.
  • Full auditability: every change or query is tracked for transparency.

We also adopted internal policies to train teams on data management — crucial to avoid human errors.

Future possibilities: integrating AI and language models

Our Modern Data Stack as it stands today provides a solid foundation to collect, transform, store and analyse our data. But we see even further. One of the most exciting opportunities is integrating artificial intelligence, particularly Large Language Models (LLMs), to maximise our data’s impact.

By adding AI agents or LLMs like Llama, we could go beyond descriptive analysis. These models could identify trends in our historical data AND generate projections based on different scenarios. Concrete case: an LLM connected to our pipeline could analyse marketing-campaign performance and propose strategic adjustments, or even simulate the impact of an additional marketing budget on future sales. Llama stands out as a particularly well-suited solution for us — open source and self-hostable.

These agents could also play a role in proactive customer support. Combining predictive analysis and conversational abilities, we could provide personalised recommendations to each customer. A customer looking to improve regulatory compliance could receive a list of specific suggestions adapted to their sector.

Another promising area is automating decision tasks. By leveraging LLMs to process structured data in ClickHouse, we could automate complex processes like cost optimisation or anomaly detection. Our teams could focus on more strategic work while guaranteeing our data actively works for us.

However, AI integration comes with challenges. We must ensure these tools respect the high security and sovereignty standards we set. Data used to train or interact with the models must stay confidential and under our full control. Recommendations generated by AI must be explainable, understandable and aligned with our customers’ specific needs.

In short, integrating AI into our infrastructure is a natural, ambitious step. It will let us explore new data-analysis horizons and strengthen our commitment to offering our customers even more relevant, personalised solutions.

Conclusion: a stack for today and tomorrow

Building this Modern Data Stack transformed our relationship with data. Not only are we more efficient, but we can also support our customers more proactively. This infrastructure is an investment in our future — designed to evolve with our needs.

Acknowledgements

I also warmly thank Sophie Lohezic for her valuable support in deploying this stack. Her data-architecture and pipeline expertise was decisive in structuring the infrastructure, optimising certain technical choices and avoiding several classic pitfalls when building a Modern Data Stack.

Beyond technical aspects, her advice also helped accelerate go-live and strengthen best practices around data governance and management.