In this episode, Gaël Duez went to both Oslo and Paris. Elin Hauge and Heloise Nonne never met before but still they have a lot in common. Same huge experience in AI and Data management … and more important - the very reason you will enjoy having them on the show - same despise for buzzwords and lazy short-cutting thinking. A candid discussion where you can expect metaverse buzz being slashed and rule of three providing more answers than deep learning 😮

❤️ Subscribe, follow, like, ... stay connected the way you want to never miss an episode!

Learn more about our guest and connect:

Elin's LinkedIn
Heloise's LinkedIn
Gaël's LinkedIn
Gaël's website
Green I/O website

📧 You can also send us an email at greenio@duez.com to share your feedback and suggest future guests or topics.

Elin's and Heloise’s sources and other references mentioned in this episode:

Hugging Face’s code carbon tool https://huggingface.co/bigscience/tr1-13B-codecarbon
Green AI by Roy Schwartz, Jesse Dodge, Noah A. Smith, Oren Etzioni “Communications of the ACM”, December 2020: https://cacm.acm.org/magazines/2020/12/248800-green-ai/fulltext
“Data storage & dark data” recent article by Tom Jackson & Ian R. Hodgkinson both from Loughborough University: put a light on the cost of storage : https://theconversation.com/dark-data-is-killing-the-planet-we-need-digital-decarbonisation-190423
"Carbontracker: Tracking and Predicting the Carbon Footprint of Training Deep Learning Models" by Lasse F. Wolff Anthony, Benjamin Kanding, Raghavendra Selvan arXiv Crnell UNiversity July 2020 https://arxiv.org/abs/2007.03051
Snowcrash the book where the Metaverse was coined (not by Mr Zuckerberg at all…)
Codecarbon.io
Green I/O mid-season wrap-up article
Gerry McGovern's book World Wide Waste
The Data For Good's website and its tool Code Carbon (sources on GitHub)

Transcript (IA generated soon to be fine tuned)

[00:00:00] Gaël: Hello, everyone. In this episode, I have the pleasure to welcome Elin and Heloïse is known. Elin lives in Oslo, in Héloïse, in Paris than never met before, but still have a lot in common. Same studies in math, physics, in computer science, same huge experience in AI and data management, and more important, the very reason you will enjoy having them on the show.

Same despise for buzz words in lazy shot cutting thinking. And of course, both are extremely mindful of environmental issues. However, they went by a different path during the career, which make their vision both complimentary and refreshing. Elin forged her own path as a respected AI strategist, a professional speaker, and a board member.

After leading teams in the insurance sector, IT consulting, and many more, always helping deploy data-driven approaches. Without the hype, Héloïse has carved out a own carry pass from the trenches after a PhD in quantum physics, excuse me, sir, she did several jobs in data science, eventually becoming the director of the Data and AI factory at the SNCF, the French Railway Company, which operates 15,000 trains per day and generates just a bit of data.

Hello Héloïse. Hello Elin. Nice to have you on the show today.

[00:01:14] Héloïse: Hello girl.first of all, What did I miss in your bio? Did I forget to mention anything about you?

yes, there is some kind of a change in my career that's coming now because I will quit sncf by the end of the month and start diploma in International Affairs. The idea is to leverage my knowledge of data, data analysis to work on geopolitics of energy and maybe some, relation with defense, in Europe, and supply chain.

[00:01:51] Héloïse: How, how can we supply, energy in a separate way in the future? So actually, I have no idea how my future job will be. So the idea is to take a step back, think a bit of what I want to do and what, I want to, how I can leverage my past career to, my new one.

[00:02:12] Gaël: But that's awesome. Super interesting. In a breaking news on the Green I/O podcast. Whoa. I'm honored. And please, recruiters don't jump on her on LinkedIn. She's not for hire. Huh? So leave her alone. And what about you, Elin?

[00:02:26] Elin: Well, I think you forgot to mention that I have actually also studied organic farming, and I have worked in the agricultural industry as a farmer and I've also set up, a company within organic, food production and distribution.

[00:02:42] Gaël: Oh, that's, that's awesome. And how come that you get interested into this, topic or this sector? Actually,

[00:02:49] Elin: Oh, well that's kind of an a very non-professional reason because, I was married to a farmer and so I realized that if I'm going to be part of running this farm, I need to understand what this game is about. So unfortunately, or unfortunately or whatever, we're not married anymore. but I still have my interest in organic farming and in food in general,

[00:03:17] Gaël: So something very positive you took away from your relationship?

[00:03:21] Elin: something at least.

[00:03:25] Gaël: Okay, awesome. And maybe, that, that could be connected to my next question, which is how did you become interested in the sustainability?

[00:03:31] Elin: Well, I think it very much started with the organic farming and really being a student and having to learn about the soil, for example, and how plants grow and how to have a livestock with a healthy. But then working in the technology industry for several years after that, I'm not sure I really thought that much more about sustainability until about a year ago.

I was hired to speak at an event for leaders in the electrical grid industry in Oslo, and before it was my turn on stage, they did a cahoot session. One of the questions. What is the most important action that your company can take towards achieving the UN sustainability goals? And the main answer was digitalization of products and services.

And I was listening to this crowd and I was thinking about my own talk that I was going to do a few minutes later, and it struck me that there is a huge mismatch. And that was when I really started to, to re-engineer my approach to AI and the Metaverse and other data driven technologies.

[00:04:53] Gaël: Yeah, that's the very reason of this show as well, and that definitely topics that we will discuss a bit later. And, what about you, Héloïse?

[00:05:01] Héloïse: Well, for me it was, quite a long process and very slow. cuz I, I remember when I was, a teenager having conversation with my sister about environment and ecology and we were already very concerned. it's a somehow, after this I thought I was having, a sustainable behavior, down, out the lights, not having a car, for instance, and being careful.

But, progressively I realized that it was really not enough, on the individual side already. So, Started to really check on every behavior I had. it was in like three years ago, being really, really mindful of everything I would do or not do. And then progressively, it shifted to the behavior in my company.

What impact do I have, individually in my job and collectively, in my company, what, what impact my company has. And I, yeah, I grew more and more concerned about the urge to act and to, multiply the action to, to really increase, not just increase, actually change the system, change how we work, and when I realized that the, carbon impact of the digital just overshoots the impacts of the entire, air traffic worldwide, then it truly was,I think a turning point, where I grew more and more concerned about what I was doing. Because when you talk about the cloud, you. Everyone thinks it's kind of, not real and have no notion of an impact.

And I was at that time working on that. I was watching TV also individually. And, you know, it's funny, I, I remember my, my father was, an engineer, in the industry for a long time. So he was,a service. So building, bricks and, tails, tiles. And he said at the end of his career, now I'm going, I'm going to try to do something good.

Just, just joking because I have been polishing my whole career with,building in this industry because this industry pollutes a lot. It uses a lot of gas to cook, bricks and everything. So he was kind of joking. So he said, now I'm trying to do something very cultic because I've been polluting my old my whole career.

And I, at some point that I realized, okay, now we are the new kind of polluting industry. And that was really,yeah, turning points. And I realized that in my team, people were concerned, but not really conscious. That's, that was a problem, but really willing to act and, progressively I started to think, that was more important,to, to take that into account as one of the first criteria in making decision to, to start a project or not to take that into account, be being really mindful of, the potential impact , on the environment.

[00:08:26] Gaël: So what we are gonna try to do today is that our listeners will not have to wait retirement time to start having an impact, and talking, about the impact of the digital industry and more specifically, data science and data in general, which are, an area where both of you are experts. I, I have to mention a discussion I had very recently with, pit market Itz. We were discussing about sustainable design, but he mentioned this number that I've didn't hear about before, that when Open AI, released their latest voice recognition system, actually voice recognition net. they gave the number of 680,000 training hours to complete it and to train it.

And obviously , the computer network they use was not drawing 70 wards, like my desktop. So it's a fair statement to say that this model training, this model has used a lot of energy. So it seems that data science can impact the planet, but how much does it actually do? S i I believe you, you've got some figures and some information on it.

[00:09:34] Elin: Yes. In general, it's a bit hard to find robust numbers here, right? Because first of all, it's difficult to find those numbers or create those numbers. And then secondly, the big suppliers are not really interested in making these numbers known either. But we do know that g T. Which has about 175 billion parameters would require as much electricity as about 130 Northern European households for a year.

So I think for the society to really understand the impact that these. Training rounds have, we need to find the equivalence in understandable terms. Kilowatt hours is a bit tricky to understand unless you do the calculations down to what you spend yourself as an individual or on your house.

But when we can compare it to the equivalent of households, it becomes much more understandable and one training. Equal to 130 households for a year. That is a lot.

[00:10:51] Gaël: indeed.

[00:10:52] Elin: then we can, of course, argue that that G PT three is maybe the world's largest neural network, and obviously will require much more energy to power the training rounds than what we maybe would consider normal models in a like a normal business and just an average business in the European market, but still with a bit of a complexity, we could easily get to levels of maybe 10, 20, 30 households for a year in power consumption equivalent. And again, maybe that is not so much if you think about one or two or three of these, and we can say, well, that's okay, but at the same time, we know that the digitalization of businesses and societies all over the world and in Europe has really barely started. There's so much more to do. And then the flip side of that is that when you have all those data and you have all those digital processes, then you need to use those data for something useful, which means that you will apply machine learning algorithms.

To at least a huge part of this, to make sense of this data and to use them for automated decision making. But in doing that, we are actually then requiring all this power consumption for the training, and then it's not just a one off, it becomes the expected average attitude. Right.

[00:12:30] Gaël: That's quite a lot when you know that the electricity consumption of the entire digital sector is, roughly the same size of, India. So the third largest in the world, electricity consumption, knowing that it might grow even further, that's, that's a bit scary. ELO is, is it something that you studied, in your previous job as well?

[00:12:50] Héloïse: Actually not at all. for different reasons. The main reason is that, at, in the industry or in many large companies, whose business is not, build on , images or videos. So companies not like Facebook, Google, or, I don't, Microsoft. The amount of data, is by several other magnitude lower than what they are dealing with for, for instance.

Yeah. And what makes,deep learning, very heavy in terms of energy conception and, um, Carbon emission is , the size of, the data sets that they are, using to be, to be trained. And my, my job, at Sncf is very related to the maintenance of infrastructure or during stock or just measuring the mobility , in train or more generally.

Mobility. And the amount of data we, we deal with is a lot lower. for instance, the platform that we, the data platform that we manage doesn't store all the data of every data set of sncf, of course, but a lot of it's operational data. We have four something like 400 terabytes of data in all. . So it's not a lot and it's not, homogeneous data.

So when we train a model, it's actually working on a very, very small part of, it's something like 300 megabytes or maybe , 100 gigabytes at the most. So actually the training time of our model is not really, the, the part that has the most impact, in terms of energy or carbon emission. It's more how we store the data.

So actually we focused a lot more on data management than on really, the, the impact of our algorithm because we are not a lot, we're not doing a lot with, videos and, images. And when people contact me, with use cases using videos. I'm always a bit,mitigated to go further because of course it would be interesting. my team would really , enjoy doing deep learning on, I don't know, videos from trains, but actually I'm not sure. Sometimes there is a simpler and cheaper solution before that we can address before going to deep learning and, and everything. So we actually have not looked so much into the impact of ai, algorithm directly.

[00:15:41] Gaël: That is very interesting because what you say is the issue, at least at Sncf, but in many companies are not that much data science, but rather data management. And I think we need to go back to this point a bit later, actually, quite soon later. If I can just pose it for a moment and just go back to data science, electricity consumption when it happens.

And I understood that it does not happen in every, company. , maybe Elin, what are the, ,the measures that you can take to reduce it? I don't know. Did did you, do you push for using code carbon I/O or,you know, the hering face,code carbon because it's funny that they've got the same name, but they're two different, stuff.

what do you do? Is it, is it about measuring, is it about being wise and, and not using the same models? What would you advise.

[00:16:30] Elin: So if you look to like the developer communities, they have for several years, maybe decades, worked on efficient coding, how to run the code. As efficiently as possible, as fast as possible. So there is already a culture among developers to engineer their code in an efficient way, and we need that same awareness and attitude amongst data scientists and data engineer.

just to put them in the same bucket for this purpose, as well. So I think it starts with awareness because I think at least many of the data scientists I know they haven't really thought about this. And when they just got a really cool problem, they would just dive into the problem and retrain and redo and explore more data and try again and play around with it and not really focus on what would be the efficient engineering of the modeling. and so that's something that I think is kind of a, a very easy. Stage one. And I've spent a bit of time looking at the green software engineering principles backed by, for example, Accenture and Aard. And I think those basic principles really make a lot of sense. And we don't really, really need to make this complicated, but just add a little bit of a common sense into how we approach development and how we develop model.

And all the work with the data and the maths and the code that is required to make this operate, in a good way in everyday business.

[00:18:11] Gaël: And, I had a very interesting conversation with Few Alvez de Costa, the, one of the co-founder of Data For Good, he told me that basically, What is still required for data scientists is kind of to switch the approach for being model centric to data centric. So you don't start with, wow, cool.

I want to use G PT three because I want to use G PT three, but I check the data set and after that I pick the most accurate model. So do you think that this switch of mindset from model centric, I want to try the latest and shiniest model to being data centric and I will just. Pick whatever most efficient tool I have regarding the data set I need to manipulate.

Connecting to what Héloïse just said about that, most of the time you don't have enough data actually to do, deep learning or whatever. do you think that is something that just raising awareness will be enough?

no, it's not enough, but it's an important first step, and this reminds me of my forecasting professor at a Warwick Business School like 20 years ago, and she kept hammering in. Make it simple. Make it simple. Make it simple. Do not overfit. And I think within the AI space, it's, it's fun, you know, for the data scientists and I really understand them why we do the same.

[00:19:33] Elin: I really dive into the complexities and as you say, the shiny new toys and play around with the really, really cool stuff. However, If we look at the development of ai, like the early stages, they were, scientific. It was about science, it was about academia research. And then over the last maybe decade, it's been more about engineering, at least in everyday business.

And we still have the science and more academic approaches like GPT-3, but in everyday. Like Héloïse is also pointing out it's more about engineering or how do we actually apply these tools to our everyday business to solve a business problem or an operational problem. And I think moving ahead, we will again transition into a design perspective of how we apply AI or machine learning to be more pragmatic, in business and society.

and then we need to have a, like the approach you mentioned, which tool do we need to solve our problem? We might not need the coolest, fanciest screwdriver. With a diamond tip and a coverage, shaft, right? But we need the tool, the screwdriver that solves the problem we have right there, right then, and that is about choosing the right design for the problem to be solved.

[00:21:03] Gaël: And what about you, Héloïse? Was it also your mantra? Make it simple. And did you have some requirement from your teams to use, a shiny screwdriver with a diamond on top of it?

[00:21:15] Héloïse: Well, maybe, there was some, yeah, of course I understand this. this will, to try the new tools, but actually, um,I can be quite harsh as a manager and, sometimes when someone says, oh yeah, but we're not doing deep learning, and it's, I thought we would do more machine learning and everything.

I say, Okay. this is just a tool. The goal is to solve a problem that's, related to an operational problem. So if you came to my team to, to do the deep learning, you are wrong. Go to, the academia, go, to ask Google or Facebook. Here we are solving maintenance problem and it can be a rule of thumb.

And if the rule of thumb is solving the problem, we will just do that, just that maybe with data, of course, and that's why we are necessary. But if a simple mean or median solves a problem for,a correct price, then we'll stop. . and I think, I agree with Elin. It's, it's not enough for the data scientist to, actually, it's not enough to go to a model centric, mindset, to a data centric mindset.

I think they have to go all the way to the use case centric mindset. A lot of companies made, this mistake in the past years, it's okay because it's, it was the beginning, but they were just pouring all the possible data and their data lake and then people were asking data saying this quite, what can you do with this data?

And a lot of energy was lost in trying to do something, from the data instead of wondering, okay, what's. Are my goal? Should I reduce costs? Should I increase performance? what should I reduce? I dunno, my critical, breakdowns in this type of, of motor, for instance. If you start with this, then you wonder, okay, this is my problem.

Which data do I have, in my position? Should I get more or not? And what can I do with what I already have? And then that's, I think, the mistake. Many data scientists think, they think their job is to use algorithm. I think their job is to look at the problem and to make a connection with the data they have at, their disposal.

And then make the connection with a model, an approach, a technical approach that can solve the problem. And their job is to make the connection be between these two, business problem, the data and the algorithm. And their mistake is to think that their specialty is on the algorithm and to work a lot on that.

I think it's wrong because in most company, the technical mastering of algorithm is not there because we mostly use libraries that are so fine tuned by experts that we cannot do better. We cannot improve what's already well packaged in the library. So tuning the code is not, the job of a data scientist or a data engineer inside a company. It's the job of academic experts, job of people who are releasing these libraries but not the people using them.

And I think most data scientists and data engineer and data architect should focus on their core competence to make the connection between operational use of something that's on the market, but they are not going to modify. They're just going to use that them and moreover, I think that data centers should, in, in the recent years, data specialists were specialized on the technology, and I'm convinced that in the future they're going to specialize on specific, business areas.

[00:25:35] Gaël: We will have data scientists specialize in energy, specialize in maintenance, specialize in mobility, specialize in marketing, and I think there are going to, there is going to be a more specialization, but towards the business side and not towards the technical side. a very refreshing perspective.

[00:25:53] Héloïse: yeah, and so, so focusing on the model is, I think, the wrong idea for most people, except for the few percent of expert who are going to work at Google, Facebook

And if I can go back on your question, what could we do to reduce the impact of, AI and data science from a company? Very operational point of view. I think there are, several levers that we can, we can, use first as a company, we can, put pressure on suppliers.

We are trying to put more pressure on how they engage, what, what kind of, promise they can really do on, improving the environmental impact of their platforms and the service that they, they provide. Because I think Microsoft, Amazon and Google are more competent to address this problem. Then the companies who are buying the services, but if their, their clients are putting pressure on them, they're going to address it more efficiently or more seriously.

So that's the first lever. and. Actually there are going to be a lot more efficient that what I can do in improving , their algorithm and the way they store data. And if they know their clients have concerns about this, they're going to change. But you talked about this in, in, in previous podcasts.

And the second thing I concur with, Elin, people writing, code should be aware they have to, be more competent in writing correct code. Actually it should be more emphasized. That's writing more, efficient code means, less costs for, your, IT systems and production.

Actually, we, that's what we have been working on, improving codes and improving architecture choice for some projects we divided by five or six, the cost of the project, the cost of the application, the yearly cost. And this actually, I think is the only way to convince, your clients to invest a little bit more money.

When you're ,building the app, is when you can actually explain that if they spend 10% more budget, they will reduce by 40% cost in production. And that I, I think it's the only way to convince. because just the carbon impact, okay, they will think it's nice, but in the end, most companies, they have budget.

And this is the main constraint. So actually, if you link the carbon impact, would the, Euro or dollar impact, then you win. Otherwise you won't, you won't go very far. and I think developers should have this in mind and they should actually, see this as a competence they can leverage on the market, to be hired and to have a good income.

Because if they can prove they can reduce costs and carbon, , then it's fine. otherwise, there are not going to be that much interest in that because it's not very sexy to improve your codes instead of, just training the, the nicest algorithm, and, the newest, library, that Google just released.

[00:29:21] Gaël: And Héloïse, when you say you divided by four or five, the cost of some codes that you wrote, could you be a bit more specific and give an example of what did you do? Was it like some kind of a small data smart model, like Martin on Andre route, they like to advocate? Or , how did you manage?

[00:29:38] Héloïse: The first thing is to actually question, the use of, IT resources you have. It's, it's very stupid. But, there, there were machines that were not used all the time. And when we just, tuned a little more, a little bit more at what time they were used, how long they were, up and running, that's the most significant gain you, you can have.

[00:30:04] Gaël: So greenops

[00:30:05] Héloïse: , Actually clean, the number of resources you have, because some of them are up and running and not really being used. This happens, very often, I think in large organization because, of . A governance problem because the teams are large. There are people who just start a machine and, they don't say, or it's not very clear to other people what they do.

So actually, this machine is weapon running and we're not sure some, maybe this team is doing something with it and just, just to be sure we will let it up and running. And actually, you, you, you can, you can reduce the cost and your impacts just by cleaning regularly and having a better governance on how you use it.

Resources. but I'm sure that's not going to be new to most people in, in companies because, okay. So it's obvious. The, the second thing is architecture choice. And this leads me to a third way to improve it's actually the competence of people you hire. Actually, the, the number of people,in France, or it's the same for in every country.

The, the market is so tense that finding people to develop is very hard. And there are companies, who, who actually sell, the service where they sell competence with people, but with teams of 20 people that they're going to put inside the company to make a project. But there is a very large turnover and a very large, a very important pressure ,from the buyers to lower the prices because it's, it's very, very expensive and lowering the prices, for hiring people have two effects. a large turnover. And companies who are sEling services with competencies are going to. Actually choose younger people, so less expert and the choices, the technical choices that are made by these teams, and the turnover, the lack of the commutation. In the end, we have, applications which are not, efficient at all.

[00:32:27] Héloïse: And for instance, architecture choices. Just by changing, the type of database that were chosen divided by two, the cost of the project at the end because it was not the appropriate choice because the people who made the choices were not senior enough, were not expert enough, and they, chose the first thing they knew.

and it worked. So if it just worked okay, this database is, is the right choice and it's okay. But in the end, when you fine tune, it's very costly because you need experts, and you need to invest time and money, but in the end, you reduce a lot of the impact. So there is a very important problem, and I don't have the solution, in the balance between the number of people you need, but then you're going to have non-expert people and the problem you solve.

So you need experts and balance between the two is really not, not easy to deal with, especially for companies who, whose business is not it. So they have to hire, they have to rely on suppliers, but sometimes the suppliers just don't really supply you with real expertise. And for this, all those technical choice, of, type of database you use, the type of, which language you're going to use, there are many mistakes that have an impact in the end on the cost and, what the money, cost and , environmental cost. And the last thing is just tuning the, the code. But I, I would say it's the last 10% that you can gain on your, your impact.

[00:34:05] Gaël: however, Elin and Héloïse. if you indulge me, I'd like to play a bit the devil advocates with the first, comment that you made, Elza, cuz you said.

That what we can do is mostly to put pressure on our provider and to that, if you take for instance, AWS, they very la copy pasted what they already, said about the security. That the, the, the statement is basically, we are in charge of sustainability of the cloud as they are in charge of the security of the cloud.

But you are in charge of sustainability in the cloud. Actually it connects pretty well with what you, what you previously say ELO is when you say that for most companies, it's not really training, models but more the data cleaning the data collection, the data preparation in storing all this data that has the, the biggest environmental footprint. And that is definitely sustainability in the cloud, not sustainability of the cloud. So you think it's, there is a bit of a discrepancy here, saying that it's mostly on providers while still having a lot to do when it comes to data management. Like real data management, like collecting, cleaning, storing,

[00:35:21] Héloïse: I think you have to do as a company, you have to do both. really are in charge of how you collect data and how you store it and how long you store it. and for what purpose? So actually have to address these questions and answer them, but you can in addition, put pressure on your provider so that actually they, they proof, to you as a client that they're doing something.

I mean, SNCF, axa, edf, or I dunno, large companies, in France or in in Europe or in the US individually don't have, the power to actually put pressure on Microsoft, or Amazon. But if there are a number of these companies actually asking, and actually in the contract that we are writing with these providers, we, we can add this requirements to provide proof of, of, efforts on the environmental impact of their data center. And this is what is done. It's already in the contract. It's already a criteria, that providers have to answer for when they're actually answering a call for information or , on large projects. And actually, I think you have to do both.

[00:36:42] Gaël: so it's actually three main, ways of actions. The first one being put pressure on your providers. The second one in B. accurate and astute or be wise about what you collect and how much you need to clean it and store it. And the third one is what you developed with,a better use of, software engineering, green Ops, et cetera, that you bundled around, run cleaner, operations.

Would you add something to that? Is it something that you've already noticed that more and more customers, they put some pressure on big providers, or not yet?

[00:37:16] Elin: Well, absolutely. As Héloïse points out, there is very often a clause in the contract stating that there's, well, something around the carbon footprint. But, I think. being Norwegian, we are very aware of the need to transition to renewables, and although we are one of the largest oil and gas producers in the world, the electricity we produce in this country is also 90 4% hydropower. and then what we use is a slightly different answer because we're part of the European market, but let's just stick to the production right now which means that Norway is a very interesting country for the data center providers such as also the Netherlands because of they have wind power. ;But switching from fossil to renewable energy to power the data center only solves a part of the problem.

And we need to keep in mind that all these large platform provider. One of their main KPIs is consumption of cloud services. So whichever way you turn this around, there is use of electricity that to the main extent of the income of these companies decides whether they're going, whether super margin or just the margin, right?

So they always have an incentive to provide as much consumption of services as possible, and I worked in the consulting industry for quite a few years now, and I've heard them saying, for so many years, well just collect all the data. and there has been like this, this belief, this truth, that just collect all the data. Whether you use them or not, or whether you need 'em or not, doesn't really matter.

Just collect them. But that is part of the problem, right? Because the amount of data we collect every day is just exponentially growing, and it will continue to exponentially grow because of the transition to digital services and product. So we really need to change this mindset around whether we collect data or not, for which purposes, and how do we actually handle the data we do collect because we know that.

a lot of the data collected are never used again or maybe just used once. And there are different numbers around how much data is never used, and it varies from what, 50% to almost 90% of the data collected never used again. It's really hard to find the real number, but the point is that a very large share of the data we collect are never used.

And if we are going to use them, there's a huge job in making them usable.

[00:40:05] Gaël: So the cleaning part adds even more cost to the planet than just, just collecting this, huge amount of wasteful data .

[00:40:13] Elin: Yeah.

[00:40:13] Gaël: I think you were referring to both Tom Jackson & Ian R. Hodgkinson,article. The recent one on the Doug data where they started stated that like 50% of the data, was Doug Data and the other numbers come from Jerry Mc Govern.

[00:40:26] Elin: Yes, the world wide waste. Mm.

[00:40:29] Gaël: It's still a huge consensus that it's above 50%. So that's a lot, definitely.

[00:40:34] Héloïse: The good news is actually storing data costs more and more money. And, and one of the reason why I had no difficulties to actually start,a service, in my team, we have a new service in finops. is actually, it costs so much more than before. . like a few years ago, companies thought, okay, the cloud, everything is almost free because it's very cheap.

And they started to pour more and more data in their data, like, and now they realize that actually it costs a lot. And the fine tuning of, how much data we collect, how much we store, and how long is, is starting right now because it's cost, it costs too much for most companies. so, so I, I think it's actually good news because back to reality, people start to ruin eyes.

They have to be more careful, on what they store. And GDPR has a, a good, impact on that because European companies have to be careful about what they store and, and. Should actually prove there is a purpose that's, legitimate. And I think, I think it's a good thing. the darker side is when we talk about data stored by most companies, it's actually something like five or 10% of the global data that's stored because most of the data stored in clouds and infrastructure comes from videos. And that's, that's actually trusted by large, companies such as Netflix and Facebook and Google , and, and, the more data is stored, the more money they do. So actually the constraint is really not the same. when you, when you talk about some sectors in the industry and, and other sectors. So the solution is not the same. And I think, when, when we talk about the amount of data that is stored worldwide, if we really want to address the real impact, we have to address, the streaming industry and, internet companies and what we can do at our level in the industry such as CS and cf.

We have 400 terabyte of data and all. So we are improving that. We are fine tuning, but it's, it's like a drop of, of water in the ocean. It's important to address this drop of water, but the ocean is, Elsewhere.

[00:43:06] Gaël: But this drop still cost you money? Because I had another opinion stated by a, an engineer director who told me recently that he really struggled to motivate his engineers because, you know, eventually you drop everything on a S3 bucket, on AWS and even the cost of creating a data pipeline that will at some point store this rather on glacier, et cetera, didn't worth the money.

[00:43:30] Héloïse: It's a question of, comparison, between,what's the cost now and what it was a few years back. To give you an example, the storing cost, in, in the platform that we manage, I don't have the number exactly, in my mind, but it went from, I don't know, 20,000 euros, a few years back to 400,000 today, something like that.

So it, it multiplied by, a factor 20. So of course, if we reduce that by, I don't know, a half, it's as much money that we can invest on, on new projects or on new, new ideas. So anyway, it's useful to address this and to reduce cost, even if it's not really, really expensive. 400, thousand euros is nothing compared to the global IT budget of A large company like Sncf,

[00:44:27] Gaël: the trend is, the trend is warism, so it's better to pay attention to the trend now rather than, you know, it multiplied by 10 every year. And at some point it starts to really weight on the IT budget that that's your message.

[00:44:39] Héloïse: Yeah, exactly.

[00:44:40] Gaël: And, You know, actually, I, I would love to bounce back on what you said about we need to separate like the streaming industry with other industries.

Now that both El and you Héloïse, you're very keen on talking about real use case for real word people. believe that there is a question of who can truly leverage its data and, and which kind of data is truly used in company.

And el recently , you made this statement that the metaverse is on, you know, is a buzzword. Everyone talks about it , but most companies they still struggle with, their digital transformation.

Would you mind to elaborate a bit with this? Like, very simple but very hard question I, I believe to answer, which is who can truly leverage its data?

[00:45:24] Elin: Huh. that is a really good question. Uh, I think, well, just to start my reasoning off with the vision of the meta.

[00:45:34] Gaël: Hmm.

[00:45:34] Elin: Metaverse.

[00:45:36] Gaël: Please shoot

[00:45:37] Elin: Yeah. Well, mark Zuckerberg has been given the dubious honor of having coined a term, but he didn't. Right. It was in Neil Stevenson's book, snow Crash, 30 years ago.

That's where Metaverse appeared the first time, and now Mark Zuckerberg. Introduced the term again, and then media has to some extent revolved around him for the discussion about what the metaverse is. Reality is that meta is only like a secondary role player in this whole metaverse landscape. I'm not even sure we should call it a metaverse reality is. Virtual reality, augmented reality are technologies that have been on the rise for several years, and really startspotentially creating value for people, solving maybe some new needs, some old needs.

A lot of, uh, different use cases that we might discuss in the context of are these actually relevant or needed for our future? But let's just stick to the technology landscape for now So there are so many players in this landscape and so many commercial interests. And I think this really confusesa lot of leaders and decision makers.

They think that this metaverse thing is something Zuckerberg came up with. So we don't need to care. But that's not the ,situation and I think we just need to look at our kids. They do gaming for fun, and when they're gaming, they also meet their friends and they chat about life when they're gaming and they get more friends that they never meet, but they meet 'em online,and then they also have their avatars and they buy stuff for their avatars. Currency that is somehow based on blockchain or other tokens. And then we have a whole new economy that our 10, 11, 12, 14 year olds grow up with. And that is changing the landscape a few years ahead. And if you think about which use cases do we not, For this virtual reality plus landscape, and I started making a list and I had to scrap the list and go back to which use cases do I not see in this landscape.

That was much easier. I came up with four. First thing is that you need to clean yourself with water that's very physical, so you actually need to do it in the physical world. Then you need to empty your intestines, so you really need to go to the restroom every now and then. You also need to do that in the physical world.

Then you need to eat basically because your food or your body needs physical food. Every now and then you. Eat digital pizzas. And then the fourth thing is recreation. I'm not even talking about sex because the sex industry is probably one of the industries that are going to make most money on this. So we're not talking about that.

We're basically talking about the reproduction of the human species, those four use cases, all other use cases. I think we can somehow, at least parts of a scenario that is in the digital space. And then coming back to Elise's comment about streaming now, about a year ago, a vice president of Intel said in media that to fuel the metaverse, we would have to multiply the power consumption with about a thousand. So a factor thousand. On consumption. Now that is a lot, that is images, video, and language.

[00:49:25] Gaël: But do you believe that most companies and businesses. They will have to start to deal with this huge amount of data, like videos, 3d, et cetera, because the message, I mean, Héloïse's message, I mean, correct me if I'm wrong, Héloïse was also that it's not the case for the vast majority of businesses today. And we tend to see the word via the lens of, Facebook or Google. But even, even at the, you told me that you didn't have enough data for some maintenance, prediction algorithm, because incident are such a rare event. Fortunately for that, that the amount of data is not enough.

So , if I got it right, both of you, it's like today, businesses, they don't deal with such a big amount of data, but in the future, because of the metaverse that might happen, that actually pretty much all of them, they will have to deal with a massive amount of data. Or am I getting something wrong here?

[00:50:19] Elin: Well, I would add that there is a very huge spread, so very industrial companies like. Luis talked about, they don't have necessarily that much data and they will still need quite a few years to maybe catch up. if they ever will. on the other hand, we have media companies, for example, or insurance companies and banks that have a lot of data.

So there is a huge variety. And I think what is really important isrefuse to discuss these challenging sides because parts of the market doesn't have the challenge.

We still need to talk about both the very real perspective of how can companies, get more data like industry like health and med tech they struggle with access to data like ocean technologies where we really need to understand more about the oceans, about, uh, utilities and energy production. but then on the other hand, we have the extreme cases where they have a lot of data and they use it for very advanced purposes, and we need to be able to think.

Both those thoughts at the same time, and not all leaders need to do that, but I still think they need to understand that their perspective of the world is not the full perspective of the world.

[00:51:47] Gaël: Hmm and other equipped today, there's business leaders.

[00:51:51] Elin: No, .that, was a simple answer.

[00:51:53] Gaël: a straight answer.

[00:51:56] Héloïse: Nope.

[00:51:57] Gaël: come on

[00:51:58] Héloïse: Actually, I don't even un understand what Metaverse is, and I have no idea. Even if we, we had a unlimited amount of energy , and, I dunno, supplying of, everything, I have no idea what a company, an insurance company, a train company, a plane company, what would they do in this, this thing?

and I agree with that in, I think people are not equipped to think about the future,technology such as metaverse for most people, it's just a buzz word, related to the entertainment industry. And I, I don't see something I outside entertainment,

[00:52:41] Elin: if I may add, I think it's easier to think of. As virtual reality plus., and then you could probably easily see use cases, for example, in industryrelated to, training of engineers or technicians in the field and inspection of digital twins, even if you are not on site. I think those two would be maybe two of the first examples, and they're not really metaverse. until they are in next generation of an internet where these data flow across entities and across providers. And maybe that's why I say we should perhaps scrap the term metaverse and talk about it as a more next level of internet and how we use data because it is really about a more immersive way of using data where virtual reality and augmented reality plays a significant role.

[00:53:39] Héloïse: Yes, I understand. So, so actually metaverse would be to digital twin and virtual reality. What AI is to machine learning and robotics. It's, it's just a higher, word to just. group everything into the same concept.

[00:53:58] Elin: Well, I think you can look at it as. internet on steroids where you are not just looking at it and getting feedback, but you become a part of it because you feel like you are part of what happens. Because, for example, you use virtual reality goggles or you have some kind of sensors that are connected to this internet, but the kind of the next generation of.

[00:54:26] Héloïse: but on on that aspect, I think there is a kind of fantasy about what companies can acquire, as technology, new technologies because there is a problem of cost can. , for instance, internet of Things, we, we can, say I think it's a kind of mature technology with sensors is something that we can produce, worldwide they are everywhere.

A bit too much actually. And even with Internet of things, I was surprised because,, when I started at ncf, seven years ago, they started,to use Internet of things and the, they had the idea that they would have deployed hundreds of thousands of sensors on the network.

But actually a few years after that, it's only a few tens of thousands and probably not more. Why? Because it's too expensive to deploy hundreds of thousands of centers on, a network that's, about , 30,000 kilometers wide. It's too expensive in comparisons to what actually, they bring because okay, we can improve, efficiency, measuring for instance, temperature of the rail is important to avoid, problems in, in the summer.

but actually we cannot, it's too expensive to put a temperature sensor. Every kilometer on the network. So actually, even the technology is available, it's, it's not, possible to use it, for this purpose. So I have doubts that, even if, steroids, I don't know, digital twins of steroids, could work because actually it would be too expensive for what it's actually brings. and there is this question of what's the risk if I don't invest in. Maybe it costs, a lot less. I have more failures, but it's okay. because you don't always want to have the best product, because, it's too expensive.

[00:56:42] Elin: Well, I think you're absolutely right, and this is wherewe need a dose of pragmatism into this whole metaverse discussion. Also because if you look at the hardware cost. Having to invest in better hardware, better screens, monitors, gaming machines, better service, not just in the data centers, but on each and every desk.

That is Also, a huge environmental cost,

[00:57:12] Elin: is very often not talked about, but it is very.

[00:57:16] Héloïse: Yeah, and the cost of energy is also something that will come into the picture very, very quickly.

[00:57:24] Gaël: Sorry about that my dear listeners, I don't know where I will cut any of this discussion. I'm enjoying it just too much. I try to keep my episodes below, one hour, But in this case, that's gonna be very difficult, but still being mindful of time and I believe it is highly connected to what you said.

I'd like to ask you one last question about this discussion around Metaverse AI in general, et cetera, and once again. I know I quoted it quite a lot, but I really enjoy, the exchange I had with him with c e me that the massive topic, which is completely overlooked in which there is no research today, is the indirect impact of ai.

Like over technology, digitalization and acceleration of usages thanks to AI are taking an unsustainable toll on the planet. Actually, when you say this sentence, I immediately connected it to this why question that I often asked to my guest, which is, do we really use AI and do we really use data science for the right stuff are we really using data science for stuff that makes a difference or are we increasing the speed thanks to AI or internet on steroids? To quote you Helen,into the very wrong direction,

[00:58:36] Elin: Good question. I think we should just look at AI as a rather advanced tool, but still a tool. but then on a personal level, I'm a medical physicist by education. And well, we basically started. Medical imaging technologies, I see some very obvious use cases, for machine learning in for example, interpreting x-rays And MRIs. and I see some very obvious use cases in food production and in understanding and saving our oceans and also in how we use energy.

And then I see some use cases, for example, in retail where I think, well, the world doesn't actually need this. This is a waste. And I think I would like to just close off with Jerry McGowan's perspective. We need to reduce our consumption. and it doesn't really help to use AI to speed up consumption.

[00:59:34] Gaël: That's one, one of my favorite authors that you quoted in the very soon, future guest in the show, Héloïse You might have some different views on it or not.

[00:59:44] Héloïse: Well, yes. I concur with, with Elin. I think that there is a huge bias on AI in its use because, the people talking,more about AI in the medias or, people who are very, present, in the medias. So, and mosque for instance, is message too much first about what he's doing.

Facebook, Google. It makes us forget that actually agree. AI is just a tool. So the question is not, do we use AI for the right thing? The question, you ask is more is this company business something useful or not?

And that's what I'm wondering about, metaverse or even streaming is, is it okay to have people, sitting in their apartments watching, tv shows for hours? I mean, I do that. Like I, I also watch TV shows, but at some points we have to wonder. AI is just a tool behind businesses. So is it useful to watch TV for so long?

is it useful to, to use trains to use planes? And, and if we ask this question, then the question of, the use of AI is answered as well, because if AI is improving efficiency, reducing cost of those businesses, then it's, and if those businesses are, are thought as useful then AI uses useful.

but it's just a tool actually, , we, we don't ask whether or not having a computer is useful. It, it depends on what you use it for. I mean, I've said many times that we don't have a lot of data at cncf, that we don't use lot of, deep learning because we don't have a lot of, videos and images, but I. Used AI for use cases, that are really useful. because I don't know, I have some use cases where machine learning, just helps technicians and engineers, This is useful, but it's not something you will talk on the media because it's simple

[01:01:59] Elin: I think, what Louise just said, that many of these use cases, they are not going to create headlines. They're not going to get a lot of tabloid attention, but they're still very useful for our society for some reason, and those are the use cases we really need to work more with and solve and make the world a better place. It is about the problems to be solved.

[01:02:23] Gaël: Absolutely. And are you optimistic about our paths, both of you? Will we manage to reach a more sustainable word and actually to increase the digital sustainability in our word

[01:02:35] Héloïse: Well being pessimistic is used less so I am not pessimistic, but sometimes,

I feel a bit, discouraged by, our behavior, the choice of, companies and the choice of individuals. And one of the reasons why I, I wanted to take a step back and I am quitting my job is, I, I think there is, a lot to do in,explaining where is actually the real problem to. today we are talking in Europe a lot about, reducing our energy consumption. And when I hear people, politics and, companies saying, okay, you have to turn off light. No, no, no. That, that, that is not enough. I'm sorry, but turning off the lights, it is useful, but it's 5% of the afford we have to do.

The problem is that people don't have the elders of manager in mind so that the afford is sometimes not towards, the correct direction. It's, it's towards a direction that actually, um, makes, her line. but not on the 80% of, gain we have to focus on. , and I think, I'm, I'm optimistic if a lot of people start yeah, wondering what problem should be solved in a cold, rational way of thinking and not emotional way of thinking.

[01:04:07] Gaël: And what about you, Ellin? Will rationality save us?

[01:04:10] Elin: it could have if humans were ever able to become rational, Some can, and we all become too rational, we become psychopaths. So well, if we look at the development of the human brain, I'm not so, um, optimistic really about our ability to be rational. maybe we can hope that the majority of humans will actually do what they can to make the right choices to make the world a better place. Right now, maybe I'm a bit more pessimistic than Luis.because I think that right now we are not really proving our ability to, make those choices. But, I hope.

[01:04:55] Héloïse: Yeah, I'm optimistic because I think when you actually take the time to explain, people understand and they make the right choices. And actually this podcast is one of the examples I, I've always heard from people saying, okay, if you are a scientist, if you are an expert, keep in mind that you have to explain , this complex concept in three minutes.

That's the, that's what people always say in medias. You only have three minutes to convince people of your thesis, but this is wrong. Otherwise, uh, podcast, lasting one hour, two hours, three hours, we have no audience. But it's not the case. People actually want to take the time to understand. And I think, if experts spend more time on medias and spend a lot of energy explaining concept, taking the time.

Then people will listen and make the right choices. and I think that's what we should do now is maybe, um,work less on everyday life efficiency and work on,communication a bit more so that people individually and collectively in companies or institutions , will make the right choices.

And I'm more optimistic because people actually start to listen a lot. the question is the content of what, what's being said on the media should change and become more rational. I think it's possible.

[01:06:25] Elin: So, that's why people like you and me, um, we have work to do.

[01:06:29] Héloïse: exactly.

[01:06:31] Gaël: And congratulations to the listener who spent, one hour with us trying to understand and deep dive on these topics. that was great to have both of you on the show. I really enjoyed the discussion. I told you, my listeners, that it would be very refreshing with different points of view and a lot of good connection.

I will as usual put, all the, the references, in the show notes. any final word, Elin or Héloïse, that you want to share before we close the podcast?

[01:06:57] Elin: Well, no, I think, first of all, I just want to say thank you for inviting, and Louise, it was a pleasure to meet you, if I can put it that way. And of course, for listeners, I'll be happy for people to reach out and share their opinions, or questions as well.

[01:07:12] Héloïse: And well, same. Really thank you girl for this time together. It was, nice. And I would say, well, if I have a final word,it would be, forget about, over technology of,usages. actually in many cases, the simple, solution is the best, because it's faster to implement.

It's simpler to maintain and it costs a lot less. Of course, my team of data scientists are not very happy when I say that, but in the end, they understand and they agree with that. If it works, and if it does the job, then we have to focus on that and okay, it won't make the hand lines, but it will make the business work more efficiently and in the end, that's what we want to do.

[01:08:08] Gaël: And I believe that will be the closing word. Don't rush for over personalization. Go for the simplest solution. Thanks a lot to both of you. That was a very cool interview.