Episode 10

Celonis Migration - a Journey From On-site to Cloud!

June 9, 2021
Mining Your Business

Listen on:

Episode Content

Continuing from last episode, the leading process mining platform Celonis has an On-Premise and a cloud solution. If you have an On-Premise, but you want the cloud, what do you do? We have got you covered. Jakub and Patrick are here to talk all things Celonis migration.

Transcript

00:00
Patrick:
Welcome back to the Mining Your Business podcast, a show all about process mining, data science and advanced business analytics. I'm Patrick and with me as always, my colleague, Jakub, hi.

00:11
Jakub:    
Hey, hey, hey.

00:13
Patrick:
Jumping off from last episode about on-premise and cloud, what does a migration between the two even mean? What types of migrations are there and what are the common mistakes to avoid? All that coming up next.

00:34
Jakub:
Celonis migration. This has been a huge topic for not only our company, but specifically for myself and my colleague Patrick Celonis migration is something that we've been dealing with last couple of, I would say, almost years with major clients of ours. And in a sense, it means transporting something from an existing on site Celonis instance into a cloud version. Is that right? Patrick?

01:05
Patrick:
Yeah, that sounds about right. So let's kind of get into what we're actually talking about. So Celonis in the process mining software place is one of the major players and they have two versions of their software actually technically kind of three. But the major is the on premise version that runs in people's own little space of IT infrastructure and then also their cloud service. Now, what we do or what this migration is supposed to do is migrate all the code, the data models and the knowledge, the analysis, all this from the on premise instance into the cloud. Now, Jakub, what do I mean when I talk about migrating transformations or code. What do I mean?

01:53
Jakub:
So in Celonis you have all kinds of different objects, when you are an end user and you are going into the report, then you can drill down on your company code and see what kind of quantities you've transported in last period. You first have to slow down a bit and think about what is behind that. So as I said, Celonis has certain objects where each object has a different role. So you have to first somehow execute some kind of code which fills in the data into what we call a data model. So in Celonis on premise system, that's the Celonis which is running in your network. You are running the code very likely on a customer database. In our case, that usually used to be an SAP system, but you can go in with other databases such as Oracle.  Anything that stores large amount of data. But again, in our experience, we most of the work with HANA from SAP and so what we do there is that we write a code, a query that we execute and it fills in our data model in Celonis.

03:21
Patrick:
Exactly, and of course, we have to migrate this from one to the other. But we also need the data that this transformation is supposed to be running on. So that's also another big step. We have to transport the data from the on premise database into the cloud. This is another thing we have to migrate. Additionally, you've already mentioned data models. Data models need to be transported from one place to the other. Usually this is a little easier but keeping in mind that all the data and all the relations to these tables between them need to be transferred correctly into the cloud. And then finally, capping it off, the analysis themselves. Of course, your lovely reports that you've spent years building obviously need to be migrated as well. So that's the then the last step of the things that we need to migrate. Am I missing anything?

04:13
Jakub:
No, I think Patrick, I think this is it. So just to recap on this, it's not all your standard ctrl+c and ctrl+v copy pasting from one file to another. It's a bit more daunting and tideous task. It's, imagine if you're moving from one country another. So you have a beautiful flat in Germany and I'm speaking from my own experience now. How does it work? And you decide to move somewhere else. Right. So it's not only that you have to take all of your furniture and everything. You also have to think about what's in the next flat or even how do you take it there. You have to think about if everything will fit? Will my huge wardrobe in the corner of the room fit into my slightly less but, you know, more spacious kind of living room? So these are the kind of things that you will be asking yourself. Also when you're migrating anything that's remotely similar, but not quite.

05:15
Patrick:
Absolutely. Absolutely. So knowing that this is a bit of a daunting task, why would people do this? Why do clients of ours keep asking us, can you please migrate us to the cloud?

05:27
Jakub:
What you are doing right here is that the you're touching the motivation. And there are multiple things why clients are even thinking about migrating from on-premise to the current IBC instance or Celonis already goals at the EMS, so let's just keep up with the times, right?

05:50
Patrick:
You mean EMS?

05:51
Jakub:
Oh, yeah. See, I keep losing myself in the in the new names. Anyway, so I think if you listen to our previous episode, when we were speculating and debating on what are the pros and cons of on premise and cloud solutions, for any kind of software, you already have an idea on why they are doing it in the first place. So speaking of that, I think one of the major reasons why they are doing it is still keep up to date. So Celonis Cloud Version in IBC I will still stick with the IBC, Patrick, don't blame me, I am an old fashioned guy, so. 

06:33
Patrick:
Okay, okay, of course.

06:34
Jakub:
So in IBC you have significantly more features than you would ever wish for having in your onsite instance. The reason is quite simple and is it that Celonis didn't quite stop supporting the old versions, but it doesn't do as much development on the on premise systems because it's just for them, it's not as easy to roll out new stuff as it is for the cloud version. So in the IBC, what you can do is there is just so many things you can set up the action engines skills. You can use the backend power that you that comes with IBC, meaning that you can for instance, write your own Python queries or Python scripts, which makes your life significantly easier. And I think we will touch on that as well. And generally staying up to date.

07:33
Patrick:
Absolutely. So the new features is I think one of the biggest draws I think if you listen to our previous episode, you will know that applying new updates into a cloud system is way easier than distributing it to everybody that is using an on premise system. So not only do you get fancy new features, but with any type of cloud solution, you also get more scalability. This is another thing we touched on the previous episode, but this is also holds true for a Celonis migration. The amount of data and things that you can do with your system are a lot more scalable. So additionally, what we also do is there are of course, process improvements that come with this. So migration can not only be a hey, we take our X and put it into the other place and have the exact same thing, but we can also improve on the process. Because this is a perfect opportunity for any type of programmer to fix that bug they've always wanted to fix, but couldn't get approved or couldn't really find. If we're already going to be looking at all this code, might as well improve it.

08:43
Jakub:
Exactly. And the last point that we have here is the overall hardware limitation that is essentially transferred from the client system to Celonis in this case. And I remember from the experience we had a client where we were in parallel basically accessing a multitude of different systems. And I think it was the number was as high as nine systems and we were running relatively heavy queries there. And you can imagine that a customer was not very happy about that and there was not that much we could do about it. So that was also one of the driving forces behind migrating or switching for the cloud solution in the first place.

09:26
Patrick:
So kind of shifting responsibility of kind of keeping the system up and running, having dedicated people to fix all these bugs and just shifting all that responsibility to the cloud. No headaches, nothing for you. Somebody else is handling all the I.T., all the infrastructure, the access, making sure that all works and all you can focus on is building your pretty analysis.

09:45
Jakub:
Yeah. I mean, value creation, you wanted to say not building analysis. Focusing on the value, but yeah, true. Patrick, we get to the timeline. Let's get really down to the project. How does it look?

09:59
Patrick:
Right. So first of all, whenever we are in discussions it is incredibly important to get on the same page with the client about what the migration can do, what it cannot do, and kind of giving a getting an idea of all the things that need to be migrated. Right. So how many analyzes are we talking about? Are we talking about ten or we're talking about thousands, right? How many transformations are we talking? Are these, you know, things that could fill a whole book or are these just like a couple of queries you know, kind of gathering the getting an idea of how much the client wants to migrate?

10:37
Jakub:
Yeah. As the next step, I would say is determining the migration strategy. So how do you proceed, who does what, who or what is who's the responsibility? So is the client just going to give us all the queries, all the scripts? And if he does, is he absolutely certain that we have everything or are we missing something or are we just going to go into Celonis and, you know, look for the dependencies ourselves? So, you know, searching for the queries in the database and so on. So this is also very important to be very clear on whose role is what.

11:18
Patrick:
Exactly, and with migration strategy, there also comes a bit of a few migration types. We will get into that later. These are very important just to define what the acceptance criteria at the end is. What does the user want to see at the end? Are they OK with things being different? Are they not? Do they want to see the exact same stuff? These things need to be set at the beginning. So when it comes to the time to for the users to validate, hey, here's your here's your stuff and they don't recognize any of it, you know, that's a bad look because that wasn't an agreed upon. So just agreeing on the type of migration that is possible and what it's going to look like at the end will of course be favorable for the client to kind of be familiar with the hurdles and the obstacles that will be in the way.

12:05
Jakub:
Yeah and not to mention that, you know that the cloud platform is also developing. So the longer you wait for this migration to happen, the more changes you are likely to see in the future. I mean, Celonis already changed the design, the frontend design, it will likely go to a different set of color soon. So yeah, you know, it was one thing to do the migration a year or two years ago I think a year ago because two years ago there was an IBC, but it might be a bit different story the longer you wait.

12:41
Patrick:
Right. So what you're saying is migrating from an on premise to Celonis, the longer you wait, the more divergent these two on premise and IBC will become.

12:52
Jakub:
I wouldn't be surprised.

12:53
Patrick:
Yeah, OK. The next step in this is the actual migration, right? This is where we actually go in and go through line by line of code, put it over into the cloud and check that it runs. We need to pull off all the data that we need. We need to do all the migration tasks that need to be done. That this is really what the bulk of the work in my eyes is.

13:18
Jakub:
And those are what exactly?

13:22
Patrick:
The bulk of the work? Oh, that's for me going through the code. It's figuring out what the data scientists before me wanted to write or if they're finding bugs potentially kind of knowing if this was intentional, did they just kind of miss a space or something. You know I'm getting all that getting the meaning behind the code working in the cloud. There's also, of course, problems that some functions in the code where the database was running, for example, if it was running on Oracle. But we don't have an Oracle system. We have a Vertica system. Right. There's difference of functions and how they operate and how it handles spaces and things like that and all that knowledge needs to be transferred correctly and identically to the cloud.

14:08
Jakub:
OK, so I remember at the beginning of the episode, we kind of mentioned different objects that we have to transport. So now let's put ourselves in the shoes of the data scientist, which we where at some point both of us were doing this. So we know very well.

14:24
Patrick:
Currently are.

14:25
Jakub:
And currently are, Patrick. So we have all these objects. If you are this data scientist who is doing the migration, what do you do first? Or like how do you proceed? What are your tasks? Let's say that you want to transport one analysis to IBC. What do you do?

14:45
Patrick:
Oh boy. So if we already have the data and we already have the transformations and we have the data.

14:50
Jakub:
The data? Maybe let's start at the beginning.

14:55
Patrick:
OK, so first of all, it is important to kind of gather all the tables and all the data that you need, right? Because an analysis doesn't need every single table you only need some tables. Right? And of course, the extraction part where we get the data and put it into the cloud, it obviously requires a lot of work. So we don't want to take more data than we need to. Because one, it's wasteful. Two can take a long time and generally kind of make some sense. So we try and figure out how much data actually needs to be exported into the cloud. Obviously, for security reasons, it's obviously nice not to take all the data, but just all the stuff that you need. For example, I only want tables, but I don't want to get the user information right. That's private. That's GDPR related. We don't want any user information, so we don't even extract that into the cloud.

15:53
Jakub:
Yeah. And I think with that also comes the different, you know, restrictions you have on the table. So let's say we talked about purchase to pay process. And in there you are looking at certain kind of documents if you already see in your code and in the initial implementation that something is specifically excluded for whatever reason it is, you might not even want to pull the data of this exclusion into your cloud because that's again, as Patrick mentioned, unnecessary amount of work takes more time and why do that?

16:27
Patrick:
Exactly, because it is a cloud instance and like many of you know, you pay for what you use. So the more space you use, the more expensive it will be. So it kind of makes sense to limit the amount of data that you need just because one, the space you'll occupy is less and you'll end up paying less because you end up using less space.

16:48
Jakub:
OK, so you got your data in your IBC instance. What is the next step?

16:52
Patrick:
Well, then it is testing all the data, of course, because, you know, extractions can not always, always function the way you think they do, just verifying that you have all your data. That's a very big step. Make sure that all your change logs and all the data is consistent across all the months. These are just generally good ideas.

17:11
Jakub:
So what you're saying is that the extractions may and will go bad at some point?

17:17
Patrick:
They always will. To say that they'll go smoothly is just a lie. Thinking that they can't is just a wishful thinking. They can and will fail and it's always better to verify your work. So once you can verify your data integrity, let's call it, we can then move on to migrating the transformations. Right? So that essentially means going into the client's database and saying, hey, what code did you have running for this data model? Then they give you the code. You have to obviously make sure that it's the current version of the code and figuring out what was written on their database and then translating it into our Vertica dialect, so on our database is running in the cloud. So that is obviously the bulk of the work, right?

18:14
Jakub:
Yeah. This is actually the most hideous part of it all. So imagine it's with any code, even though it's the same branch of language. So in here we are talking about SQL language it will have a different interpretation, there will be different functions, there will be a bit different set of rules on how you write the code and when you just take a code from one database to another, you are almost guaranteed that not all of it will work, even though from a functional perspective or from a from a logical perspective, everything makes sense. You still will have some issues. Unfortunately there is no, at the moment, there is no online tool that would be working with a precision enough that you would just copy paste your code from one database and it will translate it for you. There are some, but it doesn't work well enough for us to use. So we really have to go into each code and we have to make sure that it's executing as desired. Not to mention that each database also works with the way how the code is computed differently. So what might have been running effortless and seemingless in one database doesn't necessarily mean that it will run without any issues in the new one. Even when the code doesn't contain any errors or any functional issues.

19:51
Patrick:
Absolutely. And this is also one of the points that we will discuss later when we talk about the goals is do you want to already change your code while you're going through it? Because you said we have functions in one database that we don't have in the cloud. So we need to think of something new. Now, we're already changing the code. How far do we go to change the code when we might as well just say, well, we might as well just bug fix some things while we're here. We might as well optimize the runtime a little bit while we're already here. You know, these are obviously things that need to be determined beforehand before we actually go do the migration but these are things that come up in this stage, right? It is also kind of weird for us if we see a very obvious bug in the code, but we implemented anyway, right? The client is expecting this bug. He's expecting that this looks the way it does even though it's wrong, but you still have to implement it because that was agreed upon, right?

20:49
Jakub:
But it's his little bug, it's his little bug that he wants there.

20:52
Patrick:
Exactly. Let's move on to the next thing. What's next?

21:03
Jakub:
The next thing are the data models. So that's essentially the object that is storing whatever your queries ran. So that contains the data that you will eventually work with and display in the front end of Celonis. So luckily enough, the data models don't differ too much from on premise and IBC instance. So what you can do and I think what we are doing right now is that we are, you know, we are using the strengths of Python, right, Patrick? And what we do is we kind of do Python version of copy pasting of the data model so meaning that we take whatever is in the onsite version and we just push it into the cloud version. Then we do some minor adjustments, but the bulk of work load is actually done with this Python leveraging.

22:03
Patrick:
Exactly. So if you imagine just a data model, you have to add your tables, you have to click it together with another table, kind of put the relation between these two tables together and then you're done, this can't be that hard. Sure, but we are talking about data models that have like forty, fifty tables in it all relating to each other. Then those data models, there's multiple, multiple data models, tens, twenty, hundreds of data models that need to be migrated in some cases. That's a lot of work to click, like you will easily spend a whole week just clicking these tables together and the rate of human error in this case of course can occur, so relying on python to do this correctly and one very quickly is a huge help to us.

22:51
Jakub:
And the last object that we have here that we need to also move are the Celonis reports. This is the stuff that you see in the frontend that you can click through and that you see your data and whatever you want. Also here, what we don't do is that we go and create them from scratch in the cloud because as Patrick said, that would be we would need about ten times more time than we currently need for this migration because creating a report in Celonis can get pretty difficult and time challenging, I would say.

23:30
Patrick:
Absolutely. So here again we rely on the strength of Python just to pull the analysis from the on premise version, take its data and just push it into the cloud data. This works surprisingly well because the code that these both rely on, the on premise analysis and the analysis on the IBC are pretty much identical. So the only real task that you still need to do once you've migrated is just check that everything is there. You know, did the python script just randomly terminate or something. What will never be, you know, taken away from you as a task is going into the analysis, checking the tabs. Does everything work?

24:16
Jakub:
If not, why isn't it working? Why isn't it working?

24:20
Patrick:
Yes, it's a common complaint in this office. That's about it for the migration. Now let's move on to the goals or quote unquote migration strategy. We've talked about this before. So there's I think in my opinion, when we talk to clients three ways they want to migrate, predominantly. There's one to one migration. The I have this in my on premise system, I want to see exactly the same thing in the cloud. I don't care how that's possible, just to do it. I want one to one. I love my analysis. I love the way it works. Give me the same thing in the cloud.

24:57
Jakub:
Yeah. What is important to mention is that it will never be exactly one to one, not because of the numbers won't match, but because as we mentioned earlier, there will have to be some minor changes at least in the code, because we are taking the script from one database to another. And even though you will get a same output, the same numbers, it won't match exactly what you had before. Plus then there is the other thing, and that's also the overall like how the architecture works. So in IBC, in the cloud version, you will have certain intervals with which you are extracting the data and then you are executing your queries. So you will have one time point on when you have your data up to date while in the on premise system, you might have a bit different time on point where you did run your extraction. So when you then do the validation you have to also take this into account and that just by opening to all reports next to each other on the same screen and not having the same numbers doesn't necessarily mean that the reports aren't identical.

26:11
Patrick:
Exactly. So what you touched upon is what we know is the Delta Gap, right? So the time when the transformations run in the on premise to what you see in the analysis, that's the Delta Gap. And in the IBC, it's the time from when the extraction start to when the transformations run to when the data model loads to when you finally see it in the analysis. So the time between these two points is different. First of all, and you also have a difference of the as you said, when the snapshot of your data is displayed in your frontend. So that's one of the points that we obviously need to make clear to the client, that there's going to be some deviations. These will be minor and should kind of only be happening in the current date, you know, because that's usually where these gaps tend to be. But yeah, just kind of making sure that we are talking the same language.

27:08
Jakub:
Making sure we are not missing €50 million in revenue in last year.

27:14
Patrick:
Exactly. Yeah. That would be a major, major problem, exactly.

27:19
Jakub:
So this one to one migration that we are calling is usually our entry point into any migration because when you start taking the data and the reports from point A to point B at some point you do need to do the validation and if you are going for one to one, it's very, very easy to validate and it is your starting point to whatever you want to do in the cloud next. So in my opinion, when you decide to go for the migration, you should always start with having one to one, which is your checkpoint for whatever you do in the future.

28:04
Patrick:
I tend to disagree. I think the migration one to one is a lot harder to validate, right, because even the smallest deviations will be noticed if you expect a one to one. Both numbers need to match at the end of the day, and finding out why exactly these are different, needs some time to investigate. We usually spend a couple of hours just investigating, hey, why is this different in one system and why does it look different? These things are difficult because you just need to explain the difference. If the client expects a one to one, you have to give them a one to one. So doing this, in my opinion, just the validation of this is a lot harder.

28:45
Jakub:
Alright, thanks, Patrick.

28:47
Patrick:
You're welcome, Jakub. Just my opinion. But as you said, the functions will be different. And additionally, I don't think I've ever been part of a migration where we have noticed bugs reported them and they didn't want it fixed. I don't think anybody has ever said that. So, one, we are not just having different database functions and things like that. We're also improving the code and here we all already start putting in intentional deviations, right? So when it comes to the validation phase, they will say, Hey, will these two numbers look different? Yes, because this is the exact bug that we talked about previously. You know, the bug that we fixed here has an effect like this. Right? And then the numbers will change in this certain way. So you will need to explain the effects of fixing these bugs or these improvements that you're doing. But the validation is a lot less strict, let's say, because if you already improve the code on your way to the migration, then it is easier to validate because the numbers won't be one to one. And you can say, yes, this is just the new norm because we fix these bugs, we made these improvements and there's just I guess it's sounds like a bit of a cop out, but there's no direct comparison anymore.

30:07
Jakub:
Right? So you're basically protecting yourself a bit in this way?

30:12
Patrick:
Well, yeah, yes, in a way. But also you need to prove the improvements that you made have the effect that they do, right? So you can't just say yeah, I changed the date conversion of the delivery times and they say, well, now I have €15 billion missing in last year and you're like, Yes, that's totally that improvement. No, no, that doesn't work. You have to, of course, explain the improvements that you're making, layout what effect it has and kind of show it one to one because you can run two versions of the code, right? So you can say this is the way it looks before and this is the way it looks with the improvement.

30:51
Jakub:
I think it again comes back to what we are trying to stress a lot of times in our podcast, and that's the communication, that you still need to be very clear on what you are doing, why you are doing it, and explain as thoroughly as possible to whoever is going to be validating it for you that it is OK to be this way. And also, I mean, when we are talking about migration and the possible deviation, regardless whether you do one on one migration or migration, with already some enhancements and improvements. What I'm trying to say is that there will be some deviations and you got to know why are they happening. That's probably the key point in this. So even though the numbers will be almost identical, there might not be exactly identical. And our job as the data scientist is to know why and be able to actually explain why.

31:46
Patrick:
Exactly. The worst answer you can give to why are there deviations is, I don't know. We always strive to explain the deviations because well, one, it just restores the confidence in the analysis. If there is already going to be different data, you have to restore the confidence in the validity of the data that the user is going to see. Right. Moving on to the last or the third migration type that we intend to come across is the full on do over. That's where a client is incredibly unhappy with the analysis as they are and they are slow, they're not very performant they don't really show the right things or they show processes that aren't relevant anymore. And they just say, hey, we're just going to move our stuff to the cloud. And what we move is kind of maybe the idea of a process. So if they had a bad O2C process in the on premise, they say, Well, we want O2C, but just do it from scratch. You know, to start fresh, we'll just start fresh, get rid of all the old stuff, put in all the new stuff.

32:54
Jakub:
Yeah, we don't like our numbers. It takes too long, so please make it look that our O2C is a bit better and a bit more efficient than it really is.

33:05
Patrick:
Yeah, exactly.

33:07
Jakub:
We don't fake numbers.

33:09
Patrick:
Yeah. So we can't really just change the way the performance of your business works, but what we can do is, of course, make the analysis better, enhance them with new features, enhance them with new code, make it cleaner, make it faster and things like that.

33:26
Jakub:
Yeah. And I got to say that sometimes the initial implementation is such a mess that it's opting out for building it from scratch is might be even easier and less time consuming than just rebuilding whatever has been done before.

33:42
Patrick:
Absolutely. Absolutely. So building it from scratch sounds like a drastic option in times, but knowing code and knowing the way it can look, sometimes it is the better option, but with a full on do over of code, you will know that there is no validation to speak of. Essentially you can just say test against your source database, see if your sales orders are here, see if all your accounts are here, and just do basic validation like a validation we would do in any type of implementation. But there isn't really a one to one comparison or you look at the on premise system on your analysis and check against the new ones. It just won't make any sense.

34:23
Jakub:
Patrick, if I remember correctly, you are complaining a lot about how many tests things you have to do in each validation process in this migration. Can you elaborate on that a bit more? How do you do the testing? How do you do the validation in general?

34:38
Patrick:
Well, so once we have completed the migration and we're happy with the way it looks, we obviously do pre checks to make sure the numbers match roughly you know, blah, blah, blah. We obviously don't give it to a user department because the user department at the end has to work with it, right? They have to recognize their own data. They have to recognize their analysis right? And then they go in and they get a huge list of tests to perform, right? Do the counts match, do the amount of activities match? Does my KPI in this sheet look correct? Do the colours match everything you could possibly think of, they get a huge list of all these things to check. And if one of those things doesn't match you will be notified. There's a defect.

35:21
Jakub:
So then you have this endless issue log that you have to go through each point. And it also, I can imagine, also differs in complexity of the solution. Like is this only a change of the color or is this actually something completely wrong? And I have to go into the code and execute everything and hope for the best.

35:40
Patrick:
Absolutely. I mean, the thoroughness can be tedious for sure. And for us, I'm getting a defect for the same issue over and over again is obviously a bit of a time drain because we have to deal with the defect have to talk to the user, say, hey, what's going on? Blah, blah, blah. But there is a point to it. And being this thorough has its point is we have fixed, I think, three bugs in the last migration, that we just failed. You know, we wrongly implemented something and the users found that the numbers don't match. And it was correct, we did it wrong, you know, and then we went in and fixed it, found the bug to fix it and now the numbers match. So the thoroughness, the attention to detail can lead to very good results.

36:32
Jakub:
Speaking of defects and issues, each migration, there will be some obstacles on the way. What is coming to your mind as a first one that we are talking about migration? What is the issue? What are we usually facing and how do you sort it out or mitigate it? As a matter of fact.

36:52
Patrick:
How do you mean? Do you mean like the first thing that comes to mind when you say issues? One of them being predominantly do the testers know how to test? So when we give the testers these new analyzes and they should check them, do they know how to check for differences? That's a major thing. You can just look at the analysis and hey, this looks different. I'm going to report a defect. OK, but how does it how does it differ? Can you maybe think of why does it differ? Can you maybe filter it on the same date range to kind of see that the numbers match? So that is one of the things that we tend to do in our meetings beforehand. Is give a little validation guide that is just, hey, you should filter on this time range, focus between March to September. Then we can compare the same date range on the on premise and the cloud. You should focus on sales orders or focus on filter this way focus on the idea behind the defect. You see this in one sheet or do you see this in all the sheets? Is this a defect that occurs in many places or just in one? You know, just kind of give them an idea of what to test because getting more information from the testers about what is wrong, it helps us incredibly much in figuring out what the actual bug is.

38:17
Jakub:
What I had a lot of issues with when I was working on the migration was generally especially regarding the actual database. So I already mentioned it before. When you transport scripts from A to B, you will have to do some changes, which wasn't an issue. You know, even though it's a bit tedious, you correct everything, your functions are now working, you catch all the little bugs. But what was really, really I was struggling with was that each database also computes differently. So you have a different computing power on on premise system and a different computing power on a cloud system. And even though they might be similar, they might not work exactly the same. So they are working with the data differently. And what was acceptable in, let's say, HANA environment is no longer acceptable in cloud environment. And then you have to go and look for ways around it. How do you optimize the query in a way that it's still giving you the output that you want but won't come or won't to start having issues while being executed. So this was a huge pain point for me. I personally spent a lot of time just optimizing the queries.

39:44
Patrick:
Yeah. So currently I am also in a phase of a project where all we're doing is optimizing the scripts because we realize the way that we're transporting them is not really efficient. So it's causing issues with them just the performance of the whole cloud. So we decided, hey, let's take some time to optimize these scripts. And that has led to some incredible results. So scripts that used to take 40 hours to execute in the on premise system now takes 5 hours in the cloud and that is massive improvement. That essentially means that, hey, you can load your data model once a week, but now we can load it daily, right? So you have up to date data. So these important performance improvements is also very good for the user at the end of the day.

40:33
Jakub:
It also comes to the experience of the data science. So just because something runs longer in the cloud doesn't mean that well, it means that the script might not be optimal, but it doesn't mean that the cloud database will be less powerful. All you need to probably do is just to think about how the new database is going to work with the data and optimize your script in a way that the new database likes it more, can process it better. And then surprisingly, you might come to, as Patrick just mentioned, significantly better runtimes than you ever wished for in the previous environment.

41:14
Patrick:
Exactly, and it's not just runtimes that are important. It's also the stability of the transformations, right? If your transformations can range from, hey, sometimes they run 20 hours, sometimes 10 hours, sometimes they fail, it's not very stable, but if you get consistently 15 hours, every time you execute, they get 15 hours. That is a win in my eyes because the transformations are stable and you don't need to worry about them working one day and not working the next.

41:38
Jakub:
However, you still need to keep in mind that even in cloud, it's still some kind of a system and each system comes with a different set of issues. In Celonis specifically, there are times when there are some updates, so the system is simply down and sometimes you just don't foresee it and you have a lot of dependencies in your queries. So let's say you first need to extract the data, then you need to run your queries, and then you need to reload the model. And if one piece fails, then usually all pieces fail and you have to start over. So these are the things that you need to keep in mind, and especially at the time when you are basically running on two environments at the same time. So before you conclude your transports, before you conclude your migration, you will have for a period of time running two systems in parallel, the on premise and the cloud. And there's going to be a lot of headaches from that because each system will have different sets of issues and you will have to sort them out during this period. So get ready or be prepared for this.

42:51
Patrick:
Jakub, listening from you, I have now thought of two things that have caused me major headaches in the last migration. One, development freeze. Do the developers on the on premise system know that during the duration of the migration they shouldn't change a single thing? Because we get the code at one point in time and if that changes, the migration will obviously the validation of the migration will obviously be untestable because the code has changed since then. This is a major thing. Do all the parties involved in these systems know that the migration is happening and they shouldn't touch anything. This was a massive problem in the last migration, I believe. And also speaking of wrong versions of code, we were in a migration and on the last day of actually writing the code we noticed that we've been implementing the wrong version of the code that was actually supposed to be tested on Monday, right? That was on Friday. So we got the new code and now we had to just compare what was changed since then. And yeah, that was, that was a very rough Friday.

44:08
Jakub:
That sounds like a ton of fun.

44:11
Patrick:
Yeah, it was not, but yeah, ensuring that the code that should be migrated is verified and that nobody has a different idea of what code needs to be running in the cloud at the end of the day.

44:24
Jakub:
Patrick, wrapping up the episode slowly. Let's say that you are about to start a full migration for your customer from scratch. You are the project manager and also the lead data scientist and you tell everyone what to do. My favorite position to be at, of course. What do you do? How would you go about it, step by step to mitigate the potential issues and still run the project as seamlessly as possible?

44:53
Patrick:
Well, the first thing I would do is listen to the Mining Your Business podcast on the Celonis migration, of course.

44:58
Jakub:
Of course, of course.

44:59
Patrick:
But yeah, essentially kind of go through the timeline that we spoke about before. Gathering the requirements at the beginning, right? Figuring out how it's going to be and how the migration should be set up. Who does what? Assign people different roles, what they should take care of, providing the code, providing the data, verifying the data, go through all these steps, one by one, and do not deviate, because this the thing that you said at the start will be the basis on which you will validate at the end. So if you say, Hey, I want a one to one migration, that's how we're going to do it. You know, if there are some bugs on the way that are fixed or if the migration cannot be one to one, this needs to be already discovered beforehand before the actual migration takes place so that the users know what to expect and what the users will accept at the end of the day. You know, and after all that is said and done, the migration can happen and validation can also hopefully prove that the migration was done correctly.

46:03
Jakub:
You said it Patrick, you are the expert now for migrations and so are actually we in Processand, so if you, listener, by any chance are even thinking about migration and running on the older Celonis version and have some question regarding how to migrate it and if you eventually need some help just get in touch with us either on miningyourbusinesspodcast@gmail.com or you can also find us on Processand website, so feel free to visit. In any case I think for the episode, Patrick, that would be it for today, unless you have anything else to add.

46:39
Patrick:
No, I think from the amount of complaining we just did in the last 5 minutes, I think the listeners know that we have quite a lot of experience in the migration topic. So this is a topic that is very close, close to our hearts.

46:53
Jakub:
You know, migration at their own risk. exactly. The perks coming out of being in a new environment, I think outweighs the headaches that come during the migrating period. And then after you run in a cloud solution for a couple of months, you won't even remember how it used to be back in the days in your in your own inside version.

47:20
Patrick:
Absolutely. I don't think I've met a client that was unhappy after the migration or said I wish we didn't.

47:27
Jakub:
True, true. So yeah, thank you for listening. We will be looking forward with the next episode. We will have another guest here, another colleague of ours, and we will be speaking about workshops. So stay tuned for the next one and thank you for listening for today. And Patrick have a nice day, bye.

47:46
Patrick:
Yeah, you too. Thank you. Bye.

read more


GET IN TOUCH

Begin Your Process Mining Implementation

With over 350 process implementations, we know exactly what the crucial parts of a successful Process Mining initiative are

Jakub Dvořák

Data Science Team Lead

Data processing

We use cookies to personalise content and ads, to provide social media features and to analyse our traffic. We also share information about your use of our site with our social media, advertising and analytics partners who may combine it with other information that you’ve provided to them or that they’ve collected from your use of their services. By clicking “Accept All Cookies” you agree that cookies are stored on your device, which assists us to help improve your website experience and help our marketing initiative. For additional information on how the cookies function, select the ”Manage Cookie Settings” or use the link here: Data Policy

Necessary

Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot function properly without these cookies.

Name:
Content Manage­ment System
Purpose:

We use TYPO3 as a content management system (CMS). Cookies are used to store login information, for example.

3 Cookies:
Name:
Data processing management
Purpose:

We remember your data processing consent settings.

1 Cookie:
Name:
PHP
Purpose:

We use PHP to generate website content.

1 Cookie:

Analytics

Analytics cookies help website owners to understand how visitors interact with websites by collecting and reporting information anonymously.

Name:
Google Tag Manager
Purpose:

We use Google Tag Manager to connect Google Analytics to our website.

We use Google Analytics to register a unique ID that is used to generate statistical data on how the visitor uses the website.

5 Cookies: