Our client, leader in the Oil & Gas industry, currently owns a on-prem hosted data platform based on Cloudera that provides business services within the company to internal business units on delivering data analytical solutions.
Due to company digital transformation, cloud strategy as one of the pillars and predicted data growth, the client has to migrate on-premise data platform to the Microsoft Azure leveraging PaaS offerings with minimum refactoring of existing analytical applications and decommission on-prem workloads afterwards.
On-prem workloads to be decommissioned in 18 month.
Big number of analytical products, data ingestion components, data volume and platform capabilities to be migrated and modernized.
Migration to Azure-based data platform should go with minimum re-factoring of existing data transformation and business logic, absence of business disruption, data quality issues and performance degradation.
Network connectivity issues with on-prem data sources and reporting systems to be connected with.
High dependency on cloudera technology stack (e.g. oozie, impala, hdfs, sqoop, etc.) thus leading to challenges during target Azure technology selection, modernization and migration approaches.
There is a need to solve existing issues in data inconsistency and overheads to Ops team resulted because of scheduled-based triggering approach for executing ingestion component and analytical product pipelines.
Engagement started with the discovery phase focused on deep understanding of existing platform architecture state, preparing target architecture as well as conducting 360 analysis of existing analytical data products. The goal of analysis was to divide analytical solutions to different complexity buckets, agree on migration priority with business and select candidates for MVP to start with.
As a result of analysis, migration platform architecture based on Azure PaaS was prepared with a list of PoCs to be executed and list of first analytical products from each complexity bucket to be migrated as a first step. The goal of such an iterative MVP-based approach was to test out defined platform architecture using migration of analytical products with different complexity to identify any potential issues and define required accelerators to streamline the migration at scale.
MVP execution resulted into useful feedbacks that were incorporated into platform architecture to facilitate migration efforts and increase quality of the migration program, specifically:
Performance tests helped to confidently finalize Azure PaaS technologies to be used as a core for migration data platform.
Recurrent patterns for data ingestion components were identified which resulted into delivery of custom ingestion framework that can generate ADF-based data ingestion components based on configuration files to extract data in batch full & incremental modes from SQL & Oracle data sources into ADLS Gen2 Data Lake.
Recurrent patterns for analytical products migration steps were identified which resulted into creation of custom accelerator to automate migration routine for data pipelines of analytical products (e.g. generation of adf pipelines to wrap oozie workflows execution, auto-refactoring of source code to be compliant with HDInsight, etc.).
Understanding of baseline architecture and finalization of Azure technologies for target platform helped to define scalable approach for data quality validation during migration based on approach of automated data comparison between on-prem data assets and migrated data assets.
As a result, finalized Azure data platform architecture was utilizing:
Hub-and-Spoke approach for data platform following by Azure resources and data isolation for each analytical solution within separate Resource Groups and Subscriptions supporting costs segregation, dedicated security perimeters and clear ownership model.
Custom accelerators for automated creation of ingestion components, migration of analytical product components, validation of data quality, generation of CI/CD pipelines, etc.
Express Route for on-prem connectivity for robust and fast network channel for data movements from client on-prem environment.
Usage of Azure HDInsight as PaaS technology for running data processing workloads based for existing data pipeline components thus leading to reuse of existing business and transformation logic and decreased migration efforts.
Event-based execution of data pipelines for data ingestion components and data analytical solutions based on data availability and internal dependency graph.
Delivered MVP in 4 month time frame, planned migration was completed in 16 month, on-prem platform was decommissioned per client plans.
Developed 4 accelerators to to speed-up migration of various aspects in governed way.
Delivered new capabilities that were not available in on-prem version of the platform such as granular security model on all layers, smart monitoring, event-based execution of data pipelines, granular billing model, platform events, data lineage capturing, on-demand compute infrastructure provisioning, data ingestion as a service, etc.
We are well-versed in the dynamic world of development across a variety of industries.
Electrical grid control center, replacement of legacy scada systems
Algorithmic and manual power trading platform to boost efficiency
Gas Logistics, supplies, capacity planning
Electricity auctions
FinOps Solutions, cloud infrastructure cost optimization
Healthcare information management system to streamline clinical workflows
Improving customer engagement
Data landscape consolidation
Brand tracking analytical product
Healthcare monitoring system modernization
Data source on-boarding as a service
Road safety improvement
Analytical data exposure
Data intelligence system migration
Managing director: Mikhail Anfimau
Mergenthalerallee 15-21 65760 Eschborn, Germany
+49 6196 7008475
040 228 55754
DE345344498
HRB 123580