Data source on-boarding as a service banner

Data source on-boarding as a service

Overview

Challenge

Data silos, lots of data duplication, absence of standard format.

Lost of processes and coordination to acquire data depending on source and format.

Massive landscape of batch and streaming data sources within organization hosted on-prem:

  • 200+ of JMS sources from internal operational systems.
  • 1000+ tables in Teradata Warehouse.
  • 30+ Oracle-based Data Marts.
  • 30+ data sources in SFTPs.

Source applications would like to have autonomy in terms of on-boarding their data into the platform with minimum IT involvement.

Approach

Chosen approach is to setup Azure-based data platform leveraging Bring Your Own Data approach to consolidate data from streaming and batch data sources hosted on-prem into the platform storages that expose data at rest and in motion consumption endpoints by end-users and downstream systems.

Below are the main highlights of approaches used within a data platform:

feature cloud icon

BYOD-supportive data publishing frameworks

Data publishing frameworks with a goal to support Bring Your Own Data (BYOD) approach through preparing libraries / codebase to be used by teams owning on-prem hosted source application to autonomously deliver batch data into platform data landing zones based on ADLS Gen2 or streaming data into streaming gateway represented as on-prem Kafka cluster.

chip icon

Kappa architecture for real-time data ingestion

Streaming Data Ingestion utilizing Kappa architecture, on-prem hosted Kafka clusters, Azure Event Hubs, Azure Functions and Spark Streaming to acquire and process continuously arriving messages from JMS-based data sources following by data harmonization and making data available to downstream systems at both in-motion and at-rest.

feature roadmap icon

Batch data ingestion & harmonization pipeline

Batch Data Ingestion utilizing Azure Data Factory, Azure Databricks to initiate processing when data pushed by source applications arrives to the platform. Includes data ingestion and harmonization processes to make data available to downstream processes at rest.

feature update-frame icon

Automated as-a-service data ingestion platform

As-a-service capabilities to automatically create configuration-driven data ingestion and harmonization processes within the platform for batch and stream data pushed by source applications.

feature computer icon

Centralized data lake with federated ownership structure

Centralized Data Lake based on ADLS Gen2 and Data Warehouse based on Delta Lake that are organized around data domains and subdomains following data segregation principles and federated ownership.

feature parts icon

Azure purview-enhanced data catalog for metadata Management

Data Catalog based on Azure Purview to establish automated capturing of technical metadata, support processes of enrichment technical metadata with business glossaries to increase data discoverability, trust and reusability within organization.

Achievements

feature clock icon

Rapid MVP platform rollout

MVP version of the platform was delivered in 6 months.

feature rocket icon

Autonomous business unit transition to new platform

First business unit successfully migrated to new platform for on-boarding data sources from it’s operational systems with minimal IT involvement.

Tech stack

Azure services

  • Dala lake gen2 icon
    Data lake storage gen2
  • key vault icon
    Key vault
  • Monitor icon
    Monitor
  • Data factory icon
    Data factory
  • functions icon
    Functions
  • express route icon
    ExpressRoute
  • event hub icon
    Event Hub
  • Purview icon
    Purview

Programming languages

  • Java icon
    Java

Cloud

  • azure icon
    Azure

Big data

  • databricks icon
    Databricks
  • deltalake icon
    Deltalake
  • Kafka icon
    Kafka
  • PySpark icon
    PySpark
  • Spark streaming icon
    Spark streaming

Case studies

We are well-versed in the dynamic world of development across a variety of industries.

Contact us

Anfimau Industry Solutions GmbH

Managing director: Mikhail Anfimau

contact us

Mergenthalerallee 15-21 65760 Eschborn, Germany

Phone

+49 6196 7008475

Tax number

040 228 55754

VAT ID

DE345344498

Trade registry

HRB 123580