SAP Data Services is a certified ETL (Extraction, Transformation and Loading) tool from SAP to perform batch loading into SAP HANA.
SAP Data Services is a data integration and transformation software application. It allows users to develop and execute workflows that take data from multiple sources
and then allows the user to combine, transform, and refine that data
, and then send to a destination system
Suppose you are working in an organization where data is stored in various disparate databases like Oracle, DB2 and other legacy systems.
You are asked to recommend the best application for consolidating and replicating data into SAP HANA
from SAP and Non SAP sources using the ETL method
What is ETL (Extraction, Transformation and Loading)?
Extract, Transform and Load (ETL) refers to a process in database usage and especially in data warehousing that:
- Extracts data from homogeneous or heterogeneous data sources
- Transforms the data for storing it in proper format or structure for querying and analysis purpose
- Loads it into the final target (database or data warehouse)
Data integration and transformations can be performed using database programming languages like SQL and PLSQL however it will be expensive to manage/maintain the landscape. This is where ETL (Extraction, Transformation and Loading) tools place major role in the industry. These tools are specifically designed to have single platform where developers can build the logic for transformations and administrators also can easily maintain the system.
Challenges faced when loading data from multiple sources to HANA
Data might be soiled:
With data scattered across your organization in different ERP, database, or homegrown systems; you may likely find different versions of the truth limiting your ability to gain a complete view of the business.
Data might be Inaccurate:
In many customer system, data is inherently inconsistent because things change and business requirements continue to evolve to meet new goals. Common issues like incorrect customer names, addresses, and product names only add to the challenge for organizations to resolve before they can leverage their corporate data as an enterprise asset.
Data might be Inconsistent:
Definitions of common business entities like customers, products, supplier, material names and codes vary from system to system creating inconsistencies that data access alone cannot address. You need a better way to reconcile this.
Data might be Incomplete:
Another common data challenge is incompleteness. A customer record may be missing a postal code or country code and would be unusable unless it is appended with the correct data.
Data might be Inaccessible:
Sometimes the data is in a format that is unstructured like a free form text coming from a CRM call log. The challenge lies in how to unlock insights and the potential from all of your data sources.
SAP Data Services is the solution to overcome all the problems
SAP Data Services is the first and only, all-in-one solution for data integration (ETL), data quality management, information stewardship (data profiling and metadata management), and text analytics.
Component of Data Services
SAP DATA services have below component
This is the front end GUI (Graphical User Interface) tool where developers can login and build the jobs in SAP Data Services to move the data from one system to other or with-in the system and define the logic for transformations.
To Open Data Service Designer go to Start Menu -> All Programs -> SAP Data Services (4.2 here) -> Data Service Designer.
: Repository is a database that stores designer predefine objects and user defined objects (source and target metadata, transformation rules).
Repository are of two types –
- Local Repository (Used by Designer and Job Server).
- Central Repository ( Used for object sharing and version control)
This server is used to execute the real-time jobs created by developers in the repositories.
: This is one of the main server component in data services and is used to execute all the batch jobs created by developers in the system. Repositories should be attached to at least to one job server to execute the jobs in the repository, otherwise developer cannot execute the jobs.
It is a web based console for managing SAP Data Services like scheduling the jobs, looking at system statistics on memory usage, runtime of jobs, CPU utilization etc.
In the next article, we will learn how to load data to HANA using SAP Data Service.