Welcome to this introduction to data integration. To understand how to use your talent tool, you first need some insights on what data integration is. We'll explore typical project use cases for an in-context introduction to data integration technologies. Then, we'll cover key benefits of a properly mastered integration project.
The number of data sources that you have to handle seems to multiply every day. Your company receives data from different systems under different formats. Even within the company, every department creates data in distinct databases using disparate data models.
Accessing high-quality data is the first step to enabling analytics. To generate better business insights, data must be cleansed, enriched, then centralized in a unified, easy-to-understand format. These are the primary challenges for data integration.
ETL tools help create business intelligence by gathering, combining, and providing access to data. The ETL process consists of three... ordered steps. ETL stands for Extract, Transform, Load. Just as the name implies, an ETL tool extracts data from a source, transforms the data while in transit, then loads the data into a single repository.
Extract. Get the data from source systems as efficiently as possible. Transform.
Perform calculations and transformations on data. At this step, cleansing and standardization is also required most of the time. Load.
Load the data to the target storage. The main advantage of using a graphical ETL tool is that it manages to do all of this with minimal coding. Now that we understand some of the basics of data integration and what an ETL tool is, let's look at some typical use cases.
Data migration is a process of moving application data from old systems to new ones. It typically requires transferring data between storage types or formats. An automated migration frees up human resources from tedious, error-prone tasks.
Data warehousing is used to aggregate transactional data for business people to work with and analyze. It converts business data into business intelligence. Data integration is often implemented in data warehouses through specialized software that hosts large data repositories from internal and external resources.
Data is extracted, amalgamated, and presented in a unified form. For example, business intelligence dashboards can be fed by a data mart that stores combined data from marketing, sales, and operations. Data consolidation is usually associated with moving data from remote locations to a centralized location or combining data due to an acquisition or merger of two or more companies. In this example, two companies merge. Imagine the sales team in each company use different tools.
The two sales teams might cross over into each other's accounts not realizing that a candidate is already a customer. The goal is to have a unified sales tool and single source of truth. for the new unified company so that the sales teams don't get conflicting or overlapping data. The purpose in this case is to transform the data from A into a format that can be ingested into B. Data integration provides the answer.
Data synchronization is a process of ensuring that two or more locations contain the same up-to-date data. Adding, changing, or deleting a file from one location will mirror the action at the new location. In this case, the sales teams can't migrate to a different sales tool because if they use tool A, Sales team B will be disadvantaged for a period of time. Likewise, if they use tool B, sales team A will be disadvantaged.
In this situation, everyone wants to keep their own tools, but the two services need to share information so that the two sales teams are able to function seamlessly. Data warehousing involves the extraction and transportation of data from one or more databases into a target system or systems for analysis. This involves the extraction and transportation of huge volumes of data and is very expensive in both resources and time. The ability to capture only the changed source data and to move it from a source to a target system in real time is known as changed data capture, often abbreviated CDC.
Capturing changes reduces traffic across a network and thus helps reduce ETL time. The CDC feature, introduced in Talend data integration, simplifies the process of identifying the changed data since the last extraction. CDC in Talend quickly identifies and captures data that has been added to, updated in, or removed from the database tables and makes this changed data available for future by applications or individuals.
Let's now examine the key benefits of data integration. As we have seen, a modern ETL tool must be able to connect to a large variety of data silos. Today's data is complex and voluminous. It requires versatility, speed, and scalability. Mapping and transforming disparate data necessitates a flexible and collaborative ETL tool.
For an optimum combination, data must be profiled, cleansed, and standardized. For better performance, data integration jobs must be monitored. Job logs must be easily accessible by everyone involved. Automating exceptions handling reduces processing time.
A service-oriented architecture speeds data integration and profiling and gives you more time to respond to new business requests. Let's see how data integration can empower your data processing with some concrete examples. To be compatible with your CRM systems, you can redesign customer data by merging first and last names into one field, or by splitting credit card information into several parts.
You can combine your sales registrations from different locations, standardizing date and unit formats. To gain better insight into your business, you can enrich your sales data by mapping additional geographical data. Once data is standardized and combined, you can run calculations to feed your business intelligence reports with accurate sales rates. Data profiling can reveal duplicates. You can use data quality components in a data integration job to deduplicate data.
Here, the same customer appears in three sources with data. with different spelling. The first step is to identify duplicates using data matching components. They give you plenty of options to fine tune the duplicate detection and their survivorship rules.
Data can then be repaired, improving the quality of data in the company and the coordination between systems. Data stewards can be involved in data reparation using the Talon Data Stewardship Cloud application. In this introduction, you learned why data integration is important and how your company can benefit from it.
Talend Data Integration is your dedicated tool for a fast response to business needs. Develop and deploy end-to-end data integration jobs faster than hand-coding using the Talend drag-and-drop user interface. Improve collaboration throughout the development lifecycle with a shared repository, versioning, and continuous delivery capabilities.
Use Talend Data Quality tools to profile and cleanse data earlier in the production chain. Centralize your jobs deployment and your user access management through a web-based administration console. Respond quickly to changing technologies and new business requirements with Talon flexible integration architecture. Thanks for watching this introduction to data integration.