The (re)-emergence of self-service signals an end to IT-driven data projects. ETL (extract, transform, load) is an important part of today’s business intelligence (BI) because data from disparate sources are able to be in one place to programmatically analyze and discover business insights. It is here. And the importance of ETL for BI is hard to overstate.  

In the past few years, the BI industry has seen a plethora of new analytics applications touting end-user self-service as a prime selling point. This is driven in part by a diversity of new vendors offering products as well as by the emergence of cloud platforms and SaaS solutions as consumers or hosts of business data. These new applications have co-evolved with the cloud and besides being source agnostic, leverage new web service APIs to access data beyond the traditional relational or flat file formats. Plus, cloud-based analytics allows for quick proof-of-concept development as it removes the need for IT implemented infrastructure. It is not hard to see why leveraging these tools has obvious benefits over traditional BI implementations. 

Modern Data Platform Benefits 

  • Self-service keeps the data in the hand of those that know it best 
  • Ability to merge non-enterprise data for what-if analysis 
  • Easy to implement with a fast learning curve  for reduced training and maintenance costs  
  • Quick impact on managing the business 
  • Flexibility allows an agile response to changing business needs  

Unfortunately, depending on the size of the company, the complexity of the source data and the number of disparate data sources…uncontrolled access to data can quickly return a company back to a more modern version of the spreadsheet circus.  

How so? Results remain siloed in a single department or organization and do not aggregate to an enterprise-wide view. There are multiple definitions of KPIs without a single source of truth. And key tribal knowledge of source data and the transformations required are concentrated in SMEs and not easily reviewed or communicated. 

While analytics applications often have sophisticated methods of pulling in diverse data sources, what may work for a small proof of concept dashboard, may not scale for enterprise reporting. Plus, manual methods of accessing data may not automate easily. 

Don’t Underestimate the Importance of Enterprise Data Architecture  

Since analytics are the realm of data scientists, the responsibility of making sure the data is available for analysis in a consistent and meaningful way falls to the data engineers. Data engineers guide the decision on which platform(s) to use to shunt, clean and transform data to a set standard and how these multiple data sources and platforms should interact with one another. This includes decisions on where, in which application, data corrections, and additions should be made. How to define metadata. And where business rules are encoded.  

ETL’s Role in Positive ROI 

Many decisions and implementations can be manually executed and automated via custom code developed by SMEs.  Leading ETL tools constrain the implementation of transformations.  Some ETL software best practices: 

  • All Business Rules and tribal knowledge (Transformations) are available in a transparent manner in a single layer 
  • Built-in intelligence in new generation of tools can detect and handle source and target object changes 
  • ETL applications allow for flexible and agile growth of data needs while corralling maintenance & rework effort and object proliferation. 
  • ETL applications graphical drag and drop development lowers programming language requirements in staff 

In the Cloud with ETL

Cloud computing is not only driving a growth in Analytic platforms. Many new vendors have also appeared in the ETL space, often driven by the need for new connection software for web service APIs. In the past, the standard enterprise ETL tool vendors had been confined to a handful of well-known names; SAP’s Data Stream, Informatica, MS SSIS, and IBM’s Data Stage. With the emergence of cloud computing, these vendors have started to migrate their toolsets in order to remain viable in the changing market.  

There are more benefits of cloud for modern data platform users than ever before. More enterprise tools to choose from. At a wider range of price points.  Also, additional benefits only a SaaS solution provides such as: 

  • Removing the need for hardware maintenance and performance tuning with increasing/decreasing data loads and computing needs  
  • Solutions are designed to help bridge on-premise and legacy applications to cloud applications and platforms 
  • SaaS solutions handle connections for up and coming web-service APIs such as RESTful  

It has Never Been Easier to Implement a Robust Enterprise Architecture 

Like any tool, the tool itself is not the final solution to any or all pain-points. One size does not fit all.  Be sure to align tool selection and data architecture with business needs. Allow the tool to guide the design based on best practices. Also, leverage the transparency of encoded tribal knowledge with a flexible data architecture and corresponding ETL platform.