Create data service level agreements (SLAs)
The best place to start—especially to determine your optimal ingestion approach—is to gather use case requirements from your data consumers and work backwards to develop a data SLA to address matters such as:
- What is the business need?
- What are the expectations for the data, and when does the data need to meet them?
- How will we know when the SLA is met, and what will the response be if the SLA is not met?
As part of this, seek to outline the challenges posed by the use cases developed and plan for them accordingly. Identify the specific source systems at your disposal and make sure you know how to extract data from them.
Automated data ingestion
As data expands in volume and complexity, the days of relying on manual-ingestion solutions to curate such a massive amount of unstructured data are over. Automated data ingestion solutions have been proven to save time, boost productivity, and reduce manual steps in the data ingestion process.
Furthermore, automation offers the additional benefits of architectural consistency, consolidated management, safety, and error management. All of this contributes to decreased data processing time.
Execute data quality checks at time of ingest—but do so carefully
The best time to determine if you have a quality control problem is at the time of ingestion. While there’s no scalable way to create tests for every possible instance of data corruption across the pipeline, some organizations implement data circuit breakers that will stop the data ingestion process if data doesn’t pass specific quality checks. However, there are inherent tradeoffs here. Set your data quality thresholds too high and you may unnecessarily impede data access; set them too low and your overall data warehouse may be compromised.
Do your best here to strike a balance in your circuit breaker deployment. And leverage data visualization and observability to help you detect data quality issues early in the process so you can resolve them before they become widespread.