
The client is a supplier of technology solutions in the automotive industry. They had been acquiring electronic parts catalogs from manufacturers for years, spending $5,000 to $50,000 per month on each catalog. The challenge was to build a solution that generates and updates catalogs from five manufacturers based on source data.
The client was spending $5,000 to $50,000 per month on each electronic parts catalog acquired directly from manufacturers. The goal was to build a solution that generates and updates these catalogs automatically from source data.
Each manufacturer stored source data in a different format: text files, XML, Excel, databases (.MDF), and even EBCDIC (a format dating back to IBM mainframes in the 1950s). Building unified ETL pipelines to handle all of these was the core engineering challenge.
Strict performance requirements demanded API response times of 300ms, requiring significant optimization effort across the entire data pipeline.
We built ETL pipelines that transform diverse source data and enable a single set of APIs to query all catalogs.
The pipeline runs in three stages:
The ETL runs monthly per manufacturer with processing times of 1 to 8 hours. TeamCity runs integrity checks, autotests, and catalog builds from full data sets. SpecFlow covers all ETL and API autotests. The largest deliverable database is 150Gb.
We have helped over 200 businesses grow their value and improve how they work through better software.
