Skip to main content

Data integration with apps

Problem statement

Let's assume you are a Product Manager in a fintech organization. You are speaking to one of your clients and you request general ledger and sales data from your client. You need to feed this data into your product.

Your client specifies the data is in Netsuite and you are free to take it from there. However before commiting to any timelines, you need to have answers on the data, technology and the business use case.

The following steps document the steps that you will need to take to accomplish the integration:

Integration outline

The integration activity can be split into the following sections:

  • Data discovery - Get the entities from the source that will constitue general ledger and sales data.
  • Data pipelines - Fetching data from the source system
  • Data integration - Data modelling/transformation to work with your product.

1. Data discovery

[Ownership - Product Manager and Business Analyst]

In data discovery you identify the data elements that will constitute the data you require.

Netsuite exposes 800+ data entities.

Can you identify the tables required to get general ledger and sales data?

In order to work through the Netsuite records, you need to answer the following:

  • Are you familiar with the application or will you need to ask your customer to bring in a subject matter expert or external consultant for data discovery or other product related questions?
  • Which entities or API or DB tables will constitute complete GL Details and Sales ledger.
  • Can you define the unique records in the above dataset? This is used to make sure to dedupe data to keep unique records.
  • What should be the frequency of the ingesting the data and if the source application supports the desired data refresh frequency?
tip

Always preferable to ask customer for sample data from production.

Helps tremendously with data discovery and validation.

2. Data pipelines

[Ownership - Product Manager, Data Engineering Team]

In this stage we need to gather requirements for writing the data pipelines to fetch the data from the source app. The tech team will need the answers to the following to get started:

  • Which version of Netsuite/SAP/Macola is being used? Is the version cloud-based or on-prem?
  • How is the data exposed via the application - APIs, databases, emails, or via the SFTP folders?
  • Does the customer have the right licenses to share the data via one of the above-defined mediums?
  • The entities/tables/APIs in SAP will constitute sales data and general ledger?
  • How often do you want to sync the data from the application to your system? Every 5 mins or 24 hours?
  • Do you have any estimate of the data volume?
  • How will you define the unique records when fetching the data so that we don't duplicate the records?
  • Once ingested how will you verify/reconcile the data against the source application?
  • How much data do you want to retain in the system and does the customer have any security/data compliance requirements?
tip

Here's a simple flowchart you can use to get basic answers about data ingestion.

Integration options

** #1,#2,.... - our preference of the medium through which data is exposed.

info

Pros and Cons of data exposed via different mediums listed in this article.

3. Data integration / modelling / transformation

[Ownership - Product Manager, App Software Team, Data Engineering Team]

With stages #1, #2 we have identified and decided what data we want to fetch and how will we fetch it.

In this stage we need to transform data so that it has the datastructure and metrics as required by your application.

For e.g You might want to club the finance data (general ledger) with the ops data and then push the merged data into your application from where your users can consume the ops+finance data.

The business specific transformations will be carried out in this stage.

  • The Data Engineering Team will help you with merging the finance and the ops data.
  • The App Software Team will pick up the merged data and push it into the application.

Integration hygiene

While the integration is running, you also have make sure that the integration remains in a healthy state. Few challenges that you might encounter once the integration is live:

  1. Disconnect between the Data Engineering and the app software team - If the data pipelines fail, the app won't have the latest data. The app and the DE team need to be in sync wrt the health of the data pipelines.

  2. Data validation/reconciliation between source application and your application - Let's assume there is some issue with the data duplication in your data pipelines. The data pipelines continue to work but the data in the source and destination will diverge very quickly. Better to have checks in place to catch these errors before your customer reports these.

  3. Data source goes down

  4. Data source changes the schema of the data fetched from the APIs

ELT Data

ELT Data is a data ingestion tool (#2) and helps you reduce the time to go live with built in automations for data ingestion.