Skip to main content

Dedupe and Scheduling

Pipeline setup is a three step process:

  • Ingestion configuration
  • Dedupe and Scheduling
  • Destination setup

Dedupe and Scheduling

After setting up the ingestion details, we define the pipeline schedule and the data dedupe strategy.

Schedule

Select the schedule at which you want to run the pipeline.

The schedules can be created/edited/viewed by navigating to the "Schedules" tab. In addition you can view all the pipelines associated with a given schedule.

tip

The schedules (timezones) are daylight savings aware and the schedules are adjusted automatically.

Dedupe

Next select the Dedupe Strategy. This defines how the data is synchronized between the source and the destination. The different strategies and their details are listed here.

For incremental dedupe strategies the following metadata is required:

  • The unique keys
  • The sort/cursor keys

JSON Paths

JSON Path is a query language for JSON. ELT Data lets you define the JSON Path to specify the required JSON fields.

A JSONPath expression specifies a path to an element (or a set of elements) in a JSON structure. Paths can use the dot notation:

$.store.book[0].title

Note that dots are only used before property names not in brackets.

Example
{
"store": {
"book": [
{
"category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{
"category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
},
{
"category": "fiction",
"author": "Herman Melville",
"title": "Moby Dick",
"isbn": "0-553-21311-3",
"price": 8.99
},
{
"category": "fiction",
"author": "J. R. R. Tolkien",
"title": "The Lord of the Rings",
"isbn": "0-395-19395-8",
"price": 22.99
}
],
"bicycle": {
"color": "red",
"price": 19.95
}
},
"expensive": 10
}
JsonPathResult
$.store.book[*].authorThe authors of all books
$..authorAll authors
$.store.*All things, both books and bicycles
$.store.book[*].priceThe price of all the books
$..book[2]The third book
$..book.length()The number of books

Dedupe Keys/Unique Keys Json Paths

Unique keys are required to identify the unique required. This info is used to decide if the record needs to be updated or inserted. The unique keys are defined using the JSON Path. If the unique record is a composite key you can define multiple paths.

In the above e.g if the category and author define a unique key then the unique keys can be defined as a combination of-

$.store.book[*].category

$.store.book[*].author

ELT Data will derive the unique keys from the JSON Path and use it for running incremental dedupe.

The UI provides a JSON path select editor as well as a manual input box.

Sort Keys Json Paths

Sort keys are required to identify the latest record in case we find duplicate records.

The sort keys too are defined using the JSON Path. You can define multiple keys in sort keys and the data will be sorted as per the specified keys.