High Watermarks For Incremental Models in dbt

The last few months it’s all been dbt. Dbt is a transform and load tool which is provided by fishtown analytics. For those that have created incremental models in dbt would have found the simplicity and easiness of how it drives the workload. Depending on the target datastore, the incremental model workload implementation changes. But all that said, the question is, should the incremental model use high-watermark as part of the implementation.

How incremental models work behind the scenes is the best place to start this investigation. And when it’s not obvious, the next best place is to investigate the log after an test incremental model execution and find the implementation.

Following are the internal steps followed for a datastore that does not support the merge statements. This was observed in the dbt log.

- As the first step, It will copy all the data to a temp table generated from the incremental execution.
- It will then delete all the data from the base table that's unique in the temp table ( The temp table mentioned in the previous step).
- Finally, it will insert all the data from the temp table to the base table.

The highlights
- As the best practice, always use a high watermark when modeling an incremental load. This will optimize the workflow by limiting the amount data it has to merge or delete and insert.
- This also allows us to run an incremental load by backdating the models by N days. I think this is a great flexibility (in disguise ) to have for systems with upstream data delays.
- Always, have a uniquekey in the model. I won’t go to the length of describing the importance of following best practices for a model. But from a optimisation perspective, I would always add index on the uniquekey which would optimize the internal implementation of a incremental model.

Comments

naecona-ji-1987April 23, 2022 at 9:19 AM
naecona-ji-1987 Richard Gonzalez https://wakelet.com/wake/IsoiwPqPjjB17Y2ya_2rS
proserimber
ReplyDelete
Replies
0tiovefragnJuly 31, 2022 at 4:37 PM
0tiovefragn Melissa Buzicky download
download
https://colab.research.google.com/drive/1RqcIMlbhizp7KTkuSautjrAwqAZTIUiM
download
naspontbima
ReplyDelete
Replies
apliplac_fu_ScottsdaleAugust 25, 2022 at 3:32 PM
MrupcorWcons-nu Byron Mancuso McAfee Internet Security
Crack
https://fatfolder.com/category/system/system-libraries/
dragerroca
ReplyDelete
Replies
anidmons-zoNovember 30, 2022 at 10:03 AM
anidmons-zo Barbara Jones program
Click here
deretotask
ReplyDelete
Replies

Add comment

SqlServer Begin's

Search This Blog

High Watermarks For Incremental Models in dbt

Labels

Comments

Post a Comment

Popular posts from this blog

Create a dacpac To Compare Schema Differences

How to Backup postgres globals without sysadmin permission in RDS