The last few months it’s all been dbt. Dbt is a transform and load tool which is provided by fishtown analytics. For those that have created incremental models in dbt would have found the simplicity and easiness of how it drives the workload. Depending on the target datastore, the incremental model workload implementation changes. But all that said, the question is, should the incremental model use high-watermark as part of the implementation. How incremental models work behind the scenes is the best place to start this investigation. And when it’s not obvious, the next best place is to investigate the log after an test incremental model execution and find the implementation. Following are the internal steps followed for a datastore that does not support the merge statements. This was observed in the dbt log. - As the first step, It will copy all the data to a temp table generated from the incremental execution. - It will then delete all the data from the base table th
I have been an agent of change in my pod and the staff I have been waving is postgres :). We have been working out a new project and the OLTP data store is postgres. Out of the many things I encountered this week two things really topped the list. Most OLTP workload are random IO. Postgres heep vs cluster key that sorts data Declarative partitioning is the way forward to keeping data sets manageble and assists with maintenance tasks. The code snippet for bellow provides details of the partition child table properties. SELECT nmsp_parent.nspname AS parent_schema, parent.relname AS parent, nmsp_child.nspname AS child_schema, child.relname AS child , pg_get_expr(child.relpartbound, child.oid, true) as child_expression, child.reloptions FROM pg_inherits JOIN pg_class parent ON pg_inherits.inhparent = parent.oid JOIN pg_class child ON pg_inherits.inhrelid = child.oid JOIN pg_namespace nmsp_pa