Featured image of post Generating dbt Models using RosettaDB

Generating dbt Models using RosettaDB

RosettaDB is a powerful tool for managing database schemas, enabling transformation and management of database objects across different databases. Combined with dbt (Data Build Tool), it allows you to generate dbt models directly from your database schema, creating structured, ready-to-use datasets for analysis. This guide demonstrates how to generate dbt models using RosettaDB from your database schema.

# Prerequisites

Download JDBC drivers for your databases, install RosettaDB from the releases page, and refer to the Quick Start Guide for setup instructions.

# Setting Up RosettaDB

# 1. Initialize a New RosettaDB Project

To create a new project, use:

rosetta init dbt_postgres_project

This will create a project directory with a main.conf file for defining your database connections.

# 2. Configure Database Connections

Edit the main.conf file to define your database connection. Here’s an example configuration for a general database connection:

connections:
  - name: postgres_conn
    databaseName: analysis_db
    dbType: postgres
    url: jdbc:postgresql://localhost:5432/analysis_db
    userName: username
    password: password

# 3. Extract DBML Models

Run the rosetta extract command to generate DBML models from your database schema.

rosetta extract -s postgres_conn

Now that you have the DBML models, you can proceed to generate dbt models.

# Generating dbt Models

Use the rosetta dbt command to convert your extracted DBML into dbt models:

rosetta dbt -s postgres_conn

This command will produce dbt models based on the extracted schema, ready to integrate into your dbt project.

# Example dbt Model Output

Here’s an example of what the generated dbt models might look like, covering multiple tables for better context.

**model.yaml**

version: 2
sources:
  - name: Retail_Analysis
    description: "Data source for retail analysis"
    tables:
      - name: sales_transactions
        columns:
          - name: transaction_id
            tests:
              - not_null
              - unique
          - name: product_id
            tests:
              - not_null
          - name: customer_id
            tests:
              - not_null
          - name: transaction_date
            tests:
              - not_null
          - name: amount
            tests: []
      - name: products
        columns:
          - name: product_id
            tests:
              - not_null
              - unique
          - name: product_name
            tests: []
          - name: category
            tests: []
          - name: price
            tests: []
      - name: customers
        columns:
          - name: customer_id
            tests:
              - not_null
              - unique
          - name: first_name
            tests: []
          - name: last_name
            tests: []
          - name: email
            tests:
              - unique
          - name: registration_date
            tests: []

Example Models

1. Sales Transactions Model

with sales_transactions as (
    select
        transaction_id,
        product_id,
        customer_id,
        transaction_date,
        amount
    from {{ source('Retail_Analysis', 'sales_transactions') }}
)
select * from sales_transactions

2. Products Model

with products as (
    select
        product_id,
        product_name,
        category,
        price
    from {{ source('Retail_Analysis', 'products') }}
)
select * from products

3. Customers Model

with customers as (
    select
        customer_id,
        first_name,
        last_name,
        email,
        registration_date
    from {{ source('Retail_Analysis', 'customers') }}
)
select * from customers

Summary

By following these steps, you can effectively generate dbt models from your PostgreSQL schema using RosettaDB. With these models, you can run transformations, apply tests, and document your data workflow, enhancing data quality and usability. This process not only streamlines your data management but also aligns with best practices in data analytics.

For more details, check the official RosettaDB documentation or reach out to the community if you need further assistance.