RosettaDB is a powerful tool for managing database schemas, enabling transformation and management of database objects across different databases. Combined with dbt (Data Build Tool), it allows you to generate dbt models directly from your database schema, creating structured, ready-to-use datasets for analysis. This guide demonstrates how to generate dbt models using RosettaDB from your database schema.
# Prerequisites
Download JDBC drivers for your databases, install RosettaDB from the releases page, and refer to the Quick Start Guide for setup instructions.
# Setting Up RosettaDB
# 1. Initialize a New RosettaDB Project
To create a new project, use:
rosetta init dbt_postgres_project
This will create a project directory with a main.conf
file for defining your database connections.
# 2. Configure Database Connections
Edit the main.conf
file to define your database connection. Here’s an example configuration for a general database connection:
connections:
- name: postgres_conn
databaseName: analysis_db
dbType: postgres
url: jdbc:postgresql://localhost:5432/analysis_db
userName: username
password: password
# 3. Extract DBML Models
Run the rosetta extract
command to generate DBML models from your database schema.
rosetta extract -s postgres_conn
Now that you have the DBML models, you can proceed to generate dbt models.
# Generating dbt Models
Use the rosetta dbt
command to convert your extracted DBML into dbt models:
rosetta dbt -s postgres_conn
This command will produce dbt models based on the extracted schema, ready to integrate into your dbt project.
# Example dbt Model Output
Here’s an example of what the generated dbt models might look like, covering multiple tables for better context.
**model.yaml**
version: 2
sources:
- name: Retail_Analysis
description: "Data source for retail analysis"
tables:
- name: sales_transactions
columns:
- name: transaction_id
tests:
- not_null
- unique
- name: product_id
tests:
- not_null
- name: customer_id
tests:
- not_null
- name: transaction_date
tests:
- not_null
- name: amount
tests: []
- name: products
columns:
- name: product_id
tests:
- not_null
- unique
- name: product_name
tests: []
- name: category
tests: []
- name: price
tests: []
- name: customers
columns:
- name: customer_id
tests:
- not_null
- unique
- name: first_name
tests: []
- name: last_name
tests: []
- name: email
tests:
- unique
- name: registration_date
tests: []
Example Models
1. Sales Transactions Model
with sales_transactions as (
select
transaction_id,
product_id,
customer_id,
transaction_date,
amount
from {{ source('Retail_Analysis', 'sales_transactions') }}
)
select * from sales_transactions
2. Products Model
with products as (
select
product_id,
product_name,
category,
price
from {{ source('Retail_Analysis', 'products') }}
)
select * from products
3. Customers Model
with customers as (
select
customer_id,
first_name,
last_name,
email,
registration_date
from {{ source('Retail_Analysis', 'customers') }}
)
select * from customers
Summary
By following these steps, you can effectively generate dbt models from your PostgreSQL schema using RosettaDB. With these models, you can run transformations, apply tests, and document your data workflow, enhancing data quality and usability. This process not only streamlines your data management but also aligns with best practices in data analytics.
For more details, check the official RosettaDB documentation or reach out to the community if you need further assistance.