Dagster Pipelines for the Playlist Data

Tim Burns

5 days ago2 min read

Updated: 4 days ago

Brandi Carlile, Dolly Parton, and Tim Hanseroth

Context is the notion that we start a conversation, and the context builds as we converse. In my previous article, SQL queries were part of the context of a conversation between a client program and ChatGPT to build out queries.

That process lends itself to the transformation process in dbt, which allows us to specify dependencies and logic declaratively and process that logic.For example, here is a dbt lineage graph for a transformation that takes all playlists and creates a data mart summary of the top artists for each week.

The idea of chaining and visualizing data together to compose summaries that have meaning is crucial.

Here is a paper from 2023 that offers a solid approach to integrating SQL and text.

https://arxiv.org/pdf/2311.07509

It is a good foundation for taking traditional pipelines like dbt to the next level by incorporating LLM prompt results.

Use Context to Improve Data

The power of using dbt is that it exposes the transformations and the structure of the data and makes it easy to capture business semantics. When we augment dbt transformations with metadata, it allows us to provide a clearer understanding of how that data relates to the business logic.

Use Graphs to Construct Layered Context

The context of a conversation in an LLM is the conversation. We can use a graph to build on the conversation, view and manipulate that graph so that we can visualize the entire set and drill into individual nodes to see how we are asking questions and what we are doing with the results.

Integrate Queries into Context

SQL Databases have the advantage that their data is factual. We can extract raw facts, interpret these facts using an LLM, join more facts to the LLM, gain more knowledge, and continue until we have a desired result.

We can use the Graph Process to Address Hallucinations.

The graph formulation of a set of queries lets us apply accuracy checks to queries based on human input. We can integrate known quantities into the pipeline to check the LLM output and apply confidence metrics to the data from the LLM.

Context is Key

Again, being able to orchestrate and develop sophisticated contextual conversations with LLMs is all about handling and utilizing context.

How to Implement?

As I'm poking at methods to realize the graph process working with SQL data and LLM in tandem, I am leaning to dagster.

https://docs.dagster.io/integrations/libraries/dbt/reference

Dagster is undoubtedly the right choice. I followed the above straightforward process, and my dbt pipeline is running on Dagster. It only remains to integrate the LLM client into dagster. The UI is very nice and looks like a good springboard to simplify the process further.

Dagster Pipelines for the Playlist Data

How to Implement?

References

Data Engineering Podcast

Integration with UI

Recent Posts

Comments