Using ChatGPT to write SQL
With the rapid advancements in AI, I’ve shifted my perspective from fearing it taking over to embracing it as a powerful tool to simplify my work. As Cal Newport aptly stated in Deep Work, staying ahead in this age means working alongside new technologies rather than against them. In that spirit, instead of merely discussing data problems like usual, today we’ll be talking about how to use AI to solve your data problems.
Throughout this blog, I’ll walk through a data problem and demonstrate how AI can be an invaluable aid in its resolution. Additionally, I will share some practical Dos and Don’ts based on my own experiences of using AI to solve numerous problems.
@kevin2kelly described #ChatGPT the best on @tferriss 's podcast calling it an "intern". Someone whose work can be potentially really good but you have to give it a second look before you let it fly.
— Abhishek Singh (@abhishek27297) June 7, 2023
The data problem we’re about to tackle with (Chat)GPT is one I discussed here before its launch, making it the perfect litmus test to evaluate its capabilities. The task at hand is to build an email subscriber history table for my mailing list that shows when readers subscribe and unsubscribe. Let’s dive right in and witness AI in action!
Layout the context first
I start by laying out the context for ChatGPT first given its open-ended nature. It can help you with anything from data to relationship problems (I strongly suggest the former and refrain from the latter😜) and hence it’s helpful to set context about what we’re going to talk about.
Now that our AI partner is on board, let’s proceed with the problem!
Break down the ask
When interacting with AI, be patient and take the time to divide your problem into multiple steps. Doing so will ensure that the AI remains focused on each part and responds more accurately. Like above I specifically say “I’ll describe the problem next”. If you bombard it with a lot of text and ask something in between, it gets lost from my experience.
Describe what you have in DETAIL
Now you’d think if you give it soo much detail, you might as well do the work yourself. To that, I’d say AI performs the best to fulfill the gap between having a general idea of how things should be done and not knowing how to do it exactly. If you don’t have any idea about the problem, AI will take you for a ride. Notice how I describe what I have and its meaning and in the following prompt go on to give it an example for making it more explicit. Now, let’s tell it what we’re looking for.
Give it a template of what you WANT
You’ll save yourself time if you describe what you’re looking for with great detail as well. AI does a great job when you give it guide rails for doing things as opposed to expecting the correct answer with a single-line prompt. In this case, I explain the columns I’m expecting in the result and their significance. Also, adding little punctuation like “?” may seem trivial but can make the ask more clear.
Double check!
You’ve probably heard this by now but it’s important to double-check everything LLMs give you, given their ability to deliver BS with confidence. Cassie describes this phenomenon the best here.
Below is the query ChatGPT gave me after the above prompts:
SELECT
t1.date AS date,
t1.user_id AS user_id,
CASE
-- User joined the mailing list
WHEN t2.user_id IS NULL THEN 'I'
-- User unsubscribed from the mailing list
ELSE 'O'
END AS mailing_list_status
FROM
mailing_list_feed AS t1
LEFT JOIN
mailing_list_feed AS t2
ON t1.user_id = t2.user_id
AND t1.date = DATEADD(day, 1, t2.date)
ORDER BY
t1.date, t1.user_id;
Now in this case I kinda had an idea of where it could mess up which it did and hence I could spot it easily but in your case, it may not be that obvious which makes it essential to do some sanity checks before you run off with the solution it gives you.
In my case even though I clearly mentioned before that some days are missing from the feed it ignored this piece for some reason and gave me a query that only works when no dates are missing.
Work with AI
After spotting this I again reminded it of the condition and asked it to fix the mistake. The good thing with AI is even though it makes mistakes, it will acknowledge them and fix them when you ask it to, making the combo of AI and YOU the best team.
Below is the final query it gave me:
SELECT
t1.date AS date,
t1.user_id AS user_id,
CASE
-- User joined the mailing list
WHEN t2.user_id IS NULL THEN 'I'
-- User unsubscribed from the mailing list
ELSE 'O'
END AS mailing_list_status
FROM
mailing_list_feed AS t1
LEFT JOIN
(
SELECT
t3.user_id,
MIN(t3.date) AS next_date
FROM
mailing_list_feed AS t3
WHERE
-- Find the earliest date after the current date for each user
t3.date > t1.date
GROUP BY
t3.user_id
) AS t2
ON t1.user_id = t2.user_id
AND t1.date = t2.next_date
ORDER BY
t1.date, t1.user_id;
The beauty of working with AI lies in its adaptability and willingness to learn from interactions. With this partnership, we can harness AI’s potential while fine-tuning the results to suit our requirements. However, it’s important to remember that even though we now have the correct query, our journey is not complete. If you recollect from my blog, the next step involves putting this query into practice and considering the data architecture to build the subscriber history table effectively.
Switch gears(chats)
Another common practice I follow is if I’m about to change gears to a different topic, I start a new chat. While this seems trivial, I can’t tell you how many times I’ve come back to older chats with more questions and it helps that I didn’t switch the topic in there.
For example, in our case, what if I find out I want to make further amendments to the above query like adding a source of opt-in to the resulting table? It’s easier to leave this chat here and come back and build on it if needed.
Below is a new chat where I delve into the steps to build the data architecture and pipeline, effectively bridging the gap between the SQL query and the final email subscriber history table.
Embrace AI & practice
It’s clear that embracing AI as a collaborator can unlock boundless possibilities. Rather than fearing the rise of AI, we can harness its potential to augment our skills and simplify complex tasks. Throughout this blog, we’ve explored the art of effectively using AI to solve SQL problems. From setting the context and breaking down the ask to providing templates and double-checking results, we’ve seen how structured communication yields more accurate and relevant responses.
Remember, practice makes perfect! As you continue to engage with AI, you’ll find yourself becoming more adept at framing prompts and extracting precisely what you need. The art of manipulating prompts, as demonstrated in our example, can yield enhanced results and pave the way for more seamless interactions with AI. Another such example I learned from here that I use for my blogs is below.
Write a [type of text] about [topic] for [target audience] in the style of the provided example, capturing its tone, voice, vocabulary, and sentence structure. Example: [Insert your text]
So next time you’re stuck on something give AI a chance to help you! Happy problem-solving and creating with AI!
Let me know how you’d approach it. If you found this helpful, share it. If you’re into this, you can find me on Twitter @abhishek27297 where I talk data. If you don’t want to miss my next post, consider subscribing here, it’s free and you’ll be notified when I post my next blog.