Automatically clean data files for analysis
Create an automated process to combine and deduplicate two CSVs with Julius AI workflows.
2024-12-20
Data cleaning is an important step in preparing data for analysis, but it’s often tedious and time-consuming. If you’ve ever found yourself mind-numbingly scrolling through spreadsheets and editing data, you’re going to be excited about this tutorial. We’re going to guide you through using Julius AI to create an automated workflow for combining and cleaning your CSV files, making your data analysis process more efficient, consistent, and straightforward.
What’s great about Julius AI workflows, is you can set up a fully automated process that you can then share with your team or stakeholders, allowing them to complete a complex, multi-step data cleaning process in a fully- or semi-automated fashion.
Steps we’ll follow in this tutorial:
- Create your data-cleaning workflow
- Test your workflow
- Edit and share your workflow
Let’s dive in.
Step 1: Create your data-cleaning workflow
To get started, navigate to the Julius AI website and create a free account.
.webp)
You’ll then land on the Julius AI dashboard. Click on the “My Workflows” tab in the left-side navigation.
.webp)
Then, click the “New Workflow” button at the top of the page.
.webp)
Next, we’ll name our workflow. We’re going to call this one “Combine Two CSVs”. You can add an optional description to your workflow. Then, we’ll create our data workflow. You can provide a simple prompt and it’ll generate all the steps needed to complete the workflow - both the steps the AI needs to take as well as the human analyst.
Sample prompt:
Combine two uploaded CSVs of data into one. Deduplicate any duplicate data.
.webp)
Julius AI will then generate all of the steps needed to perform this process in the tool. For instance, for our process, the first two steps are uploading the first CSV file and then uploading the second CSV file.
.webp)
You can keep scrolling down the workflow on the right side of the page to see all of the steps Julius AI has generated to complete the data-cleaning workflow.
.webp)
Step 2: Test your workflow
Once the workflow outline is to your liking, you can test it by clicking the “Run Workflow” button at the top of the page.
.webp)
This will take you to the live workflow experience, which Julius AI calls a “Thread”. The middle window will be where you interact with the workflow and the right side window provides a step-by-step outline of the workflow.
For this example, the first step is uploading our first CSV file, which we’ll do by clicking the “Load File” button and uploading a CSV from our computer.
.webp)
After we upload our first CSV file, we’ll move to step two, where we’ll upload our second CSV file.
.webp)
Julius AI will then automatically proceed through the AI steps of reading the files, combining them into one DataFrame, deduplicating the data, and outputting a new, combined CSV file.
.webp)
Julius AI will show its work every step of the way, so if there are any issues, debugging your workflow is much easier.
.webp)
Occasionally, there might be steps that require human intervention. For instance, in one of the last steps of our workflow, there is a step where we can submit a name for our new CSV. You can remove these steps if they are not necessary via the workflow editor.
.webp)
Finally, Julius AI will output a final, combined CSV file that we can click to download.
.webp)
Step 3: Edit and share the workflow
If there are any steps along the way you want to edit, you can navigate back to the workflow builder and add additional steps at the top of the workflow builder.
.webp)
Or, you can insert, move, or remove steps within the workflow as well. For instance, if you wanted to remove the user step of naming the CSV file, you could update that to an AI step to automatically generate the CSV file name.
.webp)
Once you’re happy with your workflow, you can share it via the “Share” button in the top right corner via a link or to all Julius AI users on the Explore page.
.webp)
This tutorial was created by Garrett.