My Step-by-Step Process for Remembering the Skills I've Forgotten
How to debug a Python/R script, or any code for that matter...
Different roles come into our lives for different reasons. They each require a unique set of skills despite similar titles. I’ve worked in three different data engineering roles, each requiring quite different technical expertise.
In my DevOps role, AWS knowledge was a non-negotiable. I found myself writing tons of Python scripts utilizing the AWS CLI and deploying data pipelines using Jenkins (how many engineers actually use this tool??).
As a data engineer working in finance, I most commonly coded in SQL, and even used dbt for some of my data models. Every now and then I would utilize my Python skills to help with application development.
Now, as an analytics engineer, I rarely work with AWS and haven’t written a Python script in years. Different skills are utilized depending on your role and the company that you work for.
So what happens with all the tools and technologies we had once mastered? Do they sit in the back of our brain collecting dust and cobwebs?
While I believe there are some tools (like Jenkins) I will never have to use again, I do believe each of them taught me important lessons. However, others will show face now and then, forcing me to remember the skills I’ve forgotten.
For me, those two skills are Python and R.
R was the first language I ever learned, in a college data visualization class nonetheless. While I enjoyed using it during that course, I never once had to use it after that.
As for Python, this is a language I should be staying sharp in, but let’s be honest, life sometimes gets in the way. So, whenever there’s a chance to refresh my knowledge, I usually take the project on.
I was recently tasked with updating two scripts- a Python script and an R script used in a very manual deletion process. I experienced a human error with these scripts and wanted to make them more fool-proof.
So, here’s the process I used to understand the scripts and add what needed to be added!
Learning about the functions used
The first step in improving any code is to understand what it is doing. You need to be able to read the code, no matter the language, and understand the purpose each function, line, variable, etc. serves.
This means you need to go through the code, line by line, googling each function or syntax that you don’t understand. Don’t feel like you need to know everything- in the age of the internet, you would be stupid not to search for any questions you have.
I recommend adding comments to the code to remind yourself what each line does. This way you can re-reference it to help you connect any missing dots when you come across something you struggle to understand.
paste0() function in R
For me, one of these functions was the paste0() function in R. I knew it was doing something to string variables, but I wasn’t sure what exactly. So I Googled it!
I used this page to learn that it was a concatenation function. Easy enough!
I also searched “character vector” in Google since that was the terminology used on this page. I wanted to confirm that this was what other languages called a string, and I was right.
A few quick and easy Google searches and I was able to understand the script and what I needed to do next.
Finding the functions that do what you need
Next, after taking the time to understand what the code is doing, you need to figure out the solution you want to implement to improve the code. For me, I knew I wanted to reduce user error. This meant printing information for the user to confirm and utilizing command-line prompts.
The R script outputted a text file that the Python script then used. Every time it was run, it would output the same name for each file. To ensure old files weren’t being used to submit our requests, I wanted to generate a file with the date as the prefix. This required a knowledge of date functions in both Python and R.
Here’s some insight into the simple process I used to find what I needed.
Getting the current date in Python and R
I needed a way to retrieve the current date of whenever the script was run, in both Python and R. I called upon handy dandy Google and was able to find a Geeks for Geeks article detailing the date.today()
method. After converting this to a string using str()
, I had what I needed.
As for finding this in R, I did the same thing and discovered the Sys.Date()
method. After a few failed attempts, I realized this one was picky with the capitalization.
Reading the last line of a file in Python
Luckily I still remembered how to write a basic if/else statement and use command line inputs. I used both of these A LOT when scripting using the AWS CLI.
However, one thing I did forget was how to read lines in a text file. I only wanted to read the last line of the file, so I (you guessed it) Googled the best way to do this.
Reading through this thread, I realized the solutions being provided were overly complicated. I knew there was a better way, so I didn’t just go with the first answer I found. Eventually, I came across a comment further down with a solution I was happy about.
One of the great things about not being a total beginner, and just being a bit rusty, is that you can spot the solutions that aren’t optimal and keep researching until you find the one you know is best.
Google can be your bestie
I know this wasn’t the most technical article with ground-breaking solutions, but hey, it works! Sometimes we can get frustrated with ourselves for not remembering everything, thinking we need to contain years upon years of information in our brains at all times.
This isn’t realistic.
When you learn something new, chances are something is going to collect cobwebs. You just need to dust them off, turn to Google (or maybe even AI), and be given a reminder of what you once knew.
Have a great week!
Madison Mae
If you haven’t heard, I launched my first course- Transform Your Data Stack with dbt! This course is for all types of data professionals, beginner to intermediate, who wish to introduce (or refactor) dbt to help you scale your team's data.
You will learn how to:
Build a dbt project
Document your data following best practices
Define data quality tests using dbt packages
Write reusable data models and macros
Join me in May for 4 live sessions over 2 weeks where you will learn with me and interact with the greater data community!
Great article :)
Jenkins is not that bad (if you have an infrastructure team to support it). It was my first choice when we decided to move from dbt Cloud to dbt Core.