The technical foundations for securing an entry level data analyst role
TLDR;
- Spreadsheets (Excel)
- SQL
- Visualization tool (Tableu or Power BI)
- Scripting language (Python or R)
Intro
I am often asked by people trying to break into data what skills they need to learn to get their first job in Data, and where they should learn them. This article is the distillation of the advice I have been giving aspiring data scientists, analysts, and engineers for the last 5 years.
This article is primarily geared towards self-taught data jockeys who are looking to land their first role in data. If you’re reading this article, odds are your first role will be as an analyst. Most of the entry level roles in data are analyst roles and I don’t regard data scientist or data engineering roles as entry level.
The four pillars are spreadsheets, SQL, a visualization tool, and a scripting language.
Different jobs will require a different blend of these skills, and you can build an entire career out of mastering just one of the pillars, but almost all roles in data require at least a cursory knowledge of the four subjects.
Excel
Excel is the alpha and omega of the data world. For 30 years the data community has been talking about the fabled “excel killer” and it hasn’t ever been found. You could have been part of a multi-team 6-month long effort to harmonize data from 7 databases, built them into the sexiest Tableau dashboard, and the first thing your stakeholders will ask you is how they can export it to excel.
Excel is vast and most users just scratch the surface of its functionality, but this is a list of skills that I would consider a minimum for landing an analyst role:
- Basic interface navigation
- Formulas
- Conditionals (IF, IFS, COUNTIFS, SUMIFS, etc)
- Spreadsheet hygiene (making sure your spreadsheets are logically laid out)
- Joining data sets (V-lookup, X-lookup, index-match)
- Charting/visualization
- Pivot tables
- Filtering and sorting data
Power Query
If you want to take it a step further, I also recommend aspiring analysts become familiar with Power Query (also called Get and Transform). I like power query for aspiring analysts because it is a good introduction to working with more formally structured data and working with proper tabular data.
One advantage of learning Power Query and Power Pivot is that they are extensively used in Power BI.
What about google sheets?
Google sheets is a solid spreadsheet alternative to Excel, but it is missing a lot of the advanced features. If you learn excel you can quickly adapt to google sheets, and you can learn many of the basic spreadsheet functions on google sheets, but I don’t think it’s an adequate substitute for excel at this point.
My observation is that Google sheets is commonly used in government, academia, and in early to middle stage startups.
VBA
If you’re trying to figure out how to do something in excel and the tutorial you stumble across suggests VBA, look for a different solution.
SQL and Databases
This is a tricky subject for aspiring analysts to learn because outside of a production environment it is hard to learn the nuances of working with databases beyond basic syntax. This is because most of the practice data sets are far too clean.
Early in one of my jobs I completely botched a SQL query request because I made the amateur mistake of joining two tables on the FINANCING_ID column instead of the FINANCING_ID_NEW column.
Most databases at organizations large enough to hire analysts are not planned or designed, but are rather organic accretions of data that build up over time, accrued via mergers and acquisitions and time constrained software engineers trying to solve a problem RIGHT NOW.
For many organizations, it can take months to onboard to their databases.
So my advice is aside from learning the basic syntax of one dialect of SQL, I wouldn’t spend too much time mastering SQL until you have a job where you get to write it every day.
These are the basic querying skills I suggest you learn:
- Basic syntax
- Anatomy of a SQL query
- Aggregations
- Multi-table joins
- Dimensional modeling
- CTEs and Subqueries
Which dialect should you learn?
It doesn’t really matter because they’re so similar and once you know one, the differences can all be resolved with google or Chat GPT. My suggestion is either Postgres or T-SQL.
Visualization
While excel can be used to produce some visualizations, most organizations that hire analysts will produce dashboards with either Power BI or Tableau (I’ve worked with a few others but these are the dominant players).
Like SQL, I wouldn’t suggest indexing too heavily in visualization until after you have a job, learning the basics is important, but much of the advanced functionality is best learned in a production environment.
Power BI or Tableau?
I would suggest choosing one and focusing on it, rather then splitting your attention between the two.
If your primary experience in data is with Excel, Power BI will likely be more intuitive for you to work with. Once you learn to use one, you can easily adapt to learning another, and for most generalist analyst roles, hiring managers won’t care that much, as long as you know one of them.
I once interviewed for a role at a large enterprise to develop Tableau dashboards and I asked the hiring manager “if you hired me, what would you consider a successful hire after 6 months.”
His answer was “If you could edit a single dashboard after 6 months, I’ll consider it a success.”
Like SQL, a lot of the challenge of working with visualization tools is understanding the organization’s data.
What should you learn?
- Making all of the standard charts
- Data cleanup, and how data should be structured going into the tool of your choice
- Basic dashboarding
Scripting Language
Finally we have scripting languages. As a caveat, my first few analyst roles didn’t require me to know a scripting language, but that was some time ago and reviewing application requirements, it appears that at least knowing a little is a requirement for entry level roles now.
R or Python?
If you already know R (learned it in a statistics class) then focus on R, otherwise learn Python. If you’re proficient in one, you can learn the syntax of the other in the time it takes you to onboard.
R also tends to be more common in organizations that have close relationships with academia. Biotech firms are more likely to use R because their researchers are more likely to have used it in grad school.
What should you learn?
- Variables
- Basic numeric manipulation
- String manipulation
- Conditionals (If/then)
- Basic data structures (Lists, dictionaries, tuples, sets)
- Loops
- Defining and using functions
- Pandas (the library, not the animal)
General advice
- You don’t need to be an expert on these subjects, you need to be familiar with them.
- For entry level analyst roles, focus most on excel.
- Don’t overestimate your skill level. I once interviewed a candidate who described themselves as an “intermediate” Python user. The role didn’t call for Python, but since they said they could, I gave them a live coding exercise. I asked them to define a function to detect whether a given string input is a palindrome (a word spelled the same way forwards and backwards). They then admitted they didn’t know how to define a function. I politely ended the interview there.
Certifications
For the most part, I don’t think certifications are particularly useful for securing entry level roles. They might make a difference at the margins (maybe you get an interview with a recruiter that you otherwise wouldn’t get), but I don’t think they’re worth the effort.
There is one exception to this: The South Asian job market.
I did use a handful of certifications as a heuristic when evaluating candidates.
Generally those certifications had a few things in common:
- They were from major technology companies in data, like Snowflake, Microsoft, or Tableau.
- They cost several hundred US dollars to obtain, representing a substantial investment for a typical South Asian employee (or their employer).
Free certificates
There are lots of free or very low cost certificates, like the Google Data certificates. In general I think they’re worth about as much as you pay for them. The learning content is solid, and they’re well put together curricula, but the certification itself won’t really help you stand out.
Passing technical interviews
When I interview candidates, I really want them to succeed, I suspect most interviewers are the same.
So when you’re interviewing, keep it conversational.
I am mostly interested in seeing how you arrive at the right answer, not whether you get the answer. I prefer candidates to ask questions, test ideas, and ask for clarification. If you’re on the wrong track, I will ask questions to see if I can get you on the right track.
Resources
The following are mostly paid resources that I used when learning these skills. These are not referral links, I don’t get anything from you getting them.
Excel
Tom Hinkle is a dear friend, and I strongly recommend his courses on Udemy.
Oz Du Soleil is one of my favorite online instructors and an all around good dude: I’ve linked to his YouTube channel because he offers a lot on there.
If you want to learn Power Query, skillwave training is absolutely excellent. They also have Power BI courses, though I haven’t taken them.
SQL
The IMDB’s actual database: This is a very clean dataset that will let you practice complex SQL queries across a dimensionally modeled database.
The Microsoft Contoso Database: This simulates a retail website’s database, and will give you good practice on aggregations, and answering business questions.
Tableau
Tableau offers some of the best training on how to use their product. I’d suggest learning from their courses vs paying someone else.
Python
The Python Bible: Ziyad is one of the most engaging online instructors out there.
The Complete Pandas Bootcamp: Alexander Hagman is dry, but thorough. I still reference this course when I need refreshers on Pandas.
General
Anil was an early mentor of mine and has since started a digital analytics mentorship/educational platform. He taught me at a local college, but his work is stellar and he invests a lot in his students.
Do you think there are any foundational analytical skills I missed?
About the Author
Charles Mendelson is a senior software engineer at a Big 3 management consulting firm where he helps clients build AI prototypes and MVPs.
He started his tech career as a self-taught data analyst, before becoming a data engineer.
The Four Pillars of a Data Career was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
The Four Pillars of a Data Career
Go Here to Read this Fast! The Four Pillars of a Data Career