Good engineers, bad engineers, and evil engineers — an anecdote for data leaders
My golden framework to differentiate the good, the bad, and the evil engineers in all fields, including data
To engineer is to design or build something using scientific principles— Cambridge Dictionary.
We all love good engineers, they build fantastic bridges, roads, rockets, applications, and data structures that make our lives easier and enjoyable every day.
By the same logic, bad engineers will not make lives much better. If we hire them, they will design and build something, but take more of our time, money, and energy.
But do you also know, that outside the spectrum of good and bad, there are also evil engineers, whose mindset is not to build, but to not build.
As an engineer myself, and someone who has worked with multiple engineering teams wearing the product owner/ project manager hats, the totality of my experiences has told me something about the good, the bad, and the evil engineers. I love good engineers, I have empathy for bad engineers, and I despise evil engineers.
By the end of this post, I will tell you fundamentally, what are the differences between such types of engineers. But first, let the story be told from a more anecdotal perspective.
Some general observations of Good, Bad, and Evil engineers
Reflecting back on your own experiences and knowledge of the engineering world, what do you think are the common behaviors of the good, bad, and evil engineers?
Below are my observations:
Good engineers:
- They recognize the problems
- They solve the problems using a sustainable approach
- They also solve other problems related to the identified root cause
Bad engineers:
- They recognize the problems
- They solve the problem for a short-term
- They create more problems by solving the original problem
Evil engineers:
- They pretend not to see the problems
Looking into a concrete example
Let me make it easier to imagine these three engineer persona by describing a concrete example in the data engineering world.
Take a data engineer building a pipeline, which copies a set of raw data tables from a transactional data warehouse into a container in the cloud. Following the medallion architecture, where data goes through the bronze, silver and gold layers, they first clean the data and dump them into a set of bronze-layer tables in the designated data lakehouse. Next, they normalize the table in the silver layer, as well as establish relationships between them. Finally, they join multiple tables together in a view and create new features to represent the business metrics to be fed into Tableau dashboards.
During the testing of the dashboards, it is noted that there are missing values for a certain column in some records. Business users are concerned as they see more than 50% of records with missing data for that column, but they also acknowledge that the data may be incomplete at the source. Now, the engineers will need to investigate and resolve the problem.
A good engineer will:
- First, they know very well how that column got transformed from bronze to gold to silver layer. In other words, they will know the exact data lineage of the column with missing data.
- Identify a sample record with missing data in the gold layer, but has data at the source for that column. If they can’t identify any record in the whole population, they pronounce that the data themselves are incomplete.
- If a valid record with missing data is identified, they then apply the transformation logic manually on that sample record again to see why the data for that column didn’t come through. Here there will be 2 scenarios:
- Scenario 1: the sampled record contains some unexpected characteristics, making their column values excluded from the gold layer. In short, this is a design problem. In this scenario, a good engineer will discuss these unexpected characteristics with the product owner and determine a treatment plan for them. Either they will decide that they can safely ignore this subset of the population, as the data with these characteristics are not relevant for the business objective; Or they will come up with custom transformation logic for them, in order to bring the data in.
- Scenario 2: the column value comes through in their manual transformation, this means their perception of the initial data lineage is wrong. In short, this is an execution problem. The good engineer will go back to check what the data pipeline is doing, or what the data lineage actually is. Then they repeat the rest of the steps.
A bad engineer will:
- Have a poor understanding of the data lineage.
- Identify a sample record with missing data in the gold layer, but has data at the source for that column. If they can’t identify any record, they pronounce that the data themselves are incomplete.
- If a valid record with missing data is identified, try to apply a manual logic transformation on a record with missing data to see why the column doesn’t come through.
- Come up with the wrong conclusions of why the column values don’t come through, mainly because their understanding of the data lineage and overall data pipeline is wrong.
- If their observation leads them to a Scenario 1 conclusion as above (a design problem), they will inform the team that this is a data quality issue and call it a day. They assume the design is perfect and there is nothing to improve here.
- A more ethical but also more disastrous engineer will attempt to come up with a custom treatment for the impacted records (i.e. modify the design), however, they make a bigger mess as their perception of the data lineage is incorrect to begin with.
- If their observation leads them to a Scenario 2 conclusion (an execution problem), they will go back and study the gap between the implemented and designed data pipeline, and may actually come up with the right solution next time.
What an evil engineer will do?
- They may or may not know the correct data lineage, this is irrelevant.
- They pronounce that as the data for the column is incomplete from the source (based on what the business told them), of course, the data will be missing in the dashboard.
- Then they assume that there is no problem with the data pipeline, as the data is inherently incomplete.
- They call it a day and go home.
The fundamental differences between good, bad, and evil engineers
Hopefully, my example above has given you a clearer depiction of the three types of engineers. However, the example can only assist you in the long run after you’ve grasped the fundamental differences between good, bad, and evil engineers. To systematically differentiate among the three, it is vital to find out their essential characteristics:
Here is my take on that:
- A good engineer possesses 3 qualities: exceptional knowledge, commitment to truth, and commitment to result.
- A bad engineer lacks either exceptional knowledge or commitment to results. However, they do have a medium level of commitment to truth.
- An evil engineer has no or little commitment to truth. The result is of no importance to them. They care about other aspects (perhaps the appearance of results), or they don’t care about anything at all. It’s rare for an evil engineer to have exceptional knowledge, but if they do, it’s not relevant anyway, as again, they care neither for the truth, nor the result.
Some of you may find that there is not a clear distinction between the bad and evil engineers here. Normally, evil often does harm — so you would expect an evil engineer to introduce malicious code with bad intentions, or to cover their past mistakes. I agree with that. Yet, what I’d like to highlight here is where I draw the line between bad and evil:
It doesn’t necessarily require a malicious action for the engineer to be evil, once the engineer starts ignoring the truth in front of their eyes (i.e. pretending not to see the problems), they cross into the realm of evil.
And the more facts they ignore, the more evil they will become.
How to identify good, bad, and evil engineers?
So next time, when you meet an engineer, look for indicators of all these three qualities. Don’t be so assured yet if you just find a list of credentials, certifications, or decades of experience — they are just indicators of exceptional knowledge.
Commitment is an active state of mind. To find indicators of commitment to truth or results requires careful investigation of historical behavioral patterns, continuous analysis of one’s thinking process, and observations of their reactions toward challenges.
Neglecting to look for indicators of commitment or truth is to neglect your own success, and let it be decided by the supposedly ‘knowledgable’ engineers.
In the end, this is about you being responsible for your own hiring/partnership decision. If you don’t want to waste your money, start identifying the good, bad, and evil engineers.
***
Hi there, if you are reading this, the chance is you care about data. You think there are invaluable values that can be extracted from data, and you are eager to find the best strategies, implementation practices, and tools to extract as much value as possible out of your organization’s (or your own) data assets.
If that is true, then check out my weekly newsletter — Data & Beyond Dispatch. Every edition will bring you insightful content from the Data community, which is curated and summarised to give you fresh, well-articulated, and practical perspectives on the mission, visions, strategies, and toolboxes of truly effective Data Leaders.
Good engineers, bad engineers, and evil engineers — an anecdote for data leaders was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
Good engineers, bad engineers, and evil engineers — an anecdote for data leaders