Navigating roles within the data world can be tricky:
So, how do you know who to hire to achieve your data goals?
Understand Your Data Needs
It can be tempting to say that you just need to hire a good data person; unfortunately, people who work in data are not one-size-fits-all. The data spectrum is wide, and you may end up with a phenomenal data person in one area who cannot help you achieve your goals in another.
To start, you need a clear understanding of the data work you want to tackle as this will help you understand the skill sets you need to do the work.
Generally, data needs fall into one of three categories: infrastructure, analysis, and science. For our purposes, we’ll use these three categories as a starting point for determining the skill sets you’ll need for each.
While there is no universal template for data roles, people in data roles tend to specialize in or gravitate toward one of the three categories from above — infrastructure, analysis, or science — though there can definitely be overlap. We will simplify the data roles into three common archetypes and focus on the types of people who are best suited for tackling projects associated with each.
This is general and meant to highlight the similarities and differences in the types of data skill sets. It’s entirely possible that the folks you need don’t fit clearly into one of the archetypes above, especially when a company has specialized needs.
The bottom line is that you can use these archetypes and the associated skill sets to bundle the capabilities you need and effectively communicate the roles you’re looking to fill.
Data Engineer | Enabling Data
Data Engineers are the enablers of data. They are the foundation of the data function, and provide consistent, accurate, reliable, and usable data to consumers of data within an organization. They build and maintain a platform that enables moving and transforming data for downstream consumers. In some organizations, they are also responsible for the machine learning infrastructure that enables real-time modeling.
- Logging data consistently
- Extracting data
- Aggregating data in a single place
- Transforming raw datasets into canonical tables
- Observing data and ensuring data quality
- Maintaining data privacy and security
- Building and maintaining the machine learning infrastructure
Coding: Knowing how to code is critical. Much of the work requires knowledge of Python and SQL, and some may require knowledge of others such as Java, Spark, or Scala. Having a solid coding background with the ability to pick up new languages is essential.
ETL: Experience with Extract, Transform, and Load (ETL) is critical as this can be a major function of the role, especially if the role is responsible for building out or maintaining the infrastructure.
Collaboration: Data Engineers are responsible for ensuring data is accessible to answer business questions, so it’s essential they can effectively collaborate with various technical and non-technical teams to make this happen.
Attention to detail: Little inconsistencies can signal big problems. Data Engineers must closely observe the data and proactively identify issues when they happen. Knowing where to look, being thorough, and paying close attention to details will enable your Data Engineer to proactively identify, surface, and fix issues before decisions are made with bad data or stakeholders begin to lose trust in the data quality.
Roles With Business Context: A Data Engineer needs the appropriate amount of business context in order to understand the kind of data or platform they need to make available. Whether this be through other data roles (e.g., Analysts or Scientists) or business stakeholders, the Data Engineer must have an effective collaboration process to gain the necessary information.
Data Analysts and Data Scientists: Data Engineers provide the necessary infrastructure so that consumers of data — including Data Analysts and Scientists — can be more effective and efficient in their work. A Data Engineer may work with an analyst to define table schemas and a scientist to productionize models, so having clear and open lines of communication up and down the data chain is critical.
Engineers: Data Engineers may find themselves working with various Engineers to implement, test, and/or log data. Whether this is internal event data or external API data, Engineers need to know to involve Data Engineers to ensure the impact of their efforts can be measured.
- Having a talented team that lacks business context
- Building but not maintaining data pipelines, leading to disruptions in data availability, quality, and reliability as you scale
Data Analyst | Making Sense of Data
Data Analysts make sense of and find opportunities within the data so that teams can make better decisions. They work with business stakeholders to understand the questions the business has and the problems it wants to solve, and then they analyze, measure, and report findings back to the business.
- Measuring business metrics
- Making metrics transparent and available across the organization
- Exploring and recommending areas of opportunity
- Storytelling and helping the organization leverage data to make decisions
- Experimenting to measure impact
Coding: The ability to pull, clean, and explore data is essential, and this is generally done using SQL. Sometimes Data Analysts will use other languages, such as Python or R, to manage larger data or more complex visualizations.
Visualizing Data: Being able to show the right data in the right way is an invaluable skill. Visualizing data is not just charts, graphs, and tables, but is also about seamlessly pulling the data together to tell a story.
Communication & Stakeholder Management: One of the most important skills of an analyst is being able to understand the business context so they can identify the questions the business has and the issues it may be facing and leverage data to address them holistically.
Exploring data: With the appropriate business context, Data Analysts can ask the right questions of the data to uncover interesting data findings. As someone who is the closest to the data, they need to have the latitude to not just answer the questions the business is asking, but also explore the data more holistically.
Storytelling: Stories are an incredibly effective way for people to remember information, so the best way to make data valuable and useful is to construct a data story for stakeholders. This will provide stakeholders with a stronger understanding of the business and help them ask better questions and make better decisions.
Business stakeholders: The best and quickest way for a Data Analyst to gain the appropriate level of context within the business is to build strong relationships with their business partners. The analyst and stakeholder should be in regular contact through a mix of 1:1s, team meetings, and All Hands.
Data Engineers: Data Analysts work closely with Data Engineering in order to ensure the appropriate data is available, easy to pull, and in an analyzable format. Analysts will flag important raw data that needs to be transformed and help construct effective table schemas.
Data Scientists: There can sometimes be overlap between Data Analysts and Data Scientists because both are pulling data, though for different purposes. Having analysts inform data scientists of areas of opportunity can be extremely valuable.
- Assuming data quality is high
- Prioritizing answering questions instead of evaluating whether those are the right questions to ask
Data Scientist | Automating Decisions with Data
Data Scientists specialize in building data products that automate decision making for humans or machines. Data Scientists can also specialize in statistics or causal inference, helping in experimentation and impact measurement.
- Building models to predict business outcomes
- Building products to automate decision-making
- Estimating impact through experimentation and causal inference
Coding: Data Scientists are typically dealing with large data sets. Knowing how to pull and explore data using SQL is required, and knowing a scripting language such as Python or R is necessary for large data sets and modeling.
Algorithms & Modeling: Data Scientists leverage algorithms and models to understand their data and make predictions that enable better and automated decision making. Projects may require experience with supervised and unsupervised machine learning, including techniques such as regression, tree-based models, and neural networks.
Statistics: Data Scientists apply statistical concepts to analyze, explain, and interpret data for decision making. Further, understanding statistical inference and probability enables Data Scientists to draw causal conclusions about data through experimentation and causal inference.
Communication: Data Science can be extremely nuanced and complicated, so Data Scientists must be able to effectively communicate technical concepts to non-technical folks.
Business Savvy: One of the most important considerations for a Data Scientist is whether they’re working on the right problems to optimize for business value. Understanding what the business is trying to solve and where to focus their efforts will ensure that the work benefits the business and is not just fun or cutting edge.
Prioritization: Oftentimes a Data Science project is taking something that’s currently at 80% of an ideal solution and getting it to 90% – 95% of an ideal solution. Understanding the magnitude of the impact the improvements can have will enable a Data Scientist to prioritize their time across projects to maximize overall impact.
Data Analysts: There can be a lot of overlap between Data Scientists and Data Analysts because both are pulling data to understand what’s happening. Data Scientists should work with Data Analysts to understand where areas of opportunities are and how they can divide and conquer problems.
Data Engineers: Data Scientists will work closely with Data Engineering In order to ensure the appropriate data is easy to pull and analyze. Further, Data Scientists often work with Data Engineers to productionize models and enable a machine learning platform.
Business Stakeholders: Though a Data Scientist’s core job is not to help the business broadly understand what’s happening, Data Scientists frequently work with their business stakeholders to make sure they have a holistic understanding of the problem space in order to identify and solve the most impactful problems. Business stakeholders generally have a great idea of the “what”, and Data Scientists can help determine the “how”.
- Apply overly complicated solutions to problems
- Fail to fully understand the business context and build things that don’t improve a business problem
- Build and launch models without guardrails
Stakeholder | Consuming Prepared Data
We’d be remiss if we didn’t mention a fourth bucket that is more of a catchall for those who work with data regularly, but do not fall into one of the other archetypes. This bucket includes folks in Strategy & Operations, Finance, Marketing, Customer Service, Research, etc who are generally the customers of a data team and may present the data back to their teams or manipulate the data to dive deeper.
These folks sometimes know little SQL — though it’s not always required of them — so they sometimes rely on the data team to pull data for them.
- Misinterpreting data or taking it out of context
- Over-reliance on the data team with asks to pull a lot of data without providing sufficient context
|Question: Who should be responsible for building data transforms: Data Engineers or Data Analysts?
|Answer: For teams with both Data Engineers and Data Analysts, this can depend on your team structure as well as the relationships across your team. Since Data Analysts are the main consumers of transformed data, it can be very helpful to have Data Analysts define the schema and/or build the transforms and have Data Engineers implement and monitor them. However, Data Engineers generally have expertise in designing and maintaining transformed data, in which case a collaboration between your Data Analyst and Data Engineer might be more appropriate. If Data Engineers are close with the business and really understand the needs, they can create transforms for the rest of the business. For teams with either a Data Analyst or a Data Engineer but not both, the person you have will likely be the recipient of this responsibility.
|Question: Who should be responsible for running experiments?
|Answer: Experimentation can be truly cross-functional as there are multiple components, including accurate group logging, design, analysis, and general process. Data Engineers can be heavily involved in the experimental logging phase, though this generally requires a larger upfront investment to enable this for all experiments. For straightforward A/B experiments, Data Analysts may be able to work with stakeholders to design and analyze experiments. For more complex or difficult experiments, it may be necessary to pull in someone with a stronger statistics or causal inference background. The general process for setting up, running, and analyzing experiments should be run by the team who is willing to review experimental designs and share results with other teams, though this is generally a more collaborative process across multiple roles.
|Question: What if my company has particular needs when it comes to data?
|Answer: This is often the case, and your needs for specific roles might look a bit different. For example, if your company’s main product is ML, your ML Engineer might look more like a combination of Data Engineering and other Engineering roles. The Finance team might have its own analysts who exercise skills across archetypes but with a purely financial focus. Operators can also exercise their analytics or data science skill sets for particular projects, though their work is generally not built to scale.
How To Get Started
Want to better understand your data needs and which archetypes can help you hit your data goals? Complete this questionnaire to get started!