top of page

Data Engineer – In search of a Holistic Unicorn

[et_pb_section fb_built=”1″ _builder_version=”4.8.1″ _module_preset=”default” custom_padding=”3px|||||”][et_pb_row _builder_version=”4.8.1″ _module_preset=”default”][et_pb_column type=”4_4″ _builder_version=”4.8.1″ _module_preset=”default”][et_pb_text _builder_version=”4.8.1″ _module_preset=”default” min_height=”305px”]My previous blog in this section focused on the necessity of a Data Engineer in a team and how they differ from other data science roles. This last section of the blog would emphasize more over the Roles and Responsibilities of a Data Engineer and their desired skill set. With the explosion of data & database technologies, we have seen many tools that are available and have similar capabilities. The range of these technologies is also huge and therefore have led to too many tools and technologies for a Data Engineer. It is not possible for one to know all of them, rather a single one is sufficient if there is no specific business requirements. And even if the business requirement changes, fortunately the fundamentals are all similar and won’t make it impossible to shift to another technology. But again, expecting a Data Engineer to know all, is like having a Unicorn…’sounds good to have, but doesn’t exist’. An everyday Data Engineer’s roles, responsibilities & skillset needs vary based on the size of the company and the complexity of a project. It is therefore, important to know how to categorize them and where do they fit in the organization. [/et_pb_text][et_pb_text admin_label=”Überschrift” _builder_version=”4.8.1″ _module_preset=”default” min_height=”49px” custom_padding=”||0px|||”]


[/et_pb_text][et_pb_text _builder_version=”4.8.1″ _module_preset=”default” min_height=”102px”]

Depending on the company size a Data Engineer could take one of the three roles, which changes from a general to a narrowed down scope based upon the Team size and Database complexity.

[/et_pb_text][et_pb_image src=”” title_text=”Roles_dataEngineer” _builder_version=”4.8.1″ _module_preset=”default” min_height=”460px”][/et_pb_image][et_pb_text _builder_version=”4.8.1″ _module_preset=”default”]Generalist: Typical for small teams, where a Data Engineer has to wear many hats. They must look after the complete process right from inception of Data, Managing the pipelines and maintenance to the Analysis part. One has to be a good communicator with sound Business acumen. [/et_pb_text][et_pb_text _builder_version=”4.8.1″ _module_preset=”default” custom_margin=”8px|||||”] Pipeline-Centric: Where the Data engineers help build pipelines as per the Use case, which would later be used by the Data Scientist or the Data Analysts. [/et_pb_text][et_pb_text _builder_version=”4.8.1″ _module_preset=”default” hover_enabled=”0″ sticky_enabled=”0″]Database-centric: When the organizational data is large, in addition to the pipelines there is also a requirement of maintaining the analytical databases. Since the complexity of the databases is huge, that itself becomes a full time job. [/et_pb_text][et_pb_text admin_label=”Überschrift” _builder_version=”4.8.1″ _module_preset=”default” min_height=”44px” hover_enabled=”0″ sticky_enabled=”0″ custom_padding=”||0px|||”]


[/et_pb_text][et_pb_text _builder_version=”4.8.1″ _module_preset=”default” hover_enabled=”0″ sticky_enabled=”0″]A Data Engineer develops, builds, tests & maintains the complete architecture of a Data processing system. As a Data Engineer you will be responsible for the following things. [/et_pb_text][et_pb_text admin_label=”Überschrift” _builder_version=”4.8.1″ _module_preset=”default” min_height=”28px” hover_enabled=”0″ sticky_enabled=”0″ custom_padding=”||0px|||”]

Architecture Design

[/et_pb_text][et_pb_text _builder_version=”4.8.1″ _module_preset=”default” hover_enabled=”0″ sticky_enabled=”0″]In its core, Data Engineering comprehends architecture design, deployment and maintenance of a Data platform. It must have a careful consideration of the changing business requirements so that the change in the system is more resilient. [/et_pb_text][et_pb_text admin_label=”Überschrift” _builder_version=”4.8.1″ _module_preset=”default” min_height=”28px” hover_enabled=”0″ sticky_enabled=”0″ custom_padding=”||0px|||”]

Building & maintain ETL (Extract-Transform-Load) pipelines:

[/et_pb_text][et_pb_text _builder_version=”4.8.1″ _module_preset=”default” hover_enabled=”0″ sticky_enabled=”0″]Fundamental of every data architecture, this process involves extracting data from various sources, transforming it and loading it into data warehouse, which is utilized by the end users for analysis purpose. [/et_pb_text][et_pb_text admin_label=”Überschrift” _builder_version=”4.8.1″ _module_preset=”default” min_height=”28px” hover_enabled=”0″ sticky_enabled=”0″ custom_padding=”||0px|||”]

Building & maintaining Data Warehouse/Lake:

[/et_pb_text][et_pb_text _builder_version=”4.8.1″ _module_preset=”default” hover_enabled=”0″ sticky_enabled=”0″]In big organizations Building and Maintaining a Data Warehouse is a full time role. The existence of many databases makes it necessary to have responsible people, from governance’s point of view. They take care of the schema & organizing the metadata and define the ETL process. [/et_pb_text][et_pb_text admin_label=”Überschrift” _builder_version=”4.8.1″ _module_preset=”default” min_height=”28px” hover_enabled=”0″ sticky_enabled=”0″ custom_padding=”||0px|||”]

Manage Data & Meta Data

[/et_pb_text][et_pb_text _builder_version=”4.8.1″ _module_preset=”default” hover_enabled=”0″ sticky_enabled=”0″]The data can be stored in a warehouse either in a structured or unstructured way. The data contains meta-data (data about data) which is helpful in documentation and for a quick access to different information about a database. A data engineer is responsible of managing the data stored and structuring it via DBMS systems while ensuring proper Governance. [/et_pb_text][et_pb_text admin_label=”Überschrift” _builder_version=”4.8.1″ _module_preset=”default” min_height=”28px” hover_enabled=”0″ sticky_enabled=”0″ custom_padding=”||0px|||”]

Optimization & Scalability

[/et_pb_text][et_pb_text _builder_version=”4.8.1″ _module_preset=”default” hover_enabled=”0″ sticky_enabled=”0″]It is a usual situation with big data architecture systems that the pipeline run takes hours to run and might not be configured correctly, this could greatly affect the availability of data while having a significant price impact. It is expected from a Data Engineer to optimize the available system while ensuring availability and scalability. [/et_pb_text][et_pb_image src=”” _builder_version=”4.8.1″ _module_preset=”default” title_text=”DataEngineer_Requirements” hover_enabled=”0″ sticky_enabled=”0″][/et_pb_image][et_pb_text admin_label=”Überschrift” _builder_version=”4.8.1″ _module_preset=”default” min_height=”47px” hover_enabled=”0″ sticky_enabled=”0″]

Required Skills

[/et_pb_text][et_pb_text _builder_version=”4.8.1″ _module_preset=”default” hover_enabled=”0″ sticky_enabled=”0″]

A Data Engineering skill requirement is holistic and includes many tools and technologies being used to in combination. If you search for a Data Engineer with complete knowledge and skill set of all the available data Engineering tools & technologies, you might rather have a better chance in finding a Unicorn. Yet keep a realistic expectations from a normal DE in flesh and bone, the skills can be clustered into these 6 categories.

[/et_pb_text][et_pb_image src=”” _builder_version=”4.8.1″ _module_preset=”default” title_text=”DataEngineer_skillset” hover_enabled=”0″ sticky_enabled=”0″][/et_pb_image][et_pb_text _builder_version=”4.8.1″ _module_preset=”default” hover_enabled=”0″ sticky_enabled=”0″]A Data Engineer’s daily task would be to maintain the databases and hence must possess a good knowledge of DBMS & Database Systems and their scripting language like SQL or NoSQL.[/et_pb_text][et_pb_text _builder_version=”4.8.1″ _module_preset=”default” hover_enabled=”0″ sticky_enabled=”0″]At least one of the Programming languages like python, scala or java is a must have for a Data Engineer. It helps perform statistical analysis and modelling. The language requirement depends on the tools that would be used, like MapReduce, AWS, Azure, Apache Spark or Hadoop but being proficient in at least one is a must.[/et_pb_text][et_pb_text _builder_version=”4.8.1″ _module_preset=”default” hover_enabled=”0″ sticky_enabled=”0″]Realtime Streaming Data is another necessity in many organizations where the most recent data brings in significant business values. Example of a Realtime Streaming data use case, is the car share price surge that is based on demand or weather conditions, or during your flight arrival or departure time if you plan your trip to or from the airport.[/et_pb_text][et_pb_text _builder_version=”4.8.1″ _module_preset=”default” hover_enabled=”0″ sticky_enabled=”0″]Data Warehousing will enable you to store huge amounts of data for analytics and these data comes from various sources and is therefore one of the fundamentals. As a Data Engineer you must be proficient in at least one of the data warehousing tools like Snowflake, Oracle, Azure or AWS. In addition to these, knowledge of Operating systems is also important if the Operations is based on any one of the Operating systems.[/et_pb_text][et_pb_text admin_label=”Überschrift” _builder_version=”4.8.1″ _module_preset=”default” hover_enabled=”0″ sticky_enabled=”0″]


[/et_pb_text][et_pb_text _builder_version=”4.8.1″ _module_preset=”default” hover_enabled=”0″ sticky_enabled=”0″]While looking out for the ideal Data Engineer for your team, the key word match is not enough. An ideal candidate is the one who might have only one skill for each categories like programming, database & data warehouse knowledge, but has a holistic skill-set balance with a good business understanding and can steer the project in the right direction. [/et_pb_text][et_pb_button button_url=”@ET-DC@eyJkeW5hbWljIjp0cnVlLCJjb250ZW50IjoicG9zdF9saW5rX3VybF9wYWdlIiwic2V0dGluZ3MiOnsicG9zdF9pZCI6IjIwOTQifX0=@” url_new_window=”on” button_text=”Newsletter Anmeldung” button_alignment=”center” _builder_version=”4.8.1″ _dynamic_attributes=”button_url” button_text_color=”#303344″ button_bg_color=”#efd430″ button_border_color=”rgba(0,0,0,0)” button_border_radius=”30px” button_letter_spacing=”2px” button_font=”Poppins|600|||||||” button_use_icon=”off” custom_padding=”16px|32px|16px|32px|true|true” animation_style=”slide” animation_direction=”right” animation_intensity_slide=”5%” button_text_size__hover_enabled=”off” button_text_color__hover_enabled=”off” button_bg_color__hover_enabled=”off” button_border_color__hover_enabled=”off” button_border_radius__hover_enabled=”off” button_letter_spacing__hover=”2px” button_letter_spacing__hover_enabled=”on” button_border_width__hover_enabled=”off”][/et_pb_button][/et_pb_column][/et_pb_row][et_pb_row _builder_version=”4.8.1″ _module_preset=”default” custom_margin=”|auto|-43px|auto||”][et_pb_column type=”4_4″ _builder_version=”4.8.1″ _module_preset=”default”][et_pb_divider color=”#303344″ _builder_version=”4.8.1″ _module_preset=”default”][/et_pb_divider][/et_pb_column][/et_pb_row][/et_pb_section][et_pb_section fb_built=”1″ _builder_version=”4.8.1″ _module_preset=”default” min_height=”199px” custom_padding=”0px|||||”][et_pb_row column_structure=”1_3,2_3″ _builder_version=”4.8.1″ _module_preset=”default” min_height=”218px” custom_padding=”||2px|||”][et_pb_column type=”1_3″ _builder_version=”4.8.1″ _module_preset=”default”][et_pb_image src=”” alt=”Peter Schmäling” title_text=”BIld2″ _builder_version=”4.8.1″ _module_preset=”default”][/et_pb_image][/et_pb_column][et_pb_column type=”2_3″ _builder_version=”4.8.1″ _module_preset=”default”][et_pb_blurb title=”Tushar Poojary” _builder_version=”4.8.1″ _module_preset=”default”]

Tushar Poojary is a Junior Solution Architect at HUBSTER.S

[/et_pb_blurb][et_pb_blurb _builder_version=”4.8.1″ _module_preset=”default”]


Aktuelle Beiträge

Alle ansehen

HUBSTER.S ist Ihr Partner wenn es um die Realisierung von Projekten in den Bereichen Business Intelligence, Operational Intelligence und Industrial Analytics geht. 

Join Us on the Journey

  • LinkedIn

© 2023 by HUBSTER.S GmbH


Hohe Heide 8

97506 Grafenrheinfeld - Deutschland

+49 151 61573568

bottom of page