IBM Introduction to data engineering : Module 4 review

What I learn in module 4 of IBM data engineering?

Module 4 of IBM Data Engineering is about careers and opportunities in the DE field. I find this module particularly important because I have been researching how to be a data engineer for the past month. The content on the internet is confusing. 
What I learned from the internet in the past month about being a data engineer is that it is a highly technical field. There are no sure-shot paths to be a DE, no curriculum and a lot of noise. I took my time researching for a month before becoming laser-focused on what to do in the shortest time possible. A view of what I learned in the past month's internet search is learning SQL, NoSQL, Postgre SQL, Python and Java. This module added some more valuable inputs in what to learn.

The question would be why Java, why not Scala because a lot of interview requirements are either Java or Scala? I found out that one can't learn Scala if they don't have a prerequisite understanding of Java. It's just painful to learn Scala without Java knowledge. Coming from a non-tech background it took me time to understand this point, so Java is a must at some point in the future. I decided to learn Java after mastering Python. The reason for doing so is Python is easy, can be learned fast and once done learning all the concepts then can make projects in Python while learning the basics of Java. I am half-baked in Python currently.

The first video doesn't tell much except for some website reviews of the DE profession. The second video is about how the presenters got into the DE field. I learned from this video that DE is a highly specialised field. People were database administrators evolving to DE as business requirements grew on the job, system administration or development. One person became DE from a business intelligence analyst position. These kinds of transitions could happen if the exposure to data is more towards data modelling.

The third video is about the DE learning path. The presenters focussed on a degree in CS or software engineering but also emphasised how one can be self-taught in this domain. It discusses about different DE domains i.e. 1) data integration 2) data pipeline 3) data lake 4) data warehouse 5) distributed systems 6) data. It guided to develop baseline technical skills which are programming and query language ( Java, Python and SQL), operating systems and database knowledge. Once these skills are developed then pursue advanced skills and move to larger teams and companies where advanced skills are required. I found this video to be the most important in the module. It laid down a clear-cut path of what needs to be done and what are the levels of doing it. Level 1 is to get your basics right and level 2 is to get to advanced skills in different DE domains once basics are right. The third video summed up what I have been researching on the internet past month. This is the learning path of anyone who wants to be a DE. 

The fourth video is about what employers look for in a DE. This video scares me because, in the opening note, the presenter said that employers look for a breadth of technologies and data migration. Whoops. That's a lot for a beginner from a non-tech background. The more important stuff comes in the latter half of the video where presenters say they are looking for a normal human being who is curious enough to learn continuously. Some of the essential skills looked into are SQL, data modelling, ETL methodologies and programming, skills on RBDMS, ability to handle data and multiple file formats, ability to work with web APIs and web scraping, automation of routine work and basic data analysis. The rest of the video goes to all over the place for a beginner. 

The fifth video is about different paths to data engineering. I agree with what these guys are saying here because data engineering is a technical field. It's tough to get into DE directly unlike data analysis where you need to learn libraries in Python and Power BI and you are done. I found DE to be more niche and technical than a data analyst or business analyst. And then came the mother of all, someone who is into software development or ETL development can be a DE candidate. I agree with this because in a way DE is a branch of software dev where new systems designs for data management are done. An ideal path to start a DE career is database administration which most of the presenters recommended. Working knowledge of databases, data sources, and data formats, and the ability to build data pipelines. I think one can do a portfolio project around this. System design is also another career path towards data engineering.

The sixth video is about advice for aspiring data engineers. The first presenter gave advice on having strong foundations in SQL, Python, Data modelling and ETL methodologies and also emphasised having hands-on experience with the projects with open source materials. Another presenter advised to build a database and have mentors for professional guidance. Another presenter talks about having sound fundamentals of database internals, knowledge of one procedural language and one OOP language and one functional language like Scala. The presenters also emphasised writing beginner blogs, creating YouTube videos and teaching others. The video got quite confusing with different people telling different paths. My takeaway from the video was what the first presenter told me because that is doable for me in the shortest time possible. 

The documentation is about Data warehousing profiles which one can read at leisure. My takeaway from this module was video 3 which clearly and concisely told me what to do and how to do it. DE is a new technology field and a highly technical field I assume it is important and hard to clear technical interviews at the entry level itself. From this module, I could deduce the expectations from candidates vary among people. I expect DE interviews not to be straightforward and the journey gonna be tricky but interesting. 
See yaa tomorrow.

Popular posts from this blog

Swing Trading mindset

SQL 101 : CH 1 Notes : Teach yourself SQL

6th Jan, 2025 Nifty trade day