In the information age, data is everything, so it’s no wonder that positions like ‘data scientist’ and ‘data engineer’ have been created. While these are new job titles, the core work has existed for a while in the form of data analysts. In the past, anyone who analyzed data would be known as a data analyst and anyone who created backend support platforms to handle the data analysis would be known as a ‘business intelligence (BI) developer.
However, with the emergence of big data, the requirement to handle large volumes of data by corporations and research facilities grew and so did the requirement for data scientists and engineers. Below is a quick guide to the different roles in the world of big data.
1.The Data Analyst
Analysts have been churning out all forms of data into manageable chunks of information and processing this into reports and visual representations. They also tend to have a strong understanding of professional tools to solve problems and help guide business decisions. From pie charts to bar graphs, visual representations can be very useful in helping to make decisions about complex problems when there is an excess of information to process. However, don’t expect data analysts to analyze big data. They aren’t generally equipped to know the mathematics or have the technical know-how to tackle and develop complex algorithms for specific problems.
Core skills: Statistics, data munching, data visualization and exploratory data analysis.
Tools used: Microsoft Excel, SPSS, SPSS Modeler, SAS, SAS Miner, SQL, Microsoft Access, Tableau, SSAS.
2.The business intelligence developers
BI developers are data experts and work with internal stakeholders to understand reporting needs. They then proceed to collect requirements, design and build business intelligence and reporting solutions. They are also tasked with designing, developing and supporting new and old data warehouses, ETL packages, dashboards and analytical reports. They work with SQL to integrate data from different sources. However, BI developers don’t perform data analysis.
Core skills: ETL, developing reports, OLAP, cubes, web intelligence, business objects design.
Tools used: Tableau, dashboard tools, SQL, SSAS, SSIS and SPSS Modeler.
3.The data engineers
These engineers create the infrastructure to be analyzed by data scientists. They are software engineers who design, build, integrate and manage big data. They generally work on optimizing the performance of the big data ecosystem within a corporation and ensuring everything is easily accessible and working smoothly. Using tools like MySQL or MongoDB they might run Extract, Transform and Load (ETL) on top of large datasets and create data warehouses which can be used for reports and analysis by the data scientists. Data engineers keep themselves busy with the design and architecture of the whole ecosystem and don’t generally dabble with machine learning.
Core skills: Hadoop, MapReduce, Hive, Pig, Data streaming, NoSQL, SQL, programming.
Tools used: DashDB, MySQL, MongoDB, Cassandra.
4.Finally the data scientists
There are many sources on the internet that hail the data scientist as modern day alchemists turning raw data into purified insights. While that may be a bit of an exaggeration, it’s not too far from the truth! Statistics, machine learning, and analysis are the main tools of a data scientist. These tools can be used to solve critical business problems. Turning large rivers of big data into valuable and actionable insights is no easy task. Data science is an evolution of the data analysis done in the past few years improving on it with automation and machine learning. Data scientists are expected to be veteran programmers with an ability to design new algorithms on the fly. There is a huge demand for these individuals these days.
But what’s the point of all these actionable insights if it cannot be visualized properly? In fact, one of the expectations from this job is to be able to visualize the results of their findings using apps, or other technologies. For example, think of the Jarvis UI from the Iron Man movies. Narrating interesting stories about the solutions to their data or business problems becomes part of the job.
Data scientists are also required to understand traditional and new data analysis methods to build statistical models and discover patterns in data.