First Thing First - Data Analysis Process

Hardikkumar
5 min readFeb 9, 2021
Photo by Campaign Creators on Unsplash

Data analysis follows a step-by-step process. Each stage requires different skills and knowledge. To get meaningful insights, though, it’s important to understand the process as a whole.

In this article, we’ll explore the main steps in the data analysis process. This will cover how to define your goal, collect data, and carry out an analysis and most where applicable. When you’re done, you’ll have a much better understanding of the basics.

Ready? Let’s get started…

Data Analysis Process
  1. Defining the question
  2. Collecting the data
  3. Cleaning the data
  4. Analyzing the data
  5. Visualize & Present your data

Defining the question

In queue of data analysis first step is to define your objective. Sometimes called the “problem statement”.

Defining your objective means coming up with a hypothesis and figuring how to test it. Start by asking: What business problem am I trying to solve? While this might sound straightforward, it can be trickier than it seems. For instance, Organization’s management might pose an issue, such as: “Why are we losing customers?” It’s possible, though, that this doesn’t get to the core of the problem. As an data analyst we need to understand the business and its goals in enough depth that they can frame the problem the right way.

Let’s say you are working for a gaming company. This company creates TOP game’s for its player. While it is excellent at securing million’s of new players, it has much lower active players everyday. As such, your question might not be, “Why are we losing player’s?” but, “Which factors are negatively impacting the users experience?”

Now you’ve defined a problem, you need to determine why do users churn quickly. Is it because of bad user experience, do they get bored quickly, or is it something else?

Collecting the data

Once you’ve established your objective, you’ll need to create a strategy for collecting and aggregating the appropriate data. A key part of this is determining which data you need. This might be quantitative (numeric) data, e.g. level difficulty, or qualitative (descriptive) data, such as player reviews. All data fit into one of three categories: first-party, second-party, and third-party data. Let’s explore..

What is first-party data?

First-party data are data that you, or your company, have directly collected from players. It might come in the form of transnational tracking data or information from your players state. Whatever its source, first-party data is usually structured and organized in a clear, defined way. Other sources of first-party data might include player satisfaction surveys, rearward tasks, or direct observation.

What is second-party data?

To enrich your analysis, you might want to secure a secondary data source. Second-party data is the first-party data of other organizations. This might be available directly from the company or through a private marketplace. The main benefit of second-party data is that they are usually structured, and although they will be less relevant than first-party data, they also tend to be quite reliable. Examples of second-party data include website, app or social media activity, like online purchase histories, or shipping data.

What is third-party data?

Third-party data is data that has been collected and aggregated from numerous sources by a third-party organization. Often (though not always) third-party data contains a vast amount of unstructured data points (big data). Many organizations collect big data to create industry reports or to conduct market research. The research and advisory firm Gartner is a good real-world example of an organization that collects big data and sells it on to other companies. Open data repositories and government portals are also sources of third-party data.

Cleaning the data

Once you’ve collected your data, the next step is to get it ready for analysis. This means “Cleaning” or “scrubbing”. There are various types of quality issues when it comes to data, and that’s why data cleaning is one of the most time-consuming steps of data analysis. For example, there could be formatting errors (e.g. rows and columns are merged), missing values, repeated rows, spelling inconsistencies etc.

Though data cleaning is often done in a somewhat haphazard way and it is too difficult to define a ‘single structured process’, we will study data cleaning in the next article.

  1. Fix rows and columns
  2. Fix missing values
  3. Standardize values
  4. Fix invalid values
  5. Filter data

A good data analyst will spend around 70–90% of their time cleaning their data if it’s not clean. This might sound excessive. But focusing on the wrong data points (or analyzing inaccurate data) will severely impact your results.

Analyzing the data

Finally, you’ve cleaned your data. Now comes the fun bit — analyzing it! The type of data analysis you carry out largely depends on what your goal is. But there are many techniques available. Univariate or bivariate analysis, time-series analysis, and regression analysis are just a few you might have heard of. More important than the different types, though, is how you apply them. This depends on what insights you’re hoping to gain. Broadly speaking, all types of data analysis fit into one of the following four categories.

Descriptive analysis

  • Descriptive analysis identifies what has already happened.

Diagnostic analysis

  • Diagnostic analytics focuses on understanding why something has happened.

Predictive analysis

  • Predictive analysis allows you to identify future trends based on historical data.

Prescriptive analysis

  • Prescriptive analysis allows you to make recommendations for the future.

Visualize & Present your data

Oh Yeah! now you finished carrying out your analyses. You have your insights. The final step of the data analytics process is to share these insights with team using data visualization or reports. Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization provide an accessible way to see and understand trends, outliers, and patterns in data.

In the world of Big Data, data visualization and technologies are essential to analyze massive amounts of information and make data-driven decisions.

Summary

In this article, we’ve covered the main steps of the data analytics process. These core steps can be amended, re-ordered and re-used as you deem fit, but they underpin every data analyst’s work:

  • Define the question — What business problem are you trying to solve? Frame it as a question to help you focus on finding a clear answer.
  • Collect data — Create a strategy for collecting data. Which data sources are most likely to help you solve your business problem?
  • Clean the data — Explore, scrub, tidy, de-dupe, and structure your data as needed. Do whatever you have to! But don’t rush…take your time!
  • Analyze the data — Carry out various analyses to obtain insights. Focus on the four types of data analysis: descriptive, diagnostic, predictive, and prescriptive.
  • Present your results — How best can you share your insights and recommendations? A combination of visualization tools and communication is key.

Thanks for reading! If you like this article please press Up-Vote :)

--

--

Hardikkumar
0 Followers

Passionate about data science {📊, 💻, 🛠️} LOVE !t