Lean Management

Scatter diagrams: How can they help you analyze?

Close-up of a scatter diagram on a digital screen with colorful data points and office tools.

scatter diagrams are excellent for analyzing variable relationships, and I’ve personally leveraged them many times to discover hidden patterns in industrial data. You’ll be amazed at how much you can learn from these basic plots. So, let’s discuss how scatter plots can help you analyze your process data and identify opportunities for improvement.

Understanding Scatter Diagrams

Professionals analyzing colorful scatter plots on a large screen in a boardroom.
Scatter plots are excellent visual tools for analyzing variable relationships, and I’ve leveraged them many times throughout my career as an engineer to identify hidden insights in data. These graphs display two variables on a coordinate plane, with each point representing a pair of measurements.

Typically, you’ll see the independent variable on the x-axis and the dependent variable on the y-axis. As you look at the data points scattered across the graph, you can begin to identify potential relationships between the variables.

Scatter plots can reveal a few different types of relationships:

  • Positive relationship: As one variable increases, the other tends to as well.
  • Negative relationship: As one variable increases, the other tends to decrease.
  • No relationship: The variables don’t appear to be related.
  • Nonlinear relationship: There’s a relationship, but it’s not a straight line.

You’ll see scatter plots across all sorts of industries. Engineers use them to analyze equipment performance. Economists use them to plot economic data. Even in healthcare, a doctor might use a scatter plot to analyze the relationship between diet and health outcomes.

Plotting Data Points on a Graph

There’s a specific process to follow if you want to create a scatter plot that communicates useful information. Here’s my step-by-step process that I’ve honed over the years:

  1. Select the variables you want to analyze. Ensure the variables are both measurable and relevant to the question you’re trying to answer with the scatter plot.
  2. Gather data on those variables. Make sure it’s accurate and representative of the opportunity in question.
  3. Choose scales for the x and y axes that allow all the data points to fit without wasting space.
  4. Plot each point, which represents a pair of observations from the dataset you compiled in step 2.
  5. Label the x and y axes clearly. If the units are important, be sure to include them.
  6. Give it a title that tells the viewer what he or she is looking at.

There are plenty of tools you can use to create scatter plots. Here are a few I’ve used throughout my career:

  • Microsoft Excel: Great for simple scatter plots
  • R: Excellent if you’re performing statistical analysis and visualization
  • Python with matplotlib: Allows you to customize anything
  • Tableau: Great if you want to create an interactive visualization

Just remember, the best tool is the one you know best that also happens to accomplish the task at hand.

Analyzing Point Distribution Plots

Scatter diagram on a whiteboard with colorful data points; person analyzing data in workspace.Interpreting scatter plots is the closest you’ll get to using your detective skills. You’re essentially searching for patterns and insights that the data is trying to reveal to you.

Start by observing the general shape formed by the data points. Do they form a forest or a tree? Do the data points form a shape? These initial observations will give you a high-level understanding of the relationship between the two variables.

Then, evaluate the strength of the data relationship. If the data points are grouped closely together and clearly form a shape, the correlation is likely strong. If the data points are scattered randomly with no shape, the correlation is probably weak (or doesn’t exist at all).

Also, keep an eye out for any outliers – data points that don’t fit the normal pattern. In my experience, outliers are usually the most interesting data points, as they often uncover hidden insights or suggest there may be a data collection error.

The most common mistakes when interpreting scatter plots are:

  • Assuming causation from correlation
  • Failing to identify non-linear data relationships
  • Analyzing the data in a vacuum
  • Drawing incorrect conclusions from small sample sizes

Always maintain a healthy sense of skepticism and analyze your findings within the context of the broader data set.

Types of Correlations in Scatter Diagrams

I’ve seen each of these types of correlations in scatter plots throughout my career, and being able to identify them will make you a better data analyst.

  • Positive correlation is when both variables increase. For example, you might find a positive correlation in a manufacturing plant between production speed and the quantity of output.

  • Negative correlation is when one variable increases and the other decreases. I commonly see this in a maintenance context where doing more preventive maintenance results in less equipment failure.

  • No correlation is when there isn’t really any relationship between the variables. However, this doesn’t mean the data is useless. Instead, it might tell you to look at different variables that impact your variables.

  • Nonlinear relationships are interesting to spot because you’ll still see a clear relationship, just not a straight line. In condition monitoring, I’ve flagged nonlinear relationships between temperature and the efficiency of the equipment.

Statistical Analysis of Scatter Diagrams

Using statistical analysis adds more rigor to your interpretations of scatter plot patterns. You can add quantitative data to support your qualitative insights.

Correlation coefficients quantify the strength and direction of variable relationships. They range from -1 (perfect negative) to 0 (no correlation) to 1 (perfect positive).

To interpret scatter plot patterns statistically:

  • Draw median lines to divide the plot into four sections.
  • Count the number of data points in each section (excluding points on the line)
  • Calculate the sum of the diagonally opposing sections (e.g., A and B)
  • Calculate Q (the lower of A and B)
  • Compare Q to a threshold, which is a function of the number of data points

Regression analysis takes this a step further, allowing you to predict values of one variable based on the other. It’s a useful tool, but keep in mind that correlation does not equal causation, and there could be other factors influencing your data.

Sophisticated Methods for Point Distribution Plots

Diverse group of analysts discussing scatter diagrams in a modern office setting.As you gain more experience with scatter plots, there are advanced techniques you can use to squeeze even more information out of them.

  • LOESS (locally estimated scatterplot smoothing) is an advanced technique that I’ve found helpful when visualizing non-linear relationships. It essentially fits a smooth curve to your data points, enabling you to identify trends that aren’t immediately obvious in a traditional scatter plot.

  • Jittering is an advanced technique that solves the overplotting problem in dense scatter plots. It involves adding slight random offsets to data points so they don’t stack directly on top of each other. Doing so can unveil valuable insights when analyzing a dataset with many duplicate values.

  • You can add additional variables to your scatter plots by using color and shape. This effectively allows you to transform a 2D plot into a multi-dimensional visual. I’ve used this technique to compare equipment performance across various manufacturers and models at the same time.

  • Bubble plots take this a step further by incorporating a third variable into the size of each point. They’re more complicated, but if used sparingly, you can extract a lot of valuable information.]

Closing Remarks

scatter diagrams are excellent visual tools to analyze relationships between variables. They’re great for spotting patterns, correlations, and outliers in data sets. I’ve leveraged scatter plots extensively throughout my engineering career to optimize equipment performance and predict failures. While statistical analysis like correlation coefficients and regression will help you dig into the data, remember these techniques have their drawbacks.

You can address most of these limitations by using a few more advanced techniques like LOESS smoothing and jittering in your scatter plots. These are the same tools you’ll use to ensure you can make data-driven decisions in your continuous improvement projects.

Shares:
Show Comments (0)

Leave a Reply

Your email address will not be published. Required fields are marked *