Home > Enterprise >  Scatterplot Colored by Third Variable Issue
Scatterplot Colored by Third Variable Issue

Time:02-05

I am trying to color a scatterplot based on a third variable, in this instance by state. I have tried multiple options all returning error messages or no color variation. I am using Jupyter Notebook and matplotlib to complete this analysis.

Here is my code:

West_df = merged_df.loc[(merged_df["State"]=="Washington") | (merged_df["State"]=="Oregon|(merged_df["State"]=="California") | (merged_df["State"]=="Idaho") | (merged_df["State"]=="Nevada") | (merged_df["State"]=="Utah") | (merged_df["State"]=="Arizona") | (merged_df["State"]=="Alaska") | (merged_df["State"]=="Hawaii"),:]
West_county = West_df["County"]
West_population = West_df["Population"]
West_state = West_df.groupby("State").count()
plt.scatter(West_county, West_population, c=?)
West_state.head()

I know I need a color (c) added into my plt.scatter, it's just that I am not sure how to format this so that I can get it to work. Any help would be appreciated! Thank you.

CodePudding user response:

You can use color to indicate a third variable, in this case state. For example:

plt.scatter(population, area, c=abbrev)

This will put a color gradient over your scatterplot based on the state abbreviation. It's quite simple!

CodePudding user response:

I was able to get it to work by doing the following:

#Sorting through regional data for easier referencing
West_df = merged_df.loc[(merged_df["State"]=="Washington") | (merged_df["State"]=="Oregon")
                        |(merged_df["State"]=="California") | (merged_df["State"]=="Idaho") 
                        | (merged_df["State"]=="Nevada") | (merged_df["State"]=="Utah") 
                        | (merged_df["State"]=="Arizona") | (merged_df["State"]=="Alaska") 
                        | (merged_df["State"]=="Hawaii"),:]
#Variables
West = ["Washington","Oregon","California","Idaho","Nevada","Utah","Arizona","Alaska","Hawaii"]
West_co = West_df["County"]
West_pop = West_df["Population"]

#Plot Graph
plt.figure(figsize=(8,8))
Fig = plt.scatter(West_co,West_pop,c = West_df.State.astype('category').cat.codes)
mplcursors.cursor(hover=True)
plt.ylabel("Population")
scatter = plt.scatter(West_co, West_pop,s=150,
            c=West_df.State.astype('category').cat.codes)
plt.legend(loc="lower center", bbox_to_anchor=(.50, -0.20), ncol= 2, handles=scatter.legend_elements()[0], 
           labels=West,
           title="State")
Fig.axes.get_xaxis().set_visible(False)
plt.title("United States Region: West | County versus Population")

The answer was found through:

https://datavizpyr.com/how-to-color-scatterplot-by-a-variable-in-matplotlib/

Extremely helpful site!

  •  Tags:  
  • Related