I am writing this script to create a color-mapped plot from the attached data frame.
here is the code
biaxial_plot_ICOS_PD1 = sorted_df.plot.scatter(x="ICOS - costimulator:Cyc_14_ch_4"
, y="PD-1 - checkpoint:Cyc_12_ch_4"
, c="ClusterName", colormap='viridis', s=50)
But I get this error
ValueError: 'c' argument must be a color, a sequence of colors, or a sequence of numbers, not ['CD4 T cells' 'CD4 T cells' 'CD4 T cells' ... 'CD4 T cells CD45RO ' 'CD4 T cells CD45RO ' 'CD4 T cells GATA3 ']
sorted_df:
CodePudding user response:
When you provide a column label to the c parameter, the values of that column should be valid numbers to be mapped to colors according to the colormap provided. From the docs of DataFrame.plot.scatter
c : str, int or array-like, optional
The color of each point. Possible values are:
- A single color string referred to by name, RGB or RGBA code, for instance ‘red’ or ‘#a98d19’.
- A sequence of color strings referred to by name, RGB or RGBA code, which will be used for each point’s color recursively. For instance [‘green’,’yellow’] all points will be filled in green or yellow, alternatively.
- A column name or position whose values will be used to color the marker points according to a colormap.
The c parameter is not directly interpreted as "color by this column". If you want something like that use seaborn.
In your case, it seems that you want to color based on ClusterName, so you can use groupby ngroup, so that each ClusterName is mapped to a distinct integer, meaning to a different color.
This should work
cluster_colors = sorted_df.groupby('ClusterName').ngroup()
biaxial_plot_ICOS_PD1 = sorted_df.plot.scatter(x="ICOS - costimulator:Cyc_14_ch_4",
y="PD-1 - checkpoint:Cyc_12_ch_4",
c=cluster_colors, colormap='viridis', s=50)

