Scatter Plot Binary Data Color Coded Points from Data Labels-CodePudding

I'd like to use matplotlib.pyplot.scatter to create a scatter plot similar to the picture below from data in a dataframe with a header that is formatted similar to the table here where all the points for a given sample are colored based on the label in the first column of the data and a point is only plotted for each gene with a value of 1 - no point for the genes with a 0 value:

label	gene a	gene b	gene c	gene d
1	0	1	0	0
0	1	1	0	1
0	0	0	1	0
1	0	0	0	0
1	0	1	0	0

Note: my sample data does not match my sample scatter plot output.

CodePudding user response：

After melting your dataframe to a long format you can draw a matrix with seaborn's

With the melted dataframe you can access plt.scatter directly from pandas but I think you have to add your own custom legend for the labels.

df.plot(x='variable', y='sample', s=(df.value 0.1) * 300, kind='scatter',
        ylim=[df['sample'].max() .5, df['sample'].min()-.5], # uncomment to flip y-axis
        figsize=(7,6), c='label', cmap='coolwarm', colorbar=False
);