Multiindexing by columns from .csv-CodePudding

I have some data loaded from a .dat file that looks like this in a .csv file.

A           B           C       
X   Y   Z   X   Y   Z   X   Y   Z
1   4   7   10  13  16  19  22  25
2   5   8   11  14  17  20  23  26
3   6   9   12  15  18  21  24  27

I am using the following code to import this data

df = read_csv('data.csv' , delimeter = '\t' , header = [0,1])

the output i get from this is:

A   un1 un2 B   un3 un4 C   un5 un6
X   Y   Z   X   Y   Z   X   Y   Z
1   4   7   10  13  16  19  22  25
2   5   8   11  14  17  20  23  26
3   6   9   12  15  18  21  24  27

note: un1 is Unnamed 1_Level_0 and so on

I would like to be able to multiindex so I could read print(df['A']['Z']) so that i can get the output:

7
8
9

I imagine that there is a way that I could fill the unnamed columns with the previous column name if they contain the str 'Unnamed' but this seems like a bad way to do it. Is there any way in pandas to use multiindexing to help solve this problem?

CodePudding user response：

You can do that by first replacing the Unnamed: * columns with NA, then call ffill to propagate the previous index value:

# Some sample data
from io import StringIO

string = """
A           B              C         
X   Y   Z   X    Y    Z    X    Y    Z
1   4   7   10   13   16   19   22   25
2   5   8   11   14   17   20   23   26
3   6   9   12   15   18   21   24   27
"""
df = pd.read_csv(StringIO(string), header=[0,1], sep='\s{3}', engine='python')

# The solution
level0 = df.columns.get_level_values(0) # Get the A, B, C level, which include the Unnamed
level1 = df.columns.get_level_values(1) # Get the X, Y, Z level

level0 = (
    level0.where(~level0.str.startswith('Unnamed:')) # replace Unnamed with NA
        .to_series().ffill()                         # convert to Series to get
                                                     # the ffill method
)

# Putting the new columns together
df.columns = [level0, level1]

CodePudding user response：

You can create the MultiIndex separately:

from io import StringIO

data = """A         B           C       
X   Y   Z   X   Y   Z   X   Y   Z
1   4   7   10  13  16  19  22  25
2   5   8   11  14  17  20  23  26
3   6   9   12  15  18  21  24  27"""

n_levels = 2
df = pd.read_csv(StringIO(data), delimiter  = '\t', skiprows=n_levels, header=None)
columns = pd.read_csv(StringIO(data), delimiter  = '\t', nrows=n_levels, header=None)\
            .T.fillna(method='ffill')
df.columns = pd.MultiIndex.from_frame(columns)
df

#   A           B           C
#1  X   Y   Z   X   Y   Z   X   Y   Z
#0  1   4   7   10  13  16  19  22  25
#1  2   5   8   11  14  17  20  23  26
#2  3   6   9   12  15  18  21  24  27