Home > database >  Multiindexing by columns from .csv
Multiindexing by columns from .csv

Time:01-11

I have some data loaded from a .dat file that looks like this in a .csv file.

A           B           C       
X   Y   Z   X   Y   Z   X   Y   Z
1   4   7   10  13  16  19  22  25
2   5   8   11  14  17  20  23  26
3   6   9   12  15  18  21  24  27

I am using the following code to import this data

df = read_csv('data.csv' , delimeter = '\t' , header = [0,1])

the output i get from this is:

A   un1 un2 B   un3 un4 C   un5 un6
X   Y   Z   X   Y   Z   X   Y   Z
1   4   7   10  13  16  19  22  25
2   5   8   11  14  17  20  23  26
3   6   9   12  15  18  21  24  27

note: un1 is Unnamed 1_Level_0 and so on

I would like to be able to multiindex so I could read print(df['A']['Z']) so that i can get the output:

7
8
9

I imagine that there is a way that I could fill the unnamed columns with the previous column name if they contain the str 'Unnamed' but this seems like a bad way to do it. Is there any way in pandas to use multiindexing to help solve this problem?

CodePudding user response:

You can do that by first replacing the Unnamed: * columns with NA, then call ffill to propagate the previous index value:

# Some sample data
from io import StringIO

string = """
A           B              C         
X   Y   Z   X    Y    Z    X    Y    Z
1   4   7   10   13   16   19   22   25
2   5   8   11   14   17   20   23   26
3   6   9   12   15   18   21   24   27
"""
df = pd.read_csv(StringIO(string), header=[0,1], sep='\s{3}', engine='python')

# The solution
level0 = df.columns.get_level_values(0) # Get the A, B, C level, which include the Unnamed
level1 = df.columns.get_level_values(1) # Get the X, Y, Z level

level0 = (
    level0.where(~level0.str.startswith('Unnamed:')) # replace Unnamed with NA
        .to_series().ffill()                         # convert to Series to get
                                                     # the ffill method
)

# Putting the new columns together
df.columns = [level0, level1]

CodePudding user response:

You can create the MultiIndex separately:

from io import StringIO

data = """A         B           C       
X   Y   Z   X   Y   Z   X   Y   Z
1   4   7   10  13  16  19  22  25
2   5   8   11  14  17  20  23  26
3   6   9   12  15  18  21  24  27"""

n_levels = 2
df = pd.read_csv(StringIO(data), delimiter  = '\t', skiprows=n_levels, header=None)
columns = pd.read_csv(StringIO(data), delimiter  = '\t', nrows=n_levels, header=None)\
            .T.fillna(method='ffill')
df.columns = pd.MultiIndex.from_frame(columns)
df

#   A           B           C
#1  X   Y   Z   X   Y   Z   X   Y   Z
#0  1   4   7   10  13  16  19  22  25
#1  2   5   8   11  14  17  20  23  26
#2  3   6   9   12  15  18  21  24  27
  •  Tags:  
  • Related