I am creating a sequential ID number. How can I use Len() function and omit previously dropped frame-CodePudding

Very new to python,

I am building a code that creates an upload file for an external system. One of the requirements is an "Entry Number" column that has a syntax of 101-000000001, 101-000000002, ..., 101-000344539 etc. per row entry.

I am using the below code to build the sequential entries, but I am noticing breaks in the sequence output. My theory is that len() is using the index of the original data vs the new size after I dropped a bunch of frames.

I tried to reset the index, but that isn't working. Any idea of what I can add or change to get the desired output?

## Add new column for the Entry Number
df = df.reset_index(drop=True)
df.insert(0,'ENTRYNUMBER', range(1, 1   len(df)))
df['ENTRYNUMBER']=df['ENTRYNUMBER'].apply('{:0>9}'.format)
df['ENTRYNUMBER']=('101-'  df['ENTRYNUMBER'])

Input Dataframe (df)

ITEM	DATE	QUANTITY	UOM
A	2/3/2022	7	CTN
A	2/4/2022	8	CTN
B	2/3/2022	15	CTN
B	2/4/2022	16	CTN
C	2/3/2022	21	CTN
C	2/4/2022	25	CTN

LookUp Data Frame (df2)

ITEM
A
C

Code to Drop any item on df that isn't on df2

## Use ItemList to create a new column that flags if the item is on the item list / Anything not on list is dropped / flag column is removed 
df = pd.merge(df, df2, how='left', indicator=True)
df.drop(index=df[df['_merge'] == 'left_only'].index, inplace=True)
df=df.drop(df.columns[[4]], axis=1)

Resulting Dataframe (df)

ITEM	DATE	QUANTITY	UOM
A	2/3/2022	7	CTN
A	2/4/2022	8	CTN
C	2/3/2022	21	CTN
C	2/4/2022	25	CTN

Code to add Entry Number

## Add new column for the Entry Number
df = df.reset_index(drop=True)
df.insert(0,'ENTRYNUMBER', range(1, 1   len(df)))
df['ENTRYNUMBER']=df['ENTRYNUMBER'].apply('{:0>9}'.format)
df['ENTRYNUMBER']=('101-'  df['ENTRYNUMBER'])

Desired Output

ENTRYNUMBER	ITEM	DATE	QUANTITY	UOM
101-000000001	A	2/3/2022	7	CTN
101-000000002	A	2/4/2022	8	CTN
101-000000003	C	2/3/2022	21	CTN
101-000000004	C	2/4/2022	25	CTN

Current Output

ENTRYNUMBER	ITEM	DATE	QUANTITY	UOM
101-000000001	A	2/3/2022	7	CTN
101-000000002	A	2/4/2022	8	CTN
101-000000005	C	2/3/2022	21	CTN
101-000000006	C	2/4/2022	25	CTN

My theory is the sequence is off because it is counting previously dropped frames

Prior to this code running, a look up is being done with df2 to drop rows in df that aren't matching. This is what I believe is causing the breaks in the entry number sequence.

CodePudding user response：

You could simply use:

df.insert(0, 'ENTRYNUMBER', '101-'   (pd.Series(range(len(df)))
                                        .add(1).astype(str)
                                        .str.zfill(9))
         )

or a list comprehension:

df.insert(0, 'ENTRYNUMBER', [f'101-{i 1:0>9}' for i in range(len(df))])

output:

     ENTRYNUMBER ITEM       DATE  QUANTITY  UOM
0  101-000000001   A   2/3/2022          7  CTN
1  101-000000002   A   2/4/2022          8  CTN
2  101-000000003   B   2/3/2022         15  CTN
3  101-000000004   B   2/4/2022         16  CTN

CodePudding user response：

I"d look for scoping and spelling mistakes in the code. This works as expected:

# Make the data frames from your example inputs
from io import StringIO

string = """
ITEM    DATE    QUANTITY    UOM
A   2/3/2022    7   CTN
A   2/4/2022    8   CTN
B   2/3/2022    15  CTN
B   2/4/2022    16  CTN
C   2/3/2022    21  CTN
C   2/4/2022    25  CTN
"""
df = pd.read_csv(StringIO(string), sep="\t")

string = """
ITEM
A
C
"""
df2 = pd.read_csv(StringIO(string), sep="\t")

# Select only the ITEMs that exist in df2
df = df[df["ITEM"].isin(df2["ITEM"])]

# Add the ENTRYNUMBER column
s = "101-"   pd.Series(np.arange(len(df))   1).astype("str").str.zfill(9)
s.index = df.index
df.insert(0, "ENTRYNUMBER", s)

Result:

     ENTRYNUMBER ITEM      DATE  QUANTITY  UOM
0  101-000000001    A  2/3/2022         7  CTN
1  101-000000002    A  2/4/2022         8  CTN
4  101-000000003    C  2/3/2022        21  CTN
5  101-000000004    C  2/4/2022        25  CTN