Very new to python,
I am building a code that creates an upload file for an external system. One of the requirements is an "Entry Number" column that has a syntax of 101-000000001, 101-000000002, ..., 101-000344539 etc. per row entry.
I am using the below code to build the sequential entries, but I am noticing breaks in the sequence output. My theory is that len() is using the index of the original data vs the new size after I dropped a bunch of frames.
I tried to reset the index, but that isn't working. Any idea of what I can add or change to get the desired output?
## Add new column for the Entry Number
df = df.reset_index(drop=True)
df.insert(0,'ENTRYNUMBER', range(1, 1 len(df)))
df['ENTRYNUMBER']=df['ENTRYNUMBER'].apply('{:0>9}'.format)
df['ENTRYNUMBER']=('101-' df['ENTRYNUMBER'])
--
Input Dataframe (df)
| ITEM | DATE | QUANTITY | UOM |
|---|---|---|---|
| A | 2/3/2022 | 7 | CTN |
| A | 2/4/2022 | 8 | CTN |
| B | 2/3/2022 | 15 | CTN |
| B | 2/4/2022 | 16 | CTN |
| C | 2/3/2022 | 21 | CTN |
| C | 2/4/2022 | 25 | CTN |
LookUp Data Frame (df2)
| ITEM |
|---|
| A |
| C |
Code to Drop any item on df that isn't on df2
## Use ItemList to create a new column that flags if the item is on the item list / Anything not on list is dropped / flag column is removed
df = pd.merge(df, df2, how='left', indicator=True)
df.drop(index=df[df['_merge'] == 'left_only'].index, inplace=True)
df=df.drop(df.columns[[4]], axis=1)
Resulting Dataframe (df)
| ITEM | DATE | QUANTITY | UOM |
|---|---|---|---|
| A | 2/3/2022 | 7 | CTN |
| A | 2/4/2022 | 8 | CTN |
| C | 2/3/2022 | 21 | CTN |
| C | 2/4/2022 | 25 | CTN |
Code to add Entry Number
## Add new column for the Entry Number
df = df.reset_index(drop=True)
df.insert(0,'ENTRYNUMBER', range(1, 1 len(df)))
df['ENTRYNUMBER']=df['ENTRYNUMBER'].apply('{:0>9}'.format)
df['ENTRYNUMBER']=('101-' df['ENTRYNUMBER'])
Desired Output
| ENTRYNUMBER | ITEM | DATE | QUANTITY | UOM |
|---|---|---|---|---|
| 101-000000001 | A | 2/3/2022 | 7 | CTN |
| 101-000000002 | A | 2/4/2022 | 8 | CTN |
| 101-000000003 | C | 2/3/2022 | 21 | CTN |
| 101-000000004 | C | 2/4/2022 | 25 | CTN |
Current Output
| ENTRYNUMBER | ITEM | DATE | QUANTITY | UOM |
|---|---|---|---|---|
| 101-000000001 | A | 2/3/2022 | 7 | CTN |
| 101-000000002 | A | 2/4/2022 | 8 | CTN |
| 101-000000005 | C | 2/3/2022 | 21 | CTN |
| 101-000000006 | C | 2/4/2022 | 25 | CTN |
My theory is the sequence is off because it is counting previously dropped frames
Prior to this code running, a look up is being done with df2 to drop rows in df that aren't matching. This is what I believe is causing the breaks in the entry number sequence.
CodePudding user response:
You could simply use:
df.insert(0, 'ENTRYNUMBER', '101-' (pd.Series(range(len(df)))
.add(1).astype(str)
.str.zfill(9))
)
or a list comprehension:
df.insert(0, 'ENTRYNUMBER', [f'101-{i 1:0>9}' for i in range(len(df))])
output:
ENTRYNUMBER ITEM DATE QUANTITY UOM
0 101-000000001 A 2/3/2022 7 CTN
1 101-000000002 A 2/4/2022 8 CTN
2 101-000000003 B 2/3/2022 15 CTN
3 101-000000004 B 2/4/2022 16 CTN
CodePudding user response:
I"d look for scoping and spelling mistakes in the code. This works as expected:
# Make the data frames from your example inputs
from io import StringIO
string = """
ITEM DATE QUANTITY UOM
A 2/3/2022 7 CTN
A 2/4/2022 8 CTN
B 2/3/2022 15 CTN
B 2/4/2022 16 CTN
C 2/3/2022 21 CTN
C 2/4/2022 25 CTN
"""
df = pd.read_csv(StringIO(string), sep="\t")
string = """
ITEM
A
C
"""
df2 = pd.read_csv(StringIO(string), sep="\t")
# Select only the ITEMs that exist in df2
df = df[df["ITEM"].isin(df2["ITEM"])]
# Add the ENTRYNUMBER column
s = "101-" pd.Series(np.arange(len(df)) 1).astype("str").str.zfill(9)
s.index = df.index
df.insert(0, "ENTRYNUMBER", s)
Result:
ENTRYNUMBER ITEM DATE QUANTITY UOM
0 101-000000001 A 2/3/2022 7 CTN
1 101-000000002 A 2/4/2022 8 CTN
4 101-000000003 C 2/3/2022 21 CTN
5 101-000000004 C 2/4/2022 25 CTN
