Home > OS >  How to handle comma (,) in CSV cell data in Bash or Python
How to handle comma (,) in CSV cell data in Bash or Python

Time:02-01

I am working on an automation script in Bash.

I am downloading the CSV file from Google drive and rearrange the columns on the basis of predefined template. Column positions can vary in CSV files. I have done that already but the CSV file has columns which have a comma inside the cell data. Now the issue is when rearranging, the CSV data is messed up. There are 29 columns in CSV I am showing the specific columns here in the example of the data.

Business Name,Address 1,SIC Description,Tags

CANNON CONSULTING  LLC,56 DOUBLOON DR,"Business Consulting, Nec",m1096465.output.lsb
DLM MAINTENANCE INC,4157 UPPERLINE ST,"Building Maintenance Services, Nec",m1096465.output.lsb
PELICAN PLUMBING REPAIRS LLC,128 NEPTUNE RD,Plumbing Contractors,m1096465.output.lsb
CRATING PACKING & SHIPPING LLC,39131 MILLER RD,Packing Goods For Shipping,m1096465.output.lsb
PRO TECH AUTOMOTIVE,40167 HIGHWAY 190 E B,GENERAL AUTOMOTIVE REPAIR SHOPS,"m1096465.output.lsb, gppo229715-2"
AWAKEN WELLNESS STUDIO LLC,110 DOUBLOON DR,"Health and Allied Services, Nec",m1096465.output.lsb
R M JEAUXNES PROPERTIES,433 NIGHTHAWK DR,Nonresidential Building Operators,m1096465.output.lsb
TEXAS TRANSPORT TOWING LLC,228 HOLLOW ROCK CT,"Transportation Services, Nec",m1096465.output.lsb
GENTLE TOUCH PERSONAL CARE,405 FOXCROFT DR,"NURSING AND PERSONAL CARE, NEC","m1096465.output.lsb, gppo229715-2"
REKERS MAINTENANCE LLC,162 LAKE D ESTE DR,"Building Maintenance Services, Nec",m1096465.output.lsb

I am using awk to re order the columns like this and using numbers not column names

awk -F, '{ print $'$contact_id'","$'$business_name'",..... }' $file_name > tmp_awk.csv

My first priority is to solve this using Bash but I can also include Python script so I have the option of Python too.

CodePudding user response:

A Python script using the csv module could be as simple as:

import csv

with open('input.csv') as fd, open('output.csv', 'w', newline='') as fdout:
    rd = csv.DictReader(fd)
    wr = csv.DictWriter(fdout, ['SIC Description', 'Address 1',
                                'Tags', 'Business Name'])
    _ = wr.writeheader()
    _ = wr.writerows(rd)

If you want to reorder by column numbers it is even simpler

import csv

with open('input.csv') as fd, open('output.csv', 'w', newline='') as fdout:
    rd = csv.reader(fd)
    wr = csv.writer(fdout)
    neworder = [2, 1, 3, 0]
    length = len(neworder)
    wr.writerows([row[neworder[i]] for i in range(length)] for row in rd)

CodePudding user response:

If you can install pandas this becomes very easy.

import pandas as pd
df = pd.read_csv('your_file.csv') # handles commas inside quotes
df_ordered = df[list_with_column_names] # rearranges columns
df_ordered.to_csv('out.csv') # writes output

Of course you can do everything with the builtin csv module, too. Only the column reordering isn't quite as trivial.

CodePudding user response:

In python:

import pandas as pd
data = pd.read_csv('tmp_awk.csv',sep=',')
//head of the data / few rows
data.head()
// rearrange the data
data[['column name 1','column name 2', 'column name 3']] // so on till column name you need

In bash:

INPUT=tmp_awk.cvs
OLDIFS=$IFS
IFS=','
[ ! -f $INPUT ] && { echo "$INPUT file not found"; exit 99; }
while read flname dob ssn tel status
do
    echo "Name : $flname"
    echo "DOB : $dob"
    echo "SSN : $ssn"
    echo "Telephone : $tel"
    echo "Status : $status"
done < $INPUT
IFS=$OLDIFS

CodePudding user response:

Python is definitely the way here, as it comes with basically all systems.

As mentioned by others, csv is built-in and requires no additional install.

import csv

with open("tmp.csv", "r") as file:
    content = csv.reader(file, delimiter=',', quotechar='"')
    with open("out.csv", "w") as out:
        writer = csv.writer(out, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
        for row in content:
            if row:
                writer.writerow([row[2], row[1], row[3], row[0]])

CodePudding user response:

Miller is the best tool that I know of for processing CSV files and a few other formats. It is available as a static binary; just put the mlr executable somewhere in your PATH and you're done with the installation (which is not much different than writing a Python script and putting it in your path).

With mlr, the reordering problem can be solved as simply as this:

mlr --csv -N reorder -f '3,2,4,1' file.csv

Or if you prefer to use names instead of numbers:

mlr --csv reorder -f 'SIC Description,Address 1,Tags,Business Name' file.csv
  •  Tags:  
  • Related