Home > OS >  How to use sed command in python subprocess?
How to use sed command in python subprocess?

Time:01-21

I searched for other answers before asking this doubt. Iam running on a windows 11 machine. The csv file I got has a " in between some lines which is cause an error when importing to mongodb. So i wanted to remove it. So I found that the sed command is very fast in doing that. Most of you may recommend me to use the replace function in python but here it is not feasible because the file is 5GB in size. and when I tested both methods I found that sed is much faster.

In my system I have to run bash in command terminal and enter bash mode and then run the sed command there.

How should I run subprocess.run() command for this to achieve. Below is my code

import subprocess

p = subprocess.run('bash' | r"sed -i 's/\"/-/g' D:\Backupfiles\MAY2021\Names.csv", shell=True, capture_output=True, check=True)
print(p.returncode)

given below is the error I get when running the above code.

"C:\Users\AEC Office Kollam\anaconda3\envs\SDR Project\python.exe" "C:/Users/AEC Office Kollam/Documents/Atom/Python/MongoDB/SDR Project/subprocesstutorial.py"
Traceback (most recent call last):
  File "C:\Users\AEC Office Kollam\Documents\Atom\Python\MongoDB\SDR Project\subprocesstutorial.py", line 3, in <module>
    p = subprocess.run('bash' r"sed -i 's/\"/-/g' D:\Backupfiles\MAY2021\SDR1\BSNL\BSNL-DEC2020-EKYCC.csv", shell=True, capture_output=True, check=True)
  File "C:\Users\AEC Office Kollam\anaconda3\envs\SDR Project\lib\subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'bashsed -i 's/\"/-/g' D:\Backupfiles\MAY2021\SDR1\BSNL\BSNL-DEC2020-EKYCC.csv' returned non-zero exit status 1.

CodePudding user response:

if you are running windows:

  1. use cygwin - https://www.geeksforgeeks.org/how-to-use-linux-commands-in-windows-with-cygwin/

  2. use commands like sed -

get-content somefile.txt | %{$_ -replace "expression","replace"}

or

get-content somefile.txt | where { $_ -match "expression"}
select-string somefile.txt -pattern "expression"

if you are running linux this will work for you:

out_file = open(outp, "w")
sub = subprocess.call(['sed', 's/\"//g', inp], stdout=out_file )

CodePudding user response:

sed is just a program that bash runs from somewhere, so you should be able to run it directly with subprocess.run (or subprocess.call if you’re using an old version of Python).

In bash, use type -p sed to find out where the sed program is.

I recommend thinking about whether you really need shell=True. My guess is that you don’t. Something like the Linux code in Tal Folkman’s answer should do it. Most of the time, using the shell here just adds quoting headaches.

If you really want to go through bash, you’ll have to use the -c flag to bash. Something like

subprocess.run([r'C:\whatever\bash.exe', '-c', 'sed -i -e "s/foo/bar/" input.dat'])

CodePudding user response:

Thanks to the above two answers I was able to figure out the answer for my problem.

My csv file contained a " without and ending ". The csv file was too large to be handled by python replace command. This is my final code.

import subprocess
subprocess.run(["bash", "-c", "sed -i 's/\"//g' Name.csv"], capture_output=True, shell=True, check=True)

sed -i 's/\"//g' Name.csv here we have to use a \ for the command to work. Other than that we can use this command to replace anything on any kind of file.

Thanks You For Your Insights Everyone

  •  Tags:  
  • Related