I'm having a weird issue with using PowerShell to merge multiple csv files into one. I've done this plenty of time in the cmd prompt on in windows 7, but here the output only contains the earliest file. The command is standard stuff:
C:\> copy *.csv output.csv
All I am getting is, as I say, the earliest csv copied into this new file but nothing else. Is this an issue with powershell vs simple cmd prompt?
Thanks Michael
CodePudding user response:
As noted by lit in the comments, in PowerShell copy is a built-in alias of the Copy-Item cmdlet, which functions differently from cmd.exe's internal copy command:
As of PowerShell 7.2.1,
Copy-Itemdoes not support merging multiple files into a single destination file.Currently, if
Copy-Item's-Destinationargument (the second positional argument,output.csvin your case) is a file, all-Patharguments (the first positional argument,*.csvin your case) are sequentially copied to the same destination file - in other words: the last file that matches wildcard pattern*.csv"wins", andoutput.csvis simply a copy of it alone - see GitHub issue #12805 for a discussion.
To use cmd.exe's copy command, which merges the input files to form the destination file, call via cmd /c:
cmd /c 'copy /y /b *.csv output.csv'
Note the addition of:
/y, which suppresses a confirmation prompt if the destination file already exists/b, which copies in binary mode, which prevents an "EOF character" (the Substitute character,0x1a, which you can interactively produce withCtrl-Z) from being appended to the destination file.
As an aside: on Unix-like platforms you could use sh -c 'cat *.csv > output.csv', but there you'd have to first ensure that output.csv doesn't already exist, as that would result in an endlessly growing file.
CodePudding user response:
The copy command of cmd.exe concatenates files when you specify multiple source files and a single destination file just as you do, so the command line:
copy *.csv output.csv
should concatenate all files matching the source pattern *.csv in the order they are reported by the file system (hence in something like alphabetic order on NTFS). You should append the /B option at the end to mark the destination file as binary and therefore avoid an end-of-file character (0x1A) to become appended.
But as soon as output.csv exists, the result depends on the position it is returned by the the search against the source mask *.csv:
output.csvis the first item: all the other files become appended tooutput.csv, given that the overwrite prompt is confirmed, or the/Yswitch is added;output.csvis not the first one: all the other files become concatenated tooutput.csv, the originaloutput.csvfile seems to be ignored;
This has been tested on Windows 10, also considering the console output of copy, which lists all copied files, but excludes output.csv unless it is at the first position.
So to get a result independent from the file system, you could simply explicitly specify the first item (namely nul, the null device, providing no data) and use the operator supported by copy:
copy /Y nul *.csv output.csv /B
This ensures that a potentially already existing file output.csv is never the first one in the list of files to become copied and therefore being ignored for concatenation.
