I have a load of personal ID numbers, which I need to download automatically. Rather than store thousands of unique personal IDs on my machine, I want to multiply all the IDs by a random integer, save the result, and delete the original file.
This will end up in Excel, but VBA is not an option because of org policy. I can't install anything - limited to what comes with Windows. As far as I understand, Excel's power query isn't able to delete a source file after reading it, but I could be wrong - if so that could be another direction to go in.
Sample of csv (the ids are not real)
3465549,2504526,2504566,3465552,2506339,3465551,2317719,2506451,3465547,3465550
,,,,,,,,,
,,,,,,,,,
,,,,,,,,,
3465546,3465544,2506443,3465542,3465545,2506696,3465543,2506775,,
,,,,,,,,,
,,,,,,,,,
2356880,2356779,2356828,2356944,2356915,2356917,2356928,2356783,2356850,2356781
,,,,,,,,,
3441512,2467140,2571636,2571767,2571812,2467147,3441514,2571648,3441513,3441511
,,,,,,,,,
2380632,2380582,2380629,2380586,2380613,2380537,2380601,2380636,2380539,2380471
3221426,3221424,3221425,3221429,,,,,,
2571011,2486766,1704643,2571037,2571040,2571018,2571043,2570940,2486742,2486770
2950233,2950214,2950268,3152046,3152050,3152047,3152049,3152051,3152048,2950384
Features
- It could be any number of rows, probably between a couple of hundred and ten thousand.
- Many of the rows are blank. Blank rows need to be preserved, because another column will need to be imported to go next to id (a blank row indicates no people correspond to this value from the imported column)
- Many of the rows have less than 10 id numbers - less than the maximum, in other words.
Desired output
Let's say that the random integer was 2, this would produce:
6931098,5009052,5009132,6931104,5012678,6931102,4635438,5012902,6931094,6931100
,,,,,,,,,
,,,,,,,,,
,,,,,,,,,
6931092,6931088,5012886,6931084,6931090,5013392,6931086,5013550,,
,,,,,,,,,
,,,,,,,,,
4713760,4713558,4713656,4713888,4713830,4713834,4713856,4713566,4713700,4713562
,,,,,,,,,
6883024,4934280,5143272,5143534,5143624,4934294,6883028,5143296,6883026,6883022
,,,,,,,,,
4761264,4761164,4761258,4761172,4761226,4761074,4761202,4761272,4761078,4760942
6442852,6442848,6442850,6442858,,,,,,
5142022,4973532,3409286,5142074,5142080,5142036,5142086,5141880,4973484,4973540
5900466,5900428,5900536,6304092,6304100,6304094,6304098,6304102,6304096,5900768
What I've tried
This answer looks promising, but it only deals with one column. I couldn't figure how to adapt it so that it accepts any number of columns.
@echo off
setlocal enabledelayedexpansion
FOR /F "tokens=1-18* delims=," %%A IN (mycsv.csv) DO (
set sum1="%%~C"
set /a cole=!sum1! * 2
echo %%~A,%%~B,%%~C,%%~D,!cole!
) >> output.csv
I don't really understand this Powershell based answer.
$csv = Import-Csv mycsv.csv
foreach ($row in $csv) {
[int]$row.B *= -1
[int]$row.F *= -1
}
$csv | Export-Csv output.csv
If I try to use it, it returns
powershell : Exception setting "B": "The property 'B' cannot be found on this object. Verify that the property exists and can be
At line:1 char:1
powershell -ep Bypass .\t.ps1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CategoryInfo : NotSpecified: (Exception setti...sts and can be :String) [], RemoteException
FullyQualifiedErrorId : NativeCommandError
set."
At C:\Users\user\OneDrive\Personal\t.ps1:3 char:3
[int]$row.B *= -1
~~~~~~~~~~~~~~~~~
CategoryInfo : NotSpecified: (:) [], SetValueInvocationException
FullyQualifiedErrorId : ExceptionWhenSetting
Exception setting "F": "The property 'F' cannot be found on this object. Verify that the property exists and can be
set."
At C:\Users\user\OneDrive\Personal\t.ps1:4 char:3
[int]$row.F *= -1
~~~~~~~~~~~~~~~~~
CategoryInfo : NotSpecified: (:) [], SetValueInvocationException
FullyQualifiedErrorId : ExceptionWhenSetting
Exception setting "B": "The property 'B' cannot be found on this object. Verify that the property exists and can be
set."
At C:\Users\user\OneDrive\Personal\t.ps1:3 char:3
[int]$row.B *= -1
~~~~~~~~~~~~~~~~~
CategoryInfo : NotSpecified: (:) [], SetValueInvocationException
FullyQualifiedErrorId : ExceptionWhenSetting
Exception setting "F": "The property 'F' cannot be found on this object. Verify that the property exists and can be
set."
At C:\Users\user\OneDrive\Personal\t.ps1:4 char:3
[int]$row.F *= -1
~~~~~~~~~~~~~~~~~
CategoryInfo : NotSpecified: (:) [], SetValueInvocationException
FullyQualifiedErrorId : ExceptionWhenSetting
I assumed that B and F just referred to the rows in the csv, but if I understand the error message (I may not) they're names that have to be defined (how?) Also I need this to work across all rows which have values, and skip empty rows and empty columns within rows. So hardcoding the rows and columns to be multiplied isn't going to work for me here.
Groping around in the dark, replacing
[int]$row.B *= -1
[int]$row.F *= -1
with
[int]$row *= -1
returns
powershell : Cannot convert the "@{12=18}" value of type "System.Management.Automation.PSCustomObject" to type "System.Int32".
At line:1 char:1
powershell -ep Bypass .\t.ps1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CategoryInfo : NotSpecified: (Cannot convert ..."System.Int32".:String) [], RemoteException
FullyQualifiedErrorId : NativeCommandError
At C:\Users\user\OneDrive\Personal\t.ps1:3 char:3
[int]$row *= -1
~~~~~~~~~~~~~~~
CategoryInfo : InvalidArgument: (:) [], RuntimeException
FullyQualifiedErrorId : ConvertToFinalInvalidCastException
Cannot convert the "@{12=}" value of type "System.Management.Automation.PSCustomObject" to type "System.Int32".
At C:\Users\user\OneDrive\Personal\t.ps1:3 char:3
[int]$row *= -1
~~~~~~~~~~~~~~~
CategoryInfo : InvalidArgument: (:) [], RuntimeException
FullyQualifiedErrorId : ConvertToFinalInvalidCastException
CodePudding user response:
The logic for multiplying the values on each row of your CSV can be like this, assuming the CSV has a fixed number of columns:
function MultiplyRows {
[cmdletbinding()]
param(
[parameter(Mandatory, ValueFromPipeline)]
[object]$InputObject,
[int]$MultiplyBy = 2
)
begin { $isFirstObject = $true }
process {
if($isFirstObject) {
$headers = $InputObject.PSObject.Properties.Name
$isFirstObject = $false
}
$out = [ordered]@{}
foreach($prop in $headers) {
if(-not ($thisValue = [int]$InputObject.$prop)) {
$out[$prop] = $null
continue
}
$out[$prop] = $thisValue * $MultiplyBy
}
[pscustomobject]$out
}
}
Import-Csv ... | MultiplyRows -MultiplyBy 3 | ConvertTo-Csv
If the CSV has no headers, you will need to determine how many columns it has and use the -Header parameter on Import-Csv. You can use the following, which, by looking at your CSV it should work but note that it may not work for all CSVs since some of then can have embedded commas like:
[pscustomobject]@{
ExampleCol = 'value with , comma'
AnotherCol = 'some value'
} | ConvertTo-Csv
# Results in:
"ExampleCol","AnotherCol"
"value with , comma","some value"
Using the function from before, and the CSV provided in question this is how the code would look:
$headers = (Get-Content ./some.csv -TotalCount 1).Split(',').ForEach({
begin{ $i = 0 }
process { "Col{0}" -f $i }
})
# -MultiplyBy 2 is implied (Default Param value)
Import-Csv ./some.csv -Header $headers | MultiplyRows | ConvertTo-Csv
Which results in:
"Col0","Col1","Col2","Col3","Col4","Col5","Col6","Col7","Col8","Col9"
"6931098","5009052","5009132","6931104","5012678","6931102","4635438","5012902","6931094","6931100"
,,,,,,,,,
,,,,,,,,,
,,,,,,,,,
"6931092","6931088","5012886","6931084","6931090","5013392","6931086","5013550",,
,,,,,,,,,
,,,,,,,,,
"4713760","4713558","4713656","4713888","4713830","4713834","4713856","4713566","4713700","4713562"
,,,,,,,,,
"6883024","4934280","5143272","5143534","5143624","4934294","6883028","5143296","6883026","6883022"
,,,,,,,,,
"4761264","4761164","4761258","4761172","4761226","4761074","4761202","4761272","4761078","4760942"
"6442852","6442848","6442850","6442858",,,,,,
"5142022","4973532","3409286","5142074","5142080","5142036","5142086","5141880","4973484","4973540"
"5900466","5900428","5900536","6304092","6304100","6304094","6304098","6304102","6304096","5900768"
CodePudding user response:
The for /F loop available in a batch-file combines multiple consecutive delimiters to one, so you cannot use it with the comma as delimiter to read a CSV file with empty cells/values/fields. However, you could use it to read full lines, together with a standard for loop to split each at commas, given that there are none of the characters ?, * and <, > (so for does not need to access the file system to derive matching file names and therefore maintains the given strings):
@echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "_FILE=%~dp0data.csv" & rem // (full path to target file)
set "_SEPC=," & rem // (separator character)
set "_HEAD=" & rem // (set to something if header)
set /A "_MULT=%RANDOM%%%(1<<7) 1" & rem // (multiplicator; result < 2 Gi!)
set /A "_ZPAD=0" & rem // (optionally zero-pad to width)
setlocal EnableDelayedExpansion
if defined _HEAD (set "HEAD=") else (set "HEAD=#")
rem // Read file line by line:
for /F usebackq^ delims^=^ eol^= %%J in ("!_FILE!") do (
if defined HEAD (
set "LINE=%%J" & set "COLL=%_SEPC%"
rem // Iterate through separated items, even blank ones:
for %%I in ("!LINE:%_SEPC%=" "!") do (
rem // Retain blank items in case:
if not "%%~I"=="" (
rem set /A "ITEM=%%~I*_MULT"
rem // Multiply item and correctly handle zero-padded numbers:
set "ITEM=00000000%%~I" & set "ITEM=1!ITEM:~-9!"
set /A "ITEM%%=1000000000, ITEM*=_MULT"
rem // Optionally zero-pad resulting number:
if %_ZPAD% gtr 0 (
if "!ITEM:~,-%_ZPAD%!"=="" (
set "ITEM=000000000!ITEM!" & set "ITEM=!ITEM:~-10!"
set "ITEM=!ITEM:~-%_ZPAD%!"
)
)
rem // Append new number to new line string:
set "COLL=!COLL!%_SEPC%!ITEM!"
) else set "COLL=!COLL!%_SEPC%"
)
rem // Return current altered line string:
echo(!COLL:~2!
) else set "HEAD=#" & echo(%%J
)
endlocal
endlocal
exit /B
In the top section, set variable _MULT to a multiplication factor so that the resulting values are always less than 231 − 1, because otherwise, negative values may result due to overflow.
The variable _ZPAD defines the width which the resulting products are padded to by preceding them with enough zeros. Numbers with more digits are not truncated. The maximal numbers of zeros is 10. No padding occurs when _ZPAD is set to zero or less.
The variable _HEAD must be set to something in case the CSV file contains a headline to be kept.
Note, that this script would fail when the CSV file contains (quoted) strings and/or quoted values that contain separator characters on their own.
The script, let us call it multiply.bat, returns the resulting lines in the console. To write them to a file, use redirection (assuming the CSV file is data.csv in the current directory):
multiply.bat "data.csv" > "data_NEW.csv"
To overwrite the original CSV file, simply execute the following command line afterwards:
move /Y "data_NEW.csv" "data.csv"
CodePudding user response:
This is the way I would do it:
@echo off
setlocal EnableDelayedExpansion
rem Define the factor, use %random% here
set /A "factor=2"
(for /F "delims=" %%a in (input.txt) do (
rem Get a line and eliminate multiple commas
set "in=%%a"
set "in=!in:,,=!"
if "!in:~-1!" equ "," set "in=!in:~0,-1!"
if not defined in (
echo ,,,,,,,,,
) else (
call :multiplyRow
echo !out!
)
)) > output.txt
goto :EOF
:multiplyRow
rem Multiply the numbers by the factor and count missing commas
set "out="
set /A "n=9,num=factor*%in:,=" & set "out=!out!!num!," & set /A "n-=1,num=factor*%" & set "out=!out!!num!"
rem Insert missing commas
for /L %%i in (1,1,%n%) do set "out=!out!,"
exit /B
The "magic line" in :MultiplyRow do several things: it multiply each comma-separated number in the %in% string by the factor and join each result to out string separated by a comma. It also decrement the commas counter for each number processed. You may appreciate this mechanism in a clearer way if you cancel the @echo off line (and the redirection to output.txt file) and carefully review the executed code.
