I would like to create an array in gnuplot with character indices. I have five data files named a.txt, b.txt, c.txt etc. Each text file has two columns of code profiling data (the function name and the CPU time in seconds):
# f tcpu, s
function1 221.284
function2 161.412
function3 167.322
I would like to plot the normalised profiling data for all the files in a single histogram plot. Prior to normalising the data I have to find the maximum time value in each data file. I do this with gnuplot stats command in a do for loop. I would like to save the maxima in an array tmax. It would be convenient to use the character array indices (tmax['a']), but I cannot find a relevant example anywhere. Please find my minimal example below:
array tmax[5]
# detect and save maxima values
do for [l in "a b c d e"] {
fname = sprintf('%s.txt', l)
stats fname using 2
tmax[l] = STATS_max
}
# print out maxima values
do for [l in "a b c d e"] {
print l
print tmax[l]
}
This attempt fails with the array index out of range error, when I try to save the STATS_max value in tmax[l]. If possible, could you please suggest, how to use character indices in a gnuplot array? The full gnuplot script follows:
#!/usr/bin/gnuplot
set style data histogram
set style fill solid
set style histogram clustered
set xtics rotate by 45 offset 0,0 right
set lmargin 8
set bmargin 11
# border
set style line 11 lc rgb '#808080' lt 1
set border 3 back ls 11
set tics nomirror
# grid
set style line 12 lc rgb '#808080' lt 0 lw 1
set grid back ls 12
set terminal postscript eps size 3.5,2.62 enhanced color \
font 'Helvetica, 10' linewidth 1
set output 'advisor.eps'
set xlabel 'C function'
set ylabel 't_{cpu} norm, %'
array tmax[5]
# detect and save maxima values
do for [l in "a b c d e"] {
fname = sprintf('%s.txt', l)
stats fname using 2
tmax[l] = STATS_max
}
# print out maxima values
do for [l in "a b c d e"] {
print l
print tmax[l]
}
plot "a.txt" using ($2/tmax['a']):xtic(1) title 'p=1, t=1', \
"b.txt" using ($2/tmax['b']) title 'p=1, t=8', \
"c.txt" using ($2/tmax['c']) title 'p=8, t=1', \
"d.txt" using ($2/tmax['d']) title 'p=4, t=2', \
"e.txt" using ($2/tmax['e']) title 'p=2, t=4'
Example data file is:
# f tcpu, s
NUTS\\_prop 221.284
Grow\\_tree 161.412
Grow\\_branch 167.322
stan\\_gradient 160.204
log\\_prob\\_grad 160.034
leapfrog\\_integrator 128.392
log\\_prob 116.953
poisson\\_log\\_log 80.262
poisson\\_log\\_lpmf 80.252
add 77.031
CodePudding user response:
This would be my suggestion. Instead of arrays, you can also store your data in strings and address the values via real() and word(), (check help real and help word). This will also work with older gnuplot 4.x versions.
Actually, I'm not sure whether I fully understood your normalization.
The example below divides the data of each file by the maximum time in that file. That's what I understood from your question.
Another option would be to compare the different settings (p=1, t=1, p=1, t=8, etc.) how long they take for a certain function (normalized to 1 for the longest time).
Script:
### loop several files for normalized histogram
reset session
myFiles = "a b c d e"
myFile(n) = sprintf("%s.txt",word(myFiles,n))
# create some random test data
myFunctions = "f1 f2 f3 f4 f5 f6"
do for [i=1:words(myFiles)] {
set print myFile(i)
do for [j=1:words(myFunctions)] {
print sprintf("%s %g", word(myFunctions,j),rand(0)*150 70)
}
set print
}
myTmaxs = ''
do for [i=1:words(myFiles)] {
stats myFile(i) u 2 nooutput
myTmaxs = myTmaxs.sprintf(" %g",STATS_max)
}
myTmax(n) = real(word(myTmaxs,n))
print myTmaxs
myTitles = '"p=1, t=1" "p=1, t=8" "p=8, t=1" "p=4, t=2" "p=2, t=4"'
myTitle(n) = word(myTitles,n)
set style data histogram
set style fill solid 0.6
set style histogram clustered
set grid y
set key out
set yrange [0:]
plot for [i=1:words(myFiles)] myFile(i) u ($2/myTmax(i)):xtic(1) title myTitle(i)
### end of script
Result:
CodePudding user response:
A possible workaround would be to rename the data files into 1.txt, 2.txt, 3.txt etc. Then the minimal example code will work with the following changes in syntax:
array tmax[5]
# detect and save maxima values
do for [l=1:5] {
fname = sprintf('%d.txt', l)
stats fname using 2
tmax[l] = STATS_max
}
# print out maxima values
do for [l=1:5] {
print l
print tmax[l]
}

