Issue
I want printf to recognize multi-byte characters when calculating the field width so that columns line up properly... I can't find an answer to this problem and was wondering if anyone here had any suggestions, or maybe a function/script that takes care of this problem.
Here's a quick and dirty example:
printf "## %5s %5s %5s ##\n## %5s %5s %5s ##\n" '' '*' '' '' "•" ''
>## * ##
>## • ##
Obviously, I want the result:
>## * ##
>## • ##
Any way to achieve this?
Solution
Are these the only way? There's no way to do it with printf
alone?
Well with the example from ninjalj (thx btw), I wrote a script to deal with this problem, and saved it as fprintf
in /usr/local/bin
:
#! /bin/bash
IFS=' '
declare -a Text=("${@}")
## Skip the whole thing if there are no multi-byte characters ##
if (( $(echo "${Text[*]}" | wc -c) > $(echo "${Text[*]}" | wc -m) )); then
if echo "${Text[*]}" | grep -Eq '%[#0 +-]?[0-9]+(\.[0-9]+)?[sb]'; then
IFS=$'\n'
declare -a FormatStrings=($(echo -n "${Text[0]}" | grep -Eo '%[^%]*?[bs]'))
IFS=$' \t\n'
declare -i format=0
## Check every format string ##
for fw in "${FormatStrings[@]}"; do
(( format++ ))
if [[ "$fw" =~ ^%[#0\ +-]?[1-9][0-9]*(\.[1-9][0-9]*)?[sb]$ ]]; then
(( Difference = $(echo "${Text[format]}" | wc -c) - $(echo "${Text[format]}" | wc -m) ))
## If multi-btye characters ##
if (( Difference > 0 )); then
## If a field width is entered then replace field width value ##
if [[ "$fw" =~ ^%[#0\ +-]?[1-9][0-9]* ]]; then
(( Width = $(echo -n "$fw" | gsed -re 's|^%[#0 +-]?([1-9][0-9]*).*[bs]|\1|') + Difference ))
declare -a Text[0]="$(echo -n "${Text[0]}" | gsed -rne '1h;1!H;${g;y|\n|\x1C|;s|(%[^%])|\n\1|g;p}' | gsed -rne $(( format + 1 ))'s|^(%[#0 +-]?)[1-9][0-9]*|\1'${Width}'|;1h;1!H;${g;s|\n||g;y|\x1C|\n|;p}')"
fi
## If a precision is entered then replace precision value ##
if [[ "$fw" =~ \.[1-9][0-9]*[sb]$ ]]; then
(( Precision = $(echo -n "$fw" | gsed -re 's|^%.*\.([1-9][0-9]*)[sb]$|\1|') + Difference ))
declare -a Text[0]="$(echo -n "${Text[0]}" | gsed -rne '1h;1!H;${g;y|\n|\x1C|;s|(%[^%])|\n\1|g;p}' | gsed -rne $(( format + 1 ))'s|^(%[#0 +-]?([1-9][0-9]*)?)\.[1-9][0-9]*([bs])|\1.'${Precision}'\3|;1h;1!H;${g;s|\n||g;y|\x1C|\n|;p}')"
fi
fi
fi
done
fi
fi
printf "${Text[@]}"
exit 0
Usage: fprintf "## %5s %5s %5s ##\n## %5s %5s %5s ##\n" '' '*' '' '' '•' ''
A few things to note:
- I didn't write this script to deal with
*
(asterisk) values for formats because I never use them. I wrote this for me and didn't want to over-complicate things. - I wrote this to check only the format strings
%s
and%b
as they seem to be the only ones that are affected by this problem. Thus, if somehow someone manages to get a multi-byte unicode character out of a number, it may not work without minor modification. - The script works great for basic use of
printf
(not some old-skooler UNIX hacker), feel free to modify, or use as is all!
Answered By - Aesthir Answer Checked By - Gilberto Lyons (WPSolving Admin)