Looping through multi-line text in Bash

Some notes and observations on using loops in Bash scripts to process multi-line text. My sole reference when learning Bash these days is the Advanced Bash-Scripting Guide. (Any mistakes are my own.)

Consider the simple loop case:

INPUT="1 2 3"
for VAR in $INPUT; do
    echo $VAR
done
# Outputs:
# 1
# 2
# 3

The input is split using the value of the IFS (internal field separator) variable which defaults to whitespace. By changing the value of IFS we can change how the input is split into loop variables.

Consider retrieving records from an sqlite3 database. Say we have a sessions table:

Table 1. *sessions* table
ID	Task	Start	Duration
1	2	1316333644	900
2	3	1316344708	6300
3	2	1316344962	600

Let’s retrieve the sessions for task 2:

# Retrieve the start timestamp and session duration (both in seconds).
RESULT=$(sqlite3 my_tasks.db "SELECT start,duration FROM sessions WHERE task=2")

# Unquoted echo squashes line-breaks.
echo $RESULT
# Outputs:
# 1316333644|900 1316344962|600

# Quoted echo retains line-breaks.
echo "$RESULT"
# Outputs:
# 1316333644|900
# 1316344962|600

# We know that each record of retrieved data will contain no whitespace so the
# default IFS will split the unquoted +$RESULT+ into records:
for LINE in $RESULT; do
    echo $LINE
done
# Outputs:
# 1316333644|900
# 1316344962|600

What if the column data we retrieve contains whitespace? This means a line of content can contain whitespace so we cannot separate lines using the default IFS. My first naive solution used the expr command is extract line by line from the input:

# RESULT="some multiple lines of text input..."

# While there is more text to process...
while [ ${#RESULT} -gt 0 ]; do
    # Seek the index of the next line break.
    POS=$(expr index "$RESULT" $'\n')

    if [ "$POS" -gt 0 ]; then
        # If found, trim the next line off the front.
        LINE=${RESULT:0:$POS-1}
        RESULT=${RESULT:$POS}
    else
        # Else the remainder is the last line.
        LINE=$RESULT
        RESULT=""
    fi

    # Process $LINE
done

If our column data contains only spaces and tabs (no line breaks), we can set the IFS to split on newline characters only. To represent newline characters in IFS, we need to use the $'string' word form where escaped sequences are decoded: either $'\n' and $'\r' or using hexadecimal $'\x0A' and $'\x0D'.

Consider a task table which has a text description column which may contain spaces and tabs.

Table 2. *tasks* table
ID	Task_Num	Description	Status
9	1	Fix issue #332.	complete
10	2	Write documentation.	in_progress
11	3	Do XML export.	in_progress

# Retrieve the start timestamp and session duration (both in seconds).
RESULT=$(sqlite3 my_tasks.db "SELECT task_num,description FROM tasks WHERE status='in_progress';")

echo "$RESULT"
# Outputs:
# 2|Write documentation.
# 3|Do XML export.

# Default IFS would split on space, which we don't want.
# Set IFS to line feed and carriage return.
IFS=$'\n'$'\r'
for LINE in $RESULT; do
    echo $LINE
done
# Outputs:
# 2|Write documentation.
# 3|Do XML export.

We have successfully split the input into lines but we also want to split each line into columns. The column separator is the pipe character (|). We could locate each column using the cut command. Or we can set IFS and interpret the input as an array.

INPUT="2|Write documentation."

# Using 'cut'
TASK_NUM=$(echo $INPUT | cut -d\| -f1)
DESC=$(echo $INPUT | cut -d\| -f2)

# Using IFS
IFS=\|
FIELDS=( $INPUT )  # Separate the input in array context.
TASK_NUM=${FIELDS[0]}
DESC=${FIELDS[1]}

If we want to put it all together — split into lines then columns — we need to juggle the IFS value:

ORIG_IFS=$IFS        # Save the original IFS
LINE_IFS=$'\n'$'\r'  # For splitting input into lines
FIELD_IFS=\|         # For splitting lines into fields

IFS=$LINE_IFS
for LINE in $RESULT; do
    IFS=$FIELD_IFS
    FIELDS=( $LINE )
    IFS=$LINE_IFS

    # Do something with $FIELDS
done

# Restore the original IFS
IFS=$ORIG_IFS