Improving DSL fallover recover


Forum: DSL Ideas and Suggestions
Topic: Improving DSL fallover recover
started by: b1ackmai1er

Posted by b1ackmai1er on July 07 2006,11:43
Hello again,

I had a problem with DSL booting recently which I tracked down to my restore drive running out of space, so the backup file was incompletely written and the thus corrupt and upon booting (frugal) it hung trying to restore it. The only way I could recover was to boot from a livecd and delete the backup file.

I wanted to suggest making the system more robust by putting in some checks to make sure the system is not left in a damaged state at  shutdown or can hang on bootup.

Some suggestions ...

A check that writing the backup file was successfully before shutdown.
Allow the boot process to test the validity of the backup file before restoring.
Installing the skeleton backup file if the corruption is detected.

Regards b1m1

Posted by WDef on July 08 2006,09:44
Checking that sufficient space is available before writing the backup file in theory is a good idea.  The same thing has happened to me before - you lose both your current and former backups :(

There would be the question, though, of what would be "sufficient" space. AFAIK tar can't estimate the size of an archive before creating it, though it can count total bytes written using --totals

Perhaps a "backup" could be made first to /dev/null, bytes written counted with --totals - this would give the size of the potential backup -pre-compression.

gzip compression ratios spread between about 0.1 and 0.75 so a worst-case multiplier could be used to estimate the max possible size of the backup tarball. If this was greater than df -h showed free on the backup partition, the backup could be aborted.

Haven't tried this, might not work (--totals might not work when writing to /dev/null ?)

Posted by WDef on July 08 2006,11:10
Tried it, appears it would work.  

Writing the uncompressed archive to /dev/null is very fast so this would not slow down the backup process by much, either.

Might try to code it into the backup scripts in the next few days and see how it goes

Posted by roberts on July 08 2006,15:20
Just curious, what's in your backup that is taking so much space.
Hopefully not extensions as they should not be.  I would guess the fat mail client?
I will take a look into this for v3.1
Posted by WDef on July 09 2006,09:41
Quote
gzip compression ratios spread between about 0.1 and 0.75 so a worst-case multiplier could be used


I should have said a 'typical' range.  Brain was not engaged because, for example, if the user wants to back up a whole lot of compressed files (eg jpegs), obviously gzip will compress these by little or nothing.

So the only max upper bound on size that will always be 100% safe is very close to the size of the uncompressed archive.

Robert, if you're busy and want to wait until I have a go at this for your perusal/modification/rejection/pooh-hooing by all means do so?  I'm on holidays so beach takes precedence, but it might make me feel like my current existence has meaning ...

Posted by WDef on July 09 2006,13:02
Here's some rough notes towards implementing the above, *untested*.  (I have no idea as yet, for example, about possible effects of calling shutdown -c )

Function check_sufficient_space could be called in filetool.sh right after 'if [ $1 == "backup" ] ; then'

Code Sample
check_sufficient_space(){

TARERR=/tmp/tarerr.$$

tar -C / -T $HOME/.filetool.lst -X $HOME/.xfiletool.lst --totals -cf - 1>/dev/null 2>${TARERR}

ARCHSIZE=$(awk '/Total bytes written/{print $4}' ${TARERR} )
[ -z "$ARCHSIZE" ] && echo "Nothing to backup?"
rm -f ${TARERR}

(( ARCHK=ARCHSIZE/1000 )) # want kB
(( ARCHSIZEK = ARCHK + 1 )) # round upwards

FREESPACE=$(df -k $MOUNTPOINT | awk '/dev/{print $4}')

echo "Max upper bound on backup size = $ARCHSIZEK kB"
echo "Available space on $MOUNTPOINT = $FREESPACE kB"

if [ "$ARCHSIZEK" -ge "$FREESPACE" ]; then
   # Needs a flua gui or something here
   echo "WARNING: there may be insufficient space on $MOUNTPOINT for backup."
  while true; do
  echo "Proceed anyway? (y/n)"
  read ANS
     case $ANS in
     n|N|n*|N*) echo "Aborting backup/shutdown .."
        sudo shutdown -c
        if [ $MOUNTED == "no" ]; then
              sudo umount $MOUNTPOINT
        fi
        exit 1;;
     y|Y|y*|Y*) break;;
     *) echo "Invalid response.";;
     esac
  done
fi
}

Posted by WDef on July 09 2006,20:57
After a little testing, first problem:

'shutdown -c' doesn't abort the shutdown running from this script, it can't find a pid for the running shutdown.  One for tomorrow.

Posted by WDef on July 13 2006,10:35
'shutdown' isn't running by that stage, init 0 has already been called (doh). Can get out of shutdown by changing runlevels (see below).

A flag could be set somewhere (eg /etc/sysconfig) to prevent a restore from occuring when runlevel 5 starts up again, otherwise some files which were changed during the session might get overwritten by the restore.

Function call:
Code Sample
.
(snip)
trap failed SIGTERM

if [ $1 == "backup" ]; then
 check_sufficient_space
(snip)


Skip restore if backup had been aborted:
Code Sample
(snip)
if [ $1 == "restore" ]; then
 if [ -e /etc/sysconfig/abortedbackup ]; then
     sudo rm -f /etc/sysconfig/abortedbackup
 else
 if [ -f /etc/sysconfig/des ]; then
    TARGETFILE="backup.des"
(snip)
.
.
(snip)
   echo "${BLUE}Done.${NORMAL}"
 fi
 fi
 clean_up 0
fi
echo "I don't understand the command line parameter: $1"
(snip)


Function:
Code Sample

check_sufficient_space(){

TARERR=/tmp/tarerr.$$
tar -C / -T $HOME/.filetool.lst -X $HOME/.xfiletool.lst --totals -cf - 1>/dev/null 2>${TARERR}
ARCHSIZE=$(awk '/Total bytes written/{print $4}' ${TARERR} )
rm -rf ${TARERR}

if [ $ARCHSIZE -eq 0 ]; then echo "Nothing to backup?"; clean_up; fi

(( ARCHK=ARCHSIZE/1000 )) # want kB
(( ARCHSIZEK = ARCHK + 1 )) # round upwards

FREESPACE=$(df -k $MOUNTPOINT | awk '/dev/{print $4}')

echo "Max upper bound on backup size = $ARCHSIZEK kB"
echo "Available space on $MOUNTPOINT = $FREESPACE kB"

if [ $ARCHSIZEK -ge $FREESPACE ]; then
  a=$(runlevel)
  if [ ${a#*[ ]} -eq 5 ]; then
  # Backing up from gui
  :
  # ------- eg flua GUI goes here which just exits script if user aborts backup -------------
  else
  # We are in shutdown/reboot
     echo "WARNING: there may be insufficient space on $MOUNTPOINT for backup."
     while true; do
     echo -n "Proceed anyway? (y/n) "
     read ANS
     case $ANS in
        n|N|n*|N*) echo "Aborting backup/shutdown .."
          if [ $MOUNTED == "no" ]; then
               sudo umount $MOUNTPOINT
          fi
          sudo touch /etc/sysconfig/abortedbackup
           exec >/dev/null               # prevents trapped "failed" message when init kills this script
           sudo telinit ${a%[ ]*};;         # back to previous runlevel (normally  5 )
        y|Y|y*|Y*) break;;
        *) echo "Invalid response.";;
     esac
     done
  fi
fi
}


If anyone feels like testing the above that might be useful.

Another method (if switching runlevels about turns out to have problems) might be to move backup/restore out of the shutdown scripts altogether, and instead run filetool.sh from exitcheck.sh, before shutdown is called. However this means that (for eg) if a shutdown is initiated by some means other than the menu (eg CNTRL-ALT-DEL), a backup will not occur at all, so imho this is not preferable.

Powered by Ikonboard 3.1.2a
Ikonboard © 2001 Jarvis Entertainment Group, Inc.