This is a tiny little tip, and probably any linux guru worth his/her salt knows this, but I just discovered the wget usage to check the time-stamp / last-modified header prior to downloading a file.  Which is pretty cool if you’ve ever setup any shell scripts that fetch/sync something.

I have written some apps in the past that have relied on wget to fetch content, thereby cache it locally (as a backup in case of remote failure, as I’ve had a couple times).  Also it reduces the load if that data is begin shown on the your website/app.  So if 100 users sign on and check something, it doesn’t hit the remote server for 100x fetches of that data, it just falls back to the local copy, then the re-sync takes place 10-15 min later.

Anyways the command to get a timestamp check before downloading a file is:

wget -N http://google.com/robots.txt

So the above command will only fetch the robots.txt file IF and ONLY if the following is true:

  • A file of that name does not already exist locally.
  • A file of that name does exist, but the remote file was modified more recently than the local file.

Well there you have it, dumb but useful command if you ever need it.  Here is a script that I’ve used in the past to spool & fetch RSS / XML feeds:

!/bin/bash
#------------------------------------------------------------------
#
# This script will run via CRONTAB and fetch data from the
# urls.txt file, which can be used internally.  This way we minimize
# the number of requests externally for data.
#
# - created by Jakub
#
#------------------------------------------------------------------
basedir=/htdocs/RSS
storedir=/htdocs/RSS/read/
sourcefile=/htdocs/RSS/urls.txt</strong>
#------------------------------------------------------------------
# Read the URLS.TXT file to get the URL/filename
#
# Formatted:
# http://google.com/robots.txt/robot.filename.txt
# ^- URL                                                 ^- filename to save as
for s in `cat "$sourcefile"`;
do
geturl=`dirname $s`;
filename=`basename $s`;
wget -qN $geturl -pO "$storedir"$filename;
done;
#------------------------------------------------------------------


One Comment to “WGET only if file is modified”

  1. vvs | August 23rd, 2013 at 7:00 AM

    It looks like -N doesn’t cope well with -O since it’s like a shell redirect in fact:
    $wget -O output.file -NP ./gpff-gz/ -i ./files_list
    WARNING: timestamping does nothing in combination with -O. See the manual

    $ wget –version
    GNU Wget 1.12 built on linux-gnu.

    @ubuntu 12.04 lts

Leave a Comment