• Home
  • Contact
blog.biernacki.ca

Kuba's Online Workshop

  • Home
  • Contact

WGET only if file is modified

05/06/2009 Linux SysAdmin 1 Comment

This is a tiny little tip, and probably any linux guru worth his/her salt knows this, but I just discovered the wget usage to check the time-stamp / last-modified header prior to downloading a file.  Which is pretty cool if you’ve ever setup any shell scripts that fetch/sync something.

I have written some apps in the past that have relied on wget to fetch content, thereby cache it locally (as a backup in case of remote failure, as I’ve had a couple times).  Also it reduces the load if that data is begin shown on the your website/app.  So if 100 users sign on and check something, it doesn’t hit the remote server for 100x fetches of that data, it just falls back to the local copy, then the re-sync takes place 10-15 min later.

Anyways the command to get a timestamp check before downloading a file is:

wget -N http://google.com/robots.txt

So the above command will only fetch the robots.txt file IF and ONLY if the following is true:

  • A file of that name does not already exist locally.
  • A file of that name does exist, but the remote file was modified more recently than the local file.

Well there you have it, dumb but useful command if you ever need it.  Here is a script that I’ve used in the past to spool & fetch RSS / XML feeds:

!/bin/bash
#------------------------------------------------------------------
#
# This script will run via CRONTAB and fetch data from the
# urls.txt file, which can be used internally.  This way we minimize
# the number of requests externally for data.
#
# - created by Jakub
#
#------------------------------------------------------------------

basedir=/htdocs/RSS
storedir=/htdocs/RSS/read/
sourcefile=/htdocs/RSS/urls.txt</strong>

#------------------------------------------------------------------
# Read the URLS.TXT file to get the URL/filename
#
# Formatted:
# http://google.com/robots.txt/robot.filename.txt
# ^- URL                                                 ^- filename to save as

for s in `cat "$sourcefile"`;
do
geturl=`dirname $s`;
filename=`basename $s`;
wget -qN $geturl -pO "$storedir"$filename;
done;

#------------------------------------------------------------------
shellscriptsyncwget

Ultimate Frisbee is a hard sport to sell

Best shirt ever on shirt.woot.com

Categories
  • Android
  • Apple
  • Coldfusion
  • Cool Apps
  • CSS
  • GitHub
  • Hardware
  • How To
  • In the News
  • Javascript
  • Linux
  • PHP
  • RC Hobby
  • Reviews
  • SysAdmin
  • Thoughts & Rants
  • Tinkering
Recent Comments
  • Jim on Fixing ONKYO RC-710m Remote volume issues
  • Tony on Fixing ONKYO RC-710m Remote volume issues
  • Robert Lawrence on Fixing ONKYO RC-710m Remote volume issues
  • Dasen on Fixing ONKYO RC-710m Remote volume issues
  • Ahmed on Fixing ONKYO RC-710m Remote volume issues
RollmyBlog
  • Twitter
  • WebMojo Design
  • Windsor Ultimate
Archives
  • June 2020
  • January 2015
  • June 2014
  • February 2014
  • January 2014
  • December 2013
  • May 2013
  • February 2013
  • July 2012
  • June 2012
  • February 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • April 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • August 2010
  • July 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • November 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • December 2008
  • November 2008
  • October 2008
  • September 2008
  • August 2008
  • June 2008
  • October 2007
  • September 2007
  • June 2007
  • May 2007
  • April 2007
  • March 2007
  • January 2007
Proudly powered by WordPress | Theme: Doo by ThemeVS.