What if you could easily download an entire website?
I’m not just talking about an image or a few files but every image and file so that you can browse the site offline?
Despite the ubiquity of internet access some places either don’t offer it or require payment for service. But that’s annoying.
A likely scenario
Push all the distractions out of your mind for a minute and imagine this scenario with me:
You’re rushing through the airport with a laptop bag swinging from your shoulder, a scalding Starbucks cup in your hand and your Smartphone cradled to your ear.
As you race to the gate you suddenly careen into an elderly woman who was inching her to the restroom.
The impact catapults the Starbucks cup through the air, your phone slides across the floor like a hockey puck and the old lady falls to the ground with a loud “umph”
You just ran into an old lady… You’re that guy today.
“Oh Shit! shit shit I’m sorry, Oh my… please forgive me… are you okay, can I help you with that?”
After helping this hapless victim of your recklessness to her feet, you notice almost every eye in the terminal is transfixed on you.
All you can do is stand there, abashed, wishing you could dissolve this day like bad dream…
Your hair is disheveled, Armani suit is wrinkled and your smartphone display has more cracks than a city sidewalk.
Yeah, this day sucks.
After apologizing profusely to the lady, you gather your stuff, glance at your watch and realize you have exactly 1 minute to make the gate.
Thankfully it’s right around the corner.
When you approach the Service Agent with your boarding pass, the aperture of her eyes dilate to the size of grapefruits.
After several seconds you ask: “Is something wrong?”
“No sir, uh you look… nervous, are you okay?” She replies looking at your sweat soaked suit.
“Yeah, ” you say breathing hard, “I’m fine, thanks”.
Once in your seat you release a long sigh. You couldn’t be happier to be here.
You buckle your belt, flip open your laptop and prepare for a healthy dose of wit from famed webcomic author: Randall Munroe.
Your geeky co-workers kept rhapsodizing about Randall’s site XKCD and now you’ve developed a penchant for the concise, brilliant bursts of funny you get from each read.
You stretch out your legs while brushing beads of sweat from your forehead.
As you computer boots you muse to yourself: “This flight has free Wi-Fi right?”
Suddenly a beautiful flight attendant glides down the center aisle like a swan.
She’s pretty, you’re single and this is a great chance to test a pick-up line your beer buddy told you last night.
As she approaches your gaze centers on a glistening wedding band.
You immediately forego your pick-up line idea and revert to your original plan: asking about in-flight internet.
“Excuse me?” you say with a craven tremble in your voice. (Beauty always has a way of making a man nervous)
She walks over and gracefully looks your way with a warm and ready smile.
“Is the Wi-Fi free here?”
Now that she’s close you notice the air is redolent of roses and jasmine.
“Oh no, I’m sorry sir. First-class only – but it’s only $14 for a daily pass”
Aghast she replies: “Excuse me, sir?”
“Oh I’m sorry, I mean… I err.. I thought it was free. $14 dollars for internet? Seriously?”
There’s a 5 second gap in the conversation as you get lost in the serene well of her dreamy eyes…
You snap out of your catatonic stare
“Uh, okay, um yeah – no thanks.”
So what’s the point?
If you knew how to download full web pages to your computer then you could have saved money and face.
That’s why I’m motivated to show you how to grab complete web sites from the net.
Now this is the thing: if you’re only interested in viewing Wikipedia pages offline then you’ll be pleased to know that you can easily save Wikipedia artcles as e-books.
But what about the rest of the web? How can we save that offline?
We’ll we have two options here:
wget for Windows is the quintessential tool for slurping digital content from the digital basin of the web. wget will slake your thirst for delicious content and it’s pretty easy to use.
Super users and keyboard aficionados agree that wget is the de-facto tool for mirroring websites to your local hard drive.
Let me show you how this works:
Download the complete wget package from Sourceforge (it’s minuscule in size, about 3 MB).
Install the package
Run the installer and keep clicking Next through each screen to keep the defaults.
When I ran the installer I didn’t encounter any unscrupulous offers or opt-in ads so you should be okay. (I’m always wary of unsavory app installers…)
When the installer finishes click the Windows button in the lower left corner of the screen and browse to the location of the wget executable.
Copy the path
In my case I found the wget.exe file hanging out here:
C:\Program Files (x86)\GnuWin32\bin
We need to copy the path to the clipboard so we can paste it as a Windows Environment Variable.
This little trick I’m about to show you lets you launch wget from the command prompt rather than taking a circuitous path to that bin folder shown above.
Edit Environment Variables
Click the Windows icon in the bottom left corner of the screen and type:
Click on Advanced System Settings in the left pane.
Head over to the Advanced Tab and choose the Environment Variables… button located in the bottom right corner of the window.
In the bottom pane of the Environment Variables window, you’ll see a section called System variables.
Scroll down until you see the variable named Path.
Select it and choose the Edit… button
Click inside the variable value field and press the End key on your keyboard to skip to the end of the line.
Type a semi-colon and then paste in the path you copied earlier.
This just tells Window that we have a program that we want to run from that directory.
Alright, now just keep clicking OK until you close out all the dialog boxes.
We’re ready to bust open the command prompt to perform some Windows alchemy…
Run wget from the Command Line
If the command line scares you then I can feel your consternation. I used to avoid the command line because it made me feel like I was about one keystroke away from irreparably damaging my computer.
Fortunately it’s relatively easy to use and once you get comfortable with it you can pull off a medley of useful command prompt tricks.
Open the command prompt and type:
You should see a bunch of text and a line that says “Try `wget –help` for more options.”
If you see this you’re good to go; otherwise, check your Environment Variable again to make sure the right path is present. Remember to precede the path with a semi-colon.
If you type:
you might actually feel like you’re drowning in the deluge of options. So don’t do that; I don’t want you to get discouraged. Instead here is the most common command you’ll want to use:
wget -r http://www.fixedbyvonnie.com
Incidentally, I recognize that my site is super UGLY but I’m working on that.
I call that recursive wget command a greedy command because it basically says:
give me give me give me
Oh one more thing: if you include the dash c parameter it’ll force wget to continue downloading files if it get’s interrupted.
I’ll just leave you with these two options for now but feel free to use that –help switch for fine tuning.
I was planning to do a article for HTTrack too but my fingers are getting tired and I’m feeling lazy. I might do that in a future post. I’m getting hungry too so I need to grab something.
Anyway, if this helped you at all please share the goodness in the comments! Also if you’ve been using some wget parameters that I didn’t mention – please share. I’m sure other people will appreciate your input.