I Wish I Was Excited About The GoPro Camera Craze
February 22, 2013
Let’s take a moment to talk about technology in the consumer space. A few years ago, an exciting little device started popping up. At first it lurked around on the some small tech blogs and action sport forums. Then it moved onto mainstream technology blogs, and once the ball started rolling, I started seeing it on television commercials. I don’t even own a television, yet the few times I found myself near a television this thing seemed to show up on every commercial break. Towards the end of last year, there was one GoPro video titled, tagged, and uploaded every minute on YouTube. How did this little device explode into popularity?
Let’s talk about what these little ‘GoPro’ cameras do:
- Shoots video (in a wide variety of different resolutions and framerates)
- Shoots still images (in a wide variety of delays, timers, and bursts)
- Shoots underwater, in the dirt, in the rain, in space, or just about anywhere you want.
See a theme here? It’s crazy flexible. When I was debating purchasing one I thought about all the different use cases and what sort of audiences this camera was being marketed to. It’s great for documenting action sports. It’s great for people who just want to take some pictures or video underwater. It’s great for people who want to add a high-quality video recording unit to their remote controlled airplane. I was thinking it’d be awesome for some time-lapse video because of the built-in intervalometers. Some expensive DSLR’s don’t even have built in intervalometers!
Let’s talk about features:
- Rugged
- Cheap
- Simple to use
And let’s elaborate on the simple to use part. There are two buttons (at least on the second generation unit I had) — Power and Mode. There’s no camera focus to deal with. There’s no LCD screen for people to fiddle with composition, and the wide-angle lens means if the camera is pointed at your subject, it’s going to be in the shot. The ruggedness factors into the simplicity as well, because I can throw it in my backpack, tie it to a kite, use it in the rain or the leave it near the swimming pool. It requires little thought or worries. It’s a brilliant, flexible, easy-to-use, inexpensive device. But I am worried.
Though You Had Strong Hardware and Insane Viral Marketing Success, your Software and Customer Support Failed.
(And Failed in the Most Ridiculous, Absurd, How-The-Hell-Did-This-Get-Past-Your-QA-Department Way)
What’s my complaint? Well, as mentioned I was planning on doing some time lapse photography using the built in intervalometer that came with the camera. I was also going on a family vacation the next week, and wanted to try out the underwater video recording in the pool and ocean. So I ordered their newest (at the time) $300 GoPro Hero HD, a chest strap, and a few days later was happily recording video and shooting pictures. Looking at the footage, the quality impressed the hell out of me. I finally understood why everyone wanted HD televisions. The time-lapse footage came out pretty great as well. A big, wide-angle shot, perfect for watching clouds and shadows race across the the screen. But little did I know that with each of these experiments my camera was slowing ticking away, it’s life and usefulness decreasing with every single shot.
Fast-forward to the end of my vacation. I shot probably 50 or so videos of my family jumping off the diving board into the water and thousands of still images that would later become some neat time lapse segments I was getting slightly less than ideal battery life which I thought was an okay compromise for all the great shots, but what was really starting to get on my nerves was that I couldn’t seem to take more than a couple time lapse photos. I’d set it to the timed image mode, press the shutter button, and it would take a couple photos (I could see the little red light blink), but then it would mysteriously stop. Also, the 3 digit LCD display seemed to be stuck at 999. I saw this before when doing extended time lapses. It meant I had taken over 999 videos or pictures, exceeding the LDC display limit, but when plugged into the computer, I would find my files rolled over like GOPR0999.jpg and then GOPR1000.jpg, GOPR1001.jpg, etc.
Turns out that although it was smart enough to roll over after 999 photos, it wasn’t smart enough to roll over a few more times. After some hours of troubleshooting (which included connecting it to the computer, transferring files, reformatting the SD card, trying different SD cards, power cycling the camera, etc) I gave up. I couldn’t get it to take any more photos or video. Thankfully it was the last day of my vacation. Although that meant I would miss recording my four year old cousin finally get up the nerve to swim across the pool without floaters, I would have the next couple days alone to get my new camera working again.
Eventually I felt helpless. I decided to shoot GoPro an email (on 7/21/10):
Hello I recently purchased a GoPro HD Hero and it worked fine for a few weeks. Now however when I try to set up an image sequence (set to once every 5 seconds), it takes about 8 or 9 photos and then locks up. I cannot turn it off or stop its capture. When I plug it into my computer, I see that there are hundreds of empty folders. Only in the first folder are there about 8 or 9 photos.
I tried powercycling it by taking the battery out and putting it back in. I also wipe out the memory card both via the computer, as well as selecting the ‘delete all’ option on-camera. Whenever I try to do an image sequence, it just locks up after the first 8 or 9 photos. I am guessing it just starts making empty folders every 5 seconds on the memory card at that point.
I was happy with the product initially, but this is a very frustrating problem. I have missed opportunities to take awesome image sequences. Please advise what I can do about this.
Stephen
The next day I received this response:
Hi Stephen,
Do you take many time lapse pictures? What may have happened, is you may have encountered a known issue with our current firmware, in which the camera is no longer able to save files after it has taken 9999 images. Could you please let me know what the name of the last successfully captured image was? If this is the case, we would need you to send in your camera, and would reflash your firmware to fix the issue.
Otherwise, what brand/specification of SD card are you using? We have had many users facing issues with less reputable SD card manufacturers, and this could potentially creating the issue. In house we use Kingston and Patriot brand Class 4 or higher SD cards, and can fully recommend these.
Please let me know if this helps.
Many Thanks,
GoPro Support
to which I replied:
Yes, I do take many time lapse pictures. That was one of the reasons why I purchased the camera not too long ago. The last successfully captured image filename is GOPR9999.jpg in the folder 100GOPRO and there are a lot of empty folders such as 101GOPRO and 102GOPRO etc.
I have a Kingston SDHC 16GB Class 4 memory card, which I do not believe is the issue.
Is the only solution to send this camera to you to reflash the firmware? Is this something I cannot do if you send me the firmware and instructions? Is there not a reset button on the camera somewhere?
This is a frustrating experience that the firmware will not let the user take over 9,999 pictures. I do not understand how this is a “known issue” and yet you still shipped me a camera that has such an arbitrary limit to the number of pictures. It is unfair to your customers to sell a camera that only takes XX number of pictures before it needs to get shipped back (on the customer’s dime) and “reset” due to a software error.
Stephen
And here was the last email from them:
Update for Case #57639 – “Picture sequence not working”
Hi Stephen,
I’m sorry for the inconvenience. You are indeed experiencing an issue with our current firmware, in that its internal file counter cannot exceed 9999 and so is unable to save past this point. The only immediate fix we have is to reflash the firmware, which we need you to send the camera in to accomplish.
We are currently completing testing on our latest firmware release, which will fix the issue you are facing. We hope to have this released and available on our website by the end of summer, hopefully sooner, but need to make absolutely sure that image quality in all the new features is preserved. If you are in no hurry, you may elect to wait for this firmware upgrade, and will be notified of its release if you sign up for our newsletter:
http://www.goprocamera.com/newsletter
Please let me know how you would like to proceed.
Many Thanks,
GoPro Support
And that was it. They wanted me to either pay $20 in shipping to send my new camera across the United States, wait a week without it, just so they could “reset” a software bug and I could take another 9,999 photos. Or my only other option was to sign up for their marketing campaign so I could be notified of a new firmware release which was expected to be due by the end of summer. In troubleshooting this issue and browsing online forums, I noticed that users were complaining of repeated firmware release delays that the GoPro website had boldly advertised for.
What did I do? I quietly cursed at GoPro, paid the $20 in shipping and insurance, spent a week without my new camera, and silently vented my anger and frustration. Well, that was until now, with the publication of this blog post, nearly two years later.
Even though this happened long ago and the GoPro craze has only grown, looking back it still rubs me the wrong way. This is a product that I wanted to love, but made by a company that I cannot trust, and one that seriously shit on its customers.
You do not sell a fucking camera with a software-defined limited number of shots.
Not to a few hundred people, not to a few thousand people, and in no way should you be selling hundreds of thousands of cameras around the world with a critical firmware or software bug that arbitrarily limits the number of photos you can take with it. But GoPro did exactly that. And even though I’ll probably get a next generation Hero at some point soon, I’ll be calling them assholes in my head as I click the order button, but that’s only because there isn’t a viable alternative to be seen on the market. Not yet.
But according to the latest reviews on Amazon, it doesn’t look like I’m alone at all.
Update on 3/21/2013
And if you thought it was absurd of them to sell a camera with a ‘cap’ to the number of photos you could take, they recently tried to use the Digital Millennium Copyright Act to forcibly remove a negative review of one of their products on a website. Yup, they’re ethically challenged monsters. They’re not getting a dime from me ever again.
Let’s take a quick count of all the ways we can send instant messages via Google products:
- The chat bar in Gmail
- The chat bar in Google+
- Google Talk for mobile phones
- Google Voice
- Gmail SMS
- Google Messenger (in the mobile app for Google+, formally known as Huddle)
Some of these sound pretty similar. Why am I listing the chat bar in Gmail as a separate item as the chat bar in Google+? Because they are separate. They hold different contacts. As an example, here are the two chat screens, logged in from the same account and captured at the same moment in time.
Why aren’t these the same!? I’m sure someone at Google has a good reason for this, but it still leaves me in an awkward position when I want to chat with someone. What if I am browsing Google+ and have the urge to instant message my friend about the latest picture of a hat-wearing cat that he just shared? Taking a quick look at the chat sidebar I see that he’s not online. Oh, too bad, I think to myself. But a few minutes later I am switch back to Gmail and see that he is online. What the heck? Why do I have more friends to talk to on Gmail than I do on Google+? Some of the chats carry over from Gmail to Google+ (such as if I have two browser tabs or windows open, one on each service) so it seems like they are… sort of integrated? Maybe?
Okay that is weird. What about the Google Talk app?
Let’s think of another common situation: I’m on my phone’s Google Talk app to chat with friends, and I want to share a picture. I use Google Talk instead of traditional SMS text messages very frequently. But when I am sending text messages, I have the option of sending MMS picture messages. This is all handled very seamlessly and isn’t too big of a deal. But when I am using my phone’s Google Talk app to chat, how the heck do I send a picture? Let’s think. I could:
- Send an email with an attachment and tell the person to check their email
- Send a MMS and tell the person to check their phone
- Share it on Google+ and tell the person to visit my profile page
- Start a conversation in Google Messenger (formerly Huddle, part of the Google+ App) and share the picture and tell the person to check their Google+ Messenger App
Why is this so difficult? An easy choice would be to send an email with an attachment. After all, if the person is on Google Talk they are probably on Gmail, right? Maybe not so, as you can have the Google+ page open to chat, without having a Gmail page open. So they might miss the email. It’s also not a good choice because many smart phones take 5 or even 8 megapixel pictures. The image that you want to share might be anywhere from 300 KB to 1.5 MB in size. By the time the person finally receives the email with the picture attached, it might be 20 minutes later, and it won’t have any context. There’s just no simple way of sharing pictures via Google Talk.
Does that mean we should all be communicating via Google Messenger (formerly Huddle) — the latest and greatest way to communicate?
Google Messenger adds the ability to send group messages and share pictures, besides offering basic one to one instant messaging. Unfortunately you can only use Google Messenger on your mobile phone. Which means I am definitely not going to use it while I am near a computer where I have a large screen and an actual keyboard. And I doubt my friends and contacts will be using it either. It also has no integration with Google Talk or any of the other ways to send instant messages through Google.
So… Google Voice?
Now take a minute to consider sending SMS messages to each other through Google. If you sign up for a Google Voice you can get a special phone number. You can access Google Voice from a computer with its web browser interface, or you can access Google Voice on your mobile phone through the Google Voice app. Sounds nice, right? Unfortunately Google Voice offers no way to send a picture or a MMS equivalent.
The other problem is that all of your friends will probably need to add a second phone number to your contact information, or create a second contact entry for you. What’s even more confusing is when your friends go to text you back. Here’s what I have to think about each time I try to message my friend Nick:
And okay, whatever, I have two entries for Nick. But I have another friend Joe who also uses his regular mobile number to text as well as Google Voice to text. Maybe they use Google Voice for 95% of the time they talk, but sometimes their cell phones may be in a weird reception spot where they don’t have 3G access (data access) which is necessary for Google Voice to work, but they have basic GSM service and are able to send SMS text messages. Or maybe they use Google Voice 95% of the time, but want to send a picture message or MMS equivalent. Nope, can’t do that in Google Voice. So every now and then I get messages from the same person but may show up on my phone as coming from two different sources and the conversations may be completely unrelated. A couple hours later I have to ask myself: Which conversation thread do I reply to? Should I respond to the Google Voice number? Should I respond to the regular number?
All of this really begs the question.. Why aren’t any of these instant message services integrated with each other? How hard can this be?
I’m really not a big Apple fan, but look at what they’re doing with iMessage. It seems like they understand what’s going on. Google on the other hand keeps introducing new ways to do the same thing thing (with small functional improvements) yet seems to have no desire to make them play nicely with one another.
And please tell me what they were thinking with this Gmail Labs Feature:
And here’s what the recipient sees:
Which really makes me wonder what is going on. I have a Google Voice number, so why not send the text from that number? Instead, it’s some random number that seems to generated on the fly. And what is the recipient supposed to do? Text a message back to a random number? Or look, there’s an email address. Should the recipient email the person back? What if the sender walks away from their email? They wouldn’t have access to that particular gchat anymore, nor would they necessarily have access to an email.
Here’s just another Google product that fails to integrate with the rest. Granted this is a “beta” and “labs” feature, but all of these things really make me wonder what is Google’s preferred way for us to send instant messages to each other? It seems like each of these products offer some neat functionality that the others lack, but they all fall short of delivering one easy go-to solution. And that’s kind of weird, because having all of my messages in one easily accessible spot seems to align very well the company’s corporate mission:
Google’s mission is to organize the world‘s information and make it universally accessible and useful
All of this is really unfortunate because I really want to use Google products and services. I like chatting in Gmail, and I tend to use Google Talk on my mobile phone just as often as traditional SMS text message. So Google, please, for the love of this tech-savvy world, integrate some of your chatting services.
Puppy Linux saved the day after the death of my SSD
November 16, 2011
This past weekend my hard drive failed. Saturday night I attempted to turn on my computer and received the all-too-familiar message “DISK BOOT FAILURE, INSERT SYSTEM DISK AND PRESS ENTER” which is quite annoying. I immediately suspected my OCZ Agility 2 60 GB solid state drive, only because the same thing happened exactly 7 months and 1 day ago. My BIOS wouldn’t even recognize the hard drive, which means it failed pretty hard.
What I had to do exactly 7 months and 1 day ago involved contacting the OCZ customer support website and creating an RMA support ticket. The folks there were not very quick to respond, nor very sympathetic, but I did manage to get an RMA, mail my SSD back and get a “working” SSD about a week later. That drive failed this past weekend, meaning I’ll be contacting OCZ support and jumping through their hoops once again (ugh..). So let’s just recap how my excellent experiences with these fancy SSD’s that seem to be all the rage in this modern world:
| SSD 1 | Purchased on 10/5/2010 | Failed on 4/3/2011 | Age at death: 6 months 2 days |
| SSD 2 | Replaced on 4/11/2011 | Failed on 11/12/2011 | Age at death: 7 months 1 day |
The results are pretty scary. I could fill up 60 GB of data on that drive and lose all of it every 6 or 7 months. And I did. And it sucked. Am I just unlucky? Did I get two lemons?
Apparently not. Looking at Jeff Atwood’s collection of SSD lifespans, it would appear that solid state drives fail pretty often. Here’s the numbers he found based on eight SSD’s purchased over the last 2 years:
- Super Talent 32 GB SSD, failed after 137 days
- OCZ Vertex 1 250 GB SSD, failed after 512 days
- G.Skill 64 GB SSD, failed after 251 days
- G.Skill 64 GB SSD, failed after 276 days
- Crucial 64 GB SSD, failed after 350 days
- OCZ Agility 60 GB SSD, failed after 72 days
- Intel X25-M 80 GB SSD, failed after 15 days
- Intel X25-M 80 GB SSD, failed after 206 days
Scary! I have about 8 regular magnetic hard disk drives, some of them over 5 years old, and not one of them has failed me. Is the speed that SSDs bring to the table worth the risk that comes with losing 60 GB of data twice a year? In my situation, yes. Here’s what my storage looks like (an incredible diagram, I know):
As you can see, the only data that I lose when my SSD fails every 6-7 months is the operating system (Windows 7 Professional) and some commonly used program files (Eclipse, Notepad++, Adobe Lightroom, Adobe Photoshop). All of these can be replaced relatively easily. In fact, I keep the software installers on my 1 TB mirrored hard drives. That’s where I store my precious data (photos from a 7 year time span, important documents, saved game files, etc). Windows 7 has a neat disk management tool that allows me to set up mirrored hard drives without too much thinking. The other hard drives that aren’t mirrored (and therefore have no redundancy if they were to fail) contain data that would suck to lose, but wouldn’t cause me to cry. All of my movies and TV shows can be (slowly) replaced if I really wanted them again, program files can be reinstalled, and the misc files such as rendered compositions (I do some video editing and CGI work from time to time) can be re-rendered.
So why does my situation still suck? And why is the title of this article a shout-out to Puppy Linux? Because once my SSD dies, I lose my operating system. Without the operating system, I lose easy access to all of these files. My data is safe, but I can’t access it. When the drive my OS lived on died, my first instinct was to just re-install the OS somewhere else. But where? All of my hard drives had stuff on them. Stuff that I could lose if necessary, but I didn’t want to resort to that. If only I could just get basic access to the files and move some things around… Hmm…
That’s where Puppy Linux totally saved the day. I was able to download a 125 MB disk image (.iso) and place it on the USB drive to create a bootable USB “disk”. I stuck this USB drive into of my computer and in my BIOS screen, selected the USB drive to boot from. Within a few minutes I had a fully functional operating system (a variant of Linux) which allowed me to see my hard drives and files. If I so desired, I could grab those important documents that were safely backed up on one of my hard drives and transfer them to a USB drive. However my goal was just to delete what wasn’t important and relocate what was mildly important to another drive, thereby freeing up an entire HDD so I could install Windows 7 onto it. If I didn’t have any desire to do some gaming I could have just used the Puppy Linux OS (running from the USB drive) for the next week or so while I wait for OCZ to send me a replacement SSD.
Just a quick note: Downloading the disk image (.iso) file from Puppy Linux and dropping it onto your USB drive will not work. You’ll need to insert some magical code onto your USB drive so your computer can “boot” from it. You’ll also want to “unzip” or decompress the .iso file and actually transfer the contents of that disk image onto your USB drive instead of the .iso file. Disk images are actually great for burning onto a CD, or “mounting” onto virtual CD hardware, but when it comes to making a bootable USB drive they need a tiny bit of manipulating.
I tried a bunch of different ways of inserting that magical code that allowed the USB drive to be “bootable” without much luck. Fortunately I found UNetBootin to streamline the entire process of creating a bootable USB drive and it worked perfectly for me.

Although I didn't really need to open up my computer, I wanted to make sure that my SATA cable or port wasn't the culprit. Indeed it was the SSD. I also disconnected my "precious data" completely so I wouldn't accidentally reformat that.
Next time your operating system or a hard drive fails, consider booting from an OS that lives on your USB drive. You’ll be able to access your hard drives and recover your files so long as they are not corrupted.
So I just want to say a big thank you to Puppy Linux, a big thank you to the fine folks who wrote UNetBootin, and a disappointing “ughhhhhghghghggh” to OCZ who have twice been unsympathetic and very slow to help me with their faulty solid state drives.
Update 6/5/2012 – Further Reading
Months after publishing this post, I have found this recent article describing the inner workings of SSDs to be very enlightening. After reading it, I’m surprised OCZ even offers a 2 year warranty.
Many beginner programmers see the acronym API all over the place. Why are API’s everywhere? What do you do with them? How do they work? At the same time, many beginner programmers see or encounter XML. Why is XML everywhere? How do you turn XML into the integers or strings that I know how to deal with? These are excellent questions that aspiring programmers may ask themselves. For me it was difficult to grasp the big picture and see exactly why these two acronyms were talked about so often in the programming world. In this article I’ll explain what API’s are, why XML is so often associated with them, and at the end give a short example of how to “connect” to an API, grab some XML from it, and parse it to turn it into the integers or strings that you probably know how to manipulate on a regular basis.
So what is an API (besides an Application Programming Interface)?
Imagine you worked for a large company named Word Co. that organized words, specifically English language words. Perhaps your company scanned a bunch of textbooks and collected all of the words, counted the words, and created a big database full of useful information related to words. Basically you have a big set of information and one day your company (Word Co.) decides it wants to make all of that data available for other companies or allow individuals to see or access it. What are your options?
- Give people the actual database
- Make a website that pulls from the database
- Make an API that allows programmers to interact with the database
The first option is probably not a good one because the database can be huge (potentially gigabytes or terabytes of information), you may be using a proprietary database (such as Google’s BigTable) or software, or maybe you just spent millions of dollars collecting this information and you want to charge people for accessing it.
The second option may be a really neat idea but might not work if you wanted a mobile device or app to access it, or if you wanted to present the information in a different way other than a chart or web form. Imagine if someone wanted to make a Hangman game where you try to guess a random word (maybe a random word that was pulled from the big database of English language words) before a stick figure is “hung”. This is something the website cannot directly perform.
An API allows people to grab information (or use services) that are part of a huge data set in ways that might not be imagined by the people who created that large data set. If Word Co. organized English words and created an API to access those words, let’s take a minute to imagine what others can create with it:
- Hangman (as mentioned above)
- A computer opponent in Scrabble
- A word of the day app
- A spell checker
- A domain name brainstorming tool
Which are all applications or tools that Word Co. doesn’t have the time or desire to create. API’s are usually intended to allow third parties to create awesome things using existing data that a company has already harvested and collected. What are some other services that might have API’s?
- Weather services usually have API’s
- Google has a ton of API’s (like their Maps API, their search engine, and just about everything else)
- Facebook allows third-party developers to interact with the Facebook data
- Nearly everything else
So how do I “connect to” or use an API?
Although many API’s are different, it often boils down to making a request and getting some data. Some API’s give you a bunch of code or libraries that you add to your project, and then use that code to make the requests, but many other API’s are quite simple. If you are new to programming, I’d suggest looking for REST or so called “RESTful” API’s. Other ways to access API’s such as SOAP also exist, but in my opinion are a little harder to get started with. Fortunately many API’s that used to be SOAP based are now REST based. Let’s outline how you would use a typical REST based API:
- Make an HTTP request to a web server. Usually you’ll include a variable or two that is passed in through the URL
- Get some data back (typically XML)
- Parse the XML (the XML is just a big character stream and you’ll want to grab certain pieces of it and turn it into other data types or create an object)
- Use that data to do neat things! (Like create a Hangman game with a random word you just grabbed)
Notice that the data that comes back from an API is typically XML. Why XML? Because it’s a great intermediary “language”. Imagine if you wrote your Hangman game in Java and the Random Word API gave you Python code back. That wouldn’t be very useful. Or if you wrote something in C/C++ and an API gave you a serialized Java object.
What makes XML so popular (especially with API’s) is that it allows you to use whichever language you want, and gives you data is that both human readable and computer readable. Just about any programming language comes with standard libraries to parse XML quickly and easily. If you’re an advanced programmer, it also allows you to build objects or data structures (like if you’re dealing with A TON of data) exactly how you want them instead of forcing you to accept whatever the API gives you.
A concrete example in Java
Let’s make something! Imagine you wanted to create your own Android weather app. Since we aren’t meteorologists, we’ll get all of the weather information from someone else– Google’s Weather API. Other options are the National Weather Service (in the U.S.) or maybe Weather Underground. Most of the API’s out there are well documented and tell you how you should connect, use, or interface with them. Google’s Weather API is a little weird in that there is no documentation. I think it’s sort of a secret API. But here’s how you use it:
- Make an HTTP request to http://www.google.com/ig/api?weather=Location where Location is whatever you want (A postal code or city).
That’s it! You’ll get a bunch of XML back with the current weather and forecast information. You can even try it out in your web browser (since your web browser makes HTTP requests on a very regular basis). Let’s see what happens when we use Seattle WA as an example (from http://www.google.com/ig/api?weather=Seattle+WA):
<xml_api_reply version="1"> <weather module_id="0" tab_id="0" mobile_row="0" mobile_zipped="1" row="0" section="0"> <forecast_information> <city data="Seattle, WA"/> <postal_code data="Seattle WA"/> <latitude_e6 data=""/> <longitude_e6 data=""/> <forecast_date data="2011-09-29"/> <current_date_time data="2011-09-29 17:53:00 +0000"/> <unit_system data="US"/> </forecast_information> <current_conditions> <condition data="Clear"/> <temp_f data="62"/> <temp_c data="17"/> <humidity data="Humidity: 62%"/> <icon data="/ig/images/weather/sunny.gif"/> <wind_condition data="Wind: N at 4 mph"/> </current_conditions> <forecast_conditions> <day_of_week data="Thu"/> <low data="56"/> <high data="72"/> <icon data="/ig/images/weather/sunny.gif"/> <condition data="Clear"/> </forecast_conditions> <forecast_conditions> <day_of_week data="Fri"/> <low data="56"/> <high data="70"/> <icon data="/ig/images/weather/mostly_sunny.gif"/> <condition data="Partly Sunny"/> </forecast_conditions> <forecast_conditions> <day_of_week data="Sat"/> <low data="49"/> <high data="65"/> <icon data="/ig/images/weather/rain.gif"/> <condition data="Showers"/> </forecast_conditions> <forecast_conditions> <day_of_week data="Sun"/> <low data="54"/> <high data="65"/> <icon data="/ig/images/weather/chance_of_rain.gif"/> <condition data="Chance of Rain"/> </forecast_conditions> </weather> </xml_api_reply>
And let’s imagine we want to extract the highs and lows in this XML so we can use them in our Android weather app. As mentioned, many programming languages have built in libraries that allow you to parse the XML. Since XML is so popular, there are even multiple approaches to parsing it, even within a given language. Java has both a DOM parser and a SAX parser built in. Python also has a DOM parser and a SAX parser built in. What are DOM and SAX parsers?
- SAX (Simple API for XML) parsers are stream oriented parsers and typically use less memory and are faster
- DOM (Document Object Model) parsers are tree traversal parsers and can consume more memory if you’re dealing with large amounts of XML
When should you use one over the other? When you are dealing with HUGE amounts of data. Most of the time (such as right now) you don’t need to worry and can use whichever one you’re comfortable with. I’ll be using the Java SAX parser in this example.
Remember the steps to do this? 1) Make an HTTP request to the API, typically passing in a URL variable, 2) Get the data back and then parse it, and finally 3) Do neat things! Let’s see what that looks like in Java code:
Weather.java (first draft)
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
public class Weather
{
public static final String URL_SOURCE = "http://www.google.com/ig/api?weather=";
public static void main(String[] args)
{
/*** Create the request ***/
// Let's pick a location:
String location = "Seattle, WA";
// Create the URL:
String query = URL_SOURCE + location;
// Replace blanks with HTML-Equivalent:
query = query.replace(" ", "%20");
/***
* Make the request (This needs to be in a try-catch block because things can go wrong)
***/
try
{
// Turn the string into a URL object
URL urlObject = new URL(query);
// Open the stream (which returns an InputStream):
InputStream in = urlObject.openStream();
/** Now parse the data (the stream) that we received back ***/
// Coming shortly since we need to set up a parser
}
catch(IOException ioe)
{
ioe.printStackTrace();
}
}
}
So at this point we have some simple Java code that connects to the Google Weather API and receives some data back. In the above case, we are getting our data (the XML) in the form of an InputStream. In other languages you’ll still probably be receiving the data as a stream. Streams and I/O are a pretty big part of programming, so if you’re not sure how to work with these, now is a good time to start. Anyways, we now need to set up the XML parser. As mentioned I am picking the SAX parser for this example, and as the SAX parser explains on its website, you need to create a handler for handling the XML. In other words, you need to tell it what to do when it encounters specific parts of the XML. In this case we’ll look for <low>, <high>, and <day_of_week> tags. To define this behavior we’ll extend SAX’s DefaultHandler (meaning we give it more functionality than the default functionality). Let’s see what this looks like:
GoogleHandler.java
public class GoogleHandler extends DefaultHandler
{
// Create three array lists to store the data
public ArrayList<Integer> lows = new ArrayList<Integer>();
public ArrayList<Integer> highs = new ArrayList<Integer>();
public ArrayList<String> days = new ArrayList<String>();
// Make sure that the code in DefaultHandler's
// constructor is called:
public GoogleHandler()
{
super();
}
/*** Below are the three methods that we are extending ***/
@Override
public void startDocument()
{
System.out.println("Start document");
}
@Override
public void endDocument()
{
System.out.println("End document");
}
// This is where all the work is happening:
@Override
public void startElement(String uri, String name, String qName, Attributes atts)
{
if(qName.compareTo("day_of_week") == 0)
{
String day = atts.getValue(0);
System.out.println("Day: " + day);
this.days.add(day);
}
if(qName.compareToIgnoreCase("low") == 0)
{
int low = Integer.parseInt(atts.getValue(0));
System.out.println("Low: " + low);
this.lows.add(low);
}
if(qName.compareToIgnoreCase("high") == 0)
{
int high = Integer.parseInt(atts.getValue(0));
System.out.println("High: " + high);
this.highs.add(high);
}
}
}
And now that we have defined how the XML parser should behave, let’s add in our GoogleHandler to the Weather code:
Weather.java (final draft)
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLReaderFactory;
public class Weather
{
public static final String URL_SOURCE = "http://www.google.com/ig/api?weather=";
public static void main(String[] args)
{
/*** Create the request ***/
// Let's pick a location:
String location = "Seattle, WA";
// Create the URL:
String query = URL_SOURCE + location;
// Replace blanks with HTML-Equivalent:
query = query.replace(" ", "%20");
/***
* Make the request (This needs to be in a try-catch block because things can go wrong)
***/
try
{
// Turn the string into a URL object
URL urlObject = new URL(query);
// Open the stream (which returns an InputStream):
InputStream in = urlObject.openStream();
/** Now parse the data (the stream) that we received back ***/
// Create an XML reader
XMLReader xr = XMLReaderFactory.createXMLReader();
// Tell that XML reader to use our special Google Handler
GoogleHandler ourSpecialHandler = new GoogleHandler();
xr.setContentHandler(ourSpecialHandler);
// We have an InputStream, but let's just wrap it in
// an InputSource (the SAX parser likes it that way)
InputSource inSource = new InputSource(in);
// And parse it!
xr.parse(inSource);
}
catch(IOException ioe)
{
ioe.printStackTrace();
}
catch(SAXException se)
{
se.printStackTrace();
}
}
}
Doesn’t look so bad, does it? If you go ahead compile the two files (both Weather.java and GoogleHandler.java) you should be able to run it without any problems. Here’s the output when I ran it:
Start document Day: Thu Low: 56 High: 72 Day: Fri Low: 56 High: 70 Day: Sat Low: 49 High: 65 Day: Sun Low: 54 High: 65 End document
In the GoogleHandler there are System.out.println() commands, but it also adds the integers and strings into their own array lists which you can now access in a more familiar way (such as calling days.get(0) to get the first day of the week in that array list).
A concrete example in Python 3
And finally let’s take a quick look at how to do this in Python, again using a SAX parser. As you can see, Python does quite a bit of heavy lifting for you (such as making the HTTP request and getting the XML — which is one line of code). Go ahead and copy/modify this code for any of your projects. It was built and tested with Python 3.2.2 in October 2011.
Weather.py
import urllib.request
import xml.sax
# Create some lists to store the data:
lows = []
highs = []
days = []
# Define our special Google Handler that extends
# what the default content handler does
class GoogleHandler(xml.sax.ContentHandler):
def startElement(self, name, attrs):
if name=="day_of_week":
print("Day:", attrs['data'])
days.append(attrs['data'])
if name=="low":
print("Low:", attrs['data'])
lows.append(attrs['data'])
if name=="high":
print("High:", attrs['data'])
highs.append(attrs['data'])
# Make an HTTP request at the specified URL
# and get back a bunch of XML
xmlResponse = urllib.request.urlopen('http://www.google.com/ig/api?weather=Seattle+WA')
# Create a SAX Parser
parser = xml.sax.make_parser()
# Tell the parser to use our special handler
parser.setContentHandler(GoogleHandler())
# And parse the XML!
parser.parse(xmlResponse)
# Print out the lists:
print("Days:", days)
print("Lows:", lows)
print("Highs:", highs)
And let’s see what sort of output we get when we run it:
Day: Thu Low: 56 High: 72 Day: Fri Low: 56 High: 70 Day: Sat Low: 49 High: 65 Day: Sun Low: 54 High: 65 Days: ['Thu', 'Fri', 'Sat', 'Sun'] Lows: ['56', '56', '49', '54'] Highs: ['72', '70', '65', '65']
I hope this tutorial was helpful. If you have questions please ask away. I’ll also add that our fictional Word Co. (as mentioned at the top of this article) API isn’t just a made up concept to explain API’s. It actually exists!
How I taught myself to program (and which languages in what order)
September 25, 2011
Interested in learning to program and write code? Wondering what programming language you should teach yourself? Curious how other people got started? In this article I’ll explain how I started from ground zero, knowing nothing about programming or software development, struggled to grasp new languages and concepts, and later knew enough to get a job making custom website back-ends, writing scripts for my colleagues, and developing mobile apps. Along the way I’ll point out some helpful resources that I have found, as well as common pitfalls that I hope you’ll avoid. Let’s begin!
Which language should I learn?
A lot of people new to programming often ask this question. Depending on who you’ll ask (or where you ask it) you’ll often get a lot of different answers. When I was just starting out I asked people this question and did a lot of research. Java was something that people kept suggesting. Looking at all the features it had (like object-oriented, reusable, automatic memory management, or portable) made it sound very appealing, even if I didn’t exactly know what those words meant for a language. What ultimately convinced me was the adoption rate and the number of companies and organizations who used Java. I was thinking that if everyone else was using it, then it must be a good choice, right?
Wrong. Java was very hard to dive into. Your first, supposedly simple “Hello World” application was full of weird keywords that would take an experienced programmer a few days to fully explain. What the hell does public static void main(String[] args) even mean? Why is it necessary? What’s arguably worse about Java for beginners is that it introduces the advanced concept of objects and classes way too early in the game. Add strong typing, inheritance, and polymorphism to the mix and you’ll have beginners scratching their heads and getting discouraged. Sure those features are awesome (and almost necessary for large projects), but they are advanced topics that someone new to programming shouldn’t be too concerned about. But Java almost forces you to use them, or at least think about them. That’s why to get anything to run you have to use static void main(String[] args).
So after getting all sorts of excited to finally learn programming, I purchased Head First Java and spent a couple weeks writing my first programs. For the reasons summarized above, I was quickly discouraged and each day teaching myself programming was getting more tiresome. I took a very long break of about 6 months because I thought programming was difficult and it didn’t feel very rewarding.
What was the first language that was exciting?
I later thought it’d be a fun project to get a website up and running. After figuring out how to make a simple website (full of static HTML pages) I wanted to make a dynamic, user-interactive website which I realized would be easiest with PHP. After researching a good book, I ordered the highly-acclaimed PHP 6 and MySQL 5 for Dynamic Websites and within a week was writing simple, powerful, useful, and fun PHP code. The book is one that I have recommended to friends and though it assumes some programming knowledge, I found the book to be an excellent source in learning PHP from the ground up.
If you’re not familiar with PHP, it is a “web language”. What does that mean? It means that it is useful for writing web pages. To put it simply, PHP exists to spit out a bunch of HTML, process forms, interact with a database, and spit out some more HTML. Some great questions to ask are: where does PHP live? Do you compile it? Do you install it somewhere? What makes PHP such a good first language was that if your web server supports it, PHP is as easy as writing a couple lines of code in a simple text editor, sticking it on your web server, and then visiting that page with a web browser. Here’s an example page. Note that most of this is simple HTML, with just a couple lines of PHP code:
<html> <head> <title>PHP Test</title> </head> <body> <?php echo '<p>Hello World</p>'; $count = 10; $animal = "monkeys"; echo '<p>There are ' . $count . ' ' . $animal . '!</p>'; ?> </body> </html>
You’ll notice that the PHP lives in between the HTML. How easy is that? (pretty easy!). To get output from PHP code you typically echo or print it out in the form of some HTML, maybe in between some paragraph <p>. You can also use PHP to create any sort of other HTML tags like dynamic <div> and layout tags, or maybe headers, or buttons, or CSS, or JavaScript. PHP stands for the hypertext preprocessor meaning it is executed and ran before the HTML is displayed to the end user.
PHP also is a great language to learn in conjunction with a database, typically MySQL (because they play so well together). Databases are a big deal, and any aspiring programmer needs to know the basics of how to interact with one. What’s great is that PHP and MySQL are really easy and approachable. You don’t need to dive into crazy advanced topics to do some neat and very useful stuff. The week or two that I spent learning about databases when I was learning PHP has been super useful with other languages and future projects.
So why was PHP such an “exciting” language?
- It was easy to get started (no installation)
- I didn’t need to understand advanced topics to do simple things (dynamic weak typing, objects are optional)
- I created cool stuff quickly (a dynamic website that my friends and I used)
The last point is pretty key. If you are a beginner learning how to program, you’ll want to operate in a way that gives instant feedback and satisfaction. The other components help (no installation, no worrying about data types) but if you don’t feel like you’re accomplishing something, you’ll probably give up faster.
What was the second language that was exciting?
During my senior year of college I took my “first” formal programming course (technically third, since the first was using MatLAB to create ray-tracing programs for an optics course, and the second was Mathematica for applied boundary value problems and Fourier analysis). This class was taught by the computer science department and those wishing to learn computer science typically took it during their first year in the program. The language that was chosen for introductory students was Python.
Why Python? That was the very first lecture, and I’ll share with you the bullet points taken straight off the PowerPoint presentation:
- Named after Monty Python’s Flying Circus
Which probably just means the course instructor was a bit weird. But let’s look at what sort of programming assignments we had. Here’s the classic Hello World (with my addition) program in the Python IDE that comes bundled with the software installer (17 MB that unpacks to ~50 MB).
And let’s see what it looks like when it is executed (by pressing either F5 or going into Run -> Run Module)
Neat, huh? What’s also neat is that it takes all of 5 minutes to download, install, write, and run your first Python program. The software and packages are light-weight and comes with its own text-highlighting editor (IDLE) and console shell. Now let’s look at some of the other programs that our first-year class made:
- A text scanner to analyze large text documents (entire books) to determine word counts per sentences, word frequencies, etc.
- Animations and simple 2-dimensional movies by procedurally drawing and moving shapes.
- Sound synthesizers and audio transformation tools.
- A simple web crawler.
They sound pretty neat, useful, and engaging. Yet these were created by first year students, many of which never programmed before in their life. You might be thinking, “Wow, a web crawler… something that is a fundamental part of Google, Bing, and Yahoo… how the hell does a first year student make something like that?” Check out the code here– it’s under 50 lines of Python.
Python has a great community and comes with some awesome documentation. It doesn’t have a difficult installation or a write, compile, run workflow. The debugging information is usually helpful. It doesn’t force you to use objects and classes (unless you want to) or think about which data type your variables should be. At the same time, it’s powerful enough to do just about anything you wanted to. Unless you’re developing enterprise level software or extremely computationally intensive tasks (and most beginners aren’t!) you don’t need to write in Java or C/C++. In fact, Google is well known to use Python for all sorts of projects (such as this DNS benchmark utility). In fact many software engineers and research scientists are turning to Python for their projects or for rapid prototyping. You may also notice that at many of the universities and colleges in the U.S. computer science departments are frequently moving away from using Java for the first couple courses. (Two of the schools in my area use Python for the first two CS courses in a four course sequence, Java for the last two, and C/C++ or Java for the remaining advanced courses).
Let’s summarize why I found Python to be so exciting:
- It was easy to get started (the installation was quick and comes with its own IDE, called IDLE)
- I didn’t need to understand advanced topics to do simple things (duck typing, objects are optional)
- I created cool stuff quickly (a web crawler among others)
So what language should I learn?
Many enthusiastic and aspiring programmers still get stuck on this question. Here’s the answer: Learn whatever language works for you, keeps you excited, and is flexible enough to do whatever you want. With that in mind, I personally believe Python really worked for me. PHP got me very excited and is the language that I really learned how to program with, but it also has a pretty specific purpose (web development). Sure you can create websites with Python, but PHP was incredibly easy to get started with. And this reflects another very important point: Use whichever language is most suitable for the work that you are doing. You can use C/C++ to create a DNS benchmark software for individuals to check their speeds at home, but it’d probably be a lot faster to use Python. At the same time, you can use Python to create an algorithm to find large prime numbers, but you’d be much better off using C or even assembly language.
As a beginner don’t stress out too much about this question. The best thing you can do is pick a language, run through some tutorials, and see how you like it. The things that you’ll discover while learning your first language will all translate over to the next one. After I had learned PHP and Python I took another stab at Java, this time with the goal of creating an Android app. Within a few weeks I was pumping out code and starting to appreciate (and understand!) all the advanced topics such as objects, classes, inheritance, and polymorphism. But I wouldn’t ever have gotten it if I had just forced my way into it. Learning how to code, all the features of certain languages, and computer science in general is a process that will never end.
So what are you waiting for? Get started!
How to make a web crawler in under 50 lines of Python code
September 24, 2011
Interested to learn how Google, Bing, or Yahoo work? Wondering what it takes to crawl the web, and what a simple web crawler looks like? In under 50 lines of Python (version 3) code, here’s a simple web crawler! (The full source with comments is at the bottom of this article).
And let’s see how it is run. Notice that you enter in a starting website, a word to find, and the maximum number of pages to search through.
Okay, but how does it work?
Let’s first talk about what a web crawler’s purpose is. As described on the Wikipedia page, a web crawler is a program that browses the World Wide Web in a methodical fashion collecting information. What sort of information does a web crawler collect? Typically two things:
- Web page content (the text and multimedia on a page)
- Links (to other web pages on the same website, or to other websites entirely)
Which is exactly what this little “robot” does. It starts at the website that you type into the spider() function and looks at all the content on that website. This particular robot doesn’t examine any multimedia, instead it is just looking for “text/html” as described in the code. Each time it visits a web page it collects two sets of data: All the text on the page, and all the links on the page. If the word isn’t found in the text on the page, the robot takes the next link in its collection and repeats the process, again collecting the text and the set of links on the next page. Again and again, repeating the process, until the robot has either found the word or has runs into the limit that you typed into the spider() function.
Is this how Google works?
Sort of. Google has a whole fleet of web crawlers constantly crawling the web, and crawling is a big part of discovering new content (or keeping up to date with websites that are constantly changing or adding new stuff). However you probably noticed that this search took awhile to complete, maybe a few seconds. On more difficult search words it might take even longer. There’s another big component to search engines called indexing. Indexing is what you do with all the data that the web crawler collects. Indexing means that you parse (go through and analyze) the web page content and create a big collection (think database or table) of easily accessible and quickly retrievable information. So when you visit Google and type in “kitty cat”, your search word is going straight* to the collection of data that has already been crawled, parsed, and analyzed. In fact, your search results are already sitting there waiting for that one magic phrase of “kitty cat” to unleash them. That’s why you can get over 14 million results within 0.14 seconds.
*Your search terms actually visit a number of databases simultaneously such as spell checkers, translation services, analytic and tracking servers, etc.
Let’s look at the code in more detail!
The following code should be fully functional for Python 3.x. It was written and tested with Python 3.2.2 in September 2011. Go ahead and copy+paste this into your Python IDE and run it or modify it!
from html.parser import HTMLParser
from urllib.request import urlopen
from urllib import parse
# We are going to create a class called LinkParser that inherits some
# methods from HTMLParser which is why it is passed into the definition
class LinkParser(HTMLParser):
# This is a function that HTMLParser normally has
# but we are adding some functionality to it
def handle_starttag(self, tag, attrs):
# We are looking for the begining of a link. Links normally look
# like <a href="www.someurl.com"></a>
if tag == 'a':
or (key, value) in attrs:
if key == 'href':
# We are grabbing the new URL. We are also adding the
# base URL to it. For example:
# www.netinstructions.com is the base and
# somepage.html is the new URL (a relative URL)
#
# We combine a relative URL with the base URL to create
# an absolute URL like:
# www.netinstructions.com/somepage.html
newUrl = parse.urljoin(self.baseUrl, value)
# And add it to our colection of links:
self.links = self.links + [newUrl]
# This is a new function that we are creating to get links
# that our spider() function will call
def getLinks(self, url):
self.links = []
# Remember the base URL which will be important when creating
# absolute URLs
self.baseUrl = url
# Use the urlopen function from the standard Python 3 library
response = urlopen(url)
# Make sure that we are looking at HTML and not other things that
# are floating around on the internet (such as
# JavaScript files, CSS, or .PDFs for example)
if response.getheader('Content-Type')=='text/html':
htmlBytes = response.read()
# Note that feed() handles Strings well, but not bytes
# (A change from Python 2.x to Python 3.x)
htmlString = htmlBytes.decode("utf-8")
self.feed(htmlString)
return htmlString, self.links
else:
return "",[]
# And finally here is our spider. It takes in an URL, a word to find,
# and the number of pages to search through before giving up
def spider(url, word, maxPages):
pagesToVisit = [url]
numberVisited = 0
foundWord = False
# The main loop. Create a LinkParser and get all the links on the page.
# Also search the page for the word or string
# In our getLinks function we return the web page
# (this is useful for searching for the word)
# and we return a set of links from that web page
# (this is useful for where to go next)
while numberVisited < maxPages and pagesToVisit != [] and not foundWord:
numberVisited = numberVisited +1
# Start from the beginning of our collection of pages to visit:
url = pagesToVisit[0]
pagesToVisit = pagesToVisit[1:]
try:
print(numberVisited, "Visiting:", url)
parser = LinkParser()
data, links = parser.getLinks(url)
if data.find(word)>-1:
foundWord = True
# Add the pages that we visited to the end of our collection
# of pages to visit:
pagesToVisit = pagesToVisit + links
print(" **Success!**")
except:
print(" **Failed!**")
if foundWord:
print("The word", word, "was found at", url)
else:
print("Word never found")
Magic!
Everything you need to get a website up and running
September 14, 2011
We live in 2011, complete with computers and the ever present internet and world wide web. Nearly everything has a website, but do you? This guide will attempt to explain everything you need to do, starting from scratch, to get a website up and running. Whether it is a personal website, a new business website such as a restaurant, or a complex number-crunching website such as Google, I’ll detail each step and provide enough information for you to get started. Here is a rough outline:
-
Determining your domain/brand name
-
Finding a web host
-
Registering your domain name
-
Designing the website
-
Uploading/Updating your website
-
Troubleshooting and testing
Each of those bullet points will have a dedicated section below, so feel free to skip to those sections if you’d like. You may be wondering what my motivation is be to make this guide. Well, to be honest, this is an experiment. I was sitting in front of my computer a couple years ago, surfing the web, and suddenly realized that with all these websites in the world, why didn’t I have one? I was standing in your shoes 2 years ago. That started the journey into researching just exactly how to go about it. I researched costs, web hosting servers, domain registrars, and different ways to create the actual HTML and CSS that powers a website. Along the way, I realized there was a lot of bad advice and even more false advertising. People made biased guides just to get others to sign up for their webhost and collect a profit, and other guides were just full of advertisements for a particular technology that no one needed. I’m here to give you some personal advice, as well as plenty of choices and options along the way.
I’m not trying to get you to sign up for my web host. I’m not trying to sell you some sort of EZ-Website Maker Deluxe. I’m not telling you which domain registrars to use, or which products to buy. But I will offer my advice and the lessons I learned while creating my first website.
So, let’s begin…
Determining Your Domain / Brand Name
Basically you have two routes here– do you already have a name (such as a restaurant or company that’s been in business for awhile) or are you startup company and you want one of those ambiguous, catchy “web 2.0″ names like Google or Bing?
If you are going for the latter, you’re in for some bad news (but don’t be too sad). Unfortunately for you, the world wide web has been around for some 20+ years, and domain registrations are only around $10 / year. That means someone can buy tens or even hundreds of domain names and hold on to them. Imagine a company that makes a relatively small profit of $3 million per year. They could buy thousands of domain names for an insignificant amount of cash. Now imagine Apple, Google, or Microsoft, which make billions of dollars per year, and imagine how many domain names they can simply “hold onto” just in case. Do they do that? Probably.
What I am trying to get at is the fact that domain names are fairly cheap and there is no limit to the number you can have. Many of the short, catchy, one or two syllable, “good” domain names are already sold and registered. Does that mean you can’t ever get them? No. But you’ll certainly have to pay more than $10 dollars to get one.
There are many websites out there that allow you to bid on or buy existing domain names. The prices here will vary, from as little as $30 to upwards of $500. For short 4 or 5 letter domains, you may be expected to pay thousands.
If your company or business already has a name, you might be inclined to use that as your domain name. However, there are a few things to be aware of.
Is your business name catchy? Do you have a strong brand? Do you expect people to already know your name? In that case, go ahead and use that as your domain name.
However, if your business is relatively unknown, or if you are expecting your website to bring in visitors you may want to include some keywords in your domain name. For example, I could have named this website stephensswriting.com but I instead wanted my domain name to explain my website’s purpose. That’s right, picking out your domain name is the first step in optimizing your website for search engines and discoverability. If your company was a small ceramic company called Bakerlite, you might want to try bakerliteceramics.com. The other option, www.bakerlite.com is not very helpful to search engines unless people already associated Bakerlite with ceramics. In that case, it would be wise to reinforce the brainwashing, er… I mean, association.
Let me conclude with some ideas of pricing and how to go about registering your domain name. First of all, expect to pay no more than $10/year for a domain registration. In addition to a domain name registration, you also need a web host (more on that below). Many places that host your website also offer domain registrations. You should also be able to register a domain through one company, and then host it at a different company.
Finding a Web Host
Now that you have a domain name picked out (and possibly registered) you’ll need a web host. This is essentially a “computer” connected to the internet. This “computer” is always on, stores your website (this includes text, pictures, video, HTML, CSS etc.), and accepts incoming connections (visitors) and serves them the data that they want (the text, pictures, video, HTML, and CSS etc.)
In fact, your computer right now could theoretically host your website. The problem is that you’d always have to have it turned on and have a fast connection to the internet. Additionally, you’d need some software running to receive HTTP requests. The most common software to do this is the Apache HTTP Server.
The thing is though, if you are reading this guide (meaning you’re a beginner) you probably won’t have the technical skills and resources to personally get your own Apache HTTP Server up and running. It would be much easier for someone else to do it.
There are a lot of good resources for picking out the company that will ultimately host your website. While you are doing your research, keep in mind some of these keywords:
- Bandwidth – This is how much data flows to your visitors. If you plan on having lots of images, files, or movies on your website, bandwidth becomes important. Many web hosts advertise “unlimited” bandwidth. There is no such thing as unlimited bandwidth, but instead the company will hope you never notice or push up against the limit.
- Disk space – This is how much data is on your website. If you plan on having images, files, or movies, keep in mind that they take up a lot of space. Again, there isn’t really anything such as unlimited disk space. If you plan on backing up your personal hard drive onto your web host’s disk space, they’ll likely make the transfer so painfully slow that you’ll give up.
- Uptime – This is how frequently the web server will stay on. Ideally this number should be 100% (always on). Many companies advertise 99.999% uptime, which is some statistic that is likely made up. The best way to determine how true this claim is is to ask current customers. Check the company blogs and twitter and look at customers’ comments. Are they happy?
The web host that you’ll ultimately pick is up to you. A small website for your friends and family (maybe 300 visitors/month) won’t need all the bells and whistles as website like like Google (maybe 30,000,000 visitors/month). You should realize though that if you have a small website and one day you post something that is incredibly awesome and “goes viral” on the internet, expect your website to go down. Many web hosts offer upgrade paths to allow your website to grow, but don’t expect this to happen on the hour your website suddenly becomes popular.
If you choose wrongly about your web host or you get upset at them, keep in mind that unless you signed up for an incredibly sketchy web host, it shouldn’t be too difficult to take your content and move to a different web host. Your website (the content including text, design, pictures, etc) is your website. You own it. Not your web host.
I will briefly mention that after doing quite a lot of research around the web I found that Dreamhost sounded like a good web host. It wasn’t the cheapest option out there, but it looked to be the most trustworthy. It also helped that it had a large base of satisfied customers. If you’re interested, check them out here. Disclaimer: I receive a referral bonus if you sign up for them through that link.
Registering your Domain Name
The next step (or the same step) is registering your domain name. The reason why this might be a different step is that your domain name and your web host can be two separate entities. You can register your domain name at Company X and host your website at Company Y. It might be a little tricky and there will be more steps involved if you do this, but this is an option.
As mentioned, when you sign up for a web host, they’ll often include 1 domain registration with the web hosting, sort of a bundled “package”. If they don’t include 1 registration, you can likely pay the $10 for it through the same company at the time of sign up.
You might be wondering who is in charge of all the names of all the websites that make up the world wide web. After all, isn’t the world wide web supposed to be a collection of independently created content that spans international and inter-continental distances? Where does your $10 dollars go when you register a domain name (Who’s making all the money? And why didn’t I think of that!?).
The answer that you are probably looking for is the International Corporation for Assigned Names and Numbers (ICANN). But they don’t directly receive your $10 dollars. Instead, the ICANN delegates the tedious job of selling and registering domain names to various ICAAN-accredited domain registrars. An example of an ICANN-accredited domain registrar is GoDaddy. These second-tier registrars are the ones usually interfacing with the public (you and I) and asking for the $10 dollars in return for registering a domain name. The difference is that second-tier registrars may purchase a few million addresses (they’re buying in bulk) and resale them at a “retail” price to customers.
So now imagine that we have a domain name and a web host. The next step is…
Designing the Website
When you type in www.google.com in your web browser, what happens? Well, behind the scenes, your web browser first seeks a dynamic name server (DNS) to translate a human-readable address, such as “www.google.com” to an internet (think “computer-readable”) address of 74.125.226.176. Next an HTTP request is made to the server located at 74.125.226.176 that basically says “I want your data”. Now that you have an address (or domain name), and a server (a web host), you can send some data to people whose browsers are making HTTP requests.
What exactly is this data that comes from a web server to a client (a visitor)? Well, for the most part, it is a bunch of text mixed in with some images, perhaps a movie, or maybe an Adobe Flash game. This is what is referred to as content on a website. A website consists of content:
- Text – You are reading a bunch of text right now, aren’t you? Other text may be interactive and include hyperlinks to pages or other websites.
- Multimedia – Any pictures, videos, music, PDF files, Flash applications, Silverlight applications, Microsoft Word Documents that you can download.
- Design – This includes any CSS (and accompanying HTML) of your website. CSS will be described below, but in short this is the code that describes how your text and multimedia should be presented to the end user (a visitor).
Let’s first examine what a website really looks like, before your web browser makes it all pretty. Depending on your web browser (such as Mozilla Firefox, Internet Explorer, Google Chrome, Safari, etc.) these steps may be a little different.
- Firefox – Go to View and then select Page source. Alternatively you can right-click anywhere on a page and select View page source.
- Chrome – Right-click anywhere on a page and select View page source.
- Internet Explorer – Go to View and then select Source.
As you can see, there is quite a bit of text. You might see some common themes though, such as <a href=”some_address”>Some text</a> or maybe some <div id=”something”></div>. An image might look like <img src=”address_to_image” alt=”alternative text” />. Your web browser takes all of this text and renders it into a web site that is pleasing for humans to see and interact with.
What is all this code that I see? Are there some reoccurring themes?
- HTML is very common and can be thought of all the little pieces or building blocks of a website. It describes where headers, paragraphs, links, pictures, divisions, and just about everything else goes.
- CSS is used in conjunction with the HTML elements. CSS describes how a particular header looks, the indentation of a paragraph, or the length and width of a division to name some examples.
- Javascript is code that performs tasks or functions for the visitor. Many websites can function without Javascript, but other websites will usually have Javascript code running to control simple things like fading images or advanced things like asynchronous calls to a database or for formatting a website on the fly. A quick thing to note– Javascript is NOT the same thing as Java (another programming language). It’s also primarily used for client-side execution (meaning the code is run on a visitor’s computer, different than code that is running on the website’s computer or server). Code that runs on a web server (such as PHP, Python, Perl, Ruby, etc) is code that a visitor will not ever see and therefore you will not see it by looking at the page source.
Okay, so how do I create this content? Now here is where it can get complicated. There are literally hundreds (possibly thousands) of editors and website generators that you can use. You can use a pre-formed template and fill in the missing blanks. You can use a graphical editor where you drag images and word blocks around to position them on the screen. You can use a simple text editor like Notepad or Microsoft Word (though MS Word is not usually a good idea for web design). You can use a hybrid editor like Adobe Dreamweaver. You can even use an editor that is inside your web browser, such as the WordPress editor (which is actually called the TinyMCE editor). The thing to keep in mind, at the end of the day your visitor is still receiving the same HTML/CSS “data” that was described above*.
Why the *asterisk? It’s very possible that the “helper” editors such as Dreamweaver or WordPress will accidentally add in extra spaces, extra <span> blocks </span>, or occasionally refuse to format your paragraphs and content exactly how you want them to look. Most of the time this isn’t a big issue, but there are always those purists who need maximum control and love to dive into the nitty-gritty raw HTML and CSS. Many of these purists will use simple text editors like WordPad or Notepad. Let’s look at what a very basic web page looks like in one of these editors:
[sourcecode language="html"]
<!DOCTYPE HTML>
<HTML>
<HEAD>
<TITLE>Super Basic Website</TITLE>
<META name="keywords" content="Test" />
</HEAD>
<BODY>
<H1>Welcome to the super basic website</H1>
<P>Here is a paragraph on a website</P>
<div id="sidebar">
<P>Hello sidebar!</P>
</div>
</BODY>
</HTML>
[/sourcecode]
If you’re adventurous, you can open up a new WordPad or Notepad document, copy and paste the above HTML into the editor, and save it something like testwebsite.html. You can then open that in your web browser (Firefox, Internet Explorer, Chrome) and see before your very eyes how the web browser takes the HTML and renders it into a web page.
At this moment I’d like to point out so-called WYSIWYG editors that stand for What You See Is What You Get which usually means you will not be working with basic raw HTML like what is shown above. Instead you’ll be editing text that is already rendered out as it would be in a web browser. Instead of seeing:
[sourcecode language="html"] <strong>This text is bold</strong> [/sourcecode]
you’ll see something like:
This text is bold
What’s great about many WYSIWYG editors is that they usually let you switch back and forth between working the rendered mode and the raw HTML mode to get the best of both worlds. I find that it is much easier to work in a rich, full featured rendered mode for nearly everything, but when there are extra <span> blocks or indentations and lists are not working exactly as intended I can click a button and switch over to the raw HTML.
I’ll make a quick note about a raw text editors. Notepad and WordPad are suitable for writing basic, unformatted text, but aren’t the best for writing and examining code. Take a look at the same HTML in each of these two editors:
You’ll notice that everything is color coded and indented nicely. This makes editing code drastically more efficient without adding any additional complexity (a rare win-win with most new technology). Another feature of coding editors is that brackets </>, curly braces {…}, and parenthesis (…) change colors or boldness to let you know where you left one off. For these reasons I would strongly suggest using something just slightly fancier than WordPad or Notepad for editing code such as HTML. An excellent tool that I have used (and many, many others have used) is NotePad++. It’s free, light-weight, open-source, and is not any more difficult to use than a regular text editor.
Now let’s move onto something more advanced than just typing the raw HTML into a text editor– using WordPress to create content for a website. What’s neat about editing on WordPress is that you can edit your website on your website. What does that mean? It means that you open up a web browser, go to your website, press the log-in button or link, and can start typing up a new post right inside of your web browser. Notice the two buttons that let you switch between typing in a WYSIWYG editor and typing out some raw HTML. You’ll spend most of your time in the WYSIWYG editor, or what WordPress refers to as the “normal” editor.
Uploading/Updating your Website
Okay so imagine you picked out a domain name, registered it, and purchased web hosting. Now how do you put your first web page out onto the World Wide Web? Here are a couple options:
- You can FTP/SFTP to a web server to upload a file such as about.html or index.html. FTP stands for File Transfer Protocol and SFTP stands for Secure File Transfer Protocol.
- If WordPress or some other Content Management System (CMS) is installed on the webserver you can visit your website, log in, and edit pages through your web browser.
Let’s look at the most basic approach first– uploading a file to your web server. Imagine you created the HTML file that discussed earlier. This is a web page in its simplest form. You can name it whatever you’d like, whether that is testwebsite.html or blahblah.html. Now you need to put it onto your webserver so other people can visit the webpage and see it rendered out in their web browser.
Hopefully once you signed up for a webserver they gave you a username and password for it. Now you just need a piece of software that will connect to that web server and allow you to transfer the testwebsite.html file to it. An excellent free, light-weight, and open source tool that I use all the time is FileZilla. Here’s how I transfer a file to the webserver:
Here the file is on my desktop
Here I am transferring the file using FileZilla SFTP software
And after typing the address into any web browser, here is the new page on the world wide web!
That was pretty simple, right? In FileZilla you can also create directories. Right-click on a folder and select Create Directory. Give it a name, maybe “about”, and now you can put web pages in various directories. For example, you can create a page called thecompany.html and thefounders.html and put them in a directory called “about”. Visit those pages by going to www.yourwebsite.com/about/thecompany.html or www.yourwebsite.com/about/thefounders.html.
One thing to note is the special name index. If you name any file as index, that is the file that will show up if no other file is specified. This works when you want to have a specific page show up with you visit your web site. If I renamed TestWebsite.html to index.html, I would visit the page by just typing in www.netinstructions.com instead of www.netinstructions.com/TestWebsite.html.
Now let’s move away from using FTP/SFTP software and a text editor to create web pages. Instead, let’s create a web page using WordPress. The following pictures and steps assumes that you already have WordPress installed on your web server.
Just visit your website by typing in the address
Log in with your username and password (this will be set up right when you install WordPress)
Now you can create a new post or edit an existing post by just typing into the text box. Click update or publish when you’re done.
Note that you can switch between looking at the raw HTML and the rendered content
As you can see, editing and adding content through WordPress is pretty simple. Many websites these days allow users to add content through software that runs on the web server and is accessible with a web browser. This lets you maintain your website on just about any device (computer, tablet, phone) anywhere in the world at any time. Other content management systems (CMS‘s) exist besides WordPress such as Joomla, Drupal, Plone, Tumblr, Blogger, and many more.
Troubleshooting and Testing
As you’re going through these steps, you may have some issues. I’ve selected a few of the more popular problems that pop up from time to time.
To FTP or to SFTP? And where to put the files?
When you want to upload files to your web server or web host the most common way to do this is with a client using the (Secure) File Transfer Protocol. It is strongly suggested that you do not use FTP and instead use the secure protocol (SFTP). It’s not difficult to use SFTP instead of FTP. For example, you’ll want to use port 22 instead of port 21 when you are using FileZilla. Many other FTP and SFTP clients will just have a checkbox or a setting to switch between the two. The reason why you do not want to use FTP is that your username and password is passed from your computer across the internet to your web server in an unencrypted and exposed way. When you instead use SFTP, your username and password is encrypted before it it sent out across the internet. If a man-in-the-middle were to intercept your transfer of packets, they wouldn’t be able to “see” your username and password.
Is it really possible for someone to “intercept” your packets? Yes, absolutely. It may not be likely, and it will probably not be a human, but there is certainly the possibility that some router that your packets travel through to reach the web server will have software or code looking for insecure FTP credentials. If you are really curious how many routers are in the middle of you and your web server, an easy trace route will show you. On a Windows machine, go to your command prompt by typing Run… and then cmd or typing in cmd to the search bar. Once you’re in the command prompt, type tracert yourwebsite.com. After a moment this will show you all the routers that your information passes through before it reaches the destination.
One thing you’ll notice when you use the SFTP for the first time is the need to accept the key initially. A warning box may appear such as shown below:
It is probably safe to trust the host when connecting to it for the very first time. If you add this key to the cache, regularly connect to the host, and one day the server’s key changes, you can start to act suspicious.
Another helpful point to make is where you’ll want to stick your web pages. Most web servers are running Linux (the operating system) and Apache (the software that listens for and allows incoming HTTP requests). On those machines you’ll typically want to stick your web pages and content at the site root. Some site roots might look like:
- /home/user_name/yourwebsite.com/stickyourpagehere.html
- /home/www/yourwebsite.com/stickyourpagehere.html
DNS settings and how to change them!
DNS (Dynamic Name Server) settings are typically associated with your domain name. When you own your domain name you should be able to change some of the DNS settings if you so desire (and most of them time you probably wouldn’t unless you were manually setting up email services, pointing your domain at a new host, setting up Google Apps for domains, or adding subdomains). But I’m including this here so you know they exist. Here are some fields and possible values (a complete list of DNS record types are here):
- A is for mapping a hostname (think domain) to an IP of the host
- MX is for use with mail exchange
- TXT is for a simple textual message and you could theoretically put a random message here, but why would you?
- MX 10 ASPMX.L.GOOGLE.com means Google is handling my mail. This is because I have Google Apps for domains because I really like GMail to handle all of my email needs.
- MX 20 ALT1.ASPMX.L.GOOGLE.com is backup in case the first email exchange server is down. Redundancy is important! The higher number indicates the order of preference.
- MX 20 ALT2.ASPMX.L.GOOGLE.com is yet another backup. You’ll also see many more.
- A 173.236.239.73 is the IP address that is used to visit the website.
- SOA server: ns1.dreamhost.com means that the server hosting my website is located at ns1.dreamhost.com. It’s possible to look at DNS records of other websites to determine who their web host is.
PHP/MySQL Requirements for WordPress
When you are picking out a web server to host your website, you’ll probably want support for certain web based programming languages and databases (PHP, Perl, Python, MySQL to name some). Even if you don’t plan on writing your own custom code immediately, it is likely that you or someone on your team will want to expand your website’s capabilities in the future. If you plan on using any Content Management Systems such as WordPress, Drupal, Plone, or any others, your web server will need to support the languages that those CMS’s are built on. WordPress requires PHP and MySQL, whereas Plone requires Python.
A good web host will proudly list all of the web languages and services that they support, as well as the current version. For example, the web host Dreamhost currently supports PHP 5 and MySQL 5 and the current WordPress version requires PHP 5.2.0 or greater and MySQL 5.
Conclusion
Hopefully at this point you have seen a very broad overview of how to make a website starting from nothing. As discussed, there are multiple ways of accomplishing each task. As you work on building your website you’ll discover what works best for your needs. Personally I built my first website using Adobe Dreamweaver (a WYSIWYG editor) and by following the book Dreamweaver CS3: The Missing Manual by David McFarland. My second and third websites were all done in NotePad++ and uploaded via FileZilla. For those I wrote a custom PHP backend interacting with a home built MySQL database. I found the book PHP 6 and MySQL 5 by Larry Ullman to be very helpful for those two projects. My last three websites have all been WordPress based and I am currently learning how to write my own custom themes. You’ll soon realize that each approach towards building a website has its own set of pros and cons.
I would encourage you to do some of your own research in finding a decent web host and domain name registrar. Find a company that you trust and is transparent about their uptime. See if you can find any real customer reviews (I found a lot of fake reviews and fake websites built just for recommending Company X or Company Y). After doing lots of research my personal choice was Dreamhost. If you sign up with them through that link, I will receive a referral bonus (thank you!). I have been a customer of Dreamhost for about 3 years and have been very happy with them. However, do your own research! They are a great host for me, but your needs may be different.
Lastly I’d like to add that this is my first tutorial. Any feedback, criticism, experiences, or opinions are welcome and encouraged. Leave a comment below!






























