Why You Should Care About Your URLs (and how to!)
I’ve noticed something in user testing, and in my personal life observing friends (including non-tech-professionals) using the internet: they visually scan the URL of the page they are on all the time.
People read and use URLs.
URLs are part of every page on the web, they are a handle, a pointer, a reference, a micro help file, and a guide.
URLs are intuitive. Every user of the web has shared a URL. Every user knows how to load a page in their browser if they know the URL for that page.
URLs show the user’s location within a given information hierarchy. URLs offer and reinforce a taxonomy for a body of information.
Accomplish Two Things with Your URLs
1) Design your URLs to be user-friendly
2) Design your URLs to rank well in search engines
Anatomy of the URL http://fun.example.com/hello-world/?necessary=false
[ http:// ] Protocol: which application-layer protocol your browser should use. Http is the most common, https is a secure variant of same, you may have seen Ftp before, that’s another one. The only relevance to this conversation is that it is important to demonstrate to the user that the connection is secure in certain sensitive situations, like if you have a banking or money-management application.
[ fun ] Subdomain: it is possible to have many sites/apps all running on the same domain. ‘www’ is the most popular subdomain(though arguably/unofficially) deprecated). A subdomain can map a component of your top-level identity (jobs.example.com), or a specific product or feature (search.example.com). Also subdomains are used for versions of applications (beta.example.com).
[ example.com ] Domain: should represent your organization or brand.
[ com ] Top-level domain: represents the type of organization or brand. Full details here. .com = commercial, .mil = military, .gov = government, etc. Country-coded top-level domains (ccTLDs) denote the regional/national affiliation of the organization or brand: example.co.uk = UK commercial entity.
[ /hello-world/ ] Path: the top-down information heirarchy, including directories and ultimately the atom of content the user is viewing. We will be discussing path in excruciating detail, don’t worry.
[ ?necessary=false ] Query string: one or more key-value pairs that the server allegedly requires to process the request accurately and exhaustively. Sorry engineers, these suck. If query strings can be avoided, they ought to be.
They lengthen URLs and hinder human reading of URLs unnecessarily. By unnecessarily I mean, without adding any value to the user or to search crawlers. Making an engineer’s life easier is not your concern, and frankly if you define beautiful URLs and your engineer(s) can’t implement them easily, get new engineers.
They confuse Google, more than you might expect. Actually, this applies to all search engines. Search engines, for all their wisdom, suck at distinguishing between two pages that have the same copy, but different URLs. Since web content is so often generated dynamically from query strings, the engines are not able to ignore them, and of course engines key off of URLs because they represent the atomic level at which results are served. All engines do is say URL-X is better than URL-Y for any given search query. There are ways you can defend against duplicate-content penalties, notably page exclusions in robots.txt and more realistically the use of the ‘canonical tag’, but nothing is foolproof. Abstain.
In some cases, query strings are visible when they shouldn’t be, simply because the developer was a noob, and chose to process input from GET requests instead of POST requests. Either this makes sense to you or it doesn’t; if it doesn’t, no worries you probably don’t care, just refer to bullet 2).
Best Practices for User-Friendly URLs
URLs should be static and canonical. For any given page/resource on your site, there should be one unique URL (see above). People bookmark and share URLs, and so any given URL should never disappear. If you migrate content from one CMS to another, be sure to either redirect (hopefully to prettier URLs, using 301s) or keep the existing links without change.
As mentioned before, avoid query strings/parameters. They’re ugly, period. They mean nothing to the user and render the useful information in the URL opaque.
NEVER allow technology to infiltrate the URL. If a technology ‘owns’ the resource pattern, it better be a kickass approach. Rails has reworked their default routing a few times, it still has room for improvement, but even that system that provides an option off the shelf allows for infinite customization. If a developer tells you “that’s how Rails does it” find a file called ‘routes.rb’ and delete it in the middle of the night and make them write a new one by hand, to your specifications. For Java, my pet peeve is seeing ‘/servlet/’ or similar in a URL. What does this accomplish?
Remember the context of a user reading a URL, and think creatively about meeting your users needs! If they are seeing a resource that requires them to be logged in, and refers to their settings or some other personal information, what about starting the path with ‘/my/’? Just one example, taking the user into account will cause you to think very creatively.
Beyond these high-level recommendations, the best way to dive into lower-level topics is to go through some case studies, so here we go!
Case Studies: User-Friendliness of URLs for User Generated Content
Flickr Photo: http://www.flickr.com/photos/liang_chen/424178669/
Pretty durn aight (good). You have the domain, which is always on the same subdomain across Flickr, helping SEO and providing a consistent user experience. The path is where the magic is happening: we start with the type of resource (photos), then narrow down the possible content by specifying the photographer/user (liang_chen), and then ending with a unique identifier for the actual photo (424178669).
Flickr Recommendation: http://www.flickr.com/photos/liang_chen/jamiroquai-at-scala-london/
The only thing I’d change is swapping the photo ID out for a slug. People share URLs!! Imagine getting the first URL in an email or IM, and then imagine getting the second one. Personally, I would be far more likely to check out the second URL, since it tells me immediately that there is something on the destination page that I care about (Jamiroquai). The first URL looks like someone could just as likely be sending me a picture of their friend’s cat. Cats for me? Not so much. Secret Jamiroquai shows? Sick.
YouTube Video: http://www.youtube.com/watch?v=GLbRvcbkAwE&playnext_from=TL&videos=jWCEx4D8qtg&feature=rec-LGOUT-exp_fresh%2Bdiv-1r-2-HM
Fucking shoot me.
YouTube Recommendation: http://www.youtube.com/google-developers/google-io-2010-google-storage-for-developers/GLbRvcbkAwE
Tell me if you want to take time from your work day and view the first video. No idea? How about the second? I rest my case. I even am throwing the developers a bone here by including the unique identifier at the end of the URL, if they still want to grab it and key off a system-wide unique token, but they ought to be clever enough not to have to.
We’re following a heirarchy here!
Slideshare: http://www.slideshare.net/mackinaw/how-to-build-an-unsuccessful-startup-presentation
Well flick my tits and call me Barry, it’s perfect.
Slideshare (bonus win): http://www.slideshare.net/mackinaw
Looks like the Slideshare folks took my advice from the YouTube example. The author’s profile page sits one level above the piece of content he made, and why shouldn’t it? Nicely done.
I hope this is useful and consistent information that will help you construct beautiful and helpful URLs for your users. Of course URLs also have a hand to play in ranking well in the search engine rankings, so we might wonder how these two possibly competing goals relate. Let’s explore SEO-friendly URLs.
Best Practices for SEO-Friendly URLs
Let’s assume you have identified what keywords you wish to rank well for, and the goal from a URL format perspective is simply a matter of where to map your keywords to achieve lift. If you aren’t sure what you are trying to rank for, take a step back and dig into the SEO literature and some keyword analysis, as well as an analysis of SEO vis-a-vis your target market/audience.
SeoMoz’ Search Engine Ranking Factors lists URL-related factors #3, #9, #10, and #11 most important ranking factors. We can go through each factor to see how we can leverage URL formatting to improve ranking.
Keyword Use in Domain Name
Well, it either is or it isn’t, right? If you are starting a site fresh, and there is a common word across your highest-traffic keywords, consider including it in the domain, since this is a big one for natural ranking.
Keyword Use in Subdomain Name
Now, remember that engines split authority by subdomain, unless they have a darn good reason to share the love across subdomains of a given domain. What this means is that you might not want to do something extreme like photo-name.example.com. That said, if there are ways of bucketing your efforts in a way that allows you to leverage keywords in subdomain, knock yourself out. By and large, however, keyword use in subdomain is just like keyword use in domain name in terms of strategy, with domain presence simply being much stronger as a predictive factor.
Keyword Use in Page Name
See below
Keyword Use in Page Folder
How are these different? I don’t think they are, and I don’t think that engines parse the part of the URL between the host and the query string into named components, though they almost certainly split it by directory/folder level (‘/’).
So now what it comes down to is something like: Keyword Use in Path. Great. Use keywords in your path. From what I’ve heard, the closer you place the keywords to the root (host), the more weight they get. I believe this, as other ranking factors seem to take the left-to-right, top-to-bottom, important-content-comes-first perspective.
Keyword Stuffing in URLs
This once worked, now it sucks, give it up or don’t bother.
Conclusion
This conversation could go on practically forever. My goal is to get you thinking about URLs, even a little bit, as part of the larger user experience. In the end SEO-friendly URLs and User-Friendly URLs probably end up looking pretty similar. My guess is that focusing entirely on user-friendly URLs will give you extremely defensible SEO-friendly URLs, and that focusing on SEO can be hazardous to your health and the overall user experience of your site. The search engines are built to read content like people would, and make smart guesses about what’s relevant, high quality content and what isn’t.
One last note: if you want to rank well, one important factor is volume and quality of incoming links. Providing lovely, readable, simple URLs might just end up helping you gain in-links, as well as encouraging people to click those links - just as users rely on the URL location bar more than you think, they also have a habit of peeking to see where they will land if they click that mysterious link, so give them a clue where they’re headed.
And yes, I wish I could remove /post/123456/ from this post’s URL, oh so badly.