Part 4: Activity plan

To work strategically with your web analytics, you or someone in the web team should set up an activity plan that clearly takes you from the current state to a desired state. This part of the book covers a range of examples of activities and metrics you can choose to include in your own activity plan.

It is the hygiene factors on the website that create the prerequisites for a valuable website. Starting with all of them at once is unwise; they take time. Look through them and select five to ten to carry out (and master) during the coming year. It is better to set a goal you can achieve.

Hygiene factors

The point of trying to live up to the hygiene factors in this part is to have some measurable way of relating to the technicalities that affect visitors' experience. Furthermore, they are a way to try to ensure that the website does not deteriorate over time, but rather continuously improves with each update.

The list of activities you have selected can supplement the test protocols that system developers have. Most who build larger websites already have a systematic way to verify that a web project maintains the right quality. Quite many have long used so-called build servers to have control over code quality for a larger team of developers. Several of these tools have the ability for plugins that can do some checks automatically for you. So by all means do an inventory of the development architecture. If you stumble upon something for Continuous Integration⁶³, it is worth looking for plugins. It is possible to automatically stop deployment of new features if they do not live up to the quality requirements that have been set.

If you run a larger website, it is a good idea to take a bit of interest in what tests the developers have to ensure the right quality is achieved. These tests can be anything from completely manual to fully automated. But be prepared that it can be mined territory. The reaction can really be anything from being glad that you are interested to everyone should deal with their own problems.

Since tools and tricks for checking these factors do not always last forever, you can turn to a page on my website where I have a current list⁶⁴ of tools I myself use at the moment.

Starter pack for your activity plan

The list that follows is things you yourself can check to ensure the website works as you want. Of course, you can swap out values if you want to aim higher or lower. Select which points you think it is reasonable for every subpage on the website to live up to and discuss them with your web developer (sometimes you need to reason differently).

As you will realise when reading the list, there are things you need to add depending on how your website is intended to be used. For example, those who have chosen to go mobile first should add an activity to the list for not using or linking to PDF files on the website (they are usually unfriendly to those with small screens).

But now it is time for a bunch of suggested activities.

Implement HTTP/2 and HTTPS

Before I or someone you meet completely converts you to the gospel that HTTP/2 is the best thing since sliced bread, think critically about which users you are leaving behind. Those who do not have equipment that supports the new web protocol. At the time of writing, it is mainly those on older versions of Internet Explorer who belong to this group. Before you take the step, you should know how large a proportion of your users and intended target group will benefit.

In Microsoft's world, it is only Internet Explorer 11 that supports HTTP/2, while all other modern browsers have had it for a long time (and unlike Internet Explorer, do not tend to live on overtime). That the web servers you use support the protocol is not a given, but many had support already during 2016. If both the user's device and all involved web servers support HTTP/2, it will automatically be used. Furthermore, the browser remembers that the web server offered HTTP/2, so the next visit will be even faster as the browser already in its request to the web server says it wants to speak HTTP/2.

Daring to take the step

In many cases, there will be no additional cost or manual effort to activate HTTP/2 on your website. If you already use an external CDN (content delivery network), it is quite likely that the files there already benefit from HTTP/2. However, you may need to undo certain performance optimisations to take advantage of the new protocol. A challenge here will be how to optimise the website both for those on HTTP 1.1 and HTTP/2, since good performance practice differs; we will get into this more later.

For many years, HTTP 1.1 will live in parallel with HTTP/2, to some extent. This is an activity we who work with web analytics need to return to. Which systems we use can start working with version 2, and what does that mean in practice?

For those of you with a small WordPress website, this is a small measure compared to large organisations where masses of new, old and sometimes prehistoric systems cooperate to serve content to the web.

The reason HTTP/2 somewhat turns upside down the quality requirements you set on a web system running version 1.1 is because they have thought carefully about what the web of today looks like. Among other things, the new version is better at handling multiple simultaneous files and at keeping an open channel for sending files as they are needed. That security equivalent to HTTPS is included by default can be seen as a bonus.

The systems you look at first are those that send more than one file to the visitor per page view. Obviously the web server itself, but also if you have a content delivery network (CDN), separate image system, etc.

Keep this in mind when you read the rest of this part of the book – splitting content into a few files is good practice with HTTP/2, but bad practice in HTTP 1.1. This means that if you optimise performance according to HTTP/2's terms, it will penalise those running older technology, so consider when it is worth taking the step.

This point is more about monitoring what is expected of a modern website. At the time of writing, HTTPS is more or less standard practice for getting an extra nudge in search engine optimisation, and sooner or later it is version 2 of HTTP, that is HTTP/2, that will be something encouraged within SEO.

About HTTP/2

For those of you who have no idea what HTTP is, I can briefly explain that it stands for HyperText Transfer Protocol, which suggests it is the protocol for Hypertext – that is, HTML. You could say that HTTP is the secret language your computer speaks with the web service you connect to. You have surely seen it in the address bar when browsing the internet. When it instead says HTTPS, it means there is a layer of security on top of HTTP, in the form of something called TLS (Transport Layer Security).

The version of HTTP we have now was standardised in the 1990s and requires a fair amount of patching to make it work the way the web looks today. Therefore, Google started work on SPDY, which became the starting point for the now standardised HTTP/2 (the name for version 2).

What HTTP/2 offers is to make our web-based services faster, simpler and more robust. This is done in technical jargon through supporting full multiplexing, which reduces waiting time. The goal is to roughly halve the time it takes to load a web page. Furthermore, compression is introduced on the HTTP header, which means that the content of extensive cookies will not drag down performance quite as much.

Those of you who have read up on HTTP can rest easy. Really, nothing of what you have learned changes; it is the same field names and the same methods as before. Because of this, a transition to HTTP/2 is fairly undramatic.

It is a good idea to proactively investigate whether you can support HTTP/2 with your website, initially without it disturbing in cases where a large proportion consists of less sophisticated visitors on HTTP 1.1.

Now websites can push notifications

A completely new thing with HTTP/2 is that the web server itself can take the initiative to contact a client/browser. Previously, the web has given the appearance of being able to have two-way communication. But on a technical level, the web server suffers from memory loss between each connection and the client had to regularly remind it of its existence. That is, the client constantly made requests asking if the server had anything new to say.

That the web server can now take the initiative for this contact should also save on wasted traffic, reduce the load on computers, which means they draw a little less power. This has up to now been attempted through a technology called websocket, but there may be developments going forward.

Web hosting or own server?

If your website is on a web host, you will probably have to wait politely until they decide to offer this. But if it is on your own server, it may be worth checking what can be done. It is by no means urgent to make a transition to HTTP/2, but at the next overhaul of the website, it is definitely time. If the website does not yet run HTTPS, that should also be evaluated. What HTTPS shows your visitor is that their privacy is valuable, that forms that are posted cannot be snooped on between the browser and the website. And that the content on each page is secret from those who administer all the networks between the visitor and the website.

All browser manufacturers have promised to implement HTTP/2, so then it only remains for your web server to get support for you and your users to benefit from this new version of HTTP.

Ensure that tracking of site search, outbound links, error pages and downloads works

Surprisingly often when I have helped with colleagues' and clients' website statistics tools, they have not tracked which search terms are used on the website's own search function. Most have a search function today and as you can read in this book's deep dive into search analytics, it is a valuable resource.

For those thinking you can settle for keeping track of search terms people use on external search engines like Google, you have missed a completely crucial difference. Those who search on Google are open to suggestions about who among everyone can help them further, while those who search once inside your website search for what they believe exists on your website. In this way, you can in plain language find out the users' expectations and hopes. Furthermore, it is interesting to check what is searched for, as it suggests users may have difficulty finding that particular content. Perhaps it needs to be highlighted?

Not only search tracking is something you should configure in your tool; there is surely much more. The problem with not making these settings proactively is that it is not always possible to recreate them after the fact when you realise they are useful. Perhaps you start from square one.

Also make sure that tracking of outgoing links is done correctly; you want to know to whom you are driving traffic. Naturally, you are interested in knowing when users end up on error pages, and if it is possible to track even more serious errors, that can prove useful. If you offer downloads on the website, or have a lot of content uploaded, ensure that you know how it is used. External parties may be linking directly to your content without you understanding what is happening.

Sometimes you can group different types of content in your website statistics. Say that you have articles for inspiration, product pages and support pages; it can be interesting to see how they contribute to the whole.

Not accidentally leaking personal or customer data to web analytics tools or other third-party systems

Without thinking about it, more or less sensitive information can end up in systems where it does not belong. Most noteworthy is if you leak sensitive personal data to a third party, but even if personal data ends up in the wrong tool, it can pose a risk of misuse.

As soon as you collect personal data, you are also responsible for how it is handled, who has access to it, etc. In most countries there are laws regulating privacy and data protection. It may be worth doing some research on this for the market you operate in so it does not come as an unpleasant surprise later.

If you want to speak to the developers in developer language, or set requirements in your performance budget, coding standard or whatever you want to call it, it is that you expect content and forms to use the POST method in HTTP. As a retired developer, I can admit that many developers are not very good at HTTP. If you exemplify it by saying you should not use GET, perhaps it becomes clearer. If more clarification is needed, mention something about a posted form's fields not being intended to be displayed in plain text in the address bar.

The fact is that this type of handling also makes it a bit more cumbersome to get an overview, since you get more unique addresses to one and the same view on the website. If the cumbersomeness were not argument enough, or that it is bad practice, it is also a breach of among others Google Analytics' agreement to send them personal data. It is probably best to refrain before having your account brutally shut down for breach of contract.

Three or fewer stylesheets

Stylesheets is the name for CSS (Cascading Style Sheets) which is a solution, believe it or not, from the 90s, created by Microsoft. The idea is to separate visual design from the content and structure of a web page.

It is hard to argue that a visitor to a website benefits from the stylesheet being split into many different files. For the developer who created the website, on the other hand, it can be very convenient to have multiple stylesheets.

How many CSS files do we need?

A stylesheet that resets all margins to the browser's outer edges is not uncommon (a so-called reset.css), one more for basic design, and another for common colours to follow the graphic profile. An additional CSS file for the website to follow responsive web design (which is often an addition to an existing web), then another to set colours for the local version of the website.

Then it is not uncommon for a few more to be added depending on your plugins in WordPress (or hack-happy web consultants). But let us settle for 5 CSS files to download, so we are on the safe side in a small mathematical example.

Why minimise the number of stylesheets?

The reason you want few stylesheets, CSS files that is, is because each file that needs to be downloaded has a waiting time before it starts being sent to the visitor. This is due to what the kids of today call lag. Exactly how long this waiting time is depends on at least three factors:

The web server sending the file.
The network between the server and the visitor.
The visitor's own connection to the internet.

Because of this waiting time, you normally want as few files as possible to send to a visitor. The more files, the more time wasted on waiting.

Take the example above with five CSS files. In the best conceivable scenario, it will not be noticeable at all; then we coldly assume that the visitors are on a wired connection, the sun is shining, everyone except your visitors is out in some city park grilling sausages and drinking rosé wine. Your intended target group are the only ones using the internet and your server is doing well.

In a more realistic scenario, at least someone in your intended target group will be in the situation of having a questionable mobile 3G connection with response times of 0.1 seconds. That means half a second is spent waiting to find out how the content should look, what colours and fonts are used and how the right column should fall to the bottom of the page for mobile visitors. Then we have not counted the time it takes to transfer the CSS files' actual content – just the time it takes to wait for the files to start being sent.

Why does one have more than a single stylesheet?

Yes, one can actually wonder about that. If I speculate, and draw on my experiences as a web developer, I would claim it is due to laziness combined with imprecise requirements from the commissioner. You shift the blame and claim the client should know web development better than the web developers you in the procurement sold as the country's foremost experts.

It is always a bit too easy to blame the commissioner. In this particular case, additional CSS files can be added for the user to download for every “quick fix” made on a website. Or if you run WordPress, you need to watch out for what extra files a plugin needs.

Dirty hacks, quick fixes, laziness, ignorance or no professional pride?

I have unfortunately experienced colleagues adding yet another CSS file to a page. When I asked why they chose to add yet another file that delays the loading of a website, I often got the answer that it was the quickest solution. If you have not agreed on some form of minimum quality level, it is unfortunately only to agree – it is absolutely easiest and fastest to do dirty hacks – that is why they are called dirty hacks…

3 or fewer JavaScript files

Not infrequently, a lot of JavaScript is downloaded that the user will never have use for. Often they are split into multiple files, which contributes to extra waiting time. As we just established, it is normally an advantage the fewer files that need to be sent to a visitor.

You might ask yourself why one does it this obviously wasteful way and splits the content into multiple files? Well, the simple answer is that it is easier to calculate the cost of a developer finishing quickly than all the time wasted at the users' expense.

Of course, a balancing of interests needs to happen here. That is why you need some form of yardstick for what is okay when it comes to the use of JavaScript. This particular point is about the number of JavaScript files, that you should merge them, but there is of course a limit to how large the file is allowed to be for you to gain something.

Common division of JavaScript

Not infrequently you have some different categories of JavaScript material, for example JavaScript that:

Transforms the menu on small screens, puts in one of those hamburger menus which saves masses of space.
Supports interaction in form fields, for example by telling which required fields the user forgot to fill in when posting.
Comes with design frameworks, such as Bootstrap, which helps with design components like dialog boxes, how error messages should be displayed (and a lot of additional design magic you chose not to use on your own website).
The jQuery library is often thrown in out of habit, but quite often it is required as a component by other components.

Besides these, you often have one or more jQuery-based add-ons for image galleries, page ratings and other things.

If you are not careful, you get these JavaScripts in their own separate JavaScript file each. For those who choose to try to do a good job, you are faced with which JavaScripts to combine into a single file (if not all of them). The problem is that you must make a trade-off between sending JavaScript code that will not be used by every single visitor and splitting the whole thing into multiple logical groupings. Not all your visitors will see the image gallery, for example, so not everyone needs those JavaScript files.

One thing is clear, however: the lazy version far too many run with is that all JavaScript is downloaded from its own file regardless of whether it is even needed on the page that calls it.

Say for example that you have chosen to have lots of individual JavaScript files; then only those needed for a certain subpage should be downloaded. That is, JavaScript for form validation is not downloaded on a page that lacks forms.

Alternative approach: lazy loading

Lazy loading is a design pattern that involves what is needed for a page view being downloaded a bit later, or only when needed. Just as images in the footer do not need to be downloaded for the majority if only a few scroll down there, in the same way JavaScript files can be downloaded when they come into use.

The motivation for this is of course not to weigh things down more than necessary. The criticism of merging all JavaScript files is that then the user receives masses of data that is unnecessary for their particular visit.

What lazy loading presupposes is that the response time needs to be very low, meaning it goes extremely fast to transfer something from the web server to the visitor's browser. That is definitely not the case if the visitor has a wireless connection via 3G or Edge. On the other hand, it is excellent for the 4G network and naturally all more modern wired connections. So the question is what type of visitors your website has, and how large a proportion of your visitors are negligible for the design pattern you choose.

As you understand, there is reason to have a certain margin, and in my world three JavaScript files is that margin, especially since without particularly great effort when building a new website it can be a single JavaScript file.

Beyond the number of JavaScripts, you ideally do not want them to be loaded before the page can display itself; you can check this in Google PageSpeed nowadays – I think it says you should prioritise visible content if they think you have failed (“above the fold”).

3 or fewer images for the design

Developers should use CSS instead of images to the greatest extent possible. In ancient times, when layouts were made with tables, images like blank.gif were used to push letters in from the left margin. Furthermore, you could not count on CSS working in older browsers.

“But then I'll throw in a web font, an SVG file, and a CSS sprite”

You are absolutely right, there are files that are sort-of-almost images. Web fonts are image-adjacent in the sense that they are a visual component. SVG files are basically images in text format, and CSS sprites are a technique for combining several small images into one large file. The key principle is the same: minimise the number of files the browser needs to download for the page's appearance.

Just as with CSS and JavaScript, the reason to minimise design images is about reducing waiting time for users. Each file has its own round trip of latency.

Only load what is needed, when it is needed

A general principle for good web performance is to not load things that are not needed for the current page view. The concept of lazy loading (which we touched on under JavaScript) applies equally to images, fonts, and any other resources. If an image is below the fold, there is no urgency in downloading it before the user scrolls there.

Modern HTML even has a native attribute for this: loading="lazy" on images. This tells the browser to defer loading until the image is near the viewport. For JavaScript-dependent components, the concept of loading on demand through AMD (Asynchronous Module Definition)⁶⁵ or similar techniques can help.

The principle is simple: respect your users' bandwidth and time. Do not force them to download things they may never see or use.

Make sure you have access to good data for web analytics

This may seem obvious but it deserves its own heading. Ensure that you have configured your web analytics tool correctly and that data is actually being collected. It is not unheard of that tracking code was removed during a redesign or that a new subpage was launched without the analytics snippet.

Make a habit of regularly verifying that your data collection works as expected. At least check that the number of tracked pages roughly matches the number of pages your search engine or CMS knows about.

Validate according to WCAG 2.0 level AA

Since 2015, poor accessibility is considered discrimination under Swedish law. From 2016, the EU decided that all public authorities' websites, intranets and apps must comply. WCAG (Web Content Accessibility Guidelines) is the international standard for evaluating and improving web accessibility.

Level AA is the most commonly referenced level. It covers things like sufficient colour contrast, keyboard navigability, text alternatives for images, and more. If you are building a new website, WCAG compliance should be part of the acceptance criteria.

Hmm, what does this WCAG thing mean?

WCAG is built around four principles, often abbreviated POUR: Perceivable, Operable, Understandable and Robust. Each principle has guidelines, and each guideline has testable success criteria at levels A, AA, and AAA.

The university of Gothenburg was reported for discrimination⁶⁶ as early as 2013 for having a learning platform that was inaccessible to visually impaired students. In 2016, the EU reached an agreement⁶⁷ that public authorities' websites and apps must meet accessibility standards.

More about accessibility and testing tools

W3C has a list of tools⁶⁸ for testing the accessibility of a web page. Since there are different regulations, it is worth checking that you comply with the minimum requirements where you have operations. For example, 508 Checker for not breaking US legislation.

If you have notifications, follow up on them!

It is becoming increasingly common for websites to have notifications. Either in the old-fashioned way where inside the interface there are numbers or other markers indicating that behind a button there are unread or new things. But more recently, websites can notify even when the browser is not visible on screen.

You who have used the professional network Linkedin.com cannot have missed that their notification system has been broken since the beginning. Microsoft's Yammer has also shown a small indication of drawing attention when nothing new is to be seen underneath. They are surely far from alone in this. Given that users can be connected to the same service via several simultaneous channels, like app, mobile web and web on a computer (momentarily in sleep mode in the backpack).

If your website has notifications, make sure to follow up whether it works as intended. The simplest solution is to ask your users or yourself use your service actively.

Another alternative is to design the service so you can follow up. It is possible to build in logging of the user interface. For example, whether a button has had the status “new messages” and whether there were actually only read messages after a click.

Nowadays, computers can also receive this type of notification, so we risk appearing more than permissibly incompetent if it pings everywhere but there is nothing new to see.

Important outbound links should lead to accessible websites

Partly you help those of your own visitors who have a disability, while at the same time you steer your website's trust and power towards pages that have the same values as you.

Examples of tools for checking websites' general accessibility include WAVE⁶⁹, or if you want to automate the check, you can look at Tenon.io⁷⁰ – you perhaps cannot count on web editors taking the time.

Page views should be under one second?

One angle on why this is important for you is search engine optimisation (so-called on-page optimisation). Google gives you a certain amount of time for their indexing. Then it is good if they manage many pages, right?

To keep up with the search giant's reasoning about web performance, it is their performance guru Ilya Grigorik you should follow. If you want to read up on technicalities, he has published his book for free online – High Performance Browser Networking⁷¹.

Better usability on the website leading to higher conversion

Another reason why a website needs to be fast is for the users' sake. They lose the feeling of flow and perceive that they are waiting at roughly one second. Then you risk losing your user before they achieved what you or they wanted from the visit.

The recommendation is to serve a page in about 0.1 seconds. Then the interaction is perceived as immediate. If it takes between 1–10 seconds, the user begins to perceive that they are waiting. Even though studies show most people can maintain concentration on what they are doing, the technology begins to irritate the user.

Not wasting the web server's resources

An unusual angle on why a website should load fast is because the web server sending the page cannot handle an unlimited number of simultaneous transfers. The actual number varies of course with how expensive a solution you have, but there is potential to lower your server costs, manage longer on existing servers or perhaps be more ready for a traffic peak during a crisis or a successful campaign.

Free tools for measuring a page's load time

If you want to make it easy for yourself, check out Google's tool PageSpeed Insights; it warns if the server's response time is too long. At the same time, Pingdom has several services for this, but most interesting is the one that measures the website's response time several times per hour. Then you get a diagram of performance over time and can also receive warnings via a mobile app.

Google PageSpeed mobile should be at least 80 out of 100

Google PageSpeed is a concrete measure of how good a website's performance is, something both Google and real users care about. One reason you might settle for Google's view is because we are in their power when it comes to SEO anyway.

Now it may not be 80 out of 100 that you specifically should choose. However, you need to choose a level, something that aligns with your ambition regarding the website's performance. Perhaps you should raise the level and choose 90? If mobile customers are the most important for you. Or you choose 50 if your website has a long way to go to become good on mobile.

Google themselves have an opinion about what they consider a good level (my emphasis):

The PageSpeed Score ranges from 0 to 100 points. A higher score is better and a score of 85 or above indicates that the page is performing well. Please note that PageSpeed Insights is being continually improved and so the score will change as we add new rules or improve our analysis.
- Google's documentation on PageSpeed Insights⁷²

Create a test page to evaluate changes in web design

For it to be a fair comparison from one measurement to another, you need to test under the same conditions. In other words, you must set up a representative test page on your website. That particular page is the one you run tests against to know that the results are comparable over time. What can differ is the web server's response time; otherwise the page's content in an editorial sense should largely be the same from one time to the next.

If you do it this way, you know whether changes in web design have made things better or worse. It can be that you try switching themes if you run WordPress, that your web consultants have upgraded something or possibly done new development.

How to design your test page

For your test page to be meaningful, its editorial design needs to remain practically the same over a longer period. If you are to compare the result before an upgrade of the publishing system with after, the content needs to be similar.

My suggestion is that you create a subpage solely intended for testing. That page should contain text, images and things that are normally complex for a regular page on the website. This is not where you embed a bunch of strange external services, widgets, video clips or things that neither you nor your developers have control over.

Do a baseline measurement

When you have your test page, you do a first measurement at Google PageSpeed, then you reconcile the result with everyone involved and document it somewhere (so people understand that this is what they have to live with going forward). In some suitably formal way, you should probably explain that it is not acceptable for the result to get worse.

You may well be your own counterpart, in which case a lawyer is somewhat superfluous and you have to tell yourself to shape up :)

Images should be optimised for the web

There is much to think about when decorating your website, but also when filling it with editorial content. Those who decorate are usually professionals in image handling, but not always equally professional in what applies to the web and what conditions the web sets.

Then there are editors and all users who upload material to websites. We do not always get it right, nor is it certain that the tools we are offered help us.

Sometimes you have images via a media bank, where the media bank does the work for you; sometimes you upload them manually. In the current situation, when everyone has finally bought into the idea that websites must be responsive down to a smaller screen, yet another concern arises. Namely that small screens often have higher resolution than large ones, and can therefore benefit from more detailed images (although sharpness is not the be-all and end-all).

But small screens, mobile phones in most cases, also have the major disadvantage of having a wireless and sometimes wobbly connection. This means that even normally-resolved images can feel extremely slow to download.

Image handling for the web is, in other words, harder than ever.

Images should be in the resolution at which they will be displayed

An additional snag is that most responsive websites have fluid column widths. The way almost everyone has designed it, there are images that fill the entire column's width, which means we do not know in advance exactly at what size the image will be displayed.

Take as a thought exercise that the middle column can be from 300 to 500 pixels wide. The image shown at the top of that column needs to be as wide as the column for it to look “harmonious”.

Do you then choose to upload images that are 500 or 300 pixels wide?

Let us calculate the difference. Say the image has a 2:1 ratio. Then the larger variant becomes 125,000 pixels (500 x 250). If it is instead 300 wide, the total is 45,000 pixels (300 x 150). That is a huge difference in image area and your choice will definitely affect performance.

To add insult to injury, we have the challenge of high-resolution screens. What Apple users call retina. This often means double the resolution, meaning each pixel becomes four to achieve maximum sharpness – so it does not look blurry for those accustomed to the sharpness.

The above image would thus be 500,000 pixels to be really sharp (1000 x 500). That is ten times as much data compared to if you choose to send the image at 300 x 150.

Do you dare send such a high-resolution image to a mobile phone that is occasionally connected via a questionable mobile network? Here you need to make a trade-off, but as I hope you have understood, this is nothing you can do without careful thought. The basic rule is deceptively simple: images should be in the resolution at which they are displayed, and they should not be scaled down with HTML or CSS.

If you are worried about this, feel free to read up on responsive images online; I also have a bit more about it in my book Webbstrategi för alla.

Images should be saved in a suitable format

The basic rule is that images should be saved in the format that allows the smallest file size in relation to maintained image quality.

Most people know the formats GIF (apparently pronounced “jiff” according to some), JPG and PNG as something you encounter on the web. These formats have their strengths and weaknesses, extremely briefly:

GIF supports animations and transparency, but only 256 colours. Effective specifically for animations and sometimes simple illustrations, like logotypes.
JPG is a format for photographs. Very good for all images that lack areas of a single colour – if such things occur, they tend to “bleed” and show pink and green spots if compressed.
PNG has two variants, one at 8 bits and one at 24 bits. The 8-bit one has similarities with GIF as you can have a maximum of 256 colours. At 24 bits you also get transparency and it then supports millions of colours. PNG is most efficient as 8-bit and for illustrations, infographics, logotypes, etc.

Then there are initiatives like WebP, created in 2010. Something Google, eBay and Facebook have campaigned for. They happen to display very many images so it is absolutely in their interest to get broad support among browsers. WebP images are according to tests I have read up to 25% smaller in size compared to whichever of the other three performs best. So it is something to keep an eye on.

Images need to be saved for the web

When you have chosen the right format, an image needs to be saved optimised for the web. This exists in all image editing software I have encountered. It may be that if you are unsure you have chosen the right format, you check the other formats when saving for the web. At least Photoshop supports this by allowing you to choose format in the top right corner of the dialog box.

If you choose PNG, feel free to check whether the file becomes smaller and still looks okay as 8-bit, preferably with as few colours as possible. After a while, you get a pretty good feel for which images might work as another image format.

With JPG files, you can roughly choose the level of compression, but there is also a slider where you can choose between 0–100 how much quality you want to preserve. Be watchful of areas with the same colour or shade, and especially watchful of areas of human skin where we seem to be extra sensitive. The person in the image can appear sick if the image has been optimised too hard.

What you will see with too hard optimisation is that the detail richness seems to disappear, that discoloured squares of usually green and pink appear. Cut a piece of white into a photo and compress hard and you will see what I mean.

With a GIF file, you reason as with an 8-bit PNG.

Run images through lossless optimisation

Now you have already taken three steps for an image to be quick to download. When will we be done, really? Well, this is the last step in the process I myself follow.

What we now do is remove completely unnecessary data from the image – so-called lossless optimisation. The point is that the optimisation should not be detectable by the naked eye, but sometimes the files can become much smaller. Count on at least 5% being possible to shave off any image whatsoever.

: Image 63: ImageOptim shaves off up to three quarters of images' file size.

I, who run a Mac, use the fantastic program ImageOptim. It works so splendidly that I drag a bunch of images (perhaps the entire web image folder) to the program's window and then all images are optimised and overwritten in their current location. After that, you upload the images again.

For Windows there is a WordPress plugin called EWWW Image Optimizer⁷³ which can run through your entire image library and optimise everything losslessly in one go.

Do not use fonts that need to be downloaded

The advantage of using fonts already installed in users' devices is above all performance, but also predictability. You know what your typography will look like. When you load a font, you add a file to download – with the associated latency – and you get the phenomenon called FOUT (Flash of Unstyled Text)⁷⁴. The user first sees the text in one font and then it suddenly switches when the custom font has loaded.

Web typography

There is a whole world of web typography for those interested. But from a performance perspective, using system fonts is the safest choice. If you want to see which fonts are available across platforms, check Font Family Reunion⁷⁵.

Loading fonts via Google Fonts

One of the most common ways to load custom fonts is via Google Fonts. This is a free service where Google hosts fonts you can use. The disadvantage is that you involve a third party in the communication with your users, and there is an extra file to download. If you decide to use Google Fonts anyway, at least limit yourself to the minimum number of variants.

Follow W3C recommendations for HTML, CSS and JavaScript

Web standards are not a new idea, and following them is about ensuring your website works predictably across different browsers and devices. W3C (World Wide Web Consortium) maintains the standards for HTML and CSS, and following them is part of professional web development.

How do I know we follow web standards?

The easiest way to check is to run your pages through W3C's validators. They check for syntax errors and compliance with the standard you have declared. Having zero errors is the goal, but a handful of minor issues is not usually critical.

But what about all the shiny new things I want to use?

New features are constantly being added to web standards. Use them progressively – make sure the basic experience works without the new feature, and enhance for browsers that support it. This is the principle of progressive enhancement.

Minify front-end code

Minification means removing unnecessary characters from code – spaces, line breaks, comments – without changing functionality. It is purely about reducing file size.

Minification in practice

A minified file can be 20–40% smaller than its unminified counterpart. That difference adds up for users on slow connections. Most build tools and CMS have minification capabilities built in or available as plugins.

How to start minifying as a developer or on WordPress

For developers, tools like UglifyJS (for JavaScript) and cssnano (for CSS) are standard. For WordPress users, plugins like Autoptimize can handle this automatically. The TV series Silicon Valley⁸¹ is about a compression company, so apparently even Hollywood finds this topic exciting.

Send text files compressed

Compression (usually Gzip or Brotli) reduces the size of text-based files (HTML, CSS, JavaScript) as they are transferred over the network. The browser decompresses them on arrival. This is transparent to the user and typically reduces transfer size by 60–80%.

Why compress?

Because it is free performance. There is virtually no downside for modern servers and browsers. If you are not compressing text files, you are leaving easy performance gains on the table.

How do I compress in WordPress or in .NET?

In WordPress, several caching plugins enable Gzip compression. In .NET/IIS, it can be enabled through the compression settings in IIS Manager. Check your response headers for Content-Encoding: gzip to verify it is working.

Follow webbriktlinjer.se

For those operating in Sweden, webbriktlinjer.se (The Swedish Web Development Guidelines) is an excellent resource. It is the continuation of the “24-hour authority” initiative and provides practical guidelines for accessible and well-functioning websites.

Gasp Who cares?

You should. The first guideline, with the highest priority, is to follow WCAG 2.0 level AA. But there are many more guidelines covering everything from how the back button should work⁷⁶ to how you handle PDFs. Even for the private sector, these guidelines contain sound advice.

File lifetime defaults to 30 days

When a browser downloads a file (image, CSS, JavaScript), it can cache it locally so it does not need to download it again on the next visit. How long the browser keeps the file is controlled by HTTP headers you set on your web server.

Which files can live long in the browser's cache?

Static files that rarely change – like your logotype, CSS files, JavaScript libraries – can be cached for a very long time. A year is not uncommon. Files that change more often, like HTML pages, should have shorter cache times or be revalidated.

Okay, but what if I desperately need to push out a new logo?

The standard trick is called cache busting: change the file name or add a version query string (e.g. logo.png?v=2). Then the browser sees it as a new file and downloads it afresh.

Setting lifetime in WordPress, Apache and in .NET

In Apache, you use mod_expires or mod_headers. In .NET/IIS, you configure it in web.config. In WordPress, most caching plugins handle this for you. The key header is Cache-Control with a max-age value in seconds.

Sensible URLs

A good URL is readable, predictable and describes what the page is about. Users should be able to look at a URL and get a sense of where they will end up. Search engines also benefit from descriptive URLs.

What a URL should not contain

Avoid session IDs, long query strings with cryptic parameters, file extensions where unnecessary, and anything that makes the URL unpredictable. Also avoid special characters and spaces.

Uppercase or lowercase in a URL?

Stick to lowercase. URLs can be case-sensitive depending on the server, and mixing cases leads to confusion and potential duplicate content issues.

How dynamic a URL can you tolerate?

The less dynamic, the better. If your CMS generates URLs with IDs and query parameters, see if you can configure it to produce clean, readable URLs instead.

Quality indicators for a URL

A good URL is: short, descriptive, lowercase, uses hyphens to separate words, avoids query strings, and is stable over time. Every time you change a URL, you lose any link equity it has accumulated.

Responsive typography

Have you checked through all your subpages on a mobile or other small screen? Probably not. If you do, one or two headings are sure to pop up that cause you to have to scroll sideways to read the entire word, since Swedish (and many other languages) is built on compound words.

This problem mainly applies to headings as they are of a larger size than body text; therefore, each letter takes up more width and long words do not always fit. But do think about the accessibility aspect – some of your users are probably running with enlarged text, so what applies on your screen does not necessarily reflect everyone else's situation.

Soft hyphens for hyphenation when needed

The solution to the problem is to insert soft hyphens, which unlike regular hyphens only hyphenate a word if space requires it. The most obvious example is the main heading seen on a mobile screen, but this can just as easily appear as “elevator reading” – one word per line – if you have narrow columns in your design when it is not displayed full-screen on a desktop monitor.

A soft hyphen can be typed manually in the word where you want a possible hyphenation; just type  which stands for soft hyphen. For example:

<h1>Parlia&shy;mentarian</h1>

If you are unlucky, it does not work in your web system's regular text fields; then a developer needs to ensure you can write HTML even in those text fields. Often the problem is that the system tries to convert away the & character, which wrecks your soft hyphen.

Google PageSpeed Insights catches the problem in its usability check under “Size content to viewport” when text does not fit within the visible area.

Content published as structured data

Practically every website has information of a structured type, but it is not always published in a structured way. I am talking here about content that we as humans can easily identify as a calendar event – like a summer party – or a geographical place, that “Stockholm” probably refers to the Swedish capital rather than the small place in the USA.

This ability to understand what content is or refers to is something computers do not have. You who are interested in search engine optimisation have probably not missed all the talk about RDFa. It is one of the techniques for marking up content so that a search engine understands. So even a machine understands the content.

Microdata, Schema.org and Microformats.org

Google tired of Microformats.org being so slow with updates and then released Schema.org together with Microsoft, Yahoo, and others.

If you take a look at the website, you realise that almost every form of content you publish can be released in a structured format. One advantage is that you can get more space in Google's search results; for example, you have surely seen reviews and small calendars in Google's results list. Exactly how it is displayed is something at least Google constantly experiments with. Marking up who authored an article or blog post was something they only showed for a year before removing it.

Search engine optimisation, but not necessarily higher ranking

It is actually not considered among search engine optimisers that your website climbs higher in Google's results list if you have structured data. However, in many cases you get more space in the results list, which is definitely at the expense of those listed below your page.

What is the recommended amount of structured data depends a bit on what type of business your website has, but obviously your core business should be described structurally if possible. For example, if you list reception locations for customers, specify their geographic position, contact details, etc.

To verify that your structured data is correct, you can test with the Google Structured Data Testing Tool⁷⁷.

References to files that cannot be found or do not work

Just because a web page loads without giving an error message does not mean all is well. Behind the scenes, errors sometimes hide in one of all the files needed for the web page to become complete, as intended.

There are several causes for these errors. Most commonly when I have seen them is that the web developers managed to misplace a file when transferring changes from the development environment to the production environment. Then you cause a 404 error on one or a few files.

It can also be a server error that occurs, i.e. something in the 500 range. On some websites, stylesheets, JavaScript and even images are created by the web server. They are thus not static files on a hard drive. If an error in this creation occurs – more or less temporarily – the server will both not send the file and also create an error.

But why is this a problem?

The reason this is a problem is connected to which file is missing or not working. Say it is the stylesheet for printing; then few will discover it but it makes printouts worse than intended.

Furthermore, all types of errors risk giving bad signals to search engines when they rank your website. Constantly having problems with the website does not inspire confidence among users.

How do I find these errors?

The advanced methods are often only available to larger organisations, but we start here. If you have access to the web server's log files, you can often get detailed insight into the server's view of its well-being. You ideally want to avoid reading these log files as text, so check what analysis tools are available – perhaps you have Event Viewer in Windows Server, some log analysis tool like Kibana or Splunk. Perhaps you can get the log files and run them into a data preparation tool like Tableau.

Inspecting log files and monitoring errors should interest everyone who is not a technophobe. It is an indicator of how a website is doing, whether it has been developed competently and whether its operations work.

You with a website on a web host

If you have a smaller website, you are often somewhat more limited in what analysis options you have. But that is nothing to be downhearted about. Your expenses for keeping the website running are negligible compared to what the above-mentioned tools cost :)

Besides the obvious approach of being attentive to oddities when you yourself act as a user on your own website, there are tools that check this for you. The easiest are the network tools in all modern browsers. You find them under each browser's developer menu. There you see all files loaded for a page view, whether they generate errors or not. It is mainly the HTTP series 400 and 500 you need to think about; if you have plenty of time, it is worth thinking about the 300 series too as it is not efficient to call files that redirect.

The weakness of that method is that you do spot checks, one page at a time and manually. Another tool that does this a bit more automated and on masses of pages on your website is the free tool Optimizr.com – with the disadvantage that you have to wait for the reports they send out from time to time.

Wrong status codes are sent – the website is lying

It is good manners to answer honestly and concretely to questions. Unfortunately, this is not always prioritised when we let computers communicate with each other.

Practice for HTTP (the protocol that transports web pages over the internet) is that you speak openly about whether something went well (200 OK), whether the material has moved (301 redirect), disappeared (410 Gone) or similar. For the less serious, there is even the status code 418: I'm a teapot if your smart teapot needs to communicate over the internet :)

HTTP status codes for machines to understand each other

For the web to work optimally and machines to understand each other, you need to follow these established rules for communication. For non-human users' sake, like Google's crawlers and other bots, you shall follow HTTP's status codes, period. Otherwise they cannot know if an address or page has ceased to exist, never existed or cannot be found. How else is a machine supposed to figure out what happened – they cannot read and understand text, something that is not even needed if you just send the right number, the right status code.

If you for example send status code 301 or 302, bots are instructed that the address has been permanently or temporarily moved to another address. Not at all difficult, really.

The biggest deviations in this area are how one chooses to report that something went wrong. During 2015 I checked how Sweden's municipalities handled 404 error messages. The test involved evaluating whether they sent 404 as a message on an obviously incorrect address. 6.2 percent of the 290 municipalities sent “200 OK” instead of “404 Not Found”. They pretend that no error occurred.

This becomes a problem somewhat depending on what content you display alongside your 200 OK page. Presenting your start page instead of an error message is a naive form of goodwill but still quite common. Then you happen to get very many alternative addresses to your start page, which in an SEO context can be punished depending on whether you believe in penalties for duplicate content (and have forgotten to set a canonical URL in the page's source code).

How to find faulty error handling

One thing you can do is manually type in an incorrect address and see what is presented. To check what status code the server gives, you can for example check in the network panel that often comes with modern desktop browsers. If you want full control over HTTP, there is the Firefox add-on Tamper Data⁷⁸.

Also think about whether there are other systems contributing to the website. For example, larger organisations often have specialised systems for document management, media file management, and more, and then these also need to be tested.

The website can be crawled and indexed

That your website is not technically accessible can be exactly what you are after, but that rarely applies to a public website. There, the point is instead that everyone should be welcome, both humans and machines, to the majority of the content.

Obstacles that exist can be of various kinds. On one hand the soft variant where you ask to be left alone or try to instruct bots not to include the page in the index. Then there is the very high barrier of lacking authorisation.

Lower barriers that can be problematic

There are several less dramatic ways to try to tell machines how you wish your website to be treated. A classic that controls the entire website's settings is to have a file called robots.txt placed in the website's root directory. In robots.txt you usually specify which folders search engines and other machines should leave alone, where the website's sitemap is placed, and more.

Whether a page should be included in a search engine's index can also be controlled in detail through the page's own metadata in the HEAD tag. There you can also place metadata that specifies the page's canonical relationship within the website, i.e. whether the page is a variant of another. If so, the other, more important page's URL is specified as the canonical URL. It is a way to have multiple addresses to potentially the same content, but to specify which the main address is.

It is also possible on links to have instructions about how machines should behave, such as that links do not need to be followed. This is done through the attribute rel="nofollow" on each link.

Higher barriers for web access

Then there are the stricter ways to lock out machines, which is to block them or require login for access. You have surely experienced getting a login dialog when browsing the web, but you cannot always count on even getting the chance to try to log in. Sometimes you are blocked for not being within the approved network, not having the right IP address, or whatever requirements someone has set up.

Tools for discovering this problem

A handy tool is SEO Doctor⁷⁹ as an add-on for Firefox. If the page cannot be indexed, a red warning symbol is shown where you have placed the add-on in your browser. Another way to do spot checks is Pingdom's Full Page Test⁸⁰.

If you are looking for authorisation problems, it is among others HTTP's status codes 401 Unauthorized and 403 Forbidden you are after. It is not impossible that this is logged on your web server, but if your website is on a web host, you may not have access to this log.

Links to pages that do not exist

Broken links are among the most common quality problems on the web. They occur when pages are moved, renamed, or deleted without updating the links that point to them. Every broken link is a dead end for both users and search engines.

Start checking outbound links

Your outbound links to other websites can also break when those sites change. Regularly checking that your external links still work is a hygiene factor worth maintaining.

The ambitious and automated check

For the ambitious, there are tools that crawl your entire website and report all broken links. The SEO Toolkit⁸² for Windows Server can do this, as can various online services and browser extensions.

The server returns error messages in the 500 range

A 500-series error means the server itself has a problem. Unlike a 404 where the server simply cannot find what was requested, a 500 error means something went wrong on the server side. This can be a temporary issue or a persistent bug.

When can you expect HTTP status 500?

It can happen during deployments, when a database is overloaded, when a dependency fails, or when there is a bug in the code. Monitoring for 500 errors should be part of your routine.

Overload attack

A special case is when your server is under a DDoS (Distributed Denial of Service) attack or simply receiving more traffic than it can handle. Performance testing (load testing) beforehand can help you know your limits.

Tools for keeping track of 500 Internal Server Error

Pingdom and similar uptime monitoring services can alert you when your website returns 500 errors. Your server logs are also a primary source for investigating these issues.

Good and appropriately long page titles

The page title (the <title> tag) is one of the most important pieces of metadata on a web page. It appears in browser tabs, bookmarks, and most importantly in search engine results. A good title is clear, descriptive and of appropriate length.

Why the page title is so important

It is the first thing users see in search results and browser tabs. It should describe the page's content and be enticing enough to click. Keep it under 60–70 characters for it to display fully in search results. Have the most important keywords early in the title.

Reasonable file sizes

The total weight of a page view matters enormously for performance. A performance budget typically specifies a maximum total weight, and each component – HTML, CSS, JavaScript, images – should be kept as lean as possible.

This is when you need a performance budget

A performance budget gives you a concrete framework. If your budget is 400 KB total per page view, you need to balance how much goes to styling, scripts, images and content.

A challenge to specify acceptable speed in your performance budget

Speed depends on many factors beyond file size: server response time, network latency, rendering, etc. But file size is the one thing you have most control over as a content creator or developer.

Calculate the impact of your performance budget without moving

There are calculators and tools that let you estimate how long it takes to download a page of a certain size on various connection speeds. This can help you set realistic limits.

File sizes in the design

Much of a page's weight comes from the design: CSS, JavaScript, and decorative images. These should be optimised first, before you start restricting editorial content like photographs.

No duplicate content

Duplicate content confuses search engines and dilutes your link equity. Each page should have unique content, or at least a canonical URL pointing to the primary version.

Everything should be as unique as possible

This applies to page titles, descriptions, body text and URLs. Having the same content accessible at multiple URLs is a common problem, especially with CMS-generated pages.

Get help from Google for some overview

Google Search Console can show you pages it considers duplicates. This is a good starting point for cleanup.

Tools for finding copies of material

Beyond Search Console, tools like Siteimprove and various SEO crawlers can identify duplicate content across your website.

An appropriate number of main headings

In HTML, headings are hierarchical: H1, H2, H3, etc. The H1 is the main heading and ideally each page should have exactly one. It tells both users and search engines what the page is primarily about.

OK, how do I do it then?

If your website follows the HTML5 standard, it is still recommended that you have a limited number of main headings, preferably only one. Why? Because since the page likely has an internal structure with one heading that is more important than the others, that ranking signal should be apparent to search engines and other bots. Heading size is not only about typography and text size; it is also your way of weighting the heading's relevance for the page.

Are you building a start page or landing page? By all means have a main heading for each section with its own call to action, but then you should also encapsulate them in their own sections. This is not a free pass from following best practice, or having a semantic order in your code. Machines cannot read and understand the content, which also applies to those of your visitors who happen to have a disability. It also applies to me when I am tired, irritated, light-sensitive, cognitively impaired and hungover the day after Midsummer.

But do not forget to have a main heading – at least one you should have!

Besides being able to review the code manually, you can install browser extensions. The browser extension SEO Doctor for Firefox complains if you lack a main heading. SEO Doctor also complains if the web page lacks a subheading, which also does not hurt to have if you care about SEO.

Have an (appropriately long) description text

The description text is sometimes called a meta description. This is because it is placed among a web page's other overarching metadata in the code's head, what is in the <head> if you choose to view the source code behind a web page.

The description text should very briefly describe the page's purpose and content. This in natural language. Every important page on your website should have this type of hidden information. Or well, it is not actually that hidden. Sometimes it is displayed under the page's title in search engines' results page. It is there to be used when needed.

Length and design of description texts

The description text should be unique. This is because you should be able to distinguish between description texts on one and the same website.

Length-wise, you should keep it under 160 characters (count spaces too) if you want everything included, but as always with text, you should have the most important first. At the same time, you should keep it above 80 characters for it to be useful. I am not aware of any lower limit for when search engines ignore the description text. It may be worth monitoring these figures regularly given the increasing flora of devices we use to access the web.

Regarding how to formulate the text, you can follow communicative principles like AIDA (Attention-Interest-Desire-Action) or KISS (Keep it simple, stupid). You obviously cannot write however you like. If you are not used to expressing yourself in writing, it is good to find a formula that works for you.

When it comes to tools, Google Search Console is probably the best. There is a view for evaluating description texts, including whether there are duplicates on a website.

Media should have alternative texts

You have surely heard that you should have alt texts for images on the web? Otherwise you make life difficult for those who cannot see. Now it is not only the visually impaired who cannot see; search engines also suffer from this problem. Despite software beginning to be able to “see” what images depict, it is a long way before they understand the content the way humans do.

You have probably figured out that this does not only apply to images. Audio and video have the same problem. Then another disability appears, namely absent or reduced hearing. Here too, the ambition not to exclude anyone is complemented with what gives good search engine optimisation. If content in a video is subtitled for the deaf and blind, the text can be translated and displayed for even more people to access.

We all have a disability at some point. Imagine yourself sitting in a quiet carriage on the train and you have forgotten your headphones at home. Then you are glad someone has subtitled for you.

Best practice for captioning media files

The first rule is not to caption things that are irrelevant or trivial in the context; it will only disturb. Then you should rather choose an empty alt text, which should not be confused with not specifying the alt text attribute at all. By setting for example alt="" on an image's HTML tag, you tell the recipient that there is nothing of value to describe.

If you upload video to YouTube, there is a built-in captioning tool. If you run your own tool for streaming audio or video, this is something you need to consider.

When it comes to infographics, it becomes more difficult. Then you may need to provide your data as an alternative for those who cannot see. The data needs to be structured so it can be processed by machines. That is exactly what tables are for on the web. At the same time, you need to be aware that you cannot always perceive the content if the data is too complex. Then you can instead choose to caption the conclusion of these data.

Many choose to, instead of making images of charts, publish a table where the table through progressive enhancement technique is replaced with an illustrative image. There is a large number of JavaScript frameworks for these needs.

An appropriate number of links per page

How many links are appropriate for a single page on your website? Well, it is hard to put an exact number on it and it is not about an absolute limit. First, perhaps we need to reason a bit about why we should care at all.

All links on a page should be absolutely necessary. Why? Because it burdens your visitor's attention to choose among the options you provide. Which is most important in the context does not become entirely obvious if there are masses to choose from.

One approach is to set a limit on what is okay regardless of what type of page it is. In some tools I have used, warnings have been issued if you have over 200 links on a subpage, regardless of whether they point to pages within the website or externally.

Link juice = not everything can be most important, always

Think of each page as having 10 trust points to distribute to other pages through what it links to. If you link to 5 pages, they get 2 points each; if you link to 100, they get a tenth of a point each.

Something those interested in SEO work with is trying to balance the proportion of outbound links, i.e. links that lead away from a website. It is both a signal to the search engine and the user to what extent you refer away from your own website. Having a good balance of internal and external distribution of link juice is worth monitoring and checking how the market leaders in your niche do it. In brief: Is the point of a certain subpage to send people away, or is there internal material to offer? Follow up what users choose to do.

Having many internal links indicates a messy structure. By having masses of internal links, you also weaken the internal link structure. If you run WordPress, it is not uncommon to see a self-inflicted belly flop. Many spam their own blog posts with lots of tags. The more tags, the more links and associations. That is what the broader categories are for. Hold back, quite simply.

Using rel="nofollow" on external links

Specifically for external links, you can set the attribute nofollow. It tells search engines not to follow the link and is a way of saying you do not vouch for that link. You can at the same time ask yourself why you have a link you do not want to take responsibility for. This is a pragmatic standard that has emerged, for example, for links that arise due to user contributions on websites. It is a way not to reward spammers and an attempt to be selective with where you leave your link juice.

Web pages need inbound links

It is a bit suspicious to secretly promote a page that is difficult to discover as a regular user. You may not know how this can happen on your website too? The pages you have in your sitemap, a file primarily used by search engines, can be junk pages.

Commonly occurring, I find examples in hierarchically created websites, something that CMS like Episerver CMS gives automatically. There you can find pages called “right column” that lack content. Or “main menu” that is used as a folder in the tool for the web editor's sake. If neither your own nor anyone else's website links to a certain web page, it is definitely a signal that the web page completely lacks value. Sometimes that is true, but if the page is meaningless, it should not be there either, confusing either search engines or users.

If the page is not unimportant, it is time to link to it, at least yourself, or you will have to try to drum up some inbound links from someone else.

Beware of internal redirect chains

Those of you who read the book Webbstrategi för alla have probably understood how important I think it is to take care of your old or previously established addresses on the web. The problem can however arise over time that you redirect to yet another redirect. Then you waste the web server's power and the user's patience.

Your users are impatient, at least this applies to the absolute majority of websites. If your web pages do not load in under 0.1 seconds, you belong to the category that can get better flow in the perceived interaction. It takes time to redirect a user from one address to another. Really quite unnecessarily, or at least it is nothing a visitor benefits from.

Each redirect often takes at least a tenth of a second, but it is not uncommon for up to a second depending on the latency between the visitor and the server. In extreme cases, I have personally had response times of several seconds. Then the number of redirects would need to be multiplied by several seconds. Time that is completely wasted for all parties.

No, this is not an absurd scenario. In the spring of 2015, my own employer had a newly launched website that had only one fewer redirect than this for those who wanted to visit the start page. After sufficiently many redirects, it starts to be penalised even within SEO. According to reports, the redirect from HTTP to HTTPS specifically is not something Google penalises us for; for that particular redirect we seem to have a free pass⁸³.

How do you do it then?

Make sure to keep track of which redirects exist. If you have access to logs, it is HTTP's status codes 301 and 302 you are looking for, i.e. more or less permanent redirects. Also consider that 301 says it is a permanent redirect. It tells search engines and other machines they can forget the old address and commit to the new one. 302 says it is a temporary redirect; use that distinction wisely.

You should avoid having more than one redirect. You should never end up on a redirect page that sends you further. We should immediately redirect to the final address. With the correct status code according to the HTTP specification.

Do not specify keywords – unnecessarily

This question is quite controversial in some contexts. But you can at least ask yourself for whose sake you specify keywords, or meta keywords as they are also sometimes called. If Google ever cared about these words, it has been a long time since they stopped. Today it is rather theorised that the presence of keywords is a clumsy attempt to spam search engines.

Sometimes the organisation's editor's manual says you should enter keywords. Yes, it does where I work too. There are possibly two reasons for this: it may be that you have your own search engine that benefits from these words. The other somewhat less flattering reason is that the knowledge is from the nineties' happy days when it was almost standard practice to trick search engines with completely irrelevant keywords.

Your own search engine can actually benefit from these keywords. The reason this can work internally is that you have greater reason to trust your own editors; your own search engine only searches material you yourself have published.

OK, but our own search engine covers our external web…

This is where the real problem arises. If you benefit from specifying keywords for your own search engine's sake, by all means do so, but do not believe it benefits Google or the other search engines. If you specify keywords for your own search engine's sake, for goodness' sake make sure the words are unique and relevant so you do not spam yourself.

The rule of thumb for what in external search engine optimisation is worth spending time on is that it should be complicated to cheat, and that original material should be rewarded. It is extremely easy to cheat on a massive scale with keywords. That is why they are not used by external search engines.

Appropriate depth in the website

A highly unscientific rule of thumb you can follow is that not all users have the patience to look more than three levels deep in your website to find what they are seeking. This partly because many do not feel they have the time to spend on a difficult-to-navigate website but also because you do not know if it is worth the effort. We cannot count on particularly many of our users being motivated enough to make the effort. If, however, we design the navigation with a user in mind, there is actually evidence that four levels is better than three, among other things in Jakob Nielsen's book Prioritizing Web Usability⁸⁴.

Why this is a problem

That navigation is not always carefully designed for users' sake I get signals about constantly. If you believe in fate: the same day I was editing this part of the book, an email dropped in explaining that “the web editors find their pages easily in this flat structure”. The question is whether the website's navigation is for the web editors' sake?

Avoid more than three levels of depth

As a guideline, try to keep important content within three clicks from the start page. If users need to click more than that, evaluate whether the navigation structure can be improved.

You have limited content to convince with

Each click deeper into the website is an opportunity to lose the user. Make sure each level provides enough value and clarity to keep them moving forward.

Tools for checking navigation structure

Those who do content audits in the style that Kristina Halvorson recommends in her excellent book Content Strategy for the Web⁸⁵ already have this structure in their content documentation.

If you have Matomo, rather than Google Analytics, the website's depth is available as a tree structure to explore. It can take a while to explore the structure and remember that it is the addresses' structure you see rather than whether something is linked directly from a high level in the structure. If you run Google Analytics, you can filter out the corresponding information under Behavior → Site Content → Content Drilldown. There you see a website's grouping, if the URLs have directory paths in them.

Crawling tools like Screaming Frog or the SEO Toolkit can also show you the depth of your website's page structure and help identify pages buried too deep.

Clear link texts

Link texts (anchor texts) should describe what the user will find when they click. “Click here” and “Read more” tell the user nothing about the destination. Good link texts are descriptive and make sense out of context – important for accessibility and SEO alike.

Practice with heavy files

If you link to a large file like a PDF, indicate the file type and size in the link text or nearby. Users on mobile connections deserve to know what they are about to download.

Search engine optimisation and link texts

Search engines use link texts to understand what the linked page is about. Descriptive link texts therefore contribute to better indexing of your content.

Good distribution of link juice

Link juice (or link equity) is the concept that links pass authority from one page to another. A thoughtful internal linking strategy ensures that your most important pages receive the most equity. Avoid spreading it too thin across hundreds of links.

Tools for finding spilled link juice

SEO tools like Ahrefs, Moz, or Screaming Frog can help you visualise your internal link structure and find opportunities to improve the distribution of link equity.

Structural CSS included in the HTML code

Critical path CSS (or above-the-fold CSS) is the CSS needed to render what the user first sees. By inlining this CSS directly in the HTML, you eliminate a render-blocking request. The rest of the CSS can be loaded asynchronously.

Anecdote from March 2015: developers did not believe their eyes

When we first implemented critical path CSS at my workplace, the developers were astounded by the difference in perceived performance. The page appeared to load almost instantly, even though the total load time was barely changed. The key was that the user saw meaningful content much sooner.

Tools for critical path CSS

There is a tool called Critical Path CSS Generator⁸⁶ that can extract the CSS needed for above-the-fold content. For those working with build tools, there are npm packages that automate this as part of the build process.

Do not open links in a new window

Opening links in a new window or tab is one of those things that seems helpful but actually causes problems. It breaks the back button, which is a fundamental navigation principle on the web.

New window conflicts with the accessibility guideline WCAG's success criteria

WCAG 3.2.5⁸⁷ and G200⁸⁸ specify that opening new windows should only be done when necessary and that users should be warned in advance. Many users, especially those with cognitive or visual impairments, become confused when a new window opens unexpectedly.

New window is a security risk

There is a well-documented vulnerability⁸⁹ with target="_blank" that can be exploited for phishing. The opened page gains access to the opener page's window object, potentially allowing it to redirect the original page to a malicious URL. This can be mitigated with rel="noopener", but the simplest solution is not to open new windows in the first place.

Do not publish PDF files

PDF is not only bad; the format has its merits. Among other things, it is a sensible format for archiving, specifically the variant PDF/A, but also an excellent format if you want to send something for printing to someone else. PDF means you can refer to the last line on page two and everyone is looking at the same thing. You know the content will be printed the same way.

Responsive web is not compatible with PDF files

But PDFs are not a format for the web. The same challenge we addressed by building responsive websites is what makes PDF files less good. Orthodox responsive theory requires that you think away canvas, screen size or assume a certain type of user. A PDF is the archetype of a canvas. The whole thing is about having content with a fixed position within a given measured area. A canvas, quite simply.

It is very unfortunate if you have a responsive website and users nonetheless end up on PDF files or other documents that are unsuitable for anything other than possibly downloading to a computer. This of course also applies to Word documents and everything else that is not real web.

If your website is on the public web (not an intranet), you can try formulating a search query on Google similar to the one below:

site:mysite.se filetype:pdf

Do bear in mind that on some organisations' websites, it is not the main domain that serves documents and PDF files. At my workplace, it would have been alfresco.vgregion.se as the domain.

Provide a sitemap

A sitemap is a technical equivalent of a site map. That is, a list of all pages on a website. Sitemaps are an industry standard⁹⁰ developed by search engine companies like Google, Bing and others, but you can very well use your sitemap for other things than sending to search engines.

Among other things, your own search engine can benefit from it. Or those you collaborate with may want to be able to monitor when new material appears on your website. The sitemap's content is a chronological list of which addresses exist on the website. Newest on top and you can also add a weighting for the address's relative value within the website.

The sitemap should preferably be called something predictable, like sitemap.xml, and be placed in the website's root directory. Alternatively, you specify in your robots.txt where the sitemap can be found. If you have multiple sitemaps, you need a so-called siteindex, which is a list of which sitemaps you have. This may be needed if you have an enormous number of entries in your sitemaps, as you are not allowed to have an infinitely large sitemap.

If you run WordPress, you can use WordPress SEO to include what they call XML Sitemaps. Otherwise, in the worst case you develop the function yourself, or if the content almost never changes and you have a smaller website, you might manage to write it by hand.

You can notify Google that you have a sitemap via their Search Console; the same applies to Bing with their Webmaster Tools. There you will find out if there are any errors related to your sitemap.

Have a robots.txt and a humans.txt

: Image 66: Robots.txt on my book blog.

A robots.txt is a text file that sits at the root, the main directory, of a website. It is there to leave instructions for bots that connect to the website. Among other things, you instruct search engines this way about which parts of the website they should not index. It is of course practice to follow what is in a robots.txt but for the ill-intentioned, they can also get tips this way, so you need to be careful not to be naive.

Besides telling which parts of the website you do not want visits to (so-called disallow), it is practice to specify where you have placed your sitemap. If you are not entirely Google-centric, other search engines can get help indexing your website if you list your sitemap in this way. It is after all a very small effort.

To double-check that your robots.txt is correctly designed, you can via Google's tool Search Console call the file and have it analysed.

A humans.txt too – for your human readers

Some websites have started having a humans.txt alongside their robots.txt, to write documentation for human readers. It is the web's equivalent of how you often find a readme/readme.txt when checking out other types of software.

If you have something to say to a human reader, do it in the humans.txt file. It can be contact details, instructions for technicians or developers who check the website.

Rounding off

I hope you have found something useful in this book. Feel free to also ask follow-up questions, for example via Twitter.

Continue reading – Outro