Link Rot
Wouter from Brain Baking inspired me to check the liveness of the links I link to from this site.
I too based my calculations on _post
instead of _site
, which makes for simpler link parsing and cleanup. Plus this excludes the links to this site here, since they are being generated on each publish and thus are guaranteed to be up-to-date.
Technically they can break, but all pages are always rebuilt on publish, and the build log would show those as warning. Except for legacy redirect aliases, but those were already covered by the Google Webmaster Search Console Tools.
Show me the Details
_posts % grep -h -re "http[s?]\:" . > ~/links.txt
That gives me 289 raw results from 116 posts (minus this one).
After removing the cruft before and after the links using .*(http.*)[\)>].*
with $1
, there were a few placeholder links left, like in here, here or here.
First without the follow-redirects option to capture the Redirects status codes.
for url in $(cat links.txt)
do
curl -I -o /dev/null -w "%{url}: %{http_code}" "$url"
echo "\n"
done > curl.txt
While executing this, two domains weren’t even able to resolve/connect: www.saltstack.com
which they forgot to redirect after the VMWare acquisition and nitter.net
for obvious reasons.
And a second time with the redirects followed to see which links actually show “something” when clicked:
for url in $(cat links.txt)
do
curl -IL -o /dev/null -w "%{url} -> %{url_effective}: %{http_code}" "$url"
echo "\n"
done > curl.txt
Results
Of the 289 links 276 remained after deduplication, 274 of which I got a response. Plus 405 responses for 7 requests that disallow HEAD requests, but which worked after switching those to GET:
With redirects enabled this makes a total of 230 working links out of 274, resulting in 16.06% link rot.
I notice a lot of broken GitHub and Google Play Store links in there, even if Manton Reese’s theory holds true, only those that are not explicitly unpublished remain there:
… of all the new web companies, there are only two that will last 100 years, still hosting our stuff at URLs that don’t change: GitHub and Automattic
But still far from the 38% to 66.5% mark.