Converting latin-1 To utf-8 with Python

Tonight I finally converted all the Glossary pages in my mirror of the Jargon File into Unicode (utf-8 encoding) so that they will transmit and display properly from GitHub Pages (or any other modern web server). It was a fairly trivial thing to do in the end, but I am likely to need to repeat this for other things at work, so I'm blogging it.

The Jargon File was converted into XML-Dockbook and Unicode for version 4.4.0, but ESR only converted the front- and back-matter, not the Glossary entries (i.e. the actual lexicon). Those are still latin-1 (ISO-8859-1). And although the HTML rendition begins with the correct header declaring this:

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>

The pages are actually served from catb.org as Unicode (utf-8). For instance, compare /dev/null on catb.org with my mirror of /dev/null.

Read more…

Journald experiments - Testing systemd's logger: speed and buffering

I've been having good natured arguments at work about whether it's the End of the World that we are at last switching away from Scientific Linux 6 and it's System V style init scripts, to CentOS 7, which uses systemd.

My own opinion is that systemd is pretty cuspy. It's not perfect, but nor is it some great hulking monolithic monster come to destroy the Unix Way in the Linux world. It offers many worthwhile improvements and I've enjoyed using it in openSUSE for years now. I look forward to switching away from the hair-ball of wet SysV init scripts with clumsy precedent semantics and manual service recovery.

Now, I don't want to throw my hat into the ring on the pro's and con's of systemd having replaced the start-up infrastructure (and a lot of other systems besides) on Linux-based operating systems. Enough has been said already on that front, by many more experienced than I, and further argument is pointless: whichever camp you're in, you won't be convinced of the other sides point of view by now.

However there is one argument against systemd that I'm not so sure about: journald and it's past issues:

  • alleged buffering of logs, making diagnostics and debugging on time-critical services difficult or impossible
  • binary log files which can be corrupted, and then not useful thereafter (because they're binary)
  • volatile storage, so that your logs are gone when you want them the most: after an unplanned reboot

I'll be spending a few days experimenting with journald in these areas, to see if it's as bad now as it was five years ago when concerns like these were being raised.

In this post I want to look at the journald daemon / journalctl log viewer a bit, from the point of view of buffering output, whether and where it could be occurring, and what the implications might be as a web sysop.

This is a medium-long post, with about 23 minutes of terminal output recordings (in text, using asciinema) and is about 2⅓MB to download. It's also about half-an-hour's read on top of that.

Read more…

Password databases: setting up password-store on a Unix computer

Having covered what pass is, why I'm using it, and the required supporting tools gnupg, git, ssh and a private git remote, it's time to go over how to put the system together.

Setting it all up on a Unix computer is fairly straight-forward. Getting it onto an Android is a bit different. So in this post I'll cover how the pieces of the system fit together, and then walk through setting it up on Unix.

Synchronising your local password-store git repository with your remote store is done a bit differently depending if this is the first time you're setting up the remote, or if you already have a remote and you wish to merge it into your new local. I'll cover that too.

Read more…

Password databases: from KeePassX to Unix password store

Passwords. We all have a lot of them to remember — most of us have too many. How do you keep track of them all?

Originally I used to just remember passwords for everything, like most people. I soon found this doesn't scale past about 7 passwords and PINs. Rather than use the same passwords everywhere, I started to keep a secret list of passwords, but it was a pain to keep that list with me, and what if it was discovered?

After been keeping my passwords in a GPG-encrypted text file for a few years, I then started using a KeePassX database, and that's been pretty successful. I sync the database to my phone so that I can have my passwords with me whenever needed, but it is a little bit clunky to use.

At the recomendation from someone at work, I checked out pass, “the standard Unix password manager”. It offers all the features I've been using from KeePassX for a few years now, only with much better syncronisation based upon git+ssh.

Pass is also integrated into browsers, editors, and even a few operating systems, so it's potentially a lot less clunky and risky to use than how I was using KeePassX with passwords entered via the system clipboard.

This post reviews my password management approaches to date and gives an overview of Pass.

Read more…

Learning DVCS Workflow - 1

Tonight I learned a basic git trick that was not immediately obvious to me, but should have been, I guess.

I've been switching my Spacemacs back to the master branch to try trouble-shoot a performance issue I'm having on the Macintosh where it just hangs occasionally. My master is tracking to Spacemacs master which is still at 0.200.13. I haven't touched it in over a year, and there are some things that I wanted from my develop branch.

I want to merge in the latest version of those few files, but not everything on the branch, so a merge is not the right operation.

Read more…

4-bit Rules of Computing, Part 4

Here is the fifth part of my blog series expanding on my 4-bit rules of computing.

In this post: rules 7, 8, and 9; which discuss testing and debugging. They are all related in a way: having to do with making good-quality craft work. Because, as much as computer people like to believe that we're "engineers" or that this is "computer science", we're not really. We're crafts people, in a profession that's still very young and finding its roots and methods in order to be consistently successful.

I'm definitely not trying to pretend I'm an "engineer". For real rigour, there is much more required than a few simple rules. But these are some realistic and humble rules in the area of testing that I aim to stick to.

Read more…

Learning DVCS Workflow - 0

If you take look at the revision history for some of my projects on GitHub, you'll see that I have a fairly messy track-record!

A few times I have successfully used Magit's ability to interactively stage/unstage Hunks or parts of Hunks, to make my commits a bit more clean and sensible, but not always.

The real problem is: I haven't been using Branches. It's because I haven't studied how to use them effectively, and in the past they've been scary.

But for the future, and especially for my dotfiles, I'd like to be able to read through the commits and make sense of them after. Also I'd like to be sure when I'm committing that I don't make any unplanned master changes and break things.

I also tend to work by myself on these projects, but I'll often go on a tangent, or start a blog post and then put off finishing it while another, different idea is developed and maybe even published first. Ideally I should be able to track these things separately.

So I need to learn: how to do revision control workflow with branches, properly.

Read more…

4-bit Rules of Computing, Part 3

Here is the forth part of my blog series expanding on my 4-bit rules of computing.

Previously in Milosophical Me: Mike was reflecting on Comments, both in source code and in social media, and had come to the conclusion that they are to be avoided, that they can be more harmful than helpful, and that source codes (and people) should be allowed to speak for them selves.

There is an exception to Rule 5 (Rule 0 allows for this): doc-comments. In this post I explore what they are, how they differ from regular comments, and how to use them to assist your fellow hackers.

Read more…