Posts about code

Converting latin-1 To utf-8 with Python

Tonight I finally converted all the Glossary pages in my mirror of the Jargon File into Unicode (utf-8 encoding) so that they will transmit and display properly from GitHub Pages (or any other modern web server). It was a fairly trivial thing to do in the end, but I am likely to need to repeat this for other things at work, so I'm blogging it.

The Jargon File was converted into XML-Dockbook and Unicode for version 4.4.0, but ESR only converted the front- and back-matter, not the Glossary entries (i.e. the actual lexicon). Those are still latin-1 (ISO-8859-1). And although the HTML rendition begins with the correct header declaring this:

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>

The pages are actually served from catb.org as Unicode (utf-8). For instance, compare /dev/null on catb.org with my mirror of /dev/null.

Read more…

Journald experiments - Testing systemd's logger: speed and buffering

I've been having good natured arguments at work about whether it's the End of the World that we are at last switching away from Scientific Linux 6 and it's System V style init scripts, to CentOS 7, which uses systemd.

My own opinion is that systemd is pretty cuspy. It's not perfect, but nor is it some great hulking monolithic monster come to destroy the Unix Way in the Linux world. It offers many worthwhile improvements and I've enjoyed using it in openSUSE for years now. I look forward to switching away from the hair-ball of wet SysV init scripts with clumsy precedent semantics and manual service recovery.

Now, I don't want to throw my hat into the ring on the pro's and con's of systemd having replaced the start-up infrastructure (and a lot of other systems besides) on Linux-based operating systems. Enough has been said already on that front, by many more experienced than I, and further argument is pointless: whichever camp you're in, you won't be convinced of the other sides point of view by now.

However there is one argument against systemd that I'm not so sure about: journald and it's past issues:

  • alleged buffering of logs, making diagnostics and debugging on time-critical services difficult or impossible
  • binary log files which can be corrupted, and then not useful thereafter (because they're binary)
  • volatile storage, so that your logs are gone when you want them the most: after an unplanned reboot

I'll be spending a few days experimenting with journald in these areas, to see if it's as bad now as it was five years ago when concerns like these were being raised.

In this post I want to look at the journald daemon / journalctl log viewer a bit, from the point of view of buffering output, whether and where it could be occurring, and what the implications might be as a web sysop.

This is a medium-long post, with about 23 minutes of terminal output recordings (in text, using asciinema) and is about 2⅓MB to download. It's also about half-an-hour's read on top of that.

Read more…

4-bit Rules of Computing, Part 4

Here is the fifth part of my blog series expanding on my 4-bit rules of computing.

In this post: rules 7, 8, and 9; which discuss testing and debugging. They are all related in a way: having to do with making good-quality craft work. Because, as much as computer people like to believe that we're "engineers" or that this is "computer science", we're not really. We're crafts people, in a profession that's still very young and finding its roots and methods in order to be consistently successful.

I'm definitely not trying to pretend I'm an "engineer". For real rigour, there is much more required than a few simple rules. But these are some realistic and humble rules in the area of testing that I aim to stick to.

Read more…

Learning DVCS Workflow - 0

If you take look at the revision history for some of my projects on GitHub, you'll see that I have a fairly messy track-record!

A few times I have successfully used Magit's ability to interactively stage/unstage Hunks or parts of Hunks, to make my commits a bit more clean and sensible, but not always.

The real problem is: I haven't been using Branches. It's because I haven't studied how to use them effectively, and in the past they've been scary.

But for the future, and especially for my dotfiles, I'd like to be able to read through the commits and make sense of them after. Also I'd like to be sure when I'm committing that I don't make any unplanned master changes and break things.

I also tend to work by myself on these projects, but I'll often go on a tangent, or start a blog post and then put off finishing it while another, different idea is developed and maybe even published first. Ideally I should be able to track these things separately.

So I need to learn: how to do revision control workflow with branches, properly.

Read more…

4-bit Rules of Computing, Part 3

Here is the forth part of my blog series expanding on my 4-bit rules of computing.

Previously in Milosophical Me: Mike was reflecting on Comments, both in source code and in social media, and had come to the conclusion that they are to be avoided, that they can be more harmful than helpful, and that source codes (and people) should be allowed to speak for them selves.

There is an exception to Rule 5 (Rule 0 allows for this): doc-comments. In this post I explore what they are, how they differ from regular comments, and how to use them to assist your fellow hackers.

Read more…

Blogging with Emacs Org Mode

My blog is really in need of some love. One of the reasons I'm not posting very frequently (apart from just not having much to say or making time to say it) is that it is such a pain to write posts in WordPress' built in editor. I briefly considered editing in a text editor and then cut/pasting, but usually there's a mess to clean up anyway, so it's not much fun either.

Read more…

NetBeans 6.5 and Python

NetBeans 6.5 is out! You can run it with the Nimbus look and feel too! There's also an Early Access plugin for Python. All very nice.


I recently had occasion to play with some Python at work (a small script to do some configurations, and I didn't want to do them in bash), so I took the time to get all of this set up. It's all so very easy and not worth writing about. However I thought that the interactive debugger (which is awesome, btw) has a small issue that needs resolving. Mean-time, here's a work-around.

Read more…

Stripping tags from ogg Vorbis files

I have a bunch of free Ogg Vorbis audio files that I've downloaded from Kahvi.org. They're great! But recently they've been including cover art within the files, which breaks Windows Media Player (it can't handle the very long tags of binhex-coded JPGs).

Since I rather like WMP's integration in windows (keyboard shortcuts), and Amarok isn't quite ready for win32, I thought I'd find a way to strip the troublesome tags from the data files rather than change to another player.

Here's a quick-and-dirty shell hack to remove the tags from the files and get them playable by daft players such as Windows Media Player

Read more…