Friday, October 31, 2014

CrashPlan: Our watch is stuck on 2003

CrashPlan has this cool feature where it watches for new and changed files (and directories) in real-time; it immediately notices when there's something new, and schedules it for the next backup.

On Linux, this is done with inotify.  This usually requires a little bit of fiddling with sysctl(8), since the default maximum number of watches allowed per user is typically much lower than the number of entries in the backup file selection.

However, you may find that this feature still cannot be enabled, no matter how high you raise that maximum number.  Turns out that CrashPlan first checks the kernel version for inotify support (which was introduced in 2.6.13), and cannot parse a version number with only two components.

Guys, seriously?  Seriously?  It's been more than three years since 3.0 was released.  This should have been fixed a long time ago, and I most certainly shouldn't have to write a goddamn uname library wrapper to make it work.

(sigh)

That being said -- once you've finally got it working, it's really cool, and well worth the effort.

Thursday, October 30, 2014

CrashPlan: Don't forget the umlaut

Good thing that I tested a full restore from my new CrashPlan backup, as I found that something was missing: all filenames containing non-ASCII characters were omitted from the backup!

It turns out that Java is to blame -- at least in part.  Filenames are, after all, strings, and Java treats them as such; any filename returned by a system call (as an array of bytes) is decoded into a String object (as an array of code points) based on the character encoding of the current locale.  The same goes in reverse: any String filename passed as argument to a system call is encoded back.  If all goes well, both operations should be exact opposites, and cancel each other; the string we give to open(2) should be byte-for-byte identical to the one we got from readdir(3).

If, however, the filename is not properly encoded accordingly with the current locale, it may contain sequences of bytes which are invalid, and cannot be converted into a code point.  (This is typically the case with ISO-8859-1 filenames under a UTF-8 locale.)  In that case, the Unicode replacement character (U+FFFD) is used instead -- that's what it's for, after all.  Consequently, the re-encoded filename will not be identical to the original, and will refer to a (most probably) inexistant file with a weird name.  (The effects can be perplexing at first, such as listed files not really existing.)

If the C locale is in effect (typically because $LANG -- or $LC_ALL or $LC_CTYPE -- was explicitly set to "C", or left undefined, either of which can often be the case for init scripts, or when using sudo), then only ASCII characters are allowed; any filename with non-ASCII characters (be it encoded with UTF-8 or ISO-8859-1) will definitely not work.

CrashPlan actually accounts for all of this, and makes sure to set $LANG to "en_US.UTF-8" if it was previously undefined.  (It also enforces UTF-8 as the current codeset.  If your filesystem is still using a legacy encoding, welcome to the 21st century.)  This ensures that UTF-8 filenames will be properly handled.  Assuming, of course, that en_US.UTF-8 is a valid locale.

That's the catch: on a Debian system, locales are not installed as-is, but rather generated on demand (to save space).  And it's quite possible for en_US.UTF-8 to not have been generated, if another UTF-8 locale is being used in its stead.  In that case, failure to set $LANG will result in an invalid locale, falling back to the C locale, under which non-ASCII filenames cannot be handled properly.

CrashPlan's fault in all this is quite simple: it does not appear to output any error or warning message in this situation.  Seems like a serious oversight to me.

Setting $LANG to the proper locale in bin/run.conf would do the trick, but according to Code42, this file will be overwritten when upgrading to a new version.  (And unlike that other bug which prevents the client from launching, this one could easily go unnoticed if reintroduced.)  It's probably best to play it safe, and just generate the damn US locale.

Problem solved.

Saturday, October 25, 2014

CrashPlan: Kicking the tires

After a lot of research and reading, I'm pretty much sold on CrashPlan.

I'm currently in the process of uploading my /home partition (only 1.9 days to go!) as part of their free trial.  (Kudos to them for not putting any cap or limit -- you can try it out as much as you want.)  I had heard reports of issues with their upload/download speed, but it's all going as smooth as butter over here.  Once I've run a successful restore dry-run, I'll be another happy customer.

My only non-encryption-related issue so far is that there doesn't seem to be an easy way to remove a single file (or folder) that has already been deleted.  (Being deleted, it no longer shows up in the File Selection list, and therefore cannot be deselected.)  Of course, with the lack of any quota, this is not that much of an issue, but it's still bugging me a bit.

(My thanks to Nelson Minar for his tips on increasing the inotify limit and turning off inbound backups.)

Monday, October 13, 2014

Another day, another segfault

$ CrashPlanDesktop
$ tail -n 18 /opt/crashplan/log/ui_output.log
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xc981c81d, pid=17363, tid=4136905536

I'm starting to believe there's a curse on me...

UPDATE: Well, I'll be damned; adding -Dorg.eclipse.swt.browser.DefaultType=mozilla does work.  Apparently, this is an eclipse bug -- and here I thought that eclipse was merely an IDE.

UPDATE 2: This issue is actually documented on CrashPlan's website.  I guess I didn't look for it hard enough.

Sunday, October 12, 2014

Reinventing the OfflineIMAP wheel

I just realized that by attempting to hack IDLE support around mbsync, I was basically reinventing OfflineIMAP.  Hurray me.

(I had actually considered OfflineIMAP when I was initially looking for an IMAP sync-er, but most of the comments out there painted it as a clunkier, buggier, unmaintained alternative.  After taking a second look, this doesn't seem to be the case, at least not in the current version.  I guess I'll have to give it a try and see for myself.)

No fate but what we make

If I'd only taken five minutes to look at the damn code, instead of spending the whole evening poking at it with gdb, I would've easily found the missing comma that was causing the segfault from my previous post.  (sigh)

Wednesday, October 8, 2014

SELECT ... FOR UPDATE on absent rows

Finally tracked down today at work the source of a year-old bug that was causing (rare) intermittent MySQL deadlocks:
BEGIN;
SELECT i FROM t WHERE i = 42 FOR UPDATE;
(0 rows returned)
[...]
INSERT INTO t SET i = 42;
[deadlock]
Huh?  How can this deadlock -- didn't I just get a write lock before?  If not on the record itself (which didn't exist), then at least on the gap where it would be inserted, right?

Turns out that MySQL/InnoDB doesn't acquire an exclusive lock in this case.  It will get a shared lock (on something), though, preventing any concurrent INSERT for that row, but making it possible to deadlock when INSERT requests the proper exclusive lock it requires.  Hilarity ensues.

(Despite comments to the contrary in the bug report, I can reproduce this for any value, large or small.)

(Update: This apparently varies from one DBMS to another.)

Tuesday, October 7, 2014

I just can't escape fate

I've spent way too much time futzing aroung with programming and debugging these past few days.  I really need to settle down for a while, and take care of all the things that I've put aside and are now piling up.  Like, say, my monthly accounting.
$ gnucash
Segmentation fault
(sigh)

(Update:  This turned out to be even more complicated than I thought, involving GCC; filed Debian bug #764510.)

Monday, October 6, 2014

Sicker Happier

Somehow, "I've been feeling under the weather" has turned into "let's copy all my mail under IMAP, switching from my SpamAssassin setup to my provider's, replacing most of procmail with Sieve scripts, fetchmail with mbsync, and converting what little remains to maildrop, skipping Postfix entirely".

And then, "not sleeping enough and feeling much worse" morphed into "instead of running mbsync every 30 seconds, let's write a multi-threaded Python script that IDLEs on each mailbox".

This is fucked up.

(The SpamAssassin switch was worth it, though.)