elegant chaos

The Elegant Chaos Blog

Copilot Thoughts

July 05, 2021

The furore surrounding Github Copilot interesting.

I’m no lawyer (nor do I play one on TV), but my feeling is that it may expose a flaw in the FLOSS community’s ideas about ownership of code.

If so, this is a good thing. The flaw (if it exists) has not been created by Copilot. It was already there, it just hadn’t come to light.

Is This OK?

Anyone who’s been coding for a while will have come across the situation where you’ve found some code with a license you can’t use, you’ve used the act of reading (or maybe even debugging) the code to teach yourself the solution to the underlying problem, and then you’ve written new code.

Then maybe you’ve felt uneasy and wondered if you’ve broken the rules.

Maybe all you did was cynically copy & paste and change a few variable names - in which case you probably did break the rules. Maybe though you genuinely rewrote it all from scratch. Maybe after rewriting you pretty much ended up with the same code because - well - that’s the best expression of the underlying solution to the problem you’re trying to solve.

Who Owns What, Exactly?

For any such situation, there’s going to be a blurry line. What did I copy here, and what did I create myself? The implementation? The algorithm? The expression of the algorithm in the context of the particular languague I’m using? The implementation in the context of the problem I’m applying it to?

Furthermore, how is this process essentially different from the one undertaken by the author of the GPL’d code?

Can I be sure in any way that they themselves weren’t just re-expressing something that has prior art?

Granularity

To look at it another way:

For any sufficiently small fragment of code, there’s likely to be a canonical way to express it. Taken to an extreme, a single line may well be infintely rewriteable, but one formulation is probably clearer, more compact, or better meets your particular criteria than any other.

In most cases it would be self-evidently ridiculous to assert that the GPL license applied to a body of code actually applies to each line in isolation.

If the line includes variable names, function names, comments, or other incidental metadata, it could be argued that they are not directly related to the pure meaning of that line. They do have meaning and value, but probably only in the context in which the line exists.

These names can be replaced, rewritten, or even randomised; this may obfuscate the meaning of the code, but it doesn’t stop it working.

Once you get to a small enough granularity, the same line of code almost definitely exists in countless other programs, both open and closed source, GPL’d or liberally licensed. The names might be different, but the meaning of the code is the same. The machine code instructions emitted by the compiler will probably be the same.

What Is Knowledge Anyway?

So what exactly are we arguing about here?

If something like Copilot is taking chunks of GPL’d source and pasting them into someone else’s program, how many contiguous lines does it have to paste before there’s a problem? Is there an arbitrary N number of lines that’s ok, where N + 1 is not ok?

If Copilot applied some natural language processing to infer the context that the lines are pasted into, and then automatically renamed the variables (or even rewrote comments) to use words appropriate to the new context, would that now be ok? Same code - different names?

If it randomised the names and stripped all comments, would that be ok?

Maybe we arrive at some formulation that states that a line is ok, but a whole function is not. Are we then allowed to apply the copying process to each line in turn? Refactor the function into multiple smaller functions and use them?

Known Unknowns

This all feels very wooly to me. The code represents a muddle of knowledge, experience, style, and algorithm.

Any assertion of the right to control how each of these things are used in isolation, or even recombined into a larger whole, feels like over-reach.

Worse, it may well be politically dangerous. If an entity can assert their right to apply copyleft to small fragments of code, doesn’t that logically mean that they are claiming ownership of the underlying meaning of those fragments?

Doesn’t that put us into territory where another entity can assert ownership of the underlying meaning of other fragments and choose to patent them or in other ways suppress their use by others? Isn’t that sort of what the Free/Libre side of the community is trying to avoid in the first place?

Nothing Is Simple

I’m not claiming any great insights here, and certainly not offering any solutions.

It just seems to me that the problem is a lot knottier than some people are making out.

It’s not self-evidently the case that what Github Copilot is doing is breaking the rules, any more than it is clear that what happens when I read someone’s GPL’d code and learn something from it is following the rules…

more...

The Matchable Protocol

April 30, 2021

For the last few years, the default setting for all of the Swift code I write has been open source.

As a result, I’ve accumulated a vast number of Github repositories and Swift Package Manager packages.

However, I’ve been really bad at telling people that they exist!

This post is an attempt to start to fix that, by talking about one small package I’ve recently created: Matchable.

First though, some disclaimers:

Caveat Emptor

Part of the barrier to telling people about things I’ve done is sheer time it takes to write even quite a simple post like this one.

So my first disclaimer is just to say that this post is mostly a re-hash of the README file from the Github Repository. Nothing wrong with that I think, but just to be clear…

My second disclaimer is that this is work-in-progress code from the real world.

I’ve encountered a few people who subscribe to a fundamentalist view of open source code: that it’s useless unless it is fully polished, fully tested, 100% supported and actively maintained.

I understand this point of view; we’ve all encountered code that makes great claims and turns out to be broken or mostly unfinished.

Respectfully though, those people are wrong.

Imperfect open-source code can be frustrating. However, it can also be a helpful foundation for someone else to build on, a good example of the pros and cons of particular technique, or a useful supplier of that one crucial line you have been searching the internet for.

Aiming for perfection is setting the barrier way too high. I am as insecure as the next person when it comes to showing my workings in public. I’ve been a professional programmer for more than three decades, but I still suffer from impostor syndrome.

It’s tempting to hide away, but I’m trying to fight the urge, and I’d like to contribute in some small way to an environment where we aren’t scared to risk being wrong.

I offer up all of my open-source code in this spirit. It’s not perfect, because I am busy, and because I am still writing it. I find this code useful, and I hope someone else might. If you do find that it is fundamentally broken, please tell me why. That way I learn something.

That said…

Matchable

The Matchable protocol defines a way to compare two objects, structures or values for equality.

Unlike the Equatable protocol, Matchable works by throwing an error when it encounters a mismatch.

You can view this as an assertion of equality. For this reason, the primary method is named assertMatches.

This makes for compact code since you don’t need to write explicit return statements for every failed comparison.

It also allows the protocol to handle compound structures intelligently.

If a matching check of a structure fails on one of its members, the matchable code will wrap up the error thrown by the member, and throw another error from the structure.

Any catching code can dig down into these compound errors to cleanly report exactly where the mismatch occurred.

Usage

You can check that two values match with:

try x.assertMatches(y)

A sequence of checks can easily be performed – the first failure will throw, causing the remaining checks to be skipped:

try int1.assertMatches(int2)
try double1.assertMatches(double2)
try string1.assertMatches(string2)

A type can implement matching by conforming to the Matchable protocol, and defining the assertMatches method. Inside this method it can perform the necessary checks.

If it finds a failure, it can throw a MatchFailedError to report the mismatch.

Implementations of assertMatches are provided for most of the primitive types, and a few Foundation types (I’ve just done the ones I needed for now - pull requests gratefully received…).

Compound Types

Although you can match primitive value types, the protocol comes into its own when performing memberwise matching of compound types (structs, objects, etc).

In this case a type can conform to the MatchableCompound protocol, and defining the assertContentMatches method.

This works the same way as the basic assertMatches method, except that if a check throws an error inside this method, the error will be wrapped in an outer error reporting that the whole structure failed to match.

Keypaths

As a convenience, we also define a form of assertMatches which takes a key path or list of key paths, and calls assertMatches on each path of two objects in turn.

This helps to keep down the amount of boiler-plate code to a minimum.

Here’s an example combining keypaths and the MatchableCompound. This tests the matchability of a structure that has 13 properties, and manages to do it with a minimum of boilerplate.

extension Task: MatchableCompound {
    public func assertContentMatches(_ other: Task, in context: MatchableContext) throws {
        try assertMatches([\.state], of: other)
        try assertMatches([\.name, \.icon, \.details], of: other)
        try assertMatches([\.started], of: other)
        try assertMatches([\.hasDescription, \.hasDuration, \.isScheduled], of: other)
        try assertMatches([\.duration], of: other)
        try assertMatches([\.scheduledHour, \.scheduledMinute], of: other)
        try assertMatches([\.streaks], of: other)
        try assertMatches([\.restDays], of: other)
    }
}

Note that currently if you pass a list of keys, they all have to resolve to members of the same type. Unfortunately this somewhat reduces the helpfulness of this method.

Unit Testing

The original motivating use-case for this protocol was unit testing, where it’s often necessary to compare two instances of something, and useful to be able to identify the exact point of divergence.

Whilst I still see this as the primary use for the protocol, I have split it out into a standalone package as it may be helpful in other places.

The fact that Matchable is different from Equatable is an advantage for unit testing, as it allows both to co-exist.

In your code, you might define Equatable to only check part of a structure (a unique identifier, for example).

This is good for efficiency in production code, but no use for test code where you really do want to know if all members are equal.

In this situation you can define a thorough check with Matchable, and use that for unit testing, without interfering with the efficient implemention of Equatable.

Initially this protocol was defined as part of my XCTestExtensions package.

That package includes some additions to XCTAssert which use Matchable to let you perform matching checks:

XCTAssert(savedModel, matches: reloadedModel)

This assert method catches any errors and presents them in a nice way by calling XCTFail, identifying the exact point of failure.

Because of the way the match-failure errors are wrapped for compound structures, the method can call XCTFail at all levels of the failure, which results in Xcode showing an error marker at all levels.

This can be helpful when tracking down a mismatch in a deeply nested structure.

Future

This is an early implementation, based on code pulled from elsewhere.

The API probably needs tweaking, and the methods definitely need documenting.

I also intend to explore the idea of using Swift’s introspection to automatically generate assertMatches for structures/classes.

In theory this should work well, but it’s possible that it will hit wrinkles.

All feedback, suggestions, pull requests and bug reports gratefully received!

more...

Vapor 4 and Session Authentication

May 01, 2020

I can only explain it as lock-down madness, but a couple of weeks ago I decided to have a little play around with Vapor.

What I wanted to do, initially, was just make a simple website that did user authentication. You could register, login, and logout. If you were logged in, it knew who you were. If you were logged out, there were things you couldn’t see.

Now I’m no web developer. Admittedly I did write a WYSIWYG html editor in Hypercard, in about 1994, but I’m no web developer.

Ok, I might have also written a complete CMS using Hypercard as a CGI engine for MacHTTP, also around that kind of time, but honestly, I’m no web developer.

If really pressed, I might admit to having had a job creating the first interactive shopping basket for Robert Fripp’s DGM record label’s website in about 1998¹ - a job which I had to learn Perl for² - but if that goes to prove anything, it is that I really am not a web developer.

Still, how hard could it be, right?

I was working at Abbey Road at the time. Yes, that Abbey Road. ↩
I still feel dirty ↩

more...

Random Acts of Pragmatism

March 06, 2020

I have been accused (by myself, mostly), of being a bit too much of a purist sometimes. It’s true that I do like things to have an intellectual rigour to them, but it’s mostly about being honest and clear with ourselves about what we’re doing and why. I welcome the application of common sense, and I’m fine with taking shortcuts as long as they’re consciously chosen for a good reason.

I’d like to think that I’m a pragmatist…

more...

Github Actions and Swift

March 05, 2020

Bookish Development Diary, episode 8.

As I mentioned last time, I’ve been playing around with Github Actions, using them to build and test my Swift packages on a number of platforms.

They’re fairly easy to set up - you make a yaml file called something like Tests.yml, add it to the .github/workflows/ directory at the root of your repository, and commit.

The yaml file can contain a vast range of things, but for testing Swift what it usually boils down to some fairly standard steps.

First you select which system and tool versions to build on. For the mac, the macOS-latest image gives you the latest releases of macOS and Xcode. For Linux, there are Docker images available for Swift 5.0 and 5.1, as well as nightly builds of the latest Swift.

Then you clone your package with git.

Next you perform a build, using either swift build or xcodebuild build, depending on the platform you’re on.

Next you run some tests with swift test or xcodebuild test.

There are plenty of other things you can also do (for example posting notifications, uploading files), but a simple file that just builds & tests on the Mac might look something like this:

name: tests
on: [push, pull_request]
jobs:
  macos:
    name: MacOS
    runs-on: macOS-latest
    steps:
    - name: Checkout
      uses: actions/checkout@v1
    - name: Build
      run: swift build -v
    - name: Test
      run: swift test -v -c release

So far so good…

more...

...