var foo Type = getFoo()
var err error
var bar Type
bar, err = getBar(foo)
Isn’t the latter more explicit? Isn’t explicit better? It’ll be easier to review
because you’ll be able to see all the types.
Well, yes and no.
For one thing, between the name of the function you’re calling and the name of
the variable you’re assigning to, the type is obvious most of the time, at
least at the high level.
userID, err := store.FindUser(username)
Maybe you don’t know if userID is a UUID or some custom ID type, or even just a
numeric id… but you know what it represents, and the compiler will ensure you
don’t send it to some function that doesn’t take that type or call a method on
it that doesn’t exist.
In an editor, you’ll be able to hover over it to see the type. In a code
review, you may be able to as well, and even if not, you can rely on the
compiler to make sure it’s not being used inappropriately.
One big reason to prefer the inferred type version is that it makes refactoring
a lot easier.
If you write this code everywhere:
var foo FooType = getFoo()
doSomethingWithFoo(foo)
Now if you want to change getFoo to return a Foo2, and doSomethingWithFoo to
take a Foo2, you have to go change every place where these two functions are
called and update the explicitly declared variable type.
But if you used inference:
foo := getFoo()
doSomethingWithFoo(foo)
Now when you change both functions to use Foo2, no other code has to change. And
because it’s statically typed, we can know this is safe, because the compiler
will make sure we can’t use Foo2 inappropriately.
Does this code really care what type getFoo returns, or what type
doSomethingWithFoo takes? No, it just wants to pipe the output of one into the
other. If this shouldn’t work, the type system will stop it.
So, yes, please use the short variable declaration form. Heck, if you look at it
sideways, it even looks kinda like a gopher :=
How to “do” enums is a common problem in Go, given that it doesn’t have “real”
enums like other languages. There’s basically two common ways to do it, the
first is just typed strings:
The problem with this is that string literals (really, string constants) in Go
will get converted to the correct type, so you’d still be able to call
IsEnabled(“foo-bar”) without the compiler complaining.
A common replacement is to use numeric constants:
type FlagID int
const (
FooBar FlagID = iota
FizzBuzz
)
func IsEnabled(id FlagID) bool {
This is nice, because it would be pretty odd to see code like IsEnabled(4).
But the problem then becomes that you can’t easily print out the name of the
enum in logs or errors.
To fix this, someone (Rob Pike?) wrote
stringer, which generates
code to print out the name of the flags… but then you have to remember to run
stringer, and it’s a bunch of (really) ugly code.
The solution to this was something I first heard suggested by Dave Cheney
(because of course it was), and is so simple and effective that I can’t believe
I had never thought of it before. Make FlagName into a very simple struct:
Now, you can’t call IsEnabled(“nope”), because the constant string can’t be
converted into a struct, so the compiler would complain.
There’s no size difference between a string and a struct{ string } and it’s
just as easy to read as a straight string. Because of the String() method, you
can pass these values to %s etc in format strings and they’ll print out the
name with no extra code or work.
The one tiny drawback is that the globals have to be variables instead of
constants, but that’s one of those problems that really only exists in the
theoretical realm. I’ve never seen a bug from someone overwriting a global
variable like this, that is intended to be immutable.
I’ll definitely be using this pattern in my projects going forward. I hope this
helps some folks who are looking to avoid typos and accidental bugs from
stringly typed code in Go.
Error wrapping in go 1.13 solved a major problem gophers have struggled with since v1: how to add context to errors without obscuring the original error, so that code above could programmatically inspect the original error. However, this did not – by itself – solve the other common problems with errors: implementation leakage and (more generally) error handling.
Fragile Error Handling
In 2016, Dave Cheney wrote a blog post that includes a section titled “Assert errors for behaviour, not type”. The gist of the section is that you don’t want code to depend on implementation-specific error types that are returned from a package’s API, because then, if the implementation ever changes, the error handling code will break. Even four and a half years later, and with 1.13’s new wrapping, this can still happen very easily.
For example, say you’re in an HTTP handler, far down the stack in your data layer. You’re trying to open a file and you get an os.ErrNotExist from os.Open. As of 1.13, you can add more context to that error without obscuring the fact that it’s an os.ErrNotExist. Cool, now the consumers of that code get a nicer error message, and if they want, they can check os.IsNotExist(err) and maybe return a 404 to the caller.
Right there, your web handler is now tied to the implementation details of how your backend, maybe 4 levels deep in the stack, stores data. If you decide to change your backend to store data in S3, and it starts returning s3.ObjectNotFound errors, your web handler won’t recognize that error, and won’t know to return 404. This is barely better than matching on the error string.
Dave’s Solution - Interfaces
Dave proposes creating errors that fulfill interfaces the code can check for, like this:
type notFound interface {
NotFound() bool
}
// IsNotFound returns true if err indicates the resource doesn’t exist.
func IsNotFound(err error) bool {
m, ok := err.(notFound)
return ok && m.NotFound()
}
Cool, so now you can ensure a consistent API without relying on the implementation-specific type of the error. Callers just need to check for IsNotFound, which could be fulfilled by any type. The problem is, it’s missing a piece. How do you take that os.NotExistErr and give it a IsNotFound() method? Well, it’s not super hard, but kind of annoying. You need to write this code:
// IsNotFound returns true if err indicates the resource doesn’t exist.
func IsNotFound(err error) bool {
n, ok := err.(notFound)
return ok && n.NotFound()
}
// MakeNotFound wraps err in an error that reports true from IsNotFound.
func MakeNotFound(err error) error {
if err == nil {
return nil
}
return notFoundErr{error: err}
}
type notFound interface {
NotFound() bool
}
type notFoundErr struct {
error
}
func (notFoundErr) NotFound() bool {
return true
}
func (n notFoundErr) Unwrap() error {
return n.error
}
So now we’re at 28 lines of code and two exported functions. Now what if you want the same for NotAuthorized or ? 28 more lines and two more exported functions. Each just to add one boolean of information onto an error. And that’s the thing… this is purely used for flow control - all it needs to be is booleans.
A Better Way - Flags
At Mattel, we had been following Dave’s method for quite some time, and our errors.go file was growing large and unwieldy. I wanted to make a generic version that didn’t require so much boilerplate, but was still strongly typed, to avoid typos and differences of convention.
After thinking it over for a while, I realized it only took a slight modification of the above code to allow for the functions to take the flag they were looking for, instead of baking it into the name of the function and method. It’s of similar size and complexity to IsNotFound above, and can support expansion of the flags to check, with almost no additional work.
Here’s the code:
// ErrorFlag defines a list of flags you can set on errors.
type ErrorFlag int
const (
NotFound = iota + 1
NotAuthorized
// etc
)
// Flag wraps err with an error that will return true from HasFlag(err, flag).
func Flag(err error, flag ErrorFlag) error {
if err == nil {
return nil
}
return flagged{error: err, flag: flag}
}
// HasFlag reports if err has been flagged with the given flag.
func HasFlag(err error, flag ErrorFlag) bool {
for {
if f, ok := err.(flagged); ok && f.flag == flag {
return true
}
if err = errors.Unwrap(err); err == nil {
return false
}
}
}
type flagged struct {
error
flag ErrorFlag
}
func (f flagged) Unwrap() error {
return f.error
}
To add a new flag, you add a single line to the list of ErrorFlags and you move on. There’s only two exported functions, so the API surface is super easy to understand. It plays well with go 1.13 error wrapping, so you can still get at the underlying error if you really need to (but you probably won’t and shouldn’t!).
Back to our example: the storage code can now keep its implementation private and flag errors from the backend with return errors.Flag(err, errors.NotFound). Calling code can check for that with this:
if errors.HasFlag(err, errors.NotFound) {
// handle not found
}
If the storage code changes what it’s doing and returns a different underlying error, it can still flag it with that with the NotFound flag, and the consuming code can go on its way without knowing or caring about the difference.
Supporting Errors.Is and Errors.As
This is an update in 2022, and I realized that there’s an easier way to do this that properly supports errors.Is and errors.As. In go 1.20, there will be an errors.Join method that can let you combine two errors into one where either one will be found by errors.Is and errors.As. Until then, you can use github.com/natefinch/wrap. Then you can just define the flags as straight errors.
Then, as long as the package wraps its errors with those flags (using a package like Wrap or the upcoming errors.Join), you can check for the flag with the normal functions:
The nice thing is that you can still get the behavior of the original error because this is non-destructive wrapping. So if you need some low level detail of the underlying error, you can get it.
Indirect Coupling
Isn’t this just sentinel errors again? Well, yes, but that’s ok. In 2016, we didn’t have error wrapping, so anyone who wanted to add info to the error would obscure the original error, and then your check for err == os.ErrNotExist would fail. I believe that was the major impetus for Dave’s post. Error wrapping in Go 1.13 fixes that problem. The main problem left is tying error checks to a specific implementation, which this solves.
This solution does require both the producer and the consumer of the error to import the error flags package and use these same flags, however in most projects this is probably more of a benefit than a problem. The edges of the application code can easily check for low level errors and flag them appropriately, and then the rest of the stack can just check for flags. Mattel does this when returning errors from calling the database, for example. Keeping the flags in one spot ensures the producers and consumers agree on what flag names exist.
In theory, Dave’s proposal doesn’t require this coordination of importing the same package. However, in practice, you’d want to agree on the definition of IsNotFound, and the only way to do that with compile-time safety is to define it in a common package. This way you know no one’s going to go off and make their own IsMissing() interface that gets overlooked by your check for IsNotFound().
Choosing Flags
In my experience, there are a limited number of possible bits of data your code could care about coming back about an error. Remember, flags are only useful if you want to change the application’s behavior when you detect them. In practice, it’s not a big deal to just make a list of a handful of flags you need, and add more if you find something is missing. Chances are, you’ll think of more flags than you actually end up using in real code.
Conclusion
This solution has worked wonders for us, and really cleaned up our code of messy, leaky error handling code. Now our code that calls the database can parse those inscrutable postgres error codes right next to where they’re generated, flag the returned errors, and the http handlers way up the stack can happily just check for the NotFound flag, and return a 404 appropriately, without having to know anything about the database.
Do you do something similar? Do you have a totally different solution? I’d love to hear about it in the comments.
Pay remote workers the same as you’d pay local workers. Or vice versa if your
local workers are cheap.
That’s it. That’s the blog post.
It’s 2019, folks. Average home internet speeds are more than enough for video
conferencing and every single laptop has a built-in video camera. Conference
room video hardware has come way down in price and gone way up in quality.
Everyone collaborates via Slack and email and Jira and wikis and shared
documents in the cloud anyway. Our code is hosted in the cloud, ci/cd in the
cloud, deployed to the cloud. Why on earth would it matter where your desk is?
The truth is, it doesn’t matter. With extremely low effort, any company can hire
remote folks and have them be productive, collaborative members of a team. I
should know, I’ve done it for the last 8 years.
One thing that always comes up with remote employees is “how much should I pay
them?” I’m not exactly sure why this is even a question…. actually, yes, I am
sure. Because companies are cheap and want to pay employees as little as
possible. They are, after all, a business. So I guess the question is more
accurately asked “How can I justify paying my employees less while still getting
great talent?”
The answer is always the same - cost of living adjustments. The theory is that
you pay everyone equitably, so they all sustain the same standard of living.
i.e. you pay the person in San Francisco enough for rent and food and spending
money for a new XBox every month. You do the same for the person in rural Ohio -
rent, food, Xbox every month.
Just like a meritocracy, on its face, this sounds perfectly fair. But peek under
the surface, and it’s easily dismissed as false equivalence. Why is a SF
apartment four times the cost of the same apartment in rural Ohio? Because of
supply and demand. Because people believe the apartment in SF is worth more,
so they’re willing to pay more. Why do they believe that? Because the apartment
in SF is near awesome restaurants, easy public transportation, lots of great
similar-minded folks, etc. etc.
These are attributes of the apartment that don’t fit on a spreadsheet of square
footage, number of bedrooms, and lot size… but they have a huge effect on the
price of the home. Clearly, that is what you’re paying for when you buy a
$500k studio in SF.
So, if the house in SF is clearly more valuable than an equivalent-sized one in
rural Ohio… why should the company subsidize paying for those invisible
benefits that come with a house in SF? Would you pay someone more who lived in a
bigger house in the same city? Why not? Why is it ok for companies to subsidize
the location-based value of a home, but not the value derived from
square-footage or lot size?
To put a finer point on it… would you pay someone less who lives on the wrong
side of the tracks in the same city? That’s still location-based, isn’t it?
The thing is, the value of money isn’t actually different in SF and rural Ohio.
Buying an XBox from Amazon costs the same in both places. $3000 a month in rent
for a studio or $3000 a month in mortgage for a 4 bedroom house…. still costs
you $3000. If you live in SF, you’re saying that studio’s location is worth
$3000 a month to you. If you live in rural Ohio, you’re saying the extra
bedrooms and big backyard are worth $3000 a month to you.
…so why would you pay the person in Ohio less?
Someone on Twitter mentioned they understood paying people more who live in high
cost of living areas, but thought it would be weird to pay people less who live
in low cost of living areas…. but it’s really the exact same thing. You pay
the person in SF more, and you’re just paying everyone else less. You can’t have
it one way and not the other.
Does this mean you have to compete with Google’s salaries if your company is in
rural Ohio? Yes and no. It’s true that Google and the other big-five tech
companies pay people a lot more. But that’s true even in Silicon Valley. I’ve
interviewed at lots of SF companies that weren’t able to compete with those kind
of salaries either, but they still get to hire a lot of great talent. The big
five may have a lot of devs, but they can’t hire all the devs. And since
hiring is really hard, they don’t even get all the best devs. The big five
mostly pay a lot of money to keep the other four from poaching… i.e. they’re
really only competing with each other.
So, you might not have to compete with the Googles of the world, but you
probably do have to compete with the Salesforces, Stripes, and (previous to
acquisition) Githubs. While those companies generally pay more than some random
tech company, it’s not double or triple. It’s like 30% more. And honestly,
developers are worth that much. Basically every company in existence needs
developers, or needs to pay a service vendor for specialized software.
Hiring managers - the onus is on you to stop this predatory and unfair hiring
practice. Don’t accept it as “just the way things are”. Speak up against it.
Fight to get your remote developers the same salary and benefits your on-site
folks get. Their work is just as valuable to the company as the local folks,
paying them less is unfair, insulting, and wrong.
There’s a new Go proposal in town - try(). The gist is that it adds a builtin function try() that can wrap a function that returns (a, b, c, …, error), and if the error is non-nil, it will return from the enclosing function, and if the error is nil, it’ll return the rest of the return values.
This is how it looks in code:
func doIt() (string, int, error){
return "Daisy", 45, io.EOF
}
func tryIt() error {
name, age := try(doIt())
// use name, age
return nil
}
In the above, if doIt returns a non-nil error, tryIt will exit at the point where try is called, and will return that error.
Complications
So here’s my problem with this… it complicates the code. It adds points where your code can exit from inside the right hand side of a statement somewhere. It can make it very easy to miss the fact that there’s an early exit statement in the code.
The above is simplistic, it could instead look like this:
func tryIt() error {
fmt.Printf("Hi %s, happy %vth birthday!\n", try(doIt())
// do other stuff
return nil
}
At first blush, it would be very easy to read that code and think this function
always returns nil, and that would be wrong and it could be catastrophically
wrong.
The Old Way
In my opinion, the old way (below) of the original code is a lot more readable. The exit point is clearly called out by the return keyword as well as the indent. The intermediate variables make the print statement a lot more clear.
Oh, and did you catch the mismatched parens on the Printf statement in the try() version of tryIt() above? Me neither the first time.
Early Returns
Writing Go code involves a LOT of returning early, more than any other popular language except maybe C or Rust. That’s the real meat of all those if err != nil statements… it’s not the if, it’s the return.
The reason early returns are so good is that once you pass that return block, you can ignore that case forever. The case where the file doesn’t exist? Past the line of os.Open’s error return, you can ignore it. It no longer exists as something you have to keep in your head.
However, with try, you now have to worry about both cases in the same line and keep that in your head. Order of operations can come into play, how much work are you actually doing before this try may kick you out of the function?
One idea per line
One of the things I have learned as a go programmer is to eschew line density. I don’t want a whole ton of logic in one line of code. That makes it harder to understand and harder to debug. This is why I don’t care about missing ternary operator or map and filter generics. All those do is let you jam more logic into a single line, and I don’t want that. That makes code hard to understand, and easier to misunderstand.
Try does exactly that, though. It encourages you to put a call getting data into a function that then uses that data. For simple cases, this is really nice, like field assignment:
p := Person{
Name: try(getUserName()),
Age: try(getUserAge()),
}
But note how even here, we’re trying to split up the code into multiple lines, one assignment per line.
Would you ever write this code this way?
p := Person{Name: try(getUserName()), Age: try(getUserAge())}
You certainly can, but holy crap, that’s a dense line, and it takes me an order of magnitude longer to understand that line than it does the 4 lines above, even though they’re just differently formatted version of the exact same code. But this is exactly what will be written if try becomes part of the language. Maybe not struct initialization, but what about struct initialization functions?
p := NewPerson(try(getUserName()), try(getUserAge()))
Nearly the same code. Still hard to read.
Nesting Functions
Nesting functions is bad for readability. I very rarely nest functions in my go code, and looking at other people’s go code, most other people also avoid it. Not only does try() force you to nest functions as its basic use case, but it then encourages you to use that nested function nested in some other function. So we’re going from NewPerson(name, age) to NewPerson(try(getUserName()), try(getUserAge())). And that’s a real tragedy of readability.
I have been working remotely for about 8 years now. I’ve worked at companies
that did it poorly, and companies that did it well. Let me define remote for a
minute. I mean fully remote. Like, I can count on one hand the number of times per
year I see my coworkers in person and have fingers left over.
I was the first remote employee in my division at Mattel, and I helped guide the
culture toward supporting remote employees. Mattel, for its part, has been very
supportive, and honestly did many things right without even really thinking about
them as supporting remote employees.
I have a lot of thoughts about remote work, and I’ll probably turn this into a
series of posts on the subject. For now, I’m going to start at the beginning -
hiring.
I’ve interviewed at over a dozen companies that support remote employees. How
the interviewing process goes tells me a lot about whether or not a company
really supports remote employees.
When I interviewed at Canonical, the whole interview was remote. I never saw
anyone in person until after I got my offer, and then it was just a run to the
nearest office to sign paperwork. I literally never met any of my coworkers in
person until our first offsite about three months in. And that’s totally ok.
When I interviewed at Mattel it was much the same, except I was brought on as a
contractor first, which allowed me to prove myself, and then they were happy to
just continue letting me do my thing as a full time employee 3000 miles away
from the rest of the team. I pushed my boss to hire more remote devs, and we now
have a team that is almost fully remote.
Many places I’ve interviewed want you to come onsite after some number of
interviews to “meet the team”. While this is ok, it tends to make me think those
places aren’t as fully bought into remote culture. There was no office to go
into at Canonical. Meeting the team was getting on a google hangout (and that’s
fine).
If you buy into remote culture, meeting someone in a video chat should be good
enough. After all, that’s how you’re going to interact with them 99% of the
time. Just as the whiteboard is a relic of another time, I believe the onsite
interview is a relic if the job is remote. (If the job is not remote, then I
think it’s pretty important to get a handle on how the person interacts in
person with other people… but that’s not what we’re talking about.)
I don’t code on a whiteboard at work, and I don’t code in a meeting room with
another developer at work either. And honestly, they almost never ask me to code
in that meeting room. It’s all talk and drawing architecture on a whiteboard.
Which, like, seriously, save yourself the plane ticket and hotel charge and just
let me do that over hangouts.
The problem with having me come on site is that it’s a 2 day thing. I have to
leave work early to catch a plane the night before, spend the night in a hotel,
get up mildly jetlagged, interview all day, then take a redeye home unless I
want to spend a second night in the hotel and get home at like 3 in the
afternoon the next day. If I did that for every job I interviewed for last time
I was looking, I would have had to take a full month off… it’s just not
scalable…. and it’s rough on my family.
Speaking of family, let’s talk about onboarding. Some companies will onboard you
remotely. This is great. Paperwork can be tricky, but it’s doable with a notary public
(that’s what I did for Mattel). Otherwise, going onsite for onboarding is
fine… you sign paperwork, get a company laptop, some swag, etc. All that
could be mailed out, but I get that paperwork can be tricky remote.
But that’s like… maybe 3 days if everything goes really slowly. Many places
want a week onsite for onboarding. Buh…. to do what? If there are significant
things I can only do while onsite, we’re probably going to have problems when I
go back to work at my house for months at a time. Also, aren’t many of my
coworkers remote, so won’t most of them not even be there? One place even
mentioned onboarding was two weeks onsite.
Two weeks is an eternity. I have young kids, and there is zero chance I’m going
anywhere onsite for two weeks. I work remote so I can be with my kids. I’m
sure a lot of more senior devs out there are in the same position. Making your
onboarding process long makes you much less desirable to anyone with a family.
Don’t draw it out any longer than necessary. As a mediocre white man who is used
to getting his way, I might feel comfortable asking for a reduced onsite, but I
bet many other developers who are not so privileged might not.
So, to sum up - if you really want to show you support remote developers,
instead of just saying you do, start with the interview. Make as much of your
interview process remote as possible, and then make your onboarding as painless
as possible. It’ll save you time, it’ll save your candidates time, it’ll save
the company money, and it’ll make everyone happier.
I was so happy when I discovered retool. It’s a go tool that builds and caches go binaries into a local directory so that your dev tools stay in sync across your team. It fixes all those problems where slight difference in binary versions produce different output and cause code churn. We use it at Mattel for our projects, because we tend to have a large number of external tools that we use for managing code generation, database migrations, release management, etc.
However, retool doesn’t work very well with modules, and trying to run it with modules turned off sometimes misbehaves, and some tools just fail to compile that way.
So what to do? Well, it turns out that in the module world, retool can be replaced by a very small mage script:
This code is pretty simple — it ensures the _tools directory exists (which is where retool puts its binaries as well, so I just reused that spot since our .gitignore already ignored it). Then it sets GOBIN to the _tools directory, so binaries built by the go tool will go there, and runs go get importpath@<tag|hash>. That’s it. The first time, it’ll take a while to download all the libraries it needs to build the binaries into the modules cache, but after that it’ll figure out it doesn’t need to do anything pretty quick.
Now just use the tool helper function below in your magefile to run the right versions of the binaries (and/or add _tools to your PATH if you use something like direnv).
// tool runs a command using a cached binary.
func tool(cmd string, args ...string) error {
return sh.Run(filepath.Join("_tools", cmd), args...)
}
Now all the devs on your team will be using the same versions of their (go) dev tools, and you don’t even need a fancy third party tool to do it (aside from mage).
The list of tools then is just a simple slice of strings, thusly:
var tools = []string{
"github.com/jteeuwen/go-bindata/go-bindata@6025e8de665b31fa74ab1a66f2cddd8c0abf887e",
"github.com/golang/protobuf/protoc-gen-go@v1.3.1",
"gnorm.org/gnorm@v1.0.0",
"github.com/goreleaser/goreleaser@v0.106.0",
}
For most maintained libraries, you’ll get a nice semver release number in there, so it’s perfectly clear what you’re running (but for anything without tags, you can use a commit hash).
I’m really happy that this was as straightforward as I was hoping it would be, and it seems just as usable as retool for my use case.
func init() in Go is a weird beast. It’s the only function you can have
multiples of in the same package (yup, that’s right… give it a try). It
gets run when the package is imported. And you should never use it.
Why not? Well, there’s a few reasons. The main one is that init is only useful
for setting global state. I think it’s pretty well accepted that global state is
bad (because it’s hard to test and it makes concurrency dangerous). So, by
association init is bad, because that’s all it can do.
But wait, there’s more that makes it even worse. Init is run when a package is
imported, but when does a package get imported? If a imports b and b imports c
and b and c both have init functions, which one runs first? What if c has two
init functions in different files? You can find out, but it’s non-obvious and it
can change if you import code differently. Not knowing the order in which code
executes is bad. Normal go code executes top to bottom in a very clear and
obvious order. There’s good reason for that.
How do you test init functions? Trick question, you can’t. It’s not possible to
test the state of a package before init and then make sure the state after init
is correct. As soon as your test code runs, it imports the package and runs init
right away. Ok, maybe that’s not 100% true, you can probably do some hackery in
init to check if you’re running under go test and then not run the init logic…
but then your package isn’t set up the way it expects, and you’d have to write a
test specifically named to run first, to test init… and that’s just horrible
(and nobody does that, so it’s basically always untested code).
Ok, so there’s the reasons not to use it… now what do you do instead? If you
want state, use a struct. Instead of global variables on the package, use fields
on a struct. The package-level functions become methods, and the init function
becomes a constructor.
This fixes all the aforementioned problems. You get rid of global variables, so
if you have two different parts of your code using the same package, they don’t
stomp on each other’s settings etc. You can run tests without worrying that a
previous test modifies global state for a later test. It’s clear and obvious how
to test before and after a constructor gets called. And finally, there’s a clear
and normal order to the initialization of things. You don’t have to wonder what
gets called when, because it’s just normal go functions.
As a corollary… this means you shouldn’t use underscore imports either (since
they’re generally only useful for triggering init functions). These imports
(import _ "github.com/foo/db") are used for their side effects, like
registering sql/db drivers. The problem is that these are, by definition,
setting global variables, and those are bad, as we’ve said. So don’t use those
either.
Once you start writing code with structs instead of globals and init, you’ll
find your code is much easier to test, easier to use concurrently, and more
portable between applications. So, don’t use init.
…Axel Wagner mentioned on Twitter that this
looked too dogmatic, and he’s right. This is programming, there are infinite
possible programs, and thus there will always be exceptions to every rule. I
think it’s really rare that init is the right choice, and you should only come
to that decision after trying other options and ensuring you take into
consideration things like startup order, concurrent access, and testing.
Starlight wraps google’s Go implementation of the starlark python
dialect (most notably found in the Bazel build tool).
Starlight makes it super easy for users to extend your application by writing simple python-like
scripts that interact seamlessly with your current Go code… with no boilerplate on your part.
What is Starlark?
Starlark is a subset of python that removes some of the more advanced features, but keeps the easy to read-and-write feel. For the purposes of this article, to avoid confusion between starlight (my package) and starlark (the language), I’ll be referring to the code as python (since starlark code is a subset of python code), but there are some small differences (described in the previous link).
Parser by google
The parser and runner are maintained by google’s bazel team, which write starlark-go. Starlight is
a wrapper on top of that, which makes it so much easier to use starlark-go. The problem with the
starlark-go API is that it is more built to be a used as configuration, so it assumes you want to get
information out of starlark and into Go. It’s actually pretty difficult to get Go information into
a starlark script…. unless you use starlight.
Easy two-way interaction
Starlight has adapters that use reflection to automatically make any Go value usable in a starlark
script. Passing an *http.Request into a starlark script? Sure, you can do name =
r.URL.Query()["name"][0] in the python without any work on your part.
Starlight is built to just work the way you hope it’ll work. You can access any Go methods or
fields, basic types get converted back and forth seamlessly… and even though it uses reflection,
it’s not as slow as you’d think. A basic benchmark wrapping a couple values and running a starlark
script to work with them runs in a tiny fraction of a millisecond.
The great thing is that the changes made by the python code are reflected in your go objects,
just as if it had been written in Go. So, set a field on a pointer to a struct? Your go code will
see the change, no additional work needed.
100% Safe
The great thing about starlark and starlight is that the scripts are 100% safe to run. By default
they have no access to other parts of your project or system - they can’t write to disk or connect
to the internet. The only access they have to the outside is what you give them. Because of this,
it’s safe to run untrusted scripts (as long as you’re not giving them dangerous functions to run,
like os.RemoveAll). But at the same time, if you’re only running trusted scripts, you can give
them whatever you want (http.Get? Sure, why not?)
Example
Below is an example of a webserver that changes its output depending on the python script it runs. This is the full code, it’s not truncated for readability… this is all it takes.
First the go web server code. Super standard stuff, except a few lines to run starlight…
package main
import (
"fmt"
"log"
"net/http"
"github.com/starlight-go/starlight"
)
func main() {
http.HandleFunc("/", handle)
port := ":8080"
fmt.Printf("running web server on http://localhost%v?name=starlight&repeat=3\n", port)
if err := http.ListenAndServe(port, nil); err != nil {
log.Fatal(err)
}
}
func handle(w http.ResponseWriter, r *http.Request) {
fmt.Println("handling request", r.URL)
// here we define the global variables and functions we're making available
// to the script. These will define how the script can interact with our Go
// code and the outside world.
globals := map[string]interface{}{
"r": r,
"w": w,
"Fprintf": fmt.Fprintf,
}
_, err := starlight.Eval("handle.star", globals, nil)
if err != nil {
fmt.Println(err)
}
}
And the python handle.star:
# Globals are:
# w: the http.ResponseWriter for the request
# r: the *http.Request
# Fprintf: fmt.Fprintf
# for loops and if statements need to be in functions in starlark
def main():
# Query returns a map[string][]string
# this gets a value from a map, with a default if it doesn't exist
# and then takes the first value in the list.
repeat = r.URL.Query().get("repeat", ["1"])[0]
name = r.URL.Query().get("name", ["starlight"])[0]
for x in range(int(repeat)):
Fprintf(w, "hello %s\n", name)
# we can use pythonic truthy statements on the slices returned from the map to
# check if they're empty.
if not r.URL.Query().get("repeat") and not r.URL.Query().get("repeat"):
w.Write("\nadd ?repeat=<int>&name=<string> to the URL to customize this output\n")
w.Write("\ntry modifying the contents of output.star and see what happens.\n")
main()
You can run this example by running go get github.com/starlight-go/starlight and using go run
main.go in the example folder.
You can then update the python and watch the changes the next time you hit the server. This just
uses starlight.Eval, which rereads and reparses the script every time.
Caching
In a production environment, you probably want to only read a script once and parse it once. You
can do that with starlight’s Cache. This cache takes a list of directories to look in for
scripts, which it will read and parse on-demand, and then store the parsed object in memory for
later use. It also uses a cache for any load() calls the scripts use to load scripts they depend
on.
Work Ongoing
Starlight is still a work in progress, so don’t expect the API to be perfectly stable quite yet.
But it’s getting pretty close, and there shouldn’t be any earth shattering changes, but definitely
pin your imports. Right now it’s more about finding corner cases where the starlight wrappers don’t
work quite like you’d expect, and supporting the last few things that aren’t implemented yet (like
channels).
A question came up at the Framingham Go meetup a while back about why something
like Gradle hasn’t taken hold in the Go community. I can’t say that I know for
sure what the answer is - I don’t speak for the community - but, I have some
guesses. I think part of it is that many projects don’t need a full-fledged
build tool - for your typical Go networked server or CLI tool, a single binary
built with go build is probably fine.
For more complex builds, which may require more steps than just compile and
link, like for bundling static assets in a web server or generating code from
protobufs, for example, many people in the Go community reach for Make.
Personally, I find that unfortunate. Makefiles are clearly pretty cool for a
number of reasons (built-in CLI, dependencies, file targets). However, Make is
not Windows friendly, and it has its own language and conventions that you need
to learn on top of the oddity that is Bash scripting. Finally, it doesn’t let
you leverage the Go community’s two greatest resources - go programmers and go
code.
Maybe go run? Maybe not
The above is the start of a blog post I’ve had half written for two years. I
started to go on to recommend using go run make.go with a go file that does
the build for you. But in practice, this is problematic. If you want your
script to be useful for doing more than one thing, you need to implement a CLI
and subcommands. This ends up being a significant amount of work that then
obscures what the actual code is doing… and no one wants to maintain yet
another CLI just for development tasks. In addition, there’s a lot of chaff you
have to handle, like printing out errors, setting up logging etc.
The Last Straw
Last summer there were a couple questions on
r/golang about best practices for using Makefiles
with Go… and I finally decided I’d had enough.
I looked around at what existed for alternatives -
rake was the obvious pattern to follow, being
very popular in the Ruby community. pyinvoke was the
closest equivalent I saw in python. Was there something similar in Go? Well,
sort of, but not exactly. go-task is
written in Go, but tasks are actually defined in YAML. Not my
cup of tea. Mark Bates wrote grift which
has tasks written in Go, but I didn’t really like the ergonomics… I wanted
just a little more magic.
I decided that I could write a tool that behaved pretty similarly to Make, but
allowed you to write Go instead of Bash, and didn’t need any special syntax if
I did a little code parsing and generation on the fly. Thus, Mage was born.
Mage is conceptually just like Make, except you write Go instead of Bash. Of
course, there’s a little more to it than that. In Mage, like in Make, you write
targets that can be accessed via a simple CLI. In Mage, exported functions
become targets. Any of these exported functions are then runnable by running
mage <func_name> in the directory where the magefile lives, just like you’d run
make <target_name> for a make target.
What is a Magefile?
A magefile is simply a .go file with the mage build tag in it. All you need for
a magefile is this:
//+build mage
package main
Mage looks for all go files in the current directory with the mage build tag,
and compiles them all together with a generated CLI.
There are a few nice properties that result from using a build tag to mark
magefiles - one is that you can use as many files as you like named whatever you
like. Just like in normal go code, the files all work together to create a
package.
Another really nice feature is that your magefiles can live side by side with
your regular go code. Mage only builds the files with the mage tag, and your
normal go build only builds the files without the mage tag.
Targets
A function in a magefile is a target if it is exported and has a signature of
func(), func()error, func(context.Context), or
func(context.Context)error. If the target has an error return and you return
an error, Mage will automatically print out the error to its own stderr, and
exit with a non-zero error code.
Doc comments on each target become CLI docs for the magefile, doc comments on
the package become top-level help docs.
//+build mage
// Mostly this is used for building the website and some dev tasks.
package main
// Builds the website. If needed, it will compact the js as well.
func Build() error {
// do your stuff here
return nil
}
Running mage with no arguments (or mage -l if you have a default target
declared) will print out help text for the magefiles in the current directory.
$ mage
Mostly this is used for building the website and some dev tasks.
Targets:
build Builds the website.
The first sentence is used as short help text, the rest is available via mage
-h <target>
$ mage -h build
mage build:
Builds the website. If needed, it will compact the js as well.
This makes it very easy to add a new target to your magefile with proper
documentation so others know what it’s supposed to do.
You can declare a default target to run when you run mage without a target very
easily:
var Default = Build
And just like Make, you can run multiple targets from a single command… mage
build deploy clean will do the right thing.
Dependencies
One of the great things about Make is that it lets you set up a tree of
dependencies/prerequisites that must execute and succeed before the current
target runs. This is easily done in Mage as well. The
github.com/magefile/mage/mg library has a Deps function that takes a list of
dependencies, and runs them in parallel (and any dependencies they have), and
ensures that each dependency is run exactly once and succeeds before continuing.
In this example, build depends on generate and protos, and generate depends on
protos as well. Running build will ensure that protos runs exactly once, before
generate, and generate will run before build continues. The functions sent to
Deps don’t have to be exported targets, but do have to match the same signature
as targets have (i.e. optional context arg, and optional error return).
Shell Helpers
Running commands via os/exec.Command is cumbersome if you want to capture
outputs and return nice errors. github.com/magefile/mage/sh has helper
methods that do all that for you. Instead of errors you get from exec.Command
(e.g. “command exited with code 1”), sh uses the stderr from the command as
the error text.
Combine this with the automatic error reporting of targets, and you easily get
helpful error messages from your CLI with minimal work:
Another nice thing about the sh package is that if you run mage with -v to
turn on verbose mode, the sh package will print out the args of what commands
it runs. In addition, mage sets up the stdlib log package to default to
discard log messages, but if you run mage with -v, the default logger will
output to stderr. This makes it trivial to turn on and off verbose logging in
your magefiles.
How it Works
Mage parses your magefiles, generates a main function in a new file (which
contains code for a generated CLI), and then shoves a compiled binary off in a
corner of your hard drive. The first time it does this for a set of magefiles,
it takes about 600ms. Using the go tool’s ability to check if a binary needs to
be rebuilt or not, further runs of the magefile avoid the compilation overhead
and only take about 300ms to execute. Any changes to the magefiles or their
dependencies cause the cached binary to be rebuilt automatically, so you’re
always running the newest correct code.
Mage is built 100% with the standard library, so you don’t need to install a
package manager or anything other than go to build it (and there are binary
releases if you just want to curl it into CI).
Conclusion
I’ve been using Mage for all my personal projects for almost a year and for
several projects at Mattel for 6 months, and I’ve been extremely happy with it.
It’s easy to understand, the code is plain old Go code, and it has just enough
helpers for the kinds of things I generally need to get done, taking all the
peripheral annoyances out of my way and letting me focus on the logic that needs
to be right.
Give it a try, file some issues if you run into anything. Pull requests more
than welcome.
There’s a new error handling design proposed here. It’s…. not great.
Handle is a new keyword that basically defines a translation that can be applied
to errors returned from the current function:
func printSum(a, b string) error {
handle err { return fmt.Errorf("error summing %v and %v: %v", a, b, err ) }
x := check strconv.Atoi(a)
y := check strconv.Atoi(b)
fmt.Println("result:", x + y)
return nil
}
Check applies the handler and returns if the error passed into it is not nil,
otherwise it returns the non-error value.
Handle, in my opinion is kind of useless. We can already do this today with functions thusly:
func printSum(a, b string) (err error) {
check := func(err error) error {
return fmt.Errorf("error summing %v and %v: %v", a, b, err )
}
x, err := strconv.Atoi(a)
if err != nil {
return check(err)
}
y, err := strconv.Atoi(b)
if err != nil {
return check(err)
}
fmt.Println("result:", x + y)
return nil
}
That does literally the same thing as check and handle above.
The stated reason for adding check and handle is that too many people just write
“return err” and don’t customize the error at all, which means somewhere at the
top of your program, you get this inscrutable error from deep in the bowels of
your code, and you have no idea what it actually means.
It’s trivial to write code that does most of what check and handle do… and no
one’s doing it today (or at least, not often). So why add this complexity?
Check and handle actually make error handling worse. With the check and handle
code, there’s no required “error handling scope” after the calls to add context
to the error, log it, clean up, etc. With the current code, I always have an
if statement that I can easily slot more lines into, in order to make the error
more useful and do other things on the error path. With check, that space in
the code doesn’t exist. There’s a barrier to making that code handle errors
better - now you have to remove check and swap in an if statement. Yes,
you can add a new handle section, but that applies globally to any further
errors returns in the function, not just for this one specific error. Most of
the time I want to add information about one specific error case.
So, for example, in the code above, I would want a different error message for A
failing Atoi vs. B failing Atoi…. because in real code, which one is the
problem may not be obvious if the error message just says “either A or B is a
problem”.
Yes, if err != nil { constitutes a lot of Go code. That’s ok. That’s actually
good. Error handling is extremely important. Check and handle don’t make error
handling better. I suspect they’ll actually make it worse.
A refrain I often state about changes requested for Go is that most of them
just involve avoiding an if statement or a loop. This is one of them. That’s
not a good enough reason to change the language, in my opinion.
So, I don’t really like the contracts defined
here.
They seem complicated to understand, and duplicate a lot of what interfaces
already do, but in a much clunkier fashion.
I think we can do 90% of what the design given can do, with 20% of the added
complexity.
Most of my objection comes from two things:
First the syntax, which adds “type parameters” as yet another overloaded meaning
for stuff in parentheses (we already have: argument lists, return values,
function calls, type conversion, type assertion, and grouping for order of
operations).
Second, the implicit nature of how contracts are defined by a random block of
code that is sorta like go code, but not actually go code.
Syntax
This is a generic function as declared in the contracts code:
func Print(type T)(s []T) {
for _, v := range s {
fmt.Println(v)
}
}
The (type T) here defines a type parameter. In this case it doesn’t tell us
anything about the type, so it’s effectively like interface{}, except that it
magically works with slices the way we all thought interfaces should work with
slices back in the day – i.e. you can pass any slice into this, not just
[]interface{}.
Are we now going to have func(type T)(input T)(output T){}? That’s crazy.
Also, I don’t like that the type parameters precede the arguments… isn’t the
whole reason that we have Go’s unusual ordering that we acknowledge
that the name is more important than the type?
Here’s my fix… since contracts are basically like interfaces, let’s actually
use interfaces. And let’s make the contracty part last, since it’s least
important:
func Print(s []interface{}:T) {
for _, v := range s {
fmt.Println(v)
}
}
So here’s the change in a nutshell. You use a real interface to define the type
of the argument. In this case it’s interface{}. This cuts out the need to
define a contract separately when we already have a way of defining an abstract
type with capabilities. The : tells the compiler that this is a parameterized
type, and T is the name given that type (though it’s not used anywhere).
More Complex Types
added this section to help remove some confusion people had with the proposal
More complex functions with multiple contract types are just as easily done:
func Map(vals []interface{}:X, f func(x X) interface{}:Y) []Y {
ret := make([]Y, len(vals))
for i := range vals {
ret[i] = f(vals[i])
}
return ret
}
:X defines a type in this scope which is constrained by the interface that
precedes it (in this case, there’s no constraint). Y defines a separate type…
then inside the scope you can reference those types.
Contract Definitions as Code Are Hard
Specifying contracts via example code is going to age about as well as
specifying time formats via example output. -me on Twitter
The next example in the design is
contract stringer(x T) {
var s string = x.String()
}
func Stringify(type T stringer)(s []T) (ret []string) {
for _, v := range s {
ret = append(ret, v.String())
}
return ret
}
Wait, so we have to redefine the Stringer interface? Why? WHy not just use a
Stringer interface? Also, what happens if I screw up the code, like this?
contract stringer(x T) {
s := x.String()
}
You think the error message from that is going to be good? I don’t.
Also, this allows an arbitrarily large amount of code in contract definitions.
Much of this code could easily imply restrictions that you don’t intend, or be
more general than you expect.
contract slicer(x T) {
s := x[0]
}
Is that a map of int to something? Or is it a slice? Is that just invalid? What
would the error message say, if so? Would it change if I put a 1 in the index?
Or -1? Or “1”?
Notably… a lot of really smart gophers who have been programming in Go for
years have difficulty defining contracts that are conceptually simple, because
there is so much implied functionality in even simple types.
Take a contract that says you can accept a string or a []byte… what do you
think it would look like?
If you guessed this with even your second or third try…
…then I applaud you for being better at Go than I am. And there’s still
questions about whether or not this would fail for len(s) == 0 (answer: it
won’t, because it’s just type checked, not actually run… but, see what I mean
about implications?) Also, I’m not even 100% sure this is sufficient to define
everything you need. It doesn’t seem to say that you can range over the type.
It doesn’t say that indexing the value will produce a single byte.
Lack of Names and Documentations
The biggest problem with contracts defined as random blocks of code is their
lack of documentation. As above, what exactly a bit of code means in a contract
is actually quite hard to distill when you’re talking about generic types. And
then how do you talk about it? If you have your function that takes your
locally defined stringOrByte, and someone else has theirs defined as robytes,
but the contents are the same (but maybe in a different order with different
type names)… how can you figure out if they’re compatible?
Yes, but it’s non-trivial to see that it is (and if it wasn’t, you’d probably
have to rely on the compiler to tell you).
Imagine for a moment if there were no io.Reader or io.Writer interfaces. How
would you talk about functions that write to a slice of bytes? Would we all
write exactly the same interface? Probably not. Look at the lack of a Logging
interface, and how that affected logging across the ecosystem. io.Reader and
io.Writer make writing and reading streams of bytes so nice because they’re
standardized, because they are discoverable. The standardization means that
everyone who writes streams of bytes uses the exact same signature, so we can
compose readers and writers trivially, and discover new ways to compose them
just by looking for the terms io.Reader and io.Writer.
Just Use Interfaces, and Make Some New Built-in Ones
My solution is to mainly just use interfaces and tag them with :T to denote
they’re a parameterized type. For contracts that don’t distill to “has a
method”, make built-in contract/interfaces that can be well-documented and
well-known. Most of the examples I’ve seen of “But how would you do X?” boil
down to “You can’t, and moreover, you probably shouldn’t”.
A lot of this boils down to “I trust the stdlib authors to define a good set of
contracts and I don’t want every random coder to throw a bunch of code in a
contract block and expect me to be able to understand it”.
I think most of the useful contracts can be defined in a small finite list that
can live in a new stdlib package, maybe called ct to keep it brief.
ct.Comparable could mean x == x. ct.Stringish could mean “string or []byte or a
named version of either”… etc.
Most of the things that fall outside of this are things that I don’t think you
should be doing. Like, “How do you make a function that can compare two
different types with ==?” Uh… don’t, that’s a bad idea.
One of the uses in the contract design is a way to say that you can convert one
thing to another. This can be useful for generic functions on strings vs []byte
or int vs int64. This could be yet another specialized interface:
package ct
// Convertible defines a type that can be converted into T.
type Convertible:T contract
// elsewhere
func ParseUint64(v ct.Convertible:uint64) {
i, err := strconv.ParseUint(uint64(v))
}
Conclusion
The contracts design, as written, IMO, will make the language significantly
worse. Wrapping my head around what a random contract actually means for my
code is just too hard if we’re using example code as the means of definition.
Sure, it’s a clever way to ensure that only types that can be used in that way
are viable… but clever isn’t good.
One of my favorite posts about Go is Rob Napier’s Go is a Shop-Built
Jig. In it, he argues that there
are many ineleagant parts to the Go language, but that they exist to make the
whole work better for actual users. This is stuff like the built-in functions
append and copy, the fact that slices and maps are generic, but nothing else is.
Little pieces are filed off here, stapled on there, because making usage easy
matters more than looking slick.
This design of contracts as written does not feel like a shop-built jig. It
feels like a combination all-in-one machine that can do anything but is so
complicated that you don’t even know how to even approach it or when you should
use it vs the other tools in your shop.
I think we can make a smaller, more incremental addition to the language that
will fix a lot of the problems that many people have with Go - lack of reusable
container types, copy and paste for simple map and filter functions, etc. This
will only add a small amount of complexity to the language, while solving real
problems that people experience.
Notably, I think a lot of the problems generics solve are actually quite minor
in the scheme of major projects. Yes, I have to rewrite a filter function for
every type. But that’s a function I could have written in college and I usually
only need one or two per 20,000 lines of code (and then almost always just
strings).
So… I really don’t want to add a bunch of complexity to solve these problems.
Let’s take the most straightforward fix we can get, with the least impact on the
language. Go has been an amazing success in the last decade. Let’s move slowly
so we don’t screw that up in the next decade.
There’s a disturbing thread that pops up every once in a while where People On
The Internet say that comments are bad and the only reason you need them is
because you and/or your code aren’t good enough. I’m here to say that’s bullshit.
Code Sucks
They’re not entirely wrong… your code isn’t good enough. Neither is mine or
anyone else’s. Code sucks. You know when it sucks the most? When you haven’t
touched it in 6 months. And you look back at the code and wonder “what in the
hell was the author thinking?” (and then you git blame and it’s you… because
it’s always you).
The premise of the anti-commenters is that the only reason you need comments is
because your code isn’t “clean” enough. If it were refactored better, named
better, written better, it wouldn’t need that comment.
But of course, what is clean and obvious and well-written to you, today, while
the entire project and problem space are fully loaded in your brain… might not
be obvious to you, six months from now, or to the poor schmuck that has to debug
your code with their manager breathing down their neck because the CTO just ran
into a critical bug in prod.
Learning to look at a piece of code that you understand, and trying to figure out
how someone else might fail to understand it is a difficult skill to master. But
it is incredibly valuable… one that is nearly as important as the
ability to write good code in the first place. In industry, almost no one codes
alone. And even if you do code alone, you’re gonna forget why you wrote some
of your code, or what exactly this gnarly piece of late night “engineering” is
doing. And someday you’re going to leave, and the person they hire to replace
you is going to have to figure out every little quirk that was in your head at
the time.
So, throwing in comments that may seem overly obvious in the moment is not a bad
thing. Sometimes it can be a huge help.
Avoiding Comments Often Makes Your Code Worse
Some people claim that if you remove comments, it makes your code better,
because you have to make your code clearer to compensate. I call BS on this as
well, because I don’t think anyone is realistically writing sub-par code and
then excusing it by slapping a comment on it (aside from // TODO: this is a
temporary hack, I'll fix it later). We all write the best code we know how,
given the various external constraints (usually time).
The problem with refactoring your code to avoid needing comments is that
it often leads to worse code, not better. The canonical example is factoring
out a complicated line of code into a function with a descriptive name. Which
sounds good, except now you’ve introduced a context switch for the person reading
the code.. instead of the actual line of code, they have a function call… they
have to scroll to where the function call is, remember and map the arguments
from the call site to the function declaration, and then map the return value
back to the call site’s return.
In addition, the clarity of a function’s name is only applicable to very trivial
comments. Any comment that is more than a couple words cannot (or should not)
be made into a function name. Thus, you end up with… a function with a
comment above it.
Indeed, even the existence of a very short function may cause confusion and more
complicated code. If I see such a function, I may search to see where else that
function is used. If it’s only used in one place, I then have to wonder if this
is actually a general piece of code that represents global logic… (e.g.
NameToUserID) or if this function is bespoke code that relies heavily on the
specific state and implementation of its call site and may well not do the right
thing elsewhere. By breaking it out into a function, you’re in essence exposing
this implementation detail to the rest of the codebase, and this is not a
decision that should be taken lightly. Even if you know that this is not
actually a function anyone else should call, someone else will call it at some
point, even where not appropriate.
The problems with small functions are better detailed in Cindy Sridharan’s medium post.
We could dive into long variable names vs. short, but I’ll stop and just
say that you can’t save yourself by making variable names longer. Unless your
variable name is the entire comment that you’re avoiding writing, then you’re
still losing information that could have been added to the comment. And I think
we can all agee that usernameStrippedOfSpacesWithDotCSVExtension is a terrible
variable name.
I’m not trying to say that you shouldn’t strive to make your code clear and
obvious. You definitely should. It’s the hallmark of a good developer. But
code clarity is orthogonal to the existence of comments. And good comments are
also the hallmark of a good developer.
There are no bad comments
The examples of bad comments often given in these discussions are trivially
bad, and almost never encountered in code written outside of a programming 101
class.
// instantiate an error
var err error
Yes, clearly, this is not a useful comment. But at the same time, it’s not
really harmful. It’s some noise that is easily ignored when browsing the
code. I would rather see a hundred of the above comments if it means the dev
leaves in one useful comment that saves me hours of head banging on keyboard.
I’m pretty sure I’ve never read any code and said “man, this code would be so
much easier to understand if it weren’t for all these comments.” It’s nearly
100% the opposite.
In fact, I’ll even call out some code that I think is egregious in its lack of
comments - the Go standard library. While the code may be very correct and well
structured.. in many cases, if you don’t have a deep understanding of what the
code is doing before you look at the it, it can be a challenge to understand
why it’s doing what it’s doing. A sprinkling of comments about what the logic
is doing and why would make a lot of the go standard library a lot easier to
read. In this I am specifically talking about comments inside the
implementation, not doc comments on exported functions in general (those are
generally pretty good).
Any comment is better than no comment
Another chestnut the anti-commenters like to bring out is the wisdom can be
illustrated with a pithy image:
Ah, hilarious, someone updated the contents and didn’t update the comment.
But, that was a problem 20 years ago, when code reviews were not (generally) a
thing. But they are a thing now. And if checking that comments match the
implementation isn’t part of your code review process, then you should probably
review your code review process.
Which is not to say that mistakes can’t be made… in fact I filed a “comment
doesn’t match implementation” bug just yesterday. The saying goes something
like “no comment is better than an incorrect comment” which sounds obviously
true, except when you realize that if there is no comment, then devs will just
guess what the code does, and probably be wrong more often than a comment would
be wrong.
Even if this does happen, and the code has changed, you still have valuable
information about what the code used to do. Chances are, the code still does
basically the same thing, just slightly differently. In this world of
versioning and backwards compatbility, how often does the same function get
drastically changed in functionality while maintaining the same name and
signature? Probably not often.
Take the bug I filed yesterday… the place where we were using the function was
calling client.SetKeepAlive(60). The comment on SetKeepAlive was
“SetKeepAlive will set the amount of time (in seconds) that the client should
wait before sending a PING request”. Cool, right? Except I noticed that
SetKeepAlive takes a time.Duration. Without any other units specified for the
value of 60, Go’s duration type defaults to…. nanoseconds. Oops. Someone had
updated the function to take a Duration rather than an Int. Interestingly, it
did still round the duration down to the nearest second, so the comment was
not incorrect per se, it was just misleading.
Why?
The most important comments are the why comments. Why is the code doing what
it’s doing? Why must the ID be less than 24 characters? Why are we hiding this
option on Linux? etc. The reason these are important is that you can’t figure
out the why by looking at the code. They document lessons learned by the devs,
outside constraints imposed by the business, other systems, etc. These comments
are invaluable, and almost impossible to capture in other ways (e.g. function
names should document what the function does, not why).
Comments that document what the code is doing are less useful, because you can
generally always figure out what the code is doing, given enough time and
effort. The code tells you what it is doing, by definition. Which is not to
say that you should never write what comments. Definitely strive to write the
clearest code you can, but comments are free, so if you think someone might
misunderstand some code or otherwise have difficulty knowing what’s going on,
throw in a comment. At least, it may save them a half hour of puzzling through
your code, at best it may save them from changing it or using it in incorrect
ways that cause bugs.
Tests
Some people think that tests serve as documentation for functions. And, in a
way, this is true. But they’re generally very low on my list of effective
documentation. Why? Well, because they have to be incredibly precise, and thus
they are verbose, and cover a narrow strip of functionality. Every test tests
exactly one specific input and one specific output. For anything other than the
most simple function, you probably need a bunch of code to set up the inputs and
construct the outputs.
For much of programming, it’s easier to describe briefly what a function does
than to write code to test what it does. Often times my tests will be multiple
times as many lines of code as the function itself… whereas the doc comment on
it may only be a few sentences.
In addition, tests only explain the what of a function. What is it supposed to
do? They don’t explain why, and why is often more important, as stated above.
You should definitely test your code, and tests can be useful in figuring out
the expected behavior of code in some edge cases… but if I have to read tests
to understand your code in general, then that’s red flag that you really need to
write more/better comments.
Conclusion
I feel like the line between what’s a useful comment and what’s not is difficult
to find (outside of trivial examples), so I’d rather people err on the
side of writing too many comments. You never know who may be reading your code
next, so do them the favor you wish was done for you… write a bunch of
comments. Keep writing comments until it feels like too many, then write a few
more. That’s probably about the right amount.
If you tell the truth, you don’t have to remember anything.
—Mark Twain
In a code review recently, I asked the author to change some of their asserts to
requires. Functions in testify’s assert package allow the test to continue,
whereas those in the require package end the test immediately. Thus, you use
require to avoid trying to continue running a test when we know it’ll be in a
bad state. (side note: don’t use an assert package, but that’s another post)
Since testify’s assert and require packages have the same interface, the
author’s solution was to simply change the import thusly:
Bam, now all the assert.Foo calls would stop the test immediately, and we didn’t
need a big changelist changing every use of assert to require. All good,
right?
No.
Hell No.
Why? Because it makes the code lie. Anyone familiar with the testify package
understands the difference between assert and require. But we’ve now made code
that looks like an assert, but is actually a require. People who are 200
lines down in a test file may well not realize that those asserts are actually
requires. They’ll assume the test function will continue processing after an
assert fails. They’ll be wrong, and they could accidentally write incorrect
tests because of it - tests that fail with confusing error messages.
This is true in general - code must never lie. This is a cardinal sin
amongst programmers. This is an extension of the mantra that code should be
written to be read. If code looks like it’s doing one thing when it’s actually
doing something else, someone down the road will read that code and
misunderstand it, and use it or alter it in a way that causes bugs. If they’re
lucky, the bugs will be immediate and obvious. If they’re unlucky, they’ll be
subtle and only be figured out after a long debugging session and much head
banging on keyboard. That someone might be you, even if it was your code in
the first place.
If, for some reason, you have to make code that lies (to fulfill an interface or
some such), document the hell out of it. Giant yelling comments that can’t be
missed during a 2am debugging session. Because chances are, that’s when you’re
going to look at this code next, and you might forget that saveToMemory()
function actually saves to a database in AWS’s Antarctica region.
So, don’t lie. Furthermore, try not to even mislead. Humans make assumptions
all the time, it’s built into how we perceive the world. As a coder, it’s your
job to anticipate what assumptions a reader may have, and ensure that they are
not incorrect, or if they are, do your best to disabuse them of their incorrect
assumptions.
If possible, don’t resort to comments to inform the reader, but instead,
structure the code itself in such a way as to indicate it’s not going to behave
the way one might expect. For example, if your type has a Write(b []byte)
(int, error) method that is not compatible with io.Writer, consider calling it
something other than Write… because everyone seeing foo.Write is going to
assume that function will work like an io.Write. Instead maybe call it WriteOut
or PrintOut or anything but Write.
Misleading code can be even more subtle than this. In a recent code review, the
author wrapped a single DB update in a transaction. This set off
alarm bells for me as a reviewer. As a reader, I assumed that the code must be
saving related data in multiple tables, and that’s why a transaction was needed.
Turned out, the code didn’t actually need the transaction, it was just written
that way to be consistent with some other code we had. Unfortunately, in this
case, being consistent was actually confusing… because it caused the reader to
make assumptions that were ultimately incorrect.
Do the poor sap that has to maintain your code 6 months or two years down the
road a favor - don’t lie. Try not to mislead. Because even if that poor sap
isn’t you, they still don’t deserve the 2am headache you’ll likely be
inflicting.
A question came up at Gophercon about using functions as arguments, and what to
do when you have a function that you want to use that doesn’t quite match the
signature. Here’s an example:
Now, what if you want to use RunTwice with a function that needs more inputs
than just a string?
func Append(orig, suffix string) string {
return orig + suffix
}
func do() {
orig := "awesome"
bang := "!"
s := RunTwice(Append(orig, )) // wait, that won't work
fmt.Println(s)
}
The answer is the magic of closures. Closures are anonymous functions that
“close over” or save copies of all local variables so they can be used later.
You can write a closure that captures the bang, and returns a function that’ll
have the Translator signature.
Yay, that works. But it’s not reusable outside the do function. That may be
fine, it may not. If you want to do it in a reusable way (like, a lot of people
may want to adapt Append to return a Translator, you can make a dedicated
function for it like this:
In AppendTranslator, we return a closure that captures the suffix, and returns a
function that, when called, will append that suffix to the string passed to the
Translator.
And now you can use AppendTranslator with RunTwice.
January 31st 2017 was my last day at Canonical, after working for 3.5 years on
what is one of the largest open source projects written in Go -
Juju.
As of this writing, the main repo for Juju, http://github.com/juju/juju, is 3542
files, with 540,000 lines of Go code (not included in that number is 65,000
lines of comments). Counting all dependencies except the standard library, Juju
is 9523 files, holding 1,963,000 lines of Go code (not including comments, which
clock in at 331,000 lines).
These are a few of my lessons learned from my roughly 7000 hours working on this
project.
Notably, not everyone on the Juju team would agree with all of these, and the
codebase was so huge that you could work for a year and not see 2/3rds of the
codebase. So take the following with a grain of salt.
About Juju
Juju is service orchestration tool, akin to Nomad or Kubernetes and similar
tools. Juju consists (for the most part) of exactly two binaries: a client and
a server. The server can run in a few different modes (it used to be multiple
binaries, but they were 99% the same code, so it was easier to just make one
binary that can be shipped around). The server runs on a machine in the cloud
of your choice, and copies of the binary are installed on new machines in the
cloud so they can be controlled by the central server. The client and the
auxiliary machines talk to the main server via RPC over websockets.
Juju is a monolith. There are no microservices, everything runs in a single
binary. This actually works fairly well, since Go is so highly concurrent,
there’s no need to worry about any one goroutine blocking anything else. It
makes it convenient to have everything in the same process. You avoid
serialization and other interprocess communication overhead. It does lend
itself to making code more interdependent, and separations of concerns was not
always the highest priority. However, in the end, I think it was much easier to
develop and test a monolith than it would have been if it were a bunch of
smaller services, and proper layering of code and encapsulation can help a lot
with spaghetti code.
Package Management
Juju did not use vendoring. I think we should have, but the project was started
before any of the major vendoring tools were out there, and switching never felt
like it was worth the investment of time. Now, we did use Roger Peppe’s
godeps (not the same as godep btw) to pin
revisions. The problem is that it messes with other repos in your GOPATH,
setting them to a specific commit hash, so if you ever go to build something
else that doesn’t use vendoring, you’d be building from a non-master branch.
However, the revision pinning gave us repeatable builds (so long as no one did
anything truly heinous to their repo), and it was basically a non-issue except
that the file that holds the commit hashes was continually a point of merge
conflicts. Since it changed so often, by so many developers, it was bound to
happen that two people change the same or adjacent lines in the file. It became
such a problem I started working on an automatic resolution tool (since godeps
holds the commit date of the hash you’re pinning, you could almost always just
pick the newer hash). This is still a problem with glide and any similar tool
that stores dependency hashes in a single file. I’m not entirely sure how to
fix it.
Overall, I never felt that package management was a huge issue. It was a minor
thing in our day to day work… which is why I always thought it was weird to
read all the stories about people rejecting Go because of lack of package
management solutions. Because most third party repos maintained stable APIs for
the same repo, and we could pin our code to use a specific commit… it just was
not an issue.
Project Organization
Juju is 80% monorepo (at github.com/juju/juju, with about 20% code that exists
in separate repos (under github.com/juju). The monorepo section has pros and
cons… It is easy to do sweeping changes across the codebase, but it also means
that it doesn’t feel like you need to maintain a stable API in
foo/bar/baz/bat/alt/special … so we didn’t. And that means that it would be
essentially insane for anyone to actually import any package from under the main
monorepo and expect it to continue to exist in any meaningful way at any future
date. Vendoring would save you, but if you ever needed to update, good luck.
The monorepo also meant that we were less careful about APIs, less careful about
separation of concerns, and the code was more interdependent than it possibly
could have been. Not to say we were careless, but I feel like things outside
the main Juju repo were held to a higher standard as far as separation of
concerns and the quality and stability of the APIs. Certainly the documentation
for external repos was better, and that might be enough of a determining factor by
itself.
The problem with external repos was package management and keeping changes
synchronized across repos. If you updated an external repo, you needed to then
check in changes to the monorepo to take advantage of that. Of course, there’s
no way to make that atomic across two github repos. And sometimes the change to
the monorepo would get blocked by code reviews or failing tests or whatever,
then you have potentially incompatible changes sitting in an external repo,
ready to trip up anyone who might decide to make their own changes to the
external repo.
The one thing I will say is that utils repos are nefarious. Many times we’d want to
backport a fix in some subpackage of our utils repo to an earlier version of
Juju, only to realize that many many other unrelated changes get pulled along
with that fix, because we have so much stuff in the same repo. Thus we’d have
to do some heinous branching and cherry picking and copypasta, and it’s bad and don’t do it.
Just say no to utils packages and repos.
Overall Simplicity
Go’s simplicity was definitely a major factor in the success of the Juju
project. Only about one third of the developers we hired had worked with Go
before. The rest were brand new. After a week, most were perfectly proficient.
The size and complexity of the product were a much bigger problem for developers
than the language itself. There were still some times when the more experienced
Go developers on the team would get questions about the best way to do X in Go,
but it was fairly rare. Contrast this to my job before working on C#, where I
was constantly explaining different parts of the language or why something works
one way and not another way.
This was a boon to the project in that we could hire good developers in general,
not just those who had experience in the language. And it meant that the
language was never a barrier to jumping into a new part of the code. Juju was
huge enough that no one person could know the fine details of the whole thing.
But just about anyone could jump into a part of the code and figure out what 100
or so lines of code surrounding a bug were supposed to do, and how they were
doing it (more or less). Most of the problems with learning a new part of the
code were the same as it would have been in any language - what is the architecture, how
is information passed around, what are the expectations.
Because Go has so little magic, I think this was easier than it would have
been in other languages. You don’t have the magic that other languages have
that can make seemingly simple lines of code have unexpected functionality. You
never have to ask “how does this work?”, because it’s just plain old Go code.
Which is not to say that there isn’t still a lot of complex code with a lot of
cognitive overhead and hidden expectations and preconditions… but it’s at
least not intentionally hidden behind language features that obscure the basic
workings of the code.
Testing
Test Suites
In Juju we used Gustavo Nieyemer’s gocheck to run
our tests. Gocheck’s test suite style encouraged full stack testing by reducing
the developer overhead for spinning up a full Juju server and mongo database
before each test. Once that code was written, as huge as it was, you could just
embed that “base suite” in your test suite struct, and it would automatically do
all the dirty work for you. This meant that our unit tests took almost 20
minutes to run even on a high end laptop, because they were doing so much for
each test. It also made them brittle (because they were running so much code)
and hard to understand and debug. To understand why a test was passing or
failing, you had to understand all the code that ran before the open brace of
your test function, and because it was easy to embed a suite within a suite,
there was often a LOT that ran before that open brace.
In the future, I would stick with the standard library for testing instead. I
like the fact that test with the standard library are written just like normal
go code, and I like how explicit the dependencies have to be. If you want to run
code at the beginning of your test, you can just put a method there… but you
have to put a method there.
time in a bottle
The time package is the bane of tests and testable code. If you have code that
times out after 30 seconds, how do you test it? Do you make a test that takes
30 seconds to run? Do the rest of the tests take 30 seconds to run if something
goes wrong? This isn’t just related to time.Sleep but time.After or
time.Ticker…. it’s all a disaster during tests. And not to mention that test
code (especially when run under -race) can go a lot slower than your code does
in production.
The cure is to mock out time… which of course is non-trivial because the time
package is just a bunch of top level functions. So everywhere that was using
the time package now needs to take your special clock interface that wraps time
and then for tests you pass in a fake time that you can control. This tooks us
a long time pull the trigger on and longer still to propagate the changes
throughout our code. For a long time it was a constant source of flakey tests.
Tests that would pass most of the time, but if the CI machine were slow that
day, some random test would fail. And when you have hundreds of thousands of
lines of tests, chances are SOMETHING is going to fail, and chances are it’s not
the same thing as what failed last time. Fixing flakey tests was a constant
game of whack-a-mole.
Cross Compilation Bliss
I don’t have the exact number of combinations, but the Juju server was built to
run on Windows and Linux (Centos and Ubuntu), and across many more
architectures than just amd64, including some wacky ones like ppc64le, arm64,
and s390x.
In the beginning, Juju used gccgo for builds that the gc compiler did not
support. This was a source of a few bugs in Juju, where gccgo did something
subtly wacky. When gc was updated to support all architectures, we were very
happy to leave the extra compiler by the wayside and be able to work with just
gc.
Once we switched to gc, there were basically zero architecture-specific bugs.
This is pretty awesome, given the breadth of architectures Juju supported, and
the fact that usually the people using the wackier ones were big companies that
had a lot of leverage with Canonical.
Multi-OS Mistakes
In the beginning when we were ramping up Windows support, there were a few OS
specific bugs (we all developed on Ubuntu, and so Windows bugs often didn’t get
caught until CI ran). They basically boiled down to two common mistakes related
to filesystems.
The first was assuming forward slashes for paths in tests. So, for example, if
you know that a config file should be in the “juju” subfolder and called
“config.yml”, then your test might check that the file’s path is folder +
“/juju/config.yml” - except that on Windows it would be folder +
“\juju\config.yml”.
When making a new path, even in tests, use filepath.Join, not path.Join and
definitely not by concatenating strings and slashes. filepath.Join will do the
right thing with slashes for the OS. For comparing paths, always use
path.ToSlash to convert a filepath to a canonical string that you can then
compare to.
The other common mistake was for linux developers to assume you can delete/move
a file while it’s open. This doesn’t work on Windows, because Windows locks the
file when it’s open. This often came in the form of a defer file.Delete()
call, which would get FIFO’d before the deferred file.Close() call, and thus
would try to delete the file while it was still open. Oops. One fix is to just
always call file.Close() before doing a move or delete. Note that you can call
Close multiple times on a file, so this is safe to do even if you also have a
defer file.Close() that’ll fire at the end of the function.
None of these were difficult bugs, and I credit the strong cross platform
support of the stdlib for making it so easy to write cross platform code.
Error Handling
Go’s error handling has definitely been a boon to the stability of Juju. The
fact that you can tell where any specific function may fail makes it a lot
easier to write code that expects to fail and does so gracefully.
For a long time, Juju just used the standard errors package from the stdlib.
However, we felt like we really wanted more context to better trace the path of
the code that caused the error, and we thought it would be nice to keep more
detail about an error while being able to add context to it (for example, using
fmt.Errorf losing the information from the original error, like if it was an
os.NotFound error).
A couple years ago we went about designing an errors package to capture more
context without losing the original error information. After a lot of
bikeshedding and back and forth, we consolidated our ideas in
https://github.com/juju/errors. It’s not a perfect library, and it has grown
bloated with functions over the years, but it was a good start.
The main problem is that it requires you to always call errors.Trace(err) when
returning an error to grab the current file and line number to produce a
stack-trace like thing. These days I would choose Dave Cheney’s
github.com/pkg/errors, which grabs a stack
trace at creation time and avoid all the tracing. To be honest, I haven’t found
stack traces in errors to be super useful. In practice, unforeseen errors still
have enough context just from fmt.Errorf(“while doing foo: %v”, err) that you
don’t really need a stack trace most of the time. Being able to investigate
properties of the original error can sometimes come in handy, though probably
not as often as you think. If foobar.Init() returns something that’s an
os.IsNotFound, is there really anything your code can do about it? Most of the
time, no.
Stability
For a huge project, Juju is very stable (which is not to say that it didn’t have
plenty of bugs… I just mean it almost never crashed or grossly
malfunctioned). I think a lot of that comes from the language. The company
where I worked before Canonical had a million line C# codebase, and it would
crash with null reference exceptions and unhandled exceptions of various sorts
fairly often. I honestly don’t think I ever saw a nil pointer panic from
production Juju code, and only occasionally when I was doing something really
dumb in brand new code during development.
I credit this to go’s pattern of using multiple returns to indicate errors. The
foo, err := pattern and always always checking errors really makes for very
few nil pointers being passed around. Checking an error before accessing the
other variable(s) returned is a basic tenet of Go, so much so that we document
the exceptions to the rule. The extra error return value cannot be ignored or
forgotten thanks to unused variable checks at compile time. This makes the
problem of nil pointers in Go fairly well mitigated, compared to other similar
languages.
Generics
I’m going to make this section short, because, well, you know. Only once or
twice did I ever personally feel like I missed having generics while working on
Juju. I don’t remember ever doing a code review and wishing for generics for
someone else’s code. I was mostly happy not to have to grok the cognitive
complexity I’d come to be familiar with in C# with generics. Interfaces are
good enough 99% of the time. And I don’t mean interface{}. We used
interface{} rarely in Juju, and almost always it was because some sort of
serialization was going on.
Next Time
This is already a pretty long post, so I think I’ll cap it here. I have a lot
of more specific things that I can talk about… about APIs, versioning, the
database, refactoring, logging, idioms, code reviews, etc.
Writing libraries in Go is a relatively well-covered topic, I think… but I see
a lot fewer posts about writing commands. When it comes down to it, all Go code
ends up in a command. So let’s talk about it! This will be the first in a
series, since I ended up having a lot more to say than I realized.
Today I’m going to focus on basic project layout, with the aims of optimizing
for reusability and testability.
There are three unique bits about commands that influence how I structure my
code when writing a command rather than a library:
Package main
This is the only package a go program must have. However, aside from telling
the go tool to produce a binary, there’s one other unique thing about package
main - no one can import code from it. That means that any code you put in
package main can not be used directly by another project, and that makes the OSS
gods sad. Since one of the main reasons I write open source code is so that
other developers may use it, this goes directly against my desires.
There have been many times when I’ve thought “I’d love to use the logic behind X
Go binary as a part of my code”. If that logic is in package main, you can’t.
os.Exit
If you care about producing a binary that does what users expect, then you
should care about what exit code your binary exits with. The only way to do
that is to call os.Exit (or call something that calls os.Exit, like log.Fatal).
However, you can’t test a function that calls os.Exit. Why? Because calling
os.Exit during a test exits the test executable. This is quite hard to figure
out if you end up doing it by accident (which I know from personal experience).
When running tests, no tests actually fail, the tests just exit sooner than they
should, and you’re left scratching your head.
The easiest thing to do is don’t call os.Exit. Most of your code shouldn’t be
calling os.Exit anyway… someone’s going to get real mad if they import your
library and it randomly causes their application to terminate under some
conditions.
So, only call os.Exit in exactly one place, as near to the “exterior” of your
application as you can get, with minimal entry points. Speaking of which…
func main()
It’s is the one function all go commands must have. You’d think that
everyone’s func main would be different, after all, everyone’s application is
different, right? Well, it turns out, if you really want to make your code
testable and reusable, there’s really only approximately one right answer to
“what’s in your main function?”
In fact, I’ll go one step further, I think there’s only approximately one right
answer to “what’s in your package main?” and that’s this:
// command main documentation here.
package main
import (
"os"
"github.com/you/proj/cli"
)
func main{
os.Exit(cli.Run())
}
That’s it. This is approximately the most minimal code you can have in a useful
package main, thereby wasting no effort on code that others can’t reuse. We
isolated os.Exit to a single line function that is the very exterior of our
project, and effectively needs no testing.
We know what’s in main.go… and in fact, main.go is the only go file in the
main package. LICENSE and README.md should be self-explanatory. (Always
use a license! Otherwise many people won’t be able to use your code.)
Now we come to the two subdirectories, run and cli.
CLI
The cli package contains the command line parsing logic. This is where you
define the UI for your binary. It contains flag parsing, arg parsing, help
text, etc.
It also contains the code that returns the exit code to func main (which gets
sent to os.Exit). Thus, you can test exit codes returned from those functions,
instead of trying to test exit codes your binary as a whole produces.
Run
The run package contains the meat of the logic of your binary. You should write
this package as if it were a standalone library. It should be far removed from
any thoughts of CLI, flags, etc. It should take in structured data and return
errors. Pretend it might get called by some other library, or a web service, or
someone else’s binary. Make as few assumptions as possible about how it’ll be
used, just as you would a generic library.
Now, obviously, larger projects will require more than one directory. In fact,
you may want to split out your logic into a separate repo. This kind of depends
on how likely you think it’ll be that people want to reuse your logic. If you
think it’s highly likely, I recommend making the logic a separate directory. In
my mind, a separate directory for the logic shows a stronger committment to
quaity and stability than some random directory nestled deep in a repo
somewhere.
Putting it together
The cli package forms a command line frontend for the logic in the run package.
If someone else comes along, sees your binary, and wants to use the logic behind
it for a web API, they can just import the run package and use that logic
directly. Likewise, if they don’t like your CLI options, they can easily write
their own CLI parser and use it as a frontend to the run package.
This is what I mean about reusable code. I never want someone to have to hack
apart my code to get more use out of it. And the best way to do that is to
separate the UI from the logic. This is the key part. Don’t let your UI
(CLI) concepts leak into your logic. This is the best way to keep your logic
generic, and your UI manageable.
Larger Projects
This layout is good for small to medium projects. There’s a single binary that
is in the root of the repo, so it’s easier to go-get than if it’s under multiple
subdirectories. Larger projects pretty much throw everything out the window.
They may have multiple binaries, in which case they can’t all be in the root of
the repo. However, such projects usually also have custom build steps and
require more than just go-get (which I’ll talk about later).
When working on Gorram, I decided I
wanted to release it via a vanity import path. After all, that’s half the
reason I got npf.io in the first place (an idea blatantly stolen from Russ Cox’s
rsc.io).
What is a vanity import path? It is explained in the go get
documentation. If you’re not hosted on one
of the well known hosting sites (github, bitbucket, etc), go get has to figure
out how to get your code. How it does this is fairly ingenious - it performs an
http GET of the import path (first https then http) and looks for specific meta
elements in the page’s header. The header elements tells go get what type of
VCS is being used and what address to use to get the code.
The great thing about this is that it removes the dependency of your code on any
one code hosting site. If you want to move your code from github to bitbucket,
you can do that without breaking anyone.
So, the first thing you need to host your own vanity imports is something that
will respond to those GET requests with the right response. You could do
something complicated like a special web application running on a VM in the
cloud, but that costs money and needs maintenance. Since I already had a Hugo
website (running for free on github pages), I wanted to see if I could use that.
It’s a slightly more manual process, but the barrier of entry is a lot lower and
it works on any free static hosting (like github pages).
Where import-prefix is a string that matches a prefix of the import statement
used in your code, vcs is the type of source control used, and repo-root is the
root of the VCS repo where your code lives.
What’s important to note here is that these should be set this way for packages
in subdirectories as well. So, for npf.io/gorram/run, the meta tag should still
be as above, since it matches a prefix of the import path, and the root of the
repo is still github.com/natefinch/gorram. (We’ll get to how to handle
subdirectories later.)
You need a page serving that meta tag to live at the exact same place as the import
statement… that generally will mean it needs to be in the root of your domain
(I know that I, personally don’t want to see go get npf.io/code/gorram when I
could have go get npf.io/gorram).
The easiest way to do this and keep your code organized is to put all your pages
for code into a new directory under content called “code”. Then you just need
to set the “permalink” for the code type in your site’s config file thusly:
[Permalinks]
code = "/:filename/"
Then your content’s filename (minus extension) will be used as its url relative
to your site’s base URL. Following the same example as above, I have
content/code/gorram.md which will make that page now appear at npf.io/gorram.
Now, for the content. I don’t actually want to have to populate this page with
content… I’d rather people just get forwarded on to github, so that’s what
we’ll do, by using a refresh header. So here’s our template, that’ll live under layouts/code/single.html:
This will generate a page that will auto-forward anyone who hits it on to your
github account. Now, there’s one more (optional but recommended) piece - the
go-source meta header. This is only relevant to godoc.org, and tells godoc how
to link to the sourcecode for your package (so links on godoc.org will go
straight to github and not back to your vanity url, see more details here).
Now all you need is to put a value of vanity = https://github.com/you/yourrepo
in the frontmatter of the correct page, and the template does the rest. If your
repo has multiple directories, you’ll need a page for each directory (such as
npf.io/gorram/run). This would be kind of a drag, making the whole directory
struture with content docs in each, except there’s a trick you can do here to
make that easier.
I recently landed a change in Hugo that lets you customize the rendering of
alias pages. Alias pages are pages that are mainly used to redirect people from
an old URL to the new URL of the same content. But in our case, they can serve
up the go-import and go-source meta headers for subdirectories of the main code
document. To do this, make an alias.html template in the root of your layouts
directory, and make it look like this:
Other than the stuff in the if statement, the rest is the default alias page
that Hugo creates anyway. The stuff in the if statement is basically the same
as what’s in the code template, just with an extra indirection of specifying
.Page first.
Note that this change to Hugo is in master but not in a release yet. It’ll be
in 0.18, but for now you’ll have to build master to get it.
Now, to produce pages for subpackages, you can just specify aliases in the front
matter of the original document with the alias being the import path under the
domain name:
aliases = [ "gorram/run", "gorram/cli" ]
So your entire content only needs to look like this:
+++
date = 2016-10-02T23:00:00Z
title = "Gorram"
vanity = "https://github.com/natefinch/gorram"
aliases = [
"/gorram/run",
"/gorram/cli",
]
+++
Any time you add a new subdirectory to the package, you’ll need to add a new
alias, and regenerate the site. This is unfortunately manual, but at least it’s
a trivial amount of work.
That’s it. Now go get (and godoc.org) will know how to get your code.
Note that now we can drop the error checking in NewTool because the compiler does it for us. The ToolType still works in all ways like a string, so it’s trivial to convert for printing, serialization, etc.
However, this still lets you do something which is wrong but might not always look wrong:
a := NewTool("drill")
Because of how Go constants work, this will get converted to a ToolType, even though it’s not one of the ones we have defined.
The final revision, which is the one I’d propose, removes even this possibility, by not using a string at all (it also uses a lot less memory and creates less garbage):
package tool
type ToolType int
const (
Screwdriver ToolType = iota
Hammer
// ...
)
type Tool struct {
typ ToolType
}
func NewTool(tooltype ToolType) Tool {
return Tool{typ:tooltype}
}
This now prevents passing in a constant string that looks like it might be right. You can pass in a constant number, but NewTool(5) is a hell of a lot more obviously wrong than NewTool("drill"), IMO.
The push back I’ve heard about this is that then you have to manually write the String() function to make human-readable strings… but there are code generators that already do this for you in extremely optimized ways (see https://github.com/golang/tools/blob/master/cmd/stringer/stringer.go)
The former passes the original error up the stack, but adds no context to it.
Thus, your saveConfig function may end up printing “file not found:
default.cfg” without telling the caller why it was trying to open default.cfg.
The latter allows you to add context to an error, so the above error could
become “can’t find default config file: file not found: default.cfg”.
This gives nice context to the error, but unfortunately, it creates an entirely
new error that only maintains the error string from the original. This is fine
for human-facing output, but is useless for error handling code.
If you use the former code, calling code can then use os.IsNotExist(), figure
out that it was a not found error, and create the file. Using the latter code,
the type of the error is now a different type than the one from os.Open, and
thus will not return true from os.IsNotExist. Using fmt.Errorf effectively
masks the original error from calling code (unless you do ugly string parsing -
please don’t).
Sometimes it’s good to mask the original error, if you don’t want your callers
depending on what should be an implementation detail (thus effectively making it
part of your API contract). However, lots of times you may want to give your
callers the ability to introspect your errors and act on them. This then loses
the opportunity to add context to the error, and so people calling your code
have to do some mental gymnastics (and/or look at the implementation) to
understand what an error really means.
A further problem for both these cases is that when debugging, you lose all
knowledge of where an error came from. There’s no stack trace, there’s not even
a file and line number of where the error originated. This can make debugging
errors fairly difficult, unless you’re careful to make your error messages easy
to grep for. I can’t tell you how often I’ve searched for an error formatting
string, and hoped I was guessing the format correctly.
This is just the way it is in Go, so what’s a developer to do? Why, write an
errors library that does smarter things of course! And there are a ton of these
things out there. Many add a stack trace at error creation time. Most wrap an
original error in some way, so you can add some context while keeping the
original error for checks like os.IsNotExist. At Canonical, the Juju team wrote
just such a library (actually we wrote 3 and then had them fight until only one
was standing), and the result is https://github.com/juju/errors.
This returns a new error created by the errors package which adds the given
string to the front of the original error’s error message (just like
fmt.Errorf), but you can introspect it using errors.Cause(err) to access the
original error return by checkDefault. Thus you can use
os.IsNotExist(errors.Cause(err)) and it’ll do the right thing.
However, this and every other special error library suffer from the same problem
- your library can only understand its own special errors. And no one else’s
code can understand your errors (because they won’t know to use errors.Cause
before checking the error). Now you’re back to square one - your errors are
just as opaque to third party code as if they were created by fmt.Errorf.
I don’t really have an answer to this problem. It’s inherent in the
functionality (or lack thereof) of the standard Go error type.
Obviously, if you’re writing a standalone package for many other people to use,
don’t use a third party error wrapping library. Your callers are likely not
going to be using the same library, so they won’t get use out of it, and it adds
unnecessary dependencies to your code. To decide between returning the original
error and an annotated error using fmt.Errorf is harder. It’s hard to know when
the information in the original error might be useful to your caller. On the
other hand, the additional context added by fmt.Errorf can often change an
inscrutable error into an obvious one.
If you’re writing an application where you’ll be controlling most of the
packages being written, then an errors package may make sense… but you still
run the risk of giving your custom errors to third party code that can’t
understand them. Plus, any errors library adds some complexity to the code (for
example, you always have to rememeber to call os.IsNotExist(errors.Cause(err))
rather than just calling os.InNotExist(err)).
You have to choose one of the three options every time you return an error.
Choose carefully. Sometimes you’re going to make a choice that makes your life
more difficult down the road.
True story. The idea was this package would be a lieutenant commander (get
it?)… but I also knew I didn’t want to have to try to spell lieutenant
correctly every time I used the package. So that’s why it’s called deputy.
He’s the guy who’s not in charge, but does all the work.
Errors
At Juju, we run a lot of external processes
using os/exec. However, the default functionality of an exec.Cmd object is kind
of lacking. The most obvious one is those error returns “exit status 1”.
Fantastic. Have you ever wished you could just have the stderr from the command
as the error text? Well, now you can, with deputy.
In the above code, if the command run by Deputy exits with a non-zero exit
status, deputy will capture the text output to stderr and convert that into the
error text. e.g. if the command returned exit status 1 and output “Error: No
such image or container: bar” to stderr, then the error’s Error() text would
look like “exit status 1: Error: No such image or container: bar”. Bam, the
errors from commands you run are infinitely more useful.
Logging
Another idiom we use is to pipe some of the output from a command to our logs. This can be super useful for debugging purposes. With deputy, this is again easy:
That’s it. Now every line written to stdout by the process will be piped as a
log message to your log.
Timeouts
Finally, an idiom we don’t use often enough, but should, is to add a timeout to
command execution. What happens if you run a command as part of your pipeline
and that command hangs for 30 seconds, or 30 minutes, or forever? Do you just
assume it’ll always finish in a reasonable time? Adding a timeout to running
commands requires some tricky coding with goroutines, channels, selects, and
killing the process… and deputy wraps all that up for you in a simple API:
In Juju, we often have code that needs to run external
executables. Testing this code is a nightmare… because you really don’t want
to run those files on the dev’s machine or the CI machine. But mocking out
os/exec is really hard. There’s no interface to replace, there’s no function to
mock out and replace. In the end, your code calls the Run method on the
exec.Cmd struct.
There’s a bunch of bad ways you can mock this out - you can write out scripts to
disk with the right name and structure their contents to write out the correct
data to stdout, stderr and return the right return code… but then you’re
writing platform-specific code in your tests, which means you need a Windows
version and a Linux version… It also means you’re writing shell scripts or
Windows batch files or whatever, instead of writing Go. And we all know that we
want our tests to be in Go, not shell scripts.
So what’s the answer? Well, it turns out, if you want to mock out exec.Command,
the best place to look is in the exec package’s tests themselves. Lo and
behold, it’s right there in the first function of exec_test.go
What the heck is that doing? It’s pretty slick, so I’ll explain it.
First off, you have to understand how tests in Go work. When running go test,
the go tool compiles an executable from your code, runs it, and passes it the
flags you passed to go test. It’s that executable which actually handles the
flags and runs the tests. Thus, while your tests are running, os.Args[0] is the
name of the test executable.
This function is making an exec.Command that runs the test executable, and
passes it the flag to tell the executable just to run a single test. It then
terminates the argument list with -- and appends the command and arguments
that would have been given to exec.Command to run your command.
The end result is that when you run the exec.Cmd that is returned, it will run
the single test from this package called “TestHelperProcess” and os.Args will
contain (after the --) the command and arguments from the original call.
The environment variable is there so that the test can know to do nothing unless
that environment variable is set.
This is awesome for a few reasons:
It’s all Go code. No more needing to write shell scripts.
The code run in the excutable is compiled with the rest of your test code. No more needing to worry about typos in the strings you’re writing to disk.
No need to create new files on disk - the executable is already there and runnable, by definition.
So, let’s use this in a real example to make it more clear.
In your production code, you can do something like this:
Of course, you can do a lot more interesting things. The environment variables
on the command that fakeExecCommand returns make a nice side channel for telling
the executable what you want it to do. I use one to tell the process to exit
with a non-zero error code, which is great for testing your error handling code.
You can see how the standard library uses its TestHelperProcess test
here.
Hopefully this will help you avoid writing really gnarly testing code (or even worse,
not testing your code at all).
I had a problem yesterday - I wanted to use the excellent godoc.org to show
coworkers the godoc for the feature I was working on. However, the feature was
on a branch of the main code in Github, and go get Does Not Work That Way™.
So, what to do? Well, I figured out a hack to make it work.
https://gopkg.in is a super handy service that lets you point go get at
branches of your repo named vN (e.g. v0, v1, etc). It also happens to work on
tags. So, we can leverage this to get godoc.org to render the godoc for our WIP
branch.
From your WIP branch, simply do
git tag v0
git push myremote v0
This creates a lightweight tag that only affects your repo (not upstream from
whence you forked).
This will tell godoc to ‘go get’ your code from gopkg.in, and gopkg.in will
redirect the command to your v0 tag, which is currently on your branch. Bam,
now you have godoc for your WIP branch on godoc.org.
Later, the tag can easily be removed (and reused if needed) thusly:
git tag -d v0
git push myremote :refs/tags/v0
So, there you go, go forth and share your godoc. I find it’s a great way to get
feedback on architecture before I dive into the reeds of the implementation.
When people hear that Go only supports static linking, one of the things they
eventually realize is that they can’t have traditional plugins via dlls/libs (in
compiled languages) or scripts (in interpreted languages). However, that
doesn’t mean that you can’t have plugins. Some people suggest doing “compiled-
in” plugins - but to me, that’s not a plugin, that’s just code. Some people
suggest just running sub processes and sending messages via their CLI, but that
runs into CLI parsing issues and requires runnnig a new process for every
request. The last option people think of is using RPC to an external process,
which may also seem cumbersome, but it doesn’t have to be.
Serving up some pie
I’d like to introduce you to https://github.com/natefinch/pie - this is a Go
package which contains a toolkit for writing plugins in Go. It uses processes
external to the main program as the plugins, and communicates with them via RPC
over the plugin’s stdin and stout. Having the plugin as an external process can
actually has several benefits:
If the plugin crashes, it won’t crash your process.
The plugin is not in your process’ memory space, so it can’t do anything nasty.
The plugin can be written in any language, not just Go.
I think this last point is actually the most valuable. One of the nicest things
about Go applications is that they’re just copy-and-run. No one even needs to
know they were written in Go. With plugins as external processes, this remains
true. People wanting to extend your application can do so in the language of
their choice, so long as it supports the codec your application has chosen for
RPC.
The fact that the communication occurs over stdin and stdout means that there is
no need to worry about negotiating ports, it’s easily cross platform compatible,
and it’s very secure.
Orthogonality
Pie is written to be a very simple set of functions that help you set up
communication between your process and a plugin process. Once you make a couple
calls to pie, you then need to work out your own way to use the RPC connection
created. Pie does not attempt to be an all-in-one plugin framework, though you
could certainly use it as the basis for one.
Why is it called pie?
Because if you pronounce API like “a pie”, then all this consuming and serving
of APIs becomes a lot more palatable. Also, pies are the ultimate pluggable
interface - depending on what’s inside, you can get dinner, dessert, a snack, or
even breakfast. Plus, then I get to say that plugins in Go are as easy as…
well, you know.
Conclusion
I plan to be using pie in one of my own side projects. Take it out for a spin
in one of your projects and let me know what you think. Happy eating!
I figured I’d answer it here about Go. Luckily, Go is a very small language, so there’s not a lot of surface area to dislike. However, there’s definitely some things I wish were different. Most of these are nitpicks, thus the title.
#1 Bare Returns
func foo() (i int, err error) {
i, err = strconv.ParseInt("5")
return // wha??
}
For all that Go promotes readable and immediately understandable code, this seems like a ridiculous outlier. The way it works is that if you don’t declare what the function is returning, it’ll return the values stored in the named return variables. Which seems logical and handy, until you see a 100 line function with multiple branches and a single bare return at the bottom, with no idea what is actually getting returned.
To all gophers out there: don’t use bare returns. Ever.
#2 New
a := new(MyStruct)
New means “Create a zero value of the given type and return a pointer to it”. It’s sorta like the C++ new, which is probably why it exists. The problem is that it’s nearly useless. It’s mostly redundant with simply returning the address of a value thusly:
a := &MyStruct{}
The above is a lot easier to read, it also gives you the ability to populate the value you’re constructing (if you wish). The only time new is “useful” is if you want to initialize a pointer to a builtin (like a string or an int), because you can’t do this:
a := &int
but you can do this:
a := new(int)
Of course, you could always just do it in (gasp) two lines:
a := 0
b := &a
To all the gophers out there: don’t use new. Always use &Foo{} with structs, maps, and slices. Use the two line version for numbers and strings.
#3 Close
The close built-in function closes a channel. If the channel is already closed, close will panic. This pisses me off, because most of the time when I call close, I don’t actually care if it’s already closed. I just want to ensure that it’s closed. I’d much prefer if close returned a boolean that said whether or not it did anything, and then if I choose to panic, I can. Or, you know, not.
#4 There is no 4
That’s basically it. There’s some things I think are necessary evils, like goto and panic. There’s some things that are necessary ugliness, like the built-in functions append, make, delete, etc. I sorta wish x := range foo returned the value in x and not the index, but I get that it’s to be consistent between maps and slices, and returning the value in maps would be odd, I think.
All these are even below the level of nitpicks, though. They don’t bug me, really. I understand that everything in programming is a tradeoff, and I think the decisions made for Go were the right ones in these cases. Sometimes you need goto. Sometimes you need to panic. Making those functions built-ins rather than methods on the types means you don’t need any methods on the types, which keeps them simpler, and means they’re “just data”. It also means you don’t lose any functionality if you make new named types based on them.
So that’s my list for Go.
Postscript
Someone on the twitter discussion mentioned he couldn’t think of anything he disliked about C#, which just about made me spit my coffee across the room. I programmed in C# for ~9 years, starting out porting some 1.1 code to 2.0, and leaving as 5.0 came out. The list of features in C# as of 5.0 is gigantic. Even being a developer writing in it 40+ hours a week for 9 years, there was still stuff I had to look up to remember how it worked.
I feel like my mastery of Go after a year of side projects was about equivalent to my mastery of C# after 9 years of full time development. If we assume 1:1 correlation between time to master and size of the language, an order of magnitude sounds about right.
Obviously, not everyone hates Go. But there was a quora
question recently
about why everyone criticizes Go so much. (sorry, I don’t normally post links to
Quora, but it was the motivator for this post) Even before I saw the answers to
the question, I knew what they’d consist of:
Go is a language stuck in the 70’s.
Go ignores 40 years of programming language research.
Go is a language for blue collar (mediocre) developers.
Gophers are ok with working in Java 1.0.
Unfortunately, the answers to the questions were more concerned with explaining
why Go is “bad”, rather than why this gets under so many people’s skin.
When reading the answers I had a eureka moment, and I realized why it is. So
here’s my answer to the same question. This is why Go is so heavily criticized,
not why Go is “bad”.
There’s two awesome posts that inform my answer: Paul Graham’s
post about keeping your identity
small, and Kathy Sierra’s post about the Koolaid point. I encourage you to read those two posts, as
they’re both very informative. I hesitate to compare the horrific things that
happen to women online with the pedantry of flamewars about programming
languages, but the Koolaid Point is such a valid metaphor that I wanted to link
to the article.
Paul says
people can never have a fruitful argument about
something that’s part of their identity
i.e. the subject hits too close to home,
and their response becomes emotional rather than logical.
Kathy says
the hate wasn’t so much about the product/brand but that other people were falling for it.
i.e. they’d drunk the kool-aid.
Go is the only recent language that takes the aforementioned 40 years of
programming language research and tosses it out the window. Other new languages
at least try to keep up with the Jones - Clojure, Scala, Rust - all try to
incorporate “modern programming theory” into their design. Go actively tries
not to. There is no pattern matching, there’s no borrowing, there’s no pure
functional programming, there’s no immutable variables, there’s no option types,
there’s no exceptions, there’s no classes, there’s no generics…. there’s a lot
Go doesn’t have. And in the beginning this was enough to merely earn it scorn.
Even I am guilty of this. When I first heard about Go, I thought “What? No
exceptions? Pass.”
But then something happened - people started using it. And liking it. And
building big projects with it. This is the Koolaid-point - where people have
started to drink the Koolaid and get fooled into thinking Go is a good
language. And this is where the scorn turns into derision and attacks on the
character of the people using it.
The most vocal Go detractors are those developers who write in ML-derived
languages (Haskell, Rust, Scala, et al) who have tied their preferred
programming language into their identity. The mere existence of Go says
“your views on what makes a good programming language are wrong”. And the more
people that use and like Go, the more strongly they feel that they’re being told
their choice of programming language - and therefore their identity - is wrong.
Note that basically no one in the Go community actually says this. But the Go
philosophy of simplicity and pragmatism above all else is the polar opposite of
what those languages espouse (in which complexity in the language is ok because
it enforces correctness in the code). This is insulting to the people who tie
their identity to that language. Whenever a post on Go makes it to the front
page of Hacker News, it is an affront to everything they hold dear, and so you
get comments like Go developers are stuck in the 70’s, or is only for blue-collar devs.
So, this is why I think people are so much more vocal about their dislike of Go:
because it challenges their identity, and other people are falling for it. This
is also why these posts so often mention Google and how the language would have
died without them. Google is now the koolaid dispenser. The fact that they
are otherwise generally thought of as a very talented pool of developers means
that it is simultaneously more outrageous that they are fooling people and more
insulting that their language flies in the face of ML-derived languages.
Steve Francia asked me to help him get
Discourse deployed as a place for people to discuss
Hugo, his static site generator (which is what I use to
build this blog). If you don’t know Discourse, it’s pretty amazing forum
software with community-driven moderation, all the modern features you expect
(@mentions, SSO integration, deep email integration, realtime async updates, and
a whole lot more). What I ended up deploying is now at
discuss.gohugo.io.
I’d already played around with deploying Discourse about six months ago, so I
already had an idea of what was involved. Given that I work on
Juju as my day job, of course I decided to use Juju to
deploy Discourse for Steve. This involved writing a Juju charm which is sort
of like an install script, but with hooks for updating configuration and hooks
for interacting with other services. I’ll talk about the process of writing the
charm in a later post, but for now, all you need to know is that it follows the
official install guide for installing Discourse.
The install guide says that you can install Discourse in 30 minutes. Following
it took me a lot longer than that, due to some confusion about what the
install guide really wanted you to do, and what the install really required.
But you don’t need to know any of that to use Juju to install Discourse, and you
can get it done in 8 minutes, not 30. Here’s how:
Now, Juju does not yet have a provider for Digital Ocean, so we have to use a
plugin to get the machine created. We’re in the process of writing a provider
for Digital Ocean, so soon the plugin won’t be necessary. If you use another
cloud provider, such as AWS, Azure, HP Cloud, Joyent, or run your own Openstack
or MAAS, you can easily configure Juju to use that service, and a couple of these steps will
not be necessary. I’ll post separate steps for that later. But for now, let’s
assume you’re using Digital Ocean.
Get your Digital Ocean access info
and set the client id in an environment variable called DO_CLIENT_ID and the API
key in an environment variable called DO_API_KEY.
Juju requires access with an SSH key to the machines, so make sure you have one
set up in your Digital Ocean account.
Now, let’s create a simple configuration so juju knows where you want to deploy
your new environment.
juju init
Running juju init will create a boilerplate configuration file at
~/.juju/environments.yaml. We’ll append our digital ocean config at the bottom:
(obviously replace the region with whatever one you want)
Now, it’ll take about a minute for the machine to come up.
Discourse requires email to function, so you need an account at
mandrill, mailgun, etc. They’re free, so
don’t worry. From that account you need to get some information to properly set
up Discourse. You can do this after installing discourse, but it’s faster if
you do it before and give the configuration at deploy time. (changing settings
later will take a couple minutes while discourse reconfigures itself)
When you deploy discourse, you’re going to give it a configuration file, which
will look something like this:
The first line must be the same as the name of the service you’re deploying. By
default it’s “discourse”, so you don’t need to change it unless you’re deploying
multiple copies of discourse to the same Juju environment. And remember, this
is yaml, so those spaces at the beginning of the rest of the lines are
important.
The rest should be pretty obvious. Hostname is the domain name where your site
will be hosted. This is important, because discourse will send account
activation emails, and the links will use that hostname. Developer emails are
the email addresses of accounts that should get automatically promoted to admin
when created. The rest is email-related stuff from your mail service account.
Finally, unicorn workers should just stay 3 unless you’re deploying to a machine
with less than 2GB of RAM, in which case set it to 2.
Ok, so now that you have this file somewhere on disk, we can deploy discourse.
Don’t worry, it’s really easy. Just do this:
That’s it. If you’re deploying to a 2GB Digital Ocean droplet, it’ll take about
7 minutes.
To check on the status of the charm deployment, you can do juju status, which
will show, among other things “agent-state: pending” while the charm is being
deployed. Or, if you want to watch the logs roll by, you can do juju debug-
log.
Eventually juju status will show agent-state: started. Now grab the ip
address listed at public address: in the same output and drop that into your
browser. Bam! Welcome to Discourse.
If you ever need to change the configuration you set in the config file above,
you can do that by editing the file and doing
juju set discourse --config=/path/to/config
Or, if you just want to tweak a few values, you can do
juju set discourse foo=bar baz=bat ...
Note that every time you call juju set, it’ll take a couple minutes for
Discourse to reconfigure itself, so you don’t want to be doing this over and
over if you can hep it.
Now you’re on your own, and will have to consult the gurus at
discourse.org if you have any problems. But don’t worry, since
you deployed using Juju, which uses their official install instructions, your
discourse install is just like the ones people deploy manually (albeit with a
lot less time and trouble).
Good Luck!
Please let me know if you find any errors in this page, and I will fix them
immediately.
TOML stands for Tom’s Own Minimal Language. It is a configuration language
vaguely similar to YAML or property lists, but far, far better. But before we
get into it in detail, let’s look back at what came before.
Long Ago, In A Galaxy Far, Far Away
Since the beginning of computing, people have needed a way to configure
their software. On Linux, this generally is done in text files. For simple
configurations, good old foo = bar works pretty well. One setting per line,
name on the left, value on the right, separated by an equals. Great. But when
your configuration gets more complicated, this quickly breaks down. What if you
need a value that is more than one line? How do you indicate a value should be
parsed as a number instead of a string? How do you namespace related
configuration values so you don’t need ridiculously long names to prevent
collisions?
The Dark Ages
In the 90’s, we used XML. And it sucked. XML is verbose, it’s hard for humans
to read and write, and it still doesn’t solve a lot of the problems above (like
how to specify the type of a value). In addition, the XML spec is huge,
processing is very complicated, and all the extra features invite abuse and
overcomplication.
Enlightenment
In the mid 2000’s, JSON came to popularity as a data exchange format, and it was
so much better than XML. It had real types, it was easy for programs to
process, and you didn’t have to write a spec on what values should get processed
in what way (well, mostly). It was sigificantly less verbose than XML. But it
is a format intended for computers to read and write, not humans. It is a pain
to write by hand, and even pretty-printed, it can be hard to read and the
compact data format turns into a nested mess of curly braces. Also, JSON is not
without its problems… for example, there’s no date type, there’s no support
for comments, and all numbers are floats.
A False Start
YAML came to popularity some time after JSON as a more human-readable format,
and its key: value syntax and pretty indentation is definitely a lot easier on
the eyes than JSON’s nested curly-braces. However, YAML trades ease of reading
for difficulty in writing. Indentation as delimiters is fraught with error…
figuring out how to get multiple lines of data into any random value is an
exercise in googling and trial & error.
The YAML spec is also ridiculously long. 100% compatible parsers are very
difficult to write. Writing YAML by hand is a ridden with landmines of corner
cases where your choice of names or values happens to hit a reserved word or
special marker. It does support comments, though.
The Savior
On February 23, 2013, Tom Preston-Werner (former CEO of GitHub) made his first
commit to https://github.com/toml-lang/toml. TOML stands for Tom’s Obvious,
Minimal Language. It is a language designed for configuring software. Finally.
TOML takes inspiration from all of the above (well, except XML) and even gets
some of its syntax from Microsoft’s INI files. It is easy to write by hand and
easy to read. The spec is short and understandable by mere humans, and it’s
fairly easy for computers to parse. It supports comments, has first class
dates, and supports both integers and floats. It is generally insensitive to
whitespace, without requiring a ton of delimiters.
Let’s dive in.
The Basics
The basic form is key = value
# Comments start with hash
foo = "strings are in quotes and are always UTF8 with escape codes: \n \u00E9"
bar = """multi-line strings
use three quotes"""
baz = 'literal\strings\use\single\quotes'
bat = '''multiline\literals\use
three\quotes'''
int = 5 # integers are just numbers
float = 5.0 # floats have a decimal point with numbers on both sides
date = 2006-05-27T07:32:00Z # dates are ISO 8601 full zulu form
bool = true # good old true and false
One cool point: If the first line of a multiline string (either literal or not)
is a line return, it will be trimmed. So you can make your big blocks of text
start on the line after the name of the value and not need to worry about the
extraneous newline at the beginning of your text:
preabmle = """
We the people of the United States, in order to form a more perfect union,
establish justice, insure domestic tranquility, provide for the common defense,
promote the general welfare, and secure the blessings of liberty to ourselves
and our posterity, do ordain and establish this Constitution for the United
States of America."""
Lists
Lists (arrays) are signified with brackets and delimited with commas. Only
primitives are allowed in this form, though you may have nested lists. The
format is forgiving, ignoring whitespace and newlines, and yes, the last comma
is optional (thank you!):
I love that the format is forgiving of whitespace and that last comma. I like
that the arrays are all of a single type, but allowing mixed types of sub-arrays
bugs the heck out of me.
Now we get crazy
What’s left? In JSON there are objects, in YAML there are associative arrays…
in common parlance they are maps or dictionaries or hash tables. Named
collections of key/value pairs.
In TOML they are called tables and look like this:
# some config above
[table_name]
foo = 1
bar = 2
Foo and bar are keys in the table called table_name. Tables have to be at the
end of the config file. Why? because there’s no end delimiter. All keys under
a table declaration are associated with that table, until a new table is
declared or the end of the file. So declaring two tables looks like this:
# some config above
[table1]
foo = 1
bar = 2
[table2]
foo = 1
baz = 2
The declaration of table2 defines where table1 ends. Note that you can indent
the values if you want, or not. TOML doesn’t care.
If you want nested tables, you can do that, too. It looks like this:
nested_table is defined as a value in table1 because its name starts with
table1.. Again, the table goes until the next table definition, so baz="bat"
is a value in table1.nested_table. You can indent the nested table to make it
more obvious, but again, all whitespace is optional:
Having to retype the parent table name for each sub-table is kind of annoying,
but I do like that it is very explicit. It also means that ordering and
indenting and delimiters don’t matter. You don’t have to declare parent tables
if they’re empty, so you can do something like this:
Arrays of tables inside another table get combined in the way you’d expect, like
[[table1.array]].
TOML is very permissive here. Because all tables have very explicitly defined
parentage, the order they’re defined in doesn’t matter. You can have tables (and
entries in an array of tables) in whatever order you want. This is totally
acceptable:
[[comments]]
author = "Anonymous"
text = "Love it!"
[foo.bar.baz]
bat = "hi"
[foo.bar]
howdy = "neighbor"
[[comments]]
author = "Anonymous"
text = "Love it!"
Of course, it generally makes sense to actually order things in a more organized
fashion, but it’s nice that you can’t shoot yourself in the foot if you reorder
things “incorrectly”.
Conclusion
That’s TOML. It’s pretty awesome.
There’s a list of parsers
on the TOML page on github for pretty much whatever language you want. I
recommend BurntSushi’s for Go, since it
works just like the built-in parsers.
It is now my default configuration language for all the applications I write.
The next time you write an application that needs some configuration, take a
look at TOML. I think your users will thank you.
I obviously have a lot to talk about with Hugo, so I decided I wanted to make
this into a series of posts, and have links at the bottom of each post
automatically populated with the other posts in the series. This turned out to
be somewhat of a challenge, but doable with some effort… hopefully someone
else can learn from my work.
This now brings us to Taxonomies.
Taxonomies are basically just like tags, except that you can have any number of
different types of tags. So you might have “Tags” as a taxonomy, and thus you
can give a content tags with values of “go” and “programming”. You can also
have a taxonomy of “series” and give content a series of “Hugo 101”.
Taxonomy is sort of like relatable metadata to gather multiple pieces of content
together in a structured way… it’s almost like a minimal relational database.
Taxonomies are listed in your site’s metadata, and consist of a list of keys.
Each piece of content can specify one or more values for those keys (the Hugo
documentation calls the values “Terms”). The values are completely ad-hoc, and
don’t need to be pre-defined anywhere. Hugo automatically creates pages where
you can view all content based on Taxonomies and see how the various values are
cross-referenced against other content. This is a way to implement tags on
posts, or series of posts.
So, for my example, we add a Taxonomy to my site config called “series”. Then
in this post, the “Hugo: Beyond the Defaults” post, and the “Hugo is Friggin’
Awesome” post, I just add series = ["Hugo 101"] (note the brackets - the
values for the taxonomy are actually a list, even if you only have one value).
Now all these posts are magically related together under a taxonomy called
“series”. And Hugo automatically generates a listing for this taxonomy value
at /series/hugo-101 (the taxonomy value gets
url-ized). Any other series I make will be under a similar directory.
This is fine and dandy and pretty aweomse out of the box… but I really want to
automatically generate a list of posts in the series at the bottom of each post
in the series. This is where things get tricky, but that’s also where things
get interesting.
The examples for displaying
Taxonomies all “hard code” the
taxonomy value in the template… this works great if you know ahead of time
what value you want to display, like “all posts with tag = ‘featured’”.
However, it doesn’t work if you don’t know ahead of time what the taxonomy value
will be (like the series on the current post).
This is doable, but it’s a little more complicated.
I’ll give you a dump of the relevant portion of my post template and then talk
about how I got there:
{{ if .Params.series }}
{{ $name := index .Params.series 0 }}
<hr/>
<p><a href="" id="series"></a>This is a post in the
<b>{{$name}}</b> series.<br/>
Other posts in this series:</p>
{{ $name := $name | urlize }}
{{ $series := index .Site.Taxonomies.series $name }}
<ul class="series">
{{ range $series.Pages }}
<li>{{.Date.Format "Jan 02, 2006"}} -
<a href="{{.Permalink}}">{{.LinkTitle}}</a></li>
{{end}}
</ul>
{{end}}
So we start off defining this part of the template to only be used if the post
has a series. Right, sure, move on.
Now, the tricky part… the taxonomy values for the current page resides in the
.Params values, just like any other custom metadata you assign to the page.
Taxonomy values are always a list (so you can give things multiple tags etc),
but I know that I’ll never give something more than one series, so I can just
grab the first item from the list. To do that, I use the index function, which
is just like calling series[0] and assign it to the $name variable.
Now another tricky part… the series in the metadata is in the pretty form you
put into the metadata, but the list of Taxonomies in .Site.Taxonomies is in the
urlized form… How did I figure that out? Printf
debugging. Hugo’s auto-reloading makes it really easy to use the template
itself to figure out what’s going on with the template and the data.
When I started writing this template, I just put {{$name}} in my post template
after the line where I got the name, and I could see it rendered on webpage of
my post that the name was “Hugo 101”. Then I put {{.Site.Taxonomies.series}}
and I saw something like map[hugo-101:[{0 0xc20823e000} {0 0xc208048580} {0
0xc208372000}]] which is ugly, but it showed me that the value in the map is
“hugo-101”… and I realized it was using the urlized version, so I used the
pre-defined hugo function urlize to convert the pretty series.
And from there it’s just a matter of using index again, this time to use
$name as a key in the map of series…. .Site.Taxonomies is a map
(dictionary) of Taxonomy names (like “series”) to maps of Taxonomy values (like
“hugo-101”) to lists of pages. So, .Site.Taxonomies.series reutrns a map of
series names to lists of pages… index that by the current series names, and
bam, list of pages.
And then it’s just a matter of iterating over the pages and displaying them
nicely. And what’s great is that this is now all automatic… all old posts get
updated with links to the new posts in the series, and any new series I make,
regardless of the name, will get the nice list of posts at the bottom for that
series.
In my last post, I had deployed what is almost the most basic Hugo site
possible. The only reason it took more than 10 minutes is because I wanted to
tweak the theme. However, there were a few things that immediately annoyed me.
I didn’t like having to type hugo -t hyde all the time. Well, turns out
that’s not necessary. You can just put theme = "hyde" in your site
config, and never need to type it again. Sweet. Now to run the local server, I
can just run hugo server -w, and for final generation, I can just run hugo.
Next is that my posts were under npf.io/post/postname … which is not the end
of the world, but I really like seeing the date in post URLs, so that it’s easy
to tell if I’m looking at something really, really old. So, I went about
looking at how to do that. Turns out, it’s trivial. Hugo has a feature called
permalinks, where you can define the
format of the url for a section (a section is a top level division of your site,
denoted by a top level folder under content/). So, all you have to do is, in
your site’s config file, put some config that looks like this:
[permalinks]
post = "/:year/:month/:filename/"
code = "/:filename/"
While we’re at it, I had been putting my code in the top level content
directory, because I wanted it available at npf.io/projectname …. however
there’s no need to do that, I can put the code under the code directory and just
give it a permalink to show at the top level of the site. Bam, awesome, done.
One note: Don’t forget the slash at the end of the permalink.
But wait, this will move my “Hugo is Friggin’ Awesome” post to a different URL,
and Steve Francia already tweeted about it with the old URL. I don’t want that
url to send people to a 404 page!
Aliases to the rescue. Aliases are just
a way to make redirects from old URLs to new ones. So I just put aliases =
["/post/hugo-is-awesome/"] in the metadata at the top of that post, and now
links to there will redirect to the new location. Awesome.
Ok, so cool… except that I don’t really want the content for my blog posts
under content/post/ … I’d prefer them under content/blog, but still be of type
“post”. So let’s change that too. This is pretty easy, just rename the folder
from post to blog, and then set up an
archetype to default the metadata
under /blog/ to type = “post”. Archetypes are default metadata for a section,
so in this case, I make a file archetypes/blog.md and add type= “post” to the
archetype’s metadata, and now all my content created with hugo new
blog/foo.md will be prepopulated as type “post”. (does it matter if the type
is post vs. blog? no. But it matters to me ;)
@mlafeldt on Twitter pointed out my RSS feed was
wonky…. wait, I have an RSS feed? Yes, Hugo has that
too. There are feed XML files
automatically output for most listing directories… and the base feed for the
site is a list of recent content. So, I looked at what Hugo had made for me
(index.xml in the root output directory)… this is not too bad, but I don’t
really like the title, and it’s including my code content in the feed as well as
posts, which I don’t really want. Luckily, this is trivial to fix. The RSS xml
file is output using a Go template just like everything else in the output.
It’s trivial to adjust the template so that it only lists content of type
“post”, and tweak the feed name, etc.
I was going to write about how I got the series stuff at the bottom of this
page, but this post is long enough already, so I’ll just make that into its own
post, as the next post in the series! :)
This blog is powered by Hugo, a static site generator
written by Steve Francia (aka spf13). It is, of course, written in Go. It is
pretty similar to Jekyll, in that you write markdown, run a
little program (hugo) and html pages come out the other end in the form of a
full static site. What’s different is that Jekyll is written in ruby and is
relatively slow, and Hugo is written in Go and is super fast… only taking a
few milliseconds to render each page.
Hugo includes a webserver to serve the content, which will regenerate the site
automatically when you change your content. Your browser will update with the
changes immediately, making your development cycle for a site a very tight
loop.
The basic premise of Hugo is that your content is organized in a specific way on
purpose. Folders of content and the name of the files combine to turn into the
url at which they are hosted. For example, content/foo/bar/baz.md will be hosted
at <site>/foo/bar/baz.
Every content file has a section of metadata at the top that allows you to
specify information about the content, like the title, date, even arbitrary data
for your specific site (for example, I have lists of badges that are shown on
pages for code projects).
All the data in a content file is just that - data. Other than markdown
specifying a rough view of your page, the actual way the content is viewed is
completely separated from the data. Views are written in Go’s templating
language, which is quick to pick up and easy to use if you’ve used other
templating languages (or even if, like me, you haven’t). This lets you do
things like iterate over all the entries in a menu and print them out in a ul/li
block, or iterate over all the posts in your blog and display them on the main
page.
You can learn more about Hugo by going to its site,
which, of course, is built using Hugo.
The static content for this site is hosted on github pages at
https://github.com/natefinch/natefinch.github.io. But the static content is
relatively boring… that’s what you’re looking at in your browser right now.
What’s interesting is the code behind it. That lives in a separate repo on
github at https://github.com/natefinch/npf. This is where the markdown content
and templates live.
Here’s how I have things set up locally… all open source code on my machine
lives in my GOPATH (which is set to my HOME). So, it’s easy to find anything I
have ever downloaded. Thus, the static site lives at
$GOPATH/src/github.com/natefinch/natefinch.github.io and the markdown +
templates lives in $GOPATH/src/github.com/natefinch/npf. I created a symbolic
link under npf called public that points to the natefinch.github.io directory.
This is the directory that hugo outputs the static site to by default… that
way Hugo dumps the static content right into the correct directory for me to
commit and push to github. I just had to add public to my .gitignore so
everyone wouldn’t get confused.
Then, all I do is go to the npf directory, and run
hugo new post/urlofpost.md
hugo server --buildDrafts --watch -t hyde
That generates a new content item that’ll show up on my site under
/post/urlofpost. Then it runs the local webserver so I can watch the content by
pointing a browser at localhost:1313 on a second monitor as I edit the post in a
text editor. hyde is the name of the theme I’m using, though I have modified
it. Note that hugo will mark the content as a draft by default, so you need
–buildDrafts for it to get rendered locally, and remember to delete the draft =
true line in the page’s metadata when you’re ready to publish, or it won’t show
up on your site.
When I’m satisfied, kill the server, and run
hugo -t hyde
to generate the final site output, switch into the public directory, and
git commit -am "some new post"
That’s it. Super easy, super fast, and no muss. Coming from Blogger, this is
an amazingly better workflow with no wrestling with the WYSIWYG editor to make
it display stuff in a reasonable fashion. Plus I can write posts 100% offline
and publish them when I get back to civilization.
There’s a lot more to Hugo, and a lot more I want to do with the site, but that
will come in time and with more posts :)
This is the first post of my new blog. You may (eventually) see old posts
showing up behind here, those have been pulled in from my personal blog at
blog.natefinch.com. I’ve decided to split off my
programming posts so that people who only want to see the coding stuff don’t
have to see my personal posts, and people that only want to see my personal
stuff don’t have to get inundated with programming posts.
Right now the site is pretty basic, but I will add more features to it, such as post history etc.
I recently needed to update my npipe package, and since I want it to be production quality, that means setting up CI, so that people using my package can know it’s passing tests. Normally I’d use Travis CI or Drone.io for that, but npipe is a Windows-only Go package, and neither of the aforementioned services support running tests on Windows.
With some googling, I saw that Nathan Youngman had worked with AppVeyor to add Go support to their CI system. The example on the blog talks about making a build.cmd file in your repo to enable Go builds, but I found that you can easily set up a Go build without having to put CI-specific files in your repo.
To get started with AppVeyor, just log into their site and tell it where to get your code (I logged in with Github, and it was easy to specify what repo of mine to test). Once you choose the repo, go to the Settings page on AppVeyor for that repo. Under the Environment tab on the left, set the clone directory to C:\GOPATH\src<your import path> and set an environment variable called GOPATH to C:\GOPATH. Under the build tab, set the build type to “SCRIPT” and the script type to “CMD”, and make the contents of the script
go get -v -d -t <your import path>/…
(this will download the dependencies for your package). In the test tab, set the test type to “SCRIPT”, the script type to “CMD” and the script contents to
go test -v -cover ./…
(this will run all the tests in verbose mode and also output the test coverage).
That’s pretty much it. AppVeyor will automatically run a build on commits, like you’d expect. You can watch the progress on a console output on their page, and get a pretty little badge from the badges page. It’s free for open source projects, and seems relatively responsive from my admittedly limited experience.
This is a great boon for Go developers, so you can be sure your code builds and passes tests on Windows, with very little work to set it up. I’m probably going to add this to all my production repos, even the ones that aren’t Windows-only, to ensure my code works well on Windows as well as Linux.
BoltDB is a pure Go persistence solution that saves data to a memory mapped file. I call it a persistence solution and not a database, because the word database has a lot of baggage associated with it that doesn’t apply to bolt. And that lack of baggage is what makes bolt so awesome.
Bolt is just a Go package. There’s nothing you need to install on the system, no configuration to figure out before you can start coding, nothing. You just go get github.com/boltdb/bolt and then import “github.com/boltdb/bolt”.
All you need to fully use bolt as storage is a file name. This is fantastic from both a developer’s point of view, and a user’s point of view. I don’t know about you, but I’ve spent months of work time over my career configuring and setting up databases and debugging configuration problems, users and permissions and all the other crap you get from more traditional databases like Postgres and Mongo. There’s none of that with bolt. No users, no setup, just a file name. This is also a boon for users of your application, because they don’t have to futz with all that crap either.
Bolt is not a relational database. It’s not even a document store, though you can sort of use it that way. It’s really just a key/value store… but don’t worry if you don’t really know what that means or how you’d use that for storage. It’s super simple and it’s incredibly flexible. Let’s take a look.
Storage in bolt is divided into buckets. A bucket is simply a named collection of key/value pairs, just like Go’s map. The name of the bucket, the keys, and the values are all of type []byte. Buckets can contain other buckets, also keyed by a []byte name.
… that’s it. No, really, that’s it. Bolt is basically a bunch of nested maps. And this simplicity is what makes it so easy to use. There’s no tables to set up, no schemas, no complex querying language to struggle with. Let’s look at a bolt hello world:
// retrieve the data err = db.View(func(tx *bolt.Tx) error { bucket := tx.Bucket(world) if bucket == nil { return fmt.Errorf(“Bucket %q not found!”, world) }
val := bucket.Get(key) fmt.Println(string(val))
return nil })
if err != nil { log.Fatal(err) } }
// output: // Hello World!
I know what you’re thinking - that seems kinda long. But keep in mind, I fully handled all errors in at least a semi-proper way, and we’re doing all this:
1.) creating a database 2.) creating some structure (the “world” bucket) 3.) storing data to the structure 4.) retrieving data from the structure.
I think that’s not too bad in 54 lines of code.
So let’s look at what that example is really doing. First we call bolt.Open to get the database. This will create the file if necessary, or open it if it exists.
All reads from or writes to the bolt database must be done within a transaction. You can have as many Readers in read-only transactions at the same time as you want, but only one Writer in a writable transaction at a time (readers maintain a consistent view of the DB while writers are writing).
To begin, we call db.Update, which takes a function to which it’ll pass a bolt.Tx - bolt’s transaction object. We then create a Bucket (since all data in bolt lives in buckets), and add our key/value pair to it. After the write transaction finishes, we start a read- only transaction with DB.View, and get the values back out.
What’s great about bolt’s transaction mechanism is that it’s super simple - the scope of the function is the scope of the transaction. If the function passed to Update returns nil, all updates from the transaction are atomically stored to the database. If the function passed to Update returns an error, the transaction is rolled back. This makes bolt’s transactions completely intuitive from a Go developer’s point of view. You just exit early out of your function by returning an error as usual, and bolt Does The Right Thing. No need to worry about manually rolling back updates or anything, just return an error.
The only other basic thing you may need is to iterate over key/value pairs in a Bucket, in which case, you just call bucket.Cursor(), which returns a Cursor value, which has functions like Next(), Prev() etc that return a key/value pair and work like you’d expect.
There’s a lot more to the bolt API, but most of the rest of it is more about database statistics and some stuff for more advanced usage scenarios… but the above is all you really need to know to start storing data in a bolt database.
For a more complex application, just storing strings in the database may not be sufficient, but that’s ok, Go has your back there, too. You can easily use encoding/json or encoding/gob to serialize structs into the database, keyed by a unique name or id. This is what makes it easy for bolt to go from a key/value store to a document store - just have one bucket per document type. Again, the benefit of bolt is low barrier of entry. You don’t have to figure out a whole database schema or install anything to be able to just start dumping data to disk in a performant and manageable way.
The main drawback of bolt is that there are no queries. You can’t say “give me all foo objects with a name that starts with bar”. You could make your own index in the database and keep it up to date manually. This could be as easy as a slice of IDs serialized into an “indices” bucket for a particular query. Obviously, this is where you start getting into the realm of developing your own relational database, but if you don’t go overboard, it can be nice that all this code is just that - code. It’s not queries in some external DSL, it’s just code like you’d write for an in-memory data store.
Bolt is not for every application. You must understand your application’s needs and if bolt’s key/value style will be sufficient to fulfill those needs. If it is, I think you’ll be very happy to use such a simple data store with so little mental overhead.
[edited to clarify reader/writer relationship] Bonus Gob vs. Json benchmark for storing structs in Bolt:
Yesterday, I was trying to think of a way of automating some doc generation for my go packages. The specific task I wanted to automate was updating a badge in my package’s README to show the test coverage. What I wanted was a way to run go test -cover, parse the results, and put the result in the correct spot of my README. My first thought was to write an application that would do that for me … but then I’d have to run that instead of go test. What I realized I wanted was something that was “compatible with go test” - i.e. I want to run go test and not have to remember to run some special other command.
And that’s when it hit me: What is a test in Go? A test is a Go function that gets run when you run “go test”. Nothing says your test has to actually test anything. And nothing prevents your test from doing something permanent on your machine (in fact we usually have to bend over backwards to make sure our tests don’t do anything permanent. You can just write a test function that updates the docs for you.
I actually quite like this technique. I often have some manual tasks after updating my code - usually updating the docs in the README with changes to the API, or changing the docs to show new CLI flags, etc. And there’s one thing I always do after I update my code - and that’s run “go test”. If that also updates my docs, all the better.
Covergen is a particularly heinous example of a test that updates your docs. The heinous part is that it actually doubles the time it takes to run your tests… this is because that one test re-runs all the tests with -cover to get the coverage percent. I’m not sure I’d actually release real code that used such a thing - doubling the time it takes to run your tests just to save a few seconds of copy and paste is pretty terrible.
However, it’s a valid example of what you can do when you throw away testing convention and decide you want to write some code in a test that doesn’t actually test anything, and instead just runs some automated tasks that you want run whenever anyone runs go test. Just make sure the result is idempotent so you’re not continually causing things to look modified to version control.
I love Beyond Compare, it’s an awesome visual diff/merge tool. It’s not free, but I don’t care, because it’s awesome. However, there’s no built-in configuration for Go code, so I made one. Not sure what the venn diagram of Beyond Compare users and Go users looks like, it might be that I’m the one point of crossover, but just in case I’m not, here’s the configuration file for Beyond Compare 3 for the Go programming language: http://play.golang.org/p/G6NWE0z1GC (please forgive the abuse of the Go playground)
Just copy the text into a file and in Beyond Compare, go to Tools->Import Settings… and choose the file. Please let me know if you have any troubles or suggested improvements.
Go’s interfaces are one of it’s best features, but they’re also one of the most confusing for newbies. This post will try to give you the understanding you need to use Go’s interfaces and not get frustrated when things don’t work the way you expect. It’s a little long, but a bunch of that is just code examples.
Go’s interfaces are different than interfaces in other languages, they are implicitly fulfilled. This means that you never need to mark your type as explicitly implementing the interface (like class CFoo implements IFoo). Instead, your type just needs to have the methods defined in the interface, and the compiler does the rest.
For example:
type Walker interface {
Walk(miles int)
}
type Camel struct {
Name string
}
func (c Camel) Walk(miles int) {
fmt.Printf(“%s is walking %v miles\n”, c.Name, miles)
}
func LongWalk(w Walker) {
w.Walk(500)
w.Walk(500)
}
func main() {
c := Camel{“Bill”}
LongWalk(c)
}
// prints
// Bill is walking 500 miles.
// Bill is walking 500 miles.
Camel implements the Walker interface, because it has a method named Walk that
takes an int and doesn’t return anything. This means you can pass it into the
LongWalk function, even though you never specified that your Camel is a Walker.
In fact, Camel and Walker can live in totally different packages and never know
about one another, and this will still work if a third package decides to make a
Camel and pass it into LongWalk.
Non-Standard Continuation
This is where most tutorials stop, and where most questions and problems begin.
The problem is that you still don’t know how the interfaces actually work, and
since it’s not actually that complicated, let’s talk about that.
What actually happens when you pass Camel into LongWalk?
So, first off, you’re not passing Camel into LongWalk. You’re actually
assigning c, a value of type Camel to a value w of type Walker, and w is what
you operate on in LongWalk.
Under the covers, the Walker interface (like all interfaces), would look more or
less like this if it were in Go (the actual code is in C, so this is just a
really rough approximation that is easier to read).
type Walker struct {
type InterfaceType
data *void
}
type InterfaceType struct {
valtype *gotype
func0 *func
func1 *func
...
}
All interfaces values are just two pointers - one pointer to information about
the interface type, and one pointer to the data from the value you passed into
the interface (a void in C-like languages… this should probably be Go’s
unsafe.Pointer, but I liked the explicitness of two actual *’s in the struct to
show it’s just two pointers).
The InterfaceType contains a pointer to information about the type of the value
that you passed into the interface (valtype). It also contains pointers to the
methods that are available on the interface.
When you assign c to w, the compiler generates instructions that looks more or
less like this (it’s not actually generating Go, this is just an easier-to-read
approximation):
data := c
w := Walker{
type: &InterfaceType{
valtype: &typeof(c),
func0: &Camel.Walk
}
data: &data
}
When you assign your Camel value c to the Walker value w, the Camel type is
copied into the interface value’s Type.valtype field. The actual data in the
value of c is copied into a new place in memory, and w’s Data field points at
that memory location.
Implications of the Implementation
Now, let’s look at the implications of this code. First, interface values are
very small - just two pointers. When you assign a value to an interface, that
value gets copied once, into the interface, but after that, it’s held in a
pointer, so it doesn’t get copied again if you pass the interface around.
So now you know why you don’t need to pass around pointers to interfaces -
they’re small anyway, so you don’t have to worry about copying the memory, plus
they hold your data in a pointer, so changes to the data will travel with the
interface.
Interfaces Are Types
Let’s look at Walker again, this is important:
type Walker interface
Note that first word there: type. Interfaces are types, just like string is a
type or Camel is a type. They aren’t aliases, they’re not magic hand-waving,
they’re real types and real values which are distinct from the type and value
that gets assigned to them.
Now, let’s assume you have this function:
func LongWalkAll(walkers []Walker) {
for _, w := range walkers {
LongWalk(w)
}
}
And let’s say you have a caravan of Camels that you want to send on a long walk:
You want to pass caravan into LongWalkAll, will the compiler let you? Nope.
Why is that? Well, []Walker is a specific type, it’s a slice of values of type
Walker. It’s not shorthand for “a slice of anything that matches the Walker
interface”. It’s an actual distinct type, the way []string is different from
[]int. The Go compiler will output code to assign a single value of Camel to a
single value of Walker. That’s the only place it’ll help you out. So, with
slices, you have to do it yourself:
walkers := make([]Walker, len(caravan))
for n, c := range caravan {
walkers[n] = c
}
LongWalkAll(walkers)
However, there’s a better way if you know you’ll just need the caravan for
passing into LongWalkAll:
Note that this goes for any type which includes an interface as part of its
definition: there’s no automatic conversion of your func(Camel) into
func(Walker) or map[string]Camel into map[string]Walker. Again, they’re totally
different types, they’re not shorthand, and they’re not aliases, and they’re not
just a pattern for the compiler to match.
Interfaces and the Pointers That Satisfy Them
What if Camel’s Walk method had this signature instead?
func (c *Camel) Walk(miles int)
This line says that the type *Camel has a function called Walk. This is
important: *Camel is a type. It’s the “pointer to a Camel” type. It’s a
distinct type from (non-pointer) Camel. The part about it being a pointer is
part of its type. The Walk method is on the type *Camel. The Walk method (in
this new incarnation) is not on the type Camel. This becomes important when you
try to assign it to an interface.
c := Camel{“Bill”}
LongWalk(c)
// compiler output:
cannot use c (type Camel) as type Walker in function argument:
Camel does not implement Walker (Walk method has pointer receiver)
To pass a Camel into LongWalk now, you need to pass in a pointer to a Camel:
c := &Camel{“Bill”}
LongWalk(c)
or
c := Camel{“Bill”}
LongWalk(&c)
Note that this true even though you can still call Walk directly on Camel:
c := Camel{“Bill”}
c.Walk(500) // this works
The reason you can do that is that the Go compiler automatically converts this
line to (&c).Walk(500) for you. However, that doesn’t work for passing the
value into an interface. The reason is that the value in an interface is in a
hidden memory location, and so the compiler can’t automatically get a pointer to
that memory for you (in Go parlance, this is known as being “not addressable”).
Nil Pointers and Nil Interfaces
The interaction between nil interfaces and nil pointers is where nearly everyone
gets tripped up when they first start with Go.
Let’s say we have our Camel type with the Walk method defined on *Camel as
above, and we want to make a function that returns a Walker that is actually a
Camel (note that you don’t need a function to do this, you can just assign a
*Camel to a Walker, but the function is a good illustrative example):
func MakeWalker() Walker {
return &Camel{“Bill”}
}
w := MakeWalker()
if w != nil {
w.Walk(500) // we will hit this
}
This works fine. But now, what if we do something a little different:
func MakeWalker(c *Camel) Walker {
return c
}
var c *Camel
w := MakeWalker(c)
if w != nil {
// we’ll get in here, but why?
w.Walk(500)
}
This code will also get inside the if statement (and then panic, which we’ll
talk about in a bit) because the returned Walker value is not nil. How is that
possible, if we returned a nil pointer? Well, let’s go look back to the
instructions that get generated when we assign a value to an interface.
data := c
w := Walker{
type: &InterfaceType{
valtype: &typeof(c),
func0: &Camel.Walk
}
data: &data
}
In this case, c is a nil pointer. However, that’s a perfectly valid value to
assign to the Walker’s Data value, so it works just fine. What you return is a
non-nil Walker value, that has a pointer to a nil *Camel as its data. So, of
course, if you check w == nil, the answer is false, w is not nil… but then
inside the if statement, we try to call Camel’s walk:
And when we try to do c.Name, Go automatically turns that into (*c).Name, and
the code panics with a nil pointer dereference error.
Hopefully this makes sense, given our new understanding of how interfaces wrap
values, but then how do you account for nil pointers? Assume you want
MakeWalker to return a nil interface if it gets passed a nil Camel. You have to
explicitly assign nil to the interface:
func MakeWalker(c *Camel) Walker {
if c == nil {
return nil
}
return c
}
var c *Camel
w := MakeWalker(c)
if w != nil {
// Yay, we don’t get here!
w.Walk(500)
}
And now, finally, the code is doing what we expect. When you pass in a nil
*Camel, we return a nil interface. Here’s an alternate way to write the
function:
func MakeWalker(c *Camel) Walker {
var w Walker
if c != nil {
w = c
}
return w
}
This is slightly less optimal, but it shows the other way to get a nil
interface, which is to use the zero value for the interface, which is nil.
Note that you can have a nil pointer value that satisfies an interface. You
just need to be careful not to dereference the pointer in your methods. For
example, if *Camel’s Walk method looked like this:
I hope this article helps you better understand how interfaces works, and helps
you avoid some of the common pitfalls and misconceptions newbies have about how
interfaces work. If you want more information about the internals of interfaces
and some of the optimizations that I didn’t cover here, read Russ Cox’s article
on Go interfaces, I highly recommend it.
Functions in Go are first class citizens, that means you can have a variable that contains a function value, and call it like a regular function.
printf := fmt.Printf printf(“This will output %d line.\n”, 1)
This ability can come in very handy for testing code that calls a function which is hard to properly test while testing the surrounding code. In Juju, we occasionally use function variables to allow us to stub out a difficult function during tests, in order to more easily test the code that calls it. Here’s a simplified example:
// in install/mongodb.go package install
func SetupMongodb(path string) error { // suppose the code in this method modifies files in root // directories, mucks with the environment, etc… // Actions you actively don’t want to do during most tests. }
So, suppose you want to write a test for Bootstrap, but you know SetupMongodb won’t work, because the tests don’t run with root privileges (and you don’t want to setup mongodb on the dev’s machine anyway). What can you do? This is where mocking comes in.
TestBootstrap(t *testing.T) { f := &fakeSetup{ err: errors.New(“Failed!”) } // this mocks out the function that Bootstrap() calls setupMongo = f.setup err := Bootstrap() if err != f.err { t.Fail(“Error from setupMongo not returned. Expected %v, got %v”, f.err, err) } expPath := getPath() if f.path != expPath { t.Fail(“Path not correctly passed into setupMongo. Expected %q, got %q”, expPath, f.path) }
// and then try again with f.err == nil, you get the idea }
Now we have full control over what happens in the setupMongo function, we can record the parameters that are passed into it, what it returns, and test that Bootstrap is at least using the API of the function correctly.
Obviously, we need tests elsewhere for install.SetupMongodb to make sure it does the right thing, but those can be tests internal to the install package, which can use non-exported fields and functions to effectively test the logic that would be impossible from an external package (like the setup package). Using this mocking means that we don’t have to worry about setting up an environment that allows us to test SetupMongodb when we really only want to test Bootstrap. We can just stub out the function and test that Bootstrap does everything correctly, and trust that SetupMongodb works because it’s tested in its own package.
I started to write a blog post about how to get the most out of godoc, with examples in a repo, and then realized I could just write the whole post as godoc on the repo, so that’s what I did. Feel free to send pull requests if there’s anything you see that could be improved.
I actually learned quite a lot writing this article, by exploring all the nooks and crannies of Go’s documentation generation. Hopefully you’ll learn something too.
The Go compiler treats unused variables as a compilation error. This causes much
annoyance to some newbie Gophers, especially those used to writing languages
that aren’t compiled, and want to be able to be fast and loose with their code
while doing exploratory hacking.
The thing is, an unused variable is often a bug in your code, so pointing it out
early can save you a lot of heartache.
Here’s an example:
50 func Connect(name, port string) error {
51 hostport := ""
52 if port == "" {
53 hostport := makeHost(name)
54 logger.Infof("No port specified, connecting on port 8080.")
55 } else {
56 hostport := makeHostPort(name, port)
57 logger.Infof("Connecting on port %s.", port)
58 }
59 // ... use hostport down here
60 }
Where’s the bug in the above? Without the compiler error, you’d run the code
and have to figure out why hostport was always an empty string. Did we pass in
empty strings by accident? Is there a bug in makeHost and makeHostPort?
With the compiler error, it will say “53, hostport declared and not used” and
“56, hostport declared and not used”
This makes it a lot more obvious what the problem is… inside the scope of the
if statement, := declares new variables called hostport. These hide the
variable from the outer scope, thus, the outer hostport never gets modified,
which is what gets used further on in the function.
50 func Connect(name, port string) error {
51 hostport := ""
52 if port == "" {
53 hostport = makeHost(name)
54 logger.Infof("No port specified, connecting on port 8080.")
55 } else {
56 hostport = makeHostPort(name, port)
57 logger.Infof("Connecting on port %s.", port)
58 }
59 // ... use hostport down here
60 }
The above is the corrected code. It took only a few seconds to fix, thanks to
the unused variable error from the compiler. If you’d been testing this by
running it or even with unit tests… you’d probably end up spending a non-
trivial amount of time trying to figure it out. And this is just a very simple
example. This kind of problem can be a lot more elaborate and hard to find.
And that’s why the unused variable declaration error is actually a good thing.
If a value is important enough to be assigned to a variable, it’s probably a bug
if you’re not actually using that variable.
Bonus tip:
Note that if you don’t care about the variable, you can just assign it to the
empty identifier directly:
_, err := computeMyVar()
This is the normal way to avoid the compiler error in cases where a function
returns more than you need.
If you really want to silence the unused variable error and not remove the
variable for some reason, this is the way to do it:
v, err := computeMyVar()
_ = v // this counts as using the variable
Just don’t forget to clean it up before committing.
All of the above also goes for unused packages. And a similar tip for silencing
that error:
_ = fmt.Printf // this counts as using the package
Francesc Campoy recently posted about how to work on someone else’s Go repo from github. His description was correct, but I think there’s an easier way, and also one that might be slightly less confusing. Let’s say you want to work on your own branch of github.com/natefinch/gocog - here’s the easiest way to do it:
Fork github.com/natefinch/gocog on github
mkdir -p $GOPATH/src/github.com/natefinch/gocog
cd $GOPATH/src/github.com/natefinch/gocog
git clone https://github.com/YOURNAME/gocog .
(optional) go get github.com/natefinch/gocog
That’s it. Now you can work on the code, push/pull etc from your github repo as normal, and submit a pull request when you’re done.
go get is useful for getting code that you want to use, but it’s not very useful for getting code that you want to work on. It doesn’t set up source control. git clone does. What go get is handy for is getting the dependencies of a project, which is what step 5 does (only needed if the project relies on outside repos you don’t already have). (thanks to a post on G+ for reminding me that git clone won’t get the dependencies)
Also note, the path on disk is the same as the original repo’s URL, not your branch’s URL. That’s intentional, and it’s the key to making this work. go get is the only thing that actually cares if the repo URL is the same as the path on disk. Once the code is on disk, go build etc just expects import paths to be directories under $GOPATH. The code expects to be under $GOPATH/src/github.com/natefinch/gocog because that’s what the import statements say it should be. There’s no need to change import paths or anything wacky like that (though it does mean that you can’t have both the original version of the code and your branch coexisting in the same $GOPATH).
Note that this is actually the same procedure that you’d use to work on your own code from github, you just change step 1 to “create the repo in github”. I prefer making the repo in github first because it lets me set up the license, the readme, and the .gitignore with just a few checkboxes, though obviously that’s optional if you want to hack locally first. In that case, just make sure to set up the path under gopath where it would go if you used go get, so that go get will work correctly when you decide to push up to github. (updated to mention using go get after git clone)
This is just a collection of tips that would have saved me a lot of time if I had known about them when I was a newbie:
Build or test everything under the current directory and subdirectories:
go build ./… go test ./…
Technically, both commands take a pattern to match the name of one or more packages, and the … specifier is a wildcard, so you could do …/foo/… to match all packages under GOPATH with foo in their path.
Have an io.Writer that writes to an in-memory data structure:
b := &bytes.Buffer{} thing.WriteTo(b)
Have an io.Reader read from a string (useful when you want to use a string as the input data for something):
r := strings.NewReader(myString) thing.ReadFrom(r)
Copy data from a reader to a writer:
io.Copy(toWriter, fromReader)
Timeout waiting on a channel:
select { case val := <- ch // use val case <-time.After(time.Second*5) }
Convert a slice of bytes to a string:
var b []byte = getData() s := string(b)
Passing a nil pointer into an interface does not result in a nil interface:
func isNil(i interface{}) bool { return i == nil } var f *foo = nil fmt.Println(isNil(f)) // prints false
The only way to get a nil interface is to pass the keyword nil:
var f *foo = nil if f == nil { fmt.Println(isNil(nil)) // prints true }
How to remember where the arrow goes for channels:
The arrow points in the direction of data flow, either into or out of the channel, and always points left.
The above is generalizable to anything where you have a source and destination, or reading and writing, or assigning.
Data is taken from the right and assigned to the left, just as it is with a := b. So, like io.Copy, you know that the reader (source) is on the right, the writer (destination) is on the left: io.Copy(dest, src).
If you ever think “man, someone should have made a helper function to do this!”, chances are they have, and it’s in the std lib somewhere.
I’ve been a developer at Canonical (working on Juju) for a little over 3 months, and I have to say, this is the best job I have ever had, bar none.
Let me tell you why.
1.) 100% work from home (minus ~2 one week trips per year) 2.) Get paid to write cool open source software. 3.) Work with smart people from all over the globe.
#1 can’t be overstated. This isn’t just “flex time” or “work from home when you want to”. There is literally no office to go to for most people at Canonical. Working at home is the default. The difference is huge. My last company let us work from home as much as we wanted, but most of the company worked from San Francisco… which means when there were meetings, 90% of the people were in the room, and the rest of us were on a crappy speakerphone straining to hear and having our questions ignored. At Canonical, everyone is remote, so everyone works to make meetings and interactions work well online… and these days it’s easy with stuff like Google Hangouts and IRC and email and online bug tracking etc.
Canonical’s benefits don’t match Google’s or Facebook’s (you get the standard stuff, health insurance, 401k etc, just not the crazy stuff like caviar at lunch… unless of course you have caviar in the fridge at home). However, I’m pretty sure the salaries are pretty comparable… and Google and Facebook don’t let you work 100% from home. I’m pretty sure they barely let you work from home at all. And that is a huge quality of life issue for me. I don’t have to slog through traffic and public transportation to get to work. I just roll out of bed, make some coffee, and sit down at my desk. I get to see my family more, and I save money on transportation.
#2 makes a bigger difference than I expected. Working on open source is like entering a whole different world. I’d only worked on closed source before, and the difference is awesome. There’s purposeful openness and inclusion of the community in our development. Bug lists are public, and anyone can file one. Mailing lists are public (for the most part) and anyone can get on them. IRC channels are public, and anyone can ask questions directly to the developers. It’s a really great feeling, and puts us so much closer to the community - the people that have perhaps an even bigger stake in the products we make than we do. Not only that, but we write software for people like us. Developers. I am the target market, in most cases. And that makes it easy to get excited about the work and easy to be proud of and show off what I do.
#3 The people. I have people on my team from Germany, the UK, Malta, the UAE, Australia, and New Zealand. It’s amazing working with people of such different backgrounds. And when you don’t have to tie yourself down to hiring people within a 30 mile radius, you can afford to be more picky. Canonical doesn’t skimp on the people, either. I was surprised that nearly everyone on my team was 30+ (possibly all of them, I don’t actually know how old everyone is ;) That’s a lot of experience to have on one team, and it’s so refreshing not to have to try to train the scrappy 20-somethings to value the things that come with experience (no offense to my old colleagues, you guys were great).
Put it all together, and it’s an amazing opportunity that I am exceedingly pleased to have been given.
At the end of July, I started a new job at Canonical, the makers of Ubuntu Linux. Canonical employees mostly work from home, and use their own computer for work. Thus, I would need to switch to Ubuntu from Windows on my personal laptop. Windows has been my primary operating system for most of my 14 year career. I’ve played around with Linux on the side a few times, running a mail server on Mandrake for a while… and I’ve worked with Cent OS as server for the software at my last job… but I wouldn’t say I was comfortable spending more than a few minutes on a Linux terminal before I yearned to friggin’ click something already…. and I certainly hadn’t used it as my day to day machine.
Enter Ubuntu 13.04 Raring Ringtail, the latest and greatest Ubuntu release (pro-tip, the major version number is the year it was released, and the minor version number is the month, Canonical does two releases a year, in April and October, so they’re all .04 and .10, and the release names are alphabetical).
Installation on my 2 year old HP laptop was super easy. Pop in the CD I had burned with Ubuntu on it, and boot up… installation is fully graphical, not too different from a Windows installation. There were no problems installing, and only one cryptic prompt… do I want to use Logical Volume Management (LVM) for my drives? This is the kind of question I hate. There was no information about what in the heck LVM was, what the benefits or drawbacks are, and since it sounded like it could be a Big Deal, I wanted to make sure I didn’t pick the wrong thing and screw myself later. Luckily I could ask a friend with Linux experience… but it really could have done with a “(Recommended)” tag, and a link for more information.
After installation, a dialog pops up asking if I want to use proprietary third party drivers for my video card (Nvidia) or open source drivers. I’m given a list of several proprietary drivers and an open source driver. Again, I don’t know what the right answer is, I just want a driver that works, I don’t care if it’s proprietary or not (sorry, OSS folks, it’s true). However, trying to be a good citizen, I pick the open source one and…. well, it doesn’t work well at all. I honestly forget exactly what problems I had, but they were severe enough that I had to go figure out how to reopen that dialog and choose the Nvidia proprietary drivers.
Honestly, the most major hurdle in using Ubuntu has been getting used to having the minimize, maximize, and close buttons in the upper left of the window, instead of the upper right.
In the first week of using Ubuntu I realized something - 99% of my home use of a computer is in a web browser… the OS doesn’t matter at all. There’s actually very little I use native applications for outside of work. So, the transition was exceedingly painless. I installed Chrome, and that was it, I was back in my comfortable world of the browser.
Linux has come a long way in the decade since I last used it. It’s not longer the OS that requires you drop into a terminal to do everyday things. There are UIs for pretty much everything that are just as easy to use as the ones in Windows, so things like configuring monitors, networking, printers, etc all work pretty much like they do in Windows.
So what problems did I have? Well, my scanner doesn’t work. I went to get drivers for it, and there are third party scanner drivers, but they didn’t work. But honestly, scanners are pretty touch and go in Windows, too, so I’m not terribly surprised. All my peripherals worked (monitors, mouse, keyboard, etc), and even my wireless printer worked right away. However, later on, my printer stopped working. I don’t know exactly why, I had been messing with the firewall in Linux, and so it may have been my fault. I’m talking to Canonical tech support about it, so hopefully they’ll be able to help me fix it.
Overall, I am very happy with using Linux as my every day operating system. There’s very few drawbacks for me. Most Windows software has a corresponding Linux counterpart, and now even Steam games are coming to Linux, so there’s really very little reason not to make the switch if you’re interested.
I gave a talk at the Go Boston meetup last night and figured I should write it up and put it here.
The second thing everyone says when they read up on Go is “There are no generics!”.
(The first thing people say is “There are no exceptions!”)
Both are only mostly true, but we’re only going to talk about generics today.
Go has generic built-in data structures - arrays, slices, maps, and channels. You just can’t create your own new type, and you can’t create generic functions. So, what’s a programmer to do? Find another language?
No. Many, possibly even most, problems can be solved with the built-in data structures. You can write pretty huge applications just using maps and slices and the occasional channel. There may be a tiny bit of code duplication, but probably not much, and certainly not any tricky code.
However, there definitely are times when you need more complicated data structures. Most people writing Go solve this problem by using Interface{}, the empty interface, which is basically like Object in C# or Java or void * in C/C++. It’s a thing that can hold any type… but then you need to type cast it to get at the actual type. This breaks static typing, since the compiler can’t tell if you make a mistake and pass the wrong type into something that takes an Interface{}, and it can’t tell until runtime if a cast will succeed or not.
So, is there any solution? Yes. The inspiration comes from the standard library’s sort package. Package sort can sort a slice of any type, it can even sort things that aren’t slices, if you’ve made your own custom data structure. How does it do that? To sort something, it must support the methods on sort.Interface. Most interesting is Less(i, j int). Less returns true if the item at index i in your data structure is Less than the object at index j in your data structure. Your code has to implement what “Less” means… and by only using indices, sort doesn’t need to know the types of objects held in your data structure.
This use of indices to blindly access data in a separate data structure is how we’ll implement our strongly typed tree. The tree structure will hold an index as its data value in each node, and the indices will index into a data structure that holds the actual objects. To make a tree of a new type, you simply implement a Compare function that the tree can use to compare the values at two indices in your data structure. You can use whatever data structure you like, probably a slice or a map, as long as you can use integers to reference values in the data structure.
In this way we separate the organization of the data from the storage of the data. The tree structure holds the organization, a slice or map (or something custom) stores the data. The indices are the generic pointers into the storage that holds the actual strongly typed values.
This does require a little code for each new tree type, just as using package sort requires a little code for each type. However, it’s only a few lines for a few functions, wrapping a tree and your data.
You can check out an example binary search tree I wrote that uses this technique in my github account
This required only 36 lines of code to make the actual tree structure (including empty lines and comments).
In some simple benchmarks, this implementation of a tree is about 25% faster than using the same code with Interface{} as the values and casting at runtime…. plus it’s strongly typed.
The Go programming language is built from the ground up to implicitly encourage Go projects to be open source. If you want your project not only to contribute to open source, but to encourage other people to write open source code, Go is a great language to choose.
Let’s look at how Go does this. These first two points are overly obvious, but we should get them out of the way.
The language is open source
You can go look at the source code for the language, the compilers, and the build tools for the language. It’s a fully open source project. Even though a lot of the work is being done by Google engineers, there are hundreds of names on the list of contributors of people who are not Google employees.
The standard library is open source
Want to see high quality example code? Look at the code in the standard library. It has been carefully reviewed to be of the best quality, and in canonical Go style. Reading the standard library is a great way to learn the best ways to use and write Go.
Ok, that’s great, but what about all the code that isn’t part of Go itself?
The design of Go really shows its embrace of open source in how third party code is used in day to day projects.
Go makes it trivial to use someone else’s code in your project
Go has distributed version control built-in from the ground up. If you want to use a package from github, for example, you just specify the URL in the imports, as if it were a local package:
You don’t have to go find and download fake/foo from github and put it in a special directory or anything. Just run “go get github.com/fake/foo”. Go will then download, build, and install the code, so that you can reference it… nicely stored in a directory defined by the URL, in this case $GOPATH/src/github.com/fake/foo. Go will even figure out what source control system is used on the other side so you don’t have to (support for git, svn, mercurial, and bazaar).
What’s even better is that the auto-download happens for anyone who calls “go get” on your code repository. No more giving long drawn-out installation instructions about getting half a dozen 3rd party libraries first. If someone wants your code, they type “go get path.to/your/code”, and Go will download your code, and any remote imports you have (like the one for github above), any remote imports that code has, etc, and then builds everything.
The fact that this is available from the command line tools that come with the language makes it the de facto standard for how all Go code is written. There’s no fragmentation in the community about how packages are stored, accessed, used, etc. This means zero overhead for using third party code, it’s as easy to use as if it were built into the Go standard library.
Sharing code is the default
Like most scripting languages (and unlike many compiled languages), using source code from another project is the default way to use third party code in Go. Go creates a monolithic executable during its build, so there are no DLLs to create and distribute in the way you often see with other compiled languages. In theory you could distribute the compiled .a files from your project for other people to link to in their project, but this is not encouraged by the tooling, and I’ve personally never seen anyone do it.
All Go code uses the same style
Have you ever gone to read the source for a project you’d like to contribute to, and had your eyes cross over at the bizarre formatting the authors used? That almost never happens with Go. Go comes with a code formatting tool called gofmt that automatically formats Go code to the same style. The use of gofmt is strongly encouraged in the Go community, and nearly everyone uses it. Most text editors have an extension to automatically format your code with gofmt on save, so you don’t even have to think about it. You never have to worry about having a poorly formatted library to work with… and in the very rare situation where you do, you can just run it through gofmt and you’re good to go.
Easy cross platform support
Go makes it easy to support multiple platforms. The tooling can create native binaries for any popular operating system from the same source on a single machine. If you need platform-specific code, it’s easy to specify code that only gets compiled for a single platform, by simply appending _<os> to a file name .e.g path_windows.go will only be compiled for builds targeting Windows.
Built-in documentation and testing
Go comes with a documentation generator that spits generates HTML or plain text from minimally formatted comments in the code. It also comes with a standard testing package that can run unit tests, performance benchmarks, and runnable example code. Because this is all available in the standard library and with the standard tools, nearly everyone uses it… which means it’s easy to look at the documentation for any random Go package, and easy check if the tests pass, without having to go install some third party support tool. Because it’s all standardized, several popular websites have popped up to automate generating (and hosting) the documentation for your project, and you can easily run continuous integration on your package, with only a single line in the setup script - “language: go”.
Conclusion
Everything about Go encourages standardization and openness… which not only makes it possible to use other people’s code, it makes it easyto use other people’s code. I hope to see Go blossom as a language embraced by the open source community, as they discover the strengths that make it uniquely qualified for open source projects.
The best things about Go have nothing to do with the language.
Single Executable Output
Go compiles into a single executable that runs natively on the target OS. No more needing to install java, .net, mono, python, ruby, whatever. Here’s your executable, feel free to run it like a normal person. And you can target builds for any major OS (windows, linux, OSX, BSD).
One True Coding Style
GoFmt is a build tool that formats your source code in the standard Go format. No more arguing about spacing or brace matching or whatever. There is one true format, and now we can all move on… and even better, many editors integrate GoFmt so that your code can be automatically formatted whenever you save.
Integrated Testing
Testing is integrated into the language. Name a file with the suffix _test.go and it’ll only build under test. You run tests simply by running “go test” in the directory. You can also define runnable example code with output that is checked at test time. This example code is then included in the documentation (see below)… now you’ll never have examples in documentation with errors in them. Finally, you can have built-in benchmarks that are controlled by the go tool to automatically run enough iterations to get a significant result, displayed in number of operations per second.
Integrated Documentation
HTML documentation is built into the language. No need for ugly HTML in your source or weirdly formatted comments. Plaintext comments are turned into very legible documentation, and see above for examples that actually run and can have their output tested as a part of the tests.
DVCS
Support for distributed version control is built into the language. Want to reference code from a project on github? Just use the url of the project as the import path in your code, e.g. import “github.com/jsmith/foo” When you build your code it’ll get downloaded and built automatically.
Want to get a tool written in go? From the command line type “go get github.com/jsmith/bar” - go will download the source, build it, and install the executable in your path. Now you can run bar.
Any git, SVN, mercurial, or bazaar repository will work, but all the major public source code sites are supported out of the box - github, bitbucket, google code, and launchpad.
Other Cool Stuff
Debugging with gdb Integrated profiling tools Easy to define custom includes per targeted OS/architecture (simple _windows will only build if targetting windows) Integrated code parsers and lexers.
Do you even care about the actual language anymore? I wouldn’t. But just in case:
C-like
Garbage Collected
Statically typed
…but with type inference so you’re not typing boilerplate all the time: a := “my string”
Implicit interfaces - if a type has the methods of an interface, it implements the interface
Pointers but no pointer arithmetic (thank god)
First class functions
No exceptions
…but multiple returns from a single function so you don’t have to overload return types
Everything is UTF8 (both strings and source code.. yes you can have Θ as a variable name now)
Highly performant asynchronous code that is trivial to write
A deep standard library that does most of the boring stuff for you
I recently got very enamored with Go, and decided that I needed to write a real program with it to properly get up to speed. One thing came to mind after reading a lot on the Go mailing list: a code generator.
I had worked with Ned Batchelder at a now-defunct startup, where he developed cog.py. I figured I could do something pretty similar with Go, except, I could do one better - Go generates native executables, which means you can run it without needing any specific programming framework installed, and you can run it on any major operating system. Also, I could construct it so that gocog supports any programming language embedded in the file, so long as it can be run via command line.
Gocog runs very similarly to cog.py - you give it files to look at, and it reads the files looking for specially tagged embedded code (generally in comments of the actual text). Gocog extracts the code, runs it, and rewrites the file with the output of the code embedded.
Thus you can do something like this in a file called test.html:
if you run gocog over the file, specifying python as the command to run:
gocog test.html -cmd python -args %s -ext .py
This tells gocog to extract the code from test.html into a file with the .py extension, and then run python <filename> and pipe the output back into the file.
This is what test.html looks like after running gocog:
Note that the generator code still exists in the file, so you can always rerun gocog to update the generated text.
By default gocog assumes you’re running embedded Go in the file (hey, I wrote it, I’m allowed to be biased), but you can specify any command line tool to run the code - python, ruby, perl, even compiled languages if you have a command line tool to compile and run them in a single step (I know of one for C# at least).
“Ok”, you’re saying to yourself, “but what would I really do with it?” Well, it can be really useful for reducing copy and paste or recreating boilerplate. Ned and I used it to keep a schema of properties in sync over several different projects. Someone on Golang-nuts emailed me and is using it to generate boilerplate for CGo enum properties in Go.
Gocog’s sourcecode actually uses gocog - I embed the usage text into three different spots for documentation purposes - two in regular Go comments and one in a markdown file. I also use gocog to generate a timestamp in the code that gets displayed with the version information.
You don’t need to know Go to run Gocog, it’s just an executable that anyone can run, without any prerequisites. You can download the binaries of the latest build from the gocog wiki here: https://github.com/natefinch/gocog/wiki
Feel free to submit an issue if you find a bug or would like to request a feature.
No, not contests, golang (the programming language), and Win as in Windows.
Quick background - Recently I started writing a MUD in Go for the purposes of learning Go, and writing something that is non-trivial to code. MUDs are particularly suited to Go, since they are entirely server based, are text-based, and are highly concurrent and parallel problems (which is to say, you have a whole bunch of people doing stuff all at the same time on the server).
Anyway, after getting a pretty good prototype of the MUD up and running (which was quite fun), I started thinking about using Go for some scripty things that I want to do at work. There’s a bit of a hitch, though… the docs on working in Windows are not very good. In fact, if you look at golang.org, they’re actually non-existent. This is because the syscall package changes based on what OS you’re running on, and (not surprisingly) Google’s public golang site is not running on Windows.
So, anyway, a couple notes here on Windowy things that you (I) might want to do with Go: