Salty Infrastructure, Part One

I recently attended PyCon 2013 out in beautiful California, and as expected I absorbed quite a bit of knowledge from all of the great speakers there. Yeah, I know that just about the only thing that people remember from this recent PyCon involved crude jokes, public shaming, and a crazy Internet storm of hate. It’s unfortunate that all of that occurred, but that’s all I’m going to say on the matter. Not “choosing a side” or anything of the sort. Instead, I’d like to talk about something positive that I got out of the trip.

While I’m currently a backend Python developer for my current employer, in the past I’ve been a Linux system administrator. I’ve always leveraged that experience along my various jobs throughout my career and continue to find myself interested in infrastructure development, maintenance, and administration. Just something about that line of work that I always find incredibly appealing. This all lead to me attending a great talk by Kate Heddleston titled “Chef: Automating web application infrastructure”.

Obviously, as the name implies, this talk covered Chef. It’s hard to be in the development or system administration communities and not at least be aware of Chef and/or Puppet. Now, both of these are great tools that I completely recommend for anyone to check out. I’ve played a bit with them both, but Kate mentioned another tool during her talk that wasn’t quite “mature” in her eyes. That tool was SaltStack, and it’s written in Python. I was immediately interested to see what this group was up to given that it’s relatively new and written in Python. I’ve spent a couple of weeks off and on using it to bootstrap the servers for a new side project of mine and would like to share my initial thoughts. I hope to make a couple of posts about it as I learn more and get more comfortable. For now I’ll warn that what I share here should not be seen as “Best Practices” or anything such as that.

As I started to sit down and sink my teeth into Salt, I had a pretty distinct goal in mind. I’m working on a new side project with a friend of mine, and I wanted to quickly be able to spin up new servers when needed. Frankly, scaling isn’t an issue at all for us yet, but half the fun of side projects is learning something new. I also wanted to run this on EC2. I made a quick list of requirements that my server(s) were going to need:

  • Python
  • Django
  • Virtualenv
  • nginx
  • gunicorn
  • postgresql
  • supervisord
  • misc packages (git, zip/unzip, imaging libs)

Naturally, I started with the Official Salt Walkthrough. This is a pretty great method of getting yourself rolling. After going through it, you should have yourself with a “salt master” and a “salt minion”. Their names are pretty self explanatory. The “master” is the center of control, managing keys, issuing commands, and so on. Meanwhile, minion(s) are the nodes that you’ll be configuring as web servers, database servers and so on. Of course, the “master” server can also have a “minion” running on it. In my current situation, I just have 1 server running, and thus it’s a master and a minion.

Getting through the installation and initial walkthrough of Salt is pretty straightforward. I’m not going to cover that aspect right now, and instead focus on the seemingly less talked about Salt Cloud. My only real complaint so far with this tool is the documentation, but it seems to be getting some love as we speak. What they have up now in terms of documentation is better than what I recall when I first started tinkering with the tool. Still, you’ll notice that the docs I linked you to don’t have an “installation” guide. This can be handled in a couple of different ways.

The simplest way is through your operating system’s package manager. If you followed the walkthrough earlier by wgetting a script and piping it through sh then you’re already halfway there. For instance, on Ubuntu you should then be able to do:

sudo apt-get install salt-cloud

If you decided to go through a less conventional mean, then there’s always pip:

sudo pip install salt-cloud

I will say that you’re really best off if you install salt-cloud on whichever server you plan on being your “Salt Master”. Doing so will ensure that newly spun up nodes will automagically have their keys accepted, and they’ll be instantly a proper part of your Salt ecosystem. You certainly can spin up servers from servers other than your master, but you’ll eventually have to pop into your master node to accept the Salt keys.

After you have salt-cloud installed, you’ll want to edit /etc/salt/cloud on your Salt Master. Now the changes you make to this file will depend on your cloud provider of choice. They currently cover a variety of providers, but their documentation really only discusses AWS, Rackspace and Parallels. I’ll be discussing AWS strictly, as that’s who I use, but there aren’t many differences.

Assuming you have /etc/salt/cloud open, the first thing you’ll want to do is find the minion option in the config. You’ll want to tell it who the master is, and then you’ll also want to set the provider, like so:

provider: aws

Now we’ll want to find the AWS section in the config, which will look like so: YOUR_AWS_ID
AWS.private_key: /PATH/TO/PRIVATE_KEY.pem
# Specify whether to use public or private IP for deploy script
# private_ips or public_ips
AWS.ssh_interface: public_ips 
AWS.location: us-west-2
AWS.ssh_username: ubuntu

That’s pretty much all there is to the configuration. The rest of that configuration file is mostly for configuring other cloud providers. Now we need to define a cloud profile. Open up /etc/salt/cloud.profiles in your editor of choice, and add something like the following:

    provider: aws
    image: ami-YOURIMAGEID
    size: Micro Instance
    ssh-user: ubuntu

Now you should be able to run the following command:

salt-cloud -p base_aws NAME_TO_GIVE_INSTANCE

Afterwards, you should have a brand new EC2 instance running. If you ran that command from your “Salt Master” then the new node should already be configured as a minion that’s ready to receive commands.

You can find a Getting Started With AWS tutorial via Salt’s documentation, which can provide you with more fine tuning options than my bare bones tutorial provides here. I hope, in the coming weeks, to get more in depth with some of the things I’m using Salt for. For now, feel free to take a look at my Salt configurations in my GitHub Repo.


Owning Your Infrastructure

I imagine a lot of you have heard about the recent routing issues at Heroku, where RapGenius accused Heroku of changing their routing system without informing their customers. Thankfully Heroku has been really professional about this by posting an apology along with a very detailed technical review of the situation. I can’t commend them enough for owning up to the issue properly, and publicly apologizing. That said, by their own admissions the degradation of performance started roughly 3 years ago. They knew what they were doing for a long time, and the communication to the consumers was poor or non-existent.

I don’t feel all that bad for RapGenius though. For me, their success story says it all:

“You don’t have to plan for the future growth of your application, you can just put it up and see what happens.”

What sort of world am I living in when a statement like that makes any sense at all? You don’t have to plan for the future growth of your application? The truth is that you do need to plan. The problem is that using Platforms as a Service (PaaS) lead us to solve scaling problems by inserting coins. Outsourcing your infrastructure leaves you at least partially blind, if not completely blind, to how your infrastructure actually works. I know nobody wants to manually configure a switch or load balancer. I know that as developers we tend to care much more about our code, but without a solid and predictable infrastructure the code’s worthless.

I know it’s tempting to use services like Heroku, but I just wish people realized the consequences. If Heroku went out of business tomorrow, and shut their servers down what would you do? Do you know how to configure Postgres? Apache? Nginx? Do you really know how to run your website without their help? Would you know how to migrate your application to a dedicated server, or how much downtime would be involved? I’m sure you’re convinced such a scenario could never happen, but you’re wrong. Companies die, services fail, it’s all inevitable. I’m not saying people should perform a mass exodus from Heroku and their ilk, I’m just urging people to really understand their applications. Just like you plan for data loss, you should plan for what occurs when your hosting provider fails. If you know how to configure and deploy your application manually, then you’re all set. If you solely rely on companies such as Heroku though, you’ll come to regret it eventually.

Be prepared, and don’t be ignorant about your infrastructure. Be involved and have a plan for owning your infrastructure and how to grow it in the future. Don’t be so blind to think that paying someone is the only way your application can be scaled. You don’t have to buy servers and a bunch of hardware even. You can rent VPS servers, leverage AWS directly, or even use something like RackSpace’s Cloud. There are numerous options, and they don’t have to be expensive so be involved. Don’t be caught without a plan when you’re mistreated by a PaaS.


Introducing Chronicler

Just a few weeks ago the company I work for, Analyte Health, open sourced one of the tools we’ve developed. We had found ourselves routinely needing a way to accurately record the history of particular objects in our systems. Now obviously, there are a bunch of existing tools that could have handled this, but none of them seemed to suit our needs especially well. All of the other tools do a fantastic job of auditing changes to an object, but none of them do an especially good job handling the changes to relationships of that object. Our very specific need was to record the changes on a ManyToMany join table. We have a permission system that we use to assign various rights to all fifty states, so we needed to track when permissions were changed on any particular state. This is where we felt that other tools fell short, and thus created chronicler.

Most of the other tools that are out there are deprecated and no longer maintained. The others we tested didn’t seem to provide the consistent behavior we felt we needed. To be fair, the various other tools out there try to be much more flexible than we were with chronicler. We had the luxury of knowing that the objects we wanted to track only had one or two places where they could be modified. Thanks to that, we opted to go the route of creating a decorator that we could slap on views that enable users to modify our objects. It works like so:

from chronicler.decorators import audits

@audits(YourModel, ['relation_set', 'another_set'], 'pk', 'incoming_pk', 'POST')
def your_update_view(request):
    # modifications

The first argument is the model class that we want to keep our watchful eyes on. After that, we provide a list of relations that we need to track changes across. We’ll actually end up keeping full dictionary representations of the related objects that we pass along to chronicler. Following that is the field we should use to look up the object before processing the view. The ‘incoming_pk’ argument is the key we want to inspect for a value we’ll use with the previous argument to look up our object. Finally, we finish up by telling the decorator if we should look in the GET or POST of the request object for our ‘incoming_pk’ value. Optionally, you can also pass “force=True”, which will force an AuditItem to be created even if there aren’t any detectable changes. The decorator will then take all of that information, and get to work.

The way it ends up functioning is the decorator grabs a copy of the object before the view is executed. After the view is executed, we fire a custom signal, passing along our now stale object along with request.user so we know who made the changes. From there, chronicler catches the signal and saves a JSON version of the stale object in an AuditItem object. Before we create that JSON representation though, we first verify that there are actually changes. If there aren’t any changes we move on, but as mentioned above you can force AuditItems to be created even if there aren’t any changes.

After a view is processed, and you have your freshly created AuditItem you can access it like so:

from chronicler.models import AuditItem
audit_item = AuditItem.objects.filter(content_object=your_object).latest()
print audit_item.audit_data
{u'state_id': 19, u'state_code': u'ME', u'statepermission_set': [{u'created':...

“audit_data” is a @property on AuditItem objects that returns the dictionary representation of the object. Arguably, it would have been worthwhile to use one of the various JSONField options out there, but this @property in conjunction with a TextField suited our needs. Who knows, we still actively use it and hope to improve it over time, maybe we’ll make that change if it makes sense.

We hope that it helps somebody out there. We can’t possibly be the only group that ran into needs such as ours. For further updates, information, and install instructions just visit the GitHub Repo!

Blogging Again

So, it’s apparently been almost two years since my last post. That seems a bit crazy really. I’m going to dedicate myself once again to blogging more. I actually do a lot of cool stuff at work these days, and do have stuff to talk about, it’s just that I haven’t been making time. I’m going to start off by trying to blog at least twice a month. Ideally though, I’d like to get to a point of blogging at least once per week, even if it’s not wildly interesting.

Sure, this might be my last post for another two years, but hopefully I stick to it. Wish me luck?

Moving on Again

I suppose I should just drop the charade about “wow, it’s been so long since I last blogged”, eh? We both know it’s been a long time since I’ve blogged, I don’t know why it has to be covered, but it does. Maybe I should even wax and wane about how I plan on blogging more in the future. I’m not making any promises about future blog posts, though I do hope I return to blogging in the near future. We’ll see, for now though I just want to drop a quick update.

As of May 20th, I’ll no longer be working at Medtelligent/QuattroSource. I just recently reached my one year anniversary with them, but awkwardly received an offer from another company on the day of my one year anniversary. Oddly enough, I also gave a talk on django-nonrel and MongoDB that very same day at the offices of Imaginary Landscape, my previous employer. Seemed like a weird trifecta that was accomplished there. That aside, it was simply an offer I couldn’t possibly refuse. Sure, the money and benefits were right, but the company itself seems like it’ll be an amazingly good fit for me personally.

The company I speak of is Analyte Health. I don’t want to speak too much about what their business is exactly and all that jazz. You can read their site if you really want that information. It’s not that it’s not interesting, because I find their product incredibly fascinating. I just don’t think you came here for a sales pitch, and honestly, I haven’t even started there yet so I don’t think I’m exactly the guy to be making sales pitches as of yet. Of course I’ll continue to be working with Python and Django, and having a lot of fun in that space. Hopefully I’ll get to continue playing in the NoSQL space from time to time, if not I’m sure I’ll continue playing with that whole field in my spare time. It really seems like a perfect fit for me at the end of the day though. Seems like they have a great office culture, a strong product, and try to develop the right way.

Anyways, don’t have a lot of time to do a proper writeup right now. I do just want to say that I’ve had a great time at QuattroSource. The people there are great, the work was fun, and I learned a ton while I was there. Truth be told, I wasn’t even searching for a job, and I just sorta fell into a perfect situation at Analyte. I’m grateful for the time I had at Quattro, and I’m really looking forward to continue to grow with Analyte. Hopefully we’ll see some more writing from me in the future, but again, no promises :)