Archive for the ‘ Software Development ’ Category

AWS: The Complete Guide to Setting up a Simple Webserver

In short, I got sick and tired of paying $700/month for a cabinet in a datacenter and hosting my own servers. The failover was never REALLY truly redundant and my firm is just too small to justify hosting its own infrastructure. I’ve never been a fan of hosting services because I’ve always found them too limiting. I consider myself to be a damn good systems administrator and not having access to router configs and having root access to the machines were nonstarters for me. I’m set in my ways.

While my experience so far has been excellent, there are a number of pitfalls that you need to be aware of, and I will be sure to address those towards the end. This guide is intended for non-technical people to do very technical things so expect a LOT of explanation.

Sign up for Amazon Cloud Services

First, go to aws.amazon.com and create an account. You’ll have to enter your billing info, but as long as you stay within the free tier, you won’t be charged anything.

Set up your virtual machine

Next, you’ll want to create an EC2 instance. AWS Free Tier gives you 750 hours per month of Linux or Windows EC2 Micro Instances. There’s roughly that many hours in a month, so as long as you only have one micro instance, you should be completely free. For 99% of small business websites, this is PLENTY, though Amazon does offer a variety of configurations!

1. Sign in to the AWS Management Console. It’s a link under the dropdown in the top right with the credentials you already created.
1
2. Click on EC2
2
3. Click the button marked “Launch Instance”.
3
4. Click “Classic Wizard” and continue.
5. Select a 64 bit Amazon Linux AMI type server. This is a CentOS based linux server for those who are curious.
6. Set the number of instances to 1 and make sure that you have Micro for the type. The availability zone shouldn’t make a huge difference unless you’re latency sensitive and you know where your users are concentrated. If this isn’t you, select “no preference” on the dropdown and continue. On the next page, all of the defaults are fine, so go ahead and hit continue again.
8. On the next page, you’ll have to give server a name. Give your server a name in value box next to name and hit continue.
9. In order to access the server from your local machine, you will need to create a key pair. This will become important later when we discuss how to connect to the server, but for now just enter a name under “Create a New Key Pair” and click “Create & Download your Key Pair”. This will download a .pem file. NOTE: THIS FILE IS IMPORTANT Put it somewhere secure and preferably backed up.
10. To set up the firewall of the server, we need to allow webserver and administrative related traffic in but block out anything that might be malicious. See the settings in the image below:
4
11. Click “Launch” and you’ll have to give it a few minutes while AWS runs off and builds your configuration.

While we’re waiting, I want to get one last step out of the way so that we’re on top of things for what’s next. If you go to the instances menu item on the left, you’ll bring up a list that contains your server. Clicking on the server instance will bring up some information about your server on the bottom. Scroll down to “Public DNS” and copy the value to the right of it. You’re going to be using this address quite a bit. If you have registered a domain that you want to point to this server, now would be the time to do it. Go to your domain registrant and paste the Public DNS value into wherever your forwarding to. I host through godaddy, so I went to my DNS manager and changed the “@” record to the copied value.

Now let me explain a bit about what you’ve done so far. Those with a technical background can skip to the next section.

A virtual machine is a piece of software that emulates a computer’s hardware. You can put any operating system on it and it will just run. Virtual Machines are great for things like creating software testing environments because you can very quickly set up a “brand new clean” virtual machine, perform whatever tests you need, then throw it out and start fresh. You can vary hardware configurations such as memory, processor architecture, and hard drive size relatively easy. Amazon EC2 instances are virtual machines with public IP addresses that can be accessed from the internet, so they are ideal for our purposes. So far you’ve created a virtual server and made note of its public address, possibly hooking it up to a domain if you have one.

Get Connected to your VM

I’m running a mac using parallels, so I can explain how to connect your server to both environments. In my windows environment, I use WinSCP for file transfers and Putty for shell access. So go ahead and download both of those programs first. Next, dig up that .pem file that I told you was super important. You’re going to have to do a bit of work to get that .pem file to work with putty because it can only really use its proprietary format (don’t ask me why).

1. Open PuTTYgen.
2. Find .pem file. Click Conversions -> Import key.
3. Open the PEM file. The Key pane should populate with a public key, private key fingerprint, comment, and passphrase.
4. Change the comment from “imported-openssh-key” to whatever you named your key pair in step 9 above.
5. Pick a passphrase and click “Save private key.”
6. You’re going to store your key in a timesaver called Pagent, so go ahead and open it. When you open Pagent you’ll just see a new icon in the systemtray.
7. Right click on the new icon in the systray and click “Add Key”
8. Open your PPK file that you created with PuTTYgen and enter the passphrase you created.
9. Close the window
10. Grab the address of the server. Open Putty and paste the public address you copied into the Host Name input.
5

For Winscp, you’re just going to click the “new” button, and enter all of your hostname and login credentials like you did for Putty. There’s a “Private Key File” option that you’ll use to specify that same .PPK file that you created for Putty. After you’ve done this, go ahead and try to connect to browse the filesystem of your server.

On the mac side, I use the terminal to issue commands on the server. There’s a bit of set up involved to make it easy.
1. Take the key pair file (this is the .PEM, not the .PPK) and copy it to “/[your user id]/.ssh/”
2. Start up the Terminal application. I just go to the search bar and type “Terminal”
3. type “ssh ec2-user@[your server's host address]“. The command will ultimately end up looking something like this: ssh ec2-user@ec2-00-00-00-00.compute-1.amazonaws.com

Get services running on the VM

Now that our server us up and running and we have access to it, it’s time to issue a bunch of linux commands. I’m going to tell you what they are, then explain.

1. Connect to your server either using putty or mac terminal. The default login is ec2-user
2. Type in (“run” for future reference) sudo su. This will switch users to the root user. The root user is allowed to do whatever it wants whereas when you logged in as ec2-user, you’re highly restricted. If you’re on a PC, you can just copy the command text from here, go into your putty window and right click.
3. yum update – yum is a package manager. It connects to a repository of all of the latest greatest software available for this installation of linux and installs whatever software pacakges you tell it to. Think of it as the equivalent of windows update on steroids.
4. yum install httpd mysql mysql-server php php-mysql php-xml php-pdo php-odbc php-soap php-common php-cli php-mbstring php-bcmath php-ldap php-imap php-gd nano sysstathttpd is also known as Apache. It’s a webserver that will ultimately serve up your html pages and php scripts. MySQL is a database server that will be required by our later installations of drupal and wordpress. PHP is a very powerful scripting language that is required by wordpress and drupal. It’s an interpreted language that outputs text (html, css, javascript, or whatever you tell it to really) to a browser. Nano is a text editor that we’ll use when we’re not editing files in winscp. Sysstat is used to get system monitor like information.
5. Fire up the webserver by running service httpd start
6. Fire up the database server by running service mysqld start
7. Let’s create a user ID and password for the mysql server and then restart the service mysqladmin -u root password [new password] and then run service mysqld restart
8. Test everything out to make sure that it worked. cd /var/www/html/
nano index.php type in <?php phpinfo(); ?> and then CTRL + O then CTRL + X. Now open a web browser and navigate to your hostname (if you’ve already forwarded your domain here, go ahead and give that a shot). The page should open with detailed information about your php settings.
9. [Optional] To make your life easy from a permission standpoint, it can’t hurt to make some folders easily accessible by ec2-user so that you can copy and run from your webserver. chown -R ec2-user /var/www

Install web software

In this section, our ultimate goal is to install a CMS such as wordpress or drupal, but first we’re going to make our lives easy from a database standpoint by installing phpMyAdmin.

1. Go ahead and download phpMyAdmin from the link above. Unzip into a folder and copy that folder to /var/www/html/ using winscp (or a copy command if you’re on a mac).
2. Change the name of the folder to something you can remember. “db_admin,” for example.
3. From a web browser, navigate to the folder you just installed. “www.example.com/db_admin,” for example.
4. PhpMyAdmin will allow you to create databases, which will be used by wordpress and/or drupal. You can create the databases now, but I would actually recommend waiting because web software will usually create the databases for you.
Screen Shot 2013-05-06 at 10.16.26 AM
5. Using the steps above, follow this exact procedure to copy wordpress and drupal to their own folders. The installations themselves are pretty self explanatory, but if you need help I might be talked into doing a step-by-step guide for each of those.

Optimize Machine

I had this really nasty issue of the virtual machine running out of memory and mysql blowing up and crashing. I made some changes to my my.cnf file that made the installation significantly more stable but I was still left feeling like I wasn’t leaving myself enough room for error, so I also created a swapfile just to put my mind at ease.

You’ll want to navigate to /etc/my.cnf and open in an editor. If you login as root (sudo su just like before) you can edit in nano, or you can use chown to allow you to edit the file in winscp as ec2-user. The changes I made to my my.cnf file are as follows:

# Set internal buffers, caches and stacks very low
key_buffer = 320K
table_cache = 10
sort_buffer_size = 320K
read_buffer_size = 320K
read_rnd_buffer_size = 24K
net_buffer_length = 24K
thread_stack = 320K

innodb_buffer_pool_size = 10M

To add a swapfile, issue the following commands in this exact order:


sudo dd if=/dev/zero of=/var/swapfile bs=1M count=2048
sudo chmod 600 /var/swapfile
sudo mkswap /var/swapfile
echo /var/swapfile none swap defaults 0 0 | sudo tee -a /etc/fstab
sudo swapon -a

Post-Trade Analysis of Yesterday’s AP Hack!

And How Gatekeeper Fared…
One concern that I’ve noticed amongst options traders is their reluctance to auto-quote. Auto-quoting is the usage of an options model to send buy and sell orders in terms of volatility rather than outright market prices. Doing this is useful for traders who want to buy and sell at specified volatility points across an entire option chain, and they often times delta hedge directly into the underlying whenever their auto-quoting orders are filled.

The fear from auto-quoting comes from slow or latent systems getting hit on multiple strikes whenever a “book sweep,” or rapid repricing of the underlying, occurs. Auto-quoting computer systems need to do up to four Black-Scholes (or insert your favorite model) calculations for every single strike that they’re auto-quoting in, and then get change orders back out on the wire to the matching engine. If you’re quoting in 10 different underlyings, each with 40 live strikes under fast market conditions, you could be having to do up to 1600 partial differential equations per repricing. This in combination with messaging overhead, automated order logic, and user-interfaces can bring even the most powerful trading computers to their knees in fast market conditions! In other words, auto-quoting systems have historically tended to fail right when they’re needed the most!

Market conditions like yesterday’s False Tweet coming from Associated Press can be incredibly expensive in the first place, and a failure or freeze of an auto-quoting system could be catastrophic. The following is a recount of my experiences using SD Gatekeeper during the crash.

First a bit of background: I’ve been at a client site for the past week testing the next generation of tools we’re building. I sit in a small office right next to the traders that are using it and I spend most of my time babysitting algorithms and communicating with the developers about any bugs or performance issues with the software. In the meantime, the traders are using Gatekeeper to make 2 sided markets. Due to the proprietary nature of my customers’ trade and my duty to protect their trade secrets, I will not be disclosing what markets they were participating in however I can share the following pieces of data:

  • At the time of the crash, they were making 2 sided markets in 6 different markets, 5 of which were severely impacted by the crash.
  • All of the auto-quoting was taking place on a client machine several milliseconds away from the CME.

Around 10:00 AM PST, we observed a massive sell off in the E-Mini S&P. Though the exact events that took place from there are both a bit hazy to me due to the activity and strictly confidential, the important piece of information is that several minutes had passed between when the crash began to take place and when we finally pulled our orders. This sounds like all of the makings of disaster, but not so.

Blue Horseshoe Loves Gatekeeper

During the entire period of heightened market activity, Gatekeeper was successful in moving its orders out of the way and NOT ONE SINGLE ORDER was filled at a bad price. While there was a brief lockup of the User Interface, the auto-quoting functionality performed flawlessly even during the lockup.

For the tech geeks who want to understand how this is possible, I’m going to go into my analysis of the log files and subsequent stress testing and code profiling that my developers did. Casual readers my want to stop at this point, and I don’t blame you.

For the tech people:

Gatekeeper is an adapter-centric front end that uses a handful of assemblies. Namely, Gatekeeper, GKAPI, Strategies, and its adapters. All of the adapters inherit from our adapter base class to make it easy for us (and third-party developers) to add exchange connectivity and additional execution venues. All prices, orders, and business logic are then normalized into our concept of instruments, orders, prices and such in the GKAPI. Gatekeeper is simply a dumb UI implementation of the GKAPI. The strategies; Formula (autoquoting), Bookstackers, and Spread Bandit are all separate assemblies that use the GKAPI.

From a threading standpoint, we have a thread for each adapter, one for strategies, one for logging, and one for the UI. There other background workers doing various other tasks but for the purpose of this article, these are the relevant ones. The only one that became overwhelmed was the UI, and as we later found out, some of that wasn’t our fault.

In the form used yesterday, there were three supported adapters active; XTAPI, CTS, and Risk Server, however only the CTS Adapter was in use for auto-quoting purposes at the time of the crash.

For load-balancing purposes, CTS places limits on how often you can add or change orders to 50 orders per second but places no limitations on how many orders you can cancel. To prevent us from having our orders rejected, our adapter keeps track of how many changes and adds we’ve sent in a given second and cancels any order that “wants to change” after we’ve begun to approach our limit. We then add new orders at the requested change prices at a metered rate, as the adapter allows so that the user basically sees a changed order when in reality they were canceling, waiting, and replacing.

Under market conditions like yesterday, this essentially means that we were only slightly participating in the market (no more than 50 of our thousands of orders were active in the market at any given time).

Had we been using a DMA solution with co-location, we would have been fast enough to keep up anyways because our formula engine, FAST decoders and FIX engine have all been designed to accommodate for rapid repricing and mass-quoting. We take advantage of things like kernel bypass, scheduler optimizations, and iLink session spinning to bypass the limitations of 3rd party API’s.

 

Still Using Spreadsheets to Manage Risk or Create Algos? Time to Reconsider.

I’ve been building technology for trading firms for over a decade. From proprietary desks with in-house development and security so tight that you have to check your cellphones at the door, to shared office spaces where people openly collaborate about their strategies; I’ve seen it all. I’d say that one of the most common tools that I’ve seen on trading desks is Microsoft Excel.

Excel is popular because you can combine realtime data feeds from a multitude of sources with complicated mathematical formulas to do incredibly complex things incredibly quickly from a time-to-delivery standpoint. I’ve seen excel sheets used for everything from position analytics and reporting, to even realtime execution algorithms and complex order types. As a developer, I use Excel sheets to build proofs of concept constantly. Gatekeeper’s Formula Strategy, Spread Bandit, and even the Position Manager in Gatekeeper found their genesis in spreadsheets. It’s easy to use and anyone with a basic understanding of math and logic can become a programmer, well, almost.

Like just about anything that seems too good to be true, it is. All of the things that make Excel powerful can also make it dangerous. Being able to link cells together to form incredibly complex strings of logic without any kind of compile checks or design best practice enforcement means that everyone who “knows enough to be dangerous,” is. According to a recent Marketwatch Article, 88% of excel sheets contain errors! This is not surprising considering that a research survey of senior executives conducted by Vision Critical, only one fifth of companies have control policies for spreadsheets and even in companies that have spreadsheet policies, it is not always applied one third of the time!

One thing that separates traders from programmers, however, is that programmers have bug elimination built into their trade. They use strongly typed languages, compile time checks, best practices policies built into their development methodologies, and post-development quality assurance processes before their code ever makes it into production. Software engineers have had a lifetime of hard knocks and painful experiences stemming from bugs to know that you can’t put a prototype into production, and the same should apply to spreadsheets.

So as a trader, what can you do to protect yourself from spreadsheet errors?

The first thing to do is to find places where your spreadsheets are either error-prone, rely on third party data sources, or rely on bad or changing assumptions. In those places, you need to add checks for data validity and create alerts for when it’s not.

If your spreadsheets are used to enter orders into the market and/or display data that might cause a trading decision, it’s time to contract a developer to build you a custom tool, or evaluate off-the-shelf solutions. I’ve seen position greeks be completely wrong because someone didn’t update an expiration date after a contract roll or realtime price valuations become slightly off because interest rates changed and weren’t updated (okay that hasn’t happened recently, but you get the idea).

If hiring a software company sounds too expensive, remember that it’s cheaper than trading errors. Just ask JP Morgan who lost $2 billion due to spreadsheet errors! At the very least, give this article a read about best practices, or check out the European Spreadsheet Risks Interest Groups to fully understand the risks.