All posts by David Veksler

FEE.org Project Retrospective Part 3: tools for managing software projects

This is the third part in my series on the migration of fee.org to Umbraco.  This post will cover the tools and processes I used to manage this project.

Development Process Overview

Here is a high level overview of the implementation process – some of the the development steps were done in parallel:

  1. Interview stakeholders – understand their workflow and goals for the new site
  2. Perform in-depth site survey – build a sitemap to understand site structure
  3. Document types/page field survey – survey all fields (properties) used on each page, think about how they are related
  4. Template design – a combination of scraping the visual design of the old site and building static HTML for the new one
  5. Document type design – implement the document types in Umbraco based on the Page Field survey results
  6. Custom functionality – implement all the custom functionality not provided by Umbraco – payments, custom widgets, etc.
  7. Content migration – prepare a set of migration tools which would be run at the time of the switch
  8. Live release – execute the switch, including the final migration of content and media from old site to new.

Tools Overview:

  • Basecamp: basecamp was used in lieu of email for information discussions.
  • JIRA: JIRA is used to manage all development tasks
  • BitBucket (git): contains source code & database scripts,
  • TeamCity: build server used for continous integration of the dev servers and production releases
  • Amazon Web Services:  hosts the application, include website, DB, email, storage, etc
  • Hangouts/Skype: Permanent Hangout is used for ongoing discussion, weekly video calls on Hangout and Skype for team meetings.
  • Evernote:  a project notebook contains interview notes and other various technical snippets which I might need to refer to
  • LastPass: password manager which I use to store all project credentials and share them with the team

Project Tools in Detail:

JIRA Release Schedule:

A JIRA project is the first development artifact that I create.  It contains the development plan, all the individual dev tasks, time tracking, and links to git commits.  A project schedule is important to stakeholders, so I organize a high-level tasks list into pre-launch and post-launch releases:jira versions

JIRA Kanban Board

The JIRA Agile Board is how I organize my tasks.  If there is a dedicated PM who is familiar with scrum, I will use the scrum board, otherwise I will use the more flexible Kanban board.  I configured it into the standard four lanes:board

JIRA Task Detail:

Pretty standard.  I link JIRA to git so the commits for each task are visible, and also to TeamCity, so that the build status for each commit is linked.  As I work on stories, I add screenshots and technical notes, for myself as much as the tester/product owner.story details

BitBucket

I use a modified git-flow proces – each commit is tagged with a build, releases are tagged by date, and released code is merged to master.

Screen Shot 2015-09-26 at 12.29.19 AM

TeamCity

Each commit is automatically deployed to the dev site.  Live releases are triggered from TeamCity took.  I configure the Publish Web wizard in Visual Studio – this creates an msbuild configuration which I can trigger in the TeamCity build:TeamCity2

New Relic Monitoring

New Relic is pretty essential to running lots of websites without an ops team.  It sends alerts when there is any problem and makes it easy to identity problematic components.NewRelic

 

Evernote

Everynote contains technical notes, potential third party components, and client interviews, airplane tickets, and a lot of other information I may need to refer to:

evernote

PowerPoint

Last but not least, I write regular emails and reports to communicate project status information to non-technical stakeholders and educate them about the development process.

powerpoint-fee

Avoiding the Performance Panic Spiral of Doom

The following warning applies to anytime you try to fix a misbehaving system without understanding the cause of the problem, but especially relevant when trying to fix performance issues without knowing the cause:

The trouble started when the site started randomly slowing to a crawl at random times. The tech team met to discuss the issue.  Having failed to extract the cause by the act of stuffing enough smart people in a room, the topic shifted to solutions.

“Let’s switch our caching from memcached to redis” I said. The testing went well, and the change was made. The following testing, accompanied by a dose optimistic thinking, let to the conclusion that the issue was improved.
Everyone was happy, until it was discovered that the registration system was broken, because in one specific function, PHP failed to set the redis cache, causing a redirect loop. We fixed the problem, but the performance issue returned.Following this, another dozen configuration and code changes were tried. Since we could not consistently reproduce the performance issue, it was questionable whether any of these changes helped. The only thing which became clear was that our site was becoming increasingly unstable, and we had little experience dealing with all the new components. In desperation, we decided to start over with a new server build.

The testing of the new server went OK, until I decided to throw another new wrench in at the last second – switching from MySQL to AuroraDB. “AuroraDB is 5X faster and 100% wire compatible with MySQL” according to Amazon, but it turns out that the PerconaDB client library on the server was not, the AuroraDB default parameter group is not configured properly for high query rates, and WordPress+mysqli PHP library+AuroraDB don’t play well together.

So now, we had all our existing problems, plus the issue of configuring a new server a new set of management tools, plus the issues of switching to a new database server. Eventually, we solved all the problems we created by either learning to use the new components or reverting to old ones, but we never did figure out the cause of the performance issue, and simply patched it over with more hardware.

What’s the moral of this story?

If a website is suddenly slow, unreliable generally misbehaving for performance-related reasons, 
DO NOT TRY TO MAKE PERFORMANCE IMPROVEMENTS WITHOUT UNDERSTANDING THE ROOT CAUSE
  •  Any performance-related change should be tested to see if it makes things better.  This is impossible without a stable site.
  • Performance-related configuration and code changes should be based on evidence – quantitative proof that the specified change will help.
  • Making changes based on hunches and Internet guides is a potentially endless process as software like MySQL, Varnish, Nginx, etc offer hundreds of parameters with millions of opinions online about what is best.
  • The approach of making optimizations in the dark is a huge time drain when a quick and short-term solution is needed.  You will make many changes with unknown effectiveness, possible falling into the dreaded Performance Panic Spiral of Doom:
    1. Try to fix a problem with guesswork without understanding the cause
    2. Break an unrelated component in the process
    3. Try a more drastic fix to fix both issues
    4. Repeat, until the site is a disaster zone
HOW TO ACTUALLY RESOLVE PERFORMANCE OUTAGES: EVIDENCE-BASED ROOT CAUSE ANALYSIS
  1. Enhance your environmental awareness (by improving your monitoring & diagnostic tools*) until you can visualize, isolate, and identify the problem.
  2. Fix the problem.

* For example, using New Relic, error logs, htop, ntop, xdebug, etc.

How to regain access to an AWS EC2 instance that you’ve lost access to

Last night I accidentally locked myself out of a production EC2 instance. Arg! Panic!

How I regained access:

1: Take a snapshot of the instance.  (Note: if you require 100% uptime, this is a good time to restore the snapshot to a new instance and switch the Elastic IP to it while you fix the issue. )
2: Launch new Ubuntu recovery instance *in the same AZ* using a key file you have access to.
3: Start and SSH to your new recovery instance.
4: Create a new volume *in the same AZ* from your snapshot
5: Attach the volume you created as device /dev/sdf to your recovery instance. (You need to attach the volume after the instance is running because Linux may boot to the attached volume instead of the boot volume and you’ll still be locked out.)
6: On your new instance, run lsblk. You should see the default 8GB volume and the backed up volume you just attached.  (More @ AWS Support):


ubuntu@ip-172-31-3-214:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 8G 0 disk
└─xvda1 202:1 0 8G 0 part /
xvdf 202:80 0 100G 0 disk
└─xvdf1 202:81 0 100G 0 part

7: Mount the backed up volume using the name from lsblk:


sudo mkdir recovery
sudo mount /dev/xvddf1 /recovery

8: Now you can cd /recovery/home and fix your login issue.
If you lost your access key, edit /recovery/home/ubuntu/.ssh/authorized_keys
You can copy the private key from the new ubuntu instance that you know you have access to.  Worst case, copy the .ssh or the entire /home/ubuntu folder from the new instance to the locked-out volume.

9: Assuming you fixed your permission issue, stop the instance and detach the repaired volume.
10: Detach the old locked-out volume from your original instance and attach the repaired volume under /dev/sda1
11: Start the instance – you should have access now.  Whew.  Next time, take a snapshot before making configuration changes!

Thanks for StackOverflow for help with some tricky bits.

FEE.org Project Retrospective Part 2: Content Management Systems: Umbraco, WordPress, or Custom?

In part 1 of this series, I provided 6 time management tips for independent developers.  In this post, I will discuss why I chose Umbraco as a CMS for this project.  Part 3 will cover best-practices and tools for managing self-managed projects – people management, task management, quality assurance, deployments, sales, and communications.  Part 4 will cover content migration, custom feature implementation, launch planning, performance optimization, Amazon Web Services integration, and team engagement

WordPress, Umbraco, or Custom?

Background:

Most web developers who were around before the explosion of modern content management applications have built one themselves. I built my first CMS using Classic ASP and Access around 2002 for a university newspaper. In 2004, I designed a new CMS for Mises.org and evolved it through the latest Microsoft technologies for just over 10 years.  (More on this project here.) It was a great learning experience to build the site from a simple article archive to a million+ visits per month. During that time, I build sites using Joomla, WordPress, and Umbraco, so I was able to appreciate the costs and benefits of build a CMS from scratch.

Running a custom CMS was fun and educational, but there were many downsides. Building a popular website is a huge investment, and requires specialized expertise. It requires extensive knowledge about search engine optimization, content workflows, user experience design, full-stack performance optimization and much more. Software development in general is difficult, risky and requires continuing effort to stay up to date.

Build custom software is fun, but the criteria in choosing a CMS strategy should be to minimize technical investment and leverage third party experience as much as possible.

Selecting a CMS:

Here are the options I considered:

  1. A commercial, proprietary application with most required features built in such as SiteFininity, SiteCore, ExpressionEngine, etc
  2. An open-source (free) application such as WordPress, Drupal, and Umbraco.
  3. A custom CMS, written from scratch

Benefits of a commercial CMS:

  • Well supported by vendor.
  • Has most needed features out of the box.

Disadvantages:

  • Can be expensive. The median Adobe CQ/AEM licensing cost is $450K and median install cost is $2 million. (I had an opportunity to work an on Adobe AEM project – it’s a great tool, but I find it hard to recommend the $2 million premium over the free alternatives for most organizations.)
  • More likely to undergo major changes in future releases such as doubling in cost, targeting a different market, discontinuing product, etc.
  • No access to source code means core changes to functionality may be impossible.
  • Usually have a smaller development community and fewer and more expensive add-ons.

Benefits of an open-source CMS:

  • For popular apps, have millions of users, with community support
  • More likely to have add-ons which provide needed features
  • Access to source code means anything is customizable
  • Usually code base and features stable over long periods of time and upgrades are possible

Disadvantages:

  • Provide a set of different features often requires cobbling together plugins by different authors, with very different quality
  • The above can lead to poor performance and poor integration
  • Lack of vendor support, or perhaps expensive support by a third party

Benefits of a custom CMS:

  • Very fast and storage-efficient database and code are possible by avoiding layers of abstraction.
  • No limits on customization
  • Potentially more secure and spam proof, but avoiding bots and known exploits of popular applications
  • A completely proprietary implementation provides exactly the solution that a client needs – assuming (!) that they know what they need and it does not change over time.

Disadvantages:

  • Long-term cost of implementing features which already exist in a third party CMS

For the project of implementing a new CMS for fee.org, I considered the following candidates:

  1. WordPress: the most popular CMS with 50% of market
  2. Umbraco: an open-source .Net CMS with only about .1% of market. It’s high quality, mature and well-supported by the community
  3. Sitefinity 1% share – Commercial .Net CMS
  4. SiteCore .3% share Commercial .Net CMS
  5. ExpressionEngine $299+support PHP CMS/MySQL. 1.5%
  6. Custom CMS: write the FEE.org CMS from scratch

Top two: WordPress vs Umbraco

Ultimately, it came down to WordPress vs Umbraco:

In regard to WordPress, building Liberty.me exposed me to the pros and cons of WordPress.  I decided to use WordPress for Liberty.me because we needed to get a massive amount of functionality implemented quickly, and WordPress was the only platform with development ecosystem to make that possible. It was easy to source talent and find a plugin that did everything we needed, but we ended up with a bloated site (over 120 plugins) that stepped all over each other and required massive optimization and many layers of caching to perform well. We use nginx, varnish, hhvm, redis, three specialized CDN providers, Amazon AuroraDB and much more to get the site to perform acceptably.

Umbraco was a natural choice because of my successful experience building Hersheys Kitchens using Umbraco 4.0, and the great features it had added since then.

Winner: Umbraco

  • It is a blank slate with a lot of flexibility to create great customer experiences.
  • It is a fast and stable ASP.Net platform (with queries using the on Lucene engine) that works well out of the box.
  • It is open source, but commercially supported Great free and paid components are available
  • I’m an expert .Net/Microsoft developer, but junior in PHP, so I would have to rely on contractors much more if we went with WordPress.

Stay tuned for details of how I implemented this project in the next post…

 

FEE.org Project Retrospective Part 1: Six Secrets to Better Productivity for Independent Developers

As a software developer, I have balanced working as a developer/tech lead/architect, etc for various large corporations with working as a an independent IT consultant, building and running software for a number of small organizations.  Having the freedom to experiment in my personal work as well as exposure to professional enterprise practices has led to useful synergies, but balancing multiple projects has been quite difficult at time.

What follows are three posts with a retrospective of my latest project – porting fee.org to the Umbraco CMS, covering:

  1. Advice on personal time management for independent developers
  2. How I chose Umbraco as a CMS for this project
  3. Best-practices and tools for managing independent software projects.
  4. Content migration, custom feature implementation, launch planning, performance optimization, Amazon Web Services integration, and team engagement

I will cover the technical details of my latest project in the next post, but first, here is how I manage my time:

Speaking personally, the most difficult aspects of my work are 1 prioritizing tasks, 2 staying focused on them, and 3 work-life balance. At times in my career, there were times when I would spend only an hour or so per day working and the rest reading Wikipedia and debating in online forums. (By now, Wikipedia probably makes up over 80% of my general knowledge.)  For most technical people who don’t work for top-tier software teams or are much more productive than their team average, it’s quite feasible to spend only a minority of their time doing their assigned work. But that’s not a very smart to spend one’s career.  Anyway, here are my probably overly simple solutions for a more productive life as a developer:

1: Make setting priorities the first thing you do, and write down tasks for every activity

Setting priorities is the most complex activity that human brains engage in, and it is also what suffers the most when our brain is fatigued. We’re just not very good at setting and remembering lists of things to do. (see “Your Brain at Work” by David Rock).  So whether it’s for a six month project or a single day, I always start by writing down a list of tasks to be done and prioritizing them.

It used to be common for me to get to the office with a specific task on my mind, yet leave six hour later having worked on something entirely different. The solution to this is to making writing a prioritized set of tasks for the day the first thing I do every time I come to the office. I put task lists on paper or in Evernote, so I can refer to them later.

2: Use Project Tracking Software
On a professional level, I use project tracking software (JIRA) to write down all my tasks.  I have used JIRA to manage projects with 25 developers and multiple teams as well as projects where I was the sole developer – task tracking is essential for any project.  Using project tracking tools has many benefits for facilitating interaction with project stakeholders (that I will cover that in part #3), but it is quite essential for me to have a prioritized list of tasks for me to work on.

3: Use Test-Driven-Development

Following my task driven approach, I write the list of tasks to be done directly on my code. When I started as a developer, I would write down a list of comments in a file, then implement each function below, so that the task would become a comment describing what a single function did.  Now, I keep my classes small, and keep the big picture in a unit test class. This (1) helps me stay focused on what needs to be done (2) allows me to test each step in isolation (3) helps to detect regressions in the future (4) all the usually extreme programming benefits you can read at Wikipedia.

4: Write less code, spend more time on GitHub

It’s well established that developers on average write 10 production lines of code per day. So, if I find myself doing a lot of typing, I will step away from my computer and go for a walk. My goal as a programmer is not to write the most code, but to maximize the business value I deliver with my time.

I do this by spending my development time searching GitHub, NuGet, apt-get, etc for the best components, trying to fit them together, and building supporting tools (like unit tests) — and not debugging code or building components from scratch. If I really have a unique and worthwhile module to create, it is typically either a candidate for an open source GitHub project, or is some sort of highly proprietary business process/algorithm.

5: Schedule regular breaks

I have a theory that my obsessive Wikipedia and Reddit surfing was actually my brain’s cry for help for not allowing myself to relax and review my work. Unfortunately, by swiping from a mentally taxing production to a mentally taxing consumption activity, I was still not allowing my subconscious to relax and integrate.

I’ve since learned that it is important for me to walk away from my desk at least once per hour. When I’m deep down in work mode, I lose the ability to access, prioritize, or to identify when I need to try another approach. I will take loops around the office, or around the block several times per day to allow my mind to relax and review the approach I am taking to solving the problem at hand. It is very rare for me to be stuck on a difficult problem and have a flash of insight at my desk. I need to walk away to process problems on a high and/or subconscious level.

6: Use tracking analytics set goals, to improve time management – and do billing

I use RescueTime to track ALL my computer time, both work and entertainment. I use WakaTime to track the time I spend in IDE’s.  Git repositories record all the code and documents that I create.

Once a month or more, I spent several hours doing billing, which consists of opening up RescueTime, WakaTime, SourceTree App (git log), and Tempo (time tracking product integrated with JIRA) side by side. It takes me several hours to process 30 days, but I get a very detailed account of how I spent my productive and personal time. Once I understand how I’m spending my present time, I can decide what changes I want to make in the future.  RescueTime allows me to set personal goals for how much or little time I want to spend on certain activities.

Below you can see my allocation for July and August, and a single day.  I use all three tools to account for my hours.  Although only a portion of my work involves writing code, RescueTime helps me understand how much time I allocate to each project.

Bitcoin Exchange Project Part 2: Order Matching Algorithm

Order Matching Algorithm Description (Rough Draft – 2013/10/16)

Summary: A currency exchange is a system for buyers and sellers of different currencies to exchange different types of currency. The other matching module matches buy and sell orders, creates transactions to record the process, and updates the customers account balances.

Part One: Basic Order Processing

Action: A customer enters the quantity and prices of the orders and clicks “buy” or “sell”

1: Website creates record in OrderBook with Pending order status. The order is filed to be processed. [PlaceBuyBid

Then, the Order Matching Service iterates through the list of pending orders. [public int PlaceBuyBid(int customerId, decimal quantityOfBTC, decimal pricePerBTC, DateTime ?expirationDate = null) and public int PlaceSellOffer(int customerId, decimal quantityOfBTC, decimal pricePerBTC, DateTime ?expirationDate = null)] For each order:

2: Re-verify order status to ensure that it is still pending/active. If expired, the order is Cancelled

3: Validate the order funding. [public bool ValidateOrderFunding(Order order)] The customer must have sufficient assets to process the order. If not, the order is Suspended (it may be re-actived if funds become available later.) If order passes validation:
a: the order status is changed to Active
b: the assets needed to pay for the order are added to the Frozen balance. The prevents the customer from placing orders on more assets than he has. This feature may be removed later – we can allow spending greater than the available balance if we check the balance before processin the transaction.

4: The Order Matching Service tries to find a match to the buy or sell order. To find a match, we search all Active orders which match the specified price. [ISpecification IsMatchingOrderQuery(decimal price, int orderTypeId, int wantAssetTypeId,int offerAssetTypeId, bool? isMarketOrder)]

*If the order is a buy, we look for a price less than or equal.
*If the order is a sell, we look for a price greater than or equal.
*If it’s a market order, we find the highest (sell) or lowest (buy) price.

We sort the matches ascending for buy orders, and descending for sell orders. Then we sort by date if the price matches.  [ISpecification IsMatchingOrderQuery(decimal price, int orderTypeId, int wantAssetTypeId,int offerAssetTypeId, bool? isMarketOrder)]

5: We load the top 3 matches into memory. The reason we take 3 matches is in case one of the orders fails validation, we can try with the order 2 orders.

6: We compare the order and the match. This is a double check with C# – the order should have been already matched by the database query above. The orders should match: assets ($/BTC, order types (buy/sell),non-two market orders, and have matching prices (as above). [OrderComparisonResult CompareOrders(Order firstOrder, Order secondOrder)]

7: If the orders comparison succeds, we generate a Transaction to record the match [Transaction GetTransactionForTwoOrders(OrderComparisonResult comparisonResult)].
*A_Order is the buy order
*B_Order is the sell order
(A and B are used because I’m still not sure whether ordering should be Buy/Sell, chronological, or something else)

8: If the order quantities do not match exactly, we must generate a split order with the remainder of the larger order.

9: We run [ActivateTakeProfitAndStopLossOrders(Order order)] – TODO – should not be done here, but scheduled. (See separate post for advanced orders – take proft, stop loss, trigger, stop order, etc.)

10: We process the transaction to record the result [public Order ProcessTransaction(Transaction transaction)] (note: this module is a database transaction)

a: add the transaction and the split orders to the DB
b: for both orders in the transaction:

  •  subtract the debit access (customers balance of $ or BTC)
  •  increment the credit asset (customers balance of $ or BTC)
  •  record the commision in the commision account
  •  unfreeze frozen balances
  •  save changes

11: Repeat the process on each split order, until there is either no remaining quantity (entire order is fullfilled) or we
run out of matching orders; the remaining split order stays open as an active order


foreach (OrderProcessResultModel n in ProcessOrder(splitOrder.OrderId))
{
yield return n;
}

12: If the remaining quantity is 0, set the status to Completed

OrderBook Schema

Open Source Bitcoin Trading Engine

I am developing a Bitcoin exchange trading engine.  It’s written in C# and ASP.Net MVC4/Razor.  The UI layer is service-based using knockout.js and jqGrid to bind to JSON web services.    Order processing done in a service process.

Is anyone interested learning more about the design?  Update: OK!  Graphics below updated and first detail post added.

I think it will take at least 10 posts to document this project:

Continue reading Open Source Bitcoin Trading Engine

Creating a .Net compatible code signing certificate

The purpose of this workaround is to bypass the lack of support for CNG certificates in the .Net Framework.   I used OpenSSL to convert the certificate obtained from the Certificate Authority (Verisign, Thawte, etc) to a format supported by .Net.  This tutorial builds on the workaround in the Microsoft Connect bug report.

Ingredients:

1: Certificate text

2: Certificate imported into Windows certificate store from a root CA

3: OpenSSL

Steps:

1: Create a new file (CERT_ONLY.crt) with the certificate (text from —-BEGIN CERTIFICATE—– to —–END CERTIFICATE—–)

2: Import the certificate into certificate store via the CA website, then export it to file EF.pfx.  Include the private key.

3: Generate PEM file with the private key only:

openssl pkcs12 -in EF.pfx -nocerts -out EF.pem

4: Convert private key to RSA format

openssl rsa -in EF.pem -out EF_RSA.pem

5: Generate the code signing certificate

 openssl pkcs12 -export -out EFnew.pfx -inkey EF_RSA.pem -in CERT_ONLY.crt

6: Delete the existing certificate from the certificate store (backup first) then import the newly generated certificate