Showing posts with label betfair. Show all posts
Showing posts with label betfair. Show all posts

18 March 2011

Conclusions from Betfair's Outage



Niall Wass and Tony McAlister of betfair recently published a summary of betfair's 6 hour outage on 12 March 2011.  What follows is a review of their analysis.

Most of betfair's customers will have no idea what Niall and Tony are talking about.  "This [policy] should give maximum stability throughout a busy week that includes the Cheltenham Festival, cricket World Cup and Champions League football" is about the only non-tech part of the article that their customers can relate to.  However, for us technologists, the post provides some tasty detail for us to learn from other's mistakes.

The post is consistent with a growing and positive trend of tech oriented companies disclosing at least some technical detail of what happens to cause failures and what is to be done about it in the future.  Some benefits from this approach:
1. Apologize to your customers if you mess them about - always a good thing to do when you mess them about (Easyjet or Ryan Air - I hope you're reading this).  Even better is to offer your customers a treat - unfortunately betfair only alluded to one and didn't provide concrete commitment.
2. Give public sector analysts some confidence that this publicly traded company isn't about to capsize with technical problems
3. Receive broad review and possibly feedback about the failure.  Give specialist suppliers a chance to pitch to help out in potentially new and creative ways.
4. As a way to drive internal strategy and funding processes in a direction they otherwise might not be moving.

Level of change tends to be inversely proportional to stability.  "In a normal week we make at least 15 changes to the Betfair website…".   This is a powerful lesson that many non-tech people do not understand - the more you shove change into a system, the more you tend to decrease it's stability.  This statement also tips us that betfair has not adopted more progressive devops and continuous delivery trends to more safely pushing change into production.  

The change control thinking continues with "… but we have resolved not to release any new products or features for the next seven days".  This is absolutely the right thing to do when you're having stability issues.  Shut down the change pipeline immediately to anything other than highly targeted stability improvements.  Make no delivery of new features a "benefit" to the customer (improved stability) and send a hard statement to noisy internal product managers to take a deep breath and come back next week to push their agenda.

Although betfair might not be up on their devops and continuous delivery, they have followed the recent Internet services trend of being able to selectively shut down aspects of their service to preserve other aspects:
- "we determined that we needed our website 'available' but with betting disallowed"
- "in an attempt to quickly shed load, we triggered a process to disable some of the computationally intensive features on the site"
- "several operational protections in place to limit these types of changes during peak load"

Selective service shutdown is positive, it hints that:
1. The architecture is at least somewhat component based and loosely coupled.
2. There is a strategy to prioritize and switch off services under system duress

The assertion that betfair spent several hours verifying stability before opening the site to the public suggests bravery under fire.  "We recovered the site internally around 18:00 and re-enabled betting as of 20:00 once we were certain it was stable".  There must have been intense business pressure to resume earning money once it appeared the problem was solved.  However, during a major event, you want to make sure you're back to a stable state before you reopen your services.  A system can be in a delicate state when it is first opened back up to public load levels (e.g., page, code and data reload burden) which is one reason why we still like to perform system maintenance during low use hours so that the opening doors customer slam when the website/service opens are at least minimized.

The crux of the issue appears to be around content management, particularly web page publication.  Publishing content is tricky as there are two conditions that should be thoughtfully considered:
- Content being served while it is changing which results in "occasional broken pages caused by serving content" and here-and-gone content where content has been pushed to one server, but not another
- Inconsistency between related pieces of content (e.g., a promotional link on one page pointing to a new promotion page that hasn't been published yet)

It appears that betfair's content management system (CMS) is not async nor real time: "Every 15 minutes, an automated process was publishing…".  Any time a system is designed with hard time dependencies is a timebomb waiting to go off, with the trigger being increasing load: "Yesterday we hit a tipping point as the web servers reached a point where it was taking longer than 15 minutes to complete their update".  A lack of thread safe design is another indicator of a lack of async design that tends to enforce thread safety: "servers weren't thread-safe on certain types of content changes".  A batch, rather than on-demand approach is also symptomatic of the same design problem: "Unfortunately, the way this was done triggered a complete recompile of every page on our site, for every user, in every locale".  Therefore likely not an async on-demand pull model but rather a batch publish model.

The post concludes with a statement of what has been done to make sure the problem doesn't happen again:
1. "We've disabled the original automated job and rebuilt it to update content safely" - given the above design issues, while thread safety may have been addressed, until they address the fundamental synchronous design, I'd guess there will likely be other issues with it in the future.
2. "We've tripled the capacity of our web server farm to spread our load even more thinly" - hey, if you've got the money in the bank to do this, excellent.  However, it probably points to an underlying lack of capacity planning capability.  And of course, everyone one of those web servers depends on other services (app server, caches, databases, network, storage, …) - what have you done to those services by tripling demand on them?  Lots of spare capacity is great to have, but can potentially hide engineering problems.
3. "We've fixed our process for disabling features so that we won't make things worse."
4. "We've updated our operational processes and introduced a whole new raft of monitoring to spot this type of issue." - tuning monitoring, alerting, and trending system(s) after an event like this is crucial
5. "We've also isolated the underlying web server issue so that we can change our content at will without triggering the switch to single-threading"

And here are my lessons reminded and learned from the post:
- If you're having a serious problem, stop all changes that don't have to do with fixing the problem
- Selective (de)activation of loosely coupled and component services is a vital feature and design approach
- Make sure the systems are stable and strong after an event before you open the public floodgates
- Synchronous and timer based design approaches are intrinsically dangerous, especially if you're growing quickly
- Capacity planning is important, best done regularly, incrementally and organically (like most things), not in huge bangs.  One huge bang now can cause others in the future.
- Having lots of spare capacity allows you avoid problems… for awhile.  Spare capacity doesn't fix architectural issues, just delays their appearance.
- Technology is hard and technology at scale is really hard!

Niall and Tony, thanks for giving us the opportunity to learn from what happened at betfair.

11 May 2006

Betfair into the USA and betting exchange economics

Betfair’s Christian Hellmers sparked up the Betfair Plans Rumor Mill by mentioning how betfair could provide USD 50m in revenues for racetracks and horsemen (oh, and betfair as well!) by 2008 if it is allowed to operate in the USA. Let’s think about the statement a bit.
There is no doubt that betting exchanges tend to re-vitalize what might otherwise be a flagging market. But where does that re-vitalization come from? Here is some background on the economics of exchanges in more mature markets (i.e., UK) to understand what is happening.
Betting exchanges allow savvy punters to take advantage of “free market” or “perfect market” pricing. The operator’s “excessive” margin is torn out. What does this mean for operators (or operators that trade on the exchange)? It means that operator over-rounds (profit margin) is eroded to almost zero (zero margin = perfect margin folks). If you’re betfair, this is great – you collect 2-5% commissions. If you’re the operator, you’ve been put on an extremely leveled playing field that only allows you to differentiate on price (approaching zero margin!).
But is a betting exchange really creating a “perfect” market for their punters? No, not really. Once you factor in the 2-5% commission for betfair, PLUS the eroded but present margin present in the exchange’s prices, the punter is in a similar position to where they were before. The punter pays a premium to betfair to wring out every drop of margin from the betfair layers.
So what has really happened here?
Betfair has been one of three market forces driving down bookie prices. The second force is the bookies themselves competing against each other. The third force is internet tools that allow the comparison of odds between multiple operators (e.g., Betbrain, OddsChecker). One might argue that the second and third forces would force down prices to the bone as well, albeit a bone that was defined by traditional gambling operator's cost model. Of course, betfair could drive out the additional bookie cost of odds compilation and trading, meaning that they could compete from a little better cost position then the bookies.
The US market, even more so than the UK market of 5 years ago when betting exchanges first appeared, is populated by fragmented monopoly operators. There is plenty of margin to be “removed” (that is, in part transferred to betfair) up by the betting exchanges. Given that this is the case, then it becomes a win-win for betfair and US punters (and a BIG LOSE for existing legal gambling operators in the US).
You’ll notice that Christian’s list of beneficiaries didn’t include existing US racing operators. In fact, by excluding other operators as beneficiaries, he is essentially suggesting that betfair be the (new) monopoly operator in the US market. Of course, the lead betting exchange, by virtue of the value of liquidity and p2p network effects, would naturally evolve to a monopoly position anyway.
It would be a huge coup for betfair to be able to legally (as might be made legal by US government law in the future) offering US horse racing to US citizens. The US focused, so-called “peer to peer” betting exchange betbug claims they can operate legally already in the US so there is some, albeit trivial, precedence.
Betfair, at least in times past, has offered US horse racing. Is there much interest in US racing outside the US? Not much. Who is participating in those market? I wonder. Its not that US-based punters aren’t familiar with offshore accounts to transfer funds to and from, a common practice encouraged by the early, wild, and wooly days of US offshore betting down in the Caribbean.
It’s too bad that betfair is bound (no doubt by its US venture capital investors) to play nice in the US market. Just imagine what they could do if they took the sportingbet or betonsports positions of gladly accepting US customers.
So where does that leave betfair? It is well understood from betfair that a majority of their exchange business is on racing. They’ve pushed into the Australian racing market, and signs suggest (see previous posts) that Japan is next via the Softbank deal. I would guess that betfair will continue to push on the US, quietly growing their US horse racing business by taking bets from US punter’s offshore accounts, and wait for a chance to move into the market in a bigger way.

16 March 2006

Betfair and Softbank - deal making more sense

After a little research, I can speculate a little more about betfair and Softbank since I first wrote about them on 6 March. Two things have come to light, at least for me. First, Softbank is involved with Japanese Horse Racing. Second, Japanese Horse Racing is big business. These two points lead me to some new ideas and conclusions about the deal.
Softbank does have some prior experience with Japanese horse racing. Atei.co.uk reports in Sep 2005 that “Softbank is on track to become Japan’s first Internet group to enter the online betting market” and Softbank “plans to set up an Internet portal to provide information on the races, take bets from punters and broadcast races, under a deal with the Iwate Prefecture Horse Racing Association.”
I’m not sure of the timeframe, but ketupa.net lists Softbank interests, one of which is “JaJa Entertainment - horse racing data services (70%)”
Gaijinpot.com reports in Dec 2005 that “Softbank Corp. has begun to sell betting tickets on most local horse races in Japan on the Internet and over the phone… To launch the business, an affiliate of Softbank has taken over Nippon Racing Service Ltd., a subsidiary of the Japan Local Racing Association, which sold local horse race tickets.”
Softbank themselves in a recent quarterly report mentions “Odds Park horse racing portal” that will offer live streaming of races and will be launched in “Spring” 2006. It also mentions “D-Net” a company that provides “Internet sales of betting slips” was acquired in Dec 2005.
Add to this the significant purchase of betfair shares, Softbank is clearly involved in horse racing, and probably even has a few competencies in the area. So, how big is the horse racing pie in Japan to arouse the interest of Softbank?
Going back almost 10 years, “Anomaly” in Nov 1996, (from williamsinterference.com, apparently from the Washington Post) tells us: “Horse racing is more lucrative in Japan than anywhere else in the world. Eight of the top 10 prize-money horse races in the world are held in Japan. The Japan Cup paid a first-place prize of about $1.5 million - more than twice that paid to the winner of the Kentucky Derby.”
I found that initially a little hard to believe, but there’s more, and this from the equine’s mouth. The JRA (Japan’s larger racing organization) reports USD 24B in “Net Pari-Mutuel Handle” (i.e., turnover) in 2004, down from a high of USD 33B in 1997. The NAR (Japan’s smaller racing body) reports USD 3.3B turnover in 2004, down from 5.6B in 1998.
So it appears that the “horse racing in Japan is HUGE” claim above is credible. To keep things simple, lets assume 80% (likely less) of that parimutual pool is returned to punters and the other 20% (likely more) is kept for gambling operations, taxes, profits, and everything else to run the pools. 20% of (a combined) USD 27B in turnover so that’s… USD 5.4B in gross profits. I don’t have US, UK, Hong Kong, Australia, or any other reasonable horse racing market gambling figures at my fingertips, but I suspect that those turnover and gross profit figures are pretty compelling.
A product like betfair also has provided some revitalization to the market, at least in the UK. Softbank may also be looking for a similar effect in the flagging Japanese racing market. Perhaps cleverly, Softbank could skip a few phases of maturity that the UK went through by jumping right to exchange betting and skipping the fixed odds (“traditional bookmaker”) step. In fact, this is a check in the judgment column for Softbank to read the writing on the wall about how traditional bookmakers are being beaten down severely by betfair (if you don’t believe that statement, look at the turnover versus profitability of racing bookmaking for your favorite 3 bookmakers for the last 5 years; don’t let the hypergrowth poker numbers hid the truth of business erosion folks!).
Softbank has also demonstrated good business acumen by getting the whole value chain lined up. They have set up streaming video of races via yahoo – all important to exchange, particularly in-running, betting. They have also set up on-line wagering for the parimutuals.
Do I believe that all of this will result in a 4x increase in betfair’s turnover to warrant the P/E (that I guessed at previously) offered by Softbank? I’m more a believer now than I was, but I’m still pretty skeptical. If you’re one of the lucky few that has a few shares of betfair stock, you may want to flog it. A purchase like Softbank's won’t exactly accelerate the IPO process, and this has been the only chance the stock-owning masses at betfair have been able to sell their stock in the 6 years some of them have had it. Nor have they received any dividends from the significant profits that betfair has been earning.
Conversely, betfair are practically in a monopoly position in betting exchanges, and a growing significant position in racing betting (and sports betting), and putting your money on a monopoly is usually a wise choice. It all depends on what your timeframe for return is! I guess time was up for Europ@Web, Benchmark, UBS and all the flutter VC hangers-on to get a little payback.
---
I want to touch on one other thing the media has misrepresented about this deal. Although Softbank will become the largest *single* shareholder of stock if Andrew and Ed “have to” (perhaps they want to?!) sell down to 15% ownership each of betfair. One might notice that 15% + 15% = 30%, and 30% is well more than the 23% (at most) Softbank will have. Have E&A done anything but play nice over the last 6 years? I think the E&A voting block will trump the Softbank one.