transmissions from a free roaming agent of kaos: scalability

Showing posts with label scalability. Show all posts

14 April 2006

More on Scalability - At the Application Level

A few weeks ago I posted on scalability, in particular high level systems and software scalability (please refer to this previous post if you don't understand some of the terms in this post). I want now to briefly touch on a business's ability to scale geographically and how this applies to software applications. This post therefore contains a few more areas to use as part of evaluating potential software vendors. Also, as before it is generally about internet gambling products.

One way a business can scale up is to take its products and services from one market and sell them in a different market. It is important that if this type of activity is part of the business strategy, the business's applications that make up its products and services will faciliate this expansion.
When Internet games/gambling businesses expand in this way, they tend to get caught out in three primary areas:

Language translation
Localization and usability
Currency

1. Language translation

For web pages and from a customer facing perspective, language should be a simple matter to change. This process is often called localization, although it should be called translation. Providing there is good separation between presentation and logic, it's typically easy to break down all the text into chunks, translate each chunk, then re-forumulate the pages.

Some languages present unique challenges such as right to left and top to bottom reading - that will be covered under Localization and usability.

This is typically more difficult for a heavy client as the text is sometimes more difficult to get at and change, and there is a heavier process for testing and distributing the resultant new heavy client. Again, providing a solid process was used to maintain text catalogs in the client, this should be fairly straightforward.
This is somewhat obvious, but on both the customer facing side AND the back office side, you should be able to effortlessly switch languages. While this isn't so important for customers, it is invaluable from a backoffice and testing side.

There are several gotchas I've encountered when discussing localization with a vendor. First, they may claim to have "localized", but all they've really done is translate the customer-visible test. Second, error messages, often generate at the applications layer, are missed by the translation effort. Third, translations haven't been done end-to-end such that the backoffice has had all text fully translated as well. All of three of these areas are required for a product to even start to be considered "localized".
Therefore, another way to measure product scalability is how quickly the product can be translated, end-to-end (customer facing to back office).
A subtlty in this area is not just what customers and employees see, but also how they enter data. For example, when a customer enters their stake for a football bet, are you ready to accept numbers both in Western Arabic form (1, 2, 3, ...) and Chinese form ( 一, 二, 三)?
Lastly, in the area translation, you may want to be able to set, easily change, and translate to/from your primary back office language. For example, to save money you may decide your Thai customer support team doesn't need to be bi-lingual. This means that all your customer facing and backoffice system must be in Thai. However, the common corporate language may be English, and all the customer service KPI results must be viewable in English and Thai.

2. Localization and usability
Localization is really much more than just translating text. It is about refactoring your product or service so that it is usable by your target market.
As a first pass on the road to localization, a business will often translate and offer one or more new languages.
The next step is to conduct usability studies to verify that your translations make contextual sense. This will often result in substantially different application UI ("User Interface", e.g., web page) layout changes, different types of help offerings, brand/color changes, and changed emphasize of product features.
Scalability in this area primarily means that the products UI allows for quick and simple changes to how information is presented to your customers and staff. Are the web pages made up of components that can be shuffled about easily? Do the web pages allow for global style changes to be made?
3. Currency
There are many aspects of handling financial accounts in gambling systems. Limiting this post to scalability, a financial system is scalable if it provides the following major features:

Does the system support any number of different currencies?
Can a customer select their working currency of choice, that is, the currency used to display all monetary figures to that customer?
Can each discrete customer have multiple financial accounts (e.g., a credit card in USD, a bank account in GBP, and a Neteller account in EUR) each in multiple different currencies?
Can the system accept and calculate against any number of currency conversion values on a frequent (at least daily) basis?
Can the system have any number of financial accounts to represent internal operations
Can the system easily switch between any number of currencies for back office reporting?

On the last point, a business will typically select its internal operating currency on a corporate level and work to it. However, for ad-hoc reporting, it is quite convenient to be able to easily switch between currencies for reporting.
---
The above areas are three more points you can use when evaluating gambling platform software vendors, at least if you're interested in adding a second language or currency.

30 March 2006

Tough Questions for Gambling Software Vendors - Eight Areas of System Performance Evaluation

Scalability is a question that comes up when evaluating businesses and their supporting technologies, especially in a high growth area like online gambling. This post will focus on the technical scalability of the technical systems of a business, particularly gaming delivery platforms in use by online gambling businesses. I won't be covering non-technical aspects of business scalability in this post.

While whole books are written on the subject, the following is an abbreviated version of what to look for when evaluating the scalability of any technical system (software and supporting systems) from high level architectural point of view. My goal is to provide you with some practical and real-life knowledge to better evaluate software vendors flogging their betting exchange, sportsbook, poker, or casino products.

To assist the discussion, we’ll use the following general purpose view of an internet based service delivery architecture:

Tier 1: Client, e.g.:

Browser-based, perhaps with lots of Javascript (including perhaps AJAX) and/or Flash

A downloadable “heavy” client

Tier 2: Application Server, e.g.:

Web Server

AND/OR

Application Server

Tier 3: Database Server

This is a standard three tier architecture, and is a common reference tool when discussing internet-based applications. There are other interesting facets of this architecture like fault tolerance and security which I won’t be covering here.

Tier 2 can be quite complicated and sub-divide in many ways. Sometimes this subdivision results in a system called an “n-tier” or multi-tier architecture.

1. General separation of delivery framework

The delivery framework is a set of tools used to deliver a solution to a customer. It isn’t the specific application like Party Gaming’s Poker software on your PC or betfair’s browser-based trading interface. It is the toolset that companies use to build their applications.

A typical delivery framework for a website like betfair might be:

Tier 1: A standards compliant browser (e.g., Firefox or Internet Explorer) that supports Javascript and Flash

Tier 2A: Web server (e.g., Apache)

Tier 2B: J2EE compliant application server (e.g., JBoss)

Tier 2C: JMS messaging service

Tier 3: Database server (e.g., Oracle)

A typical delivery framework for a PC-based internet poker game like Party Poker might be:

Tier 1: A “heavy” client (“heavy” means it isn’t based in your browser and contains code that runs directly on your PC) written in a language like C++ using Microsoft’s application development tools and supporting functions

Tier 2: Microsoft .Net application server, including many discrete components such as player, lobby, table, and chat management

Tier 3: Microsoft SQL*Server

A system can more easily scale when each of these tiers can be separated. I emphasize “can be” as for cost reasons you may put application and database server software on the same hardware when you start out, but it is a well-understood and simple migration process to pull them apart at a later time when you need and can afford greater performance.

At a practical level, for Internet gambling, major software components (e.g., application server and database) should be separated out into multiple hardware platforms, even when you’re just starting out.

As a buyer evaluating systems, the thing you look for here is the use of a fairly standard delivery framework, and not some cobbled together proprietary Frankenstein framework that will be difficult to support.

2. Functional partitioning of system components

System components are created by developers in the context of the delivery frameworks they have chosen. Functional partitioning of one component from another is useful because different components can be run separately on their own hardware allowing for greater scalability.

To evaluate partitioning, logically divide up the various components involved in using the product.

Consider a poker system. Does the same component in the system support both player handling AND table play? Intuitively, managing player logins and the logic of a poker game around a table are too very different activities and could be separated.

A classical example of this is a backoffice reporting system that is used by an operator to report on game platform activity. The reporting should be completely separated from the systems that deliver customer game play, so that heavy reporting activity doesn’t jeopardize the customer’s game play by slowing it down.

The relative goodness of this area is all about how well the supplier designed the architecture of their product. Some suppliers, who evolved from a two-guys-and-their-dog software effort compounded with a lack of experience (or benefit of hindsight anyway), really fail in this area. Look for “monolithic” architectures (that is, all the logical components you might guess should be in the system are all balled into one big entity that handles middle tier responsibilities) in Tier 2 as a sign of trouble. Ever stay awake at night wondering why your poker network solution tops out at 6000 players? It’s probably in this area, and unfortunately, there isn’t much you (or your supplier) can do about it without a massive rewrite of the software to fix the fundamental design flaws.

Sometimes complex components are difficult to subdivide. For example, consider a large multi-table poker tournament that has 10,000 registered players. The logic required to create the seating between one round and the next is probably difficult to subdivide. In that case it may be best to have a specialty component whose sole job is to take the results of one round and then create the player and table structure for the next round, and that’s it.

3. Caching

Computing takes time. One part of a system asks another part of the system to do something, and then waits to get the result. Caching is the act of keeping that result around so you don’t have to take the time to reproduce it over and over again.

Caching is a key part of a high performance architecture, and the good news is that the delivery framework being used often provides various types of caching essentially for free. Caching is also the kind of thing that can sometimes be retrofit into a system to get a later performance boost without a lot of effort.

To understand caching, you need to understand how frequently what you might cache changes and how important absolute accuracy is. You’d be surprised – accuracy often isn’t that important.

Let’s use the example of a sportsbook homepage that shows a list of football matches and prices. On the backoffice side you have odds being changed through a set of automated rules in the system and human traders watching the market. Now imagine that there are 20 users every second browsing to that home page to check the odds. Should the system have to go all the way to the database to find the price to display to each of those 20 users? Definitely not. The system should generate that list of matches and prices once every (for example) 5 seconds and let all users that request the data during those 5 seconds see the same cached list.

An extreme example of this is the betting and display activity right before the off of the Grand National on betfair. Betfair is probably processing 100s of transactions each second, and the prices and amount available a given price is changing perhaps every 20 milliseconds (20/1000 of a second). What you see when you hit refresh on that page isn’t a true representation of the market at that moment, it’s just an approximation.

At a practical level, things typically cached are whole web pages, parts of web pages, data sets extracted from the database, and derived data sets, as calculated in Tier 2.

Another use of caching on a global scale is a service like Akamai. This service keeps copies of your content geographically close to your customers so that they have (at least the appearance of) faster page load times. That way if your servers are in Costa Rica and your customers are in Russia, many aspects of your web pages can load from Akamai’s servers in Russia, which will be a lot faster than your servers in Costa Rica.

4. Message passing

Message passing is the act of one component giving one or more (other) components some information asynchronously. Asynchronous information transfer means is that the component that is giving the information doesn’t waste time waiting around for the information to be delivered. It relies on a delivery framework (e.g., JMS, the Java Messaging Service) to make sure that the receiving component(s) get the information.

The great thing here is that the information sender and receiver can be completely separate from each other allowing them to (potentially) sit on different hardware platforms, or at least putting a multi-processor server to better use.

Consider a poker system. A component decides that the ace of spades will come up as the River card to be seen by 9 players around a poker table. The component sends that message out to each of the 9 player's heavy clients via the messaging service and then continues processing without waiting around to see the information delivered to all 9. The message service handles the actual delivery process.

Message passing has been a long time critical component of high performance architectures. If your software provider, particularly for products like poker and betting exchanges where there is a lot of player interaction, you should carefully understand whether message passing has been implemented.

5. Stateless

For our purposes, the state of something is a description of its current status. When computing, state takes time and resources to maintain. When something is stateless, it doesn’t remember anything between one request and the next.

Imagine a conversation between 10,000 customers each running poker clients on their PC. Each player is unique from the other, and each is about to do one of a number of different things (e.g., fold, stand, or raise!). If the software on the customer’s PC keeps track of the customer’s state, it keeps track of exactly one customer's state. If the poker server keeps track of the player’s state, it has to keep track of 10,000 different player's states. Clearly keeping the state with the client is much more scalable.

The most common example of this is web browsing. The system that supplies you with the page you just requested can only serve you the page you request. It doesn’t remember the previous page you were on. It is stateless. It is up to your web browser (typically working in unison with an application server behind the web server) to pass state information to the web server, so the server knows what you want.

Other than resource handling scalability, the characteristic of statelessness also helps enables #8 below.

When evaluating a system, you can quickly identify potential performance bottlenecks by understanding where state is maintained.

6. Resource pooling

Resourcing pooling is the act of pre-creating a set of resources that are used by many components, particularly when there are many more components wanting resources (but not at the same time) than there are resources. The resources are allocated from the resource pool, used, returned to the resource pool, and then recycled. The resources aren’t created and then destroyed, only to be created again when needed. The resources are pre-created at system startup (and perhaps on demand) and then (re)used as needed.

An example of resource pooling is something called connection pooling. When an application server (Tier 2) needs data from the database (Tier 3), it has to establish a connection (like a two-way pipe) to the database. Connection setup has a cost in terms of computing time and resources on both the application server and database sides. To reduce costs, most application and database servers maintain a ready pool of database connections, just waiting to be used. Once the application server has made its request to and received its data from the database and is ready to move on, it returns the connection to the pool so that it can be re-used again later (likely for a completely different data request).

At a practical level, delivery frameworks handle common points of resource pooling, and very little is required on a software developers part to utilize them. In order to evaluate how well the vendor’s system makes use of resource pooling, you have to dig pretty far into the architecture, which likely won’t be very practical. However, if you do identify a component that is logically very complex to set up and is frequently used, its worth understanding whether that component is set up and discarded over and over again, or is it part of a resource pool.

7. APIs, services and loose coupling of components

Loose coupling of components means that each component doesn’t know much about or share much with other components. It is an architectural way of thinking about how to separate and sometimes replicate data and functionality between components to maximize scalability.

Loose coupling makes use of and is a consequence of implementing most of the performance techniques covered above. Conversely, an architectural imperative of loose coupling would drive you to use most of the above techniques.

The reason why I’ve separated this performance technique out is to highlight the use of APIs (Application Programming Interfaces) to access loosely coupled services. An API is (hopefully!) a well understood way of accessing a service that is being provided by one component to another component. For example, in the online gambling world, connecting to a third party payment gateway like Netteller is an important business enabler. Netteller provides a very well defined way (API) to access Netteller funds transfer services. Netteller doesn’t know much about your customers and your gambling platform doesn’t know anything about the banking network under Netteller. If the two components (your gambling platform and Netteller) are loosely coupled, it should be a simple matter to unplug Netteller and plug in Firepay (at least from a technical, but not necessarily a commercial deal point of view!).

At a practical level, third parties like Netteller that want to make it easy to use their services do a good job with APIs and their documentation. They provide examples in various computing languages so developers can almost just cut and paste in the necessary components to access the services.

On the other hand, your primary gambling platform supplier may not be too keen on having you hook to other parties, so they might make it difficult or impossible for you to do so by not exposing or documenting their APIs. Also, some gambling platforms may be so poorly designed (especially in area #2 above) they can’t expose an API because there simply isn’t one.

Another telltale sign of loose coupling and API design is how easily you can get to the data you see within the application. That is, is the presentation of the data only loosely coupled with the production of that data. If the presentation layer (e.g., how the web page is coded to display the data) can be easily changed to display the same core chunk of data in different ways, the data is probably loosely coupled from the presentation.

8. Clustering (horizontal scalability)

Clustering is the ability to take one component and create multiple copies of it, potentially across multiple hardware platforms. Each copy of the component is equal to all the other copies – they are all peers. When these peers are together in a common pool and can provide services equally, they are considered to be clustered. Horizontal scalability means that I can keep adding in peers (e.g., additional hardware) to increase the performance of that peer group.

Like loose coupling, clustering is another derivative of some of the areas above.

The clustering concept is similar to resource pooling (#6 above). However, with resource pooling, your trying to avoid the costs of repetitiously creating and destroying components. With clustering, your creating a variable size pool of components to match your performance requirements.

We’ll consider two practical aspects of clustering – system level and component level.

At a system level, you want to be able to create a cluster of database, application, and web servers. As your performance needs go up, you can add a second, third, and so on server. Each of these servers can run on a new hardware platform. This can be tricky in the database tier, but is certainly well understood in the application and web tiers.

Things get interesting and complex at a component level. Consider an extremely popular football match in betfair, a few minutes before the start of the game. A good design suggests that there should be a cluster of components that handles markets, and a single “market” component (of a cluster of market components) can be dedicated to a single the hot market. An even higher performance model would be a stateless one where a cluster of market component peers can handle any market. It would be a design disaster for one component to have to handle all markets.

---

It is quite possible that as an operator evaluating your current or potential future online gambling software you will never to to ask these questions. Vendors will cry "proprietary", "competitive advantage" and other bollocks. They will ask you not to look at the man behind the curtain. This will change over time as the market becomes more competitive and using good design practices and high performance frameworks becomes a competitive advantage.

Good luck!

Why this Blog?

I write a lot. I write to get my head around a subject. I write about technology I figure out and use. I write "how we're going to do things" in email and documents to provide advice, guidance, policy and leadership for IT. At some point I realized that most of what I wrote was not proprietary and I was repeating myself as new people joined the team and repeating similar situations. So while my postings are mostly just common sense, it does help me figure things out, give me a stock set of thoughts and "how-to" for future reference and maybe even someone else might find value in them as well.

Views expressed on this website are my own and may or may not reflect the views of my employer.

I reserve rights to all content appearing in this blog if I'm the one that wrote it.