With the advent of Cloud Computing, there has been a resurgence of distributed computing terminologies and technologies, each claiming to represent and embody a particular aspect of the Cloud. This blog will focus on clarifying and compartmentalizing these aspects, and with any luck, help developers design their cloud application’s architecture.
Unlike design paradigms such as Object-Oriented, which use a top-down approach to describe architecture, we will begin with a bottom up discussion: http://en.wikipedia.org/wiki/Top-down_and_bottom-up_design. The reason to start here is that the bottom is where things begin to get connected and increasingly less cloudy. As we go up the stack, components go from “physically” connected to “logically” connected to “virtually” connected. And depending on what is being built, some aspects can be ignored and others put aside. Yet, each must be understood to get the most out of the Cloud and should ultimately provide the knowledge needed to build a successful cloud application.
Let’s first enumerate one of today’s more popular terminologies used in conjunction with cloud computing, the word “Fabric”
- Switching Fabric - Hardware/Network services: Circuit/Packet, FibreChannel, InfiniBand
- Computing Fabric - Loosely coupled: Parallel Processing Nodes, Networking, and Storage
- Data Fabric - Distributed data: Cache, Message Bus, Replication, Query/Notification
- Application Fabric - Collection of integrated software services: Azure, AppEngine, vFabric
The first three terms have the distinct similarity of describing remotely interconnected components, and when seen from a distance, look like a woven grid of edges and vertices; or, if you will, a threaded carpet where services and/or applications reside. The last term describes a tapestry of software/API(s) inter-woven together, providing a cohesive set of services used to frame business and web applications.
Now there are other terms that can also embody cloud computing (note the shameless plugs):
- Grid (i.e. loosely couple)
- Cluster (i.e. tightly couple)
- Space-Based (e.g. txObject)
- Peer-to-Peer (e.g. Yakkle Connect)
- Distributed Operating System (e.g. VirtualIron’s 1st product)
But I want to focus on the term “fabric.” In my opinion this word is key to describing the future of computing and particularly cloud computing. What each fabric term embodies is the idea of “weaved” or blend/united components. In other words, seamless and natural, not forced. Another connotation to the word fabric is “interconnected” or structured/web.
On the contrary, terms like “Grid, Cluster, Peer-to-Peer” traditionally implies “Complex, Awkward, Bounded,” all of which are anti-cloud principles and thus the resurgence of the cloud term “fabric.” Now I am not saying that products that place themselves in one or more of the fabric categories will meet this fabric definition. What I am saying is that there needs to be standards by which one can base good architectural decisions on, and defining what a “Cloud Fabric” must embody, is certainly good place to start.
Switching Fabrics are the basis for most of the world’s telecommunication, internet/big-intranets, and cloud offerings, and for the most part embody my definition of fabric. However, if you are not building a cloud and/or fabric, you should not be dealing with or worrying about such a layer.
It is not to say that Quality-of-Service (Qos), Service-Level-Agreements (SLA), and Reliability (i.e. Five-Nines) are not important attributes to a fabric. It’s just that a cloud application should not be interacting with the switch fabric to provision and/or connect computing resources. All this is the responsibility of your cloud/fabric provider.
So if you look at Juniper’s recent Stratus Project (a.k.a. QFabric) announcements and are wondering how you would use this for your application, the answer is simple: You don’t. However, if you have friends that work for or are in charge of cloud infrastructure, send it their way.
Computing Fabrics are one part grid (i.e. loose), one part cluster (i.e. tight) and one part math (i.e. graph theory), with a touch of cloud (i.e. elastic). Now at first read, this would seem to be a great place to start a cloudy application. The short answer is: Absolutely not. If, however, you have expertise in distributed, parallel, and scientific/financial applications, then I could say yes. But I still might say “no” if you have not physically set up and/or maintained a computing grid.
Now, there are really smart people with very good reasons to design their applications at this layer, but I would suggest leaving the resource allocation and elasticity to the cloud professionals and just focus your architecture at solving your complex mathematical/financial (i.e. highly computational) calculations.
There are several wonderful papers on designing computing fabrics and building applications on top of them. And there have been numerous successful, as well as, profitable (e.g. Wall Street) applications demonstrating such success. But to reiterate, this is by no means an average cloud application.
Data Fabrics on the other hand, are a good place to start. Not saying they are the only place, but a first-rate data fabric can do so much for an application, and in the right environment (e.g. cloud provider), should be the cornerstone of your architecture.
As any programmer knows, data and behavior (with a simple user-interface) is all one needs to create an application. And if the data and behavior scales, for the most part, so does the user-interface. But scaling out an application’s data does not mean you have a data fabric. For instance, if a data service makes information distributedly available (e.g. Memcached) but does not include the following… it is not a data fabric:
- Natural integration and hitless evolution
- Seamless scale up/down and in/out elasticity
- Query and Notification programming interfaces
- Aspect Injection: QoS, SLA, RBAC, Reliability, Persistence, etc…
There are so many technologies and design patterns at this layer, an indication of the importance of finding a good data fabric. And with the advent of standards such as: SQL, ODBC, JDBC, ADO.NET, XML, XQuery, XPath, JPA, etc… hundreds of proprietary and open-source solutions have been developed; but be careful in choosing “standard” solutions, since each comes with varying degrees of deviation (topic for another blog).
Though many of these technologies have characteristics you want in a data fabric; for instance, persisting and/or querying of data (a.k.a. CRUD), they are habitually geared towards manual/standalone application deployments (i.e. not cloudy). As for scaling, none directly fit the bill, let alone my application requirement of hitless upgrades.
There is however several solutions designed to address the scale limitation. For instance, using distributed memory caches are a common way to solve this problem, some requiring either explicit data access interaction or implicit. Other solutions scale out the actual persistence layer, though typically these are expensive technologies (even in the cloud: Database-as-a-Service).
Often a combination of caching, persistence, and replication is used to address scale. But be forewarned, this is complex and difficult stuff to implement, and it is being recommended that you go with your cloud provider's Database-as-a-Service (DaaS) solution.
In truth, the only solutions that even get close to what I would call a data fabric- are the DaaS offerings. Some support popular data access technologies (with great limitations) while others are designed from the ground up to meet specify cloud application requirements.
Since this blog’s goal is to outline fabric terms and technology, I will be writing a more specify blog on the Data Fabric; covering features discussed here but yet to be a part of today’s DaaS solutions. Until then, and prior to starting your application, you should continue doing research into choosing a data access technology: http://en.wikipedia.org/wiki/Data_access_layer. And depending on your choice, as much time should be spent choosing a backing store database: http://en.wikipedia.org/wiki/Database. There are many data access choices… so choose wisely. A bad decision here can kill a project.
Application Fabrics are a relatively new term and deviates a bunch from the communication aspects of network fabrics (switch, compute, and data) we just finished talking about. But I like this derivation. I like it because its goal is to provide a holistic approach to software design, development, and deployment (i.e. cloudy principles); in other words, the life cycle of an application in the cloud.
Another goal is to take services, applications typically need, and group them into one cohesive package. For instance, the following are minimum services such a fabric should provide:
- Application Framework(s)
- Development Environment (IDE)
- Automated Deployment/Life Cycle
- Data Fabric or at lease a distributed data access layer
- Database-as-a-Service or at lease a scalable database layer
Companies leading this effort are Microsoft’s Azure, Google’s AppEngine, VMware’s vFabric, and Heroku (now part of Salesforce) to name a few. Each solution has their advantages, but the major disadvantage in my mind is the “lock-in.”
Yet, with today’s business environment, time-to-market continues to be the major driving force when building an application, so I am not sure lock-in is the biggest issue. However, these solutions actually have two levels of lock-in, the application fabric framework, as well as, their cloud platform, with the one exception of Heroku. But now that Heroku is owned by Salesforce, not sure they will continue a “lock-in free” feature set.
But in the end, choosing an application framework (regardless of whether it supports an integrated service fabric) is a really wise choice for all so many reasons: http://en.wikipedia.org/wiki/Comparison_of_web_application_frameworks.
