Tuesday, March 15, 2011

Cloud Fabric Terminologies and Technologies

With the advent of Cloud Computing, there has been a resurgence of distributed computing terminologies and technologies, each claiming to represent and embody a particular aspect of the Cloud. This blog will focus on clarifying and compartmentalizing these aspects, and with any luck, help developers design their cloud application’s architecture.

Unlike design paradigms such as Object-Oriented, which use a top-down approach to describe architecture, we will begin with a bottom up discussion: http://en.wikipedia.org/wiki/Top-down_and_bottom-up_design. The reason to start here is that the bottom is where things begin to get connected and increasingly less cloudy. As we go up the stack, components go from “physically” connected to “logically” connected to “virtually” connected. And depending on what is being built, some aspects can be ignored and others put aside. Yet, each must be understood to get the most out of the Cloud and should ultimately provide the knowledge needed to build a successful cloud application.

Let’s first enumerate one of today’s more popular terminologies used in conjunction with cloud computing, the word “Fabric”
  • Switching Fabric - Hardware/Network services: Circuit/Packet, FibreChannel, InfiniBand
  • Computing Fabric - Loosely coupled: Parallel Processing Nodes, Networking, and Storage
  • Data Fabric - Distributed data: Cache, Message Bus, Replication, Query/Notification
  • Application Fabric - Collection of integrated software services: Azure, AppEngine, vFabric
The first three terms have the distinct similarity of describing remotely interconnected components, and when seen from a distance, look like a woven grid of edges and vertices; or, if you will, a threaded carpet where services and/or applications reside. The last term describes a tapestry of software/API(s) inter-woven together, providing a cohesive set of services used to frame business and web applications.

Now there are other terms that can also embody cloud computing (note the shameless plugs):
  • Grid (i.e. loosely couple)
  • Cluster (i.e. tightly couple)
  • Space-Based (e.g. txObject)
  • Peer-to-Peer (e.g. Yakkle Connect)
  • Distributed Operating System (e.g. VirtualIron’s 1st product)
But I want to focus on the term “fabric.” In my opinion this word is key to describing the future of computing and particularly cloud computing. What each fabric term embodies is the idea of “weaved” or blend/united components. In other words, seamless and natural, not forced. Another connotation to the word fabric is “interconnected” or structured/web.

On the contrary, terms like “Grid, Cluster, Peer-to-Peer” traditionally implies “Complex, Awkward, Bounded,” all of which are anti-cloud principles and thus the resurgence of the cloud term “fabric.” Now I am not saying that products that place themselves in one or more of the fabric categories will meet this fabric definition. What I am saying is that there needs to be standards by which one can base good architectural decisions on, and defining what a “Cloud Fabric” must embody, is certainly good place to start.

Switching Fabrics are the basis for most of the world’s telecommunication, internet/big-intranets, and cloud offerings, and for the most part embody my definition of fabric. However, if you are not building a cloud and/or fabric, you should not be dealing with or worrying about such a layer.

It is not to say that Quality-of-Service (Qos), Service-Level-Agreements (SLA), and Reliability (i.e. Five-Nines) are not important attributes to a fabric. It’s just that a cloud application should not be interacting with the switch fabric to provision and/or connect computing resources. All this is the responsibility of your cloud/fabric provider.

So if you look at Juniper’s recent Stratus Project (a.k.a. QFabric) announcements and are wondering how you would use this for your application, the answer is simple: You don’t. However, if you have friends that work for or are in charge of cloud infrastructure, send it their way.

Computing Fabrics are one part grid (i.e. loose), one part cluster (i.e. tight) and one part math (i.e. graph theory), with a touch of cloud (i.e. elastic). Now at first read, this would seem to be a great place to start a cloudy application. The short answer is: Absolutely not. If, however, you have expertise in distributed, parallel, and scientific/financial applications, then I could say yes. But I still might say “no” if you have not physically set up and/or maintained a computing grid.

Now, there are really smart people with very good reasons to design their applications at this layer, but I would suggest leaving the resource allocation and elasticity to the cloud professionals and just focus your architecture at solving your complex mathematical/financial (i.e. highly computational) calculations.

There are several wonderful papers on designing computing fabrics and building applications on top of them. And there have been numerous successful, as well as, profitable (e.g. Wall Street) applications demonstrating such success. But to reiterate, this is by no means an average cloud application.

Data Fabrics on the other hand, are a good place to start. Not saying they are the only place, but a first-rate data fabric can do so much for an application, and in the right environment (e.g. cloud provider), should be the cornerstone of your architecture.

As any programmer knows, data and behavior (with a simple user-interface) is all one needs to create an application. And if the data and behavior scales, for the most part, so does the user-interface. But scaling out an application’s data does not mean you have a data fabric. For instance, if a data service makes information distributedly available (e.g. Memcached) but does not include the following… it is not a data fabric:
  • Natural integration and hitless evolution
  • Seamless scale up/down and in/out elasticity
  • Query and Notification programming interfaces
  • Aspect Injection: QoS, SLA, RBAC, Reliability, Persistence, etc…
There are so many technologies and design patterns at this layer, an indication of the importance of finding a good data fabric. And with the advent of standards such as: SQL, ODBC, JDBC, ADO.NET, XML, XQuery, XPath, JPA, etc… hundreds of proprietary and open-source solutions have been developed; but be careful in choosing “standard” solutions, since each comes with varying degrees of deviation (topic for another blog).

Though many of these technologies have characteristics you want in a data fabric; for instance, persisting and/or querying of data (a.k.a. CRUD), they are habitually geared towards manual/standalone application deployments (i.e. not cloudy). As for scaling, none directly fit the bill, let alone my application requirement of hitless upgrades.

There is however several solutions designed to address the scale limitation. For instance, using distributed memory caches are a common way to solve this problem, some requiring either explicit data access interaction or implicit. Other solutions scale out the actual persistence layer, though typically these are expensive technologies (even in the cloud: Database-as-a-Service).

Often a combination of caching, persistence, and replication is used to address scale. But be forewarned, this is complex and difficult stuff to implement, and it is being recommended that you go with your cloud provider's Database-as-a-Service (DaaS) solution.

In truth, the only solutions that even get close to what I would call a data fabric- are the DaaS offerings. Some support popular data access technologies (with great limitations) while others are designed from the ground up to meet specify cloud application requirements.

Since this blog’s goal is to outline fabric terms and technology, I will be writing a more specify blog on the Data Fabric; covering features discussed here but yet to be a part of today’s DaaS solutions. Until then, and prior to starting your application, you should continue doing research into choosing a data access technology: http://en.wikipedia.org/wiki/Data_access_layer. And depending on your choice, as much time should be spent choosing a backing store database: http://en.wikipedia.org/wiki/Database. There are many data access choices… so choose wisely. A bad decision here can kill a project.

Application Fabrics are a relatively new term and deviates a bunch from the communication aspects of network fabrics (switch, compute, and data) we just finished talking about. But I like this derivation. I like it because its goal is to provide a holistic approach to software design, development, and deployment (i.e. cloudy principles); in other words, the life cycle of an application in the cloud.

Another goal is to take services, applications typically need, and group them into one cohesive package. For instance, the following are minimum services such a fabric should provide:
  • Application Framework(s)
  • Development Environment (IDE)
  • Automated Deployment/Life Cycle
  • Data Fabric or at lease a distributed data access layer
  • Database-as-a-Service or at lease a scalable database layer
Companies leading this effort are Microsoft’s Azure, Google’s AppEngine, VMware’s vFabric, and Heroku (now part of Salesforce) to name a few. Each solution has their advantages, but the major disadvantage in my mind is the “lock-in.”

Yet, with today’s business environment, time-to-market continues to be the major driving force when building an application, so I am not sure lock-in is the biggest issue. However, these solutions actually have two levels of lock-in, the application fabric framework, as well as, their cloud platform, with the one exception of Heroku. But now that Heroku is owned by Salesforce, not sure they will continue a “lock-in free” feature set.

But in the end, choosing an application framework (regardless of whether it supports an integrated service fabric) is a really wise choice for all so many reasons: http://en.wikipedia.org/wiki/Comparison_of_web_application_frameworks.

Wednesday, February 2, 2011

Future Proof your App: Cloudify your Architecture

It has been one year since my last blog, where I indicated I was working on a “Cloud Application Framework” called jStratus; with StratusDB (a cloud data fabric) being the first layer I would release to the open-source community. So let's get caught up…

Since then, many things have happened. First, I left Akiban Technologies (the cloud database company) as their Chief Architect to start a new venture called CloudTree. While Akiban has a wonderful idea, technology and team, I realized I wanted to go for it and make the data fabric a reality.

So I contacted my Telecom buddies (all industry leaders, one being my mentor) and confessed what I wanted to do. To my great surprise, they were in the planning stages of building a telecom company in the cloud and in need of such a “fabric” to scale their platform.

But before I get too far into that, let's recap on how we “all” got here… As you know, there has been much to do around “the cloud” over the last five years. Pioneered by companies such as Amazon, Salesforce, Google and Virtual Iron (shameless plug), the cloud has continued to evolve and morph into the juggernaut we see today.

Even Microsoft had to jump into the game with all that “to the cloud” marketing. And yes - this once mighty desktop company is moving their applications to the cloud too (thanks Ozzie). But for me, the final realization that the naysayers’ days were numbered was when “anti-cloud” Larry Ellison purchased Virtual Iron to begin the cloudification of their customer’s applications…

However, this blog (as well as future ones) will not be about the big boys and their cloudy applications, but about you and yours. We’ll dive into what the cloud means to today’s non-cloudy applications, as well as, what tomorrow’s applications will look like.

Many generalizations will be made throughout these blogs to get points across, but I hope in the end you will better understand what is required to both migrate your applications to a cloud architecture as well as how to designed new cloudy ones.

So where do we start? Well it depends. It depends on your type of application. There are four major types:
  • Web
  • Desktop
  • Enterprise
  • Mobile / Telecom
As you can see there is a wide range of differences with respects to an application’s audience, expectation, and environment. In the past, each type was designed, developed, and executed quite differently. Yet today’s cloud has made promises that the future of applications is cloudy.

But to be honest, one type has been running completely cloudy since its inception. Telecom has been creating distributed computing environments for its applications since the beginning, drawing the earliest cloud diagrams to describe them- well, as least as long as when I got into the business back in 1991.

The difference is that they were private and not designed for general use. And to be clear, Telecom built large-scale distributed (i.e. cloudy) systems decades before Google and Twitter were even a glimmer in their founder’s eyes.

Web applications on the other hand began their origins locally, then remotely, and with the advent of companies such as Amazon, Rackspace, and Softlayer: cloudy (well, virtualized at least). But if your web application is hosted by a cloud provider’s virtual infrastructure, is it really cloudy? I think not.

If your application knows about the operating system, server, network, and/or storage (even if virtualized), you are an old school hosted web app. If, however, you are running on a cloud provider’s PaaS infrastructure, then your application “is” cloudy. But before going into what is cloudy or not, let's continue describing the last two application types: Desktop and Enterprise.

Desktop applications are typically the furthest from cloudification. Installed locally, run locally, and content stored locally. Now, in recent years, there have been some hybrids like Skype, where the application is installed and run locally, but all content is stored elsewhere. So if you run another Skype instance on another computer, this content is re-sunk to your new location. But with recent publicized network outages, I don’t see Skype as very cloudy- well, at least, my definition of cloudy.

And finally, enterprise applications- most are run like traditional desktop apps, though on dedicated servers, networks, and storage. However, big institutions have been building private computing environments for quite a while now and many of their applications are built to run on them. Despise the unfortunate fact that both Telecom and Enterprise clouds are similar in that they are proprietary; they differ in the scale of Telecom and the diversity of Enterprise applications.

So where does this leave us?

Stay tuned...

Saturday, February 6, 2010

Cloudy with a chance of Ownership

This my first official blog and being a developer by trade, passion, and well… passion, I thought I’d discuss the journey engineers, programmers, hackers or whatever you call yourself might go through with respects to ownership of your software.

For me, it was ten years ago while using technology I owned in several telecom companies. Thinking I would not utilize it in my next endeavor, I decided to sell. Though having one very interested party, in the end it seemed like too much work (i.e. lawyers); and doing another startup and a baby on the way, I knew I had enough on my plate. This was what led me to Open Source…

Yet I still wanted “value” from the five years of work I put into it. At the time, I was a huge fan of the Python language as well as a contributor and was quite familiar with the Open Source paradigm and community. My software was a C++ and Python application framework, and here is where it hit me. I could create my own community by open sourcing the software too. And that’s just what I did: http://www.txobject.org.

But how does one open source software and what does it mean anyway? After dealing with lawyers while selling the framework technology (e.g. indemnification), my biggest concern was to protect myself from liability. Now I am no lawyer and at the time had no real understanding of software laws or Open Source licensing, I decided to go with GPL. What I did know was that GNU General Public License (GPL) was the most popular Open Source license. And after a quick read on its liability definition, I was sold. GPL it would be.

I began open sourcing around the turn of the century, before the onslaught of Open Source projects exploded onto the scene. To get the word out, I used sites such as http://freshmeat.net and http://sourceforge.net, each pioneering the community experience.

For me, this was the heyday of Open Source software, a renaissance if you will. I received tons of feedback and lots of downloads. It was a wonderful experience, both inside and outside the office. However, I was repeatedly asked by the community to provide a license other than GPL (e.g. BSD, Apache, etc…).

If you know anything about application frameworks, users of such software need to link against it. At the time, most software was proprietary. This created a problem. Developers wanted to use txObject ATK but they could not due to the GPL license requiring software built on it to fall under its “constraints”. Still, I became a follower, believer, disciple or whatever you want to call it, of the Free Software Foundation (FSF) movement and decided to stick by my original licensing decision.

Then came my next project idea: http://www.myoodb.org application framework. I decided to continue showing my loyalty to the GPL faith. Well sort of. To show conviction, the code would be licensed by GPL; however, to encourage a larger adoption of my work, I went with a dual license. Lesser GPL (LGPL) would govern the linkage. Now in my mind, this made a lot of sense. I was keeping the faith and allowing for broad adoption of my software. Yet, the community was still at best concerned and at worst just confused about this licensing scheme. And again, I felt like I did not get it right. But an interesting thing happened on the way to the theater…

A company (which shall remain nameless) bought my database technology. The agreement precludes me from working on projects like MyOODB, but since the software is licensed under GPL, its legacy continues till this day. In summary, sticking with GPL didn’t remove “value” but actually preserved it. But my journey does not end here.

With my next project, which is a Cloud Application Framework, the GPL way of thinking no longer holds water. Software running in the cloud is not required to be redistributed since GPL software is not actually delivered to the consumer. So, in my journey the next question is: Will StratusDB (i.e. Cloud Data Fabric) be GPL or GNU Affero General Public License (AGPL)?

Stay tuned...