Tag Archives: Database

Large data volumes and the Force.com Data Architecture

A few weeks back I shared a salesforce.com whitepaper that gives the high-level details of the architecture underlying the Force.com platform. For those looking to learn more about the database underpinning the platform you can find even more information in another whitepaper titled “Best Practices for Deployments with Large Data Volumes” [PDF, 629KB]. As someone who does a fair amount of writing I can confidently say this is one of the best salesforce.com whitepapers I’ve ever read. The content is great but the format is what makes it.

The Virtual Database Structure

The Virtual Database Structure

The primary purpose of the document is to give architects the options and approaches when dealing with orgs that have, or will have – you guessed it – large amounts of data. As a byproduct though it ends up revealing information about the virtual database shared by all tenants. Salesforce.com has literally built a database within a database (databasception?!) with its own optimising features such as divisions, virtual indexes and skinny tables!

In short, if you ever have to deal with large amounts of data on the Force.com platform, or if you’re just the inquisitive type, I bet you’ll learn at least one new thing from this great whitepaper!

Tagged , , , ,

Monolithic Databases are dead

The NoSQL revolution is ongoing.

It’s currently a common statement to affirm that the hegemony of monolithic relational database system is ending.

ORM, Object Relation Model, was the last trial to perpetuate that pattern born in the 70’s. ORM brings to the software development specialist the sweet feeling of directly interacting with objects. In Ruby on Rails world, Active Record was a wonderful implementation imposing its conventional relational approach on all web developer. In “Java world”, Hibernate has still a massive user base but controversy regarding it’s performance and ability to handle some complex situation are numerous.

Long story made short, Relational Databases persist data. That’s a fact for all storage systems but they also structure the way we represent and organize our data model. They oblige us to think them through principles such as cardinality, relation entity, foreign keys and all those typical normalized form concept Entity Relationship Model. Furthermore, it is well-known that performance might be an issue when mixing ORM and relational Databases. You want to make a query with 15 joins to display a deeply structured data set with few conditions and limit? You should rethink it.
Since the emergence of NoSQL 3 years ago as a new paradigm people have started to rethink the said relational database hegemony. Keep in mind that NoSQL stands for “Not Only SQL” and not “NO SQL”. In day-to-day business deliveries, we observe more and more  people introducing key-value store options into their architecture like Redis or Memcached. Some go further and replaced the usual Oracle, SQL Server, MySQL, PostgreSQL with a document oriented DB like MongoDB or CouchDB.

With Cassandra DB and the trend of Hadoop and Big Data, the database choice was extended with column oriented DB for massive data manipulation. That phenomenon will not end anytime soon. New approaches keep emerging. We have for example on-demand social database Salesforce Database.com. or the trendy graph oriented Neo4j.

With all those possibilities offered, all coming with their own advantages, we should re-evaluate the way we decide how to persist data. Each DB solves a specific problem. There is not one that is better than the other. But an intelligent mix of them could be a trade-off. That is the idea behind the polyglot persistence principle.

Consequently, a problem we will often face in near future will be “how to integrate polyglot databases, ensure communication between them, consolidate access to them ?”. ORM has blurred the boundaries previously existing between the Business Logic layer and the Persistence Layer, coding languages and query languages. What will replace ORM when we’ll have to use 3 or 4 protocols to request and store data?

To extend that introduction to Polyglot Database, and have a quick overview of the various data models I have mentioned, I recommend reading the summarized comparison  by Kristof Kovacs. If you want to go deeper, buy the new book from “Seven Databases in 7 weeks” by E. Redmond and J. R. Wilson (still in Beta).

Tagged , , , , , , ,