<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>Bharadwaj's Blog</title><link href="https://bharath12345.github.io/" rel="alternate"/><link href="https://bharath12345.github.io/feeds/all.atom.xml" rel="self"/><id>https://bharath12345.github.io/</id><updated>2020-01-01T00:00:00-05:00</updated><entry><title>Application Developer View: PostgreSQL vs. MySQL</title><link href="https://bharath12345.github.io/posts/application-developer-view-postgresql-vs-mysql/" rel="alternate"/><published>2020-01-01T00:00:00-05:00</published><updated>2020-01-01T00:00:00-05:00</updated><author><name>Bharadwaj</name></author><id>tag:bharath12345.github.io,2020-01-01:/posts/application-developer-view-postgresql-vs-mysql/</id><summary type="html">&lt;p&gt;I reluctantly started to write this post some 6 months ago. As a application developer my knowledge of the internals of DBMS design was (and still is) very limited. It is one thing to work with a DBMS at &lt;em&gt;development&lt;/em&gt; and quite another to keep it running as part of …&lt;/p&gt;</summary><content type="html">&lt;p&gt;I reluctantly started to write this post some 6 months ago. As a application developer my knowledge of the internals of DBMS design was (and still is) very limited. It is one thing to work with a DBMS at &lt;em&gt;development&lt;/em&gt; and quite another to keep it running as part of &lt;em&gt;IT Operations&lt;/em&gt;. My motivation here is to share a few specific ideas with fellow application developers. The attempt is to do a value judgement of the two systems from a development standpoint and steer clear from a value judgement in the &lt;em&gt;deployed&lt;/em&gt; scenario. After all DBMS systems are probably at the heart of more Aps vs. Ops debates than anything else. &lt;/p&gt;
&lt;div class="toc"&gt;&lt;span class="toctitle"&gt;Table of Contents&lt;/span&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#the-null-problem"&gt;The 'Null' Problem&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#object-relational-database-system"&gt;Object Relational Database System!&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#choice-of-data-types-and-storage"&gt;Choice Of Data Types and Storage&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#performance"&gt;Performance&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#philosophical-difference-that-influences-technology"&gt;Philosophical difference that influences technology&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#epilogue"&gt;Epilogue&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#quoting-from-the-references"&gt;Quoting from the references&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#there-is-no-reason-at-all-to-use-mysql-mariadb-mysql-founder-michael-widenius"&gt;There is no reason at all to use MySQL: MariaDB, MySQL founder Michael Widenius&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;Now, to say it simply (at the cost of barbs from some of my good friends who I know to be excellent operations engineers for MySQL) - &lt;strong&gt;PostgreSQL leads MySQL. And by some distance.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Apart from reading about the internals and playing with both systems I felt a need speak to whomever I could in the developer community to ask for the reasons behind the choice of DBMS in their projects. In the last 6 months I could speak to just about eight such people in different projects. Almost all from medium to small companies doing web applications (but some of these projects were themselves quite large). After speaking to these people there is one thing that I cannot but share - the answer to &lt;em&gt;Why MySQL&lt;/em&gt; from all who had chosen it was - &lt;em&gt;"Unfortunately, MySQL had already been chosen by the time I got involved"&lt;/em&gt;. Of the eight, six had been running projects for 2-3 years of which three had chosen MySQL. Rest had all opted for PostgreSQL.&lt;/p&gt;
&lt;p&gt;When I told a colleague of writing this article he smiled and asked a polite, &lt;em&gt;Why?&lt;/em&gt; After all, the web is filled with such articles. Mostly written by expert database admins. There are fewer articles from the &lt;em&gt;application programmer&lt;/em&gt; point-of-view. I can think of two reasons why there are not many programmers dissecting this -&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;em&gt;Developers&lt;/em&gt; find it difficult to talk on this topic in which the &lt;em&gt;Operations&lt;/em&gt; folk have strong opinions. In many projects of the DevOps kind the decision to pick the database is the prerogative of the &lt;em&gt;Operations&lt;/em&gt; folk than the &lt;em&gt;Developer&lt;/em&gt; folk&lt;/li&gt;
&lt;li&gt;From a developer perspective, the PostgreSQL vs. MySQL debate is a non-starter. PostgreSQL wins. And wins quite early (you will know the &lt;em&gt;why&lt;/em&gt; by the end of this post)  &lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;But before delving deeper into the comparison its good to set the application context -&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Enterprise Applications. By this, I mean the application has more moving parts than a typical web-stack. The number of tables could stretch into hundreds. Data is collected from myriad sources in real-time&lt;/li&gt;
&lt;li&gt;Read-write ratio varies vastly across tables. Database needs to support 90% (and upwards) read-only tables and also tables with much higher write than read, say 60% (and upwards)&lt;/li&gt;
&lt;li&gt;Many thousand transactions per second&lt;/li&gt;
&lt;li&gt;Hundreds of stored procedures&lt;/li&gt;
&lt;li&gt;Automating migrations, upgrades and sharding&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Given that the topic is vast and both softwares are widely used its probably a good idea to start by pointing to some of the good references for comparison from the wild web -&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href="http://www.wikivs.com/wiki/MySQL_vs_PostgreSQL"&gt;MySQL vs. PostgreSQL&lt;/a&gt; - recent and continuously updated. Readers would do well to read the articles in the links section (on last read, I did not find a single article talking glowingly about MySQL in comparison to PostgreSQL) &lt;/li&gt;
&lt;li&gt;Couple of very good articles comparing these two by Robert Haas&lt;/li&gt;
&lt;li&gt;&lt;a href="http://rhaas.blogspot.in/2010/11/MySQL-vs-postgresql-part-1-table.html"&gt;Table Organization&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://rhaas.blogspot.in/2011/02/mysql-vs-postgresql-part-2-vacuum-vs.html"&gt;Vacuum vs. Purge&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.databasejournal.com/features/postgresql/article.php/3288951/PostgreSQL-vs-MySQL-Which-is-better.htm"&gt;PostgreSQL vs MySQL: Which is better?&lt;/a&gt; - This article is 10 years old. Still a good read&lt;/li&gt;
&lt;li&gt;&lt;a href="http://sql-info.de/MySQL/gotchas.html"&gt;MySQL Gotchas&lt;/a&gt; and &lt;a href="http://sql-info.de/postgresql/postgres-gotchas.html"&gt;PostgreSQL Gotchas&lt;/a&gt;. Just stare at the size of these two lists for some time even if you don't read them. They tell a story&lt;/li&gt;
&lt;li&gt;&lt;a href="http://wiki.postgresql.org/wiki/Why_PostgreSQL_Instead_of_MySQL:_Comparing_Reliability_and_Speed_in_2007"&gt;Comparing Reliability and Speed&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.slideshare.net/techdude/postgres-vs-MySQL-presentation"&gt;A Comparison of Enterprise Suitability - PostgreSQL is Suited Better&lt;/a&gt; - though MyISAM focused, this comparison is with enterprise products in purview and is 5 years old (2008). Since then, the gap between PostgreSQL and MySQL have only widened in favour of PostgreSQL despite InnoDB&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I plan and hope not to repeat anything that is already said in these articles. And agreeing with the many writers of these articles, I don't see any point doing performance benchmark comparisons between these two database systems. But I do want to point the interested readers to the &lt;em&gt;&lt;a href="http://www.muktware.com/2013/05/there-is-no-reason-at-all-to-use-MySQL-mariadb-MySQL-founder-michael-widenius/4298"&gt;political&lt;/a&gt;&lt;/em&gt; aspects in this comparison (I have quoted from this interview at the end of this article). MySQL has been acquired by Oracle. Its only natural to have concerns about the future roadmap of MySQL given these concerns which affect technology deeply...&lt;/p&gt;
&lt;p&gt;Moving on to the specifics...&lt;/p&gt;
&lt;h3 id="the-null-problem"&gt;The 'Null' Problem&lt;/h3&gt;
&lt;p&gt;The biggest accusation one can make against any RDBMS is that it is not careful with data integrity. MySQL is notorious for its inability to handle Null with many data types. Effort to accommodate query mistakes ruins MySQL. For example - &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;MySQL will insert empty strings for text fields that have not-null constraint. This happens if you forgot to mention a field during the insert or if you somehow ended up inserting a blank value ('') for a field. It goes ahead with the insert in both these cases. Irrespective of weather we use ORM or direct JDBC or some other kind of wrappers, there simply is no way to gracefully handle this problem. PostgreSQL won't do such a thing&lt;/li&gt;
&lt;li&gt;Non-null timestamps end up getting all zero value dates. If you push a NULL as date, it defaults to current time!&lt;/li&gt;
&lt;li&gt;With decimal numbers, if you are not careful with precision and scale, then, on an insert MySQL will &lt;em&gt;change the data&lt;/em&gt; to fit the column constraints. Of course its necessary to be careful when playing with data but the problem here is a change in precision (column constraint) should in no way change the data as MySQL does. This kind of problem is just plain horror. Just refer to the MySQL gotchas site to get a clear understanding of this problem. Postgres does not alter data no matter what&lt;/li&gt;
&lt;li&gt;While writing functions, MySQL does not throw graceful exceptions for divide by zero. It just returns a plain NULL all the time!&lt;/li&gt;
&lt;li&gt;In MySQL set a text field length to &lt;em&gt;X&lt;/em&gt; and insert a string which is &lt;em&gt;2X&lt;/em&gt; in length... MySQL will just promptly truncate the extra &lt;em&gt;X&lt;/em&gt;. Now, for gods sake - the length X was a &lt;em&gt;constraint&lt;/em&gt;. On trying to insert longer length strings, we expect MySQL to throw errors... not play with our data...&lt;/li&gt;
&lt;li&gt;MySQL has no idea about dates. Try inserting 31st Feb and it will promptly comply inserting crap&lt;/li&gt;
&lt;li&gt;MySQL will allow inserting of strings to decimal columns, sometimes storing it as 0 and sometimes as NULL&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These problems are by no means all that is there to be said about MySQL's SQL compliance. MySQL takes liberties to not abide by user supplied constraints in many more situations. And this aspects creates massive problems for developers on both &lt;em&gt;correctness&lt;/em&gt; and &lt;em&gt;performance&lt;/em&gt; fronts.&lt;/p&gt;
&lt;h3 id="object-relational-database-system"&gt;Object Relational Database System!&lt;/h3&gt;
&lt;p&gt;PostgreSQL calls itself &lt;em&gt;Object Relational Database System&lt;/em&gt;. This is so because it brings with itself many new ideas that lend very well with the OOPS modelled world (that developers are so used to). And this paradigm fits the enterprise data models and requirements quite well. Let me state three specific features - &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Logical Partitioning&lt;/li&gt;
&lt;li&gt;Windowing Functions&lt;/li&gt;
&lt;li&gt;Table Inheritance&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each of these features can be quite critical with the ever increasing data that needs to be handled in today's world. It takes some reading to understand each one but it is well worth the effort. On the other side I fail to find any feature that MySQL brings that may be absent from PostgreSQL (think about it - thats a very big assertion I make!). &lt;/p&gt;
&lt;p&gt;To illustrate the point further let me describe one of my favourite features - &lt;em&gt;table inheritance&lt;/em&gt; with an example. The below statements create tables where the column &lt;em&gt;name&lt;/em&gt; belongs to the &lt;em&gt;base&lt;/em&gt; table (shape) and columns like edge, radius belong to the &lt;em&gt;derived&lt;/em&gt; tables. This model closely resembles how data is modelled in OOPS. Running the above SQL statements, will result in following status in different tables -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;shape - 4 records&lt;/li&gt;
&lt;li&gt;square, circle, rectangle tables - 1 record each!  &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;SQL Statements -&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;TABLE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;TABLE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;square&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;edge&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb nb-Type"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="n"&gt;INHERITS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;TABLE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;circle&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;radius&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb nb-Type"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;INHERITS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;TABLE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;rectangle&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb nb-Type"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb nb-Type"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;INHERITS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;INSERT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;into&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="n"&gt;VALUES&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;random&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;INSERT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;into&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;square&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;edge&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;VALUES&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;square&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;INSERT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;into&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;circle&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;radius&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;VALUES&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;circle&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;INSERT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;into&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;rectangle&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;VALUES&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;rectangle&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Like 'INHERITS' there also is a 'NO INHERITS' to mixin different tables with precision. And more importantly, Postgres uses partitioning under the covers to enable inheritance. So, not only does inheritance give the programmer flexibility in data modelling lending but it also leads to lesser duplication, and thus helps improve performance! Without inheritance, the engineers will be forced to do multiple table joins and filters (many times going up to boolean value &lt;em&gt;marker&lt;/em&gt; columns) - which sounds over-engineering for a OOP developer standpoint. Thinking about it, the non-object oriented SQL design adds to overhead to SQL optimiser, makes indexing overhead higher and many more such misses.&lt;/p&gt;
&lt;h3 id="choice-of-data-types-and-storage"&gt;Choice Of Data Types and Storage&lt;/h3&gt;
&lt;p&gt;MySQL has far fewer data types than PostgreSQL. Adding new data types to MySQL is a non-trivial error-prone work even for experience professionals. Compared to this, PostgreSQL offers a proverbial goldmine of data-types for designers to choose from. Here are some aspects about data-types that really makes PostgreSQL standout vis-a-vis MySQL -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data types for Dates - A massive choice to choose from for specific usecases&lt;/li&gt;
&lt;li&gt;Data types for IPv4, IPv6, MAC, Inet address&lt;/li&gt;
&lt;li&gt;Data types for Arrays, JSON, UUID, XML with features like search within Arrays using indexes and where clauses&lt;/li&gt;
&lt;li&gt;Data types for floating point numerics - rounding errors can be eliminated to a much larger extent with the massive choice available in this area &lt;/li&gt;
&lt;li&gt;Infinity, -Infinity, NaN as values for numeric data types - in MySQL one has no way of modelling these. Modelling these as nulls often leads to programming complexity and errors&lt;/li&gt;
&lt;li&gt;ORM tools often convert 'String' datatype to nvarchar(max) which kills performance on MySQL. Inserting multibyte characters (say Japanese) into varchar fields completely corrupts data (no database exception thrown!). Sometimes it is not sufficient to just change the column type to nvarchar when trying to store multibyte characters. Even the insert statements need a prefix (application level code change if you are using JDBC). PostgreSQL uses default UTF8 encoding. There is no varchar/nvarchar problems. Everything simply works!&lt;/li&gt;
&lt;li&gt;Adding constraints to complex types likes dates is made extremely simple with embedded functions. No such thing possible in MySQL. Special keywords like 'today', 'tomorrow', 'yesterday', 'allballs' etc lend readability to the code&lt;/li&gt;
&lt;li&gt;All strings are default UTF-8 encoded&lt;/li&gt;
&lt;li&gt;Serial and other sequences - leads to very fast ID key finding and incrementing&lt;/li&gt;
&lt;li&gt;Data type for Money!&lt;/li&gt;
&lt;li&gt;Index even functions (no other DB does this)&lt;/li&gt;
&lt;li&gt;Automatic Data Compress by Default&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Why are data-types important? Modelling precisely leads to less data stored. When performance becomes important to squeeze out the max performance requires optimised storage... because finally, things in DB schema are going to end up in RAM caches and larger datatypes will mean more space being taken up on the RAM. Less conservatively used RAM cache will bring down the performance of the application more than anything else. &lt;/p&gt;
&lt;h3 id="performance"&gt;Performance&lt;/h3&gt;
&lt;p&gt;Comparing performance of PostgreSQL and MySQL (InnoDB) is a loaded question. The references I have spelt out earlier have links to many scholarly articles that articulate the subtle differences in the MVCC implementation of both. Both provide row locking, page locking, along with read/write lock separation. After digging into the details picking one of these two on the basis of &lt;em&gt;performance&lt;/em&gt; comes back to the nature of the application that is being built. Designers should pay attention to three critical questions and answer them sufficiently before making a choice -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Read/Write characteristics of the application&lt;/li&gt;
&lt;li&gt;Concurrent access characteristics of various tables&lt;/li&gt;
&lt;li&gt;Cost of dirty reads, non-repeatable reads, phantom reads etc&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These are not easy questions to answer. The performance area is complex enough and if concurrent writes requirements of an application are extreme then moving away totally from SQL to NoSQL is a better option than trying to split hairs over RDBMS engines. A move to NoSQL brings massive freedom to design around write and concurrent access problems (along with massive responsibility to handle things correctly!). So, choosing MySQL over PostgreSQL due to some notions of higher performance without concrete answers to the above posers, in all probability, will lead to a disaster-in-waiting.&lt;/p&gt;
&lt;h3 id="philosophical-difference-that-influences-technology"&gt;Philosophical difference that influences technology&lt;/h3&gt;
&lt;p&gt;Some experts have pointed out a subtle but important philosophical difference between MySQL and PostgreSQL that impacts their core technological offering. MySQL is a &lt;em&gt;product&lt;/em&gt; while PostgreSQL is a &lt;em&gt;project&lt;/em&gt;. MySQL has been a product since its inception and sold multiple times over by different companies that have owned it. Due to the &lt;em&gt;product&lt;/em&gt; definition and ownership, large scale code corrections have been fewer with MySQL. This philosophical difference is what is behind the fact that MySQL is still in v5.x while PostgreSQL in v9.x. This difference also leads to a design where MySQL separates the storage engine and SQL parsing as different (and many different storage engines can be chosen). While PostgreSQL integrates the whole stack top-to-bottom. The folks behind PostgreSQL are driven to bring the progress in database technology to the fingertips of developers and admins. Thats why PostgreSQL has made larger course corrections in its evolutions (lending to a bigger version number v9).&lt;/p&gt;
&lt;h3 id="epilogue"&gt;Epilogue&lt;/h3&gt;
&lt;p&gt;I have a hypothesis. MySQL is more popular in applications developed using Ruby, PHP, Perl or Python. Just like Microsoft's SQL-Server is the default database if you are a C# application. This is so because of the community and peer group effect. And also because there are many tools and expertise within the ecosystem if you choose a popular stack. But the most popular language to develop &lt;em&gt;enterprise&lt;/em&gt; applications is Java. And I personally get more fond of Scala by every passing day. So the hypothesis is, for JVM developers MySQL does not lend well &lt;em&gt;just&lt;/em&gt; because of the community/peer-group effect. So the choice needs to be based more on technological pro's and con's.&lt;/p&gt;
&lt;h3 id="quoting-from-the-references"&gt;Quoting from the references&lt;/h3&gt;
&lt;h5 id="there-is-no-reason-at-all-to-use-mysql-mariadb-mysql-founder-michael-widenius"&gt;There is no reason at all to use MySQL: MariaDB, MySQL founder Michael Widenius&lt;/h5&gt;
&lt;p&gt;What Oracle is doing wrong (visit the &lt;a href="http://www.muktware.com/2013/05/there-is-no-reason-at-all-to-use-MySQL-mariadb-MySQL-founder-michael-widenius/4298"&gt;website&lt;/a&gt; to find the reference for each point)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;New ‘enterprise’ extensions in MySQL are closed source&lt;/li&gt;
&lt;li&gt;The bugs database is not public anymore&lt;/li&gt;
&lt;li&gt;The MySQL public repositories are not anymore actively updated.&lt;/li&gt;
&lt;li&gt;Security problems are not communicated nor addressed quickly (This is making     Linux distributions very annoyed with Oracle)&lt;/li&gt;
&lt;li&gt;Instead of fixing bugs, Oracle is removing features:&lt;/li&gt;
&lt;li&gt;New code in MySQL 5.5 doesn’t have test cases anymore.&lt;/li&gt;
&lt;li&gt;Some of the new code is surprisingly good by Oracle, but unfortunately the quality varies and a notable part needs to be rewritten before we can include it in MariaDB&lt;/li&gt;
&lt;li&gt;And, probably worst of all, it’s impossible for the community to work with the MySQL developers at Oracle.&lt;/li&gt;
&lt;li&gt;Oracle doesn’t accept patches&lt;/li&gt;
&lt;li&gt;There is no public roadmap&lt;/li&gt;
&lt;li&gt;There is no way to discuss with MySQL developers how to implement things or how the current code works&lt;/li&gt;
&lt;/ul&gt;</content><category term="posts"/></entry><entry><title>Functional Conference: Random Notes...</title><link href="https://bharath12345.github.io/posts/functional-conference-random-notes/" rel="alternate"/><published>2014-10-17T00:00:00-04:00</published><updated>2014-10-17T00:00:00-04:00</updated><author><name>Bharadwaj</name></author><id>tag:bharath12345.github.io,2014-10-17:/posts/functional-conference-random-notes/</id><summary type="html">&lt;p&gt;The first '&lt;a href="http://functionalconf.com/"&gt;Functional Conference&lt;/a&gt;' happened in Bangalore between Oct 9-11. I had been keenly looking forward to it. This is a quick post on the sessions I attended and the conference itself. As the lineup of speakers and topics shaped up in the buildup to the conference on their website …&lt;/p&gt;</summary><content type="html">&lt;p&gt;The first '&lt;a href="http://functionalconf.com/"&gt;Functional Conference&lt;/a&gt;' happened in Bangalore between Oct 9-11. I had been keenly looking forward to it. This is a quick post on the sessions I attended and the conference itself. As the lineup of speakers and topics shaped up in the buildup to the conference on their website, it heightened my expectations. As a younger engineer I have gone through the cycle of expecting too much from conferences and thus not being able to learn sufficiently from that which was on offer. Time has had a mellowing effect... I find it much better to keep an open mind and try to absorb all that is on offer. And then, a little later, retain only that which is useful/pertinent. With that mindset and approach I found 'Functional Conference' a very fulfilling technology experience - plenty of technical richness to absorb and sufficient ideas to retain for long.&lt;/p&gt;
&lt;h4 id="day-1-session-1-the-keynote-by-venkat-subramaniam"&gt;Day 1, Session 1: The Keynote, by Venkat Subramaniam&lt;/h4&gt;
&lt;p&gt;Venkat is as fabulous a speaker/presenter as he is writer/thinker. The theme of his keynote was an elaboration on the idea of &lt;strong&gt;mainstream&lt;/strong&gt;. Why did it take many centuries for heliocentricity to gain acceptance over the &lt;em&gt;mainstream&lt;/em&gt; idea of geocentricity? Why did it take many centuries for well meaning doctors to accept the existence microbial &lt;em&gt;germs&lt;/em&gt; as the cause of diseases over other widely held &lt;em&gt;mainstream&lt;/em&gt; theories? &lt;strong&gt;Mainstream&lt;/strong&gt; in the world of programming is OOP in the style of Java and C++. They may not be false idol's after all. However that &lt;em&gt;non-mainstream&lt;/em&gt; is generally not even introduced in colleges and software engineers have proceeded to long careers without even basic understanding of other programming approaches is sad indeed. Venkat drew the attention of the audience that things were nevertheless changing. Maybe it took a long incubation for the geocentric idea to gain... but once the right ideas, even if &lt;em&gt;non-mainstream&lt;/em&gt;, gain a foothold, there is no turning back. Maybe functional programming has had a 80 year incubation! After all it took 22 years for even OOPS to become &lt;em&gt;mainstream&lt;/em&gt;. But things are changing (lambdas in java!) and will never be the same again! &lt;/p&gt;
&lt;p&gt;Two answers by Venkat in the post-session stuck a chord with me. &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The first was a question on his favourite language... after all he had written books and applications in so many! Venkat responded by saying he treated languages as tools, say like vehicles. we sometime use a car and sometimes a flight, don't we!? So no &lt;em&gt;favorites&lt;/em&gt;. &lt;/li&gt;
&lt;li&gt;The second was actually a counter-question by Venkat - Do languages shape new thought or new thoughts shape languages? There is enough material in terms of academic research to tell us that this stream runs both ways! &lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="day-1-session-2-haskell-for-everyday-programmers-by-venkat-subramaniam"&gt;Day 1, Session 2: Haskell for everyday programmers, by Venkat Subramaniam&lt;/h4&gt;
&lt;p&gt;I was split between going for the Haskell session or the parallel Elm session. Since my work has been more and more away from UI, I chose Haskell. However, later on, heard great feedback on the Elm session by other folks at the conf. Now waiting for the slides of that session to be up to check it out.&lt;/p&gt;
&lt;p&gt;The Haskell session was a runaway hit with Venkat giving a quick intro of the many aspects of the language using the ghci REPL. The key ideas learn/relearnt were:  &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Polymorphic Types&lt;/li&gt;
&lt;li&gt;Functional Purity (no haskell speaker can never not mention this!): functions cannot have side effect. and purity always means thread-safety!&lt;/li&gt;
&lt;li&gt;Memoization: the massive performance gain that could comes ones way due to functional purity&lt;/li&gt;
&lt;li&gt;Order of program evaluation: Normative vs. Applicative&lt;/li&gt;
&lt;li&gt;Expressions vs. Statements: statements promote mutability that one cannot escape. expressions do the opposite&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="day-1-session-3-functional-programming-in-large-scale-data-processing-by-kishore-nallan"&gt;Day 1, Session 3: &lt;a href="https://speakerdeck.com/kishore/applying-functional-programming-principles-to-large-scale-data-processing"&gt;Functional programming in large scale data processing&lt;/a&gt;, by Kishore Nallan&lt;/h4&gt;
&lt;p&gt;My day job is programming in Scala to build a &lt;em&gt;large scale data processing&lt;/em&gt; platform. So choosing this session from a fellow traveler was natural. Kishore described the journey at Indix to build a web-scale product catalog by crawling and indexing the internet. The story behind their adoption of the Lambda Architecture as propounded by Nathan Marz. Kishore spoke of the benefits of using a log-structured database as first port of store than a &lt;em&gt;continuously mutating&lt;/em&gt; RDBMS or column store. Indix is a big Hadoop shop with continuous jobs to persist data, aggregate it and run both ritual/ad-hoc queries. It was a fascinating talk giving a peek into what must be a very exciting product to develop.&lt;/p&gt;
&lt;h4 id="day-1-session-4-compile-your-own-cloud-with-mirageos-by-thomas-gazagnaire"&gt;Day 1, Session 4: &lt;a href="http://decks.openmirage.org/functionalconf14#/"&gt;Compile your own cloud with MirageOS&lt;/a&gt;, by Thomas Gazagnaire&lt;/h4&gt;
&lt;p&gt;Unikernels are specialized OS kernels that are written in a high-level language and act as individual software components. A full application (or appliance) consists of a set of running unikernels working together as a distributed system. MirageOS is written in the OCaml (http://ocaml.org) language and emits unikernels that run on the Xen hypervisor. One may ask - whats the main advantage of unikernels? Unikernels win by allowing applications to access hardware resources directly without having to make repeated privilege transitions to move data between user space and kernel space. Unikernel OS are being attempted in more languages than just OCaml. There is HaLVM in Haskell, Ling in Erlang, OSv in Java and maybe more. This introduction to unikernels and perspective on Virtualization was superlative and I wish I could have absorbed more.&lt;/p&gt;
&lt;h4 id="day-1-session-5-property-based-testing-for-functional-domain-models-by-debasish-ghosh"&gt;Day 1, Session 5: &lt;a href="http://www.slideshare.net/debasishg/property-based-testing"&gt;Property based testing for functional domain models&lt;/a&gt;, by Debasish Ghosh&lt;/h4&gt;
&lt;p&gt;I have been an avid reader of Debasish Ghosh's blog and books. They are rich both in theoretical arguments and practical advise. Was thus looking forward for this session. Debasis introduced ScalaCheck/QuickCheck to the audience. Since I have used ScalaCheck before the tool itself was not new. However, the theoretical underpinnings for property based testing were a big takeaway. To quote a few statements from the session that will stay with me - &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;"To test any sufficiently complex domain model with xUnit based testing will mean some corner cases will be missed". And unit-testing, automated-testing is all about catching those corner cases&lt;/li&gt;
&lt;li&gt;"Paramatricity tests more conditions than unit test suites ever will - Edward Knett"&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The talk also included a intro to dependenty-types and parametric polymorphism. One key takeaway of attending conferences is coming to know of new books - and after this session "Theorems for free!" by Phil Wadler got added to my ToRead list.&lt;/p&gt;
&lt;h4 id="straddling-sessions-session-6-clojurescript-om-and-code-jugalbandi-session-7-functional-groovy-and-learning-from-haskell"&gt;Straddling sessions - Session 6: Clojurescript &amp;amp; Om, and Code Jugalbandi. Session 7: Functional Groovy, and Learning from Haskell&lt;/h4&gt;
&lt;p&gt;Eearlier in the day, Naresh Jain, the chief organiser of the conference, had advised the attendees to use what he called "the law of two feet" - the law asks the attendee you to get the most out of the conference by walking to the sessions even in-between, if required. Unable to decide which session to stay in for the last two sessions I decided to use this law!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Om is a library for ClojureScript programmers. Vagmi provided a &lt;a href="http://www.slideshare.net/vagmi/pragmatic-functional-programming-in-the-js-land-with-clojurescript-and-om"&gt;breezy intro&lt;/a&gt; to why ClojureScript makes React.js faster. And finally, why/how Om make ClojureScript faster by giving an example of DOM diffing and the showComponentUpdate() call&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.codejugalbandi.org"&gt;Code Jugalbandi&lt;/a&gt; was a very interesting act between two programmers (playing the roles of &lt;em&gt;Brahma&lt;/em&gt; and &lt;em&gt;Krishna&lt;/em&gt;) to showcase interesting features like currying and pattern matching across languages. It was like a breath of fresh air to the otherwise usual way of sessions at a conference &lt;/li&gt;
&lt;li&gt;Groovy is a dynamic language on the JVM. Since I have never programmed in Groovy I was pleasantly surprised with its many capabilities in functional programming showcased by Naresha&lt;/li&gt;
&lt;li&gt;The Haskell experience session was filled with anecdotes that the speaker, Aditya Godbole, recounted from his workplace (and elsewhere) in trying to bring in healthy-code practice&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="day-2-session-1-the-role-of-fear-in-language-adoption-by-bruce-tate"&gt;Day 2, Session 1: The role of Fear in language adoption, by Bruce Tate&lt;/h4&gt;
&lt;p&gt;Bruce Tate's book "Seven Languages in Seven Days" was probably the first book I bought on my new Kindle. I was looking forward to hear from this Guru of Java and Ruby. The title added to the curiosity. Bruce kicked off the talk making some very thought provoking observations -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Fear and &lt;em&gt;Discovery&lt;/em&gt; are intertwined&lt;/li&gt;
&lt;li&gt;When fear lurks within the team, &lt;em&gt;code commits&lt;/em&gt; go down!&lt;/li&gt;
&lt;li&gt;Bruce went on to draw a parallel between Geoff Moore's celebrated 'Technology Adoption Curve' and called for the audience to think of language adoption on similar lines. Languages &lt;em&gt;die&lt;/em&gt; in the chasm...&lt;/li&gt;
&lt;li&gt;Just as frequency of technology waves frustrates businessmen, the frequency of language waves frustrates programmers (how many new languages did I come across just in 2014 so far - Wolfram, Swift, Hack... and I can go on...)&lt;/li&gt;
&lt;li&gt;Just like 'behaviour change' apps have a difficult time in adoption, significant syntax changed languages lose out in the mass programmer market. Java success due to its almost identical syntax to C is NOT incidental...&lt;/li&gt;
&lt;li&gt;And then there are language adoption curves and &lt;em&gt;language paradigm&lt;/em&gt; adoption curves. The curves for Procedural, OOPS and Functional are much deeper, wider and steeper&lt;/li&gt;
&lt;li&gt;What are the &lt;em&gt;fears&lt;/em&gt; for different consumers of languages:&lt;ul&gt;
&lt;li&gt;Paralysing Fear: Jobs, Cost&lt;/li&gt;
&lt;li&gt;Motivating Fear: Concurrency, Multi-core, Time-to-market&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Now here is a question to all career programmers - how many more years do we want to program in the C/C++ style syntax??   &lt;/li&gt;
&lt;li&gt;The next wave of language adoption will &lt;em&gt;NOT&lt;/em&gt; be a big massive wave like Java. It will instead be a tsunami of many smaller waves composed of of the Scala's, Clojure's, Ruby's, Swift's, Ocaml's and many many more...&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="day-2-session-2-functional-programming-using-dyalog-by-morten-kromberg"&gt;Day 2, Session 2: Functional programming using Dyalog, by Morten Kromberg&lt;/h4&gt;
&lt;p&gt;&lt;a href="http://tryapl.org/"&gt;APL&lt;/a&gt;. I had never heard of it. The name of Ken Iverson sounded familiar and in league with Alonzo Church, Alan Turing, Haskell Curry et al. But nothing more...&lt;/p&gt;
&lt;p&gt;And what an eye-opener it was! If there was an award for the most mind-blowing session, then this one won the 1st, 2nd and 3rd places!&lt;/p&gt;
&lt;p&gt;The APL language is beyond words that this humble writer can conjure up. It takes the idea of &lt;em&gt;functional&lt;/em&gt; programming to an all new level is all I can say (and yet its not logic programming like Prolog or likes).&lt;/p&gt;
&lt;p&gt;However the story of Dyalog as a company and its business came across as not any less astonishing. To be in &lt;em&gt;software development&lt;/em&gt; business for over 40 years, going through the many industry upheavals (mainframes -&amp;gt; unix/windows -&amp;gt; cloud) and solving some of the most difficult problems in all streams of engineering and yet remaining totally unknown to most! Morten Kromberg, the CTO of Dyalog, had a interesting observation to share - "All engineers take to APL easily except the software engineers". Now that says it all about our computer science education system. &lt;/p&gt;
&lt;h4 id="day-2-session-3-monads-you-already-use-by-tejas-dinkar"&gt;Day 2, Session 3: &lt;a href="https://speakerdeck.com/gja/lightning-monads-you-already-use-without-knowing-it"&gt;Monads you already use&lt;/a&gt;, by Tejas Dinkar&lt;/h4&gt;
&lt;p&gt;Next was a lightening talk by Tejas on giving a (yet another!) perspective on Monads. In a delightfully constructed talk Tejas presented the idea through a box analogy and thereby trying to simplify the understanding of monads for list, IO etc. There would not be a functional programmer on the planet who has not heard/read at least one video/blog on Monads without scratching his head in disbelief. It takes a certain bravado to attempt presenting the topic to a roomful of programmers at the &lt;em&gt;functional&lt;/em&gt; conference! And Tejas did a splendid job of it.  &lt;/p&gt;
&lt;h4 id="day-2-session-4-purely-functional-data-structures-demystified-by-mohit-thatte"&gt;Day 2, Session 4: &lt;a href="http://www.slideshare.net/mohitthatte/purely-functional-data-structures-demystified"&gt;Purely functional data structures demystified&lt;/a&gt;, by Mohit Thatte&lt;/h4&gt;
&lt;p&gt;Mohit's talk was based on Chris Okasaki's famed work on the topic "&lt;a href="http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf"&gt;Purely Functional Data Structures&lt;/a&gt;". This is a deep topic. I have tried to read Okasaki's work and have found it hard to get past the first few chapters. It is in the same league as SICP. I was curious about how justice for such a voluminous work can be done in one session! And the first challenge was to try and formulate the problem definition - Mohit kept the audience glued as he demystified why it is tough to implement a &lt;em&gt;correct&lt;/em&gt;, &lt;em&gt;fully functional&lt;/em&gt; and &lt;em&gt;performant&lt;/em&gt; implementation of common ADT's like Queue, List, Map etc. The next challenge was to explain the complex idea of &lt;strong&gt;structural sharing&lt;/strong&gt;. The whole idea of &lt;em&gt;persistence&lt;/em&gt; in data structures and &lt;em&gt;performance&lt;/em&gt; is after all derived from &lt;em&gt;structural sharing&lt;/em&gt;. One has to really try and attempt implementing common data-structures with &lt;em&gt;structural-sharing&lt;/em&gt; to start comprehending the complexity that it introduces and the power behind the idea. That it becomes extremely complex to come up with even simple sounding list implementations when &lt;em&gt;no side effects&lt;/em&gt; barrier is introduced is also an idea that needs real hands on work to grasp. Mohit's session was a fine effort to jog my memories of my few late night battle-losses with Okasaki and the illuminating world of &lt;em&gt;persistent&lt;/em&gt; data-structures!&lt;/p&gt;
&lt;h4 id="day-2-session-5-demystify-functional-jargons-by-mushtaq-ahmed"&gt;Day 2, Session 5: Demystify functional jargons, by Mushtaq Ahmed&lt;/h4&gt;
&lt;p&gt;Scala has multiple library APIs that are very well suited for certain design usecases. In this talk Mushtaq covered, with examples, the when-to, why-to and how-to use the &lt;a href="https://github.com/scala/async"&gt;Async&lt;/a&gt;, &lt;a href="http://www.scala-lang.org/files/archive/nightly/docs/library/index.html#scala.concurrent.Await$"&gt;Await&lt;/a&gt;, &lt;a href="http://docs.scala-lang.org/overviews/core/futures.html"&gt;Blocking/Future/Promise&lt;/a&gt;. In addition to these &lt;em&gt;in-house&lt;/em&gt; Scala utilities he also demonstrated the usecase for &lt;a href="https://github.com/ReactiveX/RxJava/wiki/Observable"&gt;Observables&lt;/a&gt; in the context of streams. He demonstrated an example of a simple application built using these to capture, filter and search on tweets.   &lt;/p&gt;
&lt;h4 id="day-2-session-6-object-functional-programming-beautiful-unification-or-kitchen-sink-by-rahul-goma-phulore"&gt;Day 2, Session 6: Object-functional programming: Beautiful unification or kitchen sink, by Rahul Goma Phulore&lt;/h4&gt;
&lt;p&gt;This was a talk I was eagerly looking forward to. The reason was very specific - just days before this conference a couple of close friends (who were former colleagues) engaged me in long winding discussion on building &lt;em&gt;large&lt;/em&gt; applications in Scala. Like all languages Scala has its pro's and con's. However opinions stand divided by a wide(ening?) chasm...&lt;/p&gt;
&lt;p&gt;I was looking forward to Rahul to give me some new perspectives. In a technically-engaging talk Rahul deconstructed the myth of Scala being a kitchen-sink. The very start of the talk was made intriguing when he proceeded to ask the audience 3 questions - (a) how many see the future in a purely OOPS world? (b) how many see the future in a purely functional world? (c) how many see the future in a hybrid of both? Almost no hands went up for question (a). A few enthusiastic hands went up for question (b). But almost &lt;strong&gt;ALL&lt;/strong&gt; hands went up for question (c). That was significant food for thought in itself. &lt;/p&gt;
&lt;p&gt;Scala has been called the "Grand unification of all programming languages". It has also been called "Vegetarian ham in chicken flavor"... arguments like these have split the programming world into tribes of believers/unbelievers without significantly adding to the knowledge/understanding of either groups. Coming from a workplace where we use Scala predominantly I can testify this to be true by experience.&lt;/p&gt;
&lt;p&gt;But it pays great dividends to dive-in a little deep to understand just &lt;em&gt;how&lt;/em&gt; Scala provides this &lt;em&gt;grand-unification&lt;/em&gt;. Thats where real illumination is. How can functions be first-class objects? How does pattern-matching happen under the hood? The idea behind algebraic data types? How can using the mere term &lt;em&gt;sealed&lt;/em&gt; lead to exhaustiveness checking leading to much higher type-checking kick-in at compile-time? How do mixins works? Rahul had it all covered.&lt;/p&gt;
&lt;h4 id="day-2-session-7-methodologies-mathematics-and-the-metalinguistic-implications-of-swift-by-daniel-steinberg"&gt;Day 2, Session 7: Methodologies, Mathematics, and the Metalinguistic Implications of Swift, by Daniel Steinberg&lt;/h4&gt;
&lt;p&gt;How do we learn? Did we really learn programming when we first read a programming book? In my case, the first programming book I came across was, probably, 'C Programming' K&amp;amp;R. And I did not learn a think even after months of reading and even typing the code in first few chapters!&lt;/p&gt;
&lt;p&gt;So, how do we learn? Lets leave programming for a moment. How did we learn math? How did we learn geometry? Were we able to &lt;em&gt;see&lt;/em&gt; the problems and solutions? For example, did the area of a triangle just always mean, half multiplied by base multiplied by height, so much so that we proceeded to find the area of a triangle with sides {2,3,5}, OR we could &lt;em&gt;see&lt;/em&gt; why/how the triangle's area was so?&lt;/p&gt;
&lt;p&gt;Daniel is the author of multiple books in the iOS Apps world. In earlier life, he had been a math teacher at high school. And he came across as a fabulous father to his young, learnful kids.&lt;/p&gt;
&lt;p&gt;Daniel urged us to think how we &lt;em&gt;learn&lt;/em&gt;. And thereby also think about how we &lt;em&gt;teach&lt;/em&gt;. We don't learn by knowing the rules. We don't learn when someone tells us a definition of something. We learn by realising things bit-by-bit. We learn by building small things and making them bigger. We learn by looking at things built by others. We learn slowly. We learn more by observation than by anything else. And Daniel taught this to a roomful of adults. By showing us Donald Duck cartoons and explaining us pythagoras theorem. I became a dad just few weeks ago - And I can only thank Daniel for this talk. I learnt immensely.  &lt;/p&gt;
&lt;h4 id="epilogue"&gt;Epilogue&lt;/h4&gt;
&lt;p&gt;In between the sessions I had a chance to meet and discuss with Kishore Nallan, Debasis Ghosh, Venkat Subramaniam and many others. Except ThoughtWorks I don't think the conference had many representatives from the established big companies - which in itself was a boon as I got to hear and know so much happening in the startup/small-company world (that smaller companies are riding the wave of newer technologies must come as a no surprise). The value of a conference does not lie just in the sessions but also in realising a few things like these iin-between. I also ran into some of my old acquaintances and it was every second worth of my time to come to know of their new technical endeavours and the fate/trajectory of those that we shared long ago!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Functional Conference was a wonderful conference. Am glad that I was there.&lt;/strong&gt;&lt;/p&gt;</content><category term="posts"/></entry><entry><title>Experiments with XML XPath libraries on JVM</title><link href="https://bharath12345.github.io/posts/experiments-with-xml-xpath-libraries-on-jvm/" rel="alternate"/><published>2014-06-28T00:00:00-04:00</published><updated>2014-06-28T00:00:00-04:00</updated><author><name>Bharadwaj</name></author><id>tag:bharath12345.github.io,2014-06-28:/posts/experiments-with-xml-xpath-libraries-on-jvm/</id><summary type="html">&lt;p&gt;These days I mostly program in Scala. Few weeks ago I ran into a problem to search for data within fairly large XMLs. XPath and XQuery are the standard technologies to query XML's. JVM programmers have a choice of multiple libraries to choose from when it comes to XPath. One …&lt;/p&gt;</summary><content type="html">&lt;p&gt;These days I mostly program in Scala. Few weeks ago I ran into a problem to search for data within fairly large XMLs. XPath and XQuery are the standard technologies to query XML's. JVM programmers have a choice of multiple libraries to choose from when it comes to XPath. One constraint in my problem was that the program to crunch these XML was a long-running one. So, apart from trying to make the search fast I had to make sure that the CPU/memory requirements were sane. On submitting a XPath search if a library forked many hundred threads, broke the XML into many hundred stubs thus consuming every single ounce of CPU/RAM at disposal on my machine, then it was simply a no-go. Even if such a library turned out to be an order of magnitude faster than the rest.&lt;/p&gt;
&lt;div class="toc"&gt;&lt;span class="toctitle"&gt;Table of Contents&lt;/span&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#results-tabulated"&gt;Results Tabulated&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#jvisualvm-graphs"&gt;JVisualVM Graphs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#code"&gt;Code&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#javaxxpath"&gt;javax.xpath&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#saxon"&gt;Saxon&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#vtd"&gt;VTD&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#scala"&gt;Scala&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#epilogue"&gt;Epilogue&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;A look at the XML-XPath JVM library landscape made me shortlist the following for a quick investigation - &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/scala/scala-xml"&gt;scala.xml&lt;/a&gt; - Scala's built-in parser&lt;/li&gt;
&lt;li&gt;&lt;a href="http://docs.oracle.com/javase/7/docs/api/javax/xml/xpath/package-summary.html"&gt;javax.xml.xpath&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://saxon.sourceforge.net/saxon7.7/api-guide.html"&gt;net.sf.saxon&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://vtd-xml.sourceforge.net/"&gt;vtd-xml&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This post is a work-in-progress and I will refrain from drawing conclusions. As and when I find more, I shall add. Some passing reader may find the numbers helpful for some other cause in the wild.&lt;/p&gt;
&lt;p&gt;Now, the environment details -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The approx size of XML's I used was ~ 70MB. That does not make it very large but the complexity of the structure can be the &lt;em&gt;dark&lt;/em&gt; variable in XML processing. Even a 5MB XML with small elements, recursive lookups etc (those that people refer to as XML &lt;em&gt;database&lt;/em&gt;) can be much harder to search within than a 500MB one which has a straight simple flow (say like Log4J Xml logs). The XML I used was neither as complex as a &lt;em&gt;database&lt;/em&gt; or as simple as a &lt;em&gt;log&lt;/em&gt;. It was more alike the &lt;em&gt;configuration&lt;/em&gt; (more complex than tomcat web.xml but similar) XML files with fairly deep nesting&lt;/li&gt;
&lt;li&gt;All numbers are mean over run of 30 iterations. they should be treated as ballparks&lt;/li&gt;
&lt;li&gt;Tests were run on my 4core 8GB Mac OSX Mavericks&lt;/li&gt;
&lt;li&gt;Java version "1.7.0_51". Scala version "2.11.0"&lt;/li&gt;
&lt;li&gt;No cpu/memory hungry process running on the system while running the test. It was just a text editor, console, test application and operating system services after a fresh reboot&lt;/li&gt;
&lt;li&gt;Tests tried with 4 big buckets of Xmx setting - 512M, 1024M, 2048M, 4096M&lt;/li&gt;
&lt;li&gt;All numbers and screen captures are with jvisualvm. wanted to use jstat but got a little lazy&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;One important consideration while choosing a XML library is the API. But that is project specific and I leave it out of this comparison.&lt;/p&gt;
&lt;h3 id="results-tabulated"&gt;Results Tabulated&lt;/h3&gt;
&lt;table class="table table-striped table-bordered table-hover table-condensed"&gt;
    &lt;thead&gt;
        &lt;tr&gt;
            &lt;th colspan="11" class="text-center"&gt;Xmx512m&lt;/th&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&amp;nbsp;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;Time Taken&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;App CPU Usage&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;GC CPU Usage&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;App Heap Size&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;Heap Used&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;Eden collection count/time spent&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;Old Gen collection count/time spent&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;Eden pattern&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;Survivor pattern&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;&lt;strong&gt;Old Gen pattern&lt;/strong&gt;&lt;/td&gt;
        &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;strong&gt;scala.xml&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;240s&lt;/td&gt;
            &lt;td&gt;70-80%&lt;/td&gt;
            &lt;td&gt;20%&lt;/td&gt;
            &lt;td&gt;512M&lt;/td&gt;
            &lt;td&gt;250-300M&lt;/td&gt;
            &lt;td&gt;359/15.2s&lt;/td&gt;
            &lt;td&gt;303/3m18s&lt;/td&gt;
            &lt;td&gt;either 0M or 170M&lt;/td&gt;
            &lt;td&gt;not much usage&lt;/td&gt;
            &lt;td&gt;between 170-340M&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;strong&gt;javax.xml.xpath&lt;/strong&gt;&lt;/td&gt;
            &lt;td colspan="10" class="text-center"&gt;does not complete&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;strong&gt;net.sf.saxon.xpath&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;67s&lt;/td&gt;
            &lt;td&gt;60-80%&lt;/td&gt;
            &lt;td&gt;20%&lt;/td&gt;
            &lt;td&gt;512M&lt;/td&gt;
            &lt;td&gt;250-300M&lt;/td&gt;
            &lt;td&gt;162/6.2s&lt;/td&gt;
            &lt;td&gt;123/39.3s&lt;/td&gt;
            &lt;td&gt;0-170M tall spikes&lt;/td&gt;
            &lt;td&gt;consistent use of 57M * 2&lt;/td&gt;
            &lt;td&gt;stepwise between 0-340M&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;strong&gt;vtd.xml&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;11s&lt;/td&gt;
            &lt;td&gt;26%&lt;/td&gt;
            &lt;td&gt;0.10%&lt;/td&gt;
            &lt;td&gt;500M&lt;/td&gt;
            &lt;td&gt;150-250M&lt;/td&gt;
            &lt;td&gt;13/138ms&lt;/td&gt;
            &lt;td&gt;9/262ms&lt;/td&gt;
            &lt;td&gt;between 100-170M&lt;/td&gt;
            &lt;td&gt;very less and infrequent&lt;/td&gt;
            &lt;td&gt;between 80-240M&lt;/td&gt;
        &lt;/tr&gt;
    &lt;/tbody&gt;
    &lt;thead&gt;
        &lt;tr&gt;
            &lt;th colspan="11" class="text-center"&gt;Xmx1024m&lt;/th&gt;
        &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;strong&gt;scala.xml&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;85s&lt;/td&gt;
            &lt;td&gt;70-80%&lt;/td&gt;
            &lt;td&gt;20%&lt;/td&gt;
            &lt;td&gt;1G&lt;/td&gt;
            &lt;td&gt;250-500M&lt;/td&gt;
            &lt;td&gt;299/36s&lt;/td&gt;
            &lt;td&gt;38/14s&lt;/td&gt;
            &lt;td&gt;0-340M tall spikes&lt;/td&gt;
            &lt;td&gt;100M consistent&lt;/td&gt;
            &lt;td&gt;80-600M neat triangles&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;strong&gt;javax.xml.xpath&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;57s&lt;/td&gt;
            &lt;td&gt;50-70%&lt;/td&gt;
            &lt;td&gt;10-20%&lt;/td&gt;
            &lt;td&gt;1G&lt;/td&gt;
            &lt;td&gt;250-500M&lt;/td&gt;
            &lt;td&gt;197/14s&lt;/td&gt;
            &lt;td&gt;34/15s&lt;/td&gt;
            &lt;td&gt;0-340M tall spikes&lt;/td&gt;
            &lt;td&gt;100M consistent&lt;/td&gt;
            &lt;td&gt;200-600M neat triangles&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;strong&gt;net.sf.saxon.xpath&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;49s&lt;/td&gt;
            &lt;td&gt;50-70%&lt;/td&gt;
            &lt;td&gt;10-20%&lt;/td&gt;
            &lt;td&gt;1G&lt;/td&gt;
            &lt;td&gt;250-500M&lt;/td&gt;
            &lt;td&gt;110/12s&lt;/td&gt;
            &lt;td&gt;34/15s&lt;/td&gt;
            &lt;td&gt;0-340M tall spikes&lt;/td&gt;
            &lt;td&gt;100M consistent&lt;/td&gt;
            &lt;td&gt;200-600M neat triangles&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;strong&gt;vtd.xml&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;11s&lt;/td&gt;
            &lt;td&gt;30%&lt;/td&gt;
            &lt;td&gt;1-2%&lt;/td&gt;
            &lt;td&gt;300-800M&lt;/td&gt;
            &lt;td&gt;200-700M&lt;/td&gt;
            &lt;td&gt;11/66ms&lt;/td&gt;
            &lt;td&gt;6/204ms&lt;/td&gt;
            &lt;td&gt;200-300M&lt;/td&gt;
            &lt;td&gt;10M&lt;/td&gt;
            &lt;td&gt;400-600M&lt;/td&gt;
        &lt;/tr&gt;
    &lt;/tbody&gt;
    &lt;thead&gt;
        &lt;tr&gt;
            &lt;th colspan="11" class="text-center"&gt;Xmx2048m&lt;/th&gt;
        &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;strong&gt;scala.xml&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;70s&lt;/td&gt;
            &lt;td&gt;70-80%&lt;/td&gt;
            &lt;td&gt;10-20%&lt;/td&gt;
            &lt;td&gt;2G&lt;/td&gt;
            &lt;td&gt;0.5-1G&lt;/td&gt;
            &lt;td&gt;154/27s&lt;/td&gt;
            &lt;td&gt;26/21s&lt;/td&gt;
            &lt;td&gt;0-680M tall spikes&lt;/td&gt;
            &lt;td&gt;100M consistent&lt;/td&gt;
            &lt;td&gt;200M-1G neat triangles&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;strong&gt;javax.xml.xpath&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;59s&lt;/td&gt;
            &lt;td&gt;40-70%&lt;/td&gt;
            &lt;td&gt;10-20%&lt;/td&gt;
            &lt;td&gt;2G&lt;/td&gt;
            &lt;td&gt;0.5-1G&lt;/td&gt;
            &lt;td&gt;105/14s&lt;/td&gt;
            &lt;td&gt;23/17s&lt;/td&gt;
            &lt;td&gt;0-680M tall spikes&lt;/td&gt;
            &lt;td&gt;100M consistent&lt;/td&gt;
            &lt;td&gt;0.3-1.1G&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;strong&gt;net.sf.saxon.xpath&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;39s&lt;/td&gt;
            &lt;td&gt;40-70%&lt;/td&gt;
            &lt;td&gt;10-20%&lt;/td&gt;
            &lt;td&gt;2G&lt;/td&gt;
            &lt;td&gt;0.5-1G&lt;/td&gt;
            &lt;td&gt;69/10s&lt;/td&gt;
            &lt;td&gt;18/8s&lt;/td&gt;
            &lt;td&gt;0-680M tall spikes&lt;/td&gt;
            &lt;td&gt;200M consistent&lt;/td&gt;
            &lt;td&gt;300-600M&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;strong&gt;vtd.xml&lt;/strong&gt;&lt;/td&gt;
            &lt;td&gt;11s&lt;/td&gt;
            &lt;td&gt;26%&lt;/td&gt;
            &lt;td&gt;0%&lt;/td&gt;
            &lt;td&gt;0.5-1.25G&lt;/td&gt;
            &lt;td&gt;0.5-1.25G&lt;/td&gt;
            &lt;td&gt;14/190ms&lt;/td&gt;
            &lt;td&gt;6/272ms&lt;/td&gt;
            &lt;td&gt;600M consistent&lt;/td&gt;
            &lt;td&gt;200M&lt;/td&gt;
            &lt;td&gt;1.3G no pattern&lt;/td&gt;
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 id="jvisualvm-graphs"&gt;JVisualVM Graphs&lt;/h3&gt;
&lt;table class="table table-bordered"&gt;
    &lt;tr&gt;
        &lt;td&gt;&lt;h5&gt;javax.xpath CPU and Memory&lt;/h5&gt;&lt;img src="/images/xml.xpath/javax.xml/javax.xml.xpath 1G.png" alt="javax.xpath CPU and Memory 1G" class="img-fluid"&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;&lt;h5&gt;javax.xpath GC&lt;/h5&gt;&lt;img src="/images/xml.xpath/javax.xml/GC javax.xml.xpath 1G.png" alt="javax.xpath GC 1G" class="img-fluid"&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;&lt;h5&gt;Saxon CPU and Memory&lt;/h5&gt;&lt;img src="/images/xml.xpath/saxon/net.sf.saxon.xpath 1G.png" alt="Saxon CPU and Memory 1G" class="img-fluid"&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;&lt;h5&gt;Saxon GC&lt;/h5&gt;&lt;img src="/images/xml.xpath/saxon/GC net.sf.saxon.xpath 1G.png" alt="Saxon GC 1G" class="img-fluid"&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;&lt;h5&gt;VTD CPU and Memory&lt;/h5&gt;&lt;img src="/images/xml.xpath/vtd/vtd.xml 1G.png" alt="VTD CPU and Memory 1G" class="img-fluid"&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;&lt;h5&gt;VTD GC&lt;/h5&gt;&lt;img src="/images/xml.xpath/vtd/GC vtd.xml 1G.png" alt="VTD GC 1G" class="img-fluid"&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;&lt;h5&gt;Scala XML Xpath CPU and Memory&lt;/h5&gt;&lt;img src="/images/xml.xpath/scala.xml/scala.xml 1G.png" alt="Scala XML CPU and Memory 1G" class="img-fluid"&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;&lt;h5&gt;Scala XML GC&lt;/h5&gt;&lt;img src="/images/xml.xpath/scala.xml/GC scala.xml 1G.png" alt="Scala XML GC 1G" class="img-fluid"&gt;&lt;/td&gt;
    &lt;/tr&gt;
&lt;/table&gt;

&lt;h3 id="code"&gt;Code&lt;/h3&gt;
&lt;h4 id="javaxxpath"&gt;javax.xpath&lt;/h4&gt;
&lt;pre&gt;
import org.w3c.dom.Document;
import java.io.IOException;
import org.xml.sax.SAXException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import java.io.FileInputStream;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathFactory;
import java.util._;
import javax.xml.xpath._
import org.w3c.dom.NodeList

object Main extends App {
    try {
        val builderFactory: DocumentBuilderFactory = DocumentBuilderFactory.newInstance();
        val builder: DocumentBuilder = builderFactory.newDocumentBuilder(); 
        val xPath: XPath =  XPathFactory.newInstance().newXPath();
        println((new Date()).toString)

        val compexp = xPath.compile("/mycompany/MyResourceSet/MyResource/MyResourceList/MyResource[@displayName='Dummy']")
        def evalXml() = {
            val document: Document = builder.parse(new FileInputStream("sample.xml"));

            val node = compexp.evaluate(document, XPathConstants.NODESET)
            node match {
                case n: NodeList =&gt; println(n + " at " + (new Date()).toString + " len = " + n.getLength())
                case _ =&gt; println("typecast to NodeList failed")
            }
        }

        val t1 = System.currentTimeMillis
        val i = 30

        for(j &lt;- 0 to i)
            evalXml();
        println((new Date()).toString())
        val t2 = System.currentTimeMillis
        println("avg time = " + (t2 - t1)/i)

    } catch {
        case e: Exception=&gt; e.printStackTrace();
    }
}
&lt;/pre&gt;
&lt;h4 id="saxon"&gt;Saxon&lt;/h4&gt;
&lt;pre&gt;
import java.io._;
import java.util._;
import org.w3c.dom.NodeList;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPathFactory;
import javax.xml.xpath.XPathExpression;
import net.sf.saxon.xpath.XPathEvaluator;
import net.sf.saxon.xpath.XPathFactoryImpl;
import org.w3c.dom.Document;
import javax.xml.xpath.XPathConstants;

object SaxonEx extends App {

    val builderFactory: DocumentBuilderFactory = DocumentBuilderFactory.newInstance();
    val builder: DocumentBuilder = builderFactory.newDocumentBuilder(); 

    val factory = new XPathFactoryImpl();
    val xc = factory.newXPath();
    val xpathCompiler: XPathEvaluator = xc.asInstanceOf[XPathEvaluator];

    val xstring = "//mycompany/MyResourceSet/MyResource/MyResourceList/MyResource[@displayName='dummy']"
    val expr: XPathExpression  = xpathCompiler.compile(xstring);

    println("running SaxonEx:" + (new Date()).toString)

    def evalXml() = {
        val document: Document = builder.parse(new FileInputStream("sample.xml"));

        val node = expr.evaluate(document, XPathConstants.NODESET);
        node match {
                case n: NodeList =&gt; println(n + " at " + (new Date()).toString + " len = " + n.getLength())
                case _ =&gt; println("typecast to NodeList failed")
        }       
    }

    val t1 = System.currentTimeMillis
    val i = 30

    for(j &lt;- 0 to i)
        evalXml();
    val t2 = System.currentTimeMillis
    println("avg time = " + (t2 - t1)/i)
    println((new Date()).toString())
}
&lt;/pre&gt;
&lt;h5 id="vtd"&gt;VTD&lt;/h5&gt;
&lt;pre&gt;
import com.ximpleware._;
import com.ximpleware.xpath._;
import java.util._;

object vtd extends App {

    val vg: VTDGen = new VTDGen();

    def loopvtd = {
        vg.parseFile("sample.xml", false);
        val vn:VTDNav = vg.getNav();
        val ap:AutoPilot = new AutoPilot(vn);
        ap.selectXPath("/mycompany/MyResourceSet/MyResource/MyResourceList/MyResource[@displayName='dummy']");
        val x = ap.evalXPath()
        if(x != -1) println("eval returned " + x)
        else println("eval failed")

        val value: Int = vn.getText();
        if (value != -1) {
            val title:String = vn.toNormalizedString(value);
            println(title);
        }
    }

    val t1 = System.currentTimeMillis
    val i = 30

    for(j &lt;- 0 to i)
        loopvtd

    println((new Date()).toString())
    val t2 = System.currentTimeMillis
    println("avg time = " + (t2 - t1)/i)

}
&lt;/pre&gt;
&lt;h5 id="scala"&gt;Scala&lt;/h5&gt;
&lt;pre&gt;
#!/bin/sh
exec scala "$0" "$@"
!#

import scala.xml
import scala.xml._
import java.util._

def findout(filename: String) = {
    val xf = xml.XML.loadFile(filename)
    val cec = (xf \\ "MyResource" filter ( _ \"@displayName" contains Text("Dummy")))
}

println((new Date()).toString())
val t1 = System.currentTimeMillis
val i = 30
for(j &lt;- 0 to i) {
    findout("sample.xml")
    println(s"iteration $j")
}
println((new Date()).toString())
val t2 = System.currentTimeMillis
println("avg time = " + (t2 - t1)/i)
&lt;/pre&gt;

&lt;h3 id="epilogue"&gt;Epilogue&lt;/h3&gt;
&lt;p&gt;VTD comes across as the fasted XPath of all. Saxon comes next. The standard library implementations of XPath by Java and Scala are much slower. The Scala implementation is not XPath at all and can just be called &lt;em&gt;XPath like&lt;/em&gt;. The code is very simplistic to infer a lot from CPU/memory graphs. I have tweaked the code to get a little better inference and intuition. An interested programmer might do the same to get a better idea.&lt;/p&gt;</content><category term="posts"/></entry><entry><title>Running Machine-Learning Assignments on my Laptop's Spark Cluster</title><link href="https://bharath12345.github.io/posts/run-machine-learning-assignments-on-a-laptops-spark-cluster/" rel="alternate"/><published>2014-05-12T00:00:00-04:00</published><updated>2014-05-12T00:00:00-04:00</updated><author><name>Bharadwaj</name></author><id>tag:bharath12345.github.io,2014-05-12:/posts/run-machine-learning-assignments-on-a-laptops-spark-cluster/</id><summary type="html">&lt;p&gt;The latest offering of Coursera's popular course on Machine Learning by Andrew Ng started in the first week of March. The course requires Matlab's Octave to be used to solve the assignments. Apart from trying to solve the problems in Octave, I decided to solve the assignments in the programming …&lt;/p&gt;</summary><content type="html">&lt;p&gt;The latest offering of Coursera's popular course on Machine Learning by Andrew Ng started in the first week of March. The course requires Matlab's Octave to be used to solve the assignments. Apart from trying to solve the problems in Octave, I decided to solve the assignments in the programming language of my choice - Scala. This is a quick post on the why's and the how's.&lt;/p&gt;
&lt;h3 id="why-scala-spark-and-distributed"&gt;Why Scala, Spark and Distributed?&lt;/h3&gt;
&lt;p&gt;As computing and web has grown, the size of data to process has grown much larger. To process these large volumes of data requires two things - (1) horizontal scalability/distribution (2) efficient usage of multicore compute. Languages like R and Octave are not built for either - that is, writing programs that run on a cluster and efficiently use all CPU/RAM is infeasible in these. They are good only for smaller datasets and POC (proof-of-concepts) on large production-like datasets. Large datasets and continuous data-streams requires software design and programming in compiled languages like C, JVM-based, Haskell etc. My preference is the JVM based languages. In the world of JVM, there are multiple open source frameworks that provide a platform to write statistical computing algorithms that run on a cluster, for example -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://spark.apache.org"&gt;Apache Spark&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://mahout.apache.org/"&gt;Apache Mahout&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://0xdata.com/"&gt;H20&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I chose Spark. Spark is written in Scala. Its MLib implementation includes many of the popular/simpler ML algorithms. Spark makes use of Mesos or HDFS for distribution support. It started as research project at AMPLabs at UC Berkeley and is now incubated at Apache.&lt;/p&gt;
&lt;h3 id="spinning-up-a-spark-cluster-with-vagrant-and-docker"&gt;Spinning up a Spark Cluster with Vagrant and Docker&lt;/h3&gt;
&lt;p&gt;Running a cluster on laptop firstly requires it be computationally well powered. I use a 4-CPU &amp;amp; 8GB-RAM Mac OSX machine. I would suggest that this is the minimum configuration.&lt;/p&gt;
&lt;p&gt;The second requirement is to have separation of the virtual machines that form the cluster from the system that runs it. I have found Vagrant to be a superb tool to run configurable virtual machines which can be shared with ease. &lt;a href="https://vagrantcloud.com/"&gt;Vagrant&lt;/a&gt; uses VirtualBox. I created a VM with Ubuntu 14 Trusty and allocated 2-CPU and 4GB-RAM for it exclusively on my laptop. The next idea is to run multiple VM's on this Ubuntu machine using &lt;a href="https://www.docker.io/"&gt;Docker&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Now why Docker? - If Vagrant provides heavyweight VM abstraction then Docker provides lightweight ones. The idea is to run multiple Docker based lightweight linux VM's on this Vagrant Ubuntu VM - this is because a Spark cluster needs multiple nodes like a Master, workers and Namenode (for HDFS). One can run Docker directly on the native machine using something like a TinyCore Linux OS. The steps to do so can be found on Docker's website. However it is better to avoid that and instead rely on Vagrant. There are couple of reasons for this -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Tiny Core Linux's contents are not persisted across reboots. Since we would be coding on these VM's, a loss of contents is scary&lt;/li&gt;
&lt;li&gt;Allocating CPU/RAM to multiple nodes directly running in a laptop is unclean. Its not easy to achieve this CPU/RAM distribution in Docker too (along with shared folder support). &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Vagrant really comes handy to alleviate these shortcomings. Further, it is super easy to suspend a Vagrant VM and the whole cluster status will be persisted as-is... I can't think of anything more &lt;em&gt;cool&lt;/em&gt; than that on the planet! &lt;/p&gt;
&lt;h3 id="the-steps"&gt;The Steps&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;My Vagrantfile is a small one. It uses Ubuntu 14.04 Trusty and allocates 4GB RAM and 2 CPU core exclusively -&lt;/p&gt;
&lt;p&gt;&lt;code&gt;VAGRANTFILE_API_VERSION = "2"
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vm.box = "shrink0r/ubuntu-trusty-server-x64"
  config.vm.provider "virtualbox" do |v|
    v.memory = 4096
    v.cpus = 2
  end
end&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;SSH into the Vagrant box and clone &lt;a href="https://github.com/amplab/docker-scripts"&gt;this repository&lt;/a&gt; from AmpLabs to get going with the next Docker step&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;Post cloning, run the command - &lt;code&gt;sudo ./deploy/deploy.sh -i amplab/spark:0.9.0 -w 1&lt;/code&gt; to bring up the cluster with 1 worker. We don't need more than 1 worker on our simulated cluster. And even with only one worker, there would be 4 nodes in this cluster (master, worker, namenode, domain name server). Expect this command to take quite some time to complete&lt;/li&gt;
&lt;li&gt;The next necessary step is to configure name resolution. The nameserver IP to put in /etc/resolv.conf would be shown at the end out console output of command run in step-3&lt;/li&gt;
&lt;li&gt;Follow the steps on the Github page of AmpLabs docker-scripts repo to make sure that a Scala shell can be attached and the example run&lt;/li&gt;
&lt;li&gt;The next step is to download Hadoop and place it in the Vagrant system. Hadoop is required to interact with the HDFS (we need a client). I used Hadoop v1.2.1&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The first assignment of Machine Learning course uses a txt file (ex1data1.txt) as data. The idea now is to place this on HDFS and run Spark linear regression on it. The HDFS in the AmpLabs cluster is created by a user called 'hdfs'. So we need to mimic a user with the same name on the Vagrant client system (this is a hack). So create a new user... my interaction -&lt;/p&gt;
&lt;p&gt;&lt;code&gt;$ sudo useradd -m hdfs
$ sudo passwd hdfs
Enter new UNIX password: 
Retype new UNIX password: 
passwd: password updated successfully
$ su hdfs
Password: 
$ whoami
hdfs
$&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Next transfer the ex1data1.txt from the local filesystem to HDFS. Use the &lt;em&gt;hadoop&lt;/em&gt; program from the downloaded Hadoop bundle (its in &lt;em&gt;bin&lt;/em&gt;) to talk to the HDFS -&lt;/p&gt;
&lt;p&gt;&lt;code&gt;hdfs@packer-virtualbox-iso:/vagrant$ hadoop-1.2.1/bin/hadoop fs -fs hdfs://master:9000 -mkdir /bharath
hdfs@packer-virtualbox-iso:/vagrant$ hadoop-1.2.1/bin/hadoop fs -fs hdfs://master:9000 -put /vagrant/data/ex1data1.txt /bharath
hdfs@packer-virtualbox-iso:/vagrant$ hadoop-1.2.1/bin/hadoop fs -fs hdfs://master:9000 -ls /
Found 2 items
drwxr-xr-x   - hdfs supergroup          0 2014-05-17 17:07 /bharath
drwxr-xr-x   - hdfs supergroup          0 2014-05-17 12:06 /user
hdfs@packer-virtualbox-iso:/vagrant$ hadoop-1.2.1/bin/hadoop fs -fs hdfs://master:9000 -ls /bharath
Found 1 items
-rw-r--r--   3 hdfs supergroup       1359 2014-05-17 17:07 /bharath/ex1data1.txt&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;So by now, we have a working Spark cluster and have placed our data on its HDFS. The next step is to write a Spark application. My application is called &lt;em&gt;sparkling&lt;/em&gt; and its on github &lt;a href="https://github.com/bharath12345/sparkling"&gt;here&lt;/a&gt;. You may clone the repository onto your Vagrant Ubuntu box. You will need &lt;em&gt;sbt&lt;/em&gt; to compile this project. The compilation could take some time. So far I have written just 2 programs in this project. There is one called "Test.scala" which does a simple line-count of the file placed on HDFS in the previous step. You may want to run this, it should print a count of 92 mixed in with a lot of java-logging output. If this program worked then you can run the other program "LocalFileLinearRegression.scala". The command to run these from the sbt prompt is, quite simply -&lt;/p&gt;
&lt;p&gt;```&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;run-main in.bharathwrites.sparkling.LocalFileLinearRegression
```
10. Instead of running the client program from the &lt;em&gt;sbt&lt;/em&gt; prompt one can build a fat JAR using the &lt;em&gt;assembly&lt;/em&gt; plugin. Doing so, one can run the client program from the command-line using the well known &lt;code&gt;java -cp jar-name main-class&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;PS: My project on github (sparkling) is my playground to learn spark. I will keep modifying the code in the coming days. So you may want to once read the code to check and see if it makes sense... as I have a tendency to check-in intermediary non-working code also! :)&lt;/p&gt;</content><category term="posts"/></entry><entry><title>Code Retreat</title><link href="https://bharath12345.github.io/posts/code-retreat/" rel="alternate"/><published>2014-04-14T00:00:00-04:00</published><updated>2014-04-14T00:00:00-04:00</updated><author><name>Bharadwaj</name></author><id>tag:bharath12345.github.io,2014-04-14:/posts/code-retreat/</id><summary type="html">&lt;p&gt;Last Saturday I happened to go to my first code retreat. Hosted by &lt;a href="http://www.multunus.com/"&gt;Multunus&lt;/a&gt; and conducted by the master programmer &lt;a href="https://twitter.com/venkat_s"&gt;Venkat Subramaniam&lt;/a&gt;, it was a very learnful experience. Here's a sneak-peek of the event from my eyes. &lt;/p&gt;
&lt;h3 id="the-format"&gt;The Format&lt;/h3&gt;
&lt;p&gt;Code retreats are day long coding fests. Programmers take a crack …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Last Saturday I happened to go to my first code retreat. Hosted by &lt;a href="http://www.multunus.com/"&gt;Multunus&lt;/a&gt; and conducted by the master programmer &lt;a href="https://twitter.com/venkat_s"&gt;Venkat Subramaniam&lt;/a&gt;, it was a very learnful experience. Here's a sneak-peek of the event from my eyes. &lt;/p&gt;
&lt;h3 id="the-format"&gt;The Format&lt;/h3&gt;
&lt;p&gt;Code retreats are day long coding fests. Programmers take a crack at solving &lt;a href="http://en.wikipedia.org/wiki/Conway's_Game_of_Life"&gt;Conway's Game of Life&lt;/a&gt; in multiple short sessions. Each session is for 45 minutes. Developers work in pairs of two (miniature &lt;a href="http://en.wikipedia.org/wiki/Extreme_programming"&gt;extreme programming&lt;/a&gt; sessions). Every session starts with a clean slate, that is NO previous code at all. &lt;a href="http://en.wikipedia.org/wiki/Test-driven_development"&gt;TDD&lt;/a&gt; is encouraged. After every 45 minute session is a 15 minute standup where everyone get to share the experience and things learnt. And at the end of the standup the group may decide to impose some constraints for the next session... like not-using-classes, no-global-state, no-returns-from-any-methods etc etc. The choice of programming language, frameworks used and design is all left to the pairing team. This meetup had close to 25 programmers who kept up the hacking whole day.&lt;/p&gt;
&lt;p&gt;Generally the 45 minutes time slot is insufficient to solve the puzzle along with writing tests. Especially if one is a newbie to the puzzle or language chosen or design idea. However, the the learning is not in solving the puzzle itself... but it lies in the experience of trying to do so within the setting. &lt;/p&gt;
&lt;p&gt;One can read more about the format and structure of Code Retreat at the community website - &lt;a href="http://coderetreat.org/"&gt;http://coderetreat.org&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="my-experience"&gt;My Experience&lt;/h3&gt;
&lt;p&gt;In 5 sessions across the day, I, along with my partner, programmed in 4 languages - Scala, JavaScript, Python and Ruby! Here is a brief on each of my sessions...&lt;/p&gt;
&lt;p&gt;My first session was in Scala. My partner and I both had prior experience with the language. And thankfully my partner had some perspective and prior experience with the problem too. In a quick discussion we zeroed in on a subset of the problem we wanted to tackle (neighbour-finding in Game of Life) and agreed on a simple design. We leapt into coding starting with writing the tests. The test part straightaway exposed my poor knowledge of ScalaTest. Late while writing the code we got stuck in trying to write object equality in Scala - something I could do with my eyes closed in Java. That was the second exposé. We programmed on my partner's system and he used gradle - which was a first for me.&lt;/p&gt;
&lt;p&gt;JavaScript was common lingo with my next partner in the second session. Neither of us had used any testing frameworks in JS ever. That straightaway was an exposé! Since we did not have time to learn a new framework we decided to hand-code the tests. I don't remember the code now but we wrote some code that seemed to solve something. And along the way I re-discovered the right way to write inner functions in JavaScript.&lt;/p&gt;
&lt;p&gt;The next session had me pairing with a guy who wanted a sneak-peek into Scala (he had been a Java guy). This time we used my laptop to code. In order to give him a quick flavour of the language I decided to not use any IDE's and used a text-editor. I wanted to stay clear of ScalaTest and Scala's build system - so as to give my partner a good look at some mass of code and not distract him with externalities. We wrote code + tests in a single Scala script file. While coding I had this nagging realisation that I had not understood the problem well enough and wanted some &lt;em&gt;thinking-time&lt;/em&gt; away from coding and just thinking about the puzzle. Lunch was approaching... so I decided to think hard and understand the problem better during lunchtime :-)&lt;/p&gt;
&lt;p&gt;Post-lunch I wanted to approach the problem differently from both design and language perspective. Was delighted to find a partner who could code in Python. This guy had put in some good think-time into the problem too and the approach we came up with was very refreshing for me! He coded away in Python which I found easy to follow and we wrote a lot of code. We were quite close to solving one of the situations in the game when the time ran out. Writing so much code to solve the problem meant that the &lt;em&gt;test&lt;/em&gt; code was minimal. However I was beginning to appreciate the complexity and immense freedom of design that this simple looking problem posed at us.&lt;/p&gt;
&lt;p&gt;There were lot of Ruby developers in the group and I was very curious to get to see this language. So I chose a Ruby-man for the last session. This guy was much younger to me but surprised me by his thorough approach to TDD. He insisted on evolving the test code almost simultaneously with the main code and I must admit it initially frustrated me. I would have wanted to write a blob of test code and then write a blob of solution code and keep alternating giving good time to each. But my partner, who was the one coding in Ruby, would not have any of it. He requested that we keep shifting gears between the test-code and solution-code for every few lines of code. I knew he had a good point in what he was suggesting and asked if he could really program like this in his work time. His answer in affirmative was very convincing and I silently appreciated the young programmer's discipline.&lt;/p&gt;
&lt;h3 id="epilogue"&gt;Epilogue&lt;/h3&gt;
&lt;p&gt;Its not everyday that I force myself to think on good programming problems and this was an excellent one. That itself made me immensely happy. I was split on approaching the problem from a &lt;em&gt;bottoms-up&lt;/em&gt; or &lt;em&gt;top-down&lt;/em&gt; approach - which had me thinking about an aspect of design that I had not thought for some time. I don't remember the last time doing pair programming - its a fabulous thing and want to grab every opportunity of doing so in future. The code retreat certainly pushed me away from my comfort zone and thats a great way to learn. Having not solved the problem nags me. One of these days I will sit down to write code that solves Game_Of_Life to at least some extent from few approaches that are brewing in my head. Finally I recommend Code Retreat to all. Try to get yourself a booking the next time it is happening somewhere nearby. &lt;/p&gt;
&lt;p&gt;Last but not the least - heard some wonderful thoughts by Venkat on the endeavour of software development. And saw a wonderful startup in Multunus at its works. The day was delight - a &lt;strong&gt;Big Thank You&lt;/strong&gt; to all my partners and all those who made the event possible.&lt;/p&gt;</content><category term="posts"/></entry><entry><title>Play2 Application on Wildfly: Why and How</title><link href="https://bharath12345.github.io/posts/play2-on-jboss-wildfly/" rel="alternate"/><published>2014-03-13T00:00:00-04:00</published><updated>2014-03-13T00:00:00-04:00</updated><author><name>Bharadwaj</name></author><id>tag:bharath12345.github.io,2014-03-13:/posts/play2-on-jboss-wildfly/</id><summary type="html">&lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Java_Platform,_Enterprise_Edition"&gt;JavaEE&lt;/a&gt; (v5 and v6) has a commanding presence in both marketshare and (developer) mindshare in the enterprise software world. The specifications are well thought-out, battle-tested and highly relied upon. I started using JavaEE (v5) way back in 2007 with JBoss 4.x. The latest release, &lt;a href="http://www.oracle.com/technetwork/java/javaee/tech/index.html"&gt;JavaEE-7&lt;/a&gt;, which was released close …&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Java_Platform,_Enterprise_Edition"&gt;JavaEE&lt;/a&gt; (v5 and v6) has a commanding presence in both marketshare and (developer) mindshare in the enterprise software world. The specifications are well thought-out, battle-tested and highly relied upon. I started using JavaEE (v5) way back in 2007 with JBoss 4.x. The latest release, &lt;a href="http://www.oracle.com/technetwork/java/javaee/tech/index.html"&gt;JavaEE-7&lt;/a&gt;, which was released close to a year ago brings with itself a lot of worthy changes to the specs and impl. To bring myself up to speed on it I went through few books and attended a conference (JUDCon, Bangalore). But I have also been coding and acquainting myself with Typesafe's Scala &lt;em&gt;&lt;a href="https://typesafe.com/platform"&gt;reactive&lt;/a&gt;&lt;/em&gt; stack. These two stacks are bound to compete with each more and more in the coming days. However I feel, they can be used in applications in complementary ways when carefully designed. The competition and challenge to JavaEE-7 stems from two tough requirements -&lt;/p&gt;
&lt;div class="toc"&gt;&lt;span class="toctitle"&gt;Table of Contents&lt;/span&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#web-tier-in-javaee-the-loose-brick"&gt;Web Tier In JavaEE - The Loose Brick?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#play-framework"&gt;Play Framework&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#howto-play2-on-wildfly-to-interop-with-javaee"&gt;HOWTO - Play2 on Wildfly to interop with JavaEE&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;ol&gt;
&lt;li&gt;Horizontal scalability&lt;/li&gt;
&lt;li&gt;Near real-time persist/process/view of ever increasing data volumes&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The JavaEE stack is broadly split into 3 tiers - web, business and persistence. JSF (broadly, including expression-lang, JSTL, JSP and Servlets) is the technology of choice (per the specs) in the web tier. And JSF, to me, seems most vulnerable of &lt;em&gt;not&lt;/em&gt; being able to raise up to the above mentioned two challenges. JSF does feel like the &lt;em&gt;loose brick&lt;/em&gt; in the JavaEE stack. And it feel ever more so after spending some time with the &lt;a href="http://www.playframework.com/"&gt;Play Framework&lt;/a&gt;!&lt;/p&gt;
&lt;hr&gt;

&lt;h3 id="web-tier-in-javaee-the-loose-brick"&gt;Web Tier In JavaEE - &lt;em&gt;The Loose Brick?&lt;/em&gt;&lt;/h3&gt;
&lt;p&gt;This was a recent tweet by Peter Thomas -&lt;/p&gt;
&lt;p&gt;&lt;a style="float: left; padding-right: 2em" href="https://twitter.com/ptrthomas/statuses/428460021265887232"&gt;&lt;img alt="image" src="/images/peter%20thomas%20tweet.png"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;br/&gt;
&lt;br/&gt;&lt;/p&gt;
&lt;p&gt;Now to a quick primer on frameworks in JavaEE's web-tier. I like to group the Web Tier of JavaEE applications into 3 groups per a broad grouping of library's goals. A quick note on each of these... &lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Component Frameworks&lt;/strong&gt;: &lt;/li&gt;
&lt;li&gt;Component frameworks like JSF are suited for parts of the application with lot of forms and CRUD operations. JSF is an aggregate of multiple technical pieces which include facelets, expression language, jstl, converters, listeners, validators etc.  JSF helps build composable UI components with server side validation in a scalable way&lt;/li&gt;
&lt;li&gt;It abstracts away a lot of &lt;em&gt;state&lt;/em&gt; information in its stack which is not good for building UI components that serve a lot of &lt;em&gt;read only&lt;/em&gt; and &lt;em&gt;voluminous&lt;/em&gt; data. Tasks like JSON transformation within JSF are not efficient at scale &lt;/li&gt;
&lt;li&gt;Now, rarely do programmers get completely satisfied with the component library within JSF. So they use the richer component frameworks (above and beyond JSF) like Apache Wicket and Tapestry. And for dynamic pages with lot of AJAX there are frameworks like RichFaces and PrimeFaces which provide features atop JSF&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Action Frameworks&lt;/strong&gt;: For &lt;em&gt;read-only&lt;/em&gt; and voluminous data handling atop servlets &lt;em&gt;action&lt;/em&gt; frameworks are preferred which explicitly tie to the HTTP request/response cycle. Action frameworks typically implement the famous MVC pattern for clear separation of concerns. So applications tend to use frameworks like Struts, SpringMVC etc&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Standalone, Proprietary Frameworks&lt;/strong&gt;: These are the ones that are unbelievably beautiful for quick-small projects and unbelievably ugly for large ones. Technologies like JSP, GWT, Dart et al. These are just &lt;em&gt;pure evil&lt;/em&gt; from enterprise products perspective&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;br/&gt;
&lt;br/&gt;
&lt;br/&gt;&lt;/p&gt;
&lt;hr&gt;

&lt;h3 id="play-framework"&gt;Play Framework&lt;/h3&gt;
&lt;p&gt;With me having done quite a bit of programming in JSP and JSF, I found Play to be fresh breath of air. Web application programmers must spend some time reading and understanding &lt;a href="http://guillaumebort.tumblr.com/post/558830013/why-there-is-no-servlets-in-play"&gt;this blog&lt;/a&gt; by Guillaume Bort on reasons behind the decision to not write yet another framework atop Java's HttpServlet. My experience with Play has been by building a lookalike for my blog with it (hosted on Heroku &lt;a href="http://bharathplays.herokuapp.com"&gt;here&lt;/a&gt;). I have built my blog on NodeJS and RubyRails as well - and honestly, it took much lesser time to build it with Play. But more important is the question of should enterprise web-tiers be programmed with Play? Is Play up to the mark for projects of development scale and complexity? My answer is a thumping YES!!&lt;/p&gt;
&lt;p&gt;Let me list the specific features that I found especially useful and important -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Scala templating with compile time type safety - I have found UI composition to be very intuitive and much simpler than JSF (JSF's composability really feels like a mess when compared with Play!)&lt;/li&gt;
&lt;li&gt;Crisp way to program with Futures and Async to handle action endpoints &lt;/li&gt;
&lt;li&gt;Websockets - Futures, Async - much better API for streaming data than the WebSocket Spec in JavaEE&lt;/li&gt;
&lt;li&gt;Stateless. Easy to use with Akka&lt;/li&gt;
&lt;li&gt;Marshalling/unmarshalling of JSON data without reflection which provides huge performance improvement&lt;/li&gt;
&lt;li&gt;Explicit, clean server side routing methodology (what a mess this is in JavaEE where programmers often mix annotations, xml and sometimes also bring in client-side routing unnecessarily) &lt;/li&gt;
&lt;li&gt;No server side sessions at all! Sessions in Play are all made available through cookies and HTTP headers. So no server side context to worry about &lt;/li&gt;
&lt;li&gt;Built-in build-time JavaScript compilation&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.webjars.org/"&gt;WebJars&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Less verbose Scala code. The joy of composable functional programming&lt;/li&gt;
&lt;li&gt;Hot deployment during development&lt;/li&gt;
&lt;li&gt;Netty underneath - performance not an issue&lt;/li&gt;
&lt;li&gt;Cloud deployment ready (Heroku, Cloudbees support it)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="http://www.slideshare.net/brikis98/composable-and-streamable-play-apps"&gt;This one&lt;/a&gt; is a excellent presentation (and &lt;a href="https://github.com/brikis98/ping-play"&gt;code&lt;/a&gt;) by Yevginy Brikman of LinkedIn (LinkedIn uses Play! for multiple web apps in its stack). The title is apt - building web apps that are composable and streamable. More and more, enterprise applications have UI requirements of the kind described here. And building these using JSF/Java would be too complex a web project and IMHO not worth the trouble!    &lt;/p&gt;
&lt;h3 id="howto-play2-on-wildfly-to-interop-with-javaee"&gt;HOWTO - Play2 on Wildfly to interop with JavaEE&lt;/h3&gt;
&lt;p&gt;For very good reasons enterprise applications are generally hosted on application containers. And application containers mostly come built with a servlet container for web frontend. Now since Play2 is not Servlet based does it mean using it in enterprise applications straightaway get vetoed? Not necessarily. If engineers have a little chutzpah, the gap can be bridged. Here is how I was able to host my Play2 application on JBoss-Wildfly -&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The plugin to study and use for the task is the &lt;a href="https://github.com/dlecan/play2-war-plugin"&gt;Play2War&lt;/a&gt;. Play2War builds a WAR out of a Play2 application. I was then able to deploy this WAR of my Play app on Wildfly and get it to work&lt;/li&gt;
&lt;li&gt;After following the usage/configuration instructions from the Play2War's GitHub page, the first thing to do is to configure the &lt;em&gt;excludes&lt;/em&gt;. A number of JARs that get packaged using Play2War clash with JBoss's modules and thus need to be excluded. Here is a quick list of such JAR's from my project -
 &lt;br&gt;
&lt;table class="table table-striped table-bordered table-condensed"&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Artifact&lt;/th&gt;
      &lt;th&gt;GroupID&lt;/th&gt;
      &lt;th&gt;Version In Play v2.2.1&lt;/th&gt;
      &lt;th&gt;Version In Wildfly v8.0.0-Final&lt;/th&gt;
      &lt;th&gt;Newer&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Google Guava&lt;/td&gt;
      &lt;td&gt;com.google.guava&lt;/td&gt;
      &lt;td&gt;14.0.1&lt;/td&gt;
      &lt;td&gt;16.0.1&lt;/td&gt;
      &lt;td&gt;Wildfly&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Jackson Core, Annotations and Databind&lt;/td&gt;
      &lt;td&gt;com.fasterxml.jackson.core.jackson* &lt;/td&gt;
      &lt;td&gt;v2.2.2&lt;/td&gt;
      &lt;td&gt;v2.3.0&lt;/td&gt;
      &lt;td&gt;Wildfly&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;H2 Database&lt;/td&gt;
      &lt;td&gt;com.h2database.h2*&lt;/td&gt;
      &lt;td&gt;v1.3.172&lt;/td&gt;
      &lt;td&gt;v1.3.173&lt;/td&gt;
      &lt;td&gt;Wildfly&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Apache Commons Codec&lt;/td&gt;
      &lt;td&gt;org.apache.commons.codec&lt;/td&gt;
      &lt;td&gt;v1.6&lt;/td&gt;
      &lt;td&gt;v1.9&lt;/td&gt;
      &lt;td&gt;Wildfly&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Apache Commons IO&lt;/td&gt;
      &lt;td&gt;org.apache.commons.io&lt;/td&gt;
      &lt;td&gt;v1.3.2&lt;/td&gt;
      &lt;td&gt;v2.4&lt;/td&gt;
      &lt;td&gt;Wildfly&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Netty&lt;/td&gt;
      &lt;td&gt;io.netty&lt;/td&gt;
      &lt;td&gt;v3.7.0&lt;/td&gt;
      &lt;td&gt;v4.0.15&lt;/td&gt;
      &lt;td&gt;Wildfly&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Hibernate Commons Annotations&lt;/td&gt;
      &lt;td&gt;org.hibernate&lt;/td&gt;
      &lt;td&gt;v4.0.2&lt;/td&gt;
      &lt;td&gt;v4.0.4&lt;/td&gt;
      &lt;td&gt;Wildfly&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Hibernate Core, Entity Manager&lt;/td&gt;
      &lt;td&gt;org.hibernate&lt;/td&gt;
      &lt;td&gt;v4.2.3&lt;/td&gt;
      &lt;td&gt;v4.3.1&lt;/td&gt;
      &lt;td&gt;Wildfly&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Hibernate Validator&lt;/td&gt;
      &lt;td&gt;org.hibernate&lt;/td&gt;
      &lt;td&gt;v5.0.1&lt;/td&gt;
      &lt;td&gt;v5.0.3&lt;/td&gt;
      &lt;td&gt;Wildfly&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Javaassist&lt;/td&gt;
      &lt;td&gt;org.javaassist&lt;/td&gt;
      &lt;td&gt;v3.18.0&lt;/td&gt;
      &lt;td&gt;v3.18.1&lt;/td&gt;
      &lt;td&gt;Wildfly&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;JBoss Logging&lt;/td&gt;
      &lt;td&gt;org.jboss.logging&lt;/td&gt;
      &lt;td&gt;v3.1.1&lt;/td&gt;
      &lt;td&gt;v3.1.4&lt;/td&gt;
      &lt;td&gt;Wildfly&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;JBoss Transaction&lt;/td&gt;
      &lt;td&gt;javax.transaction.api&lt;/td&gt;
      &lt;td&gt;v1.0.0&lt;/td&gt;
      &lt;td&gt;v1.0.1&lt;/td&gt;
      &lt;td&gt;Wildfly&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Yaml&lt;/td&gt;
      &lt;td&gt;org.yaml.snakeyaml&lt;/td&gt;
      &lt;td&gt;v1.12&lt;/td&gt;
      &lt;td&gt;v1.13&lt;/td&gt;
      &lt;td&gt;Wildfly&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;      
      &lt;td&gt;Antlr&lt;/td&gt;
      &lt;td&gt;org.antlr&lt;/td&gt;
      &lt;td&gt;v2.7.7&lt;/td&gt;
      &lt;td&gt;v2.7.7&lt;/td&gt;
      &lt;td&gt;Same&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;dom4j&lt;/td&gt;
      &lt;td&gt;org.dom4j&lt;/td&gt;
      &lt;td&gt;v1.6.1&lt;/td&gt;
      &lt;td&gt;v1.6.1&lt;/td&gt;
      &lt;td&gt;Same&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Postgres&lt;/td&gt;
      &lt;td&gt;org.postgres&lt;/td&gt;
      &lt;td&gt;v9.1-901&lt;/td&gt;
      &lt;td&gt;v9.1-901&lt;/td&gt;
      &lt;td&gt;Same&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;javax.validation&lt;/td&gt;
      &lt;td&gt;&lt;/td&gt;
      &lt;td&gt;v1.1.0&lt;/td&gt;
      &lt;td&gt;v1.1.0&lt;/td&gt;
      &lt;td&gt;Same&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Joda time&lt;/td&gt;
      &lt;td&gt;org.joda.time&lt;/td&gt;
      &lt;td&gt;v2.2&lt;/td&gt;
      &lt;td&gt;v1.6.2&lt;/td&gt;
      &lt;td&gt;Play&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Apache Commons Lang&lt;/td&gt;
      &lt;td&gt;org.apache.commons.lang&lt;/td&gt;
      &lt;td&gt;v3.1&lt;/td&gt;
      &lt;td&gt;v2.6&lt;/td&gt;
      &lt;td&gt;Play&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;HttpCore&lt;/td&gt;
      &lt;td&gt;org.apache.httpcomponents.&lt;/td&gt;
      &lt;td&gt;v4.3.1&lt;/td&gt;
      &lt;td&gt;v4.2.1&lt;/td&gt;
      &lt;td&gt;Play&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;HttpClient&lt;/td&gt;
      &lt;td&gt;org.apache.httpcomponents.&lt;/td&gt;
      &lt;td&gt;v4.3.2&lt;/td&gt;
      &lt;td&gt;v4.2.1&lt;/td&gt;
      &lt;td&gt;Play&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Hibernate JPA&lt;/td&gt;
      &lt;td&gt;javax.persistence.api&lt;/td&gt;
      &lt;td&gt;v1.0.1&lt;/td&gt;
      &lt;td&gt;v1.0.0&lt;/td&gt;
      &lt;td&gt;Play&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Asm&lt;/td&gt;
      &lt;td&gt;asm.asm&lt;/td&gt;
      &lt;td&gt;v4.1&lt;/td&gt;
      &lt;td&gt;v3.3.1&lt;/td&gt;
      &lt;td&gt;Play&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;jcl-over-slf4j&lt;/td&gt;
      &lt;td&gt;org.slf4j&lt;/td&gt;
      &lt;td&gt;v1.7.5&lt;/td&gt;
      &lt;td&gt;v1.7.2&lt;/td&gt;
      &lt;td&gt;Play&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;jul-to-slf4j&lt;/td&gt;
      &lt;td&gt;org.jboss.logging&lt;/td&gt;
      &lt;td&gt;v1.7.5&lt;/td&gt;
      &lt;td&gt;v1.0.1&lt;/td&gt;
      &lt;td&gt;Play&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;slf4j-api&lt;/td&gt;
      &lt;td&gt;org.slf4j&lt;/td&gt;
      &lt;td&gt;v1.7.5&lt;/td&gt;
      &lt;td&gt;v1.7.2&lt;/td&gt;
      &lt;td&gt;Play&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Xerces&lt;/td&gt;
      &lt;td&gt;org.apache.xerces&lt;/td&gt;
      &lt;td&gt;v2.11&lt;/td&gt;
      &lt;td&gt;v2.9.1&lt;/td&gt;
      &lt;td&gt;Play&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
   &lt;/table&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;One can see from the above table that versions of many artefacts is newer in Wildfly. I decided to use the newer Wildfly versions and exclude these from the WAR generated by Play2War. So included this filtering statement in my project's &lt;em&gt;build.sbt&lt;/em&gt; file -&lt;/p&gt;
&lt;pre&gt;Play2WarKeys.filteredArtifacts ++= Seq(
  ("com.google.guava", "guava"),
  ("com.google.code.findbugs", "findbugs"),
  ("com.fasterxml.jackson.core","jackson-annotations"),
  ("com.fasterxml.jackson.core","jackson-core"),
  ("com.fasterxml.jackson.core","jackson-databind"),
  ("com.fasterxml","classmate"),
  ("commons-codec","commons-codec"),
  ("commons-io","commons-io"),
  ("org.hibernate","hibernate-commons-annotations"),
  ("org.hibernate","hibernate-core"),
  ("org.hibernate","hibernate-entitymanager"),
  ("org.hibernate","hibernate-validator"),
  ("org.hibernate.common","hibernate-commons-annotations"),
  ("org.hibernate.javax.persistence","hibernate-jpa-2.0-api"),
  ("javax.validation","validation-api"),
  ("javax.persistence","persistence-api"),
  ("javax.transaction","transaction-api"),
  ("org.jboss.spec.javax.transaction","jboss-transaction-api_1.1_spec"),
  ("org.jboss.logging","jboss-logging"),
  ("org.jboss.logmanager", "log4j-jboss-logmanager"),
  ("org.springframework","spring-beans"),
  ("org.springframework","spring-context"),
  ("org.springframework","spring-core"),
  ("postgresql","postgresql"),
  ("org.javassist","javassist"),
  ("org.yaml","snakeyaml"),
  ("antlr","antlr"),
  ("com.h2database","h2"),
  ("dom4j","dom4j"),
  ("tyrex","tyrex")
  //("com.jolbox", "bonecp"),
  //("io.netty","netty"),
)&lt;/pre&gt;

&lt;ol&gt;
&lt;li&gt;Next thing to do is to use a jboss-deployment-structure.xml where one can specify the newer Wildfly modules of these artefacts to be used for the deployment. This deployment descriptor should be created in the following path in the Play2 project -
 &lt;br&gt;
&lt;pre&gt;
   app/
   conf/
   project/
   war/
   |--WEB-INF
         |--jboss-deployment-structure.xml
   &lt;/pre&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I used the following setting in this xml -
 &lt;br&gt;
&lt;pre&gt;
    &amp;lt;jboss-deployment-structure&gt;
      &amp;lt;deployment&gt;
        &amp;lt;dependencies&gt;
          &amp;lt;module name="com.google.guava"/&gt;
          &amp;lt;module name="com.fasterxml.jackson.core.jackson-annotations"/&gt;
          &amp;lt;module name="com.fasterxml.jackson.core.jackson-core"/&gt;
          &amp;lt;module name="com.fasterxml.jackson.core.jackson-databind"/&gt;
          &amp;lt;!--module name="com.h2database.h2"/--&gt;
          &amp;lt;module name="org.apache.commons.codec"/&gt;
          &amp;lt;module name="org.apache.commons.io"/&gt;
          &amp;lt;!--module name="io.netty"/--&gt;
          &amp;lt;module name="org.hibernate.commons-annotations"/&gt;
          &amp;lt;module name="org.hibernate"/&gt;
          &amp;lt;module name="org.javassist"/&gt;
          &amp;lt;module name="org.jboss.logging"/&gt;
          &amp;lt;module name="org.yaml.snakeyaml"/&gt;
          &amp;lt;module name="org.antlr"/&gt;
          &amp;lt;module name="org.dom4j"/&gt;
          &amp;lt;module name="org.postgres"/&gt;
          &amp;lt;module name="javax.validation.api"/&gt;
          &amp;lt;module name="javax.persistence.api"/&gt;
          &amp;lt;module name="javax.transaction.api"/&gt;
          &amp;lt;module name="org.glassfish.javaeetutorial.helloservice-api"/&gt;
          &amp;lt;module name="org.jboss.log4j.logmanager"/&gt;
        &amp;lt;/dependencies&gt;
      &amp;lt;/deployment&gt;
    &amp;lt;/jboss-deployment-structure&gt;
  &lt;/pre&gt;
&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;I wanted to use Hibernate in my Play project. And I wanted to interact with EJB's and ActiveMQ messaging service in Wildfly. Firstly, this is very much possible. To use hibernate, one has to create the persistence.xml in the following structure -
 &lt;br&gt;
&lt;pre&gt;
   war
   |--WEB-INF
         |--classes
               |--META-INF
                    |--persistence.xml
   &lt;/pre&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Since the WAR will be deployed in Wildfly, make sure to read the persistence related docs from its wiki.
5. Logging - Play2War's GitHub wiki has a separate section for configuring logging with Wildfly. Make sure to read that. It basically asks for including this dependency in the build.sbt -
 &lt;br&gt;
&lt;pre&gt;
   "com.github.play2war.ext"   %% "redirect-playlogger"     % "1.0.1"
   &lt;/pre&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;With this configuration the WAR built by Play2War for my Play2 project was around 50MB in size. One has to use JNDI lookup to access the EJB's in container. The looked up EJB's can be cached in a Scala Object to avoid repeats&lt;/li&gt;
&lt;li&gt;One shortcoming that I realised while doing this work was that &lt;strong&gt;websockets will not work&lt;/strong&gt; in this setup. Play2 uses Netty as is HTTP server and Wildfly uses Undertow. The websocket implementation in Wildfly (per the Websocket 1.0 spec) could be closely tied to to Undertow - but I have not read Wildfly's code to say so with certainty. Or maybe if one can make Wildfly to use Netty instead of Undertow as the underlying HTTP server then the websocket communication as provided by Play2 should become naturally available. Anyway, this is one shortcoming one has to put up if one takes this route&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;em&gt;The Final Word&lt;/em&gt; - I really feel Play2 will be a really good fit even in the JavaEE stack a few years down the line when some more bridges will appear around it to make it easily compatible to JavaEE application servers. It can be done even now as I found out. However one should take this plunge very cautiously (at least when using Wildfly). But to me, this is the right way to go... and the right way is generally never easy!&lt;/p&gt;</content><category term="posts"/></entry><entry><title>My Scala Application: Real-time Twitter Volume Grapher For Indian Elections 2014</title><link href="https://bharath12345.github.io/posts/my-scala-application---twitter-volume-grapher-for-indian-election-personalities/" rel="alternate"/><published>2014-02-27T00:00:00-05:00</published><updated>2014-02-27T00:00:00-05:00</updated><author><name>Bharadwaj</name></author><id>tag:bharath12345.github.io,2014-02-27:/posts/my-scala-application---twitter-volume-grapher-for-indian-election-personalities/</id><summary type="html">&lt;p&gt;The Indian general elections are around the corner. For software engineers, this time around, there is data to play with and try to predict the outcome. Among all, the data from the social media giants - Twitter and Facebook - is easily accessible for analysis. Though social media may not be the …&lt;/p&gt;</summary><content type="html">&lt;p&gt;The Indian general elections are around the corner. For software engineers, this time around, there is data to play with and try to predict the outcome. Among all, the data from the social media giants - Twitter and Facebook - is easily accessible for analysis. Though social media may not be the right barometer to judge voter sentiments in a country as big and diverse as India, it is nonetheless a very tempting datasource for anyone curious. So couple of days ago I decided to do a small project - to simply &lt;em&gt;chart&lt;/em&gt; the volume of tweets with strings like &lt;em&gt;"modi", "rahul", "kejri" and "india"&lt;/em&gt; in it. I thought just a graph of volumes by itself will be interesting to see. So here I present the v1.0 of my &lt;strong&gt;Indian-general-elections-social-media-tracker!&lt;/strong&gt;&lt;/p&gt;
&lt;div class="toc"&gt;&lt;span class="toctitle"&gt;Table of Contents&lt;/span&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#the-application"&gt;The Application&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#3-seconds-tweet-aggregate-grapher"&gt;3 Seconds Tweet Aggregate Grapher&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#30-seconds-tweet-aggregate-grapher"&gt;30 Seconds Tweet Aggregate Grapher&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#5-minutes-tweet-aggregate-grapher"&gt;5 Minutes Tweet Aggregate Grapher&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#30-minutes-tweet-aggregate-grapher"&gt;30 Minutes Tweet Aggregate Grapher&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#3-hours-tweet-aggregate-grapher"&gt;3 Hours Tweet Aggregate Grapher&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#design-code-and-logic"&gt;Design, Code and Logic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#websocket-addendum"&gt;WebSocket Addendum&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#final-note"&gt;Final Note&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h3 id="the-application"&gt;The Application&lt;/h3&gt;
&lt;p&gt;The application has 5 different dashboards with a URL for each. Each of these 5 dashboard's have 4 graphs - one for each string (india/modi/rahul/kejri). Here is a quick summary of each dashboard - &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;h5 id="3-seconds-tweet-aggregate-grapher"&gt;3 Seconds Tweet Aggregate Grapher&lt;/h5&gt;
&lt;/li&gt;
&lt;li&gt;URL: &lt;a href="http://bharathplays.herokuapp.com/twitter/elections/0"&gt;http://bharathplays.herokuapp.com/twitter/elections/0&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Dashboard Sample Image: &lt;br&gt;
       &lt;a href="/images/twitterdashboard/my%20twitter%20dashboard%202014-02-27%2019-05-58.png"&gt;&lt;img alt="image" src="/images/twitterdashboard/my%20twitter%20dashboard%202014-02-27%2019-05-58.png =430x238"&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Details: &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In this graph, a new data-point get created every 3-seconds. It appears as a dot on the chart. A mouse over of the dot shows the exact time and value of the data-point&lt;/li&gt;
&lt;li&gt;The title of the four grid's shows the string the graph is for. For example, the title of the graph showing the line chart for string &lt;em&gt;rahul&lt;/em&gt; has the title &lt;strong&gt;Twitter Trends Graph for rahul&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;x-axis is time. y-axis is number-of-tweets&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;h5 id="30-seconds-tweet-aggregate-grapher"&gt;30 Seconds Tweet Aggregate Grapher&lt;/h5&gt;
&lt;/li&gt;
&lt;li&gt;URL: &lt;a href="http://bharathplays.herokuapp.com/twitter/elections/1"&gt;http://bharathplays.herokuapp.com/twitter/elections/1&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Dashboard Sample Image: &lt;br&gt;
       &lt;a href="/images/twitterdashboard/my%20twitter%20dashboard%202014-02-27%2019-06-30.png"&gt;&lt;img alt="image" src="/images/twitterdashboard/my%20twitter%20dashboard%202014-02-27%2019-06-30.png =430x238"&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Details: &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In this graph a new data-point gets created on the chart every 30 seconds&lt;/li&gt;
&lt;li&gt;Refer to the details of 3-seconds chart (above) for other info&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;h5 id="5-minutes-tweet-aggregate-grapher"&gt;5 Minutes Tweet Aggregate Grapher&lt;/h5&gt;
&lt;/li&gt;
&lt;li&gt;URL: &lt;a href="http://bharathplays.herokuapp.com/twitter/elections/2"&gt;http://bharathplays.herokuapp.com/twitter/elections/2&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Dashboard Sample Image:
       &lt;a href="/images/twitterdashboard/my%20twitter%20dashboard%202014-02-27%2019-27-42.png"&gt;&lt;img alt="image" src="/images/twitterdashboard/my%20twitter%20dashboard%202014-02-27%2019-27-42.png =430x238"&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Details: &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In this graph a new data-point gets created on the chart every 5 minutes (300 seconds)&lt;/li&gt;
&lt;li&gt;Refer to the details of 3-seconds chart (first one above) for other info&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;h5 id="30-minutes-tweet-aggregate-grapher"&gt;30 Minutes Tweet Aggregate Grapher&lt;/h5&gt;
&lt;/li&gt;
&lt;li&gt;URL: &lt;a href="http://bharathplays.herokuapp.com/twitter/elections/3"&gt;http://bharathplays.herokuapp.com/twitter/elections/3&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Dashboard Sample Image:
       &lt;a href="/images/twitterdashboard/my%20twitter%20dashboard%202014-02-27%2023-20-24.png"&gt;&lt;img alt="image" src="/images/twitterdashboard/my%20twitter%20dashboard%202014-02-27%2023-20-24.png =430x238"&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Details: &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In this graph a new data-point gets created on the chart every 30 minutes (1800 seconds)&lt;/li&gt;
&lt;li&gt;Refer to the details of 3-seconds chart (first one above) for other info&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;h5 id="3-hours-tweet-aggregate-grapher"&gt;3 Hours Tweet Aggregate Grapher&lt;/h5&gt;
&lt;/li&gt;
&lt;li&gt;URL: &lt;a href="http://bharathplays.herokuapp.com/twitter/elections/4"&gt;http://bharathplays.herokuapp.com/twitter/elections/4&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Dashboard Sample Image: [TBD]&lt;/li&gt;
&lt;li&gt;Details: &lt;ul&gt;
&lt;li&gt;In this graph a new data-point gets created on the chart every 3 hours (10800 seconds)&lt;/li&gt;
&lt;li&gt;Refer to the details of 3-seconds chart (first one above) for other info&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="design-code-and-logic"&gt;Design, Code and Logic&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;The code is on GitHub &lt;a href="https://github.com/bharath12345/playing"&gt;here&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;It uses &lt;a href="http://www.playframework.com/"&gt;Play! Framework's&lt;/a&gt; capabilities for all UI work which includes templates, URL-routing and WebSockets communication&lt;/li&gt;
&lt;li&gt;To bind to Twitter's stream firehose it uses &lt;a href="http://spray.io/"&gt;Spray.IO&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;All the code is in Scala. It does &lt;em&gt;not&lt;/em&gt; use threads and uses the actor-method of concurrency using &lt;a href="http://akka.io/"&gt;Akka&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Play's capability to do &lt;a href="http://www.playframework.com/documentation/2.2.x/api/scala/index.html#play.api.mvc.WebSocket$"&gt;async&lt;/a&gt; websocket concurrent &lt;a href="http://www.playframework.com/documentation/2.2.x/api/scala/index.html#play.api.libs.iteratee.Concurrent$"&gt;broadcast&lt;/a&gt; is leveraged&lt;/li&gt;
&lt;li&gt;It connects to &lt;a href="https://dev.twitter.com/docs/api/1.1/post/statuses/filter"&gt;filter&lt;/a&gt; API among the many Twitter's streaming APIs&lt;/li&gt;
&lt;li&gt;The part of code which connects to Twitter's streaming APIs and retrieves individual tweets is an extension of &lt;a href="http://www.cakesolutions.net/teamblogs/2013/12/08/streaming-twitter-api-in-akka-and-spray/"&gt;this&lt;/a&gt; example by &lt;a href="https://twitter.com/honzam399"&gt;Jan Machacek&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;This application does NOT do sentiment analysis but is a brute-force volume grapher. So a higher count for a candidate does not imply positive popularity but just that his name is trending more. &lt;/li&gt;
&lt;li&gt;A keyword string, say &lt;em&gt;Rahul&lt;/em&gt;, might appear anywhere in the tweet. That is, it could be anywhere in the text, or may be a part of the hashtag, or may even be part of a user handle that appears in the tweet&lt;/li&gt;
&lt;li&gt;While trying to find a match, I convert the whole tweet to lowercase and then use Java's String &lt;em&gt;contains()&lt;/em&gt; to find a match. So the matching is case insensitive - the count for string &lt;em&gt;Rahul&lt;/em&gt; includes those for &lt;em&gt;rahul&lt;/em&gt;, &lt;em&gt;raHul&lt;/em&gt;, &lt;em&gt;rahuL&lt;/em&gt; etc. The matching string could also be a substring. So &lt;em&gt;modi&lt;/em&gt; will match &lt;em&gt;modified&lt;/em&gt;, &lt;em&gt;moditva&lt;/em&gt;, &lt;em&gt;amodi&lt;/em&gt;, etc. However I took a dump of over 1000 tweets to see how many of those captured did NOT belong to Indian elections - only to find that almost 80% of all tweets captured did concern these candidates and hence elections (you have to take my word on that!)&lt;/li&gt;
&lt;li&gt;One of the main motivations behind the design is to keep it lightweight. Twitter data is voluminous as can be seen by the counts. So the challenge is to serve a huge number of web-clients along with processing the incoming data. By &lt;em&gt;removing&lt;/em&gt; the HTTP request-response loop for each update of the graph, a potentially big saving is achieved. Further, data for all 4 graphs in the dashboard is &lt;em&gt;multiplexed&lt;/em&gt; over a single WebSocket channel. So every browser client has a single WebSocket channel to the server. This again is a big saving, since, if AJAX were used, then to update each of the 4 graphs would have required one client-to-server call each - which is very very expensive as the number of browser-clients increase&lt;/li&gt;
&lt;li&gt;Actors are a beautiful message-passing abstraction which make erstwhile tasks like managing threads and pools redundant. Please refer to to Akka documentation to know about this paradigm of programming&lt;/li&gt;
&lt;li&gt;The whole application is hosted on Heroku. Heroku allows hosting of Play 2.0 applications and also provides WebSocket support. So the cost of running this application to me is free! :-)&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The design flow essence is captured in this image - &lt;/p&gt;
&lt;p&gt;&lt;a href="/images/twitterdashboard/algo.png"&gt;&lt;img alt="image" src="/images/twitterdashboard/algo.png =1000x200"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="websocket-addendum"&gt;WebSocket Addendum&lt;/h3&gt;
&lt;p&gt;The graph may not appear if you are a behind a proxy which does not tunnel WebSockets (like behind some of the office networks). Also if a firewall blocks WebSockets. But in case you run into any of these issues, then you could use your &lt;strong&gt;mobile device&lt;/strong&gt; to see the dashboard. Here is the screenshot from my Android Samsung S2 on my home Wifi. I also checked that the graphing works on my Airtel 2G network fairly well too (the dots in the image below are some mess-up by the mobile screenshot tool)&lt;/p&gt;
&lt;p&gt;&lt;a href="/images/twitterdashboard/2014_02_27_19.51.35.png"&gt;&lt;img alt="image" src="/images/twitterdashboard/2014_02_27_19.51.35.png =220x200"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3 id="final-note"&gt;Final Note&lt;/h3&gt;
&lt;p&gt;These graphs are just volumetric. I plan do some simple sentiment analysis next. However, by looking at the graphs and tweets behind them, it is heartening to see the order of popularity of each string. &lt;em&gt;India&lt;/em&gt; is most popular among the four but next comes &lt;em&gt;Modi&lt;/em&gt; and it is generally not far behind. &lt;em&gt;Rahul&lt;/em&gt; seems to appear more than &lt;em&gt;Kejri&lt;/em&gt; but both these strings trail a long way behind &lt;em&gt;Modi&lt;/em&gt;. With me being a diehard Sri Narendra Modi supporter, these graphs and numbers certainly make me happy and hopefully bode well for the good times to come for my country :-) &lt;/p&gt;</content><category term="posts"/></entry><entry><title>Computing Laws, Theorems and Aphorisms</title><link href="https://bharath12345.github.io/posts/computing-laws-theorems-and-aphorisms/" rel="alternate"/><published>2014-02-23T00:00:00-05:00</published><updated>2014-02-23T00:00:00-05:00</updated><author><name>Bharadwaj</name></author><id>tag:bharath12345.github.io,2014-02-23:/posts/computing-laws-theorems-and-aphorisms/</id><summary type="html">&lt;p&gt;There are plenty of laws, theorems and aphorisms out there, apart from Moore's and Murphy's, that computer people could use. They sometimes come handy in meetings and emails! Just using them could, at times, mean standing on the shoulders of giants. I plan to keep this as an ongoing post …&lt;/p&gt;</summary><content type="html">&lt;p&gt;There are plenty of laws, theorems and aphorisms out there, apart from Moore's and Murphy's, that computer people could use. They sometimes come handy in meetings and emails! Just using them could, at times, mean standing on the shoulders of giants. I plan to keep this as an ongoing post... shall keep adding till it gets too long!&lt;/p&gt;
&lt;h3 id="laws-and-theorems"&gt;Laws and Theorems&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="http://en.wikipedia.org/wiki/Amdahl's_law"&gt;Amdahl's Law&lt;/a&gt;&lt;/strong&gt;: used to find the maximum expected improvement to an overall system when only part of the system is improved&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="http://en.wikipedia.org/wiki/Metcalfe's_law"&gt;Metcalfe's Law&lt;/a&gt;&lt;/strong&gt;: the value of a network is proportional to the square of the number of connected users of the system (n&lt;sup&gt;2&lt;/sup&gt;)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="http://en.wikipedia.org/wiki/Conway's_law"&gt;Conway's Law&lt;/a&gt;&lt;/strong&gt;: Any piece of software reflects the organizational structure that produced it&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="http://en.wikipedia.org/wiki/Grosch%27s_law"&gt;Grosch's Law&lt;/a&gt;&lt;/strong&gt;: Computer performance increases as the square of the cost. If computer A costs twice as much as computer B, you should expect computer A to be four times as fast as computer B&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="http://en.wikipedia.org/wiki/Little's_law"&gt;Little's Law&lt;/a&gt;&lt;/strong&gt;: The long-term average number of customers in a stable system L is equal to the long-term average effective arrival rate, λ, multiplied by the (Palm‑)average time a customer spends in the system, W; or expressed algebraically: L = λW.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="http://en.wikipedia.org/wiki/Gustafson%27s_law"&gt;Gustafson's Law&lt;/a&gt;&lt;/strong&gt;: Any sufficiently large problem can be efficiently parallelized&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="http://www.netlingo.com/word/gilders-law.php"&gt;Gilders Law&lt;/a&gt;&lt;/strong&gt;: bandwidth grows at least three times faster than computer power - See more at: http://www.netlingo.com/word/gilders-law.php#sthash.zGRjAR00.dpuf&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="http://en.wikipedia.org/wiki/Mooers%27_law"&gt;Mooers Law&lt;/a&gt;&lt;/strong&gt;: An information retrieval system will not be used if it is more painful for the user to have information than not to have it&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="http://en.wikipedia.org/wiki/Parkinson%27s_law"&gt;Parkinsons Law&lt;/a&gt;&lt;/strong&gt;: Programs expand to fill all available memory&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="http://en.wikipedia.org/wiki/Miller%27s_law"&gt;Millers Law&lt;/a&gt;&lt;/strong&gt;: All discussions of incremental updates to Bugzilla will eventually trend towards proposals for large scale redesigns or feature additions or replacements for Bugzilla&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="http://en.wikipedia.org/wiki/Wirth%27s_law"&gt;Wirths Law&lt;/a&gt;&lt;/strong&gt;: Software gets slower faster than hardware gets faster&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="http://en.wikipedia.org/wiki/Brooks%27_law"&gt;Brooks Law&lt;/a&gt;&lt;/strong&gt;: Adding manpower to a late software project makes it later&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="http://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule"&gt;Greenspuns Law&lt;/a&gt;&lt;/strong&gt;: Any sufficiently complicated C or Fortran program contains an ad hoc, informally specified, bug-ridden, slow implementation of half of Common Lisp&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="http://www.slideshare.net/pieterh/fosdem-2011-0mq"&gt;Hintjens Law of Concurrency&lt;/a&gt;&lt;/strong&gt;: &lt;em&gt;e = mc&lt;sup&gt;2&lt;/sup&gt;&lt;/em&gt;, where, e = effort, m = mass of code, c = colliding threads&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="aphorisms-and-quotes"&gt;Aphorisms and Quotes&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Carnegie Mellon's Professor Richard Pattis's &lt;a href="https://www.cs.cmu.edu/~pattis/quotations.html"&gt;collection&lt;/a&gt;  &lt;/li&gt;
&lt;li&gt;Yale's tribute to Alan Perlis - &lt;a href="http://www.cs.yale.edu/quotes.html"&gt;Epigrams in Computing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Edsger Dijkstra's, &lt;a href="http://www.cs.uofs.edu/~mccloske/dijkstra_quotes.html"&gt;Quotes&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;David Wiseman's, &lt;a href="http://www.csd.uwo.ca/~magi/personal/humour/Computer_Audience/The%20Laws%20of%20Computing.html"&gt;Laws of Computing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Fred Brooks's, &lt;a href="http://courses.cs.vt.edu/~cs1104/HLL/Brooks.html"&gt;Quotes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Don Knuths, &lt;a href="http://en.wikiquote.org/wiki/Donald_Knuth"&gt;Quotes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Tony Hoare's, &lt;a href="https://www.goodreads.com/author/quotes/266154.C_A_R_Hoare"&gt;Quotes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Quotes &lt;a href="http://www.paulgraham.com/quo.html"&gt;listed&lt;/a&gt; on Paul Graham's site&lt;/li&gt;
&lt;li&gt;Murphy's &lt;a href="http://www.murphys-laws.com/murphy/murphy-computer.html"&gt;Computer Laws&lt;/a&gt; &lt;/li&gt;
&lt;/ul&gt;</content><category term="posts"/></entry><entry><title>Streaming Twitter On Play + Spray Scala App</title><link href="https://bharath12345.github.io/posts/streaming-twitter-on-play--spray-scala-app/" rel="alternate"/><published>2014-01-29T00:00:00-05:00</published><updated>2014-01-29T00:00:00-05:00</updated><author><name>Bharadwaj</name></author><id>tag:bharath12345.github.io,2014-01-29:/posts/streaming-twitter-on-play--spray-scala-app/</id><summary type="html">&lt;p&gt;Working on &lt;a href="http://bharathwrites.in/posts/scala-projects-in-the-making/#scalog"&gt;Scalog&lt;/a&gt; I decided to write a quick program couple of days ago to see the super trending twitter hashtag - #ArnabVsRahul. I initially tried to follow the hashtag on TweetDeck but found that the arrival rate of new tweets simply did not allow me to read. Wanted a way …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Working on &lt;a href="http://bharathwrites.in/posts/scala-projects-in-the-making/#scalog"&gt;Scalog&lt;/a&gt; I decided to write a quick program couple of days ago to see the super trending twitter hashtag - #ArnabVsRahul. I initially tried to follow the hashtag on TweetDeck but found that the arrival rate of new tweets simply did not allow me to read. Wanted a way to read the tweets page-by-page with each page reloading when I refresh. So wrote a program to do so - A twitter stream listener! And yesterday pushed the code to GitHub and this is a quick post on it. The code itself can be found &lt;a href="https://github.com/bharath12345/playing"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Am building Scalog using &lt;a href="http://www.playframework.com/"&gt;Play2&lt;/a&gt;! framework in Scala. The blog hosted on Heroku can be accessed by this link - &lt;a href="http://bharathplays.herokuapp.com/"&gt;http://bharathplays.herokuapp.com/&lt;/a&gt;. The blog itself is just a Scala replica of my Jekyll and NodeJS blogs. Nothing special in the blog part.&lt;/p&gt;
&lt;p&gt;To listen to the tweets I use &lt;a href="http://spray.io/"&gt;Spray's&lt;/a&gt; HTTP actor listeners. The Spray HTTP client connects to Twitter's stream service URL and waits for the chunked responses on a persistent connection. Every new tweet arrives as a chunk. I simply push the tweets to a ByteArrayStream and read it later in Play's &lt;a href="http://www.playframework.com/documentation/2.2.x/ScalaStream"&gt;streaming&lt;/a&gt; to send it to a requesting browser.&lt;/p&gt;
&lt;p&gt;My twitter streamer can be accessed by using this link stub - http://bharathblogs.herokuapp.com/twitter/go/ followed by the term to search. For example to read tweets with ArnabVsRahul one will have to use the URL - &lt;a href="http://bharathplays.herokuapp.com/twitter/go/ArnabVsRahul"&gt;http://bharathplays.herokuapp.com/twitter/go/ArnabVsRahul&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The URL needs to refreshed once for the streaming to actually begin. The first access returns no data just as a check. And not necessarily all search strings will produce results - so its better to search for permanently high trending strings like "india" in case you see no data.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.cakesolutions.net/teamblogs/2013/12/08/streaming-twitter-api-in-akka-and-spray/"&gt;This blog&lt;/a&gt; came in handy while trying to understand how to stream twitter data using Spray's HTTP client's capabilities.&lt;/p&gt;</content><category term="posts"/></entry><entry><title>My Scala Projects In The Making</title><link href="https://bharath12345.github.io/posts/scala-projects-in-the-making/" rel="alternate"/><published>2013-12-25T00:00:00-05:00</published><updated>2013-12-25T00:00:00-05:00</updated><author><name>Bharadwaj</name></author><id>tag:bharath12345.github.io,2013-12-25:/posts/scala-projects-in-the-making/</id><summary type="html">&lt;p&gt;Over the last year I often heard my friends say the era of &lt;a href="http://en.wikipedia.org/wiki/Massive_open_online_course"&gt;MOOC&lt;/a&gt; was truly upon us. It was only on taking up couple of Coursera courses did I realise it fully. They have been eye-opening many times over (and extremely rigorous). Would particularly recommend these two to anyone …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Over the last year I often heard my friends say the era of &lt;a href="http://en.wikipedia.org/wiki/Massive_open_online_course"&gt;MOOC&lt;/a&gt; was truly upon us. It was only on taking up couple of Coursera courses did I realise it fully. They have been eye-opening many times over (and extremely rigorous). Would particularly recommend these two to anyone wanting to understand programming for the multicore, realtime, big-data world -&lt;/p&gt;
&lt;div class="toc"&gt;&lt;span class="toctitle"&gt;Table of Contents&lt;/span&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#gbridge"&gt;GBridge&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#scalog"&gt;ScaLog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#webflow"&gt;WebFlow&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://class.coursera.org/progfun-003"&gt;Functional Programming Principles In Scala&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://class.coursera.org/reactive-001"&gt;Principles Of Reactive Programming&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I have been trudging on and off with Scala for the latter half of 2013. Written many small programs to understand the core concepts. But doing these two courses have put me on very firm footing. The courses had me working on 12 solid assignments. And not a single one of these took me less than couple of days. The assignments cover a lot of ground which includes -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Composing non-trivial higher order functions&lt;/li&gt;
&lt;li&gt;Mixing object oriented with functional&lt;/li&gt;
&lt;li&gt;Usage of Scala collections along with language built-in's like pattern-matching&lt;/li&gt;
&lt;li&gt;Testing with ScalaTest and ScalaCheck&lt;/li&gt;
&lt;li&gt;Using RxJava and Observables on non-trivial data-set&lt;/li&gt;
&lt;li&gt;Using Akka for Actor based concurrency&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Inspired by these assignments, I have been working on few of my own ideas. Specifically, three projects (all of which are still in their infancy). However as I head out with family for a vacation to usher in the new year, I thought of writing this as a post-it on my web-wall. One of the new year resolutions is to invest more time and energy into these projects.&lt;/p&gt;
&lt;hr&gt;

&lt;h3 id="gbridge"&gt;GBridge&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Project Goal&lt;/strong&gt;: Data bridge between &lt;a href="http://ganglia.info/"&gt;Ganglia&lt;/a&gt; (gmond) and &lt;a href="http://zeromq.org/"&gt;ZeroMQ&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The Why&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Why Ganglia?&lt;/em&gt; Because it is (probably) the worlds most popular open-source data collection tool for large data centres&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Why ZeroMQ?&lt;/em&gt; Because it is (probably) the worlds most popular open-source data-bus for high volumes, with API in most programming languages&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Specifics&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Ganglia's &lt;em&gt;gmond&lt;/em&gt; agent responds with cluster wide metric health on TCP in XML. GBridge polls this data&lt;/li&gt;
&lt;li&gt;GBridge can collect data from multiple clusters and &lt;em&gt;any&lt;/em&gt; or &lt;em&gt;random&lt;/em&gt; host within the cluster&lt;/li&gt;
&lt;li&gt;GBridge is optimised for minimum polling of &lt;em&gt;gmond&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;Each metric is published only once (and as a separate message) per polling cycle on ZeroMQ&lt;/li&gt;
&lt;li&gt;Each metric is published as JSON&lt;/li&gt;
&lt;li&gt;Use actor based concurrency and futures for polling multiple gmond nodes, parsing response and publishing on ZeroMQ&lt;/li&gt;
&lt;li&gt;Completely in Scala&lt;/li&gt;
&lt;li&gt;Graceful degradation on load. Support distribution, automatic recovery on errors and failover&lt;/li&gt;
&lt;li&gt;Going ahead support &lt;a href="http://collectd.org/"&gt;Collectd&lt;/a&gt; on data ingress side. Support writing to &lt;a href="http://opentsdb.net/"&gt;OpenTSDB&lt;/a&gt; on the data egress side&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Code Status&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/bharath12345/gBridge"&gt;Coded&lt;/a&gt; the data collection, parsing and publish to ZeroMQ&lt;/li&gt;
&lt;li&gt;Tested only for small loads&lt;/li&gt;
&lt;li&gt;Very little unit test code&lt;/li&gt;
&lt;li&gt;Yet to design for distribution, recovery and failover&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;

&lt;h3 id="scalog"&gt;ScaLog&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Project Goal&lt;/strong&gt;: &lt;a href="http://jekyllrb.com/"&gt;Jekyll&lt;/a&gt; or &lt;a href="http://jsantell.github.io/poet/"&gt;PoetJS&lt;/a&gt; like markdown based static blogger in Scala&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The Why&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Scala lends better for server side coding. Learn and implement a full-stack web application in Scala&lt;/li&gt;
&lt;li&gt;For larger blogs, features like full text search can be much faster in Scala than Ruby or JavaScript&lt;/li&gt;
&lt;li&gt;Apart from human-readable HTML interface, also provide a machine-readable   RESTful interface&lt;/li&gt;
&lt;li&gt;Option to store the markdown in flat files on the server side or source it from a RDBMS (PostgreSQL)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Specifics&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;ScaLog uses &lt;a href="http://spray.io/"&gt;Spray&lt;/a&gt; for HTTP Server side (for both RESTful interface and HTML pages)&lt;/li&gt;
&lt;li&gt;ScaLog uses &lt;a href="https://github.com/sirthias/pegdown"&gt;pegdown&lt;/a&gt; for markdown processing &lt;/li&gt;
&lt;li&gt;ScaLog uses &lt;a href="http://slick.typesafe.com/"&gt;Slick&lt;/a&gt; to read and write to RDBMS from Scala (ORM like)&lt;/li&gt;
&lt;li&gt;Cloud platforms for applications like Heroku are the main target for deployment &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Code Status&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;The &lt;a href="https://github.com/bharath12345/myspray"&gt;code&lt;/a&gt; for CRUD (post/get/put/delete) operations for the blog with RESTful URLs is complete up to proof-of-concept&lt;/li&gt;
&lt;li&gt;The code for CRUD at the database layer also done&lt;/li&gt;
&lt;li&gt;pegdown parsing of markdown complete&lt;/li&gt;
&lt;li&gt;Work needed to easily extend the URLs, support UI templating and much more&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;

&lt;h3 id="webflow"&gt;WebFlow&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Project Goal&lt;/strong&gt;: &lt;a href="http://en.wikipedia.org/wiki/NetFlow"&gt;NetFlow&lt;/a&gt; like UDP export of ingress-egress data at Web-Servers &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The Why&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Gone are the &lt;a href="http://www.kegel.com/c10k.html"&gt;C10K&lt;/a&gt; problems. We are now in the world of &lt;a href="http://c10m.robertgraham.com/p/manifesto.html"&gt;C10M&lt;/a&gt; and beyond. With such high volume of connections, to account for all the request-responses hitting the web-servers, it is not sufficient to use polling based (JMX like) or log based mechanisms. Dictionary export mechanisms are valid contenders when the volumes are so large&lt;/li&gt;
&lt;li&gt;All the good reasons of why Netflow/Sflow are wonderful methods for volume accounting (at high volumes) at switch/router level &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Specifics&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Plug-in for Jetty/Netty/Spray/Servlet containers&lt;/li&gt;
&lt;li&gt;Completely Scala. Akka actor based&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Code Status&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Design done. Yet to start coding&lt;/li&gt;
&lt;/ul&gt;</content><category term="posts"/></entry><entry><title>Why Learn Scala?</title><link href="https://bharath12345.github.io/posts/why-learn-scala/" rel="alternate"/><published>2013-12-11T00:00:00-05:00</published><updated>2013-12-11T00:00:00-05:00</updated><author><name>Bharadwaj</name></author><id>tag:bharath12345.github.io,2013-12-11:/posts/why-learn-scala/</id><summary type="html">&lt;p&gt;It was a long time ago that I read this masterpiece by the software engineering guru, Peter Norvig - &lt;a href="http://norvig.com/21-days.html#answers"&gt;Teach Yourself Programming In Ten Years&lt;/a&gt;. Peter advises wannabe programmers to learn at least half a dozen programming languages. Taking stock of myself earlier this year I realised having terribly missed out …&lt;/p&gt;</summary><content type="html">&lt;p&gt;It was a long time ago that I read this masterpiece by the software engineering guru, Peter Norvig - &lt;a href="http://norvig.com/21-days.html#answers"&gt;Teach Yourself Programming In Ten Years&lt;/a&gt;. Peter advises wannabe programmers to learn at least half a dozen programming languages. Taking stock of myself earlier this year I realised having terribly missed out. In my decade long career, I have worked deeply in only 4 languages - C++, Java, JavaScript and Perl. And &lt;em&gt;none&lt;/em&gt; of them strongly functional per Peter's advise (functional JavaScript hasn't come to me yet). This led me to pose two questions to myself - &lt;/p&gt;
&lt;div class="toc"&gt;&lt;span class="toctitle"&gt;Table of Contents&lt;/span&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#scala-is-a-great-mix"&gt;Scala Is A Great Mix&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#haskell-and-scala"&gt;Haskell and Scala&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#erlang-and-scala"&gt;Erlang and Scala&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#c-java-and-scala"&gt;C#, Java and Scala&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#scala-ecosystem"&gt;Scala Ecosystem&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#unlearning-and-relearning-programming"&gt;Unlearning and Relearning Programming&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;Why do I need another language? &lt;/li&gt;
&lt;li&gt;If I have to pick one, then, which one?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The answer to the first question came to me rather quickly. At that time I was exploring what was new with JVM-7. And what was expected of Java and JVM-8. JVM-7 with its &lt;em&gt;invoke dynamic&lt;/em&gt; and Java-8 with &lt;em&gt;lambdas&lt;/em&gt; were clearly pointing the finger in a certain direction. I realised the designers of the JVM had started to embrace polyglot and functional programming. Digging deeper, the reasons for this move were easy to realise. Java's issues with type-safety, lack of immutable collections (in the JDK), rampant usage of shared mutability etc., were beginning to weigh heavy. The distributed, multicore, big-data computing, realtime world were making Java a little too verbose, justifying the need to look for alternatives.&lt;/p&gt;
&lt;p&gt;Surprisingly, the second question turned out to be the tougher of the two. The choice essentially was between Groovy, Scala and Clojure. I chose Scala. My pre-learning decision has got richly rewarded by what I have learnt after taking the plunge to Scala. Even as I continue to make the (sometimes) steep climb, this is a small, humble attempt to articulate the amazing things I have learnt. This write-up is a little too theoretical. For &lt;em&gt;show-me-the-code&lt;/em&gt; types, I will soon write about a &lt;em&gt;not-so-small&lt;/em&gt; 3-tier (DB &amp;lt;=&amp;gt; Biz &amp;lt;=&amp;gt; UI) application I have built entirely with Scala. &lt;/p&gt;
&lt;p&gt;In this post I allude to three broad reasons to &lt;em&gt;Why Learn Scala&lt;/em&gt; -&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Scala is a great mix. It imbibes some of the best features of other popular, successful languages&lt;/li&gt;
&lt;li&gt;Scala ecosystem of frameworks/libraries is big, mature, created by some great people from academia/industry and very well documented and supported&lt;/li&gt;
&lt;li&gt;Some features of Scala that have made me a more thoughtful, better programmer&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;

&lt;h3 id="scala-is-a-great-mix"&gt;Scala Is A Great Mix&lt;/h3&gt;
&lt;p&gt;Scala brings many new original ideas with it for a Java programmer. New ideas like the implementation of persistent data-structures on the JVM, mixing object-oriented with functional, deconstructing objects with pattern matching and many refreshing ideas to lessen code verbosity. But given its academic roots, these were expected. Whats nice is that Scala also brings with itself some of the best features from at least 4 other popular, well designed programming languages -&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Haskell&lt;/li&gt;
&lt;li&gt;Erlang&lt;/li&gt;
&lt;li&gt;C#&lt;/li&gt;
&lt;li&gt;Java&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Now to quickly dig into what it brings from each of these.&lt;/p&gt;
&lt;h4 id="haskell-and-scala"&gt;Haskell and Scala&lt;/h4&gt;
&lt;p&gt;These are two interesting Hammer Principle surveys -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://hammerprinciple.com/therighttool/statements/learning-this-language-improved-my-ability-as-a-pr"&gt;Learning This Language Improved My Ability As A Programmer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://hammerprinciple.com/therighttool/statements/learning-this-language-significantly-changed-how-i"&gt;Learning This Language Significantly Changed How I Use Other Languages&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Haskell tops both these lists. Scala, at its very core, incorporates a lot of Haskell's good features into itself. Here is a short quick list -&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Type Inference&lt;/li&gt;
&lt;li&gt;First Class Functions&lt;/li&gt;
&lt;li&gt;Currying&lt;/li&gt;
&lt;li&gt;Lazy Evaluation&lt;/li&gt;
&lt;li&gt;List Comprehensions&lt;/li&gt;
&lt;li&gt;Immutability&lt;/li&gt;
&lt;li&gt;Algebraic Data Types&lt;/li&gt;
&lt;li&gt;Higher Order Types&lt;/li&gt;
&lt;li&gt;Monads&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;One question that begs an answer - If Haskell is so good, then why not use Haskell itself? Why go for Scala? If thats an option then I would definitely encourage the reader to go ahead. But to those like me who love and trust the JVM, want interoperability with Java for its ecosystem of libraries and have a overarching/indefinite need for platform independency, Scala is a welcome choice.&lt;/p&gt;
&lt;h4 id="erlang-and-scala"&gt;Erlang and Scala&lt;/h4&gt;
&lt;p&gt;&lt;a href="http://highscalability.com/blog/2013/11/8/stuff-the-internet-says-on-scalability-for-november-8th-2013.html?SSLoginOk=true"&gt;WhatsApp&lt;/a&gt; gets more messages than Twitter. WhatsApp is built on Erlang. And thats for a reason. To handle as many messages as WhatsApp does, you need a massively concurrent application. To run a massively concurrent application, you need a lot of parallel execution. And Actor based method for concurrency brought by Erlang is built for such a usecase. It is backed by solid theory and research. However Erlang takes a thread backed method for its Actor model's concurrency. And the Erlang process is very lightweight. Erlang applications commonly have tens-of thousands of threads or more. Now thread's are a scarce resource on commodity hardware (Erlang does not always run on commodity hardware). And in a distributed, horizontally scaling setup the constraint on number of threads can be quite strict. The developers of Scala have thus provided two types of Actors: thread-based and event based. Thread based actors execute in heavyweight OS threads. They never block each other, but they don’t scale to more than a few thousand actors per VM. Event-based actors are simple objects. They are very lightweight, and, like Erlang processes, you can spawn millions of them on a modern commodity machine. The difference with Erlang processes is that within each OS thread, event based actors execute sequentially without preemptive scheduling. This makes it possible for an event-based actor to block its OS thread for a long period of time (perhaps indefinitely).&lt;/p&gt;
&lt;p&gt;If one is looking to engineer a highly concurrent application on the JVM, then Scala's Actor model provides a compelling option for designing such a system. I encourage the readers to listen to the many videos/talks by the architects of Scala Actor model (like Jonas Boner and Roland Kuhn) to get a more thorough understanding of the Actor model. Scala's Akka library with its Actor model is a great effort to bring the best of Erlang's proven concurrency model to the JVM engineers.&lt;/p&gt;
&lt;h4 id="c-java-and-scala"&gt;C#, Java and Scala&lt;/h4&gt;
&lt;p&gt;Scala has taken a lot of good things from C# and Java, especially in the syntax area. The syntax seems to have been designed especially keeping the Java programmers in mind, all the while trying to reduce the verbosity. One of the very interesting features that seems to have been inspired by C# is &lt;a href="http://www.artima.com/pins1ed/implicit-conversions-and-parameters.html"&gt;Implicits&lt;/a&gt;. They provide a means to extend libraries, help in type conversion etc. &lt;/p&gt;
&lt;hr&gt;

&lt;h3 id="scala-ecosystem"&gt;Scala Ecosystem&lt;/h3&gt;
&lt;p&gt;Scala has had its set of woes in this area. There has been quite some furore over backward compatibility of Scala's native libraries and other frameworks over the last few releases 2.7 &amp;gt; 2.8 &amp;gt; 2.9 &amp;gt; 2.10 (present). The Actors model has been written multiple times over - once as native scala.actors, once as part of the Lift library, and finally as part of Akka. However, having started coding with Scala many months ago and having worked on the latest releases of many of these libraries I have felt them being no different than those that exist in the world of Java in documentation, community backup etc. One great joy is actually the existence of many options in every area of the language, which I illustrate in the list below. &lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Concurrency, Event Management, ESB&lt;/strong&gt;
    a. &lt;a href="https://github.com/akka/akka"&gt;Akka&lt;/a&gt; - actor based concurrency model
    b. &lt;a href="https://github.com/eligosource/eventsourced"&gt;Eventsourced&lt;/a&gt; - persistence, recovery, redelivery of messages&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data structures&lt;/strong&gt;
    a. &lt;a href="https://github.com/scalaz/scalaz"&gt;Scalaz&lt;/a&gt; -  data Structures for functional programming
    b. &lt;a href="https://github.com/Netflix/RxJava"&gt;RxJava&lt;/a&gt; - composing asynchronous and event-based programs using observable sequences (not written in Scala but, probably, better used with Scala than Java from code hygiene POV!)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Build and Testing&lt;/strong&gt;
    a. &lt;a href="https://github.com/rickynils/scalacheck"&gt;ScalaCheck&lt;/a&gt; - testing framework with probably no Java equivalent (at least that I know of)
    b. &lt;a href="https://github.com/sbt/sbt"&gt;SBT&lt;/a&gt; - more concise than Maven. No XML crap - build instructions as Scala DSL&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Object Relational Mapping&lt;/strong&gt; -
    a. &lt;a href="https://github.com/slick/slick"&gt;Slick&lt;/a&gt; - Database access&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Distributed Big Data Tasks&lt;/strong&gt;
    a. &lt;a href="https://github.com/twitter/finagle"&gt;Finagle&lt;/a&gt; - Fault tolerant, protocol agnostic RPC system
    b. &lt;a href="https://github.com/twitter/scalding"&gt;Scalding&lt;/a&gt; - MapReduce for Scala
    c. &lt;a href="https://github.com/twitter/summingbird"&gt;SummingBird&lt;/a&gt; - Streaming, continuous, real-time MapReduce on top of Scalding or Storm&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Web Development&lt;/strong&gt;
    a. &lt;a href="http://liftweb.net/"&gt;Lift&lt;/a&gt;
    b. &lt;a href="http://www.playframework.com/"&gt;Play!&lt;/a&gt; 
    c. &lt;a href="http://spray.io/"&gt;Spray.IO&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;These are but just a few of the popular libraries in some of the more frequently programmed areas. There are many more options for a interested programmer in each area. For example, the number of web development frameworks in native Scala number more than 10. And then there are libraries in other areas like machine-learning etc.&lt;/p&gt;
&lt;hr&gt;

&lt;h3 id="unlearning-and-relearning-programming"&gt;Unlearning and Relearning Programming&lt;/h3&gt;
&lt;p&gt;For those coming from Java, with no functional programming background, Scala can be a steep learning curve. But it is well worth the effort. To me, apart from exposure to many concepts totally new, Scala has helped in getting more firmly grounded in the fundamentals of structure and interpretation of computer programs. It has helped me realise the many things I need to &lt;em&gt;unlearn&lt;/em&gt; to become a better programmer! If a passing reader finds this claim interesting, here is a quick list of things I feel better programming at now...&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Immutability: tradeoffs in using the C/Java style innocuous looking &lt;em&gt;for&lt;/em&gt; loop; Utility (and at times necessity) of immutable collections (which do not exist in Oracle's Java JDK)&lt;/li&gt;
&lt;li&gt;Type safety: strengths of Java/JVM style strict typing; &lt;a href="http://code.stephenmorley.org/articles/java-generics-type-erasure/"&gt;Problems&lt;/a&gt; in Java's type safety offering&lt;/li&gt;
&lt;li&gt;Inheritance: a better understanding of covariance and contra-variance &lt;/li&gt;
&lt;li&gt;Rethinking code verbosity by composing higher order functions, partial functions etc (lesser code often translates to fewer bugs)&lt;/li&gt;
&lt;li&gt;A better way to alleviate null-checks using Options&lt;/li&gt;
&lt;li&gt;Dependency Injection without annotations or XMLs&lt;/li&gt;
&lt;li&gt;Things can be better than using &lt;em&gt;static&lt;/em&gt; classes, methods, variables&lt;/li&gt;
&lt;li&gt;Closures and Mixin's possible on JVM too (until now, I had thought of these only from the JavaScript perspective)&lt;/li&gt;
&lt;li&gt;Using &lt;em&gt;Map&lt;/em&gt; when I needed &lt;em&gt;Tuple&lt;/em&gt; was not exactly a bright idea&lt;/li&gt;
&lt;li&gt;I can do so much more when I can write code that my build system understands... looking for Maven plugins need not be a way of life...&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;... and I can go on and on!! &lt;/p&gt;</content><category term="posts"/></entry><entry><title>Programming Is Hard To Manage</title><link href="https://bharath12345.github.io/posts/programming-is-hard-to-manage/" rel="alternate"/><published>2013-11-26T00:00:00-05:00</published><updated>2013-11-26T00:00:00-05:00</updated><author><name>Bharadwaj</name></author><id>tag:bharath12345.github.io,2013-11-26:/posts/programming-is-hard-to-manage/</id><summary type="html">&lt;p&gt;Couple of recent incidents triggered me to write this one. Few weeks ago, I met an old friend. A fellow software industry man. But unlike me, a people manager. As we shared our experiences in software development, my friend picked on my recently acquired MBA. Give me something to read …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Couple of recent incidents triggered me to write this one. Few weeks ago, I met an old friend. A fellow software industry man. But unlike me, a people manager. As we shared our experiences in software development, my friend picked on my recently acquired MBA. Give me something to read, my friend demanded. I promised him this blog.&lt;/p&gt;
&lt;div class="toc"&gt;&lt;span class="toctitle"&gt;Table of Contents&lt;/span&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#understanding-software-development"&gt;Understanding Software Development&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#group-level"&gt;Group Level&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#individual-level"&gt;Individual level&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#why-study"&gt;Why Study?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-quick-roundup"&gt;A Quick Roundup...&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#1the-mythical-man-month"&gt;1.The Mythical Man Month&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#2-adrenaline-junkies-and-template-zombies"&gt;2. Adrenaline Junkies and Template Zombies&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#3-the-inmates-are-running-the-asylum"&gt;3. The Inmates Are Running The Asylum&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#4-hackers-and-painters"&gt;4. Hackers And Painters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#5-dreaming-in-code"&gt;5. Dreaming In Code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#6-beautiful-code"&gt;6. Beautiful Code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#7-the-productive-programmer"&gt;7. The Productive Programmer&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;The second incident. A month ago I logged into my (almost) discarded yahoo-mail. And only valuable thing in that old mailbox is a folder with few musings from my early years as a software engineer.  I opened the folder out of curiosity. One of the notes was titled &lt;em&gt;Bewilderment&lt;/em&gt;. It was a &lt;em&gt;list&lt;/em&gt; of processes, decisions and people's actions that were totally counter-intuitive to me. At the end of the piece I had advised myself to study psychology to understand things!&lt;/p&gt;
&lt;hr&gt;

&lt;h3 id="understanding-software-development"&gt;Understanding Software Development&lt;/h3&gt;
&lt;p&gt;Over the years I have searched, read and re-read books which could broaden my understanding of this wonderful enterprise that we call software development. I broadly categorise these books into two groups: &lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Those explaining team behaviour, challenges &lt;/li&gt;
&lt;li&gt;Those that throw light on individual behaviour and advise improvement. &lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;And here is a short list of titles that I would highly recommend in these two categories - &lt;/p&gt;
&lt;h4 id="group-level"&gt;Group Level&lt;/h4&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;The Mythical Man Month&lt;/strong&gt; &lt;em&gt;by Dr. Fredrick Brooks Jr&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Adrenaline Junkies and Template Zombies&lt;/strong&gt; &lt;em&gt;by Tom DeMarco, Tim Lister et al&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The Inmates Are Running The Asylum&lt;/strong&gt; &lt;em&gt;by Alan Cooper&lt;/em&gt; &lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src="/images/programmingIsHard/team.jpg" alt="Drawing" style="width: 500px;"/&gt;&lt;/p&gt;
&lt;h4 id="individual-level"&gt;Individual level&lt;/h4&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Hackers and Painters&lt;/strong&gt; &lt;em&gt;by Paul Graham&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Dreaming In Code&lt;/strong&gt; &lt;em&gt;by Scott Rosenberg&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Beautiful Code&lt;/strong&gt; &lt;em&gt;articles by Brian Kerninghan, Charles Petzold, Douglas Crockford, Jeffrey Dean, Sanjay Ghemawat and many more super programmers&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The Productive Programmer&lt;/strong&gt; &lt;em&gt;by Neal Ford&lt;/em&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src="/images/programmingIsHard/individual.jpg" alt="Drawing" style="width: 500px;"/&gt;&lt;/p&gt;
&lt;hr&gt;

&lt;h3 id="why-study"&gt;Why Study?&lt;/h3&gt;
&lt;p&gt;When it comes to studying about project management, programmers and managers alike, do not necessarily get excited about reading books. Among the arguments I have heard include...&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Busy programmers and managers have enough on their hands to do... why add more? &lt;/li&gt;
&lt;li&gt;Don't we all learn by &lt;em&gt;doing&lt;/em&gt; things? This can be learnt only by &lt;em&gt;doing&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;Technology has changed (whatever that means!) &lt;/li&gt;
&lt;li&gt;Software is an odd-ball industry - too &lt;em&gt;new&lt;/em&gt; for theoretical dissection &lt;/li&gt;
&lt;li&gt;Software is too fast-paced and full of change for a scientific analysis&lt;/li&gt;
&lt;li&gt;Better to spend time on technical books from career perspective (whatever that means!)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Deep down, I believe two things are absolutely critical for furthering of human cause - &lt;strong&gt;Books&lt;/strong&gt; and &lt;strong&gt;Institutions&lt;/strong&gt;. Neither should be undermined at the cost of other. And both are mutually dependent. A purposeful life would be one spent in either/both these endeavours. &lt;/p&gt;
&lt;p&gt;To the arguments against studying of these books, all I can do is to offer a humble suggestion - as to what I have essentially learnt from them. These books, most importantly, have helped me to &lt;em&gt;articulate&lt;/em&gt; the difficult situations I have found myself in during software projects. Both to myself and to others. How many of us can really explain our office scenario at home? To our friends in 5 minutes? But it is the self-articulation that is probably &lt;em&gt;far more important&lt;/em&gt;. As project-people we often sense a pattern when things are going wrong (or right). The gut feeling. But it is difficult to understand why our gut says what it does. Let me give an example from my career. My first two jobs had been at large companies with thousands of employees. Each day was a routine - an hour's drive to office, clear-cut tasks, well funded projects and large teams. My contributions often felt small and inconsequential. But it was not so also. My managers pointed this out often. So what was it that sometimes made me uneasy? The phrase that articulates that feeling most accurately is &lt;em&gt;'Template Zombie'&lt;/em&gt;.&lt;/p&gt;
&lt;hr&gt;

&lt;h3 id="a-quick-roundup"&gt;A Quick Roundup...&lt;/h3&gt;
&lt;p&gt;There is absolutely no way to give a quick and dirty summary of any of these books.  I hold each one very highly and dearly. Worth reading multiple times. So all I do here is to share &lt;strong&gt;when&lt;/strong&gt; you might want to read each one.&lt;/p&gt;
&lt;h5 id="1the-mythical-man-month"&gt;1.The Mythical Man Month&lt;/h5&gt;
&lt;p&gt;This is a brilliant book for anyone aspiring for a lifelong career in software (like me!). Fred Brooks charts the complete territory - from programming languages to organization to design-think. And provides superbly constructed scientific arguments and thesis for all propositions. My bewilderments of workings in a big organisation were considerably answered in this book. &lt;/p&gt;
&lt;p&gt;Sample the &lt;strong&gt;Brooks law&lt;/strong&gt;: Adding manpower to a late software project makes it later.&lt;/p&gt;
&lt;h5 id="2-adrenaline-junkies-and-template-zombies"&gt;2. Adrenaline Junkies and Template Zombies&lt;/h5&gt;
&lt;p&gt;Are you making a switch from a big company to a startup? Or vice versa? From a big team to a small one? Or vice versa? If so, reading this book is highly advised.&lt;/p&gt;
&lt;h5 id="3-the-inmates-are-running-the-asylum"&gt;3. The Inmates Are Running The Asylum&lt;/h5&gt;
&lt;p&gt;Design issues? Conflicts at workplace? Politics? You will find some delightful answers here.&lt;/p&gt;
&lt;h5 id="4-hackers-and-painters"&gt;4. Hackers And Painters&lt;/h5&gt;
&lt;p&gt;Long long ago, in my first year at work, a close friend who was a excellent mentor and a superb programmer told me something that I will never forget. Mimic'ing Amitabh Bachchan he said &lt;em&gt;'duniya mein sirf doh tarah ke log hote hain... ek jo programming kar paate hain... our dusre woh jo programming nahin kar paate hain...'&lt;/em&gt;. This is a superb book if you feel like dwelling into that one!&lt;/p&gt;
&lt;h5 id="5-dreaming-in-code"&gt;5. Dreaming In Code&lt;/h5&gt;
&lt;p&gt;This delightful book is every bit worth carrying to a vacation.&lt;/p&gt;
&lt;h5 id="6-beautiful-code"&gt;6. Beautiful Code&lt;/h5&gt;
&lt;p&gt;An ideal present to a promising programmer. Such fascinating projects and such industrious engineers. Inspirational.&lt;/p&gt;
&lt;h5 id="7-the-productive-programmer"&gt;7. The Productive Programmer&lt;/h5&gt;
&lt;p&gt;Programming fast is a real skill. Programming productively and fast is an even greater skill. This is a nice self help book for all programmers who aspire to do that.&lt;/p&gt;</content><category term="posts"/></entry><entry><title>Concurrency on the JVM</title><link href="https://bharath12345.github.io/posts/concurrency-on-the-jvm/" rel="alternate"/><published>2013-11-14T00:00:00-05:00</published><updated>2013-11-14T00:00:00-05:00</updated><author><name>Bharadwaj</name></author><id>tag:bharath12345.github.io,2013-11-14:/posts/concurrency-on-the-jvm/</id><summary type="html">&lt;p&gt;Over the last few months I amused myself with an interesting pursuit. I spoke to a large number of people on the aspect of concurrency. I spoke to ex-colleagues. I spoke to engineers, architects at hackathons/meetups. And I interviewed a large number of senior engineers for a job at …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Over the last few months I amused myself with an interesting pursuit. I spoke to a large number of people on the aspect of concurrency. I spoke to ex-colleagues. I spoke to engineers, architects at hackathons/meetups. And I interviewed a large number of senior engineers for a job at my company. I spoke to them about building a highly-concurrent, high-volume, real-time data-aggregation engine. Gave examples of easy, textbookish projects to drive home the requirements. Like a stock trading platform with 1000s of users, 1000s of stocks and 100s of stock-exchanges. Or a IPL ticket &lt;em&gt;bidding&lt;/em&gt; site with 1000s of users, many seating categories, many venues etc. And this small article is about my perspectives at the end of it. &lt;/p&gt;
&lt;hr&gt;

&lt;h4 id="programming-concurrency-building-frustrations-globally"&gt;Programming Concurrency... Building Frustrations Globally&lt;/h4&gt;
&lt;p&gt;The engineering challenge involved in building high volume concurrent applications should not be underestimated. Some &lt;em&gt;veterans&lt;/em&gt; I spoke to suggested that such problems and multiple solution approaches have existed for ages. I shamelessly, often at the cost of personal repute (of embarrassing someone) counter-questioned to tell me the &lt;em&gt;various&lt;/em&gt; approaches. Essentially, all, boiled down to just two - &lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Handling all data in a single thread due to the fear of complexity&lt;/strong&gt;: On hearing this, I would often say do u really intend to exercise just one &lt;em&gt;core&lt;/em&gt; of your upcoming quad-core, 64-processor server? Hearing this, they would move on to the 2nd option, which is...&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Build the beast with thread-pools, locks and synchronisation blocks&lt;/strong&gt;: Lets use java.util.concurrent and laugh our way to the ATM, some suggested... and I would say watch out... you could end up in a jail, a mental asylum or a bankruptcy proceeding before reaching that ATM!&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;While interviewing candidates for the job I have been on the lookout for engineers who have few perspectives &lt;em&gt;other than&lt;/em&gt; just these two. A small fraction, when pushed, uttered something to the extent of &lt;em&gt;event-based, SEDA&lt;/em&gt; approaches. But when questioned further, these event based approaches often endup folding in the realm of one of the above two. I find this rather sad. &lt;/p&gt;
&lt;p&gt;Let me clarify, I do &lt;strong&gt;not&lt;/strong&gt; think that the above two approaches are fundamentally wrong. But knowing just two is clearly insufficient. &lt;/p&gt;
&lt;p&gt;I also ran into an interesting few who had dabbled with NodeJS and were clearly smitten. Smitten with the action and enthusiasm of engineers in that world rather than with any hard technological breakthroughs. In this article I will not talk about NodeJS. I have briefly written about it earlier on this blog &lt;a href="http://bharathwrites.in/posts/the-bleeding-edge-of-an-application/"&gt;here&lt;/a&gt;. I dearly hope that those who suggested NodeJS did so out of their own naiveté and my hard nudges... and do not truly believe in the NodeJS performance hyperbole!&lt;/p&gt;
&lt;hr&gt;

&lt;h4 id="not-java-but-the-jvm"&gt;Not Java... But the JVM!&lt;/h4&gt;
&lt;p&gt;Java engineers need to graduate to becoming JVM engineers. They need to internalise forever the fact that JVM has been a far bigger innovation than Java as a language. And when Java engineers will do that they will realise that the so-called competitors like NodeJS are non-starters. One book that I highly advise to those who wish to make this graduation is the super revealing 'Programming Concurrency on the JVM' by &lt;a href="https://twitter.com/venkat_s"&gt;Dr. Venkat Subramaniam&lt;/a&gt;. I read this book sometime ago. Back then, I was just beginning to find my reasons to learn Scala/Clojure. Reading it filled me with the energy to know more about the JVM internals and the new world of concurrency programming. &lt;/p&gt;
&lt;p&gt;Nietzche once said "He who has a why to live can bear almost any how". As a programmer, my why has been concurrency, multi-core, big-data and high-performance. And Dr. Venkat gives a few how's!&lt;/p&gt;
&lt;p&gt;Broadly, the book is divided into three architectural approaches that one could take to build a concurrent application on the JVM, which are -&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;java.util.concurrent&lt;/strong&gt; with thread-pools, synchronization blocks, locks, fork-join etc &lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Software Transactional Memory&lt;/strong&gt; - being made popular by Clojure. &lt;a href="http://stackoverflow.com/questions/209751/any-real-world-experience-using-software-transactional-memory"&gt;This&lt;/a&gt; StackOverflow thread on real world adoption is instructive. And &lt;a href="http://www.cs.rochester.edu/~sandhya/papers/usenix_login_09.pdf"&gt;this&lt;/a&gt; paper gives the reader an excellent understanding from both Hardware and Software Transactional Memory perspectives&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Actor Based Concurrency&lt;/strong&gt; - being made popular by Scala and Akka. One just needs to visit the Typesafe website to know about the rapid adoption of this model&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;

&lt;h4 id="programming-concurrency-on-the-jvm-dr-venkat-subramaniam"&gt;Programming Concurrency on the JVM - Dr. Venkat Subramaniam&lt;/h4&gt;
&lt;p&gt;Dr. Venkat drives home the following points to those who wish to develop concurrent applications -&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The three options available to the designers&lt;ul&gt;
&lt;li&gt;Shared mutability... the &lt;em&gt;pure evil&lt;/em&gt; option&lt;/li&gt;
&lt;li&gt;Isolated mutability&lt;/li&gt;
&lt;li&gt;Pure immutability &lt;br&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Introduction to the world of &lt;em&gt;persistent&lt;/em&gt; data structures. &lt;a href="http://cstheory.stackexchange.com/questions/1539/whats-new-in-purely-functional-data-structures-since-okasaki"&gt;Here&lt;/a&gt; is a mind blowing thread on recent innovations in functional data structures. Many are persistent.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A quick intro to the world of modern JDK concurrency mechanisms. It is in this part of the book that I found a certain treatment of the subject of concurrency that I was sorely missing. Applications has multiple 'needs' that drive the concurrency requirement. Broadly these needs can be divided into three parts -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;High Network I/O Intensity (large network I/O requirements lead to concurrency designs)&lt;/li&gt;
&lt;li&gt;High Disk I/O Intensity&lt;/li&gt;
&lt;li&gt;Large compute problems which can be broken down to smaller pieces... divide and conquer... which leads to concurrent designs &lt;br&gt;
&lt;br /&gt;  &lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The numerous code examples in the book showcase two things -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Increasing complexity of code in certain approaches&lt;/li&gt;
&lt;li&gt;The time-to-compute or efficiency differential by comparing the different approaches
&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;I would love to quote a few sentences from the STM chapter of the book...&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;(1) We’ve been led down the path of the imperative style of programming with mutable state for so long that it’s very hard to see alternatives to synchronization, but there are.&lt;/p&gt;
&lt;p&gt;(2) OOP didn’t quite turn out to be what Alan Kay had in mind when he coined the term. His vision was primarily message passing, and he wanted to get rid of data. Somewhere along the way, OO languages started down the path of data hiding through Abstract Data Types (ADTs), binding data with procedure or combining state and behavior.    &lt;/p&gt;
&lt;p&gt;(3) In the real world, the state does not change; the identity does&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;But STM is not a silver bullet to all concurrency applications. The author clearly says - STM is suitable for concurrent reads and infrequent to reasonably frequent write collisions to the same data.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Actors are a pure message passing model. Each actor has a built-in message queue. Actor library allows multiple actors to send messages concurrently. The senders are nonblocking by default. Although multiple actors may be active at any time, only one thread is active in an actor at any instance. The main drawback of this model, in my opinion, is that message passing systems with proper interleaving is not an easy art - it requires deep design thinking. &lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h4 id="epilogue"&gt;Epilogue&lt;/h4&gt;
&lt;p&gt;Programming concurrency is hard. Any which way. When confronted with such requirements and problems, the vocabulary used by the engineers and architects to make good design choice and find right hires is extremely critical. I was recently following some discussions on Y! Combinator on the suitability of Scala/Clojure to develop enterprise applications using such new ideas for concurrency and many other things. And I found this comment, though a little harshly worded, as food for thought...&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The only thing enterprise business managers want is a language that can dumb down the art of programming to a level a programmers can be managed like assembly line workers. And that is what Java does exactly, an IDE that can make a novice and expert work at the same levels of productivity, extremely verbose code that gives an illusion of people building something big(even if its down right trivial). And most importantly programming effort can be accounted like a couple of least important replaceable folks down the hierarchy doing some assembling reusable units of material. Change this scenario, a good technology with merit makes programmers very important and makes managers look like desk clerks. Enterprise Managers don't care a least about type systems, lambdas, or traits or whatever. Most managers don't have a remote clue what those things are. Can the technology enable them to manage herds of programmers dumbed down enough to be managed like sheep? That is all they care.&lt;/p&gt;
&lt;/blockquote&gt;</content><category term="posts"/></entry><entry><title>Folding it the right way</title><link href="https://bharath12345.github.io/posts/folding-it-the-right-way/" rel="alternate"/><published>2013-10-31T00:00:00-04:00</published><updated>2013-10-31T00:00:00-04:00</updated><author><name>Bharadwaj</name></author><id>tag:bharath12345.github.io,2013-10-31:/posts/folding-it-the-right-way/</id><summary type="html">&lt;p&gt;I have been dabbling with Scala for a few months now. And one of the things that strikes me about functional programming is the beauty of the finished code. It sometimes gives me a feeling of being just the right mix of art and science! Gone are the dirty null …&lt;/p&gt;</summary><content type="html">&lt;p&gt;I have been dabbling with Scala for a few months now. And one of the things that strikes me about functional programming is the beauty of the finished code. It sometimes gives me a feeling of being just the right mix of art and science! Gone are the dirty null/empty checking &lt;em&gt;if statements&lt;/em&gt;. Gone are the dumb variety &lt;em&gt;for/while loops&lt;/em&gt;. I haven't progressed far enough to be using actors but the very thought that variables in my program are &lt;em&gt;not getting mutated&lt;/em&gt; while being thrashed across many cores and caches is enough to sometimes give me a high!&lt;/p&gt;
&lt;p&gt;But this blog is about something else. I just wanted to write about a small piece of code as an example of beauty, expressiveness and correctness of the functional style. I ran into this problem as part of my Scala Coursera course. First of all neither is the problem nor the solution mine. After writing a lot of imperative style ugly code to solve the problem I got fed up with myself and searched for a better way to do it. A more functional way. Here I just explain the problem and the solution.&lt;/p&gt;
&lt;p&gt;Firstly, the problem - &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Write a method to compute all the subsets of list of tuples. For example, given this tuple list &lt;code&gt;List(('a', 2), ('b', 2))&lt;/code&gt; the list of all subsets is:&lt;/strong&gt;&lt;/p&gt;
&lt;pre&gt;
List(
  List(),
  List(('a', 1)),
  List(('a', 2)),
  List(('b', 1)),
  List(('a', 1), ('b', 1)),
  List(('a', 2), ('b', 1)),
  List(('b', 2)),
  List(('a', 1), ('b', 2)),
  List(('a', 2), ('b', 2))
)
&lt;/pre&gt;

&lt;p&gt;Now I request you to please try solving this. It really is not very tough. Crack up your IDE and try in the imperative style of programming. Use whatever data-structures and algorithms.&lt;/p&gt;
&lt;p&gt;Yes, you will be able to crack it, after maybe some pain. But after you are done, give that code you wrote a hard stare. And a hard stare to the functional equivalent below. It is inevitable that you will realise, how fat our coding has grown on the unhealthy monotonous diet of pure imperative thinking all the time...&lt;/p&gt;
&lt;pre&gt;
1.  def combinations(occurrences: List[(Char, Int)]): List[List[(Char, Int)]] = 
2.      (occurrences foldRight List[List[(Char, Int)]](Nil)) 
3.      { case ((ch,tm), acc) =&gt; 
4.          {
5.              acc ++ ( for { 
6.                      comb &lt;- acc; 
7.                      n &lt;- 1 to tm 
8.                      } yield (ch, n) :: comb 
9.                  )
10.         } 
11.     }
&lt;/pre&gt;

&lt;p&gt;So, there you have it. About 10 lines of thin code in all its glory. Now let me get under the skin of it to show what really is happening here...&lt;/p&gt;
&lt;p&gt;First of all, Scala has the concept of tuples that helps in having cleaner data structures for problems like these. Secondly, this code (foldRight) uses currying. If you don't know about currying, that is okay. It just means that all items in a data-structure are applied on a &lt;em&gt;passed&lt;/em&gt; function. The function &lt;em&gt;passed&lt;/em&gt; in this case is the one that starts with the curly brace on line#3. Thirdly, this piece of code uses multiple anonymous functions.&lt;/p&gt;
&lt;p&gt;Let me describe the execution flow step-by-step -&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The Scala foldRight method applies the passed method on data items in the reverse. So, on passing the list &lt;code&gt;List(('a', 2), ('b', 2))&lt;/code&gt;, the first data item to be used for processing is &lt;code&gt;('b', 2)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;foldRight takes an initial accumulator. In this case it is the &lt;code&gt;Nil&lt;/code&gt; passed in line#2&lt;/li&gt;
&lt;li&gt;So the initial value of parameters on line#3 are: &lt;ul&gt;
&lt;li&gt;ch = 'b'&lt;/li&gt;
&lt;li&gt;tm = 2&lt;/li&gt;
&lt;li&gt;acc = Nil&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;The for expression &lt;em&gt;yields&lt;/em&gt; two &lt;em&gt;tuples&lt;/em&gt; on being executed. The two tuples are ('b', 1) and ('b', 2). These two are appended to the Nil list and we have the result after the first pass of data structure as &lt;code&gt;List(List(), List((b,1)), List((b,2)))&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;In the second pass, the data-item from our occurrences list being processed is &lt;code&gt;('a', 2)&lt;/code&gt;. So the value of parameters this time on line#3 are:&lt;ul&gt;
&lt;li&gt;ch = 'a'&lt;/li&gt;
&lt;li&gt;tm = 2&lt;/li&gt;
&lt;li&gt;acc = &lt;code&gt;List(List(), List((b,1)), List((b,2)))&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Its in this second pass, that things really get interesting. The for statement yields all 4 remaining subsets, which are &lt;code&gt;List(List((a,1)), List((a,2)), List((a,1), (b,1)), List((a,2), (b,1)), List((a,1), (b,2)), List((a,2), (b,2)))&lt;/code&gt; in this single pass! It will take a little bit of mind bending to understand how this happens... but its definitely worth the effort... just reading it made my day!&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Now coming to the great thing about this program - performance! Compare the number of passes on the data-structures that this piece of code has taken to the imperative code. The first thing to really digest is that this not some algorithm trickery. Now that you &lt;em&gt;know&lt;/em&gt; the algorithm in &lt;em&gt;functional&lt;/em&gt; programming style, try doing it the imperative style. Firstly, the code will not look this concise. Secondly, most of us will simply not be able to do it right.&lt;/p&gt;
&lt;p&gt;But the best part - the input data structure is immutable and so are all intermediate ones. The benefit? This piece of code will &lt;strong&gt;not fail if some other thread of execution changes the input variable when this piece of coding is executing&lt;/strong&gt;! (that is &lt;code&gt;List[(Char, Int)] occurrences&lt;/code&gt; data structure). Why? Because it is impossible to change the input data structure! It is born immutable. It will live immutable. And it will die immutable. Nothing ever can come in its way!&lt;/p&gt;
&lt;p&gt;Unfortunately, this algorithm implementation is such that output from the first iteration gets fed in the second iteration. So two parallel cores cannot be running it simultaneously. However, with all intermediate data structures being 100% immutable, it is not difficult to imagine other problems/algorithms which do not have this constraint thus using up more cores at once and built for distribution and performance! I hope you share my wow(!) about this piece of code and functional programming in this case.&lt;/p&gt;</content><category term="posts"/></entry><entry><title>Algorithms Course-1 With Prof Sidgewick on Coursera</title><link href="https://bharath12345.github.io/posts/algorithms-course-i-with-prof-sidgewick-on-coursera/" rel="alternate"/><published>2013-10-08T00:00:00-04:00</published><updated>2013-10-08T00:00:00-04:00</updated><author><name>Bharadwaj</name></author><id>tag:bharath12345.github.io,2013-10-08:/posts/algorithms-course-i-with-prof-sidgewick-on-coursera/</id><summary type="html">&lt;p&gt;I did my engineering in electronics and communication systems. But my very first job was in software development. Having not studied theory of computing, databases, compilers and even algorithms/data-strutures as part of my graduation I went on to self-study these. However, deep down, have felt the need for more …&lt;/p&gt;</summary><content type="html">&lt;p&gt;I did my engineering in electronics and communication systems. But my very first job was in software development. Having not studied theory of computing, databases, compilers and even algorithms/data-strutures as part of my graduation I went on to self-study these. However, deep down, have felt the need for more structured education. I don't remember when I first heard of &lt;a href="https://www.coursera.org"&gt;Coursera&lt;/a&gt;. But my early tryst with online education had been dismal (at my previous employer they would make me go through online training's mandatorily… and those used to absolutely suck). So even as I kept track of the courses offered on Coursera since early this year, I did not enroll. A couple of months ago I decided to give it a serious try… and I enrolled myself for the first course on Algorithms by Professor Robert Sidgewick. I finished my final exam on the course yesterday. And it feels great to be done with all tests and programming assignments. The course was structured in the undergraduate training way… which is exactly what I wanted. The learning has been enormous. Anyone who has spent a decade in software development like me would know MergeSort and QuickSort anyway… but the scientific treatment of the subject both in the videos and the textbook gives me a sense of closure. And by the way, I think algorithms and data-structures is a field which a practicing engineer has to seriously brush-up, once in every few years, just to keep up…&lt;/p&gt;
&lt;div class="toc"&gt;&lt;span class="toctitle"&gt;Table of Contents&lt;/span&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#general-questions-on-the-study-of-algorithms"&gt;General questions on the study of Algorithms&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#sorting"&gt;Sorting&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#searching"&gt;Searching&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;Like the few book reviews that I have done before on my blog, this is a quick refresher for myself on all that I have studied. Its not complete or thorough. And I hope there are no factual errors. So if a passing reader finds anything here useful, it makes me glad… &lt;/p&gt;
&lt;hr&gt;

&lt;h4 id="general-questions-on-the-study-of-algorithms"&gt;General questions on the study of Algorithms&lt;/h4&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Why study Algorithms and Data Structures? Why are they important?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Computers, no matter how powerful, have space and time constraints. Poorly thought through implementations for computing problems can take years to compute even when computing resources are massive. For example -
&lt;p align="center"&gt;
&lt;img alt="image" src="/images/algorithms/timecompare.png"&gt;
&lt;/p&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Why learn, re-learn algorithms?&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The primary reason to do so is that we are faced, all too often, with completely new computing environments (hardware and software) with new features that old implementations may not use to best advantage&lt;/li&gt;
&lt;li&gt;As a professional, it is a crime to use tools without their thorough understanding. So as Java programmers, to use HashMap and TreeSet without the knowledge of the underlying resource utilisation and performance impact is…&lt;/li&gt;
&lt;li&gt;Intellectually satisfying&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;How do you measure how long will your program take to run?&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Repeated runs in thousands to find the mean and standard-deviation&lt;/li&gt;
&lt;li&gt;Run it for different quantum's of input data 'N' - find mean and std-dev for different N after thousands of runs&lt;/li&gt;
&lt;li&gt;Find a relationship between N and time-taken by plotting on a graph - is the graph linear? hyperbolic? logarithmic? &lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Why measure how long programs take to run?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Knowing the order of growth of the running time of an algorithm provides precisely the information that you need to understand limitations on the size of the problems that you can solve. Developing such understanding is the most important reason to study performance.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;What are big-O and big-Omega notations? Why are they needed?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;big-O is for the upper bound. big-Omega is for the lower bound. (there is also a big-Theta that is a little more involved idea). The running times of a great many programs depend only on a small subset of their instructions - so when running time of algorithms are proportional to squares(N&lt;sup&gt;2&lt;/sup&gt;) or cubes(N&lt;sup&gt;3&lt;/sup&gt;) or exponentials(2&lt;sup&gt;N&lt;/sup&gt;) of input data counts (N), we know that these algorithms will not scale for large inputs (N). Only when running times of algorithms are proportional to linear(N), linearithmic(NlogN) or logarithmic(logN) or constant can they be expected to scale for large inputs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Then why is big-O not useful for predicting performance or for comparing algorithms?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The primary reason is that it describes only an upper bound on the running time. Actual performance might be much better. The running time of an algorithm might be both O(N2) and ~ a N log N. As a result, it cannot be used to justify tests like our doubling ratio test (see Proposition C on page 193). &lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;What is the base of log when we are talking about complexities of algorithms? Why?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Base-2. In terms of Big-O, the base doesn't matter because the change of base formula implies that it's only a constant factor difference. That is logarithms from base 10 or base 2 or base e can be exchanged (transformed) to any other base with the addition of a constant. The critical thing to understand is that logarithms (of any base) increase slowly with the increase of N. However, observe this table of log values… (with respect to complexity of algorithms, the value of N can never be fractional or negative anyway...)
&lt;p align="center"&gt;
&lt;img alt="image" src="/images/algorithms/log.png"&gt;
&lt;/p&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;What does Java Arrays.sort() implement?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Mergesort till Java6. TimSort from Java7 onwards...&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Order of growth graph?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Here is the log-log plot (both size(N) x-axis and time(T) y-axis are in logarithms)
&lt;p align="center"&gt;
&lt;img alt="image" src="/images/algorithms/orderofgrowth.png"&gt;
&lt;/p&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Example of each -&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;constant time - assignment statement&lt;/li&gt;
&lt;li&gt;logarithmic - binary search&lt;/li&gt;
&lt;li&gt;linear - find the maximum value&lt;/li&gt;
&lt;li&gt;linearithmic - merge sort&lt;/li&gt;
&lt;li&gt;quadratic - double for/while loop&lt;/li&gt;
&lt;li&gt;cubic - triple for/while loop&lt;/li&gt;
&lt;li&gt;exponential - brute force search&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Why develop faster algorithms?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Faster algorithms help us to address larger problems&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Why study memory utilisation of Java programs?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If you have 1GB of memory on your computer (1 billion bytes), you cannot fit more than about 32 mil- lion int values or 16 million double values in memory at any one time.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;How many bytes in memory are required to store a reference to a Java Object?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;4 bytes on a 32 bit system. 8 bytes on a 64 bit system&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4 id="sorting"&gt;Sorting&lt;/h4&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;In Java what do you have to do to be able to sort an array of a custom object type?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The class of the object should implement Comparable&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Performance of selection sort&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;N&lt;sup&gt;2&lt;/sup&gt;/2 compares and N exchanges&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;About selection sort&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;it takes about as long to run selection sort for an array that is already in order or for an array with all keys equal as it does for a randomly-ordered array! &lt;/li&gt;
&lt;li&gt;Data movement is minimal&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Performance of insertion sort&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Insertion sort uses N&lt;sup&gt;2&lt;/sup&gt;/4 compares and N&lt;sup&gt;2&lt;/sup&gt;/4 exchanges to sort a randomly ordered array of length N with distinct keys, on the average. The worst case is N&lt;sup&gt;2&lt;/sup&gt;/2 compares and N&lt;sup&gt;2&lt;/sup&gt;/2 exchanges and the best case is N-1 compares and 0 exchanges.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Performance of merge sort&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Top-down and bottom-up mergesort uses between 1⁄2NlogN and NlogN compares to sort any array of length N. Top-down mergesort uses at most 6NlogN array accesses to sort an array of length N. The primary drawback of mergesort is that it requires extra space proportional to N&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Upper limits to compare based sorting algorithms&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Compare-based algorithms that make their decisions about items only on the basis of comparing keys. A compare-based algorithm can do an arbitrary amount of computation between compares, but cannot get any information about a key except by comparing it with another one. No compare-based sorting algorithm can guarantee to sort N items with fewer than log(N!) ~ NlogN compares.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Performance of quick sort&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The quicksort algorithm’s desirable features are that it is in-place (uses only a small auxiliary stack) and uses ~ 2NlogN compares and one-sixth that many ex- changes on the average to sort an array of length N with distinct keys.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Whats the problem statement for priority queues?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;insert&lt;/em&gt; and &lt;em&gt;remove the maximum&lt;/em&gt; have to be fast. Provide fast insert and access to a subset of data points among potentially infinite number of data points. Binary heaps provide the data structure to implement logarithmic time insert and remove-max. (Java natively provides a &lt;a href="http://stackoverflow.com/questions/683041/java-how-do-i-use-a-priorityqueue"&gt;PriorityQueue&lt;/a&gt; implementation as part of collections)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;What is a binary heap?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In a binary heap, the keys are stored in an &lt;em&gt;array&lt;/em&gt; such that each key is guaranteed to be larger than (or equal to) the keys at two other specific positions. In turn, each of those keys must be larger than (or equal to) two additional keys, and so forth. The largest key in a heap-ordered binary tree is found at the root. Generally binary heaps are stored sequentially within an array by putting the nodes in level order, with the root at position 1, its children at positions 2 and 3, their children in positions 4, 5, 6, and 7, and so on.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Performance of Priorty queues with binary heaps?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In an N-key priority queue, the heap algorithms require no more than 1 + log N compares for insert and no more than 2logN compares for remove the maximum.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Performance of heap sort&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Heapsort is significant because it is the only method that is optimal (within a constant factor) in its use of both time and space—it is guaranteed to use ~2NlogN compares and constant extra space in the worst case. When space is very tight (for example, in an embedded system or on a low-cost mobile device) it is popular because it can be implemented with just a few dozen lines (even in machine code) while still providing optimal performance. However, it is rarely used in typical applications on modern systems because it has poor &lt;em&gt;cache&lt;/em&gt; (processor cache) performance: array entries are rarely compared with nearby array entries, so the number of cache misses is far higher than for quicksort, mergesort, and even shellsort, where most compares are with nearby entries.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Application of PriorityQueue&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TopN&lt;/strong&gt; by some particular order of prioritization. If you are looking for the top ten entries among a billion items, do you really want to sort a billion-entry array? With a priority queue, you can do it with a ten-entry priority queue.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;When to use Java Comparable and when the Comparator?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Implementing Comparable means implementing the compareTo method which is supposed to show the &lt;em&gt;natural ordering&lt;/em&gt; in a set of objects of a certain type.
There are many applications where we want to use differ- ent orders for the objects that we are sorting, depending on the situation. The Java Comparator interface allows us to build multiple orders within a single class. It has a single public method compare() that compares two objects. If we have a data type that implements this interface, we can pass a Comparator to sort(). In typical applications,items have multiple instance variables that might need to serve as sort keys. The Comparator mechanism is precisely what we need to allow this flexibility.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Can comparators be used with PriorityQueues as well?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Yes. See &lt;a href="http://stackoverflow.com/questions/683041/java-how-do-i-use-a-priorityqueue"&gt;http://stackoverflow.com/questions/683041/java-how-do-i-use-a-priorityqueue&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;When is a sorting method stable?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If it preserves the relative order of equal keys in the array. Read the beautiful example on page 341&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Which sorting algorithms are stable?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Only insertion sort and merge sort are stable&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;
&lt;h4 id="searching"&gt;Searching&lt;/h4&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Popular data-structures to hold symbol tables?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Binary search trees, Red black trees and hash tables&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Performance of brute force sequential search (unordered arrays or linked lists)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Search misses and insertions in an (unordered) linked-list symbol table having N key-value pairs both require N compares, and search hits N compares in the worst case. In particular, inserting N distinct keys into an initially empty linked-list symbol table uses ~N&lt;sup&gt;2&lt;/sup&gt;/2 compares. One useful measure is to compute the total cost of searching for all of the keys in the table, divided by N - for sequential search this is N/2&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Performance of binary search for symbol tables&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Binary search in an ordered array with N keys uses no more than logN + 1 compares for a search (successful or unsuccessful). But inserting a new key into an or- dered array of size N uses ~ 2N array accesses in the worst case, so inserting N keys into an initially empty table uses ~ N&lt;sup&gt;2&lt;/sup&gt; array accesses in the worst case.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Performance of BST (binary search trees) for symbol tables&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Search hits in a BST built from N random keys require ~ 1.39logN compares, on the average. Insertions and search misses in a BST built from N random keys require ~ 1.39logN compares, on the average.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Shortcoming of BST&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The running times of algorithms on binary search trees depend on the shapes of the trees, which, in turn, CS depend on the order in which keys are inserted. In the best case, a tree with N nodes could be perfectly balanced, with ~ logN nodes between the root and each null link. In the worst case there could be N nodes on the search path. So to optimise, keys are inserted in random by purpose to tilt towards the average case search performance&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Performance of 2-3 Search trees&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Search and insert operations in a 2-3 tree with N keys are guaran- teed to visit at most logN nodes.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Performance of Red-Black BST&lt;/strong&gt;
    &lt;p align="center"&gt;
    &lt;img alt="image" src="/images/algorithms/symbolperf.png"&gt;
    &lt;/p&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Why use hashing?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;To be able to handle more complicated keys (custom objects, strings)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;What are the two popular ways to hash collision resolution?&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Separate chaining - bag of items for each hash key&lt;/li&gt;
&lt;li&gt;Linear probing - also known as &lt;a href="http://en.wikipedia.org/wiki/Hash_table#Open_addressing"&gt;Open addressing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;In Java, which is faster - HashSet or TreeSet? What is the usecase for each?&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;HashSet&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Almost constant time performance due to the usage hash functions for the basic operations (add, remove, contains and size)&lt;/li&gt;
&lt;li&gt;does not guarantee that the order of elements will remain constant over time&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;TreeSet&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;guarantees log(n) time cost for the basic operations (add, remove and contains)&lt;/li&gt;
&lt;li&gt;guarantees that elements of set will be sorted (ascending, natural, or the one specified by you via it's constructor)&lt;/li&gt;
&lt;li&gt;offers a few handy methods to deal with the ordered set like first(), last(), headSet(), and tailSet() etc&lt;/li&gt;
&lt;li&gt;Internally uses a implementation close to Red-Black Trees&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Common features&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Being sets, both offer duplicate-free collection of elements&lt;/li&gt;
&lt;li&gt;It is generally faster to add elements to the HashSet and then convert the collection to a TreeSet for a duplicate-free sorted traversal&lt;/li&gt;
&lt;li&gt;None of these implementation are synchronized&lt;/li&gt;
&lt;li&gt;Java also has a LinkedHashSet - look it up to know about it more&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;In Java, which is faster - HashMap or TreeMap? What is the usecase for each?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;On similar lines as HashSet vs. TreeSet. HashMap implements a hash function (uses hashCode and equals) on the keys. TreeMap uses Red-Black trees internally. HashMap is more time efficient. TreeMap is more space efficient. TreeMap has an internal ordering of keys which can also be specified using a construction time comparator. HashMap's have no internal ordering. One should use HashMap for fast lookup and TreeMap for sorted iteration. HashMap allows null keys and values. HashMap doesn't allow duplicate entries. HashMap iteration performance depends on &lt;em&gt;initial capacity&lt;/em&gt; and &lt;em&gt;load factor&lt;/em&gt; that can be passed during construction - TreeMap offers no such iteration performance tunables. &lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Why is order not maintained in Hash* collection implementations?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The whole point of hashing is to uniformly disperse the keys, so any order in the keys is lost when hashing.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;In Java, what is the rule with implementing hashCode?&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If hashCode's are equal then objects may or may not be equal&lt;/li&gt;
&lt;li&gt;If hashCode's are not-equal the objects are not equal&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;In Java, what kind of collision resolution scheme is implemented for HashMap and Hashtable?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Both use separate chaining. Google guava libraries have some implementations for linear probing&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Space usage of BST vs. separate chaining vs. linear probing?&lt;/strong&gt;
    &lt;p align="center"&gt;
    &lt;img alt="image" src="/images/algorithms/space.png"&gt;
    &lt;/p&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Performance of hashing vis-a-vis trees?&lt;/strong&gt;
    &lt;p align="center"&gt;
    &lt;img alt="image" src="/images/algorithms/hashperf.png"&gt;
    &lt;/p&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;What would be a good data-structure to use for counting all people within a income range (say 10k to 20k) in an age group (say 25 to 35 years) among a million people?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Kd-trees because of the easy 2-dimensional split (at least one should say some kind of tree). Though Kd-trees can be used for n-dimensional searches very well too&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;</content><category term="posts"/></entry><entry><title>Weekend well spent with JSFoo &amp; NodeJS</title><link href="https://bharath12345.github.io/posts/weekend-well-spent-with-jsfoo-nodejs/" rel="alternate"/><published>2013-09-24T00:00:00-04:00</published><updated>2013-09-24T00:00:00-04:00</updated><author><name>Bharadwaj</name></author><id>tag:bharath12345.github.io,2013-09-24:/posts/weekend-well-spent-with-jsfoo-nodejs/</id><summary type="html">&lt;p&gt;Had been to the wonderful JavaScript conference &lt;a href="https://funnel.hasgeek.com/jsfoo2013/"&gt;JSFoo&lt;/a&gt; last week. The tremendous enthusiasm in the web development community for server side JavaScript was all at display. Personally, I have spent a lot of time coding visualizations with JavaScript. However only recently did I write some tidbits of code with NodeJS …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Had been to the wonderful JavaScript conference &lt;a href="https://funnel.hasgeek.com/jsfoo2013/"&gt;JSFoo&lt;/a&gt; last week. The tremendous enthusiasm in the web development community for server side JavaScript was all at display. Personally, I have spent a lot of time coding visualizations with JavaScript. However only recently did I write some tidbits of code with NodeJS. And I hadn't spent any time properly studying it. The conference has spurred me to do better. I started reading NodeJS and working on a small project to create my first non-trivial NodeJS application (which I shall share in this blog). But those details are for a little later… let me start with JSFoo…&lt;/p&gt;
&lt;div class="toc"&gt;&lt;span class="toctitle"&gt;Table of Contents&lt;/span&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#jsfoo"&gt;JSFoo&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#the-making-of-two-blogs"&gt;The making of two blogs…&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#bharathwritesin"&gt;bharathwrites.in&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#bharathblogsin"&gt;bharathblogs.in&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h4 id="jsfoo"&gt;JSFoo&lt;/h4&gt;
&lt;p&gt;Me being a little bit of computer science fundamentalist have looked upon those doing web development with a little skepticism. Do they really get the data-structures? Complexity of algorithms? Well, those those questions were put to rest by the many speakers at the conference forever. NodeJS might not be in production in a big way as of now. But there is no doubting the quality of people behind it. The design and frameworks are still in the makes… but quality people from the developer community are lapping it up. And the industry is not to be left behind… Of all, Microsoft and Adobe were among those sponsoring the event - Microsoft (with its .Net dream fading slowly) was busy showing off IE10 while Adobe seems to be on its way to burying flash and flex with investments to build open-source JavaScript frameworks… &lt;/p&gt;
&lt;p&gt;For me the conference started on a beautiful note. It had to do with the Mozilla foundation. It was awesome of Mozilla to have brought in such a wonderful contingent for the meet. My first love has always been Firefox. I do all my development on Firefox. Coming to know that Chris Heilmann was among those in the hall made me smile within myself.&lt;/p&gt;
&lt;p&gt;Now coming to the few talks that will stay with me…&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="http://robertnyman.com/"&gt;Robert Nyman&lt;/a&gt; was among the first speakers. He spoke on the upcoming FirefoxOS for mobile devices. I had my first brush with FirefoxOS at the Wikipedia hackathon and have spent some time with it. Android definitely needs another open-source competitor. With a JavaScript API platform, one hopes, FirefoxOS will catch on with the larger community of web developers. The next step for smartphones is to be able to support 1000s of lightweight apps. I hope that race gets kickstarted with FirefoxOS (there is already a nice '&lt;em&gt;search app&lt;/em&gt;' facility in FirefoxOS which tells me they have their marker in the right direction!)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;I was really looking forward to listening to &lt;a href="http://christianheilmann.com/"&gt;Christian Heilmann&lt;/a&gt;. And his choice of topic did not let me down. In his talk he urged the web developers to study HTML5. Developers continue to use shims and jQuery plugin's unnecessarily - the features they look for have made their way into the specs and should be available by default (full screen API as a case for point). Browsers are claiming HTML5 support without fully implementing the specs - and in this situation it becomes the job of the developers to pound on the doors of the browser developers (file bugs) if any part of the spec is unimplemented or glossed over. Personally, let me admit - I have never read a book on HTML5 (for that matter I don't remember if I have ever read any book on HTML at all). If someone had suggested reading a book on HTML5 before this talk I would have responded by saying that I find the W3C resources on the web quite sufficient. But now, after listening to Chris, I know why my thinking is wrong.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;One of the talks that blew my mind was that by &lt;a href="http://www.nileshtrivedi.com/"&gt;Nilesh Trivedi&lt;/a&gt; on Interactive Physics Simulations. I would have to watch the video of Nilesh's talk many times over to grasp all that he said. And to build the application that he has without using any pre-built frameworks is absolutely astounding! If you are a C/Java programmer with a liking for theory of computing like me, then, there you have it - there are people like Nilesh in the JavaScript world! &lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The two workshops that I went to were both superbly conducted. &lt;a href="http://bharani.herokuapp.com/"&gt;Bharani Muthukumarswamy&lt;/a&gt; introduced me to MeteorJS (and made me promise to myself to try it soon). And in the other workshop &lt;a href="https://github.com/panbhag"&gt;Pankaj Bhageria&lt;/a&gt; made me construct the server side of a JavaScript app step-by-step. Both made me code. And I enjoyed it thoroughly.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Other good talks included the one by Om Shankar on WebRTC, Offline Apps by Manan Bharara, Persona based authentication system (newly being brought by Mozilla) by Francois Marier and the preview of developer tools for the upcoming IE10 by &lt;a href="http://blogorama.nerdworks.in/"&gt;Rajasekharan Vengalil&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Before I end my note on JSFoo I must express my &lt;em&gt;Thanks&lt;/em&gt; to HasGeek. I had been to the Fifth Elephant few months ago and now JSFoo. I must congratulate them for filling what was a definite need among the developer community. Yes we now have www.meetup.com and other hackathons happening ever more regularly. But the Indian software community and developers in particular need more interaction. I have come to learn about so many wonderful small companies and people through these two conferences that I have lost count. &lt;em&gt;Thank you HasGeek!&lt;/em&gt;&lt;/p&gt;
&lt;hr&gt;

&lt;h4 id="the-making-of-two-blogs"&gt;The making of two blogs…&lt;/h4&gt;
&lt;p&gt;A decade ago I had a blogspot blog. I used it for a couple of years. Then got tired of it and created a new one on WordPress. But never felt like writing anything of substance on it. With my hacker like attitude (even back then), I always detested the way these blogs looked, the URL itself and many such things. Buying web-server space for hosting a blog felt plain wrong. So a year ago when I came to know of GitHub Pages I decided to try it as soon as I could. GitHub is the ideal platform for engineers to blog. Jekyll is super easy to learn. And for those with version control in their bloodstream and daydreams, Git feels so nice. So I bought my domain name (for less than Rs. 200!) and got started a couple of months ago. And though the blog is not close to what I want it to look like, it still feels so much better than blogspot…&lt;/p&gt;
&lt;p&gt;But then that was till last week. One thing that I did not like with GitHub Pages was Ruby. I don't know Ruby. And I have no inclination to learn it. So when I had to understand Gems and Rake it did not feel good. When I got a couple of error emails from GitHub saying that the blog build had failed, it felt worse (though the problems itself were trivial to fix)…&lt;/p&gt;
&lt;p&gt;I knew I could use Heroku and host the blog as a NodeJS application while simultaneously putting it in on GitHub. That would give me &lt;strong&gt;server side control&lt;/strong&gt;. And a SQL database! And a NoSql database!! So last weekend, while attending the conference, I let the urge to take me over. I started chipping away with my first NodeJS blog app… it is in fairly good shape now… and so… I am happy to present - &lt;a href="http://bharathblogs.in"&gt;http://bharathblogs.in&lt;/a&gt;!!&lt;/p&gt;
&lt;p&gt;So what sense does it make to have two blogs? None. So what I am going to do? Keep both! Well, the domain name costs nothing. (And I like to build backup plans with my applications!)… The thing is, I have built both and the code is almost identical. So why dismantle either anyway...? &lt;/p&gt;
&lt;p&gt;Now here is a quick primer to how and what of building both these...&lt;/p&gt;
&lt;hr&gt;

&lt;h4 id="bharathwritesin"&gt;bharathwrites.in&lt;/h4&gt;
&lt;p&gt;The components -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;GitHub pages for hosting&lt;/li&gt;
&lt;li&gt;Jekyll for static blog generation&lt;/li&gt;
&lt;li&gt;Grunt for JS minify (see the Gruntfile.js for complete list of tasks)&lt;/li&gt;
&lt;li&gt;Twitter Bootstrap, FontAwesome for the blog's look and feel&lt;/li&gt;
&lt;li&gt;Posts use various JavaScript frameworks like Dojo, jQuery, Angular, D3, Stack etc&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;

&lt;h4 id="bharathblogsin"&gt;bharathblogs.in&lt;/h4&gt;
&lt;p&gt;The components -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Heroku for hosting&lt;/li&gt;
&lt;li&gt;GitHub for version control&lt;/li&gt;
&lt;li&gt;NodeJS as the server side platform&lt;/li&gt;
&lt;li&gt;&lt;a href="http://jsantell.github.io/poet"&gt;Poet&lt;/a&gt; as the blogging framework&lt;/li&gt;
&lt;li&gt;Grunt for JS minify (see the Gruntfile.js for complete list of tasks)&lt;/li&gt;
&lt;li&gt;Twitter Bootstrap, FontAwesome for the blog's look and feel (etc)&lt;/li&gt;
&lt;li&gt;Posts use various JavaScript frameworks like Dojo, jQuery, Angular, D3, Stack etc&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Now I plan to slowly add more functionality to the server-side of my NodeJS blog app. Probably scrape a few RESTful data sources on the web that is of interest to me and hopefully that of my visitors. Start using the Heroku provided MongoDB. And so on… one sad thing is Heroku does not support WebSockets… Else I had couple of interesting ideas for that one. (And one of these days I will probably swap bharathwrites.in to be hosted from Heroku and bharathblogs.in from GitHub pages… want to hack on my NodeJS blog a lot and I like the bharathwrites.in url better) &lt;/p&gt;</content><category term="posts"/></entry><entry><title>The Bleeding Edge Of A Web Application...</title><link href="https://bharath12345.github.io/posts/the-bleeding-edge-of-an-application/" rel="alternate"/><published>2013-09-11T00:00:00-04:00</published><updated>2013-09-11T00:00:00-04:00</updated><author><name>Bharadwaj</name></author><id>tag:bharath12345.github.io,2013-09-11:/posts/the-bleeding-edge-of-an-application/</id><summary type="html">&lt;p&gt;Most web applications have the well-known 3-tiered structure - WebTier &amp;gt; ApplicationTier &amp;gt; DataTier. Both WebTier and ApplicationTier have the web-layer to parse the incoming HTTP requests. Its in the WebTier that one deploy's load-balancing L4-routers like Apache/Nginx or Netscaler like appliances. HTTP requests are forwarded by the WebTier to the ApplicationTier …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Most web applications have the well-known 3-tiered structure - WebTier &amp;gt; ApplicationTier &amp;gt; DataTier. Both WebTier and ApplicationTier have the web-layer to parse the incoming HTTP requests. Its in the WebTier that one deploy's load-balancing L4-routers like Apache/Nginx or Netscaler like appliances. HTTP requests are forwarded by the WebTier to the ApplicationTier which is generally served by a much bigger farm of servers. Web-layer in the ApplicationTier is the focus of this blog. Its a challenging area of software development for the following reasons and more - &lt;/p&gt;
&lt;div class="toc"&gt;&lt;span class="toctitle"&gt;Table of Contents&lt;/span&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#1-quantifying-the-bleeding-edge"&gt;1. Quantifying the 'Bleeding Edge'&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#2-why-is-it-hard"&gt;2. Why Is It Hard?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#3-software-development-of-web-applications"&gt;3. Software Development Of Web Applications&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#4-jvm-based-web-apps"&gt;4. JVM Based Web Apps&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#i-servlet-specification-frameworks"&gt;(i) Servlet Specification Frameworks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#ii-mvc-frameworks"&gt;(ii) MVC Frameworks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#iii-asynchronous-event-driven-frameworks"&gt;(iii) Asynchronous Event-Driven Frameworks&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#5-nodejs-javascript-on-the-server-side"&gt;5. NodeJS - JavaScript on the server side&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#6-ruby-and-php"&gt;6. Ruby and PHP&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#7-what-to-use-for-my-project"&gt;7. What to use for my project?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;Huge volume of requests, with &lt;em&gt;read&lt;/em&gt; requests generally surpassing &lt;em&gt;write&lt;/em&gt; by an order of magnitude or so&lt;/li&gt;
&lt;li&gt;Change. Website content and web-service APIs both change very often&lt;/li&gt;
&lt;li&gt;Variety of consumers. People read/write to the web. And so do other software applications&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Being a Java and JavaScript developer, my interest has been in the emergent software stacks in these two languages. To understand their &lt;em&gt;raison d'être&lt;/em&gt;. For that, I start by taking a look at the numbers (HTTP requests) at some of the popular websites. Then move on to some of the core technical problems. And compare some of the competing software stacks.&lt;/p&gt;
&lt;p&gt;But before discussing on the web-layer in the ApplicationTier it is instructive to look at the pure WebTier itself. Its instructive to read &lt;a href="http://news.netcraft.com/archives/2013/09/05/september-2013-web-server-survey.html"&gt;Netcraft's September 2013 Web Server Survey&lt;/a&gt;. All the top web-servers are C/C++ based. For those unfamiliar with actual web application deployments, these web-servers are not used to host the applications themselves. They serve static pages, act as L4-routers, firewalls and load-balancers. They are placed at the very gate of modern web-shops and all requests go through them. These tasks are well defined, so, it makes sense to develop them in native languages for brute speed.&lt;/p&gt;
&lt;hr&gt;

&lt;h4 id="1-quantifying-the-bleeding-edge"&gt;1. Quantifying the 'Bleeding Edge'&lt;/h4&gt;
&lt;p&gt;Here are the numbers from recently published articles on Twitter, WhatsApp and Facebook. There are others who cannot not be far behind like Google, Wikipedia, Amazon, Skype etc. &lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Twitter: 300K requests per second (RPS) for reading and 6000 RPS for writing - &lt;a href="http://highscalability.com/blog/2013/7/8/the-architecture-twitter-uses-to-deal-with-150m-active-users.html"&gt;source1&lt;/a&gt;, &lt;a href="https://blog.twitter.com/2013/new-tweets-per-second-record-and-how"&gt;source2&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;WhatsApp: 10 billion requests sent and received in one day - &lt;a href="http://thenextweb.com/mobile/2013/06/13/whatsapp-is-now-processing-a-record-27-billion-messages-per-day/"&gt;source&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;Facebook: 12 million HTTP requests per second - &lt;a href="http://www.datadoghq.com/2013/07/the-best-of-velocity-and-devopsdays-2013-part-ii/"&gt;source&lt;/a&gt; &lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;(&lt;em&gt;All these articles are quite recent&lt;/em&gt;)&lt;/p&gt;
&lt;hr&gt;

&lt;h4 id="2-why-is-it-hard"&gt;2. Why Is It Hard?&lt;/h4&gt;
&lt;p&gt;Two good resources to start understanding why these scales are hard on software development in ApplicationTier are -&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;C10K problem by &lt;a href="http://www.kegel.com/c10k.html"&gt;Kegel&lt;/a&gt; and &lt;a href="http://bulk.fefe.de/scalable-networking.pdf"&gt;Felix von Leitner&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;C10M problem by &lt;a href="http://c10m.robertgraham.com/p/manifesto.html"&gt;Robert Graham&lt;/a&gt;. And &lt;a href="http://www.youtube.com/watch?v=D09jdbS6oSI"&gt;this video&lt;/a&gt; by him is very instructive &lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;But let me state the problem(s) simply. The reasons why it is hard to handle HTTP requests are -&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Forking a process&lt;/strong&gt;: is too expensive a compute operation to perform everytime a request arrives&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Forking a thread&lt;/strong&gt;: is less expensive on compute. But writing multi-threaded applications for multi-core systems is very tough (and &lt;em&gt;actually&lt;/em&gt; forking a new thread is not inexpensive at all)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Use thread pools&lt;/strong&gt;: It just shifts the bottleneck. Once you have a thread-pool, each thread has to do a select() or poll() to find the next nonblocking socket ready for IO. But doing a select() or poll() on a huge array of open socket descriptors is extremely inefficient at the kernel level (checkout the deep analysis to C10K problem in the above mentioned links)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The Event driven model&lt;/strong&gt;: requires a paradigm shift in thinking and designing applications from bottoms-up. The best way to start grasping the idea is to read &lt;a href="http://www.reactivemanifesto.org"&gt;The Reactive Manifesto&lt;/a&gt;. This model is not very different from the SEDA architecture. Reactive applications is a very fine idea and one of the reasons why I dwelled into this subject in the first place…&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;

&lt;h4 id="3-software-development-of-web-applications"&gt;3. Software Development Of Web Applications&lt;/h4&gt;
&lt;p&gt;My current views are that, broadly, there are 3 different language families to develop web applications on server and client sides -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Server&lt;/strong&gt;: JVM based, compiled and statically typed; &lt;strong&gt;Client&lt;/strong&gt;: JavaScript&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Server&lt;/strong&gt;: Ruby/PHP, interpreted and dynamically typed; &lt;strong&gt;Client&lt;/strong&gt;: JavaScript&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Server&lt;/strong&gt;: NodeJS, interpreted and dynamically typed; &lt;strong&gt;Client&lt;/strong&gt;: JavaScript&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Thus, on the server side, the choice is between JVM (polyglot), Ruby/PHP and NodeJS. &lt;/p&gt;
&lt;h4 id="4-jvm-based-web-apps"&gt;4. JVM Based Web Apps&lt;/h4&gt;
&lt;p&gt;The web-layer in JVM world is filled with 3 types of frameworks - &lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Frameworks that support the &lt;em&gt;servlet specification&lt;/em&gt; (latest one is 3.0) &lt;/li&gt;
&lt;li&gt;MVC frameworks &lt;/li&gt;
&lt;li&gt;Asynchronous event-driven frameworks based on Netty&lt;/li&gt;
&lt;/ol&gt;
&lt;h5 id="i-servlet-specification-frameworks"&gt;(i) Servlet Specification Frameworks&lt;/h5&gt;
&lt;p&gt;These include Tomcat and Jetty. What is the main motivator for the servlet spec? It is to manage state information that does not exist in the stateless HTTP protocol. &lt;a href="http://docs.oracle.com/javaee/6/api/javax/servlet/http/HttpServletRequest.html"&gt;HttpServletRequest&lt;/a&gt; provides an API getSession() where the HttpSession object is a container to hold attributes for a single transaction spread across multiple HTTP request/responses. Apart from this central feature of sessions, the servlet API also defines the interfaces that the servlet container has to adhere-to to provide concurrent request processing in a sandbox environment. So is there a drawback in this idea? Yes, there is. The very idea of &lt;em&gt;state&lt;/em&gt; brings down the performance of these containers. That is the reason why developing highly performant RESTful APIs using servlet containers is a bad idea. In &lt;a href="http://en.wikipedia.org/wiki/Representational_state_transfer"&gt;RESTful&lt;/a&gt;, the client is expected to maintain the &lt;em&gt;state&lt;/em&gt;, if required. Servlet containers can be tuned for statelessness, but then, that goes against one of the fundamental ideas of the spec. And my readings tell me that these frameworks don't become highly performant on turning off the statefullness.&lt;/p&gt;
&lt;h5 id="ii-mvc-frameworks"&gt;(ii) MVC Frameworks&lt;/h5&gt;
&lt;p&gt;These include Spring MVC, Struts, Tapestry, Wicket etc. I have used two of these - &lt;a href="http://struts.apache.org/"&gt;Struts2&lt;/a&gt; and &lt;a href="http://wicket.apache.org/"&gt;Wicket&lt;/a&gt; in building applications that have seen deployment. The fundamental motivation for these frameworks is ease-of-development (annotations etc), clean separation of concerns (MVC design pattern), lot of goodies (like templating etc) and integration with other JavaEE stacks (Struts2-Spring integration etc). &lt;/p&gt;
&lt;h5 id="iii-asynchronous-event-driven-frameworks"&gt;(iii) Asynchronous Event-Driven Frameworks&lt;/h5&gt;
&lt;p&gt;And now I come to the most interesting area of Java web application development. &lt;a href="http://netty.io/"&gt;Netty&lt;/a&gt; based frameworks like &lt;a href="http://www.playframework.com/"&gt;Play!&lt;/a&gt; and &lt;a href="http://vertx.io/"&gt;Vert.x&lt;/a&gt;. These frameworks do not comply to the servlet specification. They use Netty underneath for asynchronous event based handling of HTTP requests (I cover &lt;em&gt;what-the-hell-is-asynchronous-event-driven&lt;/em&gt; in the NodeJS section below). Netty is stateless making the server side fast and efficient. The frameworks on top are built to match the ease-of-dev and richness offered by the likes of Struts and Tapestry. They also offer APIs for client-side statefulness. &lt;strong&gt;So these frameworks are an effort to mix high performance with ease-of-dev.&lt;/strong&gt; But moving to event-based and asynchronous thinking is not straightforward. It needs a mind shift akin to the transition to Object-oriented-programming. However the promise they hold is to be able to build web applications that defy &lt;a href="http://en.wikipedia.org/wiki/Amdahl's_law"&gt;Amdahl's law&lt;/a&gt;. If you are a new shop with bright Java engineers wanting to build a highly scalable web-application, then, these are the frameworks you should start exploring first...&lt;/p&gt;
&lt;hr&gt;

&lt;h4 id="5-nodejs-javascript-on-the-server-side"&gt;5. NodeJS - JavaScript on the server side&lt;/h4&gt;
&lt;p&gt;The &lt;a href="https://github.com/joyent/node/wiki/Projects,-Applications,-and-Companies-Using-Node"&gt;list&lt;/a&gt; of companies and websites powered by NodeJS is growing long by the day. However NodeJS is still a newbie. Why would somebody want to use NodeJS? NodeJS makes two very interesting promises -
* End-to-end JavaScript shop for you web application
* High performance through event-based asynchronous model&lt;/p&gt;
&lt;p&gt;The first promise is easy to understand. Any good web application requires a team of good designers and client-side programmers. If the programming language on both client-side and server-side are the same, then it reduces the risk of investment in diverse technologies and brings down the barriers between teams and moving people.&lt;/p&gt;
&lt;p&gt;The second promise of performance is more interesting. Is NodeJS as fast as the Java based async frameworks? &lt;a href="http://www.cubrid.org/blog/dev-platform/inside-vertx-comparison-with-nodejs/"&gt;This&lt;/a&gt; blog presents an excellent comparison. It goes to show that NodeJS is no match to the JVM based frameworks. Its difficult to beat the JVM!&lt;/p&gt;
&lt;p&gt;But moving ahead of comparisons, let me dwell a little more on the aspect of performance promised by the event-driven asynchronous frameworks in general. The hype around such frameworks is increasing day-by-day and is grounded on firm theoretical foundations. So how exactly does async and event-driven help? NodeJS provides a good base to explore since one cannot do anything but asynchronous event-based HTTP processing with NodeJS! Let us study this code fragment for a while - (this comes from &lt;a href="http://shop.oreilly.com/product/0636920024606.do"&gt;this&lt;/a&gt; excellent book on NodeJS by O'reilly)&lt;/p&gt;
&lt;pre&gt;
        // load http module
        var http = require('http');
        var fs = require('fs');

        // create http server
        http.createServer(function (req, res) {

            // open and read in a file
            fs.readFile('textfile.txt', 'utf8', function(err, data) {
                res.writeHead(200, {'Content-Type': 'text/plain'});
                if (err) {
                    res.write('Could not find or open file for reading\n');
                } else {
                    // if no error, write file to client
                    res.write(data);
                }
                res.end();
            });

        }).listen(8124, function() {
            console.log('bound to port 8124');
        });

        console.log('Server running on 8124/');
&lt;/pre&gt;

&lt;p&gt;Following aspects need to be understood -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The two instances of asynchronous behaviour - one for http I/O and the other for file I/O&lt;/li&gt;
&lt;li&gt;This program never blocks. NEVER.&lt;/li&gt;
&lt;li&gt;Multiple types of events are emitted - when a request arrives, when file I/O request completes - and these events are consumed in a single giant event loop with the NodeJS framework&lt;/li&gt;
&lt;li&gt;Large (N-squared) like compute algorithms should not be synchronously attempted - they take away all the processing core's bandwidth bringing the whole system to a halt. So, such event-based asynchronous processing is most suited for applications that can be broken down into multiple stages like a SEDA architecture&lt;/li&gt;
&lt;li&gt;The application itself acts as one giant event-producing and event-consuming engine which should be seen as single-threaded and binding to a single-core. To make use of &lt;a href="http://stackoverflow.com/questions/2387724/node-js-on-multi-core-machines"&gt;multiple-cores&lt;/a&gt; multiple-instances of NodeJS can be run on the same system&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;NodeJS has found tremendous traction with developer community. Am heading to &lt;a href="https://jsfoo.in/2013/"&gt;JSFoo in Bangalore&lt;/a&gt; next week, and one look at the funnel would tell you that every second session has something to do with NodeJS. And NodeJS has a plethora of MVC frameworks which are maturing fast. Sample - &lt;a href="http://expressjs.com/"&gt;Express&lt;/a&gt;, &lt;a href="http://geddyjs.org/"&gt;Geddy&lt;/a&gt;, &lt;a href="http://flatironjs.org/"&gt;FlatironJS&lt;/a&gt;, &lt;a href="http://emberjs.com/"&gt;EmberJS&lt;/a&gt; - these are definitely poised to give MVC frameworks in Ruby and PHP a run for their money in simplicity, performance and features. &lt;/p&gt;
&lt;hr&gt;

&lt;h4 id="6-ruby-and-php"&gt;6. Ruby and PHP&lt;/h4&gt;
&lt;p&gt;With JVM based frameworks occupying one end of the spectrum offering high-performance + Maintainability and NodeJS based frameworks at the other end with simplicity + low-cost + ease-of-dev, how much of middle-ground is left for PHP and Ruby? I am not an expert in either of these two, so I will stay away from making predictions. One thing that is in favour of PHP/Ruby is that both are &lt;em&gt;proven&lt;/em&gt; in large production applications while reactive Java frameworks and NodeJS are still not. How long will this status last? Will NodeJS and Java reactive frameworks take away a chunk of web applications that would otherwise have been Ruby/PHP's? Or will the web applications playing field get expanded with the entry of these new players creating room for all?&lt;/p&gt;
&lt;hr&gt;

&lt;h4 id="7-what-to-use-for-my-project"&gt;7. What to use for my project?&lt;/h4&gt;
&lt;p&gt;I roundoff my blog with a guidance, albeit reluctantly. Apart from the usual suspects of time-to-market, capex-opex investment, engineering-skill and requirements-complexity that make project delivery complex, I propose that few more criterions come into play when it comes to web applications -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Is the web application part of a packaged-product (bundled in a CDROM) or delivered as part of the general web or Saas?&lt;/li&gt;
&lt;li&gt;Is the web application intra-enterprise or for open-internet usage?&lt;/li&gt;
&lt;li&gt;Is the web application majorly for human consumption or accessed by other software services?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I leave those aspects to the good judgement of the readers. I would definitely like to get feedback on those who disagree from my guidance below. The idea of writing this guidance is to paint broad strokes… Exceptions among project/people always exist!&lt;/p&gt;
&lt;div class="bs-docs-grid" id="dev"&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right alignCenter"&gt;&lt;h5&gt;JVM Based&lt;/h5&gt;&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
            &lt;ol&gt;
                &lt;li&gt;Performance: JVM is worlds best VM in performance. Ruby/PHP/NodeJS are interpreted and don't come close in performance (doing anything in JRuby per me is simply a bad idea). Facebook created HipHop for PHP to make it scale - this counts as an exception. Twitter, LinkedIn shifted from Ruby to Scala (which is JVM based) and achieved higher performance numbers. One can find umpteen examples like this…&lt;/li&gt;
                &lt;li&gt;Development Time: Java and other JVM languages are slower to develop compared to Ruby/PHP/NodeJS. And thats the reason why frameworks like Play! are trying hard to sell themselves as suited for fast development&lt;/li&gt;
                &lt;li&gt;Cost: Java developers are more expensive&lt;/li&gt;
                &lt;li&gt;Suited for: Large web applications. Enterprise products. Mission critical applications&lt;/li&gt;
            &lt;/ol&gt;
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right alignCenter"&gt;&lt;h5&gt;Ruby, PHP&lt;/h5&gt;&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
            &lt;ol&gt;
                &lt;li&gt;Performance: Definitely not bad&lt;/li&gt;
                &lt;li&gt;Development Time: Fast&lt;/li&gt;
                &lt;li&gt;Cost: Medium&lt;/li&gt;
                &lt;li&gt;Suited for: Medium sized projects&lt;/li&gt;
            &lt;/ol&gt;
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right alignCenter"&gt;&lt;h5&gt;NodeJS&lt;/h5&gt;&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
            &lt;ol&gt;
                &lt;li&gt;Performance: The jury is still out. Does the Google V8 engine challenge and beat PHP/Ruby? It will never be able to match the JVM though.&lt;/li&gt;
                &lt;li&gt;Development Time: Fast&lt;/li&gt;
                &lt;li&gt;Cost: Low, since the whole application is built on a single language stack the server-side developers and client-side developers can co-work&lt;/li&gt;
                &lt;li&gt;Suited for: Smaller chatty applications&lt;/li&gt;
            &lt;/ol&gt;
        &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;</content><category term="posts"/></entry><entry><title>Application Topology Graphs - Usecase, Different Product Offerings, Prototype Using D3 and jsPlumb</title><link href="https://bharath12345.github.io/posts/topology-graphs-with-d3-and-jsplumb/" rel="alternate"/><published>2013-09-01T00:00:00-04:00</published><updated>2013-09-01T00:00:00-04:00</updated><author><name>Bharadwaj</name></author><id>tag:bharath12345.github.io,2013-09-01:/posts/topology-graphs-with-d3-and-jsplumb/</id><summary type="html">&lt;p&gt;Graph depictions are common for problems like computer networks, social networks etc. Sometime ago, I came across the use-case of graphs for software application topologies. This post covers the few things I discovered on the topic of application topologies and their graphical representation.&lt;/p&gt;
&lt;div class="toc"&gt;&lt;span class="toctitle"&gt;Table of Contents&lt;/span&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#usecase"&gt;Usecase&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#prototype"&gt;Prototype&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#about-these-graphs"&gt;About these …&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;</summary><content type="html">&lt;p&gt;Graph depictions are common for problems like computer networks, social networks etc. Sometime ago, I came across the use-case of graphs for software application topologies. This post covers the few things I discovered on the topic of application topologies and their graphical representation.&lt;/p&gt;
&lt;div class="toc"&gt;&lt;span class="toctitle"&gt;Table of Contents&lt;/span&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#usecase"&gt;Usecase&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#prototype"&gt;Prototype&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#about-these-graphs"&gt;About these graphs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#comparison"&gt;Comparison&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#apm-products"&gt;APM Products&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#apm-product-screenshots"&gt;APM Product Screenshots&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#concluding-thoughts"&gt;Concluding Thoughts&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h3 id="usecase"&gt;Usecase&lt;/h3&gt;
&lt;p&gt;One aspect that makes application topologies a challenge is that they are &lt;em&gt;logical&lt;/em&gt; and not &lt;em&gt;physical&lt;/em&gt;. That is, the boundaries of a distributed application are difficult to define. In web companies/banks it is usual to find one &lt;em&gt;application-owner&lt;/em&gt; just responsible for the database system while many other for applications that make use of the database. The database thus becomes a shared application with multiple owners/users. From the point of view of graphical representation of applications in such an enterprise, the representation of running application thus becomes 'logical' - one user might want to see his application topology include the database while another may not. The database-system application owner might want to see graphs that show just the clustered databases with their external interfaces and/or graphs which include the in-enterprise applications. Thus depending upon the organization hierarchies different application components may be required to be grouped differently (both hardware resources components like servers and software components like application servers ). Inter and intra application views are required. And different users and user groups may require different &lt;em&gt;layer-transitions&lt;/em&gt; starting with a view of their application of ownership - both drilling-in and drilling-out - through the maze of applications and its constituents. Many Application Performance Management (APM) products claim to provide such graphical views. I take a look at their offerings in a later section in this blog where I look at the application graph views of popular APM vendors like AppDynamics, OpTier etc&lt;/p&gt;
&lt;h3 id="prototype"&gt;Prototype&lt;/h3&gt;
&lt;p&gt;Before getting too deep into thinking about application graphs I decided to develop a prototype for such graph representations. Unsurprisingly, just after a few hours of looking through the world of JavaScript discovered multiple libraries capable of good graph rendering. &lt;a href="http://stackoverflow.com/questions/7034/graph-visualization-code-in-javascript"&gt;This&lt;/a&gt; StackOverflow thread is useful. One can buy good commercial graph rendering libraries like &lt;a href="http://www.yworks.com"&gt;yFiles&lt;/a&gt; or use open-source freewares like &lt;a href="http://www.graphdracula.net"&gt;GraphDracula&lt;/a&gt;, mxGraph etc. But the libraries that I was most impressed with were &lt;a href="http://www.d3js.org"&gt;D3&lt;/a&gt; and &lt;a href="http://www.jsplumbtoolkit.com"&gt;jsPlumb&lt;/a&gt;. I have played with D3 for over a year now and it is &lt;em&gt;the&lt;/em&gt; most exciting JavaScript library for me on this planet! The very paradigm of data-modeling and programming for D3 is enlightening and it provides for extremely vivid and smooth graphical representations of all kind. And coming to jsPlumb, just a visit to the website is good enough to excite any programmer of its potential. So I got cracking with D3 and jsPlumb. Below are the two graph prototypes I came up with (it did not take much of an effort to code these using help from existing code available on web). The code for these prototypes are available on my GitHub repository too. I have used &lt;a href="http://dojotoolkit.org/"&gt;Dojo&lt;/a&gt; for modularising the code (AMD way) and draw up the containers.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Note: The interactive D3 and jsPlumb graph prototypes that were originally embedded here require JavaScript libraries that are not yet available in this version of the site. The descriptions and comparisons below provide details about these prototypes.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h4 id="about-these-graphs"&gt;About these graphs&lt;/h4&gt;
&lt;div class="bs-docs-grid" id="about"&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-6 left alignCenter"&gt;&lt;h5&gt;D3 Graph&lt;/h5&gt;&lt;/div&gt;
        &lt;div class="col-md-6 right alignCenter"&gt;&lt;h5&gt;jsPlumb Graph&lt;/h5&gt;&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-6 left"&gt;
            &lt;ol&gt;
                &lt;li&gt;This is more like a inter-applications and application-group view&lt;/li&gt;
                &lt;li&gt;The two different types of icons stand for single-applications (orange text) and application-groups (blue text)&lt;/li&gt;
                &lt;li&gt;The graph can be dragged and zoomed. To drag/pan click on the graph and drag it. To zoom use the mouse scroller&lt;/li&gt;
                &lt;li&gt;The graph actually represents a set of interconnected applications and application-groups&lt;/li&gt;
                &lt;li&gt;With D3 it is not very difficult to add hover effect on nodes and links atop such a graph. It is not difficult to add color effects to application nodes and edges to signify status&lt;/li&gt;
                &lt;li&gt;The icons, text and links are all SVG - so they scale beautifully on zooming&lt;/li&gt;
                &lt;li&gt;Every refresh of the page leads to a re-rendering of the graph in a different way. This is so because the graph is rendered using [D3 Force directed graph](http://bl.ocks.org/mbostock/4062045) layout. The position of nodes and edges is not fixed but computed each time the page is rendered by the algorithm for the specified gravity, distance and charge configurations (this prototype is not a thorough job of getting the nitty-gritty of a force layout with D3 right for the best possible rendering within the coordinates of a box. Thoughtful tuning of parameters should make the graph good for all form factors and far better than I show here)&lt;/li&gt;
            &lt;/ol&gt;
        &lt;/div&gt;
        &lt;div class="col-md-6 right"&gt;
            &lt;ol&gt;
                &lt;li&gt;This is more like a intra-application view&lt;/li&gt;
                &lt;li&gt;This depicts a typical web-application with its 3-tiers: web-layer, app-layer and datasource (database, external etc)&lt;/li&gt;
                &lt;li&gt;jsPlumb provides many different types of connectors and endpoints. After playing with the options for a while I have the left the connections to look like 'Z' simply because it looked nice to me! (the more appropriate links would probably be straight lines, but this is just a playful prototype!). Have chosen the source endpoint of the connections to have a blue dot. The connections have an arrow on top (there are many choices for such settings)&lt;/li&gt;
                &lt;li&gt;Mouse-over the links to see the color change from yellow to blue - this is just using a simple css setting&lt;/li&gt;
                &lt;li&gt;To differentiate the 3-layers, I have internally used Dojo Titlepane's. I have a liking for their neat rendering&lt;/li&gt;
                &lt;li&gt;The icons are SVG. Did not try to implement zoom, pan or node/link movement. They are very much doable though non-trivial&lt;/li&gt;
            &lt;/ol&gt;
        &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;

&lt;h4 id="comparison"&gt;Comparison&lt;/h4&gt;
&lt;div class="bs-docs-grid" id="comparison"&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 first"&gt;&lt;/div&gt;
        &lt;div class="col-md-5 left alignCenter"&gt;&lt;h5&gt;D3&lt;/h5&gt;&lt;/div&gt;
        &lt;div class="col-md-5 right alignCenter"&gt;&lt;h5&gt;jsPlumb&lt;/h5&gt;&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 first"&gt;&lt;h5&gt;Scalability&lt;/h5&gt;&lt;/div&gt;
        &lt;div class="col-md-5 left"&gt;D3 is built for scalability of visual components. Hundreds and thousands of nodes and edges can be quickly created/updated/removed and the visualizations render and transition really fast (I did a quick scale test of close to 5000 nodes and few hundred thousand edges - one has to really see to believe how fast the rendering is)&lt;/div&gt;
        &lt;div class="col-md-5 right"&gt;jsPlumb is much slower than D3 in rendering. However that does not mean jsPlumb is slow - D3 is simply too fast!&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 first"&gt;&lt;h5&gt;Layouts: Force etc&lt;/h5&gt;&lt;/div&gt;
        &lt;div class="col-md-5 left"&gt;D3 has pre-built [force layout](https://github.com/mbostock/d3/wiki/Force-Layout) visualization with many options. A force directed graph works beautifully when the real-estate available for rendering is dynamic along with a (probable) huge number of nodes and edges. The graph layout optimizes itself (per gravity/distance/charge settings) to provide the best possible view&lt;/div&gt;
        &lt;div class="col-md-5 right"&gt;jsPlumb provides endpoints and connectors. One can use the facilities to build a force directed graph but such rendering algorithms are not provided OOTB (coding a force layout algorithm is not trivial). However if the number of edges and nodes is known, is not very huge and falls into a clean pattern (like the 3-layers in the above graph), jsPlumb can be used to create very neat layouts&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 first"&gt;&lt;h5&gt;Visual Beauty&lt;/h5&gt;&lt;/div&gt;
        &lt;div class="col-md-5 left"&gt;Requires programming. One can search and lookup upteen amazing D3 visualizations including many that are graphs. One can use SVG for scalable zooming. However, building a beautiful graph framework for a product with D3 will require some work&lt;/div&gt;
        &lt;div class="col-md-5 right"&gt;Even the default setting can produce excellent looking graphs. Building better looking graphs (with fewer elements) should be considerably easier with jsPlumb&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 first"&gt;&lt;h5&gt;Development Simplicity&lt;/h5&gt;&lt;/div&gt;
        &lt;div class="col-md-5 left"&gt;D3 takes some learning. The paradigm of create/update/destroy of elements along with modelling of json data for a particular library function can be complex. But once the mind gets used to the paradigm one realizes its power and simplicity. Compared to all the JS visualization frameworks that I have used (Dojo, jQuery, Raphael, mootools, YUI, Google toolkit, FusionCharts etc) D3 is in a class of its own. Once you get hooked to creating charts/visuals the D3 way, I bet you wont go near anything else!&lt;/div&gt;
        &lt;div class="col-md-5 right"&gt;jsPlumb is truly simple. As a well thought out, well written and well documented library, one can start building working graphs in less than a day (which would be quite a challenge for D3 newbie to accomplish)&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 first"&gt;&lt;h5&gt;Rendering Speed&lt;/h5&gt;&lt;/div&gt;
        &lt;div class="col-md-5 left"&gt;No other JS framework in that I have come across comes even in the vicinity of D3 in speed and performance. D3 is a class act.&lt;/div&gt;
        &lt;div class="col-md-5 right"&gt;Definitely not slow&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 first"&gt;&lt;h5&gt;Layer transitions, Panning, Zoom&lt;/h5&gt;&lt;/div&gt;
        &lt;div class="col-md-5 left"&gt;D3 is built for zoom, pan like functionality from bottoms-up. The transitions are smooth, fast and just work&lt;/div&gt;
        &lt;div class="col-md-5 right"&gt;Requires some doing&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 first"&gt;&lt;h5&gt;Project Liveness, Community, Future roadmap&lt;/h5&gt;&lt;/div&gt;
        &lt;div class="col-md-5 left"&gt;Super active. Its one of the most cloned projects in the JS world on GitHub. There is a large community of users and questions are quickly answered on StackOverflow, Google groups etc. With such strong foundations, I dont see the momentum behind D3 slowing down in near future&lt;/div&gt;
        &lt;div class="col-md-5 right"&gt;Not as hot as D3 but nevertheless very popular. Enjoys a fairly large community of users and in the tradition of jQuery plugin's one can easily see, understand, tweak the library's code which seems straightforward to understand for good developers on a demanding projects&lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;

&lt;h3 id="apm-products"&gt;APM Products&lt;/h3&gt;
&lt;p&gt;Along with trying to understand application topologies and design this prototype I had a look at the offerings of some of the APM vendors. All top vendors advertise topology graphs but their offerings seem very limited - lot of constraints to both configuration and usage. One can see the screenshots from these products below. &lt;/p&gt;
&lt;p&gt;After looking at the existing offerings and my study, here is a dump of features that would be required for anyone attempting the challenge of application topology views -&lt;/p&gt;
&lt;ol&gt;
    &lt;li&gt;The nodes in the topology are representative of the hardware/software components. The links in the topology are representative of transactions. Corresponding status by colouring is much required&lt;/li&gt;
    &lt;li&gt;Multi-level Application Groups are needed&lt;/li&gt;
    &lt;li&gt;Heterogeneous groups of applications and application-groups with customizable drill-throughs makes a lot of sense&lt;/li&gt;
    &lt;li&gt;Generally, in actual deployments n:n mapping between Application and Application-group is 'soft' or 'tag-like'. Application ownership and deployment structure often keeps changing. Users thus want to easily create new application-groups and add/remove applications from existing groups (all the time). This calls for a very flexible model the kind of which is not to be seen in existing product offerings&lt;/li&gt;
    &lt;li&gt;Different users would want to see application topology's with different applications and groups in them. Since the whole idea of Application Topology is logical and per a particular user's world-view (and not something physical) - a user would have multiple topology views with some applications and groups present in many. Example, a user could define -
        &lt;ol&gt;
            &lt;li&gt;Topology Layer 'A' with 2 applications - 'CRM', 'Core' - and 2 application groups - 'InternalBusinessApps', 'InternalOperationsApps'&lt;/li&gt;
            &lt;li&gt;Topology Layer 'B' with 1 application - 'Core' and 4 application groups - 'InternalBusinessApps', 'InternalOperationsApps', 'CustomerFacingApp', 'CriticalInterfacingApps'&lt;/li&gt;
            &lt;li&gt;So now, between Layer 'A' and Layer 'B' there is one overlapping application and 2 overlapping application-groups&lt;/li&gt;
        &lt;/ol&gt;
     &lt;/li&gt;
    &lt;li&gt;'Transactions, both intra and inter application, are typically HTTP(s), TCP, web-services, RMI/RPC etc&lt;/li&gt;
    &lt;li&gt;Users may require links in different layers to have a configurable set of transactions mapped on them. Going back to the earlier example of layers 'A' and 'B' - the link between CRM and InternalBusinessApps in layer-A can be configured to show the status per a configured set of Transactions, say TxA and TxB. While the link between the same CRM and InternalBusinessApps in layer-B can be configured to show the status per TxB and TxC&lt;/li&gt;
    &lt;li&gt;Users may require nodes in different layers to have a configurable set of hardware/software components mapped on them. Going back to the earlier example of layers 'A' and 'B' - the node for CRM in layer-A can be configured to show the status per a configured set of Components, say ServerA and DatabaseB. While the same CRM in layer-B can be configured to show the status per DatabaseB and AppServerC&lt;/li&gt;
    &lt;li&gt;Once a user defines multiple layers of topology he needs to stitch the transition. This transition stitching is a very complex requirement. Apart from it being a configurable option, this action requires a default which will show a topology layer of all individual application constituents of a Application Group&lt;/li&gt;
&lt;/ol&gt;

&lt;h4 id="apm-product-screenshots"&gt;APM Product Screenshots&lt;/h4&gt;
&lt;div class="row"&gt;
    &lt;div class="col-md-6"&gt;
        &lt;h5&gt;AppDynamics&lt;/h5&gt;
        &lt;img src="/images/topograph/appdynamics.jpg" alt="AppDynamics Application Topology" class="img-fluid"&gt;
    &lt;/div&gt;
    &lt;div class="col-md-6"&gt;
        &lt;h5&gt;OpTier&lt;/h5&gt;
        &lt;img src="/images/topograph/optier.jpg" alt="OpTier Application Topology" class="img-fluid"&gt;
    &lt;/div&gt;
&lt;/div&gt;
&lt;div class="row mt-3"&gt;
    &lt;div class="col-md-6"&gt;
        &lt;h5&gt;CorreIssue&lt;/h5&gt;
        &lt;img src="/images/topograph/correlsense.png" alt="CorreIssue Application Topology" class="img-fluid"&gt;
    &lt;/div&gt;
    &lt;div class="col-md-6"&gt;
        &lt;h5&gt;IBM&lt;/h5&gt;
        &lt;img src="/images/topograph/ibm.jpg" alt="IBM Application Topology" class="img-fluid"&gt;
    &lt;/div&gt;
&lt;/div&gt;
&lt;div class="row mt-3"&gt;
    &lt;div class="col-md-6"&gt;
        &lt;h5&gt;ExtraHop&lt;/h5&gt;
        &lt;img src="/images/topograph/extrahop.gif" alt="ExtraHop Application Topology" class="img-fluid"&gt;
    &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 id="concluding-thoughts"&gt;Concluding Thoughts&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Application topology is a wonderful emerging playground for those interested in graph representations. Its very fluid with many usecases and user expectations. Applications in enterprises are getting more and more distributed with more moving parts and complexity (while probably, computer networks in enterprises is progressively getting simplified thanks to bigger routers and switches!) thus making the problem of graphing them very exciting and challenging&lt;/li&gt;
&lt;li&gt;Open-source, liberally licensed JavaScript graphing toolkits like D3 and jsPlumb have really come of age to be used deep and wide in software products. Sufficiently interested and skilled programmers can do as good a job with these libraries as what is possible by using commercial packages like yFiles  &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;</content><category term="posts"/></entry><entry><title>Effective Java</title><link href="https://bharath12345.github.io/posts/effective-java/" rel="alternate"/><published>2013-08-22T00:00:00-04:00</published><updated>2013-08-22T00:00:00-04:00</updated><author><name>Bharadwaj</name></author><id>tag:bharath12345.github.io,2013-08-22:/posts/effective-java/</id><summary type="html">&lt;p&gt;I read this beautifully written article a few days ago - "&lt;a href="https://medium.com/lessons-learned/80ba19c55883"&gt;I will not do your tech interview&lt;/a&gt;". I can't agree more with the author. Every single time I have had to give/take a technical interview, more than the sense of being inadequately prepared I feel like carrying an inexplicable …&lt;/p&gt;</summary><content type="html">&lt;p&gt;I read this beautifully written article a few days ago - "&lt;a href="https://medium.com/lessons-learned/80ba19c55883"&gt;I will not do your tech interview&lt;/a&gt;". I can't agree more with the author. Every single time I have had to give/take a technical interview, more than the sense of being inadequately prepared I feel like carrying an inexplicable psychological burden. And I have met no one who does not fear what Ellis beautifully calls as - &lt;em&gt;"bear-trap of a stupid brainteaser"&lt;/em&gt; :-).&lt;/p&gt;
&lt;div class="toc"&gt;&lt;span class="toctitle"&gt;Table of Contents&lt;/span&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#creating-and-destorying-objects"&gt;Creating and Destorying Objects&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#the-java-methods-common-to-all-objects"&gt;The Java methods common to all objects&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#classes-and-interfaces"&gt;Classes and Interfaces&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#generics"&gt;Generics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#enums-and-annotations"&gt;Enums and Annotations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#methods"&gt;Methods&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#general-programming"&gt;General Programming&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#exceptions"&gt;Exceptions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#concurrency"&gt;Concurrency&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#serialization"&gt;Serialization&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#end-node"&gt;End Node&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;In the years to come, internet is definitely going to give more and more relief to competent engineers. Having a GitHub repository with a dump of one's pet technology prototypes, having a StackOverflow point score, well articulated tweets and maybe even well-written technology blog (read &lt;a href="http://nathanmarz.com/blog/break-into-silicon-valley-with-a-blog-1.html"&gt;this&lt;/a&gt; by Nathan Marz) will pay dividends to engineers continuously at work to sharpen their axe…&lt;/p&gt;
&lt;p&gt;But then, my current reality is a reality. And I have to take technical interviews as part of my job. And hiring the right people is so much more important for a small company - many times it is the only differentiator between success and failure of the company itself. So with the job's being dished out being so important, technical interviews are not supposed to be easy. Both for the interviewee and the interviewer. Pressed into the interviewing job, I felt the need to brush-up my fundamentals. This post is from my re-read of Joshua Bloch's classic - "&lt;a href="http://www.amazon.com/Effective-Java-Edition-Joshua-Bloch/dp/0321356683"&gt;Effective Java&lt;/a&gt;" - from a interviewer's perspective… trying to quickly refresh the elementary concepts to myself. It aint coherent or complete… will keep adding stuff to this post over time as I realise what questions really make the cut. There are plenty of &lt;em&gt;interview-questions&lt;/em&gt; blogs and books out there - but I felt, instead of quizzing a candidate on some corner case of the JVM or language (which many times the interviewer himself might have realised just hours before the interview), it would be more honest/ethical on my part to quiz in what are well-known and real-world areas of programming for an aspiring engineer - and 'Effective Java' is precisely the guide for such a setting…&lt;/p&gt;
&lt;p&gt;Now planning to write few more blogs like these in the days to come… one surely on Design Patterns by GoF. Maybe one on JavaScript's good parts per Doughlas Crockford. And time permitting, few more… &lt;/p&gt;
&lt;h4 id="creating-and-destorying-objects"&gt;Creating and Destorying Objects&lt;/h4&gt;
&lt;div class="bs-docs-grid"&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 left"&gt;Consider static factory methods instead of constructors&lt;/div&gt;
        &lt;div class="col-md-10 right"&gt;Similar to flyweight. valueof/of/getInstance/newInstance/getType/newType&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 left"&gt;Consider a builder when faced with many constructor parameters&lt;/div&gt;
        &lt;div class="col-md-10 right"&gt;Telescoping constructors are hard to read and write. Inconsistent state partway through the construction&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 left"&gt;Enforce the singleton property with a private constructor or an enum type&lt;/div&gt;
        &lt;div class="col-md-10 right"&gt;All instance fields should be transient. Provide a readResolve() method else serialization/deserialization can lead to new objects&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Enforce non-instantiability with a private constructor&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Avoid creating unnecessary objects&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;A statement like this in a for loop can lead to huge number of unnecessary objects getting created -
            &lt;code&gt;String s = new String("stringette");&lt;/code&gt;
            The improved version is simply the following:
            &lt;code&gt;String s = "stringette";&lt;/code&gt;
            This version uses a single String instance, rather than creating a new one each time it is executed. Furthermore, it is guaranteed that the object will be reused by any other code running in the same virtual machine that happens to con- tain the same string literal
            The static factory method &lt;code&gt;Boolean.valueOf(String)&lt;/code&gt; is almost always preferable to the constructor &lt;code&gt;Boolean(String)&lt;/code&gt;&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Eliminate obsolete object references&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;Spot the memory leak in this program?
            &lt;pre&gt;
            public class Stack {
                   private Object[] elements;
                   private int size = 0;
                   private static final int DEFAULT_INITIAL_CAPACITY = 16;
                   public Stack() {
                       elements = new Object[DEFAULT_INITIAL_CAPACITY];
                   }
                   public void push(Object e) {
                       ensureCapacity();
                       elements[size++] = e;
                   }
                   public Object pop() {
                       if (size == 0)
                           throw new EmptyStackException();
                       return elements[--size];
                   }
                   /**
                    * Ensure space for at least one more element, roughly
                    * doubling the capacity each time the array needs to grow.
                    */
                   private void ensureCapacity() {
                       if (elements.length == size)
                           elements = Arrays.copyOf(elements, 2 * size + 1);
                   }
            }&lt;/pre&gt;
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 left"&gt;Avoid finalizers&lt;/div&gt;
        &lt;div class="col-md-10 right"&gt;What is a finalizer? Is it always called by the GC? Is there a performance penalty to using finalizer? Why?&lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;

&lt;h4 id="the-java-methods-common-to-all-objects"&gt;The Java methods common to all objects&lt;/h4&gt;
&lt;div class="bs-docs-grid"&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Obey the general contract when overriding equals()&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
            &lt;h5&gt;1. When do you override equals()?&lt;/h5&gt;
            When a class has a notion of logical equality that differs from mere object identity, and a superclass has not already overridden equals to implement the desired behavior.
            &lt;h5&gt;2. What are the main rules that you would follow to implement equals()?&lt;/h5&gt;
            &lt;ul class="list-group"&gt;
                &lt;li class="list-group-item"&gt;Use == to check for same reference&lt;/li&gt;
                &lt;li class="list-group-item"&gt;Use instanceof to check if the agrument is of the correct type&lt;/li&gt;
                &lt;li class="list-group-item"&gt;Match all significant fields of the two objects&lt;/li&gt;
                &lt;li class="list-group-item"&gt;Symmetric? Transivitve? Consistent?&lt;/li&gt;
                &lt;li class="list-group-item"&gt;override hashCode()&lt;/li&gt;
            &lt;/ul&gt;
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Always override hashCode() when you override equals()&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
            &lt;h6&gt;1. If two objects are equal according to the equals (Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.&lt;/h6&gt;
            &lt;h6&gt;2. It is not required that if two objects are unequal according to the equals (Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.&lt;/h6&gt;
            &lt;h6&gt;3. How will you compute the hashCode()? Do not be tempted to exclude significant parts of an object from the hash code computation to improve performance&lt;/h6&gt;
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Always override toString()&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Override clone() judiciously&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
        &lt;h5&gt;1. Does Cloneable interface have a clone() method? Why not?&lt;/h5&gt;
        Because the Java Object's clone() method (which is protected) is supposed to be used
        &lt;h5&gt;2. How does Java Object's clone() method work?&lt;/h5&gt;
        If a class implements Cloneable, Object’s clone method returns a field-by-field copy of the object; otherwise it throws CloneNotSupportedException
        &lt;h5&gt;3. What are the 3 rules for implementing Cloneable?&lt;/h5&gt;
            a. x.clone() != x
            b. x.clone().getClass() == x.getClass()
            c. x.clone().equals(x)
        &lt;h5&gt;4. How to clone properly?&lt;/h5&gt;
        All classes that implement Cloneable should override clone with a public method whose return type is the class itself. This method should first call super.clone and then fix any fields that need to be fixed. Typically, this means copying any mutable objects that comprise the internal “deep structure” of the object being cloned, and replacing the clone’s references to these objects with ref- erences to the copies. While these internal copies can generally be made by call- ing clone recursively, this is not always the best approach. If the class contains only primitive fields or references to immutable objects, then it is probably the case that no fields need to be fixed.
        &lt;h5&gt;5. How come interfaces like Cloneable and Serializable have no methods? Why do they exist at all then? How does JVM use them?&lt;/h5&gt;
        The UID and custom readers/writers are accessed via reflection.
        Serializable serves as a marker to the JRE/JVM, which may take action(s) based on its presence. Refer to http://en.wikipedia.org/wiki/Marker_interface_pattern. An example of the application of marker interfaces from the Java programming language is the Serializable interface. A class implements this interface to indicate that its non-transient data members can be written to an ObjectOutputStream. The ObjectOutputStream private method writeObject() contains a series of instanceof tests to determine writeability, one of which looks for the Serializable interface. If any of these tests fails, the method throws a NotSerializableException.
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Consider implementing Comparable&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
            &lt;h5&gt;1. What is the use of the Comparable interface?&lt;/h5&gt;
        Helps in sorting when there is a natural order among the objects
            &lt;h5&gt;2. Whats the difference between interfaces like Comparable and those like Cloneable/Serializable?&lt;/h5&gt;
        &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;

&lt;h4 id="classes-and-interfaces"&gt;Classes and Interfaces&lt;/h4&gt;
&lt;div class="bs-docs-grid"&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Minimize the accessibility of classes and members&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;What is package-private? How do you implement?
        The member is accessible from any class in the package where it is declared. Technically known as default access, this is the access lev- el you get if no access modifier is specified.&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;In public classes, use public classes not public fields&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Minimize mutability&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
&lt;h5&gt;1. Is it good or bad to minimize mutability? why?&lt;/h5&gt;
     If objects are immutable they are automatically thread-safe and no synchronization or locking is required
&lt;h5&gt;2. How would you make an object immutable?&lt;/h5&gt;
&lt;ul class="list-group"&gt;
    &lt;li class="list-group-item"&gt;No mutators - no setters&lt;/li&gt;
    &lt;li class="list-group-item"&gt;Class cant be extended - class should be marked final&lt;/li&gt;
    &lt;li class="list-group-item"&gt;Make all fields final&lt;/li&gt;
    &lt;li class="list-group-item"&gt;Make all fields private&lt;/li&gt;
    &lt;li class="list-group-item"&gt;Ensure exclusive access to any mutable components&lt;/li&gt;
    &lt;li class="list-group-item"&gt;getters should return a new instance of the object&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Favor composition over inheritance&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Design and document for inheritance or else prohibit it&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Prefer interfaces to abstract classes&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
            &lt;h5&gt;Why are interfaces better than abstract classes?&lt;/h5&gt;
            &lt;ul class="list-group"&gt;
                &lt;li class="list-group-item"&gt;Existing classes can be easily retrofitted to implement a new interface&lt;/li&gt;
                &lt;li class="list-group-item"&gt;Interfaces are ideal for defining mixins&lt;/li&gt;
                &lt;li class="list-group-item"&gt;Interfaces allow the construction of nonhierarchical type frameworks.&lt;/li&gt;
                &lt;li class="list-group-item"&gt;Interfaces enable safe, powerful functionality enhancements&lt;/li&gt;
                &lt;li class="list-group-item"&gt;combine the virtues of interfaces and abstract classes by providing an abstract skeletal implementation class to go with each nontrivial interface that you export&lt;/li&gt;
            &lt;/ul&gt;
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Use interfaces only to define types&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
            &lt;h5&gt;Is 'constants' in an interface a good programming pattern?&lt;/h5&gt;
            No.
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Prefer class hierarchies to tagged classes&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Use function objects to represent strategies&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Favor static member classes over nonstatic&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
&lt;h5&gt;1. What are the 4 kinds of nested classes?&lt;/h5&gt;
            &lt;ul class="list-group"&gt;
                &lt;li class="list-group-item"&gt;a. static member classes&lt;/li&gt;
                &lt;li class="list-group-item"&gt;b. nonstatic member classes&lt;/li&gt;
                &lt;li class="list-group-item"&gt;c. anonymous classes&lt;/li&gt;
                &lt;li class="list-group-item"&gt;d. local classes&lt;/li&gt;
            &lt;/ul&gt;

&lt;h5&gt;2. When will you make a nested class static?&lt;/h5&gt;
If an instance of a nested class can exist in isolation from an instance of its enclosing class, then the nested class must be a static member class: it is impossible to create an instance of a nonstatic member class without an enclosing instance. If you declare a member class that does not require access to an enclosing instance, always put the static modifier in its declaration

&lt;h5&gt;3. Why would one prefer static classes?&lt;/h5&gt;
The association between a nonstatic member class instance and its enclosing instance is established when the former is created; it cannot be modified thereafter. Storing this reference costs time and space, and can result in the enclosing instance being retained when it would otherwise be eligible for garbage collection
        &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;

&lt;h4 id="generics"&gt;Generics&lt;/h4&gt;
&lt;div class="bs-docs-grid"&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Dont use raw types in new code&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
&lt;h5&gt;1. What is the problem with doing &lt;code&gt;private final Collection stamps = ... ;&lt;/code&gt;&lt;/h5&gt;
        Loss of compile time type safety
&lt;h5&gt;2. Is List&lt;String&gt;.class legal? What will it give me?&lt;/h5&gt;
        It is not legal
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Eliminate unchecked warnings&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
&lt;h5&gt;How do you eliminate a unchecked warning?&lt;/h5&gt;
        Suppress the warning with an @SuppressWarnings("unchecked") annotation. Always use the Suppress- Warnings annotation on the smallest scope possible.
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Prefer lists to arrays&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
&lt;h5&gt;1. If Sub is a subtype of Super, then is the array Sub[] a subtype of Super[]?&lt;/h5&gt;
Yes. Arrays are covariant. Lists are invariant
&lt;h5&gt;2. So which one is better? And why?&lt;/h5&gt;
Lists are better. Arrays are reified. This means that arrays know and enforce their element types at runtime. Generics, by contrast, are implemented by erasure. This means that they enforce their type constraints only at compile time and discard (or erase) their element type information at runtime.
&lt;h5&gt;3. Test question -&lt;/h5&gt;
This code fragment is legal but fails at runtime! -
                  &lt;pre&gt;
                  Object[] objectArray = new Long[1];
                  objectArray[0] = "I don't fit in"; // Throws ArrayStoreException&lt;/pre&gt;
                  But this one wont compile at all! -
                  &lt;pre&gt;List&amp;lt;Object&amp;gt; ol = new ArrayList&amp;lt;Long&amp;gt;(); // Incompatible types ol.add("I don't fit in");&lt;/pre&gt;
&lt;h5&gt;4. Are these legal?&lt;/h5&gt;
&lt;pre&gt;new List&amp;lt;E&amp;gt;[]
new List&amp;lt;String&amp;gt;[]
new E[]&lt;/pre&gt;
No. It is illegal to create an array of a generic type, a parameterized type, or a type parameter. Types such as E, List&amp;lt;E&amp;gt;, and List&lt;String&gt; are technically known as non-reifiable types. Intuitively speaking, a non-reifiable type is one whose runtime representation contains less information than its compile-time representation.
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Favor generic types&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;Which of these is better and why?
        &lt;pre&gt;
        public class Stack {
            private Object[] elements;
            public void push(Object e) {
            }
            public Object pop() {
            }
        }&lt;/pre&gt;
        or
        &lt;pre&gt;
        public class Stack&lt;E&gt; {
            private E[] elements;
            public void push(E e) {
            }
            public E pop() {
            }
        }&lt;/pre&gt;
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Favor generic methods&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;Which of these is better and why?
        &lt;code&gt;public static Set union(Set s1, Set s2)&lt;/code&gt;
        or
        &lt;code&gt;public static &amp;lt;E&amp;gt; Set&amp;lt;E&amp;gt; union(Set&amp;lt;E&amp;gt; s1, Set&amp;lt;E&amp;gt; s2)&lt;/code&gt;
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Use bounded wildcards to increase API flexibility&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;What is the PECS rule or Get-and-Put principle?
        Bounded wildcards can be of two types -
        &lt;code&gt;X&amp;lt;? extends E&amp;gt;&lt;/code&gt;
        or
        &lt;code&gt;Y&amp;lt;? super E&amp;gt;&lt;/code&gt;
        PECS stands for producer-extends, consumer-super.
        In other words, if a parameterized type represents a T producer, use &lt;? extends T&gt;; if it represents a T consumer, use &lt;? super T&gt;.
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Consider typesafe heterogenous containers&lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;

&lt;h4 id="enums-and-annotations"&gt;Enums and Annotations&lt;/h4&gt;
&lt;div class="bs-docs-grid"&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Use enums instead of int constants&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
&lt;h5&gt;1. Does enum extend Java Object?&lt;/h5&gt;
They provide high-quality implementations of all the Object methods

&lt;h5&gt;2. Which interfaces do enum implement?&lt;/h5&gt;
they implement Comparable and Serializable, and their serialized form is designed to withstand most changes to the enum type.

&lt;h5&gt;3. How would you associate data with enums?&lt;/h5&gt;
To associate data with enum constants, declare instance fields and write a constructor that takes the data and stores it in the fields. Enums are by their nature immutable, so all fields should be final

&lt;h5&gt;4. How would you associate a different behavior with every enum constant?&lt;/h5&gt;
using apply()

&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 left"&gt;Use instance fields instead of ordinals&lt;/div&gt;
        &lt;div class="col-md-10 right"&gt;
&lt;h5&gt;Is using ordinals a bad idea? If so, what is the option?&lt;/h5&gt;
        Use instance fields
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Use EnumSet instead of bit fields&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;Whats the usecase for EnumSets?
        Instead of bit fields which look ugly like this &lt;code&gt;text.applyStyles(STYLE_BOLD | STYLE_ITALIC);&lt;/code&gt;
        one can do this -
        &lt;code&gt;text.applyStyles(EnumSet.of(Style.BOLD, Style.ITALIC));&lt;/code&gt;
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 left"&gt;Use EnumMap instead of ordinal indexing&lt;/div&gt;
        &lt;div class="col-md-10 right"&gt;It is rarely appropriate to use ordinals to index arrays: use EnumMap instead&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Emulate extensible enums with interfaces&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Prefer annotations to naming patterns&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
&lt;h5&gt;1. Any usecase you can think of for custom annotations?&lt;/h5&gt;
JUnit testing framework originally required its users to designate test methods by beginning their names with the characters test
&lt;h5&gt;2. Which annotation do you use most?&lt;/h5&gt;
@Override, @Deprecated, @SuppressWarnings
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Consistently use the Override annotation&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
&lt;h5&gt;What @Override for?&lt;/h5&gt;
        it indicates that the annotated method declaration overrides a declaration in a supertype
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Use marker interfaces to define types&lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;

&lt;h4 id="methods"&gt;Methods&lt;/h4&gt;
&lt;div class="bs-docs-grid"&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Check parameters for validity&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Make defensive copies when needed&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Design method signatures carefully&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
&lt;h5&gt;Is Map as a method parameter better or HashMap - why?&lt;/h5&gt;
        Map is. This is super basic.
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Use overloading judiciously&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Use varargs judiciously&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Return empty arrays or collections, not nulls&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
&lt;h5&gt;What is better - returning null or empty collections?&lt;/h5&gt;
        Empty Collections
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Write doc comments for all exposed API comments&lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;

&lt;h4 id="general-programming"&gt;General Programming&lt;/h4&gt;
&lt;div class="bs-docs-grid"&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Minimize the scope of local variables&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Prefer foreach loops to traditional for loops&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Know and use the libraries&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Avoid float and double if exact answers are required&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Prefer primitives to boxed primitives&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
&lt;h5&gt;What makes the performance of this program bad?&lt;/h5&gt;
            &lt;pre&gt;
            public static void main(String[] args) {
                Long sum = 0L;
                for (long i = 0; i &lt; Integer.MAX_VALUE; i++) {
                    sum += i;
                }
                System.out.println(sum);
            }&lt;/pre&gt;
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Avoid strings when other types are more appropriate&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Beware the performance of string concatenation&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
&lt;h5&gt;1. Before 1.5, for string concatenation StringBuffer was preferred - what is it now?&lt;/h5&gt;
        StringBuilder
&lt;h5&gt;2. What is the difference between StringBuilder and StringBuffer?&lt;/h5&gt;
        StringBuider is unsynchronized - this makes it much faster. But should be used with care in concurrent programs
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Refer to objects by their interfaces&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
&lt;h5&gt;Which one is better and why?&lt;/h5&gt;
        &lt;pre&gt;
        List&lt;Subscriber&gt; subscribers = new ArrayList&lt;Subscriber&gt;();
        ArrayList&lt;Subscriber&gt; subscribers = new ArrayList&lt;Subscriber&gt;();&lt;/pre&gt;
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Prefer interfaces to reflection&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
&lt;h5&gt;Reflection allows one class to use another, even if the latter class did not exist when the former was compiled. So what are the problems using it?&lt;/h5&gt;
    &lt;ul class="list-group"&gt;
        &lt;li class="list-group-item"&gt;You lose all the benefits of compile-time type checking, including exception checking&lt;/li&gt;
        &lt;li class="list-group-item"&gt;The code required to perform reflective access is clumsy and verbose&lt;/li&gt;
        &lt;li class="list-group-item"&gt;Performance suffers&lt;/li&gt;
    &lt;/ul&gt;
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Use native methods judiciously&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Optimize judiciously&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Adhere to generally accepted naming conventions&lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;

&lt;h4 id="exceptions"&gt;Exceptions&lt;/h4&gt;
&lt;div class="bs-docs-grid"&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Use exceptions only for exceptional conditions&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Use checked exceptions for recoverable conditions and runtime exceptions for programming errors&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
&lt;h5&gt;1. What are the different types of exceptions?&lt;/h5&gt;
        * Checked exceptions
        * Unchecked exceptions - runtime exceptions and errors
&lt;h5&gt;2. When would you code for checked exceptions?&lt;h5&gt;
        when the caller is can reasonably expected to recover
&lt;h5&gt;3. When would you throw a runtime exception?&lt;/h5&gt;
        When the program is as good as dead
&lt;h5&gt;4. When would you throw a error?&lt;/h5&gt;
         there is a strong convention that errors are reserved for use by the JVM to indicate resource defi- ciencies, invariant failures, or other conditions that make it impossible to continue execution. Given the almost universal acceptance of this convention, it’s best not to implement any new Error subclasses. Therefore, all of the unchecked throw- ables you implement should subclass RuntimeException
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Avoid unnecessary use of checked exceptions&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
&lt;h5&gt;Tell me the exceptions you know and when you would use them&lt;/h5&gt;
&lt;ul class="list-group"&gt;
    &lt;li class="list-group-item"&gt;IllegalArgumentException - argument aint right&lt;/li&gt;
    &lt;li class="list-group-item"&gt;IllegalStateException - calling a method on an object before it is properly initialized&lt;/li&gt;
    &lt;li class="list-group-item"&gt;NullPointerException - someone invokes a method on a null object&lt;/li&gt;
    &lt;li class="list-group-item"&gt;ConcurrentModificationException - if a object designed to be used by a single thread is being concurrently modified&lt;/li&gt;
    &lt;li class="list-group-item"&gt;IndexOutOfBoundException - accessing array beyond its data length&lt;/li&gt;
    &lt;li class="list-group-item"&gt;UnsupportedOperationException - object does not support a method&lt;/li&gt;
&lt;/ul&gt;
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Favor the use of standard exceptions&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Throw exceptions appropriate to the abstraction&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Document all exceptions thrown by each method&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Include failure-capture information in detail messages&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Strive for failure atomicity&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Dont ignore exceptions&lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;

&lt;h4 id="concurrency"&gt;Concurrency&lt;/h4&gt;
&lt;div class="bs-docs-grid"&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Synchronize access to shared mutable data&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
&lt;h5&gt;1. Is writing of all primitive data types atomic in Java?&lt;/h5&gt;
        reading or writing a variable is atomic unless the variable is of type long or double
&lt;h5&gt;2. How long would you expect this program to run?&lt;/h5&gt;
        &lt;pre&gt;
        public class StopThread {
               private static boolean stopRequested;
               public static void main(String[] args)
                       throws InterruptedException {
                   Thread backgroundThread = new Thread(new Runnable() {
                       public void run() {
                           int i = 0;
                           while (!stopRequested)
                                i++;
                       }
                   });
                   backgroundThread.start();
                   TimeUnit.SECONDS.sleep(1);
                   stopRequested = true;
               }
        }&lt;/pre&gt;
        Probably permanently. The VM might do what is called hoisting, the virtual machine might transform this code:
        &lt;pre&gt;
        while (!done)
            i++;&lt;/pre&gt;
        into this code:
        &lt;pre&gt;
        if (!done)
            while (true)
                i++;&lt;/pre&gt;
        How would you correct his?

&lt;h5&gt;3. Is this program thread safe? Can generateSerialNumber() be called from multiple threads safely?&lt;/h5&gt;
        &lt;pre&gt;
        private static volatile int nextSerialNumber = 0;
           public static int generateSerialNumber() {
               return nextSerialNumber++;
        }&lt;/pre&gt;

&lt;h5&gt;4. What are the 4 factors that need trade-off when writing multi-threaded concurrent programs?&lt;/h5&gt;
        Safety, Liveness, Efficiency, Reusability

&lt;h5&gt;5. Whats the tradeoff between Safety and Liveness?&lt;/h5&gt;
        safety: nothing bad happens
        liveness: something good eventually happens

&lt;h5&gt;6. What is reentracy? Is Java reentrant?&lt;/h5&gt;
        Yes

&lt;h5&gt;7. Whats the difference between ArrayList and CopyOnWriteArrayList?&lt;/h5&gt;
        It is a variant of ArrayList in which all write operations are implemented by making a fresh copy of the entire underlying array. Because the internal array is never modified, iteration requires no locking and is very fast. For most uses, the performance of CopyOnWriteArrayList would be atrocious, but it’s perfect for observer lists, which are rarely modified and often traversed.
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Avoid excessive synchronization&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Prefer executors and tasks to threads&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
&lt;h5&gt;1. In the post Java 1.5 world, the use of 'Thread' is probably not a good idea due to the availability of new functionality in java.util.concurrent - what are they?&lt;/h5&gt;
                Executors and tasks
 &lt;h5&gt;2. There are some data structures designed in Java collections specifically for concurrent usage - what are they and how do they work?&lt;/h5&gt;
         ConcurrentHashMap etc
&lt;h5&gt;3. Why is it a bad idea to rely on Thread.yield or Java's thread priorities API?&lt;/h5&gt;
        Not portable
      &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Prefer concurrency utilities to wait and notify&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Document thread safety&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Use lazy initialization judiciously&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Dont depend on the thread scheduler&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Avoid thread groups&lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;

&lt;h4 id="serialization"&gt;Serialization&lt;/h4&gt;
&lt;div class="bs-docs-grid"&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Implement serializable judiciously&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
&lt;h5&gt;1. What is serialVersionUID?&lt;/h5&gt;
 Every serializable class has a unique identification number associated with it. If you do not specify this number explicitly by declaring a static final long field named serialVersionUID, the system automatically generates it at runtime by applying a complex procedure to the class. The automatically generated value is affected by the class’s name, the names of the interfaces it implements, and all of its public and protected members. If you change any of these things in any way, for example, by adding a trivial convenience method, the automatically generated serial version UID changes. If you fail to declare an explicit serial version UID, compatibility will be broken, resulting in an InvalidClassException at runtime. If no serial version UID is provided, an expensive computation is required to generate one at runtime. If you ever want to make a new version of a class that is incompatible with existing versions, merely change the value in the serial version UID declaration.

&lt;h5&gt;2. Why should a class be made to implement Serilizable with caution?&lt;/h5&gt;
 A major cost of implementing Serializable is that it decreases the flexibility to change a class’s implementation once it has been released.
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-2 right"&gt;Consider using a custom serialized form&lt;/div&gt;
        &lt;div class="col-md-10 left"&gt;
&lt;h5&gt;How good is Java's ObjectStream based Serialization? When would you implement your own custom serialized form?&lt;/h5&gt;
        The default serialized form of an object is a reasonably efficient encoding. The default serialized form is likely to be appropriate if an object’s phys- ical representation is identical to its logical content. Drawbacks - can be excessive in space consumption, not very fast, it permanently ties the exported API to the current internal representation
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Write readObject methods defensively&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;For instance control prefer enum types than readResolve&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row show-grid"&gt;
        &lt;div class="col-md-12"&gt;Consider serialization proxies instead of serialized instances&lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;

&lt;h4 id="end-node"&gt;End Node&lt;/h4&gt;</content><category term="posts"/></entry><entry><title>System and Application health - Is there a data collection challenge at the DC?</title><link href="https://bharath12345.github.io/posts/is-there-a-collection-challenge-at-the-data-center/" rel="alternate"/><published>2013-08-06T00:00:00-04:00</published><updated>2013-08-06T00:00:00-04:00</updated><author><name>Bharadwaj</name></author><id>tag:bharath12345.github.io,2013-08-06:/posts/is-there-a-collection-challenge-at-the-data-center/</id><summary type="html">&lt;p&gt;My champion-hacker friend Sumanth and I spent a little time few weeks ago digging to know if there was a data collection challenge for system and application health metrics at a typical small data center. Here is the little that we discovered...&lt;/p&gt;
&lt;div class="toc"&gt;&lt;span class="toctitle"&gt;Table of Contents&lt;/span&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#usecase"&gt;Usecase&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#it-resources"&gt;IT Resources&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#quantum-of-data-to-collect"&gt;Quantum of …&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;</summary><content type="html">&lt;p&gt;My champion-hacker friend Sumanth and I spent a little time few weeks ago digging to know if there was a data collection challenge for system and application health metrics at a typical small data center. Here is the little that we discovered...&lt;/p&gt;
&lt;div class="toc"&gt;&lt;span class="toctitle"&gt;Table of Contents&lt;/span&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#usecase"&gt;Usecase&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#it-resources"&gt;IT Resources&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#quantum-of-data-to-collect"&gt;Quantum of data to collect&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#the-developers-view"&gt;The Developer's View&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#is-there-a-data-collection-challenge"&gt;Is there a Data Collection 'Challenge'?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#with-little-more-scale"&gt;With little more scale&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#with-ganglia"&gt;With Ganglia&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;But the self questioning was preceded by a phase where I came to know about open source monitoring tools like &lt;a href="http://ganglia.sourceforge.net/"&gt;Ganglia&lt;/a&gt; and discovered that they were widely deployed. Ganglia in particular is very active as a development project and uses &lt;a href="http://en.wikipedia.org/wiki/Multicast"&gt;Multicast&lt;/a&gt; for data transmission. It caches data at all nodes within a cluster. The collection station has to communicate to only one (any) node within a cluster. I asked myself what could be the rationale behind this design. And as I looked more deeply into the world of Ganglia I discovered the amazing attention to detail - to optimise data and time without compromising accuracy or extensibility. For those wishing to understand Ganglia, &lt;a href="http://shop.oreilly.com/product/0636920025573.do"&gt;this book&lt;/a&gt; is a must read. The chapter on case-studies in this book makes for a truly fascinating read. &lt;a href="http://www.ittc.ku.edu/~niehaus/classes/750-s07/documents/ganglia-parallel-computing.pdf"&gt;This paper&lt;/a&gt; also provides a very good intro.&lt;/p&gt;
&lt;p&gt;And then there is SNMP. SNMP was built for monitoring. One small deficiency is that not all metrics are available through SNMP. In this blog I decided to keep SNMP aside and do the analysis. I present the numbers first and my inferences later. &lt;/p&gt;
&lt;h4 id="usecase"&gt;Usecase&lt;/h4&gt;
&lt;p&gt;Let me take the example of a data center for a small eCommerce startup, say, called "WebTraveller". Now, a little detail about this company -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;WebTraveller is a travel portal on the lines of Expedia but with a target market of few small-and-medium-sized enterprises in its region&lt;/li&gt;
&lt;li&gt;WebTraveller decided to build its web application using &lt;em&gt;Ruby&lt;/em&gt; (which is going strong as No.2 on GitHub top languages - &lt;a href="https://github.com/languages"&gt;https://github.com/languages&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;For its many static pages and load-balancing, the IT folk at WebTraveller decided to do the time-tested thingy - use &lt;em&gt;Apache HTTPD or Nginix&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;WebTraveller has to interface with travel data providers (airlines, bus company's et al), payment gateways, advertisers etc. Let us assume a simple idealistic world where all this data comes through superbly designed RESTful interface. So, the  IT folk at WebTraveller decided to publish/subscribe to this RESTful external data interface through a &lt;em&gt;Java application&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;WebTraveller uses MySql or Postgres as its database to store user info etc&lt;/li&gt;
&lt;li&gt;Analytics is important for WebTraveller to run promotions, tune resources per demand/supply and present forecasts/state-of-business to investors - so, as a policy 10% of all IT resources are ear-marked for 'analytics'&lt;/li&gt;
&lt;li&gt;And as a policy, no more than 5% of IT resources should be consumed by monitoring and management tools - these are overheads and should be kept to a minimum after all &lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="it-resources"&gt;IT Resources&lt;/h4&gt;
&lt;p&gt;Now, how much IT resources will WebTraveller require? Since I am the de facto CIO of WebTraveller and it is my first CIO job, I decide to start with a nice whole number - say 100 servers (okay, I hear you, all on cloud). Now, here is the split up of what this 100 servers are going to do…&lt;/p&gt;
&lt;table class="table table-bordered table-striped table-condensed bs-docs-grid"&gt;
    &lt;tr&gt;
        &lt;td&gt;1&lt;/td&gt;
        &lt;td&gt;Servers running WebTraveller's Ruby based dynamic Web-Application&lt;/td&gt;
        &lt;td&gt;25&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;2&lt;/td&gt;
        &lt;td&gt;Servers sourcing WebTraveller's static page and load-balancer (httpd/nginix)&lt;/td&gt;
        &lt;td&gt;15&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;3&lt;/td&gt;
        &lt;td&gt;Servers running WebTraveller Java interface with its data providers&lt;/td&gt;
        &lt;td&gt;25&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;4&lt;/td&gt;
        &lt;td&gt;WebTraveller database servers&lt;/td&gt;
        &lt;td&gt;20&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;5&lt;/td&gt;
        &lt;td&gt;IT analytics&lt;/td&gt;
        &lt;td&gt;10&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;6&lt;/td&gt;
        &lt;td&gt;IT management/monitoring&lt;/td&gt;
        &lt;td&gt;5&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;-&lt;/td&gt;
        &lt;td&gt;Total Servers&lt;/td&gt;
        &lt;td&gt;100&lt;/td&gt;
    &lt;/tr&gt;
&lt;/table&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;(How much am I off the mark in these assumptions? If its horrific, then please let me know and I promise to re-do this blog)&lt;/em&gt;&lt;/p&gt;
&lt;h4 id="quantum-of-data-to-collect"&gt;Quantum of data to collect&lt;/h4&gt;
&lt;p&gt;Being the CIO, I want to understand how my IT is coping. So I need data. Data on server's utilisation, database metrics, web-server metrics etc. Industry calls these various metrics as KPI - Key Performance Indicators. So KPI it will be. How many KPIs do I need to collect for each type of IT resource?&lt;/p&gt;
&lt;table class="table table-bordered table-striped table-condensed bs-docs-grid"&gt;
    &lt;tr&gt;
        &lt;td&gt;#&lt;/td&gt;
        &lt;td&gt;KPI Type&lt;/td&gt;
        &lt;td&gt;Approximate Number of KPIs per instance&lt;/td&gt;
        &lt;td&gt;Number of Instances (from the above table)&lt;/td&gt;
        &lt;td&gt;Total KPIs to collect&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;1&lt;/td&gt;
        &lt;td&gt;Operating system level KPIs - CPU, RAM, open sockets, HDD usage, network card stats etc&lt;/td&gt;
        &lt;td&gt;5&lt;/td&gt;
        &lt;td&gt;100&lt;/td&gt;
        &lt;td&gt;500&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;2&lt;/td&gt;
        &lt;td&gt;Ruby web-app KPIs&lt;/td&gt;
        &lt;td&gt;10&lt;/td&gt;
        &lt;td&gt;25&lt;/td&gt;
        &lt;td&gt;250&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;3&lt;/td&gt;
        &lt;td&gt;Ruby web-app runs on the Rails server. KPIs that speak Rails health&lt;/td&gt;
        &lt;td&gt;10&lt;/td&gt;
        &lt;td&gt;25&lt;/td&gt;
        &lt;td&gt;250&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;4&lt;/td&gt;
        &lt;td&gt;Java web-app KPIs&lt;/td&gt;
        &lt;td&gt;10&lt;/td&gt;
        &lt;td&gt;25&lt;/td&gt;
        &lt;td&gt;250&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;5&lt;/td&gt;
        &lt;td&gt;Java web-app's use a JVM and app-server (JBoss/Glassfish/Tomcat). KPIs that speak Java platform health&lt;/td&gt;
        &lt;td&gt;10&lt;/td&gt;
        &lt;td&gt;25&lt;/td&gt;
        &lt;td&gt;250&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;6&lt;/td&gt;
        &lt;td&gt;HTTPD or NGINIX KPIs&lt;/td&gt;
        &lt;td&gt;10&lt;/td&gt;
        &lt;td&gt;15&lt;/td&gt;
        &lt;td&gt;150&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;7&lt;/td&gt;
        &lt;td&gt;Database Server KPIs&lt;/td&gt;
        &lt;td&gt;20&lt;/td&gt;
        &lt;td&gt;20&lt;/td&gt;
        &lt;td&gt;400&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;8&lt;/td&gt;
        &lt;td&gt;KPIs from the Analytics system (say running Hadoop)&lt;/td&gt;
        &lt;td&gt;10&lt;/td&gt;
        &lt;td&gt;10&lt;/td&gt;
        &lt;td&gt;100&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;-&lt;/td&gt;
        &lt;td&gt;Total&lt;/td&gt;
        &lt;td&gt;-&lt;/td&gt;
        &lt;td&gt;-&lt;/td&gt;
        &lt;td&gt;2150&lt;/td&gt;
    &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;So, the approximate total number of KPIs to collect is 2150. Which is an average of about 21 KPIs to be collected from the 100 servers of WebTraveller. Now, how frequently do we want to collect this data? I as the CIO of WebTraveller want my IT to be really AGILE - which means I don't want to miss any data (especially in its initial days!). And I also want to keep it SIMPLE. So I ask my monitoring team to collect all these KPIs &lt;em&gt;every minute&lt;/em&gt;.  &lt;/p&gt;
&lt;h4 id="the-developers-view"&gt;The Developer's View&lt;/h4&gt;
&lt;p&gt;'Mr. Bean' is a developer in WebTraveller's IT team. Mr. Bean's task is cut out - he has to develop the monitoring app that collects 2150 metrics every minute by polling. Being a seasoned developer, he knows for sure that to collect so many KPIs he needs to code a 'multi-threaded' application. So Bean decides to do some estimation. How many threads will his application need to capture 2150 KPIs every minute?&lt;/p&gt;
&lt;p&gt;First of all, what are the different methods that exist to capture these KPIs from a remote server? Here are the necessary few -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;JMX to collect from Java applications&lt;/li&gt;
&lt;li&gt;JDBC to collect from the databases itself&lt;/li&gt;
&lt;li&gt;RPC/RMI or SSH based log-monitoring mechanism to retrieve data from the Ruby part&lt;/li&gt;
&lt;li&gt;RPC/RMI or SSH based log-monitoring to retrieve data from HTTPD/NGINX&lt;/li&gt;
&lt;li&gt;Server level stats through remote SSH&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Mr. Bean calculates the response-time for various collection methods - &lt;/p&gt;
&lt;table class="table table-bordered table-striped table-condensed bs-docs-grid"&gt;
    &lt;tr&gt;
        &lt;td&gt;#&lt;/td&gt;
        &lt;td&gt;Collection Method&lt;/td&gt;
        &lt;td&gt;Observation&lt;/td&gt;
        &lt;td&gt;Mean time to collect a set of KPIs from one instance&lt;/td&gt;
        &lt;td&gt;Num of servers that can be covered in 1 minute in a single thread&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;1&lt;/td&gt;
        &lt;td&gt;SSH&lt;/td&gt;
        &lt;td&gt;SSH involves two types of time - (1) time taken for connection establishment and teardown (2) multiple commands need to be run on the remote shell, data collated and retrieved&lt;/td&gt;
        &lt;td&gt;15 seconds&lt;/td&gt;
        &lt;td&gt;60/15 =&gt; 4 servers&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;2&lt;/td&gt;
        &lt;td&gt;JMX&lt;/td&gt;
        &lt;td&gt;Multiple JMX attributes can retrieved at once. But here again there are the 2 phases of connection and retrieval&lt;/td&gt;
        &lt;td&gt;15 seconds&lt;/td&gt;
        &lt;td&gt;60/15 =&gt; 4 servers&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;3&lt;/td&gt;
        &lt;td&gt;JDBC&lt;/td&gt;
        &lt;td&gt;Single JDBC session can get a lot of metrics&lt;/td&gt;
        &lt;td&gt;Assume 20 seconds to retrieve all 20 database server KPIs of one instance&lt;/td&gt;
        &lt;td&gt;60/20 =&gt; 3 servers&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;4&lt;/td&gt;
        &lt;td&gt;RPC or RMI&lt;/td&gt;
        &lt;td&gt;Am not sure if multiple variable can be retrieved in a single session. Assuming its possible...&lt;/td&gt;
        &lt;td&gt;15 seconds&lt;/td&gt;
        &lt;td&gt;60/15 =&gt; 4 servers&lt;/td&gt;
    &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;With this understanding, Mr. Bean decides on which collection technology to use for each class of KPIs (in table below). Also, Mr. Bean wants to know the number of threads his application may have to run. Mr. Bean knows that ideally, he would want to do Asynchronous collection for each of these - that is, start a request in Thread-A and retrieve the data from Thread-B when it arrives - there are many libraries that provide such Asynchronous capabilities for each of SSH, RPC, RMI, JMX, JDBC etc. However, Asynchronous communication does not lead to conservative number of threads - a thread gets forked whenever data arrives. For most conservative number of threads, a select-and-poll based method is most appropriate. The big deficiency of select-and-poll approach however is that data collection with time boundaries becomes tougher. There is no guarantee that the above mentioned mean times will always hold good. And also, the data that arrives is distributed wildly on the temporal scale. &lt;/p&gt;
&lt;p&gt;So, Mr. Bean calculates the number of threads that his application will end-up with if he takes either of the approaches -&lt;/p&gt;
&lt;table class="table table-bordered table-striped table-condensed bs-docs-grid"&gt;
    &lt;tr&gt;
        &lt;td&gt;#&lt;/td&gt;
        &lt;td&gt;KPI Type&lt;/td&gt;
        &lt;td&gt;Data Collection Technology&lt;/td&gt;
        &lt;td&gt;Number of Instances (from the above table)&lt;/td&gt;
        &lt;td&gt;Number of threads for select-and-poll approach&lt;/td&gt;
        &lt;td&gt;Number of threads for asynchronous approach&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;1&lt;/td&gt;
        &lt;td&gt;Operating system level KPIs - CPU, RAM, open sockets, HDD usage, network card stats etc&lt;/td&gt;
        &lt;td&gt;SSH&lt;/td&gt;
        &lt;td&gt;100&lt;/td&gt;
        &lt;td&gt;(100 servers/4 servers per min per thread) =&gt; 25 threads&lt;/td&gt;
        &lt;td&gt;100&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;2&lt;/td&gt;
        &lt;td&gt;Ruby web-app KPIs&lt;/td&gt;
        &lt;td&gt;RPC or RMI&lt;/td&gt;
        &lt;td&gt;25&lt;/td&gt;
        &lt;td&gt;(25/4) =&gt; 6.25 (assuming fractions in num threads is possible!)&lt;/td&gt;
        &lt;td&gt;25&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;3&lt;/td&gt;
        &lt;td&gt;Ruby web-app runs on the Rails server. KPIs that speak Rails health&lt;/td&gt;
        &lt;td&gt;RPC or RMI&lt;/td&gt;
        &lt;td&gt;25&lt;/td&gt;
        &lt;td&gt;(25/4) =&gt; 6.25&lt;/td&gt;
        &lt;td&gt;25&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;4&lt;/td&gt;
        &lt;td&gt;Java web-app KPIs&lt;/td&gt;
        &lt;td&gt;JMX&lt;/td&gt;
        &lt;td&gt;25&lt;/td&gt;
        &lt;td&gt;(25/4) =&gt; 6.25&lt;/td&gt;
        &lt;td&gt;25&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;5&lt;/td&gt;
        &lt;td&gt;Java web-app's use a JVM and app-server (JBoss/Glassfish/Tomcat). KPIs that speak Java platform health&lt;/td&gt;
        &lt;td&gt;JMX&lt;/td&gt;
        &lt;td&gt;25&lt;/td&gt;
        &lt;td&gt;(25/4) =&gt; 6.25&lt;/td&gt;
        &lt;td&gt;25&lt;/td&gt;     
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;6&lt;/td&gt;
        &lt;td&gt;HTTPD or NGINIX KPIs&lt;/td&gt;
        &lt;td&gt;SSH&lt;/td&gt;
        &lt;td&gt;15&lt;/td&gt;
        &lt;td&gt;(15/4) =&gt; 4&lt;/td&gt;
        &lt;td&gt;15&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;7&lt;/td&gt;
        &lt;td&gt;Database Server KPIs&lt;/td&gt;
        &lt;td&gt;JDBC&lt;/td&gt;
        &lt;td&gt;20&lt;/td&gt;
        &lt;td&gt;(20/3) =&gt; 7&lt;/td&gt;
        &lt;td&gt;20&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;8&lt;/td&gt;
        &lt;td&gt;KPIs from the Analytics system (say running Hadoop)&lt;/td&gt;
        &lt;td&gt;SSH&lt;/td&gt;
        &lt;td&gt;10&lt;/td&gt;
        &lt;td&gt;(10/4) =&gt; 3&lt;/td&gt;
        &lt;td&gt;10&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;&lt;/td&gt;
        &lt;td&gt;Total&lt;/td&gt;
        &lt;td&gt;-&lt;/td&gt;
        &lt;td&gt;-&lt;/td&gt;
        &lt;td&gt;64 threads&lt;/td&gt;
        &lt;td&gt;245 threads&lt;/td&gt;
    &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;So the realm of number of threads to gather information from the 100 server deployment at WebTraveller is approximately between 60 to 250 threads. The following factors are pertinent - &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Usage of async libraries would provide a better temporal distribution and fault-safety &lt;/li&gt;
&lt;li&gt;With a select-and-poll approach, the 64 threads will be active all the time. With Asynchronous approach 245 threads are forked every minute and they end much before the minute boundary (hopefully)&lt;/li&gt;
&lt;li&gt;The number of socket descriptors required will have one-to-one correspondence with number of threads (in this case). So 64 sockets will be open at any point of time by the polling approach, while up to 245 sockets could be open at any point of time by the asynchronous approach&lt;/li&gt;
&lt;li&gt;One can always mix-and-match between polling and asynchronous data collection for different types. For example JMX can be collected in an asynchronous way while SSH can be collected by the polling methods&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="is-there-a-data-collection-challenge"&gt;Is there a Data Collection 'Challenge'?&lt;/h4&gt;
&lt;p&gt;The numbers say that on average 1 to 2 threads/sockets are required to collect data from each instance. This does not sound much in WebTraveller's case but one needs to pay attention to the following details - &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I have assumed that all data points are to be collected per minute. It could very well be that data is required at a much more granular level for certain metrics - say every 5 seconds. In which case, the number of threads and sockets would simply go up 12 times!&lt;/li&gt;
&lt;li&gt;The average number of KPI per server, at 21, is a super conservative estimate. In most production environments, this number will at least double and generally, much much higher &lt;/li&gt;
&lt;li&gt;I have not considered the challenge (if there is one) on the persistence side of things - how easy is to to store all this data in a RDBMS (or NoSql!) and design queries for real-time?&lt;/li&gt;
&lt;li&gt;And these numbers need to be coupled with the natural challenges of data collection, which are - &lt;ul&gt;
&lt;li&gt;Horizontal scalability&lt;/li&gt;
&lt;li&gt;Fault tolerance&lt;/li&gt;
&lt;li&gt;More accurate temporal distribution&lt;/li&gt;
&lt;li&gt;Tired architecture induces delay in real-time collection and storage&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="with-little-more-scale"&gt;With little more scale&lt;/h4&gt;
&lt;p&gt;The situation changes considerably if we consider a data-center with 3000 servers. Even with a linear extrapolation, it would involve collection of about 65,000 data-points every minute. And in excess of 10,000 threads and sockets. &lt;/p&gt;
&lt;h4 id="with-ganglia"&gt;With Ganglia&lt;/h4&gt;
&lt;p&gt;In WebTraveller's case, Mr. Bean could potentially do one other thing. He could use Ganglia to collect the data. Each of the 5 functional groups (from the first table above) in WebTraveller's data-center could be configured as separate Ganglia clusters. This leads to -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The data collector having to communicate with only 5 servers instead of 100 - because Ganglia stores the data collected in each cluster at all the nodes&lt;/li&gt;
&lt;li&gt;In each cluster, Ganglia collects data for the constituent supporting collection technology (JMX/JDBC etc). Ganglia can use UDP Multicast, thereby only 5 threads are required to collect ALL the data. Well, now, thats one huge optimisation, isn't it? On the aside however, the Ganglia clients will have to be extended to collect from diverse sources and pre-installed on all servers. Yet, the huge saving in monitoring cost is visible in a straightforward way... &lt;/li&gt;
&lt;li&gt;Total linear scalability - even if the cluster sizes go up ten times, the load on the management server does not increase at all. The payload from each cluster might go up - but that is not much cost in collection&lt;/li&gt;
&lt;li&gt;Low granularity polling - with Ganglia, very low granularity polling within each cluster does not increase the load on the monitoring server. The monitoring server can continue to receive data on minute boundaries after all&lt;/li&gt;
&lt;li&gt;The positive effects of local storage and fault tolerance that a Ganglia based monitoring can provide&lt;/li&gt;
&lt;/ul&gt;</content><category term="posts"/></entry><entry><title>Real Time Dashboard with Camel, ActiveMQ &amp; Dojo... On JBoss7 and using JMS &amp; WebSocket</title><link href="https://bharath12345.github.io/posts/real-time-dashboard-with-camel-activemq-dojo-on-jboss-using-jms--websocket/" rel="alternate"/><published>2013-08-01T00:00:00-04:00</published><updated>2013-08-01T00:00:00-04:00</updated><author><name>Bharadwaj</name></author><id>tag:bharath12345.github.io,2013-08-01:/posts/real-time-dashboard-with-camel-activemq-dojo-on-jboss-using-jms--websocket/</id><summary type="html">&lt;p&gt;I have built real-time 'stock-ticker' like dashboards. There are many ways to build them. Few months ago I had the opportunity to design one freshly again for an enterprise product. I did a quick sweep at the different technology stacks that can be used to build a highly scalable (design …&lt;/p&gt;</summary><content type="html">&lt;p&gt;I have built real-time 'stock-ticker' like dashboards. There are many ways to build them. Few months ago I had the opportunity to design one freshly again for an enterprise product. I did a quick sweep at the different technology stacks that can be used to build a highly scalable (design/code and performance scalability) real-time dashboard. There are many technologies for real-time in the browser (like BlazeDS) that are either outdated or on their way out. I came across this very interesting &lt;a href="http://fusesource.com/apache-camel-conference-2012/videos/camelone-2012-charles-moulliard-video/"&gt;presentation&lt;/a&gt;, &lt;a href="https://github.com/FuseByExample/websocket-activemq-camel"&gt;code&lt;/a&gt; and &lt;a href="http://cmoulliard.blogspot.in/2012_04_01_archive.html"&gt;blog&lt;/a&gt; by Charles Moulliard which I found to be a very exciting design. So I sat down to extend what Charles had done to suit my usecase. I would recommend &lt;a href="http://www.amazon.com/The-Definitive-Guide-HTML5-WebSocket/dp/1430247401"&gt;this nice book&lt;/a&gt; by Apress as a good introduction to the subject of WebSockets. But before getting to the real usecase and seeing why use Camel or ActiveMQ, here is a quick primer to the different techniques one could use to build a real-time dashboard.&lt;/p&gt;
&lt;div class="toc"&gt;&lt;span class="toctitle"&gt;Table of Contents&lt;/span&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#primer-of-different-techniques"&gt;Primer of Different Techniques&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#1-polling-based"&gt;1. Polling Based&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#2-stateful-and-restful"&gt;2. Stateful and RESTful&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#3-comet"&gt;3. Comet&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#4-websocket"&gt;4. WebSocket&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#usecase"&gt;Usecase&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#design"&gt;Design&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#the-whys"&gt;The Why's?&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#1-why-apache-camel"&gt;1. Why Apache Camel?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#2-why-activemq-and-not-camels-native-jms-implementation"&gt;2. Why ActiveMQ and not Camel's native JMS implementation?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#3-why-websocket"&gt;3. Why WebSocket?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#4-why-jboss7"&gt;4. Why JBoss7?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#how-to-use-and-results"&gt;How to use and results&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#my-conclusion"&gt;My Conclusion!!&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h3 id="primer-of-different-techniques"&gt;Primer of Different Techniques&lt;/h3&gt;
&lt;h4 id="1-polling-based"&gt;1. Polling Based&lt;/h4&gt;
&lt;p&gt;Ajax requires a client side request to get data to the browser. So the simplest solution is to buld a client side timer based poller. Maybe use JavaScript timers like setInterval or setTimeout (or wrappers from libraries). &lt;/p&gt;
&lt;table class="table table-bordered table-striped table-condensed bs-docs-grid"&gt;
    &lt;tr&gt;
        &lt;td&gt;Pro&lt;/td&gt;
        &lt;td&gt;Con&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Simplicity&lt;/td&gt;
        &lt;td&gt;If the data being polled is increasing or is large, continuous degradation in performance is natural as data is fetched and rendered each time. If the 'real-time' SLAs call for changes to be shown quickly (&lt; 5 seconds), then continous polling on the client starts to weigh heavy on the data source. It could lead to continously running large number of SQLs. Above all, this would simply not scale if you expect a large number of users and/or large number or real-time data types.&lt;/td&gt;
    &lt;/tr&gt;
&lt;/table&gt;

&lt;h4 id="2-stateful-and-restful"&gt;2. Stateful and RESTful&lt;/h4&gt;
&lt;p&gt;Maintain 'states' at either server or client to reduce what is queried and transmit size. Actually there are two options,&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Client side stateful&lt;/li&gt;
&lt;li&gt;Server side stateful&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But &lt;a href="http://en.wikipedia.org/wiki/Representational_state_transfer#Constraints"&gt;REST mandates&lt;/a&gt; the following 2 constrains -&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;Stateless&lt;/span&gt;
&lt;span class="n"&gt;The&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="err"&gt;–&lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;communication&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;further&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;constrained&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;by&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;no&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;being&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;stored&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;on&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;between&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Each&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;contains&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;all&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;information&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;necessary&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ow"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;held&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ow"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;

&lt;span class="n"&gt;Cacheable&lt;/span&gt;
&lt;span class="n"&gt;As&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;on&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;World&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Wide&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Web&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;clients&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;can&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Responses&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;must&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;therefore&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;implicitly&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ow"&gt;or&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;explicitly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;define&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;themselves&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;cacheable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ow"&gt;or&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ow"&gt;not&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;prevent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;clients&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;reusing&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;stale&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ow"&gt;or&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;inappropriate&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ow"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;further&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Well&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;managed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;caching&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;partially&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ow"&gt;or&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;completely&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;eliminates&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;some&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="err"&gt;–&lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;interactions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;further&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;improving&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;scalability&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ow"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;performance&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Am no expert in RESTful design. But I know for sure that many implementations (especially those which have &lt;em&gt;streaming&lt;/em&gt; in their name) relax the stateless at server constraint. So, statefulness can go thus -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Client-side stateful&lt;/strong&gt;: Client asks for only the incremental. For example a timestamp based method could be adopted by the client to get the incrementals (by doing so the timestamp becomes the 'state'). There are some wonderful JavaScript frameworks that make state maintenance possible. One can use &lt;a href="http://backbonejs.org"&gt;BackboneJS&lt;/a&gt; or Dojo's &lt;a href="http://dojotoolkit.org/reference-guide/1.9/dojo/store/Observable.html"&gt;Observable&lt;/a&gt; pattern to build a store in the browser and update the UI only on the incremental changes. Combined with RESTful HTTP APIs on the server-side, one can build robust applications&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Server-side stateful&lt;/strong&gt;: Server can respond with &lt;em&gt;only&lt;/em&gt; the incremental when a request from the same client arrives. Server side HTTP API's publish incremental data of different types and filtering. A session handshake or client-subscription is required before the start (server has to maintain state for each client).&lt;/li&gt;
&lt;/ul&gt;
&lt;table class="table table-bordered table-striped table-condensed bs-docs-grid"&gt;
    &lt;tr&gt;
        &lt;td&gt;Pro&lt;/td&gt;
        &lt;td&gt;Con&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Since only the incremental 'delta' is in transit and re-rendered on the UI, these methods scale in performance. They are well suited for web applications where 3rd party developers could be using your data feed to build user interface or other real-timer services.&lt;/td&gt;
        &lt;td&gt;Maintaining state can quickly become very complex. Multiple types of data, with different incrementals can lead to 'cache-mess'. It leads to many many caches and really big caches. User actions like filtering add considerable complexity to the underlying infra. And despite only incrementals being in transit, it is still a request-response system, making tight SLA's (&lt;5 seconds refresh rate) quite a challenge.&lt;/td&gt;
    &lt;/tr&gt;
&lt;/table&gt;

&lt;h4 id="3-comet"&gt;3. Comet&lt;/h4&gt;
&lt;p&gt;Comet, Reverse-Ajax et al. are hacks and not solutions. The idea is that the browser makes an Ajax request to the server, which is kept open until the server has new data to send to the browser. Once the server has the event it wants to send, it sends it on this already open channel. And soon after getting a response the browser initiates a new long polling request in order to obtain subsequent events. Multiple frameworks exist to accomplish the job from both server and client side. But the technology is riddled with bugs, browser incompatibilities and is a total mess.&lt;/p&gt;
&lt;h4 id="4-websocket"&gt;4. WebSocket&lt;/h4&gt;
&lt;p&gt;Websockets are a new protocol. The protocol specifies for setting up of a full duplex communication channel between client and server on top of HTTP(S). The HTTP header from client side has a "upgrade" field set to &lt;em&gt;websocket&lt;/em&gt; and "connection" field set to &lt;em&gt;upgrade&lt;/em&gt;. All modern browsers support this by the new JavaScript API WebSocket(). So the question boils down to - whats the best way to handle these upgrade requests on the server side? There are upcoming frameworks like &lt;a href="https://github.com/Atmosphere/atmosphere"&gt;Atmosphere&lt;/a&gt; which interoperate with popular existing server and client frameworks promising easy adoption.&lt;/p&gt;
&lt;h3 id="usecase"&gt;Usecase&lt;/h3&gt;
&lt;p&gt;A real time &lt;em&gt;alerts&lt;/em&gt; dashboard. In any monitoring/management/analytics system, events go through multiple stages before getting transformed into an alert needing to be displayed for concerned users. Event pipelines come in many types and JMS is not uncommon. The usecase here is of such a system where event processors pick events to evaluate and filter. The SLAs for critical alerts can be very small time-periods depending on the domain.&lt;/p&gt;
&lt;h3 id="design"&gt;Design&lt;/h3&gt;
&lt;p&gt;The image below shows the 5 components of the implementation of my usecase. The code is posted on GitHub &lt;a href="https://github.com/bharath12345/RealTimeDashboard"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. AsyncHttpClient&lt;/strong&gt;: This is just a data feed. In most data-center scenario's the data-feed to IT management/analytics/monitoring services is separated by a firewall. I use &lt;a href="http://www.ning.com/code/2010/03/introducing-nings-asynchronous-http-client-library/"&gt;Ning HTTP client&lt;/a&gt; - it is based on the superb Jetty NIO2 implementation and works well with JBoss. For the prototype's sake, I have taken the data itself to be just the HTTP headers. It could be anything from the payload also. And it could be from other type of sources like SNMP etc&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. AsyncHttpServer&lt;/strong&gt;: Camel provides a Jetty NIO2 based Async Server implementation. I use that to receive the client connections and pick the data (http headers in my case). &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. JMS Broker&lt;/strong&gt;: I use ActiveMQ. JBoss packages HornetQ natively. But ActiveMQ is by far the most popular JMS broker on planet earth. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. Multiple JMS Topics&lt;/strong&gt;: The data receiver can publish the received data into a chosen JMS topic (depending on the data received). The first publish is of Serializable Java POJO. The receiver on this JMS topic picks the POJO, transforms it to JSON and publishes to a different set of JMS topic's just for JSON (this is not shown in the image below but can be seen in the code).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5. Camel JMS to WebSocket Route&lt;/strong&gt;: Camel route is used to pick data from the JSON JMS topic and post it to both - WebSocket &amp;amp; log file - together. A final JSON level transformation can be applied in this stage if need be.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;6. JavaScript UI&lt;/strong&gt;: A JavaScript WebSocket() connects and waits for JSON messages to appear. Received messages are shown in a grid (Dojo's GridX actually)&lt;/p&gt;
&lt;p&gt;&lt;img alt="image" src="/images/camel-websocket/camel%20jms%20websocket.png"&gt;&lt;/p&gt;
&lt;h3 id="the-whys"&gt;The Why's?&lt;/h3&gt;
&lt;h4 id="1-why-apache-camel"&gt;1. Why Apache Camel?&lt;/h4&gt;
&lt;p&gt;(1) I wanted to learn Camel (2) Apache Camel is brilliant for plumbing purposes between modules/services within an enterprise product. The number of supported components is dizzying. Despite the heavy sounding ESB word being thrown around with it I have found it quite easy to grasp and it just works like a charm!  &lt;/p&gt;
&lt;h4 id="2-why-activemq-and-not-camels-native-jms-implementation"&gt;2. Why ActiveMQ and not Camel's native JMS implementation?&lt;/h4&gt;
&lt;p&gt;One of my dear friends, &lt;a href="http://in.linkedin.com/in/sumanthn83"&gt;Sumanth&lt;/a&gt;, pointed this rather subtle mention on performance aspect in Camel's JMS page.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="c1"&gt;//camel.apache.org/jms.html&lt;/span&gt;

&lt;span class="nx"&gt;The&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;JMS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;component&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;reuses&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Spring&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="err"&gt;&amp;#39;&lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;JmsTemplate&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;sending&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;This&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;not&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ideal&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;use&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;non&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;J2EE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;container&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;typically&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;requires&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;some&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;caching&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;JMS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;provider&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;avoid&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;poor&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;performance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;If&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;you&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;intend&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;use&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Apache&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ActiveMQ&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;your&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Message&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Broker&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;which&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;good&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;choice&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ActiveMQ&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;rocks&lt;/span&gt;&lt;span class="err"&gt;…&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Further to this, I am slowly developing an aversion to everything Spring. I opine that it is better to avoid Spring in any new development project of scale. And Camel JMS is based on Spring. So better to use ActiveMQ directly.&lt;/p&gt;
&lt;h4 id="3-why-websocket"&gt;3. Why WebSocket?&lt;/h4&gt;
&lt;p&gt;Experts in RESTful design like &lt;a href="http://bill.burkecentral.com/2012/02/28/web-sockets-a-disaster-in-waiting/"&gt;Bill Burke&lt;/a&gt; denounce WebSockets sharply. There are &lt;a href="http://www.infoq.com/news/2012/02/websockets-rest"&gt;others&lt;/a&gt; who welcome it anyway. Personally, I like the idea of a full duplex channel on top of HTTP. I dont think WebSockets maybe a good idea for companies and applications to expose there data and services - which exactly is the usecase for RESTful. WebSockets quite beautifully fit within enterprise products/applications where services are consumed internally between modules/known-applications and are deployed in a distributed setup where they cross multiple DMZ. Along with the upcoming &lt;a href="http://apiux.com/2013/07/23/http2-0-initial-draft-released/"&gt;draft of HTTP 2.0&lt;/a&gt; which will hopefully support -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Binary&lt;/li&gt;
&lt;li&gt;Connections remain open so long as user stays on the page&lt;/li&gt;
&lt;li&gt;Multiple open streams &lt;/li&gt;
&lt;li&gt;Priorities&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;having WebSockets will make HTTP a dependable channel for realtime!&lt;/p&gt;
&lt;h4 id="4-why-jboss7"&gt;4. Why JBoss7?&lt;/h4&gt;
&lt;p&gt;In the world of open source Java, JBoss is simply the best application container around. I used the &lt;a href="http://wildfly.org"&gt;Wildfly&lt;/a&gt; 8.0 Alpha3 for this prototype &lt;/p&gt;
&lt;h3 id="how-to-use-and-results"&gt;How to use and results&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;A "mvn clean install" would build the EAR which should be deployed in JBoss 7+&lt;/li&gt;
&lt;li&gt;From the JBoss JMX Console, use the firePostRequests() operation to send HTTP client side requests (com.bharath.http.client)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img alt="image" src="/images/camel-websocket/jmx.png"&gt;&lt;/p&gt;
&lt;p&gt;The snapshot of the dashboard UI -&lt;/p&gt;
&lt;p&gt;&lt;img alt="image" src="/images/camel-websocket/dashboard.png"&gt;&lt;/p&gt;
&lt;h3 id="my-conclusion"&gt;My Conclusion!!&lt;/h3&gt;
&lt;p&gt;Asynchronous processing by pushing to multiple JMS topic's when combined with Apache Camel's routing and WebSocket capabilities can provide for building a truely fast and efficient events/alerts pipeline for a realtime alerts dashboard&lt;/p&gt;</content><category term="posts"/></entry><entry><title>Java and JVM 7: Slides from a quick talk</title><link href="https://bharath12345.github.io/posts/java-and-jvm-7-slides-from-a-quick-talk/" rel="alternate"/><published>2013-07-31T00:00:00-04:00</published><updated>2013-07-31T00:00:00-04:00</updated><author><name>Bharadwaj</name></author><id>tag:bharath12345.github.io,2013-07-31:/posts/java-and-jvm-7-slides-from-a-quick-talk/</id><content type="html">&lt;p&gt;Doing Java early in the morning makes for a good day. Got up early today to put together few slides for a talk to developer folk. Not comprehensive. May not be very accurate even. And too much opinionated. If you dont mind my ego you may click &lt;a href="http://bharathwrites.in/pages/slides/java7.html"&gt;here&lt;/a&gt;.&lt;/p&gt;</content><category term="posts"/></entry><entry><title>Build Dojo 1.7/1.8/1.9 with Maven</title><link href="https://bharath12345.github.io/posts/build-dojo-1819-with-maven/" rel="alternate"/><published>2013-07-18T00:00:00-04:00</published><updated>2013-07-18T00:00:00-04:00</updated><author><name>Bharadwaj</name></author><id>tag:bharath12345.github.io,2013-07-18:/posts/build-dojo-1819-with-maven/</id><summary type="html">&lt;p&gt;I have been a Dojo user for many years now. Also use many JavaScript libraries (jQuery, backbone, bootstrap, D3, highsoft) all the while but Dojo is what I really love. I would not embark on any "professional" development work without being armed with Dojo. But I rest my opinions and …&lt;/p&gt;</summary><content type="html">&lt;p&gt;I have been a Dojo user for many years now. Also use many JavaScript libraries (jQuery, backbone, bootstrap, D3, highsoft) all the while but Dojo is what I really love. I would not embark on any "professional" development work without being armed with Dojo. But I rest my opinions and comparisons of different JS libraries for a different blog. Here the context is to "build" Dojo. After all every professional project should do a build of their JS - compilers like Google Closure can find bugs, obfuscate and eventually make execution faster.&lt;/p&gt;
&lt;div class="toc"&gt;&lt;span class="toctitle"&gt;Table of Contents&lt;/span&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#task-1-installing-dojo-in-maven-repository-and-unpack-task"&gt;Task 1: Installing Dojo in Maven Repository and Unpack Task&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#task-2-move-dojo-sources"&gt;Task 2: Move Dojo sources&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#task-3-build-dojo"&gt;Task 3: Build Dojo&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#task-4-the-dojo-profile"&gt;Task 4: The Dojo Profile&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#task-5-clean-the-uncompressed-javascript"&gt;Task 5: Clean the Uncompressed JavaScript&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#task-6-copy-other-javascript-libraries"&gt;Task 6: Copy Other JavaScript libraries&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#task-7-a-fast-build-profile"&gt;Task 7: A fast build profile&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;I still am mainly a Java programmer (the enterprise products I have built are predominantly in Java… time split between Java/JavaScript may be 70/30). So am used to Maven as my primary build tool. And Maven I shall use to build Dojo.&lt;/p&gt;
&lt;p&gt;Folks who have not tried to build Dojo should probably start-off by reading these two articles -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://dojotoolkit.org/documentation/tutorials/1.9/build/"&gt;Creating Builds&lt;/a&gt; from Dojo documentation&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.mahieu.org/?p=3"&gt;Creating custom Dojo builds in Maven&lt;/a&gt; &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The last article is very good but slightly dated. And here is what I propose to add to it -&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Use dojo v1.9 (v1.8 and v1.7 with AMD should also work perfectly)&lt;/li&gt;
&lt;li&gt;I use WebStorm as my JavaScript IDE. It has excellent contextual support including that for Dojo. However, it requires Dojo to be at a constant referencable path from where it could index. Once the indexes are built, typing a "." after an object should show up the list of methods and variables belonging to that object. This is extremely useful for fast development&lt;/li&gt;
&lt;li&gt;Dojo builds are slow. A typical build from source download to unzip to compile to build WAR can take anywhere between 5 to 15 minutes. This can be painful and needs to be made faster&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Now, here is the how…&lt;/p&gt;
&lt;h4 id="task-1-installing-dojo-in-maven-repository-and-unpack-task"&gt;Task 1: Installing Dojo in Maven Repository and Unpack Task&lt;/h4&gt;
&lt;p&gt;This is no different from the Step 1 &amp;amp; 2 in Mahieu blog. The unzipped sources are placed in src/main/js of my maven hierarchy. I dont do any renaming of this directory.&lt;/p&gt;
&lt;h4 id="task-2-move-dojo-sources"&gt;Task 2: Move Dojo sources&lt;/h4&gt;
&lt;p&gt;The unpack task unzips the dojo sources in "src/main/js/dojo-release-${dojo.version}-src" directory. This is okay but not good for repeated builds. I would like a structure like shown in the picture below - all my JS libraries under src/main/js. &lt;/p&gt;
&lt;p&gt;&lt;img alt="image" src="https://raw.github.com/bharath12345/bharath12345.github.io/master/images/dojo%20blog/dojo%20blog%20structure.png"&gt; &lt;/p&gt;
&lt;p&gt;This structure helps in one major way - it helps my WebStorm IDE to index the JS. The Dojo JS are always in "src/main/js" alongwith other libraries and WebStorm understands this very well!&lt;/p&gt;
&lt;p&gt;I use antrun for its ability to run &lt;strong&gt;parallel copy tasks&lt;/strong&gt; - parallelism helps in making the build much faster. And I &lt;strong&gt;delete&lt;/strong&gt; the original unzipped directory at the end.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;plugin&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;maven-antrun-plugin&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;executions&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;execution&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;id&amp;gt;&lt;/span&gt;Copy&lt;span class="w"&gt; &lt;/span&gt;Dojo&lt;span class="nt"&gt;&amp;lt;/id&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;configuration&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;tasks&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                    &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;parallel&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                        &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;copy&lt;/span&gt; &lt;span class="na"&gt;todir=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="n"&gt;js&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nb"&gt;dir&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&amp;quot;&lt;/span&gt; &lt;span class="na"&gt;failonerror=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;false&amp;quot;&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;fileset&lt;/span&gt; &lt;span class="na"&gt;dir=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="n"&gt;dojoSrc&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                                &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;include&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;dijit/&amp;quot;/&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                           &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/fileset&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                       &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/copy&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                       &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;copy&lt;/span&gt; &lt;span class="na"&gt;todir=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="n"&gt;js&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nb"&gt;dir&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&amp;quot;&lt;/span&gt; &lt;span class="na"&gt;failonerror=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;false&amp;quot;&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;fileset&lt;/span&gt; &lt;span class="na"&gt;dir=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="n"&gt;dojoSrc&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                                &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;include&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;dojox/&amp;quot;/&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                           &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/fileset&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                       &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/copy&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                       &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;copy&lt;/span&gt; &lt;span class="na"&gt;todir=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="n"&gt;js&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nb"&gt;dir&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&amp;quot;&lt;/span&gt; &lt;span class="na"&gt;failonerror=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;false&amp;quot;&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;fileset&lt;/span&gt; &lt;span class="na"&gt;dir=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="n"&gt;dojoSrc&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                                &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;include&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;dojo/&amp;quot;/&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                           &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/fileset&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                       &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/copy&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                       &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;copy&lt;/span&gt; &lt;span class="na"&gt;todir=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="n"&gt;js&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nb"&gt;dir&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&amp;quot;&lt;/span&gt; &lt;span class="na"&gt;failonerror=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;false&amp;quot;&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;fileset&lt;/span&gt; &lt;span class="na"&gt;dir=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="n"&gt;dojoSrc&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                                &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;include&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;util/&amp;quot;/&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                           &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/fileset&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                       &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/copy&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                   &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/parallel&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                   &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;delete&lt;/span&gt; &lt;span class="na"&gt;dir=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="n"&gt;dojoSrc&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt; &lt;span class="na"&gt;quiet=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;true&amp;quot;/&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;               &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/tasks&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/configuration&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;phase&amp;gt;&lt;/span&gt;process-sources&lt;span class="nt"&gt;&amp;lt;/phase&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;goals&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;goal&amp;gt;&lt;/span&gt;run&lt;span class="nt"&gt;&amp;lt;/goal&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/goals&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/execution&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/executions&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/plugin&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4 id="task-3-build-dojo"&gt;Task 3: Build Dojo&lt;/h4&gt;
&lt;p&gt;For this again, I use the antrun plugin. This build leads to creation of dojo/dijit/dojox directories under src/main/js.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;plugin&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;maven-antrun-plugin&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;executions&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;execution&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;id&amp;gt;&lt;/span&gt;AppsOne&lt;span class="w"&gt; &lt;/span&gt;dojo&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="n"&gt;dojo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Custom&lt;span class="w"&gt; &lt;/span&gt;Build&lt;span class="nt"&gt;&amp;lt;/id&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;phase&amp;gt;&lt;/span&gt;compile&lt;span class="nt"&gt;&amp;lt;/phase&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;configuration&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;tasks&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                    &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;parallel&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                        &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;java&lt;/span&gt; &lt;span class="na"&gt;classname=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;org.mozilla.javascript.tools.shell.Main&amp;quot;&lt;/span&gt;
                    &lt;span class="na"&gt;fork=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;true&amp;quot;&lt;/span&gt; &lt;span class="na"&gt;maxmemory=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;512m&amp;quot;&lt;/span&gt; &lt;span class="na"&gt;failonerror=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;false&amp;quot;&lt;/span&gt;
                    &lt;span class="na"&gt;classpath=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="n"&gt;shrinksafe&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nb"&gt;dir&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/js.jar&lt;/span&gt;&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;separator&lt;/span&gt;&lt;span class="cp"&gt;}${&lt;/span&gt;&lt;span class="n"&gt;closure&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nb"&gt;dir&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/compiler.jar&lt;/span&gt;&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;separator&lt;/span&gt;&lt;span class="cp"&gt;}${&lt;/span&gt;&lt;span class="n"&gt;shrinksafe&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nb"&gt;dir&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/shrinksafe.jar&amp;quot;&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;arg&lt;/span&gt; &lt;span class="na"&gt;value=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="n"&gt;js&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nb"&gt;dir&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/dojo/dojo.js&amp;quot;/&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;arg&lt;/span&gt; &lt;span class="na"&gt;value=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;baseUrl=&lt;/span&gt;&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="n"&gt;js&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nb"&gt;dir&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/dojo&amp;quot;/&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;arg&lt;/span&gt; &lt;span class="na"&gt;value=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;load=build&amp;quot;/&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;arg&lt;/span&gt; &lt;span class="na"&gt;line=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;--profile &lt;/span&gt;&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="n"&gt;basedir&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/dashboard.profile.js&amp;quot;/&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;arg&lt;/span&gt; &lt;span class="na"&gt;value=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;--release&amp;quot;/&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                        &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/java&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                    &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/parallel&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/tasks&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/configuration&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;goals&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;goal&amp;gt;&lt;/span&gt;run&lt;span class="nt"&gt;&amp;lt;/goal&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/goals&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/execution&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/executions&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/plugin&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4 id="task-4-the-dojo-profile"&gt;Task 4: The Dojo Profile&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://github.com/bharath12345/uiDashboard/blob/master/uiJS/dashboard.profile.js"&gt;This is the link&lt;/a&gt; to the profile script I use. It has a lot of comments for the reader to understand. One can find a lot of options to tune the Dojo build by specifying options in the profile. The profile specifies thus -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I name my JS project as "Dashboard" - so I want the built artifacts to be in the target/dashboard directory&lt;/li&gt;
&lt;li&gt;Use the closure compiler&lt;/li&gt;
&lt;li&gt;I use both dgrid and gridx in my project along with its dependencies (xstyle, dbind, put-selector) - so those have to be included&lt;/li&gt;
&lt;li&gt;Including my project's JS - which are present in the "dashboard" directory and are AMD complying JS&lt;/li&gt;
&lt;li&gt;Finally I want to see less verbose prints on my console - so I set the logging level to SEVERE&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="task-5-clean-the-uncompressed-javascript"&gt;Task 5: Clean the Uncompressed JavaScript&lt;/h4&gt;
&lt;p&gt;Dojo build generates minimized JS. And in the process of doing so it retains the originial JS but renames them to have "uncompressed" in their filenames. This is useful for debugging purposes. But surely, we dont want these uncompressed JS to be part of the built WAR. It increases the size of the WAR (at least doubles it - taking it well above 50MB!). So, a task to remove these uncompressed JS from target directory is required. This maven stub does just that -&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;plugin&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;maven-clean-plugin&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;2.5&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;executions&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;execution&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;id&amp;gt;&lt;/span&gt;clean-js&lt;span class="nt"&gt;&amp;lt;/id&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;phase&amp;gt;&lt;/span&gt;prepare-package&lt;span class="nt"&gt;&amp;lt;/phase&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;goals&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;goal&amp;gt;&lt;/span&gt;clean&lt;span class="nt"&gt;&amp;lt;/goal&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/goals&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;configuration&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;filesets&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;fileset&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                    &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;directory&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="n"&gt;release&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nb"&gt;dir&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;/dojo&lt;span class="nt"&gt;&amp;lt;/directory&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                    &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;includes&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                        &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;include&amp;gt;&lt;/span&gt;**/*uncompressed.js&lt;span class="nt"&gt;&amp;lt;/include&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                    &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/includes&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                    &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;followSymlinks&amp;gt;&lt;/span&gt;true&lt;span class="nt"&gt;&amp;lt;/followSymlinks&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;               &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/fileset&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/configuration&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/execution&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/executions&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/plugin&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4 id="task-6-copy-other-javascript-libraries"&gt;Task 6: Copy Other JavaScript libraries&lt;/h4&gt;
&lt;p&gt;By now, the "target/dashboard/js" has all the dojo sources along with project specific built in it. The next task is to copy other JS library dependencies. In my project, I typically use D3, jQuery and jsPlumb. So here is I copy them into this directory into maven's target by stub's like these -&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;plugin&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;maven-resources-plugin&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;2.6&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;executions&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;execution&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;id&amp;gt;&lt;/span&gt;copy-d3&lt;span class="nt"&gt;&amp;lt;/id&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;phase&amp;gt;&lt;/span&gt;process-resources&lt;span class="nt"&gt;&amp;lt;/phase&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;goals&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;goal&amp;gt;&lt;/span&gt;copy-resources&lt;span class="nt"&gt;&amp;lt;/goal&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/goals&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;configuration&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;outputDirectory&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="n"&gt;gui&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gui&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;/js/d3&lt;span class="nt"&gt;&amp;lt;/outputDirectory&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;               &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;resources&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                    &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;resource&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                        &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;directory&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;${&lt;/span&gt;&lt;span class="n"&gt;js&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nb"&gt;dir&lt;/span&gt;&lt;span class="cp"&gt;}&lt;/span&gt;/d3&lt;span class="nt"&gt;&amp;lt;/directory&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;                   &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/resource&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;               &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/resources&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/configuration&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/execution&amp;gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;&amp;lt;/executions&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/plugin&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4 id="task-7-a-fast-build-profile"&gt;Task 7: A fast build profile&lt;/h4&gt;
&lt;p&gt;Dojo ZIP is upwards of 35MB in size with thousands of files. Downloading, unarchiving and moving it around makes it a heavy duty operation which is painfully slow. This makes a maven profile for faster build absolutely necessary. This profile does the following -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Assumes the presence of unarchived dojo bundle in the source tree under "src/main/js"&lt;/li&gt;
&lt;li&gt;It thus does none of the unarchiving or file movements and starts off directly with a closure build&lt;/li&gt;
&lt;li&gt;Does not delete the dojo/dijit/dojox directories from under src/main/js after the build is complete&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Readers can refer to this &lt;a href="https://github.com/bharath12345/uiDashboard/blob/master/uiJS/pom.xml"&gt;pom.xml&lt;/a&gt; from one of my projects on GitHub. It has all that I have described above. Ping me if you run into any issues using my code, understanding my blog or anything else. Thanks for reading!&lt;/p&gt;</content><category term="posts"/><category term="javascript"/><category term="dojo"/></entry><entry><title>Few days with Apache Cassandra</title><link href="https://bharath12345.github.io/posts/few-days-with-apache-cassandra/" rel="alternate"/><published>2013-07-11T00:00:00-04:00</published><updated>2013-07-11T00:00:00-04:00</updated><author><name>Bharadwaj</name></author><id>tag:bharath12345.github.io,2013-07-11:/posts/few-days-with-apache-cassandra/</id><summary type="html">&lt;p&gt;Few years ago I was a product developer at a big software (but non-database) company. We were writing the v2 of a new product after a fairly successful development round of v1. For everything OLTP, we used the wonderful open-source database - Postgres. But by v2, we had new, hight-volume data …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Few years ago I was a product developer at a big software (but non-database) company. We were writing the v2 of a new product after a fairly successful development round of v1. For everything OLTP, we used the wonderful open-source database - Postgres. But by v2, we had new, hight-volume data like NetFlow coming in. This would have intensely tested Postgres's scalability and read/write performance. And we had some datawarehousing and OLAP requirements too. A hard look at our queries told us that column-stores would be a great-fit. Looking back, the options for a new product to store and query on massive data volumes boiled down to these few options -&lt;/p&gt;
&lt;div class="toc"&gt;&lt;span class="toctitle"&gt;Table of Contents&lt;/span&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#a-simple-usecase"&gt;A Simple Usecase&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#data-volumes"&gt;Data Volumes&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#fine-grained-data"&gt;Fine-grained Data&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#coarse-grained-data"&gt;Coarse-grained Data&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#adding-it-all-up"&gt;Adding it all up!&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#before-we-start-data-modeling"&gt;Before we start data modeling...&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#data-access-methods-in-cassandra"&gt;Data Access methods in Cassandra&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#supercolumns"&gt;SuperColumns&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#denormalization-and-data-modeling-by-queries"&gt;Denormalization and Data Modeling by Queries&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#code-itself"&gt;Code Itself&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#data-modeling"&gt;Data Modeling&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#keyspace-configuration"&gt;Keyspace Configuration&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#for-jvm-method-metrics"&gt;For JVM Method metrics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#for-jvm-wide-statistics"&gt;For JVM wide statistics&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#column-families-in-jvmmethodmetrics-keyspace"&gt;Column Families in JvmMethodMetrics KEYSPACE&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="#raw-trend-query-tables"&gt;Raw Trend Query Tables&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#trend-query-roll-up-tables"&gt;Trend Query Roll-up Tables&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#topn-query-tables"&gt;TopN Query Tables&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#column-families-in-jvmmetricsraw-keyspace"&gt;Column Families in JvmMetricsRaw KEYSPACE&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#query-code"&gt;Query Code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#readwrite-performance"&gt;Read/Write Performance&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#conclusion"&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#reading-recommendations"&gt;Reading Recommendations&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;Throw more hardware: Tell the needy customer to invest more in hardware. But no one really knew how much more hardware was really going to nail it&lt;/li&gt;
&lt;li&gt;Tune, Shard, Rebuild, Redeploy: Invest in tuning our software and database for specific queries. Shard, re-model and/or do whatever that could be done by the development and implementation teams around what we had&lt;/li&gt;
&lt;li&gt;Use Oracle&lt;ul&gt;
&lt;li&gt;This did not make good business sense for a big product company - tying itself deep into Oracle&lt;/li&gt;
&lt;li&gt;CTO and architects did not think Oracle could nail the data volumes anyway (actually none of the engineers who understood the problem thought Oracle would nail it anyway!)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Use column-stores like Sybase, Vertica&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The fact was, there were no open-source, reliable, horizontally scalable column-stores or parallel DBMS to consider.&lt;/p&gt;
&lt;p&gt;Times have improved. We now have Cassandra, HBase, Hypertable etc (MongoDB, CouchDB etc are document stores with less of modeling - here the context is of schema-full data with rich data-type support).&lt;/p&gt;
&lt;p&gt;So, I decided to try and understand Cassandra. Wanted to answer the simple question - if I were to re-live the product development scenario described above, would I choose Cassandra? So in this article I talk about my experiment with Cassandra. Here, I choose a very specific use-case to illustrate what I found - Monitoring JVM metrics in a small data center.&lt;/p&gt;
&lt;h4 id="a-simple-usecase"&gt;A Simple Usecase&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;A web company running 50 JVMs. The JVMs could be Apache-Tomcat servlet containers hosting the application&lt;/li&gt;
&lt;li&gt;Each Tomcat instance hosts 50 URLs and thereby, say, 50 front-ending servlet classes each extending HttpServlet&lt;/li&gt;
&lt;li&gt;Method metrics are collected on these servlets (through logs or bytecode instrumentation or aspect-driven). Specifically, the metrics collected - number of invocations and time-spent - just 2 method level metrics!&lt;/li&gt;
&lt;li&gt;Idea is to analyze the metrics to get insights into - how to deploy the servets servers? Are there any hotspots and, if so, where - which URL (object) is being accessed most/least? at what times? trends? and so on…&lt;/li&gt;
&lt;li&gt;Along with monitoring these specific servlet method's also keep a tab on overall application health. The number of active-threads in all JVM's. Various JVM memory parameters. A few MBean stat's. Etc…&lt;/li&gt;
&lt;li&gt;Minimum data view granularity requirements -&lt;ul&gt;
&lt;li&gt;Last 30 days  - per-minute, per-hour, per-day, per-week, per-month&lt;/li&gt;
&lt;li&gt;Last 60 days  - per-hour, per-day, per-week, per-month&lt;/li&gt;
&lt;li&gt;Last 180 days - per-day, per-week, per-month&lt;/li&gt;
&lt;li&gt;Last 360 days - per-week (52 weeks), per-month&lt;/li&gt;
&lt;li&gt;Last 720 days - per-month (24 months)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;User primarily requires 'trend' and 'topN' charts. Examples -&lt;ul&gt;
&lt;li&gt;Chart of Top-10 most invoked servlets in last 2 months at per-hour granularity&lt;/li&gt;
&lt;li&gt;Trend of three specific servlet's response-times {max, min, avg, 1st and 3rd quartile} over last 6 months plotted per day&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;User also wants JVM wide statistics like - active threads, memory stats and datasource stats - all following the same granularities as above. Lets suppose that these combine to 6 separate metrics in all. &lt;/li&gt;
&lt;li&gt;From the querying perspective, lets say we have only 2 users in our IT Operations team who will be actively querying this data.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="data-volumes"&gt;Data Volumes&lt;/h4&gt;
&lt;h6 id="fine-grained-data"&gt;Fine-grained Data&lt;/h6&gt;
&lt;ul&gt;
&lt;li&gt;JVM Method data: &lt;ul&gt;
&lt;li&gt;50 JVMs * 50 Methods * 24 Hours in a day * 60 minutes per hour * 2 metric-types = 7.2 Million data-points per day. &lt;/li&gt;
&lt;li&gt;7.2 Million * 30 = 216 Million data points per month&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;JVM-wide stats: &lt;ul&gt;
&lt;li&gt;50 JVMs * 24 Hours * 60 minutes * 6 metric-types = 432K data points per day &lt;/li&gt;
&lt;li&gt;432K * 30 = 12.96 Million per month&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h6 id="coarse-grained-data"&gt;Coarse-grained Data&lt;/h6&gt;
&lt;ul&gt;
&lt;li&gt;This corresponds to roll-ups. Hourly, Daily, Weekly and Monthly.&lt;/li&gt;
&lt;li&gt;Hourly rollup for last 60 days&lt;ul&gt;
&lt;li&gt;JVM method data: 50 JVMs * 50 Methods * 24 Hours * 60 days * 2 metric-types = 7.2 Million data points over last 60 days. Or, 120K data points per day&lt;/li&gt;
&lt;li&gt;JVM-wide stats: 50 JVMs * 24 Hours * 60 days * 6 metric-types = 432K data points over last 60 days. Or, 7.2K data points per day&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Daily rollup for last 180 days&lt;ul&gt;
&lt;li&gt;JVM Method data: 50 JVMs * 50 Methods * 180 days * 2 metric-types = 900K data points in 180 days. Or, 5K data points per day&lt;/li&gt;
&lt;li&gt;JVM-wide stats: 50 JVMs * 180 days * 6 metric-types = 54K data points in 180 days. Or, 300 data points per day&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Weekly rollup for last 52 weeks&lt;ul&gt;
&lt;li&gt;JVM Method data: 50 JVMs * 50 Methods * 52 weeks * 2 metric-types = 260K data points over last 52 weeks. Or, 5K data points per week. Or, 700 data points per day&lt;/li&gt;
&lt;li&gt;JVM-wide stats: 50 JVMs * 52 weeks * 6 metric-types = 15.6K data points over last 52 weeks. Or, 300 data points per week. Or, 40 data points per day&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Monthly rollup for last 24 months&lt;ul&gt;
&lt;li&gt;JVM Method data: 50 JVMs * 50 Methods * 24 months * 2 metric-types = 120K data points for last 24 months. Or, 5K data points per month. Or, 170 data points per day&lt;/li&gt;
&lt;li&gt;JVM-wide stats: 50 JVMs * 30 days * 6 metric-types = 9000 data points per month. Or, 300 data points per month. Or 10 data points per day&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h6 id="adding-it-all-up"&gt;Adding it all up!&lt;/h6&gt;
&lt;p&gt;Number of data points collected PER DAY -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;JVM Method data:&lt;ul&gt;
&lt;li&gt;Fine grained minute data points = 7.2 Million&lt;/li&gt;
&lt;li&gt;Hourly rollup = 120K&lt;/li&gt;
&lt;li&gt;Daily rollup = 5K&lt;/li&gt;
&lt;li&gt;Weekly rollup = 700&lt;/li&gt;
&lt;li&gt;Monthly rollup = 170&lt;/li&gt;
&lt;li&gt;Total (approx) = 7.32 Million&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;JVM-wide stats:&lt;ul&gt;
&lt;li&gt;Fine grained minute data points = 432K&lt;/li&gt;
&lt;li&gt;Hourly rollup = 7.2K&lt;/li&gt;
&lt;li&gt;Daily rollup = 300&lt;/li&gt;
&lt;li&gt;Weekly rollup = 40&lt;/li&gt;
&lt;li&gt;Monthly rollup = 10&lt;/li&gt;
&lt;li&gt;Total (approx) = 440K&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Total of totals = 7.76 Million data points per day. Or, 320K data points per hour. Or, 5500 data points per minute. Or 90 data-points per second&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are couple of VERY IMPORTANT things to realize before going further -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In the DBMS world, multiple data points can fit into a single row. So, 90 data-points per second translates to fewer than 90 row inserts per second. But how fewer depends on the data modeling&lt;/li&gt;
&lt;li&gt;The temporal distribution of inserts is not even. The hourly roll-up kicks in at the end of each hour. Daily roll-up at the end-of-day and so on (not considering the timezone adjustments required for roll-ups)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Small-data problem? Its just a prototype!!&lt;/p&gt;
&lt;h4 id="before-we-start-data-modeling"&gt;Before we start data modeling...&lt;/h4&gt;
&lt;h6 id="data-access-methods-in-cassandra"&gt;Data Access methods in Cassandra&lt;/h6&gt;
&lt;p&gt;Predominantly, there are three ways to interact with Cassandra - Hector, Astyanax and CQL. Cassandra supports Thrift by providing an API. Hector and Astyanax use the Thrift API to talk to the DBMS. CQL3 proposes a new SQL like API. This &lt;a href="http://www.slideshare.net/jericevans/cql-sql-in-cassandra"&gt;slidedeck&lt;/a&gt; has CQL3 performance vis-a-vis Thrift-API by the main committer of this piece - Eric Evans. Take your pick! In this prototype, I use CQL3. &lt;/p&gt;
&lt;h6 id="supercolumns"&gt;SuperColumns&lt;/h6&gt;
&lt;p&gt;Recent articles and blogs suggest that supercolumns are a bad design and will go away in future releases of Cassandra. So I use composite keys and not supercolumns to model the data&lt;/p&gt;
&lt;h6 id="denormalization-and-data-modeling-by-queries"&gt;Denormalization and Data Modeling by Queries&lt;/h6&gt;
&lt;p&gt;One of the central ideas in column-stores is to model data per the queries expected. Also denormalize, that is, store multiple replicas of data if required. Both these ideas have strong theoratical backing. Let me state just two -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;DB schema per query requirements - One of the gurus of database design, Professor Stonebraker has suggested that in enterprise applications OLTP queries are well known in advance, few in number, and do not change often. Refer to &lt;a href="http://cs-www.cs.yale.edu/homes/dna/papers/vldb07hstore.pdf"&gt;this paper&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Denormalization - RDBMS belongs to the era when storage was expensive. Its not so anymore. CPUs are far more expensive (in both ways - CapEx and OpEx ). And DB queries take CPU cycles. And a waiting user could have tangible/intangile revenue implications of web companies. All put together, model database sparsely and denormalized. Store multiple versions and replicas of data. Do anything to make queries faster!&lt;/li&gt;
&lt;/ul&gt;
&lt;h6 id="code-itself"&gt;Code Itself&lt;/h6&gt;
&lt;p&gt;The JBoss7 based implementation of this prototype can be found in my github &lt;a href="https://github.com/bharath12345/JvmDataStorage"&gt;repository&lt;/a&gt;. You will find a couple of MBean's - JvmMethodMetricsDAO and JvmMethodIdNameDAO which have the persist() and find() methods. The procedure to use this is -&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Build the artifact using maven - 'mvn clean install' at the top level directory&lt;/li&gt;
&lt;li&gt;Deploy the jim-ear.ear in JBoss's standalone/deployments&lt;/li&gt;
&lt;li&gt;Start JBoss's jconsole and you should be able to see these MBean's in the jconsole's UI&lt;/li&gt;
&lt;/ol&gt;
&lt;h4 id="data-modeling"&gt;Data Modeling&lt;/h4&gt;
&lt;p&gt;Here are few of the broad guidelines I set and followed -&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;One Keyspace each for both types of data (JVM methods and JVM-wide stats). Each keyspace holds raw (fine grained) and roll-up data&lt;/li&gt;
&lt;li&gt;As few strings as possible in the stores&lt;/li&gt;
&lt;li&gt;Keep row-key and columm-key string names small&lt;/li&gt;
&lt;li&gt;Many data items like JVM_ID will need a mapping table to map JVM-Name to a UUID&lt;/li&gt;
&lt;li&gt;Row Key -&lt;ul&gt;
&lt;li&gt;For fine grained, minutely data, row key is a combination of JVM_ID and date (20130628 for 28th June 2013)&lt;/li&gt;
&lt;li&gt;All roll-up tables have JVM_ID as the row key&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Columns for roll-up data&lt;ul&gt;
&lt;li&gt;Hourly  Roll-up: 60 days,  2 months  =&amp;gt; 24 * 60 = 1440 columns&lt;/li&gt;
&lt;li&gt;Daily   Roll-up: 180 days, 6 months  =&amp;gt; 180 columns&lt;/li&gt;
&lt;li&gt;Weekly  Roll-up: 350 days, 50 weeks  =&amp;gt; 50  columns&lt;/li&gt;
&lt;li&gt;Monthly Roll-up: 720 days, 24 months =&amp;gt; 24  columns&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Cassandra has this superb concept of tombstones and data cleanup. This can be triggered by setting a TTL field during inserts. TTL is set in seconds and I used the following setting in this prototype -&lt;ul&gt;
&lt;li&gt;Raw:             30  days =&amp;gt; 30 * 24 * 60 * 60  =&amp;gt; 2,592,000&lt;/li&gt;
&lt;li&gt;Hourly  Roll-up: 60  days =&amp;gt; 2 * 2,592,000      =&amp;gt; 5,184,000&lt;/li&gt;
&lt;li&gt;Daily   Roll-up: 180 days =&amp;gt; 3 * 5,184,000      =&amp;gt; 15,552,000&lt;/li&gt;
&lt;li&gt;Weekly  Roll-up: 350 days =&amp;gt; 350 * 24 * 60 * 60 =&amp;gt; 30,240,000&lt;/li&gt;
&lt;li&gt;Monthly Roll-up: 720 days =&amp;gt; 4 * 15,552,000     =&amp;gt; 62,208,000&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h5 id="keyspace-configuration"&gt;Keyspace Configuration&lt;/h5&gt;
&lt;h6 id="for-jvm-method-metrics"&gt;For JVM Method metrics&lt;/h6&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="nx"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;KEYSPACE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;JvmMethodMetrics&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;WITH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;replication&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;&amp;#39;&lt;/span&gt;&lt;span class="kd"&gt;class&lt;/span&gt;&lt;span class="err"&gt;&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;&amp;#39;&lt;/span&gt;&lt;span class="nx"&gt;SimpleStrategy&lt;/span&gt;&lt;span class="err"&gt;&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;&amp;#39;&lt;/span&gt;&lt;span class="nx"&gt;replication_factor&lt;/span&gt;&lt;span class="err"&gt;&amp;#39;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h6 id="for-jvm-wide-statistics"&gt;For JVM wide statistics&lt;/h6&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="nx"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;KEYSPACE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;JvmMetrics&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nx"&gt;WITH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;replication&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;&amp;#39;&lt;/span&gt;&lt;span class="kd"&gt;class&lt;/span&gt;&lt;span class="err"&gt;&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;&amp;#39;&lt;/span&gt;&lt;span class="nx"&gt;SimpleStrategy&lt;/span&gt;&lt;span class="err"&gt;&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;&amp;#39;&lt;/span&gt;&lt;span class="nx"&gt;replication_factor&lt;/span&gt;&lt;span class="err"&gt;&amp;#39;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h5 id="column-families-in-jvmmethodmetrics-keyspace"&gt;Column Families in JvmMethodMetrics KEYSPACE&lt;/h5&gt;
&lt;h6 id="raw-trend-query-tables"&gt;Raw Trend Query Tables&lt;/h6&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;TABLE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;JvmMethodIdNameMap&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;jvm_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb nb-Type"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;method_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb nb-Type"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;method_name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;PRIMARY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;KEY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jvm_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;INDEX&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;jvm_method_name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;JvmMethodIdNameMap&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;method_name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;TABLE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;JvmMethodMetricsRaw&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;jvm_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb nb-Type"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;day_time&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb nb-Type"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;method_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb nb-Type"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;invocations&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bigint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;response_time&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb nb-Type"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;PRIMARY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;KEY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jvm_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;INDEX&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;jvm_method_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;JvmMethodMetricsRaw&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;method_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h6 id="trend-query-roll-up-tables"&gt;Trend Query Roll-up Tables&lt;/h6&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;CREATE TABLE JvmMethodMetricsHourly (
    jvm_id int,
    hour int,
    method_id bigint,
    invocations bigint,
    response_time float,
    PRIMARY KEY (jvm_id)
);

CREATE TABLE JvmMethodMetricsDaily (
    jvm_id int,
    day int,
    method_id bigint,
    invocations bigint,
    response_time float,
    PRIMARY KEY (jvm_id)
);

CREATE TABLE JvmMethodMetricsWeekly (
    jvm_id int,
    week int,
    method_id bigint,
    invocations bigint,
    response_time float,
    PRIMARY KEY (jvm_id)
);

CREATE TABLE JvmMethodMetricsMonthly (
    jvm_id int,
    month int,
    method_id bigint,
    invocations bigint,
    response_time float,
    PRIMARY KEY (jvm_id)
);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h6 id="topn-query-tables"&gt;TopN Query Tables&lt;/h6&gt;
&lt;p&gt;Data in these tables is kept sorted by maximum (response-time/invocations) to minimum&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;TABLE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;JvmMethodTopNHourly&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;jvm_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb nb-Type"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;hour&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb nb-Type"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;method_id_type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Example&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="n"&gt;_RT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;103&lt;/span&gt;&lt;span class="n"&gt;_INV&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;103&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;invocation&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;response_time_map&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb nb-Type"&gt;float&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;invocation_count_map&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;long&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;PRIMARY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;KEY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jvm_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;hour&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;TABLE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;JvmMethodTopNDaily&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;jvm_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb nb-Type"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;day&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb nb-Type"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;method_id_type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;response_time_map&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb nb-Type"&gt;float&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;invocation_count_map&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;long&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;PRIMARY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;KEY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jvm_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;hour&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;TABLE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;JvmMethodTopNWeekly&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;jvm_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb nb-Type"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;week&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb nb-Type"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;method_id_type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;response_time_map&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb nb-Type"&gt;float&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;invocation_count_map&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;long&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;PRIMARY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;KEY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jvm_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;hour&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;TABLE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;JvmMethodTopNMonthly&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;jvm_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb nb-Type"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;month&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb nb-Type"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;method_id_type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;response_time_map&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb nb-Type"&gt;float&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;invocation_count_map&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;long&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;PRIMARY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;KEY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jvm_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;hour&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h5 id="column-families-in-jvmmetricsraw-keyspace"&gt;Column Families in JvmMetricsRaw KEYSPACE&lt;/h5&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;CREATE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;TABLE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;JvmMetricsRaw&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;jvm_id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb nb-Type"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;varchar&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;day_time&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb nb-Type"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;total_live_threads&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb nb-Type"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;mem_heap&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;bigint&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;             &lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;points&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;commited&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;max&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;used&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;mem_nonheap&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;bigint&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;ds_freepool&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb nb-Type"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bigint&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;datasource_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;free&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;of&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;ds_usetime&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb nb-Type"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bigint&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;threads&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;avg&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;over&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;min&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;PRIMARY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;KEY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jvm_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4 id="query-code"&gt;Query Code&lt;/h4&gt;
&lt;p&gt;CQL3 packs a &lt;a href="http://www.datastax.com/documentation/developer/java-driver/1.0/index.html#java-driver/reference/queryBuilder_r.html"&gt;QueryBuilder&lt;/a&gt; utility that offers some basic features. Refer to the QueryBuild JavaDocs for more info. I was able to build simple queries for 'select' using different 'where' clauses for time and ID's without much effort. I would recommend users to extend Cassandra's QueryBuilder in their DAO layer to provide model specific functionality and catch errors. The prototype offers a Entity/DAO model which can be easily understood by those familiar with JPA/Hibernate. (However I am not a fan of the many ORM frameworks that are coming up for Cassandra - the knowledge of 'entity' modeling is critical for performance problems which Cassandra proposes to handle. Using a Cassandra ORM framework would mean lesser knowlege of data model and consequently less performant queries. Stay away from them!)&lt;/p&gt;
&lt;h4 id="readwrite-performance"&gt;Read/Write Performance&lt;/h4&gt;
&lt;p&gt;Post modeling and unit testing I ran the application on my laptop (MacBookPro 2.9GHz/8GB RAM). Since my laptop is not an ideal performance test environment (I have multiple applications running, no tuning of cassandra or JBoss) I see no point in publishing the numbers or charts. However, I was able to 'write' literally millions of records per minute and read them back. Since I run MySql as well on my laptop, one thing I can vouch for is that Cassandra's write performance is definitely far ahead of what I would have expected from my OOTB MySql. &lt;/p&gt;
&lt;h4 id="conclusion"&gt;Conclusion&lt;/h4&gt;
&lt;p&gt;Cassandra has come a long way from the 0.8 days. I did not come across any bugs working on my prototype. CQL3 and data modeling was a breeze. And there are a plethora of resources on this topic on the web. I would certainly recommend Cassandra for those looking to get a quick hang of NoSql and Column stores. If you are planning to use Cassandra as part of your application and have done the due deligence on the performance side, then, let me assure you - programming with Cassandra should not take any more time than using a ORM framework like JPA/Hibernate. And if you are like me, wanting to write a prototype then you should be able to wrap it all up from zero to running in a single working week. Ping me if you run into any issues using my code, understanding my blog or anything else. Thanks for reading!&lt;/p&gt;
&lt;h4 id="reading-recommendations"&gt;Reading Recommendations&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Good introduction on the subject - &lt;a href="http://shop.oreilly.com/product/0636920010852.do"&gt;O'Reilly's Cassandra Definitive Guide&lt;/a&gt;, &lt;/li&gt;
&lt;li&gt;Data Modeling - &lt;a href="http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/"&gt;this&lt;/a&gt; wonderful blog by Jay Patel from Ebay&lt;/li&gt;
&lt;li&gt;Performance comparisons - &lt;a href="http://www.datastax.com/dev/blog/2012-in-review-performance"&gt;this&lt;/a&gt; article really nails it (pay attention to the chart!)&lt;/li&gt;
&lt;/ul&gt;</content><category term="posts"/><category term="Cassandra"/><category term="Java"/><category term="CQL"/><category term="Database"/></entry><entry><title>The Visual Display of Quantitative Information</title><link href="https://bharath12345.github.io/posts/the-visual-display-of-quantitative-information/" rel="alternate"/><published>2013-07-10T00:00:00-04:00</published><updated>2013-07-10T00:00:00-04:00</updated><author><name>Bharadwaj</name></author><id>tag:bharath12345.github.io,2013-07-10:/posts/the-visual-display-of-quantitative-information/</id><summary type="html">&lt;p&gt;For the last couple of years I have been in search of theories in Data Visualization. Educate myself on the fundamentals. My search has taken me to many books and blogs. But none as remarkable as Edward Tufte &lt;a href="http://www.amazon.com/The-Visual-Display-Quantitative-Information/dp/0961392142"&gt;book&lt;/a&gt; seminal work on the subject. This is a short refresher of …&lt;/p&gt;</summary><content type="html">&lt;p&gt;For the last couple of years I have been in search of theories in Data Visualization. Educate myself on the fundamentals. My search has taken me to many books and blogs. But none as remarkable as Edward Tufte &lt;a href="http://www.amazon.com/The-Visual-Display-Quantitative-Information/dp/0961392142"&gt;book&lt;/a&gt; seminal work on the subject. This is a short refresher of the core concepts. Even as I write for myself, it may be of some use to a passing busy programmer.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Graphical Excellence: that which gives a viewer maximum ideas in shortest time with least ink in the smallest space&lt;/li&gt;
&lt;li&gt;Graphical excellence is nearly always multivariate. Charts depicting behavior of two variables with respect to each other are always more insightful than simple time-series or progression graphs&lt;/li&gt;
&lt;li&gt;'Graphical Integrity' reigns supreme. Beware of distortions. Thre representation of numbers as physically measured on the surface of the graphic itself should be directly proportional to the numerical quantities represented (as an aside, &lt;a href="http://www.amazon.com/How-Lie-Statistics-Darrell-Huff/dp/0393310728"&gt;this book&lt;/a&gt; might be a good read on distortions!)&lt;/li&gt;
&lt;li&gt;The number of information carrying (variable) dimensions depicted should not exceed the number of dimensions in the data. Beware of area charts depicting single variable variations &lt;/li&gt;
&lt;li&gt;Maximize the data-ink ratio, within reason. Erase non-data-ink, within reason. Revise. Rethink.&lt;/li&gt;
&lt;li&gt;Moire vibrations in statistical charts are chartjunk. Gridlines, often, are chartjunk. 3D, often, is chartjunk. More than 3 colors are, often, chartjunk. Piecharts are always chartjunk. Easy graphing software is leading to more chartjunk and more amazing chartjunk &lt;/li&gt;
&lt;li&gt;Awesome examples of clarity by revision -  redesigning boxplots, barcharts and my personal favorite - the super intuitive dot-dash plot combining marginal distribution with a bivariate distribution!&lt;/li&gt;
&lt;li&gt;Use coordinates and axes with thought - maximize data-ink&lt;/li&gt;
&lt;li&gt;Organize and order the flow of graphical information presented to the eye - charts should intelligently use what are known facts on cognitive abilities of human brain&lt;/li&gt;
&lt;li&gt;Balance and optimize data-density = (number of data entries)/(area of the graphic). Try to maximize it. Else shrink the graphic&lt;/li&gt;
&lt;li&gt;Pay attention to line weights&lt;/li&gt;
&lt;li&gt;Curious case of the &lt;a href="http://en.wikipedia.org/wiki/Golden_rectangle"&gt;Golden Rectangle&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;And, finally, it was John Tukey who once said - there is no data that can be displayed in a pie chart, that cannot be displayed BETTER in some other type of chart.&lt;/li&gt;
&lt;/ul&gt;</content><category term="posts"/><category term="visualization"/></entry></feed>