Wednesday, December 14, 2016

Cassandra:nodetool usage

Nodetool is a OOTB tool from Cassandra.  It provides various options to manage the keyspace.  There were three useful options I came across:

1) I had to copy the keyspace from one server to another server.  Since the source is from multiple servers, it is essential to cleanup the data after all the db and its related files are copied.  For that, I have to use nodetool.  Nodetool usually resides in <cassandra-path>/bin.  To cleanup, give the following command:

            nodetool -h <hostname of the cassandra> cleanup

Based on the amount of data to be cleaned, it will take some time.To track the progress, you can use:

            nodetool compactionstats

This will print the following:

pending tasks: 1
          compaction type        keyspace   column family       completed           total      unit  progress
                  Cleanup             <keyspace-name>      <column-family-name>      1667500496      2004577146     bytes    83.18%

Note: the total column will vary based on the size that is being calculated in real time.

2) To get the statistics of all the keyspaces. Especially to get a rough idea of how much keys per column family are being used:

           nodetool -h <hostname> cfstats


Friday, December 9, 2016

MSSQL: How to transform a set of rows from one table to another?

My requirement was to transfer a specific set of rows from one table to another table and both have the exact schema bit-by-bit.  In MSSQL, follow these steps:

1) First get the column list from the destination table.  Use:

SELECT SUBSTRING(
    (SELECT ', ' + QUOTENAME(COLUMN_NAME)
        FROM INFORMATION_SCHEMA.COLUMNS
        WHERE TABLE_NAME = '<destination-table-name>'
        ORDER BY ORDINAL_POSITION
        FOR XML path('')),
    3,
    200000);

2) If the destination table has the ID column (primary key), then you need to turn on the identity_insert option:
   
     set identity_insert <destination-table> on;

3) Then you insert...into...select query.  The select can have any valid condition to get the specific set of data.  A typical query can be like this:

insert into emails (<destination-table-column-list-from the query in step 1) SELECT <source-table-column-names>
  FROM <source-table> where <any condition that will load only those records>;

4) Don't forget to turn off the identity_insert option you changed in step 2):
 
    set identity_insert dbo.emails off;

And that is all...