Home Apache Cassandra How To Install Apache Cassandra in CentOS

How To Install Apache Cassandra in CentOS

Apache Cassandra Installation Steps And CQL Command Examples

By Rudhra Sivam
Published: Last Updated on 2.4K views

1. Introduction

Cassandra is an open-source distributed database management system with a wide column store and a NoSQL database that can handle massive amounts of data across many commodity servers with no single point of failure. It was created by the Apache Software Foundation and is written in Java. In this article, we will go through the step-by-step process to install Cassandra in CentOS 7 Linux.

2. Pre-requisites

All commands given below should be run as root or sudo user.

2.1. Install Python 2.7

On CentOS 7, Python 2.7 comes pre-installed. If it's missing for some reason, you can use the following command to install it:

# yum -y install python
# python --version
Python 2.7.5

2.2. Install Java

Use the below commands to install latest version of Java 8 and verify the installation.

# yum install java-1.8.0-openjdk-devel
# java -version

Sample output:

openjdk version "1.8.0_312"
OpenJDK Runtime Environment (build 1.8.0_312-b07)
OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)

3. How to install Cassandra

First, let us add the Cassandra repository. To do so, create a file named cassandra.repo under /etc/yum.repos.d/ directory:

# vi /etc/yum.repos.d/cassandra.repo

Add the following lines in it:

name=Apache Cassandra

Press ESC key and type :wq to save the file and close it.

Verify if the Cassandra repository is added. Below command will ensure the installed and enabled repositories:

# yum repolist
List enabled yum repositories
List enabled yum repositories

After adding the repository, run the following command to install Cassandra in your CentOS system:

# yum -y install cassandra

Enable and start Cassandra service:

# systemctl enable cassandra
cassandra.service is not a native service, redirecting to /sbin/chkconfig.
Executing /sbin/chkconfig cassandra on
# systemctl start Cassandra

Ensure the status of Cassandra:

# systemctl status cassandra
Check Cassandra status
Check Cassandra status

Use the below command to get the details of the cluster like it’s condition, load and IDs:

# nodetool status

Sample output:

Datacenter: datacenter1
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens  Owns (effective)  Host ID                               Rack
UN  69.08 KiB  16      100.0%            bf2df7a9-54bc-41c9-8c6c-0b9322d10e71  rack1
View the cluster details
View the cluster details

In the output,

  • UN - Up & Normal
  • Address - IP Address of Node
  • Load - After excluding all content in the snapshots subdirectory, the amount of file system data under the Cassandra data directory. Every 90 seconds once It will be updated.
  • Tokens - The number of tokens that have been assigned to the node.
  • Owns - How much data the node owns; a node can possess 33% of the ring but display 100% if the replication factor is 3.
  • Host ID - Host’s Network ID
  • Rack - Rack of the Node where it exists.

4. Cqlsh – CLI for Cassandra

cqlsh is a command-line interface for utilizing CQL to connect with Cassandra (Cassandra Query Language). It's included in every Cassandra package and can be found alongside the cassandra executable in the bin/ directory. The Python native protocol driver is used to implement cqlsh, which connects to a single node.

To launch Cqlsh, run:

# cqlsh

Sample output:

Connected to Test Cluster at
[cqlsh 6.0.0 | Cassandra 4.0.1 | CQL spec 3.4.5 | Native protocol v5]
Use HELP for help.
Launch Cqlsh
Launch Cqlsh

5. CQL Sample commands

5.1. Create Key Space

In Cassandra, a keyspace serves as a data container, similar to a database in relational database management systems (RDMBS)

cqlsh> CREATE KEYSPACE IF NOT EXISTS OsTechNix WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3 };

Check the key spaces in the system using below commands.

cqlsh> SELECT * FROM system_schema.keyspaces;
Check keyspaces
Check keyspaces

To show all keyspaces, run:

cqlsh> desc keyspaces;

All the keyspaces on the cluster will be listed:

ostechnix  system_auth         system_schema  system_views
system     system_distributed  system_traces  system_virtual_schema
Show all keyspaces
Show all keyspaces

5.2. Create table and insert sample data

You can use the CREATE TABLE statement defining datatypes for each column as we usually do in RDBMS. Data is stored in CQL tables with rows of columns, much like SQL definitions.

You must define 'primary key' and other data fields for creating a table. Follow the below example for table creation.

cqlsh> CREATE TABLE ostechnix.sample_table ( id UUID PRIMARY KEY, name text, birthday timestamp, nationality text, weight text, height text );

Use the INSERT statement to insert simple data into the table ostechnix.sample_table that we create above. In this below example, two records are added into the table.

cqlsh> INSERT INTO ostechnix.sample_table (id, name, nationality) VALUES (5b6962dd-3f90-4c93-8f61-eabfa4a803e2, 'KARTHICK', 'Indian');
cqlsh> INSERT INTO ostechnix.sample_table (id, name, nationality, weight) VALUES (5b6962dd-3f90-4c93-8f61-eabfa4a804e3, 'MOHAN', 'Indian', '85');

5.3. Querying the table

Use SELECT statement for returning one or more rows from a table.

cqlsh> SELECT * FROM ostechnix.sample_table;
Querying table
Querying table

Here, * returns all data from the table.

cqlsh> SELECT * FROM ostechnix.sample_table WHERE weight = '85';

Sample output:

InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"

Cassandra will not run a query that does not specify values for all columns from the primary key in the 'where' clause, and will issue the above error alerting to use 'ALLOW FILTERING'.

The reason for this error is that Cassandra will be unable to identify the node that contains the required results if the complete partition key is not included in the WHERE clause. As a result, Cassandra will have to scan the entire dataset on each node to guarantee it has identified the relevant data.

cqlsh> SELECT * FROM ostechnix.sample_table WHERE weight = '85' ALLOW FILTERING;
Filter items from table
Filter items from table

6. Summary

In this article, we have gone through the Cassandra installation procedures and a few sample CQL commands. We will have a deep dive in Cassandra Operation in the upcoming articles.


You May Also Like

Leave a Comment

* By using this form you agree with the storage and handling of your data by this website.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

This website uses cookies to improve your experience. By using this site, we will assume that you're OK with it. Accept Read More