Install Cassandra In CentOS Linux

Table of Contents

1. Introduction

Cassandra is an open-source distributed database management system with a wide column store and a NoSQL database that can handle massive amounts of data across many commodity servers with no single point of failure. It was created by the Apache Software Foundation and is written in Java. In this article, we will go through the step-by-step process to install Cassandra in CentOS 7 Linux.

2. Pre-requisites

All commands given below should be run as root or sudo user.

2.1. Install Python 2.7

On CentOS 7, Python 2.7 comes pre-installed. If it's missing for some reason, you can use the following command to install it:

# yum -y install python

# python --version
Python 2.7.5

2.2. Install Java

Use the below commands to install latest version of Java 8 and verify the installation.

# yum install java-1.8.0-openjdk-devel

# java -version

Sample output:

openjdk version "1.8.0_312"
OpenJDK Runtime Environment (build 1.8.0_312-b07)
OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)

3. How to install Cassandra

First, let us add the Cassandra repository. To do so, create a file named cassandra.repo under /etc/yum.repos.d/ directory:

# vi /etc/yum.repos.d/cassandra.repo

Add the following lines in it:

[cassandra]
name=Apache Cassandra
baseurl=https://www.apache.org/dist/cassandra/redhat/40x/
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://www.apache.org/dist/cassandra/KEYS

Press ESC key and type :wq to save the file and close it.

Verify if the Cassandra repository is added. Below command will ensure the installed and enabled repositories:

# yum repolist

After adding the repository, run the following command to install Cassandra in your CentOS system:

# yum -y install cassandra

Enable and start Cassandra service:

# systemctl enable cassandra
cassandra.service is not a native service, redirecting to /sbin/chkconfig.
Executing /sbin/chkconfig cassandra on

# systemctl start Cassandra

Ensure the status of Cassandra:

# systemctl status cassandra

Use the below command to get the details of the cluster like it’s condition, load and IDs:

# nodetool status

Sample output:

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens  Owns (effective)  Host ID                               Rack
UN  127.0.0.1  69.08 KiB  16      100.0%            bf2df7a9-54bc-41c9-8c6c-0b9322d10e71  rack1

In the output,

UN - Up & Normal
Address - IP Address of Node
Load - After excluding all content in the snapshots subdirectory, the amount of file system data under the Cassandra data directory. Every 90 seconds once It will be updated.
Tokens - The number of tokens that have been assigned to the node.
Owns - How much data the node owns; a node can possess 33% of the ring but display 100% if the replication factor is 3.
Host ID - Host’s Network ID
Rack - Rack of the Node where it exists.

4. Cqlsh – CLI for Cassandra

cqlsh is a command-line interface for utilizing CQL to connect with Cassandra (Cassandra Query Language). It's included in every Cassandra package and can be found alongside the cassandra executable in the bin/ directory. The Python native protocol driver is used to implement cqlsh, which connects to a single node.

To launch Cqlsh, run:

# cqlsh

Sample output:

Connected to Test Cluster at 127.0.0.1:9042
[cqlsh 6.0.0 | Cassandra 4.0.1 | CQL spec 3.4.5 | Native protocol v5]
Use HELP for help.
cqlsh>

5. CQL Sample commands

5.1. Create Key Space

In Cassandra, a keyspace serves as a data container, similar to a database in relational database management systems (RDMBS)

cqlsh> CREATE KEYSPACE IF NOT EXISTS OsTechNix WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3 };
cqlsh>

Check the key spaces in the system using below commands.

cqlsh> SELECT * FROM system_schema.keyspaces;

To show all keyspaces, run:

cqlsh> desc keyspaces;

All the keyspaces on the cluster will be listed:

ostechnix  system_auth         system_schema  system_views
system     system_distributed  system_traces  system_virtual_schema

5.2. Create table and insert sample data

You can use the CREATE TABLE statement defining datatypes for each column as we usually do in RDBMS. Data is stored in CQL tables with rows of columns, much like SQL definitions.

You must define 'primary key' and other data fields for creating a table. Follow the below example for table creation.

cqlsh> CREATE TABLE ostechnix.sample_table ( id UUID PRIMARY KEY, name text, birthday timestamp, nationality text, weight text, height text );
cqlsh>

Use the INSERT statement to insert simple data into the table ostechnix.sample_table that we create above. In this below example, two records are added into the table.

cqlsh> INSERT INTO ostechnix.sample_table (id, name, nationality) VALUES (5b6962dd-3f90-4c93-8f61-eabfa4a803e2, 'KARTHICK', 'Indian');

cqlsh> INSERT INTO ostechnix.sample_table (id, name, nationality, weight) VALUES (5b6962dd-3f90-4c93-8f61-eabfa4a804e3, 'MOHAN', 'Indian', '85');

5.3. Querying the table

Use SELECT statement for returning one or more rows from a table.

cqlsh> SELECT * FROM ostechnix.sample_table;

Here, * returns all data from the table.

cqlsh> SELECT * FROM ostechnix.sample_table WHERE weight = '85';

Sample output:

InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"
cqlsh>

Cassandra will not run a query that does not specify values for all columns from the primary key in the 'where' clause, and will issue the above error alerting to use 'ALLOW FILTERING'.

The reason for this error is that Cassandra will be unable to identify the node that contains the required results if the complete partition key is not included in the WHERE clause. As a result, Cassandra will have to scan the entire dataset on each node to guarantee it has identified the relevant data.

cqlsh> SELECT * FROM ostechnix.sample_table WHERE weight = '85' ALLOW FILTERING;

6. Summary

In this article, we have gone through the Cassandra installation procedures and a few sample CQL commands. We will have a deep dive in Cassandra Operation in the upcoming articles.

Resources:

Apache Cassandra Bigdata Cassandra CentOS Database Install Cassandra Linux Nosql NoSQL database Opensource

How To Install Apache Cassandra in CentOS

Apache Cassandra Installation Steps And CQL Command Examples

1. Introduction

2. Pre-requisites

2.1. Install Python 2.7

2.2. Install Java

3. How to install Cassandra

4. Cqlsh – CLI for Cassandra

5. CQL Sample commands

5.1. Create Key Space

5.2. Create table and insert sample data

5.3. Querying the table

6. Summary

Rudhra Sivam

How To Backup And Restore Files Using BorgBackup In Linux

How To Setup MySQL With Docker In Linux

You May Also Like

What is Apache Cassandra | Introduction To Cassandra

Getting Started With DataStax Astra DB | Astra...

Leave a Comment Cancel Reply