Table of Contents
1. Introduction
Cassandra is an open-source distributed database management system with a wide column store and a NoSQL database that can handle massive amounts of data across many commodity servers with no single point of failure. It was created by the Apache Software Foundation and is written in Java. In this article, we will go through the step-by-step process to install Cassandra in CentOS 7 Linux.
2. Pre-requisites
All commands given below should be run as root
or sudo
user.
2.1. Install Python 2.7
On CentOS 7, Python 2.7 comes pre-installed. If it's missing for some reason, you can use the following command to install it:
# yum -y install python
# python --version Python 2.7.5
2.2. Install Java
Use the below commands to install latest version of Java 8 and verify the installation.
# yum install java-1.8.0-openjdk-devel
# java -version
Sample output:
openjdk version "1.8.0_312" OpenJDK Runtime Environment (build 1.8.0_312-b07) OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)
3. How to install Cassandra
First, let us add the Cassandra repository. To do so, create a file named cassandra.repo
under /etc/yum.repos.d/
directory:
# vi /etc/yum.repos.d/cassandra.repo
Add the following lines in it:
[cassandra]
name=Apache Cassandra
baseurl=https://www.apache.org/dist/cassandra/redhat/40x/
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://www.apache.org/dist/cassandra/KEYS
Press ESC key and type :wq
to save the file and close it.
Verify if the Cassandra repository is added. Below command will ensure the installed and enabled repositories:
# yum repolist
After adding the repository, run the following command to install Cassandra in your CentOS system:
# yum -y install cassandra
Enable and start Cassandra service:
# systemctl enable cassandra
cassandra.service is not a native service, redirecting to /sbin/chkconfig.
Executing /sbin/chkconfig cassandra on
# systemctl start Cassandra
Ensure the status of Cassandra:
# systemctl status cassandra
Use the below command to get the details of the cluster like it’s condition, load and IDs:
# nodetool status
Sample output:
Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 127.0.0.1 69.08 KiB 16 100.0% bf2df7a9-54bc-41c9-8c6c-0b9322d10e71 rack1
In the output,
- UN - Up & Normal
- Address - IP Address of Node
- Load - After excluding all content in the snapshots subdirectory, the amount of file system data under the Cassandra data directory. Every 90 seconds once It will be updated.
- Tokens - The number of tokens that have been assigned to the node.
- Owns - How much data the node owns; a node can possess 33% of the ring but display 100% if the replication factor is 3.
- Host ID - Host’s Network ID
- Rack - Rack of the Node where it exists.
4. Cqlsh – CLI for Cassandra
cqlsh is a command-line interface for utilizing CQL to connect with Cassandra (Cassandra Query Language). It's included in every Cassandra package and can be found alongside the cassandra executable in the bin/
directory. The Python native protocol driver is used to implement cqlsh, which connects to a single node.
To launch Cqlsh, run:
# cqlsh
Sample output:
Connected to Test Cluster at 127.0.0.1:9042 [cqlsh 6.0.0 | Cassandra 4.0.1 | CQL spec 3.4.5 | Native protocol v5] Use HELP for help. cqlsh>
5. CQL Sample commands
5.1. Create Key Space
In Cassandra, a keyspace serves as a data container, similar to a database in relational database management systems (RDMBS)
cqlsh> CREATE KEYSPACE IF NOT EXISTS OsTechNix WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3 }; cqlsh>
Check the key spaces in the system using below commands.
cqlsh> SELECT * FROM system_schema.keyspaces;
To show all keyspaces, run:
cqlsh> desc keyspaces;
All the keyspaces on the cluster will be listed:
ostechnix system_auth system_schema system_views system system_distributed system_traces system_virtual_schema
5.2. Create table and insert sample data
You can use the CREATE TABLE
statement defining datatypes for each column as we usually do in RDBMS. Data is stored in CQL tables with rows of columns, much like SQL definitions.
You must define 'primary key' and other data fields for creating a table. Follow the below example for table creation.
cqlsh> CREATE TABLE ostechnix.sample_table ( id UUID PRIMARY KEY, name text, birthday timestamp, nationality text, weight text, height text );
cqlsh>
Use the INSERT
statement to insert simple data into the table ostechnix.sample_table
that we create above. In this below example, two records are added into the table.
cqlsh> INSERT INTO ostechnix.sample_table (id, name, nationality) VALUES (5b6962dd-3f90-4c93-8f61-eabfa4a803e2, 'KARTHICK', 'Indian');
cqlsh> INSERT INTO ostechnix.sample_table (id, name, nationality, weight) VALUES (5b6962dd-3f90-4c93-8f61-eabfa4a804e3, 'MOHAN', 'Indian', '85');
5.3. Querying the table
Use SELECT statement for returning one or more rows from a table.
cqlsh> SELECT * FROM ostechnix.sample_table;
Here, *
returns all data from the table.
cqlsh> SELECT * FROM ostechnix.sample_table WHERE weight = '85';
Sample output:
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING" cqlsh>
Cassandra will not run a query that does not specify values for all columns from the primary key in the 'where
' clause, and will issue the above error alerting to use 'ALLOW FILTERING'
.
The reason for this error is that Cassandra will be unable to identify the node that contains the required results if the complete partition key is not included in the WHERE
clause. As a result, Cassandra will have to scan the entire dataset on each node to guarantee it has identified the relevant data.
cqlsh> SELECT * FROM ostechnix.sample_table WHERE weight = '85' ALLOW FILTERING;
6. Summary
In this article, we have gone through the Cassandra installation procedures and a few sample CQL commands. We will have a deep dive in Cassandra Operation in the upcoming articles.
Resources: