Are you working on a writing project and need to keep track of your character and word counts? Or maybe you're a developer who wants to add file analysis capabilities to your Python scripts? Either way, you can use a simple Python script to quickly get the character and word counts for any text file.
The basic idea is straightforward. You open the file, read its contents, and then use Python's built-in functions to count the characters and words.
In this article, we'll walk through how to create a simple Python script to count both words and characters in a text file. We'll also show how to handle newline characters, so you can choose whether to include them in your character count.
Let us get started.
Table of Contents
Counting Characters
Character counting in Python involves counting every character in the text, including spaces, punctuation, and (optionally) newline characters. The len()
function makes it easy to count characters.
Example 1: Including Newlines
To get the total character count, including newline characters, you can use the len()
function on the file contents:
with open('file.txt', 'r') as file: contents = file.read() char_count = len(contents)
This will give you the full character count, counting each newline character as a single character.
Example 2: Excluding Newlines
If you want to exclude the newline characters from the count, you can use the replace()
method to remove them before getting the length:
with open('file.txt', 'r') as file: contents = file.read() char_count = len(contents.replace('\n', ''))
Now the character count will only include the actual text content, without the newline characters.
Counting Words
Counting the number of words in a text file is straightforward. Python's split()
method breaks a string into a list of words based on whitespace. By counting the length of this list, you can easily find the number of words in the text.
Here's a basic example:
To get the word count, you can split the file contents on whitespace (spaces, tabs, newlines) and count the resulting list of words:
with open('file.txt', 'r') as file: contents = file.read() word_count = len(contents.split())
This will give you the total number of words in the file.
Let's combine these functions into a single Python script that displays the total word and character count for a given file.
Complete Python Script to Count Characters and Words in a Text File
The complete Python script to display both the character and word counts for a given text file is available in our GitHub's gist page:
#!/usr/bin/env python
# ------------------------------------------------------------------
# Script Name: txtcwcount.py
# Description: A Python Script to Count Characters and Words
# in a Plain Text File.
# Website: https://gist.github.com/ostechnix
# Version: 1.0
# Usage: python txtcwcount.py filename
# ------------------------------------------------------------------
import sys
def count_words_chars(file_path):
with open(file_path, 'r') as file:
contents = file.read()
word_count = len(contents.split())
char_count = len(contents)
return word_count, char_count
def main():
if len(sys.argv) < 2:
print("Usage: python file_counter.py <file_path>")
return
file_path = sys.argv[1]
word_count, char_count = count_words_chars(file_path)
print("="*50)
print(f"File: {file_path}")
print(f"Character Count: {char_count}")
print(f"Word Count: {word_count}")
print("="*50)
if __name__ == "__main__":
main()
To use this script, either download the cwcount.py
file from our Github gist page or copy the above code save it to a file (e.g., txtcwcount.py
) and run it from the command line, passing the file path as an argument:
python txtcwcount.py file.txt
The script will then display the character count (including newlines) and the word count for the specified file.
Sample Output:
As you can see, there are 23
characters and 6
words in the file.txt
file.
Let us verify the content of the file.txt
:
Count the characters (including newlines and spaces) and words to verify if the count matches with your output.
As stated already, to exclude the newline characters from the character count, we can use the len(contents.replace('\n', ''))
approach instead of just len(contents)
.
The key change is in the count_words_chars()
function in the script:
char_count = len(contents.replace('\n', ''))
This replaces all newline characters (\n
) with an empty string, effectively removing them from the character count calculation.
The output will now look like this:
================================================== File: file.txt Character Count: 21 Word Count: 6 ==================================================
The Default Approach: Counting Characters Including Newlines
In general, the default behavior for character counting in programming is to include the newline characters.
The rationale behind this is that newline characters are part of the file's contents and should be accounted for when determining the total character count. They represent important information about the structure and formatting of the text, and excluding them from the character count would not provide the full picture.
However, there are valid use cases where you might want to exclude the newline characters from the character count, especially when the newline characters are not the primary focus of the analysis. For example, if you're interested in the visual length of the text content, excluding the newline characters might be more appropriate.
In this tutorial, I showed examples of both including and excluding newline characters in the character count. This gives you the flexibility to choose the approach that best suits your needs. If you don't have a specific requirement to exclude the newline characters, the default and more common approach would be to include them in the character count.
Conclusion
This Python script makes it easy to count the number of words and characters in a text file, with the option to include or exclude newline characters based on your needs.
Whether you're a writer, a developer, or someone who needs to analyze text files, this script can be a valuable tool in your toolkit. Give it a try and let me know if you have any questions!
Related Read: