In Python, all strings are Unicode sequences. There is no such thing as a "UTF-8 encoded Python string" or "CP-1252 encoded Python string." Asking "Is this string UTF-8?" is invalid. UTF-8 is the standard way to encode characters as a byte sequence in Python. If you want to convert a string into a byte sequence with a specific character encoding, Python helps you do that. Similarly, if you want to convert a byte sequence into a string, Python assists there as well. Bytes are not characters; bytes are bytes. Characters are an abstraction, and a string is a sequence of these abstractions.

To create a string, enclose it in quotes. Python strings can be defined using single quotes (') or double quotes (").

s = 'Dublin'
c = "Ireland"

print(s, c) # Dublin Ireland

Strings enclosed in triple quotes can span multiple lines.

print('''this is Dublin
it is in Ireland
''')
# this is Dublin
# it is in Ireland

Triple quotes are often used to represent multi-line strings and docstrings.

print('''Lorem ipsum dolor sit amet
Test
''')

Function docstrings are also strings. They can span multiple lines, so triple quotes are used to start and end the string.

def x():
    '''
    Test docstring
    :return: 
    '''

def y():
    """
    Test
    :return:
    """

print(x.__doc__)
print(y.__doc__)

String Interpolation with f-string

The most modern way to perform string interpolation is by using f-strings.

city = "Dublin"
country = "Republic of Ireland"

print(f"{city} is the capital of {country}")

# Output:
# Dublin is the capital of Republic of Ireland

With f-strings, you can perform calculations or call methods inline.

a = 32
b = "dublin"

print(f"Number is {a * 2}") # Number is 64
print(f"Formatted name is {b.capitalize()}") # Formatted name is Dublin

String Formatting

Python supports formatting values into strings. This allows embedding values into strings using placeholders.

(city, country) = "Dublin", "Republic of Ireland"

# We can directly send values to placeholders
print("{0} is in {1}".format(city, country)) # Dublin is in Republic of Ireland

# We can give them a name
print("{ci} is in {co}".format(ci=city, co=country)) # Dublin is in Republic of Ireland

# We can also use indexes
cities = ['Dublin', 'London']
print("{0[0]} is the capital of Ireland. {0[1]} is the capital of England".format(cities)) 
# Dublin is the capital of Ireland. London is the capital of England

You can format numbers using format specifiers.

print("{0:.2f}".format(10000.12)) # 10000.12

print("{:,}".format(1228312)) # 1,228,312

print("{0:,.2f}".format(12000.31)) # 12,000.31

Access a Character

You can access individual characters using indexing and a sequence of characters using slicing. Indexing starts at 0. Trying to access a character outside the index range raises an IndexError. The index must be an integer; floats or other types are not allowed.

c = "Dublin"
print(c[0]) # D
print(c[-1]) # n
print(c[1:3]) # ub

Deleting or Changing a Character

Strings are immutable, meaning their elements cannot be modified after assignment. However, you can assign a new string to the same name.

c = "Dublin"
c[0] = "D" # Raises an error: TypeError: 'str' object does not support item assignment

You cannot delete or remove individual characters in a string.

c = "Dublin"

del c[0] # Raises an error: TypeError: 'str' object doesn't support item deletion

However, you can delete the entire string using the del keyword.

c = "New York"

del c

print(c) # NameError: name 'c' is not defined

Concatenation of Two or More Strings

Concatenation means joining two or more strings into a single string.

In Python, the + operator is used for concatenation. Placing two string literals next to each other also concatenates them.

c = "Tokyo"
x = "Japan"

print(c + " is in " + x) # Tokyo is in Japan

* Operator

The * operator can be used to repeat a string a specified number of times.

print("*" * 10) # **********
print('-' * 15) # ---------------

% Operator

The % operator provides a way to format strings using placeholders.

(city, country) = "Paris", "France"

print("%s is in %s" % (city, country)) # Paris is in France
print("The city is %s " % city) # The city is Paris 

s = "Hello %s, %s" % (city, country)

print(s) # Hello Paris, France

Iterating through a String

You can iterate over a string using a for loop.

c = "Python"

for i in c:
  print(i)

Alternatively, you can iterate over it by using indexes:

c = "Python"

for i in range(len(c)):
	print(c[i])

String Membership Tests

You can check if a substring exists within a string using the in keyword.

c = "Python"

print("yt" in c) # True
print("nx" in c) # False
print("Xyt" not in c) # True

Escape Sequences

If you want to print the following text: He said, "What's there?", using both single and double quotes directly will cause a SyntaxError since the text itself contains both types of quotes.

print("He said, "What's there?"")

You can solve this issue by using triple quotes or escape sequences.

print('''He said "What's there"''') # He said "What's there"

print('He said "What\'s there"') # He said "What's there"

print("He said \"What's there\"") # He said "What's there"

Encoding and Decoding

In Python, strings are Unicode by default, which means they represent text as a sequence of Unicode characters. However, sometimes you may need to work with encoded representations of strings, such as when reading or writing files, sending data over a network, or interacting with APIs that require specific encodings. Encoding means converting a string into a sequence of bytes, while decoding is converting bytes back into a string.

To convert a string into bytes, use the .encode() method with the desired encoding format (e.g., utf-8, ascii, etc.). If no encoding is specified, utf-8 is used by default.

text = "Python"

encoded_text = text.encode("utf-8")

print(encoded_text)  # Output: b'Python'
print(type(encoded_text))  # Output: <class 'bytes'>

To convert a byte sequence back into a string, use the .decode() method with the matching encoding.

text = "Python"

encoded_text = text.encode("utf-8")

decoded_text = encoded_text.decode("utf-8")

print(decoded_text)  # Output: Python
print(type(decoded_text))  # Output: <class 'str'>

Raw Strings

In Python, strings often contain special characters or escape sequences (e.g., \n for a newline or \t for a tab). If you need to include backslashes or prevent escape sequences from being interpreted, raw strings provide a convenient solution.

A raw string is prefixed with r or R. In a raw string, backslashes (\) are treated as literal characters rather than escape characters.

Raw strings are commonly used for regular expressions and Windows-style file paths, as they eliminate the need to escape backslashes.

# Regular string
path = "C:\\Users\\Name\\Documents"
print(path)  # Output: C:\Users\Name\Documents

# Raw string
raw_path = r"C:\Users\Name\Documents"
print(raw_path)  # Output: C:\Users\Name\Documents
import re

pattern = r"\d+"  # Matches one or more digits
result = re.findall(pattern, "123 apples and 456 oranges")
print(result)  # Output: ['123', '456']

Methods

Python provides many methods for working with strings. Below, you can find the most common string methods along with examples of their usage.

capitalize()

Returns a string with the first character capitalized and the rest lowercased.

x = "i like python"

print(x.capitalize()) # I like python

center(width[, fillchar])

Returns a string of a specified width, centered and padded with the specified fill character.

x = ' python '

print(x.center(30, '*')) # ******* python ********

count(sub[, start[, end]])

Returns the number of times a specified value appears in the string. Optional start and end parameters define the range to search.

txt = "I love Python programming and reading about Python language."

print(txt.count("Python"))  # Output: 2

endswith(suffix)

Checks if the string ends with the specified suffix. Returns True if it does, otherwise False.

file = "app.py"

print(file.endswith(".py"))  # True
print(file.endswith(".html"))  # False

find(sub[, start[, end]])

Returns the lowest index of the substring if found. If not found, returns -1. Optional start and end parameters define the range to search.

txt = "My favorite language is Python!"

print(txt.find("Python"))  # 24
print(txt.find("Java"))  # -1

index(sub[, start[, end]])

Returns the index of the substring if found. If not found, raises a ValueError. Optional start and end parameters define the range to search.

txt = "My favorite language is Python!"

print(txt.index("Python"))  # 24
print(txt.index("Java"))  # ValueError: substring not found

join(iterable)

Joins the elements of an iterable into a single string, separated by the string.

print(",".join(("PHP", "Java", "Python"))) # PHP,Java,Python

lower()

Converts all upper case characters in the string to lower case.

txt = "I love Python programming!"

print(txt.lower()) # i love python programming!

lstrip([chars])

Removes specified characters from the beginning of the string and returns a copy. If no argument is provided, it removes spaces.

txt = "    Python"
txt2 = "----Python"

print(txt.lstrip()) # Python
print(txt2.lstrip("-")) # Python

replace(old, new])

Returns a copy of the string with all occurrences of a substring (old) replaced with another substring (new).

txt = "I love XX programming"

print(txt.replace("XX", "Python")) # I love Python programming

rfind(sub[, start[, end]])

Returns the highest index of the substring if found. If not found, returns -1.

txt = "Python is my favorite programming language and I am learning Python now."

print(txt.find("Python"))  # 0
print(txt.rfind("Python"))  # 61

rindex(sub[, start[, end]])

Returns the highest index of the substring if found. If not found, raises a ValueError.

txt = "Python is my favorite programming language and I am learning Python now."

print(txt.rindex("Python"))  # 61
print(txt.rindex("Pythonx"))  # ValueError: substring not found

rstrip([chars])

Removes specified characters from the end of the string and returns a copy. If no argument is provided, it removes spaces.

txt = "Python-----"

print(txt.rstrip('-')) # Python

split(sep=None, maxsplit=-1)

Splits the string into a list of substrings using the specified separator (sep). maxsplit defines the maximum number of splits.

languages = "Python, Java, C++, JavaScript, Ruby"

print(languages.split(", ")) # ['Python', 'Java', 'C++', 'JavaScript', 'Ruby']

startswith(prefix)

Checks if the string starts with the specified prefix. Returns True if it does, otherwise False.

name = "Python"

print(name.startswith("P"))  # True
print(name.startswith("n"))  # False

strip([chars])

Removes specified characters from both ends of the string and returns a copy.

txt = "-----Python-----"

print(txt.strip('-')) # Python

title()

Returns a title-cased version of the string, where the first character of each word is uppercase.

txt = "i love python programming!"

print(txt.title()) # I Love Python Programming!

upper()

Converts all lower case characters in the string to uppercase.

txt = "i love python programming!"

print(txt.upper()) # I LOVE PYTHON PROGRAMMING!

Conclusion

Strings are a key part of Python and are used in almost every application, from processing text to managing data and files. In this article, we covered the basics of creating, formatting, and manipulating strings, as well as more advanced topics like encoding, decoding, and raw strings. With these tools, you’ll have a strong foundation for working with text in Python.

To deepen your knowledge, you can explore the official Python string documentation for a complete list of methods and features. If you’re interested in formatting strings specifically, the f-string documentation provides detailed guidance. With practice and continued learning, you’ll be able to handle any text-related task efficiently.

AUTHOR
PUBLISHED 26 January 2025
TOPICS