Chapter 16 - Strings and Runes

Strings and runes are fundamental components of programming, especially in languages that deal with textual data and character encoding. A string is essentially a sequence of characters, while a rune represents an individual character, often associated with Unicode encoding. While strings are ubiquitous and used to handle text as a collection of characters, runes are essential for dealing with the specifics of encoding and decoding characters, especially in multilingual environments.

This article explores what strings and runes are, their differences, and how they are used in Python, PHP, Go, C++, and Zig. We'll also discuss common operations and techniques for handling both.

What Strings Represent in Programming

Strings are sequences of characters used to represent text. Each character in a string can correspond to a letter, digit, symbol, or whitespace. Internally, strings are stored as arrays of bytes. These bytes are encoded in a specific character encoding, such as UTF-8 or ASCII. Most modern programming languages use Unicode-compatible encodings, enabling strings to represent characters from various languages and symbol sets.

Characteristics of Strings

Immutable or Mutable: Some languages, like Python and Go, treat strings as immutable, meaning they cannot be changed after creation. Others, like PHP, allow mutable string manipulation.
Encodings: Strings are tightly coupled with encodings, determining how they represent different characters.
Concatenation: Strings can be joined together, often using operators like + or methods.
Methods and Utilities: Languages often provide built-in functions to manipulate strings (e.g., slicing, searching, replacing).

Example Use of Strings

Displaying messages to the user.
Parsing and processing user input.
Representing structured data in formats like JSON or XML.

What Runes Represent in Programming

A rune is a term commonly associated with Go but applies to any system dealing with Unicode characters. In Unicode, characters are represented as code points, which are integers corresponding to specific symbols. A rune typically represents one of these code points.

Characteristics of Runes

Single Characters: Runes are used for working with individual characters within a string.
Unicode Awareness: They help manage Unicode characters, which may span multiple bytes in encodings like UTF-8.
Precision: They provide a precise way to handle multilingual text.

Runes are especially important when dealing with languages where characters are not necessarily single bytes, such as Chinese or emoji symbols.

Strings and Runes in Python

Strings in Python

Python strings are immutable sequences of Unicode characters. They are defined using single, double, or triple quotes:

# String examples
text = "Hello, World!"
multi_line = '''This is
a multiline string.'''

Python provides powerful methods to manipulate strings:

# String operations
print(len(text))        # Get string length
print(text.upper())     # Convert to uppercase
print(text[0:5])        # Slicing

Runes in Python

Python does not have a specific "rune" type, but you can work with individual Unicode characters using strings or the ord() and chr() functions:

# Unicode character
char = 'a'
print(ord(char))       # Get Unicode code point (97)
print(chr(97))         # Get character from code point ('a')

Strings and Runes in PHP

Strings in PHP

PHP strings are mutable and can be defined using single or double quotes:

// String examples
$text = "Hello, World!";
$multiLine = 'This is
a multiline string.';

PHP provides a rich set of functions for string manipulation:

// String operations
echo strlen($text);       // Get string length
echo strtoupper($text);   // Convert to uppercase
echo substr($text, 0, 5); // Slicing

Runes in PHP

PHP does not have a direct rune type, but it can handle Unicode characters using the mb_* functions:

// Unicode character
$char = 'a';
echo ord($char);         // Get ASCII value (97)
echo chr(97);            // Get character ('a')

Strings and Runes in Go

Strings in Go

In Go, strings are immutable sequences of bytes encoded in UTF-8:

// String example
text := "Hello, World!"
fmt.Println(len(text))   // String length
fmt.Println(text[:5])    // Slicing

Runes in Go

Runes in Go are aliases for the int32 type, representing Unicode code points. They are useful when working with multilingual text:

// Rune example
r := 'a'                // Rune literal
fmt.Println(r)          // Unicode code point (97)
fmt.Printf("%c", r)    // Character ('a')

Strings and Runes in C++

Strings in C++

C++ offers the std::string class for handling strings. Strings can also be represented using C-style character arrays:

#include <iostream>
#include <string>

// String example
std::string text = "Hello, World!";
std::cout << text.length() << std::endl; // String length

Runes in C++

C++ handles Unicode characters using the char32_t type:

// Unicode character
char32_t rune = U'a';
std::cout << rune << std::endl;          // Unicode code point (97)

Strings and Runes in Zig

Strings in Zig

Zig strings are slices of bytes:

const std = @import("std");

pub fn main() void {
    var text: []const u8 = "Hello, World!";
    std.debug.print("{}\n", .{text});
}

Runes in Zig

Zig does not have a dedicated rune type but can work with Unicode code points:

const std = @import("std");

pub fn main() void {
    const rune: u32 = 'a';
    std.debug.print("{d}\n", .{rune});
}

Practical Applications of Strings and Runes

Parsing Text: Strings are essential for reading and processing textual data, such as configuration files or web requests.
Localization: Runes help ensure that applications work seamlessly with languages worldwide.
Data Serialization: Strings are critical for encoding and decoding structured data formats like JSON or XML.
Character Validation: Runes can validate and transform text at a granular level, such as handling emojis or special symbols.

Conclusion

Strings and runes are indispensable for handling text in programming. While strings focus on sequences of characters, runes allow precise control over individual characters, especially in multilingual contexts. By understanding how to work with these constructs across different languages, developers can create robust, text-aware applications.

Last modified: 26 January 2025