class String

Overview

A String represents an immutable sequence of UTF-8 characters.

A String is typically created with a string literal, enclosing UTF-8 characters in double quotes:

"hello world"

A backslash can be used to denote some characters inside the string:

"\"" # double quote
"\\" # backslash
"\e" # escape
"\f" # form feed
"\n" # newline
"\r" # carriage return
"\t" # tab
"\v" # vertical tab

You can use a backslash followed by at most three digits to denote a code point written in octal:

"\101" # == "A"
"\123" # == "S"
"\12"  # == "\n"
"\1"   # string with one character with code point 1

You can use a backslash followed by an u and four hexadecimal characters to denote a unicode codepoint written:

"\u0041" # == "A"

Or you can use curly braces and specify up to six hexadecimal numbers (0 to 10FFFF):

"\u{41}" # == "A"

A string can span multiple lines:

"hello
      world" # same as "hello      \nworld"

Note that in the above example trailing and leading spaces, as well as newlines, end up in the resulting string. To avoid this, you can split a string into multiple lines by joining multiple literals with a backslash:

"hello " \
"world, " \
"no newlines" # same as "hello world, no newlines"

Alterantively, a backlash followed by a newline can be inserted inside the string literal:

"hello \
     world, \
     no newlines" # same as "hello world, no newlines"

In this case, leading whitespace is not included in the resulting string.

If you need to write a string that has many double quotes, parenthesis, or similar characters, you can use alternative literals:

# Supports double quotes and nested parenthesis
%(hello ("world")) # same as "hello (\"world\")"

# Supports double quotes and nested brackets
%[hello ["world"]] # same as "hello [\"world\"]"

# Supports double quotes and nested curlies
%{hello {"world"}} # same as "hello {\"world\"}"

# Supports double quotes and nested angles
%<hello <"world">> # same as "hello <\"world\">"

To create a String with embedded expressions, you can use string interpolation:

a = 1
b = 2
"sum = #{a + b}"        # "sum = 3"

This ends up invoking Object#to_s(IO) on each expression enclosed by #{...}.

If you need to dynamically build a string, use String#build or StringIO.

Superclass hierarchy

Object
Reference
String

Included Modules

Comparable(self)

Defined in:

Class Method Summary

Instance Method Summary

Macro Summary

Class Method Detail

def self.build(capacity = 64, &block)

Builds a String by creating a String::Builder with the given initial capacity, yielding it to the block and finally getting a String out of it. The String::Builder automatically resizes as needed.

str = String.build do |str|
  str << "hello "
  str << 1
end
str #=> "hello 1"

def self.new(capacity, &block)

Creates a new String by allocating a buffer (Pointer(UInt8)) with the given capacity, then yielding that buffer. The block must return a tuple with the bytesize and length (UTF-8 codepoints count) of the String. If the returned length is zero, the UTF-8 codepoints count will be lazily computed.

This method is unsafe: the bytesize returned by the block must be less than the capacity given to this String. In the future this method might check that the returned bytesize is less or equal than the capacity, making it a safe method.

If you need to build a String where the maximum capacity is unknown, use String#build.

str = String.new(4) do |buffer|
  buffer[0] = 'a'.ord.to_u8
  buffer[1] = 'b'.ord.to_u8
  {2, 2}
end
str #=> "ab"

Note: if the buffer doesn't end up denoting a valid UTF-8 sequence, this method still succeeds. However, when iterating it or indexing it, an InvalidByteSequenceError will be raised.


def self.new(pull : JSON::PullParser)

def self.new(chars : Pointer(UInt8), bytesize, length = 0)

Creates a new String from a pointer, indicating its bytesize count and, optionally, the UTF-8 codepoints count (length). Bytes will be copied from the pointer.

If the given length is zero, the amount of UTF-8 codepoints will be lazily computed when needed.

ptr = Pointer.malloc(4) { |i| ('a'.ord + i).to_u8 }
String.new(ptr, 2) => "ab"

Note: if the chars don't denote a valid UTF-8 sequence, this method still succeeds. However, when iterating it or indexing it, an InvalidByteSequenceError will be raised.


def self.new(slice : Slice(UInt8))

Creates a String form the given slice. Bytes will be copied from the slice.

This method is always safe to call, and the resulting string will have the contents and length of the slice.

slice = Slice.new(4) { |i| ('a'.ord + i).to_u8 }
String.new(slice) #=> "abcd"

Note: if the slice doesn't denote a valid UTF-8 sequence, this method still succeeds. However, when iterating it or indexing it, an InvalidByteSequenceError will be raised.


def self.new(chars : Pointer(UInt8))

Creates a String from a pointer. Bytes will be copied from the pointer.

This method is unsafe: the pointer must point to data that eventually contains a zero byte that indicates the ends of the string. Otherwise, the result of this method is undefined and might cause a segmentation fault.

This method is typically used in C bindings, where you get a char* from a library and the library guarantees that this pointer eventually has an ending zero byte.

ptr = Pointer.malloc(5) { |i| i == 4 ? 0 : ('a'.ord + i).to_u8 }
String.new(ptr) #=> "abcd"

Note: if the chars don't denote a valid UTF-8 sequence, this method still succeeds. However, when iterating it or indexing it, an InvalidByteSequenceError will be raised.


Instance Method Detail

def %(other)

def *(times : Int)

def +(other : self)

def +(char : Char)

def <=>(other : self)

def ==(other : self)

def =~(other)

def =~(regex : Regex)

def [](range : Range(Int, Int))

Returns a substring by using a Range's begin and end as character indices. Indices can be negative to start counting from the end of the string.

Raises IndexError if the range's start is not in range.

"hello"[0..2]  # "hel"
"hello"[0...2] # "he"
"hello"[1..-1]  # "ello"
"hello"[1...-1]  # "ell"

def [](index : Int)

Returns the Char at the give index, or raises IndexError if out of bounds.

Negative indices can be used to start counting from the end of the string.

"hello"[0]  # 'h'
"hello"[1]  # 'e'
"hello"[-1] # 'o'
"hello"[-2] # 'l'
"hello"[5]  # raises IndexError

def [](str : String)

def [](start : Int, count : Int)

Returns a substring starting from the start character of length #count.

The start argument can be negative to start counting from the end of the string.

Raises IndexError if start isn't in range.

Raises ArgumentError if #count is negative.


def [](regex : Regex)

def [](regex : Regex, group)

def []?(str : String)

def []?(regex : Regex)

def []?(regex : Regex, group)

def []?(index : Int)

def ascii_only?

def at(index : Int, &block)

def at(index : Int)

def byte_at(index)

def byte_at(index, &block)

def byte_at?(index)

def byte_index(string : String, offset = 0)

def byte_index(byte : Int, offset = 0)

def byte_slice(start : Int)

def byte_slice(start : Int, count : Int)

def bytes

Returns this string's bytes as an Array(UInt8).

"hello".bytes          #=> [104, 101, 108, 108, 111]
"你好".bytes           #=> [228, 189, 160, 229, 165, 189]

def bytesize

Returns the number of bytes in this string.

"hello".bytesize         #=> 5
"你好".bytesize          #=> 6

def camelcase

def capitalize

def char_at(index)

def char_index_to_byte_index(index)

Returns the byte index of a char index, or nil if out of bounds.

"hello".char_index_to_byte_index(1)     #=> 1
"こんにちは".char_index_to_byte_index(1) #=> 3

def chars

Returns an array of all characters in the string.

"ab☃".chars #=> ['a', 'b', '☃']

def chomp(string : String)

def chomp(char : Char)

def chomp

def chop

Returns a new String with the last character removed. If the string ends with \r\n, both characters are removed. Applying chop to an empty string returns an empty string.

"string\r\n".chop   #=> "string"
"string\n\r".chop   #=> "string\n"
"string\n".chop     #=> "string"
"string".chop       #=> "strin"
"x".chop.chop       #=> ""

See also: #chomp


def codepoint_at(index)

def codepoints

Returns an array of the codepoints that make the string. See Char#ord

"ab☃".codepoints #=> [97, 98, 9731]

def count(*sets)

Sets should be a list of strings following the rules described at Char#in_set?. Returns the number of characters in this string that match the given set.


def count(&block)

Yields each char in this string to the block, returns the number of times the block returned a truthy value.

"aabbcc".count {|c| ['a', 'b'].includes?(c) } #=> 4

def count(other : Char)

Counts the occurrences of other in this string.

"aabbcc".count('a') #=> 2

def cstr

def delete(&block)

Yields each char in this string to the block. Returns a new string with all characters for which the block returned a truthy value removed.

"aabbcc".delete {|c| ['a', 'b'].includes?(c) } #=> "cc"

def delete(char : Char)

Returns a new string with all occurrences of char removed.

"aabbcc".delete('b') #=> "aacc"

def delete(*sets)

Sets should be a list of strings following the rules described at Char#in_set?. Returns a new string with all characters that match the given set removed.

"aabbccdd".delete("a-c") #=> "dd"

def downcase

def dump

def dump(io)

def dump_unquoted

def dump_unquoted(io)

def each_byte(&block)

Yields each byte in the string to the block.

"ab☃".each_byte do |byte|
  byte #=> 97, 98, 226, 152, 131
end

def each_byte

Returns an iterator over each byte in the string.

bytes = "ab☃".each_byte
bytes.next #=> 97
bytes.next #=> 98
bytes.next #=> 226
bytes.next #=> 156
bytes.next #=> 131

def each_char(&block)

Yields each character in the string to the block.

"ab☃".each_char do |char|
  char #=> 'a', 'b', '☃'
end

def each_char

Returns an iterator over each character in the string.

chars = "ab☃".each_char
chars.next #=> 'a'
chars.next #=> 'b'
chars.next #=> '☃'

def each_char_with_index(&block)

Yields each character and its index in the string to the block.

"ab☃".each_char_with_index do |char, index|
  char  #=> 'a', 'b', '☃'
  index #=>  0,   1,   2
end

def each_codepoint

Returns an iterator for each codepoint. See Char#ord

codepoints = "ab☃".each_codepoint
codepoints.next #=> 97
codepoints.next #=> 98
codepoints.next #=> 9731

def each_codepoint(&block)

Yields each codepoint to the block. See Char#ord

"ab☃".each_codepoint do |codepoint|
  codepoint #=> 97, 98, 9731
end

def each_line(&block)

def each_line

def empty?

def ends_with?(str : String)

def ends_with?(char : Char)

def gsub(pattern : Regex, replacement)

Returns a string where all occurrences of the given pattern are replaced with the given replacement.

"hello".gsub(/[aeiou]/, '*') #=> "h*ll*"

def gsub(hash : Hash(Char, _))

Returns a string where all chars in the given hash are replaced by the corresponding hash values.

"hello".gsub({'e' => 'a', 'l' => 'd'}) #=> "haddo"

def gsub(string : String, replacement)

Returns a string where all occurrences of the given string are replaced with the given replacement.

"hello yellow".gsub("ll", "dd") #=> "heddo yeddow"

def gsub(pattern : Regex, &block)

Returns a string where all occurrences of the given pattern are replaced by the block value's value.

"hello".gsub(/./) {|s| s[0].ord.to_s + ' '} #=> #=> "104 101 108 108 111 "

def gsub(string : String, &block)

Returns a string where all occurrences of the given string are replaced with the block's value.

"hello yellow".gsub("ll") { "dd" } #=> "heddo yeddow"

def gsub(&block : Char -> _)

Returns a new string where each character yielded to the given block is replaced by the block's return value.

"hello".gsub { |x| (x.ord + 1).chr } #=> "ifmmp"
"hello".gsub { "hi" } #=> "hihihihihi"

def gsub(char : Char, replacement)

Returns a string where all occurrences of the given char are replaced with the given replacement.

"hello".gsub('l', "lo") #=> "heloloo"
"hello world".gsub('o', 'a') #=> "hella warld"

def gsub(pattern : Regex, hash : Hash(String, _))

Returns a string where all ocurrences of the given pattern are replaced with a hash of replacements. If the hash contains the matched pattern, the corresponding value is used as a replacement. Otherwise the match is not included in the returned string.

# "he" and "l" are matched and replaced,
# but "o" is not and so is not included
"hello".gsub(/(he|l|o)/, {"he": "ha", "l": "la"}).should eq("halala")

def hash

Return a hash based on this string’s length and content.

See also Object#hash.


def includes?(c : Char)

def includes?(str : String)

def index(c : String, offset = 0)

def index(c : Char, offset = 0)

def inspect(io)

def inspect_unquoted(io)

def inspect_unquoted

def length

Returns the number of unicode codepoints in this string.

"hello".length         #=> 5
"你好".length          #=> 2

def lines

def ljust(len, char = ' ' : Char)

def lstrip

def match(regex : Regex, pos = 0, &block)

def match(regex : Regex, pos = 0)

def reverse

def rindex(c : Char, offset = length - 1)

def rindex(c : String, offset = length - c.length)

def rjust(len, char = ' ' : Char)

def rstrip

def scan(pattern : String)

def scan(pattern : Regex)

def scan(pattern : String, &block)

def scan(pattern : Regex, &block)

def size

def split(separator : String, limit = nil)

def split(separator : Regex, limit = nil)

def split(limit = nil : Int32 | Nil)

def split(separator : Char, limit = nil)

def squeeze

Returns a new string, that has all characters removed, that were the same as the previous one.

"a       bbb".squeeze #=> "a b"

def squeeze(*sets : String)

Sets should be a list of strings following the rules described at Char#in_set?. Returns a new string with all runs of the same character replaced by one instance, if they match the given set.

If no set is given, all characters are matched.

"aaabbbcccddd".squeeze("b-d") #=> "aaabcd"
"a       bbb".squeeze #=> "a b"

def squeeze(char : Char)

Returns a new string, with all runs of char replaced by one instance.

"a    bbb".squeeze(' ') #=> "a bbb"

def squeeze(&block)

Yields each char in this string to the block. Returns a new string, that has all characters removed, that were the same as the previous one and for which the given block returned a truthy value.

"aaabbbccc".squeeze {|c| ['a', 'b'].includes?(c) } #=> "abccc"
"aaabbbccc".squeeze {|c| ['a', 'c'].includes?(c) } #=> "abbbc"

def starts_with?(str : String)

def starts_with?(char : Char)

def strip

def succ

Returns the successor of the string. The successor is calculated by incrementing characters starting from the rightmost alphanumeric (or the rightmost character if there are no alphanumerics) in the string. Incrementing a digit always results in another digit, and incrementing a letter results in another letter of the same case.

If the increment generates a “carry”, the character to the left of it is incremented. This process repeats until there is no carry, adding an additional character if necessary.

"abcd".succ        #=> "abce"
"THX1138".succ     #=> "THX1139"
"((koala))".succ   #=> "((koalb))"
"1999zzz".succ     #=> "2000aaa"
"ZZZ9999".succ     #=> "AAAA0000"
"***".succ         #=> "**+"

def to_big_i(base = 10)

def to_f

Returns the result of interpreting leading characters in this string as a floating point number (Float64). Extraneous characters past the end of a valid number are ignored. If there is not a valid number at the start of str,

  1. 0 is returned. This method never raises an exception.
"123.45e1".to_f        #=> 1234.5
"45.67 degrees".to_f   #=> 45.67
"thx1138".to_f         #=> 0.0

def to_f32

Returns the result of interpreting leading characters in this string as a floating point number (Float32). Extraneous characters past the end of a valid number are ignored. If there is not a valid number at the start of str,

  1. 0 is returned. This method never raises an exception.

See #to_f.


def to_f64

Same as #to_f.


def to_i(base = 10, whitespace = true, underscore = false, prefix = false, strict = true, &block)

Same as #to_i, but returns the block's value if there is not a valid number at the start of this string, or if the resulting integer doesn't fit an Int32.

"12345".to_i { 0 }       #=> 12345
"hello".to_i { 0 }       #=> 0

def to_i(base = 10, whitespace = true, underscore = false, prefix = false, strict = true)

Returns the result of interpreting leading characters in this string as an integer base base (between 2 and 36).

If there is not a valid number at the start of this string, or if the resulting integer doesn't fit an Int32, an ArgumentError is raised.

Options:

"12345".to_i                             #=> 12345
"0a".to_i                                #=> 0
"hello".to_i                             #=> raises
"0a".to_i(16)                            #=> 10
"1100101".to_i(2)                        #=> 101
"1100101".to_i(8)                        #=> 294977
"1100101".to_i(10)                       #=> 1100101
"1100101".to_i(base: 16)                 #=> 17826049

"12_345".to_i                            #=> raises
"12_345".to_i(underscore: true)          #=> 12345

"  12345  ".to_i                         #=> 12345
"  12345  ".to_i(whitepsace: false)      #=> raises

"0x123abc".to_i                          #=> raises
"0x123abc".to_i(prefix: true)            #=> 1194684

"99 red balloons".to_i                   #=> raises
"99 red balloons".to_i(strict: false)    #=> 99

def to_i16(base = 10, whitespace = true, underscore = false, prefix = false, strict = true, &block)

Same as #to_i but returns an Int16 or the block's value.


def to_i16(base = 10, whitespace = true, underscore = false, prefix = false, strict = true) : Int16

Same as #to_i but returns an Int16.


def to_i16?(base = 10, whitespace = true, underscore = false, prefix = false, strict = true) : Int16 | Nil

Same as #to_i but returns an Int16 or nil.


def to_i32(base = 10, whitespace = true, underscore = false, prefix = false, strict = true) : Int32

Same as #to_i.


def to_i32(base = 10, whitespace = true, underscore = false, prefix = false, strict = true, &block)

Same as #to_i.


def to_i32?(base = 10, whitespace = true, underscore = false, prefix = false, strict = true) : Int32 | Nil

Same as #to_i.


def to_i64(base = 10, whitespace = true, underscore = false, prefix = false, strict = true) : Int64

Same as #to_i but returns an Int64.


def to_i64(base = 10, whitespace = true, underscore = false, prefix = false, strict = true, &block)

Same as #to_i but returns an Int64 or the block's value.


def to_i64?(base = 10, whitespace = true, underscore = false, prefix = false, strict = true) : Int64 | Nil

Same as #to_i but returns an Int64 or nil.


def to_i8(base = 10, whitespace = true, underscore = false, prefix = false, strict = true) : Int8

Same as #to_i but returns an Int8.


def to_i8(base = 10, whitespace = true, underscore = false, prefix = false, strict = true, &block)

Same as #to_i but returns an Int8 or the block's value.


def to_i8?(base = 10, whitespace = true, underscore = false, prefix = false, strict = true) : Int8 | Nil

Same as #to_i but returns an Int8 or nil.


def to_i?(base = 10, whitespace = true, underscore = false, prefix = false, strict = true)

Same as #to_i, but returns nil if there is not a valid number at the start of this string, or if the resulting integer doesn't fit an Int32.

"12345".to_i?            #=> 12345
"99 red balloons".to_i?  #=> 99
"0a".to_i?               #=> 0
"hello".to_i?            #=> nil

def to_json(io)

def to_s(io)

def to_s

def to_slice

def to_u16(base = 10, whitespace = true, underscore = false, prefix = false, strict = true, &block)

Same as #to_i but returns an UInt16 or the block's value.


def to_u16(base = 10, whitespace = true, underscore = false, prefix = false, strict = true) : UInt16

Same as #to_i but returns an UInt16.


def to_u16?(base = 10, whitespace = true, underscore = false, prefix = false, strict = true) : UInt16 | Nil

Same as #to_i but returns an UInt16 or nil.


def to_u32(base = 10, whitespace = true, underscore = false, prefix = false, strict = true) : UInt32

Same as #to_i but returns an UInt32.


def to_u32(base = 10, whitespace = true, underscore = false, prefix = false, strict = true, &block)

Same as #to_i but returns an UInt32 or the block's value.


def to_u32?(base = 10, whitespace = true, underscore = false, prefix = false, strict = true) : UInt32 | Nil

Same as #to_i but returns an UInt32 or nil.


def to_u64(base = 10, whitespace = true, underscore = false, prefix = false, strict = true) : UInt64

Same as #to_i but returns an UInt64.


def to_u64(base = 10, whitespace = true, underscore = false, prefix = false, strict = true, &block)

Same as #to_i but returns an UInt64 or the block's value.


def to_u64?(base = 10, whitespace = true, underscore = false, prefix = false, strict = true) : UInt64 | Nil

Same as #to_i but returns an UInt64 or nil.


def to_u8(base = 10, whitespace = true, underscore = false, prefix = false, strict = true, &block)

Same as #to_i but returns an UInt8 or the block's value.


def to_u8(base = 10, whitespace = true, underscore = false, prefix = false, strict = true) : UInt8

Same as #to_i but returns an UInt8.


def to_u8?(base = 10, whitespace = true, underscore = false, prefix = false, strict = true) : UInt8 | Nil

Same as #to_i but returns an UInt8 or nil.


def to_unsafe

def tr(from : String, to : String)

def underscore

def unsafe_byte_at(index)

def unsafe_byte_slice(byte_offset, count)

def unsafe_byte_slice(byte_offset)

def upcase

Macro Detail

macro gen_to_(method, max_positive = nil, max_negative = nil)

:nodoc