A String represents an immutable sequence of UTF-8 characters.
A String is typically created with a string literal, enclosing UTF-8 characters in double quotes:
"hello world"
A backslash can be used to denote some characters inside the string:
"\"" # double quote
"\\" # backslash
"\e" # escape
"\f" # form feed
"\n" # newline
"\r" # carriage return
"\t" # tab
"\v" # vertical tab
You can use a backslash followed by at most three digits to denote a code point written in octal:
"\101" # == "A"
"\123" # == "S"
"\12" # == "\n"
"\1" # string with one character with code point 1
You can use a backslash followed by an u and four hexadecimal characters to denote a unicode codepoint written:
"\u0041" # == "A"
Or you can use curly braces and specify up to six hexadecimal numbers (0 to 10FFFF):
"\u{41}" # == "A"
A string can span multiple lines:
"hello
world" # same as "hello \nworld"
Note that in the above example trailing and leading spaces, as well as newlines, end up in the resulting string. To avoid this, you can split a string into multiple lines by joining multiple literals with a backslash:
"hello " \
"world, " \
"no newlines" # same as "hello world, no newlines"
Alterantively, a backlash followed by a newline can be inserted inside the string literal:
"hello \
world, \
no newlines" # same as "hello world, no newlines"
In this case, leading whitespace is not included in the resulting string.
If you need to write a string that has many double quotes, parenthesis, or similar characters, you can use alternative literals:
# Supports double quotes and nested parenthesis
%(hello ("world")) # same as "hello (\"world\")"
# Supports double quotes and nested brackets
%[hello ["world"]] # same as "hello [\"world\"]"
# Supports double quotes and nested curlies
%{hello {"world"}} # same as "hello {\"world\"}"
# Supports double quotes and nested angles
%<hello <"world">> # same as "hello <\"world\">"
To create a String with embedded expressions, you can use string interpolation:
a = 1
b = 2
"sum = #{a + b}" # "sum = 3"
This ends up invoking Object#to_s(IO) on each expression enclosed by #{...}.
If you need to dynamically build a string, use String#build or StringIO.
Builds a String by creating a String::Builder with the given initial capacity, yielding it to the block and finally getting a String out of it.
Creates a new String from a pointer, indicating its bytesize count and, optionally, the UTF-8 codepoints count (length).
Creates a String form the given slice.
Creates a String from a pointer.
Returns a substring by using a Range's begin and end as character indices.
Returns the Char at the give index, or raises IndexError if out of bounds.
Returns a substring starting from the start character of length #count.
Returns the number of bytes in this string.
Returns the byte index of a char index, or nil if out of bounds.
Returns an array of all characters in the string.
Returns a new String with the last character removed.
Returns an array of the codepoints that make the string.
Sets should be a list of strings following the rules described at Char#in_set?.
Yields each char in this string to the block, returns the number of times the block returned a truthy value.
Counts the occurrences of other in this string.
Yields each char in this string to the block.
Returns a new string with all occurrences of char removed.
Sets should be a list of strings following the rules described at Char#in_set?.
Yields each byte in the string to the block.
Returns an iterator over each byte in the string.
Yields each character in the string to the block.
Returns an iterator over each character in the string.
Yields each character and its index in the string to the block.
Returns an iterator for each codepoint.
Yields each codepoint to the block.
Returns a string where all occurrences of the given pattern are replaced with the given replacement.
Returns a string where all chars in the given hash are replaced by the corresponding hash values.
Returns a string where all occurrences of the given string are replaced with the given replacement.
Returns a string where all occurrences of the given pattern are replaced by the block value's value.
Returns a string where all occurrences of the given string are replaced with the block's value.
Returns a new string where each character yielded to the given block is replaced by the block's return value.
Returns a string where all occurrences of the given char are replaced with the given replacement.
Returns a string where all ocurrences of the given pattern are replaced with a hash of replacements.
Return a hash based on this string’s length and content.
Returns the number of unicode codepoints in this string.
Returns a new string, that has all characters removed, that were the same as the previous one.
Sets should be a list of strings following the rules described at Char#in_set?.
Returns a new string, with all runs of char replaced by one instance.
Yields each char in this string to the block.
Returns the successor of the string.
Returns the result of interpreting leading characters in this string as a floating point number (Float64).
Returns the result of interpreting leading characters in this string as a floating point number (Float32).
Same as #to_f.
Same as #to_i, but returns the block's value if there is not a valid number at the start of this string, or if the resulting integer doesn't fit an Int32.
Returns the result of interpreting leading characters in this string as an integer base base (between 2 and 36).
Same as #to_i but returns an Int16 or the block's value.
Same as #to_i but returns an Int16.
Same as #to_i but returns an Int16 or nil.
Same as #to_i.
Same as #to_i.
Same as #to_i.
Same as #to_i but returns an Int64.
Same as #to_i but returns an Int64 or the block's value.
Same as #to_i but returns an Int64 or nil.
Same as #to_i but returns an Int8.
Same as #to_i but returns an Int8 or the block's value.
Same as #to_i but returns an Int8 or nil.
Same as #to_i, but returns nil if there is not a valid number at the start of this string, or if the resulting integer doesn't fit an Int32.
Same as #to_i but returns an UInt16 or the block's value.
Same as #to_i but returns an UInt16.
Same as #to_i but returns an UInt16 or nil.
Same as #to_i but returns an UInt32.
Same as #to_i but returns an UInt32 or the block's value.
Same as #to_i but returns an UInt32 or nil.
Same as #to_i but returns an UInt64.
Same as #to_i but returns an UInt64 or the block's value.
Same as #to_i but returns an UInt64 or nil.
Same as #to_i but returns an UInt8 or the block's value.
Same as #to_i but returns an UInt8.
Same as #to_i but returns an UInt8 or nil.
Builds a String by creating a String::Builder with the given initial capacity, yielding
it to the block and finally getting a String out of it. The String::Builder automatically
resizes as needed.
str = String.build do |str|
str << "hello "
str << 1
end
str #=> "hello 1"Creates a new String by allocating a buffer (Pointer(UInt8)) with the given capacity, then
yielding that buffer. The block must return a tuple with the bytesize and length
(UTF-8 codepoints count) of the String. If the returned length is zero, the UTF-8 codepoints
count will be lazily computed.
This method is unsafe: the bytesize returned by the block must be less than the capacity given to this String. In the future this method might check that the returned bytesize is less or equal than the capacity, making it a safe method.
If you need to build a String where the maximum capacity is unknown, use String#build.
str = String.new(4) do |buffer|
buffer[0] = 'a'.ord.to_u8
buffer[1] = 'b'.ord.to_u8
{2, 2}
end
str #=> "ab"
Note: if the buffer doesn't end up denoting a valid UTF-8 sequence, this method still succeeds.
However, when iterating it or indexing it, an InvalidByteSequenceError will be raised.
Creates a new String from a pointer, indicating its bytesize count and, optionally, the UTF-8 codepoints count (length). Bytes will be copied from the pointer.
If the given length is zero, the amount of UTF-8 codepoints will be lazily computed when needed.
ptr = Pointer.malloc(4) { |i| ('a'.ord + i).to_u8 }
String.new(ptr, 2) => "ab"
Note: if the chars don't denote a valid UTF-8 sequence, this method still succeeds.
However, when iterating it or indexing it, an InvalidByteSequenceError will be raised.
Creates a String form the given slice. Bytes will be copied from the slice.
This method is always safe to call, and the resulting string will have the contents and length of the slice.
slice = Slice.new(4) { |i| ('a'.ord + i).to_u8 }
String.new(slice) #=> "abcd"
Note: if the slice doesn't denote a valid UTF-8 sequence, this method still succeeds.
However, when iterating it or indexing it, an InvalidByteSequenceError will be raised.
Creates a String from a pointer. Bytes will be copied from the pointer.
This method is unsafe: the pointer must point to data that eventually contains a zero byte that indicates the ends of the string. Otherwise, the result of this method is undefined and might cause a segmentation fault.
This method is typically used in C bindings, where you get a char* from a
library and the library guarantees that this pointer eventually has an
ending zero byte.
ptr = Pointer.malloc(5) { |i| i == 4 ? 0 : ('a'.ord + i).to_u8 }
String.new(ptr) #=> "abcd"
Note: if the chars don't denote a valid UTF-8 sequence, this method still succeeds.
However, when iterating it or indexing it, an InvalidByteSequenceError will be raised.
Returns a substring by using a Range's begin and end as character indices. Indices can be negative to start counting from the end of the string.
Raises IndexError if the range's start is not in range.
"hello"[0..2] # "hel"
"hello"[0...2] # "he"
"hello"[1..-1] # "ello"
"hello"[1...-1] # "ell"Returns the Char at the give index, or raises IndexError if out of bounds.
Negative indices can be used to start counting from the end of the string.
"hello"[0] # 'h'
"hello"[1] # 'e'
"hello"[-1] # 'o'
"hello"[-2] # 'l'
"hello"[5] # raises IndexErrorReturns a substring starting from the start character
of length #count.
The start argument can be negative to start counting
from the end of the string.
Raises IndexError if start isn't in range.
Raises ArgumentError if #count is negative.
Returns the number of bytes in this string.
"hello".bytesize #=> 5
"你好".bytesize #=> 6Returns the byte index of a char index, or nil if out of bounds.
"hello".char_index_to_byte_index(1) #=> 1
"こんにちは".char_index_to_byte_index(1) #=> 3Returns an array of all characters in the string.
"ab☃".chars #=> ['a', 'b', '☃']Returns a new String with the last character removed.
If the string ends with \r\n, both characters are removed.
Applying chop to an empty string returns an empty string.
"string\r\n".chop #=> "string"
"string\n\r".chop #=> "string\n"
"string\n".chop #=> "string"
"string".chop #=> "strin"
"x".chop.chop #=> ""
See also: #chomp
Returns an array of the codepoints that make the string. See Char#ord
"ab☃".codepoints #=> [97, 98, 9731]Sets should be a list of strings following the rules described at Char#in_set?. Returns the number of characters in this string that match the given set.
Yields each char in this string to the block, returns the number of times the block returned a truthy value.
"aabbcc".count {|c| ['a', 'b'].includes?(c) } #=> 4Counts the occurrences of other in this string.
"aabbcc".count('a') #=> 2Yields each char in this string to the block. Returns a new string with all characters for which the block returned a truthy value removed.
"aabbcc".delete {|c| ['a', 'b'].includes?(c) } #=> "cc"Returns a new string with all occurrences of char removed.
"aabbcc".delete('b') #=> "aacc"Sets should be a list of strings following the rules described at Char#in_set?. Returns a new string with all characters that match the given set removed.
"aabbccdd".delete("a-c") #=> "dd"Yields each byte in the string to the block.
"ab☃".each_byte do |byte|
byte #=> 97, 98, 226, 152, 131
endReturns an iterator over each byte in the string.
bytes = "ab☃".each_byte
bytes.next #=> 97
bytes.next #=> 98
bytes.next #=> 226
bytes.next #=> 156
bytes.next #=> 131Yields each character in the string to the block.
"ab☃".each_char do |char|
char #=> 'a', 'b', '☃'
endReturns an iterator over each character in the string.
chars = "ab☃".each_char
chars.next #=> 'a'
chars.next #=> 'b'
chars.next #=> '☃'Yields each character and its index in the string to the block.
"ab☃".each_char_with_index do |char, index|
char #=> 'a', 'b', '☃'
index #=> 0, 1, 2
endReturns an iterator for each codepoint. See Char#ord
codepoints = "ab☃".each_codepoint
codepoints.next #=> 97
codepoints.next #=> 98
codepoints.next #=> 9731Yields each codepoint to the block. See Char#ord
"ab☃".each_codepoint do |codepoint|
codepoint #=> 97, 98, 9731
endReturns a string where all occurrences of the given pattern are replaced with the given replacement.
"hello".gsub(/[aeiou]/, '*') #=> "h*ll*"Returns a string where all chars in the given hash are replaced by the corresponding hash values.
"hello".gsub({'e' => 'a', 'l' => 'd'}) #=> "haddo"Returns a string where all occurrences of the given string are replaced with the given replacement.
"hello yellow".gsub("ll", "dd") #=> "heddo yeddow"Returns a string where all occurrences of the given pattern are replaced by the block value's value.
"hello".gsub(/./) {|s| s[0].ord.to_s + ' '} #=> #=> "104 101 108 108 111 "Returns a string where all occurrences of the given string are replaced with the block's value.
"hello yellow".gsub("ll") { "dd" } #=> "heddo yeddow"Returns a new string where each character yielded to the given block is replaced by the block's return value.
"hello".gsub { |x| (x.ord + 1).chr } #=> "ifmmp"
"hello".gsub { "hi" } #=> "hihihihihi"Returns a string where all occurrences of the given char are replaced with the given replacement.
"hello".gsub('l', "lo") #=> "heloloo"
"hello world".gsub('o', 'a') #=> "hella warld"Returns a string where all ocurrences of the given pattern are replaced with a hash of replacements. If the hash contains the matched pattern, the corresponding value is used as a replacement. Otherwise the match is not included in the returned string.
# "he" and "l" are matched and replaced,
# but "o" is not and so is not included
"hello".gsub(/(he|l|o)/, {"he": "ha", "l": "la"}).should eq("halala")Return a hash based on this string’s length and content.
See also Object#hash.
Returns the number of unicode codepoints in this string.
"hello".length #=> 5
"你好".length #=> 2Returns a new string, that has all characters removed, that were the same as the previous one.
"a bbb".squeeze #=> "a b"Sets should be a list of strings following the rules described at Char#in_set?. Returns a new string with all runs of the same character replaced by one instance, if they match the given set.
If no set is given, all characters are matched.
"aaabbbcccddd".squeeze("b-d") #=> "aaabcd"
"a bbb".squeeze #=> "a b"Returns a new string, with all runs of char replaced by one instance.
"a bbb".squeeze(' ') #=> "a bbb"Yields each char in this string to the block. Returns a new string, that has all characters removed, that were the same as the previous one and for which the given block returned a truthy value.
"aaabbbccc".squeeze {|c| ['a', 'b'].includes?(c) } #=> "abccc"
"aaabbbccc".squeeze {|c| ['a', 'c'].includes?(c) } #=> "abbbc"Returns the successor of the string. The successor is calculated by incrementing characters starting from the rightmost alphanumeric (or the rightmost character if there are no alphanumerics) in the string. Incrementing a digit always results in another digit, and incrementing a letter results in another letter of the same case.
If the increment generates a “carry”, the character to the left of it is incremented. This process repeats until there is no carry, adding an additional character if necessary.
"abcd".succ #=> "abce"
"THX1138".succ #=> "THX1139"
"((koala))".succ #=> "((koalb))"
"1999zzz".succ #=> "2000aaa"
"ZZZ9999".succ #=> "AAAA0000"
"***".succ #=> "**+"Returns the result of interpreting leading characters in this string as a floating point number (Float64).
Extraneous characters past the end of a valid number are ignored. If there is not a valid number at the start of str,
"123.45e1".to_f #=> 1234.5
"45.67 degrees".to_f #=> 45.67
"thx1138".to_f #=> 0.0Same as #to_f.
Same as #to_i, but returns the block's value if there is not a valid number at the start
of this string, or if the resulting integer doesn't fit an Int32.
"12345".to_i { 0 } #=> 12345
"hello".to_i { 0 } #=> 0Returns the result of interpreting leading characters in this string as an integer base base (between 2 and 36).
If there is not a valid number at the start of this string, or if the resulting integer doesn't fit an Int32, an ArgumentError is raised.
Options:
"12345".to_i #=> 12345
"0a".to_i #=> 0
"hello".to_i #=> raises
"0a".to_i(16) #=> 10
"1100101".to_i(2) #=> 101
"1100101".to_i(8) #=> 294977
"1100101".to_i(10) #=> 1100101
"1100101".to_i(base: 16) #=> 17826049
"12_345".to_i #=> raises
"12_345".to_i(underscore: true) #=> 12345
" 12345 ".to_i #=> 12345
" 12345 ".to_i(whitepsace: false) #=> raises
"0x123abc".to_i #=> raises
"0x123abc".to_i(prefix: true) #=> 1194684
"99 red balloons".to_i #=> raises
"99 red balloons".to_i(strict: false) #=> 99Same as #to_i but returns an Int16 or the block's value.
Same as #to_i but returns an Int16.
Same as #to_i but returns an Int16 or nil.
Same as #to_i.
Same as #to_i.
Same as #to_i.
Same as #to_i but returns an Int64.
Same as #to_i but returns an Int64 or the block's value.
Same as #to_i but returns an Int64 or nil.
Same as #to_i but returns an Int8.
Same as #to_i but returns an Int8 or the block's value.
Same as #to_i but returns an Int8 or nil.
Same as #to_i, but returns nil if there is not a valid number at the start
of this string, or if the resulting integer doesn't fit an Int32.
"12345".to_i? #=> 12345
"99 red balloons".to_i? #=> 99
"0a".to_i? #=> 0
"hello".to_i? #=> nilSame as #to_i but returns an UInt16 or the block's value.
Same as #to_i but returns an UInt16.
Same as #to_i but returns an UInt16 or nil.
Same as #to_i but returns an UInt32.
Same as #to_i but returns an UInt32 or the block's value.
Same as #to_i but returns an UInt32 or nil.
Same as #to_i but returns an UInt64.
Same as #to_i but returns an UInt64 or the block's value.
Same as #to_i but returns an UInt64 or nil.
Same as #to_i but returns an UInt8 or the block's value.
Same as #to_i but returns an UInt8.
Same as #to_i but returns an UInt8 or nil.
:nodoc