Accessing Values in Strings: 'Hello World!' "Python Programming"
Accessing Values in Strings: 'Hello World!' "Python Programming"
Accessing Values in Strings: 'Hello World!' "Python Programming"
#!/usr/bin/python3
Updating Strings
You can "update" an existing string by (re)assigning a variable to another string. The new value
can be related to its previous value or to a completely different string altogether. For example –
#!/usr/bin/python3
+ a + b will give
Concatenation - Adds values on either side of the operator
HelloPython
[] Slice - Gives the character from the given index a[1] will give e
[:] Range Slice - Gives the characters from the given range a[1:4] will give ell
In Membership - Returns true if a character exists in the given string H in a will give 1
not in Membership - Returns true if a character does not exist in the given string M not in a will
give 1
r/R Raw String - Suppresses actual meaning of Escape characters. The syntax for raw print r'\n' prints \n
strings is exactly the same as for normal strings with the exception of the raw and print R'\
string operator, the letter "r," which precedes the quotation marks. The "r" can be n'prints \n
lowercase (r) or uppercase (R) and must be placed immediately preceding the
first quote mark.
1
%c
character
2
%s
string conversion via str() prior to formatting
3
%i
signed decimal integer
4
%d
signed decimal integer
5
%u
unsigned decimal integer
6
%o
octal integer
7
%x
hexadecimal integer (lowercase letters)
8
%X
hexadecimal integer (UPPERcase letters)
9
%e
exponential notation (with lowercase 'e')
10
%E
exponential notation (with UPPERcase 'E')
11
%f
floating point real number
12
%g
the shorter of %f and %e
13
%G
the shorter of %f and %E
Other supported symbols and functionality are listed in the following table −
1
*
argument specifies width or precision
2
-
left justification
3
+
display the sign
4
<sp>
leave a blank space before a positive number
5
#
add the octal leading zero ( '0' ) or hexadecimal leading '0x' or '0X', depending on whether 'x' or 'X' were
used.
6
0
pad from left with zeros (instead of spaces)
7
%
'%%' leaves you with a single literal '%'
8
(var)
mapping variable (dictionary arguments)
9
m.n.
m is the minimum total width and n is the number of digits to display after the decimal point (if appl.)
Triple Quotes
Python's triple quotes comes to the rescue by allowing strings to span multiple lines, including
verbatim NEWLINEs, TABs, and any other special characters.
The syntax for triple quotes consists of three consecutive single or double quotes.
#!/usr/bin/python3
print('C:\\nowhere')
When the above code is executed, it produces the following result −
C:\nowhere
Now let's make use of raw string. We would put expression in r'expression' as follows –
#!/usr/bin/python3
print(r'C:\\nowhere')
When the above code is executed, it produces the following result −
C:\\nowhere
Unicode String
In Python 3, all strings are represented in Unicode.In Python 2 are stored internally as 8-bit
ASCII, hence it is required to attach 'u' to make it Unicode. It is no longer necessary now.
Built-in String Methods
Python includes the following built-in methods to manipulate strings −
1 capitalize()
2 center(width, fillchar)
Returns a string padded with fillchar with the original string centered to a total of width columns.
Counts how many times str occurs in string or in a substring of string if starting index beg and ending
index end are given.
Returns encoded string version of string; on error, default is to raise a ValueError unless errors is given
with 'ignore' or 'replace'.
Determines if string or a substring of string (if starting index beg and ending index end are given) ends
with suffix; returns true if so and false otherwise.
7 expandtabs(tabsize = 8)
Expands tabs in string to multiple spaces; defaults to 8 spaces per tab if tabsize not provided.
Determine if str occurs in string or in a substring of string if starting index beg and ending index end are
given returns index if found and -1 otherwise.
10 isalnum()
Returns true if string has at least 1 character and all characters are alphanumeric and false otherwise.
11 isalpha()
Returns true if string has at least 1 character and all characters are alphabetic and false otherwise.
12 isdigit()
13 islower()
Returns true if string has at least 1 cased character and all cased characters are in lowercase and false
otherwise.
14 isnumeric()
Returns true if a unicode string contains only numeric characters and false otherwise.
15 isspace()
Returns true if string contains only whitespace characters and false otherwise.
16 istitle()
17 isupper()
Returns true if string has at least one cased character and all cased characters are in uppercase and false
otherwise.
18 join(seq)
Merges (concatenates) the string representations of elements in sequence seq into a string, with separator
string.
19 len(string)
20 ljust(width[, fillchar])
Returns a space-padded string with the original string left-justified to a total of width columns.
21 lower()
22 lstrip()
23 maketrans()
24 max(str)
25 min(str)
29 rjust(width,[, fillchar])
Returns a space-padded string with the original string right-justified to a total of width columns.
30 rstrip()
31 split(str="", num=string.count(str))
Splits string according to delimiter str (space if not provided) and returns list of substrings; split into at
most num substrings if given.
32 splitlines( num=string.count('\n'))
Splits string at all (or num) NEWLINEs and returns a list of each line with NEWLINEs removed.
33 startswith(str, beg=0,end=len(string))
Determines if string or a substring of string (if starting index beg and ending index end are given) starts
with substring str; returns true if so and false otherwise.
34 strip([chars])
35 swapcase()
36 title()
Returns "titlecased" version of string, that is, all words begin with uppercase and the rest are lowercase.
37 translate(table, deletechars="")
Translates string according to translation table str(256 chars), removing those in the del string.
38 upper()
39 zfill (width)
Returns original string leftpadded with zeros to a total of width characters; intended for numbers, zfill()
retains any sign given (less one zero).
40 isdecimal()
Returns true if a unicode string contains only decimal characters and false otherwise.
whitespace
A string containing all ASCII characters that are deemed whitespace: at least
space, tab, linefeed, and carriage return, but more characters (e.g., certain control
characters) may be present, depending on the active locale
You should not rebind these attributes; the effects of doing so are undefined, since
other parts of the Python library may rely on them.
textwrap.wrap(text, width=70, **kwargs): This function wraps the input paragraph such that each line
in the paragraph is at most width characters long. The wrap method returns a list of output lines. The
returned list is empty if the wrapped output has no content. Default width is taken as 70.
import textwrap
value = """This function wraps the input paragraph such that each line
in the paragraph is at most width characters long. The wrap method
returns a list of output lines. The returned list
is empty if the wrapped
output has no content."""
word_list = wrapper.wrap(text=value)
# Print each line.
for element in word_list:
print(element)
import textwrap
value = """This function returns the answer as STRING and not LIST."""
string = wrapper.fill(text=value)
print (string)
textwrap.dedent(text): This function is used to remove any common leading whitespace from every line
in the input text. This allows to use docstrings or embedded multi-line strings line up with the left edge of
the display, while removing the formatting of the code itself.
import textwrap
wrapper = textwrap.TextWrapper(width=50)
s = '''\
hello
world
'''
print(repr(s)) # prints ' hello\n world\n '
text = textwrap.dedent(s)
print(repr(text)) # prints 'hello\n world\n'
textwrap.shorten(text, width, **kwargs): This function truncates the input string so that the length of
the string becomes equal to the given width. At first, all the whitespaces are collapsed in the string by
removing the whitespaces with a single space. If the modified string fits in the given string, then it is
returned otherwise, the characters from the end are dropped so that the remaining words plus the
placeholder fit within width.
import textwrap
sample_text = """This function wraps the input paragraph such that each line
n the paragraph is at most width characters long. The wrap method
returns a list of output lines. The returned list
is empty if the wrapped
output has no content."""
wrapper = textwrap.TextWrapper(width=50)
dedented_text = textwrap.dedent(text=sample_text)
original = wrapper.fill(text=dedented_text)
print('Original:\n')
print(original)
print('\nShortened:\n')
print(shortened_wrapped)
textwrap.indent(text, prefix, predicate=None): This function is used to add the given prefix to the
beginning of the selected lines of the text. The predicate argument can be used to control which lines are
indented.
import textwrap
s = 'hello\n\n \nworld'
s1 = textwrap.indent(text=s, prefix=' ')
print (s1)
print ("\n")
s2 = textwrap.indent(text=s, prefix='+ ', predicate=lambda line:
True)
print (s2)
NOTE:
A regular expression is a special sequence of characters that helps you match or find
other strings or sets of strings, using a specialized syntax held in a pattern. Regular
expressions are widely used in UNIX world.
The Python module re provides full support for Perl-like regular expressions in Python.
The re module raises the exception re.error if an error occurs while compiling or using
a regular expression.
We would cover two important functions, which would be used to handle regular
expressions. But a small thing first: There are various characters, which would have
special meaning when they are used in regular expression. To avoid any confusion
while dealing with regular expressions, we would use Raw Strings as r'expression'.
The match Function
This function attempts to match RE pattern to string with optional flags.
Here is the syntax for this function −
re.match(pattern, string, flags=0)
Here is the description of the parameters −
1
pattern
This is the regular expression to be matched.
2
string
This is the string, which would be searched to match the pattern at the beginning of string.
3
flags
You can specify different flags using bitwise OR (|). These are modifiers, which are listed in
the table below.
1
group(num=0)
This method returns entire match (or specific subgroup num)
2
groups()
This method returns all matching subgroups in a tuple (empty if there weren't any)
Example
Live Demo
#!/usr/bin/python
import re
if matchObj:
print"matchObj.group() : ", matchObj.group()
print"matchObj.group(1) : ", matchObj.group(1)
print"matchObj.group(2) : ", matchObj.group(2)
else:
print"No match!!"
The search Function
This function searches for first occurrence of RE pattern within string with
optional flags.
Here is the syntax for this function −
re.search(pattern, string, flags=0)
Here is the description of the parameters −
Sr.No Parameter & Description
.
1
pattern
This is the regular expression to be matched.
2
string
This is the string, which would be searched to match the pattern anywhere in the string.
3
flags
You can specify different flags using bitwise OR (|). These are modifiers, which are listed in
the table below.
1
group(num=0)
This method returns entire match (or specific subgroup num)
2
groups()
This method returns all matching subgroups in a tuple (empty if there weren't any)
Example
Live Demo
#!/usr/bin/python
import re
if searchObj:
print"searchObj.group() : ", searchObj.group()
print"searchObj.group(1) : ", searchObj.group(1)
print"searchObj.group(2) : ", searchObj.group(2)
else:
print"Nothing found!!"
#!/usr/bin/python
import re
#!/usr/bin/python
import re
1
re.I
Performs case-insensitive matching.
2
re.L
Interprets words according to the current locale. This interpretation affects the alphabetic
group (\w and \W), as well as word boundary behavior(\b and \B).
3
re.M
Makes $ match the end of a line (not just the end of the string) and makes ^ match the start of
any line (not just the start of the string).
4
re.S
Makes a period (dot) match any character, including a newline.
5
re.U
Interprets letters according to the Unicode character set. This flag affects the behavior of \w, \
W, \b, \B.
6
re.X
Permits "cuter" regular expression syntax. It ignores whitespace (except inside a set [] or
when escaped by a backslash) and treats unescaped # as a comment marker.
1
^
Matches beginning of line.
2
$
Matches end of line.
3
.
Matches any single character except newline. Using m option allows it to match newline as
well.
4
[...]
Matches any single character in brackets.
5
[^...]
Matches any single character not in brackets
6
re*
Matches 0 or more occurrences of preceding expression.
7
re+
Matches 1 or more occurrence of preceding expression.
8
re?
Matches 0 or 1 occurrence of preceding expression.
9
re{ n}
Matches exactly n number of occurrences of preceding expression.
10
re{ n,}
Matches n or more occurrences of preceding expression.
11
re{ n, m}
Matches at least n and at most m occurrences of preceding expression.
12
a| b
Matches either a or b.
13
(re)
Groups regular expressions and remembers matched text.
14
(?imx)
Temporarily toggles on i, m, or x options within a regular expression. If in parentheses, only
that area is affected.
15
(?-imx)
Temporarily toggles off i, m, or x options within a regular expression. If in parentheses, only
that area is affected.
16
(?: re)
Groups regular expressions without remembering matched text.
17
(?imx: re)
Temporarily toggles on i, m, or x options within parentheses.
18
(?-imx: re)
Temporarily toggles off i, m, or x options within parentheses.
19
(?#...)
Comment.
20
(?= re)
Specifies position using a pattern. Doesn't have a range.
21
(?! re)
Specifies position using pattern negation. Doesn't have a range.
22
(?> re)
Matches independent pattern without backtracking.
23
\w
Matches word characters.
24
\W
Matches nonword characters.
25
\s
Matches whitespace. Equivalent to [\t\n\r\f].
26
\S
Matches nonwhitespace.
27
\d
Matches digits. Equivalent to [0-9].
28
\D
Matches nondigits.
29
\A
Matches beginning of string.
30
\Z
Matches end of string. If a newline exists, it matches just before newline.
31
\z
Matches end of string.
32
\G
Matches point where last match finished.
33
\b
Matches word boundaries when outside brackets. Matches backspace (0x08) when inside
brackets.
34
\B
Matches nonword boundaries.
35
\n, \t, etc.
Matches newlines, carriage returns, tabs, etc.
36
\1...\9
Matches nth grouped subexpression.
37
\10
Matches nth grouped subexpression if it matched already. Otherwise refers to the octal
representation of a character code.
1
python
Match "python".
Character classes
Sr.No Example & Description
.
1
[Pp]ython
Match "Python" or "python"
2
rub[ye]
Match "ruby" or "rube"
3
[aeiou]
Match any one lowercase vowel
4
[0-9]
Match any digit; same as [0123456789]
5
[a-z]
Match any lowercase ASCII letter
6
[A-Z]
Match any uppercase ASCII letter
7
[a-zA-Z0-9]
Match any of the above
8
[^aeiou]
Match anything other than a lowercase vowel
9
[^0-9]
Match anything other than a digit
1
.
Match any character except newline
2
\d
Match a digit: [0-9]
3
\D
Match a nondigit: [^0-9]
4
\s
Match a whitespace character: [ \t\r\n\f]
5
\S
Match nonwhitespace: [^ \t\r\n\f]
6
\w
Match a single word character: [A-Za-z0-9_]
7
\W
Match a nonword character: [^A-Za-z0-9_]
Repetition Cases
Sr.No Example & Description
.
1
ruby?
Match "rub" or "ruby": the y is optional
2
ruby*
Match "rub" plus 0 or more ys
3
ruby+
Match "rub" plus 1 or more ys
4
\d{3}
Match exactly 3 digits
5
\d{3,}
Match 3 or more digits
6
\d{3,5}
Match 3, 4, or 5 digits
Nongreedy repetition
This matches the smallest number of repetitions −
2
<.*?>
Nongreedy: matches "<python>" in "<python>perl>"
1
\D\d+
No group: + repeats \d
2
(\D\d)+
Grouped: + repeats \D\d pair
3
([Pp]ython(, )?)+
Match "Python", "Python, python, python", etc.
Backreferences
This matches a previously matched group again −
1
([Pp])ython&\1ails
Match python&pails or Python&Pails
2
(['"])[^\1]*\1
Single or double-quoted string. \1 matches whatever the 1st group matched. \2 matches
whatever the 2nd group matched, etc.
Alternatives
Sr.No Example & Description
.
1
python|perl
Match "python" or "perl"
2
rub(y|le))
Match "ruby" or "ruble"
3
Python(!+|\?)
"Python" followed by one or more ! or one ?
Anchors
This needs to specify match position.
1
^Python
Match "Python" at the start of a string or internal line
2
Python$
Match "Python" at the end of a string or line
3
\APython
Match "Python" at the start of a string
4
Python\Z
Match "Python" at the end of a string
5
\bPython\b
Match "Python" at a word boundary
6
\brub\B
\B is nonword boundary: match "rub" in "rube" and "ruby" but not alone
7
Python(?=!)
Match "Python", if followed by an exclamation point.
8
Python(?!!)
Match "Python", if not followed by an exclamation point.
1
R(?#comment)
Matches "R". All the rest is a comment
2
R(?i)uby
Case-insensitive while matching "uby"
3
R(?i:uby)
Same as above
4
rub(?:y|le))
Group only without creating \1 backreference