Python 字符串常用的 20 個操作，你掌握了幾個？

作者：用戶007 2025-11-18 09:08:53

本文會介紹 20 個高頻、高效的字符串操作技巧，掌握它們，你的代碼效率能提升 10 倍。

字符串是 Python 中最常用的數據類型，但 90% 的程序員只用了 20% 的字符串操作。本文會介紹 20 個高頻、高效的字符串操作技巧，掌握它們，你的代碼效率能提升十倍。

一、字符串基礎查找和替換（5 個）

1. find() vs index() —— 查找子字符串的位置

這兩個方法看起來功能相同，其實有關鍵區(qū)別。

text = "Python is awesome, Python is powerful"

# find()：找到返回索引，找不到返回 -1（不報錯）
pos1 = text.find("Python")
print(pos1)  # 輸出：0

pos2 = text.find("Java")
print(pos2)  # 輸出：-1（不存在，返回 -1）

# index()：找到返回索引，找不到報錯
pos3 = text.index("Python")
print(pos3)  # 輸出：0

pos4 = text.index("Java")
# 報錯：ValueError: substring not found

# 找第二次出現的位置
second_pos = text.find("Python", 1)  # 從位置 1 開始查找
print(second_pos)  # 輸出：26

關鍵區(qū)別：

find() 找不到返回 -1，比較安全
index() 找不到報錯，需要異常處理

最佳實踐：推薦用 find()，避免異常處理的開銷。

2. replace() —— 替換子字符串

text = "hello world, hello python"

# 基礎替換：替換所有匹配
result1 = text.replace("hello", "hi")
print(result1)
# 輸出：hi world, hi python

# 替換指定次數：只替換前 n 個
result2 = text.replace("hello", "hi", 1)  # 只替換第 1 個
print(result2)
# 輸出：hi world, hello python

# 大小寫敏感
text2 = "Hello world, hello python"
result3 = text2.replace("hello", "hi")
print(result3)
# 輸出：Hello world, hi python（第一個 H 沒替換）

性能陷阱：

# ? 不好的做法（低效）
text = "a" * 1000000  # 100 萬個 'a'
for i in range(100):
    text = text.replace("a", "b")  # 每次都遍歷整個字符串

# ? 好的做法（高效）
import string
text = "a" * 1000000
result = text.translate(str.maketrans("a", "b"))  # 一次遍歷搞定

3. count() —— 統(tǒng)計子字符串出現次數

text = "the quick brown fox jumps over the lazy dog"

# 基礎計數
count1 = text.count("the")
print(count1)  # 輸出：2

# 指定范圍計數（從索引 5 到 30）
count2 = text.count("the", 5, 30)
print(count2)  # 輸出：1

# 統(tǒng)計不同字符的頻率
stats = {}
for char in text:
    if char != ' ':
        stats[char] = stats.get(char, 0) + 1

print(stats)
# 輸出：{'t': 2, 'h': 2, 'e': 3, ...}

# 更高效的做法：用 Counter
from collections import Counter
char_count = Counter(text.replace(" ", ""))
print(char_count.most_common(3))  # 輸出最常見的 3 個字符

生產環(huán)境用法：

# 統(tǒng)計關鍵詞在日志中出現的次數
log_text = """
ERROR: Database connection failed
WARNING: Memory usage high
ERROR: Timeout error
INFO: Server restarted
ERROR: Authentication failed
"""

error_count = log_text.count("ERROR")
warning_count = log_text.count("WARNING")
print(f"錯誤數：{error_count}，警告數：{warning_count}")

4. startswith() 和 endswith() —— 前綴和后綴判斷

filename = "document.pdf"
url = "https://www.example.com"

# 檢查后綴
if filename.endswith((".pdf", ".doc", ".docx")):
    print("這是一個文檔文件")

# 檢查前綴
if url.startswith(("http://", "https://")):
    print("這是一個網址")

# 實際應用：文件過濾
import os

def get_python_files(directory):
    """獲取目錄下所有 Python 文件"""
    python_files = []
    for file in os.listdir(directory):
        if file.endswith('.py'):
            python_files.append(file)
    return python_files

# 更 Pythonic 的做法
def get_python_files_v2(directory):
    """更高效的版本"""
    return [f for f in os.listdir(directory) if f.endswith('.py')]

性能對比：

# ? 不好的做法
if filename.endswith('.pdf') or filename.endswith('.doc'):
    pass

# ? 好的做法（3 倍快）
if filename.endswith(('.pdf', '.doc')):
    pass

5. strip() / lstrip() / rstrip() —— 去除空白字符

text = "   hello world   \n"

# strip()：去除兩端的空白
result1 = text.strip()
print(f"'{result1}'")  # 輸出：'hello world'

# lstrip()：只去除左端
result2 = text.lstrip()
print(f"'{result2}'")  # 輸出：'hello world   \n'

# rstrip()：只去除右端
result3 = text.rstrip()
print(f"'{result3}'")  # 輸出：'   hello world'

# ?? 關鍵陷阱：不是只去除一個空格！
text2 = "---hello---"
print(text2.strip("-"))  # 輸出：hello（所有連續(xù)的 - 都被去除了）

# 自定義要去除的字符
text3 = "xxxhelloyyy"
print(text3.strip("xy"))  # 輸出：hello
print(text3.strip("xyhel"))  # 輸出：o（只要包含的字符都被去除）

# 實戰(zhàn)應用：清理 CSV 數據
csv_line = " 張三 , 25 , 北京 \n"
fields = [f.strip() for f in csv_line.split(',')]
print(fields)
# 輸出：['張三', '25', '北京']

# 處理用戶輸入
user_input = input("請輸入你的名字：").strip()
# 自動去除多余的空白，避免數據不一致

常見錯誤：

# ? 錯誤：strip 去除的是字符的集合，不是字符串本身
text = "hello"
print(text.strip("lo"))  # 輸出：he（不是 hello）

# ? 正確做法：如果要去除某個字符串前綴
if text.startswith("lo"):
    text = text[2:]

二、高級分割和連接（4 個）

6. split() 和 rsplit() —— 分割字符串的藝術

# 基礎分割
text = "apple,banana,cherry,date"

parts1 = text.split(",")
print(parts1)
# 輸出：['apple', 'banana', 'cherry', 'date']

# 限制分割次數
parts2 = text.split(",", 2)  # 只分割 2 次
print(parts2)
# 輸出：['apple', 'banana', 'cherry,date']

# rsplit()：從右邊開始分割
parts3 = text.rsplit(",", 2)  # 從右邊分割 2 次
print(parts3)
# 輸出：['apple,banana', 'cherry', 'date']

# 多個分隔符分割（用正則表達式）
import re
text2 = "apple, banana; cherry: date"
parts4 = re.split(r'[,;:]', text2)
print(parts4)
# 輸出：['apple', ' banana', ' cherry', ' date']

# 實戰(zhàn)應用1：解析 URL
url = "https://www.example.com/path/to/resource?key=value&foo=bar"
protocol, rest = url.split("://", 1)
domain, rest = rest.split("/", 1)
path, query = rest.split("?", 1)
print(f"協(xié)議：{protocol}，域名：{domain}，路徑：{path}，查詢：{query}")

# 實戰(zhàn)應用2：解析 CSV 行
csv_line = 'John,"Smith, Jr.",30,New York'
# 簡單 split 會出錯，需要用 csv 模塊
import csv
reader = csv.reader([csv_line])
fields = next(reader)
print(fields)
# 輸出：['John', 'Smith, Jr.', '30', 'New York']

性能對比：

# ? 低效：多次分割
text = "a:b:c:d:e"
parts = text.split(":")
result = parts[2]  # 獲取第 3 個元素

# ? 高效：只分割需要的部分
result = text.split(":", 3)[2]

7. join() —— 連接字符串

# 基礎連接
words = ["hello", "world", "python"]
result1 = " ".join(words)
print(result1)  # 輸出：hello world python

# 連接數字（需要轉換）
numbers = [1, 2, 3, 4, 5]
result2 = "-".join(str(n) for n in numbers)
print(result2)  # 輸出：1-2-3-4-5

# 實戰(zhàn)應用1：生成 SQL IN 語句
ids = [1, 2, 3, 4, 5]
sql = f"SELECT * FROM users WHERE id IN ({','.join(map(str, ids))})"
print(sql)

# 實戰(zhàn)應用2：生成 URL 路徑
path_parts = ["api", "v1", "users", "123"]
path = "/" + "/".join(path_parts)
print(path)  # 輸出：/api/v1/users/123

# 實戰(zhàn)應用3：生成 CSV 行
data = ["張三", 25, "北京", "zhangsan@example.com"]
csv_line = ",".join(map(str, data))
print(csv_line)

# ?? 性能陷阱：不要用 + 連接多個字符串
# ? 不好的做法（每次都創(chuàng)建新字符串，O(n2) 復雜度）
result = ""
for word in words:
    result = result + " " + word

# ? 好的做法（一次性連接，O(n) 復雜度）
result = " ".join(words)

大規(guī)模數據對比：

import time

# 生成 10000 個字符串
data = ["word"] * 10000

# 用 + 連接（耗時）
start = time.time()
result = ""
for word in data:
    result += word + ","
time1 = time.time() - start

# 用 join（快速）
start = time.time()
result = ",".join(data)
time2 = time.time() - start

print(f"+ 方式：{time1:.4f}s，join 方式：{time2:.4f}s")
# 輸出示例：+ 方式：0.1234s，join 方式：0.0012s（快 100 倍！）

8. partition() 和 rpartition() —— 三分法分割

# partition()：在第一個分隔符處分割成三部分
text = "name=John;age=30;city=NYC"

head, sep, tail = text.partition(";")
print(f"前：{head}，分隔符：{sep}，后：{tail}")
# 輸出：前：name=John，分隔符：；，后：age=30;city=NYC

# 實戰(zhàn)應用：解析 key=value 格式
def parse_key_value(text):
    key, sep, value = text.partition("=")
    return key.strip(), value.strip() if sep elseNone

result = parse_key_value("timeout = 3000")
print(result)  # 輸出：('timeout', '3000')

# rpartition()：從右邊開始分割
head, sep, tail = text.rpartition(";")
print(f"前：{head}，分隔符：{sep}，后：{tail}")
# 輸出：前：name=John;age=30，分隔符：；，后：city=NYC

# 實戰(zhàn)應用：獲取文件擴展名
def get_file_info(filename):
    name, sep, ext = filename.rpartition(".")
    return name, ext if sep else""

print(get_file_info("document.pdf"))  # 輸出：('document', 'pdf')
print(get_file_info("archive.tar.gz"))  # 輸出：('archive.tar', 'gz')

三、格式化和轉換（5 個）

9. format() 和 f-string —— 字符串格式化的演進

name = "張三"
age = 25
salary = 15000.5

# 方式1：% 格式化（已過時）
result1 = "姓名：%s，年齡：%d，工資：%.2f" % (name, age, salary)

# 方式2：format() 方法（兼容性好）
result2 = "姓名：{}，年齡：{}，工資：{:.2f}".format(name, age, salary)

# 方式3：f-string（Python 3.6+，推薦）
result3 = f"姓名：{name}，年齡：{age}，工資：{salary:.2f}"

print(result3)
# 輸出：姓名：張三，年齡：25，工資：15000.50

# f-string 的強大功能：可以直接執(zhí)行表達式
print(f"下年工資：{salary * 1.1:.2f}")  # 輸出：下年工資：16500.55

# 對齊和填充
numbers = [1, 12, 123, 1234]
for num in numbers:
    print(f"數字：{num:>5}")
# 輸出：
# 數字：    1
# 數字：   12
# 數字：  123
# 數字： 1234

# 進制轉換
num = 255
print(f"十進制：{num}，十六進制：{num:x}，二進制：{num:b}")
# 輸出：十進制：255，十六進制：ff，二進制：11111111

# 百分比格式
rate = 0.8567
print(f"完成度：{rate:.2%}")  # 輸出：完成度：85.67%

# 數字分隔符（Python 3.6+）
large_num = 1234567890
print(f"大數字：{large_num:,}")  # 輸出：大數字：1,234,567,890

性能對比：

import time

name = "Python"
age = 10

# 對比三種方法的性能
iterations = 1000000

# % 格式化
start = time.time()
for _ in range(iterations):
    result = "%s is %d years old" % (name, age)
time1 = time.time() - start

# format() 方法
start = time.time()
for _ in range(iterations):
    result = "{} is {} years old".format(name, age)
time2 = time.time() - start

# f-string
start = time.time()
for _ in range(iterations):
    result = f"{name} is {age} years old"
time3 = time.time() - start

print(f"% 格式化：{time1:.3f}s")
print(f"format()：{time2:.3f}s")
print(f"f-string：{time3:.3f}s")
# 輸出示例：f-string 最快，% 最慢

10. upper() / lower() / title() / swapcase() —— 大小寫轉換

text = "Hello World Python"

# 全部大寫
print(text.upper())  # 輸出：HELLO WORLD PYTHON

# 全部小寫
print(text.lower())  # 輸出：hello world python

# 標題格式（首字母大寫）
print(text.title())  # 輸出：Hello World Python

# 交換大小寫
print(text.swapcase())  # 輸出：hELLO wORLD pYTHON

# capitalize()：首字母大寫，其他小寫
print(text.capitalize())  # 輸出：Hello world python

# 實戰(zhàn)應用1：規(guī)范化用戶輸入
user_email = input("請輸入郵箱：").strip().lower()
# 防止大小寫差異導致的問題

# 實戰(zhàn)應用2：生成 URL slug
def slugify(text):
    """將文本轉換為 URL 安全的格式"""
    return text.lower().replace(" ", "-")

print(slugify("Hello World Python"))  # 輸出：hello-world-python

# 實戰(zhàn)應用3：檢查密碼復雜度
def check_password_strength(password):
    has_upper = any(c.isupper() for c in password)
    has_lower = any(c.islower() for c in password)
    has_digit = any(c.isdigit() for c in password)
    return len(password) >= 8and has_upper and has_lower and has_digit

print(check_password_strength("Secure123"))  # 輸出：True

11. isdigit() / isalpha() / isalnum() —— 字符檢驗

# 檢查是否全是數字
print("12345".isdigit())  # 輸出：True
print("123a5".isdigit())  # 輸出：False

# 檢查是否全是字母
print("hello".isalpha())  # 輸出：True
print("hello123".isalpha())  # 輸出：False

# 檢查是否全是字母或數字
print("hello123".isalnum())  # 輸出：True
print("hello-123".isalnum())  # 輸出：False

# 檢查是否全是空格
print("   ".isspace())  # 輸出：True

# 檢查是否是合法標識符（變量名）
print("var_name".isidentifier())  # 輸出：True
print("123var".isidentifier())  # 輸出：False

# 檢查是否全是大寫/小寫
print("HELLO".isupper())  # 輸出：True
print("hello".islower())  # 輸出：True

# 實戰(zhàn)應用1：驗證用戶輸入
def validate_username(username):
    if len(username) < 3or len(username) > 20:
        returnFalse, "用戶名長度 3-20 位"
    ifnot username[0].isalpha():
        returnFalse, "用戶名首字必須是字母"
    ifnot username.replace("_", "").isalnum():
        returnFalse, "用戶名只能包含字母、數字和下劃線"
    returnTrue, "用戶名合法"

print(validate_username("user_123"))  # 輸出：(True, '用戶名合法')
print(validate_username("123user"))   # 輸出：(False, '用戶名首字必須是字母')

# 實戰(zhàn)應用2：數據類型識別
def detect_type(value_str):
    """識別字符串代表的數據類型"""
    if value_str.isdigit():
        return"整數"
    elif value_str.isalpha():
        return"字符串"
    elif value_str.isalnum():
        return"混合類型"
    else:
        return"其他"

print(detect_type("123"))  # 輸出：整數

12. zfill() 和 center() —— 填充和居中

# zfill()：用 0 填充左邊
num_str = "123"
print(num_str.zfill(5))  # 輸出：00123

# 實戰(zhàn)應用1：生成訂單號
def generate_order_id(order_num):
    returnf"ORD{order_num:0>6d}"

print(generate_order_id(123))  # 輸出：ORD000123

# center()：居中（填充兩邊）
text = "Python"
print(text.center(15))  # 輸出："    Python     "
print(text.center(15, "*"))  # 輸出："****Python*****"

# ljust() 和 rjust()：左對齊和右對齊
print(text.ljust(15, "-"))  # 輸出：Python---------
print(text.rjust(15, "-"))  # 輸出：---------Python

# 實戰(zhàn)應用2：打印表格
def print_table(rows):
    """打印對齊的表格"""
    for row in rows:
        print("|".join(cell.center(15) for cell in row))

rows = [
    ["姓名", "年齡", "城市"],
    ["張三", "25", "北京"],
    ["李四", "30", "上海"],
]
print_table(rows)

四、正則表達式和高級操作（6 個）

13. 正則表達式基礎 —— match() / search() / findall()

import re

# match()：從開頭匹配
text = "Python 3.9"
if re.match(r"Python", text):
    print("匹配成功")

# search()：在全文中查找
if re.search(r"\d+\.\d+", text):
    print("找到版本號")

# findall()：找出所有匹配
emails = "contact us at john@example.com or jane@test.org"
found = re.findall(r"\b[\w.-]+@[\w.-]+\.\w+\b", emails)
print(found)
# 輸出：['john@example.com', 'jane@test.org']

# 提取有分組的內容
text = "Price: $99.99, Tax: $7.50"
matches = re.findall(r"\$(\d+\.\d+)", text)
print(matches)
# 輸出：['99.99', '7.50']

# 實戰(zhàn)應用1：電話號碼提取
def extract_phone_numbers(text):
    """從文本中提取電話號碼"""
    pattern = r"\b(?:\+?1[-.\s]?)?\(?([0-9]{3})\)?[-.\s]?([0-9]{3})[-.\s]?([0-9]{4})\b"
    return re.findall(pattern, text)

text = "Call me at 123-456-7890 or (098) 765 4321"
print(extract_phone_numbers(text))

# 實戰(zhàn)應用2：URL 提取
def extract_urls(text):
    """從文本中提取所有 URL"""
    pattern = r"https?://(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b(?:[-a-zA-Z0-9()@:%_\+.~#?&//=]*)"
    return re.findall(pattern, text)

text = "Visit https://www.example.com or http://test.org for more info"
print(extract_urls(text))

14. sub() 和 subn() —— 正則替換

import re

# sub()：替換所有匹配
text = "The price is $99.99 and tax is $7.50"
result = re.sub(r"\$(\d+\.\d+)", r"￥\1*7", text)
print(result)
# 輸出：The price is ￥99.99*7 and tax is ￥7.50*7

# subn()：替換并返回替換次數
text = "apple, apple, apple"
result, count = re.subn(r"apple", "orange", text)
print(f"替換了 {count} 處")
print(result)

# 使用函數進行動態(tài)替換
def replace_func(match):
    """將價格增加 10%"""
    price = float(match.group(1))
    returnf"${price * 1.1:.2f}"

text = "Item 1: $100, Item 2: $50"
result = re.sub(r"\$(\d+(?:\.\d+)?)", replace_func, text)
print(result)
# 輸出：Item 1: $110.00, Item 2: $55.00

# 實戰(zhàn)應用1：日期格式轉換
def convert_date_format(text):
    """將 2024-01-15 轉換為 15/01/2024"""
    pattern = r"(\d{4})-(\d{2})-(\d{2})"
    return re.sub(pattern, r"\3/\2/\1", text)

print(convert_date_format("Today is 2024-01-15"))
# 輸出：Today is 15/01/2024

# 實戰(zhàn)應用2：HTML 標簽移除
def remove_html_tags(text):
    """從 HTML 中提取純文本"""
    return re.sub(r"<[^>]+>", "", text)

html = "<p>Hello <b>World</b></p>"
print(remove_html_tags(html))
# 輸出：Hello World

15. compile() —— 預編譯正則表達式（性能優(yōu)化）

import re

# ? 不好的做法（每次都編譯）
def validate_email_slow(email):
    for _ in range(1000):
        if re.match(r"^[\w\.-]+@[\w\.-]+\.\w+$", email):
            returnTrue
    returnFalse

# ? 好的做法（只編譯一次）
email_pattern = re.compile(r"^[\w\.-]+@[\w\.-]+\.\w+$")

def validate_email_fast(email):
    for _ in range(1000):
        if email_pattern.match(email):
            returnTrue
    returnFalse

# 性能對比
import time

email = "user@example.com"

start = time.time()
validate_email_slow(email)
time1 = time.time() - start

start = time.time()
validate_email_fast(email)
time2 = time.time() - start

print(f"未預編譯：{time1:.4f}s，預編譯：{time2:.4f}s")
# 預編譯通常快 2-3 倍

# 實戰(zhàn)應用：創(chuàng)建驗證器類
class Validator:
    """使用預編譯正則表達式的驗證器"""
    EMAIL_PATTERN = re.compile(r"^[\w\.-]+@[\w\.-]+\.\w+$")
    PHONE_PATTERN = re.compile(r"^\d{10,11}$")
    URL_PATTERN = re.compile(r"^https?://")
    
    @classmethod
    def is_valid_email(cls, email):
        return cls.EMAIL_PATTERN.match(email) isnotNone
    
    @classmethod
    def is_valid_phone(cls, phone):
        return cls.PHONE_PATTERN.match(phone) isnotNone
    
    @classmethod
    def is_valid_url(cls, url):
        return cls.URL_PATTERN.match(url) isnotNone

print(Validator.is_valid_email("user@example.com"))  # True
print(Validator.is_valid_phone("13800138000"))  # True
print(Validator.is_valid_url("https://example.com"))  # True

16. translate() —— 高效的字符替換

# 創(chuàng)建轉換表
translation_table = str.maketrans("aeiou", "12345")
text = "hello world"
result = text.translate(translation_table)
print(result)
# 輸出：h2ll4 w4rld

# 刪除指定字符
delete_table = str.maketrans("", "", "aeiou")
text = "hello world"
result = text.translate(delete_table)
print(result)
# 輸出：hll wrld

# 實戰(zhàn)應用1：移除標點符號
import string
text = "Hello, World! How are you?"
remove_punctuation = str.maketrans("", "", string.punctuation)
result = text.translate(remove_punctuation)
print(result)
# 輸出：Hello World How are you

# 實戰(zhàn)應用2：數字轉中文
chinese_map = str.maketrans("0123456789", "零一二三四五六七八九")
text = "My phone is 13800138000"
result = text.translate(chinese_map)
print(result)
# 輸出：My phone is 一三八零零一三八零零零

# 性能對比：translate vs replace
import time

text = "hello world" * 10000
iterations = 10000

# 方式1：用 replace
start = time.time()
for _ in range(iterations):
    result = text.replace("o", "0").replace("e", "3")
time1 = time.time() - start

# 方式2：用 translate
trans_table = str.maketrans("oe", "03")
start = time.time()
for _ in range(iterations):
    result = text.translate(trans_table)
time2 = time.time() - start

print(f"replace 方式：{time1:.4f}s，translate 方式：{time2:.4f}s")
# translate 通常快 3-5 倍

17. expandtabs() —— 制表符處理

# 將制表符轉換為空格
text = "name\tage\tcity\nJohn\t25\tNYC"
print(text.expandtabs(15))
# 輸出對齊的表格

# 實戰(zhàn)應用：處理日志文件中的縮進
log_text = "Error:\t\tConnection failed\nWarning:\t\tMemory high"
formatted = log_text.expandtabs(20)
print(formatted)

# 獲取制表符的位置
text = "Line1\tColumn1\nLine2\tColumn2"
print(text.expandtabs(10))

18. encode() 和 decode() —— 字符編碼轉換

# 編碼：字符串 → 字節(jié)
text = "Hello 世界 ??"

# 編碼為 UTF-8
encoded_utf8 = text.encode("utf-8")
print(encoded_utf8)
# 輸出：b'Hello \xe4\xb8\x96\xe7\x95\x8c \xf0\x9f\x8c\x8d'

# 編碼為 GB2312（簡體中文）
encoded_gb = text.encode("gb2312", errors="ignore")
print(encoded_gb)

# 解碼：字節(jié) → 字符串
decoded = encoded_utf8.decode("utf-8")
print(decoded)
# 輸出：Hello 世界 ??

# 處理編碼錯誤
text = "測試"
try:
    # 嘗試用 ASCII 編碼（會失敗）
    encoded = text.encode("ascii")
except UnicodeEncodeError as e:
    print(f"編碼錯誤：{e}")

# 使用錯誤處理策略
# 'strict'：遇到無法編碼的字符報錯（默認）
# 'ignore'：忽略無法編碼的字符
# 'replace'：用 ? 替代無法編碼的字符
# 'xmlcharrefreplace'：用 XML 字符引用替代

text = "Hello 世界"
print(text.encode("ascii", errors="ignore"))
# 輸出：b'Hello '

print(text.encode("ascii", errors="replace"))
# 輸出：b'Hello ?'

print(text.encode("ascii", errors="xmlcharrefreplace"))
# 輸出：b'Hello 世界'

# 實戰(zhàn)應用1：處理文件編碼問題
def safe_read_file(filepath):
    """安全讀取文件，自動處理編碼問題"""
    encodings = ["utf-8", "gbk", "gb2312", "ascii"]
    for encoding in encodings:
        try:
            with open(filepath, "r", encoding=encoding) as f:
                return f.read()
        except (UnicodeDecodeError, UnicodeEncodeError):
            continue
    raise ValueError("無法讀取文件，編碼未知")

# 實戰(zhàn)應用2：處理網絡數據
import json
json_str = '{"name":"張三","age":25}'
json_bytes = json_str.encode("utf-8")
decoded_str = json_bytes.decode("utf-8")
data = json.loads(decoded_str)
print(data)

19. ljust() / rjust() / center() 的高級用法

# 基礎用法
text = "Python"
print(text.ljust(15, "-"))  # 輸出：Python---------
print(text.rjust(15, "-"))  # 輸出：---------Python
print(text.center(15, "-")) # 輸出：----Python-----

# 實戰(zhàn)應用1：創(chuàng)建進度條
def progress_bar(percent, width=20):
    """創(chuàng)建文本進度條"""
    filled = int(width * percent / 100)
    bar = "█" * filled + "?" * (width - filled)
    returnf"[{bar}] {percent}%"

for i in range(0, 101, 10):
    print(progress_bar(i))

# 實戰(zhàn)應用2：對齊輸出（類似表格）
def print_aligned_table(data):
    """打印對齊的表格"""
    # 計算每列的最大寬度
    max_widths = [max(len(str(row[i])) for row in data) 
                  for i in range(len(data[0]))]
    
    for row in data:
        aligned_row = [str(cell).ljust(width) 
                       for cell, width in zip(row, max_widths)]
        print(" | ".join(aligned_row))

data = [
    ["姓名", "年齡", "城市"],
    ["張三", "25", "北京"],
    ["李四的昵稱", "30", "上海"],
]
print_aligned_table(data)

# 實戰(zhàn)應用3：美化日志輸出
def format_log_message(level, message):
    """格式化日志消息"""
    level_str = f"[{level}]".ljust(10)
    returnf"{level_str} {message}"

print(format_log_message("INFO", "Server started"))
print(format_log_message("ERROR", "Connection failed"))
print(format_log_message("WARNING", "Memory usage high"))

20. casefold() —— 激進的大小寫折疊

# casefold()：比 lower() 更激進的小寫轉換
# 適用于國際字符和不同語言

text = "?"# 德文字母
print(text.lower())    # 輸出：?（不變）
print(text.casefold()) # 輸出：ss（轉換為兩個 s）

# 實戰(zhàn)應用1：不區(qū)分大小寫的字符串比較
def case_insensitive_compare(str1, str2):
    """不區(qū)分大小寫的比較（包括國際字符）"""
    return str1.casefold() == str2.casefold()

print(case_insensitive_compare("Stra?e", "STRASSE"))  # 輸出：True
print(case_insensitive_compare("hello", "HELLO"))  # 輸出：True

# 實戰(zhàn)應用2：搜索功能
def search_case_insensitive(text, query):
    """不區(qū)分大小寫的搜索"""
    return query.casefold() in text.casefold()

print(search_case_insensitive("Hello World", "hello"))  # 輸出：True
print(search_case_insensitive("Na?ve", "naive"))  # 輸出：True

# 性能對比：casefold vs lower
import time

text = ("Hello World Python " * 1000).casefold()
query = "world"

iterations = 100000

# 使用 lower()
start = time.time()
for _ in range(iterations):
    query.lower() in text
time1 = time.time() - start

# 使用 casefold()
start = time.time()
for _ in range(iterations):
    query.casefold() in text
time2 = time.time() - start

print(f"lower()：{time1:.4f}s，casefold()：{time2:.4f}s")

五、綜合實戰(zhàn)：完整的數據處理流程

綜合案例1：解析和驗證用戶數據

import re
from collections import defaultdict

def parse_and_validate_user_data(csv_data):
    """
    解析和驗證 CSV 格式的用戶數據
    
    輸入格式：
    name,email,phone,age
    張三,zhangsan@example.com,13800138000,25
    李四,lisi@test.org,15900139000,30
    """
    
    lines = csv_data.strip().split("\n")
    headers = [h.strip() for h in lines[0].split(",")]
    
    users = []
    errors = []
    
    for i, line in enumerate(lines[1:], start=2):
        fields = [f.strip() for f in line.split(",")]
        
        if len(fields) != len(headers):
            errors.append(f"行 {i}：字段數不匹配")
            continue
        
        user = dict(zip(headers, fields))
        
        # 驗證郵箱
        email_pattern = re.compile(r"^[\w\.-]+@[\w\.-]+\.\w+$")
        ifnot email_pattern.match(user["email"]):
            errors.append(f"行 {i}：郵箱格式錯誤 - {user['email']}")
            continue
        
        # 驗證電話
        ifnot user["phone"].isdigit() or len(user["phone"]) != 11:
            errors.append(f"行 {i}：電話格式錯誤 - {user['phone']}")
            continue
        
        # 驗證年齡
        try:
            age = int(user["age"])
            ifnot18 <= age <= 100:
                errors.append(f"行 {i}：年齡應在 18-100 之間")
                continue
        except ValueError:
            errors.append(f"行 {i}：年齡應為數字 - {user['age']}")
            continue
        
        user["age"] = age
        users.append(user)
    
    return {
        "valid_users": users,
        "errors": errors,
        "summary": f"成功：{len(users)} 條，失敗：{len(errors)} 條"
    }

# 使用示例
csv_data = """
name,email,phone,age
張三,zhangsan@example.com,13800138000,25
李四,invalid-email,15900139000,30
王五,wangwu@test.org,159001390,35
趙六,zhaoliu@test.org,18600136000,120
"""

result = parse_and_validate_user_data(csv_data)
print(result["summary"])
for error in result["errors"]:
    print(f"  ? {error}")
for user in result["valid_users"]:
    print(f"  ? {user['name']} - {user['email']}")

綜合案例2：日志分析和統(tǒng)計

import re
from collections import Counter

def analyze_log_file(log_text):
    """
    分析日志文件，提取關鍵信息
    
    日志格式：
    [2024-01-15 10:30:45] INFO: Server started
    [2024-01-15 10:30:50] ERROR: Connection failed
    """
    
    # 定義日志模式
    log_pattern = re.compile(
        r"\[(?P<timestamp>.*?)\]\s+(?P<level>\w+):\s+(?P<message>.*)"
    )
    
    logs = []
    level_count = Counter()
    
    for line in log_text.strip().split("\n"):
        match = log_pattern.match(line)
        ifnot match:
            continue
        
        log_entry = match.groupdict()
        logs.append(log_entry)
        level_count[log_entry["level"]] += 1
    
    # 查找錯誤消息
    errors = [log for log in logs if log["level"] == "ERROR"]
    
    # 統(tǒng)計信息
    return {
        "total_logs": len(logs),
        "level_distribution": dict(level_count),
        "errors": errors,
        "error_count": len(errors),
        "error_types": Counter(e["message"].split(":")[0] for e in errors)
    }

# 使用示例
log_text = """
[2024-01-15 10:30:45] INFO: Server started
[2024-01-15 10:30:50] ERROR: Connection failed
[2024-01-15 10:31:00] WARNING: Memory usage high
[2024-01-15 10:31:05] ERROR: Connection failed
[2024-01-15 10:31:10] INFO: Request processed
"""

result = analyze_log_file(log_text)
print(f"總日志數：{result['total_logs']}")
print(f"日志級別分布：{result['level_distribution']}")
print(f"錯誤數：{result['error_count']}")
print(f"錯誤類型：{result['error_types']}")

綜合案例3：URL 解析和清理

import re
from urllib.parse import urlparse, parse_qs

def analyze_urls(url_list):
    """
    分析和清理 URL 列表
    """
    
    url_pattern = re.compile(
        r"https?://(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b(?:[-a-zA-Z0-9()@:%_\+.~#?&//=]*)"
    )
    
    valid_urls = []
    domains = Counter()
    
    for url in url_list:
        # 提取 URL
        url = url.strip()
        ifnot url_pattern.match(url):
            continue
        
        # 解析 URL
        parsed = urlparse(url)
        domain = parsed.netloc.replace("www.", "")
        domains[domain] += 1
        
        # 解析查詢參數
        params = parse_qs(parsed.query)
        
        valid_urls.append({
            "url": url,
            "domain": domain,
            "path": parsed.path,
            "params": params
        })
    
    return {
        "total_urls": len(valid_urls),
        "unique_domains": len(domains),
        "top_domains": domains.most_common(5),
        "urls": valid_urls
    }

# 使用示例
urls = [
    "https://www.example.com/path?key=value",
    "http://test.org/api/users?id=123&type=admin",
    "invalid-url",
    "https://github.com/repository"
]

result = analyze_urls(urls)
print(f"有效 URL：{result['total_urls']}")
print(f"獨特域名：{result['unique_domains']}")
print(f"頂級域名：{result['top_domains']}")

六、性能優(yōu)化總結

場景1：大規(guī)模字符串拼接

# ? 不好（時間復雜度 O(n2)）
result = ""
for i in range(10000):
    result += f"Item {i}, "

# ? 好（時間復雜度 O(n)）
result = ", ".join(f"Item {i}" for i in range(10000))

# 性能提升：100 倍以上

場景2：多次替換操作

# ? 不好（每次都遍歷字符串）
text = "a" * 1000000
for char in "abcdefg":
    text = text.replace(char, "x")

# ? 好（使用 translate，一次遍歷）
trans = str.maketrans("abcdefg", "xxxxxxx")
text = text.translate(trans)

# 性能提升：10 倍以上

場景3：頻繁的正則匹配

# ? 不好（每次都編譯）
import re
for email in emails:
    if re.match(r"^[\w\.-]+@[\w\.-]+\.\w+$", email):
        pass

# ? 好（預編譯）
pattern = re.compile(r"^[\w\.-]+@[\w\.-]+\.\w+$")
for email in emails:
    if pattern.match(email):
        pass

# 性能提升：2-3 倍

七、20 個操作速查表

序號	操作	用途	復雜度	常用度
1	find() / index()	查找子字符串	O(n*m)	?????
2	replace()	替換子字符串	O(n*m)	?????
3	count()	統(tǒng)計出現次數	O(n)	????
4	startswith/endswith	前后綴檢查	O(m)	?????
5	strip()	去除空白	O(n)	?????
6	split()	分割字符串	O(n)	?????
7	join()	連接字符串	O(n)	?????
8	partition()	三分法分割	O(n)	???
9	format / f-string	字符串格式化	O(n)	?????
10	upper/lower/title	大小寫轉換	O(n)	????
11	isdigit/isalpha	字符檢驗	O(n)	????
12	zfill / center	填充和居中	O(n)	???
13	match / search	正則匹配	O(n*m)	?????
14	findall	找所有匹配	O(n*m)	?????
15	sub / subn	正則替換	O(n*m)	?????
16	compile	預編譯正則	O(m)	????
17	translate	字符映射	O(n)	???
18	expandtabs	制表符處理	O(n)	?
19	encode / decode	編碼轉換	O(n)	????
20	casefold	激進小寫	O(n)	??

八、最佳實踐建議

? 做這些事：

使用 f-string —— 最新、最快、最可讀
用 join() 拼接 —— 永遠不要用 + 連接多個字符串
預編譯正則 —— 頻繁匹配時必須預編譯
使用 strip() —— 清理用戶輸入數據
選擇合適的檢驗方法 —— isdigit、isalpha 等
用 translate —— 大規(guī)模字符替換時效率最高
編碼統(tǒng)一 —— 優(yōu)先使用 UTF-8
驗證輸入 —— 始終驗證外部輸入

九、總結

這 20 個字符串操作涵蓋了 Python 中 95% 的實際應用場景。關鍵是要理解：

基礎操作（1-5）：是所有字符串處理的基礎
高效操作（6-7）：join 和 split 是性能的關鍵
驗證操作（11）：確保數據質量
正則表達式（13-16）：處理復雜匹配的利器
性能優(yōu)化（translate、compile）：處理大規(guī)模數據的必備

責任編輯：趙寧寧來源： Python數智工坊

Python 字符串數據類型