Python 字符串常用的 20 個操作,你掌握了幾個?
字符串是 Python 中最常用的數據類型,但 90% 的程序員只用了 20% 的字符串操作。本文會介紹 20 個高頻、高效的字符串操作技巧,掌握它們,你的代碼效率能提升十倍。

一、字符串基礎查找和替換(5 個)
1. find() vs index() —— 查找子字符串的位置
這兩個方法看起來功能相同,其實有關鍵區(qū)別。
text = "Python is awesome, Python is powerful"
# find():找到返回索引,找不到返回 -1(不報錯)
pos1 = text.find("Python")
print(pos1) # 輸出:0
pos2 = text.find("Java")
print(pos2) # 輸出:-1(不存在,返回 -1)
# index():找到返回索引,找不到報錯
pos3 = text.index("Python")
print(pos3) # 輸出:0
pos4 = text.index("Java")
# 報錯:ValueError: substring not found
# 找第二次出現的位置
second_pos = text.find("Python", 1) # 從位置 1 開始查找
print(second_pos) # 輸出:26關鍵區(qū)別:
- find() 找不到返回 -1,比較安全
- index() 找不到報錯,需要異常處理
最佳實踐:推薦用 find(),避免異常處理的開銷。
2. replace() —— 替換子字符串
text = "hello world, hello python"
# 基礎替換:替換所有匹配
result1 = text.replace("hello", "hi")
print(result1)
# 輸出:hi world, hi python
# 替換指定次數:只替換前 n 個
result2 = text.replace("hello", "hi", 1) # 只替換第 1 個
print(result2)
# 輸出:hi world, hello python
# 大小寫敏感
text2 = "Hello world, hello python"
result3 = text2.replace("hello", "hi")
print(result3)
# 輸出:Hello world, hi python(第一個 H 沒替換)性能陷阱:
# ? 不好的做法(低效)
text = "a" * 1000000 # 100 萬個 'a'
for i in range(100):
text = text.replace("a", "b") # 每次都遍歷整個字符串
# ? 好的做法(高效)
import string
text = "a" * 1000000
result = text.translate(str.maketrans("a", "b")) # 一次遍歷搞定3. count() —— 統(tǒng)計子字符串出現次數
text = "the quick brown fox jumps over the lazy dog"
# 基礎計數
count1 = text.count("the")
print(count1) # 輸出:2
# 指定范圍計數(從索引 5 到 30)
count2 = text.count("the", 5, 30)
print(count2) # 輸出:1
# 統(tǒng)計不同字符的頻率
stats = {}
for char in text:
if char != ' ':
stats[char] = stats.get(char, 0) + 1
print(stats)
# 輸出:{'t': 2, 'h': 2, 'e': 3, ...}
# 更高效的做法:用 Counter
from collections import Counter
char_count = Counter(text.replace(" ", ""))
print(char_count.most_common(3)) # 輸出最常見的 3 個字符生產環(huán)境用法:
# 統(tǒng)計關鍵詞在日志中出現的次數
log_text = """
ERROR: Database connection failed
WARNING: Memory usage high
ERROR: Timeout error
INFO: Server restarted
ERROR: Authentication failed
"""
error_count = log_text.count("ERROR")
warning_count = log_text.count("WARNING")
print(f"錯誤數:{error_count},警告數:{warning_count}")4. startswith() 和 endswith() —— 前綴和后綴判斷
filename = "document.pdf"
url = "https://www.example.com"
# 檢查后綴
if filename.endswith((".pdf", ".doc", ".docx")):
print("這是一個文檔文件")
# 檢查前綴
if url.startswith(("http://", "https://")):
print("這是一個網址")
# 實際應用:文件過濾
import os
def get_python_files(directory):
"""獲取目錄下所有 Python 文件"""
python_files = []
for file in os.listdir(directory):
if file.endswith('.py'):
python_files.append(file)
return python_files
# 更 Pythonic 的做法
def get_python_files_v2(directory):
"""更高效的版本"""
return [f for f in os.listdir(directory) if f.endswith('.py')]性能對比:
# ? 不好的做法
if filename.endswith('.pdf') or filename.endswith('.doc'):
pass
# ? 好的做法(3 倍快)
if filename.endswith(('.pdf', '.doc')):
pass5. strip() / lstrip() / rstrip() —— 去除空白字符
text = " hello world \n"
# strip():去除兩端的空白
result1 = text.strip()
print(f"'{result1}'") # 輸出:'hello world'
# lstrip():只去除左端
result2 = text.lstrip()
print(f"'{result2}'") # 輸出:'hello world \n'
# rstrip():只去除右端
result3 = text.rstrip()
print(f"'{result3}'") # 輸出:' hello world'
# ?? 關鍵陷阱:不是只去除一個空格!
text2 = "---hello---"
print(text2.strip("-")) # 輸出:hello(所有連續(xù)的 - 都被去除了)
# 自定義要去除的字符
text3 = "xxxhelloyyy"
print(text3.strip("xy")) # 輸出:hello
print(text3.strip("xyhel")) # 輸出:o(只要包含的字符都被去除)
# 實戰(zhàn)應用:清理 CSV 數據
csv_line = " 張三 , 25 , 北京 \n"
fields = [f.strip() for f in csv_line.split(',')]
print(fields)
# 輸出:['張三', '25', '北京']
# 處理用戶輸入
user_input = input("請輸入你的名字:").strip()
# 自動去除多余的空白,避免數據不一致常見錯誤:
# ? 錯誤:strip 去除的是字符的集合,不是字符串本身
text = "hello"
print(text.strip("lo")) # 輸出:he(不是 hello)
# ? 正確做法:如果要去除某個字符串前綴
if text.startswith("lo"):
text = text[2:]二、高級分割和連接(4 個)
6. split() 和 rsplit() —— 分割字符串的藝術
# 基礎分割
text = "apple,banana,cherry,date"
parts1 = text.split(",")
print(parts1)
# 輸出:['apple', 'banana', 'cherry', 'date']
# 限制分割次數
parts2 = text.split(",", 2) # 只分割 2 次
print(parts2)
# 輸出:['apple', 'banana', 'cherry,date']
# rsplit():從右邊開始分割
parts3 = text.rsplit(",", 2) # 從右邊分割 2 次
print(parts3)
# 輸出:['apple,banana', 'cherry', 'date']
# 多個分隔符分割(用正則表達式)
import re
text2 = "apple, banana; cherry: date"
parts4 = re.split(r'[,;:]', text2)
print(parts4)
# 輸出:['apple', ' banana', ' cherry', ' date']
# 實戰(zhàn)應用1:解析 URL
url = "https://www.example.com/path/to/resource?key=value&foo=bar"
protocol, rest = url.split("://", 1)
domain, rest = rest.split("/", 1)
path, query = rest.split("?", 1)
print(f"協(xié)議:{protocol},域名:{domain},路徑:{path},查詢:{query}")
# 實戰(zhàn)應用2:解析 CSV 行
csv_line = 'John,"Smith, Jr.",30,New York'
# 簡單 split 會出錯,需要用 csv 模塊
import csv
reader = csv.reader([csv_line])
fields = next(reader)
print(fields)
# 輸出:['John', 'Smith, Jr.', '30', 'New York']性能對比:
# ? 低效:多次分割
text = "a:b:c:d:e"
parts = text.split(":")
result = parts[2] # 獲取第 3 個元素
# ? 高效:只分割需要的部分
result = text.split(":", 3)[2]7. join() —— 連接字符串
# 基礎連接
words = ["hello", "world", "python"]
result1 = " ".join(words)
print(result1) # 輸出:hello world python
# 連接數字(需要轉換)
numbers = [1, 2, 3, 4, 5]
result2 = "-".join(str(n) for n in numbers)
print(result2) # 輸出:1-2-3-4-5
# 實戰(zhàn)應用1:生成 SQL IN 語句
ids = [1, 2, 3, 4, 5]
sql = f"SELECT * FROM users WHERE id IN ({','.join(map(str, ids))})"
print(sql)
# 實戰(zhàn)應用2:生成 URL 路徑
path_parts = ["api", "v1", "users", "123"]
path = "/" + "/".join(path_parts)
print(path) # 輸出:/api/v1/users/123
# 實戰(zhàn)應用3:生成 CSV 行
data = ["張三", 25, "北京", "zhangsan@example.com"]
csv_line = ",".join(map(str, data))
print(csv_line)
# ?? 性能陷阱:不要用 + 連接多個字符串
# ? 不好的做法(每次都創(chuàng)建新字符串,O(n2) 復雜度)
result = ""
for word in words:
result = result + " " + word
# ? 好的做法(一次性連接,O(n) 復雜度)
result = " ".join(words)大規(guī)模數據對比:
import time
# 生成 10000 個字符串
data = ["word"] * 10000
# 用 + 連接(耗時)
start = time.time()
result = ""
for word in data:
result += word + ","
time1 = time.time() - start
# 用 join(快速)
start = time.time()
result = ",".join(data)
time2 = time.time() - start
print(f"+ 方式:{time1:.4f}s,join 方式:{time2:.4f}s")
# 輸出示例:+ 方式:0.1234s,join 方式:0.0012s(快 100 倍!)8. partition() 和 rpartition() —— 三分法分割
# partition():在第一個分隔符處分割成三部分
text = "name=John;age=30;city=NYC"
head, sep, tail = text.partition(";")
print(f"前:{head},分隔符:{sep},后:{tail}")
# 輸出:前:name=John,分隔符:;,后:age=30;city=NYC
# 實戰(zhàn)應用:解析 key=value 格式
def parse_key_value(text):
key, sep, value = text.partition("=")
return key.strip(), value.strip() if sep elseNone
result = parse_key_value("timeout = 3000")
print(result) # 輸出:('timeout', '3000')
# rpartition():從右邊開始分割
head, sep, tail = text.rpartition(";")
print(f"前:{head},分隔符:{sep},后:{tail}")
# 輸出:前:name=John;age=30,分隔符:;,后:city=NYC
# 實戰(zhàn)應用:獲取文件擴展名
def get_file_info(filename):
name, sep, ext = filename.rpartition(".")
return name, ext if sep else""
print(get_file_info("document.pdf")) # 輸出:('document', 'pdf')
print(get_file_info("archive.tar.gz")) # 輸出:('archive.tar', 'gz')三、格式化和轉換(5 個)
9. format() 和 f-string —— 字符串格式化的演進
name = "張三"
age = 25
salary = 15000.5
# 方式1:% 格式化(已過時)
result1 = "姓名:%s,年齡:%d,工資:%.2f" % (name, age, salary)
# 方式2:format() 方法(兼容性好)
result2 = "姓名:{},年齡:{},工資:{:.2f}".format(name, age, salary)
# 方式3:f-string(Python 3.6+,推薦)
result3 = f"姓名:{name},年齡:{age},工資:{salary:.2f}"
print(result3)
# 輸出:姓名:張三,年齡:25,工資:15000.50
# f-string 的強大功能:可以直接執(zhí)行表達式
print(f"下年工資:{salary * 1.1:.2f}") # 輸出:下年工資:16500.55
# 對齊和填充
numbers = [1, 12, 123, 1234]
for num in numbers:
print(f"數字:{num:>5}")
# 輸出:
# 數字: 1
# 數字: 12
# 數字: 123
# 數字: 1234
# 進制轉換
num = 255
print(f"十進制:{num},十六進制:{num:x},二進制:{num:b}")
# 輸出:十進制:255,十六進制:ff,二進制:11111111
# 百分比格式
rate = 0.8567
print(f"完成度:{rate:.2%}") # 輸出:完成度:85.67%
# 數字分隔符(Python 3.6+)
large_num = 1234567890
print(f"大數字:{large_num:,}") # 輸出:大數字:1,234,567,890性能對比:
import time
name = "Python"
age = 10
# 對比三種方法的性能
iterations = 1000000
# % 格式化
start = time.time()
for _ in range(iterations):
result = "%s is %d years old" % (name, age)
time1 = time.time() - start
# format() 方法
start = time.time()
for _ in range(iterations):
result = "{} is {} years old".format(name, age)
time2 = time.time() - start
# f-string
start = time.time()
for _ in range(iterations):
result = f"{name} is {age} years old"
time3 = time.time() - start
print(f"% 格式化:{time1:.3f}s")
print(f"format():{time2:.3f}s")
print(f"f-string:{time3:.3f}s")
# 輸出示例:f-string 最快,% 最慢10. upper() / lower() / title() / swapcase() —— 大小寫轉換
text = "Hello World Python"
# 全部大寫
print(text.upper()) # 輸出:HELLO WORLD PYTHON
# 全部小寫
print(text.lower()) # 輸出:hello world python
# 標題格式(首字母大寫)
print(text.title()) # 輸出:Hello World Python
# 交換大小寫
print(text.swapcase()) # 輸出:hELLO wORLD pYTHON
# capitalize():首字母大寫,其他小寫
print(text.capitalize()) # 輸出:Hello world python
# 實戰(zhàn)應用1:規(guī)范化用戶輸入
user_email = input("請輸入郵箱:").strip().lower()
# 防止大小寫差異導致的問題
# 實戰(zhàn)應用2:生成 URL slug
def slugify(text):
"""將文本轉換為 URL 安全的格式"""
return text.lower().replace(" ", "-")
print(slugify("Hello World Python")) # 輸出:hello-world-python
# 實戰(zhàn)應用3:檢查密碼復雜度
def check_password_strength(password):
has_upper = any(c.isupper() for c in password)
has_lower = any(c.islower() for c in password)
has_digit = any(c.isdigit() for c in password)
return len(password) >= 8and has_upper and has_lower and has_digit
print(check_password_strength("Secure123")) # 輸出:True11. isdigit() / isalpha() / isalnum() —— 字符檢驗
# 檢查是否全是數字
print("12345".isdigit()) # 輸出:True
print("123a5".isdigit()) # 輸出:False
# 檢查是否全是字母
print("hello".isalpha()) # 輸出:True
print("hello123".isalpha()) # 輸出:False
# 檢查是否全是字母或數字
print("hello123".isalnum()) # 輸出:True
print("hello-123".isalnum()) # 輸出:False
# 檢查是否全是空格
print(" ".isspace()) # 輸出:True
# 檢查是否是合法標識符(變量名)
print("var_name".isidentifier()) # 輸出:True
print("123var".isidentifier()) # 輸出:False
# 檢查是否全是大寫/小寫
print("HELLO".isupper()) # 輸出:True
print("hello".islower()) # 輸出:True
# 實戰(zhàn)應用1:驗證用戶輸入
def validate_username(username):
if len(username) < 3or len(username) > 20:
returnFalse, "用戶名長度 3-20 位"
ifnot username[0].isalpha():
returnFalse, "用戶名首字必須是字母"
ifnot username.replace("_", "").isalnum():
returnFalse, "用戶名只能包含字母、數字和下劃線"
returnTrue, "用戶名合法"
print(validate_username("user_123")) # 輸出:(True, '用戶名合法')
print(validate_username("123user")) # 輸出:(False, '用戶名首字必須是字母')
# 實戰(zhàn)應用2:數據類型識別
def detect_type(value_str):
"""識別字符串代表的數據類型"""
if value_str.isdigit():
return"整數"
elif value_str.isalpha():
return"字符串"
elif value_str.isalnum():
return"混合類型"
else:
return"其他"
print(detect_type("123")) # 輸出:整數12. zfill() 和 center() —— 填充和居中
# zfill():用 0 填充左邊
num_str = "123"
print(num_str.zfill(5)) # 輸出:00123
# 實戰(zhàn)應用1:生成訂單號
def generate_order_id(order_num):
returnf"ORD{order_num:0>6d}"
print(generate_order_id(123)) # 輸出:ORD000123
# center():居中(填充兩邊)
text = "Python"
print(text.center(15)) # 輸出:" Python "
print(text.center(15, "*")) # 輸出:"****Python*****"
# ljust() 和 rjust():左對齊和右對齊
print(text.ljust(15, "-")) # 輸出:Python---------
print(text.rjust(15, "-")) # 輸出:---------Python
# 實戰(zhàn)應用2:打印表格
def print_table(rows):
"""打印對齊的表格"""
for row in rows:
print("|".join(cell.center(15) for cell in row))
rows = [
["姓名", "年齡", "城市"],
["張三", "25", "北京"],
["李四", "30", "上海"],
]
print_table(rows)四、正則表達式和高級操作(6 個)
13. 正則表達式基礎 —— match() / search() / findall()
import re
# match():從開頭匹配
text = "Python 3.9"
if re.match(r"Python", text):
print("匹配成功")
# search():在全文中查找
if re.search(r"\d+\.\d+", text):
print("找到版本號")
# findall():找出所有匹配
emails = "contact us at john@example.com or jane@test.org"
found = re.findall(r"\b[\w.-]+@[\w.-]+\.\w+\b", emails)
print(found)
# 輸出:['john@example.com', 'jane@test.org']
# 提取有分組的內容
text = "Price: $99.99, Tax: $7.50"
matches = re.findall(r"\$(\d+\.\d+)", text)
print(matches)
# 輸出:['99.99', '7.50']
# 實戰(zhàn)應用1:電話號碼提取
def extract_phone_numbers(text):
"""從文本中提取電話號碼"""
pattern = r"\b(?:\+?1[-.\s]?)?\(?([0-9]{3})\)?[-.\s]?([0-9]{3})[-.\s]?([0-9]{4})\b"
return re.findall(pattern, text)
text = "Call me at 123-456-7890 or (098) 765 4321"
print(extract_phone_numbers(text))
# 實戰(zhàn)應用2:URL 提取
def extract_urls(text):
"""從文本中提取所有 URL"""
pattern = r"https?://(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b(?:[-a-zA-Z0-9()@:%_\+.~#?&//=]*)"
return re.findall(pattern, text)
text = "Visit https://www.example.com or http://test.org for more info"
print(extract_urls(text))14. sub() 和 subn() —— 正則替換
import re
# sub():替換所有匹配
text = "The price is $99.99 and tax is $7.50"
result = re.sub(r"\$(\d+\.\d+)", r"¥\1*7", text)
print(result)
# 輸出:The price is ¥99.99*7 and tax is ¥7.50*7
# subn():替換并返回替換次數
text = "apple, apple, apple"
result, count = re.subn(r"apple", "orange", text)
print(f"替換了 {count} 處")
print(result)
# 使用函數進行動態(tài)替換
def replace_func(match):
"""將價格增加 10%"""
price = float(match.group(1))
returnf"${price * 1.1:.2f}"
text = "Item 1: $100, Item 2: $50"
result = re.sub(r"\$(\d+(?:\.\d+)?)", replace_func, text)
print(result)
# 輸出:Item 1: $110.00, Item 2: $55.00
# 實戰(zhàn)應用1:日期格式轉換
def convert_date_format(text):
"""將 2024-01-15 轉換為 15/01/2024"""
pattern = r"(\d{4})-(\d{2})-(\d{2})"
return re.sub(pattern, r"\3/\2/\1", text)
print(convert_date_format("Today is 2024-01-15"))
# 輸出:Today is 15/01/2024
# 實戰(zhàn)應用2:HTML 標簽移除
def remove_html_tags(text):
"""從 HTML 中提取純文本"""
return re.sub(r"<[^>]+>", "", text)
html = "<p>Hello <b>World</b></p>"
print(remove_html_tags(html))
# 輸出:Hello World15. compile() —— 預編譯正則表達式(性能優(yōu)化)
import re
# ? 不好的做法(每次都編譯)
def validate_email_slow(email):
for _ in range(1000):
if re.match(r"^[\w\.-]+@[\w\.-]+\.\w+$", email):
returnTrue
returnFalse
# ? 好的做法(只編譯一次)
email_pattern = re.compile(r"^[\w\.-]+@[\w\.-]+\.\w+$")
def validate_email_fast(email):
for _ in range(1000):
if email_pattern.match(email):
returnTrue
returnFalse
# 性能對比
import time
email = "user@example.com"
start = time.time()
validate_email_slow(email)
time1 = time.time() - start
start = time.time()
validate_email_fast(email)
time2 = time.time() - start
print(f"未預編譯:{time1:.4f}s,預編譯:{time2:.4f}s")
# 預編譯通常快 2-3 倍
# 實戰(zhàn)應用:創(chuàng)建驗證器類
class Validator:
"""使用預編譯正則表達式的驗證器"""
EMAIL_PATTERN = re.compile(r"^[\w\.-]+@[\w\.-]+\.\w+$")
PHONE_PATTERN = re.compile(r"^\d{10,11}$")
URL_PATTERN = re.compile(r"^https?://")
@classmethod
def is_valid_email(cls, email):
return cls.EMAIL_PATTERN.match(email) isnotNone
@classmethod
def is_valid_phone(cls, phone):
return cls.PHONE_PATTERN.match(phone) isnotNone
@classmethod
def is_valid_url(cls, url):
return cls.URL_PATTERN.match(url) isnotNone
print(Validator.is_valid_email("user@example.com")) # True
print(Validator.is_valid_phone("13800138000")) # True
print(Validator.is_valid_url("https://example.com")) # True16. translate() —— 高效的字符替換
# 創(chuàng)建轉換表
translation_table = str.maketrans("aeiou", "12345")
text = "hello world"
result = text.translate(translation_table)
print(result)
# 輸出:h2ll4 w4rld
# 刪除指定字符
delete_table = str.maketrans("", "", "aeiou")
text = "hello world"
result = text.translate(delete_table)
print(result)
# 輸出:hll wrld
# 實戰(zhàn)應用1:移除標點符號
import string
text = "Hello, World! How are you?"
remove_punctuation = str.maketrans("", "", string.punctuation)
result = text.translate(remove_punctuation)
print(result)
# 輸出:Hello World How are you
# 實戰(zhàn)應用2:數字轉中文
chinese_map = str.maketrans("0123456789", "零一二三四五六七八九")
text = "My phone is 13800138000"
result = text.translate(chinese_map)
print(result)
# 輸出:My phone is 一三八零零一三八零零零
# 性能對比:translate vs replace
import time
text = "hello world" * 10000
iterations = 10000
# 方式1:用 replace
start = time.time()
for _ in range(iterations):
result = text.replace("o", "0").replace("e", "3")
time1 = time.time() - start
# 方式2:用 translate
trans_table = str.maketrans("oe", "03")
start = time.time()
for _ in range(iterations):
result = text.translate(trans_table)
time2 = time.time() - start
print(f"replace 方式:{time1:.4f}s,translate 方式:{time2:.4f}s")
# translate 通常快 3-5 倍17. expandtabs() —— 制表符處理
# 將制表符轉換為空格
text = "name\tage\tcity\nJohn\t25\tNYC"
print(text.expandtabs(15))
# 輸出對齊的表格
# 實戰(zhàn)應用:處理日志文件中的縮進
log_text = "Error:\t\tConnection failed\nWarning:\t\tMemory high"
formatted = log_text.expandtabs(20)
print(formatted)
# 獲取制表符的位置
text = "Line1\tColumn1\nLine2\tColumn2"
print(text.expandtabs(10))18. encode() 和 decode() —— 字符編碼轉換
# 編碼:字符串 → 字節(jié)
text = "Hello 世界 ??"
# 編碼為 UTF-8
encoded_utf8 = text.encode("utf-8")
print(encoded_utf8)
# 輸出:b'Hello \xe4\xb8\x96\xe7\x95\x8c \xf0\x9f\x8c\x8d'
# 編碼為 GB2312(簡體中文)
encoded_gb = text.encode("gb2312", errors="ignore")
print(encoded_gb)
# 解碼:字節(jié) → 字符串
decoded = encoded_utf8.decode("utf-8")
print(decoded)
# 輸出:Hello 世界 ??
# 處理編碼錯誤
text = "測試"
try:
# 嘗試用 ASCII 編碼(會失敗)
encoded = text.encode("ascii")
except UnicodeEncodeError as e:
print(f"編碼錯誤:{e}")
# 使用錯誤處理策略
# 'strict':遇到無法編碼的字符報錯(默認)
# 'ignore':忽略無法編碼的字符
# 'replace':用 ? 替代無法編碼的字符
# 'xmlcharrefreplace':用 XML 字符引用替代
text = "Hello 世界"
print(text.encode("ascii", errors="ignore"))
# 輸出:b'Hello '
print(text.encode("ascii", errors="replace"))
# 輸出:b'Hello ?'
print(text.encode("ascii", errors="xmlcharrefreplace"))
# 輸出:b'Hello 世界'
# 實戰(zhàn)應用1:處理文件編碼問題
def safe_read_file(filepath):
"""安全讀取文件,自動處理編碼問題"""
encodings = ["utf-8", "gbk", "gb2312", "ascii"]
for encoding in encodings:
try:
with open(filepath, "r", encoding=encoding) as f:
return f.read()
except (UnicodeDecodeError, UnicodeEncodeError):
continue
raise ValueError("無法讀取文件,編碼未知")
# 實戰(zhàn)應用2:處理網絡數據
import json
json_str = '{"name":"張三","age":25}'
json_bytes = json_str.encode("utf-8")
decoded_str = json_bytes.decode("utf-8")
data = json.loads(decoded_str)
print(data)19. ljust() / rjust() / center() 的高級用法
# 基礎用法
text = "Python"
print(text.ljust(15, "-")) # 輸出:Python---------
print(text.rjust(15, "-")) # 輸出:---------Python
print(text.center(15, "-")) # 輸出:----Python-----
# 實戰(zhàn)應用1:創(chuàng)建進度條
def progress_bar(percent, width=20):
"""創(chuàng)建文本進度條"""
filled = int(width * percent / 100)
bar = "█" * filled + "?" * (width - filled)
returnf"[{bar}] {percent}%"
for i in range(0, 101, 10):
print(progress_bar(i))
# 實戰(zhàn)應用2:對齊輸出(類似表格)
def print_aligned_table(data):
"""打印對齊的表格"""
# 計算每列的最大寬度
max_widths = [max(len(str(row[i])) for row in data)
for i in range(len(data[0]))]
for row in data:
aligned_row = [str(cell).ljust(width)
for cell, width in zip(row, max_widths)]
print(" | ".join(aligned_row))
data = [
["姓名", "年齡", "城市"],
["張三", "25", "北京"],
["李四的昵稱", "30", "上海"],
]
print_aligned_table(data)
# 實戰(zhàn)應用3:美化日志輸出
def format_log_message(level, message):
"""格式化日志消息"""
level_str = f"[{level}]".ljust(10)
returnf"{level_str} {message}"
print(format_log_message("INFO", "Server started"))
print(format_log_message("ERROR", "Connection failed"))
print(format_log_message("WARNING", "Memory usage high"))20. casefold() —— 激進的大小寫折疊
# casefold():比 lower() 更激進的小寫轉換
# 適用于國際字符和不同語言
text = "?"# 德文字母
print(text.lower()) # 輸出:?(不變)
print(text.casefold()) # 輸出:ss(轉換為兩個 s)
# 實戰(zhàn)應用1:不區(qū)分大小寫的字符串比較
def case_insensitive_compare(str1, str2):
"""不區(qū)分大小寫的比較(包括國際字符)"""
return str1.casefold() == str2.casefold()
print(case_insensitive_compare("Stra?e", "STRASSE")) # 輸出:True
print(case_insensitive_compare("hello", "HELLO")) # 輸出:True
# 實戰(zhàn)應用2:搜索功能
def search_case_insensitive(text, query):
"""不區(qū)分大小寫的搜索"""
return query.casefold() in text.casefold()
print(search_case_insensitive("Hello World", "hello")) # 輸出:True
print(search_case_insensitive("Na?ve", "naive")) # 輸出:True
# 性能對比:casefold vs lower
import time
text = ("Hello World Python " * 1000).casefold()
query = "world"
iterations = 100000
# 使用 lower()
start = time.time()
for _ in range(iterations):
query.lower() in text
time1 = time.time() - start
# 使用 casefold()
start = time.time()
for _ in range(iterations):
query.casefold() in text
time2 = time.time() - start
print(f"lower():{time1:.4f}s,casefold():{time2:.4f}s")五、綜合實戰(zhàn):完整的數據處理流程
綜合案例1:解析和驗證用戶數據
import re
from collections import defaultdict
def parse_and_validate_user_data(csv_data):
"""
解析和驗證 CSV 格式的用戶數據
輸入格式:
name,email,phone,age
張三,zhangsan@example.com,13800138000,25
李四,lisi@test.org,15900139000,30
"""
lines = csv_data.strip().split("\n")
headers = [h.strip() for h in lines[0].split(",")]
users = []
errors = []
for i, line in enumerate(lines[1:], start=2):
fields = [f.strip() for f in line.split(",")]
if len(fields) != len(headers):
errors.append(f"行 {i}:字段數不匹配")
continue
user = dict(zip(headers, fields))
# 驗證郵箱
email_pattern = re.compile(r"^[\w\.-]+@[\w\.-]+\.\w+$")
ifnot email_pattern.match(user["email"]):
errors.append(f"行 {i}:郵箱格式錯誤 - {user['email']}")
continue
# 驗證電話
ifnot user["phone"].isdigit() or len(user["phone"]) != 11:
errors.append(f"行 {i}:電話格式錯誤 - {user['phone']}")
continue
# 驗證年齡
try:
age = int(user["age"])
ifnot18 <= age <= 100:
errors.append(f"行 {i}:年齡應在 18-100 之間")
continue
except ValueError:
errors.append(f"行 {i}:年齡應為數字 - {user['age']}")
continue
user["age"] = age
users.append(user)
return {
"valid_users": users,
"errors": errors,
"summary": f"成功:{len(users)} 條,失敗:{len(errors)} 條"
}
# 使用示例
csv_data = """
name,email,phone,age
張三,zhangsan@example.com,13800138000,25
李四,invalid-email,15900139000,30
王五,wangwu@test.org,159001390,35
趙六,zhaoliu@test.org,18600136000,120
"""
result = parse_and_validate_user_data(csv_data)
print(result["summary"])
for error in result["errors"]:
print(f" ? {error}")
for user in result["valid_users"]:
print(f" ? {user['name']} - {user['email']}")綜合案例2:日志分析和統(tǒng)計
import re
from collections import Counter
def analyze_log_file(log_text):
"""
分析日志文件,提取關鍵信息
日志格式:
[2024-01-15 10:30:45] INFO: Server started
[2024-01-15 10:30:50] ERROR: Connection failed
"""
# 定義日志模式
log_pattern = re.compile(
r"\[(?P<timestamp>.*?)\]\s+(?P<level>\w+):\s+(?P<message>.*)"
)
logs = []
level_count = Counter()
for line in log_text.strip().split("\n"):
match = log_pattern.match(line)
ifnot match:
continue
log_entry = match.groupdict()
logs.append(log_entry)
level_count[log_entry["level"]] += 1
# 查找錯誤消息
errors = [log for log in logs if log["level"] == "ERROR"]
# 統(tǒng)計信息
return {
"total_logs": len(logs),
"level_distribution": dict(level_count),
"errors": errors,
"error_count": len(errors),
"error_types": Counter(e["message"].split(":")[0] for e in errors)
}
# 使用示例
log_text = """
[2024-01-15 10:30:45] INFO: Server started
[2024-01-15 10:30:50] ERROR: Connection failed
[2024-01-15 10:31:00] WARNING: Memory usage high
[2024-01-15 10:31:05] ERROR: Connection failed
[2024-01-15 10:31:10] INFO: Request processed
"""
result = analyze_log_file(log_text)
print(f"總日志數:{result['total_logs']}")
print(f"日志級別分布:{result['level_distribution']}")
print(f"錯誤數:{result['error_count']}")
print(f"錯誤類型:{result['error_types']}")綜合案例3:URL 解析和清理
import re
from urllib.parse import urlparse, parse_qs
def analyze_urls(url_list):
"""
分析和清理 URL 列表
"""
url_pattern = re.compile(
r"https?://(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b(?:[-a-zA-Z0-9()@:%_\+.~#?&//=]*)"
)
valid_urls = []
domains = Counter()
for url in url_list:
# 提取 URL
url = url.strip()
ifnot url_pattern.match(url):
continue
# 解析 URL
parsed = urlparse(url)
domain = parsed.netloc.replace("www.", "")
domains[domain] += 1
# 解析查詢參數
params = parse_qs(parsed.query)
valid_urls.append({
"url": url,
"domain": domain,
"path": parsed.path,
"params": params
})
return {
"total_urls": len(valid_urls),
"unique_domains": len(domains),
"top_domains": domains.most_common(5),
"urls": valid_urls
}
# 使用示例
urls = [
"https://www.example.com/path?key=value",
"http://test.org/api/users?id=123&type=admin",
"invalid-url",
"https://github.com/repository"
]
result = analyze_urls(urls)
print(f"有效 URL:{result['total_urls']}")
print(f"獨特域名:{result['unique_domains']}")
print(f"頂級域名:{result['top_domains']}")六、性能優(yōu)化總結
場景1:大規(guī)模字符串拼接
# ? 不好(時間復雜度 O(n2))
result = ""
for i in range(10000):
result += f"Item {i}, "
# ? 好(時間復雜度 O(n))
result = ", ".join(f"Item {i}" for i in range(10000))
# 性能提升:100 倍以上場景2:多次替換操作
# ? 不好(每次都遍歷字符串)
text = "a" * 1000000
for char in "abcdefg":
text = text.replace(char, "x")
# ? 好(使用 translate,一次遍歷)
trans = str.maketrans("abcdefg", "xxxxxxx")
text = text.translate(trans)
# 性能提升:10 倍以上場景3:頻繁的正則匹配
# ? 不好(每次都編譯)
import re
for email in emails:
if re.match(r"^[\w\.-]+@[\w\.-]+\.\w+$", email):
pass
# ? 好(預編譯)
pattern = re.compile(r"^[\w\.-]+@[\w\.-]+\.\w+$")
for email in emails:
if pattern.match(email):
pass
# 性能提升:2-3 倍七、20 個操作速查表
序號 | 操作 | 用途 | 復雜度 | 常用度 |
1 | find() / index() | 查找子字符串 | O(n*m) | ????? |
2 | replace() | 替換子字符串 | O(n*m) | ????? |
3 | count() | 統(tǒng)計出現次數 | O(n) | ???? |
4 | startswith/endswith | 前后綴檢查 | O(m) | ????? |
5 | strip() | 去除空白 | O(n) | ????? |
6 | split() | 分割字符串 | O(n) | ????? |
7 | join() | 連接字符串 | O(n) | ????? |
8 | partition() | 三分法分割 | O(n) | ??? |
9 | format / f-string | 字符串格式化 | O(n) | ????? |
10 | upper/lower/title | 大小寫轉換 | O(n) | ???? |
11 | isdigit/isalpha | 字符檢驗 | O(n) | ???? |
12 | zfill / center | 填充和居中 | O(n) | ??? |
13 | match / search | 正則匹配 | O(n*m) | ????? |
14 | findall | 找所有匹配 | O(n*m) | ????? |
15 | sub / subn | 正則替換 | O(n*m) | ????? |
16 | compile | 預編譯正則 | O(m) | ???? |
17 | translate | 字符映射 | O(n) | ??? |
18 | expandtabs | 制表符處理 | O(n) | ? |
19 | encode / decode | 編碼轉換 | O(n) | ???? |
20 | casefold | 激進小寫 | O(n) | ?? |
八、最佳實踐建議
? 做這些事:
- 使用 f-string —— 最新、最快、最可讀
- 用 join() 拼接 —— 永遠不要用 + 連接多個字符串
- 預編譯正則 —— 頻繁匹配時必須預編譯
- 使用 strip() —— 清理用戶輸入數據
- 選擇合適的檢驗方法 —— isdigit、isalpha 等
- 用 translate —— 大規(guī)模字符替換時效率最高
- 編碼統(tǒng)一 —— 優(yōu)先使用 UTF-8
- 驗證輸入 —— 始終驗證外部輸入
九、總結
這 20 個字符串操作涵蓋了 Python 中 95% 的實際應用場景。關鍵是要理解:
- 基礎操作(1-5):是所有字符串處理的基礎
- 高效操作(6-7):join 和 split 是性能的關鍵
- 驗證操作(11):確保數據質量
- 正則表達式(13-16):處理復雜匹配的利器
- 性能優(yōu)化(translate、compile):處理大規(guī)模數據的必備





























