主题:[原创]按关键字分割文本的代码
按关键字分割文本
如果一篇文章中有许多章节,可使用这种方式(但最多不超过999章节)。请先输入关键字。关键
字必须为三个字,且第一个字必须是章节的首字(例如“第”字),第二个字必须是中文小写数字或阿拉
伯数字(半角或全角的数字均可),第三个字必须是章节的末字(例如“章”或“回”或“节”等)。下
面的例子都是合法的关键字:
第1章、第1回、&1节 (中间是半角数字)
第1章、第1回、&1节 (中间是全角数字)
第一章、第一回、&一节 (中间是中文小写数字)
分割出来的文件保存在原文本文件所在的文件夹,并且以相应的数字为文件名,如:“第一章”保存
的文件名为“001.txt”,“第二章”保存的文件名为“002.txt”,等等。
由于代码中使用了正则表达式,所以在编程阶段必须选中“工程→引用→Microsoft VBScript
Regular Expressions 5.5”
代码如下(假设欲分割的文本已经读出在Text1文本框):
Sub 文本分割(FileName As String) 'FileName是欲分割的txt文件的全路径文件名
Dim z1 As String, z2 As String, ST As String
Dim J1 As Long, J2 As Long, count As Long, mark As Integer
Dim BJ1 As Boolean, BJ2 As Boolean
Dim regEx
Dim mas As MatchCollection
Dim ma As Match
z1 = InputBox$("请输入关键字"): If Len(z1) <> 3 Then MsgBox "关键字必须为三个字!": Exit Sub
Select Case Asc(Mid$(z1, 2, 1))
Case 48 To 57: ST = "+[0123456789]{1,3}": mark = 1
Case -23632 To -23623: ST = "+[0123456789]{1,3}": mark = 2
Case Else: ST = "+[零一二三四五六七八九十百]{1,5}": mark = 3
End Select
z2 = Left$(z1, 1): z1 = Right$(z1, 1): z1 = z2 & ST & z1 & "+"
z2 = Left$(FileName, InStrRev(FileName, "\")) '获取文件路径
Set regEx = New RegExp '建立规范表达式
regEx.Pattern = z1 '设置模式
regEx.IgnoreCase = True '不区分大小写
regEx.Global = True '全局搜索
Set mas = regEx.Execute(Text1.Text) '搜索
If mas.count Then
Screen.MousePointer = 11
For Each ma In mas '循环匹配集合
count = count + 1: z1 = Mid$(ma.Value, 2, ma.Length - 2)
Select Case mark
Case 2: z1 = StrConv(z1, 8)
Case 3: z1 = 获取章节数字(z1)
End Select
z1 = Format(z1, "000")
If BJ1 Then J2 = ma.FirstIndex + 1: GoSub 100 Else BJ1 = True: ST = z1: J1 = ma.FirstIndex + 1
If count = mas.count Then BJ2 = True: GoSub 100: Screen.MousePointer = 0: Set regEx = Nothing: MsgBox "分割并保存完毕"
Next
Else: MsgBox "非法的关键字"
End If
Exit Sub
100
Open z2 & ST & ".txt" For Output As #1
If Not BJ2 Then Print #1, Mid$(Text1.Text, J1, J2 - J1) Else Print #1, Mid$(Text1.Text, J1)
Close #1
J1 = J2: ST = z1
Return
End Sub
Function 获取章节数字(z As String) As String
Dim N As Integer, k As Integer, i As Integer, NumA() As Integer
k = Len(z)
ReDim NumA(1 To k) As Integer
For i = 1 To k
Select Case Mid$(z, i, 1)
Case "零", "0": NumA(i) = 0
Case "一": NumA(i) = 1
Case "二": NumA(i) = 2
Case "三": NumA(i) = 3
Case "四": NumA(i) = 4
Case "五": NumA(i) = 5
Case "六": NumA(i) = 6
Case "七": NumA(i) = 7
Case "八": NumA(i) = 8
Case "九": NumA(i) = 9
Case "十": NumA(i) = 10
Case "百": NumA(i) = 100
End Select
Next
For i = k To 1 Step -1
If NumA(i) > 9 And i > 1 Then
N = N + NumA(i - 1) * NumA(i): i = i - 1
Else
N = N + NumA(i)
End If
Next
获取章节数字 = N
End Function
如果一篇文章中有许多章节,可使用这种方式(但最多不超过999章节)。请先输入关键字。关键
字必须为三个字,且第一个字必须是章节的首字(例如“第”字),第二个字必须是中文小写数字或阿拉
伯数字(半角或全角的数字均可),第三个字必须是章节的末字(例如“章”或“回”或“节”等)。下
面的例子都是合法的关键字:
第1章、第1回、&1节 (中间是半角数字)
第1章、第1回、&1节 (中间是全角数字)
第一章、第一回、&一节 (中间是中文小写数字)
分割出来的文件保存在原文本文件所在的文件夹,并且以相应的数字为文件名,如:“第一章”保存
的文件名为“001.txt”,“第二章”保存的文件名为“002.txt”,等等。
由于代码中使用了正则表达式,所以在编程阶段必须选中“工程→引用→Microsoft VBScript
Regular Expressions 5.5”
代码如下(假设欲分割的文本已经读出在Text1文本框):
Sub 文本分割(FileName As String) 'FileName是欲分割的txt文件的全路径文件名
Dim z1 As String, z2 As String, ST As String
Dim J1 As Long, J2 As Long, count As Long, mark As Integer
Dim BJ1 As Boolean, BJ2 As Boolean
Dim regEx
Dim mas As MatchCollection
Dim ma As Match
z1 = InputBox$("请输入关键字"): If Len(z1) <> 3 Then MsgBox "关键字必须为三个字!": Exit Sub
Select Case Asc(Mid$(z1, 2, 1))
Case 48 To 57: ST = "+[0123456789]{1,3}": mark = 1
Case -23632 To -23623: ST = "+[0123456789]{1,3}": mark = 2
Case Else: ST = "+[零一二三四五六七八九十百]{1,5}": mark = 3
End Select
z2 = Left$(z1, 1): z1 = Right$(z1, 1): z1 = z2 & ST & z1 & "+"
z2 = Left$(FileName, InStrRev(FileName, "\")) '获取文件路径
Set regEx = New RegExp '建立规范表达式
regEx.Pattern = z1 '设置模式
regEx.IgnoreCase = True '不区分大小写
regEx.Global = True '全局搜索
Set mas = regEx.Execute(Text1.Text) '搜索
If mas.count Then
Screen.MousePointer = 11
For Each ma In mas '循环匹配集合
count = count + 1: z1 = Mid$(ma.Value, 2, ma.Length - 2)
Select Case mark
Case 2: z1 = StrConv(z1, 8)
Case 3: z1 = 获取章节数字(z1)
End Select
z1 = Format(z1, "000")
If BJ1 Then J2 = ma.FirstIndex + 1: GoSub 100 Else BJ1 = True: ST = z1: J1 = ma.FirstIndex + 1
If count = mas.count Then BJ2 = True: GoSub 100: Screen.MousePointer = 0: Set regEx = Nothing: MsgBox "分割并保存完毕"
Next
Else: MsgBox "非法的关键字"
End If
Exit Sub
100
Open z2 & ST & ".txt" For Output As #1
If Not BJ2 Then Print #1, Mid$(Text1.Text, J1, J2 - J1) Else Print #1, Mid$(Text1.Text, J1)
Close #1
J1 = J2: ST = z1
Return
End Sub
Function 获取章节数字(z As String) As String
Dim N As Integer, k As Integer, i As Integer, NumA() As Integer
k = Len(z)
ReDim NumA(1 To k) As Integer
For i = 1 To k
Select Case Mid$(z, i, 1)
Case "零", "0": NumA(i) = 0
Case "一": NumA(i) = 1
Case "二": NumA(i) = 2
Case "三": NumA(i) = 3
Case "四": NumA(i) = 4
Case "五": NumA(i) = 5
Case "六": NumA(i) = 6
Case "七": NumA(i) = 7
Case "八": NumA(i) = 8
Case "九": NumA(i) = 9
Case "十": NumA(i) = 10
Case "百": NumA(i) = 100
End Select
Next
For i = k To 1 Step -1
If NumA(i) > 9 And i > 1 Then
N = N + NumA(i - 1) * NumA(i): i = i - 1
Else
N = N + NumA(i)
End If
Next
获取章节数字 = N
End Function