我正在做一些web抓取,这是数据的格式
Sr.No. Course_Code Course_Name Credit Grade Attendance_Grade
我收到的实际字符串具有以下形式
1 CA727 PRINCIPLES OF COMPILER DESIGN 3 A M
我感兴趣的是Course_Code,Course_Name和Grade,在这个例子中,值将是
Course_Code : CA727 Course_Name : PRINCIPLES OF COMPILER DESIGN Grade : A
解决方法
让我们用Ruby的命名捕获和一个自我描述的正则表达式!
course_line = / ^ # Starting at the front of the string (?<SrNo>\d+) # Capture one or more digits; call the result "SrNo" \s+ # Eat some whitespace (?<Code>\S+) # Capture all the non-whitespace you can; call it "Code" \s+ # Eat some whitespace (?<Name>.+\S) # Capture as much as you can # (while letting the rest of the regex still work) # Make sure you end with a non-whitespace character. # Call this "Name" \s+ # Eat some whitespace (?<Credit>\S+) # Capture all the non-whitespace you can; call it "Credit" \s+ # Eat some whitespace (?<Grade>\S+) # Capture all the non-whitespace you can; call it "Grade" \s+ # Eat some whitespace (?<Attendance>\S+) # Capture all the non-whitespace; call it "Attendance" $ # Make sure that we're at the end of the line now /x str = "1 CA727 PRINCIPLES OF COMPILER DESIGN 3 A M" parts = str.match(course_line) puts " Course Code: #{parts['Code']} Course Name: #{parts['Name']} Grade: #{parts['Grade']}".strip #=> Course Code: CA727 #=> Course Name: PRINCIPLES OF COMPILER DESIGN #=> Grade: A