正则表达式处理XML

前端之家收集整理的这篇文章主要介绍了正则表达式处理XML前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
<tr>
<td>5345454354</td><td>2010-3-29 13:48:33</td><td>周杰伦</td>
</tr>
<tr>
<td>6565465466</td><td>2010-3-29 15:34:38</td><td>张学友</td>
</tr>
<tr>
<td>6546546546</td><td>2010-3-30 19:30:50</td><td>刘德华</td>
</tr>
<tr>
<td>9875646545</td><td>2010-3-31 2:20:58</td><td>郭富城</td>
</tr>
<tr>
<td>7868768768</td><td>2010-3-31 8:03:11</td><td>梁朝伟</td>
</tr>
<tr>
<td>1434444446 </td><td>2010-3-31 8:45:52</td><td>习近平</td>
</tr>
<tr>
<td>7665466666</td><td>2010-3-31 18:00:46</td><td>李长春</td>
</tr>

若想取标记<td></td>之间的内容,可以这样分析

表达式

说明

(?<=Expression) 逆序肯定环视,表示所在位置左侧能够匹配Expression

(?<!Expression) 逆序否定环视,表示所在位置左侧不能匹配Expression

(?=Expression) 顺序肯定环视,表示所在位置右侧能够匹配Expression

(?!Expression) 顺序否定环视,表示所在位置右侧不能匹配Expression

(?is)(?<=<td>).+?(?=</td>) (?is) 模式修饰,i表示忽略大小写,s表示单行模式.能匹配回车换行 (?<=<td>) 逆序肯定环视,需要匹配的结果以<td>开头,但是<td>匹配,结果中不包含<td> .+? 任意字符,每次匹配到符合的(任意字符),即尝试匹配后面的表达式,直到后面的表达式失败,回溯上一次匹配结果。 (?=</td>) 顺序肯定环视,匹配的结果最后要以</td>结尾,但</td>不匹配,结果中不包含</td> 

正则取xml内容比dom4j快50倍?

long t1 = System.nanoTime();
String str = "<xml><ToUserName><![CDATA[gh_520f99dff7cc]]></ToUserName><FromUserName><![CDATA[oBAMOs3aZB0dkbILsBR1wksbmli4]]></FromUserName><CreateTime>1416900555</CreateTime><MsgType><![CDATA[event]]></MsgType><Event><![CDATA[MASSSENDJOBFINISH]]></Event><MsgID>2348714844</MsgID><Status><![CDATA[send success]]></Status><TotalCount>1</TotalCount><FilterCount>1</FilterCount><SentCount>1</SentCount><ErrorCount>0</ErrorCount></xml>";
// Document doc = null;
// try {
// doc = DocumentHelper.parseText(str);
// } catch (DocumentException e) {
// log.error("解析群发xml错误:"+e.getMessage(),e);
// }
// 
// Element root = doc.getRootElement();
// String msgid = root.elementTextTrim("MsgID");
// String Status = root.elementTextTrim("Status");
// String TotalCount = root.elementTextTrim("TotalCount");
// String FilterCount = root.elementTextTrim("FilterCount");
// String SentCount = root.elementTextTrim("SentCount");
// String ErrorCount = root.elementTextTrim("ErrorCount");
            String msgid = RegExp.getString(str,"(?<=<MsgID>)[\\s\\S]*?(?=</MsgID>)").trim();
            String Status = RegExp.getString(str,"(?<=<Status><!\\[CDATA\\[)[\\s\\S]*?(?=\\]\\]></Status>)")
                .trim();
            String TotalCount = RegExp.getString(str,"(?<=<TotalCount>)[\\s\\S]*?(?=</TotalCount>)")
                .trim();
            String FilterCount = RegExp.getString(str,"(?<=<FilterCount>)[\\s\\S]*?(?=</FilterCount>)")
                .trim();
            String SentCount = RegExp.getString(str,"(?<=<SentCount>)[\\s\\S]*?(?=</SentCount>)")
                .trim();
            String ErrorCount = RegExp.getString(str,"(?<=<ErrorCount>)[\\s\\S]*?(?=</ErrorCount>)")
                .trim();
            long t2 = System.nanoTime();
            log.info(t2-t1);
            log.info((t2-t1)*0.000001);
            log.info(msgid+","+Status+","+TotalCount+","+FilterCount+","+SentCount+","+ErrorCount);

正则代码

public class RegExp {
    public static ArrayList<String> getStrs(String source,String regex) {
        Pattern p = Pattern.compile(regex);
        Matcher m = p.matcher(source);
        ArrayList<String> list = new ArrayList();

        while (m.find()) {
            list.add(source.substring(m.start(),m.end()));
        }

        return list;
    }

    public static String getString(String source,String regex) {
        ArrayList<String> list = getStrs(source,regex);

        if (list.size() > 0) {
            return (String) list.get(0);
        }

        return "";
    }

    public static ArrayList<String> getStrs(String source,String beginStr,String endStr,boolean isLong) {
        if (isLong) {
            return getStrs(source,"(?<=" + replay(beginStr) + ")[\\s\\S]*(?=" + replay(endStr) +
                ")");
        }

        return getStrs(source,"(?<=" + replay(beginStr) + ")[\\s\\S]*?(?=" + replay(endStr) +
            ")");
    }

    public static String getString(String source,boolean isLong) {
        if (isLong) {
            return getString(source,"(?<=" + replay(beginStr) + ")[\\s\\S]*(?=" + replay(endStr) +
                ")");
        }

        return getString(source,"(?<=" + replay(beginStr) + ")[\\s\\S]*?(?=" + replay(endStr) +
            ")");
    }

    private static String replay(String source) {
        String result = "";
        result = source.replace("\\","\\\\");
        result = source.replace(".","\\.");
        result = result.replace("(","\\(");
        result = result.replace(")","\\)");
        result = result.replace("[","\\[");
        result = result.replace("]","\\]");
        result = result.replace("{","\\{");
        result = result.replace("}","\\}");
        result = result.replace("$","\\$");
        result = result.replace("?","\\?");
        result = result.replace("&","\\&");
        result = result.replace("*","\\*");
        result = result.replace("!","\\!");
        result = result.replace("^","\\^");
        result = result.replace("+","\\+");
        result = result.replace("#","\\#");

        return result;
    }
}

猜你在找的正则表达式相关文章