Issue
I've tried the whole working day to do this, but I've not a result yet. So, what i wan't to do is this:
I have a textfile, bad formated, in this file are hunderts of textes like these:
2012-02-21 05:16:47,205 ERROR - No KPI mapping found for kpi 'stoerungbeheben_moduleaccess_triage_1',
I will find all Strings between '
and '
so that one result is: stoerungbeheben_moduleaccess_triage_1
and write it back to another .txt
file
The text is different, sometimes the same.
I've tried with filterLine
and pattern with regex but it doesn't work.
Could you give me a hint how I can do that?
Kind regards
Collin
Solution
The following groovy-script produces the desired result (though does not write to file, but I believe you can easily achieve that):
def regex = "[0-9]+-[^']+'([^']+)'[^\r\n]*\r?\n?"
def source = """
2012-02-21 05:16:47,205 ERROR - No KPI mapping found for kpi 'stoerungbeh_¤eben_moduleaccess_triage_1',
2012-02-21 05:16:47,205 ERROR - No KPI mapping found for kpi 'otherbeheben_üü'
2012-02-21 05:16:47,205 ERROR - No KPI mapping found for kpi 'stoerungbeheben_moduleaccess_triage_1',
2012-02-21 05:16:47,205 ERROR - No KPI mapping found for kpi 'thirdhbeheben_äÄ_moduleaccess_triage_1'
2012-02-21 05:16:47,205 ERROR - No KPI mapping found for kpi 'stoerungbeheben_mo&%duleaccess_triage_1',
"""
java.util.regex.Pattern p = java.util.regex.Pattern.compile(regex)
java.util.regex.Matcher m = p.matcher(source)
while(m.find()) {
println(m.group(1))
}
yields:
stoerungbeh_¤eben_moduleaccess_triage_1
otherbeheben_üü
stoerungbeheben_moduleaccess_triage_1
thirdhbeheben_äÄ_moduleaccess_triage_1
stoerungbeheben_mo&%duleaccess_triage_1
EDIT: The explanation of the pattern would have been to long of a comment so added it to the answer:
Wikipedia article has a fairly comprehensive table of regex meta characters: http://en.wikipedia.org/wiki/Regular_expression#Examples IMO the best way to learn and understand regexes is to write and execute zounds of regexes against various arbitrary strings.
The pattern is far from optimal but here's some explanation for [0-9]+-[^']+'([^']+)'[^\r\n]*\r?\n?:
[0-9]+- => + sign means match 1 or more numbers from 0 to 9. Then stop at hyphen (example: 2012-). This for tackling the case if there's no newline or it is the last line.
[^']+' => match 1 or more characters that are not apostrophe and stop at apostrophe (example: -02-21 05:16:47,205 ERROR - No KPI mapping found for kpi ').
([^']+)' => match and capture 1 or more characters that are not apostrophe and stop at apostrophe (example: stoerungbeheben_moduleaccess_triage_1' where from the captured part in brackets is: stoerungbeheben_moduleaccess_triage_1).
[^\r\n]* => match 0 or more characters that are not carriage return (\r) or newline (\n) (example: ,).
\r? => match carriage return if it exists.
\n? => match newline if it exists.
Answered By - heikkim