Home » SQL & PL/SQL » SQL & PL/SQL » RegEX data parsing (12.1)
RegEX data parsing [message #686652] |
Thu, 10 November 2022 10:36 |
|
lott42_gmail
Messages: 146 Registered: June 2010
|
Senior Member |
|
|
I'm not good with Reg Ex parsing but here's one way I accomplished my task. Is there an easier way to parse data within a string?
--
--1: Start: Need to find either the string "ROOT" + "any 1 char" + "CAUSE:" OR "ROOT" + "any 1 char" + "CAUSE" + "any 1 char" + "ASSESSMENT:"
--2: End: Then find the first number+period+left paren. ex: "3.(" after #1. There's a few more possible "End" positions like 2.( OR 2. ( OR 2). OR 2) . OR 2. )
--3: Data: Parse the data between #1 and #2(if NO #2 "End") was found then get rest of string
--
--Sample data
WITH data AS (
SELECT 1 id, '1.(U) ROOT CAUSE: Root cause data1. other data' str FROM DUAL union all
SELECT 2 id, '1.(U) ROOT*CAUSE ASSESSMENT: Root cause data2 3.(U) other data' str FROM DUAL UNION ALL
SELECT 3 id, '1.(U) ROOT-CAUSE: Root cause data3 2.(U) other data' str FROM DUAL UNION ALL
SELECT 9 id, '1.(U) TEST STRING: Root cause data9 5.(U)' str FROM DUAL
)
SELECT parsed_data.str,
--#3 Parse the data between #1 and #2
TRIM (
SUBSTR (
str,
start_pos,
CASE
--if NO end anchor was found, then get the rest of input string
WHEN end_numb_of_chars = 0 THEN LENGTH (str)
ELSE end_numb_of_chars - 1
END)) root_parsed_data,
parsed_data.start_pos,
parsed_data.end_numb_of_chars
FROM (SELECT data_location.*,
root_data_start_pos + root_data_str_len start_pos,
--#2 Find the first number+period+left paren after #1
REGEXP_INSTR (
SUBSTR (str,
--start looking after "root" data was found
root_data_start_pos + root_data_str_len,
--grab the rest of the string
LENGTH (str)),
--Need to find "end" position anchor
--2.( or 2. ( or 2). or 2) . or 2. )
'\d+\.\(|\d+\. \(|\d+\(\.|\d+\)\.') end_numb_of_chars
FROM (SELECT d.*,
--#1: look for either "ROOT CAUSE" or "ROOT CAUSE ASSESSMENT"
REGEXP_INSTR (
str,
'ROOT.{1}CAUSE:|ROOT.{1}CAUSE.{1}ASSESSMENT:')
root_data_start_pos,
--#1: look for either "ROOT CAUSE" or "ROOT CAUSE ASSESSMENT". get length of string
CASE
WHEN REGEXP_INSTR (str, 'ROOT.{1}CAUSE:') > 0
THEN
LENGTH ('ROOT CAUSE:')
WHEN REGEXP_INSTR (str,
'ROOT.{1}CAUSE.{1}ASSESSMENT:') >
0
THEN
LENGTH ('ROOT CAUSE ASSESSMENT:')
END
root_data_str_len
FROM data d) data_location) parsed_data
Expected output
STR ROOT_DATA_PARSED
1.(U) ROOT CAUSE: Root cause data1. other data Root cause data1. other data
1.(U) ROOT*CAUSE ASSESSMENT: Root cause data2 3.(U) other data Root cause data2
1.(U) ROOT-CAUSE: Root cause data3 2.(U) other data Root cause data3
1.(U) TEST STRING: Root cause data9 5.(U)
|
|
|
Re: RegEX data parsing [message #686654 is a reply to message #686652] |
Fri, 11 November 2022 05:49 |
Solomon Yakobson
Messages: 3286 Registered: January 2010 Location: Connecticut, USA
|
Senior Member |
|
|
WITH data AS (
SELECT 1 id, '1.(U) ROOT CAUSE: Root cause data1. other data' str FROM DUAL union all
SELECT 2 id, '1.(U) ROOT*CAUSE ASSESSMENT: Root cause data2 3.(U) other data' str FROM DUAL UNION ALL
SELECT 3 id, '1.(U) ROOT-CAUSE: Root cause data3 2.(U) other data' str FROM DUAL UNION ALL
SELECT 9 id, '1.(U) TEST STRING: Root cause data9 5.(U)' str FROM DUAL
)
SELECT STR,
REGEXP_SUBSTR(
STR,
'ROOT.CAUSE(.ASSESSMENT)?:(.+?)((\d+\.\()|$)',
1,
1,
NULL,
2
) ROOT_DATA_PARSED
FROM DATA
/
STR ROOT_DATA_PARSED
---------------------------------------------------------------- -----------------------------
1.(U) ROOT CAUSE: Root cause data1. other data Root cause data1. other data
1.(U) ROOT*CAUSE ASSESSMENT: Root cause data2 3.(U) other data Root cause data2
1.(U) ROOT-CAUSE: Root cause data3 2.(U) other data Root cause data3
1.(U) TEST STRING: Root cause data9 5.(U)
SQL>
SY.
[Updated on: Fri, 11 November 2022 05:50] Report message to a moderator
|
|
|
|
Re: RegEX data parsing [message #686658 is a reply to message #686657] |
Sun, 13 November 2022 12:20 |
|
lott42_gmail
Messages: 146 Registered: June 2010
|
Senior Member |
|
|
Solomon, I introduced a Line Feed for ID:3 in the STR value
WITH data AS (
SELECT 1 id, '1.(U) ROOT CAUSE: Root cause data1. other data' str FROM DUAL union all
SELECT 2 id, '1.(U) ROOT*CAUSE ASSESSMENT: Root cause data2 3.(U) other data' str FROM DUAL UNION ALL
SELECT 3 id, '1.(U) ROOT-CAUSE: Root
cause data3 2.(U) other data' str FROM DUAL UNION ALL
SELECT 9 id, '1.(U) TEST STRING: Root cause data9 5.(U)' str FROM DUAL
)
SELECT STR,
-- DUMP(str) str_dump,
REGEXP_SUBSTR(
STR,
-- 'ROOT.CAUSE(.ASSESSMENT)?:(.+?)((\d+\.\()|$)',
-- 'ROOT.CAUSE(.ASSESSMENT)?:(^\s*|\s*$)((\d+\.\()|$)',
'ROOT.CAUSE(.ASSESSMENT)?:((.|\n)*)((\d+\.\()|$)',
1,
1,
NULL,
2
) ROOT_DATA_PARSED
FROM DATA
I'd still like to parse the data even if it contains a CR/LF character in there like
I tried a few variations of where I think the regex should be modified to allow for these line terminators but to no avail...
|
|
|
Re: RegEX data parsing [message #686660 is a reply to message #686658] |
Mon, 14 November 2022 07:47 |
Solomon Yakobson
Messages: 3286 Registered: January 2010 Location: Connecticut, USA
|
Senior Member |
|
|
You should start reading documentation:
WITH data AS (
SELECT 1 id, '1.(U) ROOT CAUSE: Root cause data1. other data' str FROM DUAL union all
SELECT 2 id, '1.(U) ROOT*CAUSE ASSESSMENT: Root cause data2 3.(U) other data' str FROM DUAL UNION ALL
SELECT 3 id, '1.(U) ROOT-CAUSE: Root
cause data3 2.(U) other data' str FROM DUAL UNION ALL
SELECT 9 id, '1.(U) TEST STRING: Root cause data9 5.(U)' str FROM DUAL
)
SELECT ID,
STR,
REGEXP_SUBSTR(
STR,
'ROOT.CAUSE(.ASSESSMENT)?:(.+?)((\d+\.\()|$)',
1,
1,
'n',
2
) ROOT_DATA_PARSED
FROM DATA
/
ID STR ROOT_DATA_PARSED
---------- ------------------------------------------------------------------- ------------------------------
1 1.(U) ROOT CAUSE: Root cause data1. other data Root cause data1. other data
2 1.(U) ROOT*CAUSE ASSESSMENT: Root cause data2 3.(U) other data Root cause data2
3 1.(U) ROOT-CAUSE: Root Root
cause data3 2.(U) other data cause data3
9 1.(U) TEST STRING: Root cause data9 5.(U)
SQL>
SY.
|
|
|
Re: RegEX data parsing [message #686661 is a reply to message #686660] |
Mon, 14 November 2022 11:48 |
|
lott42_gmail
Messages: 146 Registered: June 2010
|
Senior Member |
|
|
Thanks again Solomon!! I still need a lot of practice with RegEx that I assumed that I needed to modify the expression. Never occurred to me that "regexp_modifier" option was an issue. Yes, I still have a lot of learning to do but I do appreciate your help plus reading other posts to help with my learning curve.
|
|
|
Goto Forum:
Current Time: Sat Sep 28 19:29:56 CDT 2024
|