{"id":632,"date":"2020-09-07T18:06:57","date_gmt":"2020-09-07T12:06:57","guid":{"rendered":"https:\/\/mellowhost.com\/blog\/?p=632"},"modified":"2020-09-08T17:03:27","modified_gmt":"2020-09-08T11:03:27","slug":"how-to-get-the-string-part-word-text-within-brackets-in-python","status":"publish","type":"post","link":"https:\/\/mellowhost.com\/blog\/how-to-get-the-string-part-word-text-within-brackets-in-python.html","title":{"rendered":"How to get the string\/part\/word\/text within brackets in Python using Regex"},"content":{"rendered":"\n<p><strong>PROBLEM DEFINITION<\/strong><\/p>\n\n\n\n<p>For example, you have a string like the following: <\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">[lagril] L.A. Girl Pro. Setting HD Matte Finish Spray<\/pre>\n\n\n\n<p>While you are scanning the line, you would like to extract the following word from it &#8216;lagril&#8217;, which you are interested in. How to do that?<\/p>\n\n\n\n<p><strong>GETTING TEXT WITHIN BRACKETS USING REGEX IN PYTHON<\/strong><\/p>\n\n\n\n<p>Our problem falls into a common string extraction problem we face in software engineering. We usually do this using Regular Expressions. Let&#8217;s build the regular expression logic first, using regex101.com<\/p>\n\n\n\n<p>We need to find a string that starts with &#8216;[&#8216; bracket and ends with &#8216;]&#8217; bracket, and in the middle, we expect alphanumeric word with small or capital letters, and they can be anything from 0 to any. So, this should be as simple as the following:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">\\[[A-Za-z0-9]*\\]<\/pre>\n\n\n\n<p>Now, this should help us target the words that comes within bracket in a sentence\/large string. But the trick to grab the text within the bracket is to group them. To use group in regex, we use () brackets without back slash in front. So if the regex is as following:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">\\[([A-Za-z0-9]*)\\]<\/pre>\n\n\n\n<p>This will put the matching string in group 1. Now, how can you get what is in the group 1 of a regular expression engine? Let&#8217;s dive into python now:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># let's import regular expression engine first\nimport re\n\n# our string\ntxt = '[lagril] L.A. Girl Pro. Setting HD Matte Finish Spray'\n\n# our regex search would be as following:\nx = re.search(r\"\\[([A-Za-z0-9]*)\\]\", txt)\n\n# we know this will put the inner text in group 1. regex object that returned by re.search, has a method called 'group()' to catch the groups matches regex. You may use the following\n\nx.group(1) # prints lagril\n<\/pre>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>PROBLEM DEFINITION For example, you have a string like the following: While you are scanning the line, you would like to extract the following word from it &#8216;lagril&#8217;, which you are interested in. How to do that? GETTING TEXT WITHIN BRACKETS USING REGEX IN PYTHON Our problem falls into a common string extraction problem we &hellip; <a href=\"https:\/\/mellowhost.com\/blog\/how-to-get-the-string-part-word-text-within-brackets-in-python.html\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;How to get the string\/part\/word\/text within brackets in Python using Regex&#8221;<\/span><\/a><\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[271,293,454],"tags":[296,455,472],"_links":{"self":[{"href":"https:\/\/mellowhost.com\/blog\/wp-json\/wp\/v2\/posts\/632"}],"collection":[{"href":"https:\/\/mellowhost.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mellowhost.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mellowhost.com\/blog\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/mellowhost.com\/blog\/wp-json\/wp\/v2\/comments?post=632"}],"version-history":[{"count":2,"href":"https:\/\/mellowhost.com\/blog\/wp-json\/wp\/v2\/posts\/632\/revisions"}],"predecessor-version":[{"id":636,"href":"https:\/\/mellowhost.com\/blog\/wp-json\/wp\/v2\/posts\/632\/revisions\/636"}],"wp:attachment":[{"href":"https:\/\/mellowhost.com\/blog\/wp-json\/wp\/v2\/media?parent=632"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mellowhost.com\/blog\/wp-json\/wp\/v2\/categories?post=632"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mellowhost.com\/blog\/wp-json\/wp\/v2\/tags?post=632"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}