Apr 27, 2015

python split multiple delimiter

i think, its basic but important to you get idea. Most basic function is use re library or just use split() function on string.

Python String Split


a = "ini budi, iwan dan ibu budi. Selain itu ini juga ada 'madu' budi." 
a.split(' ')
['ini', 'budi,', 'iwan', 'dan', 'ibu', 'budi.', 'Selain', 'itu', 'ini', 'juga', 'ada', "'madu'", 'budi.']


well, its just split on space or any other delimiter in split bracket parameter.

Python keep include delimiter

sometimes, you just need to process the text, then you want to return back or merge again after processing. Well, here's this :

import re
re.split('(\W)',a)
['ini', ' ', 'budi', ',', '', ' ', 'iwan', ' ', 'dan', ' ', 'ibu', ' ', 'budi', '.', '', ' ', 'Selain', ' ', 'itu', ' ', 'ini', ' ', 'juga', ' ', 'ada', ' ', '', "'", 'madu', "'", '', ' ', 'budi', '.', '']


Python split two characters or more delimiter


its different case, the splitter can be more than one delimiter. You can use Kennet's function :

def tsplit(string, delimiters):
    """Behaves str.split but supports multiple delimiters."""
   
    delimiters = tuple(delimiters)
    stack = [string,]
   
    for delimiter in delimiters:
        for i, substring in enumerate(stack):
            substack = substring.split(delimiter)
            stack.pop(i)
            for j, _substring in enumerate(substack):
                stack.insert(i+j, _substring)
           
    return stack



tsplit(a, (',', '/', '-',' ','.','\''))
['ini', 'budi', '', 'iwan', 'dan', 'ibu', 'budi', '', 'Selain', 'itu', 'ini', 'juga', 'ada', '', 'madu', '', 'budi', '']

 
other search terms such as :

python split multiple delimiters
python split string into list delimiter
python split on multiple characters
python split on different characters
python substring delimiter
python split separator
python split include delimiter
python split string into list delimiter
python split on different characters
python split two characters
python convert string to raw
regex find text between two strings

are solved with re, as simple as this  :

re.split('\W+',a)
['ini', 'budi', 'iwan', 'dan', 'ibu', 'budi', 'Selain', 'itu', 'ini', 'juga', 'ada', 'madu', 'budi', '']


.
i think, its basic but important to you get idea. Most basic function is use re library or just use split() function on string.

Python String Split


a = "ini budi, iwan dan ibu budi. Selain itu ini juga ada 'madu' budi." 
a.split(' ')
['ini', 'budi,', 'iwan', 'dan', 'ibu', 'budi.', 'Selain', 'itu', 'ini', 'juga', 'ada', "'madu'", 'budi.']


well, its just split on space or any other delimiter in split bracket parameter.

Python keep include delimiter

sometimes, you just need to process the text, then you want to return back or merge again after processing. Well, here's this :

import re
re.split('(\W)',a)
['ini', ' ', 'budi', ',', '', ' ', 'iwan', ' ', 'dan', ' ', 'ibu', ' ', 'budi', '.', '', ' ', 'Selain', ' ', 'itu', ' ', 'ini', ' ', 'juga', ' ', 'ada', ' ', '', "'", 'madu', "'", '', ' ', 'budi', '.', '']


Python split two characters or more delimiter


its different case, the splitter can be more than one delimiter. You can use Kennet's function :

def tsplit(string, delimiters):
    """Behaves str.split but supports multiple delimiters."""
   
    delimiters = tuple(delimiters)
    stack = [string,]
   
    for delimiter in delimiters:
        for i, substring in enumerate(stack):
            substack = substring.split(delimiter)
            stack.pop(i)
            for j, _substring in enumerate(substack):
                stack.insert(i+j, _substring)
           
    return stack



tsplit(a, (',', '/', '-',' ','.','\''))
['ini', 'budi', '', 'iwan', 'dan', 'ibu', 'budi', '', 'Selain', 'itu', 'ini', 'juga', 'ada', '', 'madu', '', 'budi', '']

 
other search terms such as :

python split multiple delimiters
python split string into list delimiter
python split on multiple characters
python split on different characters
python substring delimiter
python split separator
python split include delimiter
python split string into list delimiter
python split on different characters
python split two characters
python convert string to raw
regex find text between two strings

are solved with re, as simple as this  :

re.split('\W+',a)
['ini', 'budi', 'iwan', 'dan', 'ibu', 'budi', 'Selain', 'itu', 'ini', 'juga', 'ada', 'madu', 'budi', '']

No comments:

Post a Comment