Как убрать из строки знаки препинания питон
Перейти к содержимому

Как убрать из строки знаки препинания питон

  • автор:

2 простых способа удалить знаки препинания в строках Python

Прежде чем углубиться в концепцию удаления знаков препинания, давайте сначала разберемся с ситуациями, в которых нам нужно удалить знаки препинания.

Когда дело доходит до получения значений данных из API или веб-скрейпинга, мы часто сталкиваемся со значениями данных в файлах JSON, которые содержат буквенно-цифровые строки, а также знаки препинания. Это делает всю строку немного неформатированной.

Это те случаи когда мы чувствуем необходимость устранить или удалить знаки препинания в строке.

Давайте теперь посмотрим на различные способы удаления знаков препинания из String.

1. Удалите знаки препинания с помощью библиотеки регулярных выражений.

Python предоставляет нам библиотеку регулярных выражений для работы со всеми видами регулярных выражений и управления ими.

Нам нужно импортировать приведенную ниже библиотеку для работы с регулярным выражением —

Мы будем использовать re.sub(pattern, replacement, original_string) для удаления знаков препинания.

  1. pattern: знаки препинания или шаблон выражений, которые мы хотим заменить.
  2. replacement: строка, которая будет заменять шаблон.

Кроме того, мы использовали метод re.sub() для замены знаков препинания заменой, предусмотренной как ‘ ‘, то есть пробелом.

2. Использование цикла Python для удаления знаков препинания.

Циклы Python также можно использовать для удаления знаков препинания из строки, как показано ниже:

How to Remove Punctuation from Text in Python

A practical example of how to remove punctuation from text efficiently in Python

George Pipis

Geek Culture

Join Medium with my referral link — George Pipis

Read every story from George Pipis (and thousands of other writers on Medium). Your membership fee directly supports…

In NLP projects, we used to remove punctuation from the text. However, we should be very careful when we perform such tasks, depending on the project since punctuations can actually be very important like sentiment analysis and so on. Let’s provide some examples:

Another way to do that is the following:

Awesome, we managed to remove all punctuation. But what if we want to keep some of them, like the hashtag?

Remove some Punctuation and Keep some others

Let’s see how we can keep some punctuation. First, let’s get all the punctuation.

The above is the regular expression. Let’s keep all of them, but hashtags.

Best way to strip punctuation from a string

For higher versions of Python use the following code:

It’s performing raw string operations in C with a lookup table — there’s not much that will beat that but writing your own C code.

If speed isn’t a worry, another option though is:

This is faster than s.replace with each char, but won’t perform as well as non-pure python approaches such as regexes or string.translate, as you can see from the below timings. For this type of problem, doing it at as low a level as possible pays off.

This gives the following results:

Ashish Cherian's user avatar

Regular expressions are simple enough, if you know them.

For the convenience of usage, I sum up the note of striping punctuation from a string in both Python 2 and Python 3. Please refer to other answers for the detailed description.

Python 2

Python 3

SparkAndShine's user avatar

Not necessarily simpler, but a different way, if you are more familiar with the re family.

Vinko Vrsalovic's user avatar

string.punctuation is ASCII only! A more correct (but also much slower) way is to use the unicodedata module:

You can generalize and strip other types of characters as well:

It will also strip characters like

*+§$ which may or may not be «punctuation» depending on one’s point of view.

are not part of the punctuation category. You need to also test for the Symbols category as well.

I usually use something like this:

For Python 3 str or Python 2 unicode values, str.translate() only takes a dictionary; codepoints (integers) are looked up in that mapping and anything mapped to None is removed.

To remove (some?) punctuation then, use:

The dict.fromkeys() class method makes it trivial to create the mapping, setting all values to None based on the sequence of keys.

To remove all punctuation, not just ASCII punctuation, your table needs to be a little bigger; see J.F. Sebastian’s answer (Python 3 version):

string.punctuation misses loads of punctuation marks that are commonly used in the real world. How about a solution that works for non-ASCII punctuation?

Personally, I believe this is the best way to remove punctuation from a string in Python because:

  • It removes all Unicode punctuation
  • It’s easily modifiable, e.g. you can remove the \ if you want to remove punctuation, but keep symbols like $ .
  • You can get really specific about what you want to keep and what you want to remove, for example \ will only remove dashes.
  • This regex also normalizes whitespace. It maps tabs, carriage returns, and other oddities to nice, single spaces.

This uses Unicode character properties, which you can read more about on Wikipedia.

Peter Mortensen's user avatar

This line actually does not work: remove = regex.compile(ur'[\p|\p|\p

|\p|\p]+', regex.UNICODE)

I haven’t seen this answer yet. Just use a regex; it removes all characters besides word characters ( \w ) and number characters ( \d ), followed by a whitespace character ( \s ):

Peter Mortensen's user avatar

Blairg23's user avatar

This might not be the best solution however this is how I did it.

Ashwini Chaudhary's user avatar

Here’s a one-liner for Python 3.5:

Peter Mortensen's user avatar

Haythem HADHAB's user avatar

Here is a function I wrote. It’s not very efficient, but it is simple and you can add or remove any punctuation that you desire:

Peter Mortensen's user avatar

Pablo Rodriguez Bertorello's user avatar

Just as an update, I rewrote the @Brian example in Python 3 and made changes to it to move regex compile step inside of the function. My thought here was to time every single step needed to make the function work. Perhaps you are using distributed computing and can’t have regex object shared between your workers and need to have re.compile step at each worker. Also, I was curious to time two different implementations of maketrans for Python 3

Plus I added another method to use set, where I take advantage of intersection function to reduce number of iterations.

This is the complete code:

This is my results:

krinker's user avatar

A one-liner might be helpful in not very strict cases:

Peter Mortensen's user avatar

Here’s a solution without regex.

  • Replaces the punctuations with spaces
  • Replace multiple spaces in between words with a single space
  • Remove the trailing spaces, if any with strip()

ngub05's user avatar

Why none of you use this?

Dehua Li's user avatar

I was looking for a really simple solution. here’s what I got:

Here’s one other easy way to do it using RegEx

Zain Sarwar's user avatar

Animeartist's user avatar

twasbrillig's user avatar

Isayas Wakgari Kelbessa's user avatar

Vivian's user avatar

The question does not have a lot of specifics, so the approach I took is to come up with a solution with the simplest interpretation of the problem: just remove the punctuation.

Python: Remove Punctuation from a String (3 Different Ways!)

Python - Remove Punctuation from String Cover Image

In this tutorial, you’ll learn how to use Python to remove punctuation from a string. You’ll learn how to strip punctuation from a Python string using the str.translate() method, the str.replace() method, the popular regular expression library re , and, finally, using for-loops.

Being able to work with and manipulate strings is an essential skill for any Pythonista. Strings you find via the internet or your files will often require quite a bit of work in order to be able to analyze them. One of the tasks you’ll often encounter is the ability to use Python to remove punctuation from a string.

The Quick Answer: Use .translate() for the fastest performance

Quick Answer - Python Remove Punctuation from String

Table of Contents

Use Python to Remove Punctuation from a String with Translate

One of the easiest ways to remove punctuation from a string in Python is to use the str.translate() method. The translate() method typically takes a translation table, which we’ll do using the .maketrans() method.

Let’s take a look at how we can use the .translate() method to remove punctuation from a string in Python. In order to do this, we’ll import the built-in string library, which comes bundled with a punctuation attribute.

The .maketrans() method here takes three arguments, the first two of which are empty strings, and the third is the list of punctuation we want to remove. This tells the function to replace all punctuation with None .

Want to learn more? If you want to learn how to use the translate method (and others!) to remove a character from a string in Python, check out my in-depth tutorial here.

What is Python’s string.punctuation?

Python comes built-in with a library, string , which includes an attribute string.punctuation that includes many built-in punctuation characters. Because the library is built-in, you don’t need to worry about needing to install it.

In case you’re curious about what punctuation is included in the string.punctuation , let’s have a quick look:

Use Python to Strip Punctuation from a String with Regular Expressions (regex)

The Python regular expression library, re , feels like it can do just about anything – including stripping punctuation from a string!

Regular expressions are great because it comes built-in with a number of helpful character classes that allow us to select different types of characters. For example, \w\s looks for words or whitespaces. We can select the opposite of this (i.e., anything that isn’t a word or whitespace) using the ^ character. This, then, allows us to select anything that isn’t a word or whitespace, which in our case, it selects punctuation.

Let’s see how we can use regex to remove punctuation in Python:

This is a great approach that looks for anything that isn’t an alphanumeric character or whitespace, and replaces it with a blank string, thereby removing it.

Use Python to Remove Punctuation from a String with str.replace

The str.replace() method makes easy work of replacing a single character. For example, if you wanted to only replace a single punctuation character, this would be a simple, straightforward solution.

Let’s say you only wanted to replace the ! character from our string, we could use the str.replace() method to accomplish this. Let’s take a look at how to:

What we’ve done here, is append the .replace() method to our string. The first parameter is the string to replace, which in this case is our ! character. The second parameter is what to replace it with, which in this case is an empty string.

In the next example, you’ll learn how to use a for loop to replace all punctuation from a string using a for-loop.

Use Python to Strip Punctuation from a String using a for-loop

In the previous section of the tutorial, you learned how to use the str.replace() method to remove a single punctuation character. In this section, we’ll repeat this example, but use a for-loop to be able to remove every punctuation character.

Let’s see how we can do this in Python:

One of the things to note here is that we’re writing over our original string here. We can’t assign a new string, as it will continuously replace itself.

Now that you’ve learned a number of methods, let’s see which of these methods is the fastest.

What is the fastest way to strip a Python String from Punctuation?

In this tutorial, you’ve learned three different methods to remove punctuation from a string in Python. Let’s see which of these methods is the fastest.

For this test, we created a string that’s over 1,000,000,000 characters long and removed all punctuation from a string using Python.

Let’s take a look at the results:

What is the fastest way to strip a Python String from Punctuation?

The str.translate() method is the fastest way to remove punctuation from a string in Python – sometimes up to 40 times faster!

Of course, speed isn’t everything, but finding code that significantly slows down your code will often lead to a poorer user experience.

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *