R : Select or Remove Columns from Data Frame
The article below explains how to select or remove columns (variables) from dataframe in R. In R, there are multiple ways to select or delete a column.
The following code creates a sample data frame that is used for demonstration.
Sample Data |
R : Remove column by name
In base R there are multiple ways to delete columns by name.
Method I : subset() function
The most easiest way to remove columns is by using subset() function. In the code below, we are telling R to drop variables x and z. The ‘-‘ sign indicates dropping variables. Make sure the variable names would NOT be specified in quotes when using subset() function.
Method II : ! sign
In this method, we are creating a character vector named drop in which we are storing column names x and z. Later we are telling R to select all the variables except the column names specified in the vector drop. The function names() returns all the column names and the ‘!’ sign indicates negation.
R : Remove columns by column index numbers
It’s easier to remove columns by their position number. All you just need to do is to mention the column index number. In the following code, we are telling R to delete columns that are positioned at first column, third and fourth columns. The minus sign is to drop variables.
In this case, we are telling R to keep only variables that are placed at second and fourth position.
Select or Delete columns with dplyr package
Remove Columns by Name Pattern
Keep / Drop Columns by pattern |
The same logic can be applied to a word as well if you wish to find out columns containing a particular word. In the example below, we are trying to keep columns where it contains C_A and creates a new dataframe for the retained columns.
The following program automates selecting or deleting columns from a data frame.
To keep variables ‘a’ and ‘x’, use the code below. The drop = 0 implies keeping variables that are specified in the parameter «cols». The parameter «data» refers to input data frame. «cols» refer to the variables you want to keep / remove. «newdata» refers to the output data frame.
To drop variables, use the code below. The drop = 1 implies removing variables which are defined in the second parameter of the function.
Как удалить столбцы из фрейма данных в R (с примерами)
Самый простой способ удалить столбцы из фрейма данных в R — использовать функцию subset() , которая использует следующий базовый синтаксис:
В следующих примерах показано, как использовать эту функцию на практике со следующим фреймом данных:
Пример 1. Удаление столбцов по имени
В следующем коде показано, как удалить столбцы из фрейма данных по имени:
Пример 2. Удаление столбцов по индексу
В следующем коде показано, как удалить столбцы из фрейма данных по индексу:
Пример 3: удаление столбцов в списке
В следующем коде показано, как удалить из фрейма данных столбцы, принадлежащие определенному списку:
Пример 4: удаление столбцов в диапазоне
В следующем коде показано, как удалить столбцы из фрейма данных в определенном диапазоне:
Remove an entire column from a data.frame in R
Does anyone know how to remove an entire column from a data.frame in R? For example if I am given this data.frame:
and I want to remove the 2nd column.
8 Answers 8
You can set it to NULL .
As pointed out in the comments, here are some other possibilities:
You can remove multiple columns via:
Be careful with matrix-subsetting though, as you can end up with a vector:
To remove one or more columns by name, when the column names are known (as opposed to being determined at run-time), I like the subset() syntax. E.g. for the data-frame
to remove just the a column you could do
and to remove the b and d columns you could do
You can remove all columns between d and b with:
As I said above, this syntax works only when the column names are known. It won’t work when say the column names are determined programmatically (i.e. assigned to a variable). I’ll reproduce this Warning from the ?subset documentation:
Warning:
This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like ‘[‘, and in particular the non-standard evaluation of argument ‘subset’ can have unanticipated consequences.
How to Remove Column in R?
To remove a single column or multiple columns in R DataFrame use square bracket notation [] or use functions from third-party packages like dplyr. There are several ways to remove columns or variables from the R DataFrame (data.frame).
Please enable JavaScript
1. Prepare the Data
Let’s create an R DataFrame, run these examples and explore the output. If you already have data in CSV you can easily import CSV files to R DataFrame. Also, refer to Import Excel File into R.
2. Remove Column using R Base Functions
By using R base function subset() or square bracket notation you can remove single or multiple columns by index/name from the R DataFrame.
2.1 Remove Column by Index
First, let’s use the R base bracket notation df[] to remove the column by Index. This notation takes syntax df[, columns] to select columns in R, And to remove columns you have to use the – (negative) operator.
The following example removes the second column by Index from the R DataFrame.
2.2 Remove Columns by Range
This notation also supports selecting columns by the range and using the negative operator to remove columns by range. In the following example, removes all rows between 2 and 4 indexes, which ideally removes columns pages , names , and chapters .
2.3 Remove Multiple Columns
Use vector to specify the column/vector indexes you want to remove from the R data frame. The following example removes multiple columns with indexes 2 and 3.
2.4 Remove Columns From List
You can also use the column names from the list to remove them from the R data frame. Here I am using names() function which returns all column names and checks if a name is present in the list using %in% operator.
2.5 By using subset() Function
By using the R base function subset() you can remove columns by name from the data frame. This function takes the data frame object as an argument and the columns you wanted to remove.
Yields the same output as above.
3. Remove Columns by using dplyr Functions
In this section, I will use functions from the dplyr package to remove columns in R data frame. dplyr is an R package that provides a grammar of data manipulation and provides a most used set of verbs that helps data science analysts to solve the most common data manipulation. In order to use this, you have to install it first using install.packages(‘dplyr’) and load it using library(dplyr) .
3.1 Remove Column by Matching
dplyr select() function is used to select the column and by using negation of this to remove columns. All verbs in dplyr package take data.frame as a first argument. When we use dplyr package, we mostly use the infix operator %>% from magrittr , it passes the left-hand side of the operator to the first argument of the right-hand side of the operator.
For example, x %>% f(y) converted into f(x, y) so the result from left-hand side is then “piped” into the right-hand side. This pipe can be used to write multiple operations that you can read left-to-right.
3.2 Remove Variables By Name Range
The same function can also be used to remove variables by name range.
3.3 Remove Variables using contains
Use -contains() to ignore columns that contain text. The following example removes the column chapters as it contains text apt . This function also takes a list of values to check contains.
3.4 Remove Column starts with
Use -starts_with() to ignore columns that start with a text. The following example removes the column chapters as it starts with character c.
3.5 Remove Column ends with
Similarly, use -ends_with() to remove variables that end with a text, the following examples remove name and price columns as they end with the letter e.
3.6 Remove Columns if it exists
Finally, use the one_of() function to check if the column exists and then remove it from the data frame only when exists. If a column is not found, it returns a warning.