banner



How Do You Filter Your Data By Certain Values In A Column In Plot In R

  • Plotting with ggplot2
  • Boxplot
  • Faceting
  • barplot
  • Plotting time series data
  • Resources for going farther
                          download.file("http://datacarpentry.github.io/dc_zurich/data/portal_data_joined.csv",              "information/portal_data_joined.csv") surveys_complete <-                            read.csv(file =              "data/portal_data_joined.csv")          

In this lesson, we volition be using functions from the ggplot2 bundle to create plots. At that place are plotting capabilities that come with R, only ggplot2 provides a consistent and powerful interface that allows yous to produce high quality graphics apace, allowing an efficient exploration of your datasets. The functions in base R have different strengths, and are useful if you are trying to draw very specific plots, in particular if they are plots that are non representation of statistical graphics.

Plotting with ggplot2

With ggplot, plots are build step-past-step in layers. This layering system is based on the idea that statistical graphics are mapping from data to artful attributes (colour, shape, size) of geometric objects (points, lines, bars). The plot may also contain statistical transformations of the data, and is drawn on a specific coordinate system. Faceting can be used to generate the same plot for unlike subsets of the dataset.

To build a ggplot we demand to:

  • demark plot to a specific data frame
  • define aestetics (aes), that maps variables in the data to axes on the plot or to plotting size, shape color, etc.,

  • add together geoms – graphical representation of the data in the plot (points, lines, confined). To add a geom to the plot utilize + operator:

                              ggplot(surveys_complete,                aes(x =                weight,                y =                hindfoot_length)) +                                                geom_point()            

Nosotros can reduce over-plotting by adding some jitter:

                              ggplot(surveys_complete,                aes(x =                weight,                y =                hindfoot_length)) +                                                geom_point(position =                position_jitter())            

We can add additional artful values according to other backdrop from our dataset. For instance, if nosotros want to color points differently depending on the species.

                              ggplot(surveys_complete,                aes(x =                weight,                y =                hindfoot_length,                colour =                species_id)) +                                                geom_point(position =                position_jitter())            

Nosotros tin make the points more transparent so we can assess the overplotting.

                              ggplot(surveys_complete,                aes(ten =                weight,                y =                hindfoot_length,                colour =                species_id)) +                                                geom_point(alpha =                0.3,                position =                position_jitter())            

Just like we did for the species_id and the colors, we tin do the same with using different shapes for

                              ggplot(surveys_complete,                aes(x =                weight,                y =                hindfoot_length,                colour =                species_id,                shape =                as.factor(plot_id))) +                                                geom_point(blastoff =                0.3,                position =                position_jitter())            

ggplot2 too allows you to calculate straight some statistical

                              ggplot(surveys_complete,                aes(x =                weight,                y =                hindfoot_length,                colour =                species_id)) +                                                geom_point(alpha =                0.iii,                position =                position_jitter()) +                                stat_smooth(method =                "lm")            

              surveys_complete %>%                                                filter(species_id ==                  "DS") %>%                                                ggplot(aes(x =                weight,                y =                hindfoot_length,                colour =                species_id)) +                                                geom_point(alpha =                0.3,                position =                position_jitter()) +                                stat_smooth(method =                "lm")            

              ## ggplot(subset(surveys_complete,  species_id == "DS"), aes(x = weight, y = hindfoot_length, colour = species_id)) + ##  geom_point(alpha = 0.iii,  position = position_jitter()) + stat_smooth(method = "lm")            
                              ggplot(subset(surveys_complete, species_id ==                  "DS"),                aes(x =                weight,                y =                hindfoot_length,                colour =                species_id)) +                                                geom_point(alpha =                0.iii,                position =                position_jitter()) +                                stat_smooth(method =                "lm") +                                                ylim(c(0,                threescore))            

Using ylim subsets the data to be represented:

                              ggplot(surveys_complete,                aes(x =                weight,                y =                hindfoot_length,                colour =                species_id)) +                                                geom_point(alpha =                0.three,                position =                position_jitter()) +                                stat_smooth(method =                "lm") +                                                ylim(c(40,                60))            

while setting limits with coord_cartesian acts a magnifying glass:

                              ggplot(surveys_complete,                aes(x =                weight,                y =                hindfoot_length,                color =                species_id)) +                                                geom_point(alpha =                0.iii,                position =                position_jitter()) +                                stat_smooth(method =                "lm") +                                                coord_cartesian(ylim =                c(forty,                60))            

Boxplot

Visualising the distribution of weight within each species.

                              ggplot(subset(surveys_complete, !is.na(weight)),                aes(x =                species_id,                y =                weight)) +                                                geom_boxplot()            

By calculation points to boxplot, nosotros can see particular measurements and the abundance of measurements.

                              ggplot(subset(surveys_complete, !is.na(weight)),                aes(x =                species_id,                y =                weight)) +                                                geom_point(alpha=                0.3,                color=                "lycopersicon esculentum",                position =                "jitter") +                                                geom_boxplot(alpha=                0) +                                coord_flip()            

Challenge

  • Create boxplot for hindfoot_length, and modify the color of the points.
  • Supervene upon the boxplot past a violin plot
  • Add the layer coord_flip()

Faceting

                              ggplot(subset(surveys_complete, !is.na(weight)),                aes(species_id, weight)) +                                                geom_point(blastoff=                0.3,                color=                "tomato",                position =                "jitter") +                                                geom_boxplot(alpha=                0) +                                coord_flip() +                                facet_wrap( ~                sex)            

Challenge

  • Modify the data frame so we merely look at males and females
  • Modify the colors, so points for males and females are different
  • Alter the data frame to only plot three species of your choosing
                              ggplot(subset(surveys_complete, species_id %in%                                c("DO",                "DM",                "DS") &                sexual activity %in%                                c("F",                "M")),                aes(x =                sexual activity,                y =                weight,                colour =                interaction(sex, species_id))) +                                facet_wrap( ~                species_id) +                                                geom_point(blastoff =                0.three,                position =                "jitter") +                                                geom_boxplot(blastoff =                0,                colour =                "black")            

barplot

                              ggplot(surveys_complete,                aes(species_id)) +                                geom_bar()  surveys_complete %>%                                                filter(!is.na(weight)) %>%                                                group_by(species_id) %>%                                                summarize(mean_weight =                mean(weight)) %>%                                                ggplot(aes(x =                species_id,                y =                mean_weight)) +                                geom_bar(stat =                "identity")            

Claiming

Repeat the aforementioned thing on the hindfoot length instead of the weight

Plotting time series data

Let's calculate number of counts per year for each species. To do that we need to group data first and count records inside each group.

              yearly_counts <-                surveys_complete %>%                                                group_by(twelvemonth, species_id) %>%                                                summarise(count=                n())            

Timelapse data can be visualised as a line plot with years on ten axis and counts on y axis.

                              ggplot(yearly_counts,                aes(x=year,                y=count)) +                                                geom_bar(stat =                "identity")            

This is the plot data for all the species together. We demand to tell ggplot to split graphed information by species_id

                              ggplot(yearly_counts,                aes(x=yr,                y=count,                group=species_id)) +                                                geom_line()            

We volition exist able to distiguish species in the plot if we add colors.

                              ggplot(yearly_counts,                aes(x=year,                y=count,                group=species_id,                color=species_id)) +                                                geom_line()            

Challenge

  1. Draw the yearly counts for the species DO, DS, DM
  2. Draw the yearly counts for each plot type
  3. Draw the yearly counts for all taxa merely Rodents
  4. Draw the yearly counts for species that have been captured more than than 2000 times over the class of the surveys (hard)
  5. Describe the yearly counts for the species that have been captured at to the lowest degree 300 times in one year (difficult)
              surveys_complete %>%                                                group_by(taxa, yr) %>%                                tally %>%                                                filter(taxa !=                  "Rodent") %>%                                                ggplot(aes(ten =                twelvemonth,                y =                n,                group =                taxa,                color =                taxa)) +                                                geom_line()            

              surveys_complete %>%                                                group_by(plot_type, year) %>%                                tally %>%                                                ggplot(aes(10 =                year,                y =                n,                group =                plot_type,                color =                plot_type)) +                                                geom_line()            

              ### Easy  yearly_counts %>%                                                filter(species_id %in%                                c("Do",                "DS",                "DM")) %>%                                                ggplot(aes(x =                year,                y =                count,                grouping =                species_id,                colour =                species_id)) +                                                geom_line()            

              ### Difficult  sp_totals <-                surveys_complete %>%                                                group_by(species_id) %>%                                                summarise(count =                north()) %>%                                                filter(count >                                2000) %>%                                .$species_id  yearly_counts %>%                                                filter(species_id %in%                sp_totals) %>%                                                ggplot(aes(x =                year,                y =                count,                group =                species_id,                color =                species_id)) +                                                geom_line()            

              ### More difficult  sp_250 <-                yearly_counts %>%                                                filter(count >=                                300) %>%                                ungroup %>%                                                select(species_id) %>%                                unique  yearly_counts %>%                                                filter(species_id %in%                sp_250$species_id) %>%                                                ggplot(aes(10 =                twelvemonth,                y =                count,                group =                species_id,                color =                species_id)) +                                                geom_line()            

How Do You Filter Your Data By Certain Values In A Column In Plot In R,

Source: https://datacarpentry.org/dc_zurich/R-ecology/05-visualisation-ggplot2

Posted by: miercirmly1939.blogspot.com

0 Response to "How Do You Filter Your Data By Certain Values In A Column In Plot In R"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel