How do I count treated and untreated in R

I'm trying to learn R again and am trying to count the number total number of genes that are "treated" and "untreated" with dex in the bioconductor airway dataset. (https://bioconductor.org/packages/release/data/experiment/html/airway.html).

I'm trying:

airway$dex=='trted'
#[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

and it's not working.

2 answers

  • answered 2021-11-27 01:01 Adele

    Use sum() function to count True values:

    sum(airway$dex=='trted')
    

  • answered 2021-11-27 01:11 IRTFM

    After installing that package I performed the following actions at my console ( and including all output):

    > library(airway)
    Loading required package: SummarizedExperiment
    Loading required package: MatrixGenerics
    Loading required package: matrixStats
    
    Attaching package: ‘matrixStats’
    
    The following object is masked from ‘package:dplyr’:
    
        count
    
    
    Attaching package: ‘MatrixGenerics’
    
    The following objects are masked from ‘package:matrixStats’:
    
        colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse, colCounts, colCummaxs, colCummins,
        colCumprods, colCumsums, colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs, colMads, colMaxs,
        colMeans2, colMedians, colMins, colOrderStats, colProds, colQuantiles, colRanges, colRanks, colSdDiffs,
        colSds, colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads, colWeightedMeans,
        colWeightedMedians, colWeightedSds, colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
        rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods, rowCumsums, rowDiffs, rowIQRDiffs,
        rowIQRs, rowLogSumExps, rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins, rowOrderStats,
        rowProds, rowQuantiles, rowRanges, rowRanks, rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs,
        rowVars, rowWeightedMads, rowWeightedMeans, rowWeightedMedians, rowWeightedSds, rowWeightedVars
    
    Loading required package: GenomicRanges
    Loading required package: stats4
    Loading required package: BiocGenerics
    Loading required package: parallel
    
    Attaching package: ‘BiocGenerics’
    
    The following objects are masked from ‘package:parallel’:
    
        clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply,
        parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB
    
    The following objects are masked from ‘package:bit64’:
    
        match, order, rank
    
    The following objects are masked from ‘package:dplyr’:
    
        combine, intersect, setdiff, union
    
    The following objects are masked from ‘package:stats’:
    
        IQR, mad, sd, var, xtabs
    
    The following objects are masked from ‘package:base’:
    
        anyDuplicated, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval,
        evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget, order,
        paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
        table, tapply, union, unique, unsplit, which.max, which.min
    
    Loading required package: S4Vectors
    
    Attaching package: ‘S4Vectors’
    
    The following object is masked from ‘package:Matrix’:
    
        expand
    
    The following objects are masked from ‘package:data.table’:
    
        first, second
    
    The following objects are masked from ‘package:tidygraph’:
    
        active, rename
    
    The following object is masked from ‘package:tidyr’:
    
        expand
    
    The following objects are masked from ‘package:dplyr’:
    
        first, rename
    
    The following object is masked from ‘package:base’:
    
        expand.grid
    
    Loading required package: IRanges
    
    Attaching package: ‘IRanges’
    
    The following object is masked from ‘package:data.table’:
    
        shift
    
    The following object is masked from ‘package:nlme’:
    
        collapse
    
    The following object is masked from ‘package:tidygraph’:
    
        slice
    
    The following object is masked from ‘package:purrr’:
    
        reduce
    
    The following objects are masked from ‘package:dplyr’:
    
        collapse, desc, slice
    
    Loading required package: GenomeInfoDb
    Loading required package: Biobase
    Welcome to Bioconductor
    
        Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see
        'citation("Biobase")', and for packages 'citation("pkgname")'.
    
    
    Attaching package: ‘Biobase’
    
    The following object is masked from ‘package:MatrixGenerics’:
    
        rowMedians
    
    The following objects are masked from ‘package:matrixStats’:
    
        anyMissing, rowMedians
    
    The following object is masked from ‘package:bit64’:
    
        cache
    
    
    Attaching package: ‘SummarizedExperiment’
    
    The following object is masked from ‘package:SeuratObject’:
    
        Assays
    
    The following object is masked from ‘package:Seurat’:
    
        Assays
    

    I looked at the help page

    > help(pac=airway)
    

    So after reading that I thought the airway dataset might be accessible, but no:

    > str(airway)
    Error in str(airway) : object 'airway' not found
    

    So I tried loading it with the data function (and no error was reported) so I looked at its structure:

    > data(airway)
    > str(airway)
    Formal class 'RangedSummarizedExperiment' [package "SummarizedExperiment"] with 6 slots
      ..@ rowRanges      :Formal class 'GRangesList' [package "GenomicRanges"] with 3 slots
      .. .. ..@ elementMetadata:Formal class 'DataFrame' [package "IRanges"] with 6 slots
      .. .. .. .. ..@ rownames       : NULL
      .. .. .. .. ..@ nrows          : int 64102
      .. .. .. .. ..@ listData       : Named list()
      .. .. .. .. ..@ elementType    : chr "ANY"
      .. .. .. .. ..@ elementMetadata: NULL
      .. .. .. .. ..@ metadata       : list()
      .. .. ..@ elementType    : chr "GRanges"
      .. .. ..@ metadata       :List of 1
      .. .. .. ..$ genomeInfo:List of 20
      .. .. .. .. ..$ Db type                                 : chr "TranscriptDb"
      .. .. .. .. ..$ Supporting package                      : chr "GenomicFeatures"
      .. .. .. .. ..$ Data source                             : chr "BioMart"
      .. .. .. .. ..$ Organism                                : chr "Homo sapiens"
      .. .. .. .. ..$ Resource URL                            : chr "www.biomart.org:80"
      .. .. .. .. ..$ BioMart database                        : chr "ensembl"
      .. .. .. .. ..$ BioMart database version                : chr "ENSEMBL GENES 75 (SANGER UK)"
      .. .. .. .. ..$ BioMart dataset                         : chr "hsapiens_gene_ensembl"
      .. .. .. .. ..$ BioMart dataset description             : chr "Homo sapiens genes (GRCh37.p13)"
      .. .. .. .. ..$ BioMart dataset version                 : chr "GRCh37.p13"
      .. .. .. .. ..$ Full dataset                            : chr "yes"
      .. .. .. .. ..$ miRBase build ID                        : chr NA
      .. .. .. .. ..$ transcript_nrow                         : chr "215647"
      .. .. .. .. ..$ exon_nrow                               : chr "745593"
      .. .. .. .. ..$ cds_nrow                                : chr "537555"
      .. .. .. .. ..$ Db created by                           : chr "GenomicFeatures package from Bioconductor"
      .. .. .. .. ..$ Creation time                           : chr "2014-07-10 14:55:55 -0400 (Thu, 10 Jul 2014)"
      .. .. .. .. ..$ GenomicFeatures version at creation time: chr "1.17.9"
      .. .. .. .. ..$ RSQLite version at creation time        : chr "0.11.4"
      .. .. .. .. ..$ DBSCHEMAVERSION                         : chr "1.0"
      ..@ colData        :Formal class 'DataFrame' [package "IRanges"] with 6 slots
      .. .. ..@ rownames       : chr [1:8] "SRR1039508" "SRR1039509" "SRR1039512" "SRR1039513" ...
      .. .. ..@ nrows          : int 8
      .. .. ..@ listData       :List of 9
      .. .. .. ..$ SampleName: Factor w/ 8 levels "GSM1275862","GSM1275863",..: 1 2 3 4 5 6 7 8
      .. .. .. ..$ cell      : Factor w/ 4 levels "N052611","N061011",..: 4 4 1 1 3 3 2 2
      .. .. .. ..$ dex       : Factor w/ 2 levels "trt","untrt": 2 1 2 1 2 1 2 1
      .. .. .. ..$ albut     : Factor w/ 1 level "untrt": 1 1 1 1 1 1 1 1
      .. .. .. ..$ Run       : Factor w/ 8 levels "SRR1039508","SRR1039509",..: 1 2 3 4 5 6 7 8
      .. .. .. ..$ avgLength : int [1:8] 126 126 126 87 120 126 101 98
      .. .. .. ..$ Experiment: Factor w/ 8 levels "SRX384345","SRX384346",..: 1 2 3 4 5 6 7 8
      .. .. .. ..$ Sample    : Factor w/ 8 levels "SRS508567","SRS508568",..: 2 1 3 4 5 6 7 8
      .. .. .. ..$ BioSample : Factor w/ 8 levels "SAMN02422669",..: 1 4 6 2 7 3 8 5
      .. .. ..@ elementType    : chr "ANY"
      .. .. ..@ elementMetadata: NULL
      .. .. ..@ metadata       : list()
      ..@ assays         :Reference class 'ShallowSimpleListAssays' [package "GenomicRanges"] with 1 field
      .. ..$ data:Formal class 'SimpleList' [package "IRanges"] with 4 slots
      .. .. .. ..@ listData       :List of 1
      .. .. .. .. ..$ counts: int [1:64102, 1:8] 679 0 467 260 60 0 3251 1433 519 394 ...
      .. .. .. ..@ elementType    : chr "ANY"
      .. .. .. ..@ elementMetadata: NULL
      .. .. .. ..@ metadata       : list()
      .. ..and 12 methods.
      ..@ NAMES          : NULL
      ..@ elementMetadata:Formal class 'DataFrame' [package "S4Vectors"] with 6 slots
      .. .. ..@ rownames       : NULL
      .. .. ..@ nrows          : int 64102
      .. .. ..@ listData       : Named list()
      .. .. ..@ elementType    : chr "ANY"
      .. .. ..@ elementMetadata: NULL
      .. .. ..@ metadata       : list()
      ..@ metadata       :List of 1
      .. ..$ :Formal class 'MIAME' [package "Biobase"] with 13 slots
      .. .. .. ..@ name             : chr "Himes BE"
      .. .. .. ..@ lab              : chr NA
      .. .. .. ..@ contact          : chr ""
      .. .. .. ..@ title            : chr "RNA-Seq transcriptome profiling identifies CRISPLD2 as a glucocorticoid responsive gene that modulates cytokine"| __truncated__
      .. .. .. ..@ abstract         : chr "Asthma is a chronic inflammatory respiratory disease that affects over 300 million people worldwide. Glucocorti"| __truncated__
      .. .. .. ..@ url              : chr "http://www.ncbi.nlm.nih.gov/pubmed/24926665"
      .. .. .. ..@ pubMedIds        : chr "24926665"
      .. .. .. ..@ samples          : list()
      .. .. .. ..@ hybridizations   : list()
      .. .. .. ..@ normControls     : list()
      .. .. .. ..@ preprocessing    : list()
      .. .. .. ..@ other            : list()
      .. .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
      .. .. .. .. .. ..@ .Data:List of 2
      .. .. .. .. .. .. ..$ : int [1:3] 1 0 0
      .. .. .. .. .. .. ..$ : int [1:3] 1 1 0
    

    Scanning through that list of S4 structured data I saw this line:

          .. .. .. ..$ dex       : Factor w/ 2 levels "trt","untrt": 2 1 2 1 2 1 2 1
    

    So the dex items do have "trt" and "untrt" as values but that "column" is located somewhat deeper in the entire DesignedExperiment structure. There might be a specific function, that I do not know the name of, to pull out values from such structures, but we now have enough information to answer (or hack together) the question. Follow the names and operators in that nested list backward to its origin and use the S4 extraction operator: "@" where it appropriate and $ when not:

    sum( airway@ colData @ listData $ dex == "trt")
    #[1] 4
    

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum