API Documentation

class ICGC_data_parser.SSM_Reader(*args, **kwargs)[source]

Reader class for the International Cancer Genome Consortium aggregate file of simple somatic mutations from the Data Releases.

Example:

>>> reader = SSM_Reader(filename='data/ssm_sample.vcf')

>>> for record in reader.parse(filters=['BRCA-EU']):
...    print(record.ID, record.CHROM, record.POS)
MU66865518 1 100141201
MU65487875 1 100160548
MU66281118 1 100638179
MU66254120 1 101352655
    ...
iter_lines(filters=None)[source]

Iterate through the file’s raw lines, filtering out the ones not matching the regular expressions given.

next_array(strict_whitespace=False)[source]

Fetch the next line splitted into fields.

If strict_whitespace is True, then split on tabs rather than whitespace. This allows for fields with spaces in them.

next_line()[source]

Fetch the next raw line from the file.

parse(filters=None)[source]

Iterate through the records of the file, filtering out the lines that do not match the regular expressions given.

Example:

>>> reader = SSM_Reader(filename='data/ssm_sample.vcf')

>>> for record in reader.parse(filters=['BRCA-EU']):
...    print(record.ID)
MU66865518
MU65487875
MU66281118
MU66254120
    ...
push_line(line)[source]

Rebuffers line so that it is parsed next.

subfield_parser(sf_name, sep='|')[source]

Get a parser for the items of the subfield.

Useful to parse the CONSEQUENCE and OCCURRENCE subfields of the INFO field.

Example:

>>> reader = SSM_Reader(filename='data/ssm_sample.vcf')

>>> CONSEQUENCE = reader.subfield_parser('CONSEQUENCE')

>>> for record in reader.parse(filters=['BRCA-EU']):
...    # Which genes are affected?
...    print(CONSEQUENCE(record)[0].gene_symbol)
SLC27A3
GATAD2B
TPM3
SHE
ADAM15
  ...