API Documentation¶
-
class
ICGC_data_parser.
SSM_Reader
(*args, **kwargs)[source]¶ Reader class for the International Cancer Genome Consortium aggregate file of simple somatic mutations from the Data Releases.
Example:
>>> reader = SSM_Reader(filename='data/ssm_sample.vcf') >>> for record in reader.parse(filters=['BRCA-EU']): ... print(record.ID, record.CHROM, record.POS) MU66865518 1 100141201 MU65487875 1 100160548 MU66281118 1 100638179 MU66254120 1 101352655 ...
-
iter_lines
(filters=None)[source]¶ Iterate through the file’s raw lines, filtering out the ones not matching the regular expressions given.
-
next_array
(strict_whitespace=False)[source]¶ Fetch the next line splitted into fields.
If
strict_whitespace
is True, then split on tabs rather than whitespace. This allows for fields with spaces in them.
-
parse
(filters=None)[source]¶ Iterate through the records of the file, filtering out the lines that do not match the regular expressions given.
Example:
>>> reader = SSM_Reader(filename='data/ssm_sample.vcf') >>> for record in reader.parse(filters=['BRCA-EU']): ... print(record.ID) MU66865518 MU65487875 MU66281118 MU66254120 ...
-
subfield_parser
(sf_name, sep='|')[source]¶ Get a parser for the items of the subfield.
Useful to parse the CONSEQUENCE and OCCURRENCE subfields of the INFO field.
Example:
>>> reader = SSM_Reader(filename='data/ssm_sample.vcf') >>> CONSEQUENCE = reader.subfield_parser('CONSEQUENCE') >>> for record in reader.parse(filters=['BRCA-EU']): ... # Which genes are affected? ... print(CONSEQUENCE(record)[0].gene_symbol) SLC27A3 GATAD2B TPM3 SHE ADAM15 ...
-