SAM
- Header section:
- The header section is not mandatory, but most NGS softwares require it.
- It contains information about five main topics:
- alignment file: format version, sorting;
- reference sequence(s): e.g. name, length, species, url;
- read group: sequencing lane, sample, sequencing center, library etc.;
- program: aligner name and version, parameters used for the alignment;
- custom comment(s).
- Each line of the header section starts with ‘@’ and a two letter record type code.
- Alignment section:
- Every read in the alignment (and sometimes unmapped reads) are represented by one row consisting of tab delimited fields (basically columns).
- If a read is mapped to more than one location, every mapping will have its own row in the sam file.
- There are 11 mandatory fields in each row:
- read name
- bitwise flag (it codes information about the read e.g. mapped/unmapped, paired/not paired, mapped to forward/reverse strand etc.) -> for a “flag decoder”, see here
- reference sequence name
- starting position of the mapped reads on the reference sequence
- mapping quality
- CIGAR string (this is basically a short description of the alignment)
- reference name for the mate (for paired data)
- position of the mate (for paired data)
- distance between paired reads (for paired data)
- nucleotide sequence of the read
- per base quality of the read
- there are several optional fields, for these, see the format specificatio.n
0 comments:
Post a Comment