ChIP-seq ambiguous tag mapper


ChIP-seq ambiguous tag mapper - a Gibbs sampling algorithm for the mapping of ambiguous ChIP-seq sequence tags. A collaboration with the Lunyak laboratory at the Buck Institute for Age Research.




  • - Scripts used to map ambiguous tags by Gibbs sampling
  • - Script used to format the output from Bowtie
  • - Script used to organize unique tags using output from

Sample Data

  • - Files with the real genomic sites for testing
  • - Files with the short sequence tags
  • - Formatted output from Bowtie and files for unique tags. These files are the input for


To run our ambiguous tag mapping program, you need to do tag-to-genome mapping with the program Bowtie first and then re-format the Bowtie output before running our script (see Documentation and/or parts A-E below).

Alternatively, to test our program you can use the test data we provide and run the program directly (see Documenation and/or parts D and E below).

Notice: the scripts were changed and updated on February 16, 2011. Please use the updated version.

A. Run Bowtie program to obtain the initial mapping of sequence tags.

B. Format the output from Bowtie program:
1. The output file from Bowtie program need to be formatted in order to run “”.
2. Run the script “” to change the format.
3. Command line:

 perl –p directory –i Bowtie output file -o output.bed

-p: directory where the Bowtie output file (initial mapping of tags) is located;
 -i:  the Bowtie output file (initial mapping of tags), including both unique and ambiguous tags;
 -o:  the name of the output file. The default name is “mapping_result.bed”

4. The resulting output file is in this format: “tag_id chr>position>strand,chr>position>strand,…”
HWI-EAS229_75_30DY0AAXX:4:1:0:1282/1 chr18>6452262>+,
HWI-EAS229_75_30DY0AAXX:4:1:0:1282/2 chr18>6452351>+,chr4>66122359>-,

C. Organize unique tags:
1. In order to set the parameters for the algorithm, the unique tags need to be organized.
2. Run “” to organize the unique tags.
3. Command line:

 perl -p directory -i the formatted mapping file -o output.bed -l length of adjacent region   
 -p:  directory where the formatted mapping file is located   
 -i:  the formatted mapping file generated by “”  
 -o:  name of output file, the default name is "unique_screen_result.bed"
 -l:  length of adjacent region for co-located tags, the default value is 147

4. The resulting output file has this format:
“chr position_start position_end unique_tag_count”
chr1 795260 795406 226
chr1 830067 830213 166

D. Run "":
1. After the preparation through the steps above, run “” to apply the Gibbs sampling method to assign
each ambiguous tag to a specific genomic site.
2. Command line:

 perl -p path -f mapping_result.file -u unique_mapping.file -o output.bed -l region_length 
-r maximal_tag_number -a ambiguous_confidence -m iteration_number

-p: the directory where the files (formatted mapping file & the file for unique tags) are located.
 -f:  the formatted mapping file generated by “”.
 -u:  the file for unique tags generated by “”.
 -o:  name of the output file.
 -l:  the length of the adjacent region for co-located tags. The default value is 147.
 -r:  the maximal tag count used to construct the likelihood table. The default value is 50.
 -a:  the relative confidence of ambiguous tags. The default value is 0.2.
 -m:  the number of iterations. The default value is 5.

E: Sample Data

  1. The sample data are the libraries we used to evaluate the algorithm performance.
  2. “”: files with the real genomic sites in the benchmarks. It contains 2 files.
    “benchmark_8lib.bed” is the benchmark for the 8 smaller sequence libraries and “benchmark_biglib.bed” is for the bigger library.
  3. “”: files with the short sequence tags. There are 9 files inside the folder. “tag_lib*.fastq” are tags for the 8 smaller
    libraries respectively. “tag_biglib.fastq” is the file with tags for the bigger library.
  4. “”: contains
    1. “mapping_result_lib*.bed” & “mapping_result_biglib.bed”: the formatted files (generated by “”) with initial mapping results from Bowtie program;
    2. “unique_screen_result_lib*.bed” & “unique_screen_result_biglib.bed”: the files for unique tags (generated by “”).
  5. In order to test the algorithm, files in “” are enough. If you want to make sure the initial mappings are correct, you can run Bowtie on files in “” and then run “” and “”.
  6. Files in the “” could be used to compare the final map of tags to evaluated the performances.