Transcription factor search tools




















Search corresponding promoters for your sequence s To activate the mapping, simply check the box in this field. Of course, you must use one of the sequence input options as well. Choose an organism from the drop-down list. If you submitted a gene, all promoters of this gene are extracted directly from the ElDorado genome database. If you submitted sequences, they are mapped against the selected genome in the ElDorado database. The promoters of all transcripts with at least one exon identical to one of the mapped exons match your query.

On the search result page, you may select the promoters for analysis with MatInspector. Some notes on promoter finding: Each promoter can have additional annotation:. Depending on the selected MatInspector library a form with more parameters to fill in will appear:.

Note: Certain parameter settings require that the Matrix library is automatically reset, disregarding your selection. The assignment of genes to transcription factors in MatBase depends on the ElDorado database version, i.

The ElDorado version however, is taken into account for the literature-based lines of evidence. Literature analysis is available for the following organisms: all vertebrates, both yeasts, fruitfly and thale cress. This means that, if the literature-based lines of evidence are available, the matrix library which corresponds to the current ElDorado database must be selected.

In particular, this is required if you use the gene input option the input gene is from one of the organisms listed above your matrix selection contains a matrix from a matrix group corresponding to this organism i. A selected subset of matrices can also be saved in a personal directory and can be retrieved via the "use previously defined matrix subsets"-option. There is a difference between matrix family subsets and individual matrix subsets.

All matrices in a family are of the same uneven length and have an anchor position assigned which is the center position of the matrix. This assures that matrices of a family match exactly at the same position.

If matrix families are selected, MatInspector will only list the best match from a family for each site. Otherwise individual matrices selected different but closely related matrices might match at the same position on the sequence example. The maximum core similarity of 1. Only matches that contain the "core sequence" of the matrix with a score higher than the core similarity are listed in the output. Increasing the core similarity will miss matches that have one or more mismatches in the core region but have a high similarity to the rest of the matrix This should only be done to enhance the performance of MatInspector.

Decreasing the core similarity while retaining the same matrix similarity might give a few more matches in the output that have more mismatches in the core region of the matrix. A perfect match to the matrix gets a score of 1. Mismatches in highly conserved positions of the matrix decrease the matrix similarity more than mismatches in less conserved regions.

Increasing the matrix similarity will find less matches in your sequence, but might miss matches that do have a "mismatch" compared to the matrix. The matrix similarity is correlated to the re-value of a matrix: A matrix with a high re-value will find more matches even with a high matrix similarity than a well-defined matrix low re-value. Since there are binding sites that are biologically quite "loosely" defined, a high re-value is not necessarily a sign of a "bad" matrix description. A very low re-value might even be a sign of a description that is too strict.

Optimized matrix similarity: Thresholds that minimize false positives for each individual matrix are supplied with our library and can be selected from the pull-down menu example. There is a limit for the computation for the lines of evidence. For database searches, or if the combined lengths of all input sequences is above 1 million basepairs, the lines of evidence are not available.

Statistics Depending on this option the output can be reduced to contain only the match summary table no graphics, no match details. For database searches it can be interesting to view the statistics only but not the result list as the number of matches to be listed in detail is limited to By default, the graphics are shown for your search results. For database searches or if there are too many sequences more than 50 , this option is not available.

Also, if a sequence has a length of more than bp, the map is omitted in the output. For example: If you know the position for the transcription start site or any position you want to use as the "zero-point" of your sequence , MatInspector can give all matches relative to this position: if a TSP at position is given, just enter as an offset!

Warning: Please use this option only for analyses which can be performed in a short time. If the analysis takes longer than the timeout of the webserver, the connection will be terminated and you will receive an error message e.

In this case, the results will not be available, please restart the analysis using the option below "Send the URL of the result to". The results will be available for a limited time on our server.

For details of how long your results will be kept please see the result-email. After that period they will be deleted unless protected in the project management! For details on the algorithm or how core and matrix similarity are calculated, please see the algorithm details.

The analysis is terminated if matches are found as a larger number of matches will result in a huge output file where your browser may crash when displaying it. Here is an example output for a matrix search using the vertebrate group of matrices:. The features of the interactive graphics, e.

Click the symbol to sort the matches by the number of available evidences. See a detailed description of additional lines of evidence for a transcription factor binding site. Sequence Color code for sequences: For displaying the sequences which match a matrix in the MatInspector library the following code is used:.

Note: Interactive handling of the result tables is deactivated if there are more than 50 sequences or more than matches in one sequence or more than matches in all sequences in the output. Toggling the visibility of table columns It is possible to hide complete columns of the table. Note that there are some columns hidden by default, i. Hint: If the column "Additional lines of evidence" is available, sort the matches by Evidence to display the most relevant matches at the top of the table.

Filtering rows The textfields resp. In a textfield, just type a search term. For the enrichment plots, if a given transcription factor has binding sites in n S out of N S search regions and n B out of N B background regions, then:. By default, ten TFs are shown, but the user can choose to add or remove TFs from the image. There are also options to filter the TFs displayed according to the scan stringency or enrichment P -value, for intuitive exploration of the data.

The scan and enrichment algorithms produce a graphical display of the TFBS locations on the sequences. There are many options to edit the images, including adjusting the deficit and P -value thresholds for displaying TFBSs, selecting or removing TFs to be viewed, editing the colour scheme for TFs and rearranging the order of the sequences.

Promoters or other regulatory regions can be re-arranged or removed and the colour of each TF can be customised for the production of figures. The enrichment analysis produces an additional interactive plot that displays the fold enrichment, average abundance and P -value associated with all TFs Fig 3.

The images created can be saved as publication-quality files and the binding site data and enrichment statistics can be saved as text files for subsequent analysis using additional tools. The data are derived from the proportion of regions bound for each TF, which is the number of bound regions divided by the total number of regions.

The plot shows the enrichment ratio of proportion bound and average log proportion bound. Underlying data are provided in S1 Data. The multi-threading capabilities of Java are used to take advantage of all available computer processors, significantly improving analysis speeds S1 Fig ; alternatively, CiiiDER can also be restricted to use only a certain number of processors.

Enrichment of transcription factors is also displayed using interactive HTML plots generated using the Plotly JavaScript library [ 18 ]. CiiiDER is available for download as a JAR file with supporting files; other software dependencies do not need to be installed. Genomes and associated GLM files are also available for extracting promoter sequences using gene names or Ensembl IDs.

The program and documentation are available from www. These were selected because they were available in narrow peak format with peak max values, to give the highest probability of focusing on the true binding site, and because matching high-quality TRANSFAC TF models were available.

Sequences corresponding to 50 bases either side of the maximum signal of each ChIP-seq peak were obtained. This length was chosen to allow sufficient sequence to identify TFBSs, while minimising extraneous sequence. Backgrounds were produced by using base genomic sequences 10, bases away from the peak, ensuring that none of the background sequences overlapped with surrounding peaks.

CiiiDER scans were performed using deficits of 0. Clover analyses were performed with default values. Robust multiarray averaging RMA -normalised microarray data from Bidwell et al. The query gene list consisted of significantly down-regulated genes that were defined as interferon-inducible in mouse using the Interferome v2.

CiiiDER analyses were performed on the promoter sequences using deficits of 0. A background containing 2, human protein-coding genes was used for enrichment. In order to perform an enrichment analysis for a gene set of interest, it is important to choose an appropriate background [ 27 ]. Comparing an experimentally derived, co-expressed gene list to a genome-wide background may lead to the enrichment of some TFs that are not specifically related to the experiment.

For example, if the query were promoters of genes showing a significant change in expression following a particular treatment, then an appropriate background might be promoters of genes that were expressed in the appropriate cell or tissue type, but showed no significant response to the stimulation [ 9 ].

The ability for CiiiDER to predict key regulatory TFs was demonstrated by reanalysing a published study of the regulation of the immune system in breast cancer [ 23 ].

Bidwell et al. A set of approximately 3, genes were down-regulated in the metastasised cells relative to the primary tumour, of which were determined to be interferon-regulated genes IRGs from the Interferome v1.

In that study, Clover was used to show that these genes were enriched for Irf7 binding sites. The role of Irf7 was confirmed by showing an increase in interferon signalling and a reduction of tumour metastases following restoration of Irf7 expression in the tumour cells in the bone metastasis model.

We reanalysed the normalised expression data from this experiment to create a list of IRGs down-regulated in metastases using the updated Interferome v2.

It is often difficult to accurately distinguish between TFBSs of TFs belonging to a family, since their binding site preferences can be very similar. In this case, cross-referencing with the published expression data revealed that Irf7 was the most significantly suppressed IRF-family TF in metastases, which added supporting evidence to its role.

Since gene orthologues often retain similar functions throughout evolution and maintain a similar method of regulation [ 29 ], CiiiDER could potentially be used to examine phylogenetic conservation, through prediction of enriched TFs, and by creating visualisations to help distinguish patterns in TFBSs. This identified a great number of potential TFBSs for hundreds of TFs see Fig 2 , many of which are likely to be false positives, which makes it difficult to identify likely candidate transcriptional regulators.

The top ten over-represented TFs that occurred in at least half of the promoters were selected for display Fig 4. These are the most significant TFs that are predicted in all promoters, whereas other top significant TFs do not show the same consistent pattern. The results of the enrichment algorithm, displaying the ten most significantly enriched TFs present in at least half of the promoters.

Underlying data are provided in S3 Data. The combination of enrichment analysis and effective visualisation can allow rapid identification of TFBSs that are phylogenetically and spatially conserved. This gives greater support when choosing candidate TFs that are most likely to be involved in regulatory elements. The power of CiiiDER analyses can be increased by linking the results to other data.

As with the breast cancer example, it is worth considering all members of a TF family when choosing TFs for further validation. TFBS enrichment results may be assessed in the context of gene expression data to determine which TFs are detectable or have altered expression levels in the experimental system of interest. It is user-friendly and produces quality visual outputs to assist researchers to uncover signalling pathways and their controlling TFs in a wide variety of biological contexts.

The program, user manual and example data are available at www. Gene sets were loaded into the GUI and promoters were obtained A , TFBSs were predicted across the query promoters B and collated C , background sites were predicted D , the enrichment calculation was performed E and the final graphical outputs were created F.

The site prediction steps take advantage of multiple computer processors. The maximum memory usage was 4. Measurements were made on an iMac with four i7 4. The curves represent the ratio of true binding sites predicted against the number of false binding sites predicted.

Build hg19 mm9 Alignment multiz46way multiz30way. Acknowledgement Development of motifmap is made possible by funding from the National Science Foundation and the National Institutes of Health. My Motifs 0. Select species track and click next.

Distance To Closest Gene Distance bp. Back Next Save. Select species alignment and click next. Enter a gene name official gene symbol or refSeq transcript id and click next.



0コメント

  • 1000 / 1000