▼ Public Database integrated in iPCD
(1) Genetic variation & mutation
(6) Disease-associated information
(7) Protein-protein interaction
(10) Post-translational modification
(12) Transcriptional regulator
(14) Protein expression/Proteomics
▼ Public Database integrated in iPCD
(1) Genetic variation & mutation
1. TCGA: A public funded project that aims to catalogue and discover major cancer-causing genomic alterations to create a comprehensive "atlas" of cancer genomic profiles (Hutter, et al., 2018).
2. ICGC: To obtain a comprehensive description of genomic, transcriptomic and epigenomic changes in 50 different tumor types and/or subtypes which are of clinical and societal importance across the globe (Zhang, et al., 2019).
3. COSMIC: The world's largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer (Tate, et al., 2019).
4. dbSNP: The NCBI database of genetic variation (Sherry, et al., 2001).
5. IntOGen: Integration and data mining of multidimensional oncogenomic data (Rubio-Perez, et al., 2015).
6. MIMP: Characterizes genetic variants such as cancer mutations that specifically alter kinase-binding sites in proteins (Wagih, et al., 2015).
7. VarCards: An integrated genetic and clinical database for coding variants in the human genome (Li, et al., 2018).
(2) Functional annotation
1. DrLLPS: Contained 437,887 known and computationally detected LLPS-associated proteins in 164 eukaryotic species (Ning, et al., 2020).
2. iEKPD: Contained 197,348 phosphorylation regulators, including 109,912 protein kinases, 23,294 protein phosphatases and 68,748 PPBD-containing proteins in 164 eukaryotic specie (Guo, et al., 2019).
3. iUUCD: Contained 136,512 UB/UBL regulators, including 1,230 E1s, 5,636 E2s, 93,343 E3s, 9,548 DUBs, 30,173 UBDs and 11,099 ULDs in 148 eukaryotic species (Zhou, et al., 2018).
4. WERAM: Collected over 580 experimentally identified histone regulators from eight model organisms (Xu, et al., 2017).
5. AnimalTFDB 3.0: A comprehensive resource for annotation and prediction of animal transcription factors (Hu, et al., 2019).
6. PlantTFDB 4.0: Plant Transcription Factor Database (Jin, et al., 2017).
7. HAMAP: High-quality Automated and Manual Annotation of Proteins (Pedruzzi, et al., 2015).
8. neXtProt: Now has proteomics data for over 85% of the human proteins, as well as new tools tailored to the proteomics community (Gaudet, et al., 2017).
9. CGDB: Containing ~73,000 circadian-related genes in 68 animals, 39 plants and 41 fungi (Li, et al., 2017).
10. MultitaskProtDB-II: A database of multitasking/moonlighting proteins (Franco-Serrano, et al., 2018).
11. MoonDB 2.0: An updated database of extreme multifunctional and moonlighting proteins (Ribeiro, et al., 2019).
12. CORUM: The comprehensive resource of mammalian protein complexes (Giurgiu, et al., 2019).
13. CellMarker: A manually curated resource of cell markers in human and mouse (Zhang, et al., 2019).
14. GPCRdb: G protein-coupled receptors (GPCRs) (Pándy-Szekeres, et al., 2018).
15. EuRBPDB: A total of 311,571 RNA binding proteins with RNA binding domains and 3,651 non-canonical RNA binding proteins without known RNA binding domains in 162 eukaryotic species (Liao, et al., 2020).
(3) Structural annotation
1. PDB: Contains 41,599 distinct protein sequences, 36,830 structures of human sequences and 9,465 nucleic acid containing structures (Burley, et al., 2019).
2. SCOP2: A prototype of a new structural classification of proteins (Andreeva, et al., 2014).
3. IUPred: Web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content (Dosztányi, et al., 2005).
4. DisProt: The Database of Protein Disorder provides manually curated annotations of intrinsically disordered proteins (Hatos, et al., 2020).
5. DNAproDB: An expanded database and web-based tool contains 95% of all available DNA-protein complexes for structural analysis of DNA-protein complexes (Sagendorf, et al., 2020).
(4) Physicochemical property
1. AAindex: A database of numerical indices representing various physicochemical and biochemical properties of amino acids and pairs of amino acids (Kawashima, et al., 2008).
2. Compute pI/Mw: A tool which allows the computation of the theoretical pI (isoelectric point) and Mw (molecular weight) for a list of UniProt Knowledgebase (Swiss-Prot or TrEMBL) entries or for user entered sequences Z (Wilkins, et al., 1999).
(5) Functional domain
1. Pfam: A widely used database of protein families, containing 14,831 manually curated entries in the current release (El-Gebali, et al., 2019).
2. PROSITE: Consists of documentation entries describing protein domains, families and functional sites, as well as associated patterns and profiles to identify them (Sigrist, et al., 2013).
3. InterPro: Provides functional analysis of proteins by classifying them into families and predicting domains and important sites (Mitchell, et al., 2019).
4. PIRSF: Reflects evolutionary relationships of full-length proteins and domains (Nikolskaya, et al., 2007).
5. PRINTS: A collection of diagnostic protein family 'fingerprints' (Attwood, et al., 2012).
(6) Post-translational modification
1. EPSD: Contained 1,616,804 experimentally identified p-sites in 209 326 phosphoproteins from 68 eukaryotic species (Lin, et al., 2021).
2. PLMD: Integrated 284,780 modification events in 53,501 proteins across 176 eukaryotes and prokaryotes for up to 20 types of protein lysine modifications (Xu, et al., 2017).
3. dbPTM: A total of 908,917 experimentally verified PTM sites (Huang, et al., 2019).
4. PhosphoSitePlus: A knowledgebase dedicated to mammalian post-translational modifications (PTMs), contains over 330,000 non-redundant PTMs, including phospho, acetyl, ubiquityl and methyl groups (Hornbeck, et al., 2015).
5. iPTMnet: Contains more than 654 500 unique PTM sites in over 62 100 proteins, along with more than 1200 PTM enzymes and over 24 300 PTM enzyme-substrate-site relations (Huang, et al., 2018).
6. HPRD: The human protein reference database that contains a lot of annotations including PTMs (Goel, et al., 2012).
(7) Disease-associated information
1. ClinVar: A public archive of reports of the relationships among human variations and phenotypes with supporting evidence (Landrum, et al., 2018).
2. GWASdb: Collected 2,479 unique publications from PubMed and other resources, generated a total of 252,530 unique TASs, mapped 1,610 GWAS traits to 501 Human Phenotype Ontology (HPO) terms, 435 Disease Ontology (DO) terms and 228 Disease Ontology Lite (DOLite) terms (Li, et al., 2016).
3. SNPdbe: A database and a webinterface that is designed to fill the annotation gap left by the high cost of experimental testing for functional significance of protein variants (Schaefer, et al., 2012).
4. ActiveDriverDB: Human disease mutations and genome variation in post-translational modification sites of proteins (Krassowski, et al., 2018).
5. BioMuta: An integrated sequence feature database, provides a framework for automated and manual curation and integration of cancer-related sequence features so that they can be used in NGS analysis pipelines (Dingerdissen, et al., 2018).
6. Kin-Driver: A human kinase database with driver mutations (Simonetti, et al., 2014).
7. OMIM: A comprehensive, authoritative and timely research resource of curated descriptions of human genes and phenotypes and the relationships between them (Amberger, et al., 2019).
8. PTMD: A Database of Human Disease-associated Post-translational Modifications (Xu, et al., 2018).
9. MSDD: miRNA SNP Disease Database (Yue, et al., 2018).
10. DisGeNET: A platform integrating information on human disease-associated genes and variants (Piñero, et al., 2017).
11. DiseaseEnhancer: A resource of human disease-associated enhancer catalog (Zhang, et al., 2018).
(8) Protein-protein interaction
1. IID: A major replacement of the I2D interaction database, with larger PPI networks (a total of 1,566,043 PPIs among 68,831 proteins) (Kotlyar, et al., 2019).
2. BioGRID: Contains records for 1 598 688 biological interactions manually annotated from 55 809 publications for 71 species (Oughtred, et al., 2019).
3. iRefIndex: A consolidated protein interaction database with provenance (Razick, et al., 2008).
4. PINA: Including multiple collections of interaction modules identified by different clustering approaches from the whole network of protein interactions ('interactome') for six model organisms (Cowley, et al., 2012).
5. HINT: High-quality protein interactomes and their applications in understanding human disease (Das, et al., 2012).
6. Mentha: A resource for browsing integrated protein-interaction networks (Calderone, et al., 2013).
7. inBio MapTM: >500,000 functional interpretation of >4,700 cancer genomes and genes involved in autism (Li, et al., 2017).
8. STRING: A database of known and predicted protein-protein interactions, covers 9,643,763 proteins from 2,031 organisms (Szklarczyk, et al., 2019).
9. TIMBAL: A database holding molecules of molecular weight <1200 Daltons that modulate protein–protein interactions (Higueruelo, et al., 2013).
(9) Drug-target relation
1. TTD: Contains 2,025 targets, including 364 successful, 286 clinical trial, 44 discontinued and 1,331 research targets, 17,816 drugs, including 1,540 approved, 1,423 clinical trial, 14,853 experimental drugs and 3,681 multi-target agents (Li, et al., 2018).
2. DrugBank: Contains 9,591 drug entries including 2,037 FDA-approved small molecule drugs, 241 FDA-approved biotech (protein/peptide) drugs, 96 nutraceuticals and over 6,000 experimental drugs (Wishart, et al., 2018).
3. GtoPdb: Providing pharmacological, chemical, genetic, functional and pathophysiological data on the targets of approved and experimental drugs (Harding, et al., 2018).
4. ADReCS-Target: Provides comprehensive information for illustrating ADRs caused by drug interactions with protein, gene and genetic variation (Zhang, et al., 2007).
5. ECOdrug: A database connecting drugs and conservation of their targets across species (Verbruggen, et al., 2018).
6. DGIdb 3.0: Consolidates, organizes and presents drug-gene interactions and gene druggability information from papers, databases and web resources (Cotto, et al., 2018).
7. CTD: A free resource that provides manually curated information on chemical, gene, phenotype, and disease relationships to advance understanding of the effect of environmental exposures on human health (Grondin, et al., 2019).
8. DrugCentral: A drug information resource (Ursu, et al., 2019).
(10) Orthologous information
1. InParanoid: Provides a user interface to orthologs inferred by the InParanoid algorithm (Sonnhammer, et al., 2015).
2. OMA: A leading resource to relate genes across many species from all of life (Altenhoff, et al., 2018).
3. OrthoDB: A comprehensive catalog of orthologs, genes inherited by extant species from a single gene in their last common ancestor (Kriventseva, et al., 2019).
(11) Biological pathway
1. KEGG: A database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies (Kanehisa, et al., 2019).
2. SignaLink: An integrated resource to analyze signaling pathway cross-talks, transcription factors, miRNAs and regulatory enzymes (Fazekas, et al., 2013).
3. PathBank: A new, comprehensive, visually rich pathway database containing more than 110 000 machine-readable pathways found in 10 model organisms (Wishart, et al., 2020).
4. Reactome: Provides molecular details of signal transduction, transport, DNA replication, metabolism, and other cellular processes as an ordered network of molecular transformations-an extended version of a classic metabolic map, in a single consistent data model (Fabregat, et al., 2018).
(12) Transcriptional regulator
1. TRRUST: An expanded reference database of human and mouse transcriptional regulatory interactions (Han, et al., 2018).
2. HEDD: An integrated human enhancer disease database (Wang, et al., 2018).
3. DroID: A comprehensive, integrated resource for protein, transcription factor, RNA and gene interactions for Drosophila (Murali, et al., 2011).
4. YTRP: Aimed to find the TRP information for the TFPE-identified TF-gene regulatory pairs (Yang, et al., 2014).
5. RegNetwork: Gene regulatory networks for human and mouse by collecting the documented regulatory interactions among TFs, miRNAs and target genes (Liu, et al., 2015).
6. TCGA: Contains the DNA methylation data of 37 cancer type (Hutter, et al., 2018).
(13) mRNA expression
1. TCGA: The Cancer Genome Atlas (TCGA) has generated comprehensive, multi-dimensional maps of the key genomic changes in 33 types of cancer. The TCGA dataset, 2.5 petabytes of data describing tumor tissue and matched normal tissues from more than 11,000 patients, is publically available and has been used widely by the research community (Hutter, et al., 2018).
2. ICGC: The Data Portal currently contains data from 24 cancer projects, and consists of 3478 genomes and 13 cancer types and subtypes (Zhang, et al., 2019).
3. COSMIC: Describes 2 002 811 coding point mutations in over one million tumor samples and across most human genes (Tate, et al., 2019).
4. ArrayExpress: An archive of functional genomics data (Athar, et al., 2019).
5. GXD: The mouse Gene Expression Database (Smith, et al., 2019).
6. BioXpress: A curated gene expression and disease association database (Dingerdissen, et al., 2018).
7. TissGDB: Tissue-specific Gene DataBase in cancer (Kim, et al., 2018).
8. FFGED: The filamentous fungal gene expression database (Zhang, et al., 2010).
9. TISSUES 2.0: An integrative web resource on mammalian tissue expression (Palasca, et al. 2018).
(14) Protein expression/Proteomics
1. The Human Protein Atlas (HPA): 11,200 unique proteins corresponding to over 50% of all human protein-encoding genes have been analysed (Uhlen, et al., 2017).
2. Human Proteome Map (HPM): Including 30 histologically normal human samples, resulted in identification of proteins encoded by 17,294 genes (Kim, et al., 2014).
(15) Subcellular localization
1. NLSdb: Nuclear Localization Signals (Bernhofer, et al., 2018).
2. COMPARTMENTS: Unification and visualization of protein subcellular localization evidence (Binder, et al., 2014).
3. Membranome: The Membranome 2.0 database contains 2129 TM α-helical homodimers of bitopic proteins from six species (Lomize, et al., 2018).
4. Translocatome: Translocating human proteins (Mendik, et al., 2019).
(16) DNA & RNA Element
1. circBase: A database for circular RNAs (Glažar, et al., 2014).
2. TransCirc: Contains information of >300 000 circRNAs together with multi-omics evidence from published literatures to support circRNA translations (Huang, et al., 2021).
3. TargetScan: TargetScan predicts biological targets of miRNAs by searching for the presence of conserved 8mer, 7mer, and 6mer sites that match the seed region of each miRNA (Agarwal, et al., 2015).
4. miRWalk 3.0: The new version of miRWalk stores predicted data obtained with a maschine learning algorithm including experimentally verified miRNA-target interactions. (Sticht, et al., 2018).
5. miRcode: Provides "whole transcriptome" human microRNA target predictions based on the comprehensive GENCODE gene annotation, including >10,000 long non-coding RNA genes (Jeggari, et al., 2012).
6. SEA 3.0: Consisting of 164 545 super-enhancers in 11 species (Chen, et al., 2020).
7. miRNAMap: Genomic maps of microRNA genes and their target genes in mammalian genomes (Hsu, et al., 2006).
8. miRecords: miRecords is a resource for animal miRNA-target interactions. miRecords consists of two components (Xiao, et al., 2008).
9. miRTarBase: A resource for experimentally validated microRNA-target interactions (Chou, et al., 2018).
10. RNAInter: RNAInter integrated >41 million RNA-associated interactions across 154 species (Lin, et al., 2020).