Code & Data


Adeft (Acromine based Disambiguation of Entities From Text context) is a utility for building models to disambiguate acronyms and other abbreviations of biological terms in the scientific literature. It makes use of an implementation of the Acromine algorithm developed by the NaCTeM at the University of Manchester to identify possible longform expansions for shortforms in a text corpus. It allows users to build disambiguation models to disambiguate shortforms based on their text context. A growing number of pretrained disambiguation models are publicly available to download through adeft.

External References

Adeft: Acromine-based Disambiguation of Entities from Text with applications to the biomedical literature.

Steppi A, Gyori BM, Bachman JA.

Journal of Open Source Software. 2020. 5(45): 1708. doi: 10.21105/joss.01708.


ProteinNet is a standardized data set for machine learning of protein structure. It provides protein sequences, structures (secondary and tertiary), multiple sequence alignments (MSAs), position-specific scoring matrices (PSSMs), and standardized training / validation / test splits. ProteinNet builds on the biennial CASP assessments, which carry out blind predictions of recently solved but publicly unavailable protein structures, to provide test sets that push the frontiers of computational methodology. It is organized as a series of data sets, spanning CASP 7 through 12 (covering a ten-year period), to provide a range of data set sizes that enable assessment of new methods in relatively data poor and data rich regimes.

External References

ProteinNet: a standardized data set for machine learning of protein structure.

AlQuraishi M.

BMC Bioinformatics. 2019. 20(1):311. doi: 10.1186/s12859-019-2932-0. PMID: 31185886.

GR Calculator

The Growth Rate inhibition (GR) Calculator is an open source set of Python, R and on-line tools for quantifying the responses of cancer cells to drugs in a manner that corrects for the confounding effects of variable cell proliferation rates. Response metrics computed from GR data include GR50 and GRmax and are direct analogues of familiar IC50 and Emax response measures.

External References

Growth rate inhibition metrics correct for confounders in measuring sensitivity to cancer drugs.

Hafner M, Niepel M, Chung M, Sorger PK.

Nat Methods. 2016. 13(6):521-7. doi: 10.1038/nmeth.3853.PMID: 27135972.

Alternative drug sensitivity metrics improve preclinical cancer pharmacogenomics.

Hafner M, Niepel M, Sorger PK.

Nat Biotechnol. 2017. 35(6):500-502. doi: 10.1038/nbt.3882. PMID: 28591115.


ASHLAR (Alignment by Simultaneous Harmonization of Layer/Adjacency Registration) is an open source Python package that stiches together successive microscopy image tiles to generate a single, seamless image. ASHLAR also registers images from different fluorescent channels at a high level of accuracy.

Dye Drop

Reliable high-throughput imaging of cells grown in multi-plate wells is complicated by loss of cells during staining and wash steps. The dye drop method uses a set of incrementally more dense solutions to prevent cell loss. Dye Drop software consists of Python tools for determining the viability and cell cycle states of cells before and after drug treatment.

Terms of use
    • The CCSP provides all its content, tools, and data on an “as is” basis, without warranty or representation of any kind, express or implied. If you use code, data or other content from this website, you must accept the terms on this page.
    • We reserve the right to modify these terms at any time. Unless otherwise specified, text, images and data on the site are provided under a Creative Commons Attribution ShareAlike (CC BY-SA) license. Scientific manuscripts carry the licenses of their publishers.
      Creative Commons
    • The CCSP aims to provide timely public access to all relevant data, software, and tools in accordance with data release policy of the NCI Cancer System Biology Consortium.
    • Note that data released prepublication is potentially subject to confounders, batch effects and other errors which may not have been identified or controlled. While the CCSP works hard to ensure the reproducibility of its results, use of pre-publication data entails extra risks.
    • Specialized reagents (e.g. cell lines, plasmids etc.) are generally made available via standard repositories (e.g. Addgene). In other cases, contact us at the address below. Reagents that are not in repositories will usually require an MTA prior to distribution. We do not redistribute materials that originated with commercial vendors or outside research groups; we will not respond to requests about such materials.
    • Users are expected to acknowledge the following in all oral or written presentations, disclosures, or publications of the data or analyses provided by the CCSP.
      • For unpublished data found on this site:
        The Harvard Medical School CCSP and the funding source that supported the work: NIH grant U54-CA225088 (e.g. “[x] data were provide by the Harvard Medical School Center for Cancer Systems Pharmacology, funded by NIH grant U54-CA225088.”)
      • For published data:
        The normal rules of scientific citation apply. Whenever feasible we use open source licenses for our manuscripts and data. Data are also routinely deposited in GEO, PRIDE, and other repositories. Large-scale imaging data remain hard to share; please contact us if you need access.
      • For software:
        Software is generally released under an MIT Open Source license. See for details.


If you have specific questions or comments about HMS CCSP data, software, and tools, contact us at