Class MDGenerator

  • Direct Known Subclasses:
    BCUTGenerator, CFGenerator, ECFPGenerator, PFGenerator, RFGenerator, ShapeGenerator

    @PublicAPI
    public abstract class MDGenerator
    extends Object
    Base class for all kinds of MolecularDescriptor generators. Its main purpose is two-fold: (1) defines an interface for all generator classes (that is, what methods should be implemented), (2) implements function for gather statistical data on descriptor generated and retrieval functions for these statistics.
    Since:
    JChem 2.1
    • Field Detail

      • createStatistics

        protected boolean createStatistics
        indicates if statistical data has to be gathered during generation
      • molCount

        protected int molCount
        variables to collect statistical data in
      • minNonEmptyPercent

        protected float minNonEmptyPercent
      • minNonEmptyId

        protected int minNonEmptyId
      • maxNonEmptyPercent

        protected float maxNonEmptyPercent
      • maxNonEmptyId

        protected int maxNonEmptyId
      • sumNonEmptyPercent

        protected float sumNonEmptyPercent
      • freqCount

        protected int[] freqCount
      • density

        protected int[] density
    • Constructor Detail

      • MDGenerator

        public MDGenerator()
        Created an object.
    • Method Detail

      • generate

        public abstract String[] generate​(Molecule m,
                                          MolecularDescriptor d)
                                   throws MDGeneratorException
        Generates the molecular descriptor for the given molecule. The MolecularDescriptor provided is updated (thus it has to be allocated and initialized by the client of this class).
        Parameters:
        m - molecule for which the descriptor is created
        d - the generated descriptor
        Returns:
        names of tags (properties) added
        Throws:
        MDGeneratorException - in the case of any failures to generate the descriptor
      • setCreateStatistics

        public void setCreateStatistics​(boolean createStatistics)
        Toggles the create statistics flag.
        Parameters:
        createStatistics - new value for the create statistics flag
        Since:
        JChem 2.1
      • updateStatistics

        protected void updateStatistics​(MolecularDescriptor d)
        Updates statistics gathered on fingerprints generated.
        Parameters:
        d - newly generated MolecularDescriptor
        Since:
        JChem 2.1
      • calcFreqCount

        protected int calcFreqCount​(MolecularDescriptor d)
        Calculate and store in freqCount[] absolute frequency counts per cells. Also gets number of non-zero cells in the descriptor.
        Parameters:
        d - descriptor in which non-zero cells should be counted
        Returns:
        number of non-zero cells
      • getMoleculeCount

        public int getMoleculeCount()
        Gets the number of molecules processed (that is, the number of descriptors generated) since the initialization of the object.
        Returns:
        number of molecules processed
      • getAverageNonZeroRatio

        public float getAverageNonZeroRatio()
        Gets the average percentage of cells that have non-zero value taken all descriptors generated since the initialization of the generator into account.
        Returns:
        relative number of bits set in descriptors
      • getMaximumBitRatio

        public float getMaximumBitRatio()
        Gets the maximum percentage of non-zero cells in descriptors generated.
        Returns:
        maximum bits set, relative to descriptor length
      • getBrightestMolId

        public int getBrightestMolId()
        Gets the id of that molecule which had the maximum number of non-zero cells among all descriptors generated since the initialization of the generator object.
        Returns:
        unique molecule identifier (a consequtive index from zero)
      • getMinimumBitRatio

        public float getMinimumBitRatio()
        Gets the minimum percentage of non-zero cells in descriptors generated.
        Returns:
        minimum bits set, relative to descriptor length
      • getDarkestMolId

        public int getDarkestMolId()
        Gets the id of that molecule which had the minimum number of non-zero cells among all descriptors generated since the initialization of the generator object.
        Returns:
        unique molecule identifier (a consequtive index from zero)
      • getDensityCounts

        public int[] getDensityCounts()
        Gets the array of bit density. The array can be indexed from 0 to 10. Index i returns the number of descriptors in which the ratio non-zero cells is between 10 * i and 10 * i + 10 .
        Returns:
        array of density counts
      • getFrequencyCounts

        public int[] getFrequencyCounts()
        Gets the absolute frequence count array for all descriptors generated. Each element of the array stores the number of descriptors in which the corresponding cell had non-zero value.
        Returns:
        per-cell frequency count array