Class MDSet


  • @PublicAPI
    public class MDSet
    extends Object
    MDset combines several MolecularDescriptors into one entity. The purpose of this class is to allow dissimilarity calculations being performed on various MolecularDescriptors simultaneously. This improves the predictive power of individual descriptors and is more efficient than doing it one-by-one.
    MDSet objects can be compared against each other by dissimilarity metrics. The dissimilarity coefficient is obtained as the weighted sum of the dissimilarity coefficients of the pair-wise comparison of components. Weights are stored in the MDSetParameters class, aggregated by this class.
    MDSet instances are associated with (and calculated from) molecular structures. This connection between the orginal Molecule and its MDSet objects is preserved by the unique identifier of the molecule which is stored in the MDSet object too.
    Besides MolecularDescriptor components, and MDSet object can take an arbitrary number of external, user defined float values. Typically, these are calculated by third party software and stored in SDfile tags or database columns. These values are used in dissimilarity calculations but they are never modified.
    Remark: the term Set is slightly misleading since components constituting the MDSet are ordered. Tuple or Record would be more appropriate though probably quite unusual in a cheminformatics context.
    Since:
    JChem 2.0
    • Field Detail

      • dissim

        public float dissim
        dissimilarity measured against an other set
    • Constructor Detail

      • MDSet

        public MDSet()
        Creates an empty MDSet object. It can be initialized by calling setSize( int nComponents ) and setParameters( final MDSetParameters params ).
      • MDSet

        public MDSet​(MDSet c)
        Copy constructor. Creates an identical object, in which components are cloned, but parameters are not cloned.
        Parameters:
        c - a MDSet object to be copied
      • MDSet

        public MDSet​(int nComponents)
        Creates an empty MDSet object capable of stroring a given number of MolecularDescriptor components. Components should be added by setDescriptor( final MolecularDescriptor descriptor ) .
        Parameters:
        nComponents - number of components in the MDSet object
      • MDSet

        public MDSet​(int nComponents,
                     int nUserData)
        Creates an empty MDSet object capable of stroring a given number of MolecularDescriptor components and the given number of user defined (external) data. Components should be added by setDescriptor( final MolecularDescriptor descriptor ) .
        Parameters:
        nComponents - number of components in the MDSet object
        nUserData - number of further floating point values
    • Method Detail

      • newInstance

        public static MDSet newInstance​(String[] componentTypes)
        Gets a new MDSet instance constituted of the specified components. MDSetParameters are set to default.
        Parameters:
        componentTypes - type names of the components
        Returns:
        a new object
      • newInstance

        public static MDSet newInstance​(String[] componentTypes,
                                        String[] params)
        Gets a new MDSet instance constituted of the specified components. Components are parametrized with the given parameter settings.
        Parameters:
        componentTypes - type names of the components
        params - parameter strings
        Returns:
        a new object; or null, if the required class could not be instanciated
      • newInstance

        public static MDSet newInstance​(String[] componentTypes,
                                        File[] params)
        Gets a new MDSet instance constituted of the specified components. Components are parametrized from the given parameter files.
        Parameters:
        componentTypes - type names of the components
        params - parameter files
        Returns:
        a new object; or null, if the required class could not be instanciated
      • clone

        public Object clone()
        Clones the object.
        Overrides:
        clone in class Object
        Returns:
        a new, identical MDSet instance
      • setSize

        public void setSize​(int nComponents,
                            int nUserData)
        Sets the number of MolecularDescriptor components and the number of user defined (external) data in the MDSet.
        Parameters:
        nComponents - number of components in the MDSet object
        nUserData - number of further floating point values
      • setSize

        public void setSize​(int nComponents)
        Sets the number of MolecularDescriptor components in the MDSet.
        Parameters:
        nComponents - number of components in the MDSet object
      • setId

        public void setId​(int id)
        Sets the unique internal idenifier of the MDSet object.
        Parameters:
        id - unique identifier
      • getId

        public int getId()
        Gets the identifier of the MDSet.
        Returns:
        the identifier
      • setNaturalId

        public void setNaturalId​(String id)
        Sets the natural idenifier of the MDSet object. This identifier is taken from a Molecule (from an SDfile tag).
        Parameters:
        id - unique identifier
      • getNaturalId

        public String getNaturalId()
        Gets the natural identifier of the source Molecule of the MDSet.
        Returns:
        the identifier
      • setParameters

        public void setParameters​(MDSetParameters params)
        Sets the parameters of the MDSet. Note, that this has no effect on the parameters of individual MolecularDescriptor components in the MDSet.
        Parameters:
        params - new parameters for this MDSet.
      • getParameters

        public MDSetParameters getParameters()
        Gets the current parameter settings.
        Returns:
        the parameters of the MDSet
      • addDescriptor

        public void addDescriptor​(MolecularDescriptor descriptor)
        Appends the next component to the MDSet object.
        Parameters:
        descriptor - the next component of the MDSet
      • setDescriptors

        public void setDescriptors​(MolecularDescriptor[] descriptors)
        Sets all components of the MDSet.
        Parameters:
        descriptors - MDSet components, they are not cloned
      • setDescriptor

        public void setDescriptor​(int componentIndex,
                                  MolecularDescriptor md)
        Sets a given component of the MDSet.
        Parameters:
        componentIndex - index of the component to be set
        md - the MolecularDescriptor type of the specified component
      • size

        public int size()
        Gets the number of components constituting the MDSet.
        Returns:
        number of component
      • getDescriptor

        public MolecularDescriptor getDescriptor​(int index)
        Gets a specified component of the MDSet.
        Parameters:
        index - component index
        Returns:
        the selected component
      • generate

        public void generate​(Molecule mol)
                      throws MDGeneratorException
        Generates the MDSet from the given molecular structure.
        Parameters:
        mol - the molecule to generate from.
        Throws:
        MDGeneratorException - when failed to generate one of the components
      • getDissimilarity

        public float getDissimilarity​(MDSet other)
        Calculates the dissimilarity between two MDSet objects. The dissimilarity value is the weighted sum of the component-wise dissimilarity values.
        Parameters:
        other - a MDSet object which this is compared to Its type is Object in order to implement the Clusterable interface.
        Returns:
        the dissimilarity coefficient calculated
      • getLowerBound

        public float getLowerBound​(Object o)
        Gives a lower bound estimation for the value of getDissimilarity( final Object o ). This method is implemented due to the services requirements by the Clusterable interface.
        Parameters:
        o - MDSet object to which this is compated Its type is Object in order to implement the Clusterable interface.
        Returns:
        the lower bound estimation of the dissimilarity coefficient
      • setUserData

        @Deprecated
        public void setUserData​(float[] userData)
        Deprecated.
        since 2.3
        Sets all user defined float values in the MDSet.
        Parameters:
        userData - user defined floating point data values
      • setUserData

        @Deprecated
        public void setUserData​(int dataIndex,
                                float userData)
        Deprecated.
        since 2.3
        Sets a given user defined float value in the MDSet.
        Parameters:
        dataIndex - index of the data value to be set
        userData - user defined floating point data value
      • getUserData

        @Deprecated
        public float getUserData​(int index)
        Deprecated.
        since 2.3
        Gets the value of a user defined data component.
        Parameters:
        index - data component index
        Returns:
        value of user defined data component
      • getUserData

        @Deprecated
        public float[] getUserData()
        Deprecated.
        since 2.3
        Gets the value of all user defined data components.
        Returns:
        array of values of user defined data components