-
Wigle, D., Jurisica, I., N. Radulovich, M. Pintilie, J. Rossant, N. Liu, C. Lu, J. Woodgett, I. Seiden, M. Johnston, S. Keshavjee, G. Darling, T. Winton, B. Breitkreutz, P. Jorgenson, M. Tyers, F. A. Shepherd,
M.S. Tsao.
Molecular profiling of non-small cell lung cancer and correlation
with disease-free survival
Cancer Research, 62(11): In press. 2002.
[available upon request]
Recent studies have suggested that information from gene expression profiles could be used to develop molecular classifications of cancer. We hypothesized that expression levels of specific genes in operative specimens could be correlated to recurrence risk in non-small cell lung cancer (NSCLC). We performed expression profiling using 19.2K cDNA microarrays on tumor specimens from a total of 39 NSCLC patients with known clinical follow-up information. Statistical analysis and clustering approaches were used to determine patterns of gene expression segregating with clinical outcome. The results provide evidence that molecular subtyping of NSCLC can identify distinct profiles of gene expression correlating with disease-free survival.
-
Sultan, M., Wigle, D., Cumbaa, C., Maziarz, M., Glasgow, J., M.-S. Tsao, Jurisica, I.
Binary tree-structured vector quantization approach to clustering and
visualizing microarray data.
Bioinformatics. Special Issue of ISMB'02, 18(Suppl. 1): In press. 2002.
[available upon request]
Motivation: With the increasing number of gene expression
databases, the need for more powerful analysis and
visualization tools is growing. Many techniques have successfully
been applied to unravel latent similarities among
genes and/or experiments. Most of the current systems for
microarray data analysis use statistical methods, hierarchical
clustering, self-organizing maps, support vector machines,
or k-means clustering to organize genes or experiments
into meaningful groups. Without prior explicit bias
almost all of these clustering methods applied to gene expression
data not only produce different results, but may
also produce clusters with little or no biological relevance.
Of these methods, agglomerative hierarchical clustering
has been the most widely applied, although many limitations
have been identified.
Results: Starting with a systematic comparison of the underlying
theories behind clustering approaches, we have
devised a technique that combines tree-structured vector
quantization and partitive k-means clustering (BTSVQ).
This hybrid technique has revealed clinically relevant clusters
in three large publicly available data sets. In contrast
to existing systems, our approach is less sensitive to data
preprocessing and data normalization. In addition, the
clustering results produced by the technique have strong
similarities to those of self-organizing maps (SOMs). We
discuss the advantages and the mathematical reasoning
behind our approach.
Availability: The BTSVQ system is implemented in Matlab
R12 using the SOM toolbox for the visualization and
preprocessing of the data (http://www.cis.hut.fi/projects/
somtoolbox/). BTSVQ is available for non-commercial use
(http://www.uhnres.utoronto.ca/ta3/BTSVQ).
Contact: ij@uhnres.utoronto.ca
-
Luft, J., Wolfley, J., Jurisica, I., Glasgow, J., Fortier, S., DeTitta, G.T.
Macromolecular crystallization in a high throughput laboratory - the search phase.
Journal of Crystal Growth,
232: 591-595, 2001.
[available upon request]
Macromolecular crystallization efforts are frequently divided into a search phase, during which approximate
conditions are sought, and an optimization phase, when the approximate conditions are optimized to yield crystals of
sufficient quality for diffraction work. Faced with the possibility that, on a yearly basis, many hundreds of proteins
might be generated, both in our laboratories and at the laboratories of our collaborators, we have recently designed and
commissioned a high throughput robotics lab designed for the search phase. The lab is capable of setting up and
photographically evaluating over 60,000 microbatch crystallization experiments per week. In the first four months of
operation we have set up crystallization experiments for more than one hundred proteins.
-
Jurisica, I., Rogers, P., Glasgow, J., Collins, R., Wolfley, J., Luft, J., DeTitta, G.T. Improving Objectivity and Scalability in Protein Crystallization:
Integrating Image Analysis With Knowledge Discovery.
IEEE Intelligent Systems Journal,
Special issue on Intelligent Systems in Biology,
November/December, pp. 26-34, 2001.
[available upon request]
This paper describes issues related to
integrating image analysis techniques with knowledge discovery and
case-based reasoning.
Although the work is applicable to a number of problem
domains, here we focus on
the problem of analyzing and classifying outcomes of
protein crystallization experiments in high-throughput structural genomics.
We apply fast Fourier transform to analyze image
content in order to extract important features of the spectrum.
A combination
of these features is used to classify crystallization experiments' outcomes.
Although humans can analyze images more flexibly, a
computational approach makes the process scalable and more objective.
We evaluate the classification process and present results on how
the automatically-extracted features
can be combined to discover important crystallographic knowledge.
-
Wigle, D., Rossant, J., Jurisica, I.
Mining mouse microarray data,
Genome Biology, 2(7): 1019.1-1019.4, 2001.
[available upon request]
Microarrays of mouse genes are now available from several sources, and they have so far given new insights into gene expression in embryonic development, regions of the brain and during apoptosis. Microarray data posted on the internet can be reanalyzed to study a range of questions.
-
Jurisica, I., Rogers, P., Glasgow, J., Fortier, S., Luft, J., Wolfley, J., Bianca, M., Weeks, D., DeTitta, G.T. Intelligent Decision Support for Protein Crystal Growth. To appear in IBM Systems Journal, Special issue on Deep
Computing for Life Sciences, 40(2): 394-409, 2001.
[available upon request] or at IBM System Journal page in
html & PDF formats
Genomic projects are producing hundreds of proteins a year for structural analysis. The
challenge of the research described in this paper is to remove crystal growth experiments as
a rate-limiting step in the enterprise of structure determination of proteins. We meet this
challenge by combining a high-throughput crystallization setup and evaluation in the wet lab
with a sophisticated algorithmic analysis of the outcomes in the computer lab. Furthermore,
we apply techniques from knowledge management and artificial intelligence to develop an
automated system that assists expert crystallographers in planning and evaluating novel
crystal growth experiments. Fundamental to our computational approach to crystallization is
a comprehensive information repository for crystal growth experiments. This stored
information will be used to discover general rules or principles underlying the growth
process for crystals, as well as to guide the reasoning algorithm for planning experiments.
The paper reports on the preliminary results in the wet lab and computation lab
respectively. We define the problem, propose an architecture for intelligent decision support
in the crystallization domain, and report on the status of the individual components of the
architecture.
-
Jurisica, I., J. Glasgow, and J. Mylopoulos. Incremental
Iterative Retrieval and Browsing for Efficient Conversational CBR Systems.
International
Journal of Applied Intelligence. 12(3): 251-268, 2000.
[available upon request]
A case base is a repository of past experiences that can be used for
problem solving. Given a new problem, expressed in the form of a query,
the case base is browsed in search of ``similar'' or ``relevant'' cases.
Conversational case-based reasoning (CBR) systems generally support user
interaction during case retrieval and adaptation. Here we focus on case
retrieval where users initiate problem solving by entering a partial problem
description. During an interactive CBR session, a user may submit additional
queries to provide a ``focus of attention''. These queries may be
obtained by relaxing or restricting the constraints specified for a prior
query. Thus, case retrieval involves the iterative evaluation of a series
of queries against the case base, where each query in the series is obtained
by restricting or relaxing the preceding query.
This paper considers alternative
approaches for implementing iterative browsing in conversational CBR systems.
First, we discuss a naive algorithm, which evaluates each query independent
of earlier evaluations. Second, we introduce an incremental algorithm,
which reuses the results of past query evaluations to minimize the computation
required for subsequent queries. In partiular, the paper proposes an efficient
algorithm for case base browsing and retrieval using database techniques
for incremental view maintenance. In addition, the paper evaluates the
performance of the proposed algorithm with respect to alternative approaches
considering two perspectives: (i) experimental efficiency evaluation using
diverse application domains, and (ii) scalability evaluation using the
performance model of the proposed system.
-
Jurisica, I. and Glasgow, J. (1997). Improving performance of case-based
classification using context-based relevance. International Journal
of Artificial Intelligence Tools. Special Issue of IEEE ITCAI-96 Best Papers.
6(4):511-536.
IJAIT'97.ps.Z
Classification involves associating instances with particular classes
by maximizing intra-class similarities and minimizing inter-class similarities.
Thus, the way similarity among instances is measured is crucial for the
success of the system. In case-based reasoning, it is assumed that similar
problems have similar solutions. The case-based approach to classification
is founded on retrieving cases from the case base that are similar to a
given problem, and associating the problem with the class containing the
most similar cases.
Similarity-based retrieval tools can advantageously be used in building
flexible retrieval and classification systems. Case-based classification
uses previously classified instances to label unknown instances with proper
classes. Classification accuracy is affected by the retrieval process --
the more relevant the instances used for classification, the greater the
accuracy.
The paper presents a novel approach to case-based classification. The
algorithm is based on a notion of similarity assessment and was developed
for supporting flexible retrieval of relevant information. Case similarity
is assessed with respect to a given context that defines constraints for
matching. Context relaxation and restriction is used for controlling the
classification accuracy. The validity of the proposed approach is tested
on real-world domains, and the system's performance, in terms of accuracy
and scalability, is compared to that of other machine learning algorithms.
-
Jurisica, I., Rogers, P., Glasgow, J., Fortier, S., Luft,
J., Bianca, D., DeTitta, G.T.
Image-Feature Extraction for Protein Crystallization:
Integrating Image Analysis and Case-Based Reasoning
Thirteenth Annual Conference on Innovative Applications
of Artificial Intelligence (IAAI-2001),
Seattle, WA, 2001, p. 73-80.
This paper describes issues related to
integrating image analysis techniques into case-based reasoning.
Although the approach is generic, a high-throughput protein
crystallization problem is used as an example.
Our solution to the crystallization problem is to store outcomes
of experiments as images, extract important image features, and use them
to automatically recognize different crystallization outcomes.
Subsequently, we use the outcomes of image classification to perform
case-based planning of crystallization experiments for new proteins.
Knowledge-discovery techniques are used to
extract general principles for crystallization.
Such principles are applicable to the adaptation phase of case-based reasoning.
The motivation for automated image-feature extraction is twofold:
\snum{1} the human interpretation/analysis of image content is subjective, and
\snum{2} many problem domains require reasoning with large databases of
uninterpreted images. In this paper we present the design and implementation
of our integrated system, as well as some
preliminary experimental results.
- Jurisica, I. (2000). Building better decision-support systems by using
knowledge discovery.
Annual Conference of the American Society for Information Science,
Chicago, IL, p. 281-291.
-
Luft, J. R., J. Wolfley, M. Bianca, D. Weeks, I. Jurisica, P. Rogers, J.
Glasgow, S. Fortier, G. T. DeTitta. (2000). Gearing up for structural genomics:
The challenge of hundreds of proteins and hundred of thousands of crystallizaiton
experiments per year.
The Annual Conference of the American Crystallography
Association (ACA'2000), Saint Paul, MN.
Structural genomics projects promise to produce hundreds of proteins
a year for structural analysis. The challenge to crystal growers
is to make some other step in the structural biology enterprise rate-limiting.
Our approach is to combine high throughput (HTP) crystallization setup
and evaluation in the wet lab with sophisticated algorithmic analyses of
the HTP outcomes in the computer lab for the purposes of recipe prediction.
In the wet lab we now have the capacity to prepare and evaluate the
results of over sixty thousand (61.4K) crystallization experiments a workweek.
Each is a microbatch experiment conducted under paraffin oil. Pipetting
is performed with robots outfitted with 96 or 384 syringes and XYZ translation
stages. High density (1536 well) micro-assay plates hold the experiments.
1536 crystallization cocktails, covering a wide range of crystallizing
agents, have been prepared. Current pipetting protocols allow us
to deploy 200 nanoL droplets of protein solution and crystallization cocktails
(total drop size 400 nanoL). Once a micro-assay plate is prepared
with paraffin oil and crystallization cocktails it is possible to set protein
solution into the wells in less than five minutes, allowing us to work
quickly with unstable proteins. Current total protein requirements
are being assessed, but are likely to be in the 10 mg range. After
setup plates are placed on a computer controlled XY table with micron positioning
accuracy. The plates are translated under a megapixel digital camera
where images are captured by a framegrabber. The XY table can accommodate
28 plates (43K experiments) at a time and the camera can record 43K images
in approximately twelve hours.
In the computer lab the images are analyzed automatically to determine
the outcomes of the crystallization experiments. We are developing
a standard vocabulary of outcomes that will describe the results:
clear drop, amorphous precipitate, phase separation, microcrystals, crystals,
and uncertain outcome. These outcomes, recorded as a function of
time, are the cornerstone of a crystallization database that will contain
physical information about individual proteins as well as results of crystallization
experiments with those proteins. Using case-based reasoning algorithms
we will identify patterns of similar properties and crystallization outcomes
relating two or more proteins in the database. Our hypothesis is
that, given a quantitative measure of “similarity” between proteins, recipes
successfully employed for one protein will be useful starting points for
crystallization experiments with similar proteins. Future work will
center upon the most predictive measures of “similarity”.
-
Luft, J. R., J. Wolfley, M. Bianca, D. Weeks, I. Jurisica, P. Rogers, J.
Glasgow, S. Fortier, G. T. DeTitta. (2000). Gearing Up for ~40K Crystallization
Experiments a Day: Meeting The Needs of HTP Structural Proteomics Projects.
Eighth
International Conference on the Crystallization of Biological Macromolecules
(ICCBM-8), Sandestin, Florida.
The medical potential of the various genome projects now underway will
be realized when we know not only the sequences of the amino acids coded
in open reading frames but also what these ORFs represent, both structurally
and functionally. Structural proteomics will challenge us to grow
more and better crystals for diffraction studies. Our labs are involved
in two major aspects of that work: getting the techniques and equipment
in place to do large scale, high thruput crystallization experiments, and
assembling the expertise to make sense of all the data that will come from
those experiments.
-
Jurisica, I. (2000). Knowledge Organization by Systematic Knowledge Management
and Discovery. International Conference of the International Society
of Knowledge Organization (ISKO 6), Toronto, Ontario, p. 366-371.
We need to use dynamic knowledge organization
approaches in order to facilitate effective access and use of domain knowledge.
Although there are many approaches to knowledge organization available,
it is a challenge to systematically organize evolving domains, because
it is not feasible to rely only on humans to create relationships among
individual knowledge sources. Additional problems arise because knowledge
may not be consistently and completely described, and quality control may
not always be in place in distributed knowledge environments. In this article
we describe a generic approach to knowledge organization by using systematic
knowledge management and applying knowledge-discovery techniques. We use
a case-based reasoning system, called TA3, as a core component for knowledge
management. Application of symbolic knowledge-discovery component of TA3
supports three main tasks: system optimization, knowledge evolution and
evidence creation. To explain advantages of this approach, we use our experience
from biomedical domains.
-
Jurisica, I. and Glasgow, J. (2000). Extending case-based reasoning by
discovering and using image features in IVF. ACM Symposium on Applied
Computing (SAC'2000) Villa Olmo, Como, Italy, p. 52-59.
This paper describes the application of automated
image analysis to evaluate morphology and developmental features of oocytes
and embryos in the domain of in vitro fertilization (IVF). Although
humans can analyze images more flexibly, computer vision techniques make
the proc-ess more objective and precise. We propose to use com-puter-based
morphometry to precisely and objectively identify developmental features
of oocytes and embryos. Extracted morphological information can be linked
with symbolic information to better predict pregnancy outcome and suggest
further medical procedures. Recognized fea-tures can then be used to support
case-based reasoning and knowledge discovery. The combination of image
analysis techniques and case-based reasoning can thus serve as: (1) a feature
extraction technique; (2) an indexing approach; and (3) an analysis tool.
A combination of symbolic and image information can then be used to identify
morpho-logical features of oocytes and embryos that are vital for successful
IVF. Extracting image features and analyzing them helps to perform knowledge
discovery from images.
-
Jurisica, I, J. Mylopoulos, E. Yu. (1999) Using Ontologies for Knowledge
Management: A Computational Perspective. Annual Conference of the American
Society for Information Science, Washington, DC, p. 482-496.
Knowledge management research focuses on the
development of concepts, methods, and tools supporting the management of
human knowledge. To further this objective, researchers are studying the
way organizations, groups and individuals use knowledge in the performance
of daily tasks. They are also developing computer-based tools and techniques
to support the acquisition, representation, organization, retrieval, analysis
and evolution of knowledge in its many forms. The main objective of this
paper is to survey some of the primitive concepts that have been used in
computer science for the representation of knowledge and summarize some
of their advantages and drawbacks. A secondary objective is to relate these
techniques to information sciences theory and practice.
Several research areas within computer science
have developed techniques for representing knowledge so that it can be
accessed and used by humans and software systems alike. In particular,
Artificial Intelligence (AI) has developed techniques for representing
knowledge so that it can be exploited by intelligent systems. Databases
have focused on techniques, which allow the representation and management
of large amounts of simple knowledge, using as vehicles relational databases
and related technologies. Software Engineering and Information Systems
have developed elaborate techniques for capturing knowledge that relates
to the requirements, design decisions and rationale for a software system.
We characterize all these techniques in terms of the primitive concepts
they offer for representing knowledge within a given class of applications.
-
Dayani-Fard, H. and Jurisica, I. (1998) Reverse Engineering by Mining Dynamic
Repositories. InWorking Conference on Reverse Engineering (WCRE'98),
Honolulu, Hawaii.
This paper presents some preliminary results
on applying information retrieval and knowledge-mining techniques to reverse
engineering of legacy systems. In order to support a dynamic environment,
we take an approach of integrating lightweight tools. Instead of forcing
a user to use a fixed environment, our approach provides a basic information
repository, which manages information extracted from the documentation
and source code. The system stores this information in a graph structure,
it supports navigation through the repository, and modification of its
structure and annotation. Preliminary evaluation of the proposed approach
on the small-size software system is encouraging.
-
Jurisica, I. (1998) Asynchronous Telemedicine: A Case-Based Reasoning Approach
to Knowledge Sharing. InInformation Technology in Community Health (ITCH*98)
Conference, Victoria, BC.
The health care industry faces constant demands
to improve quality, extend services, and reduce cost. Telemedicine satisfies
these demands by supporting distant consultations. In addition, knowledge-based
systems may augment current synchronous telemedicine applications by storing
and managing medical experience over time. By providing timely and efficient
access to the knowledge repository, knowledge-based systems help to distribute
experience, standardize procedures, lower cost, and increase quality of
health care services. This facilitates asynchronous telemedicine.
Our previous experience from using a case-based
reasoning system to support specialists in in vitro fertilization domain
shows that this paradigm is suitable for building medical knowledge repositories
for knowledge sharing. We propose to extend the system to support tele-consultations:
(1) between specialists (rare medical cases); (2) between general practitioners
and specialists (standard practices); and (3) between health care professionals
and patients (generic medical information). This will help to standardize
patient examination and treatment practices. In addition, physicians will
be able to share experience via remote knowledge repository.
This paper focuses on extensions for specialists.
We show how case-based reasoning can support evidence-based medicine, remote
consultations, and improve knowledge sharing and domain understanding.
-
Mylopoulos, J, Jurisica, I. and Yu, E. (1998) Computational mechanisms
for knowledge organization. In 5th International Conference
of the International Society of Knowledge Organization (ISKO 5),
pages 125-132, Lille, France. ISKO'98.ps.Z
This paper reviews several knowledge organization
techniques used in Computer Science, in areas such as Artificial Intelligence,
Databases and Software Engineering. Some of these computational mechanisms
may assist in the organization and management of immense digital information
resources. At the same time, the paper notes an increasing need for computer-based
information systems to operate in open networked environments. This need
requires knowledge organization principles, which are flexible and can
be used with informally expressed knowledge. We expect to find such knowledge
organization techniques in Library and Information Sciences, and hope to
integrated them with the computational techniques described in this paper.
-
Jurisica, I. and Glasgow, J. (1998). An efficient approach to iterative
browsing and retrieval for case-based reasoning. Editor Angel Pasqual del
Pobil, Jose Mira and Moonis Ali,Lecture Notes in Computer Science, IEA/AIE*98,
pages 535-546, Springer-Verlag. IEA/AIE'98.ps.Z
A case base is a repository of past experiences
that can be used for problem solving. Given a new problem, expressed in
the form of a query, the case base is browsed in search of "similar" or
"relevant" cases. One way to perform this search involves the iterative
evaluation of a series of queries against the case base, where each query
in the series is obtained by restricting or relaxing the preceding query.
The paper considers alternative approaches for
implementing iterative browsing in case-based reasoning systems, including
a naive algorithm, which evaluates each query independent of earlier evaluations,
and an incremental algorithm, which reuses the results of past query evaluations
to minimize the computation required for subsequent queries. In particular,
the paper proposes an efficient algorithm for case base browsing and retrieval
using database techniques for view maintenance. In addition, the paper
evaluates the performance of the proposed algorithm with respect to alternative
approaches considering two perspectives: (1) experimental efficiency evaluation
using diverse application domains, and (2) scalability evaluation using
the performance model of the proposed system.
-
Jurisica, I. and Nixon, B. (1998) Building quality into case-based reasoning
systems. Lecture Notes in Computer Science, CAiSE*98, pages
363-380, Springer-Verlag. CAiSE'98.ps.Z
Complex decision-support information systems
for diverse domains need advanced facilities, such as knowledge repositories,
reasoning systems, and modeling for processing interrelated information.
System development must satisfy functional requirements, but must also
systematically meet global quality factors, such as performance, confidentiality
and accuracy, called non-functional requirements (NFRs).
Case-based reasoning (CBR) systems, an important
class of decision support systems, require a design process that systematically
produces high-quality applications. Beyond satisfying basic functional
requirements for CBR, it is important to meet global quality factors, such
as performance and confidentiality, called non-functional requirements
(NFRs). This paper presents a goal-oriented, knowledge-based approach for
aiding decision support system development and usage, namely, it proposes
an approach for dealing with non-functional requirements (NFRs) for CBR
systems. We show how quality can be built into a CBR system, using the
"QualityCBR" approach, which integrates existing work on CBR and NFRs.
We illustrate the use of the approach in a complex medical domain – in
vitro fertilization [C8]. In this domain, a CBR system is used for:
(1)
suggesting hormonal therapy for in-vitro fertilization patients,
(2)
predicting the probability of successful pregnancy, and (3) interactively
determining important patient's characteristics that can improve pregnancy
rate. The QualityCBR approach is used to address important NFRs, such as
performance, accuracy and confidentiality.
-
Jurisica, I. Similarity-Based retrieval for diverse Bookshelf software
repository users In IBM CASCON Conference, pages 224-235, Toronto,
Canada, 1997. CASCON'97.ps.Z
The paper presents a similarity-based retrieval
framework for a software repository that aids the process of maintaining,
understanding, and migrating legacy software systems. Designing a software
repository involves three issues: (1) information content; (2) information
representation; and (3) strategies for accessing repository artifacts.
Given the architecture of a Bookshelf software repository, we extend the
retrieval system to support imprecise queries, iterative browsing, and
diverse users. Because of repository size, complexity of queries and relations
among artifacts, we take a performance approach to support a scalable implementation.
We propose a retrieval system that uses numeric and semantically rich context-based
similarity [J1]. Efficient iterative browsing is based on an incremental
query evaluation algorithm from database management systems. Explicitly
defined context supports various retrieval strategies and diverse user
models.
-
Jurisica, I. and Gupta, K. Knowledge-based systems for decision support
in healthcare. In Digital Knowledge Conference II, Toronto, 1997.
This paper introduces a generic approach to knowledge-based
decision-support in medicine. We review problems present in medical domains
and introduce available solutions. We describe a case-based reasoning system
called SpotLight and discuss its advantages when applied to complex medical
domains, in vitro fertilization and nephrology.
-
Jurisica, I. and Glasgow, J. A case-based reasoning approach to learning
control. In 5th International Conference on Data and Knowledge
Systems for Manufacturing and Engineering, DKSME-96, Phoenix, AZ, 1996.
DKSME'96.ps.Z
-
Jurisica, I. and Glasgow, J. Case-Based Classification Using Similarity-Based
Retrieval. In 8th IEEE International Conference on Tools
with Artificial Intelligence, Toulouse, France, p. 410-419. TAI'96.ps.Z
-
Jurisica, I. TA3: Case-Based Intelligent Retrieval and Advisory Tool. ACM
Conference on Society and the Future of Computing. Durango, CO, 1995.
-
Jurisica, I. and Shapiro, H. Case-based reasoning system applied as an
advisor for IVF practitioners 51st Annual Conference of the
American Society for Reproductive Medicine, Seattle, WA, 1995. ASRM'95.ps.Z
-
Greiner, R. and Jurisica, I. A statistical approach to solving the EBL
utility problem. In Proc. ofNational Conference on AI, AAAI-92,
pages 241-247, San Jose, CA, 1992. AAAI'92.ps.Z
Similarity plays a central role in theories
of human problem solving and thus is important for artificial intelligence
research. Although there are different approaches to similarity assessment,
the underlying idea is to classify information according to some features,
so that we can use it in similar situations. Depending on the application
domain, the task at hand, and user preferences, the relevance of individual
features may vary, and so will the similarity of the concepts they represent.
It is paramount to know what affects feature relevance and how to represent
such information explicitly.
The objective of this thesis is to improve case-based
reasoning by: (1) achieving better accuracy during classification; (2)
retrieving cases that are more relevant to a given problem; and (3) obtaining
scalability with respect to case base size, case and query complexity.
We achieve this goal by introducing a new theory of similarity-based retrieval
that uses variable-context similarity assessment, and by defining an efficient
iterative retrieval algorithm that employs ideas of incremental view maintenance
algorithms from database management systems. Context is a parameter of
similarity that specifies what attributes are involved in similarity assessment
between cases, and what set of values may be considered for these attributes.
It defines which aspects of a case are important in a particular situation.
We also define a set of operations, namely relaxation and restriction,
which enable to control the relevance of retrieved cases.
We evaluate competence, scalability and algorithmic
complexity of a prototype system on diverse real-world domains. We show
how the proposed similarity measure supports flexible computation by trading
off the accuracy or precision of the computation process for time and space
resources. In addition, the case representation used supports case base
organization so that cases similar in a given context can be grouped into
clusters. This representation also lends itself to attribute-oriented discovery,
a technique that finds relevant attributes and their values. The discovery
process improves the representation by grouping together relevant, removing
unneeded or adding essential attributes. Performance evaluation shows how
the discovery process improves system's competence. Iterative retrieval
of cases is efficiently handled by the adoption of incremental view maintenance
algorithms from database management systems. Performance evaluation shows
that this approach improves efficiency of case retrieval and thus helps
to achieve system scalability with respect to case base size, case representation
and query complexity.
Jurisica, I. (1993). Query Optimization for Knowledge
Base Management Systems: A Machine Learning Approach. MSc thesis, Department
of Computer Science, University of Toronto, Toronto, Ontario.
This thesis proposes new machine learning applications
to optimize queries in a knowledge base management systems. In particular,
an explanation-based machine learning algorithm is adopted, extended and
tested. The algorithm, called PALO (Probably Approximately Locally Optimal),
is a general model of a learning system and is directly applicable to a
variety of systems as a speedup learning module. The algorithm is based
on the theoretical work of Valiant [Valiant-CACM84] and uses statistical
information to produce a close approximation of a locally optimal search
strategy. Some additions are made to the original version of the algorithm,
to solve a broader range of problems. In addition, the termination condition
in the algorithm is changed in order to make it run faster without any
degradation of its performance.The learning module is implemented and its
integration into an architecture of a knowledge base management system
is shown. The proposed optimization technique is tested with real and artificial
examples to establish its effectiveness.