Computer Science Research Day

August 26, 2005

Photos of the day

Morning sessions (Votey 367):

9:00-10:00am. Keynote: Professor Jason H. Moore (Dartmouth)

Title: Computational Human Genetics

10:00-10:30am. Coffee break

10:30-12:00pm. Graduate student presentations

Click on the talk to see titles and abstracts. All 5 minutes talks are for advertising the corresponding posters at lunch time. Please visit the student posters!

  1. Josh Payne (20 minutes)
  2. David Van Horn (20 minutes)
  3. Mohammed Al-Kateb (5 minutes)
  4. Dan He (5 minutes)
  5. AJ Qiang Jing (5 minutes)
  6. Bill Biyu Liang (5 minutes)
  7. Katerina Simonova (5 minutes)
  8. Nagi Basha (5 minutes)

12:00-1:30pm. Lunch (provided by dept)
                     & Posters (Votey 351 and 322)

All graduate student research posters

Afternoon sessions (Votey 367):

1:30-3:00pm. Graduate student presentations

  1. Like Gao (20 minutes)
  2. Peter Chapin (20 minutes)
  3. Helen Yu (20 minutes)
  4. Tri Tran (20 minutes)
  5. Yan Zhang (20 minutes)

3:10-3:30pm. Coffee break

3:30-5:00pm. Discussion session (Votey 322):

Xindong Wu (lead, 30 minutes):
"Where we are and what we would like to be, in terms of our research profiles and long-term planning"

Byung Lee and Maggie Eppstein (lead, 30 minutes):
How to promote research collaborations (both within the Department and across campus)

Chris Skalka (lead, 30 minutes):
Publications in high-impact journals vs first-rate conferences

Sean Wang (lead, if any time left):  Free discussions.

Talk titles and abstracts:

Computational Human Genetics

Jason H. Moore, Ph.D.
Frank Lane Research Scholar in Computational Genetics
Associate Professor of Genetics
Dartmouth-Hitchcock Medical Center

Abstract:
An important goal of human genetics is to understand the role of genes in predicting the initiation, progression, and severity of common diseases such as breast cancer and hypertension. Will one or more of the 30,000 genes in the human genome predict your risk of having a heart attack above and beyond information about your diet and exercise patterns? This is an exciting time in human genetics because we now have access to technologies that make it feasible to measure thousands of DNA sequence variations across the human genome. The challenge for geneticists is to sift through the massive amounts of information to determine which specific DNA sequence variations play an important role in disease susceptibility. The purpose of this talk is to communicate the important role that computer science will play in turning data into clinically useful knowledge.

Network Analysis of Mating Interactions in Evolving Artificial Systems

Josh Payne, CS PhD Student

Abstract: In recent years, the analysis of complex systems as networks has received an increasing amount of attention. It has been shown that systems as seemingly disparate as the Internet, the interactions of chemicals within the cell, email correspondences, metabolic networks, the World Wide Web, neural networks, and human sexual interactions all possess similar underlying topological characteristics. That such dissimilar systems posses such similar inherent properties provides insight into the mechanisms by which complex systems are created and the rules by which they are governed. In order to gain a better understanding of the evolutionary dynamics of evolving artificial systems, we propose to model the mating interactions of population-based evolving systems as a network. This talk will focus on the presentation of network analysis in the context of evolutionary algorithms and will provide preliminary results depicting the emergence of scale-free mating topologies under high selection pressure. The implications of the emergence of such a mating structure will be discussed as will the future directions of this research.

Algorithmic Trace Effect Analysis: Context Based Security in Programming Languages

David Van Horn, CS MS Student

Trace effect analysis is a programming language analysis for ensuring correct and secure behavior of software with respect to so-called temporal properties of programs. Temporal properties express the well-ordering of program events, such as when files are opened or when resources are obtained. By enforcing program event order specifications, we can ensure that files are opened before being read, or that privilege activation occurs before privileged resources are acquired. In general, the ability to enforce temporal properties in programs allows a wide variety of safety and security guarantees to be made about software.

Further, this analysis is performed automatically at compile time, i.e. it takes place before program execution occurs, leading to the earlier detection of errors. A consequence of such compile time, aka static reasoning, is that if a program is deemed correct by the analysis, it will never violate its specification at run time, ensuring safe program behavior in all possible circumstances of program execution.

Common security mechanisms such as stack inspection, found in the Java programming language and .NET framework, are enforced via run time inspection of the sequence of events occurring up to the point of the inspection. These mechanisms are formalizable within trace effect analysis and thus can be enforced at compile time, implying all run time inspections are superfluous and can be eliminated. Any run time overhead associated with enforcing the security mechanism can therefore also safely be eliminated.

Our analysis consists of a program logic (a system of axioms and deduction rules) such that if a program has a derivable logical judgment, the program meets its temporal specification. Additionally we have defined an algorithm for discovering such a judgment for a program. This algorithm has been proved sound, which is to say any judgment found by the algorithm is a valid logical judgment and represents a proof that the program meets its specification. This proof discovery procedure has been implemented for a foundational programming language with notions of atomic events, temporal assertions, and computational traces (sequences of events).

Approximate Query Result Distribution over Data Streams: A Load Shedding Case

Mohammed Al-Kateb, CS PhD student

We consider the problem of approximate join processing over data streams. In this context, we address the problem of load shedding, the process of dropping a portion of incoming tuples when system resources, either or both of processing power and memory buffer size, are insufficient. In our model, the objective of load shedding is to preserve frequency distribution of the query result as much as possible, when memory buffer size is limited. We show that this optimization problem is NP-hard, and formulate it as a binary quadratic programming problem (BQP), also well-known to be NP-hard. In order to alleviate the computational complexity of the BQP problem, several heuristic approaches have been proposed. We use an approach called the memetic algorithm, also known as genetic local search (GLS) algorithm. This is a combination of the genetic algorithm and a local search algorithm. In many cases, memetic algorithms have been shown to be capable of finding near-optimum solutions for difficult combinatorial optimization problems.

Ontology-based feature transformation

Dan He, CS PhD Student

For ordinary classification problem, a large training set is needed so that nearly all important features are covered. For small training set problem, no current feature selection methods can achieve good classification accuracy. Because there are not enough information of features at all. Here we propose a method based on the domain ontology of the feature set. We use the domain ontology hierarchical structure of the features to better estimate the frequencies of those uncovered features, then to achieve a better classification result.

Energy-effective Target Tracking in Binary Sensor Networks

AJ Qiang Jing, CS PhD Student

Target tracking is a very important application of sensor networks, and there are various types of sensor networks for target tracking application. Among those the simplest model is the binary sensor networks, in which the sensor may generate as little as one bit of information: whether an object is approaching the sensor or moving away from it.

In the previous work, a probabilistic model is used to describe the moving object and the particle filtering method is used to estimate the trajectory of the object in the sensor field. The proposed tracking algorithm can give the accurate predications about the direction of motion of the object, and determine the location of the moving object as well if with the proximity sensors which report whether the object is in its immediate neighborhood.

The concern of energy conservation arises in the binary sensor networks as well. Actually among all the sensors in the field, only a small number of them are useful for the tracking algorithm. Our proposed algorithm focuses on how to selectively turn on the sensors which will provide the maximum information about the motion of the object, while at the same time turning off the sensors not providing useful information. So without compromising much of the tracking accuracy, we could extend the network life time considerably by conserving energy for the inactive sensors.

A Predictive QoS Control Strategy for Wireless Sensor Networks

Bill Biyu Liang, CS PhD Student

The number of active sensors in a wireless sensor network has been proposed as a measure, albeit limited, for quality of service (QoS) for it dictates the spatial resolution of the sensed parameters. In very large sensor network applications, the number of sensor nodes deployed may exceed the number required to provide resolution requirement. Herein we propose a method, dubbed predictive QoS control (PQC), to manage the number of active sensors in such an over-deployed network. The strategy is shown to obtain near Bernoulli lifetime and variance performance with the added benefit of not requiring the network to know the total number of sensors available. This benefit is especially relevant in networks where sensors are prone to failure due to not only energy exhaustion but also environmental factors and/or those networks where nodes are replenished over time. The method also has advantages in that only transmitting sensors need to listen for QoS control information and thus enabling inactive sensors to operate at extremely low power levels.

Pairwise key establishment in distributed sensor networks

Ekaterina Simonova, CS PhD Student

Due to their relatively low cost, distributed sensor networks are gaining more popularity for civilian and military purposes. Wide usage in hostile environment in order to collect information and vulnerability of nodes to physical capture raise strict requirements on security and authentication of communications within the network. Pairwise key establishment is a fundamental security service, which forms the basis for other security services. Limited computational and power resources make traditional asymmetric cryptography impractical, as expensive computations would exhaust power resources and expose the network to the denial-of service attacks. Special key predistribution techniques are being developed to establish secure channels between nodes.

Intelligent Agents Approach to the On-line Management and Control of Transportation Networks

Nagi Basha, CS PhD Student

In recent years, there has been a concerted effort aimed at taking advantage of the advances in communications, electronics, and Information Technology, efforts to improve the efficiency and safety of transportation systems. Within the transportation community, this effort is generally referred to as the Intelligent Transportation Systems (ITS) program. From an efficiency standpoint, ITS offers an alternative approach to meeting the challenges of increasing travel demand that does not require the physical construction of additional transportation system capacity, but rather an approach that attempts to optimize the utilization of existing capacity. At the core of several ITS applications is the notion of utilizing information about the current status of the transportation network (obtained in real-time through the use of in-road sensors, Closed Circuit TV cameras, incident reports, etc.) to come up with control strategies aimed at maximizing the use of existing capacity. These control strategies could take the form of route guidance recommendations to steer drivers away from congested segments, ramp metering strategies that control the traffic volume entering a freeway network, and adaptive signal control strategies that change traffic signal plans in real-time to better accommodate observed volumes.

Goal: Develop an intelligent agent to learn by itself from its interaction with the real world to provide route guidance for motorists.

Feature Selection for Building Cost-Effective Data Stream Classifiers

Like Gao, CS PhD Student

A stream classifier is a decision model that assigns a class label to a data stream, based on the continuously arriving data. Like any classification problem, various features of the stream can be used. Some features may have higher relevance to the classification task but quite costly to obtain, while other features may be less relevant but less costly to acquire. Cost-based classification has been studied to deal with features with different costs. However, a unique situation for data streams is that some (inexpensive) features may become more and more relevant with the passing of time, but the time needed for decision may be considered a cost. A challenge is how to balance the different costs when building classifiers. To this end, this paper uses a feature selection strategy by extending the traditional Relief algorithm.  The idea is to estimate the cost for classification of each feature, and order all the features with a score that combines both cost estimation and classification relevance. A classifier is then built with the selected features using a traditional classification method. Experimental results show that the classifiers constructed with this strategy are indeed cost effective.

Risk Assessment in Distributed Authorization

Peter Chapin, CS PhD Student

In a distributed environment resource owners do not know the identity of users. As a result, authorization decisions must be based on a chain of credentials that proves the user complies with the resource owner's access policy. Numerous "trust management" systems have been developed to address this. In these systems individual credentials are either valid or invalid. We have extended an existing trust management system to allow risks to be associated with credentials. We develop a formal semantics for our extension and describe an authorization algorithm that avoids wasteful searching and yet remains sound with respect to the semantics.

Title:Pattern Matching with wildcards and length constraints

Helen Yu He, CS PhD Student

In this talk, we are going to introduce our NSF proposal about a unique problem of pattern matching with wildcards and length constraints. Given a pattern P and a text T, a substring S in T is a matching string of P if (1) the number of wildcards between each two consecutive pattern letters in S and (2) length of S are both bounded by the user's specifications. We have designed a preliminary algorithm SAIL for solving this problem in an on-line fashion. Since the original problem is complicated, we are going to talk about our short-term and long-term plans for solving this problem.

Efficient Processing of Window-based Aggregation Join Queries over Data Streams using Query Transformations

Tri Tran, CS PhD Student

Our research concerns the problem of efficiently processing window-based aggregation join queries over data streams. These queries involve both join and aggregation operations, and handling them in combination allows for better optimization. Our focus is on query transformations from the original query execution plans. For this, we first introduce new stream operators: aggregation set update and aggregation set join. Then, we propose query transformation rules based on the idea of performing aggregation before joins so that join input cardinalities may be reduced while using the operators for continuous incremental processing. Through experiments, we confirm the efficiency advantage of transformed query execution plans over the original ones. Our ongoing work is to consider the optimization usage problem in the cases of insufficient memory space and computation resource. We define two error metrics as objective functions and formulate optimization problems for different cases of resource usage. For the problem involving multi-way join among multiple streams, it is recognized as an unsolvable problem, we propose heuristics to find the system parameters.

ACE: An Aggressive Classifier Ensemble with Error Detection, Correction and Cleansing

Yan Zhang, CS PhD Student

Given a dataset with deceptive information in it, how to locate the suspicious instances and attributes so as to benefit further data mining procedures is a very important issue. The essential goal is to reduce noise impacts and enhance the learners built from deception corrupted data. In this work, we provide a novel framework that unifies error detection, correction and data cleansing to build an aggressive classifier ensemble for effective learning from noisy data. Experimental comparisons will demonstrate that such an aggressive classifier ensemble is superior to the model built from the original noisy data, and is more reliable in enhancing the learning accuracy from noisy data sources, in comparison with data correction or cleansing efforts.