Course Project Ideas
Project groups will consist of two students. Ideally, you would choose
a project which has a bearing on your dissertation research. We provide
some project ideas; you are encouraged to discuss this and other ideas
with the instructor.
-
Naming and location
- Unstructured p2p systems such as Oceanstore, PAST etc. achieve good
storage location distribution using distributed hashes. However, the
hashes might not behave as expected (uniformly distributing across
the key space) for a real deployment. In our lab, William Acosta is
collecting traces that can sample the kinds of objects available on
Gnutella networks. An interesting challenge is to use the characteristics
of these objects names as the basis to decide on good hashing functions
that allow random distribution of objects/nodes.
- Notre Dame has a number of desktops available across the campus. These
devices can collectively provide a large distributed storage from local
unused storage. Choose the amount of reliability that you can provide
in order to coaelesce the storage into a single global storage. You
would choose the specific naming, replication ratios etc. For example,
a storage with no replication with the global name space as a union
of the individual name spaces. If machine A provides /dir1/1 and machine
B provides /dir1/2 and /dir2/1, then the global storage's name space
will be /dir1/{1,2} and /dir2/1. You can implement such a storage using
application level file servers (webdav, for example). Also check out
the unionfs paper.
- Peer-to-peer storage
- Extending the scenario of using the desktops to provide global storage,
an interesting challenge is to explore the nature of replication
in this system. Also, how would a compilation on this global store
behave (where are the new objects that are generated, stored?)
- Imagine a p2p storage which allows dynamic queries in the name space.
For example, the user can create a directory called "mkdir <query:name=Surendar>".
An interesting challenge is to design the parameters of such a system.
- Imagine a storage running on mobile devices (such as a laptop) that
allows users to create/access objects that are available in the local
neighborhood. As users move from different locations, some of these
links become invalid. The system can choose to search for these
objects and move them closer to the user. These objects will not
be accessible while the object is being moved, even though the name
might still show up in the namespace.
- Consistency and Replication
- In the context of the desktop store, what is the consistency model
for replication. Farsite dealt with similar issues.
- Storage management
- Given the large number of desktops in Notre Dame, how would make the
management (of stale data created by a node leaving the storage for
extended periods of time) of these storage easy?
- Security
- Distribution adds new challenges in providing a secure storage. Mechanisms
for distributed authorization and authentication are hard/cumbersome.
However, reducing some security guarantees is expected to make the
system scalable.
Traditional storage provides strong authentication
and authorization upfront before allowing for the object to be
stored. However, after an object is stored in the sytem, it is expected
to be valid. In the context of peer-to-peer/sensor storage (with
a large number of storage components), an interesting challenge is
to allow any entity to store objects into the storage . The problem
then is to make sure that you can say something about the stored
objects.
An interesting research problem is provide some restrictions on how
objects are stored (have to be signed), and migrated (each entity
must counter sign), in order to achieve as much flexibility in allowing
anyone to store and allowing for good auditability.
- Energy Management
- Disk drives heat up from continous operation. In a sensor type
storage where the disks may be deployed in the field (away from temperature
controlled rooms), can we design a storage that uses multiple disks
and distributes the blocks in such a fashion as to manage the temperature
of each individual disk. Such a policy is relatively easy in deciding
the exact disk to store a block, the challenge is in choosing the
store that will maximize the throughput for reading the objects in
the the future.
Available Resources
For your course projects, you can use: