Course Project Ideas

Project groups will consist of two students. Ideally, you would choose a project which has a bearing on your dissertation research. We provide some project ideas; you are encouraged to discuss this and other ideas with the instructor.

  • Naming and location

    • Unstructured p2p systems such as Oceanstore, PAST etc. achieve good storage location distribution using distributed hashes. However, the hashes might not behave as expected (uniformly distributing across the key space) for a real deployment. In our lab, William Acosta is collecting traces that can sample the kinds of objects available on Gnutella networks. An interesting challenge is to use the characteristics of these objects names as the basis to decide on good hashing functions that allow random distribution of objects/nodes.
    • Notre Dame has a number of desktops available across the campus. These devices can collectively provide a large distributed storage from local unused storage. Choose the amount of reliability that you can provide in order to coaelesce the storage into a single global storage. You would choose the specific naming, replication ratios etc. For example, a storage with no replication with the global name space as a union of the individual name spaces. If machine A provides /dir1/1 and machine B provides /dir1/2 and /dir2/1, then the global storage's name space will be /dir1/{1,2} and /dir2/1. You can implement such a storage using application level file servers (webdav, for example). Also check out the unionfs paper.
  • Peer-to-peer storage
    • Extending the scenario of using the desktops to provide global storage, an interesting challenge is to explore the nature of replication in this system. Also, how would a compilation on this global store behave (where are the new objects that are generated, stored?)
    • Imagine a p2p storage which allows dynamic queries in the name space. For example, the user can create a directory called "mkdir <query:name=Surendar>". An interesting challenge is to design the parameters of such a system.
    • Imagine a storage running on mobile devices (such as a laptop) that allows users to create/access objects that are available in the local neighborhood. As users move from different locations, some of these links become invalid. The system can choose to search for these objects and move them closer to the user. These objects will not be accessible while the object is being moved, even though the name might still show up in the namespace.
  • Consistency and Replication
    • In the context of the desktop store, what is the consistency model for replication. Farsite dealt with similar issues.
  • Storage management
    • Given the large number of desktops in Notre Dame, how would make the management (of stale data created by a node leaving the storage for extended periods of time) of these storage easy?
  • Security
    • Distribution adds new challenges in providing a secure storage. Mechanisms for distributed authorization and authentication are hard/cumbersome. However, reducing some security guarantees is expected to make the system scalable.
      Traditional storage provides strong authentication and authorization upfront before allowing for the object to be stored. However, after an object is stored in the sytem, it is expected to be valid. In the context of peer-to-peer/sensor storage (with a large number of storage components), an interesting challenge is to allow any entity to store objects into the storage . The problem then is to make sure that you can say something about the stored objects. An interesting research problem is provide some restrictions on how objects are stored (have to be signed), and migrated (each entity must counter sign), in order to achieve as much flexibility in allowing anyone to store and allowing for good auditability.
  • Energy Management
    • Disk drives heat up from continous operation. In a sensor type storage where the disks may be deployed in the field (away from temperature controlled rooms), can we design a storage that uses multiple disks and distributes the blocks in such a fashion as to manage the temperature of each individual disk. Such a policy is relatively easy in deciding the exact disk to store a block, the challenge is in choosing the store that will maximize the throughput for reading the objects in the the future.

Available Resources

For your course projects, you can use:

  • Educational Experimental Systems Lab:

  • Planetlab machines
    • Access to widearea resources through planetlab
  • Hydra storage bricks
    • You will also have access to our own experimental storage bricks. Each brick runs a VIA 1.2 GHz processor and sports 250 GB or 1 TB
Surendar Chandra
Last modified: 08/24/2005 19:12