Wednesday, January 19, 2011

Data Storage

One of the fundamental challenges of Abundant is storing bug data.  There are several conflicting challenges that must all be met in order to store data in ways that address all needs.

The data format must:

Be Human Readable
In order to best integrate with version control, we want the data format to be text based (as opposed to in some sort of database) so that users can see the bugs and track changes directly in the version control system.  Furthermore, version control works best with text content, as it is easier to compare changes automatically.

Be Able to Store Metadata
A lesson learned from developing b is that metadata (assigned-to, bug status, etc.) needs to stick with the actual data like issue description and comments.  In b metadata was stored separately in order to ease caching, but I believe it makes more sense to keep cached data for speed separate from all the actual data.  The data structure therefore needs to store structured data that is machine readable, at the same time as remaining human readable.

Be Fast to Parse
We want this to be scalable and that means fast access to any bug.  This can largely be done with untracked cache files, but assuming each bug is stored in its own file, each file should be fast to display without caching.  Listing, browsing, and filtering can be assisted with some sort of caching or indexing mechanism that will remain invisible to the user.


These factors affect what formats we can use, some options include:

Plain Text / Custom Format
A plain text file is the most human readable, but the least computer parseable.  This what was used for b, text in the file was split into sections denoted by titles square brackets, and any section that wasn't empty was displayed, and metadata was stored in a separate file.

XML / JSON / Other Standard Data Format
A structured text file which will contain both user content and metadata in one file per bug.  The primary advantage is this will make working with the data very easy, as there exist powerful parsers for standard data formats in both Python and most other languages.  This seems like a better option than a custom format.

A potential limitation of this method (and to some extend the one above) is in how Mercurial stores changes.  More details can be found in this email thread.

A File-based Database
Another alternative is using a database system that works well with version control.  This might be somewhat ideal, as it would (presumably) work flawlessly with Mercurial's change tracking, and also be efficient.  The problem with this method is I don't know of such a tool.  It seems fairly likely to me that such a thing exists, but I don't know of it.


These are all the challenges and options that come to mind, what are your thoughts?

Thursday, December 16, 2010

Proposal

Well, I have done a truly terrible job keeping this blog up to date over the last semester, but I've done a lot of work and finalized what I will be working on, so I give to you my proposal for Abundant, a distributed issue tracker!

The Proposal:


The Presentation - Scribd seems to not like the font I used for some reason, click the 'Download' link below if it looks bad:


Unfortunately, apparently Google doesn't support uploading anything other than images and videos to blogger, so the links above point to Scribd and potentially could break in the future.  Hopefully not.

Thursday, September 2, 2010

System.out.println("Hello world");

Hello all!

This blog is intended to be a place for me to chronicle the work I will be doing over the next school year developing my senior project in the Computer Science department at Willamette University. With apologies to non-technical readers, this blog is likely to be somewhat dense and is not intended for a general public readership.

At this point, I am exploring different potential ideas for what I could be working on. I have two ideas at this point, neither of which are fully formed, but both would be enjoyable, interesting, and challenging.

Developing an Operating System and Multi-Threaded Virtual Machine
My first idea, which I have had since sophomore year when I took CS353 and developed the PC231, a simple virtual machine to develop assembly code on, is to expand upon the PC231 in the following ways:
  • Improve the architecture (larger word size, two's complement, more hardware commands, etc.) - potentially/probably trying to replicate a more common architecture
  • Develop a limited kernel to handle memory management and execution control
  • Develop a rudimentary operating system to allow the user to execute arbitrary commands
  • Develop a virtual file system to integrate with the virtual machine
This would be a fairly new field for me - outside of CS353 I have never done any kind of assembly code work or low-level programming. Exploring this area would be interesting, however my limited experience and knowledge makes it difficult to predict if this is a reasonable goal, and if the amount of work is feasible.

Develop A Lightweight Issue Tracker for Use With Distributed Revision Control Systems
Over the summer, I developed an extension for Mercurial, a distributed revision control system. This extension is a lightweight, command-line bug tracker called b. I would like to develop this into a free standing tool which can be used anywhere, with few system requirements, no setup or configuration necessary, and a minimal footprint. Specifically, I intend to do the following:
  • Remove Mercurial dependency and turn b into a standalone program
  • Enable it to integrate cleanly with the major distributed version control tools (Mercurial, Git, Bazaar)
  • Develop an internal web server to provide a web interface to view and manage issues
  • Develop plugins for major IDEs (Eclipse, others if time permits)
  • Develop a graphical front-end for non-command line use
The different facets of this project would allow me to explore many fields at once, ranging from Python programming to web development to software management concepts. This would be an interesting and valuable project, both personally - because I would love to use such a tool - and potentially others would find it useful as well.


And these are my two ideas at this point. Comments, questions, and alternative ideas are welcome!