Thread Closed 
Week 5: Codebases Millions of Lines of Code
09-28-2014, 10:51 PM
Post: #11
RE: Week 5: Codebases Millions of Lines of Code
The layout of this chart is very easy to follow, which I'm sure increases its ability to shock my friends' parents on Facebook. If I ignore the subjects and just look at the chart's numbers, it reminds me of an old book that helps kids visualize how “big” the number one million is.

The chart compares software that interested people will likely have heard of, but in a way that I believe is misleading. For example, when the show the lines of code count for Apache OpenOffice, I wonder if they're including a standalone Java Virtual Machine in that code count or not. If so, that wouldn't be a fair comparison with something like the Linux kernel. That would be like comparing a parts count for an engine with the parts count for a vehicle, which would include and engine and many other discreet components. It's possible that the healthcare.gov website includes in its lines of code count SQL Server, Windows Server, Internet Information Services, and all kinds of other complex software that an enterprise website can be built on. Am I to believe that healthcare.gov is made up of 500,000,000 lines of business logic?

The labeling inconsistency is something CS 5630 has made me more sensitive to. Some of the sub titles of the subjects are tiny descriptions, some are dates, and some are missing. Are the items with no descriptions items that simply speak for themselves or something?
Find all posts by this user
09-28-2014, 11:09 PM (This post was last modified: 09-28-2014 11:14 PM by lediaev.)
Post: #12
RE: Week 5: Codebases Millions of Lines of Code
I really like this visualization. I too was confused by the % on the arcs, but it makes sense once you see that the arcs are connected the same program over time. I thought the healthcare.gov part was very funny. Certainly, the size of a program is not necessarily proportional to its usefulness. I was also a bit shocked to see that a government program of 60 million lines of code was abandoned. I don't want to wonder how much money they wasted. I also wonder why some programs are so large. Why exactly does a car need 10 million lines of code? We are definitely missing information, but I don't think it's very practical to include it for every item. As rfinn mentioned, how exactly are the sizes being measured? It's true that if I write a program and use a standard library, the size of that library should not be included. Overall, though, I really like this vis. I like the layout, the colors, the extra thin white bars, and even the arcs are alright once we know what they mean. The only real problem I found was that it is very long. You need to scroll down to see it all.
Find all posts by this user
09-29-2014, 02:46 AM
Post: #13
RE: Week 5: Codebases Millions of Lines of Code
This is a very interesting visualization. It is essentially a bar chart with vertical axis representing different software (item axis) and horizontal axis representing lines of codes (data axis), and the bars are arranged in ascending order. However, the author creatively chopped the data axis and piled them vertically. On one hand, this greatly reduced the space for the visualization and effectively used the space (imagine how large a square would be if no adjustment is made). On the other hand, we are more accustomed to scrolling vertically than horizontally, and many mouse has a scroll wheel that makes vertical scrolling very easy. This technique is also applied to the last few large bars.

The author also used arcs to connect different versions of the same software. The arcs are encoded with size, in particular, thickness, to represent the ratio of the lines of codes of the new version relative to the old version.

However, this visualization is still too large and spans multiple pages. This makes us hard to get the needed information easily. I constantly scroll up and down to look for something. For example. the legends for color encoding for the types of software at the top right corner is not very useful when we scroll down. Probably we could have some floating legends.
Find all posts by this user
09-29-2014, 05:01 PM
Post: #14
RE: Week 5: Codebases Millions of Lines of Code
I agree with the interesting use of the y-axis comment. Showing the evolution between codebases between older pieces of software & the general growth in number of lines of the pieces of software as they evolved (using the light grey arching lines) is an interesting choice. The use of the "a million lines of code" divider being approximately the same color at first glance is a bit distracting & not sure if it is exactly necessary. The "fine lines" is also a nice touch, & does not add to chart junk as much as clarifying the codebase's amount of code lines. The no-scaling feature (besides from thousands of lines to millions) helps to showcase the fact that the apparent size of the healthcare.gov codebase is as large as it is. The fact that the author also included the books with the multipliers as well as the bacteria & mouse genome does not fit into the general theme of codebases & should be omitted.
Find all posts by this user
10-02-2014, 09:15 PM (This post was last modified: 10-02-2014 09:19 PM by holtvg.)
Post: #15
RE: Week 5: Codebases Millions of Lines of Code
I think this visualization does an ok job of showing the the amount of code in various programs running in different systems in our world while creating some eye catching details with it's variation in the color on the bars. I did however find the quantity scale confusing as initially it seemed it was going in both the x and y but on closer examination it's actually separated into multiple categories and multiple subscales in those categories from hundreds of thousands to millions of thousands of lines of code going in the y direction which could confuse the viewer. In addition this gives me a hard time comparing visually just how much bigger each quantity of data in each category is bigger than the previous because of multiple scales as it seems there's too much data being crammed.

I also though the gray semi circles going out were confusing and had a hard time making sense of it such as the one labeled “organism” and think the data could be represented differently there. The last thing was the same bars being in more than one row which makes it hard to sum up the amount for that as in the instances of “mouse” and “apparent size of healthcare.gov website though I understand this is also a space issue but I believe the scale can be optimized so as to not have multiple bars in different rows and to just have one bar and accommodate the larger data. In the end it did convey its point as to give the viewer an idea of the difference in the vast visual scale of data in multiple categories but it's hard to make quantitative comparisons off it.
Find all posts by this user
Thread Closed 


Forum Jump:


User(s) browsing this thread: 1 Guest(s)