I’ve re-written my first little d.v to properly split up the text file into arrays that I can actually use. Once I got that out of the way I was able to print the name and gender of artists by using a number between 1 and 15995. This screenshot shows some text displaying on screen. A bit of a breakthrough for me I will admit.
Next step was to recreate my initial attempt using this new code. Previously I just looped through the text file and added a 1 to the diameter every time a match was made. I knew that this wasn’t a great way to do it, and when I tried anyway it didn’t work because it didn’t like the string being matched with a character. Luckily Mitchell was able to point me towards the .equals function which worked a treat. Now it loops through the file and every time there is a match it adds 1 to variable number, these numbers are displayed on the left of the screen (in order: male, female, company, other). This number is huge and we can’t represent it exactly in pixel size, so I’ve divided it by 20 before drawing the circles.
We can see that there are substantially more males than any other artists. But then it gets a little murky… which lines represent which genders? And I still want some interactivity.
Over the last few weeks I’ve been working my way through Ben Fry’s Visualizing Data book, slowly coding up the examples and thinking about how or what I will be able to achieve. The problem was that I didn’t feel I was getting anywhere, sure I was entering all the code but I struggled to see the relevance it would have to me. So I decided to jump right in and attempt my first data visualisation with data from the Prints and Printmaking databases. First though I had to extract the data…
I have a backup of the database from MSSQL in .bak format. Of course I can’t import this into MySQL so I’ve had to install Microsoft SQL management studio on one of our computers, and after playing around I was able to import my backup file and see how the multiple databases fit together and work. It was complicated. Originally I thought there were 6 databases, but there are actually 29! The are all connected in some way, but trying to match those connections is proving to be a not so fun task.
I started querying the databases in different ways, trying to match artists to works through various numbers (IRN, UniqueArtistIdentifier, etc), this took some time and the results were less than rewarding. So after some playing around I was able to instead generate a simple text file with the artists name, gender, IRN, and artistIdentifer.
And so the really fun work started, it was time to play with some data!
I asked myself the simple question: Which gender has a higher representation in the collection?
So I set about trying to bend examples from the book to work with the file I had. Basically I wanted to draw some cirlces to represent the amount of artists that were male, female, a company or other, I knew what I wanted to do, but actually achieving it was surprisingly hard. (It was my first attempt though). After about an hour I had produced this:
Yep, that’s it, some green circles across the top of the screen. It didn’t even really use the data I wanted to, but it was something and it proved I was able to parse and mine the text file, a relief by all means. After a few more hours and many attempts, I was able to get to this stage:
4 circles of completely random sizes, which don’t represent how big the data set is. Nonetheless they do show the prominence of male artists compared to female, and the big black bulk of unknown gender artists.
This simple visualisation made me realise that data visualisation has another important function: it allows us to see how much data is missing. The black circle is about the same size as the group of males, which means that a lot of artists entries don’t have a gender assigned to them. In my next attempt, I’ll try and display the IRN (unique ID of the artist) or name of the artists, which will help identify which artists are missing that data.
This first attempt is small and so simple, but I am pleased I could produce it (even with the swearing) in a few hours. It’ll be a steep hill to climb…