In [97]:
import pandas as pd
import altair as alt
artefacts_df = pd.read_csv("artefacts-overview-stats.csv")

artefacts_non_nature_df = artefacts_df.loc[artefacts_df['topic'] != 'nature']


## Artefacts 


This is a doomed attempt to summarise the number of artefacts (see below for definitions) held in UK public collections. Doomed because, to quote from the British Library annual report 2023/2024 - 'in the absence of a consensus about what constitutes a single item it is not possible to reach a definitive statement of the size of the collection'. Added to that is also what is considered an artefact - are all museum objects, all items held in a library (every copy held of Joyce's Ulysses, even one printed last year?), any item in an archive to be counted? Should heritage buildings be counted as well, and do we say a castle is one item on par with a manuscript or painting ?

But regardless of this, as the TaNC report also tried to set out counts of the national collection, this is an attempt to visualise this more, to give 

Definitions (about which we can no doubt all politely argue forever about at some agreeable venue someday):

  * artefacts - the physical *thing* considered to be cultural heritage
  * catalogued record - the inventory level digital catalalogue record for the artefact (for museum, library and archives) or collection of artefacts (for archives)
  * enriched record - the richer (I'm not even going to try give a definition of what richer even means - does fully catalogued mean nothing more can ever be said?) catalogued record for the item (for museum, library and archives) or collection of artefacts (for archives)
  * digitised record - catalogued (enriched or otherwise) and an image (or other media) of the artefact (for museum, library and archives items) 

In these overview pages we givbe various breakdowns of the stats for each of the above and try to tease out
a sunmmary of the state of this in the UK cultural heritage collections. There are then

Summary
  * Overview (artefacts, catalogued records, objects, library, archive)
  * Sub collections records (granual version of above)
  * Sector level records (library,museum,archive)
  * Institution level records (per institution pages)

(we are not breaking down collection into sub collections for artefacts as that's getting too complex and
not really comparable so not useful for data visualisation)
    
Progress (i.e. showing difference between current and previous)
  * Overview Progress - tracking change over time
  * sub collections progress
  * Sector Progress
  * Institution level progress

Uncertainty Level

  * Level 1 - unavoidable inherent uncertainty
  * Level 2 - uncertainty of number of artefacts
  * Level 3 - uncertainty of cataloging type
  * Level 4 - uncertainty of number of records
  * Level 5 - uncertainty of number of published records

## Artefacts by topic

In [91]:
title = alt.TitleParams('Collection Size (Artefacts) by topic', anchor='middle')
alt.Chart(artefacts_df, title=title).mark_bar().encode(
    alt.Color('institution:N', sort='descending', legend=alt.Legend(orient='bottom',columns=5)),
    column='precision:O',
    x='artefact_count:Q',
    tooltip=['institution', 'artefact_count'],
    y='topic:N',
).properties(width=900).resolve_scale(x='independent').configure(numberFormat='.2s')

## Artefacts by sector

Showing the distribution of artefacts (a cultural heritage item of any type) on different topic held in each sector.

In [89]:
title = alt.TitleParams('Artefacts by sector', anchor='middle')
alt.Chart(artefacts_df,title=title).mark_bar().encode(
    alt.Color('institution:N', legend=alt.Legend(orient='bottom',columns=4)),
    x='artefact_count:Q',
    tooltip=['institution', 'artefact_count'],
    y='topic:N',
    column='sector'
).properties(width=225).resolve_scale(x='independent').configure(numberFormat='.2s')

# Uncertainty of level 2 - would fall if any institution was certain abnout artefacts (by 1/num_institutions)

## Artefacts by sector (excluding specimen collections)

Showing the distribution of artefacts (a cultural heritage item of any type)
on different topic held in each sector, excluding nature collections.

In [90]:
title = alt.TitleParams('Collection Size (Artefacts) by topic and sector (without Nature collections)', anchor='middle')
alt.Chart(artefacts_non_nature_df,title=title).mark_bar().encode(
    alt.Color('institution:N', legend=alt.Legend(orient='bottom',columns=4)),
    x='artefact_count:Q',
    tooltip=['institution', 'artefact_count'],
    y='topic:N',
    column='sector'
).properties(width=225).resolve_scale(x='independent').configure(numberFormat='.2s')

# Uncertainty of level 2 - would fall if any institution was certain abnout artefacts (by 1/num_institutions)

## Artefacts by institution type

In [98]:
title = alt.TitleParams('Artefacts by institution type', anchor='middle')
alt.Chart(artefacts_df,title=title).mark_bar().encode(
    alt.Color('institution:N', sort='descending', legend=alt.Legend(orient='bottom',columns=4,symbolLimit=80)),
    column='precision:O',
    tooltip=['institution', 'artefact_count'],
    x='artefact_count:Q',
    y='type:N',
).properties(width=900).resolve_scale(x='independent').configure(numberFormat='.2s')

## Artefacts by institution type (without Natural History)

In [87]:
title = alt.TitleParams('Artefacts by institution type (without Nature Collections)', anchor='middle')
alt.Chart(artefacts_non_nature_df,title=title).mark_bar().encode(
    alt.Color('institution:N', sort='descending', legend=alt.Legend(orient='bottom',columns=4,symbolLimit=80)),
    column='precision:O',
    tooltip=['institution', 'artefact_count'],
    x='artefact_count:Q',
    y='type:N',
).properties(width=900).resolve_scale(x='independent').configure(numberFormat='.2s')

## Artefacts by institution and collection type

In [95]:
import altair as alt
title = alt.TitleParams('Artefacts per Institition by collection type', anchor='middle')
alt.Chart(artefacts_df, title=title, width=125).mark_bar().encode(
    alt.Color('institution:N', legend=alt.Legend(orient='bottom',columns=4)),
    alt.X('artefact_count', axis=alt.Axis(orient='top')),
    y='institution',
    column='type',
    tooltip=['artefact_count', 'institution'],
    order='artefact_count:Q'
).configure(numberFormat='.2s').resolve_scale(x='independent')

## Artefacts per instition and collection type (without natural history)

In [78]:
import altair as alt

artefacts_non_nature_df = artefacts_df.loc[artefacts_df['topic'] != 'nature']

title = alt.TitleParams('Artefacts per Institition (by collection type, without specimen collections', anchor='middle')
alt.Chart(artefacts_non_nature_df, title=title, width=125).mark_bar().encode(
    alt.Color('institution:N', legend=alt.Legend(orient='bottom',columns=4)),
    alt.X('artefact_count', axis=alt.Axis(orient='top')),
    y='institution',
    column='type',
    tooltip=['institution', 'artefact_count'],
    order='artefact_count:Q'
).configure(numberFormat='.2s').resolve_scale(x='independent')