This section describes how to import GMD data into non-relational and relational databases.
Importing into a Non-Relational Database
Given the size of the metadata that needs to be processed, Gracenote recommends breakdown the ingestion line-by-line instead of trying to process the full dataset altogether. Below is an example of how the data can be imported into a non-relational database, such as Mongo DB. This is sample code to be used for guidance and is not a supported Gracenote product and will need to be tailored by the client depending on their needs.
Sample Code: Importing MongoDB
Note: This example imports data into collections, taking approximately three hours to complete ingestion.
Initial Import
Code
# coding: utf-8# Python 3# This is sample code to be used for guidance and is not a supported Gracenote product# This script imports GMD data from their raw files into a MongoDBimport gzipimport jsonimport ijson.backends.yajl2_cffi as ijsonfrom pymongo import MongoClientimport tqdm# Variables# dir_name = the name of the directory that contains the GMD JSON filesdir_name = '20180526'# location = the path to the directory that contains the GMD JSON files aka the 'dir_name' locationlocation = '/Users/username/Desktop/clientname/'# db_name = the name you would like to use for your databasedb_name = 'GMD_clientname'# Enter your MongoDB connection detailsmongo_db = MongoClient('127.0.0.1', connect=False)[db_name]print('\nConnected to the DB...')# A function to import data into a given Mongo DB collectiondef parse_gmd(file_path, iterator_key, unique_id_key, collection_name):items = []with gzip.open(file_path) as f:for item in tqdm.tqdm(ijson.items(f, iterator_key)):item['_id'] = item[unique_id_key]items.append(item)if len(items) > 50000:try:mongo_db[collection_name].insert_many(items, ordered=False)except Exception as e:print(e)items = []if items:try:mongo_db[collection_name].insert_many(items, ordered=False)except Exception as e:print(e)items = []print('\nImporting Artist Data...')parse_gmd(file_path= f"{location}{dir_name}/gracenote_artist_{dir_name}.json.gz",iterator_key="artists.item",unique_id_key='artistID',collection_name='artists')print('Finished importing the Artist Data')print('\nImporting Album Master Data...')parse_gmd(file_path= f"{location}{dir_name}/gracenote_album_master_{dir_name}.json.gz",iterator_key="masters.item",unique_id_key='albumMasterID',collection_name='album_master')print('Finished importing the Album Master Data')print('\nImporting Album Edition Data...')parse_gmd(file_path= f"{location}{dir_name}/gracenote_album_edition_{dir_name}.json.gz",iterator_key="editions.item",unique_id_key='albumEditionID',collection_name='album_edition')print('Finished importing the Album Edition Data')print('\nImporting Song Data...')parse_gmd(file_path= f"{location}{dir_name}/gracenote_song_{dir_name}.json.gz",iterator_key="songs.item",unique_id_key='songID',collection_name='song')print('Finished importing the Song Data')print('\nImporting Recording Data...')parse_gmd(file_path= f"{location}{dir_name}/gracenote_recording_{dir_name}.json.gz",iterator_key="recordings.item",unique_id_key='recordingID',collection_name='recording')print('Finished importing the Recording Data')print('\nAll Finished!')
Importing into a Relational Database
GMD delivers metadata in 6 to 7 files depending on the package clients are licensed (Artists, Album Master, Album Editions, Recordings, Songs, Descriptors and Hierarchy). Each file can be further broken down into smaller tables for easy querying and ingestion. Below is an example of how the data can be organized into smaller tables in a relational database (this example uses SQLite) which can be used as reference. This is sample code to be used for guidance and is not a supported Gracenote product and will need to be tailored by the client depending on their needs.
Sample Code: Import into SQLite
Sample schema to split the JSON files received in FTP server into smaller tables.
Sample indexing script based on the sample schema mentioned above.
Breaking down the JSON files further in a Relational Database (SQLite)
Schema
Code
CREATE TABLE artist (id int, name text, object_type text, sort_name text, disambiguation text);CREATE TABLE artist_descriptor (id int, type char(4), descriptor_id int, weight int);CREATE TABLE artist_name (id int, type text, name text, language text);CREATE TABLE artist_image (id int, type char(4), size text, width int, height int, value text);CREATE TABLE artist_text (id int, type char(4), value text);CREATE TABLE artist_xid (id int, type char(4), source text, value text);CREATE TABLE artist_related (id int, related_type char(32), ord int, related_id int, related_name text);CREATE TABLE artist_pop (id int, region char(4), score int, type text);CREATE TABLE master (id int, title text, artist_id int, year text, preferred_edition_id int, release_type int);CREATE TABLE edition (id int, title text, artist_id int, year text, master_id int, release_type int, track_count int);CREATE TABLE edition_track (id int, ord int, recording_id int, title text, artist_id int);CREATE TABLE edition_xid (id int, type text, value text);CREATE TABLE recording (id int, title text, artist_id int, year text, song_id int);CREATE TABLE recording_xid (id int, type text, value text);CREATE TABLE recording_descriptor (id int, type char(4), descriptor_id int, weight int);CREATE TABLE recording_pop (id int, region char(4), score int, type text);CREATE TABLE song (id int, title text, preferred_recording_id int, artist_id int);CREATE TABLE descriptor (type char(4), id int, value text);CREATE TABLE hierarchy (hierarchy_id int, hierarchy_name char(32));CREATE TABLE hierarchy_node (node_id int, type char(16), level int, parent_node_id int);CREATE TABLE hierarchy_local (node_id int, language_id int, script_id int, name text);CREATE TABLE hierarchy_mapped_descriptor (node_id int, mapped_descriptor_id int);
Indexing
Code
CREATE INDEX idx_artist_id on artist(id);CREATE INDEX idx_artist_name on artist(name);CREATE INDEX idx_artist_name_id on artist_name(id);CREATE INDEX idx_artist_name_type on artist_name(type);CREATE INDEX idx_artist_name_name on artist_name(name);CREATE INDEX idx_artist_related_id on artist_related(id);CREATE INDEX idx_artist_pop on artist_pop(id);CREATE INDEX idx_master_id on master(id);CREATE INDEX idx_master_title on master(title);CREATE INDEX idx_master_preferred_edition_id on master(preferred_edition_id);CREATE INDEX idx_master_artist_id on master(artist_id);CREATE INDEX idx_edition_id on edition(id);CREATE INDEX idx_edition_master_id on edition(master_id);CREATE INDEX idx_edition_track_id on edition_track(id);CREATE INDEX idx_edition_track_recording_id on edition_track(recording_id);CREATE INDEX idx_edition_xid_id on edition_xid(id);CREATE INDEX idx_recording_id on recording(id);CREATE INDEX idx_recording_song_id on recording(song_id);CREATE INDEX idx_recording_title on recording(title);CREATE INDEX idx_recording_xid_id on recording_xid(id);CREATE INDEX idx_recording_pop on recording_pop(id);CREATE INDEX idx_recording_descriptor_id on recording_descriptor(id);CREATE INDEX idx_song_id on song(id);CREATE INDEX idx_song_title on song(title);CREATE INDEX idx_song_preferred_recording_id on song(preferred_recording_id);CREATE INDEX idx_descriptor_id on descriptor(id);CREATE INDEX idx_descriptor_value on descriptor(value);CREATE INDEX idx_hierarchy_node on hierarchy_node(node_id);CREATE INDEX idx_hierarchy_local on hierarchy_local(node_id);CREATE INDEX idx_hierarchy_mapped_descriptor on hierarchy_mapped_descriptor(node_id);