Main page
About this Wiki Manual
About the GIJTR
Random page
Translate
Language statistics
Message group statistics
Export
English
Create account
Log in
Export translations
From Wiki
Jump to:
navigation
,
search
Settings
Group
About the GIJTR
About this Wiki Manual
Access and Data Security
Addendum I
Addendum II
Addendum III
Addendum IV
Addendum V
Chapter Zero
Digital Archiving Lifecycle
Digitization, Preservation and Ingest
Glossary of Key Terms and Concepts
Introduction
Main challenges for CSOs creating digital archives
Main Page
Maintenance: Preservation, Development and Migrations
Manual Overview
Means and Resources for building a digital archive
Outreach and Social Activism: Archive in Action
Planning and Organizing
Summary
Value of Digital Archiving for Civil Society Organizations
Language
aa - Afar
ab - Abkhazian
abs - Ambonese Malay
ace - Achinese
ady - Adyghe
ady-cyrl - Adyghe (Cyrillic script)
aeb - Tunisian Arabic
aeb-arab - Tunisian Arabic (Arabic script)
aeb-latn - Tunisian Arabic (Latin script)
af - Afrikaans
ak - Akan
aln - Gheg Albanian
alt - Southern Altai
am - Amharic
ami - Amis
an - Aragonese
ang - Old English
ann - Obolo
anp - Angika
ar - Arabic
arc - Aramaic
arn - Mapuche
arq - Algerian Arabic
ary - Moroccan Arabic
arz - Egyptian Arabic
as - Assamese
ase - American Sign Language
ast - Asturian
atj - Atikamekw
av - Avaric
avk - Kotava
awa - Awadhi
ay - Aymara
az - Azerbaijani
azb - South Azerbaijani
ba - Bashkir
ban - Balinese
ban-bali - Balinese (Balinese script)
bar - Bavarian
bbc - Batak Toba
bbc-latn - Batak Toba (Latin script)
bcc - Southern Balochi
bci - Baoulé
bcl - Central Bikol
be - Belarusian
be-tarask - Belarusian (Taraškievica orthography)
bg - Bulgarian
bgn - Western Balochi
bh - Bhojpuri
bho - Bhojpuri
bi - Bislama
bjn - Banjar
blk - Pa'O
bm - Bambara
bn - Bangla
bo - Tibetan
bpy - Bishnupriya
bqi - Bakhtiari
br - Breton
brh - Brahui
bs - Bosnian
btm - Batak Mandailing
bto - Iriga Bicolano
bug - Buginese
bxr - Russia Buriat
ca - Catalan
cbk-zam - Chavacano
cdo - Mindong
ce - Chechen
ceb - Cebuano
ch - Chamorro
cho - Choctaw
chr - Cherokee
chy - Cheyenne
ckb - Central Kurdish
co - Corsican
cps - Capiznon
cr - Cree
crh - Crimean Tatar
crh-cyrl - Crimean Tatar (Cyrillic script)
crh-latn - Crimean Tatar (Latin script)
cs - Czech
csb - Kashubian
cu - Church Slavic
cv - Chuvash
cy - Welsh
da - Danish
dag - Dagbani
de - German
de-at - Austrian German
de-ch - Swiss High German
de-formal - German (formal address)
dga - Dagaare
din - Dinka
diq - Zazaki
dsb - Lower Sorbian
dtp - Central Dusun
dty - Doteli
dv - Divehi
dz - Dzongkha
ee - Ewe
egl - Emilian
el - Greek
eml - Emiliano-Romagnolo
en - English
en-ca - Canadian English
en-gb - British English
eo - Esperanto
es - Spanish
es-419 - Latin American Spanish
es-formal - Spanish (formal address)
et - Estonian
eu - Basque
ext - Extremaduran
fa - Persian
fat - Fanti
ff - Fula
fi - Finnish
fit - Tornedalen Finnish
fj - Fijian
fo - Faroese
fon - Fon
fr - French
frc - Cajun French
frp - Arpitan
frr - Northern Frisian
fur - Friulian
fy - Western Frisian
ga - Irish
gaa - Ga
gag - Gagauz
gan - Gan
gan-hans - Gan (Simplified Han script)
gan-hant - Gan (Traditional Han script)
gcr - Guianan Creole
gd - Scottish Gaelic
gl - Galician
gld - Nanai
glk - Gilaki
gn - Guarani
gom - Goan Konkani
gom-deva - Goan Konkani (Devanagari script)
gom-latn - Goan Konkani (Latin script)
gor - Gorontalo
got - Gothic
gpe - Ghanaian Pidgin
grc - Ancient Greek
gsw - Alemannic
gu - Gujarati
guc - Wayuu
gur - Frafra
guw - Gun
gv - Manx
ha - Hausa
hak - Hakka Chinese
haw - Hawaiian
he - Hebrew
hi - Hindi
hif - Fiji Hindi
hif-latn - Fiji Hindi (Latin script)
hil - Hiligaynon
ho - Hiri Motu
hr - Croatian
hrx - Hunsrik
hsb - Upper Sorbian
hsn - Xiang Chinese
ht - Haitian Creole
hu - Hungarian
hu-formal - Hungarian (formal address)
hy - Armenian
hyw - Western Armenian
hz - Herero
ia - Interlingua
id - Indonesian
ie - Interlingue
ig - Igbo
igl - Igala
ii - Sichuan Yi
ik - Inupiaq
ike-cans - Eastern Canadian (Aboriginal syllabics)
ike-latn - Eastern Canadian (Latin script)
ilo - Iloko
inh - Ingush
io - Ido
is - Icelandic
it - Italian
iu - Inuktitut
ja - Japanese
jam - Jamaican Creole English
jbo - Lojban
jut - Jutish
jv - Javanese
ka - Georgian
kaa - Kara-Kalpak
kab - Kabyle
kbd - Kabardian
kbd-cyrl - Kabardian (Cyrillic script)
kbp - Kabiye
kcg - Tyap
kea - Kabuverdianu
kg - Kongo
khw - Khowar
ki - Kikuyu
kiu - Kirmanjki
kj - Kuanyama
kjh - Khakas
kjp - Eastern Pwo
kk - Kazakh
kk-arab - Kazakh (Arabic script)
kk-cn - Kazakh (China)
kk-cyrl - Kazakh (Cyrillic script)
kk-kz - Kazakh (Kazakhstan)
kk-latn - Kazakh (Latin script)
kk-tr - Kazakh (Turkey)
kl - Kalaallisut
km - Khmer
kn - Kannada
ko - Korean
ko-kp - Korean (North Korea)
koi - Komi-Permyak
kr - Kanuri
krc - Karachay-Balkar
kri - Krio
krj - Kinaray-a
krl - Karelian
ks - Kashmiri
ks-arab - Kashmiri (Arabic script)
ks-deva - Kashmiri (Devanagari script)
ksh - Colognian
ksw - S'gaw Karen
ku - Kurdish
ku-arab - Kurdish (Arabic script)
ku-latn - Kurdish (Latin script)
kum - Kumyk
kus - Kʋsaal
kv - Komi
kw - Cornish
ky - Kyrgyz
la - Latin
lad - Ladino
lb - Luxembourgish
lbe - Lak
lez - Lezghian
lfn - Lingua Franca Nova
lg - Ganda
li - Limburgish
lij - Ligurian
liv - Livonian
lki - Laki
lld - Ladin
lmo - Lombard
ln - Lingala
lo - Lao
loz - Lozi
lrc - Northern Luri
lt - Lithuanian
ltg - Latgalian
lus - Mizo
luz - Southern Luri
lv - Latvian
lzh - Literary Chinese
lzz - Laz
mad - Madurese
mag - Magahi
mai - Maithili
map-bms - Basa Banyumasan
mdf - Moksha
mg - Malagasy
mh - Marshallese
mhr - Eastern Mari
mi - Māori
min - Minangkabau
mk - Macedonian
ml - Malayalam
mn - Mongolian
mni - Manipuri
mnw - Mon
mo - Moldovan
mos - Mossi
mr - Marathi
mrh - Mara
mrj - Western Mari
ms - Malay
ms-arab - Malay (Jawi script)
mt - Maltese
mus - Muscogee
mwl - Mirandese
my - Burmese
myv - Erzya
mzn - Mazanderani
na - Nauru
nah - Nāhuatl
nan - Minnan
nap - Neapolitan
nb - Norwegian Bokmål
nds - Low German
nds-nl - Low Saxon
ne - Nepali
new - Newari
ng - Ndonga
nia - Nias
niu - Niuean
nl - Dutch
nl-informal - Dutch (informal address)
nmz - Nawdm
nn - Norwegian Nynorsk
no - Norwegian
nod - Northern Thai
nog - Nogai
nov - Novial
nqo - N’Ko
nrm - Norman
nso - Northern Sotho
nv - Navajo
ny - Nyanja
nyn - Nyankole
nys - Nyungar
oc - Occitan
ojb - Northwestern Ojibwa
olo - Livvi-Karelian
om - Oromo
or - Odia
os - Ossetic
pa - Punjabi
pag - Pangasinan
pam - Pampanga
pap - Papiamento
pcd - Picard
pcm - Nigerian Pidgin
pdc - Pennsylvania German
pdt - Plautdietsch
pfl - Palatine German
pi - Pali
pih - Norfuk / Pitkern
pl - Polish
pms - Piedmontese
pnb - Western Punjabi
pnt - Pontic
prg - Prussian
ps - Pashto
pt - Portuguese
pt-br - Brazilian Portuguese
pwn - Paiwan
qqq - Message documentation
qu - Quechua
qug - Chimborazo Highland Quichua
rgn - Romagnol
rif - Riffian
rki - Arakanese
rm - Romansh
rmc - Carpathian Romani
rmy - Vlax Romani
rn - Rundi
ro - Romanian
roa-tara - Tarantino
rsk - Pannonian Rusyn
ru - Russian
rue - Rusyn
rup - Aromanian
ruq - Megleno-Romanian
ruq-cyrl - Megleno-Romanian (Cyrillic script)
ruq-latn - Megleno-Romanian (Latin script)
rw - Kinyarwanda
ryu - Okinawan
sa - Sanskrit
sah - Yakut
sat - Santali
sc - Sardinian
scn - Sicilian
sco - Scots
sd - Sindhi
sdc - Sassarese Sardinian
sdh - Southern Kurdish
se - Northern Sami
se-fi - Northern Sami (Finland)
se-no - Northern Sami (Norway)
se-se - Northern Sami (Sweden)
sei - Seri
ses - Koyraboro Senni
sg - Sango
sgs - Samogitian
sh - Serbo-Croatian
sh-cyrl - Serbo-Croatian (Cyrillic script)
sh-latn - Serbo-Croatian (Latin script)
shi - Tachelhit
shi-latn - Tachelhit (Latin script)
shi-tfng - Tachelhit (Tifinagh script)
shn - Shan
shy - Shawiya
shy-latn - Shawiya (Latin script)
si - Sinhala
simple - Simple English
sjd - Kildin Sami
sje - Pite Sami
sk - Slovak
skr - Saraiki
skr-arab - Saraiki (Arabic script)
sl - Slovenian
sli - Lower Silesian
sm - Samoan
sma - Southern Sami
smn - Inari Sami
sms - Skolt Sami
sn - Shona
so - Somali
sq - Albanian
sr - Serbian
sr-ec - Serbian (Cyrillic script)
sr-el - Serbian (Latin script)
srn - Sranan Tongo
sro - Campidanese Sardinian
ss - Swati
st - Southern Sotho
stq - Saterland Frisian
sty - Siberian Tatar
su - Sundanese
sv - Swedish
sw - Swahili
syl - Sylheti
szl - Silesian
szy - Sakizaya
ta - Tamil
tay - Tayal
tcy - Tulu
tdd - Tai Nuea
te - Telugu
tet - Tetum
tg - Tajik
tg-cyrl - Tajik (Cyrillic script)
tg-latn - Tajik (Latin script)
th - Thai
ti - Tigrinya
tk - Turkmen
tl - Tagalog
tly - Talysh
tly-cyrl - Talysh (Cyrillic script)
tn - Tswana
to - Tongan
tok - Toki Pona
tpi - Tok Pisin
tr - Turkish
tru - Turoyo
trv - Taroko
ts - Tsonga
tt - Tatar
tt-cyrl - Tatar (Cyrillic script)
tt-latn - Tatar (Latin script)
tum - Tumbuka
tw - Twi
ty - Tahitian
tyv - Tuvinian
tzm - Central Atlas Tamazight
udm - Udmurt
ug - Uyghur
ug-arab - Uyghur (Arabic script)
ug-latn - Uyghur (Latin script)
uk - Ukrainian
ur - Urdu
uz - Uzbek
uz-cyrl - Uzbek (Cyrillic script)
uz-latn - Uzbek (Latin script)
ve - Venda
vec - Venetian
vep - Veps
vi - Vietnamese
vls - West Flemish
vmf - Main-Franconian
vmw - Makhuwa
vo - Volapük
vot - Votic
vro - Võro
wa - Walloon
wal - Wolaytta
war - Waray
wls - Wallisian
wo - Wolof
wuu - Wu
xal - Kalmyk
xh - Xhosa
xmf - Mingrelian
xsy - Saisiyat
yi - Yiddish
yo - Yoruba
yrl - Nheengatu
yue - Cantonese
za - Zhuang
zea - Zeelandic
zgh - Standard Moroccan Tamazight
zh - Chinese
zh-cn - Chinese (China)
zh-hans - Simplified Chinese
zh-hant - Traditional Chinese
zh-hk - Chinese (Hong Kong)
zh-mo - Chinese (Macau)
zh-my - Chinese (Malaysia)
zh-sg - Chinese (Singapore)
zh-tw - Chinese (Taiwan)
zu - Zulu
Format
Export for off-line translation
Export in native format
Export in CSV format
Fetch
<languages/> [[File:<span lang="en" dir="ltr" class="mw-content-ltr">DPI.jpg</span>|thumb|<span lang="en" dir="ltr" class="mw-content-ltr">500x500px</span>]] <div lang="en" dir="ltr" class="mw-content-ltr"> == Introduction == </div> <div lang="en" dir="ltr" class="mw-content-ltr"> <big>Now that we have completed the planning and organizing stage and come out the other side safely armed with the '''[[Special:MyLanguage/Planning and Organizing#General Plan|General Plan]]''', the table of the archive’s structure, descriptions of the material, and a decision on software and storage media for the '''[[Special:MyLanguage/Glossary of Key Terms and Concepts#Digital Archiving System|Digital Archiving System]]''', we are prepared for the next stage. This is where the actual magic happens: the creation of our digital archive.</big> </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Along with the great promise it brings, this stage is also the most dynamic and complex, as well as the most resource-heavy, expertise-driven, and technologically demanding for the organization. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Our goal at this stage is to process and prepare all selected material—both physical and '''[[Special:MyLanguage/Glossary of Key Terms and Concepts#Born-digital|born-digital]]'''—and to make it digital preservation-ready. This means that by the end of this stage, we will have the material prepared with respect to all necessary technical and archival requirements for transfer into our newly selected Digital Archiving System. This includes a series of actions using software and other technological tools that need to be applied to our selected source material to be able to properly archive it and preserve it long-term. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Additionally, if we are working to digitally preserve source material that is wholly or partially physical, this stage includes a major pre-step: digitization. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> == Digitization == </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Through the process of '''[[Special:MyLanguage/Glossary of Key Terms and Concepts#Digitization|digitization]]''', we create digital copies, or “[[Special:MyLanguage/Glossary of Key Terms and Concepts#Digital Surrogates|surrogates]],” of original physical items. These digital copies are then processed as digital archival objects, preserved, and made accessible. We will, therefore, be focusing on the preservation of these digital copies rather than the original physical items. Consult [[Special:MyLanguage/Addendum II|Addendum II]] for further guidance. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> There are different types of physical objects we might want to digitize that can be stored on a variety of media. They include, for example, text, photographs, drawings, maps, video, audio, and other types of content stored on paper, audio cassettes, 16 mm tape, or any other physical or '''[[Special:MyLanguage/Glossary of Key Terms and Concepts#Analogue Document|analog]]''' storage media. </div> [[File:CSOs-in-Digital-Archiving-Toolkit-6x9-EN-final-print (KEY WORDS WIKI) Page 072.jpg|center|thumb|<span lang="en" dir="ltr" class="mw-content-ltr">450x450px</span>]] <div lang="en" dir="ltr" class="mw-content-ltr"> They could also include objects such as pieces of clothing, banners, personal belongings, etc. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Clearly, the type of material we need to digitize will define both major and specific decisions to be made in the process—and each organization will make them in line with its goals and capacities. However, general elements of the process also need to be addressed in all digitization projects. This chapter outlines those elements of digitization that are relevant to the process regardless of the material's type, content, or storage media. </div> [[File:<span lang="en" dir="ltr" class="mw-content-ltr">BREAKING NEWS!.png</span>|left|<span lang="en" dir="ltr" class="mw-content-ltr">85x85px</span>]] <div lang="en" dir="ltr" class="mw-content-ltr"> {| class="wikitable" |+ !BREAKING News: In-House Digitization May Cost More Than Outsourcing. |- |''If the organization's capabilities are insufficient for the requirements of the digitization process, a decision to hire an external company for the project must be considered. Doing so may determine the success or failure of the program. Initiating digitization with inadequate preparation, resources, and capacities could produce more costs than results, with little or no long-term value. On the other hand, a quality-assured, well-planned, and executed outsourcing option could save substantial time and effort. Hence, in-house digitization, with the different costs it involves, may sometimes cost the organization more than outsourcing the work externally.'' |} </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Digitization is a major, demanding archival project in and of itself and requires due attention, careful planning, and dedicated implementation. Since we are looking at digitization as part of a larger process of building a digital archive, we have already discussed some of the issues involved, mostly regarding the first few stages of the process. An overview of the digitization process is outlined in Figures 9a and 9b. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> {| class="wikitable" |+Figure 9a. Overview of stages and actions in the digitization process !1. Planning </div> <div lang="en" dir="ltr" class="mw-content-ltr"> General: goal, outcomes, timeframe, resources. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Logistical and organizational: workflow, </div> <div lang="en" dir="ltr" class="mw-content-ltr"> conditions, space, naming, equipment, metadata. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Archival and technological requirements: </div> <div lang="en" dir="ltr" class="mw-content-ltr"> quality, format, file naming, equipment & </div> <div lang="en" dir="ltr" class="mw-content-ltr"> metadata. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Planning for preservation of original physical </div> <div lang="en" dir="ltr" class="mw-content-ltr"> items. !2. Preparing Material </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Creating an inventory of physical material. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Review of material and selection of material </div> <div lang="en" dir="ltr" class="mw-content-ltr"> for digitization. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Description of material. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Preparing physical items for digitization. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> |} </div> <div lang="en" dir="ltr" class="mw-content-ltr"> {| class="wikitable" |+Figure 9b. Overview of stages and actions in the digitization process !3. Preparing Data/Tech </div> <div lang="en" dir="ltr" class="mw-content-ltr"> * Defining digitization requirements, file </div> <div lang="en" dir="ltr" class="mw-content-ltr"> naming, format selection, standard of </div> <div lang="en" dir="ltr" class="mw-content-ltr"> quality, collection of metadata. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> * Obtaining and installing digitization </div> <div lang="en" dir="ltr" class="mw-content-ltr"> equipment, software, storage media. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> * Setting up equipment to meet digitization </div> <div lang="en" dir="ltr" class="mw-content-ltr"> requirements, testing, fine-tuning. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> !4. Implementation </div> <div lang="en" dir="ltr" class="mw-content-ltr"> * Preparation of material </div> <div lang="en" dir="ltr" class="mw-content-ltr"> * Process scheduling </div> <div lang="en" dir="ltr" class="mw-content-ltr"> * Digitization </div> <div lang="en" dir="ltr" class="mw-content-ltr"> * Quality control </div> <div lang="en" dir="ltr" class="mw-content-ltr"> * Post-processing and '''[[Special:MyLanguage/Glossary of Key Terms and Concepts#Optical Character Recognition (OCR) Software|OCR]]''' </div> <div lang="en" dir="ltr" class="mw-content-ltr"> * Storage and backup </div> <div lang="en" dir="ltr" class="mw-content-ltr"> |} </div> <div lang="en" dir="ltr" class="mw-content-ltr"> In previous chapters, we discussed the development of a General Plan, the creation of an '''[[Special:MyLanguage/Planning and Organizing#Identification Inventory|Inventory]]''', and the selection and description of the material—which are also the first steps of the digitization process. Hence, having already covered the first two, we can pick up the digitization process at the beginning of the third stage by preparing archival and technological elements. </div> [[File:<span lang="en" dir="ltr" class="mw-content-ltr">BREAKING NEWS!.png</span>|left|<span lang="en" dir="ltr" class="mw-content-ltr">85x85px</span>]] <div lang="en" dir="ltr" class="mw-content-ltr"> {| class="wikitable" |+ !BREAKING News: Digitization Can Be Done on a Small Scale and With a Modest Budget. |- |''Small-scale digitization projects need to be adjusted to fit modest capacities and resources. Generally, that means there may be only one or two persons tasked with performing all the steps of the digitization process on one computer and with limited resources. The process is certainly less efficient, less reliable, and slower under those conditions, but it is doable and—whenever other options are not available—it is highly recommendable. Any digitizing work you can conduct can be highly significant, especially if the material is fragile and prone to deterioration.'' |} </div> <div lang="en" dir="ltr" class="mw-content-ltr"> === Specifying a Naming Convention for Digitized Files === </div> <div lang="en" dir="ltr" class="mw-content-ltr"> For a digital file intended for archiving and preservation, a name is not just a name. It is also a very important descriptor of that particular item, which should contain information that allows us to identify what the item is and what it contains so we can locate it in the archive and properly manage and preserve it. Therefore, an important element of specifications for digitization is the development and application of a consistent set of '''[[Special:MyLanguage/Glossary of Key Terms and Concepts#Archival rules|rules]]''', a so-called “naming convention” for digital surrogates we create from physical items. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> There are no universal rules for file naming, and each organization needs to develop its own naming convention that best suits its archival needs. However, the name of a digital surrogate should always provide a reference, a connection between itself and the physical item from which it was created through digitization. In principle, a '''[[Special:MyLanguage/Glossary of Key Terms and Concepts#Digital File Name|file name]]''' should contain several components that identify it, for example, its unique identifying number, its date of creation, a reference to its content, series, subseries, or folder it is a part of. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> We should also bear in mind that these file names primarily need to be processed and understood by the software we will use for managing our digital archive. Hence, our primary concern in naming files is to apply a convention that will enable our Digital Archiving System to correctly identify the file and use its information. However, many also consider it a good practice to include a descriptive component in a file name that could be understood by humans as well, for example, a reference to its title or content. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> While, as mentioned, there are no strict instructions for developing a naming convention, we can nevertheless identify some basic recommendations, as outlined in Figure 10. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> {| class="wikitable" |+Figure 10. Recommendations for a file naming convention !General !Identifiers !Standards |- |Use a reasonable number of components for a file name. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Names should be as short as possible, so use abbreviations. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Be consistent in the application of the file naming convention and do not allow for exceptions. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> |Include key identifiers as components of a filename </div> <div lang="en" dir="ltr" class="mw-content-ltr"> (i.e., identifying number of the item). </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Include descriptive components such as date, title, or reference to its content. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> |Use only English alphabet letters (a–z), numbers (0–9), dash (-), and underscore (_). </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Dates should be entered in the ISO standard format (i.e., yyyy-dd-mm). </div> <div lang="en" dir="ltr" class="mw-content-ltr"> |} </div> <div lang="en" dir="ltr" class="mw-content-ltr"> === Specifying File Formats and Quality === </div> <div lang="en" dir="ltr" class="mw-content-ltr"> In addition to the file name of a digital surrogate, its digital format and the standard of quality to which it will be digitized also need to be specified before the process can begin in earnest. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Since the same type of files—such as documents, photographs, or video—can be stored in different digital formats, we must specify which formats we will use for the digital surrogates created from our physical items. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Given that we are digitizing material for long-term preservation, it is important that we select formats that will allow their proper viewing and use by new generations of software. To prevent our digitized files from becoming obsolete, we should choose robust and resilient formats to change over time. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> This means we should look for formats that meet the necessary standards, are well-established, and are widely used with substantial and positive user feedback. The formats we select should also allow us to add information and metadata to the files and have stable support, commercially or through an open-source community. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Clearly, we will be considering different sets of formats depending on the type of items we are digitizing—documents, photographs, video, etc. The scope of format options can be overwhelming, and there is no universally ideal solution for each type of digitized content. The selection, again, depends on the specific needs and circumstances of the archive. Nevertheless, some formats have a proven high robustness and resilience to change. Figure 11 provides an overview of such formats for the most frequently digitized types of physical items: documents, pictures, audio, and video. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> {| class="wikitable" |+Figure 11. Overview of robust digital formats or digitization of different types of physical items. |Physical Item Type |Robust digital File format |- |Documents |PDF |- |Photographs |RAW or TIF |- |Slides and negatives |RAW or TIF |- |Audio |WAV |- |Video |MP4 |} </div> <div lang="en" dir="ltr" class="mw-content-ltr"> === Specifying Quality Standard(s) for Digitized Files === </div> <div lang="en" dir="ltr" class="mw-content-ltr"> An important element of the specifications for the digitization process is the quality standard to which we want and need to digitize our physical items. This is usually referred to as the “resolution” of a digitized document, photograph, or video. A higher resolution of a digital surrogate will allow for a better user experience and wider possibilities for its use—and, overall, a better copy of its original than a lower-resolution file. However, higher resolution also means that the digital surrogate will have a bigger digital size and will, therefore, take up more space in our storage media. </div> [[File:CSOs-in-Digital-Archiving-Toolkit-6x9-EN-final-print (KEY WORDS WIKI) Page 078.jpg|thumb|<span lang="en" dir="ltr" class="mw-content-ltr">Image shared by FAMDEGUA, GIJTR partner organization in Guatemala.</span>]] <div lang="en" dir="ltr" class="mw-content-ltr"> Therefore, in specifying the resolution of the digital surrogates we will create, we need to weigh the requirements for their quality standard with the demand it creates in terms of digital storage space for our archive. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> As human rights organizations working with unique and invaluable material, we can easily be tempted to digitize all our material in the highest available resolution to ensure the best possible quality of digital surrogates. However, this would be neither feasible nor sustainable, as it would create immense difficulties in storing, processing, and preserving such files long-term. Therefore, organizations must make digitization quality specifications in line with their goals and capacities. As a guide, Table 12 provides an overview of what is often considered minimal and optimal resolution quality levels for digitization of different types of physical items. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> {| class="wikitable" |+Figure 12. Overview of minimal and optimal resolution quality levels for digitization of different types of physical items. |Item Type |Minimal Quality |Optimal Quality |- |Documents |300 DPI |600 DPI |- |Photographs |600 DPI |1,200+ DPI |- |Slides and negatives |1,200 DPI |2,400+ DPI |- |Audio |16-bits and 44.1 KHz |24-bits and 96 KHz |- |video |1080P or 2 Megapixel |2K+ or 4 Megapixel |} </div> <div lang="en" dir="ltr" class="mw-content-ltr"> === Metadata: Descriptions of Digitized Files === </div> <div lang="en" dir="ltr" class="mw-content-ltr"> In the section dealing with the planning and organization of a digital archive, we discussed the important process of describing the archival material on several of its relevant attributes and creating a connection between those '''[[Special:MyLanguage/Glossary of Key Terms and Concepts#Description of Archival Material|descriptions]]''' and the material by recording them in a table. This is necessary, as it allows us to later search for, locate, and identify items and item groups based on those descriptions and properly manage, preserve, and use the archival material. The same principle applies to digital surrogates. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> After digitization, the digital files we create from the physical originals will become the items in our digital archive. Hence, they also need to be described and have their descriptions attached to them so they can later be found, accessed, and preserved. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> These linked descriptions of archival items are known as “metadata,” or data about data. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> In the process of digitization, it is essential that relevant metadata is collected and attached to the digital surrogates we create. This is because, without its attached metadata, a digital surrogate becomes meaningless and unusable—as we might be unable to find or identify it or understand what it is, its context, history, creator, or where it belongs in the archive. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Most of the metadata we need to preserve is linked to the digital archival files they describe, created, and captured by the software tools we use to digitize, manage, and archive the data. This includes basic metadata (e.g., date of creation/digitization) as well as very technical types of metadata, such as those on the validity or integrity of digital files. The software tools can, therefore, allow us to capture the metadata. Concrete technical solutions in relation to different types of metadata being captured and preserved are discussed further in the manual. However, our main concern is selecting which metadata types we want and need to record and preserve in our digital archival files. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Compared with physical originals, digital surrogates require and allow for a whole range of additional metadata to be collected. This includes metadata such as technical specifications of an archival digital file and information about its creation and any further digital action taken on it. For CSOs working with human rights material, such technical metadata is important for preserving and maintaining a digital surrogate's credibility and establishing the '''[[Special:MyLanguage/Glossary of Key Terms and Concepts#Chain of custody|chain of custody]]'''. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> A wide variety of types of metadata could be collected about digital surrogates both during and after the digitization process. Based on their purpose and function, the most common types are summarized in Figure 13. {| class="wikitable" |+Figure 13. Types of Metadata !Descriptive & Structural !Admin & Preservation !Technical |- |Descriptive metadata gives details about a digital record and its content to make it easier to find. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Structural metadata provides information about the internal structure of a digital file, including information like page, section, or index. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> |Administrative metadata refers to the information about the management of a digital record, such as who created it, or who can access it. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Preservation metadata helps the usage of digital records in the future; includes information about what software or hardware is needed to open and use a digital file. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> |Technical metadata, rather than being created for the purposes of archiving is often captured automatically through the software or hardware used to create a digital record. For example, photos created by a digital camera automatically capture information about the image and embed this information in the file itself. |} </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Selecting the metadata for any given digitization project will depend on its context and circumstances: an organization’s resources and capacities, the type of material, its intended applications, types of '''[[Special:MyLanguage/Glossary of Key Terms and Concepts#Access Plan|access]]''', and user needs, among others. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Existing metadata standards and specific, tested, and widely used metadata profiles and sets provide guidance through the maze of numerous metadata types and formats. However, there are now so many different metadata standards and sets developed and proposed by different organizations that their sheer number creates an obstacle to identifying those we want and need to use. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> A good place to start is with the so-called “[https://www.dublincore.org/specifications/dublin-core/dces/ '''Dublin Core Metadata Element Set'''].” Dublin Core is a widely applied set of 15 properties or elements for describing digital files. These elements are often considered a standard set of metadata that are applied almost regardless of the type of archival material, the archive's theme, or the type of software used in the Digital Archiving System. Further, for preservation purposes, the so-called PREMIS metadata standard provides a useful reference and guidance ([https://www.loc.gov/standards/premis/ '''PREMIS: Preservation Metadata Maintenance Activity (Library of Congress)''']). </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Whatever set of metadata we select for our collection, there is another set of decisions that we need to make about them to complete their digitization specifications. These include questions such as, Where will the metadata be stored? How will it be captured? When in the process do we capture it? </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Making decisions related to these questions before the digitization process will provide us with a plan for standardized and consistent collection and structuring of metadata throughout the digitization process. This is important to make our metadata “interoperable,”—which means structuring and formatting it in a way that allows it to be read and used by different computer systems. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Making our metadata interoperable will save us significant time and resources (as well as headaches) later in the process, not least in the next step when we need to ingest and make operable that metadata, along with the digital surrogate files to which it is linked, in our Digital Archiving System. These issues related to the processing of digital files and their metadata will be discussed in more detail in the upcoming section, where we look at how our entire material—digitized and '''[[Special:MyLanguage/Glossary of Key Terms and Concepts#Born-digital|born-digital]]'''—needs to be prepared for ingest into our Digital Archiving System. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> === Selection, Set-Up, and Testing of Equipment, Software, Hardware, and Storage Media === </div> <div lang="en" dir="ltr" class="mw-content-ltr"> This manual cannot recommend specific digitization equipment, software, or storage media or how to set up and optimize it. Such advice would necessarily be too generic for the requirements of any concrete project, and it would also be likely to become obsolete quickly. </div> [[File:CSOs-in-Digital-Archiving-Toolkit-6x9-EN-final-print (KEY WORDS WIKI) Page 083.jpg|thumb|<span lang="en" dir="ltr" class="mw-content-ltr">Image shared by CONAVIGUA, GIJTR partner organization in Guatemala.</span>]] <div lang="en" dir="ltr" class="mw-content-ltr"> However, we should mention three elements that need to guide our decisions in selecting the technology we use for digitization: characteristics of the material, an organization’s capacities and resources, and the archive’s needs and requirements. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> First, the equipment we select and how it will be set up and fine-tuned depends on the material we digitize: type, format, state of preservation, size/length of the originals, and quantity. Fragile material, for example, will require more refined and sensitive equipment and setup, while large quantities of material will require a solution for quick processing. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Further, our decisions will be dictated by our resources in terms of time, expertise, staff, space, and finances. Each of these aspects will set limits on what can be a feasible solution for our project. </div> [[File:<span lang="en" dir="ltr" class="mw-content-ltr">BREAKING NEWS!.png</span>|left|<span lang="en" dir="ltr" class="mw-content-ltr">85x85px</span>]] <div lang="en" dir="ltr" class="mw-content-ltr"> {| class="wikitable" |+ !BREAKING News: More Expensive Equipment Can Bring Down Overall Digitization Costs |- |We should be mindful that although digitization can be done on a different range of budgets, it is important to look at total costs of a project rather than one-off costs separately, such as the cost of a piece of equipment. Total project costs should include staff wages, equipment, time, etc. More expensive equipment that processes items more quickly, for example, could save us much more than it costs if we also calculate staff time and wages. |} </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Finally, and most importantly, the needs of our archive and its future users, as well as the modes of planned use for the materials we are digitizing, should define the minimal and optimal requirements of the equipment. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> For hardware and software, regardless of the type of material (documents, photographs, video, or other), the requirement will be to provide digital surrogates of desired quality in adequate formats and capture the selected metadata. In terms of storage media, the most important aspects to be considered are its reliability (resilience to data loss), durability (usability over a longer time period), and scalability (potential to expand the data storage space as required). </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Once we have selected and obtained our equipment, we need to install and set it up properly in line with our digitization requirements. This process is important and needs to be done properly. Otherwise, even the right equipment will not yield the required results. Hence, if an organization does not have internal expertise, external assistance would be advisable at this point. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> This is especially true given that the setup and its fine-tuning are not a one-off activity, as the process requires repeated testing and iterative changes before the required result is achieved. The testing process should include a sample of different groups of materials and involve the entire process of an item’s digitization (i.e., the digitization workflow). </div> [[File:CSOs-in-Digital-Archiving-Toolkit-6x9-EN-final-print (KEY WORDS WIKI) Page 084.jpg|thumb|<span lang="en" dir="ltr" class="mw-content-ltr">Image shared by CONAVIGUA, GIJTR partner organization in Guatemala.</span>]] <div lang="en" dir="ltr" class="mw-content-ltr"> === Implementation: Digitization Workflow === </div> <div lang="en" dir="ltr" class="mw-content-ltr"> The final stage of digitization is the implementation of all the different elements that we have been planning, deciding on, and devising in the previous stages. Digitization is a complex process, but if all of its parts and functions are planned and designed well and advance, its implementation will be streamlined and fruitful. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> That is why, in putting all elements together, we should develop a detailed '''[[Special:MyLanguage/Glossary of Key Terms and Concepts#Digitization Workflow|digitization workflow]]''', which should include all its actions and operations—from reviewing and preparing physical items and workspace to completing the workflow through storing the created digital surrogates and making backup copies. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Each digitization project will have its own unique workflow and specific sequence of digitization actions and operations. Further, some activities, such as quality control, will be repeated at different stages of the process, while others will be executed simultaneously or in parallel. Although specific actions and their sequence are tailored to each concrete project, we can identify the key elements required in any digitization workflow: preparations, process scheduling, digitization, quality control, post-processing, and storage and backup. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> === Preparation of Material, Protocols, and Workspace === </div> <div lang="en" dir="ltr" class="mw-content-ltr"> The digitization process begins in earnest by ensuring a clean and appropriate workspace, allowing enough area for work with physical materials as well as for digitizing equipment and a computer. Assuming that fragile or otherwise compromised material has already been removed, we can proceed to clean our physical material and remove any added items, such as paper clips or staples on documents. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Information and relevant digitization specifications about file naming, file resolution, and format, plus any metadata to be recorded, should be on hand and well-organized. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> === Process Scheduling === </div> <div lang="en" dir="ltr" class="mw-content-ltr"> As part of the workflow, it is essential to schedule the entire process clearly—to determine, document, and then strictly apply an exact sequence of operations to be performed during the digitization process. The scheduling should include buffer time for unexpected events. </div> [[File:!RESOURCE!.png|left|<span lang="en" dir="ltr" class="mw-content-ltr">85x85px</span>]] <div lang="en" dir="ltr" class="mw-content-ltr"> {| class="wikitable" |+ !Resource alert! |- |Excellent examples of digitization workflows and scheduling for organizations dealing with the preservation of cultural heritage material are provided in “[https://www.digitizationguidelines.gov/guidelines '''Technical Guidelines for Digitizing Cultural Heritage Materials'''],” issued by the USA Federal Agencies Digital Guidelines Initiative. |} </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Digitization Processing </div> <div lang="en" dir="ltr" class="mw-content-ltr"> The process of digitization itself will clearly be very different depending on the type, volume, content, and other characteristics of the material. Paper documents and photographs can be scanned reasonably quickly, while '''[[Special:MyLanguage/Glossary of Key Terms and Concepts#Analogue Document|analog]]''' audio and video will need to be digitized in real time. Artwork and historical documents will require a different scanning specifications set-up than will an administrative document. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Regardless of the differences, a good practice at the start of each digitization session is to digitize a reference item (document, photograph, short sample audio or video) with the result reviewed against specifications as a form of ad hoc quality control. In case of any discrepancy from the digitization specifications, equipment can be checked and its set-up fine-tuned. This will help avoid wasting entire sessions of work due to equipment or set-up issues. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> === Post-processing === </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Post-digitization processing of digital surrogates includes making slight corrections to a file to adjust it to a certain standard or specific project specification. This could include actions such as increasing the sharpness of sound in a video file or brightness of an image on a document. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Post-processing might sometimes also include creation of secondary, derivative copies of the file. These are created for specific purposes such as providing access or producing high-quality reproductions, and also for creating fully searchable documents from originally non-searchable image files through the application of '''[[Special:MyLanguage/Glossary of Key Terms and Concepts#Optical Character Recognition (OCR) Software|Optical Character Recognition]]''' (OCR) software. In essence, by running OCR software on our scanned image of a document, we add a layer of text onto that image file so other software can read it, which makes the document fully searchable. This is essential for making human rights archives more accessible and visible, which is often a key purpose of their digitization. Given the importance of the application of OCR technology in creating fully searchable text files from our digital surrogate image files, in [[Special:MyLanguage/Addendum IV|Addendum IV]] we provide a set of recommendations regarding its use. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> === Quality Review === </div> <div lang="en" dir="ltr" class="mw-content-ltr"> There are two elements to digitization quality control, and both can and should be implemented at multiple points in the process scheduling (i.e., both during and after digitization, as well as at regular intervals over the course of the project). The first element relates to ensuring that all physical items intended for digitization have indeed been digitized. This can be done automatically by comparing the two sets of data for physical items and their surrogates; however, this should also be accompanied by a sample manual check to ensure that digital surrogates properly correspond to their physical originals. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> The second element of quality review is ensuring that the digitization specifications have all been met—that the digital surrogates are created in the right format and quality, with correct filenames, and selected metadata has been captured. Here again we will need to use a combination of manual and automated quality review, which is supported by software tools and applications such as "[https://jhove.openpreservation.org/ '''JHOVE'''].” </div> <div lang="en" dir="ltr" class="mw-content-ltr"> === Storing Digitization Products === </div> [[File:CSOs-in-Digital-Archiving-Toolkit-6x9-EN-final-print (KEY WORDS WIKI) Page 089.jpg|thumb|<span lang="en" dir="ltr" class="mw-content-ltr">Image shared by ASOMOVIDINQ, GIJTR partner organization in Guatemala.</span>]] <div lang="en" dir="ltr" class="mw-content-ltr"> At the end of the process, we need to temporarily store the products of digitization on one or more storage media until they are prepared and ingested into a digital archival system. The end-result of the process should be one or more digital surrogates of the original, which are often referred to as “master files.” These are stored in a file directory structure created for this purpose. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Master files are the best-quality files we produce through digitization and are intended to be preserved long-term without loss of any essential features. The number of master files we will create will depend on the content of the originals and the planned uses of the digital surrogate. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> In addition to master files, we can also produce a number of secondary files, often called “access” or “service files.” These files are created from the master file and optimized for the intended use (e.g., for web or for research). </div> <div lang="en" dir="ltr" class="mw-content-ltr"> For organizations working with documentation on human rights abuses, it is especially important to note that these derivative files are used for the creation of files with fully searchable textual content through OCR. The usual practice is for only master files to be stored for preservation purposes. However, given the importance of the OCR—and therefore fully searchable versions of documents—for human rights archives, it is advisable to also create and store two such readable files, one as an access copy and the other for preservation purposes. The same applies for the master files, as we should create at least two backup copies and store them on two separate storage media whenever possible. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> == Preservation and Preparation for Ingest == </div> <div lang="en" dir="ltr" class="mw-content-ltr"> <blockquote> </div> <div lang="en" dir="ltr" class="mw-content-ltr"> <big>We are now fully in the digital archival world.</big> </div> <div lang="en" dir="ltr" class="mw-content-ltr"> <big>All our material is now in a digital form.</big> </div> <div lang="en" dir="ltr" class="mw-content-ltr"> <big>We also have a digital archival repository—in the form of a Digital Archiving System.</big> </div> <div lang="en" dir="ltr" class="mw-content-ltr"> </blockquote> </div> <div lang="en" dir="ltr" class="mw-content-ltr"> To complete the process of creating a digital archive, we now need to employ a set of software-based '''[[Special:MyLanguage/Glossary of Key Terms and Concepts#Archival Techniques|digital archiving techniques]]''' on both our digitized and [[Special:MyLanguage/Glossary of Key Terms and Concepts#Born-digital|born-digital]] material. This is necessary to prepare it for ingest and long-term preservation in the Digital Archiving System. We also need to set up and prepare our Digital Archiving System itself—its databases and software tools and applications—to properly receive, store, and preserve our digital archival material. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> To do that, we first need to review our basic archiving tools—the '''[[Special:MyLanguage/Glossary of Key Terms and Concepts#Archival structure table|archival structure table]]''' and descriptions of material—which in this digital archiving world will take the form of databases and text files containing file directories, metadata, and data documentation. Therefore, it is necessary to clarify these two key concepts that are uniquely important for digital archiving—metadata and data documentation—which are necessary for understanding how our digital archival content is organized, described, related, managed, and used within a Digital Archiving System. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> === What Is Metadata and Data Documentation. === </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Metadata is data—information about data, about the digital archival content. It is stored in a structured form suitable for software processing. Metadata is essentially equal to archival descriptions of digital content. Indeed, the descriptions of our content that we made in the previous stage will now, in the Digital Archiving System, become metadata, thereby adding to other types of metadata such as system-generated technical metadata or metadata on an item’s access history. Metadata is therefore necessary for the goals of long-term preservation and access, as it allows us to maintain the integrity, quality, and usability of content. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> '''[[Special:MyLanguage/Glossary of Key Terms and Concepts#Data Documentation|Data documentation]]''' provides information about the ''context'' of our data, our digital archival content. It is often provided in a textual or other human-readable form. Data documentation in fact supplements metadata and provides information that enables others to use the archival content. For example, if we conduct a survey of victims and are preserving their filled-in questionnaires as our digital archival data, we should also preserve related data documentation (e.g., a document detailing the survey design and methodology). Given that data documentation is also “data about data,” it could also be seen as a specific type of metadata, one which provides context and is recorded in human-friendly format. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> === Preparing Metadata and Data Documentation === </div> [[File:CSOs-in-Digital-Archiving-Toolkit-6x9-EN-final-print (KEY WORDS WIKI) Page 091.jpg|thumb|<span lang="en" dir="ltr" class="mw-content-ltr">Image shared by CCJ, GIJTR partner organization in Colombia.</span>]] <div lang="en" dir="ltr" class="mw-content-ltr"> While our digital files are safely stored and backed up on storage media awaiting ingest and archiving in the digital information system, we need to turn our attention to some housekeeping duties. They involve preparing our metadata and data documentation for the upcoming process to ensure the smooth ingest and proper archiving of files. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> This involves having a clear and well-organized record of data documentation and metadata thus far in the process—what they contain and how they relate to one another. This includes tables/databases with lists (or directories) of file names, the files’ metadata, and data documentation. Throughout previous chapters, we described how these documents are developed or generated through planning, inventory creation, review, selection, organization, description, and digitization of material. As a result, at this point in the process, we should have the following metadata and data documentation created: </div> <div lang="en" dir="ltr" class="mw-content-ltr"> A) This document started its life as Identification Inventory and then, through processes of organization and description, grew into the Table of Archive’s Structure. It contains metadata on the archive’s structure, grouping of files in series, subseries, and folders, and additional descriptive and technical metadata we selected to put into it. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> B) As a result of the digitization process, we have produced databases in which we recorded each digital surrogate we produced and the selected metadata about it. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Further, digitizing equipment and software also generated additional databases with metadata we selected to capture technical attributes of the digital surrogates and/or history of actions on them throughout the digitization process. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Finally, we also might have produced text documents containing data documentation, information about the context of the digital surrogates we created, or the digitization process itself. This will allow others to understand how our data can be interpreted or used. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> C) A database of '''[[Special:MyLanguage/Glossary of Key Terms and Concepts#Born-digital|born-digital]]''' files for preservation with their basic metadata will either already exist or be easily created using simple software tools such as “DROID” or “IngestList.” </div> <div lang="en" dir="ltr" class="mw-content-ltr"> D) There might be additional pre-existing tables/databases or text files containing metadata and/or data documentation about certain item groups or the entire collection. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> In order for our digital content, metadata, and data documentation to be properly ingested into the Digital Archiving System, we need to provide the system software with instructions on what these documents are and how they relate to each other. In this way, the system can, for example, correctly attach metadata in one database to the items metadata describes that are listed in a different database, and then to data documentation providing information about the given items’ context. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> As part of the preparations, we might also need to manually divide, merge, or combine some of our tables/databases to transform them into a more appropriate format. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> The exact steps that we will need to take in this process in which we will need to prepare our metadata and data documentation, or how we will input information about their inter-relations into the Digital Archiving System, will depend on the characteristics of the archive and the system itself. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Yet, regardless of these specifics, we will always need to have a clear overview, a map, or a scheme of our metadata and data documentation and how they are related before we can begin with the ingest. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> === Preservation and Preparation of Data for Archiving === </div> <div lang="en" dir="ltr" class="mw-content-ltr"> We can now move on to the preservation actions and preparation of our digital data for ingest and archiving. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> ==== Cleaning ==== </div> <div lang="en" dir="ltr" class="mw-content-ltr"> The first thing we should always do before working with digital data intended for preservation is perform an '''[[Special:MyLanguage/Glossary of Key Terms and Concepts#Antivirus|antivirus scan]]''' by connecting the storage media to a previously scanned computer that is not connected to any local network or internet. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> ==== Backup ==== </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Then comes the backup. At the end of the digitization process, we have already created backups of the digital surrogates’ master files. If we have not yet done the same for the '''[[Special:MyLanguage/Glossary of Key Terms and Concepts#Born-digital|born-digital]]''' data, we should create their backups now by producing two copies and storing them on separate storage media, if possible, at two different locations. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> ==== File Naming ==== </div> <div lang="en" dir="ltr" class="mw-content-ltr"> While our digital surrogates’ files have already been named in line with the naming convention we developed and adopted, our born-digital files might still have their original names. We must therefore apply our naming convention to the born-digital files and name them accordingly. Their names will then contain the same components—identification, description, technical, or other—as those we selected and used for the digital surrogates in a way that was described in the digitization chapter. There are reasonably simple and easy-to-use software tools that can perform this task of renaming our digital files automatically within the parameters we set for it, such as “Rename Master” and “File Renamer Basic.” </div> <div lang="en" dir="ltr" class="mw-content-ltr"> ==== Metadata ==== </div> <div lang="en" dir="ltr" class="mw-content-ltr"> In the previous section, we took stock of metadata and data documentation we collected thus far in the process. As explained there, we will need to ingest our metadata in a specific, fixed format that is recognizable by our Digital Archiving System. This specific format of metadata will be based on the metadata standard we selected to implement earlier in the process, and that we now need to apply for ingest of data into our Digital Archiving System. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> If, as advised in this manual, in the planning phase, we have already made a decision on the standard we will apply for metadata collection and implemented it through description and digitization phases, then our metadata will already have been gathered in accordance with that standard. Therefore, we should be able to arrange and prepare it for ingest in accordance with the system-recognizable format by making only basic technical arrangements or mapping our metadata to the standard. For example, in the digitization section, we mentioned that the so-called “Dublin Core” basic metadata standard is supported by most digital archiving software. Hence, if we applied this standard for the collection of metadata from the beginning, and we selected the software that supports it, we would now be able to translate the collected metadata into the format our Digital Archiving System can recognize and properly ingest. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> === Preservation of Metadata === </div> <div lang="en" dir="ltr" class="mw-content-ltr"> In the earlier discussion of metadata and the importance of its proper collection and management, we mentioned the key role it has for long-term preservation of digital archival data. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> This becomes even more salient and relevant at this point in the process, with the preparation for ingest and long-term preservation of our material. This is because, before we ingest and archive our data, we need to make sure that we capture the necessary metadata, which will allow our digital material to be adequately preserved, its authenticity maintained, and it remaining usable in the future. To understand which essential set of metadata we need to capture to preserve our invaluable data, we will need to get to know our digital files and their formats a bit better, including things such as our files’ validity, quality, and '''[[Special:MyLanguage/Glossary of Key Terms and Concepts#Fixity|fixity]]'''. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> === Identifying and Converting File Formats === </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Back in the digitization process, we established the need to store our digital material in file formats that are appropriate for long-term preservation. Primarily, these are formats that have a wide user/support community and are proven to be resilient to change over time. This is also why they are often called “lossless” as opposed to “lossy” formats that do tend to lose quality and/or change and degrade over time. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Our digitized material has already been stored in appropriate preservation formats through digitization, and now we need to make sure the same is true with our born-digital material. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> We first need to identify the format of our '''[[Special:MyLanguage/Glossary of Key Terms and Concepts#Born-digital|born-digital]]''' files, which we can do with the assistance of specialized software, such as “DROID” or “Siegfried,” that allows us to automatically identify the format of batches of our digital files. We will then proceed to change formats of the files for which we determine the need to be put into a different, preservation-appropriate format. Specialized software for conversion of files to different formats can be very useful in this process. Such software is format-specific (e.g., “Audio/Video to WAV Converter”) which converts audio and video files to WAV format, or “CDS Convert,” which allows conversion of documents, presentations, and images between different software formats. </div> [[File: <span lang="en" dir="ltr" class="mw-content-ltr">!TIP!.png</span>|left| <span lang="en" dir="ltr" class="mw-content-ltr">85x85px</span>]] <div lang="en" dir="ltr" class="mw-content-ltr"> {| class="wikitable" |+ !’’’The Importance of Using Proper Preservation Formats’’’ |- |Lossless formats, by rule, also produce larger files. Hence, for large collections and small organizations, such as CSOs, this can represent a challenge in terms of additional storage capacities they may require. However, this manual advises against making compromises with the selection of file formats, as use of proper preservation formats is essential for all following preservation actions and the success of the process as a whole. |} </div> <div lang="en" dir="ltr" class="mw-content-ltr"> === Validating Files === </div> <div lang="en" dir="ltr" class="mw-content-ltr"> The next step in preparing our digital content for proper preservation in the Digital Archiving System is validation of our files—that is, establishing that they really are what we think they are. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> In essence, through file validation, we check whether the format of a file is proper and correct—whether it is valid. Hence, through file format validation, we can check whether a file conforms to the file format specification—standards a specific file format such as .jpg, .doc., or TIFF must follow. As an illustration, file format validation could be compared to the inspection of boxes or folders in a physical archive to ensure they are not damaged, otherwise items could fall out or be damaged. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> In digital archiving, file format validation is particularly important for long-term preservation and access, for a number of reasons. Files with formats that are not valid are difficult to manage over time, especially when a file needs to be converted or migrated. Moreover, access might become difficult or impossible, as files with nonconforming formats become more difficult to open and use over time. Finally, files that are not valid will be more difficult—if not impossible—to render properly by future software. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Of course, we do not manually inspect whether a file format conforms to its specifications; there is software available to perform that function and identify and create reports on the files that are found not to be valid. We already mentioned one such software tool—JHOVE—in the chapter on quality control at the end of the digitization process, but there are also other tools, most of which are specialized for a certain group of formats. </div> [[File: <span lang="en" dir="ltr" class="mw-content-ltr">!TIP!.png</span>|left| <span lang="en" dir="ltr" class="mw-content-ltr">85x85px</span>]] <div lang="en" dir="ltr" class="mw-content-ltr"> {| class="wikitable" |+ !’’’Preservation Actions Should Immediately Follow Digitization’’’ |- |File format validation and other preservation actions, along with the quality control procedures, should be performed immediately at the end of the digitization process either as an alternative or in addition to conducting them as part of the preparations for ingest, depending on a project’s specific needs and workflow. |} </div> <div lang="en" dir="ltr" class="mw-content-ltr"> === Fixity === </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Fixity, a crucial element of the long-term preservation of files as well as in maintaining their integrity, authenticity, and usability, means a state of being unchanged or permanent. In essence, fixity allows us to determine whether a file has been altered or corrupted over time and to track and record any such changes. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> To be able to do this, we use fixity to record the initial state of a file before ingest by taking its “digital fingerprint.” In fact, fixity software will record a number of a file’s specific, technical characteristics and create an alphanumeric code—a “checksum.” This checksum, just like fingerprints for humans, will be unique for that file and should not change over time. The checksum for a file will be recorded as part of its metadata so we can always perform the same fixity check and establish whether the file’s checksum has changed—that is, whether a file has changed. Recording this type of preservation metadata is crucial for confirming and establishing a digital item's "chain of custody.” </div> <div lang="en" dir="ltr" class="mw-content-ltr"> In addition to allowing us to establish any changes to a file that have occurred over time, fixity is also useful when we are migrating files between different storage media, units, or digital depositories. It is highly advisable to apply a fixity check after each such file transfer to establish any changes that might have occurred in the course of the file '''[[Maintenance: Preservation, Development and Migrations#Active maintenance: Migration|migration]]'''. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Further, fixity will allow us to verify that any copies of a file we create for backup are complete and correct. Fixity checksum can also be given to other potential file users so they are able to verify that they have received the correct file. There is a range of software that can perform fixity, such as “Checksum” and “Exact.File,” just to name a few. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> === Quality Control === </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Many things can go wrong with digital files as they are created, managed, and stored before they reach the point of ingest. During digitization, due to an error or a virus, files can be damaged, made incomplete, or reduced in quality. It is therefore a good practice to perform as comprehensive a quality check of all our digital files as possible before their ingest and archiving. There is a whole set of tools that perform either specific or a group of quality control actions. Some examples include NARA’s File Analyzer and Metadata Harvester, which has a range of functions, or, on the other side of the spectrum, a highly designed “Fingerdet,” which helps detect fingerprints on digitized items. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> === Removing Duplicates and Weeding Files === </div> <div lang="en" dir="ltr" class="mw-content-ltr"> While we are at it, we should use this opportunity to clean up our files a bit. Over the course of collecting, organizing, copying, and temporarily storing our digital files, it is likely that we will have created duplicates, or that folders contain hidden files or files that do not belong in them. Having duplicates and other unwanted files in our collection can create confusion, in addition to unnecessarily taking up space in our storage. It is therefore a good practice to remove them before ingest. Depending on the size of the collection, this could be a very time-consuming and error-prone task if performed manually. Luckily, there are software tools that can do this for us efficiently and reliably. Examples of dedicated tools for this purpose include “FolderMatch” and “CloneSpy.” </div> <div lang="en" dir="ltr" class="mw-content-ltr"> === Metadata on Private, Sensitive, Confidential, or Copyrighted Data === </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Given the importance of data safety and security when archiving material related to human rights violations, it is highly advisable that, at this point, before the content is ingested, we make an additional review of the material with respect to privacy, sensitivity, confidentiality, and copyrights. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> During the description processes, we should have already identified groups of materials or even single items that contain personal or sensitive information. Now we need to make sure all relevant metadata about such material is collected and appropriately linked to the items. Depending on the material and the archive’s access policy, it might be useful, or even necessary, to add further metadata here, specifically that which provides instructions for its future management regarding copyright, protection, or restricted access to the material. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Conveniently, there are standards and software that have been developed to provide assistance in that process. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> === Standards === </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Standards for metadata selection, collection, and use often include a full range of preservation metadata. Application of such metadata standards supports the preservation of digital items and ensures their long-term usability. A range of standards has been developed for handling preservation and metadata in general. As such a wide choice of options can often limit a clear view, we recommend an organization use as a starting point the “Preservation Metadata Implementation Strategies” (PREMIS) standard. </div> [[File: <span lang="en" dir="ltr" class="mw-content-ltr">!RESOURCE!.png</span>|left| <span lang="en" dir="ltr" class="mw-content-ltr">85x85px</span>]] <div lang="en" dir="ltr" class="mw-content-ltr"> {| class="wikitable" |+ !Resource Recommendation! |- |PREMIS has achieved the status of being the accepted international standard for preservation metadata. Both a strength and a limitation of the PREMIS standard is it must be tailored to meet the requirements of the specific context; it is not an off-the-shelf solution in the sense that an archive simply implements it directly to its data. Some of PREMIS’s elements might not be relevant, and an organization may find that additional information beyond what is defined by the PREMIS standard is needed to support its requirements. |} </div> <div lang="en" dir="ltr" class="mw-content-ltr"> It should be noted that different metadata standards will often be integrated, or at least compatible, with the software we use for metadata collection and management functions. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> === Software Tools === </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Thus far in this chapter we have mentioned examples of different software solutions that can perform specific preservation metadata collection and management functions, such as file identification, conversion, validity, and fixity checks. Such tools will indeed sometimes be designed to perform just one specific, or a group of similar, functions. However, these individual tools are also often used together as a more wide-ranging software solution, which can provide a full scope of preservation and metadata-related functions. Moreover, such multifunctional tools for metadata are then incorporated into comprehensive software solutions that can manage the entire process of digital archiving within a given Digital Archiving System. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> In the planning section of this manual, where we discuss the selection of a software solution for our Digital Archiving System, we consider whether the option we choose has integrated support for the selected metadata standard, as well as all the necessary software tools to collect and manage preservation metadata to our archive’s requirements. At that point, we could opt for an enterprise solution that provides an all-in-one option with all necessary standards and tools integrated into it. But an alternative would be to build a solution that meets our needs by using different, interoperable software, with each performing one of the preservation functions. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> This stage of preparation of data for ingest and capturing preservation metadata makes salient the importance of our selection of the digital archiving software and the effect it has on the technologies and software tools we can and need to use. Therefore, the specific software tools we will apply in this phase, as well as later on, will fully depend on the type of solution we select for our digital archiving software. </div> [[File: <span lang="en" dir="ltr" class="mw-content-ltr">!TIP!.png</span>|left| <span lang="en" dir="ltr" class="mw-content-ltr">85x85px</span>]] <div lang="en" dir="ltr" class="mw-content-ltr"> {| class="wikitable" |+ !Digital Forensics |- |If working with older data storage formats or digital material of unclear origin and features—especially when a history of the material and “chain of custody” need to be established—a promising area of development is '''[[Special:MyLanguage/Glossary of Key Terms and Concepts#Digital Forensics|digital forensics]]''',” which provide benefits in addressing digital authenticity, accountability, and accessibility. This forensic technology can make it possible to identify privacy issues, establish a chain of custody for provenance, employ write protection for capture and transfer, and detect forgery or manipulation. It can also extract and mine relevant metadata and content, enable efficient indexing and searching by curators, and facilitate audit control and granular access privileges. Digital forensic technologies vary greatly in their capability, cost, and complexity, with certain equipment ranging from free to expensive. Some techniques are very straightforward to use, while others have to be applied with great care and sophistication. There is an increasingly rich set of open source forensic tools (e.g., “BitCurator”) that are free to obtain and use. |} </div> <div lang="en" dir="ltr" class="mw-content-ltr"> === Preparing the Digital Archiving System === </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Set-up and preparation of our digital archival system for its first ingest of digital files is a complex process that requires time, effort, patience, and reasonably advanced IT knowledge and skills. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Digital Archiving Systems cannot simply be installed and immediately used, as we do with standard commercial software. This is because any Digital Archiving System needs to be “instructed” on each and every aspect of its operations. Based on our requirements, we need to set the parameters in the system, create or design databases within it, create links between data and metadata, etc. Providing these “instructions” to our software might require anything from simply filling an electronic form or choosing an option from a drop-down menu to needing to use computer coding and other advanced IT skills. </div> [[File:CSOs-in-Digital-Archiving-Toolkit-6x9-EN-final-print (KEY WORDS WIKI) Page 101.jpg|center|thumb|866x866px|<span lang="en" dir="ltr" class="mw-content-ltr">Image shared by AVIPA, GIJTR partner organization in Guinea.</span>]] <div lang="en" dir="ltr" class="mw-content-ltr"> The amount of time and expertise needed depends on the type of software solution selected for the Digital Archiving System. The rule of thumb we applied to the selection of software applies here as well. Commercial solutions will be simpler for both set-up and use, but will likely offer fewer options for adaptation. Open-source solutions will mainly require more IT expertise and time—but can provide more suitable and tailored solutions. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> == Ingest == </div> <div lang="en" dir="ltr" class="mw-content-ltr"> This is the sweet spot, where the entire effort and process conducted so far comes together and results in the creation of our archive. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> However, we should not imagine that we can just click a button, go have a tea, and return to see all our data, metadata, and data documentation ingested and properly connected to each other. Rather, the ingest process will need to be performed in parts by transferring material per group over a period of time. In the process, we will also likely encounter errors, discover incorrect specifications in a system, or similar that will need to be addressed, and the system will need to be fine-tuned and the ingest repeated. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> After ingesting each group of material, we should produce at least one archival master copy of each item, at least two backup copies, and any derivative working copies we might need. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Backup copies should be created and stored in line with the best practice rules described earlier (i.e., create multiple copies on two different storage media technologies and store them at different locations). </div> <div lang="en" dir="ltr" class="mw-content-ltr"> As a final step, we need to perform the same preservation actions we applied to our content in preparation for ingest. This includes scanning the material as well as all backup copies with '''[[Special:MyLanguage/Glossary of Key Terms and Concepts#Antivirus|antivirus]]''' software and checking each file’s fixity, validity, and quality assurance. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> Suppose we have covered the basics so far and ensured all the elements have been prepared. In that case, the process should be successful. We should now be able to enjoy the fruits of our work—our precious material previously scattered around the office and in storage units and basements—having been turned into a digital archive. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> In the next step, we will make sure our archive's goals are also achieved—that it preserves our material for a long time and in a safe manner and provides as wide an access to its content as possible. </div> <div lang="en" dir="ltr" class="mw-content-ltr"> '''06:00''' </div>
Tools
Special pages
Printable version