Content area
Purpose
– After almost three centuries of employing western educational approaches, many African societies are still characterized by low western literacy rates, civil conflicts, and underdevelopment. It is obvious that these western educational paradigms, which are not indigenous to Africans, have done relatively little good for Africans. Thus, the purpose of this paper is to argue that the salvation for Africans hinges upon employing indigenous African educational paradigms which can be subsumed under the rubric of ubuntugogy, which the authors define as the art and science of teaching and learning undergirded by humanity toward others.
Design/methodology/approach– Therefore, ubuntugogy transcends pedagogy (the art and science of teaching), andragogy (the art and science of helping adults learn), ergonagy (the art and science of helping people learn to work), and heutagogy (the study of self-determined learning). That many great African minds, realizing the debilitating effects of the western educational systems that have been forced upon Africans, have called for different approaches.
Findings– One of the biggest challenges for studying and teaching about Africa in Africa at the higher education level, however, is the paucity of published material. Automated generation of metadata is one way of mining massive data sets to compensate for this shortcoming.
Originality/value– Thus, the authors address the following major research question in this paper: What is automated generation of metadata and how can the technique be employed from an African-centered perspective? After addressing this question, conclusions and recommendations are offered.
1. Introduction
While a great deal of attention has been paid to the “digital divide” within developed countries and between those countries and the developing ones, most Africans do not even have such luxury as access to books, periodicals, radio, and television channels, which is precisely why information and communication technology (ICT) is so important to Africa. ICT has the potential to have a positive impact on Africa's development. So, how can Africans transform that potential into reality? And how can Africans access that technology? Without access, that technology cannot do much for Africans – thus, the essence of digital technology.
The digital technology often refers to the newest ICT, particularly the internet. There are, of course, other more widely available forms of ICT, such as radio and telephones. But there are many problems concerning the generally abysmal state of networks of every kind on the continent that make it difficult to fully utilize the development potential of even this technology. Africa's electrical grid is grossly inadequate, resulting in irregular or non-existent electrical supplies. The biggest problem is that in many countries, significant power distribution networks are non-existent in rural areas.
Africa's phone systems are spotty and often rely on antiquated equipment, and progress is hamstrung by bureaucracy and, in most instances, state-owned monopolies. But African governments have the power to alter these circumstances and, gradually, some are doing so. The signs of progress are unbelievable. A few years ago, a couple of countries had internet access. Today, all 54 countries and territories in Africa have permanent connections, and there is also rapidly growing public access provided by phone shops, schools, police stations, clinics, and hotels.
Although Africa is becoming increasingly connected, access to the internet, however, is progressing at a limited pace. Of the 770 million people in Africa, only one in every 150, or approximately 5.5 million people in total, now uses the internet. There is roughly one internet user for every 200 people, compared to a world average of one user for every 15 people, and a North American and European average of about one in every two people.
An internet or e-mail connection in Africa usually supports a range of three to five users. The number of dial-up internet subscribers now stands at over 1.3 million, up from around one million at the end of 2,000. Of these, North Africa accounts for about 280,000 subscribers and South Africa accounts for 750,000 (Lusaka Information Dispatch, 2003). Kenya now has more than 100,000 subscribers and some 250 cyber cafes across the country (BBC, 2002). The widespread penetration of the internet in Africa is still largely confined to the major cities, where only a minority of the total population lives. Most of the continent's capital cities now have more than one internet service provider (ISP); and in early 2001, there were about 575 public ISPs across the continent. Usage of the internet in Africa is still considered a privilege for a few individuals and most people have never used it (Lusaka Information Dispatch, 2003).
In Zambia, for example, there are now about five ISPs, which include Zamnet, Microlink, Coppernet, Uunet, and Zambia Telecommunication Service, which is government owned. Most people in Lusaka go to internet cafes to check for their e-mail unlike surfing the internet to conduct research (Lusaka Information Dispatch, 2003).
Indeed, ICT can play a substantial role to improve access to all forms of education (formal schooling, adult literacy, and vocational educational training) and to strengthen the economic and democratic institutions in African countries. It can also help to address the major issue of this paper: i.e. one of the biggest challenges for studying and teaching about Africa in Africa at the higher education level being the paucity of published material. We suggest in this paper that automated generation of metadata is one way of mining massive data sets to compensate for this shortcoming. Thus, this paper addresses the following major research question: What is automated generation of metadata and how can the technique be employed from an African-centric perspective? After addressing this question, conclusions and recommendations are offered.
As can be seen from Table I, despite the fact that the available data are a bit dated, not much has changed since 2000. Even though the number of internet users in Africa has grown exponentially since 2000, the continent is still far behind the rest of the world. Mobile cellular telephones are becoming the major mode of communication in the region, albeit radio is still the most prevalent means of reaching the masses. It is therefore not surprising that the Economic Commission for Africa (2003) has lamented that “the media in Africa are, with notable exceptions, far from being a promoter of the information society in Africa.”
Despite the preceding challenge, with the attendant barriers, there are still hopeful signs on the horizon for Africa in terms of internet access. We hope to examine these issues in another paper.
2. Mining massive data sets
The capabilities of generating and collecting data, observed Alshameri (2006), have been increasing rapidly. The computerization of many business and government transactions with the attendant advances in data collection tools, he added, has provided huge amounts of data. Millions of databases have been employed in business management, government administration, scientific and engineering management, and many other applications. This explosive growth in data and database has generated an urgent need for new techniques and tools that can intelligently and automatically transform the processed data into useful information and knowledge (Chen et al., 1996). This paper explores the nature of data mining and how it can be used in doing research on African issues.
Data mining is the task of discovering interesting patterns from large amounts of data where the data can be stored in databases, data warehouses, or other information repositories. It is a young interdisciplinary field, drawing upon such areas as database systems, data warehousing, statistics, machine learning, data visualization, information retrieval, and high-performance computing. Other contributing areas include neural networks, pattern recognition, spatial data analysis, image databases, signal processing, and many application fields, such as business, economics, and bioinformatics.
Data mining denotes a process of non-trivial extraction of implicit, previously unknown and potentially useful information (such as knowledge rules, constraints, regularities) from data in databases. The information and knowledge gained can be used for applications ranging from business management, production control, and market analysis to engineering design and scientific exploration.
There are also many other concepts, appearing in some literature, carrying a similar or slightly different definitions, such as knowledge mining from databases, knowledge extraction, data archaeology, data dredging, data analysis, etc. By knowledge discovery in databases, interesting knowledge, regularities, or high-level information can be extracted from the relevant sets of data in databases and be investigated from different angles, thereby serving as rich and reliable sources for knowledge generation and verification. Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning and by many industrial companies as an important area with an opportunity for major revenue generation. The discovered knowledge can be applied to information management, query processing, decision making, process control, and many other applications. Researchers in many different fields, including database systems, knowledge-based systems, artificial intelligence, machine learning, knowledge acquisition, statistics, spatial databases, and data visualization have shown great interest in data mining. Furthermore, several emerging applications in information providing services, such as on-line services and the World Wide Web, also call for various data mining techniques to better understand user behavior in order to ameliorate the service provided and to increase business opportunities. Recent years have witnessed an explosion in the amount of digitally stored data, the rate at which data is being generated, and the diversity of disciplines relying upon the availability of stored data. Massive data sets are increasingly important in a wide range of applications, including observational sciences, product marketing, and the monitoring and operations of large systems. Massive data sets are collected routinely in a variety of settings in astrophysics, particle physics, genetic sequencing, geographical information systems, weather prediction, medical applications, telecommunications, sensors, government databases, and credit card transactions. The nature of these data is not limited to a few esoteric fields, but, arguably, to the entire gamut of human intellectual pursuits, ranging from images on Web pages to exabytes (∼1018 bytes) of astronomical data from sky surveys (Hambrusch et al., 2003).
There are different areas which provide for the use of data mining. The following are some examples:
-
(1) Astronomy and astrophysics have long used data mining techniques such as statistics that aid in the careful interpretation of observations that are an integral part of astronomy. The data being collected from astronomical surveys are now being measured in terabytes (∼1012 bytes), because of the new technology of the telescopes and detectors. These data sets can be easily stored and analyzed by high-performance computers (Grossman et al., 2001).
-
(2) Biology, chemistry, and medicine – informatics, chemical informatics, and medicine are all areas where data mining techniques have been used for a while and are increasingly gaining acceptance. In bioinformatics, which is a bridge between biology and information technology, the focus is on the computational analysis of gene sequences (Cannataro et al., 2004). In the chemical sciences, the information overload problem is becoming staggering as well, with the chemical abstract service adding about 700,000 new compounds to its database each year. Chemistry data are usually obtained either by experimentation or by computer simulation. In medicine, image mining is used on the analysis of images from mammograms, MRI scans, ultrasounds, DNA micro-arrays, and X-rays for tasks such as identifying tumors, retrieving images with similar characteristics, detecting changes, and genomics.
-
(3) Earth sciences, climate modeling, and remote sensing are replete with data mining opportunities. They cover a broad range of topics, including climate modeling and analysis, atmospheric sciences, geographical information systems, and remote sensing.
-
(4) Computer vision and robotics are characterized by a substantial overlap. There are several ways in which the two fields can benefit each other. For example, computer vision applications can benefit from the accurate machine learning algorithms developed in data mining, while the extensive work done in image analysis and fuzzy logic for computer vision and robotics can be used in data mining as well, especially for applications involving images (Kamath, 2001).
-
(5) Engineering – with sensors and computers becoming ubiquitous and powerful, and engineering problems becoming more complex, there is a greater focus on gaining a better understanding of these problems through experiments and simulations. As a result, large amounts of data are being generated, providing an ideal opportunity for the use of data mining techniques in areas such as structural mechanics, computational fluid dynamics, material science, and the semi-conductor industry.
-
(6) Financial data analysis – most banks and other financial institutions offer a wide variety of banking services such as checking, savings, and business and individual customer transactions. Added to that are credit services like business mortgages and investment services such as mutual funds. Some also offer insurance and stock investment services (Han and Kamber, 2001).
-
(7) Security and surveillance comprise another active area for data mining methodologies. They include applications such as fingerprint and retinal identification, human face recognition, and character recognition in order to identify people and their signatures for access, law enforcement, or surveillance purposes.
3. Requirements and challenges of mining massive data
That mining massive data to study and teach about Africa is imperative and portends challenges is hardly a matter of dispute. For instance a casual Google search of the word Africa yielded 1,640,000,000 results in 0.21 seconds. Imagine what the more enormous results would be if the word is combined with other words. Many areas of business, government, education, international relations, etc., pertaining to Africa deal with vast amounts of data. Significant amounts of these data which are now available on the internet need to be translated into meaningful knowledge. Many of the web site owners use different software packages to store and publish their data, and many professionals use different software packages as well to make sense of the data. Yet, besides South Africa, which has invested significantly in the area of data mining, most of the African countries have overlooked this valuable resource that will give them a lot more control over their data. Since the data are often vast and noisy – meaning that they are imprecise and their structures are complex, data mining is a solution (for more on this, see e.g. Hart, 2006).
As stated above, there are, of course, challenges that confront data mining. The following are some of the challenges:
-
(1) Handling of different types of high-dimensional data. Since there are many kinds of data and databases used in different applications, one may expect that a knowledge discovery system should be able to perform effective data mining on different kinds of data.
-
(2) Efficiency and scalability of data mining algorithms. With the increasing size of data, there is a growing appreciation for algorithms that are scalable. To effectively extract information from a huge amount of data in databases, the knowledge discovery algorithms must be efficient and scalable to large databases.
-
(3) Usefulness, certainty, and expressiveness of data mining results. Scientific data, especially data from observations and experiments, are noisy. Removing the noise from data, without affecting the signal, is a challenging problem in massive data sets.
-
(4) Building reliable and accurate models and expressing the results. Different kinds of knowledge can be discovered from a large amount of data. These discovered kinds of knowledge can be examined from different views and presented in different forms.
-
(5) Mining distributed data. The widely available local and wide-area computer networks, including the internet, connect many sources of data, and form huge distributed heterogeneous databases, such as the text data that are distributed across various web servers or astronomy data that are distributed as part of a virtual observatory.
-
(6) Protection of privacy and data security. When data can be viewed from many different angles and at different abstraction levels, it can threaten the goal of ensuring data security and guarding against the invasion of privacy (Chen et al., 1996). It is important to study when knowledge discovered may lead to an invasion of privacy and what security measures can be developed to prevent the disclosure of sensitive information.
-
(7) Size and type of data. Science data sets range from moderate to massive, with the largest being measured in terabytes. As more complex simulations are performed and observations over long periods at higher resolution are conducted, the data will grow to the petabyte range. Data mining infrastructure should support the rapidly increasing data volume and the variety of data formats that are used in the scientific domain.
-
(8) Data visualization. The complexity and noise of massive data affect data visualization. Scientific data are collected from variant sources by using different sensors. Data visualization is needed to use all available data to enhance an analysis. Unfortunately, a difficult problem may emerge when data are collected on different resolutions, using different wavelengths, under different conditions, with different sensors (Kamath, 2001).
4. Mining African massive data
4.1 Mining spatial databases
The study and development of data mining algorithms for spatial databases are motivated by the large amount of data collected through remote sensing, medical equipment, and other instruments. Managing and analyzing spatial data became an important issue due to the growth of the applications that deal with geo-reference data. A spatial database stores a large amount of space-related data, such as maps, pre-processed remote sensing, or medical imaging data. Spatial databases have many features distinguishing them from relational databases. They carry topological and/or distance information, usually organized by sophisticated, multidimensional spatial indexing structures that are accessed by spatial data access methods and often require spatial reasoning, geometric computation, and spatial knowledge representation techniques. Another difference is the query language that is employed to access spatial data. The complexity of the spatial data type is another important feature (Palacio et al., 2003).
The explosive growth in data and databases used in business management, government administration, and scientific data analysis has created the need for tools that can automatically transform the processed data into useful information and knowledge. Spatial data mining is a subfield of data mining that deals with the extraction of implicit knowledge, spatial relationships, or other interesting patterns not explicitly stored in spatial databases (Koperski et al., 1998). Such mining demands an integration of data mining with spatial database technologies. It can be used for understanding spatial data, discovering spatial relationships and relationships between spatial and non-spatial data, constructing spatial knowledge databases, reorganizing spatial databases, and optimizing spatial queries. It is expected to have wide applications in geographic imaging, navigation, traffic control, environmental studies, and many other areas where spatial data are employed (Han and Kamber, 2001).
A crucial challenge to spatial data mining is the exploration of efficient spatial data mining techniques due to the huge amount of spatial data and the complexity of spatial data types and spatial access methods. Challenges in spatial data mining arise from different issues. First classical data mining is designed to process numbers and categories, whereas spatial data are more complex and include extended objects such as points, lines, and polygons. Second, while classical data mining works with explicit inputs, spatial predicates, and attributes are often implicit. Third, classical data mining treats each input independently of other inputs, while spatial patterns often exhibit continuity and high autocorrelation among nearby features (Shekhar et al., 2002).
Fayyad et al. (1996) used decision tree methods to classify images of stellar objects to detect stars and galaxies. About three terabytes of sky images were analyzed. Similar to the mining association rules in transactional and relational databases, spatial association rules can be mined in spatial databases. Spatial association describes the spatial and non-spatial properties which are typical for the target objects but not for the whole database (Ester et al., 2000). Koperski et al. (1998) introduced spatial association rules that describe associations between objects based on spatial neighborhood relations. An example can be the following:
(Equation 1)
This rule states that 90 percent of African countries receiving western aid are also highly corrupt, and 0.5 percent of the data belongs to such a case.
Visualizing large spatial data sets became an important issue due to the rapidly growing volume of spatial data sets, which makes it difficult for a human to browse such data sets. Shekhar et al. (2002) have constructed a web-based visualization software package for observing the summarization of spatial patterns and temporal trends. The visualization software will help users gain insight and enhance their understanding of the large data.
The need for mining spatial databases to learn and teach about Africa is evident in the launching of the Spatial Data Infrastructure-Africa (SDI-Africa), a newsletter geared toward people interested in geographic information systems (GIS), remote sensing, and data management issues in Africa. The newsletter seeks to raise awareness and provide useful information to strengthen SDI initiatives in the various African countries and aid the synchronization of regional activities (SDI-Africa, 2007, p. 1).
Also, there is the GIS Lounge, which hosts both commercial and free data sources covering the African continent. It has the following cadastral, environmental, and cultural databases: Africa Data Dissemination Service, Africover, Safari 2000 Project, SAHIMS GIS Data, Stuck Woman Traps South African Cave Group, GIS Data, Geographic Data for Africa Lags, National Land Cover Data Set, and Six Best Sources for Local Land Use/Land Cover GIS Data (GIS Lounge, 2012, p. 1).
In addition, there is the UN-Water/Africa spatial data initiative. The initiative, which was formerly called the Interagency Group for Water in Africa, comprises many United Nations agencies. The initiative was launched in 1992 to coordinate and harmonize water activities in Africa. Members of the initiative meet routinely to review progress, exchange information, and plan follow-up activities (UN-Water/Africa, 2012, p. 1).
Furthermore, in 1996, the South African government developed the Spatial Development Initiative (SDI) methodology as an integrated planning tool aimed at promoting investment in the various parts of the country that were underdeveloped but had potential for growth. The methodology is geared toward a process whereby the public sector develops or facilitates conditions suitable for private sector investment and Public-Private-Community Partnerships. The initiative driven by regional economic development and integration imperatives made imperative by globalization (Maputo Corridor Logistics Initiative, 2012, p. 1).
As Christian Rogerson shows in his paper titled “Spatial Development Initiatives in Southern Africa: The Maputo Development Corridor,” about the SDI initiatives, they are becoming a critical aspect in the planning for reconstruction in post-apartheid South and Southern Africa. According to Rogerson, the initiatives mark a fundamental break with the trajectories and initiatives for economic spatial planning of the apartheid regimes. He examines the record and development impact of the SDI initiatives through the lens of the most well-known Maputo Development Corridor or Maputo SDI. He finds that the cross-border activities of this particular SDI make it an important case study in understanding the recent shifts toward a greater regional Southern African economy. He therefore argues that the Maputo SDI represents one illustration of the construction or configuration of a new regionalism in Southern Africa's development (Rogerson, 2001, p. 324).
Finally, in their paper titled “A Review of the Status of Spatial Data Infrastructure Implementation in Africa” (2010), Prestige Makanga and Julian Smit point out that while governments across the globe are realizing the value of National Spatial Data Infrastructures (NSDIs) and are making major investments in launching NSDIs, African governments are doing so at a snail's pace. The authors then present an assessment of the status of NSDI activities in 29 African countries representing all five regions of the continent. They show that generally, formal NDSI activities in most of the countries surveyed are still in their infancy. They then recommend possible steps that can be undertaken to foster SDI implementation on the continent and highlight potential areas for further SDI research.
4.2 Mining text databases
Text databases consist of large collections of documents from various sources, such as news articles, research papers, books, digital libraries, e-mail messages, and web pages. Text databases are rapidly growing due to the increasing amount of information available in electronic forms, such as electronic publications, e-mail, CD-ROMs, and the World Wide Web (which also can be considered as a huge interconnected dynamic text and multimedia database).
Data stored in most text databases are semi-structured data in that they are neither completely unstructured nor completely structured. For example, a document may contain a few structured fields, such as a title, author's name(s), publication date, length, category, etc., and also contain some largely unstructured text components, such as an abstract and contents.
Traditional information retrieval techniques have become inadequate for the increasingly vast amounts of text data (Han and Kamber, 2001). Typically, only a small fraction of the many available documents will be relevant to a given individual user. Without knowing what could be in the documents, it is difficult to formulate effective queries for extracting and analyzing useful information from the data. Users need tools to compare different documents, rank the importance and relevance of the documents, or find patterns and trends across multiple documents. Thus, text mining has become an increasingly popular and essential theme in data mining.
Information retrieval is a field that has been developing in parallel with database systems for many years. Unlike the field of database systems, however, which has focussed on query and transaction processing of structured data, information retrieval is concerned with the organization and retrieval of information from a large number of text-based documents. A typical information retrieval problem is to locate relevant documents based on user input, such as keywords or example documents. This type of information retrieval system includes online library catalog systems and online document management systems (Berry et al., 1999).
It is vital to know how accurate or correct a text retrieval system is in retrieving documents based on a query. The set of documents relevant to a query can be called “{Relevant},” whereas the set of documents retrieved is denoted as “{Retrieved}.” The set of documents that are both relevant and retrieved is denoted as “{Relevant}{Retrieved}.” There are two basic measures for assessing the quality of a retrieval system: precision and recall (Berry et al., 1999).
The precision of a system is the ratio of the number of relevant documents retrieved to the total number of documents retrieved. In other words, it is the percentage of retrieved documents that are in fact relevant to the query – i.e. the correct response. Precision can be represented as follows:
(Equation 2)
The recall of a system is the ratio of the number of relevant documents retrieved to the total number of relevant documents in the collection. Stated differently, it is the percentage of documents that are relevant to the query and were retrieved. Recall can be represented the following way:
(Equation 3)
4.3 Mining remote sensing data
The data volumes of remote sensing are rapidly growing. National Aeronautics and Space Administration's (NASA) Earth Observing System program alone produces massive data products with total rates more than 1.5 terabytes per day (King and Greenstone, 1999). Application and products of Earth observing and remote sensing technologies have been shown to be crucial to global social, economic, and environmental well being (Yang et al., 2001).
In order to help scientists search massive remotely sensed databases and find data of interest to them, and then order the selected data sets or subsets, several information systems have been developed for data ordering purposes to face the challenges of the rapidly growing volumes of data, since the traditional method where a user downloads data and uses local tools to study the data residing on a local storage system is no longer helpful. To find interesting data, scientists need an effective and efficient way to search through the data. Metadata are provided in a database to support data searching by commonly used criteria such as spatial coverage, temporal coverage, spatial resolution, and temporal resolution.
The need for mining remote sensing data is captured by a group of scholars who collaborated on a book titled Recent Advances in Remote Sensing and GIS in Sub-Sahara Africa (2012) edited by Courage Kamusoko et al. The contributors demonstrate that the use of remote sensing data and analytical techniques such as GIS is imperative for the study of an extensive area because topographic maps at a scale of 1:50,000 or larger are not available for detailed mapping on the ground. They also show that basic socio-economic and physical data, such as census data, environmental data, and infrastructure data, are either lacking for many African countries or not kept updated for modeling analyses.
The United States Agency for International Development (USAID) saw the need for mining remote sensing data pertaining to Africa when it requested that the United States Geological Survey Center for Earth Resources Observation and Science undertake a review of current and potential capabilities at regional remote sensing centers across Africa and to employ remote sensing applications for societal benefit. The USAID also requested an evaluation of the utility and appropriateness of a web-based data, information, and decision support system portal such as a SERVIR model for Africa, just as it is being used for Mesoamerica (United States Geological Survey, 2012).
As Aida Opoku-Mensah, Chairperson of the African Association of Remote Sensing of the Environment, remarked during her welcome address at the organization's eighth annual conference convened in Addis Ababa, Ethiopia from October 25 to 29, 2010, Africa faces major pressing issues dealing with climate change impact, water scarcity, energy shortage, environmental stresses, and food crises that affect citizens, business, and the community at large. She noted that efforts were being made by African countries to work out strategies and policies coordinated on matters dealing with the environment and sustainable development. She also emphasized the fact that of all the challenges, constitution of coherent seamless and up-to-date spatially enabled information is a significant precondition for setting up coordinated policy and strategy (African Association of Remote Sensing of the Environment, 2010).
In South Africa, the use of remote sensing and satellite imaging is coming of age, as they are being employed widely in the country to develop sustainable agricultural and natural resource management tools to confront climate change. Remote sensing is proving to be more successful in South Africa than in other African countries for two reasons. First, South Africa has the required telecommunications network infrastructure to support remote sensing; second, its higher internet capacity means that information can be transmitted consistently and timely (Integrated Regional Information Networks, 2012).
4.4 Mining astronomical data
Astronomy has become an immensely data-rich field, with numerous digital sky surveys across a range of wavelengths and many terabytes of pixels and billions of detected sources, often with tens of measured parameters for each object. The problem with the astronomical database is not only the very large size, but also the variable quality of the data and the nature of astronomical objects with their very wide dynamic range in apparent luminosity and size present additional challenges. The great changes in astronomical data enable scientists to map the universe systematically, and in a panchromatic manner. Scientists can study the galaxy and the large-scale structure in the universe statistically and discover unusual or new types of astronomical objects and phenomena (Brunner et al., 2002).
The imperative for mining astronomical data for learning and teaching about Africa is captured by the South African Virtual Observatory when on November 24, 2011, on behalf of the South African Astronomical Observatory and the Center for High Performance Computing, announced the release of the Smithsonian Astrophysical Observatory/NASA Astrophysics Data System mirror in Cape Town, South Africa. The mirror site (defined as an exact copy of another remote site on the World Wide Web) is the first of its kind hosting important astrophysics data archive in Africa (South African Virtual Observatory, 2011).
In fact, as the African Astronomical Society (AAS) noted in 2010, the call for a Pan-African professional society of astronomers to mine astronomical data went back several years before that. According to the organization, Peter Martinez of South Africa and Pius Okeke of Nigeria published articles on ways to develop astronomy in Africa, and the latter called for the formation of a Pan-African style AAS. Regional professional astronomical societies had been formed in both West Africa and East Africa, North Africans had organized professional astronomy organizations, and South Africa had been engaged in the astronomy field for a long time. But it was at the 2010 launch of the African Physical Society in Dakar, Senegal that a number of astronomers from across Africa and its Diaspora decided to form the AAS. Following the meeting, Okeke wrote a whitepaper on the formation and the structure of the AAS that was widely disseminated amongst African astronomers (STEMconnector, 2010).
4.5 Mining bioinformatics data
Bioinformatics is described by Cannataro et al. (2004) as a bridge between the life sciences and computer science. It has also been described by Barker and Thornton (2004) as a cross-disciplinary field in which biologists, computer scientists, chemists, and mathematicians work together, each bringing a unique point of view. The term bioinformatics has a range of interpretations, but the core activities of bioinformatics are widely acknowledged: storage, organization, retrieval, and analysis of biological data obtained by experiments or by querying databases.
The increasing volume of biological data collected in recent years has prompted increasing demand for bioinformatics tools for genomic and proteomic (the set of proteins encoded by the genome to define models representing and analyzing the structure of the proteins contained in each cell) data analysis.
Mining bioinformatics data to learn and teach about Africa is a need that is very well known by the Human Heredity and Health in Africa (H3Africa) project. As the first ever Pan-African research program on disease and DNA, H3Africa was birthed in Ethiopia in 2011 by scientists that came from across the continent. The project seeks to mine bioinformatics data to help unravel how Africa's genes deal with illnesses such as tuberculosis, heart disease, and sleeping sickness (South African Bioinformatics Institute, 2011).
There also is the African Society for Bioinformatics and Computational Biology (ASBCB), a non-profit professional organization that seeks to advance bioinformatics and computational biology on the continent. It serves as an international forum and resource for the development of competence and expertise in the field. Through liaison and cooperation with other similar international organizations, ASBCB promotes the African standing in bioinformatics and computational biology in the world (African Society for Bioinformatics and Computational Biology, 2012).
Non-African institutions and laboratories have also shown a keen interest in the development of bioinformatics programs across Africa. For example, in 2005, the Conference on the Bioinformatics of African Pathogens and Disease Vectors convened in Nairobi, Kenya attracted five French institutions and laboratories. Participants from Africa came from institutions and laboratories in Burkina Faso, Ivory Coast, Kenya, Mali, Morocco, Nigeria, South Africa, The Sudan, and Tunisia (Lefort, 2005).
5. Research methodology
The following is a discussion of the proposed approach for mining massive data sets for studying Africa from an African-centric perspective. It depends on the METANET concept: a heterogeneous collection of scientific databases envisioned as a national and international digital data library which would be available via the internet. We consider a heterogeneous collection of massive databases such as remote sensing data and text data. The discussion is divided into two separate, but interrelated, subsections: the automated generation of metadata and the query and search of the metadata.
Through METANET, Data Documentation Initiative (DDI), and OpenSurvey methodologies can be used to collect data in areas where the technological infrastructure is less developed and less consistent. DDI allows researchers to use XML-based tools, using open standards, to access extensive machine-readable textual descriptions of past surveys, and make them more readily available. OpenSurvey will make it possible for researcher to use survey software and open source software to generate data. The common tools the researcher can use through open survey AskML, an XML-based metadata standard for a survey instrument, and TabsML, an instrument used to access crosstab reports (
Also, following Allert et al. (2004), humanistic approaches such as ubuntugogy can provide unique benefits across a region. This is because these approaches allow a researcher to decipher self-reflexive, subjects as part of the context and personalities.
5.1 Automated generation of metadata
In general, it is assumed there are metadata that describe file and variable type and organization, but that have minimal information on scientific content of the data. For example, the 2010 National Science Foundation (NSF) report states that common data formats are necessary for the wider accessibility of survey data. Future data collections by the American National Election Survey (ANES), the General Social Survey (GSS), and the Panel Study of Income Dynamism (PSID) must meet common standards for machine readability (National Science Foundation (NSF), 2010, p. 10). The 2007 NSF report describes the current modes of dissemination are confusing and badly outdated (National Science Foundation (NSF), 2007, p. 7). In the raw form, a data set and its metadata have minimal usability. For example, a satellite-based remote sensing platform will produce thousands of image data sets in the same file form based on the same instruments over the same geographic regions. However, only the image data sets with certain patterns in the image will be of interest to the scientist. Without additional metadata about the content, the social scientist would have to scan all of these images, a daunting prospect for terabyte data sets. This challenge is noted in NSF 2010 report, “Some data sets lend themselves to broad distribution to the media, policy analysts, and the lay public. These data sets tend to focus on the dissemination of aggregate-level data or completely anonymized public-use data” (NSF, 2010, p. 12). Thus a strategy for making the data usable is to link the data set to digital objects that are used to index the data set. The search operation for a particular structure in a data set then becomes a simple indexing operation on the digital objects linked to the data set. The idea is to link digital objects with scientific meaning to the data set at hand – in this case, the ANES, GSS, and PSID. The digital objects become part of the searchable metadata associated with the data set. It should be said that the goal of creating digital objects reflecting the scientific content of the data is not to replace the judgment of the scientist, but to narrow the scope of the data sets that the scientist must consider. It is quite possible that some of the patterns found by the automated methods will be inappropriate. Metadata in of itself is useless to scientists, however, when combined with digital objects metadata is an invaluable tool.
The key element is to automate the process of creating digital objects with scientific meaning to be linked to the data set. The digital objects will essentially be named patterns we find in the data sets. The concept is to have a background process, launched either by the database owner or, more likely, via applet created by the virtual data center (e.g. a VDADC), examining databases available on the data-web and searching within data sets for recognizable patterns. When a pattern is found in a particular data set, the digital object corresponding to that pattern is made part of the metadata associated with that data set. Also pointers would be added to that metadata pointing to metadata associated with other distributed databases containing the same pattern.
This metadata will be located in the virtual data center and through this metadata, distributed databases will be linked. This linking is to be done on the fly as data is accumulated in the database. On existing databases, the background process would run as compute cycles are available. The idea is that because the database is dynamic, the background process would always be running adding metadata dynamically. In fact, the PSID database is a dynamic metadata (NSF, 2007, p. 10). It permits the user to construct dynamic codebooks that reflect the actual data downloaded, and meet data management standards likely to remain applicable well into the future (NSF, 2010, p. 13).
In order to link photos and metadata, each photo will be associated with patterns in the data set. These named patterns will be coupled with a photo and this data will be shared and updated from a virtual data center.
Patterns to be searched for are to be generated by one of at least three different methods, that are empirical or statistical patterns, model-based patterns, and patterns found by clustering algorithms. By empirical or statistical patterns in the data, we mean patterns that have been observed over a long period of time that may be thought to have some underlying statistical structure. This could be a pattern which might be speculative and for which the social scientist would like to have additional verification. Certain weather patterns such as hurricanes in late summers in the subtropical zones or certain protein patterns in DNA sequencing are examples of empirical or statistical patterns. NSF (2010) report states an example that The Social Explorer provides a model for providing lay users access to summary analyses and to a tool to create their own data visualizations, especially empirical patterns with geospatial features that are easily represented in maps (NSF, 2010, p. 15). NSF 2007 report shows a remarkable number of time series tapping numerous aspects of Americans’ attitudes and behavior patterns have accumulated during the last 30 years, and this treasure trove of data has been mined by thousands of scholars who have produced thousands of publications as a result (NSF, 2007, p. 50). Model-based patterns clearly are predictive and would be of interest if verified in real data. Statistical, empirical, and model-based patterns all originate with the scientists and have some intellectual imperative behind them. The patterns found by clustering methods by contrast are patterns that are found by purely automated techniques that may or may not have scientific significance. The idea is to flag for the social scientist unusual patterns that bear further investigation. Statistical clustering methods have received considerable attention and extremely effective recursive, non-parametric methods might be employed to accomplish this task. For instance, as pointed out in the NSF 2010 report, the clustered sample design led to the need to help researchers correct for design effects in their estimation (NSF, 2010, p. 39). Patterns will be identified by using empirical or statistical patterns, model-based patterns, and patterns found by clustering algorithms.
5.2 Query and search
The idea of the automated creation of metadata is to develop metadata that reflects the scientific content of the data sets within the database rather just data structure information. The locus of the metadata is the virtual data center. The end user would see only the virtual data center. The original metadata, resident in the actual data centers, would be reproduced in the virtual center. However, that original metadata would be augmented by metadata collected by the automated creation procedures mentioned above, by pointers used to link related data sets in distributed databases, and by metadata collected in the process of interacting with system users.
The end user will see the metadata in the virtual datacenter but the data will be entered from several actual data centers.
The general desirata for the scientist is to have a comparatively vague question that can be sharpened as the scientist interacts with the system. For example, “give me data about pollution in the Chesapeake Bay” might be an initial query that would possibly be sharpened to something like “give me data about nitrate and ammonia concentrations in the Chesapeake Bay within 8 miles of the entry of waters from the Potomac River into the Bay.” For example, in the NSF 2007 report, the content of the core and how it has evolved over time should be made clearer to all potential users of the GSS. The GSS should sketch out the core set of items, but the core should be allowed to evolve over time via the interaction of the GSS Board, GSS PIs, and the user community (NSF, 2007, p. 6). Clearly, even the second query is comparatively vague. It may be that data is accessible from several distributed databases for this type of query through the following logic. Nitrates and ammonia support algae growth that responds to infrared. Therefore, an image data set in visible light available on one database may be compared to an image data set taken in infrared by a different instrument and available on a second database may be compared in order to show a high infrared-to-visible light intensity. This is indicative of robust algae growth and that indicates excess nitrate and ammonia concentrations. This aspect is important because as stated in the NSF 2010 report, the ANES, GSS, and PSID data are distributed to researchers worldwide and via several web sites (NSF, 2010, pp. 56, 87). This excess ratio would be a statistical or possibly model-driven pattern, which was already established by the automated generation of metadata mechanism, discussed earlier. Thus the retrieval process consists of not only a browser mechanism for requesting data when the user has a precise query, but should also support an expert system query capability which will help the scientist reformulate a vague question in a form that may be submitted more precisely.
Query and search would contain four major elements:
-
(1) client browser;
-
(2) (expert system for query refinement;
-
(3) search engine; and
-
(4) reporting mechanism.
The first and last are relatively straightforward.
The system will have a browser that will enable the end user to request precise information. This system should also aid the end user in reformulating vague requests. It is expected that after some time that end users will depend less and less on the aid or help function.
5.2.1 Client browser
The client browser would be a piece of software running on the scientist's client machine. The client machine is likely to be a PC or a workstation. This component is straightforward. The idea is to have a GUI interface that would allow the user to interact with a more powerful server in the virtual data center. The client software is essentially analogous to the myriad of browsers available for the World Wide Web.
The browser will be software utilized on the end users’ different operating systems: PCs, UNIX, LINUX, etc.
5.2.2 Expert system for query refinement
There are two basic scenarios for the interaction of the scientist with the server: first, the scientist knows precisely the location and type of data he desires, and second, he knows generally the type of question he liked to ask, but has little information about the nature of the databases with which he hopes to interact. The first scenario is comparatively straightforward, but the expert system would still be employed to keep a record of the nature of the query. The idea is to use the queries as a tool in the refinement of the search process. The second scenario, however, is the more complex. The approach is to match a vague query formulated by the scientist to one or more of the digital objects discovered in the automated-generation-of-metadata phase. The expert system would initially be given rules devised by discipline experts for performing this match. Given an inquiry, the expert system would attempt to match the query to one or more digital objects (patterns). It would provide the scientist with an opportunity to confirm the match or to refine the query. This interplay would continue until the scientist is satisfied with the proposed matches. The expert system would then engage the search engine in order to synthesize the appropriate data sets. The expert system would also take advantage of the interaction with the scientist to form a new rule for matching the original query to the digital objects developed in the refinement process.
If a query is not specific then the query will be matched to a pattern of data. The end-user can except or reject the data or refine the search. For example, the specific life course points identified in the NSF 2010 report require specific query (NSF, 2010, p. 59). Also, specific query is needed to address the needs of policymakers and those who are browsing for new data sources, seeking summary analytic information, or may want to download specific variables quickly (NSF, 2007, p. 45).
Thus there are two aspects: one is the refinement of the precision of an individual search and the other is the refinement of the search process. Both aspects have the same goal; one is tactical and the other is strategic. The refinement would be greatly aided by the active involvement of the scientist. The scientist would be informed how his particular query was resolved; this allows him to reformulate the query efficiently. The log files of these iterative queries would be processed automatically to inspect the query trees and possibly, improve their structure. The end user is informed how the query problem was resolved thus informing him how a proper query is made.
Also there are two other considerations of interest. First, other experts not necessarily associated with ANES, GSS, and PSID may have examined certain data sets and have commentary in either informal annotations or in the refereed scientific literature. These commentaries should form part of the metadata associated with the data set. Part of the expert system should provide an annotation mechanism that allows users to attach commentary or library references (particularly digital library references) as metadata. Obviously, such annotations may be self-serving and potentially unreliable. However, the idea is to alert the social scientist to information that may be of use. User derived metadata would be considered secondary metadata.
The other consideration is to provide a mechanism for indicating reliability of data. This would be attached to a data set as metadata, but may in fact be derived from the original metadata. For example, a particular data collection instrument may be known to have a high variability. Thus any data set that is collected by this instrument, no matter where in the database it occurs, should have as part of the attached metadata an appropriate caveat. Thus the concept of automated collection of metadata should have a capability to not only examine the basic data for patterns, but also examine the metadata itself and based on collateral information such as just mentioned, be able to generate additional metadata.
5.2.3 Search engine
As indicated above, large-scale scientific information systems will likely be distributed in nature and contain not only the basic data but both structured metadata, for example, sensor type, sensor number, measurement date, and unstructured metadata, for example, a text-based description of the data. In fact, the ANES project is structured so that it out exists any particular project directors.
Thus, comprehensive project documentation is available to access and evaluate the methodology (NSF, 2010, p. 65). These systems will typically have multiple main repository sites that together will house a major portion of the data as well as some smaller sites, virtual data centers, containing the remainder of the data. Clearly, given the volume of the data, particularly within the main servers, high performance engines that integrate the processing of the structured and unstructured data would be required to support desired response rates for user requests.
Both database management systems (DBMS) and information retrieval systems provide some functionality to maintain data. DBMS allow users to store unstructured data as binary large objects (BLOB) and information retrieval systems allow users to enter structured data in zoned fields. However, DBMS offer only a limited query language for values that occur in BLOB attributes. Similarly, information retrieval systems lack robust functionality for zoned fields. Additionally, information retrieval systems traditionally lack efficient parallel algorithms. Using a relational database approach to information retrieval allows for parallel processing since almost all commercially available parallel engines support some relational DBMS. An inverted index may be modeled as a relation. This treats information retrieval as an application of a DBMS. Using this approach, it is possible to implement a variety of information retrieval functionality and achieve good run-time performance. Users can issue complex queries including both structured data and text.
The key hypothesis is that the use of a relational DBMS to model an inverted index will allow users to query both structured data and text via standard SQL. In this fashion, users may use any relational DBMS that supports standard SQL; allow implementation of traditional information retrieval functionality such as Boolean retrieval, proximity searches, and relevance ranking, as well as non-traditional approaches based on data fusion and machine learning techniques; take advantage of current parallel DBMS implementations so that acceptable run-time performance can be obtained by increasing the number of processors applied to the problem. One of the future challenges for data access and dissemination pointed out in the NSF 2010 report is the issue of multi-level data sets. Another issue is the adding of more contextual data (NSF, 2010, p. 78).
5.2.4 Reporting mechanism
The basic idea is not only to retrieve data sets appropriate to the needs of the scientist, but also to scale down the potentially large databases the scientist must consider. That is, the scientist would consider megabytes instead of terabytes of data. The search and retrieval process may still result in a massive amount of data. The reporting mechanism would thus initially report the nature and magnitude of the data sets to be retrieved. If the scientist agrees that the scale is appropriate to his needs, the data will be delivered by an FTP or similar mechanism to his local client machine or to another server where he wants the synthesized data to be stored.
6. Implementation and lesson from relevant work
To help scientists search for massive databases and find data of interest to them, a good information system should be developed for data ordering purposes. The system should be performing well based on the descriptive information of the scientific data sets or metadata, such as the main purpose of the data sets, the spatial and temporal coverage, the production time, the quality of the data sets, and the main features of the data sets.
Scientists want to have an idea of what the data look like before ordering them, since metadata searching alone cannot meet all scientific queries. Thus, content-based searching or browsing and preliminary analysis of data based on their actual values will be inevitable in these application contexts. One of the most common content-based queries is to find large enough spatial regions over which the geophysical parameter values fall into certain intervals given a specific observation time. The query result could be used for ordering data as well as for defining features associated with scientific concepts.
For researchers of African topics to be able to maximize the utility of this content-based query technique, there must exist a web-based prototype through which they can demonstrate the idea of interest. The prototype must deal with different types of massive databases, with special attention being given to the following and other aspects that are unique to Africa:
-
(1) African languages with words encompassing diacritical marks (dead and alive);
-
(2) western colonial languages (dead and alive);
-
(3) other languages such as Arabic, Russian, Hebrew, Chinese, etc.;
-
(4) use of desktop software such as Microsoft Word or Corel WordPerfect to type words with diacritical marks and then copy and paste them into internet search lines; and
-
(5) copying text in online translation sites and translating them into the target language.
The following paragraphs provide lessons from relevant work.
The underlying approach must be pluridiscipinary, which involves the use of open and resource-based techniques available in the actual situation. It has, therefore, to draw upon the indigenous knowledge materials available in the locality and make maximum use of them. Indigenous languages are, therefore, at the center of the effective use of this methodology.
What all this suggests is that the researcher must revisit the indigenous techniques that take into consideration the epistemological, cosmological, and methodological challenges. Hence, the researcher must be culture-specific and knowledge-source-specific in his/her orientation. Thus, the process of redefining the boundaries between the different disciplines in our thought process is the same as that of reclaiming, reordering and, in some cases, reconnecting those ways of knowing, which were submerged, subverted, hidden, or driven underground by colonialism and slavery. The research should, therefore, reflect the daily dealings of society and the challenges of the daily lives of the people. Toward this end, at least the following six questions should guide pluridisciplinary research:
-
(1) How can the research increase indigenous knowledge in the general body of global human development?
-
(2) How can the research create linkages between the sources of indigenous knowledge and the centers of learning on the continent and the Diaspora?
-
(3) How can centers of research in the communities ensure that these communities become “research societies”?
-
(4) How can the research be linked to the production needs of the communities?
-
(5) How can the research help to ensure that science and technology are generated in relevant ways to address problems of the rural communities where the majority of the people live and that this is done in indigenous languages?
-
(6) How can the research help to reduce the gap between the elite and the communities from which they come by ensuring that the research results are available to everyone and that such knowledge is drawn from the communities? (for more on this approach, see Bangura, 2005).
The prototype system will allow scientists to make queries against disparate types of databases.
For instance, queries on remote sensing data can focus on the features observed in images. Those features may be environmental or artificial features which consist of points, lines, or areas. Recognizing features is the key to interpretation and information extraction. Images differ in their features, such as tone, shape, size, pattern, texture, shadow, association, etc.
Other features of the images that also should be taken into consideration include percentage of water, green land, cloud forms, snow, and so on. The prototype system will help scientists to retrieve images that contain different features; the system should be able to handle complex queries. This calls for some knowledge of African fractals, which have been defined as a self-similar pattern – i.e. a pattern that repeats itself on an ever diminishing scale (Bangura, 2000b, p. 7).
As Ron Eglash (1999) has demonstrated, first, traditional African settlements typically show repetition of similar patterns at ever-diminishing scales: circles of circles of circular dwellings, rectangular walls enclosing ever-smaller rectangles, and streets in which broad avenues branch down to tiny footpaths with striking geometric repetition. He easily identified the fractal structure when he compared aerial views of African villages and cities with corresponding fractal graphics simulations. To estimate the fractal dimension of a spatial pattern, Eglash used several different approaches. In the case of Mokoulek, for instance, which is a black-and-white architectural diagram, a two-dimensional version of the ruler size vs length plots were employed. For the aerial photo of Labbazanga, however, an image in shades of gray, a Fourier transform was used. Nonetheless, according to Eglash, we cannot just assume that African fractals show an understanding of fractal geometry, nor can we dismiss that possibility. Thus, he insisted that we listen to what the designers and users of these structures have to say about it. This is because what may appear to be an unconscious or accidental pattern might actually have an intentional mathematical component.
Second, as Eglash examined African designs and knowledge systems, five essential components (recursion, scaling, self-similarity, infinity, and fractional dimension) kept him on track of what does or does not match fractal geometry. Since scaling and self-similarity are descriptive characteristics, his first step was to look for the properties in African designs. Once he established that theme, he then asked whether or not these concepts had been intentionally applied, and started to look for the other three essential components. He found the clearest illustrations of indigenous self-similar designs in African architecture.
The examples of scaling designs Eglash provided vary greatly in purpose, pattern, and method.
As he explained, while it is not difficult to invent explanations based on unconscious social forces – for example, the flexibility in conforming designs to material surfaces as expressions of social flexibility – he did not believe that any such explanation can account for its diversity. He found that from optimization engineering, to modeling organic life, to mapping between different spatial structures, African artisans had developed a wide range of tools, techniques, and design practices based on the conscious application of scaling geometry. Thus, for example, instead of using the Koch curve to generate the branching fractals used to model the lungs and acacia tree, Eglash used passive lines that are just carried through the iterations without change, in addition to active lines that create a growing tip by the usual recursive replacement.
For the text database, the prototype system must consider polysymy and synonymy problems in the queries. Polysymy means words having multiple meanings: e.g. “order,” “loyalty,” and “ally.” Synonymy means multiple words having the same meaning: e.g. “jungle” and “forest,” “tribe” and “ethnic-group,” “language” and “dialect,” “tradition” and “primitive,” “corruption”, and “lobbying.” The collected documents will be placed into categories depending on the documents’ subjects. Scientists can search into those documents and retrieve only the ones related to queries of interest. Scientists can search via words or terms, and then retrieve documents on the same category or from different categories as long as they are related to the words or terms in which the scientists are interested.
7. Conclusions and recommendations
The main concept of the automated metadata is creating a digital object and linking it to the data set to make the data usable and at the same time the search operation for particular structure in the data set easy to access. Researchers should consider scalability when working with a massive data set.
Data mining techniques and visualization must play a pivotal role in retrieving substantive electronic data to study and teach about African phenomena in order to discover unexpected correlations and causal relationships, and understand structures and patterns in massive data. In light of all these possibilities, it is imperative that there be on-the-ground commitment on the part of implementers, as well as university and government authorities, in order to achieve sustainable ICT in Africa. Only through their participation will the internet transform the classroom, change the nature of learning and teaching, and change information seeking, organizing, and using behavior.
Indeed, as Bangura (2005) has suggested, the provision of education in Africa must employ ubuntugogy (which we define as the art and science of teaching and learning undergirded by humanity toward others) to serve as both a given and a task or desideratum for educating students. Ubuntugogy is undoubtedly part and parcel of the cultural heritage of Africans. Nonetheless, it clearly needs to be revitalized in the hearts and minds of some Africans. Although compassion, warmth, understanding, caring, sharing, humanness, etc., are underscored by all the major world orientations, ubuntu serves as a distinctly African rationale for these ways of relating to others. The concept of ubuntu gives a distinctly African meaning to, and a reason of motivation for, a positive attitude toward the other. In light of the calls for an African Renaissance, ubuntugogy urges Africans to be true to their promotion of peaceful relations and conflict resolution, educational and other developmental aspirations. We ought never to falsify the cultural reality (life, art, literature) which is the goal of the student's study. Thus, we would have to oppose all sorts of simplified or supposedly simplified approaches and stress instead the methods which will achieve the best possible access to real life, language, and philosophy.
As Bangura explicates, at least three major tenets of ubuntu can be delineated. The first major tenet of ubuntu rests upon its religiosity. While western humanism tends to underestimate or even deny the importance of religious beliefs, ubuntu, or African Humanism is decidedly religious. For the westerner, the maxim, “A person is a person through other persons,” has no obvious religious connotations. S/he will probably think it is nothing more than a general appeal to treat others with respect and decency. However, in African tradition, this maxim has a deeply religious meaning. The person one is to become “through other persons” is, ultimately, an ancestor. By the same token, these “other persons” include ancestors. Ancestors are extended family. Dying is an ultimate homecoming. Not only must the living and the dead share with and care for one another, but the living and the dead depend on one another (Bangura, 2005).
Bangura points out that this religious tenet is congruent with the daily experience of most Africans. For example, at a calabash, an African ritual that involves drinking of African beer, a little bit of it is poured on the ground for consumption by ancestors. Many Africans also employ ancestors as mediators between them and God. In African societies, there is an inextricable bond between humans, ancestors, and the Supreme Being. Therefore, ubuntu inevitably implies a deep respect and regard for religious beliefs and practices (Bangura, 2005).
The second major tenet of ubuntu, according to Bangura, hinges upon its consensus building. African traditional culture has an almost infinite capacity for the pursuit of consensus and reconciliation. African style democracy operates in the form of (some times extremely lengthy) discussions. Although there may be a hierarchy of importance among the speakers, every person gets an equal chance to speak up until some kind of an agreement, consensus, or group cohesion is reached. This important aim is expressed by words like simunye (“we are one”: i.e. “unity is strength”) and slogans like “an injury to one is an injury to all” (Bangura, 2005).
Bangura notes that the desire to agree within the context of ubuntu safeguards the rights and opinions of individuals and minorities to enforce group solidarity. In essence, ubuntu requires an authentic respect for human/individual rights and related values, and an honest appreciation of differences (Bangura, 2005).
The third major tenet of ubuntu, Bangura notes, rests upon dialogue, with its particularity, individuality and historicality. Ubuntu inspires us to expose ourselves to others, to encounter the differences of their humanness in order to inform and enrich our own. Thus understood, umuntu ngumentu ngabantu translates as “To be human is to affirm one's humanity by recognizing the humanity of others in its infinite variety of content and form.” This translation of ubuntu highlights the respect for particularity, individuality and historicality, without which a true African educational paradigm cannot reemerge (Bangura, 2005).
Furthermore, according to Bangura, the ubuntu respect for the particularities of the beliefs and practices of others is especially emphasized by the following striking translation of umuntu ngumentu ngabantu: “A human being through (the otherness of) other human beings.” Ubuntu dictates that, if we were to be human, we need to recognize the genuine otherness of our fellow humans. In other words, we need to acknowledge the diversity of languages, histories, values and customs, all of which make up a society (Bangura, 2005).
Bangura maintains that ubuntu's respect for the particularity of the other is closely aligned to its respect for individuality. But the individuality which ubuntu respects is not the Cartesian type. Instead, ubuntu directly contradicts the Cartesian conception of individuality in terms of which the individual or self can be conceived without thereby necessarily conceiving the other. The Cartesian individual exists prior to, or separately and independently from, the rest of the community or society. The rest of society is nothing but an added extra to a pre-existent and self-sufficient being. This “modernistic” and “atomistic” conception of individuality underscores both individualism and collectivism. Individualism exaggerates the seemingly solitary aspects of human existence to the detriment of communal aspects. Collectivism makes the same mistake on a larger scale. For the collectivist, society comprises a bunch of separately existing, solitary (i.e. detached) individuals (Bangura, 2005).
Contrastingly, Bangura argues, ubuntu defines the individual in terms of his/her relationship with others. Accordingly, individuals only exist in their relationships with others; and as these relationships change, so do the characters of the individuals. In this context, the word “individual” signifies a plurality of personalities corresponding to the multiplicity of relationships in which the individual in question stands. Being an individual by definition means “being-with-others.” “With-others” is not an additive to a pre-existent and self-sufficient being; instead, both this being (the self) and the others find themselves in a whole wherein they are already related. This is all somewhat boggling for the Cartesian mind, whose conception of individuality must now move from solitary to solidarity, from independence to interdependence, from individuality vis-à-vis community to individuality à la community (Bangura, 2005).
In the west, according to Bangura, individualism often translates into rugged competition. Individual interest is modus vivendi, and society or others are regarded as a means to individual ends. This is in stark contrast to the African preference for co-operation, group work or shosholoza (“work as one”: i.e. team work). The stokvels in South Africa are approximated to be 800,000. Stockvels are joint undertakings or collective enterprises, such as savings clubs, burial societies and other co-operatives. The stockvel economy might be described as capitalism with sazi (humanness), or, put differently, a socialist form of capitalism. Making a profit is important, but never if it involves the exploitation of others. Profits are equally shared. Thus, stockvels are based on the ubuntu “extended family system”: i.e. all involved should be considered as brothers and sisters, members of the same family (Bangura, 2005).
Indeed, for Bangura, the ubuntu conception of individuality may seem contradictory, since it claims that the self or individual is constituted by its relations with others. But if this is the case, then what are the relations between? Can persons and personal relations really be equally primordial? African thought addresses this (apparent) contradiction in the idea of seriti: i.e. an energy, power of force which both makes us ourselves and unites us in personal interaction with others. This idea allows us to see the self and others as “equiprimordial” or as aspects of the same universal field of force. This distinctive African inclination toward collectivism and collective sense of responsibility does not negate individualism. It merely discourages the notion that the individual should take precedence over community (Bangura, 2005).
Consequently, Bangura points out, an oppressive communalism constitutes a derailment, an abuse of ubuntu. True ubuntu incorporates dialogue: i.e. it intertwines both relation and distance. It preserves the other in his/her otherness, in his/her uniqueness, without letting him/her slip into the distance (Bangura, 2005).
Thus, according to Bangura, the emphasis on the “ongoingness” of the contact and interaction with others on which the African subjectivity feeds suggests a final important ingredient of the “mutual exposure” mandated by ubuntu: i.e. respecting the historicality of the other. This means respecting his/her dynamic nature or process nature. The flexibility of the other is well noted in ubuntu. In other words, for the African humanist, life is without absolutes. An ubuntu perception of the other is never fixed or rigidly closed; rather, it is adjustable or openended. It allows the other to be, to become. It acknowledges the irreducibility of the other: i.e. it never reduces the other to any specific characteristic, conduct, or function. This underscores the concept of ubuntu which denotes both a state of being and one of becoming. As a process of self-realization through others, it simultaneously enriches the self-realization of others (Bangura, 2005).
Table I
Latest data on internet usage in Africa compared to the rest of the world
[Figure omitted. See PDF]
Equation 1
Equation 2
Equation 3
© Emerald Group Publishing Limited 2014
