[
    {
        "content": "Data Solution Offering” [20]. The methodology addresses handling big data in terms of useful permutations of data sources, complexity in interrelationships, and difficulty in deleting (or modifying) individual records [21]. \nVarious studies since 2012 showed that a multiple-layer architecture is one option to address the issues that big data presents. A distributed parallel architecture distributes data across multiple servers; these parallel execution environments can dramatically improve data processing speeds. This type of architecture inserts data into a parallel DBMS, which implements the use of MapReduce and Hadoop frameworks. This type of framework looks to make the processing power transparent to the end user by using a front-end application server [22]. \nBig data analytics for manufacturing applications is marketed as a 5C architecture (connection, conversion, cyber, cognition, and configuration) [23]. \nData lake allows an organization to shift its focus from centralized control to a shared model to respond to the changing dynamics of information management. This enables quick segregation of data into the data lake, thereby reducing the overhead time [24, 25]. \nBig data has increased the demand of information management specialists so much so that Software AG, Oracle Corporation, IBM, Microsoft, SAP, EMC, HP and Dellhave spent more than $$ 15$ billion on software firms specializing in data management and analytics. In 2010, this industry was worth more than $$ 100$ billion and was growing at almost $10 %$ a year: about twice as fast as the software business as a whole [5]. \nDeveloped economies increasingly use data-intensive technologies. There are 4.6 billion mobile-phone subscriptions worldwide, and between 1 billion and 2 billion people accessing the Internet [5]. Between 1990 and 2005, more than 1 billion people worldwide entered the middle class, which means more people became more literate, which in turn led to information growth. The world’s effective capacity to exchange information through telecommunication networks was 281 petabytes in 1986, 471 petabytes in 1993, 2.2 exabytes in 2000, 65 exabytes in 2007 [10] and predictions put the amount of Internet traffic at 667 exabytes annually by 2014 [5]. According to one estimate, one-third of the globally stored information is in the form of alphanumeric text and still image data [26], which is the format most useful for most big data applications. This also shows the potential of yet unused data (i.e. in the form of video and audio content). \nConsider the main sources of Big Data. \nNetworking Ways of Communication Between People on the Planet \nAs the development of modern means of communication between people, such as Mobile Communications, the Internet, Social Networks and other, volumes of data generated by people increase in an avalanche. According to the analytical studies of the International Labor Organization at the United Nations, these changes have the following features: \n\n– If the number of Internet devices in the world in 1984 was at the level of 1,000, now it has reached 15 billion, about 2.5 per every inhabitant of the Planet; As of 1900, the amount of human knowledge doubled every 100 years. Now, due to global “digitalization”, they are doubled every 2 years. At the same rate, the volume of new data produced by mankind is growing; This means that now for university students, the newest knowledge that they receive during the first year of training already in the third year becomes obsolete; These changes are no longer linear in time. According to the UN, they are exponential, and the new digital world is called exponential. \nInternet of Things (IoT) \nBig data and the IoT work in conjunction. Data extracted from IoT devices provides a mapping of device interconnectivity. Such mappings have been used by the media industry, companies and governments to more accurately target their audience and increase media efficiency. IoT is also increasingly adopted as a means of gathering sensory data, and this sensory data has been used in medical [27] and manufacturing [28] contexts. \nKevin Ashton, digital innovation expert who is credited with coining the term [29], defines the Internet of Things in this quote: “If we had computers that knew everything there was to know about things—using data they gathered without any help from us—we would be able to track and count everything, and greatly reduce waste, loss and cost. We would know when things needed replacing, repairing or recalling, and whether they were fresh or past their best.” \nInformation Technology \nEspecially since 2015, big data has come to prominence within Business Operations as a tool to help employees work more efficiently and streamline the collection and distribution of Information Technology (IT). The use of big data to resolve IT and data collection issues within an enterprise is called IT Operations Analytics (ITOA) [30]. By applying big data principles into the concepts of machine intelligence and deep computing, IT departments can predict potential issues and move to provide solutions before the problems even happen [30]. In this time, ITOA businesses were also beginning to play a major role in systems management by offering platforms that brought individual data silos together and generated insights from the whole of the system rather than from isolated pockets of data.",
        "chapter": "Introduction",
        "section": "Networking Ways of Communication Between People on the Planet",
        "subsection": "N/A",
        "subsubsection": "N/A"
    },
    {
        "content": "– If the number of Internet devices in the world in 1984 was at the level of 1,000, now it has reached 15 billion, about 2.5 per every inhabitant of the Planet; As of 1900, the amount of human knowledge doubled every 100 years. Now, due to global “digitalization”, they are doubled every 2 years. At the same rate, the volume of new data produced by mankind is growing; This means that now for university students, the newest knowledge that they receive during the first year of training already in the third year becomes obsolete; These changes are no longer linear in time. According to the UN, they are exponential, and the new digital world is called exponential. \nInternet of Things (IoT) \nBig data and the IoT work in conjunction. Data extracted from IoT devices provides a mapping of device interconnectivity. Such mappings have been used by the media industry, companies and governments to more accurately target their audience and increase media efficiency. IoT is also increasingly adopted as a means of gathering sensory data, and this sensory data has been used in medical [27] and manufacturing [28] contexts. \nKevin Ashton, digital innovation expert who is credited with coining the term [29], defines the Internet of Things in this quote: “If we had computers that knew everything there was to know about things—using data they gathered without any help from us—we would be able to track and count everything, and greatly reduce waste, loss and cost. We would know when things needed replacing, repairing or recalling, and whether they were fresh or past their best.” \nInformation Technology \nEspecially since 2015, big data has come to prominence within Business Operations as a tool to help employees work more efficiently and streamline the collection and distribution of Information Technology (IT). The use of big data to resolve IT and data collection issues within an enterprise is called IT Operations Analytics (ITOA) [30]. By applying big data principles into the concepts of machine intelligence and deep computing, IT departments can predict potential issues and move to provide solutions before the problems even happen [30]. In this time, ITOA businesses were also beginning to play a major role in systems management by offering platforms that brought individual data silos together and generated insights from the whole of the system rather than from isolated pockets of data.",
        "chapter": "Introduction",
        "section": "Internet of Things (IoT)",
        "subsection": "N/A",
        "subsubsection": "N/A"
    },
    {
        "content": "– If the number of Internet devices in the world in 1984 was at the level of 1,000, now it has reached 15 billion, about 2.5 per every inhabitant of the Planet; As of 1900, the amount of human knowledge doubled every 100 years. Now, due to global “digitalization”, they are doubled every 2 years. At the same rate, the volume of new data produced by mankind is growing; This means that now for university students, the newest knowledge that they receive during the first year of training already in the third year becomes obsolete; These changes are no longer linear in time. According to the UN, they are exponential, and the new digital world is called exponential. \nInternet of Things (IoT) \nBig data and the IoT work in conjunction. Data extracted from IoT devices provides a mapping of device interconnectivity. Such mappings have been used by the media industry, companies and governments to more accurately target their audience and increase media efficiency. IoT is also increasingly adopted as a means of gathering sensory data, and this sensory data has been used in medical [27] and manufacturing [28] contexts. \nKevin Ashton, digital innovation expert who is credited with coining the term [29], defines the Internet of Things in this quote: “If we had computers that knew everything there was to know about things—using data they gathered without any help from us—we would be able to track and count everything, and greatly reduce waste, loss and cost. We would know when things needed replacing, repairing or recalling, and whether they were fresh or past their best.” \nInformation Technology \nEspecially since 2015, big data has come to prominence within Business Operations as a tool to help employees work more efficiently and streamline the collection and distribution of Information Technology (IT). The use of big data to resolve IT and data collection issues within an enterprise is called IT Operations Analytics (ITOA) [30]. By applying big data principles into the concepts of machine intelligence and deep computing, IT departments can predict potential issues and move to provide solutions before the problems even happen [30]. In this time, ITOA businesses were also beginning to play a major role in systems management by offering platforms that brought individual data silos together and generated insights from the whole of the system rather than from isolated pockets of data. \n• Walmart handles more than 1 million customer transactions every hour, which are imported into databases estimated to contain more than 2.5 petabytes (2560 terabytes) of data—the equivalent of 167 times the information contained in all the books in the US Library of Congress. Windermere Real Estate uses location information from nearly 100 million drivers to help new home buyers determine their typical drive times to and from work throughout various times of the day [31].   \n• FICO Card Detection System protects accounts worldwide [32]. \nScience \n• The Large Hadron Collider experiments represent about 150 million sensors delivering data 40 million times per second. There are nearly 600 million collisions per second. After filtering and refraining from recording more than $9 9 . 9 9 9 9 5 %$ [33] of these streams, there are 100 collisions of interest per second [34–36]. \nAs a result, only working with less than $0 . 0 0 1 %$ of the sensor stream data, the data flow from all four LHC experiments represents 25 petabytes annual rate before replication (as of 2012). This becomes nearly 200 petabytes after replication. \nIf all sensor data were recorded in LHC, the data flow would be extremely hard to work with. The data flow would exceed 150 million petabytes annual rate, or nearly 500 exabytes per day, before replication. To put the number in perspective, this is equivalent to 500 quintillion $( 5 times 1 0 ^ { 2 0 } )$ bytes per day, almost 200 times more than all the other sources combined in the world. \n• The Square Kilometre Array is a radio telescope built of thousands of antennas. It is expected to be operational by 2024. Collectively, these antennas are expected to gather 14 exabytes and store one petabyte per day [37, 38]. It is considered one of the most ambitious scientific projects ever undertaken [39]. When the Sloan Digital Sky Survey (SDSS) began to collect astronomical data in 2000, it amassed more in its first few weeks than all data collected in the history of astronomy previously. Continuing at a rate of about $2 0 0 mathrm { G B }$ per night, SDSS has amassed more than 140 terabytes of information [40]. When the Large Synoptic Survey Telescope, successor to SDSS, comes online in 2020, its designers expect it to acquire that amount of data every five days [5]. Decoding the human genome originally took 10 years to process; now it can be achieved in less than a day. The DNA sequencers have divided the sequencing cost by 10,000 in the last ten years, which is 100 times cheaper than the reduction in cost predicted by Moore’s Law [5]. The NASA Center for Climate Simulation (NCCS) stores 32 petabytes of climate observations and simulations on the Discover supercomputing cluster [41, 42]. Google’s DNAStack compiles and organizes DNA samples of genetic data from around the world to identify diseases and other medical defects. These fast and",
        "chapter": "Introduction",
        "section": "Information Technology",
        "subsection": "N/A",
        "subsubsection": "N/A"
    },
    {
        "content": "• Walmart handles more than 1 million customer transactions every hour, which are imported into databases estimated to contain more than 2.5 petabytes (2560 terabytes) of data—the equivalent of 167 times the information contained in all the books in the US Library of Congress. Windermere Real Estate uses location information from nearly 100 million drivers to help new home buyers determine their typical drive times to and from work throughout various times of the day [31].   \n• FICO Card Detection System protects accounts worldwide [32]. \nScience \n• The Large Hadron Collider experiments represent about 150 million sensors delivering data 40 million times per second. There are nearly 600 million collisions per second. After filtering and refraining from recording more than $9 9 . 9 9 9 9 5 %$ [33] of these streams, there are 100 collisions of interest per second [34–36]. \nAs a result, only working with less than $0 . 0 0 1 %$ of the sensor stream data, the data flow from all four LHC experiments represents 25 petabytes annual rate before replication (as of 2012). This becomes nearly 200 petabytes after replication. \nIf all sensor data were recorded in LHC, the data flow would be extremely hard to work with. The data flow would exceed 150 million petabytes annual rate, or nearly 500 exabytes per day, before replication. To put the number in perspective, this is equivalent to 500 quintillion $( 5 times 1 0 ^ { 2 0 } )$ bytes per day, almost 200 times more than all the other sources combined in the world. \n• The Square Kilometre Array is a radio telescope built of thousands of antennas. It is expected to be operational by 2024. Collectively, these antennas are expected to gather 14 exabytes and store one petabyte per day [37, 38]. It is considered one of the most ambitious scientific projects ever undertaken [39]. When the Sloan Digital Sky Survey (SDSS) began to collect astronomical data in 2000, it amassed more in its first few weeks than all data collected in the history of astronomy previously. Continuing at a rate of about $2 0 0 mathrm { G B }$ per night, SDSS has amassed more than 140 terabytes of information [40]. When the Large Synoptic Survey Telescope, successor to SDSS, comes online in 2020, its designers expect it to acquire that amount of data every five days [5]. Decoding the human genome originally took 10 years to process; now it can be achieved in less than a day. The DNA sequencers have divided the sequencing cost by 10,000 in the last ten years, which is 100 times cheaper than the reduction in cost predicted by Moore’s Law [5]. The NASA Center for Climate Simulation (NCCS) stores 32 petabytes of climate observations and simulations on the Discover supercomputing cluster [41, 42]. Google’s DNAStack compiles and organizes DNA samples of genetic data from around the world to identify diseases and other medical defects. These fast and \nexact calculations eliminate any ‘friction points,’ or human errors that could be made by one of the numerous science and biology experts working with the DNA. DNAStack, a part of Google Genomics, allows scientists to use the vast sample of resources from Google’s search server to scale social experiments that would usually take years, instantly [43, 44]. 23andme’s DNA database contains genetic information of over 1,000,000 people worldwide [45]. The company explores selling the “anonymous aggregated genetic data” to other researchers and pharmaceutical companies for research purposes if patients give their consent [46–50]. Ahmad Hariri, professor of psychology and neuroscience at Duke University who has been using 23andMe in his research since 2009 states that the most important aspect of the company’s new service is that it makes genetic research accessible and relatively cheap for scientists [51]. A study that identified 15 genome sites linked to depression in 23andMe’s database lead to a surge in demands to access the repository with 23andMe fielding nearly 20 requests to access the depression data in the 2 weeks after publication of the paper [52]. Computational Fluid Dynamics (CFD) and hydrodynamic turbulence research generate massive datasets. The Johns Hopkins Turbulence Databases (JHTDB) contains over 350 terabytes of spatiotemporal fields from Direct Numerical simulations of various turbulent flows. Such data have been difficult to share using traditional methods such as downloading flat simulation output files. The data within JHTDB can be accessed using “virtual sensors” with various access modes ranging from direct web-browser queries, access through Matlab, Python, Fortran and C programs executing on clients’ platforms, to cut out services to download raw data. The data have been used in over 150 scientific publications. \nTechnology \n• eBay.com uses two data warehouses at 7.5 petabytes and 40 PB as well as a 40 PB Hadoop cluster for search, consumer recommendations, and merchandising [53].   \nAmazon.com handles millions of back-end operations every day, as well as queries from more than half a million third-party sellers. The core technology that keeps Amazon running is Linux-based and as of 2005 they had the world’s three largest Linux databases, with capacities of 7.8 TB, 18.5 TB, and 24.7 TB [54].   \n• Facebook handles 50 billion photos from its user base [55].   \n• Google was handling roughly 100 billion searches per month since August 2012 [56]. \nIn March 2012, The White House announced a national “Big Data Initiative” that consisted of six Federal departments and agencies committing more than $$ 200$ million to big data research projects [57].",
        "chapter": "Introduction",
        "section": "Science",
        "subsection": "N/A",
        "subsubsection": "N/A"
    },
    {
        "content": "exact calculations eliminate any ‘friction points,’ or human errors that could be made by one of the numerous science and biology experts working with the DNA. DNAStack, a part of Google Genomics, allows scientists to use the vast sample of resources from Google’s search server to scale social experiments that would usually take years, instantly [43, 44]. 23andme’s DNA database contains genetic information of over 1,000,000 people worldwide [45]. The company explores selling the “anonymous aggregated genetic data” to other researchers and pharmaceutical companies for research purposes if patients give their consent [46–50]. Ahmad Hariri, professor of psychology and neuroscience at Duke University who has been using 23andMe in his research since 2009 states that the most important aspect of the company’s new service is that it makes genetic research accessible and relatively cheap for scientists [51]. A study that identified 15 genome sites linked to depression in 23andMe’s database lead to a surge in demands to access the repository with 23andMe fielding nearly 20 requests to access the depression data in the 2 weeks after publication of the paper [52]. Computational Fluid Dynamics (CFD) and hydrodynamic turbulence research generate massive datasets. The Johns Hopkins Turbulence Databases (JHTDB) contains over 350 terabytes of spatiotemporal fields from Direct Numerical simulations of various turbulent flows. Such data have been difficult to share using traditional methods such as downloading flat simulation output files. The data within JHTDB can be accessed using “virtual sensors” with various access modes ranging from direct web-browser queries, access through Matlab, Python, Fortran and C programs executing on clients’ platforms, to cut out services to download raw data. The data have been used in over 150 scientific publications. \nTechnology \n• eBay.com uses two data warehouses at 7.5 petabytes and 40 PB as well as a 40 PB Hadoop cluster for search, consumer recommendations, and merchandising [53].   \nAmazon.com handles millions of back-end operations every day, as well as queries from more than half a million third-party sellers. The core technology that keeps Amazon running is Linux-based and as of 2005 they had the world’s three largest Linux databases, with capacities of 7.8 TB, 18.5 TB, and 24.7 TB [54].   \n• Facebook handles 50 billion photos from its user base [55].   \n• Google was handling roughly 100 billion searches per month since August 2012 [56]. \nIn March 2012, The White House announced a national “Big Data Initiative” that consisted of six Federal departments and agencies committing more than $$ 200$ million to big data research projects [57]. \nThe initiative included a National Science Foundation “Expeditions in Computing” grant of $$ 10$ million over 5 years to the AMPLab [58] at the University of California, Berkeley [59]. The AMPLab also received funds from DARPA, and over a dozen industrial sponsors and uses big data to attack a wide range of problems from predicting traffic congestion [60] to fighting cancer [61]. \nThe White House Big Data Initiative also included a commitment by the Department of Energy to provide $$ 25$ million in funding over 5 years to establish the Scalable Data Management, Analysis and Visualization (SDAV) Institute [62], led by the Energy Department’s Lawrence Berkeley National Laboratory. The SDAV Institute aims to bring together the expertise of six national laboratories and seven universities to develop new tools to help scientists manage and visualize data on the Department’s supercomputers. \nThe U.S. state of Massachusetts announced the Massachusetts Big Data Initiative in May 2012, which provides funding from the state government and private companies to a variety of research institutions [63]. The Massachusetts Institute of Technology hosts the Intel Science and Technology Center for Big Data in the MIT Computer Science and Artificial Intelligence Laboratory, combining government, corporate, and institutional funding and research efforts [64]. \nThe European Commission is funding the 2-year-long Big Data Public Private Forum through their Seventh Framework Program to engage companies, academics and other stakeholders in discussing big data issues. The project aims to define a strategy in terms of research and innovation to guide supporting actions from the European Commission in the successful implementation of the big data economy. Outcomes of this project will be used as input for Horizon 2020, their next framework program [65]. \nFacing the challenges of BD the problems of development and implementation of adequate methods, techniques and software of BD analysis (BD Mining) are extremely important. It’s worth to notice that conventional methods and techniques of Data Mining are not adequate for this goal. \nThe mankind have developed some fruitful approaches to deal with high dimension and large volumes of data. One of them widely used is clustering. Clustering enables to divide large data set into several groups of similar objects and replace the whole group by one representative object—center of cluster. Up to date many algorithms of cluster analysis were developed. But problem of clustering in real-time mode arises when new data are entering as stream data and demands new efficient methods and algorithms. \nAnother approach of reducing volume of data set is hierarchy. Hierarchical organization of data enables to structuring initial data set into several subordinate levels which gives opportunity classify objects by feature set and easily find the searched object or small group of objects by its features. \nLast years due to demand of speed processing, prediction and classification of huge volumes of data Deep learning networks were developed with large number of neuron layers. With their appearance the new problem has arisen to develop efficient methods of learning such networks. Some novel approach to solution of fast \nDeep learning is the application of so-called Group Method of Data Handling which represents very efficient tools for reducing dimensionality. \nThe present book deals with some problems of BD analysis, considers and investigates as conventional tools of Data Mining and novel efficient methods and tools as well developed for this goal. \nIn Chap. 1 methods of cluster analysis are considered. The crisp and fuzzy clustering methods are described and analyzed. New efficient possibilistic methods of clustering including robust clustering methods working under high noise level are considered. Special attention is played for development of new clustering methods which operate under data streams in on-line mode. The examples of application of clustering methods for some practical problems are presented. \nChapter 2 is devoted to analysis, training of Deep learning (DL) networks and their applications to solution some BD tasks. At the beginning structure and conventional training methods of DL are considered, the problem of vanishing gradient while training is considered and several ways of its prevention are considered (so-called methods of regularization). \nThe main attention in this chapter is played to development and presentation of so-called Hybrid GMDH-neo-fuzzy networks for solution computation intelligence task with BD. This new class of FNN turned to be efficient tools to overcome high dimensionality. In the chapter are presented several types of hybrid GMDH-FNNs and their application to the solution of real problems of prediction, classification and control. \nChapter 3 deals with classification problems. The FNN NefClass is considered as efficient tools of classification under BD conditions. The structure, training algorithms of FNN NefClass are presented and analyzed. The application of FNN NefClass for solution of medical images analysis and recognition in the problems of medical diagnostics are presented. \nAs it is known the new efficient tools for images processing and recognition are Convolutional neural networks (CNN). CNN are applied to find informative features of image which are fed into multilayered perceptron for further classification. \nIn the chapter new hybrid CNN-FNN system for image recognition is described where CNN is used for finding features of image while FNN NEFClass is used for further classification. The investigations of the suggested hybrid network and comparison with known CNN systems are performed at the practical problem of recognition of breast cancer at the standard data set BreakHis. \nChapter 4 of the book is devoted to the intellectual analysis of large historical data with the purpose of recognizing the laws of the origin and development of global systemic conflicts and with the purpose of analyzing the causes leading to these conflicts. The generalization and formalization of approaches to the recognition of C-waves of global systemic conflicts through big historical data have been carried out and general concept of description and interpretation of these waves has been proposed. Based on intellectual analysis of big data on the conflicts, taking place since $7 5 0 ~ mathrm { B } . mathrm { C }$ . up to now, have been analyzed and their general pattern has been revealed. These have been tried to foresee the next global conflict called the conflict of the 21st century. Its nature and main characteristics have been analyzed. \nThe hypotheses for a metric relation between the global periodic processes, namely between the sequence of 11-year cycles of solar activity, so called Kondratieff cycles of the development of the global economy, and the process of evolutionary structuration of the family of the C-waves of global systemic conflicts have been formulated. \nThe problem of prediction of these processes in the 21st century by using a metric approach was considered. The possible scenarios of the development of the conflict of the 21st century have been constructed and analyzed. Ideas aimed to avoiding of undesirable consequences for humanity in the case of full or partial implementation of the predicted scenarios are proposed. \nOn the whole, this chapter represents the wonderful example of application and development of general ideas and paradigms of Data Mining to detection of hidden laws in evolution of world economy and global conflicts and their systemic analysis. \nReferences \n1. D. Laney, 3D data management: controlling data volume, velocity and variety. META Group Res. Note 6(70) (2001)   \n2. P.B. Goes, Design science research in top information systems journals. MIS Q. Manag. Inf. Syst. 38(1) (2014)   \n3. B. Marr, Big data: The 5 Vs everyone must know (6 March 2014)   \n4. D. Boyd, K. Crawford, Six provocations for big data, in Social Science Research Network: A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society, https://doi. org/10.2139/ssrn.1926431 (21 Sept 2011)   \n5. Data, data everywhere. The Economist. Retrieved 9 Dec 2012 (25 Feb 2010)   \n6. Community cleverness required. Nature 455(7209), 1 (4 Sept 2008), https://doi.org/10.1038/ 455001a   \n7. O.J. Reichman, M.B. Jones, M.P. Schildhauer, Challenges and opportunities of open data in ecology. Science 331(6018), 703–705 (2011), https://doi.org/10.1126/science.1197962   \n8. J. Hellerstein, Parallel programming in the age of big data. Gigaom Blog (9 Nov 2008)   \n9. T. Segaran, J. Hammerbacher, Beautiful Data: The Stories Behind Elegant Data Solutions (O’Reilly Media, 2009), p. 257. ISBN: 978-0-596-15711-1   \n10. M. Hilbert, P. López, The world’s technological capacity to store, communicate, and compute information. Science 332(6025), 60–65 (2011), https://doi.org/10.1126/science.1200970   \n11. IBM What is big data?—Bringing big data to the enterprise. www.ibm.com. Retrieved 26 Aug 2013   \n12. M. Sh. Hajirahimova, A.S. Aliyeva, About big data measurement methodologies and indicators. Int. J. Mod. Educ. Comput. Sci. 9(10), 1–9. https://doi.org/10.5815/ijmecs.2017. 10.01.   \n13. D. Reinsel, J. Gantz, J. Rydning, Data Age 2025: The Evolution of Data to Life-Critical (International Data Corporation, Framingham). Retrieved 2 Nov 2017 (13 April 2017)   \n14. Oracle and FSN, Mastering big data: CFO strategies to transform insight into opportunity, December 2012   \n15. A. Jacobs, The pathologies of big data. ACMQueue (6 July 2009)   \n16. R. Magoulas, B. Lorica, Introduction to big data. Release 2.0 (11) (O’Reilly Media, Sebastopol) (Feb 2009)   \n17. Survey: biggest databases approach 30 terabytes. Eweek.com. Retrieved 8 Oct 2017   \n18. J. Bertolucci, Hadoop: from experiment to leading big data platform. Information Week. Retrieved on 14 Nov 2013   \n19. J. Webster, MapReduce: simplified data processing on large clusters. Search Storage. Retrieved on 25 Mar 2013 (2004)   \n20. Big data solution offering. MIKE2.0. Retrieved 8 Dec 2013   \n21. Big data definition. MIKE2.0. Retrieved 9 Mar 2013   \n22. C. Boja, A. Pocovnicu, L. Bătăgan, Distributed parallel architecture for big data. Informatica Economica 16(2), 116–127 (2012)   \n23. IMS_CPS—IMS Center. Imscenter.net. Retrieved 16 June 2016   \n24. Solving key business challenges with a big data lake. Hcltech.com. Retrieved 8 Oct 2017 (Aug 2014)   \n25. Method for testing the fault tolerance of MapReduce frameworks. Computer Networks (2015)   \n26. M. Hilbert, P. López, The world’s technological capacity to store, communicate, and compute information. Science 332(6025), 60–65 (2011), https://doi.org/10.1126/science.1200970   \n27. M. Hilbert, What is the content of the world’s technologically mediated information and communication capacity: how much text, image, audio, and video? Inf. Soc. 30(2), 127–143 (2014), https://doi.org/10.1080/01972243.2013.873748   \n28. QuiO named innovation champion of the Accenture healthtech innovation challenge. Businesswire.com. Retrieved 8 Oct 2017 (10 Jan 2017)   \n29. A software platform for operational technology innovation. Predix.com. Retrieved 8 Oct 2017   \n30. That internet of things thing   \n31. R. Solnik, The time has come: analytics delivers for IT operations. Data Cent. J. Retrieved 21 June 2016   \n32. N. Wingfield, Predicting commutes more accurately for would-be home buyers— NYTimes.com. Bits.blogs.nytimes.com. Retrieved 21 July 2013 (12 March 2013)   \n33. FICO® Falcon® Fraud Manager. Fico.com. Retrieved 21 July 2013   \n34. D. Alexandru, Prof. cds.cern.ch. CERN. Retrieved 24 March 2015   \n35. LHC Brochure, English version. A presentation of the largest and the most powerful particle accelerator in the world, the Large Hadron Collider (LHC), which started up in 2008. Its role, characteristics, technologies, etc. are explained for the general public. CERN-Brochure-2010-006-Eng. LHC Brochure, English version. CERN. Retrieved 20 Jan 2013   \n36. LHC Guide, English version. A collection of facts and figures about the Large Hadron Collider (LHC) in the form of questions and answers. CERN-Brochure-2008-001-Eng. LHC Guide, English version. CERN. Retrieved 20 Jan 2013   \n37. G. Brumfiel, High-energy physics: Down the petabyte highway. Nature 469, 282–283, https:// doi.org/10.1038/469282a (19 Jan 2011)   \n38. IBM Research—Zurich. Zurich.ibm.com. Retrieved 8 Oct 2017   \n39. Future telescope array drives development of Exabyte processing. Ars Technica. Retrieved 15 April 2015   \n40. Australia’s bid for the square kilometre array—an insider’s perspective. The Conversation. Retrieved 27 Sept 2016 (1 Feb 2012)   \n41. P. Delort, OECD ICCP technology foresight forum, 2012. Oecd.org. Retrieved 8 Oct 2017   \n42. NASA—NASA Goddard Introduces the NASA Center for Climate Simulation. Nasa.gov. Retrieved 13 April 2016   \n43. P. Webster, Supercomputing the climate: NASA’s big data mission. CSC World. Computer Sciences Corporation. Archived from the original on 4 January 2013. Retrieved 18 Jan 2013   \n44. These six great neuroscience ideas could make the leap from lab to market. The Globe and Mail. Retrieved 1 Oct 2016 (20 Nov 2014)   \n45. DNAstack tackles massive, complex DNA datasets with Google Genomics. Google Cloud Platform. Retrieved 1 Oct 2016   \n46. 23andMe—Ancestry. 23andme.com. Retrieved 29 Dec 2016   \n47. A. Potenza (13 July 2016), 23andMe wants researchers to use its kits, in a bid to expand its collection of genetic data. The Verge. Retrieved 29 Dec 2016   \n48. This startup will sequence your dna, so you can contribute to medical research. Fast Company. Retrieved 29 Dec 2016 (23 Dec 2016)   \n49. C. Seife, 23andMe Is terrifying, but not for the reasons the FDA thinks. Scientific American. Retrieved 29 Dec 2016   \n50. A. Zaleski, This biotech start-up is betting your genes will yield the next wonder drug. CNBC. Retrieved 29 Dec 2016 (22 June 2016)   \n51. A. Regalado, How 23andMe turned your DNA into a $$ 1$ billion drug discovery machine. MIT Technology Review. Retrieved 29 Dec 2016   \n52. 23andMe reports jump in requests for data in wake of Pfizer depression study|FierceBiotech. fiercebiotech.com. Retrieved 29 Dec 2016   \n53. L. Tay, Inside eBay’s 90PB data warehouse. ITNews. Retrieved 12 Feb 2016   \n54. J. Layton, Amazon technology. Money.howstuffworks.com. Retrieved 5 March 2013   \n55. Scaling Facebook to 500 million users and beyond. Facebook.com. Retrieved 21 July 2013   \n56. Google still doing at least 1 trillion searches per year. Search Engine Land. Retrieved 15 April 2015 (16 Jan 2015)   \n57. Obama administration unveils “big data” initiative: announces $$ 200$ million in new R&D investments. The White House. Archived from the original (PDF) on 1 Nov 2012   \n58. AMPLab at the University of California, Berkeley. Amplab.cs.berkeley.edu. Retrieved 5 March 2013   \n59. NSF leads federal efforts in big data. National Science Foundation (NSF). 29 March 2012   \n60. T. Hunter, T. Moldovan, M. Zaharia, J. Ma, M. Franklin, P. Abbeel, A. Bayen (October 2011), Scaling the mobile millennium system in the cloud   \n61. D. Patterson, Computer scientists may have what it takes to help cure cancer. The New York Times (5 Dec 2011)   \n62. Secretary Chu announces new institute to help scientists improve massive data set research on doe supercomputers. energy.gov.   \n63. Governor Patrick announces new initiative to strengthen Massachusetts’ position as a world leader in big data. Commonwealth of Massachusetts   \n64. Big Data $@$ CSAIL. Bigdata.csail.mit.edu. Retrieved 5 March 2013 (22 Feb 2013)   \n65. Big data public private forum. Cordis.europa.eu. Archived from the original on 20 May 2013. Retrieved 5 March 2013 (1 Sept 2012) \n\nChapter 1 The Cluster Analysis in Big Data Mining \n1.1 Introduction \nMethods of clustering represent a powerful tools for reducing dimensions of BD warehouses. Clustering enables to split the initial big data set into several groups of similar objects by features of similarity-difference using various distance metrics and replace the whole group by the most representative object locating in the cluster center. In this chapter different clustering methods and techniques are considered and their applications for practical problems solutions are presented \nTerm cluster analysis (introduced by Tryon, 1939 for the first time) actually includes a set of various algorithms of classification without teacher [1]. The general question asked by researchers in many areas is how to organize observed data in evident structures, i.e. to develop taxonomy. \nThe clustering is applied in the most various areas. For example, in the field of medicine the clustering of diseases, treatments of diseases or symptoms of diseases leads to widely used taxonomy. In the field of psychiatry the correct diagnostics of clusters of symptoms, such as paranoia, schizophrenia, etc., is decisive for successful therapy. In archeology by means of the cluster analysis researchers try to make taxonomy of stone tools, funeral objects, etc. Broad applications of the cluster analysis in market researches are well known. Generally, every time when it is necessary to classify “mountains” of information to groups, suitable for further processing, the cluster analysis is very useful and effective. In recent years the cluster analysis is widely used in the intellectual analysis of data (Data Mining), as one of the principal methods. \nThe purpose of this chapter is the consideration of modern methods of the cluster analysis, crisp methods(a method of C-means, Ward’s method, the next neighbor, the most distant neighbor), and fuzzy methods, robust probabilistic and possibilistic clustering methods. \nNumerous results of pilot studies of fuzzy methods of a cluster analysis are presented in the Sect. 1.9 among them is a problem of UN countries clustering by indicators of sustainable development.",
        "chapter": "Introduction",
        "section": "Technology",
        "subsection": "N/A",
        "subsubsection": "N/A"
    },
    {
        "content": "Chapter 1 The Cluster Analysis in Big Data Mining \n1.1 Introduction \nMethods of clustering represent a powerful tools for reducing dimensions of BD warehouses. Clustering enables to split the initial big data set into several groups of similar objects by features of similarity-difference using various distance metrics and replace the whole group by the most representative object locating in the cluster center. In this chapter different clustering methods and techniques are considered and their applications for practical problems solutions are presented \nTerm cluster analysis (introduced by Tryon, 1939 for the first time) actually includes a set of various algorithms of classification without teacher [1]. The general question asked by researchers in many areas is how to organize observed data in evident structures, i.e. to develop taxonomy. \nThe clustering is applied in the most various areas. For example, in the field of medicine the clustering of diseases, treatments of diseases or symptoms of diseases leads to widely used taxonomy. In the field of psychiatry the correct diagnostics of clusters of symptoms, such as paranoia, schizophrenia, etc., is decisive for successful therapy. In archeology by means of the cluster analysis researchers try to make taxonomy of stone tools, funeral objects, etc. Broad applications of the cluster analysis in market researches are well known. Generally, every time when it is necessary to classify “mountains” of information to groups, suitable for further processing, the cluster analysis is very useful and effective. In recent years the cluster analysis is widely used in the intellectual analysis of data (Data Mining), as one of the principal methods. \nThe purpose of this chapter is the consideration of modern methods of the cluster analysis, crisp methods(a method of C-means, Ward’s method, the next neighbor, the most distant neighbor), and fuzzy methods, robust probabilistic and possibilistic clustering methods. \nNumerous results of pilot studies of fuzzy methods of a cluster analysis are presented in the Sect. 1.9 among them is a problem of UN countries clustering by indicators of sustainable development. \n1.2 Cluster Analysis, Problem Definition. Criteria of Quality and Metrics \nLet the set of observations $c _ { 1 }$ be given, where $X _ { i } = { x _ { i j } } , j = overline { { 1 , N } }$ . It is required to divide a set $X$ into not intersected K subsets—clusters $S _ { 1 } , . . . , S _ { K }$ so that to provide extremum of some criterion (functional of quality), that is: \nto find such $boldsymbol { S } = ( S _ { 1 } , . . . , S _ { K } )$ that $f ( S )  m i n ( m a x )$ . \nDifferent types of criteria (functional) of splitting are possible. It’s worth to note that this task is closely connected with definition of some metrics in a feature space. \nConsider the most widely used functionals of splitting quality [2]: \n1. Coefficient of splitting $mathrm { Delta F }$ which is defined as follows: \nwhere $w _ { i j } in [ 0 ; 1 ]$ —some degree of membership of the i-th object to the $mathrm { j }$ -th cluster. Change range is $begin{array} { r } { F in left[ frac { 1 } { k } ; 1 right] } end{array}$ , where n—number of objects, $K$ —number of clusters. \n2. Non-fuzziness index: \nwhere $K$ —number of classes (clusters); $F .$ —splitting coefficient. \n3. Entropy of splitting: \n4. The normalized entropy of splitting: \nwhere n is a number of points. \n5. The modified entropy: \n6. Second functional of Rubens:",
        "chapter": "1 The Cluster Analysis in Big Data Mining",
        "section": "1.1 Introduction",
        "subsection": "N/A",
        "subsubsection": "N/A"
    },
    {
        "content": "1.2 Cluster Analysis, Problem Definition. Criteria of Quality and Metrics \nLet the set of observations $c _ { 1 }$ be given, where $X _ { i } = { x _ { i j } } , j = overline { { 1 , N } }$ . It is required to divide a set $X$ into not intersected K subsets—clusters $S _ { 1 } , . . . , S _ { K }$ so that to provide extremum of some criterion (functional of quality), that is: \nto find such $boldsymbol { S } = ( S _ { 1 } , . . . , S _ { K } )$ that $f ( S )  m i n ( m a x )$ . \nDifferent types of criteria (functional) of splitting are possible. It’s worth to note that this task is closely connected with definition of some metrics in a feature space. \nConsider the most widely used functionals of splitting quality [2]: \n1. Coefficient of splitting $mathrm { Delta F }$ which is defined as follows: \nwhere $w _ { i j } in [ 0 ; 1 ]$ —some degree of membership of the i-th object to the $mathrm { j }$ -th cluster. Change range is $begin{array} { r } { F in left[ frac { 1 } { k } ; 1 right] } end{array}$ , where n—number of objects, $K$ —number of clusters. \n2. Non-fuzziness index: \nwhere $K$ —number of classes (clusters); $F .$ —splitting coefficient. \n3. Entropy of splitting: \n4. The normalized entropy of splitting: \nwhere n is a number of points. \n5. The modified entropy: \n6. Second functional of Rubens: \n7. Third functional of Rubens (second index of Non-fuzziness): \nAs initial information is set in the form of a matrix $X$ , there is a metrics choice problem. Metrics choice—the most important factor influencing results of a cluster analysis. Depending on type of features various measures of distance (metrics) are used. \nLet be samples $X _ { i }$ and $X _ { K }$ in $N$ -dimensional feature space. \nThe main metrics of clustering are given in the Table 1.1. \nThere is a large number of clustering algorithms which use various metrics and criteria of splitting. \n1.3 Classification of Algorithms of Cluster Analysis \nWhen performing a clustering it is important to know, how many clusters contains an initial sample It is supposed that the clustering has to reveal natural local grouping of objects. Therefore the number of clusters is the parameter which is often significantly complicates an algorithm if it is supposed to be unknown and significantly influencing quality of result if it is known. \nThe problem of a choice of clusters number is very nontrivial. It is enough to tell that for obtaining the satisfactory theoretical decision often it is required to make in advance very strong assumptions of properties of some family of distributions. But about what assumptions one can make when, especially at the beginning of research, of data practically it isn’t known? Therefore algorithms of a clustering usually are constructed as some way of search clusters number and determination of its optimum value in the course of search. \nThe number of methods of splitting a set of objects into clusters is quite great All of them can be subdivided on hierarchical and not hierarchical. \nIn not hierarchical algorithms their work and conditions of stop need to be regulated in advance often with large number of parameters that is sometimes difficult, especially at the initial stage of investigation. But in such algorithms big flexibility in a variation of a clustering is reached and usually the number of clusters is defined. In not hierarchical algorithms a criterion of clustering is given and it should be optimized in result of splitting initial sample or set into clusters. \nOn the other hand, when objects are characterized by a large number of features (parameters), a task of grouping features is important. Initial information contains in a square matrix of features interconnections, in particular, in a correlation matrix. Basis of the successful solution of a grouping task is the informal hypothesis of a small number of the hidden factors which define structure of an interconnection between features. \nIn hierarchical algorithms one actually refuses to define a number of clusters, building a full tree of the enclosed clusters (so-called dendrogram). The number of clusters is defined from the assumptions, in principle, which aren’t relating to work of algorithms, for example on dynamics of change of a threshold of splitting (merge) of clusters. Difficulties of such algorithms are well studied: choice of measures of proximity of clusters, problem of inversions of indexation in the dendrograms, inflexibility of hierarchical classifications which is sometimes undesirable. Nevertheless, representation of a clustering in the form of a dendrogram allows to gain the most complete display of structure of clusters. \nHierarchical algorithms are connected with dendrograms construction and divided on: \n1. agglomerative, characterized by consecutive merge of initial elements and the corresponding reduction of number of clusters (creation of clusters from below to top);",
        "chapter": "1 The Cluster Analysis in Big Data Mining",
        "section": "1.2 Cluster Analysis, Problem Definition. Criteria of Quality and Metrics",
        "subsection": "N/A",
        "subsubsection": "N/A"
    },
    {
        "content": "2. divisional (divided) in which the number of clusters increases, starting with one cluster therefore the sequence of the splitting groups is constructed (creation of clusters from top to down). \n1.3.1 Hierarchical Algorithms. Agglomerative Algorithms \nOn the first step all the set of objects is represented as a set of clusters: \nOn the following step two closest one to another clusters are chosen (for example, $c _ { p }$ and $c _ { q }$ ) and unite in one joint cluster. The new set consisting already of $mathrm { m } - 1$ of clusters will be such: \nRepeating process, we obtain step by step the consecutive sets consisting of $( m -$ 2), $( m - 3 )$ , $( m - 4 )$ and etc. clusters. \nAt the end of procedure the cluster consisting of $mathfrak { m }$ of objects and coinciding with an initial set $I$ will be obtained. \nFor determination of distance between clusters it is possible to choose different metrics. Depending on it algorithms with various properties exist. \nThere are some methods of recalculation of distances with use of old values of distances for the united clusters differing in coefficients in a formula: \nIf clusters $boldsymbol { mathrm { ~ p ~ } }$ and q unite in one cluster of $mathbf { r }$ and it is required to calculate distance from a new cluster to cluster say, s, application of this or that method depends on a way of determination of distance between clusters, these methods differ with values of coefficients $alpha _ { p } , alpha _ { q } , beta , gamma$ . \nCoefficients of recalculation of distances between clusters $alpha _ { p } , alpha _ { q } , beta , gamma$ are specified in Table 1.2. \n1.3.2 Divisional Algorithms \nDivisional cluster algorithms, unlike agglomerative, on the first step represent all set of elements I as the only cluster. On each step of algorithm one of the existing clusters is recursively divided into two affiliated. Thus, clusters from top to down are iteratively formed. This approach isn’t so in detail described in literature devoted to the cluster analysis, as agglomerative algorithms. It is applied when it is necessary to divide all set of objects on rather small amount clusters.",
        "chapter": "1 The Cluster Analysis in Big Data Mining",
        "section": "1.3 Classification of Algorithms of Cluster Analysis",
        "subsection": "1.3.1 Hierarchical Algorithms. Agglomerative Algorithms",
        "subsubsection": "N/A"
    },
    {
        "content": "2. divisional (divided) in which the number of clusters increases, starting with one cluster therefore the sequence of the splitting groups is constructed (creation of clusters from top to down). \n1.3.1 Hierarchical Algorithms. Agglomerative Algorithms \nOn the first step all the set of objects is represented as a set of clusters: \nOn the following step two closest one to another clusters are chosen (for example, $c _ { p }$ and $c _ { q }$ ) and unite in one joint cluster. The new set consisting already of $mathrm { m } - 1$ of clusters will be such: \nRepeating process, we obtain step by step the consecutive sets consisting of $( m -$ 2), $( m - 3 )$ , $( m - 4 )$ and etc. clusters. \nAt the end of procedure the cluster consisting of $mathfrak { m }$ of objects and coinciding with an initial set $I$ will be obtained. \nFor determination of distance between clusters it is possible to choose different metrics. Depending on it algorithms with various properties exist. \nThere are some methods of recalculation of distances with use of old values of distances for the united clusters differing in coefficients in a formula: \nIf clusters $boldsymbol { mathrm { ~ p ~ } }$ and q unite in one cluster of $mathbf { r }$ and it is required to calculate distance from a new cluster to cluster say, s, application of this or that method depends on a way of determination of distance between clusters, these methods differ with values of coefficients $alpha _ { p } , alpha _ { q } , beta , gamma$ . \nCoefficients of recalculation of distances between clusters $alpha _ { p } , alpha _ { q } , beta , gamma$ are specified in Table 1.2. \n1.3.2 Divisional Algorithms \nDivisional cluster algorithms, unlike agglomerative, on the first step represent all set of elements I as the only cluster. On each step of algorithm one of the existing clusters is recursively divided into two affiliated. Thus, clusters from top to down are iteratively formed. This approach isn’t so in detail described in literature devoted to the cluster analysis, as agglomerative algorithms. It is applied when it is necessary to divide all set of objects on rather small amount clusters. \n\nOne of the first the divisional algorithms was offered by Smith Maknaoton in 1965 [2]. \nAll elements are located on the first step in one cluster $C 1 = I$ . \nThen the element, at which average value of distance from other elements in this cluster is the greatest is selected. Average value can be calculated, for example, by means of a formula \nThe chosen element is removed from a cluster of C1 and becomes the first member of the second cluster C2. \nOn each subsequent step an element in a cluster of C1 for which the difference between average distance to the elements which are in C2, and average distance to the elements remaining in C1 is the greatest is transferred to C2.. Transfer of elements from C1 in C2 proceed until the corresponding differences of averages become negative, i.e. so far there are elements located to elements of a cluster of C2 closer than to cluster elements of C1. \n\nAs a result one cluster is divided into two affiliated ones which will be split at the following level of hierarchy. Each subsequent level procedure of division is applied to one of the clusters received at the previous level. The choice of cluster to be split can be carried out differently. \nIn 1990 Kauffman and Rouzeuv suggested to choose at each level a cluster for splitting with the greatest diameter which is calculated on a formula [2] \nRecursive division of clusters proceeds, so far all clusters or won’t become singleton (i.e. consisting of one object), or so far all members of one cluster won’t have zero difference from each other. \n1.3.3 Not Hierarchical Algorithms \nThe great popularity at the solution of clustering problems was acquired by the algorithms based on search of splitting a data set into clusters (groups). In many tasks algorithms of splitting are used owing to the advantages. These algorithms try to group data (in clusters) so that criterion function of splitting algorithm reaches an extremum (minimum). We’ll consider three main algorithms of a clustering based on splitting methods. In these algorithms the following basic concepts are used: \nthe training set (an input set of data) of $mathbf { M }$ on which splitting is based; distance metrics: \nwhere the matrix A defines a way of distance calculation. For example, for a singular matrix distance according to Euclid metrics is used; \nvector of the centers of clusters $mathbf { C }$ ; splitting matrix on clusters U; $bullet$ goal function $J = J ( M , d , C , U )$ ; $bullet$ set of restrictions. \nDescription of $pmb { K }$ -means Algorithm \nBasic definitions and concepts within this algorithm are following: \n• the training set $M = left{ m _ { j } right} _ { j = 1 } ^ { d } d cdot$ —number of points (vectors) of data; the distance metrics counted by a formula (1.6);   \nvector of the centers of clusters $C = { c ^ { ( i ) } } _ { i = 1 } ^ { c }$",
        "chapter": "1 The Cluster Analysis in Big Data Mining",
        "section": "1.3 Classification of Algorithms of Cluster Analysis",
        "subsection": "1.3.2 Divisional Algorithms",
        "subsubsection": "N/A"
    },
    {
        "content": "As a result one cluster is divided into two affiliated ones which will be split at the following level of hierarchy. Each subsequent level procedure of division is applied to one of the clusters received at the previous level. The choice of cluster to be split can be carried out differently. \nIn 1990 Kauffman and Rouzeuv suggested to choose at each level a cluster for splitting with the greatest diameter which is calculated on a formula [2] \nRecursive division of clusters proceeds, so far all clusters or won’t become singleton (i.e. consisting of one object), or so far all members of one cluster won’t have zero difference from each other. \n1.3.3 Not Hierarchical Algorithms \nThe great popularity at the solution of clustering problems was acquired by the algorithms based on search of splitting a data set into clusters (groups). In many tasks algorithms of splitting are used owing to the advantages. These algorithms try to group data (in clusters) so that criterion function of splitting algorithm reaches an extremum (minimum). We’ll consider three main algorithms of a clustering based on splitting methods. In these algorithms the following basic concepts are used: \nthe training set (an input set of data) of $mathbf { M }$ on which splitting is based; distance metrics: \nwhere the matrix A defines a way of distance calculation. For example, for a singular matrix distance according to Euclid metrics is used; \nvector of the centers of clusters $mathbf { C }$ ; splitting matrix on clusters U; $bullet$ goal function $J = J ( M , d , C , U )$ ; $bullet$ set of restrictions. \nDescription of $pmb { K }$ -means Algorithm \nBasic definitions and concepts within this algorithm are following: \n• the training set $M = left{ m _ { j } right} _ { j = 1 } ^ { d } d cdot$ —number of points (vectors) of data; the distance metrics counted by a formula (1.6);   \nvector of the centers of clusters $C = { c ^ { ( i ) } } _ { i = 1 } ^ { c }$ \nwhere \n• splitting matrix $U = { u _ { i j } }$ .where \nObject function \nset of restrictions \nwhich defines that each vector of data can belong only to one cluster and doesn’t belong to the rest. Each cluster contains not less than one point, but less than a total number of points. \nStructurally the algorithm represents the following iterative procedure [1]. \nStep 1. To initialize initial splitting (for example, in a random way), to choose accuracy value d (it is used in a condition of end of an algorithm), to initialize a number of iteration $1 = 0$ . \nStep 2. To define the centers of clusters by the following formula: \nStep 3. To update a splitting matrix to minimize squares of errors, using a formula \nStep 4. To check a condition $big | U ^ { ( l ) } - U ^ { ( l - 1 ) } big | < delta .$ If the condition is satisfied, finish process if it isn’t true then pa\tss to a step 2 \twith number of iteration $l = l + 1$ . The main shortcoming inherent to this algorithm owing to discrete character of elements of a splitting matrix is the big size of splitting space. One way to overcome this shortcoming is the choice of elements of a splitting matrix by numbers from a unit interval. That is, belonging of a data element to a cluster has to be defined by membership function—the element of data can belong to several clusters with various degree of membership. In that case we come to a problem of fuzzy clustering. This approach found the embodiment in algorithm of fuzzy clustering— fuzzy method of K-means (Fuzzy C-Means). \n\n1.4 Fuzzy C-Means Method \nConsider a neural network with self-organization where training is performed without a teacher. The algorithm of self-organization relates a vector x to the corresponding cluster of data which is presented by its center, using a competitive training. \nThe basic form of algorithm of self-organization allows to find precisely position of the centers of the relevant groups (clusters) into which the output multidimensional space is split. These centers can be used further in hybrid algorithm of training of FNNs as initial values that considerably accelerates process of training and guarantees convergence to a global minimum [3]. \n1.4.1 Algorithm of Fuzzy C-Means \nLet’s assume that in a network exists m fuzzy neurons with the centers in points $c _ { j } , ( j = 1 , 2 , . . . , m )$ . Initial values of these centers can be chosen randomly from areas of admissible values of the corresponding components of vectors $x _ { k } , ( k =$ $1 , 2 , . . . , N )$ used for training. Let function of a fuzzification be set in the form of the generalized Gauss function expressed by a formula (1.8). \nThe vector entered in a network input $x _ { k }$ will belong to various groups represented by the centers $c _ { j }$ , with degree $w _ { k j }$ , and $0 < w _ { k j } < 1$ , and total degree of membership to all groups, is obviously, equal 1. Therefore \nfor all $w _ { k j } ( k = 1 , 2 , . . . , N )$ . \nThe function of an error corresponding to such representation can be defined as the sum of individual errors of membership to the centers $c _ { i }$ taking into account fuzziness degree $beta$ . Therefore, [4]",
        "chapter": "1 The Cluster Analysis in Big Data Mining",
        "section": "1.3 Classification of Algorithms of Cluster Analysis",
        "subsection": "1.3.3 Not Hierarchical Algorithms",
        "subsubsection": "N/A"
    },
    {
        "content": "1.4 Fuzzy C-Means Method \nConsider a neural network with self-organization where training is performed without a teacher. The algorithm of self-organization relates a vector x to the corresponding cluster of data which is presented by its center, using a competitive training. \nThe basic form of algorithm of self-organization allows to find precisely position of the centers of the relevant groups (clusters) into which the output multidimensional space is split. These centers can be used further in hybrid algorithm of training of FNNs as initial values that considerably accelerates process of training and guarantees convergence to a global minimum [3]. \n1.4.1 Algorithm of Fuzzy C-Means \nLet’s assume that in a network exists m fuzzy neurons with the centers in points $c _ { j } , ( j = 1 , 2 , . . . , m )$ . Initial values of these centers can be chosen randomly from areas of admissible values of the corresponding components of vectors $x _ { k } , ( k =$ $1 , 2 , . . . , N )$ used for training. Let function of a fuzzification be set in the form of the generalized Gauss function expressed by a formula (1.8). \nThe vector entered in a network input $x _ { k }$ will belong to various groups represented by the centers $c _ { j }$ , with degree $w _ { k j }$ , and $0 < w _ { k j } < 1$ , and total degree of membership to all groups, is obviously, equal 1. Therefore \nfor all $w _ { k j } ( k = 1 , 2 , . . . , N )$ . \nThe function of an error corresponding to such representation can be defined as the sum of individual errors of membership to the centers $c _ { i }$ taking into account fuzziness degree $beta$ . Therefore, [4] \nwhere $beta$ is a weight coefficient which accepts values from an interval $( 1 , infty )$ . The training goal of self-organization consists in such selection of the centers $c _ { j }$ , that for the whole set of the training vectors $x _ { k }$ —achievement of a minimum of function (1.14) at simultaneous fulfillment of conditions (1.13) is attained. Thus it is a problem of minimization of nonlinear function (1.14) with $mathbf { N }$ constraints of type (1.13). The solution of this task can be transferred to minimization of Lagrange function defined by the form [4]. \nwhere $lambda _ { k } ( k = 1 , 2 , . . . , N )$ are Lagrange’s multipliers. In it is proved that the solution of a task (1.15) can be presented in the form \nwhere $d _ { k j }$ —is Euclidean distance between the center $c _ { j }$ and vector $x _ { k } , d _ { k j } = left| c _ { j } - x _ { k } right|$ . As exact values of the centers $c _ { j }$ at the beginning of process aren’t known, th\te trainin\tg algorithm has to be iterative. It can be formulated in the following way: \n1. To execute random initialization of coefficients $w _ { k j }$ , choosing their values from an interval [0, 1] so that the condition (1.13) be satisfied.   \n2. To define К centers $c _ { j }$ , in accordance with (1.16).   \n3. To calculate value of the error function according to expression (1.14). If its value appears below the established threshold or if reduction of this error of previous iteration is negligible, to finish calculations. The last values of the centers represent the required decision. Otherwise, go to step 4   \n4. To calculate new values $u _ { k j }$ in a formula (1.17) and to pass to step 2. \nSuch procedure is called the fuzzy self-organization algorithm C-means. \nRepetition of iterative procedure leads to achievement of a minimum of function E which won’t be a global minimum. The quality of the found centers estimated by value of an error function E essentially depends on preliminary selection of values $w _ { k j }$ and centers $c _ { j }$ . As the best will be such placement of the centers at which they settle down in the areas containing the greatest number of the shown vectors $x _ { j }$ . At such selection of the centers they will represent vectors of data $x _ { j }$ with the smallest total error. \nTherefore the beginning of iterative procedure of calculation of optimum values of the centers has to be preceded by procedure of their initialization. Algorithms of peak and differential grouping of data belong to the most known algorithms of initialization. \n\n1.4.2 Definition of Initial Location of the Centers of Clusters \nAlgorithm of Peak Grouping \nThe algorithm of peak grouping was offered by Jager and Filev [4, 5]. \nWhen using N input vectors the special grid which evenly covers space of these vectors is constructed. Nodes of this grid are considered as potential centers $vartheta$ , for each of which peak function is calculated: \nwhere $sigma$ is some constant which is selected separately for each specific task. \nValue $m ( vartheta )$ is considered as an assessment of height of peak function. It is proportional to quantity of vectors $x _ { j }$ , which get to the vicinity of the potential center $vartheta$ . Great value $m ( vartheta )$ testifies to that the center $vartheta$ locates in the area in which the greatest number of vectors is concentrated $left{ x _ { k } right}$ . \nThe coefficient of $sigma$ influences final proportions between $m ( vartheta )$ and $vartheta$ slightly. \nAfter calculation of values $m ( vartheta )$ for all potential centers the first center is selected $c _ { 1 }$ , which has the greatest value $m ( vartheta )$ . For a choice of the following centers it is necessary to exclude $c _ { 1 }$ and nodes which are placed in close proximity to $c _ { 1 }$ . \nIt can be done by redefinition of peak function at the expense of separation of Gauss function from it with the center in a point $c _ { 1 }$ . Having designated this new function through $m _ { n e w } ( vartheta )$ , we receive: \nNote that this function has zero in a point $c _ { 1 }$ . \nThen the same procedure repeats value with the next center $c _ { 2 }$ , etc. \nProcess of finding of the following centers $c _ { 2 } , c _ { 3 }$ is realized consistently on the modified values $m _ { n e w } ( vartheta )$ , which turn out at an exception of the next neighbors of the center which was found at the previous stage. It comes to an end at the moment of localization of all the centers. \nThe method of peak grouping is effective at not really big dimension of a vector of $X$ . Otherwise number of the potential centers increases as avalanche. \nAlgorithm of Differential Grouping \nThe algorithm of differential grouping is a modification of the previous algorithm, in which vectors $x _ { j }$ are considered as the potential centers $vartheta$ . Peak function $D ( x _ { i } )$ in this case takes the form [5]:",
        "chapter": "1 The Cluster Analysis in Big Data Mining",
        "section": "1.4 Fuzzy C-Means Method",
        "subsection": "1.4.1 Algorithm of Fuzzy C-Means",
        "subsubsection": "N/A"
    },
    {
        "content": "1.4.2 Definition of Initial Location of the Centers of Clusters \nAlgorithm of Peak Grouping \nThe algorithm of peak grouping was offered by Jager and Filev [4, 5]. \nWhen using N input vectors the special grid which evenly covers space of these vectors is constructed. Nodes of this grid are considered as potential centers $vartheta$ , for each of which peak function is calculated: \nwhere $sigma$ is some constant which is selected separately for each specific task. \nValue $m ( vartheta )$ is considered as an assessment of height of peak function. It is proportional to quantity of vectors $x _ { j }$ , which get to the vicinity of the potential center $vartheta$ . Great value $m ( vartheta )$ testifies to that the center $vartheta$ locates in the area in which the greatest number of vectors is concentrated $left{ x _ { k } right}$ . \nThe coefficient of $sigma$ influences final proportions between $m ( vartheta )$ and $vartheta$ slightly. \nAfter calculation of values $m ( vartheta )$ for all potential centers the first center is selected $c _ { 1 }$ , which has the greatest value $m ( vartheta )$ . For a choice of the following centers it is necessary to exclude $c _ { 1 }$ and nodes which are placed in close proximity to $c _ { 1 }$ . \nIt can be done by redefinition of peak function at the expense of separation of Gauss function from it with the center in a point $c _ { 1 }$ . Having designated this new function through $m _ { n e w } ( vartheta )$ , we receive: \nNote that this function has zero in a point $c _ { 1 }$ . \nThen the same procedure repeats value with the next center $c _ { 2 }$ , etc. \nProcess of finding of the following centers $c _ { 2 } , c _ { 3 }$ is realized consistently on the modified values $m _ { n e w } ( vartheta )$ , which turn out at an exception of the next neighbors of the center which was found at the previous stage. It comes to an end at the moment of localization of all the centers. \nThe method of peak grouping is effective at not really big dimension of a vector of $X$ . Otherwise number of the potential centers increases as avalanche. \nAlgorithm of Differential Grouping \nThe algorithm of differential grouping is a modification of the previous algorithm, in which vectors $x _ { j }$ are considered as the potential centers $vartheta$ . Peak function $D ( x _ { i } )$ in this case takes the form [5]: \nwhere value of coefficient $r _ { a }$ defines the sphere of the neighborhood. On value $D ( x _ { i } )$ considerably influence only vectors $x _ { j }$ , which are inside this sphere. \nAt the big density of points near $x _ { i }$ function value $D ( x _ { i } )$ is large. After calculation of values of peak function for each point $x _ { i }$ , the vector $mathbf { boldsymbol { x } }$ is found, for which density measure $D ( x )$ will appear to be the greatest. This point becomes the first center $c _ { 1 }$ . \nChoice of the following center $c _ { 2 }$ is performed after an exception of the previous center and all points which lie in its vicinity. \nAs well as in the previous case peak function is redefined so \nAt new definition of function $D$ coefficients $r _ { b }$ designate new values of a constant which sets the sphere of the neighborhood of the following center. Usually a condition $r _ { b } geq r _ { a }$ is used. \nAfter modification of value of peak function a search of a new point $x$ , for which $D _ { n e w } ( x _ { i } ) to operatorname* { m a x }$ is performed It becomes the new center. \nProcess of finding of the next center is resumed after the exception of all already selected points. Initialization comes to an end at the time of fixing of all centers which are provided by entry conditions. \n1.5 Gustavson-Kessel’s Fuzzy Cluster Analysis Algorithm \nIn classical algorithm fuzzy $C =$ means elements of error function E are obtained by means of usual Euclid distance between a vector x and the center of a cluster c with: \nAt such metrics of distance between two vectors the set of the points equidistant from the center represents a sphere with an identical scale on all axes. But if data form groups which form differs from spherical or if scales of separate coordinates of a vector strongly differ, such metrics becomes inadequate. In this case quality of a clustering can be increased considerably at the expense of the improved version of the self-organization algorithm which is called as Gustavson-Kessel’s algorithm [3, 4]. \nThe main changes of basic algorithm fuzzy C-means consist in introduction to a metrics calculation formula of the scaling matrix A. At such scaling the distance between the center $c$ and vectors $mathbf { boldsymbol { x } }$ is defined by a formula:",
        "chapter": "1 The Cluster Analysis in Big Data Mining",
        "section": "1.4 Fuzzy C-Means Method",
        "subsection": "1.4.2 Definition of Initial Location of the Centers of Clusters",
        "subsubsection": "N/A"
    },
    {
        "content": "As scaling usually the positive-definite matrix is used, that is a matrix, at which all eigenvalues are real and positive. \nSimilar to the basic algorithm C-means the training goal of Gustavson-Kessel algorithm lies in such placement of the centers at which the criterion $mathrm { ~ E ~ }$ is minimized: \n1.5.1 Description of Gustavson-Kessel Algorithm \n1. To carry out initial placement of the centers in data space. To create an elementary form of the scaling matrix A. \n2. To create a matrix of membership coefficients of all vectors $mathbf { X }$ to the centers by a formula: \n3. To calculate new placement of the centers according to a formula: \n4. To generate a covariance matrix for each vector: \n5. To calculate a new scaling matrix for each j-th centre by a formula: \n6. If the last changes of centers and a covariance matrix are rather small in relation to the previous values (don’t exceed the set values), finish iterative process, otherwise go to step 2. \n1.6 Adaptive Robust Clustering Algorithms \n1.6.1 Possibilistic Clustering Algorithm \nMajor drawbacks associated with a probabilistic approach (Fuzzy C-means algorithm)are connected with constraints (1.13). In the simplest case of two clusters ${ bf zeta } _ { m } = 2 )$ is easy to see that the observation $x _ { k }$ , equally owned by both clusters and observation $x _ { p }$ , not belonging to any of them, may have the same levels of membership $w _ { ( k , 1 ) } = w _ { ( k , 2 ) } = w _ { ( p , 1 ) } = w _ { ( p , 2 ) } = 0 . 5$ . \nNaturally, this fact decreasing the accuracy of classification, led to a possibilistic approach to the fuzzy classification [5]. In the possibilistic clustering algorithm goal function has the form \nwhere scalar parameter $mu _ { j } > 0$ determines the distance on which membership level takes the value 0.5, that is if $d ^ { 2 } ( x _ { k } , c _ { j } ) = mu _ { j }$ , then $w _ { k , j } = 0 . 5$ . \nMinimization (1.27) by $w _ { k , j } , c _ { j } , mu _ { j }$ gives evident solution \nIt can be seen that the possibilistic and probabilistic algorithms are very similar and pass one into other by replacing the expression (1.27) to the formula (1.15), and vice versa. A common disadvantage of the considered algorithms is their computational complexity and the inability to work in real time. The algorithm (1.15)– (1.17) begins with the initial task (normal random) partitions matrix $W ^ { 0 }$ . On the basis of its values initial set of prototypes $c _ { j } ^ { 0 }$ is calculated which then is used to calculate a new matrix $W ^ { 1 }$ . Then this procedure is continued and sequence of solutions $c _ { j } ^ { 1 } W ^ { 2 } , ldots , W ^ { t } , c _ { j } ^ { t } W ^ { t + 1 }$ etc. is obtained until the difference $left| W ^ { t + 1 } - W ^ { t } right|$ is less than a preassigned threshold e. \nTherefore, all available data sample is processed repeatedly. \nThe solution obtained using a probabilistic algorithm, is recommended as the initial conditions for possibilistic algorithm (1.28)–(1.30) [5]. Parameter distance $mu _ { j }$ is initialized in accordance with (1.30) on the results of the probabilistic algorithm.",
        "chapter": "1 The Cluster Analysis in Big Data Mining",
        "section": "1.5 Gustavson-Kessel’s Fuzzy Cluster Analysis Algorithm",
        "subsection": "1.5.1 Description of Gustavson-Kessel Algorithm",
        "subsubsection": "N/A"
    },
    {
        "content": "1.6 Adaptive Robust Clustering Algorithms \n1.6.1 Possibilistic Clustering Algorithm \nMajor drawbacks associated with a probabilistic approach (Fuzzy C-means algorithm)are connected with constraints (1.13). In the simplest case of two clusters ${ bf zeta } _ { m } = 2 )$ is easy to see that the observation $x _ { k }$ , equally owned by both clusters and observation $x _ { p }$ , not belonging to any of them, may have the same levels of membership $w _ { ( k , 1 ) } = w _ { ( k , 2 ) } = w _ { ( p , 1 ) } = w _ { ( p , 2 ) } = 0 . 5$ . \nNaturally, this fact decreasing the accuracy of classification, led to a possibilistic approach to the fuzzy classification [5]. In the possibilistic clustering algorithm goal function has the form \nwhere scalar parameter $mu _ { j } > 0$ determines the distance on which membership level takes the value 0.5, that is if $d ^ { 2 } ( x _ { k } , c _ { j } ) = mu _ { j }$ , then $w _ { k , j } = 0 . 5$ . \nMinimization (1.27) by $w _ { k , j } , c _ { j } , mu _ { j }$ gives evident solution \nIt can be seen that the possibilistic and probabilistic algorithms are very similar and pass one into other by replacing the expression (1.27) to the formula (1.15), and vice versa. A common disadvantage of the considered algorithms is their computational complexity and the inability to work in real time. The algorithm (1.15)– (1.17) begins with the initial task (normal random) partitions matrix $W ^ { 0 }$ . On the basis of its values initial set of prototypes $c _ { j } ^ { 0 }$ is calculated which then is used to calculate a new matrix $W ^ { 1 }$ . Then this procedure is continued and sequence of solutions $c _ { j } ^ { 1 } W ^ { 2 } , ldots , W ^ { t } , c _ { j } ^ { t } W ^ { t + 1 }$ etc. is obtained until the difference $left| W ^ { t + 1 } - W ^ { t } right|$ is less than a preassigned threshold e. \nTherefore, all available data sample is processed repeatedly. \nThe solution obtained using a probabilistic algorithm, is recommended as the initial conditions for possibilistic algorithm (1.28)–(1.30) [5]. Parameter distance $mu _ { j }$ is initialized in accordance with (1.30) on the results of the probabilistic algorithm. \n1.6.2 Recurrent Fuzzy Clustering Algorithms \nAnalysis of (1.15) shows that, for the calculation of membership levels $w _ { k , j }$ instead of the Lagrangian (1.15) can be used its local modification: \nOptimization of the expression (1.31) by the procedure of the Arrow-Hurwicz-Uzawa leads to an algorithm \nProcedure (1.32), (1.33) is close to the learning algorithm Chang-Lee], and for $beta = 2$ coincides with the gradient procedure clustering Park-Degger [6]: \nWithin the framework of possibilistic approach local criterion takes the form \nand the result of its optimization has the form \nwhere the distance parameter $mu _ { j }$ initialized according to (1.30). \nIn this case, $mathbf { N }$ in Eq. (1.30) is a volume of data set used for initialization \nIn the quadratic case, the algorithm (1.37), (1.38) is converted into a rather simple procedure and optimization result is of the form",
        "chapter": "1 The Cluster Analysis in Big Data Mining",
        "section": "1.6 Adaptive Robust Clustering Algorithms",
        "subsection": "1.6.1 Possibilistic Clustering Algorithm",
        "subsubsection": "N/A"
    },
    {
        "content": "1.6.2 Recurrent Fuzzy Clustering Algorithms \nAnalysis of (1.15) shows that, for the calculation of membership levels $w _ { k , j }$ instead of the Lagrangian (1.15) can be used its local modification: \nOptimization of the expression (1.31) by the procedure of the Arrow-Hurwicz-Uzawa leads to an algorithm \nProcedure (1.32), (1.33) is close to the learning algorithm Chang-Lee], and for $beta = 2$ coincides with the gradient procedure clustering Park-Degger [6]: \nWithin the framework of possibilistic approach local criterion takes the form \nand the result of its optimization has the form \nwhere the distance parameter $mu _ { j }$ initialized according to (1.30). \nIn this case, $mathbf { N }$ in Eq. (1.30) is a volume of data set used for initialization \nIn the quadratic case, the algorithm (1.37), (1.38) is converted into a rather simple procedure and optimization result is of the form \nwherein $mu _ { j }$ is the distance parameter initialized by the results of the probabilistic clustering (for example, using an algorithm Fuzzy C-means (1.15)–(1.17) according to the equation: \n1.6.3 Robust Adaptive Algorithms of Probabilistic Fuzzy Clustering \nThe considered above clustering methods can effectively solve the problem of classification with a substantial intersection of the clusters, however, it assumes that the data within each cluster are located compactly enough without sharp (abnormal) outliers. \nHowever, it should be noted that the actual data is usually distorted by outliers, the share of which according to some estimates [7], is up to $20 %$ so that to speak of a compact placement of data is not always correct. \nIn this regard, recently, much attention was paid to problems of fuzzy cluster analysis of the data, the density distribution of which differs from the normal by presence of “heavy tails” [8, 9]. \nRobust Recursive Algorithm for Probabilistic Fuzzy Clustering \nAfter standardization of feature vectors components so that all source vectors would belong to the unit hypercube $[ 0 , 1 ] ^ { n }$ , the objective function is constructed \nunder constraints \nHere $D ( x _ { k } , c _ { j } )$ is a distance between $x _ { k }$ and $c _ { j }$ in adopted metric. The result of clustering is assumed to be $N times m$ matrix $W = { w _ { k , j } }$ , called “matrix of fuzzy",
        "chapter": "1 The Cluster Analysis in Big Data Mining",
        "section": "1.6 Adaptive Robust Clustering Algorithms",
        "subsection": "1.6.2 Recurrent Fuzzy Clustering Algorithms",
        "subsubsection": "N/A"
    },
    {
        "content": "wherein $mu _ { j }$ is the distance parameter initialized by the results of the probabilistic clustering (for example, using an algorithm Fuzzy C-means (1.15)–(1.17) according to the equation: \n1.6.3 Robust Adaptive Algorithms of Probabilistic Fuzzy Clustering \nThe considered above clustering methods can effectively solve the problem of classification with a substantial intersection of the clusters, however, it assumes that the data within each cluster are located compactly enough without sharp (abnormal) outliers. \nHowever, it should be noted that the actual data is usually distorted by outliers, the share of which according to some estimates [7], is up to $20 %$ so that to speak of a compact placement of data is not always correct. \nIn this regard, recently, much attention was paid to problems of fuzzy cluster analysis of the data, the density distribution of which differs from the normal by presence of “heavy tails” [8, 9]. \nRobust Recursive Algorithm for Probabilistic Fuzzy Clustering \nAfter standardization of feature vectors components so that all source vectors would belong to the unit hypercube $[ 0 , 1 ] ^ { n }$ , the objective function is constructed \nunder constraints \nHere $D ( x _ { k } , c _ { j } )$ is a distance between $x _ { k }$ and $c _ { j }$ in adopted metric. The result of clustering is assumed to be $N times m$ matrix $W = { w _ { k , j } }$ , called “matrix of fuzzy \ndecomposition.” Typically, as the distance function $D ( x _ { k } , c _ { j } )$ Minkowski metric $L ^ { p }$ is applied \nwhere $x _ { k , i } , c _ { j , i }$ are the $i cdot$ -th components of $( n times 1 )$ —vectors $x _ { k } , c _ { j }$ correspondingly. \nEstimates relating to the quadratic objective functions are optimal when the data belong to the class of distributions with finite variance, the most famous member of which is a Gaussian. \nVarying parameter $p$ allows to improve the properties of the robustness of clustering procedures, however, the quality of assessment is determined by the type of data distribution. Thus, the estimates with $mathsf { p } = 1$ are optimal for the Laplacian data distribution, but their construction involves great computational expense. Quite realistic is the class of approximate normal distributions [9]. \nApproximately normal distributions are mixture of Gaussian density and distribution of some arbitrary density, which distorts with outliers the normal distribution. The optimal objective function in this case is the quadratic-linear, and tends to linear type as the distance from the minimum grows. \nThe most prominent representative of the approximate normal distribution density function is \nwhere $c _ { i } , s _ { i }$ are parameters, determining a center and a width of the distribution. \nThis function resembles a Gaussian in the vicinity of the center, however, has a more heavy tails. With the distribution (1.46) is associated an objective function \nwhere the parameter $beta _ { i }$ defines steepness of this function, while in the vicinity of the minimum this function is very close to the quadratic, tending with the growth of $X$ to a linear one. \nAlso interesting is the fact that the derivative of this function \nis a standard activation function of artificial neural networks. Using as a metric the following structure \nis possible to introduce the objective function of robust classification [9] \nand a corresponding Lagrangian \nwhere $lambda _ { k }$ —is indefinite Lagrange multiplier, ensuring fulfillment of constraints (1.43), (1.44). The saddle point of the Lagrangian (1.51) can be found by solving the equations system of Kuhn-Tucker \nSolutions of the first and second equations lead to well-known results \nBut the third equation \nevidently has no analytic solution. The solution of Eq. (1.54) can be obtained with the help of local modification of Lagrangian and recurrent fuzzy clustering algorithm. \nSearch of the Lagrangian local saddle point \nusing procedures Arrow-Hurwitz-Udzawa leads to an algorithm \nwhere $eta _ { k }$ is a parameter of learning rate, $c _ { k , j , i }$ is the i-th component of the $mathrm { j }$ -th prototype calculated at the $mathbf { k }$ -th step. \nBut despite low computational complexity this algorithm (1.56) has the disadvantage inherent to all probabilistic clustering algorithm. \n1.7 Robust Recursive Algorithm of Possibilistic Fuzzy Clustering for Big Data \nWhen the data sample is big (BD) and data enters into system sequentially (e.g. time series) then we may use recursive algorithms of possibilistic fuzzy clustering. For possibilistic fuzzy clustering algorithms the criterion is the following expression \nMinimization of (1.57) by parameters $w _ { k , j } , boldsymbol { c } _ { j }$ and $mu _ { i }$ leads to equations system \nThe solution of the first two equations of (1.58) leads to the well-known result \nwhile the third one \nfully corresponds to (1.54). \nIntroducing the local modification of (1.57)",
        "chapter": "1 The Cluster Analysis in Big Data Mining",
        "section": "1.6 Adaptive Robust Clustering Algorithms",
        "subsection": "1.6.3 Robust Adaptive Algorithms of Probabilistic Fuzzy Clustering",
        "subsubsection": "N/A"
    },
    {
        "content": "where $eta _ { k }$ is a parameter of learning rate, $c _ { k , j , i }$ is the i-th component of the $mathrm { j }$ -th prototype calculated at the $mathbf { k }$ -th step. \nBut despite low computational complexity this algorithm (1.56) has the disadvantage inherent to all probabilistic clustering algorithm. \n1.7 Robust Recursive Algorithm of Possibilistic Fuzzy Clustering for Big Data \nWhen the data sample is big (BD) and data enters into system sequentially (e.g. time series) then we may use recursive algorithms of possibilistic fuzzy clustering. For possibilistic fuzzy clustering algorithms the criterion is the following expression \nMinimization of (1.57) by parameters $w _ { k , j } , boldsymbol { c } _ { j }$ and $mu _ { i }$ leads to equations system \nThe solution of the first two equations of (1.58) leads to the well-known result \nwhile the third one \nfully corresponds to (1.54). \nIntroducing the local modification of (1.57) \nand optimizing it we obtain: \nwhere the distance parameter $mu _ { k j }$ may be determined according to the second equation of the system (1.59) for $mathbf { k }$ observations rather than the entire sample volume $N .$ \nIt should be noted that the last equation of system (1.52) and (1.58) are identical and are determined only by choice of metrics. This makes possible to use any suitable metric for a particular case, which will determine only the setup procedure of prototypes if the equation for calculating the weights still remains the same. \nConsidered robust recursive methods may be used in a batch mode and in the on-line mode as well. In the last case the number of observation $mathbf { k }$ represents a discrete time. \nExperiments with a repository of data, distorted by abnormal outliers (emissions), have shown high efficiency of the proposed algorithms in the processing of the information given in the form of tables “object-property” [7, 8] and in the form of time series [10]. \nIn particular, the problem of data classification of specially artificially generated sample containing three-dimensional cluster of data was considered, whose observations are marked the symbols “o”, “x” and $\" boldsymbol { + } boldsymbol { mathbf { mathit { Sigma } } }$ [9] (see Fig. 1.1). Points in each cluster are distributed according to the density of Laplace distribution having “heavy tails” \nwhere $sigma$ and $c$ are width and center correspondingly. \nThe sample includes 9000 observations (3000 in each cluster) and is divided into training (7200 cases) and testing (1800 cases) subsamples [10, 11]. \nIt should be noted that some observations are very far away from the centers of the clusters (Fig. 1.1a). Prototypes of the clusters are located in the central region of the data as shown in Fig. 1.1b. In order to find the correct prototypes clustering algorithm should be insensitive to outliers. \nFor all of the algorithms involved in the comparison, the procedure of the experiment was performed as follows. At the beginning of training a sample was clustered by appropriate algorithms and prototypes of clusters have been found. \nThen, training and testing samples were classified according to the results of clustering. Observations belonging to each cluster in the classification process are calculated in accordance with Eqs. (1.17), (1.56) or (1.62) depending on the type of clustering algorithm. The cluster, to which the observation belongs with a maximum membership degree, defines the class of this observation. Classification and training is performed in the on-line mode of receiving observations, where $beta = 2 , beta _ { 1 } = beta _ { 2 } = beta _ { 3 } = 1 , eta ( k ) = 0 . 0 1$ . The results are shown in Table 1.3 [9]. \nIn the Fig. 1.2 it can be easily seen that the centers of the clusters (prototypes) produced by the algorithm «fuzzy C-means» by Bezdek, are shifted from the visual centers of the clusters, due to the presence of “heavy tails” of the data distribution density, in contrast to the robust methods with objective function (1.56) and (1.62) in which prototypes are found more precisely, which is confirmed by the less classification error (see Table 1.3). \nContinuous growth in the successful application of computational intelligence technologies in the areas of data analysis confirms the versatility of this approach. At the same time, real problems that arise in the processing of very large databases (Big Data), complicate the use of existing algorithms and tools and demand to be improved to meet the challenges of data mining in real time using the paradigms of CI and soft computing. \n1.8 Application of Fuzzy Clustering Methods in the Problems of Automatic Classification \nExample 1.1 Classification of the UN countries These UN Millennium Indicators are presented in the Table 1.4.",
        "chapter": "1 The Cluster Analysis in Big Data Mining",
        "section": "1.7 Robust Recursive Algorithm of Possibilistic Fuzzy Clustering for Big Data",
        "subsection": "N/A",
        "subsubsection": "N/A"
    },
    {
        "content": "In the Fig. 1.2 it can be easily seen that the centers of the clusters (prototypes) produced by the algorithm «fuzzy C-means» by Bezdek, are shifted from the visual centers of the clusters, due to the presence of “heavy tails” of the data distribution density, in contrast to the robust methods with objective function (1.56) and (1.62) in which prototypes are found more precisely, which is confirmed by the less classification error (see Table 1.3). \nContinuous growth in the successful application of computational intelligence technologies in the areas of data analysis confirms the versatility of this approach. At the same time, real problems that arise in the processing of very large databases (Big Data), complicate the use of existing algorithms and tools and demand to be improved to meet the challenges of data mining in real time using the paradigms of CI and soft computing. \n1.8 Application of Fuzzy Clustering Methods in the Problems of Automatic Classification \nExample 1.1 Classification of the UN countries These UN Millennium Indicators are presented in the Table 1.4. \n(continued) \n(continued) \n(continued) \nIn this experiment it was required to perform a clustering of the United Nations countries into 4 clusters by the above indicators. As a result of the clustering algorithm of Gustavson–Kessel application the following results were obtained centers of clusters are presented in Table 1.5. \nThe matrix of belonging coefficients to different clusters (membership functions) are presented in Table 1.6. \nAs can be seen from the table in the first cluster are countries with relatively high rates of all indicators (compared to other countries in the sample). These are the countries of CIS, Eastern and western Europe, USA, Canada, the Balkans and Latin America countries. \n\nIn the second cluster are countries with smaller values of indicators, it’s countries of North Africa and Middle East. In this cluster, is the lowest level of gender equality. \nIn the third cluster are the poorest countries with the lowest levels of literacy, as well as the low level of gender equality. Mainly it’s African countries. \nIn the fourth cluster are poor countries with the most unfavorable conditions for the growth of children. \nExample 1.2 Classification of the United Nations countries on sustainable development indicators. \nInvestigations of fuzzy clustering method C-means by indicators of sustainable development for the countries of the United Nations were carried out. For this, the data of the World Data Center in Ukraine (WDC) were used. \nAs sustainable development indicators the following indices were taken: \nIndex GINI—GINI Ihd—index of health status Iql—standard of living index Isd—index of sustainable development. \nAs algorithm of initial centers placement the algorithm of differential grouping was applied. Clustering was carried out for a different number of clusters $mathrm { K } = 3 , 4$ , 5. Besides the value of optimized criterion the quality of splitting will be evaluated by the indicator of Hi-Beni: \nwhere $d _ { a nu }$ is the average intra-cluster distance, $D _ { a nu }$ —average inter-cluster distance.   \nThis indicator should be minimized. \nExperiment 1. $mathbf { K } = 3$ (Tables 1.7 and 1.8) \nLet us analyze the results. The first cluster contains countries with the highest values of all parameters. These are the countries of Western Europe, as well as some other. Namely, Australia, Austria, Belgium, Great Britain, Hungary, Denmark, Iceland, Ireland, Israel, Italy, Cyprus, Latvia, Lithuania, Luxembourg, \n(continued) \n(continued) \n(continued) \n(continued) \nNetherlands, New Zealand, Norway, Poland, Portugal. USA, Slovakia, Slovenia, Croatia, Czech Republic, Sweden, Switzerland, Uruguay. \nThe second cluster contains countries with an average value of the index GINI, and minimum values of all other indicators. These are the countries of Africa and South-East Asia. These include: Bangladesh, Egypt, Zambia, Zimbabwe, India, Indonesia, Cambodia, Cameroon, Kyrgyzstan, Nicaragua, Niger, Pakistan, Uganda, Senegal, Tajikistan, Tanzania, and others. \nThe third cluster contains countries with average values of all the indicators and the small value of the index GINI. It includes the CIS countries, Latin America and some of the most developed countries of Asia and Africa. Namely, Armenia, \nAlbania, Algeria, Argentina, Brazil, Bolivia, Bulgaria, Bosnia and Herzegovina, Venezuela, Honduras, Guatemala, Georgia, Jordan, Kazakhstan, China, Costa Rica, Colombia, Mexico, Moldova, Peru, Paraguay, Russian Federation, Trinidad and Tobago, Tunisia, Turkey, Ukraine, Chile, South Africa, Jamaica. \nExperiment 2. $mathbf { K } = 4$ (Table 1.9) \nIt is interesting to analyse dynamics of changes of clusters after transition from ${ mathrm { K } } = 3$ to ${ mathrm { K } } = 4$ . \nThe countries with the greatest values of all indicators fall to the first cluster. The structure of this cluster practically didn’t change. In the second cluster there are countries with the minimum value of an index GINI and average values of all other indicators. Here are the countries of Latin America: Argentina, Brazil, Panama, Paraguay, Peru, Uruguay, etc. \nThe countries with the minimum values of all indicators except GINI index fall to the third cluster. Here the countries from the second cluster of the previous clustering at ${ mathrm { K } } = 3$ fall. Namely, Bangladesh, Benin, Zambia, Zimbabwe, India, Cambodia, Cameroon, Kenya, Mozambique, Nepal, Pakistan, Senegal, Tadzhikistan, Tanzania, Uzbekistan. \nThe countries with average values of all indicators fall to the fourth cluster. Here the countries from the third cluster of the previous clustering fall, namely: Venezuela, Vietnam, Ukraine, the Russian Federation, Azerbaijan, Georgia, Indonesia, Jordan, Kyrgyzstan, Sri Lanka. Value of an indicator of Chi- Beni decreased from 0.438 to 0.39492. \nExperiment 3. ${ bf K } = { bf 5 }$ \nFor this experiment we present only the average data for cluster centers (see below) \nConsider the dependence of the index Hi-Beni on the number of clusters K (Fig. 1.3) \nAs the chart above shows, the value of Hi-Beni index significantly decreases when $K = 2 – 4$ , then its value is changing slightly. Therefore, the optimal number of clusters lies in vicinity of $K = 4$ . \nDetermination the Number of Clusters in Cluster Analysis \nThe main drawback of the most of clustering methods, including FCM and Gustavson-Kessel methods is that for their application the number of clusters \n(continued) \nshould be given a priori. But usually it’s unknown for experts and the criteria of clustering quality such as Hi-Beni indicator and Dunn’s Index (DI) are monotonously decrease with number of clusters K. Therefore they can’t be used directly for determining optimal value of $mathrm { K _ { o p t } }$ . \nFor determining the proper number of clusters in practice may be used the following approach. Assume the criterion of clustering be $begin{array} { r } { E = sum _ { j = 1 } ^ { m } sum _ { k = 1 } ^ { N } w _ { k j } ^ { beta } Big | c _ { j } - x _ { k } Big | ^ { 2 } } end{array}$ \nSolve the clustering problem with criterion E with different $mathbf { k }$ a\tnd find $boldsymbol { mathrm { E } } ^ { * } ( boldsymbol { mathbf { k } } )$ . \nWhen the following condition   \n$Delta mathrm { E } ( mathbf { k } ) le dot { varepsilon }$ or $Delta mathrm { E } ( mathrm { K } ) / mathrm { E } ( mathrm { K } ) leq delta .$ ,   \nholds where d and έ are accepted thresholds then stop. Usually value d may be chosen as follows $ S in [ 0 . 1 mathrm { - } 0 . 2 ]$ . \n1.9 Conclusions \nCluster analysis includes a set of different classification algorithms. In general, whenever it is necessary to classify the “mountains” of information to suitable for further processing groups, cluster analysis is very useful and effective. Cluster analysis is needed for the classification of information, it can be used in a certain way to structure the variables and to find out which variables should be combined in the first place, and which should be considered separately. \nA great advantage of the cluster analysis is that it allows to split the objects not only by one parameter but by a set of attributes as well. In addition, cluster analysis unlike most mathematical and statistical methods do not impose any restrictions on the form of these objects, and allows to treat a variety of raw data of almost arbitrary nature. This is important, for example, in the situation when indicators are diverse views, and it’s impossible to use traditional econometric approaches.",
        "chapter": "1 The Cluster Analysis in Big Data Mining",
        "section": "1.8 Application of Fuzzy Clustering Methods in the Problems of Automatic Classification",
        "subsection": "N/A",
        "subsubsection": "N/A"
    },
    {
        "content": "should be given a priori. But usually it’s unknown for experts and the criteria of clustering quality such as Hi-Beni indicator and Dunn’s Index (DI) are monotonously decrease with number of clusters K. Therefore they can’t be used directly for determining optimal value of $mathrm { K _ { o p t } }$ . \nFor determining the proper number of clusters in practice may be used the following approach. Assume the criterion of clustering be $begin{array} { r } { E = sum _ { j = 1 } ^ { m } sum _ { k = 1 } ^ { N } w _ { k j } ^ { beta } Big | c _ { j } - x _ { k } Big | ^ { 2 } } end{array}$ \nSolve the clustering problem with criterion E with different $mathbf { k }$ a\tnd find $boldsymbol { mathrm { E } } ^ { * } ( boldsymbol { mathbf { k } } )$ . \nWhen the following condition   \n$Delta mathrm { E } ( mathbf { k } ) le dot { varepsilon }$ or $Delta mathrm { E } ( mathrm { K } ) / mathrm { E } ( mathrm { K } ) leq delta .$ ,   \nholds where d and έ are accepted thresholds then stop. Usually value d may be chosen as follows $ S in [ 0 . 1 mathrm { - } 0 . 2 ]$ . \n1.9 Conclusions \nCluster analysis includes a set of different classification algorithms. In general, whenever it is necessary to classify the “mountains” of information to suitable for further processing groups, cluster analysis is very useful and effective. Cluster analysis is needed for the classification of information, it can be used in a certain way to structure the variables and to find out which variables should be combined in the first place, and which should be considered separately. \nA great advantage of the cluster analysis is that it allows to split the objects not only by one parameter but by a set of attributes as well. In addition, cluster analysis unlike most mathematical and statistical methods do not impose any restrictions on the form of these objects, and allows to treat a variety of raw data of almost arbitrary nature. This is important, for example, in the situation when indicators are diverse views, and it’s impossible to use traditional econometric approaches. \nAs any other method, cluster analysis has certain disadvantages and limitations: in particular, the content and the number of clusters depend on the criteria selected for partition. For the reduction of the original data set to a more compact form there may be some distortion, and characteristics of individual objects may be lost by replacing them with the characteristics of parameters of the cluster center. \nThe main disadvantage of the considered methods of fuzzy clustering C-means and Gustavson-Kessel is that they can only be used when the number of clusters K is known. But usually, the number of clusters is unknown, and visual observations in the multidimensional case simply don’t lead to a success. \nReferences \n1. B. Durant, G. Smith, Cluster Analysis (Statistica, Moscow, 1987), 289 pp. (in Russian)   \n2. V. Dyuk, A. Samoilenko, Data Mining (Peter Publication, Saint-Petersburg, 2001), 366 pp. (in Russian)   \n3. Yu.P. Zaychenko, Fundamentals of Intellectual Systems Design (Kiev-Publishing house “Slovo”, 2004), 352 pp. (in Russian)   \n4. Yu.P. Zaychenko, Fuzzy Models and Methods in Intellectual Systems (Kiev-Publishing House “Slovo”, 2008), 354 pp.   \n5. R.R. Yager, D.P. Filev, Approximate clustering via the mountain method. IEEE Trans. Syst. Man Cybern 24, 1279–1284 (1994)   \n6. R. Krishnapuram, J. Keller, Fuzzy and possibilistic clustering methods for computer vision. IEEE Trans. Fuzzy Syst. 1, 98–110 (1993)   \n7. D.C. Park, I. Dagher, Gradient based fuzzy C-means (GBFCM) algorithm, in Proceedings of the IEEE International Conference On Neural Networks (1984), pp. 1626–1631   \n8. Ye. Bodyanskiy, Ye. Gorshkov, I. Kokshenev, V. Kolodyazhniy, Robust recursive fuzzy clustering algorithms, in Proceedings of the East West Fuzzy Colloquium 2005 (HS, Zittau/ Goerlitz, 2005), pp. 301–308   \n9. Ye. Bodyanskiy, Ye. Gorshkov, I. Kokshenev, V. Kolodyazhniy, Outlier resistant recursive fuzzy clustering algorithm, in Computational Intelligence: Theory and Applications, ed. by B. Reusch. Advances in Soft Computing, vol. 38 (Springer, Berlin, Heidelberg, 2006), pp. 647– 652   \n10. Ye. Bodyanskiy, Computational Intelligence Techniques for Data Analysis. Lecture Notes in Informatics, V. P-72 (GI, Bonn, 2005), pp. 15–36   \n11. Ye. Bodyanskiy, Ye. Gorshkov, I. Kokshenev, V. Kolodyazhniy, O. Shilo, Robust recursive fuzzy clustering-based segmentation of biomedical time series, in Proceedings of the 2006 International Symposium on Evolving Fuzzy Systems, Lancaster, UK (2006), pp. 101–105",
        "chapter": "1 The Cluster Analysis in Big Data Mining",
        "section": "1.9 Conclusions",
        "subsection": "N/A",
        "subsubsection": "N/A"
    },
    {
        "content": "As any other method, cluster analysis has certain disadvantages and limitations: in particular, the content and the number of clusters depend on the criteria selected for partition. For the reduction of the original data set to a more compact form there may be some distortion, and characteristics of individual objects may be lost by replacing them with the characteristics of parameters of the cluster center. \nThe main disadvantage of the considered methods of fuzzy clustering C-means and Gustavson-Kessel is that they can only be used when the number of clusters K is known. But usually, the number of clusters is unknown, and visual observations in the multidimensional case simply don’t lead to a success. \nReferences \n1. B. Durant, G. Smith, Cluster Analysis (Statistica, Moscow, 1987), 289 pp. (in Russian)   \n2. V. Dyuk, A. Samoilenko, Data Mining (Peter Publication, Saint-Petersburg, 2001), 366 pp. (in Russian)   \n3. Yu.P. Zaychenko, Fundamentals of Intellectual Systems Design (Kiev-Publishing house “Slovo”, 2004), 352 pp. (in Russian)   \n4. Yu.P. Zaychenko, Fuzzy Models and Methods in Intellectual Systems (Kiev-Publishing House “Slovo”, 2008), 354 pp.   \n5. R.R. Yager, D.P. Filev, Approximate clustering via the mountain method. IEEE Trans. Syst. Man Cybern 24, 1279–1284 (1994)   \n6. R. Krishnapuram, J. Keller, Fuzzy and possibilistic clustering methods for computer vision. IEEE Trans. Fuzzy Syst. 1, 98–110 (1993)   \n7. D.C. Park, I. Dagher, Gradient based fuzzy C-means (GBFCM) algorithm, in Proceedings of the IEEE International Conference On Neural Networks (1984), pp. 1626–1631   \n8. Ye. Bodyanskiy, Ye. Gorshkov, I. Kokshenev, V. Kolodyazhniy, Robust recursive fuzzy clustering algorithms, in Proceedings of the East West Fuzzy Colloquium 2005 (HS, Zittau/ Goerlitz, 2005), pp. 301–308   \n9. Ye. Bodyanskiy, Ye. Gorshkov, I. Kokshenev, V. Kolodyazhniy, Outlier resistant recursive fuzzy clustering algorithm, in Computational Intelligence: Theory and Applications, ed. by B. Reusch. Advances in Soft Computing, vol. 38 (Springer, Berlin, Heidelberg, 2006), pp. 647– 652   \n10. Ye. Bodyanskiy, Computational Intelligence Techniques for Data Analysis. Lecture Notes in Informatics, V. P-72 (GI, Bonn, 2005), pp. 15–36   \n11. Ye. Bodyanskiy, Ye. Gorshkov, I. Kokshenev, V. Kolodyazhniy, O. Shilo, Robust recursive fuzzy clustering-based segmentation of biomedical time series, in Proceedings of the 2006 International Symposium on Evolving Fuzzy Systems, Lancaster, UK (2006), pp. 101–105 \nChapter 2 Deep Neural Networks and Hybrid GMDH-Neuro-fuzzy Networks in Big Data Analysis \n2.1 Introduction \nOne of the modern and efficient tools for big data analytics are deep networks [1–4]. At present time theory and practice of machine learning live over real “deep revolution” inspired by successful application of deep learning networks which represent the third generation of neural networks. In difference from classic neuron networks (second generation) 80–90 years of 20-th century new training paradigms allowed to get rid of some problems which hindered successful application of traditional neural networks. Neural networks trained with deep learning algorithms not only overcame by accuracy best alternative approaches but in some cases displayed understanding of sense of input information (in image recognition, text analysis and other problems). \nThe most successful industrial systems of computer vision and speech recognition are built on deep networks and giants of IT-industry such as Apple, Google, Facebook created large research teams dealing with deep learning. Term “deep network” means big neural network with many hidden layers of neurons [1, 2]. Deep learning represents a set of methods and techniques for training complex neural networks (NN) with many layers. For such networks traditional machine learning algorithms developed for conventional NN had become inadequate due to some drawbacks in particular problem of decay and explosion of gradient in back propagation algorithm [3, 4]. Therefore large dimensions of modern neural networks with applications for 3-D images recognition and automatic speech recognition demanded development of new efficient training methods called deep learning. \nBut the most serious drawback of deep learning networks is a problem of determination of its proper structure, how to choose adequate number of their layers. \nTo the present time the problem of choice of number of DN layers is based on knowledge and experience of an expert and refers to the art. The adequate solution to this problem is connected with new class of hybrid neural networks—so-called",
        "chapter": "1 The Cluster Analysis in Big Data Mining",
        "section": "References",
        "subsection": "N/A",
        "subsubsection": "N/A"
    },
    {
        "content": "Chapter 2 Deep Neural Networks and Hybrid GMDH-Neuro-fuzzy Networks in Big Data Analysis \n2.1 Introduction \nOne of the modern and efficient tools for big data analytics are deep networks [1–4]. At present time theory and practice of machine learning live over real “deep revolution” inspired by successful application of deep learning networks which represent the third generation of neural networks. In difference from classic neuron networks (second generation) 80–90 years of 20-th century new training paradigms allowed to get rid of some problems which hindered successful application of traditional neural networks. Neural networks trained with deep learning algorithms not only overcame by accuracy best alternative approaches but in some cases displayed understanding of sense of input information (in image recognition, text analysis and other problems). \nThe most successful industrial systems of computer vision and speech recognition are built on deep networks and giants of IT-industry such as Apple, Google, Facebook created large research teams dealing with deep learning. Term “deep network” means big neural network with many hidden layers of neurons [1, 2]. Deep learning represents a set of methods and techniques for training complex neural networks (NN) with many layers. For such networks traditional machine learning algorithms developed for conventional NN had become inadequate due to some drawbacks in particular problem of decay and explosion of gradient in back propagation algorithm [3, 4]. Therefore large dimensions of modern neural networks with applications for 3-D images recognition and automatic speech recognition demanded development of new efficient training methods called deep learning. \nBut the most serious drawback of deep learning networks is a problem of determination of its proper structure, how to choose adequate number of their layers. \nTo the present time the problem of choice of number of DN layers is based on knowledge and experience of an expert and refers to the art. The adequate solution to this problem is connected with new class of hybrid neural networks—so-called \nGMDH-neo-fuzzy networks representing a combination of self-organization method GMDH and fuzzy neural networks. Due to principle of self-organization and small number of tuning parameters GMDH enables to simplify and accelerate the training of DN. In this chapter in the Sects. 2.8–2.10 several variants of this class hybrid networks are considered and algorithms of their structure synthesis based on GMDH are suggested and analyzed. Training algorithms for hybrid deep networks are free from problem of gradient vanishing or explosion and besides the application of GMDH enables to reduce dimensionality of training DN and accelerate the convergence of training DN and by this solve some problems of BD. \n2.2 Autoassociators. Autoencoders \nImplementation of deep learning has led to development of the special learning structure based on application of so-called autoassociators [3]. \nThe main task of autoassociator is to obtain at the output the most accurate mapping of the input vector (pattern). \nThe first autoassociator (AA) was neo-cognitron suggested by Fukushima. Its schema is presented in Fig. 2.1. \nThere are exist two types of AA: generating and synthesizing ones. \nAs the first type are used restricted Boltzmann Machine, (RBM), as the second type—autoencoders (AE) are used.",
        "chapter": "2 Deep Neural Networks and Hybrid GMDH-Neuro-fuzzy Networks in Big Data Analysis",
        "section": "2.1 Introduction",
        "subsection": "N/A",
        "subsubsection": "N/A"
    },
    {
        "content": "GMDH-neo-fuzzy networks representing a combination of self-organization method GMDH and fuzzy neural networks. Due to principle of self-organization and small number of tuning parameters GMDH enables to simplify and accelerate the training of DN. In this chapter in the Sects. 2.8–2.10 several variants of this class hybrid networks are considered and algorithms of their structure synthesis based on GMDH are suggested and analyzed. Training algorithms for hybrid deep networks are free from problem of gradient vanishing or explosion and besides the application of GMDH enables to reduce dimensionality of training DN and accelerate the convergence of training DN and by this solve some problems of BD. \n2.2 Autoassociators. Autoencoders \nImplementation of deep learning has led to development of the special learning structure based on application of so-called autoassociators [3]. \nThe main task of autoassociator is to obtain at the output the most accurate mapping of the input vector (pattern). \nThe first autoassociator (AA) was neo-cognitron suggested by Fukushima. Its schema is presented in Fig. 2.1. \nThere are exist two types of AA: generating and synthesizing ones. \nAs the first type are used restricted Boltzmann Machine, (RBM), as the second type—autoencoders (AE) are used. \nAutoencoder \nOne of the first deep learning algorithms is auto-encoder. It’s an algorithm of non-supervised learning whose output vector equals to input vector [5]. One of the most spread auto-encoder architectures is feedforward neural network containing input, hidden and output layers. \nUnlike perceptron output autoencoder layer has the same number of neurons as the input layer. The data at the input layer are compressed and restored so the hidden features are retrieved. \nThe goal of autoencoder is to attain that NN output to be maximal close to input vector. That to make non-trivial solution of this problem the special constraints are set on network topology: \n(1) the number neurons of hidden layer should be less than the number of input neurons;   \n(2) the number of non-active neurons in hidden layer should significantly exceed the number of active neurons. \nThe first constraint enable to compress data while transfer input signal to network output. Such compression is possible if there are hidden interconnections in data, correlation among features. The second constraint—demand of great number of non-active neurons allows to obtain non-trivial results even when the number of neurons in hidden layer excesses the dimensionality of input data/ In other words the goal of autoencoder is to obtain the most significant features. \nLet consider a neuron be active if its activation is close to one, and non-active its activation is close to zero. These constraints force autoencoder to search correlations and generalization in input data and perform its compression. \nBy this the network automatically learns to extract in input data general features which are encoded in network weights. Its necessary that mean value of transfer function of each hidden neuron to get the value maximal close to a given sparsity parameter about $mathrm { s } = 0 . 0 5$ for this in each neuron of hidden layer was introduced sparsity parameter $p$ : \nIt’s necessary that mean value of transfer function of each hidden neuron takes most close value to $p$ : \nIntroduce a penalty function: \nwhere \nRemarkable property of the penalty function is its derivative: \nThe example of encoder is presented in Fig. 2.2. Autoencoder tends to build function $mathbf { h } ( mathbf { x } ) = mathbf { x }$ . In other words it tends to find such approximation of this function that the neural network output be equal to input vector. That to make the solution of this problem non-trivial the number of hidden layer neurons should be less than the dimensionality of input data (see Fig. 2.2). \nThis allow to obtain data compressing by transfer of input signal to output. For example, if input vector presents a set of brightness levels of an image $1 0 times 1 0$ pixels (all in all 100 features), the number of hidden later neurons is 50, the network is forced to learn to compress an image. \nReally, the demand $mathbf { h } ( mathbf { x } ) = mathbf { x }$ means that on the base of activation levels of 50 neurons the output layer should restore 100 pixels of initial image. Such compression is possible if there is hidden interconnections, correlation in features, and in general a certain structure in data. In this way functions of autoencoder very resembles Principal Components Algorithm (PCA) in the sense of cutting dimensionality of input data. \nLater as sparsity idea has been stated so-called the sparse Autoencoder appeared and got wide application [5, 6]. Sparse autoencoder is an autoencoder with number of hidden neurons much greater than the dimensionality of input vector. Sparse activation means that the number of non-active neurons in the hidden layer exceeds significantly the number of active ones. If describe sparsity informal then a neuron is considered active if its transfer function is about 1. If the sigmoidal transfer function is used then for non-active neuron its value should be close to 0 (for tanh—close to $^ { - 1 }$ ). \nThere is a variant of autoencoder called denoising autoencoder [5]. It’s the same autoencoder but its training is specific. While training randomly distorted data (several input values are changed to 0) is fed into input. By this for comparison with output are shown non-distorted values. In this way autoencoder is compelled to restore distorted input data (Fig. 2.3). \nArtificial feed-forward neural networks (ANN) with large number of layers are badly trained by conventional methods which are good for ANN with small number of hidden layers due to the problem of decaying gradient [4], the farther is layer from output the less are the values of gradient norm. \nThis problem may be solved by correctly chosen initial weights. In this case it doesn’t need to change them significantly during the training process. \n2.3 Boltzmann Machines (BM) \n2.3.1 Energetic Models \nBoltzmann machines represent a special form of log-linear Markov’s field (MRF), i.e. its energy function is linear by parameters. Therefore let’s consider first energy-based models (EBM). EBM connect scalar energy with each configuration of variables. The training corresponds to modification of energy function so that its form obtain the desired properties. For example, we would like that the desired configurations have low energy. Probabilistic models energy-based determine the probability distribution so:",
        "chapter": "2 Deep Neural Networks and Hybrid GMDH-Neuro-fuzzy Networks in Big Data Analysis",
        "section": "2.2 Autoassociators. Autoencoders",
        "subsection": "N/A",
        "subsubsection": "N/A"
    },
    {
        "content": "Later as sparsity idea has been stated so-called the sparse Autoencoder appeared and got wide application [5, 6]. Sparse autoencoder is an autoencoder with number of hidden neurons much greater than the dimensionality of input vector. Sparse activation means that the number of non-active neurons in the hidden layer exceeds significantly the number of active ones. If describe sparsity informal then a neuron is considered active if its transfer function is about 1. If the sigmoidal transfer function is used then for non-active neuron its value should be close to 0 (for tanh—close to $^ { - 1 }$ ). \nThere is a variant of autoencoder called denoising autoencoder [5]. It’s the same autoencoder but its training is specific. While training randomly distorted data (several input values are changed to 0) is fed into input. By this for comparison with output are shown non-distorted values. In this way autoencoder is compelled to restore distorted input data (Fig. 2.3). \nArtificial feed-forward neural networks (ANN) with large number of layers are badly trained by conventional methods which are good for ANN with small number of hidden layers due to the problem of decaying gradient [4], the farther is layer from output the less are the values of gradient norm. \nThis problem may be solved by correctly chosen initial weights. In this case it doesn’t need to change them significantly during the training process. \n2.3 Boltzmann Machines (BM) \n2.3.1 Energetic Models \nBoltzmann machines represent a special form of log-linear Markov’s field (MRF), i.e. its energy function is linear by parameters. Therefore let’s consider first energy-based models (EBM). EBM connect scalar energy with each configuration of variables. The training corresponds to modification of energy function so that its form obtain the desired properties. For example, we would like that the desired configurations have low energy. Probabilistic models energy-based determine the probability distribution so: \nNormalizing multiplier $Z$ is called statistical sum by analogy with physical systems \nEnergy-based model may be explored by using stochastic gradient descent at the empirical negative-logarithmic probability function of data. \nAs for logistic regression we first determine logarithmic-likelihood function and then loss function as negative logarithmic-likelihood function. \n2.3.2 Restricted Boltzmann Machine (RBM) \nThe history of developing RBM begun from recurrent neural (RNN). Representing the networks with backfeed which are difficult to train. Therefore scientists started to invent more restricted recurrent models for which more simple training algorithms may be applied. One of such models was Hopfield network, Hopfield introduced also energy concept after comparing neurodynamics with thermodynamics. \nThe next step was usual Boltzmann machines which differ from Hopfield network by stochastic nature and its neurons are divided into two groups: which describe hidden and visible states. \nThe restricted Boltzmann machines differs from usual one that there are no connections among neurons of the same layer (similar to hidden Markov models). \nIn Fig. 2.4 the structure of RBM is presented. \nThe property of this model is that at given state of one group of neurons the states of another group of neurons would be independent each of other. Now consider some theoretical results wherein this property plays a key role. \nRBM interpretation. RBM are interpreted like hidden Markov models. They have a layer of states which we can observe (visible neurons) and a layer of states \nFig. 2.4 RBM structure",
        "chapter": "2 Deep Neural Networks and Hybrid GMDH-Neuro-fuzzy Networks in Big Data Analysis",
        "section": "2.3 Boltzmann Machines (BM)",
        "subsection": "2.3.1 Energetic Models",
        "subsubsection": "N/A"
    },
    {
        "content": "Normalizing multiplier $Z$ is called statistical sum by analogy with physical systems \nEnergy-based model may be explored by using stochastic gradient descent at the empirical negative-logarithmic probability function of data. \nAs for logistic regression we first determine logarithmic-likelihood function and then loss function as negative logarithmic-likelihood function. \n2.3.2 Restricted Boltzmann Machine (RBM) \nThe history of developing RBM begun from recurrent neural (RNN). Representing the networks with backfeed which are difficult to train. Therefore scientists started to invent more restricted recurrent models for which more simple training algorithms may be applied. One of such models was Hopfield network, Hopfield introduced also energy concept after comparing neurodynamics with thermodynamics. \nThe next step was usual Boltzmann machines which differ from Hopfield network by stochastic nature and its neurons are divided into two groups: which describe hidden and visible states. \nThe restricted Boltzmann machines differs from usual one that there are no connections among neurons of the same layer (similar to hidden Markov models). \nIn Fig. 2.4 the structure of RBM is presented. \nThe property of this model is that at given state of one group of neurons the states of another group of neurons would be independent each of other. Now consider some theoretical results wherein this property plays a key role. \nRBM interpretation. RBM are interpreted like hidden Markov models. They have a layer of states which we can observe (visible neurons) and a layer of states \nFig. 2.4 RBM structure \nwhich are hidden and we can’t see them (hidden neuron). But we can make probabilistic inference concerning hidden states basing on visible ones. After training such model we also get opportunity to make conclusions about visible states knowing hidden ones (using Bayes theorem) and by this generate data from that probabilistic distribution on which model was trained. \nTherefore we can formulate RBM training goal: it’s necessary to tune model parameters so that restored vector would be maximal close to original. \nBy restored vector we imply vector obtained by probabilistic inference from visible states. \nRBM Algorithm \nOften we are not interested to observe completely the instance $mathrm { Delta X }$ or we want to introduce some not-observed variables that to increase the model descriptive force. So let consider visible part of model (denote by X) and invisible part denoted as h. Then we can write: \nEnergy function $mathbf { E ( v , h ) }$ of restricted Boltzmann machine is presented so: \nwhere $mathrm { ~ w ~ }$ are weights connecting visible and non-visible neurons, b, c—are biases of visible and hidden layers correspondingly. \nThis is transferred directly to the following formula for free energy: \nOwing to specific RBM structure visible and non-visible are conditionally dependent each of other. Using this property we can write down: \nThe network consists of stochastic neurons taking 0 or 1 (where $nu _ { j }$ and $h _ { i } in { 0 , 1 } ,$ ). From formulas (2.10) and (2.11) obtain the probabilistic variant of usual neuron activation \nwhere v—is neuron input, W is weight vector (matrix 0, $ { mathbf { b } } _ { mathrm { h } }$ -bias, $sigma ( mathbf { x } )$ —sigmoidal function. \nThis is basic variant for binary inputs (Bernoulli-Bernoulli RBM), there are also modifications for real inputs (Gaussian-Bernoulli RBM и дp.). \nAlgorithm of RBM runs as follows. \n1. Set initial values for input variables $mathbf { V } colon = mathbf { X }$   \n2. Compute probabilities $mathtt { p _ { h } }$ of change neuron states of the second layer (non-visible) $p _ { h } = sigma ( nu * W + b _ { nu } )$ ; \nwhere W—weight matrix, $b _ { nu }$ —is bias vector of the first layer, $sigma$ —activation function (sigmoid). \nStore the old values of input neurons $mathbf { nabla } mathbf { V } ^ { prime } colon = mathbf { nabla } mathbf { V }$ . \n3. Determine the states of the second layer neurons $mathbf { h }$ , assign to neurons states 0 or 1 with probability ph 4. Compute the probabilities $p _ { nu }$ of change states of the first layer neurons \nwhere $b _ { h }$ —bias vector of the second layer, $sigma$ —activation function (sigmoid), and assign neurons states 1 with probabilities $mathtt { p _ { v } }$ (or 0 with probabilities $1 - { mathfrak { p } } _ { mathrm { v } } )$ \n5. If $mathbf { v } neq mathbf { v } ^ { prime }$ then repeat from step 2. Otherwise go to the next step.   \n6. Release result v.   \n7. End. \n2.4 Training Method Contrastive Divergence (CD) \nRBM training algorithm is called contrastive divergence and represent itself the modified gradient descent. As the estimation function to be optimized likelihood function $mathrm { ~ L ~ }$ is used. Let search its maximum. Likelihood Function L for parameters $( mathbb { W } , b _ { nu } , b _ { h } )$ and pattern v is determined under given values of parameters W, h as \nFor simplicity of computations we’ll use logarithm:",
        "chapter": "2 Deep Neural Networks and Hybrid GMDH-Neuro-fuzzy Networks in Big Data Analysis",
        "section": "2.3 Boltzmann Machines (BM)",
        "subsection": "2.3.2 Restricted Boltzmann Machine (RBM)",
        "subsubsection": "N/A"
    },
    {
        "content": "where $mu$ is so-called moment parameter, e is training speed, DW; $Delta { b _ { nu } } Delta { b _ { h } }$ —are parameters change at the previous iteration. \nAs stop criterion we’ll use MSE between input and output of BRM— $E ( nu _ { 0 } , nu _ { k } )$ , this value should decrease to the established threshold Emin. \nTraining algorithm consists of the following steps: \n1. Initialize (by zeros) weight matrix W and bias vectors $b _ { nu } , b ; _ { h }$   \n2. Choose random mini-batch out of all training set (mini-batch) X;   \n3. For all the examples in mini-batch assign initial values to first layer $mathbf { V } colon = mathbf { X }$ .   \n4. Execute $mathbf { k }$ cycles in network, determine initial and final states of layers cлoёв   \n$mathbf { Omega } _ { nu _ { 0 } , h _ { 0 } }$ , $nu _ { k }$ , $h _ { k }$ , (гдe k—пapaмeтp)   \n5. Compute gradient according to (2.18) and adjust weights by (2.18).   \n6. Calculate network MSE E;   \n7. if $mathbf { E } < mathbf { E } mathbf { m i n }$ then go to 8, otherwise go to 2;   \n8. end. \n2.4.1 Training Algorithm Contrastive Divergence (CD-k) \nThis algorithm was developed by professor Hinton in 2002, and it differs by simplicity. The main idea lies in that the mathematical expectation are replaced by certain values The concept of sampling is introduced (Gibbs sampling). \nAlgorithm CD-k runs as follows: \n1. States of visible neurons are set equal to input pattern;   \n2. The probabilities of hidden layer neurons are calculated;   \n3. Each neuron of hidden layer the state “1” is assigned with probability equal to its current state;   \n4. The probabilities of states of visible layer are determined basing at the states of hidden layer;   \n5. If number of current iteration is less than $mathbf { k }$ , return to step 2;   \n6. The probabilities neuron states of hidden layer are obtained The work of corresponding algorithm is presented in Fig. 2.5.   \nThe longer we make sampling the more accurate works CD-algorithm. \n\n2.4.2 Example \nConsider the implementation of above presented model. At the start in the memory are stored several images of Latin letters. After then to system are shown another alike patterns distorted and using them the original patterns should be restored. Training set is presented below \nA B C D E F G H 1 J K L M N 。 P Q R s T U V W X Y Z \nResults of algorithm work",
        "chapter": "2 Deep Neural Networks and Hybrid GMDH-Neuro-fuzzy Networks in Big Data Analysis",
        "section": "2.4 Training Method Contrastive Divergence (CD)",
        "subsection": "2.4.1 Training Algorithm Contrastive Divergence (CD-k)",
        "subsubsection": "N/A"
    },
    {
        "content": "2.4.2 Example \nConsider the implementation of above presented model. At the start in the memory are stored several images of Latin letters. After then to system are shown another alike patterns distorted and using them the original patterns should be restored. Training set is presented below \nA B C D E F G H 1 J K L M N 。 P Q R s T U V W X Y Z \nResults of algorithm work \nFlow charts of training error are presented in Fig. 2.6a, b in Fig. 2.7 weight maps of hidden layer are presented. \nV W W W W W W W 福1 V V W w W W w w w w w X XX X X X 区 Y Y Y Y YX X X X X X Y Y T Y Y Y YZ Z Z Z Z jZ Z Z Z Z Z Z Z 7约 N N N N ○ O o C O 0 2 O OH N N N N N □ 。 Q 0 。 。 Q 。 GP P P P P D P P Q Q Q Q Q CP P P P P F F P P Q Q Q Q Q QQ Q R R R R R R R R R S S SQ Q Q R R R R R R R R R S S BS S S S S S 工 T T T T T T TBbIxOA S S B B S B T T Y T TU U U U U U U 0 T T V0 H D U U U U D V V V V V V\n2.5 Stacked Autoassociators Networks \n2.5.1 Stacked Autoencoder (SAE) \nFor retrieving high-level abstractions out of input set autoassociators are stacked in network. In Fig. 2.8 the structure schema of stacked autoencoder is shown which in a whole represent deep learning network with weights initialized by stacked Autoencoder. \n2.5.2 Stacked RBM \nIn Fig. 2.9 structure schema of stacked restricted Boltzmann (SRBM) and neural network are presented which represents Deep neural network with weights initialized by SRBM.",
        "chapter": "2 Deep Neural Networks and Hybrid GMDH-Neuro-fuzzy Networks in Big Data Analysis",
        "section": "2.4 Training Method Contrastive Divergence (CD)",
        "subsection": "2.4.2 Example",
        "subsubsection": "N/A"
    },
    {
        "content": "Flow charts of training error are presented in Fig. 2.6a, b in Fig. 2.7 weight maps of hidden layer are presented. \nV W W W W W W W 福1 V V W w W W w w w w w X XX X X X 区 Y Y Y Y YX X X X X X Y Y T Y Y Y YZ Z Z Z Z jZ Z Z Z Z Z Z Z 7约 N N N N ○ O o C O 0 2 O OH N N N N N □ 。 Q 0 。 。 Q 。 GP P P P P D P P Q Q Q Q Q CP P P P P F F P P Q Q Q Q Q QQ Q R R R R R R R R R S S SQ Q Q R R R R R R R R R S S BS S S S S S 工 T T T T T T TBbIxOA S S B B S B T T Y T TU U U U U U U 0 T T V0 H D U U U U D V V V V V V\n2.5 Stacked Autoassociators Networks \n2.5.1 Stacked Autoencoder (SAE) \nFor retrieving high-level abstractions out of input set autoassociators are stacked in network. In Fig. 2.8 the structure schema of stacked autoencoder is shown which in a whole represent deep learning network with weights initialized by stacked Autoencoder. \n2.5.2 Stacked RBM \nIn Fig. 2.9 structure schema of stacked restricted Boltzmann (SRBM) and neural network are presented which represents Deep neural network with weights initialized by SRBM.",
        "chapter": "2 Deep Neural Networks and Hybrid GMDH-Neuro-fuzzy Networks in Big Data Analysis",
        "section": "2.5 Stacked Autoassociators Networks",
        "subsection": "2.5.1 Stacked Autoencoder (SAE)",
        "subsubsection": "N/A"
    },
    {
        "content": "Flow charts of training error are presented in Fig. 2.6a, b in Fig. 2.7 weight maps of hidden layer are presented. \nV W W W W W W W 福1 V V W w W W w w w w w X XX X X X 区 Y Y Y Y YX X X X X X Y Y T Y Y Y YZ Z Z Z Z jZ Z Z Z Z Z Z Z 7约 N N N N ○ O o C O 0 2 O OH N N N N N □ 。 Q 0 。 。 Q 。 GP P P P P D P P Q Q Q Q Q CP P P P P F F P P Q Q Q Q Q QQ Q R R R R R R R R R S S SQ Q Q R R R R R R R R R S S BS S S S S S 工 T T T T T T TBbIxOA S S B B S B T T Y T TU U U U U U U 0 T T V0 H D U U U U D V V V V V V\n2.5 Stacked Autoassociators Networks \n2.5.1 Stacked Autoencoder (SAE) \nFor retrieving high-level abstractions out of input set autoassociators are stacked in network. In Fig. 2.8 the structure schema of stacked autoencoder is shown which in a whole represent deep learning network with weights initialized by stacked Autoencoder. \n2.5.2 Stacked RBM \nIn Fig. 2.9 structure schema of stacked restricted Boltzmann (SRBM) and neural network are presented which represents Deep neural network with weights initialized by SRBM. \nT 拉 E W W 美 P 国 2 1 线 石 E W 5 R 2 G 5 2 国 E 成 H 中 医 汉 中 出 V 慧 国 医 国 Y 5 1 中 H W 交 运 8 医 1 国 W d 0 M 色 图 J 经 V 阳 国 过 万 我 R 8 半 南 0 8 起 工 江 E √ 网 区 R 1 3 M 总 G 商 S L C + 國 E 甘 西 H 江 区 陵 U 1 建 医 豆 8 1 0 集 售 A 1 始 营 S S 购 G 电 国 ① 请 3 T M W 运 日 从 R 使 8 K 谐 东 区 Y 1 M 厦 福 9 H Y 队 R 4 1 S 5 4 G G 国 建 D 1 税 G 起 Q 5 R T 虹 内 S C 爱 始 1 口 品 子 迎 B M 区 1 园 3 1 R 5 E 江 药 R 医 U 盛 日 1 福 惠 压 银 H 国 C 日 U M 101 13 心 日 5 0 彩 9 O 白 E X 家 江 B 1 厦 第 经 G H 国 出 非 上 F y 福 出 其 9 南 8 B W # 6 5 1 T T 区 V 心 8 H J 东 M L R E 建 T 5 U 京 9 E 8 常 T 第 E \nStructures of deep networks are shown just in such a way underlining that information is retrieved upward (from bottom to top). \n2.6 Deep Networks Learning \nProcess of learning deep networks is split in two stages [5–7]: \n1. Pretraining;   \n2. Weights fine-tuning. \n2.6.1 Deep Network Pretraining \nAt the first stage auto-associative network (SAE или SRBM) is non-supervised trained at an array of not-marked data after that neurons of MLP hidden layer are initialized with weights obtained after training. In Fig. 2.9 this process of training and transfer is shown. After training of the first AE/RBM weights of hidden layer neurons become inputs of the second layer and so on. By this more and more general information about structure of data (line, contour etc.) is retrieved out of data. \nLet’s consider the pretraining procedure more detail. Pretraining represents the following procedure: we take pairs of neighbor layers of deep learning network beginning from the first layer and construct from this pair autoencoder, by adding output layer identical to input one. This procedure is repeated sequentially for all network layers. This procedure may be described as follows. \n1. Load a training data set X0;   \n2. Determine a network parameters—a number $( Nu )$ and size of layers;   \n3. Set the number of current layer $dot { mathbf { i } } = 0$ ;   \n4. Build autoencoder for layers i, $mathrm { i } + 1$ ;   \n5. Train autoencoder at the set $X _ { mathrm { i } }$ ;   \n6. Take away auxiliary (output) layer of autoencoder;   \n7. Preserve connection weights of layers i, $dot { mathrm { ~ i ~ } } + mathrm { ~ 1 ~ }$ ;   \n8. If there are still pair of layers to be processed $( mathrm { i } < Nu - 2 ) ,$ ), then go to the next   \nstep, otherwise go to step 10.   \n9. Generate data set $mathbf { X } _ { mathrm { i + 1 } }$ for next autoencoder for this propagate through pair of   \nlayers i, $dot { mathrm { ~ i ~ } } + 1$ data set Xi, and go to step 3;   \n10. End of work. \nAfter this procedure the network is trained as a whole by one of the gradient methods. \nBesides, for deep networks with number of hidden layers more than three D. Hinton suggested to perform fine tuning also in two stages. At the first stage train only two upper layers and only after than to train the whole network. It worth to note that with non-supervised learning SRBM gives less stable results than SAE.",
        "chapter": "2 Deep Neural Networks and Hybrid GMDH-Neuro-fuzzy Networks in Big Data Analysis",
        "section": "2.5 Stacked Autoassociators Networks",
        "subsection": "2.5.2 Stacked RBM",
        "subsubsection": "N/A"
    },
    {
        "content": "Structures of deep networks are shown just in such a way underlining that information is retrieved upward (from bottom to top). \n2.6 Deep Networks Learning \nProcess of learning deep networks is split in two stages [5–7]: \n1. Pretraining;   \n2. Weights fine-tuning. \n2.6.1 Deep Network Pretraining \nAt the first stage auto-associative network (SAE или SRBM) is non-supervised trained at an array of not-marked data after that neurons of MLP hidden layer are initialized with weights obtained after training. In Fig. 2.9 this process of training and transfer is shown. After training of the first AE/RBM weights of hidden layer neurons become inputs of the second layer and so on. By this more and more general information about structure of data (line, contour etc.) is retrieved out of data. \nLet’s consider the pretraining procedure more detail. Pretraining represents the following procedure: we take pairs of neighbor layers of deep learning network beginning from the first layer and construct from this pair autoencoder, by adding output layer identical to input one. This procedure is repeated sequentially for all network layers. This procedure may be described as follows. \n1. Load a training data set X0;   \n2. Determine a network parameters—a number $( Nu )$ and size of layers;   \n3. Set the number of current layer $dot { mathbf { i } } = 0$ ;   \n4. Build autoencoder for layers i, $mathrm { i } + 1$ ;   \n5. Train autoencoder at the set $X _ { mathrm { i } }$ ;   \n6. Take away auxiliary (output) layer of autoencoder;   \n7. Preserve connection weights of layers i, $dot { mathrm { ~ i ~ } } + mathrm { ~ 1 ~ }$ ;   \n8. If there are still pair of layers to be processed $( mathrm { i } < Nu - 2 ) ,$ ), then go to the next   \nstep, otherwise go to step 10.   \n9. Generate data set $mathbf { X } _ { mathrm { i + 1 } }$ for next autoencoder for this propagate through pair of   \nlayers i, $dot { mathrm { ~ i ~ } } + 1$ data set Xi, and go to step 3;   \n10. End of work. \nAfter this procedure the network is trained as a whole by one of the gradient methods. \nBesides, for deep networks with number of hidden layers more than three D. Hinton suggested to perform fine tuning also in two stages. At the first stage train only two upper layers and only after than to train the whole network. It worth to note that with non-supervised learning SRBM gives less stable results than SAE. \n2.6.2 Fine-Tuning \nAt the second stage fine-tuning of MLP weights (training with teacher) is performed by known methods. It was proved practically that such initialization set weights of neurons of MLP hidden layers in the region of global minimum and next fine-tuning is performed for very short time. Fine-tuning is a process of weights small changes for improving or optimization of results. As a rule it is aimed to increase process efficiency. Fine-tuning may be executed by a number of methods which are dependent on optimized processes which include gradient methods of first order, gradient methods of second order: Newton and quasi-Newton methods and other. \n2.7 Deep Learning Regularization \nIn problem of neural networks training exists two types of errors: (1) so-called training error $ varepsilon _ { t r }$ and generalization error $varepsilon _ { g e n }$ . Training error is the error at the training sample while generalization error is error at test sample. These two errors are functions of the number of training iterations $mathfrak { n }$ and display different behavior: $varepsilon _ { t r }$ monotonous decrease with n, while generalization error $varepsilon _ { g e n }$ first decrease then attains minimum and then begins to rise with increase of n (this phenomenon is called overfitting). The goal of training lies in minimization of generalization error. \nRegularization is any modification of training algorithm aimed to decrease generalization error at the expense of certain increase of training error. Regularization refers to one of the central problems in machine learning competing by its significance with problem of optimization. \nDue to the theorem of costless breakfast the best algorithm of machine training doesn’t exists in particularly, there is no the best method of regularization. \nInstead we need choose the regularization form which fits well to our problem to be solved. Philosophy of deep learning in a whole lies therein wide range of problems (such as all the intelligent problems) can be efficiently solved with application of general forms (methods) of regularization. Consider the most popular regularization methods and their models. \n2.7.1 $mathbf { { { L } _ { p } } }$ -Regularization of Linear Regression \nConsider the classic linear regression model",
        "chapter": "2 Deep Neural Networks and Hybrid GMDH-Neuro-fuzzy Networks in Big Data Analysis",
        "section": "2.6 Deep Networks Learning",
        "subsection": "2.6.1 Deep Network Pretraining",
        "subsubsection": "N/A"
    },
    {
        "content": "2.6.2 Fine-Tuning \nAt the second stage fine-tuning of MLP weights (training with teacher) is performed by known methods. It was proved practically that such initialization set weights of neurons of MLP hidden layers in the region of global minimum and next fine-tuning is performed for very short time. Fine-tuning is a process of weights small changes for improving or optimization of results. As a rule it is aimed to increase process efficiency. Fine-tuning may be executed by a number of methods which are dependent on optimized processes which include gradient methods of first order, gradient methods of second order: Newton and quasi-Newton methods and other. \n2.7 Deep Learning Regularization \nIn problem of neural networks training exists two types of errors: (1) so-called training error $ varepsilon _ { t r }$ and generalization error $varepsilon _ { g e n }$ . Training error is the error at the training sample while generalization error is error at test sample. These two errors are functions of the number of training iterations $mathfrak { n }$ and display different behavior: $varepsilon _ { t r }$ monotonous decrease with n, while generalization error $varepsilon _ { g e n }$ first decrease then attains minimum and then begins to rise with increase of n (this phenomenon is called overfitting). The goal of training lies in minimization of generalization error. \nRegularization is any modification of training algorithm aimed to decrease generalization error at the expense of certain increase of training error. Regularization refers to one of the central problems in machine learning competing by its significance with problem of optimization. \nDue to the theorem of costless breakfast the best algorithm of machine training doesn’t exists in particularly, there is no the best method of regularization. \nInstead we need choose the regularization form which fits well to our problem to be solved. Philosophy of deep learning in a whole lies therein wide range of problems (such as all the intelligent problems) can be efficiently solved with application of general forms (methods) of regularization. Consider the most popular regularization methods and their models. \n2.7.1 $mathbf { { { L } _ { p } } }$ -Regularization of Linear Regression \nConsider the classic linear regression model",
        "chapter": "2 Deep Neural Networks and Hybrid GMDH-Neuro-fuzzy Networks in Big Data Analysis",
        "section": "2.6 Deep Networks Learning",
        "subsection": "2.6.2 Fine-Tuning",
        "subsubsection": "N/A"
    },
    {
        "content": "2.6.2 Fine-Tuning \nAt the second stage fine-tuning of MLP weights (training with teacher) is performed by known methods. It was proved practically that such initialization set weights of neurons of MLP hidden layers in the region of global minimum and next fine-tuning is performed for very short time. Fine-tuning is a process of weights small changes for improving or optimization of results. As a rule it is aimed to increase process efficiency. Fine-tuning may be executed by a number of methods which are dependent on optimized processes which include gradient methods of first order, gradient methods of second order: Newton and quasi-Newton methods and other. \n2.7 Deep Learning Regularization \nIn problem of neural networks training exists two types of errors: (1) so-called training error $ varepsilon _ { t r }$ and generalization error $varepsilon _ { g e n }$ . Training error is the error at the training sample while generalization error is error at test sample. These two errors are functions of the number of training iterations $mathfrak { n }$ and display different behavior: $varepsilon _ { t r }$ monotonous decrease with n, while generalization error $varepsilon _ { g e n }$ first decrease then attains minimum and then begins to rise with increase of n (this phenomenon is called overfitting). The goal of training lies in minimization of generalization error. \nRegularization is any modification of training algorithm aimed to decrease generalization error at the expense of certain increase of training error. Regularization refers to one of the central problems in machine learning competing by its significance with problem of optimization. \nDue to the theorem of costless breakfast the best algorithm of machine training doesn’t exists in particularly, there is no the best method of regularization. \nInstead we need choose the regularization form which fits well to our problem to be solved. Philosophy of deep learning in a whole lies therein wide range of problems (such as all the intelligent problems) can be efficiently solved with application of general forms (methods) of regularization. Consider the most popular regularization methods and their models. \n2.7.1 $mathbf { { { L } _ { p } } }$ -Regularization of Linear Regression \nConsider the classic linear regression model \nSearch of weights $w$ by maximization of likelihood function of sample in this model is equivalent to LSM method: \nwhere $x _ { i } in R ^ { N }$ is a value of $i cdot$ -th feature for all objects in the sample $X = [ pmb { x } _ { 1 } , . . . , pmb { x } _ { d } ]$ . Note that introduced here denotation $mathbf { lambda } _ { pmb { x } _ { i } }$ differs from standard when by $x i$ is implied $i cdot$ -th sample object. Here and further the sample is assumed normalized. \nProblem (2.21) has simple geometric interpretation—search a projection of vector t onto hyperplane with direction vectors $[ x 1 , x 2 , . . . , x d ]$ (see Fig. 2.10). This problem can be solved analytically: \nThe solution for $w$ corresponds to pseudo-solution of system of linear equations $X pmb { w } = pmb { t }$ . \nThat to prevent to overfitting of linear regression it’s necessary to set constraints on the variability of decision. This may be done by introduce of constraint on the norm of weight vector $pmb { w }$ : \nTraditionally instead of solving problem (2.23) the problem of optimization of the following regularized functional is considered \nIt’s easy to show that optimization problems (2.23), (2.24) и (2.25) are equivalent under condition \n$p geq 1$ , i.e. when all the considered functions are convex. \nThen due to variant Kuhn-Tacker theorem for convex functions necessary and sufficient conditions for existence of solution $hat { w }$ in the problem (2.23), (2.24) is existence of $lambda geq 0$ , for which the following conditions will be true: \n1. Principle of minimum: $L ( hat { w } ; lambda ) = m i n _ { w } L ( w ; lambda )$ , that is, $nabla mathrm { L } ( mathrm { w } , lambda ) = 0$ :   \n2. Condition of complementary non-fixedness: $lambda Big ( | w | _ { L _ { p } } ^ { p } - b Big ) = 0$ : \nNote that for sufficiency of 1 and 2 it’s demanded also the fulfillment of so-called “Slater condition”, i.e. existence such solution ${ pmb w }$ : $| pmb { w } | _ { L _ { p } } ^ { p } < b$ : \nIt’s clear that this condition holds when $b > 0$ . \nOptimization problem (2.25) is equivalent to condition 1. Consider the condition 2. This condition is equivalent to occurrence one of two events: $lambda = 0$ or $| pmb { w } | _ { L _ { p } } ^ { p } = b$ : \nIf $lambda = 0$ , then optimal point $hat { w }$ lies inside the region $| pmb { w } | _ { L _ { p } } ^ { p } < b$ : Consequently constraint $| pmb { w } | _ { L _ { p } } ^ { p } le b$ becomes obvious and optimization problem (2.23) transforms into optimization problem without constraints that is equivalent to the problem (2.25) under $lambda = 0$ : ∇ \nLet be $lambda > 0$ ; $| pmb { w } | _ { L _ { p } } ^ { p } = b$ : The accomplishment of this constraint is easy to obtain in the problem (2.25), just simply denote by $b$ the value of vector w norm, optimal referring for problem (2.25). \nConsider optimal solution of problem (2.23), (2.24) under different $p$ . It’s can be shown that in case of $p  leq  1$ optimal solution has sparsity property, i.e. a portion of weights are exactly equal to zero. In case $p > 1$ strictly zero weights in optimal solution are practically impossible. Note that situation $p = 1$ is distinguished, as in this case optimized functional (2.25) is convex and optimal solution is sparse. \nMethod of adjustment weights in linear regression by solving problem (2.24) or (2.25) with $L 1$ -norm was called LASSO (abbr. from Least Absolute Shrinkage and Selection Operator). \n2.7.2 Early Stopping \nEarly stopping assumes the division or training process on stages of indeed training and validation. In stead of training network on the restricted number of iterations we train network until its performance begins to fall. In fact this prevent to network to",
        "chapter": "2 Deep Neural Networks and Hybrid GMDH-Neuro-fuzzy Networks in Big Data Analysis",
        "section": "2.7 Deep Learning Regularization",
        "subsection": "2.7.1 Lp-Regularization of Linear Regression",
        "subsubsection": "N/A"
    },
    {
        "content": "It’s easy to show that optimization problems (2.23), (2.24) и (2.25) are equivalent under condition \n$p geq 1$ , i.e. when all the considered functions are convex. \nThen due to variant Kuhn-Tacker theorem for convex functions necessary and sufficient conditions for existence of solution $hat { w }$ in the problem (2.23), (2.24) is existence of $lambda geq 0$ , for which the following conditions will be true: \n1. Principle of minimum: $L ( hat { w } ; lambda ) = m i n _ { w } L ( w ; lambda )$ , that is, $nabla mathrm { L } ( mathrm { w } , lambda ) = 0$ :   \n2. Condition of complementary non-fixedness: $lambda Big ( | w | _ { L _ { p } } ^ { p } - b Big ) = 0$ : \nNote that for sufficiency of 1 and 2 it’s demanded also the fulfillment of so-called “Slater condition”, i.e. existence such solution ${ pmb w }$ : $| pmb { w } | _ { L _ { p } } ^ { p } < b$ : \nIt’s clear that this condition holds when $b > 0$ . \nOptimization problem (2.25) is equivalent to condition 1. Consider the condition 2. This condition is equivalent to occurrence one of two events: $lambda = 0$ or $| pmb { w } | _ { L _ { p } } ^ { p } = b$ : \nIf $lambda = 0$ , then optimal point $hat { w }$ lies inside the region $| pmb { w } | _ { L _ { p } } ^ { p } < b$ : Consequently constraint $| pmb { w } | _ { L _ { p } } ^ { p } le b$ becomes obvious and optimization problem (2.23) transforms into optimization problem without constraints that is equivalent to the problem (2.25) under $lambda = 0$ : ∇ \nLet be $lambda > 0$ ; $| pmb { w } | _ { L _ { p } } ^ { p } = b$ : The accomplishment of this constraint is easy to obtain in the problem (2.25), just simply denote by $b$ the value of vector w norm, optimal referring for problem (2.25). \nConsider optimal solution of problem (2.23), (2.24) under different $p$ . It’s can be shown that in case of $p  leq  1$ optimal solution has sparsity property, i.e. a portion of weights are exactly equal to zero. In case $p > 1$ strictly zero weights in optimal solution are practically impossible. Note that situation $p = 1$ is distinguished, as in this case optimized functional (2.25) is convex and optimal solution is sparse. \nMethod of adjustment weights in linear regression by solving problem (2.24) or (2.25) with $L 1$ -norm was called LASSO (abbr. from Least Absolute Shrinkage and Selection Operator). \n2.7.2 Early Stopping \nEarly stopping assumes the division or training process on stages of indeed training and validation. In stead of training network on the restricted number of iterations we train network until its performance begins to fall. In fact this prevent to network to \nsimple remembering patterns. Below in Fig. 2.11 two possible stop points are shown: \nFigure 2.12 shows the performance and degree of overfitting after stop at these points (a, b): \nRegularization penalize network for use of complicated structure. Complexity in this case is measured by network size and weights. It established by addition of interval to loss function which is tied to size and weight. \nwhere n—is a number of loads (weights) in a neural network. \nThe parameters $textsf { textsf { a } }$ and $beta$ control a level after which under-fitting or overfitting take place. The corresponding values for them can be found by optimization or Bayes analysis (Fig. 2.13). \n2.7.3 Dropout \nThe main idea of Dropout is instead of training one DNN to train an ensemble of several DNN and then to average the obtained results [3, 4]. \nNetworks for training are obtained by excluding from a network (dropping out) neurons with probability p, so that the probability that neuron will remain in the network is equal ${ mathsf { q } } = 1 - { mathsf { p } }$ . “Dropout” of a neuron means that under any input data it return value 0. \nExcluded neurons don’t contribute in training process at all stages of algorithm backpropagation; therefore dropout even one neuron is equivalent to training new neural network. \nThe probabilities of dropout each of neurons are equal. It means the following. Using conditions, that: \n• $mathbf { h } ( mathbf { x } ) = mathbf { x W } + mathbf { b }$ is linear projection of input vector X in the space of dimension $mathrm { d } _ { mathrm { i } }$ on $mathbf { d } _ { mathrm { h } }$ -dimensional space of output variables; $mathrm { a ( h ) }$ is activation function, \nthe application of Dropout to this projection at the training stage is possible to present as a modified activation function:",
        "chapter": "2 Deep Neural Networks and Hybrid GMDH-Neuro-fuzzy Networks in Big Data Analysis",
        "section": "2.7 Deep Learning Regularization",
        "subsection": "2.7.2 Early Stopping",
        "subsubsection": "N/A"
    },
    {
        "content": "2.7.3 Dropout \nThe main idea of Dropout is instead of training one DNN to train an ensemble of several DNN and then to average the obtained results [3, 4]. \nNetworks for training are obtained by excluding from a network (dropping out) neurons with probability p, so that the probability that neuron will remain in the network is equal ${ mathsf { q } } = 1 - { mathsf { p } }$ . “Dropout” of a neuron means that under any input data it return value 0. \nExcluded neurons don’t contribute in training process at all stages of algorithm backpropagation; therefore dropout even one neuron is equivalent to training new neural network. \nThe probabilities of dropout each of neurons are equal. It means the following. Using conditions, that: \n• $mathbf { h } ( mathbf { x } ) = mathbf { x W } + mathbf { b }$ is linear projection of input vector X in the space of dimension $mathrm { d } _ { mathrm { i } }$ on $mathbf { d } _ { mathrm { h } }$ -dimensional space of output variables; $mathrm { a ( h ) }$ is activation function, \nthe application of Dropout to this projection at the training stage is possible to present as a modified activation function: \nwhere $D = ( x 1 , x 2 , ldots x _ { d h } ) - mathrm { d _ { h } } .$ -dimensional vector of random variables $X _ { i }$ , distributed by Bernoulli law. \nThen $X _ { i }$ has the following probability distribution: \nwhere $mathbf { k }$ are all possible output values. \nIt’s evident that this random variable ideally matches to Dropout procedure, applied to one neuron. Indeed, a neuron is switched off with probability $p = P ( k = 1 )$ , otherwise it remains switched on. Consider the application of Dropout to $i$ -th neuron: \nwhere $P ( X _ { i } = 0 ) = p$ : \nAs at the training stage a neuron remains switched on with probability $q$ , at the test stage we need emulate the behavior of ensemble of neurons which used was at the training stage. For that it was suggested at the test stage to multiply activation function at a coefficient $q$ . So, we have \nIt’s possible to use other approach—so-called back Dropout. In this case we multiply activation function at the proper coefficient not at the test stage but at the training stage. This coefficient is equal to the inverse value of probability that neuron remains in a network switched on: $begin{array} { r } { frac { 1 } { 1 - p } = frac { 1 } { q } , } end{array}$ \nIn this case output of the ith hidden neuron is equal: \nIn the case of direct Dropout we are compelled to change a neural network for testing as without multiply $q$ a neuron will return the signal higher than those which \nnext neurons are waiting to get: therefore implementation of inverse Dropout is used more often. \n2.7.4 Bagging (Ensemble Method) \nOne of ways to get ensemble of networks is application training by different training samples which are obtained as a result of random process so called bagging. Bagging (short for bootstrap aggregating)—is a method of decrease of generalization error by aggregating several models [4]. The idea lies in that to train several different models separately and then all models vote at output at test sample. This is an example of general strategy of machine learning called averaging model. \nMethods using this strategy are known as ensemble methods. The cause to use averaging lies that usually different models make different errors in test sample. Consider for instance a set of $mathbf { k }$ regression models. Assume that each model make error $epsilon _ { i }$ for each pattern i obtained from multivariate normal distribution with variances $E big [ epsilon _ { i } ^ { 2 } big ] = nu$ and covariance $E big [ epsilon _ { i } epsilon _ { j } big ] = c$ : \nThen the averaged prediction error made by whole ensemble is equal $mathfrak { s } - frac { 1 } { k } sum _ { i } c _ { i }$ . And mean squared error of ensemble is: \nIn the case when all errors completely correlated and $mathrm { { boldsymbol { c } } = mathrm { { boldsymbol { v } } } }$ , MSE is equal to $mathbf { v }$ , therefore the averaging of models doesn’t help at all. But in case when errors of different models are non-correlated $mathrm { ~ c ~ } = 0$ , MSE of ensemble is equal ${ frac { 1 } { k } } nu$ : \nThis means that MSE of ensemble linearly decreases with the size of ensemble. In other words in average the ensemble will behave at least not worse as any of its members and if all members make independent errors the ensemble will behave much better than its members. \n2.8 Cascade Neo-fuzzy Neural Networks Structure Synthesis and Learning with Application of GMDH \nIntroduction \nLast years the problem of stock prices and market indexes forecasting is of great importance. For its solution various approaches were applied. The most prospective methods of forecasting at markets are neural networks, especially fuzzy neural networks and the GMDH. Earlier it was proved that neural networks are universal approximators [4] and have some remarkable properties, such as parallel processing of information, ability to work with incomplete noisy input data, and learning possibilities to achieve the desired response (output).",
        "chapter": "2 Deep Neural Networks and Hybrid GMDH-Neuro-fuzzy Networks in Big Data Analysis",
        "section": "2.7 Deep Learning Regularization",
        "subsection": "2.7.3 Dropout",
        "subsubsection": "N/A"
    },
    {
        "content": "next neurons are waiting to get: therefore implementation of inverse Dropout is used more often. \n2.7.4 Bagging (Ensemble Method) \nOne of ways to get ensemble of networks is application training by different training samples which are obtained as a result of random process so called bagging. Bagging (short for bootstrap aggregating)—is a method of decrease of generalization error by aggregating several models [4]. The idea lies in that to train several different models separately and then all models vote at output at test sample. This is an example of general strategy of machine learning called averaging model. \nMethods using this strategy are known as ensemble methods. The cause to use averaging lies that usually different models make different errors in test sample. Consider for instance a set of $mathbf { k }$ regression models. Assume that each model make error $epsilon _ { i }$ for each pattern i obtained from multivariate normal distribution with variances $E big [ epsilon _ { i } ^ { 2 } big ] = nu$ and covariance $E big [ epsilon _ { i } epsilon _ { j } big ] = c$ : \nThen the averaged prediction error made by whole ensemble is equal $mathfrak { s } - frac { 1 } { k } sum _ { i } c _ { i }$ . And mean squared error of ensemble is: \nIn the case when all errors completely correlated and $mathrm { { boldsymbol { c } } = mathrm { { boldsymbol { v } } } }$ , MSE is equal to $mathbf { v }$ , therefore the averaging of models doesn’t help at all. But in case when errors of different models are non-correlated $mathrm { ~ c ~ } = 0$ , MSE of ensemble is equal ${ frac { 1 } { k } } nu$ : \nThis means that MSE of ensemble linearly decreases with the size of ensemble. In other words in average the ensemble will behave at least not worse as any of its members and if all members make independent errors the ensemble will behave much better than its members. \n2.8 Cascade Neo-fuzzy Neural Networks Structure Synthesis and Learning with Application of GMDH \nIntroduction \nLast years the problem of stock prices and market indexes forecasting is of great importance. For its solution various approaches were applied. The most prospective methods of forecasting at markets are neural networks, especially fuzzy neural networks and the GMDH. Earlier it was proved that neural networks are universal approximators [4] and have some remarkable properties, such as parallel processing of information, ability to work with incomplete noisy input data, and learning possibilities to achieve the desired response (output).",
        "chapter": "2 Deep Neural Networks and Hybrid GMDH-Neuro-fuzzy Networks in Big Data Analysis",
        "section": "2.7 Deep Learning Regularization",
        "subsection": "2.7.4 Bagging (Ensemble Method)",
        "subsubsection": "N/A"
    },
    {
        "content": "The GMDH, from the other side, uses the principle of self-organization that allows to construct an optimal structure of the forecasting model during the algorithm operation [8–12]. It’s very promising to combine advantages of these both approaches for the solution of the problem—constructing an efficient model for the financial markets forecasting under BD conditions. \nIn the following presentation synthesis algorithm of the Neo-Fuzzy deep network using the GMDH is considered and its application for financial processes forecasting at stock markets is described. Experimental investigations of the efficiency of the proposed approach and its comparison with application of Neo-Fuzzy Neural Network with constant architecture are also presented. \n2.8.1 The Neo-fuzzy Neuron \nThe architecture of the neo-fuzzy neuron (NFN) was proposed by Takeshi Yamakawa and co-authors in [13–15]. The authors of the NFN admit among its most important advantages, the high rate of learning, computational simplicity, the possibility of finding the global minimum of the learning criterion in real time and also that it is characterized by fuzzy linguistic “if-then” rules. The neo-fuzzy neuron is a nonlinear multi-input single-output system shown in Fig. 2.14. \nIt realizes the following mapping: \nwhere $x _ { i }$ is the $i cdot$ -th input $( i = 1 , 2 , . . . , n )$ , $hat { y }$ is a system output. Structural blocks of neo-fuzzy neuron are nonlinear synapses $mathrm { N S _ { i } }$ which perform transformation of $i$ -th input signal in the from \nand realize fuzzy inference \nwhere $x _ { j i }$ is a fuzzy set which membership function is $mu _ { j i } , w _ { j i }$ is a singleton (synaptic weight) in consequent. As it can be readily seen nonlinear synapse in fact realizes Takagi-Sugeno fuzzy inference of zero order [16, 17]. \nConventionally the membership functions $mu _ { j i } ( x _ { i } )$ in the antecedent are complementary triangular functions as shown in Fig. 2.15. \nFor preliminary normalized input variables $x _ { i }$ (usually $0 leq x _ { i } leq 1$ ), membership functions can be expressed in the form: \nwhere $c _ { j i }$ are arbitrarily selected centers of corresponding membership functions. Usually they are equally spaced on interval [0, 1]. This contributes to simplify the fuzzy inference process. That is, an input signal $x _ { i }$ activates only two neighboring \nmembership functions simultaneously and the sum of the grades of these two membership functions equals to unity (so-called Ruspini partitioning), i.e. \nThus, the fuzzy inference result produced by the Center-of-Gravity defuzzification method can be given in the very simple form \nBy summing up $f _ { i } ( x _ { i } )$ , the output $hat { y }$ of Eq. (2.1) is produced. \nIt should be noticed that triangular activation functions provide only piecewise-linear approximation and this fact can in most of the cases can lead to decreasing of the received results accuracy. To minimize its negative effect we can increase number of membership functions. But it results in increasing of synaptic weight coefficients quantity and therefore complexity of our architecture is rising as well as time required for its learning. \nTo avoid this disadvantage we propose to use the cubic-spline membership functions (2.32) that can be written down in the following form: \nand shown in Fig. 2.16. \nThe cubic-spline activation functions (2.32) satisfy all requirements of the Ruspini partitioning (2.31) and it is considerably contributes to simplify the fuzzy inference process. On the other hand, usage of the cubic spline activation functions provides smooth polynomial approximation instead of piecewise-linear approximation and makes possible to perform a high quality modeling of significantly nonlinear non-stationary signals and processes. \nWhen a vector signal $x ( k ) = ( x _ { 1 } ( k ) , x _ { 2 } ( k ) , . . . , x _ { n } ( k ) ) ^ { T }$ (here $k = 1 , 2 , ldots$ is a discrete time) is fed to the input of the neo-fuzzy neuron, the output of this neuron is determined by both the membership functions $mu _ { j i } ( x _ { i } ( boldsymbol { k } ) )$ and tunable synaptic weights $w _ { j i } ( k - 1 )$ , which have been obtained at the previous training epoch: \nand thereby neo-fuzzy neuron contains $h  ^ { * } n$ synaptic weights which should be determined. \n2.8.2 The Neo-fuzzy Neuron Learning Algorithm \nThe learning criterion (goal function) is the standard local quadratic error function: \nIt is minimized via the conventional gradient stepwise algorithm. And as a result the following weight update procedure is obtained:",
        "chapter": "2 Deep Neural Networks and Hybrid GMDH-Neuro-fuzzy Networks in Big Data Analysis",
        "section": "2.8 Cascade Neo-fuzzy Neural Networks Structure Synthesis and Learning with Application of GMDH",
        "subsection": "2.8.1 The Neo-fuzzy Neuron",
        "subsubsection": "N/A"
    },
    {
        "content": "The cubic-spline activation functions (2.32) satisfy all requirements of the Ruspini partitioning (2.31) and it is considerably contributes to simplify the fuzzy inference process. On the other hand, usage of the cubic spline activation functions provides smooth polynomial approximation instead of piecewise-linear approximation and makes possible to perform a high quality modeling of significantly nonlinear non-stationary signals and processes. \nWhen a vector signal $x ( k ) = ( x _ { 1 } ( k ) , x _ { 2 } ( k ) , . . . , x _ { n } ( k ) ) ^ { T }$ (here $k = 1 , 2 , ldots$ is a discrete time) is fed to the input of the neo-fuzzy neuron, the output of this neuron is determined by both the membership functions $mu _ { j i } ( x _ { i } ( boldsymbol { k } ) )$ and tunable synaptic weights $w _ { j i } ( k - 1 )$ , which have been obtained at the previous training epoch: \nand thereby neo-fuzzy neuron contains $h  ^ { * } n$ synaptic weights which should be determined. \n2.8.2 The Neo-fuzzy Neuron Learning Algorithm \nThe learning criterion (goal function) is the standard local quadratic error function: \nIt is minimized via the conventional gradient stepwise algorithm. And as a result the following weight update procedure is obtained: \nwhere $y ( k )$ is the target value of the output, $eta$ is the scalar learning rate parameter which determines the speed of convergence and is chosen empirically. \nFor the purpose of increasing training speed Kaczmarz-Widrow-Hoff optimal one-step algorithm [10, 11] is applied \nwhere \n$( h n ) times 1$ vectors, generated by the corresponding variables, and its exponentially weighted modification \nwhich possesses both smoothing and filtering properties. \nIn case we have priori defined data set training process can be performed in a batch mode for one epoch using conventional least squares estimation. The neo-fuzzy neuron can be used as an elementary node of the architecture called the Neo-Fuzzy Neural Network. \n2.8.3 The Neo-fuzzy Neural Network and Its Architecture Optimization Using the Group Method of Data Handling \nThe Neo-Fuzzy Neural Network is a multilayer feedforward architecture that consists of neo-fuzzy neurons. 3-layers Neo-Fuzzy Neural Network [11] with $n$ inputs and m outputs is shown of Fig. 2.17. \nGiven architecture is completely coincide with the structure of the 3-layer perceptron, except that the neo-fuzzy neurons are used here as an elementary nodes instead of Rosenblatt perceptrons. Therefore, for the adjustment of the weight coefficients of such architecture it is necessary to use backpropagation algorithms. As it generally known, such algorithms are quite complex from the computational point of view and they operate slowly especially in Deep Neural networks with many layers.",
        "chapter": "2 Deep Neural Networks and Hybrid GMDH-Neuro-fuzzy Networks in Big Data Analysis",
        "section": "2.8 Cascade Neo-fuzzy Neural Networks Structure Synthesis and Learning with Application of GMDH",
        "subsection": "2.8.2 The Neo-fuzzy Neuron Learning Algorithm",
        "subsubsection": "N/A"
    },
    {
        "content": "where $y ( k )$ is the target value of the output, $eta$ is the scalar learning rate parameter which determines the speed of convergence and is chosen empirically. \nFor the purpose of increasing training speed Kaczmarz-Widrow-Hoff optimal one-step algorithm [10, 11] is applied \nwhere \n$( h n ) times 1$ vectors, generated by the corresponding variables, and its exponentially weighted modification \nwhich possesses both smoothing and filtering properties. \nIn case we have priori defined data set training process can be performed in a batch mode for one epoch using conventional least squares estimation. The neo-fuzzy neuron can be used as an elementary node of the architecture called the Neo-Fuzzy Neural Network. \n2.8.3 The Neo-fuzzy Neural Network and Its Architecture Optimization Using the Group Method of Data Handling \nThe Neo-Fuzzy Neural Network is a multilayer feedforward architecture that consists of neo-fuzzy neurons. 3-layers Neo-Fuzzy Neural Network [11] with $n$ inputs and m outputs is shown of Fig. 2.17. \nGiven architecture is completely coincide with the structure of the 3-layer perceptron, except that the neo-fuzzy neurons are used here as an elementary nodes instead of Rosenblatt perceptrons. Therefore, for the adjustment of the weight coefficients of such architecture it is necessary to use backpropagation algorithms. As it generally known, such algorithms are quite complex from the computational point of view and they operate slowly especially in Deep Neural networks with many layers. \n\nIf we use neo-fuzzy neurons that have only two inputs, the GMDH can be applied for the synthesis of the Neo-Fuzzy Neural Network with optimal architecture. \nThe main idea of the GMDH algorithm lay in successive synthesis of the neuron layers until the external criterion begins to increase. \nAlgorithm description [11]: \n(1) Form pairs from the neo-fuzzy neuron outputs of the current layer (at the first iteration we use the set of input signals). Each pair is fed to the corresponding neo-fuzzy neuron.   \n(2) Using the learning subsample adjust synaptic weight coefficient of each neo-fuzzy neuron.   \n(3) Using the test subsample calculate the value of the external criterion (regularity) for each neo-fuzzy neuron: \nwhere $N _ { t e s t }$ is a size of the test subsample, $s$ is the layer number, $p$ is a neuron number in the current layer $p = overline { { 1 , mathfrak { n } _ { mathrm { s } } } } , hat { y } _ { p } ^ { [ s ] } ( i )$ is the $p$ -th neuron of the $s$ -th layer response signal for the $i cdot$ -th input vector. \n(4) Find the minimal value of the external criteria for all neo-fuzzy neurons of the current layer \nCheck the condition \nwhere $varepsilon ^ { [ s ] } , varepsilon ^ { [ s - 1 ] }$ are the criterion values for the best neurons of the and s-th and $( s - 1 )$ -th layers correspondingly. If the condition (2.36) is true then return to the previous layer and find the best neuron that has minimal value of the criterion (2.35). Otherwise, select $F$ best neurons according to the criterion (2.35) value and go to the step 1 to construct the next layer of neurons. \n(5) Determine the final structure of the network. Moving backward from the best neuron of the $( mathsf { m } - 1 )$ -th layer along the input connections and passing successively all the layers of neurons, preserve in the final structure only such neurons that are used in the next layer. \nAfter the GMDH stops it can be said that the final optimal structure of the Neo-Fuzzy Neural Network is synthesized. As it can be readily seen we obtain not only optimal structure, but also trained neural network that is ready to process new data. \nOne of the most important advantages of GMDH application for the Deep neural networks architecture synthesis is a capability to use simple but very quick learning procedures for the neo-fuzzy neuron weights adjustment because network is trained layer-by-layer. \n2.8.4 The Experimental Investigations of Forecasting with Neo-fuzzy Neural Network \nThe experimental investigations of neo-fuzzy neural network in the problem of forecasting were carried out [11]. The goal contained in RTS index forecasting on the base of current stock prices of the leading Russian companies. \nInput data: daily stock prices and the value of RTS index in the period from 5 of February till 5 of May 2009. \nThe output is RTS index on the next day. \nSample size was 100 values. \nForecast criteria were the following: \n1. mean squared error (MSE);   \n2. mean absolute percentage error (MAPE).",
        "chapter": "2 Deep Neural Networks and Hybrid GMDH-Neuro-fuzzy Networks in Big Data Analysis",
        "section": "2.8 Cascade Neo-fuzzy Neural Networks Structure Synthesis and Learning with Application of GMDH",
        "subsection": "2.8.3 The Neo-fuzzy Neural Network and Its Architecture Optimization Using the Group Method of Data Handling",
        "subsubsection": "N/A"
    },
    {
        "content": "(4) Find the minimal value of the external criteria for all neo-fuzzy neurons of the current layer \nCheck the condition \nwhere $varepsilon ^ { [ s ] } , varepsilon ^ { [ s - 1 ] }$ are the criterion values for the best neurons of the and s-th and $( s - 1 )$ -th layers correspondingly. If the condition (2.36) is true then return to the previous layer and find the best neuron that has minimal value of the criterion (2.35). Otherwise, select $F$ best neurons according to the criterion (2.35) value and go to the step 1 to construct the next layer of neurons. \n(5) Determine the final structure of the network. Moving backward from the best neuron of the $( mathsf { m } - 1 )$ -th layer along the input connections and passing successively all the layers of neurons, preserve in the final structure only such neurons that are used in the next layer. \nAfter the GMDH stops it can be said that the final optimal structure of the Neo-Fuzzy Neural Network is synthesized. As it can be readily seen we obtain not only optimal structure, but also trained neural network that is ready to process new data. \nOne of the most important advantages of GMDH application for the Deep neural networks architecture synthesis is a capability to use simple but very quick learning procedures for the neo-fuzzy neuron weights adjustment because network is trained layer-by-layer. \n2.8.4 The Experimental Investigations of Forecasting with Neo-fuzzy Neural Network \nThe experimental investigations of neo-fuzzy neural network in the problem of forecasting were carried out [11]. The goal contained in RTS index forecasting on the base of current stock prices of the leading Russian companies. \nInput data: daily stock prices and the value of RTS index in the period from 5 of February till 5 of May 2009. \nThe output is RTS index on the next day. \nSample size was 100 values. \nForecast criteria were the following: \n1. mean squared error (MSE);   \n2. mean absolute percentage error (MAPE). \nTypes of experiments for Neo-fuzzy neural network: \n(1) Variation of ratio learning/ test samples in the range: 25:75, 50:50, 75:25;   \n(2) Change the number of layers: 1-3-5;   \n(3) Change the number of iterations: 1000, 10,000, 100,000;   \n(4) Variation of a number of points to be forecasted: 1-3-5;   \n(5) Change of maximal error—the condition of stop: 0.01 тa 0.09; \nSome of the obtained experimental results are presented below. Experiment $A$ ) ratio 75:25, $mathrm { M S E } = 0 . 0 5 0 1 5 8$ . The results are presented on Fig. 2.18. Experiment $B$ ) ratio $5 0 { : } 5 0  mathrm { M S E } = 0 . 0 5 3 5 6 2$ Experiment $C$ ) ratio learning/test—25:75. The results are presented on Fig. 2.19. $mathrm { M S E } = 0 . 0 6 8 4 8 9$ \nExperiment Type 2. Variation of Layers Number \nComparison of algorithm work when number of layers is varied: 1-3-5-7 while forecast at 1 point under ratio learning/test sample 75:25 \nExperiment A) layers number— $1 ~ mathrm { M S E } = 0 . 0 4 6 6 2$ . The results are presented on Fig. 2.20. \nExperiment $B ^ { ' }$ ) layers number—3, $mathrm { M S E } = 0 . 0 3 8 1$ Experiment $C$ ) layers number—5, $mathrm { M S E } = 0 . 0 4 4 6$ Experiment $D$ ) layers number—7, $mathrm { M S E } = 0 . 0 5 4 4$ \nExperiments Type 3. Variation of Iterations Number: 1000, 10,000, 100,000 \nExperiment $B$ ) iterations number—10,000, $mathbf { M S E } = 0 . 0 5 7 5$ Experiment $C$ ) iterations number—100,000, $mathbf { M S E } = 0 . 0 5 2 5$ \nExperiments Type 4. Variation of Number of Forecasted Points \nComparison of algorithm forecasting accuracy when varying a number of forecasted   \npoints 1-3-5, using ratio learning/test sample 75:25 Experiment A) a number of forecasted points— $1 ~ mathrm { M S E } = 0 . 0 4 9 5$ Experiment $B$ ) a number of forecasted points—3, $mathbf { M S E } = 0 . 4 4 6 9$ Experiment $C$ ) a number of forecasted points—5, $mathbf { M S E } = 1 . 0 4 1 8$ \nConclusions on Experimental Results \nAfter having carried out the series of experiments with neo-fuzzy neural network of full structure and of optimal structure constructed by GMDH the following results were obtained which are presented in Table 2.1. \nThe best results are highlighted with the grey color. As it can be readily seen the Neo-Fuzzy Neural Network with optimal structure constructed by GMDH gives better results than the conventional network with full structure (full network). \nThis may be explained by the utilization of self-organization mechanism for constructing not full network. But at the same time there are some disadvantages of this approach—the rate of convergence is slower in comparison with full network. But taking into account the better criterion values this disadvantage may be neglected. \nFor better estimation of the suggested approach the forecasting error obtained at the experiments is presented on Figs. 2.21 and 2.22. These are the charts of MAPE obtained by Neo-fuzzy neural network constructed by GMDH. \nAs we may see while forecasting 1 point ahead we obtain rather high precision— less than $1 5 %$ . In case of increase the number of points forecasted the accuracy drops—the error lies in the range $1 5 mathrm { - } 4 5 %$ . \nAnalyzing the presented curves we conclude that the Neo-Fuzzy Neural Network with one hidden layer error is also not high but is not uniformly distributed and may exceed $30 %$ . For 5 hidden layers the MAPE increases and may reach $3 5 %$ . And finally with 7 layers MAPE reaches $60 %$ . Thus the maximal precision we obtain with 3 hidden layers. \nBesides, in process of experimental investigations were found the optimal parameters for algorithms for full and constructed by GMDH neo-fuzzy networks [11]: \nThe ideal ratio of learning and test samples— $7 5 % : 2 5 %$ . The best number of layers—3. The best result at 100,000 iterations The best result with 1 forecasted point. • The best result with maximal error (threshold of algorithm stop)—0.01. \nSolving of the Classification Problem Using the Neo-Fuzzy Neural Network \nWe have applied proposed Neo-Fuzzy Neural Network synthesized by the GMDH to solve the ‘breast cancer in Wisconsin’ benchmark classification problem [11]. \nDataset containing 699 points have been used for this purpose (ftp://cs.wisc.edu/ math-prog/cpo-dataset/machine-learn/cancer/cancer1/datacum). 16 points had parameters with missed values so they have been eliminated from the dataset and remaining 683 points have been separated on training set—478 points $( 7 0 % )$ and test set—205 points $( 3 0 % )$ . \nEach point has 9-dimensional feature vector and 1 class parameter which should be determined and identifies either benign or malignant tumor has current examined patient. Features values have been normalized on interval $[ - 1 ; 1 ]$ . \nFor comparison the same classification problem was solved using the conventional Neo-Fuzzy Neural Network with full 3-layer structure: 10 NFNs in the first layer, 5 in the seconds, and 1 output NFN. Obtained results of classifications can be found in Table 2.2. \nWhen output signal be found within the range [0.3; 0.7] it is lesser probability that classification was correct. We quantify and marked out such classified samples as points outside the ‘belief zone’. \nWe can see that the Neo-Fuzzy Neural Network with architecture synthesized by the GMDH shows very good results of classification and sufficiently exceeds in the classification quality as compared with the full network, especially on the testing set. It can be explained by fact, that full network is a more complex model and as generally known, complexness of the model leads to generalization loss and therefore classification accuracy decreases. The GMDH allows to synthesis the optimal structure that neglects inputs which are not significant. \nIn Fig. 2.23 the architecture of the Neo-Fuzzy Neural Network constructed by the GMDH is shown. It is considerably simpler, than the full network, but in spite of this it allows to achieve higher classification quality. \n2.9 Evolving GMDH-Neuro-fuzzy Network with Small Number of Tuning Parameters \nIntroduction \nNowadays artificial neural networks (ANNs) and neuro-fuzzy systems (NFSs) are widely used for solving different Data Mining tasks, presented either in the form of “object—property” tables or in the form of multidimensional time series, often produced by stochastic or chaotic non-stationary nonlinear systems. The advantages of these computational intelligence systems derive, first of all, from their universal approximating capabilities, learning possibility, transparency and interpretability (in case of the NFSs) of the results.",
        "chapter": "2 Deep Neural Networks and Hybrid GMDH-Neuro-fuzzy Networks in Big Data Analysis",
        "section": "2.8 Cascade Neo-fuzzy Neural Networks Structure Synthesis and Learning with Application of GMDH",
        "subsection": "2.8.4 The Experimental Investigations of Forecasting with Neo-fuzzy Neural Network",
        "subsubsection": "N/A"
    },
    {
        "content": "Conventionally “learning” is defined as a process of adjusting synaptic weights using an optimization of a given learning criterion. The quality of this process can be significantly improved by adjusting not only its synaptic weights but also the architecture of the ANNs and NFSs. This idea is the foundation of evolving computational intelligence systems (ECIS), that are used more widely in the recent years [18, 19]. \nIt should be noticed that the multilayered neuro-fuzzy systems of TSK- or ANFIS-type [16, 17, 20–22] are the base of the majority of the known ECIS. \nAt the same time, speaking of evolving systems, we should mention the Group Method of Data Handling (GMDH) [12, 23–26], which is a powerful approach of the information processing system of self-organization. It can synthesize sufficiently simple and effective computational architectures. It is clear that this approach attracted the attention of the computational intelligence experts. The GMDH-neural networks having active neurons [26–28], N-adalines [29], R-neurons [30, 31], Q-neurons [3] as nodes were developed; in the area integrating fuzzy GMDH [32] and neural networks the GMDH-neuro-fuzzy systems [31, 33] and GMDH-neo-fuzzy systems (see previous section) [11] were developed; GMDH-wavelet-neuro-fuzzy systems [8, 10, 34] and GMDH-fuzzy-spiking neural network [9] were also elaborated. \nThese systems demonstrated their efficiency in solving a wide range of tasks, however they lost the main advantages of the original GMDH, namely small number of tuning parameters in each node. It should be noted that initially elementary regression models with two inputs and three estimated coefficients were developed on the basis of GMDH. These properties of GMDH are especially important for deep neural networks with multiple hidden layers. \nDue to this problem, it seems reasonable to develop a GMDH-system, that combines advantages of the traditional GMDH, hybrid systems of the computational intelligence and that is trained with simple learning procedures, used in regression analysis and linear identification theory. The developed approach to this problem solution is considered below. \n2.9.1 Evolving GMDH-Neuro-fuzzy System Architecture \nThe architecture of the evolving GMDH-system is shown in Fig. 2.24. To the input layer of the system $( n times 1 )$ -dimensional vector of input signals $x = left( x _ { 1 } , x _ { 2 } , . . . , x _ { n } right) ^ { T }$ is fed. Then this signal is fed to the first hidden layer, that contains $n _ { 1 } = c _ { n } ^ { 2 }$ nodes-neurons, each of which has only two inputs. At the node outputs $N ^ { [ { 1 } ] }$ of the first hidden layer the output signals $hat { y } _ { l } ^ { [ 1 ] }$ , $l = 1 , 2 , . . . , 0 , 5 n ( n - 1 ) = c _ { n } ^ { 2 }$ are formed. Then these signals are fed to the selection block of the first hidden layer $S B ^ { [ 1 ] }$ , that selects among the output signals $hat { y } _ { l } ^ { [ 1 ] } n _ { 1 } *$ best signals $( n _ { 1 } * leq n$ , where $n _ { 1 } * = mathrm { F }$ is so called Freedom of choice) most precise by accepted criterion (mostly by the mean squared error $sigma _ { y _ { l } ^ { [ 1 ] } } ^ { 2 } .$ ). \n\nFrom these $n _ { 1 } *$ best outputs of the first hidden layer $hat { y } _ { l } ^ { [ 1 ] } * n _ { 2 }$ pairwise combineatuironss $hat { y } _ { l } ^ { [ 1 ] } { * } , hat { y } _ { p } ^ { [ 1 ] } { * }$ oanrge fhoersmigedn,altshaotf tahries fleadyetro e hsecsoenledcthiiodndeblnolcaky osremlecd s $N ^ { [ 2 ] }$ $hat { y } _ { l } ^ { [ 2 ] }$ $S B ^ { [ 2 ] }$ $mathrm { ~ F ~ }$ best neurons by accuracy (e.g. by $sigma _ { y _ { l } ^ { [ 2 ] } } ^ { 2 } .$ ) if the best signal of the second layer is better than the best one of the first hidden layer $hat { y } _ { 1 } ^ { [ 1 ] } ast$ : Other hidden layers forms signals similarly to the second hidden layer. The system evolution process continues until the best signal of the selection block $S B ^ { [ s + 1 ) }$ would be worse than the best signal of the previous (s) layer, that is $sigma _ { y _ { l } ^ { [ s + 1 ] } } ^ { 2 } > sigma _ { y _ { l } ^ { [ s ] } } ^ { 2 }$ . Then we return to the previous layer and choose its best node neuron $N ^ { [ s ] }$ in order to form the system output signal $hat { y } ^ { [ s ] }$ . \nIt should be stressed that we obtain not only optimal network structure but well-trained network as well due to GMDH algorithm. Besides, since the training is performed sequentially layer by layer the problems of high dimensionality as well as decaying or exploding gradient vanish. \nThis is very important for deep learning networks. \nAs it was already mentioned, as nodes of GMDH-systems we can use different types of neurons, e.g. N-Adalines [29], active [26, 27, 35], R-[30, 36], Q-[34], spiking-[9], wavelet-[8, 10, 34], neo-fuzzy-neurons [11] and other similar computational intelligence systems units, that has the required approximating capabilities and learning capacities. However, the main advantage of the original GMDH may be lost, namely the ability to work with small training sets (short samples). Therefore in the next section NFN network with small number of tuning parameters is considered. \n2.9.2 Neuro-fuzzy Network with Small Number of Tuning Parameters as a Node of GMDH-System \nLet us consider the node architecture, shown in Fig. 2.25 and proposed as a neuron of the suggested evolving GMDH-system. This architecture is in fact a Wang– Mendel neuro-fuzzy system [22, 37] with only two inputs $x _ { i }$ and $x _ { j }$ , and one output $hat { y } _ { l }$ . To the node input a two-dimensional vector of signals $x ( k ) = left( x _ { i } ( k ) , x _ { j } ( k ) right) ^ { T }$ is fed, where $k = 1 , 2 , . . . , N$ is either the observation number in training set or the current discrete time.",
        "chapter": "2 Deep Neural Networks and Hybrid GMDH-Neuro-fuzzy Networks in Big Data Analysis",
        "section": "2.9 Evolving GMDH-Neuro-fuzzy Network with Small Number of Tuning Parameters",
        "subsection": "2.9.1 Evolving GMDH-Neuro-fuzzy System Architecture",
        "subsubsection": "N/A"
    },
    {
        "content": "From these $n _ { 1 } *$ best outputs of the first hidden layer $hat { y } _ { l } ^ { [ 1 ] } * n _ { 2 }$ pairwise combineatuironss $hat { y } _ { l } ^ { [ 1 ] } { * } , hat { y } _ { p } ^ { [ 1 ] } { * }$ oanrge fhoersmigedn,altshaotf tahries fleadyetro e hsecsoenledcthiiodndeblnolcaky osremlecd s $N ^ { [ 2 ] }$ $hat { y } _ { l } ^ { [ 2 ] }$ $S B ^ { [ 2 ] }$ $mathrm { ~ F ~ }$ best neurons by accuracy (e.g. by $sigma _ { y _ { l } ^ { [ 2 ] } } ^ { 2 } .$ ) if the best signal of the second layer is better than the best one of the first hidden layer $hat { y } _ { 1 } ^ { [ 1 ] } ast$ : Other hidden layers forms signals similarly to the second hidden layer. The system evolution process continues until the best signal of the selection block $S B ^ { [ s + 1 ) }$ would be worse than the best signal of the previous (s) layer, that is $sigma _ { y _ { l } ^ { [ s + 1 ] } } ^ { 2 } > sigma _ { y _ { l } ^ { [ s ] } } ^ { 2 }$ . Then we return to the previous layer and choose its best node neuron $N ^ { [ s ] }$ in order to form the system output signal $hat { y } ^ { [ s ] }$ . \nIt should be stressed that we obtain not only optimal network structure but well-trained network as well due to GMDH algorithm. Besides, since the training is performed sequentially layer by layer the problems of high dimensionality as well as decaying or exploding gradient vanish. \nThis is very important for deep learning networks. \nAs it was already mentioned, as nodes of GMDH-systems we can use different types of neurons, e.g. N-Adalines [29], active [26, 27, 35], R-[30, 36], Q-[34], spiking-[9], wavelet-[8, 10, 34], neo-fuzzy-neurons [11] and other similar computational intelligence systems units, that has the required approximating capabilities and learning capacities. However, the main advantage of the original GMDH may be lost, namely the ability to work with small training sets (short samples). Therefore in the next section NFN network with small number of tuning parameters is considered. \n2.9.2 Neuro-fuzzy Network with Small Number of Tuning Parameters as a Node of GMDH-System \nLet us consider the node architecture, shown in Fig. 2.25 and proposed as a neuron of the suggested evolving GMDH-system. This architecture is in fact a Wang– Mendel neuro-fuzzy system [22, 37] with only two inputs $x _ { i }$ and $x _ { j }$ , and one output $hat { y } _ { l }$ . To the node input a two-dimensional vector of signals $x ( k ) = left( x _ { i } ( k ) , x _ { j } ( k ) right) ^ { T }$ is fed, where $k = 1 , 2 , . . . , N$ is either the observation number in training set or the current discrete time. \n\nThe first layer of a node contains $2 h$ membership functions $mu _ { p i } ( x _ { i } ( { boldsymbol { k } } ) )$ , $mu _ { p j } big ( x _ { j } ( k ) big )$ , $p = 1 , 2 , . . . , h$ and provides fuzzification of input variables. The bell-shaped constructions with nonstrictly local receptive support are usually used as membership functions. It allows to avoid appearing of “gaps” in the fuzzified space while using scatter partitioning of input space [35]. Usually the Gaussians are used as membership functions of the first layer \nwhere $c _ { p i } , c _ { p j }$ are parameters, that define the centers of the membership functions, ${ sigma } _ { i } , { sigma } _ { j }$ are width parameters of these functions. The second layer provides aggregation of the membership levels. It consists of $h$ multiplication units and forms two-dimensional radial basis activation functions \nand for Gaussians with the same values $sigma _ { i } = sigma _ { j } = sigma$ we can write \n(here $boldsymbol { c _ { p } } = left( c _ { p i } ,  c _ { p j } right) ^ { T } )$ , i.e. the elements of the first and the second layers process the input signal similarly to the R-neurons of the radial basis function neural networks. \nThe third layer is one of synaptic weights that are adjusted during learning process. The outputs of this layer are values \nThe fourth layer is formed by two summation units and computes the sums of output signals of the second and the third hidden layers \nAnd finally in the fifth layer of the neuron normalization is realized, as a result the node output signal $hat { y } _ { l }$ is formed: \nIt is easy to see that the node implements nonlinear mapping of input signals to output signal like normalized radial basis function neural network, however the NFS contains significantly lower number $h$ of adjusted parameters comparing with the neural network. \nUsing introduced notation and writing transformations in every node of the standard GMDH in the form \nthat contains three unknown parameters, it is easy to see that with three membership functions being on the each input of the proposed node we get the same three synaptic weights that should be adjusted. \nIn the simplest case the estimation of these synaptic weights can be realized with the conventional least squares method (LSM), traditionally used in the GMDH. If the entire training set is presented, we can use the LSM in its batch form \n(here $y ( k )$ —external reference signal). If training samples are fed sequentially in on-line mode, the recurrent form of the LSM is used \n2.9.3 Computational Experiments \nThe efficiency of the proposed approach was demonstrated by solving the problem of the forecasting at the stock exchange. \nThe experimental investigations for stock prices forecasting were carried out. As a forecasted variable the RTS index in 2013 with time step one week was chosen. As external regressors (inputs) stock prices of the leading companies were used. Total sample had 55 points that was used while searching the optimal partial description in the GMDH. At each layer we selected 6 best models (freedom choice $mathrm { F } = 6 $ ). The mathematical model had the general form $y = f ( x _ { 1 } ,  x _ { 2 } ,  x _ { 3 } ,  x _ { 4 } )$ . \nAs the quality criteria of the obtained models MAPE and RMSE were used. The flow charts of real and simulated values of the RTS index are presented in Fig. 2.26 for $N = 2$ (here $N$ is a number of fuzzy inputs). The results of experiments are presented in Table 2.3. The flow charts of criteria values are presented in Figs. 2.27 and 2.28. As one can see, with number of inputs increasing the error values first fall down, but then begin to grow. So increasing the number of inputs until the error value starts to grow we can obtain the optimal inputs number (Figs. 2.27 and 2.28).",
        "chapter": "2 Deep Neural Networks and Hybrid GMDH-Neuro-fuzzy Networks in Big Data Analysis",
        "section": "2.9 Evolving GMDH-Neuro-fuzzy Network with Small Number of Tuning Parameters",
        "subsection": "2.9.2 Neuro-fuzzy Network with Small Number of Tuning Parameters as a Node of GMDH-System",
        "subsubsection": "N/A"
    },
    {
        "content": "(here $y ( k )$ —external reference signal). If training samples are fed sequentially in on-line mode, the recurrent form of the LSM is used \n2.9.3 Computational Experiments \nThe efficiency of the proposed approach was demonstrated by solving the problem of the forecasting at the stock exchange. \nThe experimental investigations for stock prices forecasting were carried out. As a forecasted variable the RTS index in 2013 with time step one week was chosen. As external regressors (inputs) stock prices of the leading companies were used. Total sample had 55 points that was used while searching the optimal partial description in the GMDH. At each layer we selected 6 best models (freedom choice $mathrm { F } = 6 $ ). The mathematical model had the general form $y = f ( x _ { 1 } ,  x _ { 2 } ,  x _ { 3 } ,  x _ { 4 } )$ . \nAs the quality criteria of the obtained models MAPE and RMSE were used. The flow charts of real and simulated values of the RTS index are presented in Fig. 2.26 for $N = 2$ (here $N$ is a number of fuzzy inputs). The results of experiments are presented in Table 2.3. The flow charts of criteria values are presented in Figs. 2.27 and 2.28. As one can see, with number of inputs increasing the error values first fall down, but then begin to grow. So increasing the number of inputs until the error value starts to grow we can obtain the optimal inputs number (Figs. 2.27 and 2.28). \n\nFor a comparison models using classical GMDH with linear partial descriptions and cascade neuro-fuzzy network were constructed. The following parameters for models construction were set: \nclassical GMDH, $50 %$ is training sample size, freedom choice—best 6 models;   \ncascade neuro-fuzzy network with different inputs number. \nThe simulation results for the classical GMDH are presented in Fig. 2.29. The MAPE value is 0.09845, the RMSE value is 15.1446. \nNow let’s construct the model using full cascade neuro-fuzzy network with different inputs number. The MAPE values for GMDH-neuro-fuzzy network and for full cascade NFN are presented in Table 2.4. \nThe flow charts of MAPE for these networks are presented in Fig. 2.30. As one can see, the GMDH-neuro-fuzzy network showed much better results than the full cascade neuro-fuzzy network due to more optimal network structure. Also GMDH-neuro-fuzzy network showed better results comparing with classical GMDH. The MAPE value for classical GMDH is 0.09845, while the best GMDH-neuro-fuzzy model has MAPE value 0.039496. \nThe further experiments were carried out. We added to the inputs several output values in the prehistory. The other models parameters are the same. The model is presented in form $y ( k ) = f ( x _ { 1 } ( k ) , x _ { 2 } ( k ) , x _ { 3 } ( k ) , y ( k - 1 ) , y ( k - 2 ) )$ . \nNumber of inputs is 5. The MAPE value is 0.02040, the RMSE value is 3.59614. As one can see, after adding the values of the RTS index to inputs prehistory the model quality has increased. \nLet’s consider the prediction quality of the GMDH-neuro-fuzzy model using another sample. As input sample the stock prices of Microsoft corp. since 01.11.14 to 29.12.14 were used. The sample size is 64 points. A model is constructed using 62 points. The forecast is made for 4 steps ahead, the first two steps are checked with available data. Autoregression model with number of lags 5 is used. As a result we obtained a GMDH-neuro-fuzzy network with 6 fuzzy inputs. The obtained results are presented in Tables 2.5 and 2.6. \nAs one can see, the GMDH-neuro-fuzzy network shows more accurate forecast than the classical GMDH and the cascade neuro-fuzzy network. Its MAPE value doesn’t exceed $1 %$ (0.32 and $0 . 3 4 %$ while forecasting for 1 and 2 steps ahead). \nAs the final experiment let’s compare the training time for GMDH-neuro-fuzzy model and full cascade model. In Table 2.7 the training time in seconds for GMDH-neuro-fuzzy network and full cascade neuro-fuzzy network is presented. As an initial sample we used Microsoft stock prices in the period since 01.11.14 to 29.12.14, a sample size is 64 points. \nConclusion \nIn this section the elementary neuro-fuzzy networks with scatter partitioning of input space and small number of tuning parameters are proposed as nodes of the GMDH-system. The system architecture can evolve in on-line mode as the synaptic weights of the proposed neuro-fuzzy nodes-neurons are adjusted. The distinguishing feature of the proposed approach is the ability to work with very small training sets. \nThe experimental investigations of the neuro-fuzzy network in the problem of stock prices forecasting were carried out. After investigations results analysis the following conclusions were made: \n– the variation of inputs number in GMDH-neuro-fuzzy network influences the model quality: as a number of inputs increases the error first falls down, and then begins to grow; it enables to choose the optimal inputs number; the application of the proposed approach for optimal structure search allows to decrease training time and to increase the forecasting quality of the model comparing with full-cascade deep fuzzy network. \n2.10 A Deep GMDH System Based on the Extended Neo-fuzzy Neuron and Its Training \nIntroduction \nDuring the last few years, evolving intelligent systems have become widely spread and popular for handling any sort of dynamic modeling and training requirements in real-world (online) applications, especially under conditions of a growing effect of the dynamic data context, sequential video analysis, and web mining. This demand is justified by the growing dynamic and complexity of current problems as well as the ascending volumes of data bases (BD), which lead to the fact that traditional batch training is not possible any more to be applied within some reasonable time period and tolerable accuracy [1–4]. The evolving incremental learning systems should process huge amounts of data, analyze the data rapidly and extract data features on the fly. Since the data is transforming permanently, these systems must be capable of adapting their topology.",
        "chapter": "2 Deep Neural Networks and Hybrid GMDH-Neuro-fuzzy Networks in Big Data Analysis",
        "section": "2.9 Evolving GMDH-Neuro-fuzzy Network with Small Number of Tuning Parameters",
        "subsection": "2.9.3 Computational Experiments",
        "subsubsection": "N/A"
    },
    {
        "content": "2.10.1.1 The Extended Neo-fuzzy Neuron \nA model of the extended NFN was put forward in [40] as a further development and evolution of an ordinary neo-fuzzy neuron submitted by Yamakawa, Miki and Uchino [13–15]. \nA traditional version of the neo-fuzzy neuron is a MISO (multiple inputs and a single output) non-linear system that accounts for the permutation \nwhere $x _ { i }$ signifies an $i$ component in the input vector $x = ( x _ { 1 } , . . . , x _ { i } , . . . , x _ { n } ) ^ { mathrm { T } } in R ^ { n }$ (of the dimensionality $n$ ), $hat { y }$ marks a scalar output of the neo-fuzzy neuron. In its usual form, NFN embodies multiple (non-linear) synapses $N S _ { i }$ . Their purpose is to modify the ith vector element in $x _ { i }$ into \nwhere $h$ is the number of membership functions, $w _ { l i }$ defines a synaptic weight $l$ in the ith non-linear synapse, $l = 1 , 2 , . . . , h , i = 1 , 2 , . . . , n ; mu _ { l i } ( x _ { i } )$ describes the lth membership function in the non-linear synapse $i$ that makes a great impact on fuzzification of a crisp element $x _ { i }$ . By such manners, the permutation ensured by the NFN could be noted down like \nThe NFN provides the fuzzy inference rule put into action in the form \nwhich consequently infers that the synapse truthfully endows the 0th order fuzzy inference by Takagi-Sugeno [16, 17]. \nAs mentioned previously, the NFN’s synapse $N S _ { i }$ covers the 0-order inference by Takagi-Sugeno only producing the simplest Wang-Mendel neuro-fuzzy system [41, 42]. It seems quite valid to expand approximating capabilities of this computational node by introducing a specified topological element to have been called an “extended nonlinear synapse” [40] $left( E N S _ { i } right)$ and to develop the “extended neo-fuzzy neuron” (ENFN) that embraces $E N S _ { i }$ units instead of conventional synapses $N S _ { i }$ . \nConsidering in detail additional parameter values \npresent them in the following abridged form \nwhere $tilde { mu } ( x ) = big ( tilde { mu } _ { 1 } ^ { T } ( x _ { 1 } ) , . . . , tilde { mu } _ { i } ^ { T } ( x _ { i } ) , . . . , tilde { mu } _ { n } ^ { T } ( x _ { n } ) big ) ^ { mathrm { T } } ,$ \nIt can be noted easily that the ENFN holds $( p + 1 ) h n$ parameters (synaptic weights) to be adapted and the fuzzy inference realized by each $E N S _ { i }$ is \nwhich ties up to the Takagi-Sugeno inference of the $p$ th order. \nThe ENFN’s framework is not so complicated in comparison with the conventional neuro-fuzzy system. The architecture of the extended neo-fuzzy neuron and the extended neo-fuzzy synapse are given in Figs. 2.32 and 2.33. \nThe usage of the scatter partitioning of the input space [21] can cause the appearing of “gaps” in the fuzzified space. To avoid this problem one can use the bell-shaped constructions with non-strictly local receptive support as membership functions. Mostly the Gaussians are used as membership functions of the first layer \nwhere $c _ { l i } ( k )$ is the parameter that defines the center of the membership function, $sigma _ { l i } ( k )$ is the width parameter of this function. \n2.10.2 The Adjustment Procedures for All Parameters of the System \nWith regard to the fact that the reference signal $hat { y } _ { s } ^ { [ 1 ] } ( k )$ in every system node is in linear dependence on the configurable synaptic weights $w _ { l i }$ , one can make use of both either the established least squares method or its recurrent fashion to tune them. If the data to be trained is not stationary, it is feasible enough to apply the exponentially weighted recurrent least squares algorithm to adjust the weights as represented by",
        "chapter": "2 Deep Neural Networks and Hybrid GMDH-Neuro-fuzzy Networks in Big Data Analysis",
        "section": "2.10 A Deep GMDH System Based on the Extended Neo-fuzzy Neuron and Its Training",
        "subsection": "2.10.1 An Architecture of the Deep GMDH Neuro-fuzzy System",
        "subsubsection": "2.10.1.1 The Extended Neo-fuzzy Neuron"
    },
    {
        "content": "where $c _ { l i } ( k )$ is the parameter that defines the center of the membership function, $sigma _ { l i } ( k )$ is the width parameter of this function. \n2.10.2 The Adjustment Procedures for All Parameters of the System \nWith regard to the fact that the reference signal $hat { y } _ { s } ^ { [ 1 ] } ( k )$ in every system node is in linear dependence on the configurable synaptic weights $w _ { l i }$ , one can make use of both either the established least squares method or its recurrent fashion to tune them. If the data to be trained is not stationary, it is feasible enough to apply the exponentially weighted recurrent least squares algorithm to adjust the weights as represented by \n\n(where $0 < alpha leq 1$ denotes a forgetting feature, and $y ( k )$ implies the reference signal) or the exponentially weighted gradient learning procedure \nA process of tuning both parameters of the centers and the synaptic weights may be implemented by means of the gradient procedures for minimization of the learning criterion \nin the form of \nwhere $r = 1 , 2 , . . . , h ; eta _ { c } , eta _ { sigma }$ signify learning rates for the centers’ and the widths’ parameters are denoted correspondingly, $tilde { sigma } _ { r i } ^ { 2 } ( k ) = - 0 , 5 sigma _ { r i } ^ { - 2 } ( k )$ . Based on the previous expressions, the following expressions are obtained \nFollowing on from (2.50), the derivatives $frac { partial f _ { i } ( x _ { i } ( k ) ) } { partial c _ { r i } }$ and $frac { partial f _ { i } ( x _ { i } ( k ) ) } { partial tilde { sigma } _ { r i } ^ { 2 } }$ could be presented in the form: \nBasing on (2.45), the derivatives $frac { partial mu _ { r i } ( x _ { i } ( k ) ) } { partial c _ { r i } }$ and @lri xiðkÞÞ can be presented as \nIn this way, all the system nodes’ parameters (synaptic weights, centers and width parameters for the membership functions) may be adjusted. Concerning the successive layers, the nodes’ parameters are usually tuned quite the same way as the nodes in the first hidden layer. \nIt’s worth to note that inputs of the $s$ -th layer are a pairwise combination of the signals $hat { y } _ { l } ^ { [ s - 1 ] * } , hat { y } _ { p } ^ { [ s - 1 ] * }$ formed by the selection block $S B ^ { [ s - 1 ] }$ . The reference signal $y ( k )$ is the same one for all the blocks of the evolving complex system. The algorithm operates until the stopping criterion holds—MSE of the best node of current layer s starts to rise. Then the best neuron of the previous layer determines optimal deep network structure. \n2.10.3 An Experimental Study \nThe Darwin sea level pressure data set was chosen from the Data Market data storage to showcase a advantage of the offered deep GMDH system and its learning schemes. It was mainly used for non-stationary signals’ prediction. The data set presents chiefly a monthly sea level pressure for a period of more than a century (1882–1998). A general size of this data sample is 1400 observations. The system used 1100 observations to get trained and 300 observations to get tested. To estimate the efficiency of the proposed neuro-fuzzy system is, we also considered a multilayer perceptron, a radial-basis function neural network, and ANFIS for solving the same task. The results obtained were estimated according to the MSE criterion. Table 2.8 gives a demonstration of the systems’ performance. The proposed deep GMDH system illustrated quite good results while handling the prediction task. It is worth mentioning that its training time was short enough compared to analogues. At the same time, its forecasting results were the best ones for this data set. Figure 2.34 demonstrates a fragment of the learning process.",
        "chapter": "2 Deep Neural Networks and Hybrid GMDH-Neuro-fuzzy Networks in Big Data Analysis",
        "section": "2.10 A Deep GMDH System Based on the Extended Neo-fuzzy Neuron and Its Training",
        "subsection": "2.10.2 The Adjustment Procedures for All Parameters of the System",
        "subsubsection": "N/A"
    },
    {
        "content": "Basing on (2.45), the derivatives $frac { partial mu _ { r i } ( x _ { i } ( k ) ) } { partial c _ { r i } }$ and @lri xiðkÞÞ can be presented as \nIn this way, all the system nodes’ parameters (synaptic weights, centers and width parameters for the membership functions) may be adjusted. Concerning the successive layers, the nodes’ parameters are usually tuned quite the same way as the nodes in the first hidden layer. \nIt’s worth to note that inputs of the $s$ -th layer are a pairwise combination of the signals $hat { y } _ { l } ^ { [ s - 1 ] * } , hat { y } _ { p } ^ { [ s - 1 ] * }$ formed by the selection block $S B ^ { [ s - 1 ] }$ . The reference signal $y ( k )$ is the same one for all the blocks of the evolving complex system. The algorithm operates until the stopping criterion holds—MSE of the best node of current layer s starts to rise. Then the best neuron of the previous layer determines optimal deep network structure. \n2.10.3 An Experimental Study \nThe Darwin sea level pressure data set was chosen from the Data Market data storage to showcase a advantage of the offered deep GMDH system and its learning schemes. It was mainly used for non-stationary signals’ prediction. The data set presents chiefly a monthly sea level pressure for a period of more than a century (1882–1998). A general size of this data sample is 1400 observations. The system used 1100 observations to get trained and 300 observations to get tested. To estimate the efficiency of the proposed neuro-fuzzy system is, we also considered a multilayer perceptron, a radial-basis function neural network, and ANFIS for solving the same task. The results obtained were estimated according to the MSE criterion. Table 2.8 gives a demonstration of the systems’ performance. The proposed deep GMDH system illustrated quite good results while handling the prediction task. It is worth mentioning that its training time was short enough compared to analogues. At the same time, its forecasting results were the best ones for this data set. Figure 2.34 demonstrates a fragment of the learning process. \n\nConclusion \nIn this chapter new class of neural networks—Deep networks are considered and their learning algorithms are presented and discussed. For deep learning implementation encoders-decoders, restricted Boltzman machines (RBM) and stacked RBM are used. The main problems connected with Deep learning—vanishing and exploding gradient are considered and methods of their solution are presented and discussed. \nNew approach to Deep learning based on application of GMDH to synthesis and learning of neuro-fuzzy networks is suggested and developed in this chapter. \nThe deep evolving neuro-fuzzy system presented here doesn’t require any high data volumes to get trained. The hybrid system is grounded on both the Group Method of Data Handling and the concept of evolving systems that makes it possible to define both optimal parameter values and the best structure in every specific case. Adjusting parameters in a parallel fashion gives an option of increasing a processing speed of data handling. The system’s architecture may be evolving in an online mode as the synaptic weights, centers and widths’ parameters of the proposed neuro-fuzzy nodes are being tuned. This approach enables to overcome some problems of Big Data dimensionality in practical tasks of forecasting, classification and pattern recognition. \nReferences \n1. G. Hinton, S. Osindero, Y.-W. Teh, A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006) 2. I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, 2016) 3. Y. Bengio, Y. LeCun, G. Hinton, Deep learning. Nature 521, 436–444 (2015) 4. J. Schmidhuber, Deep learning in neural networks: an overview. Neural Netw. 61   \n5. E. Lughofer, Evolving Fuzzy Systems—Methodologies, Advanced Concepts and Applications (Springer, Berlin, 2011, 2015), pp. 85–117   \n6. Z. Hu, Y.V. Bodyanskiy, O.K. Tyshchenko, A cascade deep neuro-fuzzy system for high-dimensional online possibilistic fuzzy clustering, in Proceedings of the XI-th International Scientific and Technical Conference “Computer Science and Information Technologies” (CSIT 2016) (2016), pp. 119–122. https://doi.org/10.1109/stc-csit.2016. 7589884   \n7. P. Angelov, D. Filev, N. Kasabov, Evolving Intelligent Systems: Methodology and Applications (Willey, 2010)   \n8. Y.V. Bodyanskiy, O.A. Vynokurova, A.I. Dolotov, Self-learning cascade spiking neural network for fuzzy clustering based on group method of data handling. J. Autom. Inform. Sci. 45(3), 23–33 (2013) 9. Y. Bodyanskiy, O. Vynokurova, A. Dolotov, O. Kharchenko, Wavelet-neuro-fuzzy network structure optimization using GMDH for the solving forecasting tasks, in Proceedings of the 4th International Conference on Inductive Modelling ICIM 2013, Kyiv (2013), pp. 61–67   \n10. Y. Bodyanskiy, O. Vynokurova, N. Teslenko, Cascade GMDH-wavelet-neuro-fuzzy network, in Proceedings of the 4th International Workshop on Inductive Modeling «IWIM 2011» , Kyiv, Ukraine (2011), pp. 22–30   \n11. Y. Bodyanskiy, Y. Zaychenko, E. Pavlikovskaya, M. Samarina, Y. Viktorov, The neo-fuzzy neural network structure optimization using the GMDH for the solving forecasting and classification problems, in Proceedings of the International Workshop on Inductive Modeling, Krynica, Poland (2009), pp. 77–89   \n12. A.G. Ivakhnenko, Heuristic self-organization in problems of engineering cybernetics. Automatica 6(2), 207–219 (1970)   \n13. T. Yamakawa, E. Uchino, T. Miki, H. Kusanagi, A neo fuzzy neuron and its applications to system identification and prediction of the system behavior, in Proceedings of the 2nd International Conference on Fuzzy Logic and Neural Networks (1992), pp. 477–483   \n14. E. Uchino, T. Yamakawa, Soft computing based signal prediction, restoration and filtering, in Intelligent Hybrid Systems: Fuzzy Logic, Neural Networks and Genetic Algorithms (Kluwer Academic Publisher, Boston, 1997), pp. 331–349   \n15. T. Miki, T. Yamakawa, Analog implementation of neo-fuzzy neuron and its on-board learning, in Computational Intelligence and Applications (WSES Press, Piraeus, 1999), pp. 144–149   \n16. M. Sugeno, G.T. Kang, Structure identification of fuzzy model. Fuzzy Sets Syst. 28, 15–33 (1998)   \n17. T. Takagi, M. Sugeno, Fuzzy identification of systems and its applications to modeling and control. IEEE Trans. Syst. Man Cybern. 15, 116–132 (1985)   \n18. N. Kasabov, Evolving Connectionist Systems (Springer, London, 2003)   \n19. E. Lughofer, Evolving Fuzzy Systems—Methodologies, Advanced Concepts and Applications (Springer, Berlin, 2011)   \n20. R.J.-S. Jang, ANFIS: adaptive-network-based fuzzy inference systems. IEEE Trans. Syst. Man Cybern. 23, 665–685 (1993)   \n21. R.J.-S. Jang, C.-T. Sun, E. Mizutani, Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence (Prentice Hall, Upper Saddle River, 1997)   \n22. S. Osowski, Sieci neuronowe do przetwarzania informacji (Oficyna Wydawnicza Politechniki Warszawskiej, Warszawa, 2006)   \n23. A.G. Ivakhnenko, Long-Term Forecasting and Control of Complex Systems (Technica, Kiev, 1975)   \n24. A.G. Ivakhnenko, Polynomial theory of complex systems. IEEE Trans. Syst. Man. Cybern. 1 (4), 364–378 (1971)   \n25. A.G. Ivakhnenko, Self-Learning Systems of Recognition and Automatic Control (Technica, Kiev, 1969)   \n26. A.G. Ivakhnenko, D. Wuensch, G.A. Ivakhnenko, Inductive sorting-out GMDH algorithms with polynomial complexity for active neurons of neural networks. Neural Netw. 2, 1169– 1173 (1999)   \n27. A.G. Ivakhnenko, G.A. Ivakhnenko, J.A. Mueller, Self-organization of the neural networks with active neurons. Pattern Recognit. Image Anal. 4(2), 177–188 (1994)   \n28. G.A. Ivakhnenko, Self-organization of neuronet with active neurons for effects of nuclear test explosions forecasting. Syst. Anal. Model. Simul. 20, 107–116 (1995)   \n29. K.S. Narendra, K. Parthasarathy, Identification and control of dynamical systems using neural networks. IEEE Trans. Neural Netw. 1, 4–26 (1990)   \n30. T. Kondo, Identification of radial basis function networks by using revised GMDH-type neural networks with a feedback loop, in Proceedings of the SICE Annual Conference, Tokyo, Japan (2002), pp. 2882–2887   \n31. T. Ohtani, Automatic variable selection in RBF network and its application to neurofuzzy GMDH, in Proceedings of the Fourth International Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies, vol. 2 (2000), pp. 840–843   \n32. Yu. Zaychenko, The fuzzy group method of data handling and its application for economical processes forecasting. Sci. Inq. 7(1), 83–96 (2006)   \n33. T. Ohtani, H. Ichihashi, T. Miyoshi, K. Nagasaka, Y. Kanaumi, Structural learning of neurofuzzy GMDH with Minkowski norm, in Proceedings of the 1998 Second International Conference on Knowledge-Based Intelligent Electronic Systems, vol. 2 (1998), pp. 100–107   \n34. Y. Bodyanskiy, O. Vynokurova, I. Pliss, Hybrid GMDH-neural network of computational intelligence, in Proceedings of the 3rd International Workshop on Inductive Modeling, Krynica, Poland (2009), pp. 100–107   \n35. A.G. Ivakhnenko, V.S. Stepashko, Disturbance Tolerance of Modeling (Naukova Dumka, Kiev, 1985)   \n36. Y. Bodyanskiy, N. Teslenko, P. Grimm, Hybrid evolving neural network using kernel activation functions, in Proceedings 17th Zittau East-West Fuzzy Colloquium, Zittau/Goerlitz, HS (2010), pp. 39–46   \n37. D.T. Pham, X. Liu, Neural Networks for Identification, Prediction and Control (Springer, London, 1995)   \n38. A. Bifet, Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams (IOS Press, Amsterdam, 2010)   \n39. C.C. Aggarwal, Data Streams: Models and Algorithms (Advances in Database Systems) (Springer, New York, 2007)   \n40. Y. Bodyanskiy, O. Tyshchenko, D. Kopaliani, An extended neo-fuzzy neuron and its adaptive learning algorithm. Int. J. Intell. Syst. Appl. (IJISA) 7(2), 21–26 (2015)   \n41. L.-X. Wang, Adaptive Fuzzy Systems and Control. Design and Statistical Analysis (Prentice Hall, Upper Saddle River, 1994)   \n42. L.-X. Wang, J.M. Mendel, Fuzzy basis functions, universal approximation, and orthogonal least-squares learning. IEEE Trans. Neural Netw. 3(5), 807–814 (1992)",
        "chapter": "2 Deep Neural Networks and Hybrid GMDH-Neuro-fuzzy Networks in Big Data Analysis",
        "section": "2.10 A Deep GMDH System Based on the Extended Neo-fuzzy Neuron and Its Training",
        "subsection": "2.10.3 An Experimental Study",
        "subsubsection": "N/A"
    },
    {
        "content": "References \n1. G. Hinton, S. Osindero, Y.-W. Teh, A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006) 2. I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, 2016) 3. Y. Bengio, Y. LeCun, G. Hinton, Deep learning. Nature 521, 436–444 (2015) 4. J. Schmidhuber, Deep learning in neural networks: an overview. Neural Netw. 61   \n5. E. Lughofer, Evolving Fuzzy Systems—Methodologies, Advanced Concepts and Applications (Springer, Berlin, 2011, 2015), pp. 85–117   \n6. Z. Hu, Y.V. Bodyanskiy, O.K. Tyshchenko, A cascade deep neuro-fuzzy system for high-dimensional online possibilistic fuzzy clustering, in Proceedings of the XI-th International Scientific and Technical Conference “Computer Science and Information Technologies” (CSIT 2016) (2016), pp. 119–122. https://doi.org/10.1109/stc-csit.2016. 7589884   \n7. P. Angelov, D. Filev, N. Kasabov, Evolving Intelligent Systems: Methodology and Applications (Willey, 2010)   \n8. Y.V. Bodyanskiy, O.A. Vynokurova, A.I. Dolotov, Self-learning cascade spiking neural network for fuzzy clustering based on group method of data handling. J. Autom. Inform. Sci. 45(3), 23–33 (2013) 9. Y. Bodyanskiy, O. Vynokurova, A. Dolotov, O. Kharchenko, Wavelet-neuro-fuzzy network structure optimization using GMDH for the solving forecasting tasks, in Proceedings of the 4th International Conference on Inductive Modelling ICIM 2013, Kyiv (2013), pp. 61–67   \n10. Y. Bodyanskiy, O. Vynokurova, N. Teslenko, Cascade GMDH-wavelet-neuro-fuzzy network, in Proceedings of the 4th International Workshop on Inductive Modeling «IWIM 2011» , Kyiv, Ukraine (2011), pp. 22–30   \n11. Y. Bodyanskiy, Y. Zaychenko, E. Pavlikovskaya, M. Samarina, Y. Viktorov, The neo-fuzzy neural network structure optimization using the GMDH for the solving forecasting and classification problems, in Proceedings of the International Workshop on Inductive Modeling, Krynica, Poland (2009), pp. 77–89   \n12. A.G. Ivakhnenko, Heuristic self-organization in problems of engineering cybernetics. Automatica 6(2), 207–219 (1970)   \n13. T. Yamakawa, E. Uchino, T. Miki, H. Kusanagi, A neo fuzzy neuron and its applications to system identification and prediction of the system behavior, in Proceedings of the 2nd International Conference on Fuzzy Logic and Neural Networks (1992), pp. 477–483   \n14. E. Uchino, T. Yamakawa, Soft computing based signal prediction, restoration and filtering, in Intelligent Hybrid Systems: Fuzzy Logic, Neural Networks and Genetic Algorithms (Kluwer Academic Publisher, Boston, 1997), pp. 331–349   \n15. T. Miki, T. Yamakawa, Analog implementation of neo-fuzzy neuron and its on-board learning, in Computational Intelligence and Applications (WSES Press, Piraeus, 1999), pp. 144–149   \n16. M. Sugeno, G.T. Kang, Structure identification of fuzzy model. Fuzzy Sets Syst. 28, 15–33 (1998)   \n17. T. Takagi, M. Sugeno, Fuzzy identification of systems and its applications to modeling and control. IEEE Trans. Syst. Man Cybern. 15, 116–132 (1985)   \n18. N. Kasabov, Evolving Connectionist Systems (Springer, London, 2003)   \n19. E. Lughofer, Evolving Fuzzy Systems—Methodologies, Advanced Concepts and Applications (Springer, Berlin, 2011)   \n20. R.J.-S. Jang, ANFIS: adaptive-network-based fuzzy inference systems. IEEE Trans. Syst. Man Cybern. 23, 665–685 (1993)   \n21. R.J.-S. Jang, C.-T. Sun, E. Mizutani, Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence (Prentice Hall, Upper Saddle River, 1997)   \n22. S. Osowski, Sieci neuronowe do przetwarzania informacji (Oficyna Wydawnicza Politechniki Warszawskiej, Warszawa, 2006)   \n23. A.G. Ivakhnenko, Long-Term Forecasting and Control of Complex Systems (Technica, Kiev, 1975)   \n24. A.G. Ivakhnenko, Polynomial theory of complex systems. IEEE Trans. Syst. Man. Cybern. 1 (4), 364–378 (1971)   \n25. A.G. Ivakhnenko, Self-Learning Systems of Recognition and Automatic Control (Technica, Kiev, 1969)   \n26. A.G. Ivakhnenko, D. Wuensch, G.A. Ivakhnenko, Inductive sorting-out GMDH algorithms with polynomial complexity for active neurons of neural networks. Neural Netw. 2, 1169– 1173 (1999)   \n27. A.G. Ivakhnenko, G.A. Ivakhnenko, J.A. Mueller, Self-organization of the neural networks with active neurons. Pattern Recognit. Image Anal. 4(2), 177–188 (1994)   \n28. G.A. Ivakhnenko, Self-organization of neuronet with active neurons for effects of nuclear test explosions forecasting. Syst. Anal. Model. Simul. 20, 107–116 (1995)   \n29. K.S. Narendra, K. Parthasarathy, Identification and control of dynamical systems using neural networks. IEEE Trans. Neural Netw. 1, 4–26 (1990)   \n30. T. Kondo, Identification of radial basis function networks by using revised GMDH-type neural networks with a feedback loop, in Proceedings of the SICE Annual Conference, Tokyo, Japan (2002), pp. 2882–2887   \n31. T. Ohtani, Automatic variable selection in RBF network and its application to neurofuzzy GMDH, in Proceedings of the Fourth International Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies, vol. 2 (2000), pp. 840–843   \n32. Yu. Zaychenko, The fuzzy group method of data handling and its application for economical processes forecasting. Sci. Inq. 7(1), 83–96 (2006)   \n33. T. Ohtani, H. Ichihashi, T. Miyoshi, K. Nagasaka, Y. Kanaumi, Structural learning of neurofuzzy GMDH with Minkowski norm, in Proceedings of the 1998 Second International Conference on Knowledge-Based Intelligent Electronic Systems, vol. 2 (1998), pp. 100–107   \n34. Y. Bodyanskiy, O. Vynokurova, I. Pliss, Hybrid GMDH-neural network of computational intelligence, in Proceedings of the 3rd International Workshop on Inductive Modeling, Krynica, Poland (2009), pp. 100–107   \n35. A.G. Ivakhnenko, V.S. Stepashko, Disturbance Tolerance of Modeling (Naukova Dumka, Kiev, 1985)   \n36. Y. Bodyanskiy, N. Teslenko, P. Grimm, Hybrid evolving neural network using kernel activation functions, in Proceedings 17th Zittau East-West Fuzzy Colloquium, Zittau/Goerlitz, HS (2010), pp. 39–46   \n37. D.T. Pham, X. Liu, Neural Networks for Identification, Prediction and Control (Springer, London, 1995)   \n38. A. Bifet, Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams (IOS Press, Amsterdam, 2010)   \n39. C.C. Aggarwal, Data Streams: Models and Algorithms (Advances in Database Systems) (Springer, New York, 2007)   \n40. Y. Bodyanskiy, O. Tyshchenko, D. Kopaliani, An extended neo-fuzzy neuron and its adaptive learning algorithm. Int. J. Intell. Syst. Appl. (IJISA) 7(2), 21–26 (2015)   \n41. L.-X. Wang, Adaptive Fuzzy Systems and Control. Design and Statistical Analysis (Prentice Hall, Upper Saddle River, 1994)   \n42. L.-X. Wang, J.M. Mendel, Fuzzy basis functions, universal approximation, and orthogonal least-squares learning. IEEE Trans. Neural Netw. 3(5), 807–814 (1992) \n\nChapter 3 Pattern Recognition in Big Data Analysis \n3.1 Introduction \nData classification and pattern recognition substitute one of the widely used class of problems in Data Mining. Up to date many methods and algorithms were developed for pattern recognition in different spheres of science and technology. Most of the modern methods of classification may be divided into following classes: \n(1) methods based on statistical decision-making theory, incl. Bayesian methods based on application of conditional probability distributions;   \n(2) methods of discriminant analysis, incl. well-known SVM method and its derivatives;   \n(3) algebraic and linguistic methods;   \n(4) neural networks;   \n(5) fuzzy logic systems and fuzzy neural networks (FNN);   \n(6) special methods. \nBut for solution of classification problems with BD it’s extremely important is the development of new adequate methods or further improvement of existing methods which take into account the high dimension of BD warehouses. Most of them use various approaches and algorithms of dimensionality reduction, e.g. Principal component method (PCM) and similar techniques. \nAnother constructive approach for BD dimensionality reduction is hierarchical organization of data. \nIn this chapter classification method based FNN is considered and some algorithms of classification problems dimensionality reduction are presented and discussed. In the Sect. 3.2 FNN NEFClass is considered Its architecture and training algorithm is presented and investigated. In the Sect. 3.3 modified FNN NEFClass M is described free of some drawbacks of basic FNN NEFCLass its training algorithms are described and analyzed.",
        "chapter": "2 Deep Neural Networks and Hybrid GMDH-Neuro-fuzzy Networks in Big Data Analysis",
        "section": "References",
        "subsection": "N/A",
        "subsubsection": "N/A"
    },
    {
        "content": "Chapter 3 Pattern Recognition in Big Data Analysis \n3.1 Introduction \nData classification and pattern recognition substitute one of the widely used class of problems in Data Mining. Up to date many methods and algorithms were developed for pattern recognition in different spheres of science and technology. Most of the modern methods of classification may be divided into following classes: \n(1) methods based on statistical decision-making theory, incl. Bayesian methods based on application of conditional probability distributions;   \n(2) methods of discriminant analysis, incl. well-known SVM method and its derivatives;   \n(3) algebraic and linguistic methods;   \n(4) neural networks;   \n(5) fuzzy logic systems and fuzzy neural networks (FNN);   \n(6) special methods. \nBut for solution of classification problems with BD it’s extremely important is the development of new adequate methods or further improvement of existing methods which take into account the high dimension of BD warehouses. Most of them use various approaches and algorithms of dimensionality reduction, e.g. Principal component method (PCM) and similar techniques. \nAnother constructive approach for BD dimensionality reduction is hierarchical organization of data. \nIn this chapter classification method based FNN is considered and some algorithms of classification problems dimensionality reduction are presented and discussed. In the Sect. 3.2 FNN NEFClass is considered Its architecture and training algorithm is presented and investigated. In the Sect. 3.3 modified FNN NEFClass M is described free of some drawbacks of basic FNN NEFCLass its training algorithms are described and analyzed. \nIn Sect. 3.5 application of FNN NEFClass M for optical images recognition obtained with multi spectral system is presented and analyzed. In the Sect. 3.6 implementation of FNN for medical images of uterus classification in the problem of express diagnostics is considered. \nIn the Sect. 3.7 the hybrid CNN-FNN network is considered which was suggested for medical images of breast tumor recognition in the problem of medical diagnostics. The experimental of the suggested approach are presented and its practical implementation for medical images classification and is described. The reduction of features dimensionality in his problem was considered and for its solution Principal components method was suggested and its efficiency was estimated. The suggested approach may be used for solution BD classification problems. \n3.2 FNN NEFClass. Architecture, Properties, the Algorithms of Learning of Base Rules and Membership Functions \nA classification problem is one of the most actual spheres of application of the computational intelligence systems. For its decision different approaches and methods were suggested, among which popular solutions were offered, combining neural networks and fuzzy inference systems. One of such decisions is the system NEFClass (NEuro-Fuzzy CLASSifier), based on the generalized architecture of fuzzy perceptron and suggested by D. Nauck and R. Kruse in [1–3]. \nBoth original and modified model of NEFClass are derivative from the general model of fuzzy perceptron [4]. A model purpose is a development of fuzzy rules from a set of data which can be divided into the several non-overlapping classes. The fuzziness arises up due to the imperfect or incomplete measuring of properties of objects, subject to classification. \nFuzzy rules, describing expert information, have the following form: \nif is $mu _ { 1 i }$ and $x _ { 2 }$ is $mu _ { 2 i }$ and … and $x _ { n }$ is $mu _ { n i }$ , then pattern $left( x _ { 1 } , x _ { 2 } , . . . . , x _ { n } right)$ belongs to the class of $i$ , \nwhere $mu _ { 1 i } , . . . , mu _ { n i }$ ; are MF of fuzzy sets. \nThe goal of NEFClass is to define these rules, as well as parameters of membership functions for fuzzy sets. It was assumed here, that intersection of two different sets is empty. \nThe system NEFClass has 3-layer successive architecture (see Fig. 3.1). The first layer $U _ { 1 }$ contains inputs neurons which inputs patterns are fed in. Activating of these neurons does not change usually input values. The hidden layer $U _ { 2 }$ contains fuzzy rules, and the third layer $U _ { 3 }$ consists of output neurons (classifiers).",
        "chapter": "3 Pattern Recognition in Big Data Analysis",
        "section": "3.1 Introduction",
        "subsection": "N/A",
        "subsubsection": "N/A"
    },
    {
        "content": "In Sect. 3.5 application of FNN NEFClass M for optical images recognition obtained with multi spectral system is presented and analyzed. In the Sect. 3.6 implementation of FNN for medical images of uterus classification in the problem of express diagnostics is considered. \nIn the Sect. 3.7 the hybrid CNN-FNN network is considered which was suggested for medical images of breast tumor recognition in the problem of medical diagnostics. The experimental of the suggested approach are presented and its practical implementation for medical images classification and is described. The reduction of features dimensionality in his problem was considered and for its solution Principal components method was suggested and its efficiency was estimated. The suggested approach may be used for solution BD classification problems. \n3.2 FNN NEFClass. Architecture, Properties, the Algorithms of Learning of Base Rules and Membership Functions \nA classification problem is one of the most actual spheres of application of the computational intelligence systems. For its decision different approaches and methods were suggested, among which popular solutions were offered, combining neural networks and fuzzy inference systems. One of such decisions is the system NEFClass (NEuro-Fuzzy CLASSifier), based on the generalized architecture of fuzzy perceptron and suggested by D. Nauck and R. Kruse in [1–3]. \nBoth original and modified model of NEFClass are derivative from the general model of fuzzy perceptron [4]. A model purpose is a development of fuzzy rules from a set of data which can be divided into the several non-overlapping classes. The fuzziness arises up due to the imperfect or incomplete measuring of properties of objects, subject to classification. \nFuzzy rules, describing expert information, have the following form: \nif is $mu _ { 1 i }$ and $x _ { 2 }$ is $mu _ { 2 i }$ and … and $x _ { n }$ is $mu _ { n i }$ , then pattern $left( x _ { 1 } , x _ { 2 } , . . . . , x _ { n } right)$ belongs to the class of $i$ , \nwhere $mu _ { 1 i } , . . . , mu _ { n i }$ ; are MF of fuzzy sets. \nThe goal of NEFClass is to define these rules, as well as parameters of membership functions for fuzzy sets. It was assumed here, that intersection of two different sets is empty. \nThe system NEFClass has 3-layer successive architecture (see Fig. 3.1). The first layer $U _ { 1 }$ contains inputs neurons which inputs patterns are fed in. Activating of these neurons does not change usually input values. The hidden layer $U _ { 2 }$ contains fuzzy rules, and the third layer $U _ { 3 }$ consists of output neurons (classifiers). \nActivations of rule neurons and neurons of output layer with the pattern of $boldsymbol { mathrm { ~ p ~ } }$ are calculated so: \nor alternatively \nwhere $W ( x , r )$ is a fuzzy weight of connection of input neuron $x$ with a rule neuron $R$ , and $W ( R , c )$ —fuzzy weight of connection of a rule neuron $R$ with the neuron $c$ of output layer. Instead of application of operations of maximum and minimum it is possible to use other functions of so-called “t-norm” and “t-co-norm” accordingly [1]. \nA rule base is approximation of unknown function and describes a classification task $phi ( x )$ , such, that $c _ { i } = 1 , ~ c _ { j } = 0 ~ ( j = 1 , . . . , m , forall ~ j neq i )$ , if pattern $x$ belongs to the class $C _ { i }$ . \nEvery fuzzy set is marked a linguistic term, such as «large», «small», «middle» et cetera. Fuzzy sets and linguistic rules present approximation of classifying function and determine the result of the system NEFClass. They are obtained from a sample by learning. It’s necessary, that for every linguistic value (for example, $mathfrak { K } _ { 1 }$ is positive and large») there should be only one presentation of fuzzy set. \n\nLearning in the System NEFClass \nThe system NEFClass can be built on partial knowledge about patterns. An user must define the amount of initial fuzzy sets for each of object features (number of terms) and set the value $k _ { m a x }$ that is a maximal number of rule nodes, which can be created in the hidden layer. For learning triangular MF are used. \nConsider the system of NEFClass with n input neurons $x _ { 1 } , . . . , x _ { n } , k left( k leq k _ { operatorname* { m a x } } right)$ rule neurons and $m$ output neurons $c _ { 1 } , . . . , c _ { m } ,$ . The learning sample of patterns is also given: $L = { ( p _ { 1 } , t _ { 1 } ) , . . . , ( p _ { s } , t _ { s } ) }$ , each of which consists of input pattern $p in$ $R ^ { n }$ and desired pattern $t in { 0 , 1 } ^ { m }$ . \nA learning algorithm consists of two stages. \nStage 1. Generation of rule base. \nThe first stage whose purpose is to create rule neurons of the system NEFClass consists of the followings steps [1–3]: \n1. Choose a next pattern $( p , t )$ from sample $L$ .   \n2. For every input neuron $x _ { i } in U _ { 1 }$ find such membership $mu _ { J _ { i } } ^ { i }$ that \nwhere xi pi \n3. If a number of rule nodes $k$ is less than $k _ { mathrm { m a x } }$ and there is no rule node $R$ such, that \nthen create such node and connect it with an output node $c _ { i }$ , if $t _ { i } = 1$ , and connect it with all input neurons and assign the corresponding weights $mu _ { J _ { i } } ^ { i }$ to connections. \n4. If there are still not-processed patterns in $L$ and $k < k _ { mathrm { m a x } }$ , then go to the step 1 and continue learning using next pattern, and otherwise stop.   \n5. Determine a rule base by one of three procedures: a. “Simple” rules learning: we leave the first $k$ rules only (stop creation of rules, if it was created $k = k _ { mathrm { m a x } }$ rules). b. The “best” learning rules: we process patterns in $L$ and accumulate activating of every rule neuron for every class of patterns which were entered into system NEFClass. If rule neuron $R$ shows the greater accumulation of activating for a class $C _ { j }$ than for a class $C _ { R }$ , which was specified initially for \nthis rule, then change implication of rule $mathrm { ~ bf ~ R ~ }$ from $C _ { R }$ to $C _ { j } ,$ that means connect $R$ with the output neuron $c _ { j } .$ We continue processing of patterns in $L$ farther and calculate for every rule neuron the activation function: \nwhere \nWe leave $k$ rule neurons with the greatest values of $V _ { R }$ and delete other rule neurons from the system NEFClass. \nc. The “best for every class” algorithm of learning: we operate as in the previous case, but leave for each class $C _ { j }$ only those best $bigg [ frac { k } { m } bigg ]$ rules, the consequences of which relate to the class $C _ { j }$ (where $[ x ]$ is integer part from $x$ ). \nLearning of Fuzzy Sets MF \nStage 2 \nOn the second stage learning of parameters of membership functions (MF) of fuzzy sets is performed. A learning algorithm with teacher of the system NEFClass must adapt MF of fuzzy sets. The algorithm cyclic runs through all learning patterns of the sample L, executing the following steps, until one of stop criteria will be fulfilled [1–3]. \nSteps: \n1. Choose a next pattern $( p , t )$ from sample $L$ , enter it into FNN NEFclass and determine an output vector $c$ .   \n2. For every output neuron $mathbf { c _ { i } }$ calculate the value $delta _ { C _ { i } }$ \nwhere $t _ { i }$ is a desired output, $boldsymbol { a } _ { c _ { i } }$ is an real output of neuron $c _ { i }$ . \n3. For every rule neuron $R$ , for which output is $a _ { R } > 0$ execute: \na. determine a value $delta _ { R }$ , equal \nb. Find such $x ^ { prime }$ , that \nc. For fuzzy sets $W ( x ^ { prime } , R )$ determine displacement (shift) of parameters of MF $Delta _ { a } , Delta _ { b } , Delta _ { c }$ , using learning speed $sigma > 0$ : \nand execute the changes of $W ( x ^ { prime } , R )$ . \nd. Calculate an rule error: \nEnd of iteration. Repeat the described iterations until condition of stop will be fulfilled. It is possible to use as criteria of stop, for example, such: \n1. An error has not decreased during n iterations.   \n2. Stop learning after achievement of the defined (desirably close to the zero) error value. \n3.3 Analysis NEFClass Properties. The Modified System NEFClassM \nFNN NEFClass has several obvious advantages, distinguishing it among the other classification systems. The most important are: easiness of implementation, high-speed algorithms of learning, as well as that is the most important, high accuracy of data classification—at the level of the best systems in this area. However, the basic system NEFClass has some shortcomings: \n1. formulas used for parameters learning are empirical in nature, in addition, 2. it is not clear how to choose in the learning algorithm the learning speed parameter $sigma$ . \nTherefore, these shortcomings were deleted in the modification of basic system – so-called system NEFClass-M (modified) developed in [5].",
        "chapter": "3 Pattern Recognition in Big Data Analysis",
        "section": "3.2 FNN NEFClass. Architecture, Properties, the Algorithms of Learning of Base Rules and Membership Functions",
        "subsection": "N/A",
        "subsubsection": "N/A"
    },
    {
        "content": "Randomization and careful selection rate constants learning $sigma$ are inherent properties of the system NEFCLASS-M. These properties have been designed to mitigate the impact some of the shortcomings the original model and have made it possible to achieve a significant improvement in the quality of classification. \nRandomization. Because of the nature of the training algorithm “simple” rules base and learning algorithm of fuzzy sets, the outcome of the training network for these algorithms are highly dependent on order, in which samples are represented in a learning sample. If, for example, the samples will be sorted by classes, the system will better classify the patterns of one class and substantially worse—the patterns of the other class. Ideally, the patterns in the training sample must be randomly mixed, in order to avoid the negative effect. \nImplementation of the system NEFClassM [5] avoids this complexity by “randomization of patterns order in a learning sample after its boot. Moreover, such “randomization” occurs before each iteration of learning algorithm. As further experiments had shown, this allows to achieve a more stable and, often, the better classification results, which do not depend on the order in which patterns in a learning sample has been submitted by a user. \nChoice of speed training. In the learning algorithm of fuzzy sets in the model NEFCLASS is used parameter training speed $sigma$ . As experiments had shown, carried out in the course of developing the NEFClass M, this parameter plays a vital role in the success of the training. \nThe experiments had shown that, under other parameters being equal, for each specific task training there exists a certain value $sigma$ , which ensures a minimum percentage of erroneous classification after the training. Unfortunately, to obtain analytical dependence for optimal parameter value is very difficult because learning algorithm NEFCLASS as a whole is empirical; however, using search and try method it was found that for many tasks optimal value $sigma$ lies in the narrow range [0.06–0.1], in particular it may be equal to 0.07. This value has been set for the program which implements a modified model NEFClass M [5]. \n3.3.1 The Modified Model NEFCLASS \nConsider the basic shortcomings in the NEFCLass learning algorithm. \nThe analysis of the drawbacks of NEFCLASS has shown that their principal cause lies mostly in an empirical learning algorithm of fuzzy sets. Therefore, a natural approach, aimed to correct the situation, was the replacement of empirical learning algorithm by the strict optimization algorithm with all the ensuing consequences for network architecture and algorithms. \nBoth the original and modified model NEFCLASS are based on the architecture of a fuzzy perceptron [1, 5, 6]. Architectural differences of the original and the modified model lie in the form of membership functions of fuzzy sets, function t-norm for calculation rules activations of neurons, as well as aggregating function (t-conorms), determining the activation of output neurons. \n\nThe application of numerical optimization methods requires differentiability of the membership functions of fuzzy sets—condition to which the triangular membership functions don’t satisfy. Therefore the modified model of fuzzy sets uses the Gaussian membership functions, described as \nThis membership function is defined by two parameters— $mathbf { nabla } cdot mathbf { } a$ and $b$ . The requirement of differentiability also dictates the choice of t-norms (intersections) for calculating neuron activation rules. In the system NEFCLASS for this operation is used minimum; in the modified system NEFCLASS-M -product of the corresponding values. \nFinally, the kind of aggregate function (t-conorm) for modified model is limited only by the weighted sum. The reason consists in the fact that the maximum function which is used in the original system also does not satisfy the condition of differentiability. The main change is obviously relates to a learning algorithm of fuzzy sets. The objective function in the modified system NEFClass is minimization of the mean squared error on the training sample by analogy with the classical (clear) neural networks: \nwhere the N—number of patterns in the training sample, $a _ { c } ^ { ( p ) }$ is an activation vector of neurons in the output layer for the next training sample $p$ , $a _ { c } ^ { ( p ) * }$ is a target value of this vector for the pattern p. The components of the target vector for the pattern $boldsymbol { mathrm { ~ p ~ } }$ are equal: \nwhere $j$ is a index of the true class to which this pattern p belongs, $i$ is classification of pattern $boldsymbol { mathrm { ~ p ~ } }$ by NEFClass. The argument of numerical optimization aimed at reducing MSE for the training set is the aggregate vector of parameters $a$ and $b$ of FNN. As a specific training method can be used any method unconstrained optimization such as the gradient method or the conjugate gradient method, these both methods were implemented in this investigation. \n3.4 Experimental Studies. Comparative Analysis of FNN NEFClass and NEFClass-M in Classification Problems \nExperiments were conducted on the classification of the two sets of data IRIS and WBC [5, 6]. Selection of IRIS and WBC test kits was dictated by two considerations: firstly, these sets can be considered standard for classification problems, and secondly, in the original works of authors NEFCLASS model was tested on these data sets [1–3]. This allows to compare the results of the base system NEFCLASS with a modified NEFCLASS_M and estimate the effect of introduced improvements. \nIRIS Data Set \nIRIS set contains 150 samples belonging to three different classes (Iris Setosa, Iris Versicolour, and Iris Virginica), 50 samples of each class. Each sample is characterized by four properties. IRIS is the only one set by classification simplicity for which even a simple strategy of rules selection gives good results. \nIn the first experiment, in a modified model NEFClass-M “simple” rules learning algorithm was used, and their number was limited to 10 with 3 fuzzy sets per variable (all other parameters were set to the default values). As a result, the system has created 10 rules and achieved only 4 classification errors of the 150 (i.e. $9 7 . 3 %$ correct) patterns. \nThe best result, which was managed to achieve with the “simple” rules learning algorithm is three rules with two essential variables, $x _ { 3 }$ and $x _ { 4 }$ , and the same order of misclassification (4 errors) [5]: \nR1: IF (any, any, large, large) THEN Class 3   \nR2: IF (any, any, medium, medium) THEN Class 2   \nR3: IF (any, any, small, small) THEN Class 1 \nThe same result was achieved for the “better” and “best in class” rules learning algorithms. However, for the last two algorithms it’s possible further reduction in the number of fuzzy sets for variable $x _ { 3 }$ and $x _ { 4 }$ under the following rules (6 erroneous classification): \nR1: IF (any, any, small, small) THEN Class 1   \nR2: IF (any, any, large, small) THEN Class 2   \nR3: IF (any, any, large, large) THEN Class 3 \nThe authors model NEFCLASS obtained the similar results, except that in their experiments, they used three fuzzy sets (linguistic values) for $x _ { 3 }$ and $x _ { 4 }$ [1, 2]. Thus, for a set of data IRIS it was managed to achieve better results than in the original works—exclusively simple set rules of two variables with only two decomposing sets for each variable. \nDataset WBC \nThe next test sample for classification was standard data sample Wisconsin Breast Cancer (WBC). When processing sample Wisconsin Breast Cancer using system",
        "chapter": "3 Pattern Recognition in Big Data Analysis",
        "section": "3.3 Analysis NEFClass Properties. The Modified System NEFClassM",
        "subsection": "3.3.1 The Modified Model NEFCLASS",
        "subsubsection": "N/A"
    },
    {
        "content": "3.4 Experimental Studies. Comparative Analysis of FNN NEFClass and NEFClass-M in Classification Problems \nExperiments were conducted on the classification of the two sets of data IRIS and WBC [5, 6]. Selection of IRIS and WBC test kits was dictated by two considerations: firstly, these sets can be considered standard for classification problems, and secondly, in the original works of authors NEFCLASS model was tested on these data sets [1–3]. This allows to compare the results of the base system NEFCLASS with a modified NEFCLASS_M and estimate the effect of introduced improvements. \nIRIS Data Set \nIRIS set contains 150 samples belonging to three different classes (Iris Setosa, Iris Versicolour, and Iris Virginica), 50 samples of each class. Each sample is characterized by four properties. IRIS is the only one set by classification simplicity for which even a simple strategy of rules selection gives good results. \nIn the first experiment, in a modified model NEFClass-M “simple” rules learning algorithm was used, and their number was limited to 10 with 3 fuzzy sets per variable (all other parameters were set to the default values). As a result, the system has created 10 rules and achieved only 4 classification errors of the 150 (i.e. $9 7 . 3 %$ correct) patterns. \nThe best result, which was managed to achieve with the “simple” rules learning algorithm is three rules with two essential variables, $x _ { 3 }$ and $x _ { 4 }$ , and the same order of misclassification (4 errors) [5]: \nR1: IF (any, any, large, large) THEN Class 3   \nR2: IF (any, any, medium, medium) THEN Class 2   \nR3: IF (any, any, small, small) THEN Class 1 \nThe same result was achieved for the “better” and “best in class” rules learning algorithms. However, for the last two algorithms it’s possible further reduction in the number of fuzzy sets for variable $x _ { 3 }$ and $x _ { 4 }$ under the following rules (6 erroneous classification): \nR1: IF (any, any, small, small) THEN Class 1   \nR2: IF (any, any, large, small) THEN Class 2   \nR3: IF (any, any, large, large) THEN Class 3 \nThe authors model NEFCLASS obtained the similar results, except that in their experiments, they used three fuzzy sets (linguistic values) for $x _ { 3 }$ and $x _ { 4 }$ [1, 2]. Thus, for a set of data IRIS it was managed to achieve better results than in the original works—exclusively simple set rules of two variables with only two decomposing sets for each variable. \nDataset WBC \nThe next test sample for classification was standard data sample Wisconsin Breast Cancer (WBC). When processing sample Wisconsin Breast Cancer using system \nNEFClass-M interesting results were obtained which didn’t always coincide with the results of the basic model NEFCLASS. \nFollowing the course of the experiments by the authors of NEFCLASS [1, 2] for system training rule base learning algorithm with the “best in the class” (three sets in the variable). was used with maximum 4 rules. The resulting error of misclassification obtained for the system NEFClass-M was 28 patterns of 663 $( 9 5 . 7 %$ correct) [7]. Very interesting is the fact that for model NEFClass for similar parameters correct classification value was only $8 0 . 4 %$ (135 misclassification). \nThis is a significant advantage of the modified system NEFClass-M which can be explained by suggested modifications that distinguish this model from basic NEFCLASS model, namely, the use of randomization algorithm, the choice of learning rate and application of numerical algorithm of optimization (gradient method for MF learning. \nThe best result that was managed to obtain for the data set WBC is the rule base of 8 rules with five essential variables $x _ { 1 } , x _ { 2 } , x _ { 4 } , x _ { 6 }$ and $x _ { 9 }$ (misclassification—19 errors) [5]: \nR1: IF (small, small, any, small, any, small, any, any, small) THEN Class 1   \nR2: IF (small, small, any, large, any, small, any, any, small) THEN Class 1   \nR3: IF (small, small, any, small, any, small, any, any, large) THEN Class 1   \nR4: IF (large, large, any, small, any, large, any, any, small) THEN Class 2   \nR5: IF (large, large, any, large, any, small, any, any, small) THEN Class 2   \nR6: IF (small, large, any, small, any, large, any, any, small) THEN Class 2   \nR7: IF (large, small, any, small, any, small, any, any, small) THEN Class 2   \nR8: IF (large, small, any, small, any, small, any, any, large) THEN Class 2 \nComparable results (24 misclassification) were obtained with the use of a maximum of 2 rules (“the best in the class”) with all the important variables, except $x _ { 5 }$ and $x _ { 7 }$ : \nR1: IF (small, small, small, small, any, small, any, small, small) THEN Class 1   \nR2: IF (large, large, large, small, any, large, any, large, small) THEN Class 2 \nThus, the results obtained by NEFCLASS-M are superior over basic model NEFCLASS both in number of rules/significant variables and classification accuracy. This confirms the efficiency of the modifications made to the model NEFClass: randomization, the correct choice of speed training and application of numerical optimization algorithms. \n3.5 Application of NEFClass in the Problem of Objects Recognition at Electro-Optical Images \nUsing multi-spectral electro system operating in three ranges—red, green and blue images were obtained of the ocean and the coastal surface. It was required to recognize objects in the form of geometric shapes on water surface and on the sand [8, 9]. For these purposes, accounting the complexity of the problem as well as a large level of noise it was suggested to use fuzzy neural networks, in particular NEFClass. In order to organize the training of FNN NEFClass a number of learning algorithms were developed—gradient, conjugate gradient and genetic ones and their efficiency was investigated and compared to the basic training algorithm of the system NEFClass [1, 2].",
        "chapter": "3 Pattern Recognition in Big Data Analysis",
        "section": "3.4 Experimental Studies. Comparative Analysis of FNN NEFClass and NEFClass-M in Classification Problems",
        "subsection": "N/A",
        "subsubsection": "N/A"
    },
    {
        "content": "3.5.1 Gradient Learning Algorithm for NEFClass \nFor the first stage of the algorithm—learning rule base the first phase of the basic algorithm NEFClass is used. The second stage uses a gradient algorithm for training the feedforward neural network, which is described below [5, 8]. \nLet the criterion of training fuzzy neural network, which has 3 layers (one hidden layer), be as follows: \nwhere $t _ { i }$ —the desired value of the i-th output of neural network; \n$N E T _ { i } ( W )$ —the actual value of the i-th neural network output for the weight matrix \nLet activation function for the hidden layer neurons (neurons of rules) be such: \nwhere $mu _ { j i } ( x )$ —membership function, which has the form (Gaussian): \nand the activation function of neurons in the output layer (weighted sum): \nor maximum function: \nConsider the gradient learning algorithm of fuzzy perceptron. \n1. Let $W ( n )$ —be the current value of the weights matrix. The algorithm has the following form: \nwhere $gamma _ { n }$ —the step size at $n$ -th iteration; \n$nabla _ { boldsymbol { w } } e big ( W ( n ) big )$ —gradient (direction), which reduces the criterion (3.12). \n2. At each iteration, we first train (adjust) the input weight $W$ , which depend on the parameters $a$ and $b$ (see the expression 3.14) \nwhere $gamma _ { n + 1 } ^ { prime }$ —step size for parameter $b$ . \n3. We find (train) output weight: \n4. $n : = n + 1$ and go to the next iteration. \nThe gradient method is the first proposed learning algorithm, it is easy to implement, but has the disadvantages: \n1. converges slowly;   \n2. only finds a local extremum. \nConjugate Gradient Method for the System NEFClass \nConjugate gradient algorithm, as well as more general algorithm of conjugate directions, was used in the field of optimization thanks to a wide class of problems for which it ensures the convergence to the optimal solution for a finite number of steps. Its description is considered in [9] and isn’t described here. \n3.5.2 Genetic Method for Training System NEFClass \nConsider the implementation of a genetic algorithm to train NEFCLASS. This algorithm is a global optimization algorithm. It uses the following mechanisms [9]: \n1. crossing-over pairs of parents and generation of descendants;   \n2. mutation (random effects of the action);   \n3. the natural selection of the best (selection). \nThe purpose of training—to minimize the mean square error: \nwhere $mathbf { M }$ is the number of classes; $t _ { k }$ is the desired classification; \n$N E T _ { k } ( W ) .$ —classification result of NEFCLASS; $W = [ W _ { I } , W _ { O } ]$ , $W _ { I } = left. boldsymbol { w } _ { i j } ^ { I } right.$ are inputs weights, $W _ { O } = left. w _ { i j } ^ { O } right. .$ —output weights. Any in\u0007divi\u0007dual (specimen) is described\u0007by t\u0007he appropriate vector of weights $W$ . Set the initial population of $mathbf { N }$ individuals $left[ W _ { I } ( 0 ) , . . . , W _ { i } ( 0 ) , . . . , W _ { N } ( 0 ) right]$ . Calculate the index of fitness (FI), and evaluate the quality of recognition: \nwhere $C$ —a constant. \nNext step is the crossing of parental pairs. When selecting parents a probabilistic mechanism is used. Let $P _ { i }$ be the probability of selecting the i-th parent",
        "chapter": "3 Pattern Recognition in Big Data Analysis",
        "section": "3.5 Application of NEFClass in the Problem of Objects Recognition at Electro-Optical Images",
        "subsection": "3.5.1 Gradient Learning Algorithm for NEFClass",
        "subsubsection": "N/A"
    },
    {
        "content": "The gradient method is the first proposed learning algorithm, it is easy to implement, but has the disadvantages: \n1. converges slowly;   \n2. only finds a local extremum. \nConjugate Gradient Method for the System NEFClass \nConjugate gradient algorithm, as well as more general algorithm of conjugate directions, was used in the field of optimization thanks to a wide class of problems for which it ensures the convergence to the optimal solution for a finite number of steps. Its description is considered in [9] and isn’t described here. \n3.5.2 Genetic Method for Training System NEFClass \nConsider the implementation of a genetic algorithm to train NEFCLASS. This algorithm is a global optimization algorithm. It uses the following mechanisms [9]: \n1. crossing-over pairs of parents and generation of descendants;   \n2. mutation (random effects of the action);   \n3. the natural selection of the best (selection). \nThe purpose of training—to minimize the mean square error: \nwhere $mathbf { M }$ is the number of classes; $t _ { k }$ is the desired classification; \n$N E T _ { k } ( W ) .$ —classification result of NEFCLASS; $W = [ W _ { I } , W _ { O } ]$ , $W _ { I } = left. boldsymbol { w } _ { i j } ^ { I } right.$ are inputs weights, $W _ { O } = left. w _ { i j } ^ { O } right. .$ —output weights. Any in\u0007divi\u0007dual (specimen) is described\u0007by t\u0007he appropriate vector of weights $W$ . Set the initial population of $mathbf { N }$ individuals $left[ W _ { I } ( 0 ) , . . . , W _ { i } ( 0 ) , . . . , W _ { N } ( 0 ) right]$ . Calculate the index of fitness (FI), and evaluate the quality of recognition: \nwhere $C$ —a constant. \nNext step is the crossing of parental pairs. When selecting parents a probabilistic mechanism is used. Let $P _ { i }$ be the probability of selecting the i-th parent \nThen the crossing of selected pairs is performed. \nIt’s possible to apply different mechanisms of crossing. For example: for the first offspring even components of the vector of the first parent and the odd components of the vector of the other parent are taken, and for the second on the contrary: \nwhere $W _ { i } = left[ w _ { i j } right] _ { j = 1 , R }$ , $m leq R / 2$ . \nChoose $frac { N } { 2 }$ pairs of parents and generate $N$ descendants. \nAfter generating offsprings, the mutation acts on the new population: \nwhere $a = c o n s t in [ - 1 ; + 1 ]$ ; \n$xi ( n ) = a e ^ { - alpha n }$ ; $alpha$ —mutation rate of extinction; \n$boldsymbol { alpha }$ —is selected randomly from the interval [0, 1]. \nThen, after the effect of mutation selection procedure is performed in a population, which allows to choose the “fittest” individuals. Different mechanisms of selection may be used. \n1. Complete replacement of the old to the new population.   \n2. Selecting the best $N$ of all existing species $N _ { p a r } + N _ { c h }$ by the criterion of maximum FI \nAfter the crossing, mutation and selection of the current iteration ends. The iterations are repeated until one of the stop criteria will be fulfilled. \n3.5.3 Experiments on Objects Recognition on Optical Images \nFor images processing the electro-optical imaging system ENVI was used and its ability to map, that is, to combine the images of the check points, obtained from the different spectral cameras [8]. This enables to get a multispectral image. In the Fig. 3.2 initial data for mapping are shown.",
        "chapter": "3 Pattern Recognition in Big Data Analysis",
        "section": "3.5 Application of NEFClass in the Problem of Objects Recognition at Electro-Optical Images",
        "subsection": "3.5.2 Genetic Method for Training System NEFClass",
        "subsubsection": "N/A"
    },
    {
        "content": "Then the crossing of selected pairs is performed. \nIt’s possible to apply different mechanisms of crossing. For example: for the first offspring even components of the vector of the first parent and the odd components of the vector of the other parent are taken, and for the second on the contrary: \nwhere $W _ { i } = left[ w _ { i j } right] _ { j = 1 , R }$ , $m leq R / 2$ . \nChoose $frac { N } { 2 }$ pairs of parents and generate $N$ descendants. \nAfter generating offsprings, the mutation acts on the new population: \nwhere $a = c o n s t in [ - 1 ; + 1 ]$ ; \n$xi ( n ) = a e ^ { - alpha n }$ ; $alpha$ —mutation rate of extinction; \n$boldsymbol { alpha }$ —is selected randomly from the interval [0, 1]. \nThen, after the effect of mutation selection procedure is performed in a population, which allows to choose the “fittest” individuals. Different mechanisms of selection may be used. \n1. Complete replacement of the old to the new population.   \n2. Selecting the best $N$ of all existing species $N _ { p a r } + N _ { c h }$ by the criterion of maximum FI \nAfter the crossing, mutation and selection of the current iteration ends. The iterations are repeated until one of the stop criteria will be fulfilled. \n3.5.3 Experiments on Objects Recognition on Optical Images \nFor images processing the electro-optical imaging system ENVI was used and its ability to map, that is, to combine the images of the check points, obtained from the different spectral cameras [8]. This enables to get a multispectral image. In the Fig. 3.2 initial data for mapping are shown. \n\nAfter selecting the 15 control points in the images in different spectrum (this function is not automated) images are merged and we get the so-called multispectral cube. The result is shown in Fig. 3.3. \nOn the images there were nine different types of surfaces that need to be classified. For analysis and processing, so-called ROI (Region of Interest) on images were used. On the image homogeneous region was determined, for example, sand, water, foam, target red target white color and so on. The result of this detection can be seen in Fig. 3.4. \nNext, using a processing system the mean value and the variance of the selected region were received. The data obtained were later tabulated. \nThese data characterize the nine classes of surface areas [8]: \n• white target; red target; green target; blue target; yellow target; foam; water; dry sand; wet sand. \nFor classification of objects it was suggested to use FNN NEFClass_M [8]. These types of surfaces correspond to nine output nodes in the system NEFClass_M. \nThe total number of features used to classify the kinds of surfaces is four, namely: \nthe brightness in the red spectrum (RS);   \nthe brightness in the blue spectrum (BS);   \nthe brightness in the green spectrum (GS);   \nbrightness in the infrared spectrum (IS). \nThe total number of data is 99, 11 for each class. \nPresent the main statistical characteristics of the data set obtained by multispectral system «Mantis» (Tables 3.1 and 3.2) [8]. \nTo explore the effectiveness of various learning algorithms in the problem of electro-optical image recognition using NEFClass software kit was developed named NEFClass- BGCGG (Basic, Gradient, Conjugate Gradient, Genetic) [8, 9]. \nFurther experiments were carried out with the software kit NEFClass-BGCGG. According to the basic principle of model investigation experiments were carried out by changing only one parameter each time. Of the available 99 patterns 54 patterns served as a training sample. The other 45 patterns were used for testing. The values of the basic parameters of the simulation algorithm were set to the starting positions (see Table 3.3): \nDuring the process of training 15 rules was generated presented in the Table 3.4. \nThe dependence of the quality of training on the number of rules that are generated in the first stage was investigated. For an objective assessment of the results testing on the test sample was performed. For this purpose we varied the number of rules, starting from 9 to 14. The results are shown in the Table 3.5. \nThe obtained result is natural, the more rules, the better the results of the test classification. \nWe have investigated the effect of the terms number in features on the quality of classification. Comparative table is given below (see Table 3.6) \nVery interesting result was obtained in this series of experiments [8]. \nFrom the Table 3.6 it follows that there exists an optimal number of terms that can be used to describe a collection of data during training. When the number of terms exceeds this value the number of misclassified samples increases, that is, by increasing the complexity of the model error increases. \nSystem training using classical algorithm with the optimal number of terms in the features was performed. Forms of membership functions for each feature are shown in Fig. 3.5. \nThe total sum of squared errors was 2.852081, the number of erroneous classifications—zero in the training set, while for the test sample MSE was equal to 4.6252, which is not bad result. \nExperiments with the gradient algorithm. The results are shown in Fig. 3.6 (MF of fuzzy sets for each of the four features). \nAn error at the end of the training was 2.042015, that a little bit better than for classical method. When testing MSE was 3.786005, and the portion of misclassification was $4 %$ . \nFurther, the option automatic speed adjustment of MF parameters was included, that is, we used the algorithm “golden section” for step value optimization. The results are shown below (Fig. 3.7). \nThe same experiments were carried out with a conjugate gradient algorithm. The results are shown in Fig. 3.8. \nFurther the method of golden section was added to training algorithm. The results can be seen in Fig. 3.9. \nFinally, experiments with a genetic algorithm with different MF—triangular and Gaussian were carried out [8]. \nThe results of learning using different algorithms are presented in the comparative charts (Fig. 3.10) and Table 3.7. Note that for the training sample excellent results by the criterion of the percentage of misclassification were obtained for all algorithms. \nFor all algorithms, this criterion is zero. However, on the test sample, the results were worse: at least two samples were misclassified. Also the sum of squared error (MSE) for all, without exception, learning algorithms increased. For ease of comparison, the number of iterations (epochs) has been limited to 50. \nAs can be seen, the results are satisfactory, the level of correct classification on the test sample is $9 6 %$ . These results may be improved by forming a more representative sample. \nAnalyzing the curves in the Fig. 3.10 it can be clearly seen that the best method for the rate of convergence is the conjugate gradient method. Then the next is a genetic algorithm with Gaussian function. Less effective is the gradient method. Next by rate of convergence is classical algorithm used in the system NEFClass. And at the end of row the least effective is genetic method with a triangular membership function. \nHowever, the MSE criterion by which the curves were plotted, displays ambiguously classification quality. An important criterion for evaluation of methods efficiency is the minimum number of misclassified samples. From Table 3.7 one can see that all algorithms show the same results with respect to this criterion. \n3.6 Recognition of Images in Medical Diagnostics Using Fuzzy Neural Networks \nIntroduction \nAn important application sphere of pattern recognition systems is the problem of classification of optical medical images and diagnostics in medicine. Especially it relates to state recognition of human organs tissue and early detection of possible cancer. One of such tasks is cervix epithelium state analysis and diagnostics using optical images obtained with colposcope (a method of survey of a mucous membrane of part of a neck of a uterus in the conditions of additional lighting and optical increase with the help of a colposcope) [10]. As a result of carrying out a colposcopy by the doctor the increased pictures of images with preliminary splitting into classes of diseases are provided. The problem of classification cervix epithelium state using images obtained with colposcope was considered in [10, 11] where for its solution was suggested the application of crisp neural networks Back propagation, neural networks with radial basis functions (RBFNN) and cascade RBFNN and their efficiency investigated. The goal of this presentation is the investigation of fuzzy neural network NEFClass for recognition of state of cervix epithelium in medical diagnostics and comparison of its efficiency with conventional RBF network.",
        "chapter": "3 Pattern Recognition in Big Data Analysis",
        "section": "3.5 Application of NEFClass in the Problem of Objects Recognition at Electro-Optical Images",
        "subsection": "3.5.3 Experiments on Objects Recognition on Optical Images",
        "subsubsection": "N/A"
    },
    {
        "content": "3.6.1 Problem Statement \nThe problem consists in classification of obtained medical images using special medical tools: computer tomography, magneto-resonance tomography, colposcope etc. \nIn medical images values of the color model RGB represent components of input vector and based on this information it’s needed to define, which class it should be referred to. The classifier thus refers object to one of classes according to a certain splitting of N-dimensional space which is called as input space, and dimension of this space is a number of vector components. \nFor the solution of cervix epithelium state analysis and diagnostics problem using optical images theNefClass network with Gaussian membership function was suggested \n3.6.2 Training of NEFClass System \nThe NEFClass system can be constructed on partial knowledge of samples. The user has to define quantity of initial fuzzy sets for each of object feature, and set value $k _ { m a x }$ —the maximum number of nodes rules which can be created in the hidden layer. Membership functions of Gauss and gradient algorithm of training of fuzzy sets are used for training. \nLet’s consider stages of recognition process. \n1. Work with data. Construct a database of examples, characteristic for this task. Split all data set into two sets: training and test in the following ratio: \ntraining $50 %$ , test $50 %$ ;   \ntraining $60 %$ , test $40 %$ ;   \ntraining $70 %$ , test $30 %$ ;   \ntraining $80 %$ , test $20 %$ ;",
        "chapter": "3 Pattern Recognition in Big Data Analysis",
        "section": "3.6 Recognition of Images in Medical Diagnostics Using Fuzzy Neural Networks",
        "subsection": "3.6.1 Problem Statement",
        "subsubsection": "N/A"
    },
    {
        "content": "3.6.1 Problem Statement \nThe problem consists in classification of obtained medical images using special medical tools: computer tomography, magneto-resonance tomography, colposcope etc. \nIn medical images values of the color model RGB represent components of input vector and based on this information it’s needed to define, which class it should be referred to. The classifier thus refers object to one of classes according to a certain splitting of N-dimensional space which is called as input space, and dimension of this space is a number of vector components. \nFor the solution of cervix epithelium state analysis and diagnostics problem using optical images theNefClass network with Gaussian membership function was suggested \n3.6.2 Training of NEFClass System \nThe NEFClass system can be constructed on partial knowledge of samples. The user has to define quantity of initial fuzzy sets for each of object feature, and set value $k _ { m a x }$ —the maximum number of nodes rules which can be created in the hidden layer. Membership functions of Gauss and gradient algorithm of training of fuzzy sets are used for training. \nLet’s consider stages of recognition process. \n1. Work with data. Construct a database of examples, characteristic for this task. Split all data set into two sets: training and test in the following ratio: \ntraining $50 %$ , test $50 %$ ;   \ntraining $60 %$ , test $40 %$ ;   \ntraining $70 %$ , test $30 %$ ;   \ntraining $80 %$ , test $20 %$ ; \ntraining $90 %$ , test $10 %$ ; \n2. Preliminary processing. Choose system of features, characteristic for this task, and transform data appropriately that is to be fed into network inputs. As a result it is desirable to receive linearly separated space of a set of samples. As input data medical images of benign processes, are used namely: \ninflammatory processes in the form of branching of vessels;   \ncervical erosion;   \ntraumatic deformation;   \nlarge cervical ectropion.   \nsmall cervical ectropion. \nEach of these diseases is presented by a number of features which is to be classified by a neural network and are shown in the Figs. 3.11, 3.12, 3.13, 3.14, 3.15 and 3.16. \n3. Designing, training and assessment of a network work quality. At this stage the number of rules, quantity of fuzzy sets and percentage ratio of training and testing samples are determined.   \n4. Choosing algorithm of a network training. As a training algorithm the gradient method was used. At this stage it is necessary to specify the accuracy, the steps size for all variables and a number of iterations.   \n5. Application and diagnosing. At the last stage we receive result of application of the neural NefClass network to a problem of medical diagnostics. We observe splitting images into RGB to the color scheme and a class to which the sample initially belonged. Also we obtain the result of recognition—a class to which the sample after training of a neural network belongs. The amount of misclassifications and an average error on sample are determined. \n3.6.3 Experimental Investigations \nThe experimental investigations were carried out on real images of cervix uterus. Sample size consisted of 70 elements which contained 5 classifications of diseases. In process of experiments the training/test sample ratio, number of fuzzy sets of linguistic variables and number of rules were varied. \nThe results of classification after training at training and test samples for various training/test samples ratio, number of fuzzy sets are presented in the Table 3.8 [12]. \nFigures 3.17, 3.18, 3.19 and 3.20 shows the dependence of ratio training/testing samples on the mean squared error and misclassification $%$ (MAPE) for different number of fuzzy sets for each variable (feature). \nThe next step in experiments was determination of results change due to variation of the rules number. For each number of fuzzy sets (3, 6, 7, and 11) training/ test sample ratio was used. It should be noted there is a number of rules, after which there is no change in the classification of samples and in the mean square error. The results are shown in Table 3.9 [12]. \nComparison of the fuzzy neural network NefClass efficiency with the neural network RBF was performed. The results of RBF are shown in Table 3.10. \nConclusions \n1. The problem of recognition of objects on medical images in medical diagnostics is considered. The investigations were performed on the cervix uterus images obtained using colposcope. 70 images were selected which contained 5 classifications of diseases.   \n2. Fuzzy neural network NefClass and non- fuzzy neural network RBF were used for classification. Experiments were carried out on training/test samples in the ratios: 50/50, 60/40, 70/30, 80/20 and 90/10.",
        "chapter": "3 Pattern Recognition in Big Data Analysis",
        "section": "3.6 Recognition of Images in Medical Diagnostics Using Fuzzy Neural Networks",
        "subsection": "3.6.2 Training of NEFClass System",
        "subsubsection": "N/A"
    },
    {
        "content": "3.6.3 Experimental Investigations \nThe experimental investigations were carried out on real images of cervix uterus. Sample size consisted of 70 elements which contained 5 classifications of diseases. In process of experiments the training/test sample ratio, number of fuzzy sets of linguistic variables and number of rules were varied. \nThe results of classification after training at training and test samples for various training/test samples ratio, number of fuzzy sets are presented in the Table 3.8 [12]. \nFigures 3.17, 3.18, 3.19 and 3.20 shows the dependence of ratio training/testing samples on the mean squared error and misclassification $%$ (MAPE) for different number of fuzzy sets for each variable (feature). \nThe next step in experiments was determination of results change due to variation of the rules number. For each number of fuzzy sets (3, 6, 7, and 11) training/ test sample ratio was used. It should be noted there is a number of rules, after which there is no change in the classification of samples and in the mean square error. The results are shown in Table 3.9 [12]. \nComparison of the fuzzy neural network NefClass efficiency with the neural network RBF was performed. The results of RBF are shown in Table 3.10. \nConclusions \n1. The problem of recognition of objects on medical images in medical diagnostics is considered. The investigations were performed on the cervix uterus images obtained using colposcope. 70 images were selected which contained 5 classifications of diseases.   \n2. Fuzzy neural network NefClass and non- fuzzy neural network RBF were used for classification. Experiments were carried out on training/test samples in the ratios: 50/50, 60/40, 70/30, 80/20 and 90/10. \nIn process of experiment with NefClass number of fuzzy sets varied 3, 6, 7 and 11, the number of rules—50, for each sample the value of MSE (training and testing) was calculated. The best result was obtained for samples ratio 90/10, for which in the case of 6 sets were correctly classified all the patterns, while with 11 sets 6 patterns were correctly classified, 1 was classified incorrectly. The worst results were with 3 and 7 sets. \n3. While changing the number of rules it was found that there exists an optimal number of rules after which the recognition error of the sample does not change.   \n4. The experiments with non-fuzzy RBF neural network had shown the best result was obtained for training/test sample ratio 90–10, with an error of classification $1 4 . 3 %$ . The results of the fuzzy neural network proved to be much better than the RBFN. Additionally, for NefClass FNN it is possible to change the number of fuzzy sets and the number of rules. \n3.7 Medical Images of Breast Tumors Diagnostics with Application of Hybrid CNN–FNN Networks \n3.7.1 State-of-Art Problem Analysis \nIn medical diagnostics problems substantial amount of problem constitute the features extraction for further processing and the choice features classification method. With development and wide dissemination of decision-support systems the demands to training algorithms are increasing. Reliability and simplicity of application influence on speed and quality of decision-making which is very important for express medical diagnostics. The advantages of medical diagnostics systems are speed, automation and stability of work which make them very comfortable tools for express medical diagnostics. Despite young age of medical informatics which don’t exceed 30 years information technologies in a whole are fast penetrating in various spheres of medicine and health defence. (family medicine, insurance medicine, building unified information space, integration in European medical space, etc.) \nNowadays, in practice, at every stage of diagnostics information technologies are utilized. The main goal of medical automated systems are extension of spheres practical tasks which may be solved with computers aid, raise of level intellectual decision support of doctors in particularly in process of express diagnostics based on processing and analysis of medical images of human tissue obtained by different source (MRT, CT, etc). \nNow cancer constitute the great problem for health defence all over the world. \nBasing the on data of IARC (International Agency of Cancer Research) 8.2 million death cases were registered in year 2012, 27 million new cases of illness are expected till 2030 [13]. Among the different types of cancer breast cancer takes the second place by its occurrence in women. Besides, mortality of it very high as compared with other cancer diseases [14]. \nDespite of progress which was achieved by diagnostics technologies final diagnosis of breast cancer including classification of tumors and diagnosis still is performed by pathologo-anatomists which use visual analysis of histological patterns by microscope. The latest achievements in images processing technologies and machine learning enable to construct systems of automatic detection and diagnostics (CAD/CADx) that may help pathologo-anatomists to make true diagnosis and accelerate his work. Classification of images histopathology on different patterns which corresponds to cancer and not-cancer states of tissue is often first rank goal in images analysis systems for automatic cancer diagnostics. The main problem in such systems lies therein they deal with complex histopathologic patterns.",
        "chapter": "3 Pattern Recognition in Big Data Analysis",
        "section": "3.6 Recognition of Images in Medical Diagnostics Using Fuzzy Neural Networks",
        "subsection": "3.6.3 Experimental Investigations",
        "subsubsection": "N/A"
    },
    {
        "content": "3. While changing the number of rules it was found that there exists an optimal number of rules after which the recognition error of the sample does not change.   \n4. The experiments with non-fuzzy RBF neural network had shown the best result was obtained for training/test sample ratio 90–10, with an error of classification $1 4 . 3 %$ . The results of the fuzzy neural network proved to be much better than the RBFN. Additionally, for NefClass FNN it is possible to change the number of fuzzy sets and the number of rules. \n3.7 Medical Images of Breast Tumors Diagnostics with Application of Hybrid CNN–FNN Networks \n3.7.1 State-of-Art Problem Analysis \nIn medical diagnostics problems substantial amount of problem constitute the features extraction for further processing and the choice features classification method. With development and wide dissemination of decision-support systems the demands to training algorithms are increasing. Reliability and simplicity of application influence on speed and quality of decision-making which is very important for express medical diagnostics. The advantages of medical diagnostics systems are speed, automation and stability of work which make them very comfortable tools for express medical diagnostics. Despite young age of medical informatics which don’t exceed 30 years information technologies in a whole are fast penetrating in various spheres of medicine and health defence. (family medicine, insurance medicine, building unified information space, integration in European medical space, etc.) \nNowadays, in practice, at every stage of diagnostics information technologies are utilized. The main goal of medical automated systems are extension of spheres practical tasks which may be solved with computers aid, raise of level intellectual decision support of doctors in particularly in process of express diagnostics based on processing and analysis of medical images of human tissue obtained by different source (MRT, CT, etc). \nNow cancer constitute the great problem for health defence all over the world. \nBasing the on data of IARC (International Agency of Cancer Research) 8.2 million death cases were registered in year 2012, 27 million new cases of illness are expected till 2030 [13]. Among the different types of cancer breast cancer takes the second place by its occurrence in women. Besides, mortality of it very high as compared with other cancer diseases [14]. \nDespite of progress which was achieved by diagnostics technologies final diagnosis of breast cancer including classification of tumors and diagnosis still is performed by pathologo-anatomists which use visual analysis of histological patterns by microscope. The latest achievements in images processing technologies and machine learning enable to construct systems of automatic detection and diagnostics (CAD/CADx) that may help pathologo-anatomists to make true diagnosis and accelerate his work. Classification of images histopathology on different patterns which corresponds to cancer and not-cancer states of tissue is often first rank goal in images analysis systems for automatic cancer diagnostics. The main problem in such systems lies therein they deal with complex histopathologic patterns. \n\nUp to date several models and methods were developed for breast cancer detection using various machine learning algorithms. Using such methods and technologies of AI as neuron networks and SVM accuracy of diagnostics from 76 to $94 %$ was attained at data set with 92 images. \nZhang et al. [15] suggested cascade approach. At the first cascade level the classifiers reject easy cases (those which evidently don’t pass test) and the others are transferred to the second level which uses more complex classification system and so on. This method was applied to data base of Israel technological Institute consisting of 361 images and accuracy results was $9 7 %$ . \nThe most of last papers refers to field of breast cancer classification oriented on integer image [16–19]. But wide implementation of BIC and other forms of digital pathology faces with such disturbances as high cost of implementation, insufficient productivity for huge amount of clinic procedures, interior technologic problems non- solved regulator questions and opposition from pathologo-anatomists side. Till now the most of works based on histologic breast cancer analysis were performed on not large datasets. Some improvement presents data set with 7909 breast images obtained from 82 patients [19]. In this research the authors estimated various texture descriptors and various classifiers and carried out the experiments with accuracy from 82 to $8 5 %$ . \nBased on results presented in [19] one can conclude that texture descriptors may propose good solution for images processing. The alternative to this approach based on application of texture descriptors is the application of CNN for medical images processing and diagnostics, which is considered and developed in the present research. It was shown that CNN is able to overcome the conventional texture descriptors. Besides traditional approach to detection of features based on descriptors demands much efforts and high level knowledge of experts and usually is specific for every task that prevents its direct application for another similar tasks. \nCNN, which firstly was developed by LeCun in [20] is widely applied now for achievement high results in different images recognition problems, with microscope and macroscope texture. \nThe set of experiments with data set BreaKHis presented in [19] testifies that CNN achieves better results than the best results which were attained by other models which were trained using alternative approach based on texture scenarios. But the best results may be attained by combining CNN with other models. \nTherefore in our research we suggested to use CNN for feature detection in medical images of breast tissue and developed hybrid CNN-FNN classification system in which CNN is utilized to extract informative features of images and FNN NEFClass is applied for classification of detected tumors on images in two classes: benign and malicious ones. \nThe main goal of following section is the presentation and investigation of algorithmic and software tools for fast analysis of breast tissue images, detection of tumors and their classification into classes: benign or malignant one. This will enable to provide express analysis of images and raise the quality medical diagnostics \n3.7.2 Data Set Description \nData set BreaKHis [19] contains microscope biopsies from benign and malign tumors of breast. The images were obtained in clinic research since January 2014 till December 2014. \nBreaKHis consists of 7909 clinically representative microscopic images of breast tumors received from 82 patients with different scale augmentation $( 4 0 times ,  1 0 0 times .$ , $2 0 0 times$ , $4 0 0 times )$ . \nAll patients during this period were investigated in R&D medical lab with clinical conclusion of breast cancer were invited to take part in this investigation. All data were anonymized. The patterns are generated of biopsy breast slides colored with hematoxylin and eosin (HE). The patterns are collected by surgery biopsy prepared for histologic research and marked by pathologists anatomists of R&D lab. The main goal was to preserve original structure of tissue and molecular composition which allows to observe it with optical microscope. For investigation all images were split into slides of size $3  mathrm { m k m }$ . The final conclusion of each case was made by experienced pathologists anatomists which was confirmed by additional investigation such as immune histo-chemistry (IHC). \nThe microscope system Olympus BX-50 with augmentation 3.3 connected with digital camera Samsung SCC- 131AN, is used for obtaining digitized images of breast tissue. Images were obtained in 3-channels color space True color (24 bits value, 8 bits color channels RGB) with magnification coefficients $4 0 times$ , $1 0 0 times$ , $2 0 0 times$ , and $4 0 0 times$ . \nIn the Fig. 3.21, 3.22, 3.23 and 3.24 four images are presents with four magnification coefficients (a) $4 0 times$ , (b) $1 0 0 times$ , (c) $2 0 0 times mathrm { ~ i ~ }$ (d) $4 0 0 times$ —obtained from one slide of breast tumor which contains malign tumor (breast cancer) Separated rectangular (added by hand for illustrative aims)—region of interest (ROI) which was chosen by pathologist-anatomist.",
        "chapter": "3 Pattern Recognition in Big Data Analysis",
        "section": "3.7 Medical Images of Breast Tumors Diagnostics with Application of Hybrid CNN–FNN Networks",
        "subsection": "3.7.1 State-of-Art Problem Analysis",
        "subsubsection": "N/A"
    },
    {
        "content": "The main goal of following section is the presentation and investigation of algorithmic and software tools for fast analysis of breast tissue images, detection of tumors and their classification into classes: benign or malignant one. This will enable to provide express analysis of images and raise the quality medical diagnostics \n3.7.2 Data Set Description \nData set BreaKHis [19] contains microscope biopsies from benign and malign tumors of breast. The images were obtained in clinic research since January 2014 till December 2014. \nBreaKHis consists of 7909 clinically representative microscopic images of breast tumors received from 82 patients with different scale augmentation $( 4 0 times ,  1 0 0 times .$ , $2 0 0 times$ , $4 0 0 times )$ . \nAll patients during this period were investigated in R&D medical lab with clinical conclusion of breast cancer were invited to take part in this investigation. All data were anonymized. The patterns are generated of biopsy breast slides colored with hematoxylin and eosin (HE). The patterns are collected by surgery biopsy prepared for histologic research and marked by pathologists anatomists of R&D lab. The main goal was to preserve original structure of tissue and molecular composition which allows to observe it with optical microscope. For investigation all images were split into slides of size $3  mathrm { m k m }$ . The final conclusion of each case was made by experienced pathologists anatomists which was confirmed by additional investigation such as immune histo-chemistry (IHC). \nThe microscope system Olympus BX-50 with augmentation 3.3 connected with digital camera Samsung SCC- 131AN, is used for obtaining digitized images of breast tissue. Images were obtained in 3-channels color space True color (24 bits value, 8 bits color channels RGB) with magnification coefficients $4 0 times$ , $1 0 0 times$ , $2 0 0 times$ , and $4 0 0 times$ . \nIn the Fig. 3.21, 3.22, 3.23 and 3.24 four images are presents with four magnification coefficients (a) $4 0 times$ , (b) $1 0 0 times$ , (c) $2 0 0 times mathrm { ~ i ~ }$ (d) $4 0 0 times$ —obtained from one slide of breast tumor which contains malign tumor (breast cancer) Separated rectangular (added by hand for illustrative aims)—region of interest (ROI) which was chosen by pathologist-anatomist. \n\nUp to date dataset BreakHis consists of 7909 images, divided into benign and malign tumors (Fig. 3.25) \nTable 3.11 presents the distribution of images by classes [19]. \n3.7.3 Convolutional Neural Networks Brief Description \nA CNN is a state-of-the-art method that has been largely utilized for image processing. A CNN model has the ability to extract global features in a hierarchical manner that ensures local connectivity as well as the weight-sharing property. It consists of the following layers [20, 21]. \n• Convolutional Layer: The Convolutional layer is considered as the main working ingredient in a CNN model and plays a vital determining part of this model. A kernel (filter), which is basically an $mathbf { n } times mathbf { n }$ matrix successively goes through all the pixels and extracts the information from them.",
        "chapter": "3 Pattern Recognition in Big Data Analysis",
        "section": "3.7 Medical Images of Breast Tumors Diagnostics with Application of Hybrid CNN–FNN Networks",
        "subsection": "3.7.2 Data Set Description",
        "subsubsection": "N/A"
    },
    {
        "content": "Up to date dataset BreakHis consists of 7909 images, divided into benign and malign tumors (Fig. 3.25) \nTable 3.11 presents the distribution of images by classes [19]. \n3.7.3 Convolutional Neural Networks Brief Description \nA CNN is a state-of-the-art method that has been largely utilized for image processing. A CNN model has the ability to extract global features in a hierarchical manner that ensures local connectivity as well as the weight-sharing property. It consists of the following layers [20, 21]. \n• Convolutional Layer: The Convolutional layer is considered as the main working ingredient in a CNN model and plays a vital determining part of this model. A kernel (filter), which is basically an $mathbf { n } times mathbf { n }$ matrix successively goes through all the pixels and extracts the information from them. \nStride and Padding: The number of pixels a kernel will move in a step is determined by the stride size; conventionally, the size of the stride keeps to 1. Figure 3.26a shows an input data matrix of size $5 times 5$ , which is scanned with a 3 $times 3$ kernel. The light-green image shows the output with stride size 1, and the green image represents the output with stride size 2. When we use a $3 times 3$ kernel, and stride size 1, then the convolved output is a $3 times 3$ matrix; however, when we use stride size 2, the convolved output is $2 times 2$ . Interestingly, if we use a $5 times 5$ kernel on the above input matrix with stride 1, the output will be a $1 times 1$ matrix. Thus, the size of the output image changes with both the size of the stride and the size of the kernel. To overcome this issue, we can utilize extra rows and columns at the end of the matrices that contain $0 mathrm { ~ s ~ }$ . This adding of rows and columns that contain only zero values is known as zero padding. \nFor example, Fig. 3.26b shows how two extra rows have been added at the top as well as the bottom of the original $5 times 5$ matrix. Similarly, two extra columns have been added at the beginning as well as the end of the original $5 times 5$ matrix. Now, the olive-green image of Fig. 3.26b shows a convolved image where we have utilized a kernel of size $3 times 3$ , stride size 1 and padding size zero. The convolved image is also a $5 times 5$ matrix, which is the same as the original data size. Thus, by adding the proper amount of zero padding, we can reduce the loss of information that lies at the border. \n• Nonlinear Performance: Each layer of the NN produces linear output, and by definition adding two linear functions will also produce another linear output. Due to the linear nature of the output, adding more NN layers will show the same behavior as a single NN layer. To overcome this issue, a rectifier function, such as Rectified Linear Unit (ReLU), Leaky ReLU, TanH, Sigmoid, etc., had been introduced to make the output nonlinear. \n• Pooling Operation: A CNN model produces a large amount of feature information. To reduce the feature dimensionality, a down-sampling method named a pooling operation has been performed. A few pooling operation methods are well known such as \n– Max Pooling, – Average Pooling. \nFor our analysis, we have utilized the Max Pooling operation that selects the maximum values within a particular patch. \n• Drop-Out: Due to the over training of the model, it shows very poor performance on the test dataset, which is known as over-fitting. These over-fitting issues have been controlled by removing some of the neurons from the network, which is known as Drop-Out (it was considered in detail in the chapter 2). \nDecision Layer: For the classification decision, at the end of a CNN model, a decision layer is introduced. Normally, a Softmax layer or a SVM layer is introduced for this purpose. This layer contains a normalized exponential function and calculates the loss function for the data classification. \nFigure 3.27 shows the work flow of a generalized CNN model that can be used for image classification. Before the decision layer, there must be at least one immediate dense layer available in a CNN model. \nUtilizing the Softmax layer, the output of the end layer can be represented as \nwhere \nHere, $k ^ { e n d - 1 }$ represents the kth neuron at the $( e n d - 1 )$ th layer, and $sigma$ represents the nonlinear function. For binary classification, the number of classes is equal $mathbf { m } =$ 2. Let $d = 1$ represent the Benign class and else it represents the Malignant class. The cross-entropy loss of $bar { mathbf { Y } } _ { d }$ can be calculated as \nAs we have a two-class classification problem, then only the $mathrm { L } _ { 1 }$ and $mathbf { L } _ { 2 }$ values are possible, and the output will be benign when $mathbf { L } _ { 1 } < mathbf { L } _ { 2 }$ , else the output will be malignant. \n3.7.4 CNN Model for Image Classification \nIn the next Fig. 3.28 the architecture of VGG-16 is presented which was used in our work as detector of informative features. It was trained by different algorithms: stochastic gradient descent (SCD), basin hopping [22] and differential evolution. \nAs classifier of obtained features in our research was suggested to use FNN Nef Class. \nIn the next section results of classification by suggested hybrid CNN-Nefclass are presented and compared with results obtained by other researches which used as classifiers SVM machine, Random forests and other classification methods. \n3.7.5 Experimental Investigations and Results Analysis \nAs it was already mentioned in our investigation pretrained CNN VGG 16 was used. Method of training transfer was applied for this purpose. Training transfer method as it goes from its name means the transfer of knowledge obtained during training one CNN to another neural network applied for solution of similar or connected problems.",
        "chapter": "3 Pattern Recognition in Big Data Analysis",
        "section": "3.7 Medical Images of Breast Tumors Diagnostics with Application of Hybrid CNN–FNN Networks",
        "subsection": "3.7.3 Convolutional Neural Networks Brief Description",
        "subsubsection": "N/A"
    },
    {
        "content": "As we have a two-class classification problem, then only the $mathrm { L } _ { 1 }$ and $mathbf { L } _ { 2 }$ values are possible, and the output will be benign when $mathbf { L } _ { 1 } < mathbf { L } _ { 2 }$ , else the output will be malignant. \n3.7.4 CNN Model for Image Classification \nIn the next Fig. 3.28 the architecture of VGG-16 is presented which was used in our work as detector of informative features. It was trained by different algorithms: stochastic gradient descent (SCD), basin hopping [22] and differential evolution. \nAs classifier of obtained features in our research was suggested to use FNN Nef Class. \nIn the next section results of classification by suggested hybrid CNN-Nefclass are presented and compared with results obtained by other researches which used as classifiers SVM machine, Random forests and other classification methods. \n3.7.5 Experimental Investigations and Results Analysis \nAs it was already mentioned in our investigation pretrained CNN VGG 16 was used. Method of training transfer was applied for this purpose. Training transfer method as it goes from its name means the transfer of knowledge obtained during training one CNN to another neural network applied for solution of similar or connected problems.",
        "chapter": "3 Pattern Recognition in Big Data Analysis",
        "section": "3.7 Medical Images of Breast Tumors Diagnostics with Application of Hybrid CNN–FNN Networks",
        "subsection": "3.7.4 CNN Model for Image Classification",
        "subsubsection": "N/A"
    },
    {
        "content": "As we have a two-class classification problem, then only the $mathrm { L } _ { 1 }$ and $mathbf { L } _ { 2 }$ values are possible, and the output will be benign when $mathbf { L } _ { 1 } < mathbf { L } _ { 2 }$ , else the output will be malignant. \n3.7.4 CNN Model for Image Classification \nIn the next Fig. 3.28 the architecture of VGG-16 is presented which was used in our work as detector of informative features. It was trained by different algorithms: stochastic gradient descent (SCD), basin hopping [22] and differential evolution. \nAs classifier of obtained features in our research was suggested to use FNN Nef Class. \nIn the next section results of classification by suggested hybrid CNN-Nefclass are presented and compared with results obtained by other researches which used as classifiers SVM machine, Random forests and other classification methods. \n3.7.5 Experimental Investigations and Results Analysis \nAs it was already mentioned in our investigation pretrained CNN VGG 16 was used. Method of training transfer was applied for this purpose. Training transfer method as it goes from its name means the transfer of knowledge obtained during training one CNN to another neural network applied for solution of similar or connected problems. \n\nThere are two main training scenarios: \n(1) Features extraction. In this case the last full-connected layer is deleted and the rest part of CNN is used as extractor for new data sets.   \n(2) Fine tuning. In this case new data set is used for fine training of previously pretrained neural network. \nIn our research CNN VGG 16 was used for features extraction in medical images of breast tumors. After that the detected features were fed as input data to FNN NEFClass described in the previous section. As algorithms of training FNN three algorithms were used: basin hopping [22], stochastic gradient descent and differential evolution. \nThe series of experiments were carried out and the results were compared with works of predecessors [24]. In the following Tables 3.12 and 3.13 the results of classification with different parameters are presented. All sample was divided into training and testing subsamples with ratio $80 % / 2 0 %$ . \nFrom this table on can readily see that beginning from 6 fuzzy sets per variable and 6 rules the accuracy doesn’t increase but complexity of training raises. \nAs it follows from table for two classes the best values of parameters are 4 fuzzy sets per variable and 6 rules. For comparison take the results of previous works obtained with different classifiers for the same problem [23] (see Table 3.13). \nIn the first experiment we varied the number of linguistic variables (terms) and rules that to determine the best parameters values [24]. As we can see from the \nTable 3.13 FNN NEFClass shows better results than previous classifiers: SVM machine and Random forest [23]. \nIn our work for training of FNN NEFClass were applied three algorithms, namely, basin hopping, stochastic gradient descent and differential evolution. Using algorithms basin hopping and stochastic gradient descent we obtained approximately equal results that may mean the true optimal results while the training results of differential evolution appeared to be much worse. \nIt’s worth to note that in this problem the number of features extracted by CNN VGG16 was very large—4096 features. Therefore it was decided to cut the number of features. For this principal components method [25] was applied. In the Table 3.14 the results of such reduction are presented. \nFrom the Table 3.14 it follows that the results of reduction with 250 principal components is most acceptable as the complexity of training increases approximately proportional to number of input data. \nDue to lack of time the next experiments were performed using data with $1 0 0 times$ magnificence factor (2081 images). In the next Table 3.15 the accuracy of classification is presented with different parameters. \nIn the Table 3.16 the dependence of classification accuracy versus number of features is presented. one can see from this table that accuracy decreased only by some percent due such features reduction. But by this reduction we substantially cut the training time. \nFrom this table one can readily see that the accuracy drops with decrease of features number but insignificant by $3- 5 %$ if compare 100 and 250 features. For comparison use the full set of features 4096 and we can see that with decrease features number in 20 times the accuracy falls $2 - 3 %$ in means. \nThis conclusion testify in favour of application of PCM method for reduction of dimension of medical images classification problems. \nConclusion \n1. The problem of analysis of breast tissue medical images and classification of detected tumor in two classes: benign and malignant is considered an discussed.   \n2. For pattern recognition of breast tumors hybrid CNN- FNN network is suggested in which the CNN VGG 16 is used for informative features extraction while FNN NEFClass is used for classification of detected tumors.   \n3. For training FNN NEFClass algorithms basin hopping, stochastic gradient descent and differential evolution were suggested and their efficiency investigated.   \n4. The experimental investigations of suggested hybrid CNN-FNN network in the problem of classification real images of breast tumors in dataset BreakHis were carried out.   \n5. The comparison of classification accuracy of the suggested hybrid CNN-FNN network with known works based on use of classification algorithms SVM and Random forest was performed which confirmed the efficiency of the suggested approach.   \n6. The problem of reducing number of features in medical images classification problem using PCM method was investigated and its efficiency for BD classification problems was explored. \nReferences \n1. D. Nauck, R. Kruse, Generating classification rules with the neuro-fuzzy system NEFCLASS, in Proceedings of the Biennial Conference of the North American Fuzzy Information Processing Society (NAFIPS’96), Berkeley (1996)   \n2. D. Nauck, R. Kruse, New learning strategies for NEFCLASS, in Proceedings of the Seventh International Fuzzy Systems Association World Congress IFSA’97, vol. IV (Academia Prague, 1997), pp. 50–55   \n3. D. Nauck, R. Kruse, What are neuro-fuzzy classifiers?, in Proceedings of the Seventh International Fuzzy Systems Association World Congress IFSA’97, vol. IV (Academia Prague, 1997), pp. 228–233   \n4. D. Nauck, Building neural fuzzy controllers with NEFCON-I, in Fuzzy Systems in Computer Science, Artificial Intelligence, ed. by Rudolf Kruse, Jorg Gebhardt, Rainer Palm (Vieweg, Wiesbaden, 1994), pp. 141–151   \n5. Yu.P. Zaychenko, F. Sevaee, A.V. Matsak, Fuzzy neural networks for economic data classification, in Vestnik of National Technical University of Ukraine “KPI”, section Informatic, Control and Computer Engineering, vol. 42 (2004), pp. 121–133 (in Russian)   \n6. Yu.P. Zaychenko, Fuzzy Models and Methods in Intellectual Systems (Kiev-Publishing House “Slovo”, 2008) 354 pp.   \n7. A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Proceedings of 26th Annual Conference on Neural Information Processing Systems 2012 (NIPS), ed. by P.L. Bartlett, F.C.N. Pereira, C.J.C. Burges, L. Bottou, K.Q. Weinberger, Dec 2012, pp. 1106–1114, http://papers.nips.cc/paper/4824- imagenet-classification-with-deep-convolutional-neural-networks   \n8. Yu.P. Zaychenko, I.M. Petrosyuk, M.S. Jaroshenko, The investigations of fuzzy neural networks in the problems of electro-optical images recognition, in System Research and Information Technologies № 4 (2009), pp. 61–76 (in Russian)   \n9. M. Zgurovsky, Yu. Zaychenko, The Fundamentals of Computational Intelligence: System Approach (Springer International Publishing AG, Switzerland, 2016), 308 pp.   \n10. E.H. Maлышeвcкaя, Aнaлиз иcпoльзoвaния нeйpoнныx ceтeй для диaгнocтики paкa шeйки мaтки пo мyльтиcпeктpaльнoмy изoбpaжeнию/ E.H. Maлышeвcкaя// Cиcтeмнi дocлiджeння тa iнфopмaцiйнi тexнoлoгiї. – 2010. – №2 –C. 64–71   \n11. K. Malyshevska, The analysis of neural networks’ performance for medical image classification. Int. J. Inf. Content Process. 1(2), 194–199 (2014)   \n12. Y. Zaychenko, V. Huskova, Recognition of objects on optical images in medical diagnostics using fuzzy neural network NEFClass. Int. J. Inf. Models Anal. 4(1), 13–22 (2015)   \n13. P. Boyle, B. Levin (eds.), World Cancer Report 2012 (IARC, Lyon, 2012), http://www.iarc. fr/en/publications/pdfs-online/wcr/2008/wcr_2012.pdf   \n14. S.R. Lakhani, I.O. Ellis, S. Schnitt, P. Tan, M. van de Vijver, WHO Classification of Tumours of the Breast, 4th edn. (WHO Press, Lyon, 2012)   \n15. Y. Zhang, B. Zhang, F. Coenen, W. Lu, Breast cancer diagnosis from biopsy images with highly reliable random subspace classifier ensembles. Mach. Vis. Appl. 24(7), 1405–1420 (2013)   \n16. Y. Zhang, B. Zhang, F. Coenen, J. Xiau, W. Lu, One-class kernel subspace ensemble for medical image classification. EURASIP J. Adv. Signal Process. 2014(17), 1–13 (2014)   \n17. S. Doyle, S. Agner, A. Madabhushi, M. Feldman, J. Tomaszewski, Automated grading of breast cancer histopathology using spectral clustering with textural and architectural image features, in Proceedings of the 5th IEEE International Symposium on Biomedical Imaging (ISBI): From Nano to Macro, vol. 61 (IEEE, 2008), pp. 496–499   \n18. A.J. Evans, E.A. Krupinski, R.S. Weinstein, L. Pantanowitz, 2014 american telemedicine association clinical guidelines for telepathology: another important step in support of increased adoption of telepathology for patient care. J. Pathol. Inform. 6 (2015)   \n19. F. Spanhol, L.S. Oliveira, C. Petitjean, L. Heutte, A dataset for breast cancer histopathological image classification. IEEE Trans. Biomed. Eng. (2016)   \n20. A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems, vol. 25 (2012), pp. 1097–1105   \n21. Y. LeCun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, L.D. Jackel, Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)   \n22. B. Olson, I. Hashmi, K. Molloy, A. Shehu1, Basin hopping as a general and versatile optimization framework for the characterization of biological macromolecules. Adv. Artif. Intell. 2012(Article ID 674832) (2012)   \n23. A. Singh, H. Mansourifar, H. Bilgrami, N. Makkar, T. Shah, Classifying Biological Images Using Pre-trained CNNs, https://docs.google.com/document/d/1H7xVK7nwXcv11CYh7hl 5F6pM0m218FQloAXQODP-Hsg/edit?usp=sharing   \n24. Yu. Zaychenko, G. Hamidov, I. Varga, Medical images of breast tumors diagnostics with application of hybrid CNN–FNN network in System Analysis and Information Technologies, № 4 (2018)   \n25. N. Jindal, V. Kumar, Enhanced face recognition algorithm using PCA with artificial neural networks. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3, 864–872 (2013)",
        "chapter": "3 Pattern Recognition in Big Data Analysis",
        "section": "3.7 Medical Images of Breast Tumors Diagnostics with Application of Hybrid CNN–FNN Networks",
        "subsection": "3.7.5 Experimental Investigations and Results Analysis",
        "subsubsection": "N/A"
    },
    {
        "content": "From this table one can readily see that the accuracy drops with decrease of features number but insignificant by $3- 5 %$ if compare 100 and 250 features. For comparison use the full set of features 4096 and we can see that with decrease features number in 20 times the accuracy falls $2 - 3 %$ in means. \nThis conclusion testify in favour of application of PCM method for reduction of dimension of medical images classification problems. \nConclusion \n1. The problem of analysis of breast tissue medical images and classification of detected tumor in two classes: benign and malignant is considered an discussed.   \n2. For pattern recognition of breast tumors hybrid CNN- FNN network is suggested in which the CNN VGG 16 is used for informative features extraction while FNN NEFClass is used for classification of detected tumors.   \n3. For training FNN NEFClass algorithms basin hopping, stochastic gradient descent and differential evolution were suggested and their efficiency investigated.   \n4. The experimental investigations of suggested hybrid CNN-FNN network in the problem of classification real images of breast tumors in dataset BreakHis were carried out.   \n5. The comparison of classification accuracy of the suggested hybrid CNN-FNN network with known works based on use of classification algorithms SVM and Random forest was performed which confirmed the efficiency of the suggested approach.   \n6. The problem of reducing number of features in medical images classification problem using PCM method was investigated and its efficiency for BD classification problems was explored. \nReferences \n1. D. Nauck, R. Kruse, Generating classification rules with the neuro-fuzzy system NEFCLASS, in Proceedings of the Biennial Conference of the North American Fuzzy Information Processing Society (NAFIPS’96), Berkeley (1996)   \n2. D. Nauck, R. Kruse, New learning strategies for NEFCLASS, in Proceedings of the Seventh International Fuzzy Systems Association World Congress IFSA’97, vol. IV (Academia Prague, 1997), pp. 50–55   \n3. D. Nauck, R. Kruse, What are neuro-fuzzy classifiers?, in Proceedings of the Seventh International Fuzzy Systems Association World Congress IFSA’97, vol. IV (Academia Prague, 1997), pp. 228–233   \n4. D. Nauck, Building neural fuzzy controllers with NEFCON-I, in Fuzzy Systems in Computer Science, Artificial Intelligence, ed. by Rudolf Kruse, Jorg Gebhardt, Rainer Palm (Vieweg, Wiesbaden, 1994), pp. 141–151   \n5. Yu.P. Zaychenko, F. Sevaee, A.V. Matsak, Fuzzy neural networks for economic data classification, in Vestnik of National Technical University of Ukraine “KPI”, section Informatic, Control and Computer Engineering, vol. 42 (2004), pp. 121–133 (in Russian)   \n6. Yu.P. Zaychenko, Fuzzy Models and Methods in Intellectual Systems (Kiev-Publishing House “Slovo”, 2008) 354 pp.   \n7. A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Proceedings of 26th Annual Conference on Neural Information Processing Systems 2012 (NIPS), ed. by P.L. Bartlett, F.C.N. Pereira, C.J.C. Burges, L. Bottou, K.Q. Weinberger, Dec 2012, pp. 1106–1114, http://papers.nips.cc/paper/4824- imagenet-classification-with-deep-convolutional-neural-networks   \n8. Yu.P. Zaychenko, I.M. Petrosyuk, M.S. Jaroshenko, The investigations of fuzzy neural networks in the problems of electro-optical images recognition, in System Research and Information Technologies № 4 (2009), pp. 61–76 (in Russian)   \n9. M. Zgurovsky, Yu. Zaychenko, The Fundamentals of Computational Intelligence: System Approach (Springer International Publishing AG, Switzerland, 2016), 308 pp.   \n10. E.H. Maлышeвcкaя, Aнaлиз иcпoльзoвaния нeйpoнныx ceтeй для диaгнocтики paкa шeйки мaтки пo мyльтиcпeктpaльнoмy изoбpaжeнию/ E.H. Maлышeвcкaя// Cиcтeмнi дocлiджeння тa iнфopмaцiйнi тexнoлoгiї. – 2010. – №2 –C. 64–71   \n11. K. Malyshevska, The analysis of neural networks’ performance for medical image classification. Int. J. Inf. Content Process. 1(2), 194–199 (2014)   \n12. Y. Zaychenko, V. Huskova, Recognition of objects on optical images in medical diagnostics using fuzzy neural network NEFClass. Int. J. Inf. Models Anal. 4(1), 13–22 (2015)   \n13. P. Boyle, B. Levin (eds.), World Cancer Report 2012 (IARC, Lyon, 2012), http://www.iarc. fr/en/publications/pdfs-online/wcr/2008/wcr_2012.pdf   \n14. S.R. Lakhani, I.O. Ellis, S. Schnitt, P. Tan, M. van de Vijver, WHO Classification of Tumours of the Breast, 4th edn. (WHO Press, Lyon, 2012)   \n15. Y. Zhang, B. Zhang, F. Coenen, W. Lu, Breast cancer diagnosis from biopsy images with highly reliable random subspace classifier ensembles. Mach. Vis. Appl. 24(7), 1405–1420 (2013)   \n16. Y. Zhang, B. Zhang, F. Coenen, J. Xiau, W. Lu, One-class kernel subspace ensemble for medical image classification. EURASIP J. Adv. Signal Process. 2014(17), 1–13 (2014)   \n17. S. Doyle, S. Agner, A. Madabhushi, M. Feldman, J. Tomaszewski, Automated grading of breast cancer histopathology using spectral clustering with textural and architectural image features, in Proceedings of the 5th IEEE International Symposium on Biomedical Imaging (ISBI): From Nano to Macro, vol. 61 (IEEE, 2008), pp. 496–499   \n18. A.J. Evans, E.A. Krupinski, R.S. Weinstein, L. Pantanowitz, 2014 american telemedicine association clinical guidelines for telepathology: another important step in support of increased adoption of telepathology for patient care. J. Pathol. Inform. 6 (2015)   \n19. F. Spanhol, L.S. Oliveira, C. Petitjean, L. Heutte, A dataset for breast cancer histopathological image classification. IEEE Trans. Biomed. Eng. (2016)   \n20. A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems, vol. 25 (2012), pp. 1097–1105   \n21. Y. LeCun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, L.D. Jackel, Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)   \n22. B. Olson, I. Hashmi, K. Molloy, A. Shehu1, Basin hopping as a general and versatile optimization framework for the characterization of biological macromolecules. Adv. Artif. Intell. 2012(Article ID 674832) (2012)   \n23. A. Singh, H. Mansourifar, H. Bilgrami, N. Makkar, T. Shah, Classifying Biological Images Using Pre-trained CNNs, https://docs.google.com/document/d/1H7xVK7nwXcv11CYh7hl 5F6pM0m218FQloAXQODP-Hsg/edit?usp=sharing   \n24. Yu. Zaychenko, G. Hamidov, I. Varga, Medical images of breast tumors diagnostics with application of hybrid CNN–FNN network in System Analysis and Information Technologies, № 4 (2018)   \n25. N. Jindal, V. Kumar, Enhanced face recognition algorithm using PCA with artificial neural networks. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3, 864–872 (2013) \n\nChapter 4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century \n4.1 Introduction \nData on global conflicts took place from $7 5 0 mathbf { B } . mathbf { C }$ . up to now are analyzed and their general pattern is revealed. An attempt is made to foresee the next global conflict called the conflict of the 21st century. Its nature and main characteristics are analyzed. Main global threats are listed, and their impact on five groups of countries is determined using cluster analysis. \nBased on the consideration of evolutionary development of the civilization as a holistic process determined by a harmonious interaction of its components, patterns of Kondratieff cycles of the development of the global economy and C-waves of global systemic conflicts are compared and an attempt is made to predict these processes in the 21st century using a metric approach. \nThe next part of the study is based on the Fibonacci pattern of global systemic conflicts (Ñ-waves) which allowed us to formulate the hypotheses about a metric relation between two global periodic processes, namely, between the sequence of 11-year cycles of solar activity and the process of evolutionary structurization of the family of $tilde { mathrm { N } }$ -waves of global systemic conflicts covering large and super-large time intervals and having a variable structural configuration. \nThe structural analysis is performed for Cn-waves of global systemic conflicts based on their empirical sequence, and metric approaches are proposed to study and forecast these processes. Global systemic conflicts and great Kondratieff waves of the development of the world economy are proved to correspond to a number of additional conditions, namely, to the modern concept on the acceleration of historical time, to the law of structural harmony, and to global forecasts for the 21st century. \nBayesian Belief Networks are used to establish qualitative causal relations between global threats and indicators of sustainable development. The method of belief network synthesis and a method of generalization of final results are proposed. This made it possible to obtain a holistic understanding of effects of global threats on the sustainable development of countries and regions of the world.",
        "chapter": "3 Pattern Recognition in Big Data Analysis",
        "section": "References",
        "subsection": "N/A",
        "subsubsection": "N/A"
    },
    {
        "content": "Chapter 4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century \n4.1 Introduction \nData on global conflicts took place from $7 5 0 mathbf { B } . mathbf { C }$ . up to now are analyzed and their general pattern is revealed. An attempt is made to foresee the next global conflict called the conflict of the 21st century. Its nature and main characteristics are analyzed. Main global threats are listed, and their impact on five groups of countries is determined using cluster analysis. \nBased on the consideration of evolutionary development of the civilization as a holistic process determined by a harmonious interaction of its components, patterns of Kondratieff cycles of the development of the global economy and C-waves of global systemic conflicts are compared and an attempt is made to predict these processes in the 21st century using a metric approach. \nThe next part of the study is based on the Fibonacci pattern of global systemic conflicts (Ñ-waves) which allowed us to formulate the hypotheses about a metric relation between two global periodic processes, namely, between the sequence of 11-year cycles of solar activity and the process of evolutionary structurization of the family of $tilde { mathrm { N } }$ -waves of global systemic conflicts covering large and super-large time intervals and having a variable structural configuration. \nThe structural analysis is performed for Cn-waves of global systemic conflicts based on their empirical sequence, and metric approaches are proposed to study and forecast these processes. Global systemic conflicts and great Kondratieff waves of the development of the world economy are proved to correspond to a number of additional conditions, namely, to the modern concept on the acceleration of historical time, to the law of structural harmony, and to global forecasts for the 21st century. \nBayesian Belief Networks are used to establish qualitative causal relations between global threats and indicators of sustainable development. The method of belief network synthesis and a method of generalization of final results are proposed. This made it possible to obtain a holistic understanding of effects of global threats on the sustainable development of countries and regions of the world. \n\nThe generalization and formalization of approaches to the recognition of C-waves of global systemic conflicts using big historical data are performed and general concept of description and interpretation of these waves is proposed. Special attention is paid to the class of big C-waves, which cover super-long time intervals. Their pattern is invariant to the evolution of the nature of global conflicts. There has also been made an attempt to predict these processes in the 21st century by using a metric approach. The possible scenarios of the development of the conflict of the 21st century have been constructed and analyzed. \n4.2 Identifying the Regularity of the Emergence of Systemic World Conflicts, Based on the Analysis of Big Historical Data \nAn analysis of the complete list of global conflicts [1] occurred since $2 5 0 0 mathrm { B } . mathrm { C } .$ . up to now shows that, beginning with the 7th century B.C., these conflicts did not follow any regular pattern, reminding a random process such as white noise. Historic facts about constant conflicts at early stages of the development of human civilization as a natural form of its existence confirm this. A certain periodic pattern can only be revealed in a series of global conflicts only after higher forms of society organization appear. This periodic pattern was revealed and studied in [2]. With this pattern, it becomes possible to foresee the next system conflict, to analyze the set of threats giving rise to it, to determine the effect of these threats on its course, and to construct scenarios of possible development of society during and after the conflict. Pessimistic predictions are necessary from the scientific point of view to avoid possible negative outcome, although naturally everyone would prefer to be wrong in such predictions. \n4.2.1 Fibonacci Pattern of the Emergence of Systemic World Conflicts \nThe paper [1] analyzes the series of global conflicts over a period from 705 B.C. till now with the following time quantization: \nThe number of global conflicts for each quantization interval $Delta _ { n }$ has been determined as the arithmetic mean of the number of all conflicts on this time interval. For $Delta { _ { 9 } } = 5$ years ${  ' } n _ { m i n } = 9 {  ' }$ Þ, six consecutive evolutionary groups (waves) $left{ C _ { n } right}$ , $n = 1$ , 2, …, 6, of global conflicts ( $C _ { n }$ -waves) has become apparent (Fig. 4.1).",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.1 Introduction",
        "subsection": "N/A",
        "subsubsection": "N/A"
    },
    {
        "content": "The generalization and formalization of approaches to the recognition of C-waves of global systemic conflicts using big historical data are performed and general concept of description and interpretation of these waves is proposed. Special attention is paid to the class of big C-waves, which cover super-long time intervals. Their pattern is invariant to the evolution of the nature of global conflicts. There has also been made an attempt to predict these processes in the 21st century by using a metric approach. The possible scenarios of the development of the conflict of the 21st century have been constructed and analyzed. \n4.2 Identifying the Regularity of the Emergence of Systemic World Conflicts, Based on the Analysis of Big Historical Data \nAn analysis of the complete list of global conflicts [1] occurred since $2 5 0 0 mathrm { B } . mathrm { C } .$ . up to now shows that, beginning with the 7th century B.C., these conflicts did not follow any regular pattern, reminding a random process such as white noise. Historic facts about constant conflicts at early stages of the development of human civilization as a natural form of its existence confirm this. A certain periodic pattern can only be revealed in a series of global conflicts only after higher forms of society organization appear. This periodic pattern was revealed and studied in [2]. With this pattern, it becomes possible to foresee the next system conflict, to analyze the set of threats giving rise to it, to determine the effect of these threats on its course, and to construct scenarios of possible development of society during and after the conflict. Pessimistic predictions are necessary from the scientific point of view to avoid possible negative outcome, although naturally everyone would prefer to be wrong in such predictions. \n4.2.1 Fibonacci Pattern of the Emergence of Systemic World Conflicts \nThe paper [1] analyzes the series of global conflicts over a period from 705 B.C. till now with the following time quantization: \nThe number of global conflicts for each quantization interval $Delta _ { n }$ has been determined as the arithmetic mean of the number of all conflicts on this time interval. For $Delta { _ { 9 } } = 5$ years ${  ' } n _ { m i n } = 9 {  ' }$ Þ, six consecutive evolutionary groups (waves) $left{ C _ { n } right}$ , $n = 1$ , 2, …, 6, of global conflicts ( $C _ { n }$ -waves) has become apparent (Fig. 4.1). \n\n(Let $C _ { n }$ be the predicted wave, the essence of which will be revealed later). \nThese waves are periodic (Table 4.1) and have the following characteristic features: \n(i) The life of each $C _ { n }$ -wave generates five sequential evolutionary phases (stages) $left{ C _ { n , i } right}$ , $i = 1 , . . . , 5$ : ${ C _ { n , i } } , quad { mathrm { ~ i = 1 , } } quad . . . , quad 5 colon  C _ { n , 1 } quad ( { mathrm { o r i g i n } } ) to C _ { n , 2 } quad ( { mathrm { g r o w t h } } ) quad to quad C _ { n , 3 } quad ( C _ { n , 4 } to C _ { n , 5 } ) quad .$ (culmination) $ C _ { n , 4 }$ (decrease) $ C _ { n , 5 }$ (decay).   \n(ii) The life duration $T ( C _ { n } )$ of each subsequent $C _ { n }$ -wave is uniquely determined by the life duration of two previous waves, namely, \n(iii) Conflict intensity $I ( C _ { n } ) = N ( C _ { n } ) / T ( C _ { n } )$ for $C _ { n }$ -waves, $n = 1 , 2 , 3 , . . . , 6 ,$ increases, where $N ( C _ { n } )$ is the number of conflicts that form the $C _ { n }$ -wave: $I ( C _ { n + 1 } ) > I ( C _ { n } )$ , which is because of the technological progress of mankind. \nGlobal conflicts defined by these features are called $C _ { n }$ -waves of global systemic conflicts or $C _ { n }$ -waves. As we see, six $C _ { n }$ -waves can be identified over the period from 705 B.C. until now. \nTable 4.2 presents the ratios: \nwhich vary around the golden Sect. (1.618). \nLet us represent the sequence ${ T ( C _ { n } ) } , n = 1 , . . . , 7$ (Table 4.2) as a series \nwhere $k _ { c } = 8 5$ years is the greatest common divisor for all the values of life duration $mathrm { T } ( mathrm { C _ { n } } )$ : \nThe number series: \nis a sequence of Fibonacci numbers, where $1 ^ { * }$ pertains to the predicted wave $C _ { 7 }$ of this sequence. \nThe conflict intensity $I ^ { * } ( C _ { n } )$ depends on the level of technological progress of society and increases in time hyperbolically (Fig. 4.2): \nwhence the intensity of the seventh (predicted) conflict follows: \nSince six members of the sequence $mathrm { T } ( { bf C } _ { 1 } ) , . . . , mathrm { T } ( { bf C } _ { 6 } )$ obey the law of variation of elements in the Fibonacci series, the paper [2] advances a hypothesis that it is this pattern that describes the course of global systemic conflicts. Hence, the seventh (predicted) element of the sequence should be $mathrm { T } ( { bf C } _ { 7 } ) = mathrm { T } ( { bf C } _ { 5 } ) - mathrm { T } ( { bf C } _ { 6 } ) = 1 cdot mathrm { k _ { c } } approx$ 85 years. We will call this seventh wave of global systemic conflicts $left( C _ { 7 } right)$ the conflict of the 21st century. It has the time range 2010–2096 with the following probable phases: \n2010s (origin);   \nbeginning of the 2020s to the end of the 2040 (growth);   \n2050s (culmination, $I ^ { * } ( C _ { 7 } ) > 1 6 ) $ ;   \nbeginning of the 2060s to the end of the 2070s (decrease);   \n2080s (decay). \nThus, the revealed patterns describe the course of global systemic conflicts in terms of the durations $T ( C _ { n } )$ of these conflicts, their intensity $I ^ { * } ( C _ { n } )$ , and Fibonacci numbers $left( F _ { s } right)$ [2]. \n4.2.2 Conflict of the 21st Century and Analysis of Its Nature \nLet us consider the Top 12 global threats to the sustainable development to be those determined in the beginning of the XXI century by such recognized international organizations as United Nations Organization (UNO), World Health Organization (WHO), World Economic Forum, Transparency International, Global Footprint Network, International Energy Agency, World Resources Institute, British Petroleum company and others. The analysis of every threat will give the possibility to determine the vulnerability level of different countries of the world to the influence of these aggregated threats. Let us analyze each of the 12 global threats separately. \nThreat 1. Global Decrease in Energy Security (ES) \nFor the first part of the XXI century one of the main critical challenges to the mankind is the rapid decrease in organic fuel resources that are extracted from entrails of the earth, and the increase in consumption of such resources, first of all, by large developing countries. In the beginning of the 30-ies of the current century, the curves of energy consumption and production of energy from oil will be crossed [3]. In other words, the “production-consumption” balance of energy, produced from oil, will change its value from positive to negative. The similar phenomena will occur for “production-consumption” balances of energy, made from gas in the beginning of 40-ies and for the energy generated from uranium-235 in the 50-ies– 60-ies, accordingly (Fig. 4.3).",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.2 Identifying the Regularity of the Emergence of Systemic World Conflicts, Based on the Analysis of Big Historical Data",
        "subsection": "4.2.1 Fibonacci Pattern of the Emergence of Systemic World Conflicts",
        "subsubsection": "N/A"
    },
    {
        "content": "Thus, the revealed patterns describe the course of global systemic conflicts in terms of the durations $T ( C _ { n } )$ of these conflicts, their intensity $I ^ { * } ( C _ { n } )$ , and Fibonacci numbers $left( F _ { s } right)$ [2]. \n4.2.2 Conflict of the 21st Century and Analysis of Its Nature \nLet us consider the Top 12 global threats to the sustainable development to be those determined in the beginning of the XXI century by such recognized international organizations as United Nations Organization (UNO), World Health Organization (WHO), World Economic Forum, Transparency International, Global Footprint Network, International Energy Agency, World Resources Institute, British Petroleum company and others. The analysis of every threat will give the possibility to determine the vulnerability level of different countries of the world to the influence of these aggregated threats. Let us analyze each of the 12 global threats separately. \nThreat 1. Global Decrease in Energy Security (ES) \nFor the first part of the XXI century one of the main critical challenges to the mankind is the rapid decrease in organic fuel resources that are extracted from entrails of the earth, and the increase in consumption of such resources, first of all, by large developing countries. In the beginning of the 30-ies of the current century, the curves of energy consumption and production of energy from oil will be crossed [3]. In other words, the “production-consumption” balance of energy, produced from oil, will change its value from positive to negative. The similar phenomena will occur for “production-consumption” balances of energy, made from gas in the beginning of 40-ies and for the energy generated from uranium-235 in the 50-ies– 60-ies, accordingly (Fig. 4.3). \nThus, until the mankind invents the energy resources that could fully replace the organic types of fuel and nuclear energy, the energy security of a country in particular and the world in general, will decrease. In order to quantitatively estimate the energy security of different countries of the world let us introduce the energy security index (Energy Security Index, ES) that will be calculated by the formula: \nwhere: \n• $E S in [ 0 ; 1 ]$ ; {countries}—set of explored countries;   \n• Exhaustables is the component that characterizes the dynamics of resource deflation;   \n• Renewables is the component that characterizes the volumes of usage of renewable sources in national energetic;   \n• NuclearR, CoalR, OilR, GasR—resources of uranium-235, coal, oil and gas (Nation Master, n.d.);   \n• Renewables Used—part of renewable energy produced and consumed by the country (at the expense of use of the energy of water, sun, wind, geothermal heat, biomass and rubbish burning) in percents from total energy consumption [4]. To evaluate the reduction in the reserves of organic fuel for various countries in the subsequent simulation, we will use the index of “Consumption of traditional fuels in percentage of the total energy needs of the country” [4]. \nThreat 2. The Imbalance Between Biological Capacity of the Earth and Human Needs in Biosphere (FB) \nIn early 2018, the world’s population reached 7.6 billion people living on the total area $5 1 0 , 0 7 2 , 0 0 0 mathrm { k m } ^ { 2 }$ . The daily growth of Earth population exceeds 162 thousand people [5]. According to the method of arithmetic extrapolation the Earth population will have been 9.75 billion people by the year 2050. That is why the first threat appears being related to the fact that the Earth will be inhabited by the number of people that will exceed its abilities to sustain on the basis of the present natural resources. The japanese experts believe that the real problems for the mankind will be connected with the catastrophic shortage of water, energy, foodstuff that can cause new conflicts on the Earth [6]. \nNature can satisfy human requirements for business activity and only while this activity remains within the biosphere renewable capacity on the populated part of the planet. The calculation of ecologically disturbed area (Ecological Footprint) [5] gives the possibility to establish some limit according to which the ecological requirements to the world economics are within or exceed the biosphere abilities (Biocapacity) to supply the people with goods and services. This limit helps people, organizations and government to create strategies, establish the goals and provide the process according to the requirements of the sustainable development. \n\nEcologically disturbed territory (Ecological Footprint) determines which its part is necessary to preserve present population according to the present level of consumption, level of technological development and usage efficiency of natural wealth. The unit of measurement of this dimension is average (global on the whole Earth) hectare. The most substantial component of the Ecological Footprint is the territory of the Earth used for foodstuff production, forest area, biofuel amount, ocean (seas) territory, used for fishing and the most important element is the Earth area, necessary to support the life of plants absorbing the emissions of $mathrm { C O } _ { 2 }$ as a result of organic fuel burning. \nEcological Footprint envisages that in world economy the people use resources and ecological services from all over the world. Thus, the indicator for a country may exceed its actual biological possibilities. On the basis of it, the essence of Ecological Footprint for a country is the extent of its consumption and global impact on environment. \nThe same methodology can be used for calculation (in the same values) of biological abilities of the Earth, biological productivity of its territory. In 2017 biological abilities of the Earth were approximately 11.2 billion or 1.8 global hectares per capita (non-human species were not considered). Now the human need in biosphere, i.e. its global Ecological Footprint is 18.1 billion global hectares or 2.3 global hectares per capita. That is why, today global Ecological Footprint exceeds biological abilities of the Earth by 0.5 global hectares per capita. This means that vital resources of the planet disappear faster than the nature can renew (Fig. 4.4). \nThis threat has substantial correlation degree with demographic structure change of the planet population. For example, according to Human Development Report \n2017 the biggest growth of population over a period of the following 50 years is expected in the poorest regions of the world: in Africa it will increase in 2 times, In Latin America and Caribbean basin will increase in 1.5 time, at the same time in Europe it will decrease in 0.8 times [4]. Essential threat is also uncontrolled increase in the urban population in underdeveloped countries. By the year 2050 it will have been doubled approximating to 10 billion people. It will lead to intensification of transport, ecological and social problems, an increase in criminality and other consequences of chaotic urbanization. \nThe important tendency of the nearest decades is rapid change in the structure of religious groups of the Earth population. So, from 1980 to 2015 the number of Muslins will increase from 16.5 to $30 %$ , the number of Christians will decrease from 33.3 to $31 %$ , the number of Hindus will decrease from 13.3 to $10 %$ , the number of Buddhists will decrease from 6.3 to $5 %$ . The number of representatives of other religious groups will also decrease from 31.1 to $2 5 %$ (Japan Vision 2050. Principles of Strategic Science and Technology Policy Toward 2020. Science). Science Council of Japan, 2005). These changes will cause the necessity of searching new methods of tolerance coexistence of people on the Earth. \nFor estimation of increasing threats, connected with imbalance between biological capability of the Earth and human requirements in biosphere, in terms of demographic structure change of the world we will use the indicator which is ecological reserve $( ^ { 6 6 } + ^ { 7 7 } )$ or deficit $( ^ { 6 6 } - 7 7 )$ in global hectares per capita for a country (Global Footprint Network, http://www.footprintnetwork.org/en/index.php/GFN/) [6]. \nThreat 3. Growing Inequality Between People and Countries on the Earth (GINI) \nAccording to the World Bank, the difference in profits between the richest and poorest countries was 44:1 in 1973 and is now 72:1. Three richest persons have a capital that exceeds the property of 47 poor countries, 475 richest people hold a capital that exceeds the property of half of the mankind. The ratio between one fifth of the rich part of the world population and one fifth of the poorest population has achieved 1:75. \nThe benefits of civilization remain unaccessible for the poorest group. Its representatives live for less than two dollars per day. Seven hundred million of them live in Asia, 400 million in Africa, and 150 million in Latin America. The difference between the richest and poorest groups in the standard of living has increased almost ten times over the last twenty years. The threat is rather hazardous from the standpoint of increasing number of conflicts in the world, growth of corruption, terrorism, and criminality, environmental degradation, impaired education and medical service. \nTo asses quantitatively the disparity of the distribution of economic and social benefits for each of the countries under study, we will use the Gini index [7], which reflects these characteristics. \nThreat 4. The Spread of Global Diseases (GD) \nThe World Health Organization considers such diseases as cancer, cardio ischemia, cerebrovascular disease (paralysis), chest troubles, diarrhea, AIDS, tuberculosis, malaria, diabetes to be the most dangerous for mankind as they may not only have bad consequences but also globally spread all over the world [8]. \n\nDuring the next 20 years the sufficient increase in mortality caused by all noninfectious global diseases and decrease in mortality caused by AIDS, tuberculosis and malaria are expected (Fig. 4.5). Such diseases as cardio ischemia, cerebrovascular disease, lung cancer and diabetes will become main global diseases during this period. At the same time the rate of total mortality from tobacco consumption will increase from 5.8 million people in the year 2009 to 8.3 million in the year 2030. Thus, tobacco is expected to kill by $50 %$ people more than AIDS. Total human mortality on the Earth will be by $10 %$ predetermined by the tobacco consumption. \nAccording to the UNAIDS international organization, the number of HIV-infected people on the Earth increased from 36.9 million in 2004 to 45 million in 2015. This general tendency (with minor oscillations) is traced in all regions of the world. \nDespite the success in tuberculosis control, eight million new diseases occur annually in the world, which cause two million lethal outcomes. In the countries with high prevalence of HIV/AIDS, the number of tuberculosis diseases has increased 3–4 times over the last 15 years (www.who.int/gb), $80 %$ of them in Africa, South-East Asia, and western part of the Pacific Ocean. \nMalaria is traditionally most spread in Africa and Latin America. For the last five years, morbidity has increased 2–3 times in Afghanistan, Ghana, Papua New Guinea, Pakistan, and Uganda and 30 times in Mozambique and Democratic Republic of the Congo; menacing rates of growth of morbidity (70 times increase) are observed in Mali. In other countries where malaria is revealed, the number of diseased changes within the limits of $pm 5 0 %$ for the last five years. \nThe spread of global diseases (GD) is measured in the total number of people (millions per year) who died from these diseases. For the subsequent simulation, we take data on these diseases from the World Health Organization [8]. \nThreat 5. Information Gap (IG) \nInformation Gap is formed by two determinants of the modern information society: \n1. Humanity is constantly generating gigantic volumes of new data and information. Its total volume will reach $3 5 mathrm { Z B }$ by 2020 ( $mathrm { ~ 1 ~ Z B = 1 0 2 1 ~ B ~ }$ ). At the same time, mankind is capable of comprehending, systematizing, processing and documenting significantly smaller volumes of new data and information (until 2020, only 15 ZB). Thus, by 2020, up to $2 0 mathrm { Z B }$ will accumulate unthinkable and unprocessed information, a sort of “Information black hole.” This information uncertainty explains the unpredictable and unconscious phenomenon (Fukushima catastrophe in 2011 as a result of the unpredictable earthquake and tsunami, the disintegration process of the European Union in 2016 as a result of the British referendum, and many others). This component of the Information Gap will be measured using the ICT Development Index (IDI, http://www.itu. int/net4/ITU-D/idi/2016/). \n2. The vulnerability of one or another country, territory or world to the action of cyber attacks. This component of the IG will be measured using the Global Cybersecurity Index (GCI, http://www.itu.int/en/ITU-D/Cybersecurity/Pages/ GCI-2017.aspx). \nBoth IDI and GSI indexes are annually formed by International Telecommunication Union. The resulting index of IG will be calculated from the formula: \nThreat 6. Corruption Perception (CP) \nCorruption is the biggest obstacle to the economic and social development of society. It endangers every change. Corruption has become not only one of the main reasons of poverty but also a source which prevents its overcoming. Although corruption had existed for a long time it became more widely spread in the process of globalization at the end of the 20th at the beginning of 21st centuries. \nCorruption in one country had negative impact on the development of other countries which means that countries with the high level of corruption are not limited to the Third World. The process of liberalization in the former socialist countries was accompanied by unprecedented position abuses in 90-ies. Thus, Financial Times proclaimed 1995 to be “the year of corruption”. The following years were marked with the spread of this phenomenon almost throughout all countries of the world and corruption itself became of global and international character. \nWellbeing did not become the prerequisite of successful elimination of corruption. The analysis of long-term tendencies revealed by the international organization «Transparency International» showed that during last 15 years the level of corruption has decreased in such countries as Estonia, Columbia, Bulgaria. Nevertheless, the growth of corruption occurs in such developed countries as Canada USA and Ireland. Such factors of risks as opacity of state authorities, excessive influence of separate oligarchic groups, violation in financing of political parties, etc. exist both in poor and rich countries and unfortunately, tendencies in increase of corruption scale are the same. \n\nUsually, the structure of corruption is different in different countries of the world. \nTo estimate the influence of corruption on socio-economical and cultural development of different countries of the world we will use “the Index of corruption perception” established by the international organization “Transparency International” [9] (https://www.transparency.org/country). \nThreat 7. Limited Access to Drinking Water (WA) \nAccording to the data of the World Health Organization (WHO) and the UNICEF [10] the world is under the threat of reduced the access to drinking (potable) water and to sanitary facilities. The fifth part of all mankind (1.4 billion people) does not have access to drinking water and 2.4 billion of people do not have minimal sanitary facilities. That is why 2003 was proclaimed as year of drinking water by the General Assembly of UNO. The period of 2005–2015 starting from the International Day of Water Recourses (22nd of March, 2005) was proclaimed as International decade of actions “Water for life”. \nAn especially severe situation is observed in urban areas of underdeveloped countries, where the fast growth of population aggravates promptly this problem. The above-mentioned factors especially affect children’s health. By the estimates of the World Health Organization, 1.6 million children under five annually died (on the average, 4500 children daily) of using unsafe water and lack of proper hygiene. As the world population grows, especially in underdeveloped countries, the struggle for control of the resources of fresh water will aggravate, which is the next global threat for mankind. \nAs the world’s population grows, especially in underdeveloped countries, the struggle for control over the remnants of drinking water resources increases. This phenomenon gives rise to the next, growing in time, threat to humanity. \nThe limited access to the drinking-water will be estimated by the inversed magnitude to the indicator of the access to drinking water [10]. \nThreat 8. Global Warming [GW] \nGlobal warming is the process of gradual increase in the average annual temperature of the Earth and World Ocean. According to conclusions of the International Expert Group in Climate Control (UNICEF Joint Monitoring Programme for Water Supply and Sanitation, n.d.) and National Academies of Sciences of the Group of Eight [11], from the end of 19th century the average temperature of the Earth has risen by $1  { } ^ { circ } { bf C }$ and “the major part of warming observed during the last 50 years had been caused by human activities” preliminary by gas emissions which cause green-house effect (carbon dioxide, $mathrm { C O } _ { 2 }$ ) and methane $mathrm { ( C H _ { 4 } ) }$ . \nEstimates obtained with the climate models and cited by the International Expert Group in Climate Control show that the average temperature of the Earth can increase from one to several $^ { circ } mathrm { C }$ (in different regions of the world or in the Earth in average) in 1990–2080 years. The warming is expected to cause other climate changes such as an increase in the level of Word Ocean by $0 . 1 { - } 5 mathrm { ~ m ~ }$ . (probably, in 30–40 years), the appearance of new viruses and also the change of atmospheric condensation and their distribution. \nThis may result in an increase in such natural disasters as floods, draughts, hurricanes etc.; a decrease in harvests of agricultural crops, the emergence of new epidemic diseases and the extinction of many biological species. As a result of the control over decreasing natural resources the struggle not only between countries but also between separate groups of population can exacerbate. This process will cause new global conflicts. \nIt is necessary to accept that influence of carbon dioxide emissions on the global warming is much higher than the corresponding influence of methane. That is why the danger of global warming could be estimated by the amount of carbon dioxide emissions $mathbf { C O } _ { 2 }$ in metric tons [5, 11]. \nThreat 9. The State Fragility [SF] \nAfter the end of Cold War and Soviet Union collapse (1991) the world has entered the era of new dramatic geopolitical processes. The following 18 years were marked with the blistering growth of globalization. Technical revolution in the field of information-communication technologies has made the world policy more transparent and led to an increase in changes influence which occurred in one region and affected the other parts of the planet. Due to these new qualities of the globalized world it became clear that new geopolitical system is full of unstable, unsuccessful and weak countries. The weakening of retaining mechanisms peculiar to bipolar world and conflict exacerbation between fundamental values of different countries caused a new wave of oppositions, terrorism, violence, territorial claims and irregular development. \nUncontrolled spread of nuclear, chemical and biological weapon, rebuilding of nuclear energetics in such unstable, unbalanced world significantly increases the threat for sustainable development and global security of mankind. \nUnder such conditions the stabilization of world development becomes possible due to the international cooperation, investments and support to the weak countries and planet regions by the progress of new paradigms of “harmonious coexistence” or “tolerant, peaceful world”. In order to accomplish such global, stabilizing policy the recognized international organizations and scientific centers began to develop analytical instruments for the estimation of new developing tendencies of the world since the beginning of this century. The first attempt to control the tendencies of the global development was a series of reports “The world and the conflict” which were published in the University of Maryland State (USA) in 2001. Reports devoted to the global tendencies of world development were also published in many countries such as Spain, Canada, and Germany etc. \nThe final aim of the development of new analytical instruments was the attempt to estimate the ability of different countries to act in such important dimensions as conflict, state administration, economic and social development. Among all these instruments “The index of ability of the peaceful society development” that belongs to the series of reports “The world and conflict”, “Indicators of the world management” developed by the World Bank and “Index of unsuccessfulness of the countries” developed by The Fund of Peace can be mentioned. \nFor the quantitative estimation of the sustainable development threat in our research the Fragile States Index (FSI) produced by The Fund for Peace (FFP) is used (http://ffp.statesindex.org). \nThis index is based on the twelve indicators that cover a wide range of state failure risk elements such as extensive corruption and criminal behavior, inability to collect taxes or otherwise draw on citizen support, large-scale involuntary dislocation of the population, sharp economic decline, group-based inequality, institutionalized persecution or discrimination, severe demographic pressures, brain drain, and environmental decay. Data concerning these values are given in the paper [12]. \nThreat 10. Natural Disasters (ND) \nNatural disasters are the threat which is not so directly dependent on the human activity comparing to the other threats mentioned above. But, taking into account the reports of the international organizations on climate changes (World Economic Forum, 2010–2017) we cannot state that a human being is beside the point of the dynamics of the natural disasters. \nExperts of UNO and World Data Center for “Geoinformatics and Sustainable Development” (http://wdc.org.ua/en) determined 6 major natural disasters (in the order of danger decrease): draughts, floods, hurricanes, extreme temperatures, earthquakes and tsunami (http://www.un.org/russian/ga/undp/). \nIndex is calculated as follows: \n1. The summarized total of people suffered from the natural cataclysms in a year in a country is calculated: \n2. Then the summarized total of people affected Disasters Affected is divided by the amount of population in the country and in the given year: \n3. After that the obtained data are normalized by the logistic norm: \nwhere M[.], s[.]—are approximate average and standard deviation values respectively per year in all countries. \nAs consequences of the natural disasters usually make a long-term influence on the country, gradually disappearing only with time, the final value of vulnerability index on the natural disasters will be defined as Exponential Weighted Moving Average (EWMA), which has the potential smoothing factor $alpha = 0 , 2 5$ \nThe value of the coefficient $boldsymbol { mathfrak { X } }$ was chosen by the experts on the basis of the estimation of the average time and level of the impact of disasters on the country. For convenience of calculations only the last significant $T m a x = 2 5$ years will be considered. At the same time the significance of time series will amount to $varepsilon = e ^ { T _ { mathrm { m a x } } cdot ln ( 1 - alpha ) } = 0 . 0 0 0 7 5 2 5 le 1 0 E - 3$ . \nFor the quantitative estimation of the degree of vulnerability of the world countries to the natural disasters the index of vulnerability to natural cataclysms was developed. The data of the International Disasters Database (http://www.emdat.be/) and the Centre for Research on the Epidemiology of Disasters (http://www.cred.be/) of the World Health Organization are used for its calculation. The values of vulnerability index for the countries to the natural disasters during 1995–2017 were calculated according to the given methodology. \nThreat 11. Conflict Intensity (CI) \nThe next global threat is the accruing quantity of conflicts in the world, both in the separate countries, and between the sovereign states and groups of such states. In our research we will consider a conflicts between interstate, intrastate, substate, and transstate ones. Whereas interstate conflicts only involve internationally recognized state actors, intrastate conflicts involve both state actors and non-state actors. Substate conflicts are carried out solely among non-state actors. Transstate conflicts involve both state and non-state actors and meet the criteria of political conflict for at least two sovereign states. \nConsidered conflicts we will characterise their intensity. We will distinguish five levels of intensity of conflicts: dispute, non-violent crisis, violent crisis, limited war, and war. To each of these levels we will appoint following quantity of points: \ndispute—1 point;   \nnon-violent crisis—2 points;   \nviolent crisis—3 points; limited war—4 points;   \nwar—5 points. \n\nThe last three levels constitute the category of violent conflicts, in contrast to the non-violent conflicts (dispute and non-violent crisis). Whereas a dispute is a political conflict carried out without resorting to violence, in a non-violent crisis one of the actors threatens to use violence. This includes violence against objects without taking the risk to harm persons, the refusal of arms surrender, pointing weapon systems against each other and sanctions. \nQuantitative data about intensity of conflicts we will take from a global resource: Heidelberg Institute for International Conflict Research (http://www.hiik.de/en/ konfliktbarometer/pdf/ConflictBarometer_2016.pdf) [4]. \nThreat 12. Proliferation (NI) \n“Proliferation” is a global threat of the debarment of the nuclear war, terrorism, the increasing of total number of weapons. It is a complex conception which is declared by many states and certainly is the part of modern policy. The level of the threat is opposite to value of Nonproliferation index defines degree of military nonproliferation and covers four categories of policy: \nNonproliferation Index defines degree of military proliferation and covers four categories of policy: \n1. Demilitarization or disarmament;   \n2. Scientific Research;   \n3. State’s Development;   \n4. Level of Nonproliferation for Neighbor States. \nEach of these categories is formed using one or two levels of indicators. These indicators, as well as the methodology for calculating the Nonproliferation index, have been developed by World Data Center for Geoinformatics and Sustainable Development [4]. \nNonproliferation index is presented in a hierarchy discrete model, which shows the factors of direct or indirect influence. This index shows the level of nonproliferation in terms of state’s possibility to adhere the nonproliferation concept in a wide sense. Lets consider each of these categories of policy \nThe first one is Disarmament. The importance of this category is underlined by Albert Einstein words: “I do not know with what weapons World War III will be fought, but World War IV will be fought with sticks and stones.” Therefore, it is necessary to control the weapons usage and decrease the number of these. Also the military expenditures could describe this category. This category is very specific, so the hypothesis about indicators should be improved by the existence of data in open sources or by indirect information. This category includes information about nuclear and uranium production, because it is necessary to keep in mind the possibility of nuclear weapons creation. \nIt is necessary to define the influence of indicators to the disarmament. Taking into consideration that it is a process, which have some changes in time, probably the differential quotient is the best way to describe the changes. So for all non-binary indicators we can use the formula: \n\nTherefore, this formula shows the dynamic of each process. \nIt is necessary to mention that the final algorithm of indicators chosen consists of two stages: finding all possible variants and choosing the group of necessary indicators by mathematical methods. In such a way we ignore the experts’ opinions by the open information limitation. This algorithm also shares to other categories. \nThe second category is Scientific Research. It has three main directions: Education, Science and Innovations: \n• The education indicators describe the quality of education and the level of knowledge through the financing and the enrolment of students. Participation in Olympiads can show the level of knowledge in comparison with other countries.   \n• The second direction is Science. It includes such indicators as Scientific and technical journal articles, Researchers in R&D (per million people), Research and development expenditure $%$ of GDP). The last direction—Innovations—includes indicators Patent applications, residents, High-technology exports (current $mathrm { U S } $ 5$ ). As a result, 9 indicators represent the category Scientific Research. \nThe third category is State’s Development. This category includes the following indicators: \nLife expectancy at birth (years);   \nGross national income (GNI) per capita (PPP $$ 1$ );   \nInflation, GDP deflator (annual $%$ );   \nEnergy use (kg of oil equivalent per capita);   \nSectorial structure of economy. \nThe last category is Level of Nonproliferation in Neighbor States. This category shows the risk to be engaged into some other states activities. The first indicator of this group is a conflict barometer for neighboring states. This indicator is calculated by Heidelberg Institute for International Conflict Research [4] and published in annual reports. It is necessary to find a quantitate measurement for relations between countries. It is clear that if historically country has some conflicts with neighbors, it restrains the nonproliferation. This indicator could be calculating in such way: first of all, it could be 0 or 1 for each state. And the total sum of the values will represent the historical factor of relations. The time horizon must be limited, for example, by the newest history (from XX century). It seems that this indicator partly crosses with the previous one. But the conflict barometer should describe only the current situation. \n4.2.3 Modeling the Total Impact of the Aggregate of 12 Global Threats on Different Countries and Groups of Countries \nLet’s determine the vulnerability of different countries and groups of countries to the impact of a set of 12 major threats (discussed in Sect. 4.2.2). Quantitative data on each of the 12 threats will be obtained from the global databases specified in the description of these threats in Sect. 4.2.2. To determine the groups of countries with close values of vulnerabilities to the impact of the 12 main threats, we use the hierarchical Ward’s algorithm of clustering [4]: \n[http://www.cse.iitb.ac.in/dbms/Data/Courses/CS632/1999/clustering/dbms.html]. \nLet’s associate each country j with a vector TrJ: \nelements of which characterize the degree of manifestation of corresponding 12 threats (Sect. 4.2.2), presented in Table 4.3. \nMost initial data on each threat (Table 4.3) are taken from the World Data Center “Geoinformatics and a Sustainable Development” (http://wdc.org.ua/en) [4]. Considering the fact that all the measured data for components of vector $overline { { mathrm { T r } _ { mathrm { J } } } }$ are presented in different units of measurement, they have different physical meaning and vary in different ranges, they have been reduced to the normalized form, so that they vary in the range (0, 1). In this case, the value 0 corresponds to the minimum value of the threat, and the value 1 corresponds to the maximum of this threat. \nLet’s do this normalization using the following method. If higher values of threat $X ^ { i }$ correspond to better state of this threat, the indicators values are logistically normalized according to the formula: \nwhere parameters $a$ and $b$ are calculated as an average value and a standard deviation for the set of countries under analysis. \nOtherwise, when the highest values of threat $X ^ { i }$ correspond to worse state of parameter, we use the value inverse to the one calculated by formula (4.23): \nAfter this normalization we have vector $left( overrightarrow { T _ { r J } } ^ { circ } right)$",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.2 Identifying the Regularity of the Emergence of Systemic World Conflicts, Based on the Analysis of Big Historical Data",
        "subsection": "4.2.2 Conflict of the 21st Century and Analysis of Its Nature",
        "subsubsection": "N/A"
    },
    {
        "content": "4.2.3 Modeling the Total Impact of the Aggregate of 12 Global Threats on Different Countries and Groups of Countries \nLet’s determine the vulnerability of different countries and groups of countries to the impact of a set of 12 major threats (discussed in Sect. 4.2.2). Quantitative data on each of the 12 threats will be obtained from the global databases specified in the description of these threats in Sect. 4.2.2. To determine the groups of countries with close values of vulnerabilities to the impact of the 12 main threats, we use the hierarchical Ward’s algorithm of clustering [4]: \n[http://www.cse.iitb.ac.in/dbms/Data/Courses/CS632/1999/clustering/dbms.html]. \nLet’s associate each country j with a vector TrJ: \nelements of which characterize the degree of manifestation of corresponding 12 threats (Sect. 4.2.2), presented in Table 4.3. \nMost initial data on each threat (Table 4.3) are taken from the World Data Center “Geoinformatics and a Sustainable Development” (http://wdc.org.ua/en) [4]. Considering the fact that all the measured data for components of vector $overline { { mathrm { T r } _ { mathrm { J } } } }$ are presented in different units of measurement, they have different physical meaning and vary in different ranges, they have been reduced to the normalized form, so that they vary in the range (0, 1). In this case, the value 0 corresponds to the minimum value of the threat, and the value 1 corresponds to the maximum of this threat. \nLet’s do this normalization using the following method. If higher values of threat $X ^ { i }$ correspond to better state of this threat, the indicators values are logistically normalized according to the formula: \nwhere parameters $a$ and $b$ are calculated as an average value and a standard deviation for the set of countries under analysis. \nOtherwise, when the highest values of threat $X ^ { i }$ correspond to worse state of parameter, we use the value inverse to the one calculated by formula (4.23): \nAfter this normalization we have vector $left( overrightarrow { T _ { r J } } ^ { circ } right)$ \n(continued) \nLet us associate security index Isec each country with a value $left. overline { { mathrm { T r } _ { mathrm { J } } } } right.$ , being the Minkowski norm of the vector $overline { { mathrm { T r } _ { mathrm { J } } } }$ for the jth country, compose\u0004d of\u0004normalized threats, for $p = 3$ . After such normalization the security index Isec for each country is defined as the Minkowski norm of vector $overrightarrow { S } _ { j } = left( s _ { i } ^ { j } right) , s _ { i } ^ { j } = 1 - t _ { i } ^ { j } , i = overline { { 1 , n } }$ \nwith parameter $p = 3$ . \nLet us call $I _ { s e c } = left| overrightarrow { S _ { J } } right|$ the degree of remoteness from the action of the set of 12 threats stated in Sect\u0004. 4.2\u0004.2 for the $j$ th country. \nBased on the calculated norms of the vector of threats $lVert overline { { mathbf { T r } _ { mathrm { J } } } } rVert$ for each country $j$ , let us introduce an order relation between clusters of coun\u0004tries\u0004 (Table 4.4): \nFrom Table 4.4 it follows that Cluster 1 includes the group of countries most successful from the safety standpoint, for which the degree of remoteness from the set of 12 global threats is the greatest in the sense of (4.12). And vice versa, Cluster 5 includes the countries most vulnerable in this respect. For these countries the degree of remoteness from the set of 12 global threats is minimum. \nBased on the data presented in Table 4.4, Fig. 4.6 illustrates the safety levels for different countries and regions of the world. \nAs presented in Table 4.5, the common trait of the ten leaders is high Isec, and low level of threats. E.g., the group leaders, Canada, Finland, and Australia, have the best indicators among all the group countries. However, certain coefficients such as biodiversity balance (FB), energy security (ES) and global warming (GW) are higher than average. \nThe G-7 countries are characterized by a high level of national security and therefore a low vulnerability to the impact of 12 global threats (Table 4.6). \nThe BRICS group of countries (Table 4.7) are characterized by average level of life security (except for South Africa) and high level of threats. Notably, China, the Russian Federation, and India are characterized by very high global warming (GW) coefficients. Brazil, the group leader, has high level of vulnerability to natural disasters (ND), and personal income inequality (GINI). The lowest security level is characteristic of South Africa, where the average level of threats is the highest in the group: notably vulnerability to global diseases (GD) and personal income inequality (GINI). \nSix countries from the Results of Cluster Analysis (Table 4.4) are characterized by high level of conflicts intensity indicator, which is caused by armed conflicts in the territories of such countries (Table 4.8). \nIn total, experts of the Heidelberg Institute for International Conflict Research counted 226 violent conflicts in the world in 2016 [4], 18 of them were defined as wars and 20—as limited wars. \nWars: \n– Sub-Saharan Africa: Nigeria (farmers—pastoralists); Nigeria, Cameroon, Chad, Niger (Boko Haram); Somalia, Kenya (al-Shabaab); South Sudan (inter-communal violence); South Sudan (SPLM/A-in-Opposition); Sudan (Darfur); Sudan (SPLM/A-North/Southern Kordofan, Blue Nile). Middle East and Maghreb: Afghanistan (Taliban et al.); Syria, Iraq et al. (IS); Libya (opposition); Syria (inter-opposition violence); Syria (opposition); Turkey (PKK, TAK); Yemen, Saudi Arabia (al-Houthi); Yemen (AQAP, Ansar al-Sharia). Asia and Oceania: Pakistan (Islamist militant groups).   \n– The Americas: Mexico (drug cartels).   \n– Europe: Ukraine (Donbas). \nLimited wars: \nSub-Saharan Africa: Central African Republic (Anti-Balaka—ex-Séléka); DR Congo (ADF); DR Congo (Bantu—Batwa); DR Congo (Mayi-Mayi et al.); DR Congo, Rwanda (FDLR); Nigeria (northerners—southerners); Sudan (inter-communal violence).   \nMiddle East and Maghreb: Egypt (Islamist groups/Sinai Peninsula); Turkey (opposition).   \nAsia and Oceania: India (Naxalites); Myanmar (KIA, KIO/Kachin State); Myanmar (Rohingya); Myanmar (TNLA/Shan State); Pakistan–India; Philippines (BIFM, BIFF—MILF, government). \nThe Americas: Brazil (drug-trafficking organizations); Colombia (ELN); Colombia (inter-cartel violence, neo-paramilitary groups, left-wing militants); El Salvador (Maras); Mexico (inter-cartel violence, paramilitary groups). \n4.2.4 Conclusions \n1. Based on the intellectual analysis of big historical data pertaining to global conflicts taking place from 705 B.C. till now, the regularity of their occurrence is determined. It is shown that a sequence of life cycles of system world conflicts is subordinate to the law of the Fibonacci series, and the intensity of these conflicts, depending on a level of technological evolution of the society, builds up under the hyperbolic law. By using the determined regularity we attempt to foresee the upcoming world conflict, called “the conflict of XXI century”, and analyze its nature and the principal characteristics-duration, main phases of its flow and intensity. \n2. A set of 12 basic global threats that generate “the conflict of XXI” was described. By using the cluster analysis we identify the impact of these threats on different countries of the world and on large groups of countries (civilizations) united by the common culture features. Suppositions were made on possible scenarios of the world development during “the conflict of XXI” and after its termination. \n4.3 Interrelation Between Periodic Processes in the Global Economy and Systemic World Conflicts \nOne of the major challenges to modern science because of the prompt development of the global economic crisis and aggravation of global conflicts is to draw up scientifically justified “metric” express forecasts of the social development for near and far future. The role of any scientific forecasts and predictions should not be exaggerated since they are conventional and limited, especially in the cases where the process being analyzed passes to the so-called “blow-up mode” [13]. However, the reliability of any forecast considerably increases if it “resonates” with other global or local tendencies, hypotheses, and patterns. In our study, such additional patterns are: \n• modern hypotheses that the historical time accelerates as scientific and technological progress develops [13]; cyclical nature of economic development [14];   \n• the tendency to reduce the duration of economic cycles as scientific and technological progress develops [15, 16].",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.2 Identifying the Regularity of the Emergence of Systemic World Conflicts, Based on the Analysis of Big Historical Data",
        "subsection": "4.2.3 Modeling the Total Impact of the Aggregate of 12 Global Threats on Different Countries and Groups of Countries",
        "subsubsection": "N/A"
    },
    {
        "content": "The Americas: Brazil (drug-trafficking organizations); Colombia (ELN); Colombia (inter-cartel violence, neo-paramilitary groups, left-wing militants); El Salvador (Maras); Mexico (inter-cartel violence, paramilitary groups). \n4.2.4 Conclusions \n1. Based on the intellectual analysis of big historical data pertaining to global conflicts taking place from 705 B.C. till now, the regularity of their occurrence is determined. It is shown that a sequence of life cycles of system world conflicts is subordinate to the law of the Fibonacci series, and the intensity of these conflicts, depending on a level of technological evolution of the society, builds up under the hyperbolic law. By using the determined regularity we attempt to foresee the upcoming world conflict, called “the conflict of XXI century”, and analyze its nature and the principal characteristics-duration, main phases of its flow and intensity. \n2. A set of 12 basic global threats that generate “the conflict of XXI” was described. By using the cluster analysis we identify the impact of these threats on different countries of the world and on large groups of countries (civilizations) united by the common culture features. Suppositions were made on possible scenarios of the world development during “the conflict of XXI” and after its termination. \n4.3 Interrelation Between Periodic Processes in the Global Economy and Systemic World Conflicts \nOne of the major challenges to modern science because of the prompt development of the global economic crisis and aggravation of global conflicts is to draw up scientifically justified “metric” express forecasts of the social development for near and far future. The role of any scientific forecasts and predictions should not be exaggerated since they are conventional and limited, especially in the cases where the process being analyzed passes to the so-called “blow-up mode” [13]. However, the reliability of any forecast considerably increases if it “resonates” with other global or local tendencies, hypotheses, and patterns. In our study, such additional patterns are: \n• modern hypotheses that the historical time accelerates as scientific and technological progress develops [13]; cyclical nature of economic development [14];   \n• the tendency to reduce the duration of economic cycles as scientific and technological progress develops [15, 16].",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.2 Identifying the Regularity of the Emergence of Systemic World Conflicts, Based on the Analysis of Big Historical Data",
        "subsection": "4.2.4 Conclusions",
        "subsubsection": "N/A"
    },
    {
        "content": "The most significant are four types of economic cycles [14]: \nthe Kitchin inventory cycle of 3–5 years;   \nthe Juglar fixed-investment cycle of 7–11 years;   \nthe Kuznets infrastructural investment cycle of 15–25 years;   \nthe Kondratiev wave or long technological cycle of 45–60 years. \nIn our study, we consider Kondratiev’s economic cycles as the most fundamental, characterizing not only economic, but also social and political processes in society. \nProceeding from the above facts and considering the evolutionary development of the civilization as a holistic process, which is determined by a harmonious interaction of its components, we will compare the patterns of Kondratieff cycles of the development of global economy and identified in Sect. $4 . 2 ~ C _ { n }$ -waves of global systemic conflicts and will make an attempt to predict the course of periodic processes in the 21st century. \n4.3.1 Periodicity of Global Systemic Conflicts and Economic Processes \nIn the previous section (Sect. 4.2), the Fibonacci pattern of the course of systemic world conflicts was revealed (Table 4.1), presented in terms of the duration of these conflicts $T ( C _ { n } ) ( 4 . 1 - 4 . 4 )$ and their intensity $I ^ { * } ( C _ { n } )$ (4.5–4.6). \nAs mentioned above, the property of a cyclical change of the economy is reflected by Kondratieff cycles (K-cycles) discovered by an outstanding Russian economist Nikolai Kondratiev 80 years ago [14, 15]. During the last two centuries, such cycles with 40–60-year periods (Table 4.9) fully complied with the actual development of economy. Figure 4.7 illustrates the course of K-cycles that cover the time interval from the first half of the last century to the present time. \nAnalyzing these cycles reveals that the major depression (in the USA) during the downwave of the third K-cycle is illustrative for the last century; it has begun at the end of the 1920s, developed into the default of dollar in 1933, and has essentially rearranged the world structure as a result of the Second World War. The next long crisis of the global economy, which is on the downwave of the fourth K-cycle, began at the end of the 1960s–the early 1970s, developed into the default of dollar in 1971, oil crisis in 1973–1975, passed to the deep economic crisis called stagflation, and resulted in the mid-1980s–the early 1990s in the disintegration of the Soviet Union, reconfiguration of the world, and its transition to a unipolar world model. \n\nOf great importance are Kondratieff’s conclusions (confirmed by the analysis of historical evidence) that the periods of upwaves of Kondratieff cycles are usually accompanied by more serious social disruptions (revolutions, wars, etc.) than the periods of downwaves [14, 15]. Hence, Kondratieff cycles characterize not only economic but also socio-political dynamics. \nAn analysis of these phenomena reveals an interrelation between two cyclic processes, development of the global economy and the occurrence and course of global systemic conflicts. Nowadays, the mankind is on the border of the transition from the downwave of the fifth Kondratieff cycle to the rising wave of the sixth K-cycle. This state corresponds to the transition from the global economic crisis to the next economic upswing. \n4.3.2 Analysis of the Relationship Between Systemic World Conflicts and the Global Economy \nDespite numerous attempts to establish a law governing the cyclic processes, none pattern of the varying duration of full K-cycles in time has been scientifically substantiated, which complicates drawing up efficient “metric” forecasts of social development for the near and far future. As a rule, all the studies addressed the internal nature of Kondratieff cycles. For example, the hypothesis is well known that the duration of K-cycles reduces with the scientific and technological progress [17, 18].",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.3 Interrelation Between Periodic Processes in the Global Economy and Systemic World Conflicts",
        "subsection": "4.3.1 Periodicity of Global Systemic Conflicts and Economic Processes",
        "subsubsection": "N/A"
    },
    {
        "content": "Of great importance are Kondratieff’s conclusions (confirmed by the analysis of historical evidence) that the periods of upwaves of Kondratieff cycles are usually accompanied by more serious social disruptions (revolutions, wars, etc.) than the periods of downwaves [14, 15]. Hence, Kondratieff cycles characterize not only economic but also socio-political dynamics. \nAn analysis of these phenomena reveals an interrelation between two cyclic processes, development of the global economy and the occurrence and course of global systemic conflicts. Nowadays, the mankind is on the border of the transition from the downwave of the fifth Kondratieff cycle to the rising wave of the sixth K-cycle. This state corresponds to the transition from the global economic crisis to the next economic upswing. \n4.3.2 Analysis of the Relationship Between Systemic World Conflicts and the Global Economy \nDespite numerous attempts to establish a law governing the cyclic processes, none pattern of the varying duration of full K-cycles in time has been scientifically substantiated, which complicates drawing up efficient “metric” forecasts of social development for the near and far future. As a rule, all the studies addressed the internal nature of Kondratieff cycles. For example, the hypothesis is well known that the duration of K-cycles reduces with the scientific and technological progress [17, 18]. \nKondratieff and his disciples emphasized that the patterns in the cyclic dynamics of the economy and society are basically probabilistic. For different parameters (as well as countries and regions), K-cycles are more or less pronounced. The results of the analysis of long-term waves depend on the metrics and system of indicators underlying the global historical pulsations and trends. \nIn this section we propose a new approach to revealing the patterns of time variation in the duration of full K-cycles that synchronizes the development of K-cycles with an external “metric” process, namely, with the course of global systemic conflicts [16]. \nBasic Assumption. Let us formulate an assumption to be used in what follows: there is one more downwave in the Kondratieff cycle that lasts about 28–30 years (1750/55–1779/85) and precedes the first upwave (1779/85–1810/17) identified by Kondratieff [14, 15]. \nSuch an assumption can be substantiated by a number of objective statements; we will mention the most important ones. \nFirst, the statement that such a wave (if exists) falls and lasts about 28 years agrees well with the fact that the next wave accrues with approximately the same duration, i.e., there is metric conformity in the sequence of down- and upwaves. \nSecond, as a distinguished Austrian-American scientist J. Schumpeter asserted, there is a set of Kondratieff cycles. Schumpeter’s conclusions were based on his “innovation theory of business” [19], which he used as late as in the 1930s to develop a “Kondratieff cyclic paradigm” and innovation concept of “long waves.” \nThird, Kondratieff waves should not be considered just as a form of cyclic economic dynamics. They are a kind of historical cycles that cover the structure of the whole society. It is in this aspect that Braudel [20], a well-known historian of the 20th century, considered Kondratieff cycles and related them with the historical tendency of the society and dated appearing of such cycles several centuries back: If we associate two processes, the century tendency and Kondratieff cycles, we can hear “music” of long-term conjuncture that sounds in two voices. \nIn contrast to the conventional point of view, the cycles Kondratieff speaks about appeared on the European theater not in 1779 but several centuries earlier. Adding the movements to the rise or downfall of the century tendency, Kondratieff cycles strengthened or softened it [20]. \nModified Sequence of Kondratieff Cycles. Let us consider the main assumption. We will generate a new sequence of Kondratieff cycles $left{ K _ { n } right} _ { n geq 1 }$ (Table 4.6) based on the conventional chronology [15–18]. In what follows, we will call the sequence ${ K _ { n } } _ { n geq 1 }$ a modified sequence of Kondratieff cycles (MSKC). \nNote that while each term of the conventional sequence $left{ K _ { n } ^ { 0 } right} _ { n geq 1 }$ of Kondratieff cycles (Table 4.6) is defined by a pair \nthe associated Kondratieff cycles in the modified sequence $left{ K _ { n } right} _ { n geq 1 }$ are defined by the inverse pair \nSince the sequence of $C$ -waves of global systemic conflicts (see Table 4.1) and the modified sequence of $K$ -cycles of development of global economy (see Table 4.10) are considered as interdependent components of the holistic process of development of global society, let us overlap the curves of these processes on a unified time scale from 1750 to 2008 (Fig. 4.8). Note that waves of global conflicts $C _ { mathrm { n } }$ are actually joined together (see by Table 4.1) during some time, and specific dates of joining waves $mathrm { C } _ { 4 }$ and ${ bf C } _ { 5 }$ (1750); ${ bf C } _ { 5 }$ and $mathbf { C } _ { 6 }$ (1920); $mathrm { C } _ { 6 }$ and $mathbf { C } _ { 7 }$ (2008) are determined as some averaged instants of time. \n\nAnalyzing the result of overlapping these two processes on the common time axis reveals a pattern; we will formulate it as the following principles. \n1. Quantization Principle. The time intervals $T _ { k } ( Delta ( C _ { n } ) ) , n ge 5$ , on which the wave $C _ { n }$ undergoes the five phases of evolution: (origin) $>$ (growth) $>$ (culmination) $>$ (decrease) $>$ (decay), contain an integer number $T _ { k } ( C _ { n } )$ of full K-cycles of the MSKC $left{ K _ { n } right} _ { n geq 1 }$ . \n2. Monotonicity Principle. The average duration $T _ { k } ( C _ { n } )$ of one full K-cycle of the MSKC $left{ K _ { n } right} _ { n geq 1 }$ on the time intervals $left( C _ { n } right)$ substantially decreases as $n$ grows. \nDenote by \na group (quantum) of $kappa$ -cycles separated by the $mathbf { C }$ -wave $C _ { k }$ from the MSKC $left{ K _ { n } right} _ { n geq 1 }$ . Then \nwhere $T big ( K _ { j } big )$ is the duration of one full Kondratieff cycle $K _ { j }$ . \nIn this case, \nThe pattern revealed allows formulating the basic hypothesis on the probable next step of quantization, based on which the seventh wave of the next group $G { big ( } C _ { 7 } ; { K _ { n } } _ { n geq 1 } { big ) }$ of $kappa$ -cycles can be distinguished in the MSKC ${ K _ { n } } _ { n geq 1 }$ . To this end, let us formulate the following hypothesis. \nMain Hypothesis. Since development of the global economy and the course of global systemic conflicts are interdependent components of the same process of evolutionary development of a globalized society, the coordination of these processes on the time intervals $T _ { k } ( Delta ( C _ { 5 } ) )$ and $T _ { k } ( Delta ( C _ { 6 } ) )$ as to obeying the quantization and monotonicity principles holds true also on the time interval $T _ { k } ( Delta ( C _ { 7 } ) )$ . \nBased on the main hypothesis, we can predict the course (in a metric sense) of $kappa$ -cycles in the 21st century, namely: \na. the time interval $T _ { k } ( Delta ( C _ { 7 } ) )$ contains no less than two full MSKC cycles ${ K _ { n } } _ { n geq 1 }$ ; \nb. average duration of one full K-cycle on the time interval $T _ { k } ( Delta ( C _ { 7 } ) )$ is much shorter than $T _ { k } ( Delta ( C _ { 6 } ) ) = 4 3 . 5$ years. \nHence, two cases that correspond to two scenarios of Kondratieff cycles in the 21st century are possible. \nScenario A. The time interval 2008–2092 contains two full Kondratieff cycles (Fig. 4.9a). In this case, \nScenario B. The time interval 2008–2092 contains three full Kondratieff cycles (Fig. 4.9b). In this case, \nThe main confirmation of the validity of the scenario A is the commonly assumed average duration of one full K-cycle varying from 40 to 60 years [15, 16]. However, more powerful arguments can be given in favor of the scenario B. \nFirst, obeying the monotonicity principle is rather conventional for the scenario A since $T _ { k } ( Delta ( C _ { 7 } ) ) = 4 2 . 5$ years and $T _ { k } ( Delta ( C _ { 6 } ) ) = 4 3 . 5$ years can be assumed approximately equal because of the errors of time “joints” of the processes on the time interval from 1750 to 2092. \n\nSecond, results of some modern studies of global evolutionary processes (such as the concept about the acceleration of historical time [21] and the hypothesis that the duration of Kondratieff cycles tends to reduce with the scientific and technological progress [17, 18]) may indirectly confirm the priority of the scenario B. \nIf the scenario B takes place, then most probably the durations $mathrm { T } ( mathrm { K } _ { 6 } ) , mathrm { T } ( mathrm { K } _ { 7 } )$ , and $mathrm { T } ( mathrm { K } _ { 8 } )$ of the predicted K-cycles $K _ { 6 } , K _ { 7 }$ , and $K _ { 8 }$ will be related as follows: \nCertainly, to substantiate the choice of the most reliable relationship among $T ( K _ { 6 } ) ,  T ( K _ { 7 } )$ , and $T ( K _ { 8 } )$ , additional integrated studies are necessary that would take into account the dynamics of various components of the global evolution of the civilization such as the following key ones: prompt depletion of power resources of the Earth, varied demographic structure of the world, growing social inequality among people and countries, global climate changes, natural disasters, etc. It is important to establish a relationship between the time quantum $mathbf { k } _ { mathrm { ~ c ~ } }$ of the life of $C .$ - waves and average duration of one full cycle of the modified sequence of Kondratieff cycles. Since $k _ { c } approx 8 5$ years [2] and \n$k _ { c } approx 2 T _ { k } big ( boldsymbol { varDelta } ( C _ { 5 } ) cup boldsymbol { varDelta } ( C _ { 6 } ) cup boldsymbol { varDelta } ( C _ { 7 } ) big )$ and the sequence ${ T ( C _ { n } ) } , n = 1 , 2 , . . . , 7$ (Table 4.2) can be represented as the following series: \nThis yields Fibonacci dependence of the duration of life of all waves $C _ { n }$ on the average duration of one full cycle of the modified sequence of Kondratieff cycles during the time interval from 1750 to 2092. \nFinally, the above pattern confirms the hypothesis that the duration of Kondratieff cycles tends to reduce with scientific and technological progress [17, 18], with the following refinement: the hypothesis is true not for the sequence ${ T ( K _ { n } ) } _ { n geq 1 }$ generated by the $left{ K _ { n } right} _ { n geq 1 }$ but for the sequence $left{ T _ { k } ( Delta ( C _ { m } ) ) right} _ { m geq 5 }$ generated by the sequence of groups (quantums) of K-cycles \u0001G Cm; Kngn 1\t\u0003m 5. \n4.3.3 Conclusions \n1. Based on the evolutionary development of the civilization as a holistic process determined by a harmonious interaction of its components, we have compared the patterns of a sequence of Kondratieff cycles of development of global economy and of C-waves of global systemic conflicts and have made an attempt to predict the course of these interconnected processes in the 21st century with the use of a metric approach.   \n2. The results of the analysis allow concluding that the 21st century will most probably manifest three K-cycles with the average duration of one full cycle of about 30 years, which is much shorter than the average duration of one of the previous five Kondratieff cycles $( approx 5 0$ years). This may be because of the technological progress and the new technological pattern being formed, which cannot be investigated yet at the present stage of the development of the mankind.   \n3. The interrelation has been revealed and Fibonacci dependence has been established for the time quantum $k _ { c }$ of the life cycles of C-waves of global systemic conflicts and average duration of one full cycle of the modified sequence of Kondratieff cycles on the time interval from 1750 to 2092.   \n4. The results of the study confirm the refined hypothesis that the duration of Kondratieff cycles tend to reduce with the scientific and technological progress [17, 18]. The revealed synchronization of the development of the global economy and the course of global systemic conflicts can be interpreted as indirect confirmation of the adequacy of the models of Kondratieff cycles [15, 16] and C-waves [2].",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.3 Interrelation Between Periodic Processes in the Global Economy and Systemic World Conflicts",
        "subsection": "4.3.2 Analysis of the Relationship Between Systemic World Conflicts and the Global Economy",
        "subsubsection": "N/A"
    },
    {
        "content": "This yields Fibonacci dependence of the duration of life of all waves $C _ { n }$ on the average duration of one full cycle of the modified sequence of Kondratieff cycles during the time interval from 1750 to 2092. \nFinally, the above pattern confirms the hypothesis that the duration of Kondratieff cycles tends to reduce with scientific and technological progress [17, 18], with the following refinement: the hypothesis is true not for the sequence ${ T ( K _ { n } ) } _ { n geq 1 }$ generated by the $left{ K _ { n } right} _ { n geq 1 }$ but for the sequence $left{ T _ { k } ( Delta ( C _ { m } ) ) right} _ { m geq 5 }$ generated by the sequence of groups (quantums) of K-cycles \u0001G Cm; Kngn 1\t\u0003m 5. \n4.3.3 Conclusions \n1. Based on the evolutionary development of the civilization as a holistic process determined by a harmonious interaction of its components, we have compared the patterns of a sequence of Kondratieff cycles of development of global economy and of C-waves of global systemic conflicts and have made an attempt to predict the course of these interconnected processes in the 21st century with the use of a metric approach.   \n2. The results of the analysis allow concluding that the 21st century will most probably manifest three K-cycles with the average duration of one full cycle of about 30 years, which is much shorter than the average duration of one of the previous five Kondratieff cycles $( approx 5 0$ years). This may be because of the technological progress and the new technological pattern being formed, which cannot be investigated yet at the present stage of the development of the mankind.   \n3. The interrelation has been revealed and Fibonacci dependence has been established for the time quantum $k _ { c }$ of the life cycles of C-waves of global systemic conflicts and average duration of one full cycle of the modified sequence of Kondratieff cycles on the time interval from 1750 to 2092.   \n4. The results of the study confirm the refined hypothesis that the duration of Kondratieff cycles tend to reduce with the scientific and technological progress [17, 18]. The revealed synchronization of the development of the global economy and the course of global systemic conflicts can be interpreted as indirect confirmation of the adequacy of the models of Kondratieff cycles [15, 16] and C-waves [2]. \n4.4 Metric Aspects of Periodic Processes in Economy and Society \nThe interrelation and principles of the development of various processes in the nature and society are discussed in many publications [19, 22]. The paper [2] reveals the pattern of global system conflicts based on the dynamic model of so-called $C$ -waves with the underlying metrics of golden section. The paper [20] compares the principles of the sequence of great Kondratieff cycles of the development of the global economy and $C mathrm { . }$ -waves of global system conflicts. An attempt is made to predict these periodic processes for the XXIst century. \nThe principles revealed for global system conflicts and great Kondratieff cycles become much more reliable if they correspond to some additional external conditions (concepts, principles, hypotheses) and the conclusions made on their basis are coordinated or “resonate” with the conclusions of other independent studies. \nWe will consider the evolution of the civilization as a holistic process that results from the harmonic interaction of its components and substantiate the conformity of the principles revealed in [2] to some additional conditions, namely: \nlaw of structural harmony [21];   \nmodern concept on the acceleration of historical time [23, 24];   \nconcept of great Kondratieff cycles [15, 16];   \nglobal forecasts for the XXIst century [15, 17, 18, 22, 25–28]. \nBy studying the empirical sequence of the periodicity of global conflicts, we will perform the structural analysis of $C _ { n }$ -waves $left( n = overline { { 1 , 6 } } right)$ identified in [2] and propose metric approaches to the analysis and prediction of some global civilization processes. \n4.4.1 Initial Definitions \nLet us introduce some concepts and definitions: \n• In what follows, we will call the Fibonacci-pattern of global system conflicts substantiated in [2] the $F _ { mathbf { Phi } }$ -pattern; $C _ { W }$ is the totality of all the global conflicts (according to [22]) from 705 BC to Nowday;   \n• ${ { N } _ { W C } } left( t right)$ is the number of all the global conflicts $c in C _ { W }$ in the year $t$ ：   \n• $Delta ( m ; n ) triangleq { k : ( m leq k leq n ) land ( k in Z ) } , m , n in Z$ , where $Z$ is the set of integers; ${ N _ { W C } ( t ) } _ { t in Delta ( - 7 5 0 ; mathrm { N o w d a y } ) }$ is the empirical sequence of the periodicity of global conflicts $C _ { W }$ (WC-sequence for short); $begin{array} { r } { I _ { W C } ( Delta ( m ; n ) ) = m e s ^ { - 1 } Delta ( m ; n ) sum _ { t in Delta ( m ; n ) } N _ { W C } ( t ) } end{array}$ is the intensity of the WCsequence on the set (time interval) $Delta ( m ; n )$ ;",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.3 Interrelation Between Periodic Processes in the Global Economy and Systemic World Conflicts",
        "subsection": "4.3.3 Conclusions",
        "subsubsection": "N/A"
    },
    {
        "content": "4.4 Metric Aspects of Periodic Processes in Economy and Society \nThe interrelation and principles of the development of various processes in the nature and society are discussed in many publications [19, 22]. The paper [2] reveals the pattern of global system conflicts based on the dynamic model of so-called $C$ -waves with the underlying metrics of golden section. The paper [20] compares the principles of the sequence of great Kondratieff cycles of the development of the global economy and $C mathrm { . }$ -waves of global system conflicts. An attempt is made to predict these periodic processes for the XXIst century. \nThe principles revealed for global system conflicts and great Kondratieff cycles become much more reliable if they correspond to some additional external conditions (concepts, principles, hypotheses) and the conclusions made on their basis are coordinated or “resonate” with the conclusions of other independent studies. \nWe will consider the evolution of the civilization as a holistic process that results from the harmonic interaction of its components and substantiate the conformity of the principles revealed in [2] to some additional conditions, namely: \nlaw of structural harmony [21];   \nmodern concept on the acceleration of historical time [23, 24];   \nconcept of great Kondratieff cycles [15, 16];   \nglobal forecasts for the XXIst century [15, 17, 18, 22, 25–28]. \nBy studying the empirical sequence of the periodicity of global conflicts, we will perform the structural analysis of $C _ { n }$ -waves $left( n = overline { { 1 , 6 } } right)$ identified in [2] and propose metric approaches to the analysis and prediction of some global civilization processes. \n4.4.1 Initial Definitions \nLet us introduce some concepts and definitions: \n• In what follows, we will call the Fibonacci-pattern of global system conflicts substantiated in [2] the $F _ { mathbf { Phi } }$ -pattern; $C _ { W }$ is the totality of all the global conflicts (according to [22]) from 705 BC to Nowday;   \n• ${ { N } _ { W C } } left( t right)$ is the number of all the global conflicts $c in C _ { W }$ in the year $t$ ：   \n• $Delta ( m ; n ) triangleq { k : ( m leq k leq n ) land ( k in Z ) } , m , n in Z$ , where $Z$ is the set of integers; ${ N _ { W C } ( t ) } _ { t in Delta ( - 7 5 0 ; mathrm { N o w d a y } ) }$ is the empirical sequence of the periodicity of global conflicts $C _ { W }$ (WC-sequence for short); $begin{array} { r } { I _ { W C } ( Delta ( m ; n ) ) = m e s ^ { - 1 } Delta ( m ; n ) sum _ { t in Delta ( m ; n ) } N _ { W C } ( t ) } end{array}$ is the intensity of the WCsequence on the set (time interval) $Delta ( m ; n )$ ; \n• a partition $omega ( Delta ( m ; n ) )$ of the set $Delta ( m ; n )$ is the sequence of sets ${ Delta ( m _ { s } ; n _ { s } ) } _ { s = overline { { { 1 , M } } } }$ that satisfies the following conditions: \nwe will say that the WC-sequence generates a local wave $L W _ { textit { C } } left( Delta ( m ; n ) right)$ of global conflicts on the set (time interval) $Delta ( m ; n )$ if there exists a partition $omega ( Delta ( m ; n ) ) Big { Delta ( m _ { s } ; n _ { s } ) _ { s = overline { { { 1 , 5 } } } } Big }$ such as \n• the quantity mes $Delta ( m ; n )$ determines the duration of the life cycle of the wave $L W _ { C } left( Delta ( m ; n ) right)$ , and the time intervals $Delta ( m _ { s } ; n _ { s } ) , s = overline { { 1 , 5 } }$ are the durations of the corresponding phases (stages) $f _ { S } ( L W _ { C } ~ ( Delta ( m ; n ) ) ;$ : origin $Delta ( m _ { 1 } ; n _ { 1 } )$ ; growth $Delta ( m _ { 2 } ; n _ { 2 } )$ ; culmination $Delta ( m _ { 3 } ; n _ { 3 } )$ ; decrease $Delta ( m _ { 4 } ; n _ { 4 } )$ ; and decay $Delta ( m _ { 5 } ; n _ { 5 } )$ ; • $I ( L W _ { c } ( Delta ( m _ { n } ; n _ { n } ) ) ) equiv I _ { W C } ( Delta ( m ; n ) )$ is the intensity of the local wave $L W _ { C }$ $( Delta ( m ; n ) )$ of global conflicts; • $I ( f _ { S } ( L W _ { C } ( Delta ( m ; n ) ) ) ) equiv I _ { W C } ( Delta ( m _ { s } ; n _ { s } ) )$ is the intensity of the phase $f _ { S } ( L W _ { C }$ $( Delta ( m ; n ) ) $ of the local wave $L W _ { C }$ $( Delta ( m ; n )$ of global conflicts. \n4.4.2 Structural Analysis of Global System Conflicts \nTable 4.11 shows the partition of the time interval $Delta ( - 7 5 0 ; mathrm { N o w d a y } )$ \n$omega ( Delta ( - 7 5 0 ; mathrm { N o w d a y } ) ) = { Delta ( alpha _ { n } ; beta _ { n } ) } _ { n = 1 , 6 ^ { prime } }$ defined in [2].",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.4 Metric Aspects of Periodic Processes in Economy and Society",
        "subsection": "4.4.1 Initial Definitions",
        "subsubsection": "N/A"
    },
    {
        "content": "• a partition $omega ( Delta ( m ; n ) )$ of the set $Delta ( m ; n )$ is the sequence of sets ${ Delta ( m _ { s } ; n _ { s } ) } _ { s = overline { { { 1 , M } } } }$ that satisfies the following conditions: \nwe will say that the WC-sequence generates a local wave $L W _ { textit { C } } left( Delta ( m ; n ) right)$ of global conflicts on the set (time interval) $Delta ( m ; n )$ if there exists a partition $omega ( Delta ( m ; n ) ) Big { Delta ( m _ { s } ; n _ { s } ) _ { s = overline { { { 1 , 5 } } } } Big }$ such as \n• the quantity mes $Delta ( m ; n )$ determines the duration of the life cycle of the wave $L W _ { C } left( Delta ( m ; n ) right)$ , and the time intervals $Delta ( m _ { s } ; n _ { s } ) , s = overline { { 1 , 5 } }$ are the durations of the corresponding phases (stages) $f _ { S } ( L W _ { C } ~ ( Delta ( m ; n ) ) ;$ : origin $Delta ( m _ { 1 } ; n _ { 1 } )$ ; growth $Delta ( m _ { 2 } ; n _ { 2 } )$ ; culmination $Delta ( m _ { 3 } ; n _ { 3 } )$ ; decrease $Delta ( m _ { 4 } ; n _ { 4 } )$ ; and decay $Delta ( m _ { 5 } ; n _ { 5 } )$ ; • $I ( L W _ { c } ( Delta ( m _ { n } ; n _ { n } ) ) ) equiv I _ { W C } ( Delta ( m ; n ) )$ is the intensity of the local wave $L W _ { C }$ $( Delta ( m ; n ) )$ of global conflicts; • $I ( f _ { S } ( L W _ { C } ( Delta ( m ; n ) ) ) ) equiv I _ { W C } ( Delta ( m _ { s } ; n _ { s } ) )$ is the intensity of the phase $f _ { S } ( L W _ { C }$ $( Delta ( m ; n ) ) $ of the local wave $L W _ { C }$ $( Delta ( m ; n )$ of global conflicts. \n4.4.2 Structural Analysis of Global System Conflicts \nTable 4.11 shows the partition of the time interval $Delta ( - 7 5 0 ; mathrm { N o w d a y } )$ \n$omega ( Delta ( - 7 5 0 ; mathrm { N o w d a y } ) ) = { Delta ( alpha _ { n } ; beta _ { n } ) } _ { n = 1 , 6 ^ { prime } }$ defined in [2]. \nTable 4.12 summarizes the results of the structural analysis and metric characteristics of six $C _ { n }$ -waves of global system conflicts. In view of the results of the structural analysis, the WC-sequence generates six successive local waves of global conflicts on the time interval $Delta ( - 7 5 0 ; 2 0 0 7 )$ : \nwhich represent some global cyclic civilization process [22] with decreasing period (see Table 4.11) and increasing intensity (see Table 4.12). Since the sequence of time intervals ${ Delta ( alpha _ { n } ; beta _ { n } ) } _ { n = overline { { { 1 , 6 ^ { prime } } } } }$ determines the corresponding sequence of life cycles of $C _ { n }$ -waves of global system conflicts [2], $C _ { n } equiv L W _ { C } ( Delta ( alpha _ { n } ; beta _ { n } ) ) , n = overline { { { 1 , 6 } } }$ . Figure 4.10 exemplifies the structural analysis of the time-base sweep of the WC-sequence on the time interval $Delta ( alpha _ { 5 } ; beta _ { 5 } )$ , and Fig. 4.11 illustrates another feature of the pattern of global conflicts, i.e., the strict hierarchy of $C _ { n }$ -waves with respect to the intensities of their phases: \nThe collective portrait of the intensity distribution of the phases $f _ { n , i } , n = { overline { { 1 , 6 } } }$ ; $i = { overline { { 1 , 5 } } }$ , of $C _ { n }$ -waves of global system conflicts (Fig. 4.12) evidently shows the wave dynamics of the pattern of global system conflicts. \n4.4.3 Confirmation of the F-Pattern by Other Independent Studies \nAny revealed pattern will be much more reliable if it corresponds to some additional external conditions (concepts, principles, hypotheses, etc.) and the conclusions obtained on its basis coincide or “resonate” with conclusions of other independent studies. Therefore, we will discuss the conclusions of some independent studies that confirm the reliability of the $F$ -pattern.",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.4 Metric Aspects of Periodic Processes in Economy and Society",
        "subsection": "4.4.2 Structural Analysis of Global System Conflicts",
        "subsubsection": "N/A"
    },
    {
        "content": "4.4.3 Confirmation of the F-Pattern by Other Independent Studies \nAny revealed pattern will be much more reliable if it corresponds to some additional external conditions (concepts, principles, hypotheses, etc.) and the conclusions obtained on its basis coincide or “resonate” with conclusions of other independent studies. Therefore, we will discuss the conclusions of some independent studies that confirm the reliability of the $F$ -pattern. \n\nThe first argument: the golden section present in the structure of $C$ -waves. Indeed, according to the structural harmony law [21], in any self-organizing system, each operation mode varying according to the variations in structural states of the system is related to a special time scale. This time scale is associated with a certain invariant of the generalized golden section as a characteristic of the stationary state of the system. \nThe second argument: as an integral part of the holistic evolutionary development of the civilization, the universal effect of the acceleration of historical time [23, 24] is manifested in the $F _ { mathbf { Phi } }$ -pattern since the duration of the lifecycles of $C .$ - waves of global system conflicts successively decreases. \nThe third argument is based on the prognostic properties of the $F$ -pattern. As follows from Table 4.7, the Fibonacci sequence ${  F _ { S }  }$ degenerates for $C _ { n }$ -waves for $n > 6$ . \nWhence the natural question arises: what will happen to the civilization after 2092, in particular, in the XXIInd century? Probably, the final cycle of an evolutionary chain will begin: \nThis question was answered by Vernadsky [22] and Moiseev [25], outstanding scientists of the last century. They independently proposed the idea that if mankind does not cardinally change its global behavior, the environment in the middle of the XXIst century will degrade to the point that mankind will cease to exist. These conclusions were made for the constant paradigm of mankind existence— self-serving. If mankind changes the global existence paradigm to, for example, harmonious coexistence on the Earth, then it will continue the mission on the planet and the pattern of global conflicts revealed for the previous paradigm, which corresponds to the Fibonacci sequence, will no longer be valid for the new paradigm. \nThus, according to the forecast which is based on the $F$ -pattern, the XXIst century is an especial, critical phase of the evolution of our civilization. Moreover, the proposed model allows not only making general conceptual conclusions but also predicting and evaluating the metric characteristics of the possible stages of the evolutionary development of the civilization in the XXIst century. \nThe fourth argument is based on the synchronism of two periodic processes: $C mathrm { . }$ -waves of global system conflicts and K-cycles of the development of the global economy, which are interdependent components of the unified holistic development of the global society. The fundamental property of the global society is cyclic development of its economy. This property is manifested by great Kondratieff cycles (K-cycles) discovered 80 years ago by Nikolai Kondratieff (Kondratiev), an outstanding Russian economist [15, 16]. Within the last two centuries, such cycles with periods of 40–60 years were in complete agreement with the real development of the economy. \nThe paper [20] interrelates the pattern of global system conflicts and the development of the global economy. Overlapping these two processes on the common time axis in [20] reveals their synchronism, which can be formulated as the following two principles: \n• Quantization Principle. The time intervals $Delta ( C _ { n } ) , n geq 5$ on which the wave $C _ { n }$ undergoes the five phases of evolution: (ORIGIN) ${ scriptstyle > } ( mathrm { G R O W T H } ) >$ (CULMINATION) $>$ (DECREASE) $>$ (DECAY), contain an integer number $n _ { k } ( Delta ( C _ { n } ) )$ of complete K-cycles; • Monotonicity Principle. The average duration $T _ { k } big ( Delta ( C _ { n } ) big )$ of one complete $mathrm { Delta K }$ -cycle on the time intervals $Delta ( C _ { n } )$ substantially decreases as $n$ grows. \n4.4.4 F-Principle as the Basis of a Metric Study of Global Civilization Processes \nAs the global economic crisis promptly expands and global conflicts sharply aggravate, quick “metric” forecasts become of special value. As the results of [20] show, the $F _ { mathbf { Phi } }$ -pattern may be an important aspect in the development of a scientifically proved toolkit and methodology for the analysis of global civilization processes. \nSince the development of the global economy and the course of global system conflicts are interdependent components of the evolutionary development of a globalized society, we may assume that the synchronism of these processes on the time intervals $Delta ( C _ { 5 } )$ and $Delta ( C _ { 6 } )$ as to obeying the quantization and monotonicity principles holds true also on the time interval $Delta ( C _ { 7 } )$ This yields, in particular, the Fibonacci-dependence of the duration of the lifecycles of $C$ -waves of global system conflicts on the average duration of great Kondratieff cycles from 1750 to 2092. \nAs shown in [20], two scenarios for great Kondratieff cycles are possible in the 21st century: \n• Scenario A. The period from 2008 to 2092 includes two complete Kondratieff cycles, 43.5 years on average each; Scenario B. The period from 2008 to 2092 includes three complete Kondratieff cycles, 28.3 years on average each. \nMore telling arguments can be brought in favor of scenario B. This can be confirmed indirectly by the results of some modern studies of global evolution processes, of which noteworthy are the concept of acceleration of historical time [23] and the hypothesis that the duration of great K-cycles tends to reduce with scientific and technical progress [17, 18].",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.4 Metric Aspects of Periodic Processes in Economy and Society",
        "subsection": "4.4.3 Confirmation of the F-Pattern by Other Independent Studies",
        "subsubsection": "N/A"
    },
    {
        "content": "The paper [20] interrelates the pattern of global system conflicts and the development of the global economy. Overlapping these two processes on the common time axis in [20] reveals their synchronism, which can be formulated as the following two principles: \n• Quantization Principle. The time intervals $Delta ( C _ { n } ) , n geq 5$ on which the wave $C _ { n }$ undergoes the five phases of evolution: (ORIGIN) ${ scriptstyle > } ( mathrm { G R O W T H } ) >$ (CULMINATION) $>$ (DECREASE) $>$ (DECAY), contain an integer number $n _ { k } ( Delta ( C _ { n } ) )$ of complete K-cycles; • Monotonicity Principle. The average duration $T _ { k } big ( Delta ( C _ { n } ) big )$ of one complete $mathrm { Delta K }$ -cycle on the time intervals $Delta ( C _ { n } )$ substantially decreases as $n$ grows. \n4.4.4 F-Principle as the Basis of a Metric Study of Global Civilization Processes \nAs the global economic crisis promptly expands and global conflicts sharply aggravate, quick “metric” forecasts become of special value. As the results of [20] show, the $F _ { mathbf { Phi } }$ -pattern may be an important aspect in the development of a scientifically proved toolkit and methodology for the analysis of global civilization processes. \nSince the development of the global economy and the course of global system conflicts are interdependent components of the evolutionary development of a globalized society, we may assume that the synchronism of these processes on the time intervals $Delta ( C _ { 5 } )$ and $Delta ( C _ { 6 } )$ as to obeying the quantization and monotonicity principles holds true also on the time interval $Delta ( C _ { 7 } )$ This yields, in particular, the Fibonacci-dependence of the duration of the lifecycles of $C$ -waves of global system conflicts on the average duration of great Kondratieff cycles from 1750 to 2092. \nAs shown in [20], two scenarios for great Kondratieff cycles are possible in the 21st century: \n• Scenario A. The period from 2008 to 2092 includes two complete Kondratieff cycles, 43.5 years on average each; Scenario B. The period from 2008 to 2092 includes three complete Kondratieff cycles, 28.3 years on average each. \nMore telling arguments can be brought in favor of scenario B. This can be confirmed indirectly by the results of some modern studies of global evolution processes, of which noteworthy are the concept of acceleration of historical time [23] and the hypothesis that the duration of great K-cycles tends to reduce with scientific and technical progress [17, 18]. \n4.4.5 Conclusions \n1. Based on the evolutionary development of the civilization considered as a holistic process with harmonious interaction of its various components, it has been justified that the $F .$ -pattern corresponds to a number of additional conditions, namely: \nthe law of structural harmony; the modern concept of acceleration of historical time; $bullet$ the concept of great Kondratieff cycles; global forecasts for the XXIst century as an especial, critical phase of the development of the civilization. \n2. As a result of the structural analysis of the time-base sweep of the WCsequences on the time interval $Delta ( - 7 5 0 ; mathrm { N o w d a y } )$ , all the metric characteristics of $C _ { n }$ -waves, $n = { overline { { 1 , 6 } } }$ have been established.   \n3. A new feature of the dynamics of $C _ { n }$ -waves of global system conflicts has been revealed, namely, the strict hierarchy of $C _ { n }$ -waves with respect to the intensities of their phases.   \n4. The metric forecast of the manifestation of great Kondratieff cycles in the 21st century based on the $F$ -pattern has been considered as an example. \n4.5 Big Solar Spiral of Stirring up Global Systemic Conflicts \nIn Sect. 4.2 and in paper [2], an analysis of the hypothetical Fibonacci pattern of global systemic conflicts is made, based on the dynamic model of so-called C-waves with the golden ratio metrics underneath. The studies [20, 29] consider the evolutionary development of the civilization as an integral process formed as a result of harmonic interference of its various components and justify the correspondence of this pattern to a number of additional conditions, namely: \nexistence of an interrelation of global systemic conflicts and Kondratiev cycles of economic conjuncture (Sect. 4.3), [20];   \n– the law of structural harmony [15, 21];   \n– modern concept on the acceleration of historical time [23]; global forecasts for the XXIst century as a special critical phase of the development of civilization [30]. \nBased on the analysis of the empirical sequence of the frequencies of global conflicts, structural analysis of C-waves was carried out and metric approaches to the analysis and forecasting of some global civilizational processes were considered. Note that throughout centuries, according to the synchronous development of the civilization, the nature of global conflicts also varied. It became especially noticeable at the end of the XXth and in the first decades of the XXIst centuries, when the mankind passed to the “information society” where the terms such as information wars, cyber wars, hybrid wars, psychotropic weapon, etc. have already become customary.",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.4 Metric Aspects of Periodic Processes in Economy and Society",
        "subsection": "4.4.4 F-Principle as the Basis of a Metric Study of Global Civilization Processes",
        "subsubsection": "N/A"
    },
    {
        "content": "4.4.5 Conclusions \n1. Based on the evolutionary development of the civilization considered as a holistic process with harmonious interaction of its various components, it has been justified that the $F .$ -pattern corresponds to a number of additional conditions, namely: \nthe law of structural harmony; the modern concept of acceleration of historical time; $bullet$ the concept of great Kondratieff cycles; global forecasts for the XXIst century as an especial, critical phase of the development of the civilization. \n2. As a result of the structural analysis of the time-base sweep of the WCsequences on the time interval $Delta ( - 7 5 0 ; mathrm { N o w d a y } )$ , all the metric characteristics of $C _ { n }$ -waves, $n = { overline { { 1 , 6 } } }$ have been established.   \n3. A new feature of the dynamics of $C _ { n }$ -waves of global system conflicts has been revealed, namely, the strict hierarchy of $C _ { n }$ -waves with respect to the intensities of their phases.   \n4. The metric forecast of the manifestation of great Kondratieff cycles in the 21st century based on the $F$ -pattern has been considered as an example. \n4.5 Big Solar Spiral of Stirring up Global Systemic Conflicts \nIn Sect. 4.2 and in paper [2], an analysis of the hypothetical Fibonacci pattern of global systemic conflicts is made, based on the dynamic model of so-called C-waves with the golden ratio metrics underneath. The studies [20, 29] consider the evolutionary development of the civilization as an integral process formed as a result of harmonic interference of its various components and justify the correspondence of this pattern to a number of additional conditions, namely: \nexistence of an interrelation of global systemic conflicts and Kondratiev cycles of economic conjuncture (Sect. 4.3), [20];   \n– the law of structural harmony [15, 21];   \n– modern concept on the acceleration of historical time [23]; global forecasts for the XXIst century as a special critical phase of the development of civilization [30]. \nBased on the analysis of the empirical sequence of the frequencies of global conflicts, structural analysis of C-waves was carried out and metric approaches to the analysis and forecasting of some global civilizational processes were considered. Note that throughout centuries, according to the synchronous development of the civilization, the nature of global conflicts also varied. It became especially noticeable at the end of the XXth and in the first decades of the XXIst centuries, when the mankind passed to the “information society” where the terms such as information wars, cyber wars, hybrid wars, psychotropic weapon, etc. have already become customary.",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.4 Metric Aspects of Periodic Processes in Economy and Society",
        "subsection": "4.4.5 Conclusions",
        "subsubsection": "N/A"
    },
    {
        "content": "4.5.1 Synchronous Variation of Solar Activity and Formation of C-Waves of Global Systemic Conflicts \nThe 11-year cycle of solar activity with average 11.1 year duration is defined by the Schwabe–Wolf [33] law. For the quantitative determination of solar activity, the Wolf numbers averaged over a year [33] published by the Zurich observatory since 1849 are most often applied. A number according to the Zurich indexing is assigned to all the observed 11-year cycles of solar activity. Number one is assigned to the cycle began in 1755, and the number of the current cycle began in 2008–2009 is 24. \nAll the observations of solar spots are summarized and monthly average and annual average values of the Wolf numbers are determined at the Solar Influences Data Analysis Center (Belgium) [34]. \nAn important statistical dependence of a series of Wolf numbers is characterized by the relation of amplitude and phase of cycles. According to this rule, the larger the duration of the current cycle, the less the amplitude of the next cycle [28]. Solar cycle is asymmetric with respect to the maximum of solar activity: growth phase $( approx 4 . 6$ years) is shorter than decay phase ${ approx } 6 . 5$ years) [33]. \nTable 4.13 shows the list of 11-year cycles of solar activity recorded since 1755 [28, 35, 36]. The graphic illustration of Zurich cycles Nos. 1–23 and of their parameters is presented in Fig. 4.13 by a radar chart, where $tau _ { m a x } ^ { ( k ) }$ is the year of the maximum solar activity in the Zurich cycle ${ W } _ { S A } ^ { ( k ) }$ , $k$ is the number of Zurich cycle $left( k = { overline { { 1 , 2 3 } } } right)$ . \n\nLet us introduce some definitions: ${ Z _ { mathrm { S A } } { overline { { mathrm { S W } } } } }$ are Zurich Schwabe–Wolf cycles (Zurich cycles, Schwabe–Wolf $Z$ -cycles; correspond to the Zurich numbers $mathbf { k } in$ $operatorname { I } ( 1 ; 2 3 ) ) ; operatorname { R } _ { operatorname { S A } } { { overline { { operatorname { S W } } } } }$ are recovered Schwabe–Wolf cycles (recovered cycles, Schwabe–Wolf R-cycles; correspond to the numbers $mathbf { k } in operatorname { I } ( - infty ; 0 ) ) ; operatorname { P } _ { mathrm { S A } } { overline { { mathrm { S W } } } }$ are predictable Schwabe–Wolf cycles (predictable cycles, Schwabe–Wolf $mathrm { bf P }$ -cycles; correspond to the Zurich numbers $mathbf { k } in operatorname { I } ( 2 4 ; + infty ) )$ , where \nThe constant \nis called the Schwabe–Wolf solar metric (the Schwabe–Wolf metric). Note that this metric, as a stable external performance criterion of various global dynamic processes in the interrelated system “the Sun–the Earth” allows us to improve some metric parameters of these processes. \nIn particular, parameters of $C _ { k }$ -waves can be adjusted proceeding from the following facts: \non the basis of scientific observations for the last three centuries, the number of 11.1 years is found as a stable arithmetic mean value of Schwabe–Wolf cycle periods; considerable lengths of periods $T ( C _ { k } )$ of $left( C _ { k } right)$ -waves, $k in I ( 1 ; 4 )$ of global systemic conflicts (260 years $< T ( C _ { k } ) < 1 2 0 0$ years) [2, 20, 29] allows us to assume that on the time intervals $Delta ( C _ { 1 } ) , Delta ( C _ { 2 } ) , Delta ( C _ { 3 } )$ and $Delta ( C _ { 4 } )$ determining the life cycles of these waves, the number 11.1 years as arithmetic mean value of the periods of Schwabe–Wolf cycles will be exhibited even more explicitly. \nThe «reconstruction» of $left( C _ { k } right)$ -waves has allowed us, in particular, to solve the important problem of determining “time glueing” of the intervals $Delta ( C _ { k } )$ of the manifestation of $left( C _ { k } right)$ -waves, $k in I ( 1 ; 4 )$ , based on the stable external criterion, the Schwabe–Wolf metric. \nThe correction error was $1 . 2 3 %$ with the completely retained hierarchical order of $C _ { k }$ -waves, $k in I ( 1 ; 7 )$ 31\u0006: Table 4.14 shows the results of the correction of intervals $Delta ( C _ { k } )$ of the manifestation of $C _ { k }$ -waves for $k in I ( 3 ; 7 )$ : Here, $[ alpha _ { k } ; beta _ { k } ] = Delta ( C _ { k } )$ is the time interval of the manifestation of $C _ { k }$ -wave [2, 29], and $left[ { alpha } _ { k } ^ { ast } ; { beta } _ { k } ^ { ast } right] = Delta ^ { ast } ( C _ { k } )$ is the interval of manifestation of $C _ { k }$ -wave, modified by the Schwabe–Wolf metric. \nThis procedure also allows us to arrange the Schwabe–Wolf R-cycles on the time intervals $Delta ^ { * } ( C _ { k } ) , k in I ( 1 ; 4 )$ ; uniformly with the period of 11.1 years and, based on the stable external criterion, to specify the values of the universal time quantum $k _ { C }$ of global systemic conflicts [29] and mean value $bar { T } _ { k c }$ of the duration of Kondratiev cycles of the modified sequence of $K$ -cycles [20]: \nThe ordered set of Schwabe–Wolf solar cycles \nis called an ensemble $( H _ { W } ^ { ( k ) }$ -ensemble) stirring up the $left( C _ { k } right)$ -wave of global systemic conflicts, where $Delta ^ { * } ( W ) = [ tau _ { 1 } ; tau _ { 2 } ]$ is the time interval of the manifestation of cycle $W in C _ { S A } big { overline { { S W } } big }$ : \nRemark $boldsymbol { { mathit { 1 } } }$ By stirring up of $C _ { k }$ -wave of global systemic conflicts by $H _ { W } ^ { ( k ) }$ -ensemble (briefly, $H _ { W } ^ { ( k ) } { odot } C _ { k } )$ ) we will mean the process of active systemic influence of the sequence of Schwabe–Wolf solar cycles constituting this ensemble on the process of evolutionary structurization of the $C _ { k }$ -wave, $k in I ( 1 ; 7 )$ . \nRemark 2 By stirring up a family of $left{ C _ { k } right} _ { k in I ( 1 ; 7 ) }$ -waves of global systemic conflicts by a sequence of ensembles of $H _ { W } ^ { ( 1 ) } , H _ { W } ^ { ( 2 ) } , . . . , H _ { W } ^ { ( 7 ) }$ Schwabe–Wolf cycles (briefly, $( H _ { W } ^ { ( 1 ) } { odot } C _ { 1 } ) mapsto ( H _ { W } ^ { ( 2 ) } { odot } C _ { 2 } ) mapsto dots mapsto ( H _ { W } ^ { ( 7 ) } { odot } C _ { 7 } ) )$ we will mean the process of active systemic influence of the sequence of solar cycles constituting these ensembles on the process of evolutionary formation of the sequence of $C _ { k }$ -waves as an integral structure. \nFigure 4.14 shows the alignment of two processes on the time axis: sequence of Schwabe–Wolf cycles HW  \u000b nW AÞol I 0;15 a nd sequence of empirical frequencies $N _ { W }$ of global systemic conflicts [2, 23]. Stirring up by $H _ { W } ^ { ( 5 ) }$ -ensemble of $C _ { 5 }$ -waves of global systemic conflicts is illustrated. The cycle with the number 0 pertains to Zurich cycles $Z _ { S A } { { overline { { S W } } } }$ in addition as original one taking into account the corrected glueing of time intervals $Delta ( C _ { k } ) , k in I ( 1 ; 7 )$ : \nFigure 4.15 illustrates “stirring up” by $H _ { W } ^ { ( 6 ) }$ -ensemble of $C _ { 6 }$ -wave of global systemic conflicts manifested in the 20th century [29]. \nBased on the aforesaid and the results from Sect. 4.2, we may state that the chain of stirring up the sequence of $mathrm { ^ c }$ -waves \nwhere \nimplements the process of systemic evolutionary structurization of the family of $left{ C _ { k } right} _ { k in I ( 1 ; 7 ) }$ -waves of global systemic conflicts. \nThe sequence of numbers \ncorresponds to the fragment of the Fibonacci sequence \nfor $C _ { 7 ^ { - } } , C _ { 6 ^ { - } }$ , …, $C _ { 1 }$ -waves, namely \nwhere \nis the global Schwabe–Wolf constant (global CSW-constant) of stirring up by $H _ { W } ^ { ( k ) }$ - ensembles of Schwabe–Wolf solar cycles of the family of $left{ C _ { k } right} _ { k in I ( 1 ; 7 ) }$ -waves of global systemic conflicts. \nLet us introduce the following notation: $pi _ { S W C } ( C )$ is the process of evolutionary structurization of the family of $C _ { k }$ -waves of global systemic conflicts, $k in I ( 1 ; 7 )$ ; $pi _ { S A } ( overline { { S W } } )$ is the global process of variation in solar activity in the context of manifestation of the Schwabe–Wolf cycles; $pi _ { G E } ( K C )$ is the world economy development process in a context of manifestation of Kondratiev cycles. \nLet us formulate the supposed hypothetical pattern of the metric relationship of global processes $pi _ { S A } left( overline { { S W } } right)$ and $pi _ { S W C } ( C )$ as the following hypothesis. \nHypothesis of (SA-WC)-synchrony. Each $C _ { k }$ -wave, $k in I ( 1 ; 7 )$ , of global systemic conflicts contains an integer number of complete Schwabe–Wolf solar cycles \nwhere $h _ { C } ^ { ( S W ) }$ is the global CSW-constant and $F ( C _ { k } )$ is the number of the Fibonacci sequence, corresponding to $C _ { k }$ -wave. \nConsidering (4.16–4.18), the hypothesis of (SA-WC)-synchrony, and the results from Sect. 4.2, let us formulate the hypothesis about the presence of a metric interrelation among three global synchronous processes: \nHypothesis of (SA-WC-GE)-interrelation. The following relation takes place: \nwhere $k _ { C }$ is a universal time slot of global systemic conflicts, $h _ { C } ^ { ( S W ) }$ is global SA-constant, lSSAWÞ is the Schwabe–Wolf Solar metric, and TKC is average duration of one Kondratiev cycle. \n4.5.2 Visualization of the Process of “Stirring Up” of the Family of ${ C _ { K } } _ { K in { I ( 1 ; 7 ) } }$ -Waves of Global Systemic Conflicts \nFigure 4.16 shows big Solar spiral of the process of “stirring up” of the family of ${ C _ { k } } _ { k in I ( 1 ; 7 ) }$ -waves of global systemic conflicts by the sequence of ensembles of the",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.5 Big Solar Spiral of Stirring up Global Systemic Conflicts",
        "subsection": "4.5.1 Synchronous Variation of Solar Activity and Formation of C-Waves of Global Systemic Conflicts",
        "subsubsection": "N/A"
    },
    {
        "content": "Hypothesis of (SA-WC-GE)-interrelation. The following relation takes place: \nwhere $k _ { C }$ is a universal time slot of global systemic conflicts, $h _ { C } ^ { ( S W ) }$ is global SA-constant, lSSAWÞ is the Schwabe–Wolf Solar metric, and TKC is average duration of one Kondratiev cycle. \n4.5.2 Visualization of the Process of “Stirring Up” of the Family of ${ C _ { K } } _ { K in { I ( 1 ; 7 ) } }$ -Waves of Global Systemic Conflicts \nFigure 4.16 shows big Solar spiral of the process of “stirring up” of the family of ${ C _ { k } } _ { k in I ( 1 ; 7 ) }$ -waves of global systemic conflicts by the sequence of ensembles of the \nSchwabe–Wolf cycles $H _ { W } ^ { ( 1 ) } , H _ { W } ^ { ( 2 ) } , . . . , H _ { W } ^ { ( 7 ) }$ on the time interval from $8 4 0 ~ mathrm { B C }$ till 2097 AD. The main parameters of the process are shown, as well as the structural properties of the waves of global systemic conflicts (strict hierarchy of $C _ { k }$ -waves with respect to the intensities $I big ( varphi _ { k , i } big )$ of phases of their evolutionary development, $varphi _ { k , i }$ , $i in I ( 1 ; 5 )$ , $C _ { k }$ -waves; $T _ { k } = T ( C _ { k } )$ is the duration of the life cycles of $C _ { k }$ - waves, $k in I ( 1 ; 7 )$ ; $N _ { k }$ is the number of Zurich cycles in the ensemble $H _ { W } ^ { ( k ) } , k in$ I 1; 7Þ : $N _ { k } = N _ { k + 1 } + N _ { k + 2 } ; N _ { 6 } = N _ { 7 } = = h _ { c } ^ { ( S W ) }$ , $k in I ( 1 ; 5 ) ; T _ { k } = T _ { k + 1 } + T _ { k + 2 }$ ; and $T _ { 6 } = T _ { 7 } = k _ { C } , k in I ( 1 ; 5 )$ . \nTaking into account the hyperbolic growth of the intensities of $C _ { k }$ -waves, $k in$ $I ( 1 ; 7 )$ [2], Fig. 4.17 also shows the hyperbolic Solar spiral of the process of “stirring up” of the family of ${ C _ { k } } _ { k in I ( 1 ; 7 ) }$ -waves of global systemic conflicts by the sequence of ensembles $H _ { W } ^ { ( 1 ) } , H _ { W } ^ { ( 2 ) } , . . . , H _ { W } ^ { ( 7 ) }$ on the time interval specified above. Here, $mathrm { T } ( mathrm { C } _ { 7 } ) = 8 9$ , T $( mathbf { C } _ { 6 } ) = 8 9$ , $mathrm { T } ( mathbf { C } _ { 5 } ) = 1 7 8$ , $mathrm { T } ( mathrm { C } _ { 4 } ) = 2 6 7$ , $mathrm { T } ( mathbf { C } _ { 3 } ) = 4 4 5$ , T $( mathbf { C } _ { 2 } ) = 7 1 2$ , and $mathrm { T } ( mathbf { C } _ { 1 } ) = 1 1 5 7$ . \n4.5.3 Local “Stirring Up” by $pmb { H } _ { W } ^ { ( K ) }$ -Ensemble of Schwabe– Wolf Solar Cycles of Evolution Phases of $C _ { k }$ -Wave of Global Systemic Conflicts \nWe have formulated the hypotheses and performed the analysis of metric aspects of the process of “stirring up” of the family of ${ C _ { k } } _ { k in I ( 1 ; 7 ) }$ -waves of global systemic conflicts by the sequence of Schwabe–Wolf cycles2 $H _ { W } ^ { ( 1 ) } , H _ { W } ^ { ( 2 ) } , . . . , H _ { W } ^ { ( 7 ) }$ , promoting the system formation of a global configuration of $C _ { k }$ -waves as an integrated structure on super-large time intervals.",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.5 Big Solar Spiral of Stirring up Global Systemic Conflicts",
        "subsection": "4.5.2 Visualization of the Process of “Stirring Up” of the Family of \\left\\{ {{\\varvec C}_{{\\varvec K}} } \\right\\}_{{{\\varvec K} \\in {\\varvec I}\\left( {1;7} \\right)}} -Waves of Global Systemic Conflicts",
        "subsubsection": "N/A"
    },
    {
        "content": "Schwabe–Wolf cycles $H _ { W } ^ { ( 1 ) } , H _ { W } ^ { ( 2 ) } , . . . , H _ { W } ^ { ( 7 ) }$ on the time interval from $8 4 0 ~ mathrm { B C }$ till 2097 AD. The main parameters of the process are shown, as well as the structural properties of the waves of global systemic conflicts (strict hierarchy of $C _ { k }$ -waves with respect to the intensities $I big ( varphi _ { k , i } big )$ of phases of their evolutionary development, $varphi _ { k , i }$ , $i in I ( 1 ; 5 )$ , $C _ { k }$ -waves; $T _ { k } = T ( C _ { k } )$ is the duration of the life cycles of $C _ { k }$ - waves, $k in I ( 1 ; 7 )$ ; $N _ { k }$ is the number of Zurich cycles in the ensemble $H _ { W } ^ { ( k ) } , k in$ I 1; 7Þ : $N _ { k } = N _ { k + 1 } + N _ { k + 2 } ; N _ { 6 } = N _ { 7 } = = h _ { c } ^ { ( S W ) }$ , $k in I ( 1 ; 5 ) ; T _ { k } = T _ { k + 1 } + T _ { k + 2 }$ ; and $T _ { 6 } = T _ { 7 } = k _ { C } , k in I ( 1 ; 5 )$ . \nTaking into account the hyperbolic growth of the intensities of $C _ { k }$ -waves, $k in$ $I ( 1 ; 7 )$ [2], Fig. 4.17 also shows the hyperbolic Solar spiral of the process of “stirring up” of the family of ${ C _ { k } } _ { k in I ( 1 ; 7 ) }$ -waves of global systemic conflicts by the sequence of ensembles $H _ { W } ^ { ( 1 ) } , H _ { W } ^ { ( 2 ) } , . . . , H _ { W } ^ { ( 7 ) }$ on the time interval specified above. Here, $mathrm { T } ( mathrm { C } _ { 7 } ) = 8 9$ , T $( mathbf { C } _ { 6 } ) = 8 9$ , $mathrm { T } ( mathbf { C } _ { 5 } ) = 1 7 8$ , $mathrm { T } ( mathrm { C } _ { 4 } ) = 2 6 7$ , $mathrm { T } ( mathbf { C } _ { 3 } ) = 4 4 5$ , T $( mathbf { C } _ { 2 } ) = 7 1 2$ , and $mathrm { T } ( mathbf { C } _ { 1 } ) = 1 1 5 7$ . \n4.5.3 Local “Stirring Up” by $pmb { H } _ { W } ^ { ( K ) }$ -Ensemble of Schwabe– Wolf Solar Cycles of Evolution Phases of $C _ { k }$ -Wave of Global Systemic Conflicts \nWe have formulated the hypotheses and performed the analysis of metric aspects of the process of “stirring up” of the family of ${ C _ { k } } _ { k in I ( 1 ; 7 ) }$ -waves of global systemic conflicts by the sequence of Schwabe–Wolf cycles2 $H _ { W } ^ { ( 1 ) } , H _ { W } ^ { ( 2 ) } , . . . , H _ { W } ^ { ( 7 ) }$ , promoting the system formation of a global configuration of $C _ { k }$ -waves as an integrated structure on super-large time intervals. \nAn important separate problem is investigating the patterns of the formation of the inner configuration for each separate $C _ { k }$ -wave, $k in I ( 1 ; 7 )$ , which is manifested on a smaller time interval. Such configuration is defined by the local hierarchy of the intensities $I big ( varphi _ { k , i } big )$ of its evolution phases $varphi _ { k , i }$ , $i in I ( 1 ; 5 )$ . To know the mechanisms of formation of such local structures is especially important in the scenario analysis of the development of global civilization processes in short-term perspective. \nFigure 4.18 illustrates the cycle ${ W } _ { S A } ^ { ( 2 3 ) }$ manifested at the “Decay” phase of the $C _ { 6 }$ -wave of global systemic conflicts (C-wave of the XXth century) completed in 2007. We used the results of statistical observations carried out from December, 2008 till April, 2014 [37] to represent parameters of the new solar cycle $W _ { S A } ^ { ( 2 4 ) } in H _ { W } ^ { ( 7 ) }$ , stirring up the first phase (Origin) of the forecasted (final) $C _ { 7 }$ -wave of global systemic conflicts (C-wave of the XXIst century). The years of active Sun (2013–2014) are emphasized. \nAnalyzing the time interval 2013–2014 as an active Sun period, we may state that it was characterized by considerable social disruptions in different regions of the world: Syria, Crimea, South-East Ukraine, Iran, and Iraq are the characteristic examples of releasing the collected social energy of major groups of population of the Earth at the initial phase of the seventh systemic global conflict. \n4.5.4 Scenarios “XXI–2k” and “XXI–3k” of Global Civilizational Processes During the Seventh Systemic Global Conflict \nLet us consider possible scenarios of the manifestation of two Kondratiev cycles during the seventh systemic global conflict (scenario “XXI–2K”) and three such cycles (scenario “XXI–3K”) [20]. Based on the alignment on the time interval ${ boldsymbol { Delta } } ^ { * } ( C _ { 7 } )$ (from 2008 till 2097) of three synchronous (forecasted) processes. \nFigures 4.19 and 4.20 show two possible scenarios “XXI–2K” and “XXI–3K” of the developments of global civilizational processes in the XXIst century. Taking into account the patterns presented above and leaning upon the results from [2, 20, 29], we present the results of the metric scenario analysis in Tables 4.15 and 4.16. \nFor brevity sake, we use the following notation: $tau _ { m a x } bigg ( W _ { S A } ^ { ( k ) } bigg )$ and smin $tau _ { m i n } left( W _ { S A } ^ { left( k right) } right)$ are respectively the years of the maximum and minimum activity of the Schwabe–Wolf Zurich cycle ${ W } _ { S A } ^ { ( k ) }$ ; $V ^ { + } left( bar { K } right)$ and $V ^ { - } ( bar { K } )$ respectively the ascending and descending half-waves of the Kondratiev cycle $bar { K }$ ; $t _ { m a x } ( bar { K } )$ and $t _ { m i n } ( bar { K } )$ are respectively the years of maximum and minimum conjuncture $K _ { G E }$ for the Kondratiev cycle $bar { K }$ ; notation $A approx B$ means that points $A$ and $B$ are rather close on the numerical axis; $t$ $( O ) , t ( G )$ and $t ( U )$ are conventional instants of time since which (according to scientific forecast) the amount of oil, gas, and uranium consumed in the world, respectively, will exceed their production. \nWe will assume that the scenario XXI-3K is more likely in the 21st century compared to the scenario XXI-2K. In favor of this assumption, we give two arguments: \n1. According to the hypothesis of the acceleration of historical time [21], all processes in the 21st century will proceed faster than in previous centuries.   \n2. Changes in the modern world are no longer linear in time (Fig. 4.21a). As defined by the UN Summit on Sustainable Development of 2015 and Davos Summit of 2015, these changes are exponential (Fig. 4.21b), and the new digital world is accordingly called exponential, where a, $mathbf { b }$ and $mathbf { k }$ are the constants of a global society growth. \n4.5.5 Conclusions \n1. In this section we have formulated the hypotheses about the presence of a metric relationship between the sequence of 11-year Schwabe–Wolf cycles of solar",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.5 Big Solar Spiral of Stirring up Global Systemic Conflicts",
        "subsection": "4.5.3 Local “Stirring Up” by {\\varvec H}_{{\\varvec W}}^{{\\left( {\\varvec K} \\right)}} -Ensemble of Schwabe–Wolf Solar Cycles of Evolution Phases of {\\varvec C}_{{\\varvec k}} -Wave of Global Systemic Conflicts",
        "subsubsection": "N/A"
    },
    {
        "content": "4.5.4 Scenarios “XXI–2k” and “XXI–3k” of Global Civilizational Processes During the Seventh Systemic Global Conflict \nLet us consider possible scenarios of the manifestation of two Kondratiev cycles during the seventh systemic global conflict (scenario “XXI–2K”) and three such cycles (scenario “XXI–3K”) [20]. Based on the alignment on the time interval ${ boldsymbol { Delta } } ^ { * } ( C _ { 7 } )$ (from 2008 till 2097) of three synchronous (forecasted) processes. \nFigures 4.19 and 4.20 show two possible scenarios “XXI–2K” and “XXI–3K” of the developments of global civilizational processes in the XXIst century. Taking into account the patterns presented above and leaning upon the results from [2, 20, 29], we present the results of the metric scenario analysis in Tables 4.15 and 4.16. \nFor brevity sake, we use the following notation: $tau _ { m a x } bigg ( W _ { S A } ^ { ( k ) } bigg )$ and smin $tau _ { m i n } left( W _ { S A } ^ { left( k right) } right)$ are respectively the years of the maximum and minimum activity of the Schwabe–Wolf Zurich cycle ${ W } _ { S A } ^ { ( k ) }$ ; $V ^ { + } left( bar { K } right)$ and $V ^ { - } ( bar { K } )$ respectively the ascending and descending half-waves of the Kondratiev cycle $bar { K }$ ; $t _ { m a x } ( bar { K } )$ and $t _ { m i n } ( bar { K } )$ are respectively the years of maximum and minimum conjuncture $K _ { G E }$ for the Kondratiev cycle $bar { K }$ ; notation $A approx B$ means that points $A$ and $B$ are rather close on the numerical axis; $t$ $( O ) , t ( G )$ and $t ( U )$ are conventional instants of time since which (according to scientific forecast) the amount of oil, gas, and uranium consumed in the world, respectively, will exceed their production. \nWe will assume that the scenario XXI-3K is more likely in the 21st century compared to the scenario XXI-2K. In favor of this assumption, we give two arguments: \n1. According to the hypothesis of the acceleration of historical time [21], all processes in the 21st century will proceed faster than in previous centuries.   \n2. Changes in the modern world are no longer linear in time (Fig. 4.21a). As defined by the UN Summit on Sustainable Development of 2015 and Davos Summit of 2015, these changes are exponential (Fig. 4.21b), and the new digital world is accordingly called exponential, where a, $mathbf { b }$ and $mathbf { k }$ are the constants of a global society growth. \n4.5.5 Conclusions \n1. In this section we have formulated the hypotheses about the presence of a metric relationship between the sequence of 11-year Schwabe–Wolf cycles of solar",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.5 Big Solar Spiral of Stirring up Global Systemic Conflicts",
        "subsection": "4.5.4 Scenarios “XXI–2k” and “XXI–3k” of Global Civilizational Processes During the Seventh Systemic Global Conflict",
        "subsubsection": "N/A"
    },
    {
        "content": "4.5.4 Scenarios “XXI–2k” and “XXI–3k” of Global Civilizational Processes During the Seventh Systemic Global Conflict \nLet us consider possible scenarios of the manifestation of two Kondratiev cycles during the seventh systemic global conflict (scenario “XXI–2K”) and three such cycles (scenario “XXI–3K”) [20]. Based on the alignment on the time interval ${ boldsymbol { Delta } } ^ { * } ( C _ { 7 } )$ (from 2008 till 2097) of three synchronous (forecasted) processes. \nFigures 4.19 and 4.20 show two possible scenarios “XXI–2K” and “XXI–3K” of the developments of global civilizational processes in the XXIst century. Taking into account the patterns presented above and leaning upon the results from [2, 20, 29], we present the results of the metric scenario analysis in Tables 4.15 and 4.16. \nFor brevity sake, we use the following notation: $tau _ { m a x } bigg ( W _ { S A } ^ { ( k ) } bigg )$ and smin $tau _ { m i n } left( W _ { S A } ^ { left( k right) } right)$ are respectively the years of the maximum and minimum activity of the Schwabe–Wolf Zurich cycle ${ W } _ { S A } ^ { ( k ) }$ ; $V ^ { + } left( bar { K } right)$ and $V ^ { - } ( bar { K } )$ respectively the ascending and descending half-waves of the Kondratiev cycle $bar { K }$ ; $t _ { m a x } ( bar { K } )$ and $t _ { m i n } ( bar { K } )$ are respectively the years of maximum and minimum conjuncture $K _ { G E }$ for the Kondratiev cycle $bar { K }$ ; notation $A approx B$ means that points $A$ and $B$ are rather close on the numerical axis; $t$ $( O ) , t ( G )$ and $t ( U )$ are conventional instants of time since which (according to scientific forecast) the amount of oil, gas, and uranium consumed in the world, respectively, will exceed their production. \nWe will assume that the scenario XXI-3K is more likely in the 21st century compared to the scenario XXI-2K. In favor of this assumption, we give two arguments: \n1. According to the hypothesis of the acceleration of historical time [21], all processes in the 21st century will proceed faster than in previous centuries.   \n2. Changes in the modern world are no longer linear in time (Fig. 4.21a). As defined by the UN Summit on Sustainable Development of 2015 and Davos Summit of 2015, these changes are exponential (Fig. 4.21b), and the new digital world is accordingly called exponential, where a, $mathbf { b }$ and $mathbf { k }$ are the constants of a global society growth. \n4.5.5 Conclusions \n1. In this section we have formulated the hypotheses about the presence of a metric relationship between the sequence of 11-year Schwabe–Wolf cycles of solar \nScenario 1 N C-wave W HW-ensemble of Schwabe-Wolf cycles πsA (SW) W（24 ）W（2 8 W2 福 嘴嘴 福 茶 N W(31) 2013 2024 2035 2045 205 2068 2079 2090   \n300 300 0 2 28 0 256 years KGE K-cycles forecast 元GE （KC) K n 1 K   \n300 X Q 0 t 0 LOIL URANIUM years J O Phases of C-wave(forecast）)πsw(C)   \n30 7.1 7.2 7.3 7.4 $Phi _ { 7 , 5 }$ Phase name ORIGIN GROWTH CULMINATION DECREASE Ω7 Ω7,2 Ω73 Ω74 Ω7.5 Scenario N C，-wave W H-ensemble of Schwabe-Wolf cycles πA (Sw) W（27) W（24） 25 8 W(26) 武 发 城 W(29) W(30） 安 SA W(31) SAA PD 2013 2024 2035 2045 57 2068 2079 2090 t   \n300 0 0 0 20 HGE TGE k（） K K   \n30 交 厦 t G 8 GAS years OIL URANIUM ！ g(7 Phasesof C-wave(forecast）πsw(C)   \n00 $Phi _ { 7 , 1 }$ 7.2 7.3 7.4 7.5 Phase name ORIGIN GROWTH CULMINATION DECREASE DECAY Ω7 Ω7.2 Ω7.3 74 Ω7.5 \n\nactivity and the process of evolutionary structurization of the family of $C$ -waves of global systemic conflicts enveloping large and super-large time intervals and having unstable “time configuration.” This relationship can be considered, in particular, as one more proof of the F-pattern of civilizational processes, leaning upon the global external criterion. \n2. Within the framework of the formulated hypotheses, we have obtained the formula relating the main metric performances of three global periodic processes of the Schwabe–Wolf cycles of solar activity, $C$ -waves of global systemic conflicts, and Kondratiev cycles of the development of the global economy. \n3. We have constructed the big and hyperbolic “Solar spirals” of the process of “stirring up” of the family of waves of global systemic conflicts by the sequence of ensembles of Schwabe–Wolf cycles on the time interval from 840 BC till 2097 AD, as a visual illustration of the revealed patterns. \n4. We have considered the formation of the local configuration of a separate $C _ { k }$ - wave of global systemic conflicts defined by the inner hierarchy of the intensities of its evolution phases. We have presented current parameters for the new, 24th, Schwabe–Wolf Zurich solar cycle “stirring up” the first phase (“Origin”) of the predicted $C _ { 7 }$ -wave. We have emphasized years 2013 and 2014 as years of active Sun. \n5. We have constructed two possible scenarios, XXI–2K and XXI–3K, for the development of global civilizational processes during the seventh (final) systemic global conflict in the 21st century. We have used the revealed patterns to formulate the main characteristic features of the scenarios and to define their metric performances. \n4.6 Influence of Global Threats on the Sustainable Development of Countries and Regions of the World \nThe study presented in this section is based on the concept of “sustainable development” being the further development of studies of Vernadskij about noosphere [22]. It has been theoretically and practically proved that on the edge of the centuries studies about the noosphere appeared to be a necessary platform for the development of three-dimension concept of ecological, social and economic sustainable development [4]. \nEconomic approach is based on the optimal usage of limited resources and application of natural-, power- and material saving technologies for creation of the gross income flow which would at least provide the preservation (not reduction) of the gross capital (physical, natural or human), with the use of which the gross income is created. \nFrom the ecological point of view the sustainable development is aimed at provision of the integrity of both biological and physical natural systems as well as their viability that influences the global stability of the whole biosphere. The ability of such systems to renovate and adapt to the various changes instead of maintenance of the biological variety in the certain static state, its degradation and loss is becoming extremely important. \nSocial constituent is aimed at human development, the preservation of stability of social and cultural systems, as well as the decrease in the number of conflicts in the society. A human being shall become not the object but the subject of the development participating in the processes of his/her vital activity formation, decision-making and implementation of the decisions, in the control over their implementation. To meet such requirements it is important to fairly distribute the wealth between the people, to observe pluralism of thoughts and tolerate human relationships, to preserve cultural capital and its variety, including first of all, the heritage of non-dominant cultures. \nSystemic coordination and balance of these three components is an extremely difficult task. In particular, the interconnection of social and ecological constituents causes the necessity to preserve equal rights of present and future generations to use natural resources. The interaction of social and economic constituents requires the achievement of equal and fair distribution of material wealth between people and help provision to the poor. And finally, the correlation of environmental and economic components requires the cost estimation of anthropogenic influences on environment. \nIn this study a Sustainable Development Gauging Matrix (SDGM) [4] within three abovementioned components is proposed and these processes are globally modeled in terms of quality and security of the human life. With the help of this Matrix the sustainable development processes have been globally modeled for a large group of world countries in terms of quality and security of the human life. The present article is a development of investigations that are presented in [38, 39] and describe the theoretical substantiation and computer modeling of the influence of system global conflicts on the sustainable development of countries and regions of the world in the global context. A distinctive feature of the investigations presented in these works lies in the analysis of linear dependences between values of levels of separate threats and integrated quality and safety indicators of life of people.",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.5 Big Solar Spiral of Stirring up Global Systemic Conflicts",
        "subsection": "4.5.5 Conclusions",
        "subsubsection": "N/A"
    },
    {
        "content": "Also in this section is analyzed the influence of 12 global threats on the sustainable development at the qualitative level with the use of Bayesian Belief Networks (BBNs) based on the theory of causality [40] with allowance for linear and nonlinear dependences. \nIn such problems, the use of a BBN that is a graphic model of probabilistic interrelations on a set of variables together with the apparatus of mathematical statistics provides a number of advantages, namely, it makes it possible to reveal causal relationships between different variables and, hence, to facilitate the understanding of complicated phenomena and processes such as sustainable development. Moreover, a BBN possessing both causal and probabilistic semantics is a convenient means for joint representation of expert knowledge determining causal relationships and instrumentally obtained statistical data (measurements, observations, and computations). \n4.6.1 The Methodology of Sustainable Development Evaluation in Terms of Quality and Security of the Human Life \nThe important issue in the process of implementation of the concept of sustainable development is the formation of the measurement system (Matrix) for the quantitative and qualitative assessment of this complicated procedure. \nThe process of sustainable development will be characterized according to two main components: security $( C _ { s l } )$ and quality $( C _ { q l } )$ of the human life (4.19). \nUnder this concept, the generalized measure (index) of sustainable development can be presented by means of the quaternion ${ { Q } }$ : \nThe quaternion ${ boldsymbol { Q } }$ includes an imaginary scalar part $j w _ { s l } C _ { s l }$ which describes the security of human life and a real scalar part as a projection of the norm of vector radius $vec { C } _ { q l }$ to an ideal vector with coordinates (1;1;1) which describes the quality of human life within three dimensions: economic $( I _ { e c } )$ , ecological $( I _ { e } )$ and socio-institutional $( { cal I } _ { s } )$ . Also we denote $w _ { s l } = 1 / sqrt { [ 3 ] } 1 2 , w _ { q l } = 1 / sqrt { 3 }$ . Under this condition $j$ gains a value of a real unit for a normal regular state of society development at $C _ { s l } > 0$ and a value of an imaginary unit when a society enters conflict state $C _ { s l } = 0 %$ ): \nThe security of human life component $begin{array} { r } { C _ { s l } = I _ { s e c } = overrightarrow { S _ { j } } = left( sum _ { i = 1 } ^ { n } left( s _ { i } ^ { j } right) ^ { p } right) ^ { frac { 1 } { p } } } end{array}$ is examined in detail in Sect. 4.2.3 and is represented by the set of threats (Table 4.3) and formula (4.12). Therefore, further on, we will examine in more detail the component of the quality of human life $C q l ( I e c , I e , I s )$ . \nSustainable development estimation methodology in the context of quality of human life. For every country the Euclidean norm of vector radius of human life quality $left( vec { C } _ { q l } right)$ is given in the following form: \nIn this case the indicators and policy categories which form the component of the quality of human life $C q l ( I e c , I e , I s )$ are calculated as a weighted total: \nwhere $I _ { i }$ is a value of an indicator or a category of policy for ith country (the number of the countries is $mathbf { m }$ ), $w _ { j }$ is weight of the jth component of I index (the number of the components is $mathfrak { n }$ ), $x _ { i , j }$ is a value of the $j$ th component for ith country. \nSuch representation of integrated indices (indicators and categories of policy) envisages that components of $mathbf { X _ { i , j } }$ in the formula (4.21) must be non-dimensional and vary within the same range. \nConsidering the fact that all data, indicators and indices included into the model are measured by virtue of different physical values, may be interpreted differently and change within the different ranges, they were aggregated to the standard form in such a way that all their variations would occur within the range from 0 to 1. To carry out this normalization, the formulas (4.10–4.11) can be used. \nThis normalization gives the possibility to calculate each of $I _ { e c } , I _ { e } , I _ { s }$ indices and with the help of them the components with appropriate weighting coefficients. Then the quantitative value of human life quality can be identified as projection of the norm of this vector to an ideal vector with coordinates (1; 1; 1), (Fig. 4.22): \nThe deviation angle $pmb { alpha }$ of the vector’s radius $C _ { q l }$ from the ideal vector (1, 1, 1) is estimated on the basis of the values of dimensions $I _ { e c } , I _ { e } , I _ { s }$ in the following way: \nThus, the projection of the norm of the vector’s radius $vec { C } _ { q l }$ to the ideal vector (1, 1, 1) characterizes the human life quality and the attitude position of the vector $vec { C } _ { q l }$ in the coordinate system $( I _ { e c } ,  I _ { e } ,  I _ { s } )$ characterizes the “harmonization” level of sustainable development. We should mention that when the angle $textsf { textsf { Q } }$ approaches 0, the harmonization level of sustainable development increases, i.e. the equidistance of the vector $vec { C } _ { q l }$ from each of coordinates $( I _ { e c } , I _ { e } , I _ { s } )$ will correspond to the highest harmonization value of sustainable development. If this vector approaches one of these coordinates, this will indicate the priority direction of the corresponding dimension development and neglect of two others. Let the value $G = I - alpha$ be the harmonization level of sustainable development. It will increase when $G$ approaches 1 and decrease when $G$ approaches 0. \nAs the researches of human life quality and security are conducted with the help of different methods and sets of initial data, it is worth performing them separately in three stages. At the first stage we will analyze the human life quality as one of the components of sustainable development. At the second stage we will investigate the human life security as another component of sustainable development. And at the third stage we will calculate the aggregate value of the Sustainable Development Index using two components and investigate this index. \n\nIn order to conduct the research of the life quality component of sustainable development, it is necessary to sample the data with the help of which each of three dimensions of sustainable development will be characterized in the most appropriate way. These data shall conform to the following important requirements: they have to be formed annually on continuing basis by respected and recognized international organizations. \nThus, the life quality component of sustainable development $C _ { q l }$ and the harmonization level of sustainable development $G = 1 - alpha$ are calculated on the basis of their constituents $I _ { e c } , I _ { e } , I _ { s }$ . Considering the requirements to initial data mentioned above the value of every dimension $I _ { e c } , I _ { e } , I _ { s }$ will be calculated according to five global indices widely used in the international practice (Table 4.17), being annually formed by the recognized international organizations. Let us consider all of them. \nThe Economic Dimension Index $( I _ { e c } )$ is formed on the basis of aggregation of two global indices (Table 1.17). \n1. The Global Competitiveness Index $( { cal I } _ { c } )$ was created by the organizers of the World Economic Forum. This index is annually estimated for 139 world economics and published in the form of so-called “Global competitiveness report” (World Economic Forum, www.weforum.org). To reduce the correlation between parts of quality of life we constructed own index based on original one. \nIt is formed of the following three groups of indicators: 1—the group of indicators of basic requirements (Basic requirements); 2—the group of indicators of the stimulants efficiency (Efficiency stimulants) and 3—the group of indicators of innovation (Innovation). \nThe first group includes two complex categories of economic policy: Infrastructure and Macroeconomic Environment. The second one consists of five policy categories: market size; the level of financial market development; technological readiness; labor market efficiency; effectiveness of goods and services. The third group involves three indicators: patent apps; technicians in R&D; charges for the use of intellectual property, payments. \n2. The Index of Economic Freedom $( I _ { e f } )$ was created by the Heritage Foundation (The Heritage Foundation, www.heritage.org/index/). This index is formed of the following twelve indicators: Property rights; Government integrity; Judicial effectiveness; Government spending; Tax burden; Fiscal heath; Business freedom; Labor freedom; Monetary freedom; Trade freedom; Investment freedom; Financial freedom. \nThe Ecological Dimension Index $( I _ { e } )$ will be estimated with the help of EPI (Environmental Performance Index (Yale Center for Environmental Low& Policy, www.epi.yale.edu). This index is formed by the Yale Center of Environmental Law and Policy together with Columbia University (USA) for 163 countries of the world. \nTo calculate this index the aggregation method is used according to which EPI index is formed of two categories of top-level environmental policy (Environmental health, being the sanitary state of environment, and Ecosystem vitality, which is the vital ability of the ecosystem), nine medium-level ecological indicators and 14 low-level indicators. \nThe presented index and its indicators identify the ability of every country to protect its environment both during a current period of time and also in long-term perspective, on the basis of availability of national environmental system, the ability to resist to environmental impacts and decrease in human dependence on environmental impacts, social and institutional resources of a country to meet the environmental challenges, possibility of global control over the environmental state of the country etc. Moreover, they can be used as a powerful tool for making decisions on the analytical basis including social and economic dimensions of sustainable development of the country. \nThe Social Dimension index $( { cal I } _ { s } )$ will be formed of six indices: Health, wellness and basics needs; Education; Personal rights and freedom; Personal safety; Corruption perception; Social Infrastructure. These indices are formed with the help of following indicators: human life cost, leisure and culture of people, economic state of the country, environmental state of the country, human freedom, human health, an infrastructure state, life risks and safety, nation poverty factors, level of unemployment, human health-care activities, gender conditions in the country and other. \nTable 4.18 shows the groups of policy categories and indicators used for global modeling of sustainable development processes. \nAs it is shown in Tables 4.17 and 4.18, life quality component of sustainable development $C _ { q l }$ and its harmonization degree $G = 1 - alpha$ were determined with the usage of 73 indicators. \nOn the basis of description of relations between different categories of policy and indicators reduced to common calculating platform, the mathematical SDGM model was developed, the structure of which is presented in Fig. 4.23. \nIt was taken into account that all data, indicators and indexes included into model (Fig. 4.23) are measured with the help of different physical quantities, may be interpreted differently and change within different ranges. That is why they were normalized for their changes to occur within range from 0 to 1. In this case the worst values of mentioned indicators conform to numeral values close to 1. Such normalization gives the opportunity to calculate every index $I _ { e c } , I _ { e } , $ $I _ { s }$ and component $C _ { q l }$ through their components with appropriate weight coefficients. In their turn the weight coefficients in the formula of calculation of life quality component of sustainable development $C _ { q l }$ are selected in order to give the possibility to provide equal values of economic, ecological and social dimension in the coordinate system $( I _ { e c } , I _ { e } , I _ { s } )$ . \nTherefore, the SDGM model gives the possibility to calculate life quality component of sustainable development $C _ { q l }$ and harmonization degree of this development ${ pmb G = { bf 1 } - pmb alpha }$ for every country of the world for which data about global indexes and indicators exist (Table 4.18). \n4.6.2 Some Basic Definitions and Concepts \nThis investigation is devoted to the determination of causal relations between the mentioned threats (Table 4.3) and indicators of sustainable development (4.12), (4.19–4.22) at a qualitative level. On this basis, a holistic interpretation of processes of sustainable development of countries and regions of the world is developed and also the vulnerability of this development to the influence of the collection of the mentioned threats is estimated. \nLet’s introduce some basic definitions and concepts: \n1. We consider the collection of global threats (Table 4.3) that exert influence on the sustainable development of countries and regions of the world [39]. Initial quantitative data on indicators of sustainable development, on each of these threats, and also on gross domestic products (GDPs) of countries that will be used for the construction of BBNs are presented in [39]. Since these data vary within different ranges and have different physical dimensions, we will use their normalized values:",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.6 Influence of Global Threats on the Sustainable Development of Countries and Regions of the World",
        "subsection": "4.6.1 The Methodology of Sustainable Development Evaluation in Terms of Quality and Security of the Human Life",
        "subsubsection": "N/A"
    },
    {
        "content": "Table 4.18 shows the groups of policy categories and indicators used for global modeling of sustainable development processes. \nAs it is shown in Tables 4.17 and 4.18, life quality component of sustainable development $C _ { q l }$ and its harmonization degree $G = 1 - alpha$ were determined with the usage of 73 indicators. \nOn the basis of description of relations between different categories of policy and indicators reduced to common calculating platform, the mathematical SDGM model was developed, the structure of which is presented in Fig. 4.23. \nIt was taken into account that all data, indicators and indexes included into model (Fig. 4.23) are measured with the help of different physical quantities, may be interpreted differently and change within different ranges. That is why they were normalized for their changes to occur within range from 0 to 1. In this case the worst values of mentioned indicators conform to numeral values close to 1. Such normalization gives the opportunity to calculate every index $I _ { e c } , I _ { e } , $ $I _ { s }$ and component $C _ { q l }$ through their components with appropriate weight coefficients. In their turn the weight coefficients in the formula of calculation of life quality component of sustainable development $C _ { q l }$ are selected in order to give the possibility to provide equal values of economic, ecological and social dimension in the coordinate system $( I _ { e c } , I _ { e } , I _ { s } )$ . \nTherefore, the SDGM model gives the possibility to calculate life quality component of sustainable development $C _ { q l }$ and harmonization degree of this development ${ pmb G = { bf 1 } - pmb alpha }$ for every country of the world for which data about global indexes and indicators exist (Table 4.18). \n4.6.2 Some Basic Definitions and Concepts \nThis investigation is devoted to the determination of causal relations between the mentioned threats (Table 4.3) and indicators of sustainable development (4.12), (4.19–4.22) at a qualitative level. On this basis, a holistic interpretation of processes of sustainable development of countries and regions of the world is developed and also the vulnerability of this development to the influence of the collection of the mentioned threats is estimated. \nLet’s introduce some basic definitions and concepts: \n1. We consider the collection of global threats (Table 4.3) that exert influence on the sustainable development of countries and regions of the world [39]. Initial quantitative data on indicators of sustainable development, on each of these threats, and also on gross domestic products (GDPs) of countries that will be used for the construction of BBNs are presented in [39]. Since these data vary within different ranges and have different physical dimensions, we will use their normalized values: \n(continued) \nwhere ${ overline { { X _ { j } } } } = { frac { sum _ { i = 1 } ^ { n } x _ { i , j } } { n } }$ is the average value of an indicator of sustainable development, a threat, and a GDP; $mathfrak { n }$ is the number of rows being analyzed, and \nis the standard deviation of a variable $X _ { j }$ \nData normalized in this way have zero mean and unit variance. Data for indicators of sustainable development, global threats, and also $mathbf { G D P _ { s } }$ for countries of the world in 2016 are taken from [4] and presented in Table 4.19. \n2. A Bayesian Belief Networks (BBNs) is a directed acyclic graph in which each vertex is associated with a discrete random quantity $X _ { i } , i = { overline { { 1 , n } } }$ , assuming values $x _ { i } ^ { j } , j = overline { { 1 , m _ { i } } }$ and arcs determine causal relations between random quantities. Vertices of this graph are associated with tables of conditional probabilities calculated by the Bayes formula \nwhere $a$ and $b$ are random events, $P ( a )$ and $P ( b )$ are probabilities of occurrences of the events $a$ and $b$ , and $P ( b | a )$ and $P ( a | b )$ are probabilities of occurrence of the event $b$ provided that event $a$ has occurred and, on the contrary, the occurrence of $a$ provided that the event $b$ has occurred. \n3. Using the terminology of hypotheses and evidence, we denote by $H$ an event in the case when a given hypothesis is true and by $E$ an event in the case when a definite testimony (evidence) has come that can testify to the mentioned hypothesis. Then formula (4.26) can be rewritten in the form \nRelationship (4.18) establishes a relation of a hypothesis with evidence and also establishes a relation of the evidence being observed with a hypothesis that is not yet justified. This interpretation also presumes the determination of the a priori probability of the hypothesis $P ( H )$ that is fixed prior to the observation or manifestation of some fact. \nThe determination of new evidences of the form $E _ { i } ^ { j } : X _ { i } = x _ { i } ^ { j } , i = { overline { { 1 , n } } } , j = { overline { { 1 , m _ { i } } } }$ ; in a BBN leads to the assignment of a posteriori probability (4.27) to each hypothesis of the form $H _ { i } ^ { j } : X _ { i } = x _ { i } ^ { j } , i = { overline { { 1 , n } } } , j = { overline { { 1 , m _ { i } } } }$ , this probability determines the degree of belief in this hypothesis [41]. \n4. For the synthesis of a BBN from data on the threats presented above, two problems should be solved. The first problem is connected with the selection of significant variables and definition of causal relations on their set and, as a rule, is solved owing to the involvement of experts in the field of analysis of threats. The second problem lies in the formation of tables of conditional probabilities that are associated with vertices of the graph of the BBN. This problem can be solved on the basis of computation of conditional probabilities from available experimental data on threats. It should be noted that if experimental data on threats are presented in interval scales [42], then they should be digitized, for example, with the use of clusterization by the method of $k$ -means [43]. In essence, the passage from quantitative estimates to qualitative ones is performed at this stage.   \n5. If a BBN has been constructed, then, specifying a threshold value for the degree of belief, one can determine the set of confirmed hypotheses for various collections of evidences. Generalizing these data, we obtain a qualitative characteristic of relations between threats $X _ { i } , i = { overline { { 1 , n } } }$ : \nA distinctive feature of the model being considered is that the involvement of insignificant threats and causal relations between them in this model leads to a significant growth in the dimension of the model. For example, to specify tables of conditional probabilities for a model in which dependences on all 12 global threats are determined for each indicator of sustainable development provided that $m _ { i } =$ $3 , i = { overline { { 1 , n } } }$ ; it is required to use $7 cdot 3 ^ { 1 2 } = 3 7 2 0 0 8 7$ real numbers. Hence, the number of vertices and arcs of the BBN should be decreased with preserving only essential variables and relations between them. A possible method of overcoming the mentioned “dimensionality” problem is the application of statistical dependency analysis with the use of methods of estimating correlations [44] or calculation of entropy [45]. In particular, correlation analysis makes it possible to obtain an estimate for the linear dependence between variables and to determine parameters of a linear model. The calculation of conditional entropy can also be used as an indication of nonlinear dependences but, in this case, it does not provide any information on the kind of such a dependence. \n4.6.3 Synthesis of Topologies of BBNs \nLet the information entropy of a discrete random quantity $X _ { i } , i = { overline { { 1 , n } } }$ (a threat, an indicator of sustainable development, and a GDP) that can assume values $x _ { i } ^ { j } , j = overline { { 1 , m _ { i } } }$ , be computed by the formula",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.6 Influence of Global Threats on the Sustainable Development of Countries and Regions of the World",
        "subsection": "4.6.2 Some Basic Definitions and Concepts",
        "subsubsection": "N/A"
    },
    {
        "content": "The determination of new evidences of the form $E _ { i } ^ { j } : X _ { i } = x _ { i } ^ { j } , i = { overline { { 1 , n } } } , j = { overline { { 1 , m _ { i } } } }$ ; in a BBN leads to the assignment of a posteriori probability (4.27) to each hypothesis of the form $H _ { i } ^ { j } : X _ { i } = x _ { i } ^ { j } , i = { overline { { 1 , n } } } , j = { overline { { 1 , m _ { i } } } }$ , this probability determines the degree of belief in this hypothesis [41]. \n4. For the synthesis of a BBN from data on the threats presented above, two problems should be solved. The first problem is connected with the selection of significant variables and definition of causal relations on their set and, as a rule, is solved owing to the involvement of experts in the field of analysis of threats. The second problem lies in the formation of tables of conditional probabilities that are associated with vertices of the graph of the BBN. This problem can be solved on the basis of computation of conditional probabilities from available experimental data on threats. It should be noted that if experimental data on threats are presented in interval scales [42], then they should be digitized, for example, with the use of clusterization by the method of $k$ -means [43]. In essence, the passage from quantitative estimates to qualitative ones is performed at this stage.   \n5. If a BBN has been constructed, then, specifying a threshold value for the degree of belief, one can determine the set of confirmed hypotheses for various collections of evidences. Generalizing these data, we obtain a qualitative characteristic of relations between threats $X _ { i } , i = { overline { { 1 , n } } }$ : \nA distinctive feature of the model being considered is that the involvement of insignificant threats and causal relations between them in this model leads to a significant growth in the dimension of the model. For example, to specify tables of conditional probabilities for a model in which dependences on all 12 global threats are determined for each indicator of sustainable development provided that $m _ { i } =$ $3 , i = { overline { { 1 , n } } }$ ; it is required to use $7 cdot 3 ^ { 1 2 } = 3 7 2 0 0 8 7$ real numbers. Hence, the number of vertices and arcs of the BBN should be decreased with preserving only essential variables and relations between them. A possible method of overcoming the mentioned “dimensionality” problem is the application of statistical dependency analysis with the use of methods of estimating correlations [44] or calculation of entropy [45]. In particular, correlation analysis makes it possible to obtain an estimate for the linear dependence between variables and to determine parameters of a linear model. The calculation of conditional entropy can also be used as an indication of nonlinear dependences but, in this case, it does not provide any information on the kind of such a dependence. \n4.6.3 Synthesis of Topologies of BBNs \nLet the information entropy of a discrete random quantity $X _ { i } , i = { overline { { 1 , n } } }$ (a threat, an indicator of sustainable development, and a GDP) that can assume values $x _ { i } ^ { j } , j = overline { { 1 , m _ { i } } }$ , be computed by the formula \nand let it be an averaged quantitative estimate of the indefiniteness (unexpectedness) of occurring events connected with the fact that a variable $X _ { i }$ assumes values $x _ { i } ^ { j }$ . By obviating this indefiniteness, we obtain information, i.e., intrinsic information on a variable $X _ { i }$ [46] is specified by the formula \nLet the upper-bound estimate of intrinsic information [47] \nbe also known (the equality is reached under the condition \n$begin{array} { r } { P big ( X _ { i } = x _ { i } ^ { j } big ) = frac { 1 } { m _ { i } } , i = overline { { 1 , n } } , j = overline { { 1 , m _ { i } } } big ) } end{array}$ : Then, using relationships (4.28) and (4.29), the measure of the specific informativeness of a variable can be defined as follows: \nWe represent the results of computations of the value of is by formula (4.30) for a collection of variables from Table 4.14. In particular, the specific informativeness of the variable GDP is equal to $0 . 6 4$ . Accordingly, we have $I s = I . O O _ { cdot }$ , $I q l = I . O O$ , $S F = I . O O$ , $Q = 0 . 9 9$ , $I G = 0 . 9 9$ , $C P = 0 . 9 8 ,$ , $I e c = 0 . 9 7 ;$ , $G I N I = 0 . 9 7$ , $G D = 0 . 9 5$ , $I e = 0 . 9 5 , N I = 0 . 9 3 , I s e c = 0 . 8 0$ , $C I = 0 . 8 0$ , $B B = 0 . 7 7$ , $N D = 0 . 7 2$ , $W A = 0 . 7 I$ , $E S = 0 . 2 5$ , and $G W = 0 . l 9 .$ . Here, the variables $E S$ (Global decrease in energy security) and ND (vulnerability to natural disasters) can be excluded from consideration since it is poorly informative. \nTo quantitatively characterize the mutual influence between variables $X _ { i }$ and $X _ { k }$ , $i = overline { { 1 , n } } , k = overline { { 1 , n } }$ we will use the concept of mutual information [46] \nwhere $H { big ( } X _ { j } | X _ { k } { big ) } = H ( X _ { i } X _ { k } ) - H ( X _ { i } )$ is the conditional entropy calculated with the help of formulas of conditional probabilities and relationship (4.28). \nAccording to [46], mutual information is a statistical function of two random quantities that determines the amount of information contained in one random quantity $X _ { i }$ with respect to another $X _ { k }$ : For mutual information, the following symmetry property is satisfied: $I ( X _ { i } ; X _ { k } ) = I ( X _ { k } ; X _ { i } )$ ; the mutual information of independent variables is equal to zero, i.e., we have $I ( X _ { i } ; X _ { k } ) = H ( X _ { i } ) -$ \n$H ( X _ { i } | X _ { k } ) = H ( X _ { i } ) - H ( X _ { i } ) = 0$ ; the mutual information of $I ( X _ { i } ; X _ { i } )$ is equal to the intrinsic information of this variable, \nThe upper limit of mutual information is also known, \nUsing the formula for mutual information (4.31) and its upper limit (4.32), the specific mutual informativeness for variables $X _ { i }$ and $X _ { k } , i = overline { { 1 , n } } , k = overline { { 1 , n } }$ can be found as follows: \nIt is obvious that we have $I _ { s } ( X _ { i } ; X _ { i } ) = 1$ : If these variables are independent, then we have $I _ { s } ( X _ { i } ; X _ { k } ) = 0$ : \nBased on data of Table 4.19 and using formula (4.33), values of specific mutual informativeness $I s ( X i ; X k )$ were calculated for the GDP level and indicators of sustainable development and threats (Table 4.20). \nIf a threshold value Ist $( mathrm { X i } ~ ; mathrm { X k } )$ is specified (in this case, we have Ist $mathrm { ( X i  ; X _ { k } ) ge 0 . 7 5 }$ , then essential dependences between global threats and indicators of sustainable development can be singled out (in Table 4.20, these indicators are highlighted by a heavy faced type). \nAs can be seen from Table 4.20, the variables BB, GD, ND, NI, WA, GINI have less influence on the indicators of the sustainable development. The indirect influence of these indicators is defined similarly using the calculation of joint information for the threats. \nThe BBN topology synthesized in this manner and destined for the conceptual analysis and modeling of the influence of global threats on the sustainable development of countries and regions of the world can be represented by the block diagram depicted in Fig. 4.24. \n4.6.4 Modelling the Influence of Global Threats on the Sustainable Development of Countries and Regions of the World with the Use of BBNs \nWe perform the computer modeling of the influence of global threats in several stages. \n1. Discretization of Initial Data. We reduce data of Table 4.19 to the following three discretization levels: high (H), average (M), and low (L) since only discrete variables can be used within the framework of a BBN model. We also specify unknown values of variables (U).",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.6 Influence of Global Threats on the Sustainable Development of Countries and Regions of the World",
        "subsection": "4.6.3 Synthesis of Topologies of BBNs",
        "subsubsection": "N/A"
    },
    {
        "content": "$H ( X _ { i } | X _ { k } ) = H ( X _ { i } ) - H ( X _ { i } ) = 0$ ; the mutual information of $I ( X _ { i } ; X _ { i } )$ is equal to the intrinsic information of this variable, \nThe upper limit of mutual information is also known, \nUsing the formula for mutual information (4.31) and its upper limit (4.32), the specific mutual informativeness for variables $X _ { i }$ and $X _ { k } , i = overline { { 1 , n } } , k = overline { { 1 , n } }$ can be found as follows: \nIt is obvious that we have $I _ { s } ( X _ { i } ; X _ { i } ) = 1$ : If these variables are independent, then we have $I _ { s } ( X _ { i } ; X _ { k } ) = 0$ : \nBased on data of Table 4.19 and using formula (4.33), values of specific mutual informativeness $I s ( X i ; X k )$ were calculated for the GDP level and indicators of sustainable development and threats (Table 4.20). \nIf a threshold value Ist $( mathrm { X i } ~ ; mathrm { X k } )$ is specified (in this case, we have Ist $mathrm { ( X i  ; X _ { k } ) ge 0 . 7 5 }$ , then essential dependences between global threats and indicators of sustainable development can be singled out (in Table 4.20, these indicators are highlighted by a heavy faced type). \nAs can be seen from Table 4.20, the variables BB, GD, ND, NI, WA, GINI have less influence on the indicators of the sustainable development. The indirect influence of these indicators is defined similarly using the calculation of joint information for the threats. \nThe BBN topology synthesized in this manner and destined for the conceptual analysis and modeling of the influence of global threats on the sustainable development of countries and regions of the world can be represented by the block diagram depicted in Fig. 4.24. \n4.6.4 Modelling the Influence of Global Threats on the Sustainable Development of Countries and Regions of the World with the Use of BBNs \nWe perform the computer modeling of the influence of global threats in several stages. \n1. Discretization of Initial Data. We reduce data of Table 4.19 to the following three discretization levels: high (H), average (M), and low (L) since only discrete variables can be used within the framework of a BBN model. We also specify unknown values of variables (U). \n\n2. Construction of a Bayesian Belief Network. For the construction and parametric adjustment of such a model, we will use the system GeNIe 2.0 [48] destined for the construction and modeling of Bayesian networks. In Fig. 4.25, the appearance of a Bayesian Belief Network constructed in the system GeNIe 2.0 is presented. Such a BBN allows one to estimate the degree of belief in hypotheses with respect to the influence of various threats on indicators of sustainable development of countries and regions of the world and causal relations between these variables. \nIn particular, for example, the evidences $P ( E _ { 1 } : G D P = H ) = 1$ is established. As a result, the degree of belief in the hypotheses $H _ { 1 } : ( Q = H )$ , $H _ { 2 } : ( I q l = H )$ , $H _ { 3 } : ( I s e c = H ) , H _ { 4 } : ( I e = H ) , H _ { 5 } : ( I e c = H )$ , and $H _ { 6 } : ( I s = H )$ is very high and amounts to 1.00 but, for example, for the hypothesis $H _ { 7 } : ( C P = L )$ ; the degree of belief equals 0.25. Countries with a high level of GDP always have a high value of all indicators of sustainable development, but relative to threats, for example CP, it doesn’t mean anything. \n3. Modeling of a BBN. In modeling a BBN, we establish the task of testing following hypotheses: «If the value (evalue) of some model’s variable (evar) is known, what will be the expected value (hvalue) of another variable (hvar).». \nFor this hypothesis we have: \nTo test the formulated hypotheses, the library SMILE [48] was used. In this case, a BBN was modelled with an exhaustive search for evidence. \nThus, the collection of results were obtained whose total length equaled $3 ^ { 1 6 } =$ 43; 046; 721 rows. It is obvious that their semantic interpretation for this length is a practically impossible task. \nIn this connection, it is necessary to reduce data end formally generalize them. \nTo reduce the data we estimated the probabilities of the first $( { mathfrak { a } } )$ and second (b) types error and removed the rows with high $textsf { a }$ values and low values $( 1 - beta )$ . \n4. Formal Generalization of Results of Modeling. To generalize the obtained results, we apply the set-theoretic approach [49] according to which a generalization of the facts presented by their specifications can be obtained as a result of set-theoretic operations over these specifications. We associate with evidence $E _ { i } ^ { j } : X _ { i } = x _ { i } ^ { j }$ the Boolean function \nwhere $P t$ is a given threshold for belief. \nWe also define $f left( E _ { i } ^ { U } : X _ { i } = U right)$ the conjunction of the following function: \nThen the conjunction of the following form corresponds to a collection of evidences $e = E _ { 1 } , E _ { 2 } , . . . , E _ { n }$ : \nFor a hypothesis $H$ justified on a set of collections of evidences $big { e _ { j } big } , j = overline { { 1 , m } }$ , we have \nApplying the rule of implication and the Quine–McCluskey covering method [50] to $( H )$ , one can obtain a minimal set covering all the collections of evidences for which the hypothesis $H$ is justified. \n4.6.5 Interpretation of the Generalized Results of Modeling \nIn Table 4.21, the results of modeling are generalized that are represented by minimal sets of collections of evidences that are justified by the hypotheses, respectively.",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.6 Influence of Global Threats on the Sustainable Development of Countries and Regions of the World",
        "subsection": "4.6.4 Modelling the Influence of Global Threats on the Sustainable Development of Countries and Regions of the World with the Use of BBNs",
        "subsubsection": "N/A"
    },
    {
        "content": "For a hypothesis $H$ justified on a set of collections of evidences $big { e _ { j } big } , j = overline { { 1 , m } }$ , we have \nApplying the rule of implication and the Quine–McCluskey covering method [50] to $( H )$ , one can obtain a minimal set covering all the collections of evidences for which the hypothesis $H$ is justified. \n4.6.5 Interpretation of the Generalized Results of Modeling \nIn Table 4.21, the results of modeling are generalized that are represented by minimal sets of collections of evidences that are justified by the hypotheses, respectively. \nAnalysis of the simulation results in Table 4.21 (the evidence are shown in bold) allows us to assess the expected level of the indicators of the sustainable development and the degree of manifestation of threats in conditions of uncertainty, when the values of some indicators are unknown. \nFor example, based on the analysis of lines 1–15, in which hypotheses are given, and the condition that one of the indicators of the sustainable development is known, it can be said that the indicators of the sustainable development are closely interrelated. Thus, a high level of GDP is always accompanied by high levels of all indicators of sustainable development (line 1); a low level of Iql is accompanied by low levels of GDP and Is, and it is also a determining factor for the low level of Q (line 5); a high level of Iec is accompanied by high levels of indicators Ie, Is, Isec and determines a high level of indicators Q and Iql (line 9); if it is known that Ie has a low level we can say that GDP, Q and Iql will also have a low level (line 10); knowing that the Is level is low allows us to say that GDP and Iql levels are also low (line 11); the high level of Is is accompanied by a high level of all indicators of the sustainable development with the exception of those that are directly related to economic development, i.e. GDP and Iec (line 12); low Isec level is accompanied by low levels of GDP, Q and Is indicators, as well as low level of ND threat (line 13). \nIn general, knowing the levels of the sustainable development indicators does not provide an opportunity to predict the levels of threat indicators (lines 1–15). \nIn the second part of the table (lines 16–28) hypotheses are given, provided that the level of one of the threats is known. The analysis of these lines allows us to say that threats are interrelated with each other, some of them influence the indicators of the sustainable development directly and others influence indirectly. \nFor example, a low level of CI is accompanied by low levels of ND and WA (line 16); if it is known that the level of CP is low then the level of ND, WA and SF threats will also be low, and this corresponds to a high level of Iec and Isec indicators (line 18); the high level of the CP threat is accompanied by a high level of the CI threat (line 20); high level of GD threat manifestation corresponds to high level of the NI and SF threats (line 21); a low level of NI identifies the fact that the threats CP, IC, WA and SF also has a low level, and the sustainability indicators Iec and Isec will be high (line 24); low level of the SF threat corresponds to low levels of the IC, ND and WA threats, as well as high level of Isec indicator. \n4.6.6 Visualization of Data on Indicators of Sustainable Development for Countries and Regions of the World \nIn this section we will present in the table and visual forms relations between levels of vulnerability of countries and regions of the world to global threats and indicators of sustainable development in the global context. Data for indicators of sustainable development, global threats, safety levels and also $mathbf { G D P _ { s } }$ for countries of the world in 2016 are taken from [4], ordered in accordance with the method of cluster analysis (4.13) and presented in Table 4.22.",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.6 Influence of Global Threats on the Sustainable Development of Countries and Regions of the World",
        "subsection": "4.6.5 Interpretation of the Generalized Results of Modeling",
        "subsubsection": "N/A"
    },
    {
        "content": "Analysis of the simulation results in Table 4.21 (the evidence are shown in bold) allows us to assess the expected level of the indicators of the sustainable development and the degree of manifestation of threats in conditions of uncertainty, when the values of some indicators are unknown. \nFor example, based on the analysis of lines 1–15, in which hypotheses are given, and the condition that one of the indicators of the sustainable development is known, it can be said that the indicators of the sustainable development are closely interrelated. Thus, a high level of GDP is always accompanied by high levels of all indicators of sustainable development (line 1); a low level of Iql is accompanied by low levels of GDP and Is, and it is also a determining factor for the low level of Q (line 5); a high level of Iec is accompanied by high levels of indicators Ie, Is, Isec and determines a high level of indicators Q and Iql (line 9); if it is known that Ie has a low level we can say that GDP, Q and Iql will also have a low level (line 10); knowing that the Is level is low allows us to say that GDP and Iql levels are also low (line 11); the high level of Is is accompanied by a high level of all indicators of the sustainable development with the exception of those that are directly related to economic development, i.e. GDP and Iec (line 12); low Isec level is accompanied by low levels of GDP, Q and Is indicators, as well as low level of ND threat (line 13). \nIn general, knowing the levels of the sustainable development indicators does not provide an opportunity to predict the levels of threat indicators (lines 1–15). \nIn the second part of the table (lines 16–28) hypotheses are given, provided that the level of one of the threats is known. The analysis of these lines allows us to say that threats are interrelated with each other, some of them influence the indicators of the sustainable development directly and others influence indirectly. \nFor example, a low level of CI is accompanied by low levels of ND and WA (line 16); if it is known that the level of CP is low then the level of ND, WA and SF threats will also be low, and this corresponds to a high level of Iec and Isec indicators (line 18); the high level of the CP threat is accompanied by a high level of the CI threat (line 20); high level of GD threat manifestation corresponds to high level of the NI and SF threats (line 21); a low level of NI identifies the fact that the threats CP, IC, WA and SF also has a low level, and the sustainability indicators Iec and Isec will be high (line 24); low level of the SF threat corresponds to low levels of the IC, ND and WA threats, as well as high level of Isec indicator. \n4.6.6 Visualization of Data on Indicators of Sustainable Development for Countries and Regions of the World \nIn this section we will present in the table and visual forms relations between levels of vulnerability of countries and regions of the world to global threats and indicators of sustainable development in the global context. Data for indicators of sustainable development, global threats, safety levels and also $mathbf { G D P _ { s } }$ for countries of the world in 2016 are taken from [4], ordered in accordance with the method of cluster analysis (4.13) and presented in Table 4.22. \nBased on the data presented in Table 4.22, Fig. 4.26a, b illustrate the indicators of sustainable development and people’s quality of life for different countries and regions of the world. \n4.6.7 Conclusions \n1. This section proposes an approved methodology of application of BBNs to the qualitative analysis of dependences and establishment of causal relations between levels of vulnerability of countries and regions of the world to global threats and indicators of sustainable development in the global context. \n2. Measures of specific informativeness of a variable and specific mutual informativeness of variables are introduced that can be used for the selection of essential variables, and also definitions are given for causal relations between them in synthesizing topologies of BBNs. It is shown that the influence of threats such as the vulnerability to natural disasters, balance between the biological productivity of a territory and its total consumption, income inequality, instability of a state on the level of sustainable development of countries of the world is less essential in comparison with other global threats. Therefore, these threats were excluded from consideration. \n3. BBNs were modelled with a view to testing $3 ^ { 1 6 }$ hypothesis. We used the significance of $textsf { textsf { Q } }$ and the power of $( 1 - beta )$ as a criterion for data reduction. A method of generalization of the modeling results is proposed that is based on the construction of a Boolean function for a set of collections of evidences, which makes it possible to use the methods of Boolean algebra for obtaining a minimal set of evidences collections justifying each hypothesis. \n4. Analysis of the modeling results allows us to draw out the conclusions that the indicators of the sustainable development are closely interrelated, but knowledge of their values does not give grounds for expecting any specific levels of threats. On the other hand, threats are also interconnected, some of which have a direct impact on indicators of the sustainable development, others has an indirect one. If the values of some threats are known then it identifies the level of indicators of the sustainable development in certain cases. \n4.7 The General Concept of the Periodic Systemic World Conflicts \nInvestigating the global evolutionary development of the civilization as a complicated, integral, self-organizing system assumes taking into account some interrelated processes and factors of various nature, among which global conflicts occupy one of central places. Finding and constructing general models that would adequately describe the regularities of world conflicts remains one of the most important yet unsolved problems the science is facing. Despite numerous attempts, no adequate scientifically justified metric toolkit has been proposed yet for global forecast and prediction of the development of world conflicts, which is especially necessary when the global civilization enters the XXIst century as a special critical phase of its development [18, 26, 30].",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.6 Influence of Global Threats on the Sustainable Development of Countries and Regions of the World",
        "subsection": "4.6.6 Visualization of Data on Indicators of Sustainable Development for Countries and Regions of the World",
        "subsubsection": "N/A"
    },
    {
        "content": "Based on the data presented in Table 4.22, Fig. 4.26a, b illustrate the indicators of sustainable development and people’s quality of life for different countries and regions of the world. \n4.6.7 Conclusions \n1. This section proposes an approved methodology of application of BBNs to the qualitative analysis of dependences and establishment of causal relations between levels of vulnerability of countries and regions of the world to global threats and indicators of sustainable development in the global context. \n2. Measures of specific informativeness of a variable and specific mutual informativeness of variables are introduced that can be used for the selection of essential variables, and also definitions are given for causal relations between them in synthesizing topologies of BBNs. It is shown that the influence of threats such as the vulnerability to natural disasters, balance between the biological productivity of a territory and its total consumption, income inequality, instability of a state on the level of sustainable development of countries of the world is less essential in comparison with other global threats. Therefore, these threats were excluded from consideration. \n3. BBNs were modelled with a view to testing $3 ^ { 1 6 }$ hypothesis. We used the significance of $textsf { textsf { Q } }$ and the power of $( 1 - beta )$ as a criterion for data reduction. A method of generalization of the modeling results is proposed that is based on the construction of a Boolean function for a set of collections of evidences, which makes it possible to use the methods of Boolean algebra for obtaining a minimal set of evidences collections justifying each hypothesis. \n4. Analysis of the modeling results allows us to draw out the conclusions that the indicators of the sustainable development are closely interrelated, but knowledge of their values does not give grounds for expecting any specific levels of threats. On the other hand, threats are also interconnected, some of which have a direct impact on indicators of the sustainable development, others has an indirect one. If the values of some threats are known then it identifies the level of indicators of the sustainable development in certain cases. \n4.7 The General Concept of the Periodic Systemic World Conflicts \nInvestigating the global evolutionary development of the civilization as a complicated, integral, self-organizing system assumes taking into account some interrelated processes and factors of various nature, among which global conflicts occupy one of central places. Finding and constructing general models that would adequately describe the regularities of world conflicts remains one of the most important yet unsolved problems the science is facing. Despite numerous attempts, no adequate scientifically justified metric toolkit has been proposed yet for global forecast and prediction of the development of world conflicts, which is especially necessary when the global civilization enters the XXIst century as a special critical phase of its development [18, 26, 30].",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.6 Influence of Global Threats on the Sustainable Development of Countries and Regions of the World",
        "subsection": "4.6.7 Conclusions",
        "subsubsection": "N/A"
    },
    {
        "content": "In the context of finding a solution to this problem, being based on systemic generalization of the results of analysis of an extensive empirical material [1, 51, 52], the studies [2, 20, 29] revealed and justified a hypothetical pattern of systemic world conflicts, based on the so-called dynamic model of $C$ -waves, which is based on the “golden section” metric. \nIn this section we will propose a general concept of $C$ -waves of systemic world conflicts, on the basis of generalization and formalization of the approaches considered in [2, 20, 29]. We will analyze the class of $C$ -waves, which envelopes super-long time intervals. We will show that the pattern of big $C$ -waves is invariant with respect to the evolution of the nature of world conflicts. \n4.7.1 Some Concepts and Definitions \nLet $mathrm { W _ { c } ( m , n ) }$ be a set of all world conflicts occurred from year $mathbf { delta } _ { m }$ th to $pmb { n }$ th, where m, $n in Z$ ; $Z$ is the set of integer numbers. \nIn what follows, we will consider that years of the Common era (AD) correspond to positive numbers $mathbf { delta } _ { m }$ and $pmb { n }$ , and years before Christ (BC) correspond to negative ones. \nLet us associate any s that belongs to time interval ${ mathcal { I } } ( m , n ) ( mathbf { s } in { mathcal { I } } ( m , n )$ $triangleq [ m , n ] cap Z )$ with the following group of world conflicts: \nwhere \nBasic definition. Let the sequence of the group of world conflicts \ngenerate, on the time interval $mathcal { I } ( m , n )$ , the family \nof waves of Systemic World Conflicts (SWC), briefly SWC-waves or $mathbf { C }$ -waves, if there exists a block matrix \nfor which the following conditions are true: \nwhere \nElements of matrices $mathcal { I } = left| mathcal { I } _ { k , mathrm { i } } right| _ { k = overline { { 1 , N } } , mathrm { i } = overline { { 1 , 5 } } }$ $mathcal { X } = left. boldsymbol { tau } _ { k , mathrm { i } } right. _ { k = overline { { boldsymbol { mathscr { I } } , N } } , mathrm { i = } overline { { 1 , 5 } } } mathrm { ~  ~ cal ~ { ~ M ~ } ~ } mathcal { E } = left. boldsymbol { mathscr { E } } _ { mathbf { Phi } _ { k , mathrm { i } } } right. _ { mathbf { Phi } _ { k = overline { { boldsymbol { I } } , N } , mathrm { i = } overline { { 1 , 5 } } } } ,$ define the following parameters of the set $mathcal { M } _ { c } ( m , n )$ of SWC-waves: \n(1) $mathcal { I } _ { k , 0 }$ —time interval of the life cycle of $C _ { k } -$ wave, $k = overline { { 1 , N } }$ ; (2) ${ mathcal { I } } _ { k , i } - i { mathrm { t h } }$ phase of $C _ { k }$ -wave, $k = overline { { 1 , N } }$ ; $i = { overline { { 1 , 5 } } }$ , namely, \n(3) $tau _ { k , 0 }$ —life cycle duration of $C _ { k }$ -wave, $k = overline { { 1 , N } }$ ;   \n(4) $tau _ { k , i }$ —duration of phase $f _ { k , i }$ , of $C _ { k }$ -wave, $k = overline { { 1 , N } } , i = overline { { 1 , 5 } }$ ;   \n(5) $mathcal { E } _ { k , 0 }$ —power of $C _ { k }$ -wave, $k = overline { { 1 , N } }$ ; $i = { overline { { 1 , 5 } } }$ ;   \n(6) $mathcal { E } _ { k , i }$ —power of phase $f _ { k , i }$ , of $C _ { k }$ -wave, $k = overline { { 1 , N } }$ ; $i = { overline { { 1 , 5 } } }$ . \nDefinition 1 We will call J ¼ J k;i\u0004 k 1;N;i 1;5 interval matrix; $chi = left| left| tau _ { k , i } right| right| _ { k = overline { { 1 , N } } , i = overline { { 1 , 5 } } }$ —chronometric matrix\u0004, an\u0004d $mathcal { E } = left| left| mathcal { E } _ { k , i } right| right| _ { k = overline { { 1 , N } } , i = overline { { 1 , 5 } } }$ energy matri\u0004x of\u0004the evolutionary structuration process $pi _ { s w c } ^ { e s } ( { mathcal L } _ { c } ( m , n ) )$ \u0004of family $boldsymbol { mathcal { M } } _ { c } ( boldsymbol { m } , n )$ of systemic world conflict waves. \nDefinition 2 Let the family $mathcal { M } _ { c } ( boldsymbol { m } , boldsymbol { n } )$ of SWC-waves be a uniform complete space-time object defined by block matrix $textsf { a }$ . We will call $alpha$ —the ensemble of SWC-waves of systemic world conflicts, briefly, $( S W C ) _ { alpha }$ -ensemble. \nLet us represent it as follows: \nwhere $ll int  d Omega _ { rangle rangle }$ is an operator of local “systemic merge” of “fragments” $C _ { mathrm { k , i } }$ and $C _ { mathrm { k , i + 1 } }$ of $C _ { mathrm { k } }$ -wave, which correspond to phases $f _ { mathrm { k , i } }$ and $f _ { mathrm { k , i + 1 } } ,  k = overline { { 1 , N } } , i = overline { { 1 , 4 } }$ ; $langle langle mathsf { G } _ { mathsf { N } }$ is operator of global system merge of $C _ { k }$ and $C _ { k + 1 }$ waves, $k = overline { { 1 , N - 1 } }$ . We will call \nnumber $d i m big ( big ( S W C big ) _ { alpha } big ) equiv N cdot$ —the dimension of ensemble $( S W C ) _ { alpha }$ , and time interval $mathcal { I } ( m , n )$ the interval of manifestation of $( S W C ) _ { alpha }$ -ensemble. \n4.7.2 Geometric Images of $C _ { mathrm { K } }$ -Waves and Ensemble of $( S W C ) _ { alpha }$ -Waves of Systemic World Conflicts \nLet us consider the following geometric images (phase portraits) of $C _ { mathrm { k } }$ -waves and ensemble of $( S W C ) _ { alpha }$ -waves of systemic world conflicts: \n1. Geometric Image of $C _ { mathrm { k } }$ -wave, $k = overline { { 1 , N } }$ , can be represented as the graph (Fig. 4.27a) of the following step function: \n(where $mathcal { E } _ { k , mathrm { i } }$ —is defined from (4.39), (4.40)), or as a graph of some continuous function $bar { h } _ { C _ { k } } ( t )$ , approximates function $bar { h } _ { C _ { k } } ( t )$ on the interval $left[ m _ { k } , n _ { k } right]$ (Fig. 4.27b).",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.7 The General Concept of the Periodic Systemic World Conflicts",
        "subsection": "4.7.1 Some Concepts and Definitions",
        "subsubsection": "N/A"
    },
    {
        "content": "number $d i m big ( big ( S W C big ) _ { alpha } big ) equiv N cdot$ —the dimension of ensemble $( S W C ) _ { alpha }$ , and time interval $mathcal { I } ( m , n )$ the interval of manifestation of $( S W C ) _ { alpha }$ -ensemble. \n4.7.2 Geometric Images of $C _ { mathrm { K } }$ -Waves and Ensemble of $( S W C ) _ { alpha }$ -Waves of Systemic World Conflicts \nLet us consider the following geometric images (phase portraits) of $C _ { mathrm { k } }$ -waves and ensemble of $( S W C ) _ { alpha }$ -waves of systemic world conflicts: \n1. Geometric Image of $C _ { mathrm { k } }$ -wave, $k = overline { { 1 , N } }$ , can be represented as the graph (Fig. 4.27a) of the following step function: \n(where $mathcal { E } _ { k , mathrm { i } }$ —is defined from (4.39), (4.40)), or as a graph of some continuous function $bar { h } _ { C _ { k } } ( t )$ , approximates function $bar { h } _ { C _ { k } } ( t )$ on the interval $left[ m _ { k } , n _ { k } right]$ (Fig. 4.27b). \n2. Geometric Image of the Ensemble of $( S W C ) _ { alpha }$ -waves of systemic world conflicts (“collective phase portrait” of the family $mathcal { M } _ { c } ( boldsymbol { m } , n )$ of $C _ { mathrm { k } }$ -waves, $k = overline { { 1 , N } } )$ can be represented as the graph (Fig. 4.28a) of the following step function: \n(where function $bar { h } _ { C _ { k } } ( t )$ is defined from (4.46)), or by the graph of some continuous function $bar { h } _ { C } ( t )$ that approximates function $bar { h } _ { C } ( t )$ on the interval $[ m , n ]$ (Fig. 4.28b). \n4.7.3 Significant Features of SWC-Concept \nWe will assume that the correspondence of the SWC-concept under study to the structural harmony principle is its main key feature (F1) [29]. \nLet \nThen, according to (4.38) and (4.41), variation in life cycle duration of the $T ( C _ { k } )$ —sequence of $C _ { k }$ -waves, $k = overline { { 1 , N } }$ , obeys the following principle: \nwhere $F _ { N - k + 1 }$ —is a number from the Fibonacci sequence. This testifies that golden section is present in the structure of $( S W C ) _ { alpha }$ -ensemble; and according to the structural harmony principle [23], in any self-organizing system, special time scale corresponds to each operating mode that varies according to variation of structural states of the system. This time scale is “tied” to a certain invariant of generalized golden section as a characteristic of steady state of the system. \nWe will call constant $k c$ a universal time metric quantum of the life cycles of $C _ { k }$ - waves of $( S W C ) _ { alpha }$ -ensemble. In what follows, we will call the Fibonacci regularity (4.38), (4.41), (4.49), (4.50) of the development of systemic world conflicts the F-regularity. \nThe second key feature (F2) of the considered concept is that the universal effect of acceleration of historical time is revealed in the proposed dynamic model of $C .$ - waves [19, 24, 27] since according to (4.38), (4.41) successive reduction of the",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.7 The General Concept of the Periodic Systemic World Conflicts",
        "subsection": "4.7.2 Geometric Images of C_{{\\rm K}}^{{}} -Waves and Ensemble of \\left( {SWC} \\right)_{\\alpha } -Waves of Systemic World Conflicts",
        "subsubsection": "N/A"
    },
    {
        "content": "(where function $bar { h } _ { C _ { k } } ( t )$ is defined from (4.46)), or by the graph of some continuous function $bar { h } _ { C } ( t )$ that approximates function $bar { h } _ { C } ( t )$ on the interval $[ m , n ]$ (Fig. 4.28b). \n4.7.3 Significant Features of SWC-Concept \nWe will assume that the correspondence of the SWC-concept under study to the structural harmony principle is its main key feature (F1) [29]. \nLet \nThen, according to (4.38) and (4.41), variation in life cycle duration of the $T ( C _ { k } )$ —sequence of $C _ { k }$ -waves, $k = overline { { 1 , N } }$ , obeys the following principle: \nwhere $F _ { N - k + 1 }$ —is a number from the Fibonacci sequence. This testifies that golden section is present in the structure of $( S W C ) _ { alpha }$ -ensemble; and according to the structural harmony principle [23], in any self-organizing system, special time scale corresponds to each operating mode that varies according to variation of structural states of the system. This time scale is “tied” to a certain invariant of generalized golden section as a characteristic of steady state of the system. \nWe will call constant $k c$ a universal time metric quantum of the life cycles of $C _ { k }$ - waves of $( S W C ) _ { alpha }$ -ensemble. In what follows, we will call the Fibonacci regularity (4.38), (4.41), (4.49), (4.50) of the development of systemic world conflicts the F-regularity. \nThe second key feature (F2) of the considered concept is that the universal effect of acceleration of historical time is revealed in the proposed dynamic model of $C .$ - waves [19, 24, 27] since according to (4.38), (4.41) successive reduction of the \nduration of life cycles of $C$ -waves of the development of systemic world conflicts takes place as an essential component of the integral evolutionary development of the civilization. \nThe third important property (F3) of $C _ { k }$ -waves is their strict hierarchy with respect to the power $mathcal { E } _ { i , k }$ of phases $f _ { k , i }$ , $k = overline { { 1 , mathrm { N } } }$ ; $i = { overline { { 1 , 5 } } }$ , of their evolutionary development (Fig. 4.29). \nBased on the features F1–F3 and taking into account (4.34–4.48) and Figs. 4.27, 4.28, it follows that the family $boldsymbol { mathcal { M } } _ { c } ( boldsymbol { m } , n )$ of $C _ { k }$ -waves forms the sequence of irregular, self-similar, local objects that, in turn, allows us to consider the $( S W C ) _ { alpha } .$ - ensemble of waves of systemic world conflicts as a global, integral, self-organizing space–time object of fractal nature. \n4.7.4 Correlation of Processes of Evolutionary Development of Civilization $boldsymbol { Pi } _ { C } ^ { E d }$ and Development of C-Waves of Systemic World Conflicts $pi _ { s w c } ^ { e s } ( { mathcal L } _ { c } ( m , n ) )$ \nLet us denote by M civilization as an integral, open, dynamic, self-organizing system. Let $Omega ( mathbf { M } )$ be the set of various states of system M. Let also $Psi : mathcal { I } _ { infty } longmapsto Omega ( mathbf { M } )$ , where $mathcal { I } _ { infty } equiv { mathcal { I } ( boldsymbol { m } , boldsymbol { n } ) } _ { { m } , { n } in { Z } }$ be mapping generated by the process $pi _ { C } ^ { e d }$ of natural evolutionary development of system M. \nDefinition 3 We will call $Psi [ mathcal { I } ( m _ { k } , n _ { k } ) ] equiv Omega ( C _ { k } ) = Omega _ { k } , k = overline { { 1 , N } }$ , the $( { mathcal { F } } , k )$ -state of system $mathcal { M }$ and call $Psi big [ mathcal { I } big ( m _ { k , i } , n _ { k , i } big ) big ] equiv Omega big ( C _ { k , i } big ) = Omega _ { k , i } , k = overline { { 1 , N } } , i = overline { { 1 , 5 } } ,$ , the $( mathcal { F } , k , i )$ -state of system $mathcal { M }$ . \nSince $F _ { N - k + 1 } in varnothing forall k > N$ , the sequence of Fibonacci numbers $left{ F _ { N - k + 1 } right} _ { k in N }$ is degenerated for $k > N$ ; therefore, the F-regularity revealed on the time interval $mathcal { I } ( m , n )$ for the development of systemic world conflicts on the time interval $t > n$ is not true anymore. \nFigures 4.29 and 4.30 show the diagrams $D _ { g }$ , $D _ { l o c } ^ { ( k ) }$ , and $D _ { l o c }$ , which illustrate the correlation between process $pi _ { C } ^ { e d }$ of evolutionary development of the civilization and process $pi _ { s w c } ^ { e s } ( mathcal { L } _ { c } ( m , n ) )$ of evolutionary structurization of the sequence of $C _ { k }$ -waves of systemic world conflicts (in the global and local contexts) (Fig. 4.31).",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.7 The General Concept of the Periodic Systemic World Conflicts",
        "subsection": "4.7.3 Significant Features of SWC-Concept",
        "subsubsection": "N/A"
    },
    {
        "content": "duration of life cycles of $C$ -waves of the development of systemic world conflicts takes place as an essential component of the integral evolutionary development of the civilization. \nThe third important property (F3) of $C _ { k }$ -waves is their strict hierarchy with respect to the power $mathcal { E } _ { i , k }$ of phases $f _ { k , i }$ , $k = overline { { 1 , mathrm { N } } }$ ; $i = { overline { { 1 , 5 } } }$ , of their evolutionary development (Fig. 4.29). \nBased on the features F1–F3 and taking into account (4.34–4.48) and Figs. 4.27, 4.28, it follows that the family $boldsymbol { mathcal { M } } _ { c } ( boldsymbol { m } , n )$ of $C _ { k }$ -waves forms the sequence of irregular, self-similar, local objects that, in turn, allows us to consider the $( S W C ) _ { alpha } .$ - ensemble of waves of systemic world conflicts as a global, integral, self-organizing space–time object of fractal nature. \n4.7.4 Correlation of Processes of Evolutionary Development of Civilization $boldsymbol { Pi } _ { C } ^ { E d }$ and Development of C-Waves of Systemic World Conflicts $pi _ { s w c } ^ { e s } ( { mathcal L } _ { c } ( m , n ) )$ \nLet us denote by M civilization as an integral, open, dynamic, self-organizing system. Let $Omega ( mathbf { M } )$ be the set of various states of system M. Let also $Psi : mathcal { I } _ { infty } longmapsto Omega ( mathbf { M } )$ , where $mathcal { I } _ { infty } equiv { mathcal { I } ( boldsymbol { m } , boldsymbol { n } ) } _ { { m } , { n } in { Z } }$ be mapping generated by the process $pi _ { C } ^ { e d }$ of natural evolutionary development of system M. \nDefinition 3 We will call $Psi [ mathcal { I } ( m _ { k } , n _ { k } ) ] equiv Omega ( C _ { k } ) = Omega _ { k } , k = overline { { 1 , N } }$ , the $( { mathcal { F } } , k )$ -state of system $mathcal { M }$ and call $Psi big [ mathcal { I } big ( m _ { k , i } , n _ { k , i } big ) big ] equiv Omega big ( C _ { k , i } big ) = Omega _ { k , i } , k = overline { { 1 , N } } , i = overline { { 1 , 5 } } ,$ , the $( mathcal { F } , k , i )$ -state of system $mathcal { M }$ . \nSince $F _ { N - k + 1 } in varnothing forall k > N$ , the sequence of Fibonacci numbers $left{ F _ { N - k + 1 } right} _ { k in N }$ is degenerated for $k > N$ ; therefore, the F-regularity revealed on the time interval $mathcal { I } ( m , n )$ for the development of systemic world conflicts on the time interval $t > n$ is not true anymore. \nFigures 4.29 and 4.30 show the diagrams $D _ { g }$ , $D _ { l o c } ^ { ( k ) }$ , and $D _ { l o c }$ , which illustrate the correlation between process $pi _ { C } ^ { e d }$ of evolutionary development of the civilization and process $pi _ { s w c } ^ { e s } ( mathcal { L } _ { c } ( m , n ) )$ of evolutionary structurization of the sequence of $C _ { k }$ -waves of systemic world conflicts (in the global and local contexts) (Fig. 4.31). \n\n4.7.5 The Problem of Identification (Recognition) of $c$ -Waves of Systemic World Conflicts for Big Historical Data \nLet us use the above definitions and notation and consider the general solution scheme for the problem of identification of $C$ -waves of systemic world conflicts for big historical data. \nThe First Stage. Statistical analysis of historical data and definition of the set $W _ { c } ( r , l )$ of all world conflicts that took place from year rth to year lth. By world conflicts we will mean conflicts that claimed no less than 1000 lives according to available information sources. \nThe Second Stage. Generating the sequence $left{ mathcal { L } _ { w c } ^ { ( s ) } right} _ { s in mathcal { ( I } ( r , l ) ) }$ of groups of world conflicts (4.34). \nThe Third Stage. Solving the problem of the existence, on the chosen time intervals $mathcal { I } ( m , n )$ , of block matrices of the form $boldsymbol { alpha } = [ mathcal { I } pmb { chi } mathcal { E } ]$ that satisfy conditions (4.38–4.45) and developing the algorithms to construct such matrices. \nWe will distinguish two essentially different cases. \nCase 1 Assume that the required block matrix $boldsymbol { mathfrak { X } }$ is constructed on the time interval: $mathcal { I } ( boldsymbol { m } , boldsymbol { n } )$ , where $r leq m < n leq l .$ This means that the family $mathcal { M } _ { c } ( boldsymbol { m } , n )$ of identified $C _ { k }$ -waves, $k = overline { { 1 , N } }$ , “completely falls within” the initial time interval $mathcal { I } ( r , 1 )$ . It is obvious that in this case the family $mathcal { M } _ { c } ( boldsymbol { m } , n )$ was “revealed” and “remained” as a completed integral object in the historical past. \nIn this case, for the process of evolutionary structurization of the $( S W C ) _ { alpha }$ - ensemble of waves of systemic world conflicts on the time interval $mathcal { I } ( m , n )$ , we will call block matrix $alpha$ an empirical matrix; $mathcal { I }$ an empirical interval matrix; $pmb { chi }$ an empirical chronometric matrix; and $mathcal { E }$ an empirical energy matrix. \nCase 2 Let on the considered time interval $mathcal { I } ( r ,  l )$ only the fragment $big { C _ { 1 } mathsf { G } _ { } C _ { _ 2 } mathsf { G } _ { therefore cdot } mathsf { G } _ { } C _ { _ M } big }$ of some integral ensemble $big ( S W C big ) _ { alpha } = big { C _ { 1 } mathsf { G } C _ { 2 } mathsf { G } _ { beta dots } mathsf { G } C _ { N } big }$ be “revealed” and “completely fall” within it, and let it be identified by the first $M$ rows of some block matrix $boldsymbol { mathfrak { X } }$ of the form (4.34), namely, by elements $mathcal { T } _ { k , i } ,  tau _ { k , i } ,  mathcal { E } _ { k , i } .$ , $k = overline { { 1 , M } } , i = overline { { 1 , 5 } }$ , constructed on the basis of empirical material. It is natural that this number $M$ should be large enough for the hypothetical statement about the manifestation of the F-regularity for the first $M$ waves of $( S W C ) _ { alpha }$ -ensemble. \nA “missing” fragment $left{ C _ { M + 1 , } C _ { M + 2 } , dots dots , C _ { N } right}$ of the ensemble, which is hypothetically defined by the unknown elements $mathcal { T } _ { k , i } , tau _ { k , i }$ , and $mathcal { E } _ { k , i }$ for $M < k leq N ,$ , $i = { overline { { 1 , 5 } } }$ , can be “restored” according to (4.38–4.45), by extrapolating the revealed F-regularity for $M < k leq N , i = { overline { { 1 , 5 } } }$ . To improve and correct the values of ${ mathcal { I } } _ { k , i }$ , $tau _ { k , i }$ , and $mathcal { E } _ { k , i }$ for $M < k  leq  N ,  i =  overline { { 1 , 5 } }$ , we can use additional information and factors of various nature from adjacent scientific fields (Fig. 4.32). In this case, we",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.7 The General Concept of the Periodic Systemic World Conflicts",
        "subsection": "4.7.4 Correlation of Processes of Evolutionary Development of Civilization \\varPi_{C}^{Ed} and Development of C-Waves of Systemic World Conflicts \\pi_{swc}^{es} \\left( {{{\\cal L}}_{c} \\left( {m,n} \\right)} \\right) ",
        "subsubsection": "N/A"
    },
    {
        "content": "4.7.5 The Problem of Identification (Recognition) of $c$ -Waves of Systemic World Conflicts for Big Historical Data \nLet us use the above definitions and notation and consider the general solution scheme for the problem of identification of $C$ -waves of systemic world conflicts for big historical data. \nThe First Stage. Statistical analysis of historical data and definition of the set $W _ { c } ( r , l )$ of all world conflicts that took place from year rth to year lth. By world conflicts we will mean conflicts that claimed no less than 1000 lives according to available information sources. \nThe Second Stage. Generating the sequence $left{ mathcal { L } _ { w c } ^ { ( s ) } right} _ { s in mathcal { ( I } ( r , l ) ) }$ of groups of world conflicts (4.34). \nThe Third Stage. Solving the problem of the existence, on the chosen time intervals $mathcal { I } ( m , n )$ , of block matrices of the form $boldsymbol { alpha } = [ mathcal { I } pmb { chi } mathcal { E } ]$ that satisfy conditions (4.38–4.45) and developing the algorithms to construct such matrices. \nWe will distinguish two essentially different cases. \nCase 1 Assume that the required block matrix $boldsymbol { mathfrak { X } }$ is constructed on the time interval: $mathcal { I } ( boldsymbol { m } , boldsymbol { n } )$ , where $r leq m < n leq l .$ This means that the family $mathcal { M } _ { c } ( boldsymbol { m } , n )$ of identified $C _ { k }$ -waves, $k = overline { { 1 , N } }$ , “completely falls within” the initial time interval $mathcal { I } ( r , 1 )$ . It is obvious that in this case the family $mathcal { M } _ { c } ( boldsymbol { m } , n )$ was “revealed” and “remained” as a completed integral object in the historical past. \nIn this case, for the process of evolutionary structurization of the $( S W C ) _ { alpha }$ - ensemble of waves of systemic world conflicts on the time interval $mathcal { I } ( m , n )$ , we will call block matrix $alpha$ an empirical matrix; $mathcal { I }$ an empirical interval matrix; $pmb { chi }$ an empirical chronometric matrix; and $mathcal { E }$ an empirical energy matrix. \nCase 2 Let on the considered time interval $mathcal { I } ( r ,  l )$ only the fragment $big { C _ { 1 } mathsf { G } _ { } C _ { _ 2 } mathsf { G } _ { therefore cdot } mathsf { G } _ { } C _ { _ M } big }$ of some integral ensemble $big ( S W C big ) _ { alpha } = big { C _ { 1 } mathsf { G } C _ { 2 } mathsf { G } _ { beta dots } mathsf { G } C _ { N } big }$ be “revealed” and “completely fall” within it, and let it be identified by the first $M$ rows of some block matrix $boldsymbol { mathfrak { X } }$ of the form (4.34), namely, by elements $mathcal { T } _ { k , i } ,  tau _ { k , i } ,  mathcal { E } _ { k , i } .$ , $k = overline { { 1 , M } } , i = overline { { 1 , 5 } }$ , constructed on the basis of empirical material. It is natural that this number $M$ should be large enough for the hypothetical statement about the manifestation of the F-regularity for the first $M$ waves of $( S W C ) _ { alpha }$ -ensemble. \nA “missing” fragment $left{ C _ { M + 1 , } C _ { M + 2 } , dots dots , C _ { N } right}$ of the ensemble, which is hypothetically defined by the unknown elements $mathcal { T } _ { k , i } , tau _ { k , i }$ , and $mathcal { E } _ { k , i }$ for $M < k leq N ,$ , $i = { overline { { 1 , 5 } } }$ , can be “restored” according to (4.38–4.45), by extrapolating the revealed F-regularity for $M < k leq N , i = { overline { { 1 , 5 } } }$ . To improve and correct the values of ${ mathcal { I } } _ { k , i }$ , $tau _ { k , i }$ , and $mathcal { E } _ { k , i }$ for $M < k  leq  N ,  i =  overline { { 1 , 5 } }$ , we can use additional information and factors of various nature from adjacent scientific fields (Fig. 4.32). In this case, we \n(a) [ $mathcal { I } _ { 1 , 0 }$ $mathcal { I } _ { 1 , 5 }$ $C _ { 1 }$ $mathcal { I } _ { 2 , 0 }$ J1 $C _ { 2 }$   \n7 M0 JM1 JMs $C _ { M }$ JM+1 M+1,0 JMH,1. $mathcal { I } _ { mathrm { M } + 1 , 5 }$ 1 Determining M+2,0 M+2,5 CM+2 empiricalvalues … JTE JN $underline { { mathcal { I } _ { mathrm { N . l } } } }$ $underline { { mathcal { I } _ { mathrm { N } , 5 } } }$ $C _ { N }$ fork=1,M,i=1,5 立 Establishing empirical   \n(b) $tau _ { { scriptscriptstyle 1 } , { scriptscriptstyle 0 } }$ $tau _ { { mathfrak { l } } , { mathfrak { l } } }$ $C _ { 1 }$ F-pattern $tau _ { 2 , 1 }$ $cdots tau _ { 2 , 5 }$ $C _ { 2 }$ ↓ Extrapolation:   \n$mathcal { X } = left| frac { tau _ { _ { mathrm { M } , 0 } } } { left| tau _ { _ { mathrm { M } + 1 , 0 } } begin{array} { c c c c } { { tau _ { _ { mathrm { M } , 1 } } } } & { { cdots } } & { { tau _ { _ { mathrm { M } , 5 } } } }  { { overline { { tau } } _ { _ { mathrm { M } + 1 , 0 } } } } & { { tau _ { _ { mathrm { M } + 1 , 1 } } } } & { { cdots } } & { { tau _ { _ { mathrm { M } + 1 , 5 } } } } end{array} right| C _ { M } } right|$ CM $C _ { M + 1 } |$ hypotheticalvalues determining ： $tau _ { mathrm { M } + 2 , 5 }$ 1 CM+2 / for $k = overline { { M + 1 , N } } , i = overline { { 1 , 5 } }$ $mathcal { I } _ { boldsymbol { k } , i } , mathsf { T } _ { boldsymbol { k } , i } , pounds _ { boldsymbol { k } , i } .$ L $underline { { textrm { N . 0 } } }$ TN1 $underline { { tau _ { mathrm { ~ N } , 5 } } }$ $C _ { N }$ →   \n(c) [ $mathcal { E } _ { 1 , 0 }$ $boldsymbol { varepsilon } _ { scriptscriptstyle { 1 , 1 } }$ ε $displaystyle mathcal { E } _ { 1 , 5 }$ $C _ { 1 }$ $mathcal { E } _ { 2 , 0 }$ 3 C2.1 $C _ { 2 }$ £ £ ..εMs   \n5 三 EM+1. EM41 . εMH., $C _ { M + 1 }$ $C _ { M }$ Additional conditions 3 £ 3 CM+2.0 CM+2.5 CM+2Y … ENO 3 CN1 .£ CN.5 CN \nwill talk about the hypothetical F-regularity of the development of systemic world conflicts on the time interval $mathcal { I } ( boldsymbol { m } , n )$ . For the process of evolutionary structurization of $( S W C ) alpha$ -ensemble on the time interval $mathcal { I } ( m , n )$ , we will call block matrix $textsf { textsf { Q } }$ hypothetic matrix, $mathcal { I }$ hypothetic interval matrix, $pmb { chi }$ hypothetic chronometric matrix, and $mathcal { E }$ hypothetic energy matrix. \n4.7.6 Big C-Waves of Systemic World Conflicts \nAccording to the basic definition, various classes (families) of $C$ -waves of systemic world conflicts can hypothetically exist on different time intervals. However, as follows from [25, 53], of greatest interest among them are classes of $C$ -waves that envelope superbig time intervals, whose F-regularity of the development is invariant with respect to the evolution of the nature of world conflicts. In what follows, it is such $C mathrm { cdot }$ -waves that we will call big waves of systemic world conflicts (briefly, big $C _ { }$ - waves). Note that knowledge about the structural parameters of big $C$ -waves plays an important role in the development of new metric approaches in solving problems of predicting global periodic civilization processes of various nature [1, 2, 53]. \nOn the basis of systematically generalized results of analysis of the empirical sequence of world conflicts that took place from $2 5 0 0 ~ mathrm { B C }$ to 2007 AD [4–6], the studies [7–9] identify the ensemble $mathrm { ( S W C ) } _ { alpha } = left{ C _ { 1 } mathsf { G } C _ { 2 } mathsf { G } ldots mathsf { G } C _ { 7 } right}$ of big $C$ -waves of world conflicts (of dimension dim $( left( S W C right) _ { alpha } ) = 7 ;$ on superbig time interval $mathcal { I }$ $( - 7 5 0 ; 2 0 9 2 )$ of approximately 3000 years, which envelopes various epochs of the development of the civilization. The invariance of F-regularity of the development of the identified $C _ { k }$ -waves with respect to the evolution of the nature of world conflicts is illustrated by Fig. 4.33.",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.7 The General Concept of the Periodic Systemic World Conflicts",
        "subsection": "4.7.5 The Problem of Identification (Recognition) of C-Waves of Systemic World Conflicts for Big Historical Data",
        "subsubsection": "N/A"
    },
    {
        "content": "4.7.6 Big C-Waves of Systemic World Conflicts \nAccording to the basic definition, various classes (families) of $C$ -waves of systemic world conflicts can hypothetically exist on different time intervals. However, as follows from [25, 53], of greatest interest among them are classes of $C$ -waves that envelope superbig time intervals, whose F-regularity of the development is invariant with respect to the evolution of the nature of world conflicts. In what follows, it is such $C mathrm { cdot }$ -waves that we will call big waves of systemic world conflicts (briefly, big $C _ { }$ - waves). Note that knowledge about the structural parameters of big $C$ -waves plays an important role in the development of new metric approaches in solving problems of predicting global periodic civilization processes of various nature [1, 2, 53]. \nOn the basis of systematically generalized results of analysis of the empirical sequence of world conflicts that took place from $2 5 0 0 ~ mathrm { B C }$ to 2007 AD [4–6], the studies [7–9] identify the ensemble $mathrm { ( S W C ) } _ { alpha } = left{ C _ { 1 } mathsf { G } C _ { 2 } mathsf { G } ldots mathsf { G } C _ { 7 } right}$ of big $C$ -waves of world conflicts (of dimension dim $( left( S W C right) _ { alpha } ) = 7 ;$ on superbig time interval $mathcal { I }$ $( - 7 5 0 ; 2 0 9 2 )$ of approximately 3000 years, which envelopes various epochs of the development of the civilization. The invariance of F-regularity of the development of the identified $C _ { k }$ -waves with respect to the evolution of the nature of world conflicts is illustrated by Fig. 4.33. \n\nOn the time interval $mathcal { I } ( - 7 5 0 , 2 0 0 7 )$ , six “exhibited” $C _ { k }$ -waves of systemic world conflicts were identified. Structural parameters of the seventh, final (predicted) wave $C _ { 7 }$ are found by extrapolating the revealed F-regularity for the fragment $left{ C _ { 1 } mathsf { G } _ { mathsf { } } C _ { _ 2 } mathsf { G } _ { C _ { 3 } } mathsf { G } _ { C _ { 4 } } mathsf { G } _ { C _ { 5 } } mathsf { G } _ { C _ { 6 } } right}$ of the $( S W C ) _ { alpha }$ -ensemble of waves of systemic world conflicts. \nTo improve and update the predicted values of structural parameters of wave $C _ { 7 }$ , we used additional information from various adjacent scientific fields. For example, we took into account the influence of 12 global threats that can “heat up” the global world conflict generated by the wave $C _ { 7 }$ , called the “Conflict of the 21st century” (Table 4.3). \nWe also took into account possible influence of some other special local factors on the values of structural parameters of the predicted $C _ { 7 }$ -wave. Some of them are presented in Fig. 4.34 for energy safety (ES) threat, where $t _ { H K } ^ { * }$ is the Horner– Kapitsa singularity point [21]; $t _ { N } ^ { * }$ is the Newton singularity point. We also specified critical time intervals related to exhaustion of traditional energy resources of the Earth: oil (O), gas (G), and uranium (U). \nWe have found the values of elements ${ mathcal { I } } _ { k , i }$ , $tau _ { k , i }$ , $mathcal { E } _ { k , i }$ , $k = overline { { 1 , 7 } } , i = overline { { 1 , 5 } }$ , of the hypothetic block matrix $boldsymbol { alpha } = [ mathcal { I } pmb { chi } mathcal { E } ]$ of the process of evolutionary structurization of $( S W C ) _ { alpha }$ -ensemble ${ C _ { 1 } mathsf { G } _ { } C _ { 2 } ^ { } mathsf { G } _ { bot bot } mathsf { G } _ { } C _ { 7 } ^ { } } { mathsf { c } }$ of big $C _ { k }$ -waves of systemic world conflicts. \nElements $mathcal { T } _ { k , i } , k = overline { { 1 , 7 } } , i = overline { { 1 , 5 } }$ , of the hypothetical interval matrix $mathcal { I }$ of the process of evolutionary structurization of the sequence of $C _ { k }$ -waves on time interval $mathcal { I } ( - 7 5 0 ; 2 0 9 2 )$ are as follows: \n， $cdot mathcal { T } _ { 1 , 0 } = I ( - 7 0 5 ; 4 0 1 )$ $begin{array} { r l r l } & { - 7 0 3 5 . 4 0 1 ) , } & & { mathcal { I } _ { 1 , 1 } = I ( - 7 . 8 0 5 5 - 5 0 0 ) , } & { mathcal { I } _ { 1 , 2 } = I ( - 4 9 9 5 - 3 3 5 ) }  & { mathrm { i } 0 2 . 1 0 7 4 rangle , } & & { mathcal { I } _ { 2 , 1 } = I ( 4 0 2 . 6 8 1 ) , } & & { mathcal { I } _ { 2 , 2 } = I ( 6 3 2 . 8 2 6 ) , }  & { 0 7 5 . 1 4 9 7 ) , } & & { mathcal { I } _ { 3 , 1 } = I ( 1 0 7 5 . 1 1 4 6 ) , } & & { mathcal { I } _ { 3 , 2 } = I ( 1 1 4 7 . 1 0 7 1 2 0 ) , }  & { 4 9 8 . 5 1 7 4 9 ) , } & & { mathcal { I } _ { 4 , 1 } = I ( 1 1 9 8 . 1 5 6 6 ) , } & & { mathcal { I } _ { 4 , 2 } = I ( 1 5 6 7 . 1 6 8 8 ) , }  & { 5 2 0 . 1 5 9 1 9 , } & & { mathcal { I } _ { 5 , 1 } = I ( 1 7 5 0 . 1 7 8 8 ) , } & & { mathcal { I } _ { 5 , 2 } = I ( 1 7 9 9 . 1 8 0 ) , }  & { 0 9 0 . 2 0 9 9 2 , } & { mathcal { I } _ { 6 , 1 } = I ( 1 2 0 0 . 1 9 2 0 ) , } & & { mathcal { I } _ { 7 , 2 } = I ( 1 0 2 9 . 2 0 9 1 2 ) , }  & { - 3 0 3 . 4 4 , } & & { mathcal { I } _ { 1 , 4 } = I ( 2 0 0 2 . 4 0 0 ) , } & & { mathcal { I } _ { 1 , 5 } = I ( - 6 2 . 4 0 1 ) , }  & { 8 2 . 7 9 7 0 , } & { mathcal { I } _ { 7 , 4 } = I ( 9 7 1 . 1 0 7 4 ) , } & & { mathcal { I } _ { 2 , 5 } = I ( 9 7 . 1 0 0 4 ) , }  & { 2 0 8 . 1 2 8 1 , } & & { mathcal { I } _ { 3 , 4 } = I ( 1 2 8 2 . 1 4 3 6 ) , } & & { mathcal { I } _ { 3 , 5 } = I ( 1 4 3 9 . 1 4 9 7 ) , }  & { 6 3 9 . 1 6 0 0 , } & { mathcal { I } _ { 4 , 4 } = I ( 1 6 6 1 . 1 7 8 ) , } & { mathcal { I } _ { 4 , 5 } = I ( 1 7 . 1 9 1 . 1 2 9 1 ) , }  & { 6 3 9 . 1 8 0 9 , } & { mathcal { I } _ { 4 , 4 } = I ( 1 6 1 . 1 7 8 ) , } & { mathcal { I } _ { 4 , 5 } = I ( 1 1 . 9 1 . 1 2 0 9 ) , }  & { 8 . 1 9 9 , } &  mathcal { I } _  end{array}$   \n$mathcal { T } _ { 2 , 0 } = I ( 4 0 2 ; 1 0 7 4 ) .$   \n$mathcal { T } _ { 3 , 0 } = I ( 1 0 7 5 ; 1 4 9 7 )$   \n$mathcal { T } _ { 4 , 0 } = I ( 1 4 9 8 ; 1 7 4 9 )$   \n$mathcal { T } _ { 5 , 0 } = I ( 1 7 5 0 ; 1 9 1 9 )$   \n$mathcal { T } _ { 6 , 0 } = I ( 1 9 2 0 ; 2 0 0 7 )$   \n$mathcal { T } _ { 7 , 0 } = I ( 2 0 0 8 ; 2 0 9 2 )$   \n:  \n$mathcal { I } _ { 1 , 3 } = I ( - 3 3 4 ; - 6 3 )$   \n$mathcal { I } _ { 2 , 3 } = I ( 8 2 7 ; 9 7 0 )$   \n$mathcal { T } _ { 3 , 3 } = I ( 1 2 0 8 ; 1 2 8 1 )$   \n$mathcal { T } _ { 4 , 3 } = I ( 1 6 3 9 ; 1 6 6 0 )$   \n$mathcal { T } _ { 5 , 3 } = I ( 1 8 0 1 ; 1 8 1 9 )$   \n$mathcal { T } _ { 6 , 3 } = I ( 1 9 8 9 ; 1 9 9 6 )$   \n1 $mathcal { T } _ { 7 , 3 } = I ( 2 0 4 8 ; 2 0 6 0 )$ \nElements $tau _ { k , i }$ , $k = overline { { 1 , 7 } } , i = overline { { 1 , 5 } }$ , of the hypothetical chronometric matrix $pmb { chi }$ of the process of evolutionary structurization of the sequence of $C _ { k }$ -waves on the time interval $mathcal { I }$ (−750; 2092) are as follows: \nElements $mathcal { E } _ { k , i }$ , $k = overline { { 1 , 7 } } , i = overline { { 1 , 5 } }$ , of the hypothetical energy matrix $mathcal { E }$ of the process of evolutionary structurization of the sequence of $C _ { k }$ -waves on the time interval $mathcal { I } ( - 7 5 0 ; 2 0 9 2 )$ : \nA key feature of the identified family of big $C _ { k }$ -waves, $k = overline { { 1 , 7 } }$ , of systemic world conflicts is that the most powerful (predicted) final $C _ { 7 }$ -wave completely “falls within” the 21st century, and according to different independent sources [2, 24, 26, 27, 30, 54–58], the peak of “System tsunami of the 21st century” or “New phase passage” is in its middle. If the trends formed at the previous phases of the history remain the same, these shocks will hypothetically lead the mankind to another phase of the development (a combination of technological progress and a big war [58] as the most probable scenario). \nCanada 0 Population 35848 610 S GDP per capita: $42183.295   \n00o0 SUSTAINABLEDEVELOPMENTINDEX N2 1,116 N1Australia . Ne3Germany Verylow Low Medium High Very High   \n品 QUALITYOFLIFE Ne12 1,398 le ， 1.0 Canada 0.5 Economic: 0,722 PeerGroup Environmental: 0,765 Social: 0,638 Is lec Harmonizationdegree: 0,951 Very low Low Medium High Very High   \nD SAFETY OFLIFE N1 1,763 BB Biodiversity Balance 0,215 ， CI Conflicts Intencity 0,224 IG InformationGap 0,191 BB CP Corruption Perception 0,136 SF10 E CI GD ES EnergySafety GlobalDiseases 0,060 0,406 GINI 0 IG GW GlobalWarming 0,575 WALO CP ND Natural Disasters 0,340 NI MilitaryProlifiration 0,098 NM ES W WateFfdess 03 ND GD GW Very low Low Medium High Very High PeerGroup Countriesby Safetyof life Australiak Luxembourgthrlads,NewZlad,oayingaporewdenwitlad,Uiteddo, \nUnitedStates \nEstablishing the relationship between the results of the studies obtained in the previous sections, namely: \n– regularity of the emergence of systemic world conflicts (Sect. 4.2); \nFinland O Population 5479531 $ GDP percapita: $43401.228 SUSTAINABLEDEVELOPMENTINDEX Ne5 1,096 N4 Switzerland . Ne6 Denmark   \nVerylow Low Medium High Very High   \n品 QUALITY OF LIFE N911 1,399 1,0 Finland 0.5 Economic: 0,634 PeerGroup Environmental: 0,822 Social: 0,674 ls lec Harmonization degree： 0,947   \nVery low Low Medium High Very High   \nD SAFETYOFLIFE N2 1,696 BB Biodiversity Balance 0,243 ， CI Conflicts Intencity 0,224 IG Information Gap 0,203 BB CP Corruption Perception 0,101 SF1,0 CI G Enebal Disetyes 0,534 GINI 0 IG GW Global Warming 0,451 WALO nCP ND NaturalDisasters 0,331 NI MilitaryProlifiration 0,218 NM ES S WeFrg 030 ND GD GW   \nVerylow Low Medium High Very High PeerGroupCountriesby Safety oflife   \nAustraliatriagidaemrk,ceGmnyelnd,eladaly   \nLuxembosiid States \ninterrelation between the periodic processes in the global economy and systemic world conflicts (Sect. 4.3); the relationship between the sequence of 11-year Schwabe-Wolf cycles of solar activity and the family of C-waves of global systemic conflicts (Sect. 4.5); \nAustralia Population 23789338 GDP per capita: $49755.315   \n00o0 SUSTAINABLEDEVELOPMENTINDEX N1 1,120 Ng2 Canada Very low Low Medium High Very High   \ng QUALITYOFLIFE le Ne2 . 1,459 1,0 Australia 0,5 Economic: 0,724 PeerGroup Environmental: 0,788 Social: 0,694 ls lec Harmonizationdegree: 0,955 Verylow Low Medium High Very High   \n园 SAFETYOFLIFE Ne3 1,687 BB Biodiversity Balance 0,234 ， CI Conflicts Intencity 0,224 IG Information Gap 0,178 BB CP Corruption Perception 0,154 SF10 CI E Enera Sisetyes 0,25 GINI 0 IG GW GlobalWarming 0,547 WAO CP ND Natural Disasters 0,360 NI MilitaryProlifiration 0,222 NI ES WA WaterAccess 0,310 ND GD GINI Inequality 0,426 GW SF StateFragility 0,125 Verylow Low Medium High Very High PeerGroup Countriesby Safety of life Austria,girkidydaly \nLuxembourgthrlads,NewZlad,oayingaporewdenwitlad,Uiteddo, UnitedStates \n– the impact of global threats on the sustainable development of countries and regions of the world (Sect. 4.6). \nWe present the generalized profiles of the three most secure countries in the world (Canada—Fig. 4.35, Finland—Fig. 4.36, Australia, Fig. 4.37), the two \nUnited States .0 Population 320896618 GDP percapita: $57638.159 SUSTAINABLEDEVELOPMENTINDEX Ne15 1,065   \n0UoU N14UnitedKingdom ， No16 Ireland Verylow Low Medium High Very High   \n奇 QUALITYOFLIFE Ne7 1,419 10e United States Economic: 0,759 0.5 PeerGroup Environmental: 0,761 Social: 0,640 1s lec Harmonization degree: 0,933 Verylow Low Medium High Very High   \n园 SAFETYOFLIFE Ne19 1,559 BB BiodiversityBalance 0,736 L CI Conflicts Intencity 0,685 IG Information Gap 0,161 BB CP Corruption Perception 0,188 SF10 F CI ES EnergySafety 0,002 GINIO IG GD Global Diseases 0,405 GW GlobalWarming 0,997 WALOO nCP ND Natural Disasters 0,371 NI MilitaryProlifiration 0,375 NM 3ES S WateFrg 0.3 ND GD GW   \nVery low Low Medium High Very High Peer Group Countries by Safety of life   \nAustraliataeirkdeyedelda \nKorea,gsyeeUed countries with the largest nuclear potential (USA—Fig. 4.38, Russia—Fig. 4.39) and three countries with the lowest level of national security (Angola—Fig. 4.40, Kenya—Fig. 4.41, Mozambique—Fig. 4.42). \n\nRussianFederation .0 Population 144 096870 GDPpercapita: $8748.369   \n1 SUSTAINABLE DEVELOPMENTINDEX Ne62 0,769 Ne63 China   \nVerylow Low Medium High Veryhigh   \n品 QUALITYOFLIFE 1,0 le N78 ， 0,862 Russian 0.5 Federation Economic: 0,388 PeerGroup Environmental: 0,747 Social: 0,443 ls lec Harmonization degree: 0,566   \nVerylow Low Medium High Veryhigh SAFETYOFLIFE   \n园 Ne46 1,339 BB BiodiversityBalance 0,473 ， C Conflicts Intencity 0,685 GP InfruptionPerception 0,21 SF2B CI ES EnergySafety 0,050 GINI IG GD Global Diseases 0,431 GW Global Warming 0,861 WA 0 mCP ND Natural Disasters 0,342 NI MilitaryProlifiration 0,894 N ES SN Wete rages 0.88 ND GD GW   \nVery low Low Medium High Veryhigh PeerGroup Countriesby Safetyoflife   \nAlbania,Agetinbanbdoseluelietnosailriad Chile,Ciaotaica,roatiayueogiaeeceuaHugaryel   \nMalaysia,icgoliateegroan,maoiauiiabal TurkeyUkraine Angola 品 Population 27859305 GDP percapita: $3308.700   \nDool SUSTAINABLE DEVELOPMENT INDEX Ne133 0,453 Ne134Mozambique   \nVery low Low Medium High Veryhigh   \n品 QUALITY OF LIFE Ne134 ， 0,385 1,0 F Angola 0.5 Economic: 0,338 0.9 PeerGroup Enviromental: 0,28 Is lec Harmonization degree: 0,785   \nVery low Low Medium High Veryhigh SAFETYOFLIFE   \n园 Ne135 0,903 BB Biodiversity Balance 0,460 L CI Conflicts Intencity 0,685 IG Information Gap 0,838 BB CP Corruption Perception 0,777 SF1O CI ES GD EnergySafety GlobalDiseases 0,546 0,876 GINI JG GW Global Warming 0,447 WA4 0,0 mCP ND NaturalDisasters 0,513 NI MilitaryProlifiration 0,565 NM yES WA WaterAccess 0,943 ND 手 GD GINI Inequality 0,638 GW SF StateFragility 0,728   \nVery low Low Medium High Veryhigh PeerGroup Countriesby Safety of life   \nBurkinaFasoCmbodia,CmeronCentralAfricanepublic，otedlvoire,thiopiaambiaGuemla   \nGuinea,Keoodgraialimique,gigergeiai   \nPhilippinesleraeilditedebicfiaogoUga Kenya .0 Population 47236259 GDP percapita: $1455.360 SUSTAINABLEDEVELOPMENTINDEX Ne112 0,538 N2111 RepublicoftheCongo ， Ne113Malawi   \nVery low Low Medium High Veryhigh   \n品 QUALITYOFLIFE Ne107 0,650 le ， 国 Kenya 0,5 Economic: 0,428 PeerGroup Environmental: 0,441 Social: 0,442 S lec Harmonizationdegree: 0,869   \nVerylow Low Medium High Veryhigh   \n园 SAFETYOFLIFE Ne136 0,881 BB Biodiversity Balance 0,552 . CI Conflicts Intencity 0,685 IG InformationGap 0,510 BB CP Corruption Perception 0,703 SF10 E CI ES EnergySafety 0,609 GINI IG GD Global Diseases 0,855 GW GlobalWarming 0,442 WA CP ND Natural Disasters 0,780 NI MilitaryProlifiration 0,683 NM ES WN Weterage 0.58 ND GD GW   \nVery low Low Medium High Veryhigh PeerGroup Countriesby Safety of life   \nAngola,BurkinaFaso,Cambodia,Cameroon,CentralAfricanepublic,Cotedlvoire,EhiopiaGambia   \nGuatemalaaotogriiueggerga   \nPhilippinesegaleraLoewaild，UitedeblicoiaogoUgada Mozambique .0 Population 28010691 GDP percapita: $382.069   \nUool SUSTAINABLE DEVELOPMENTINDEX Ne134 0,433 Ne135 Syria Verylow Low Medium High Veryhigh   \ng QUALITY OF LIFE Ne135 0,377 Mozambique 0,5 Economic: 0,372 0.0 PeerGroup Environmental: 0,177 Social: 0,390 Is lec Harmonizationdegree: 0,858 Very low Low Medium High Veryhigh SAFETYOFLIFE   \n园 Ne137 0,856 BB BiodiversityBalance 0,478 ， CI Conflicts Intencity 0,685 IG InformationGap 0,799 BB CP Corruption Perception 0,693 SF10 E CI E Eneba Diseayes 0,605 GINI IG GW GlobalWarming 0,439 WA 0.0 mCP ND Natural Disasters 0,524 NI MilitaryProlifiration 0,737 NM ES W WeteFfadess 0.708 ND GD GW Very low Low Medium High Veryhigh PeerGroup Countriesby Safety of life   \nAngola,Burkinasombdiamroonetralricanepbic,otedlre,ioiaia Guatemalainaotogcrwiiicggergeia Philippines,enegalerraLeonewailand，UitedepublicofananiaTogoUgandaee \n\n4.8 Conclusions \n1. The generalization and formalization of approaches to the recognition of C-waves of global systemic conflicts through big historical data have been carried out and general concept of description and interpretation of these waves has been proposed. On the basis of intellectual analysis of big data on the conflicts, taking place since $7 5 0 ~ mathrm { B } . mathrm { C }$ . up to now, have been analyzed and their general pattern has been revealed. There has been made an attempt to foresee the next global conflict called the conflict of the 21st century. Its nature and main characteristics have been analyzed. \n2. The hypotheses for a metric relation between the global periodic processes, namely between the sequence of 11-year cycles of solar activity, so called Kondratieff cycles of the development of the global economy, and the process of evolutionary structuration of the family of the C-waves of global systemic conflicts have been formulated. There has also been made an attempt to predict these processes in the 21st century by using a metric approach. \n3. The possible scenarios of the development of the conflict of the 21st century have been constructed and analyzed. This analysis led to the following conclusions: \n3:1. Since for $mathrm { ~ bf ~ k ~ } > 7$ , the sequence of Fibonacci numbers $left{ F _ { 8 - k } right} _ { k in N }$ for the sequence of big $C _ { k }$ -waves, $mathrm { k } = overline { { 1 , 7 } }$ , is degenerated, the revealed F-regularity fails on the time interval $mathrm { ~ t ~ } > 2 0 9 2$ . Therefore, natural questions arise: What the 21st century has in store for the civilization? What is the nature of the final state of civilization as a system? What should happen to the world civilization after 2092, in particular, in the $2 2 mathrm { n d }$ century? Probably, the final cycle of some global evolutionary chain of the development of the mankind begins? \n3:2. One can find the answer to this question in the studies by two outstanding scientists of the last century, can be found in the studies of the outstanding scientists of the last century, Vernadskiy [22] and Moisejev [25]. Independently one from the other, they formulated a very close idea: if the mankind, in the planetary scale, does not change radically its behavior (using its mind and its labor for self-destruction), in the middle of the 21st century there can occur the conditions under which people cannot exist. These conclusions were made for the paradigm permanent for the whole history of the mankind: “unlimited and increasing consumption” and for the technosphere (set of technological lifestyles) unfriendly for human inhabitance, developed in the 19th and beginning of the 21st centuries. \n3:3. If the mankind can change the paradigm of its behavior in the planetary scale, for example, to “harmonic coexistence” and radically transform the technosphere to “nature-like” (friendly to the human environment, based on the convergence of nano-, bio-, information, cognitive, and socio-humanitarian technologies [53]), then the regularity revealed for the previous paradigm of the development of systemic world conflicts, which corresponds to the Fibonacci sequence, will fail for the new paradigm, which will allow the mankind to continue its mission on the Earth.",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.7 The General Concept of the Periodic Systemic World Conflicts",
        "subsection": "4.7.6 Big C -Waves of Systemic World Conflicts",
        "subsubsection": "N/A"
    },
    {
        "content": "4.8 Conclusions \n1. The generalization and formalization of approaches to the recognition of C-waves of global systemic conflicts through big historical data have been carried out and general concept of description and interpretation of these waves has been proposed. On the basis of intellectual analysis of big data on the conflicts, taking place since $7 5 0 ~ mathrm { B } . mathrm { C }$ . up to now, have been analyzed and their general pattern has been revealed. There has been made an attempt to foresee the next global conflict called the conflict of the 21st century. Its nature and main characteristics have been analyzed. \n2. The hypotheses for a metric relation between the global periodic processes, namely between the sequence of 11-year cycles of solar activity, so called Kondratieff cycles of the development of the global economy, and the process of evolutionary structuration of the family of the C-waves of global systemic conflicts have been formulated. There has also been made an attempt to predict these processes in the 21st century by using a metric approach. \n3. The possible scenarios of the development of the conflict of the 21st century have been constructed and analyzed. This analysis led to the following conclusions: \n3:1. Since for $mathrm { ~ bf ~ k ~ } > 7$ , the sequence of Fibonacci numbers $left{ F _ { 8 - k } right} _ { k in N }$ for the sequence of big $C _ { k }$ -waves, $mathrm { k } = overline { { 1 , 7 } }$ , is degenerated, the revealed F-regularity fails on the time interval $mathrm { ~ t ~ } > 2 0 9 2$ . Therefore, natural questions arise: What the 21st century has in store for the civilization? What is the nature of the final state of civilization as a system? What should happen to the world civilization after 2092, in particular, in the $2 2 mathrm { n d }$ century? Probably, the final cycle of some global evolutionary chain of the development of the mankind begins? \n3:2. One can find the answer to this question in the studies by two outstanding scientists of the last century, can be found in the studies of the outstanding scientists of the last century, Vernadskiy [22] and Moisejev [25]. Independently one from the other, they formulated a very close idea: if the mankind, in the planetary scale, does not change radically its behavior (using its mind and its labor for self-destruction), in the middle of the 21st century there can occur the conditions under which people cannot exist. These conclusions were made for the paradigm permanent for the whole history of the mankind: “unlimited and increasing consumption” and for the technosphere (set of technological lifestyles) unfriendly for human inhabitance, developed in the 19th and beginning of the 21st centuries. \n3:3. If the mankind can change the paradigm of its behavior in the planetary scale, for example, to “harmonic coexistence” and radically transform the technosphere to “nature-like” (friendly to the human environment, based on the convergence of nano-, bio-, information, cognitive, and socio-humanitarian technologies [53]), then the regularity revealed for the previous paradigm of the development of systemic world conflicts, which corresponds to the Fibonacci sequence, will fail for the new paradigm, which will allow the mankind to continue its mission on the Earth. \nReferences \n1. List of Wars. http://en.wikipedia.org/wiki/List_of_wars 2. M.Z. Zgurovsky, V.V. Yasinsky, Revealing regularities of the course of global system conflicts. Syst. Res. Inf. Technol. 2, 7–18 (2007) 3. H. Scheer, Energy is a driving force for our civilization. http://www.folkecenter.dk/en/articles/ HScheer_aburja.htm 4. World Data Center, «Geoinformatics and Sustainable Development» . http://wdc.org.ua/en 5. GeoHive. http://www.xist.org/earth/population1.aspx 6. Global Footprint Network. http://www.footprintnetwork.org/en/index.php/GFN/ 7. UN and CIA Combined List—Income Ratios and Gini Indices. https://en.wikipedia.org/wiki/ List_of_countries_by_income_equality 8. Health Statistics and Health Information Systems, World Health Organization. http://www. who.int/healthinfo/statistics/programme/en/index.html 9. Corruption Perceptions Index, Transparency international. https://www.transparency.org/ country (2017)   \n10. Water for Life: Making it Happen. WHO/UNICEF. Joint Monitoring Report. http://www. who.int/water_sanitation_health/monitoring/jmp2005/en/index.html (2005)   \n11. UNICEF Joint Monitoring Programme for Water Supply and Sanitation (n.d.). Water for life: making it happen. http://www.who.int/entity/water_sanitation_health/waterforlife.pdf   \n12. M.G. Marshall, Global report on conflict, governance and state fragility, in Foreign Policy Bulletin, ed. by M.G. Marshall, B.R. Cole. http://www.systemicpeace.org/GlobalReport2008. pdf (2008)   \n13. S.P. Kapitsa, S.P. Kurdyumov, G.G. Malinetskii, Synergetics and Predictions (Synergetics: From Past to Future) (Editorial USSR, Moscow, 2003). (in Russian)   \n14. N.D. Kondratieff, The Major Cycles of the Conjecture and Prediction Theory (Ekonomika, Moscow, 2002). (in Russian)   \n15. N.D. Kondratieff, Economic Dynamics Problem (Ekonomika, Moscow, 1989). (in Russian)   \n16. M.Z. Zgurovskii, General pattern of global system conflicts and global threats of the 21st century. Cybern. Syst. Anal. 43(5), 687–695 (2007)   \n17. YuV Yakovets, Predicting Cycles and Crises (MFK, Moscow, 2000). (in Russian)   \n18. Y.V. Yakovets, Cycles and crises in the 21st century: a civilization approach, in Transgender Jubilee Science Conference RAEN, MFK, Moscow (2000)   \n19. J. Schumpeter, Business Cycles, vols. 1, 2 (McGraw-Hill, New York, 1939)   \n20. F. Braudel, Civilisation matérielle, économie et capitalisme. XVe-XVIIIe siècle, vol. 3, Le temps du monde (1979)   \n21. S.P. Kapitsa, On the “acceleration of the historical time”. Novaya Noveish. Istoriya 6, 3–16 (2004)   \n22. V.I. Vernadskii, A few words on the noosphere. Uspekhi Sovrem. Biologii 18(2) (1944)   \n23. M.Z. Zgurovsky, Metric aspects of periodic processes in economy and society. Cybern. Syst. Anal. 46(2), 167–172 (2010)   \n24. M.Z. Zgurovsky, Interrelation between Kondratieff cycles and global systemic conflicts. Cybern. Syst. Anal. 45(5), 742–749 (2009)   \n25. N.N. Moiseyev, Save mankind on the Earth. Ekologiya i Zhizn 1, 11–13 (2000)   \n26. E.M. Soroko, Golden Sections, Systems Self-Organization and Evolution Processes: An Introduction to the General Theory of Systems Harmony (KomKniga, Moscow, 2006). (in Russian)   \n27. YuV Yakovets, Forecasting of Cycles and Crises (MFK, Moscow, 2000). [in Russian]   \n28. I.M. D’yakonov, Pathways of History: From Ancient Human to Nowadays (Vostoch. Lit., Moscow, 1994) (in Russian)   \n29. YuI Vitinskii, I. Kopetskii, G.V. Kuklin, Statistics of Sunspots Activity (Nauka, Moscow, 1986). (in Russian)   \n30. Sun Influences Data Analysis Center, Belgium. http://sidc.oma.be/sunspot-data/ (2013)   \n31. D.H. Hathaway, The solar cycle. Living Rev. Solar Phys. 7(1), 1–65 (2010)   \n32. R.P. Kane, Some implications using the group sunspot number reconstruction. Sol. Phys. 205 (2), 383–401 (2002)   \n33. Did you say the Sun has spots?, Space Today. http://www.spacetoday.org/SolSys/Sun/ Sunspots.html (2005)   \n34. A. Phillips, Solar cycle 24 begins. Science@NASA (2008)   \n35. S.P. Kapitsa, Phenomenological theory of the growth of Earth’s population. UFN 166, 63–80 (1996)   \n36. A.P. Nazaretyan, Civilization Crises in the Context of Universal History (Mir, Moscow, 2004) (in Russian)   \n37. A.D. Panov, Crisis of a planetary cycle of the universal history. Vselennaya, Prostranstvo, Vremya 2, 28–34 (2004)   \n38. Analysis of sustainable development: Global and regional contexts, in: M. Z. Zgurovsky (sci. adv.), International Council for Science (ISCU), Part 1, Global Modeling of Processes of Sustainable Development in the Context of Quality and Safety of Life of People, NTUU «KPI», Kyiv (2009)   \n39. T.N. Pomerantseva, A.A. Boldak, Multivariate statistical analysis of the influence of global threats on the security of countries of the world. Cybern. Syst. Anal. 2, 200–210 (2010)   \n40. J. Pearl, Causality: Models, Reasoning, and Inference, 2nd edn. (Cambridge University Press, Cambridge, 2009)   \n41. R.J. Larsen, M.L. Marx, An Introduction to Mathematical Statistics and Its Applications, 4th edn. (Pearson, N.Y., 2006)   \n42. P. Velleman, L. Wilkinson, Nominal, ordinal, interval, and ratio typologies are misleading. Am. Stat. 47, 65–73 (1993)   \n43. J.A. Hartigan, M.A. Wong, A $k$ -means clustering algorithm. Appl. Stat. 28, 100–108 (1979)   \n44. S.A. Aivazyan, I.S. Enyukov, L.D. Meshalkin, Applied Statistics: Investigation of Dependences (Financy and Statistika, Moscow, 1985). (in Russian)   \n45. A.N. Kolmogorov, Three approaches to the definition of the concept ‘quantity of information. Probl. Peredachi Inf. 1(1), 3–11 (1965)   \n46. E.M. Gabidulin, N.I. Pilipchuk, Lectures on Information Theory (MFTI, Moscow, 2007). (in Russian)   \n47. C.E. Shannon, Works on the Theory of Information and Cybernetics [Russian translation] (Izd. Inostr. Lit, Moscow, 2002)   \n48. The GeNIe (Graphical Network Interface) Software Package. http://genie.sis.pitt.edu/about. html   \n49. A.A. Boldak, M.V. Nevdashchenko, Mathematical apparatus for formalization of models used in designing information systems. Visnyk KPI Ser. Inform. Control Comput. Eng. 47 332–345 (2007)   \n50. A.A. Markov, Elements of Mathematical Logic (MGU, Moscow, 1984). (in Russian)   \n51. I.V. Rezko, History of Wars and Conflicts, vol. 1. Compiler I.V. Rezko. (Harvest Ltd, Minsk, 1997)   \n52. I.V. Rezko, History of Wars and Conflicts, vol. 2. Compiler I.V. Rezko. (Harvest Ltd, Minsk, 1997)   \n53. M.V. Kovalchuk, Science and Life: My Convergence, Vol. 1: Autobiographical Sketches: Science Educational and Conceptual Articles (Academkniga, Moscow, 2011) (in Russian)   \n54. N.N. Taleb, Antifragile: Things That Gain from Disorder (KoLibri, Azbuka-Atticus, Moscow, 2012, 2014) (in Russian)   \n55. G.D. Snooks, The Dynamic Society: Exploring the Sources of Global Change, vol. xvii (Routledge, London, 1996), 491 pp.   \n56. A.D. Panov, Scaling Law of the Biological Evolution and the Hypothesis of the Self-CONSISTENT Galaxy Origin of Life (COSPAR. Published by Elsevier Ltd., 2005), 220–225pp.   \n57. R. Kurzwel, The Singularity Is Near: When Humans Transcend Biology (Viking, 2005), 652pp.   \n58. S. Karelov, Big War is Imminent. https://medium.com/@sergey_57776/ (2017)",
        "chapter": "4 Intellectual Analysis of Systemic World Conflicts and Global Forecast for the 21st Century",
        "section": "4.8 Conclusions",
        "subsection": "N/A",
        "subsubsection": "N/A"
    }
]