Knowledge-Based Programming for the Cybersecurity Solution

Received: August 7, 2018 Revised: October 22, 2018 Accepted: November 25, 2018 Abstract: Introduction: The problem of cyberattacks reduces to the unwanted infiltration of software through latent vulnerable access points. There are several approaches to protection here. First, unknown or improper system states can be detected through their characterization (using neural nets and/or symbolic codes), then interrupting the execution to run benchmarks and observe if they produce the states they should. If not, the execution can be rewound to the last successful benchmark, all states restored, and rerun.


INTRODUCTION
Software, which is complex enough to be capable of self-reference, cannot be proven valid [1].It similarly follows that complex software is inherently subject to cyberattack.It is just a matter of degree.
The classic approach to detecting and preventing cyberattack is to interrupt executing software at random points and run benchmark programs, whose (intermediary) output is known.If any such runs deviate from the expected output, it effectiveness of cyberattacks.The generation of KBSE is a spiral development process.Thus, cyber protection requires domain understanding.One solution is distributed software synthesis.Here, one expert compiler calls another in a graph representation of invocation patterns.Notice that algorithmic synthesis depends upon reuse at the highest or mostabstract level.This effectively serves as a highest-level programming language with the side-benefit of providing relatively cyber-secure output codes as previously described.
It follows that the science and particularly the engineering of cyber-secure systems is a consequence of building expert compilers.Such compilers will need to create functions (i.e., functional programming) to insure interchangeability among the various functions.Global variables, other than those connected with blackboards and nonmonotonic reasoning, are notoriously difficult to compile.That is the reason why the scope of synthesized code and variables need be local as a practical matter.
One way to write code, which writes code, is to build a distributed networked expert compiler, which synthesizes some space of functional programs.The code to be synthesized first are rules for synthesizing code.Here, design decisions can be generalized to provide more options for the synthesized rules.This is a form of bootstrapping for cracking the knowledge acquisition bottleneck [7].
The way to write meta-rules is to group rules into segments, which are applied by other rules [8].Segments are cohesive; and, as a consequence, rules and groups of rules can be grouped by chance on the basis of synthesizing code, which functions in satisfaction of sets of I/O constraints.While the validation of arbitrary code can be achieved by expert systems of arbitrary complexity, functional code is amenable to I/O testing by making use of orthogonal sets of I/O pairs.A pair of I/O pairs is said to be orthogonal with respect to a software function if the two I/O tests do not share the same execution path trace.In practice, such pairings of I/O tests need only be relatively orthogonal as a cost-saving measure.For example, an orthogonal set of sort program tests is {( The key to identifying relatively orthogonal test vectors is that the set of them is random in the recursively enumerable sense [9].That is, the test vectors should not be functionally related to each other.Validated functions can be further composed and similarly tested for novel functionality.
So far, we have seen that the problem of providing practical cybersecurity is tied to the problem of automatic programming.This problem, in turn, is tied to the problem of knowledge acquisition.
Knowledge acquisition goes beyond neural networks.Indeed, effective knowledge acquisition by humans involves transformational analogy, which is something that neural networks cannot do [10,11].This then is the domain for symbolic AI, since neural networks inherently cannot perform basic "modus ponens".
The key to successful KBSE is having as much knowledge in as many representational formalisms [12] as possible available to bear.In theory, one needs as high a density of knowledge as possible, which cannot be bounded [13].

ON RANDOMIZATION
The density of knowledge can be increased, in theory, by applying knowledge to itself, as previously indicated.One can generalize data and generalize generalizations.When this leads to relatively incompressible levels, the applicable term is "randomization" [9].For example, numerical sequences are said to be "random" because no more concise generating algorithm than the sequence itself is known.
Random knowledge has the property that the converse (i.e., symmetric knowledge) follows from the application of one random knowledge function to another -including itself.For example, the add-one function increases its numerical input by one.The mod-two function returns true just in case its numerical input is even.The composition of the modtwo and add-one function returns true just in case its numerical input is odd.This is a symmetric function, which is defined by a composition of two relatively random (i.e., orthogonal) functions.Notice that this is the way to expand the knowledge base.This expansion, in turn, increases the space of functions, which can be automatically programmed.This space includes many non-deterministic variants too, which provide benchmarked security via semantic randomization.Relatively random functions are recursively enumerable because there is no known composition, which can express them [9].
The next question might be, "which is to be preferred -an expert compiler, or a case-based compiler for KBSE?"The answer is that it follows from randomization theory that both are required [13].Cases serve as a repository for experiential knowledge.They can be generalized into rules through the application of cases and/or rules once their domain is understood.This is not necessarily achievable in practice.However, the chance of a successful generalization is maximized through the use of networked coherent systems.Coherent systems are relatively random.The point is that such systems of systems need to be engineered not only to advance the goals of KBSE, but those of cybersecurity as well.Knowledge spawning knowledge may sound simple enough, but it represents perhaps the most complex project ever conceived.
It is well-known that the representation of knowledge and data determines that, which can be learned [12].Neural networks can map inputs to outputs, but they can neither deduce nor induce new knowledge.That requires symbolic approaches; including deduction, induction, abduction, and fuzzy pattern matching.In particular, symbolic approaches are compatible with grammatical inference [14,15], which is tied to programming language definition.All these methods and more are related through randomization [9,13].Indeed, it is postulated that deficiencies in today's neural networks [10,11] will be ameliorated through the redefinition of neural networks using randomization theory [13].

EXAMPLE OF CYBERSECURE SOFTWARE SYNTHESIS
The problem is to detect a cyberattack when it happens and recover from a cyberattack while it happens.Software needs to be subdivided into components, which map a set of input vectors to a non-deterministic set of stochastic output vectors.Components are defined in terms of other components, which are defined by rules (Fig. 1).The behavior of a set of Boolean components or a sequence of procedural components is not unique.Thus, it is possible to synthesize a diverse set of components, which provides the desired security for an arbitrary I/O characterization.

Justification for I/O Characterization of Software Components
It is acknowledged that there is a software, which cannot be sufficiently characterized by a non-deterministic stochastic I/O mapping.For example, a component might draw a picture.Here, a knowledge-based and/or neural system may be applied to rank the quality of the component.In a sense, mapping input to desired output(s) is universal, it's just that intermediate evaluation code is sometimes needed.Thus, while we will not address such complexities in this paper, it is to be understood that the methodology advanced herein is completely compatible with them.In fact, it may be used to define the intermediate knowledge-based evaluation systems.
Another point of contention pertains to the use of empirical testing instead of, or in combination with, denotational or axiomatic semantics for program validation.The recursive Unsolvability of the Equivalence Problem [1] proves that in the general case it is impossible to prove that two arbitrary programs compute the same function.Moreover, approaches to program validation, based on computational semantics, have proven to be unacceptably difficult to apply in practice.There can be no theoretical method for insuring absolute validity once a program grows to a level of complexity to be capable of self-reference [1,13,16,17].
It follows that program validation is properly based on empirical testing, the goal of which is to cover a maximal number of execution paths using a minimal number of test cases.This is none other than randomization [9,13].Of course, there is no need to achieve the absolute minimum here; a minimum relative to the search time required to find the test cases will suffice.In a large enough system of systems, the methodology advanced herein may be applied to the generation of relatively random test cases.Randomization serves to maximize reuse.Reuse is perhaps the best realworld technique for exposing and thus minimizing the occurrence of program bugs.

Random-Basis Testing
Each component, saved in the database, is associated with one or more I/O test vector pairings that serve to map a random input vector to correct non deterministic output vectors.The underpinning principle is that test vectors, which have been sufficiently randomized, are relatively incompressible.For example, consider the synthesis of a sort function using LISP (Fig. 2).There are some extraneous details, such as knowing when a particular sequence will lead to a stack overflow; but, these are easily resolved using an allowed execution time parameter.Impressive programs have been so synthesized; supporting the component-based concept.Notice that components can be written at any scale from primitive statements to complex functions.Given only so much allocated search time, the system will either discover a solution or report back with failure.This is in keeping with the recursive Unsolvability of the Halting Problem [1,13].Consider such I/O constraints as (((3 2 1) (1 2 3)) ((3 1 2) (1 2 3))).That is, when (3 2 1) is input to the sort function, it is required to output (1 2 3).Similarly, when (3 1 2) is input to it, it is required to output the same.Clearly, there is little value in using a test set such as (((1) (1)) (( 2 The purpose served by case I/O exemplar vectors, such as illustrated above, is to filter out synthesized functions, which do not satisfy the specified mapping constraints.A super set of such functions can thus be filtered down to a nondeterministic set of functions.The I/O test vectors must be mutually random so as to maximize the number of distinct execution paths tested.This helps to insure that the selected function(s) will perform, as desired, on unspecified input vectors.Moreover, in as much as every function is synthesized as a random instance of its parent schema, semantically equivalent non-deterministic functions are likely to be syntactically distinct.Almost certainly, they will be syntactically distinct in the context of other synthesized and composed functions.This is sufficient to insure cyber security.
One problem, which occurs in the synthesis of large complex functions is that the candidate instantiation space is too large to permit tractable testing.One solution to this problem follows from the inverse triangle inequality.This means that one can effect an exponential reduction in the search space for functional instances by moving the articulation points into distinct functions, which can always be done.This moves the increase in complexity for the moved articulation point from the product of possible instances for that articulation point to the sum of possible instances for that articulation point.When this is carried out for several articulation points, in the same function, the cumulative savings in the size of the search space is clearly exponential.Moreover, the search space for function synthesis can be further reduced by independently synthesizing each function in a composition of functions and then separately validating the composition.Another technique for saving time is to evaluate each salient set of I/O constraints to the point of failure, if any.Just as is the case with "or" and "and" functions, there is no need to evaluate past the point of failure.Given that the vast majority of synthesized functions will be failures, this one strategy can collectively save considerable time.Another very-important approach to delimiting search is to employ search control heuristics.For example, in Pascal, every end keyword needs to be preceded by an unmatched begin keyword.This knowledge, and knowledge like it, can be applied to delimit the members of the matching set articulation point.Search control heuristics enable the scaling of the methodology.This is a key concept.It should be noted how semantic randomization depends, in no small part, on knowledge insertion (i.e., unlike syntactic randomization).As Francis Bacon once stated, "knowledge is power"; in this case, power to thwart otherwise successful cyber-attacks.
A problem is that the test set is relatively symmetric or compressible into a compact generating function.A fixedpoint or random test set is required instead and the use of such relatively random test sets is called, random-basis testing [18].While the need for functional decomposition remains, under random-basis testing, the complexity for the designer is shifted from writing code to writing search schema and relatively random tests.For example, such a test set here is (((1) (1)) ((2 1) (1 2)) ((3 1 2) (1 2 3)) ((1 2 3) (1 2 3))).Many similar ones exist.One may also want to constrain the complexity of any synthesized component (e.g., Insertion Sort, Quicksort, et al.).This can be accomplished through the inclusion of temporal constraints on the I/O behavior (i.e., relative to the executing hardware and competing software components) [19].

Component Definition
There are two categories of components; Boolean components, which return True or False and procedural components, which compute all other functions and can post and/or retract information to/from a blackboard.There are two blackboards; a local blackboard, which is only accessible to local component functions and procedures as well as those invoked by them and a global blackboard, which is accessible to all component functions and procedures.The blackboards dynamically augment the input vectors to provide further context.
All components are composed of rules, each of which consists of one or a conjunction of two or more Boolean components, which imply one or a sequence of two or more, procedural components; including global and local RETRACT and POST.Given an input vector and corresponding output vector(s), the rule base comprising the component must map the former to that latter at least tolerance percent of the time.The default tolerance is 100 percent.Transformation may also favor the fastest component on the same I/O characterization.Notice that greater diversification comes at an allowance for less optimization.
Cyberattacks are likely to affect one LISP construct (e.g., null S) and not another (e.g., nil) or vice versa.Given the plethora of such constructs, which comprise a typical function, some versions of syntax will remain unscathed by a cyberattack, while others will not.This is always detectable; and, processing can be rewound and continued.It follows that automatic programming enables cybersecurity; and, cybersecurity serves to advance automatic programming.It seems as though nature has provided us with a constructive consequence supporting the plethora of cyberattacks!That is, they serve to advance technologies for automatic programming, which is applied to ameliorate the cybersecurity problem.

Component Synthesis
A library of universal primitive and macro components is supplied and evolved.There are three ways that these are retrieved.First, is by name.Second is by mapping an input vector closer, by some definition (e.g., the 2-norm et al.), to a desired non deterministic output vector (i.e., hill climbing -non contracting transformations reducing the distance to a goal state with each substitution).Third is just by mapping the input vector using contracting and non-contracting transformations (i.e., Type 0 transformation).Hill climbing and Type 0 transformation may be combined and occur simultaneously until interrupted.The former accelerates reaching a desired output state, while the latter gets the system off of non-global hills.
Macro components are evolved by chance.They comprise a Very High Level Language (VHLL).For example, a macro component for predicting what crops to sow will no doubt invoke a macro component for predicting the weather.Similarly, a macro component for planning a vacation will likewise invoke the same macro component for predicting the weather (i.e., reuse) [20].
Test vectors are stored with each indexed component to facilitate the programmer in their creation and diversification as well as with the overall understanding of the components function.While increasing the number of software tests is generally important, a domain-specific goal is to generate mutually random ordered pairs [18].Components in satisfaction of their I/O test vectors are valid by definition.Non deterministic outputs are not stochastically defined for testing as it would be difficult to know these numbers as well as inefficient to run such quantitative tests.
As software gets more complex, one might logically expect the number of components to grow with it.Actually, the exact opposite is true.Engineers are required to obtain tighter integration among components in an effort to address cost, reliability, and packaging considerations, so they are constantly working to decrease the number of software components but deliver an ever-expanding range of capabilities.Thus, macro components have great utility.Such randomizations have an attendant advantage in that their use; including that of their constituent components implies their increased testing by virtue of their falling on a greater number of execution paths [9,13,19].The goal here is to cover the maximum number of execution paths using the relatively fewest I/O tests (i.e., random-basis testing [18]).
The maximum number of components, in a rule, as well as the maximum number of rules in a component, is determined based on the speed, number of parallel processors for any fixed hardware capability, and the complexity of processing the I/O vectors.It is assumed that macro components will make use of parallel/distributed processors to avoid a significant slowdown.Components that are not hierarchical are quite amenable to parallel synthesis and testing.
Components may not recursively (e.g., in a daisy chain) invoke themselves.This is checked at definition time through the use of an acyclic stack of generated calls.Searches for component maps are ordered from primitive components to a maximal depth of composition, which is defined in the I/O library.This is performed to maximize speed of discovery.The components satisfying the supplied mapping characterizations are recursively enumerable.
Software engineers can supply external knowledge, which is captured for the specification of components.Components are defined using a generalized language based on disjunction.This is because it is easier to specify alternatives (i.e., schemas) in satisfaction of I/O constraints than to specify single instances (e.g., A|B → C than A → C | B → C; or, A → B|C than A → B | A → C).Moreover, such an approach facilitates the automatic re-programming of component definitions in response to the use of similar I/O constraints.The idea is to let the CPU assume more of the selection task by running a specified number of rule alternates against the specified I/O constraints.This off-loads the mundane work to the machine and frees the software engineer in proportion to the processing speed of the machine.Here, the software engineer is freed to work at the conceptual level; while, the machine is enabled to work at the detailed level.Each is liberated to do what it does best.The number of (macro) Boolean components, (macro) procedural components, and alternate candidate rules is determined by the ply of each and the processing speed of the machine.Notice that the task of programming component rules is thus proportionately relaxed.Programming is not necessarily eliminated; rather, it is moved to ever-higher levels.This is randomization [9].Furthermore, componenttype rule-based languages have the advantage of being self-documenting (e.g., IF "Root-Problem" THEN "Newton-Iterative-Method").Novel and efficient development environments can be designed to support the pragmatics of such programming.
Each run may synthesize semantically equivalent (i.e., within the limits defined by the I/O test vectors), but syntactically distinct functions (e.g., see the alternative definitions for MYSORT at the bottom of Fig. 2).Similar diversified components are captured in transformation rules.Thus, initially diversified components are synthesized entirely by chance, which of course can be very slow.Chance synthesis is a continual on-going process, which is necessary to maintain genetic diversity.But, once transformation rules are synthesized, they are applied to constituent component rules to create diversified components with great rapidity.The 3-2-1 skew may be applied to favor the use of recently acquired or fired transformation rules.It uses a logical move-to-the-head ordered search based upon temporal locality [21].The acquisition of new components leads to the acquisition of new transforms.Note that if the system sits idle for long, it enters dream mode via the 3-2-1 skew.That is, it progressively incorporates less recently acquired/fired transforms in the search for diversified components.
Transformation rules can be set to minimize space and/or maximize speed and in so doing generalize/optimize.Such optimizations are also in keeping with Occam's razor, which states that in selecting among competing explanations of apparent equal validity, the simplest is to be preferred.If, after each such transformation, the progressively outer components do not properly map their I/O characterization vectors, then it can only be because the pair of components comprising the transformation rule is not semantically equivalent.In this case, the transformation is undone and the transformation rule and its substituted component are expunged (i.e., since it has an unknown deleterious I/O behavior).This allows for a proper version to be subsequently re-synthesized.Components having more-specific redundant rules have those rules expunged.
Convergence upon correct components and thus correct transforms is assured.This is superior to just using multiple analogies as it provides practical (i.e., to the limits of the supplied test vectors) absolute verification at potentially multiple component levels.Such validation is not in contradiction with the Incompleteness Theorem as the test vectors are always finite as is the allowed runtime [17].

Non Monotonic Rules
Non monotonic rules are secondary rules, which condition the firing of primary rules.They have the advantage of being highly reusable; facilitating the specification of complex components.Reuse is a tenet of randomization theory [13].Both local and global blackboards utilize posting and retraction protocols.The scope of a local blackboard is limited to the originating component and all components invoked by it.For example, {Laces: Pull untied laces, Tie: Make bow} → GRETRACT: (Foot-ware: Shoes are untied); GPOST: (Foot-ware: Shoes are tied) The order of the predefined, global and local RETRACT and POST procedures is, akin to all procedural sequences, immutable.

Component Redundancy and Diversification
The pattern-matching search known as backtracking can iteratively expand the leftmost node, or the rightmost node on Open [22].Results here are not identical, but are statistically equivalent.If one component is provided with one expansion search parameter, the other component must be provided with the same search parameter, or the resultant dual-component search will have some breadth-first, rather than strictly depth-first characteristics.This will change the semantics resulting from the use of large search spaces.Clearly, components need to be transformed with due regard for subtle context to preserve their aggregate semantics.These semantic differences become apparent on input vectors, which are outside of those used for I/O definition.Their use can result in erroneous communications via the local and/or global blackboards.The system of systems, described in the technical approach below, evolves such context-sensitive components and their transformations.
NNCSs can potentially provide exponentially more security than can a multi-compiler by finding multiple paths from start to goal states [2,22].Under syntactic differentiation, achieving the same results implies computing the same component semantics.Under transformational equivalence, one need not compute the same exact component semantics; only ones that achieve the same results in the context of other components.Given sufficiently large problem spaces and sufficient computational power, exponential increases in cyber security can thus be had.Syntactic differentiation can at best provide only linear increases in cybersecurity.Thus, our methodology offers far greater security against cyberattacks than can conventional approaches [3].
The transformational process converges on the synthesis of syntactically distinct components, which are, to the limits of testing, semantically equivalent.Such components can be verified to be free from attack if their I/O synthesis behavior is within the specified tolerance.Even so, multiple "semantically equivalent" components may compute different output vectors on the same, previously untested input vectors.Here, diversity enables the use of multiple functional analogies by counting the number of diverse components yielding the same output vector.It also allows for a count of the approximate number of recursively enumerable distinct paths leading to the synthesis of each component.This multiple analogies of derivation, when combined with multiple functional analogies, provide a relative validity metric for voting the novel output vectors.These solution vectors are very important because they evidence the system capability for learning to properly generalize by way of exploiting redundancy (i.e., in both function and derivation).Furthermore, having multiple derivations provides stochastic non deterministic probabilities.This lies at the root of human imagination and knowledge.

SUMMARY AND CONCLUSION
The goal of this paper was to show that cybersecurity problems can be seen to be a blessing in disguise.They are a blessing because their solution will entail advancing automatic functional programming through KBSE.Nondeterministic synthesized functions allow for the benchmarking of proper behavior; and, this allows for the detection of and recovery from cyberattacks.KBSE implies the synthesis of code as a last step and the synthesis (generalization) of knowledge as an intermediate step.This implies that cybersecurity is a function of scale because semantic randomization is dependent upon a maximal quantity of most-general knowledge, which is arrived at through spiral development across a network of coherent expert compilers.Such networks are self-amplifying over time.It follows that cybersecurity does not have a static solution.Automation will eventually provide a cure for the cyber-maladies it brought on.Table 1 presents some other traditional methods and randomization-based methods for cybersecurity and recounts their major advantage(s) favoring their use.Unlike all of these methods, semantic randomization allows cybersecurity to follow schema-based automatic programming for cost minimization as a function of scale, as illustrated in Fig. (2).Scale cannot be addressed by parallel processors alone.There comes a point where the inclusion of good search control heuristics becomes essential.Semantic randomization has been applied to actual naval problems [23 -25].The performance of the system (i.e., the inferential error rate) is tied to the size of the transformational base as well as the processing power/time allocated in conjunction with the schema-definition language.This is unbounded, by any non-trivial metric, because the Kolmogorov complexity of computational intelligence is unbounded [26].References [27 -61] contain supporting material, which supplements many of the topics covered in this paper.Behavioral Distance A way to defend against mimicry attacks by using a comparison between the behaviors of two diverse processes running the same input.

Semantic Randomization
A way to apply schema-based program synthesis to automatically create semantically equivalent syntactically distinct algorithmic variants of the same code, where a cyberattack will succeed on at most one variant.Picks up where the use of distinct compilers leaves off.May be used in combination with the above methods for enhanced cybersecurity.

Table 1 . A comparison of transformation-based cyber security methods.
Variant ApproachesIf the same input is supplied to a set of diversified variants of the same code, then the cyberattack will succeed on at most one variant.