M&M Task 3: Brains and Computers: Chaudhary (2016) : Brain-Computer Interfaces For Communicat Ion and Rehabilitation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

M&M TASK 3: BRAINS AND COMPUTERS

BCI

CHAUDHARY (2016): BRAIN–COMPUTER INTERFACES FOR COMMUNICAT ION AND


REHABILITATION
• Brain–computer interfaces (BCIs) use brain activity to control external devices, thereby enabling
severely disabled patients to interact with the environment
• Can be invasive and non-invasive: EEG and near infra-red spectroscopy
o EEG-based BCIs have enabled some paralyzed patients to communicate
o Near-infrared spectroscopy combined with a classical conditioning paradigm is the only
successful approach for complete locked-in syndrome
o The combo of EEG-based BCIs with behavioural physiotherapy is a feasible option for
rehabilitation in stroke
▪ → The approach is to induce neuroplasticity and restore lost function after stroke
• Assistive BCIs → designed to enable paralyzed patients to communicate or control external robotic
devices, such as prosthetics; rehabilitative BCIs are designed to facilitate recovery of neural function

CONNECTIONSIM

THAGARD
CHAPTER 1: APPROACHES TO COGNITIVE SCIENCE

Studying the Mind

• Aim of cognitive science: explain how people think


• This will be useful in many dimensions: education, engineering, computers, politics
• Cog science proposes that people have mental procedures that operate on mental representations to
produce thought and action
o Different kinds of mental representations, such as rules and concepts, foster different kinds of
mental procedures

Beginnings

• Plato: most important knowledge comes from virtue that people know innately, independently of
sense experience
• Descartes + Leibniz: knowledge can be gained just by thinking and rezoning = rationalism
• Aristotle: knowledge = rules that are learned from experience = empiricism
• Kant: rationalism + empiricism = knowledge depends on both sense experience and the innate
capacities of the mind
• Then experimental psychology came (Wundt), but then was overtaken by behaviourism (Watson)
• I skipped the rest cuz it seems irrelevant

Methods in Cognitive Science

• Primary method: experimentation with human participants


o They need a theoretical framework: do this by forming and testing computational models
intended to be analogous to mental operations
• Researchers have created computational models that stimulate aspects of human performance
• Linguists: identify grammatical pricniples that provide the basic structure of human languages
o Identification takes place by noticing subtle differences between grammatical and
ungrammatical utterances
• Inserting electrodes
• Magnetic and positronic scanning devices
• Observing patients with lesioned brains
• Cognitive anthropology: how thought works in different cultural settings
o Main method: ethnography = living and interacting with members of a culture to a sufficient
extent that their social and cognitive systems become apparent
• Philosophy: no distant method
o Important: deals with fundamental issues that underlie the experimental and computational
approaches to the mind
o Deals with mind and body relation, and also methodological questions such as the nature of
explanations found in cog science
o Deals with normative questions about how people should think
• Cog science = combo of psychology, AI, linguistics, neuroscience, anthropology, and philosophy

The Computational-Representational Understanding of Mind (CRUM)

• Thinking can be best understood in terms of representational structures in the mind and
computational procedures that operate on those structures
• CRUM might be wrong lol → might be inadequate to explain fundamental facts about the mind
• But it’s the most successful approach thus far at explaining the mind
• CRUM assumes that the mind has mental representations like data structures, and computational
procedures like algorithms
• CRUM: Mind <-> brain <->
computation
• Thinking arises from applying
computational procedures to mental representations

Theories, Models, and Programs

• Cognitive theory postulates a set of representational structures and a set of processes that operate on
these structures
• A computational model makes these structures and processes more precise by interpreting them by
analogy with computer programs that consist of data structures and algorithms
• Vague ideas about representations can be supplemented by precise computational ideas about data
structures, and mental processes can be defined algorithmically
• To test this model, it must be implemented in a software program in a programming language
• This program may run on a variety of hardware platforms (PCs, Macs…) or it may be designed for a
specific kind of hardware that has many processes working in parallel
• 3 stages of the development of cog. Theories: discovery, modification, and evaluation
• The running program can contribute to evaluation of the model and theory in 3 ways:
1. Shows that the postulated representations and processes are computationally realizable
2. In order to show not only the computational realizability of a theory but also its psychological
plausibility, the program can be applied qualitatively to various examples of thinking
3. To show a much more detailed fit between the theory and human thinking, the program can
be used quantitively to generate detailed predictions about human thinking that can be
compared with the results of psychological experiments
Evaluating approaches to mental representations

• You can use these criteria to evaluate 6 different


approaches to mental representation: logic, rules,
concepts, images, cases, and connections (artificial
neural networks)
• Representational power = how much info a particular
kind of representation can express
• Evaluate computational power of an approach in terms
of how it accounts for 3 important kinds of high-level
thinking:
1. Problem solving = 3 kinds of prob solving to
be explained: planning, decision making, and
explanation
- Planning requires a reasoner to get from
initial state to goal state → find
successful sequence of actions
- Decision making: choose best plan among other possible alternatives
- Explanation require figuring out why something happened
2. Intelligence = learn from experience to be better → so a mental representation must have
sufficient computational power to explain how people learn
3. A general cognitive theory must account for human language use
- 3 aspects of language use need to be explained:
1) People’s ability to comprehend language
2) Their ability to produce utterances
3) Children’s universal ability to learn language
• Psychological plausibility: requires accounting not just for the qualitative capacities of humans but also
for the quantitative results of psychological experiments concerning these capacities
• Neurological techniques: scanning techniques that can identify where and when in the brain certain
cognitive tasks are performed
o Important part of reflection on the operations of the mind
• Practical applicability: what knowledge representation has to tell us about
o Education: should be able to increase understanding of learning, and how to teach better
o Design: how to make computer interfaces that people like to use, should benefit from an
understanding of how people are thinking when they perform such tasks
o Intelligent systems: stand alone experts or as tools to support human decisions
o Mental illness: understanding and treatment

CHAPTER 7: CONNCECTIONS

• Cajal: brain consists of discrete cells


• Connectionist research: computational modelling inspired by the neuronal structure of the brain
o Emphasises importance of connections among simple neuronlike structures
o Discussed in terms of neural networks or parallel distributed processing (PDP)
• 2 models focused on in this paper:
1. Local representations in which neuronlike structures are given an identifiable interpretation in
terms of specifiable concepts or propositions
2. Distributed representations in networks that lean how to represent concepts or propositions
in more complex ways that distribute meaning over complexes of neuronlike structures
o These can be both used to perform parallel constraint satisfaction = try to satisfy all the needs
necessary to have an optimal thing
• Explicit models of parallel constraint satisfaction were first developed for computer vision
o Marr and Poggio: cooperative algorithm for stereoscopic vision
▪ How does the brain match the two different images from both eyes?
▪ Matching is governed by several constraints involving how points in one image can
be put into correspondence with points in another
▪ They proposed using a parallel, interconnected network of processors in which the
interconnections represented the constraints

1. Representational power

• In local connectionist networks, the units have specifiable interpretation such as particular concepts or
propositions
• The activation unit can be interpreted as a judgement about the applicability of a concept or the truth
of a proposition
• Links can be one way or symmetric
• Links are either excitatory or inhibitory
• Distributed representations
o Feedforward network where
info flows upward through the
network
o Bottom row: input, top row:
output
o Information is distributed all over
o .ie concepts in a network
• Recurrent networks: activation from the output units feed back into the input units
• Links between units suffice for representing simple associations, but lack representational power to
capture more complex kinds of rules ex: ‘for any x, if there is a y such tat u is a geek and x likes y, then x
is a geek” → difficult to represent in connectionist networks
o But can try by using synchrony to link units that represent associated elements: a unit of
package of units that represents the x that does the liking can be made to fire with the same
temporal pattern as the x that likes computers
o Or another way to represent relational info is to use vectors = list of numbers that can be
understood as the firing rates of groups of neurons
▪ Vectors can be distinguished between agents (what does the liking) and objects
(ones that are linked)
▪ They can be combined to represent complex relational info needed for analogical
reasoning

2. Computational power

A. Problem solving

• Parallel network
1. Concepts such as outgoing are represented by units
2. Positive internal constraints are represented by excitatory connections: if 2 concepts are
related by a positive constrain, then the units representing the elements should be linked by
an excitatory link
3. Negative internal constraints are represented by inhibitory connections: if 2 concepts are
related by a negative constraint, then the units representing the elements should be linked by
an inhibitory link
4. An external constraint can be captured by linking units representing elements that satisfy the
external constraint to a special unit that affects the units to which it is linked either positively
(excitatory links) or negatively (inhibitory links)
• The neural network computes by spreading activation between units that are linked to each other
o Unit with excitatory link to an active unit → gain activation
o Unit with inhibitory link to an active unit → decreased activation
• Relaxation = constraints can be satisfied in parallel by repeatedly passing activation among all units,
until after some number of cycles of activity all units have reached stable activation levels
o Adjusting the activation of all units based on the units to which they are connected until all
units have stable high or low activations
• Settling = achieving stability

i. Planning

• Touretzky and Hinton: rule-based system that uses distributed representations


o Treats the process of matching the IF part of a rule as a kind of parallel constraint satisfaction
o However, the resulting system can match only clauses with simple predicates, not relations
• Nelson, Thagard, and Hardy use local representations to implement rule matching and analogy
application as parallel constraint satisfaction
o Resulting system models plan construction

ii. Decision

• Elements of a decision are various actions and


goals
• The positive internal constraints come from
facilitation realisations: if an action facilitates a
goal, then the action and goal go together
• The negative internal constraints come from
incompatibility relations: when 2 actions can’t be
performed or satisfied together
• External constraint comes from goal priority:
some goals are inherently desirable, providing a
positive constraint
• Analogies are useful → but depend on parallel constraint satisfaction

iii. Explanation

• = activation of prototypes encoded in distributed networks


o Inference to the best explanation = activation of
the most appropriate phenotype
• This has been modelled via a theory of explanatory
coherence
o Units representing pieces of evidence are linked to
a special evidence unit that activates them, and
activation spreads out to other units
o There is an inhibitory link connecting the units
representing the competing hypothesis
o Choice of the best explanation can involve not only the evidence for particular hypothesis, but
also why those hypotheses might be true

B. Learning

• Learning can take place by


a) Adding new units
b) Changing the weights on the links between units → most researched
• Ex of weight learning: Hebb → what fires together, wires together
o This type of learning is unsupervised = doesn’t need a teacher
• Most common kind of learning in feedforward networks with distributed representations:
backpropagation = a generalization of the delta rule to multi-
layered feedforward networks, made possible by using the
chain rule to iteratively compute gradients for each layer
o Ex: after training, the network should be able to
classify students: given a set of features activated in
the input layer, it should activate an appropriate
stereotype at the output layer
o Can train the network by adjusting the weights that
connect the different units, through the following steps

o They also can identify statistical associations between input and output features that are more
subtle than rules
o Limitations:
▪ Requires a supervisor to say whether an error has been made
▪ Tends to be slow, requiring 100s and 1000s of examples to train a simple network

C. Language

• Interconnected units can represent hypotheses about what letters are present and about what words
are present, and relaxing the network can pick the best
overall interpretation
• Construction integration model of discourse
comprehension: meaning is determined by activation
flow in the network
o Which interpretation gets activated depends
on how input info will affect the various units
and links
• Children have tendency to use wrongly use past tenses (ex: goed, hitted)
o McClelland showed how a connectionist network can be trained to reproduce the children’s
error using distributed representations rather than rules
o But there is a debate (which I can’t be fucked to write down but I don’t think it matters so its
chill)

3. Psychological plausibility

• Connectionist models have furnished explanations of many psychological phenomena


o Ex: how the duration of context letters affects the perceptibility of a word, and also explains
various speech perception phenomena like temporal effects
o Ex: why some children are quick to learn some things, but slow to learn in others
• They have also suggested new ones
o Ex: purpose of an analogy has an effect on analogical mapping
o Ex: how people form impressions of other people
• Backpropagation techniques have also stimulated many psychological processes
o Ex: used it in a way that stimulates many aspects of human performance → how words vary in
processing difficulty, how novel items are pronounced, and how people make the transition
from beginning to skilled reading

4. Neurological plausibility

• Real neurons are much more complex than the units in an artificial network, which merely pass
activation to each other
• They also have neurotransmitters, and also undergo changes in synaptic and non-synaptic properties
• Each artificial unit = representing a neuronal group, a complex of neurons that work together to play a
processing role
• Local networks use symmetric links between units
• Synapses connecting neurons are one-way
o Also have neural pathways that allow them to influence each other
o Has excitatory links to other neurons or inhibitory links to other neurons, but not a mixture
• Actual neural networks have the feedforward character of backpropagation networks
o But there is no known neurological mechanism by which the same pathways that feed
activation forward can also be used to propagate error correction backward
• → despite all this shit, it’s still a good analogy

5. Practical applicability

• Education: reading → to read, you need to process letters into which words and simultaneously take
into account meaning and context
o Reading is a parallel constraint satisfaction → constraints simultaneously involve spelling and
leaning and context
• Design = parallel constraint satisfaction
o Ex: architect’s design for a building must take into account numerous constraints: cost,
purpose of building, surroundings, aesthetic
• Engineering → ex: training networks to recognise bombs, underwater objects, and handwriting

ACTIVATION FUNCTIONS
ACTIVATION FUNCTIONS IN NEURAL NETWORKS
Activation function? It’s just a thing function that you use to get the output of node

Why we use Activation functions with Neural Networks?

• It is used to determine the output of neural network like yes or no


o It maps the resulting values in between 0 to 1 or -1 to 1 etc. (depending upon the function)
• The Activation Functions can be basically divided into 2 types:
1. Linear Activation Function
2. Non-linear Activation Functions
- Derivative or Differential: Change in y-axis w.r.t. change in x-axis → also known as slope
- Monotonic function: A function which is either entirely non-increasing or non-decreasing

Types

1. Sigmoid or Logistic Activation Function


• Looks like an S curve
• It exists between 0 and 1 → 0 = neuron not firing, 1 = neuron
firing
• Especially used for models where we have to predict the
probability as an output
• Its differentiable and monotonic

2. Tanh or hyperbolic tangent Activation Function


• Like a sigmoid but better
• Range is from (-1 to 1)
• Is also sigmoidal (s - shaped)
• Advantage: negative inputs will be mapped strongly negative and the zero inputs will be mapped near
zero
• The function is differentiable
• The function is monotonic while its derivative is not monotonic
• The tanh function is mainly used classification between two classes

→ both are used in feedforward nets

3. ReLU (Rectified Linear Unit) Activation


Function
• Most used
• Half-rectified (from bottom) → f(z) is
zero when z is less than zero and f(z)
is equal to z when z is above or
equal to zero
• Range: 0 – infinity
• The function and its derivative both are monotonic
• Disadvantage: all the negative values become zero immediately which decreases the ability of the
model to fit or train from the data properly
4. Leaky ReLU
• Attempt to solve the dying ReLU problem
• The leak helps to increase the range of the ReLU
function → Usually, the value of a is 0.01 or so.
• When a is not 0.01 then it is called Randomized
ReLU.
• Range: -infinity to infinity
• Both Leaky and Randomized ReLU functions are
monotonic in nature. Also, their derivatives also monotonic in nature

Why derivative/differentiation is used? When updating the curve, to know in which direction and how much to
change or update the curve depending upon the slope

ACTIVATION FUNCTIONS AND ITS TYPES-WHICH IS BETTER?


What is an activation function?

• It is a very powerful, strong as well as a very


complicated Machine Learning technique which mimics
a human brain and how it functions
• Activation functions are really important for an artificial
neural network (A-NN) to learn and make sense of
something really complicated and Non-linear complex
functional mappings between the inputs and response
variable
• They introduce non-linear properties to our Network
• Main purpose: convert an input signal of a node in an
A-NN to an output signal
o That output signal now is used as a input in the next layer in the stack
• Specifically, in A-NN we do the sum of products of inputs(X) and their corresponding Weights(W) and
apply an Activation function f(x) to it to get the output of that layer and feed it as an input to the next
layer

Why can’t we do it without activating the input signal?

• If we do not apply an activation function then the output signal would simply be a simple linear function
• A Neural Network without Activation function would simply be a Linear regression Model, which has
limited power and does not perform well most of the time
• We want our Neural Network to not just learn and compute a linear functions but something more
complicated than that.
• Also without activation function our Neural network would not be able to learn and model other
complicated kinds of data such as images, videos, audio, speech etc

→ We need to apply an activation function f(x) so as to make the network more powerful and add ability to it to
learn something complex and complicated form data and represent non-linear complex arbitrary functional
mappings between inputs and outputs

→ Hence using a nonlinear activation, we are able to generate non-linear mappings from inputs to outputs

→ An activation function should be differentiable = we can find the slope of the sigmoid curve at any two points

- We need this in order to perform backpropagation optimization strategy while propagating


backwards in the network to compute gradients of Error(loss) with respect to Weights and then accordingly
optimize weights using Gradient descend or any other Optimization technique to reduce Error lol what

Most popular types

1. Sigmoid or Logistic
2. Tanh — Hyperbolic tangent
3. ReLu -Rectified linear units

GRACEFUL DEGREDATION

CONNECTIONISM: AN IN TRODUCTION
• Graceful degradation is the property of a network whose performance progressively becomes worse as
the number of its randomly destroyed units or connections increases
• The alternative property might be called catastrophic degradation = the property of a system whose
performance plummets to zero when even a single component of the system is destroyed
• That the performance of biological neural networks degrades gradually -- "gracefully" -- not
catastrophically, is another reason why artificial neural networks are more accurate information
processing models than classical ones
o After all, altering even a single rule in a classical computer model tends to bring the computer
implementing the damaged program to a "crashing" halt

TEACHTARGET
• Graceful degradation = ability of a computer, machine, electronic system or network to maintain
limited functionality even when a large portion of it has been destroyed or rendered inoperative
• The purpose: to prevent catastrophic failure. Ideally, even the simultaneous loss of multiple
components does not cause downtime in a system with this feature
• The operating efficiency or speed declines gradually as an increasing number of components fail
• Graceful degradation is an outgrowth of effective fault management, which is the component of
network management concerned with detecting, isolating and resolving problems
• In Web site design, the term refers to the judicious implementation of new or sophisticated features to
ensure that most Internet users can effectively interact with pages on the site

DELTA RULE

TECHNOPEDIA
Definition?

The Delta rule in machine learning and neural network environments is a specific type of backpropagation that
helps to refine connectionist ML/AI networks, making connections between inputs and outputs with layers of
artificial neurons

Explanation

• In general, backpropagation has to do with recalculating input weights for artificial neurons using a
gradient method
o Delta learning does this using the difference between a target activation and an actual
obtained activation
o Using a linear activation function, network connections are adjusted
• Another way to explain the Delta rule is that it uses an error function to perform gradient descent
learning:
o Essentially in comparing an actual output with a targeted output, the technology tries to find a
match
o If there is not a match = the program makes changes
o The actual implementation of the Delta rule is going to vary according to the network and its
composition → but by employing a linear activation function, the Delta rule can be useful in
refining some types of neural network systems with particular flavours of backpropagation

WIKIPEDIA
• Delta rule is a gradient descent learning rule for updating the weights of the inputs to artificial neurons
in a single-layer neural network
• It is a special case of the more general backpropagation algorithm. For a neuron j with activation
function g(x), the delta rule for j's ith weight wji is given by:

You might also like