2023 SIG Benchmark Report
2023 SIG Benchmark Report
2023 SIG Benchmark Report
THROUGH
THE SIG
LOOKING
GLASS
Benchmark Report | 2023
softwareimprovementgroup.com
THROUGH THE SIG
LOOKING GLASS
Benchmark Report | 2023
FOREWORD
Here it is, our fifth SIG Benchmark Report on the state of
technology. We started this tradition five years ago, producing
fascinating results for the digital community annually. This
year is no different. We have great things to share with you,
1 with thorough data analysis behind it to substantiate our
findings. I would like to thank the entire team who worked on
this great piece of work. I hope you enjoy reading it as much
as we did creating it.
[prompt]
picture of a canyon
viewed from above
with people on one
side and digital
hardware on the
other side
But first and foremost, digest the results, and put them into action. Some
of our findings are quite positive and can be seen as a compliment to
the digital world: we do see that, on average, the build quality is rising.
However, in all honesty, most of our findings are less favorable and
should require your immediate action. Read the section of your choice or
review the entire report: every section contains actionable findings. 2
There are many things to list here, but some really stand out. In no
particular order, here are a few things you can learn more about in the
report:
As you can see, there is so much to share, so please read our report and
let us know what you think.
1
systems.
2
Did you know that great
software ARCHITECTURE
needs a great alignment NEW SIG
with the ORGANIZATIONAL
and SOCIAL aspects of
ARCHITECTURE
your teams? As well as a QUALITY MODEL
solid design on top of the
right technological choices? PINPOINTS
With OUR NEW MODEL, it is
possible to MEASURE AND
COST, RISK, AND
CONTROL these aspects. In SLOWDOWN
terms of resolution speed,
good architecture quality can FACTORS
mean a factor of TWO TIMES
FASTER than poor quality. p. 19 - 32
AI is all the rage, also in the domain AI and big data
of enterprise software. Are we seeing
3
a next generation of smart systems systems plagued
that are being properly ENGINEERED
and coded? Or are we looking at a by poor coding
PROVERBIAL GOLD RUSH? It’s mostly
the latter, although exceptions do exist. p. 33 - 40
We CRUNCHED THE DATA on the AI and
big data systems in our benchmark to
show you the differences. What we are
often seeing in AI systems is a LACK OF
TEST CODE, a LACK OF ABSTRACTION,
and an overall maintainability that
scores BELOW THE BENCHMARK.
4
4
OPEN SOURCE productivity when building
systems. Each month, we are
REMAINS A seeing that 50% OF ENTERPRISE
SOFTWARE SYSTEMS are
WIDE-OPEN vulnerable due to security
BACKDOOR issues in open source libraries.
Business CRITICAL systems just
as much. Further, the fix speeds
p. 41 - 56 of vulnerabilities still leaves a
lot to be desired. LEGISLATIVE
CHANGES such as the US
Cybersecurity Strategy and the
EU Cyber Resilience Act will soon
5
Every IT organization is on the HUNT demand software producers to
for the RIGHT PEOPLE with the RIGHT have ZERO VULNERABILITIES. Is
COMPETENCIES. How to FIND them, your team ready?
TRAIN them, and KEEP them? EXIN
and SIG are ASSESSING thousands
of IT professionals globally to find out
what they are good at. Surprisingly,
many are in positions where they
need UPSKILLING TO COMPETE
First digital
better in their current roles. At the
same time, DEMANDING POSITIONS
skills benchmark
such as enterprise architecture and
leadership roles are truly becoming
shows poor job
SKILL HUBS. Those few sought-after
professionals are in a great position
alignment
to move to their NEXT ROLES and the
challenge will be to keep them on.
p. 57 - 64
1 Magiel Bruntink / Pepijn van de Kamp / Benedetta Lavarone
SOFTWARE
BUILD QUALITY
5
Major differentiator between
industries and technologies
IMAGE
LOOKING AT ENTERPRISE SOFTWARE FROM BOTH INSIDE AND OUTSIDE
Functionality
OUTSIDE-IN INSIDE-OUT
SOFTWARE SYSTEM
CHART-GANTT
GLOBAL BUILD QUALITY AS MEASURED BY SIG
Including 100K person-years worth of enterprise software
HHHHI
Weighted Mean Rating
HHHII
HHIII
[prompt] a photographic picture of a world in the future as a mix of happy people working
in a modern office doing digital work and industrialization elements like heavy machines in
light colors blue and white and green and yellow
Looking back at the past three years, we observe that the average
Modularity rating (purple line) is in gradual decline, in contrast to the
other build quality properties. The implication is that the architecture
aspect of build quality needs more attention to prevent further decline.
For this reason, SIG created a new Architecture Quality model: the first
outcomes of that are presented in the next chapter.
1 https://www.softwareimprovementgroup.com/software-analysis
2. Software build quality across industries
and technology stacks
Software industry sectors
Let’s start by slicing the SIG Build Quality benchmark by software
industry sectors. Most enterprise software producers have a clearly
defined target industry like Banking or Retail, for instance. We
further include Government as a broad category of different kinds of
governmental and regulatory responsibilities. Furthermore, the category
Software & Computer Services includes clients that are active across
9 many different industries or are focussed on clientele in the software
industry itself.
Our 2023 top 10 ranking of software industry sectors can be seen in the
following table. The Delta column indicates ranking changes compared
to 2022. Overall, the ranking remains rather stable, with a position swap
between first and second place, where Energy, Oil & Gas companies are
again leading the pack. Government systems gain a place at the expense
of Insurance. This year’s newcomer Health Care enters at position #8,
slightly below the market average of 3.0 stars.
In 2020 we published our industry sector ranking for the first time. Back
then, the margin between #1 and #10 was about .51 stars, while we are
now looking at a difference of .64 stars. The top sectors are gaining,
rather than the lower performance losing stars. The rates at which legacy
technologies can be phased out play a major role in these trends. Trailing
industries should therefore increase their actions toward modernization to
avoid being disrupted by newcomers.
LIST
# Delta Top 10 Software industry sectors 2020–2022 Score
1 ARROW-CIRCLE-UP Energy, Oil & Gas 3.40
2 ARROW-CIRCLE-DOWN Industrial Transportation 3.34
3 Banking 3.33
4 ARROW-CIRCLE-UP Government 3.25
5 ARROW-CIRCLE-DOWN Insurance 3.22
6 Financial Services 3.05 10
7 Software & Computer Services 2.98
8 new Health Care 2.92
9 ARROW-CIRCLE-UP Telecommunications 2.92
10 ARROW-CIRCLE-DOWN Retail 2.76
• Scores range between 0.5 and 5.5 stars in the SIG Maintainability
Model, calculated as mean maintainability weighted by systems’
volume, for each system’s most recent measurement.
• Industry sectors or technology stacks have at least 50 systems across
at least 10 clients.
• The table only shows the top 10 ranked industry sectors.
• For industry sectors, the deltas are rank position changes since the
2022 Benchmark Report.
11
LIST
# Top 10 Software technology stacks 2020–2022 Score
1 Web/Templating 3.40
2 BPM/Middleware 3.33
3 Low Code 3.22
4 Modern general purpose 3.18
5 Configuration 2.94
6 Scripting 2.84
7 Embedded/System 2.73
8 Packaged Solution Customization 2.57
9 Legacy 3GL/4GL 2.45
10 Database 2.45
For the Low Code category, we want to share the following additional
analysis. Looking at Component Entanglement, one of the underlying
metrics for architectural quality, we see a stark difference with competing
modern general-purpose languages. Component Entanglement is
a measure of overall layering and adherence to good architectural
principles.
CHART-GANTT
Tech stack Low Code Modern general purpose
Component Entanglement
HHHHH 12
HHHHI
HHHII
HHIII
HIIII
1 10 100 1,000
Volume (Person Months)
LIGHTBULB-ON
Key Finding: Low code systems are more
entangled, especially at larger system sizes.
This makes them harder to maintain, compared
to systems built in modern general-purpose
languages.
LIGHTBULB-ON
Key Findings:
• Software systems in different life-cycle
phases have very different growth and
change characteristics, impacting estimation
by an order of magnitude.
• While systems are in Evolution, they
typically grow at 10% per year and have a
change rate of 47% per year.
• In Maintenance, the growth of a code base
stagnates while existing code is still changed
at a typical rate of 15% per year.
Software lifecycle management within organizations with large software
portfolios is hard to do efficiently. We recommend tracking high-level
KPIs to indicate whether the expected maintenance activities take place
according to the expected life-cycle phase.
14
Before diving in, we need to define the typical life-cycle phases of
enterprise software systems:
• Initial development
The Initial Development phase starts with the first code being written
and typically ends when the software is considered both stable
and feature-rich enough to be rolled out to the full target group of
users. In this phase, the software is typically written by one or more
dedicated development teams. In the initial development phase, the
rapid growth of new code volume and a large amount of changes to
the existing code is expected.
• Evolution
After going into production, typically evolutionary activities take place:
addressing feedback from users on existing features, adding more
features to the software, and working on non-functional aspects of
the software (e.g. increasing scalability) as the user base grows. The
foundation of the software is now in place. In this phase, the software
is typically under development by one or more dedicated teams.
• Maintenance
In the Maintenance phase, the code base is typically brought under
the responsibility of a development team that maintains multiple code
bases (no dedicated team). In this phase, the ability of this team to
make changes to the existing software depends on the degree to
which knowledge of the code base is still available in the team, as
well as the quality of the code base, documentation, and integrity of
the architecture. Typical activities that are performed in this phase
are handling small change requests, bug fixes, and keeping the
underlying software libraries, frameworks, and other infrastructure
components up to date.
• Decommissioning
In this phase, it is time to execute change activities that are needed
for the sun set of the software system. Functionality and users are
migrated to other systems. No major changes are typically made to
the code base at this stage, other than changes that are needed for
phasing out specific functionality or to keep the software in a safe
and secure state (e.g. patching security vulnerabilities).
• End of life
In this phase, the software system is switched off and no more
changes are made to the software. The code base and related
artifacts are safely archived.
15
Let’s have a look at our data on 2 high-level KPIs that can help with
tracking evolution and maintenance activities:
CHART-LINE-UP
YEARLY GROWTH AND CHANGE RATE OF ENTERPRISE SOFTWARE
SYSTEMS PER LIFE-CYCLE PHASE
Based on analysis of code changes to 500 enterprise software systems between 2020 and 2022
175%
168%
150%
125%
100%
75%
50% 52%
47%
25%
15%
10%
3%
0% 0% 0%
[prompt] picture of
great software design
showing from inside a
computer
The data obviously shows variance. These numbers should be used as
initial guidance, and a more detailed SIG analysis may be required to
adequately forecast a specific system of a portfolio.
LIST
Yearly Growth Yearly Change
Low Median High Low Median High
Initial Development 8% 52% 215% 29% 168% 562%
Evolution 0% 10% 33% 10% 47% 149%
17 Maintenance -3% 0% 8% 3% 15% 53%
Decommissioning -1% 0% 2% 0% 3% 16%
& End of life
It’s clear that the software lifecycle phases have a major impact on yearly
growth and change rates. What are the implications? We see two major
recommendations:
[prompt] picture of
great software design
showing from inside a
computer
IMAGE
MAINTAINABILITY MEASUREMENT IS OUR TOOL TO DETERMINE
SOFTWARE BUILD QUALITY
18
HHHII
HHHII
HHHHI
5 - OVERALL RATING Translate to overall rating
of technical quality
Dennis Bijlsma / Lodewijk Bergmans
NEW SIG
ARCHITECTURE
2
19 QUALITY MODEL
pinpoints cost, risk,
and slowdown factors
2 https://www.oneadvanced.com/trends-report/2020-21/
IMAGE
PUZZLE
Structure
PEOPLE-GROUP NETWORK-WIRED
Knowledge Communication
21
Architecture
Quality
ARROWS-SPLIT-UP-AND-LEFT DATABASE
Evolution Data Access
🔌
Technology
Stack
Each architecture aspect influences the degree to which the system can
be easily changed or extended. The six aspects are:
So are microservices just a fad that is already dying out, or are these
principles here to stay? If we look at SIG’s benchmark data, we see
that microservice architectures have become mainstream around 2017,
and this has caused a significant increase in the average number of
components per system that is still visible to this day. This means the
trend towards systems that consist of many small components is both
widespread and showing no signs of slowing down.
CHART-GANTT
Relation between architecture quality and issue resolution time
HHHHI
HHHII
HHIII
0 10 20 30 40 50 60 70 80
Issue resolution time in days
LIGHTBULB-ON
Key Finding: Systems with 4-star Architecture
Quality resolve issues two times faster than
2-star systems.
2. Measure maintainability and socio-
technical architecture to determine
technical strategy
Historically, most attention on legacy systems has focused on the
functional, operational, and technical challenges surrounding such systems.
However, socio-technical architecture is increasingly a focus area, being
named in a report by Ardoq3 as the #1 trend for enterprise architects.
24
SIG considers maintainability to be the foundation for ensuring agility and
flexibility. So how does the SIG maintainability rating relate to the socio-
technical software architecture ratings?
CHART-SCATTER-BUBBLE
Maintainability vs. Architecture Quality
HHHHH
Refactor Retain
Architecture Quality
HHHHI
Architecture quality
HHHII
This chart shows the maintainability and architecture ratings for all
systems where SIG evaluated both aspects in 2022. The x-axis depicts
a system’s maintainability rating, while the y-axis depicts the socio-
technical architecture rating. The size of each dot represents the size of
each system. The colors are used to easily identify each quadrant.
3 https://content.ardoq.com/enterprise-architecture-trends-infographic
maintainability but acceptable socio-technical architecture, but there are
also many highly maintainable systems that have poor architecture. This
means that, depending on which quadrant your system is located in, you
can decide to focus on specific quality aspects first.
The size of the dots also reveals that most of the systems in the ‘retain’
category are smaller systems, which is generally also a trend for high
maintainable systems. For architecture quality, the size of the systems
matters much less: there are many mid- to large-sized systems with high
architecture quality, even though we see that the largest systems are in
the renovate/retire category.
LIGHTBULB-ON
Key Finding: Our architecture benchmark
confirms that larger systems often suffer in
quality - but not always, so it is indeed possible
to create big systems at high architectural
quality. High architecture quality allows for
systems to be refactored more easily and kept
maintainable.
In the next section, we investigate which factors most strongly influence
architecture quality and which are the most relevant areas to focus on to
ensure good architecture quality and an evolvable system.
3. Coupling and knowledge are the main
challenges for socio-technical
architecture
In the previous section, we explained the challenges that organizations
face in their socio-technical architecture, especially when it comes to
modernizing legacy systems. But what factors contribute most to these
challenges?
CHART-GANTT
WHAT ASPECTS DISTINGUISH THE HIGH PERFORMERS ON
ARCHITECTURE QUALITY?
What aspects distinguish the high performers on architecture quality?
Knowledge distribution
Component coupling
Property
Component freshness
Communication centralization
Data coupling
The above diagram shows the top five properties where high architecture
quality systems outperform the medium quality systems: it shows how
the average high performer compares to the average system in the
benchmark for each of the properties. More precisely, each bar shows the
delta between the median rating of the high performers and the median
rating of the mid-performers in the benchmark. A significant delta for a
given property indicates that, for most systems, there is significant room
for improvement, especially for that property.
LIGHTBULB-ON
Key Finding: Knowledge Distribution,
Component Coupling, and Communication
Centralization are the big factors of high-
27 quality architecture in the SIG architecture
quality benchmark: get these right to increase
architecture quality.
The system properties with the most significant deltas are Knowledge
Distribution, Component Coupling, and Communication Centralization.
These properties cover very different aspects of the architecture:
Communication Centralization and Component Coupling cover technical
coupling; the dependencies between different parts of the code. More
dependencies, spread more widely across the code, makes the code
harder to change, and causes changes to one component to ripple
through to other components (often impacting additional development
teams).
4 Chuanqi Wang, Yanhui Li, Lin Chen, Wenchin Huang, Yuming Zhou, Baowen Xu, Examining the
effects of developer familiarity on bug fixing, Journal of Systems and Software, Volume 169, 2020
CHART-GANTT
SURVEY: WHAT DO YOU SEE AS THE MAIN CHALLENGE IN
MODERNIZING YOUR CURRENT SOFTWARE LANDSCAPE?
The SIG architecture quality model aims to bring better insights into the
ability of an architecture to evolve swiftly. The underlying metrics of the
model offer suggestions on what properties of the system can (or need to
be) improved.
1. Define
architecture
principles, 2. Capture the
not rules rationale for
architecture
decisions
29
3. Address
architecture
in an
incremental
fashion
Obviously, some sort of feedback loop is still needed to make sure the
principles are actually applied in practice. This is where SIG’s architecture
quality model can help: low ratings can indicate that a principle is not 30
followed, or that a new principle needs to be defined.
5 https://www.softwareimprovementgroup.com/using-architecture-decision-records-to-guide-your-architecture-choices/
31
[prompt] a picture of
digital growth in fancy
colors and binary code
SIG recommended practice:
ADDRESS ARCHITECTURE IN AN INCREMENTAL FASHION.
Addressing technical debt at code level is often done using small,
incremental refactorings. Such an approach leads to lower risk, as
the scope of changes is smaller. In many organizations, architecture
changes do not follow this agile approach and tend to be structured into
“projects” where large changes are made over a period of time.
The system properties in the SIG Architecture Quality Model map directly
to architecture modernization techniques that can be applied in this way:
3
Rob van der Veer / Asma Oualmakran
34
Data analysis of our benchmark shows that AI/big data systems
are significantly less maintainable than other systems. 73% of AI/
big data systems score below the benchmark average. Their average
maintainability rating of 2.7 stars is significantly lower than the average
of other systems6. This is mostly caused by the quality properties Unit
Size and Unit Complexity. AI/big data systems are, on average, in the 5%
bottom of the industry regarding Unit Size (long blocks of code), and for
Unit Complexity in the bottom 25%.
Fortunately, there are also AI/big data systems with high maintainability,
as our benchmark clearly shows. This demonstrates that it is, thankfully,
not impossible to build maintainable AI/big data systems.
[prompt] picture of a
white human-like robot
reading a book at a
table with a desk light
on
6 T-test rejected the null hypothesis that there is no significant difference between the maintainability of AI systems and general
SIG benchmark.
CHART-SCATTER-BUBBLE
Dataset
AI/big data systems
SIG Benchmark
35
LIGHTBULB-ON
Key Finding: In the typical AI/big data system, 36
1.5% of the code is test code, whereas the
benchmark is 43% test code.
LIGHTBULB-ON
Key Finding: On average, AI/big data systems
have significantly lower maintainability,
mostly caused by long and complex blocks of
code accompanied by a very low amount of test
code. The result is that AI/big data systems
tend to be difficult and costly to change, extend,
and integrate, with a high risk of making
mistakes. Furthermore, this can severely hinder
transferring AI/big data systems to another
team.
What could be causing these long and complex code units? Typically,
such issues are the result of unfocused code (having more than one
responsibility) and the lack of abstraction: useful pieces of code are not
isolated into separated units. Instead, they are copy-pasted in code.
Without exception, it is our experience that AI/big data code suffers from
these problems.
RECTANGLE-CODE
GREATEST(IIF(ISNULL(i_RS_VLD_FM_DT),TO_DATE(v_
LOGC_RSVD_VAL_UNKNOWN, ‘YYYY-MM-DD HH24:MI:’),i_
RS_VLD_FM_DT),IIF(ISNULL(i_RS_VLD_FM_DT_fauit),
TO_DATE(v_LOGC_RSVD_VAL_UNKNOWN, ‘YYYY-MM-DD
HH24:MI:SS’), i_RS_VLD_FM_DT_fauit),IIF(ISNULL(i_
RS_VLD_FM_DT_xref_sol),TO_DATE(v_LOGC_RSVD_VAL_
UNKNOWN,‘YYYY-MM-DD HH24:MI:SS’),i_RS_VLD_FM_DT_
xref_sol))
The above code snippet can be refactored in the below more abstracted
and readable code. Conditions of each parameter are abstracted into
the function MakeValidDate. This single improvement has a significant
impact on maintainability as repeated functionality is simplified,
centralized, and made testable.
RECTANGLE-CODE
Greatest ( MakeValidDate(i_RS_VLD_FM_DT),
MakeValidDate(i_RS_VLD_FM_DT_fauit),
MakeValidDate(i_RS_VLD_FM_DT_xref_sol))
LIGHTBULB-ON
Key Finding: Important causes for
maintainability issues in AI/big data systems
are code having multiple responsibilities and
the lack of abstractions.
What could be the root cause for the AI/big data maintainability issues?
38
1. Lab programming: most of data science work is aimed at a single
experiment, to try things in one-shot, or solve an analytical problem
ad hoc - not with the intention to deliver something to go into
production for a long time per se. The problem is that once things
work, there is no real incentive for the data scientist to refactor and
improve code quality. After all, there are no tests, so changing code is
risking breaking it without noticing it.
2. Data science education programs typically focus more on data
science and less on software engineering best practices.
3. Traditionally, data science development tools lack the support for
software engineering best practices.
a. R and Jupyter notebooks for example are based on the
paradigm of a step by step one-shot approach, which is
suitable for experiments but not for maintainable software.
b. Some data science languages lack powerful abstraction
and testing mechanisms.
4. The SQL pattern is often the standard paradigm for data
preparation. This pattern comes down to working with datasets that
are joined and contain many consecutive operations on many fields at
the same time. In AI/big data this represents a large part of the work
(75-90%7), and it has its maintainability challenges - for which the
solutions are often unknown to data scientists. Data scientists find
this the least enjoyable part of the work8 and the most difficult9.
7 (Microsoft 2019)
8 Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says”, Forbes, Gil press, March 2016
9 Biggest difficulty: “Software Engineering for Machine Learning: A Case Study”, 2019 IEEE/ACM 41st International Conference
on Software Engineering: Software Engineering in Practice (ICSE- SEIP), Amershi et al. Microsoft
Part of the reason why unit testing is lacking in AI/big data systems is
that the engineers rely on integration tests, which can be done elegantly
for this type of system by measuring the correctness of the AI model.
If the model performs badly, this can be caused by some issue in the
pipeline. The problem with this approach is twofold:
LIGHTBULB-ON
Key Finding: Maintainability issues in AI/big
data systems have a root cause in the way data
scientists tend to work, given their focus on
experiments, their education, their tools, and
the fact that data preparation is the dominant
part of their work.
40
VULNERABLE OPEN
4
SOURCE REMAINS
A WIDE-OPEN
41
BACKDOOR
One year on we revisited our core findings to provide a fresh view of the
state of affairs. Spoiler: we’re not yet where we need to be, not by a
longshot. There may be some recent hints of gradual improvements, but
it's more accurate to say that we are still in the middle of a vulnerability
pandemic.
LIGHTBULB-ON
Key Finding: 50 to 60% of enterprise software
systems have a vulnerable open source
dependency, each month. Around 30% have
a critically vulnerable dependency. Business
critical systems are only marginally less
exposed than less critical systems.
Let’s have a good look at our extended data and distill answers to the
following questions:
60%
Monthly vulnerable systems %
40% 44
20%
Severity
Low
Medium
High
0% Critical
Jul '21 Oct '21 Jan '22 Apr '22 Jul '22 Oct '22 Jan '23
Carving our data to highlight the class of systems that are deemed to be
Business
60% Critical by their owners reveals the following trends:
40%
Other systems
s
0%
Oct '21 Apr '22 Oct '22 Apr '23
0%
It appears that marking systems as business critical, for example,
enterprise architects or higher management has no significant impact
on the exposure to critical vulnerabilities. In our data, we observe much
the same rates of critical vulnerabilities across the years 2021 and 2022
for the business critical and the less critical systems. It seems there is
a significant disconnect between the business owners and the technical
owners: a gap that is urgently asking to be bridged.
Below we list the top 10 most seen dependencies that had a critical
vulnerability in 2022. These are popular libraries that enjoy a lot of
attention among developers and hackers alike. If vulnerabilities become
known in these libraries, teams do well to update to safer versions on
short notice.
LIST
# Critically vulnerable Language ecosystems % systems
dependencies that used
1. FasterXML Jackson Java Maven 9.9%
2. Spring Framework Java Maven 9.8%
(SpringShell)
3. OWASP HTML Sanitizer Java Maven 7.8%
4. Spring Framework Java Maven 7.8%
5. Log4net .NET NuGet 7.4%
6. Log4j 1.2 Java Maven 7.0%
7. .NET Core .NET NuGet 6.7%
8. Commons Text Java Jar 5.6%
9. PDFBox Java Jar 5.1%
10. PostgreSQL Java Maven 4.6%
The top 10 list for 2022 indeed features exclusively the common modern
programming technologies in enterprise software, Java and .NET. The
XML handling library Jackson is one of the most used libraries in Java
and at the same time the one for which the most vulnerabilities have
been reported in previous years. Close to 10% of systems we analyzed in
2022 used a critically vulnerable version of Jackson. A very close second
is the popular Spring Framework, with an exploit known as SpringShell
(following the naming of the late 2021 Log4Shell incident).
Our top 10 list consists of vulnerable libraries that are commonly used
directly, or sometimes in an indirect and less visible way. In those cases,
a vulnerable library is only used through another library that is directly
and visibly used with a code base. In particular, the logging libraries
are commonly pulled in without much fanfare and with a risk of causing
unknown vulnerability to exploits.
14 https://blog.checkpoint.com/2023/01/05/38-increase-in-2022-global-cyberattacks/
15 https://www.enisa.europa.eu/news/enisa-news/understanding-the-increase-in-supply-chain-security-attacks
16 As shown in our Benchmark Report 2022, page 23
2. Reduce risk in open source usage by
addressing build quality
Open source ecosystem hosting vendors such as SonaType provide yearly
updates on the state of their managed ecosystems. Over 2022, they
reported that 14% of downloaded libraries were vulnerable. Often that’s
because the downloaded version was outdated and, while known to be
vulnerable, was still available for download. Also, they report that 6 out
of 7 vulnerable libraries are downloaded as indirect dependencies.
47
Turning to our data, we observe an overall more positive picture. The
following graph shows the percentage of vulnerable libraries used in
enterprise software observed by SIG. The good news is that the overall
rate is well below the SonaType-reported 14%, in particular for the
critically severe ones (at just 1-2%). So, the enterprise software systems
we observe with Sigrid are using fewer vulnerable versions than the
general software public.
CHART-GANTT
VULNERABLE LIBRARIES ACROSS THE ENTERPRISE SOFTWARE
INDUSTRY
Based on 106K dependencies seen in 2022
Critical
High
Medium
Low
200%
100%
0%
n = 240 n = 2900 n = 3060 n = 480
16 https://www.softwareimprovementgroup.com/software-analysis/
17 Libraries of 1-star Maintainability are not yet available in the dataset
LIGHTBULB-ON
Key Finding: Compared to recommended
4-star Maintainability, 2-star Java and Python
libraries have 2x more risk of having vulnerable
versions. At the same time, exceptional quality
5-star libraries show only a quarter of the risk.
49
The SIG Maintainability measurement is often correlated with other best
software development practices. To name just a few core practices:
automated unit testing, modern (security) code review, and the use of
automated tools for continuous integration and deployment. One reason
for the correlation is that such best practices are hard or impossible to
implement in code bases of low quality. Such code bases resist being
understood, tested, and changed.
CHART-GANTT
TIME-TO-UPDATE VERSUS MAINTAINABILITY
Tracking 326K dependencies from 18 ecosystems in 3500 client
systems
Client system maintainability HHIII HHHII HHHHI HHHHH
75%
Percentage of dependencies updated
50%
25%
0%
50
In the plot on the page before, we show longitudinal data for the time-to-
update in days since a dependency was introduced. In short, the higher
the graph, the quicker dependencies were updated to newer versions.
It’s clear to see that 5- and 4-star code wins the race, while 3-star
code lags behind and never really catches up. 2-star code is often older
and generally uses a stagnant set of libraries. Given that the libraries
themselves tend to evolve quickly, and vulnerabilities are discovered at an
increasing pace, stagnation of update speed is a genuine concern.
[prompt] picture of
great software design
showing from inside a
computer
Establish a duty of care based on current security standards
In order for the software industry to prevent countless lawsuits and fines,
it is time to start taking responsibility by building security in from the start.
Many organizations are working on this but struggle for different reasons,
including that there is no clear and shared duty of care in the industry.
When is software secure enough?
SIG is making efforts to urge the software industry to action. For this,
the right incentives could be provided by the right legislation. However,
requiring software makers to adhere to strict security standards could
reduce the freedom to innovate. Instead, we would like to see that the
industry can showcase how secure their product is so that buyers can
decide. The responsibility of ensuring secure software is then moved to the
buyers provided that the software producers have done their job regarding
assurance.
18 https://www.opencre.org/
19 https://www.commoncriteriaportal.org/
20 https://www.enisa.europa.eu/publications/cybersecurity-certification-eucc-candidate-scheme
As an example of an effective security prescription, we would like to
point out The Update Framework (TUF)’s threat analysis21. By providing
transparency on the perceived threats and implemented mitigations,
together with external audit reports, the users of TUF can implement the
product more securely within their own scope of responsibility.
21 https://theupdateframework.io/security/
7. Instruct teams on security requirements with a combination of
training, coding guidelines, and continuous knowledge exchange. To
deal with a large number of requirements, it is essential to implement
a process to attach the relevant instructions and verifications to
individual tasks (e.g. stories22).
[prompt] a photographic
image of a poorly
constructed building
showing locks at doors
22 https://owaspsamm.org/guidance/agile/
IMAGE
PROJECT HIGHLIGHT
Threat
weakness
mitigation
taxonomy
55 Machine Expert
learning from review tactics
expert knowledge
decisions
Correlated
Hotspot
SAST/ DAST
detection findings
Context
based
verification
guidance
S C R A M B L E
Smart Code Review Assistance Module
Blending Leading Expertise
The current prototype is already being used in client assignments and has
led to a substantial increase in efficiency, quality of work, consistency,
and availability to a larger group of experts - users at SIG, but we are
also taking steps to involve users at clients and partners.
This builds on the idea that we also promote through our Sigrid platform:
allowing organizations to apply an approach in the business context:
a risk-driven and cost-based way of dealing with security, instead of the
bottom-up approach of having to deal with thousands of tool findings
that need to be fixed.
Wouter Knigge / Edward Song / Magiel Bruntink / Xander Schrijen
FIRST DIGITAL
SKILLS BENCHMARK
SHOWS POOR JOB
57
ALIGNMENT
5
At the end of 2022, Astride began, and since then, over 5,500
people in over 180 countries have taken their Digital Skills
Assessment which has provided professionals with insight into
how they relate to their current and potential future job roles
and has provided us with a unique view on global digital skills
across industries. Our major finding is that we clearly see that
digital skills are very poorly met. The average skill gap for the
vast majority of roles is more than 50%. There is significant
room for upskilling, to put it mildly.
[prompt] a photo-realistic picture of a happy casual man with eight arms and hands with
digital devices in it
1. Job market alert: a desperate need for
digital skills
We are in an era of constant disruption, and the IT and Software industry
is leading the charge. Over the past three decades, technological
advancements have significantly accelerated, reshaping the ways
we communicate and connect. Consider the smartphone revolution,
which transformed and mobilized our means of communication. This
phenomenon is not an isolated incident. Disruptive leaps in technology,
such as the robotization of the manufacturing sector, have consistently 58
replaced earlier methods at an ever-increasing pace.
However, is the workforce keeping up? Over the past decade, discussions
surrounding the "Talent Shortage" in IT have persisted. The rate of
change does not correspond with the number of skilled professionals
entering the job market. For example, the software engineering industry's
shortage of technical personnel is growing at an alarming rate, with
the security niche leading the pack23. Moreover, the existing workforce
requires continuous upskilling to remain competitive and relevant.
• Employer’s perspective: are people equipped for the jobs they are
currently in? No, on average, there is a high skill gap for practitioners
in their current jobs, with only a few exceptions.
• Candidate’s perspective: what jobs are easy to get into, and which
have good follow-up opportunities? The data show plenty of job entry
opportunities for the Astride participants, in particular for those with
specialized and demanding jobs.
23 https://www.forrester.com/report/the-security-skills-shortage-takes-its-toll-on-organizations/RES178724
24 https://gaper.io/tech-talent-shortage/
2. Astride – The digital skills compass for
tomorrow’s learning journey
Achieving equilibrium in the digital skills marketplace necessitates aligning
supply and demand. Both candidates and employers require insight into
their respective positions and potential actions:
IMAGE
YOUR CURRENT BEST MATCH SECOND THIRD FOURTH FIFTH
JOB ROLE
27% Chief
71%
Solution
64%Data
64% ICT
63%
Project
60%
Digital
information designer scientist operations manager educator
officer (CIO) manager
25 https://esco.ec.europa.eu/en/about-esco/escopedia/escopedia/european-e-competence-framework-e-cf
26 https://www.exin.com/astride-by-exin/
3. Insights from the Astride Benchmark on
the current job market
Drawing from the anonymized self-assessment data, we can examine
the current trends in the global job market. In addition to the skill
assessments, EXIN collects information on participants' country of
residence and industry, allowing for contextual analysis. Hence, our
dataset can be utilized to combine findings into insights, as represented
below.
60
This Benchmark Report provides a few key findings out of a deeper
analysis that will soon become available as an EXIN whitepaper. Stay
tuned on EXIN’s LinkedIn27 for an update on that.
LIGHTBULB-ON
Key Finding: 5,500 IT practitioners from across
the globe have used Astride so far.
• 180 out of 195 countries are represented,
• Most common industry is IT / Software,
but many more specific industries are
represented,
• Most respondents indicated “Project
Manager” as their job role.
27 https://www.linkedin.com/school/exin/
IMAGE
61
LIGHTBULB-ON
Key Finding: Except for people in the jobs of
Digital transformation leader and Information
security manager, the average Astride score for
job roles shows a skill gap of at least 50%. The
average Developer has a 75% gap with having
all skills for the job.
For each job role assessed, the graph shows the average Astride score
(dark blue line) inside the middle 50% scores (gray box). The maximum
Astride score a participant can get for their current job is 80 points. While
there are no doubt skilled individuals in all of the assessed jobs, this
analysis focuses on the average cases to allow organizations to reflect on
recruitment, training, and retainment policy.
Looking at the graph, it’s obvious that people in most job roles score far
below 40 points on average, implying a skill gap of higher than 50%. The
only exceptions are people in the jobs of Digital transformation leader
and Information security manager. The average Developer even has a
75% gap with having all skills for the job, with an average Astride score
of about 20.
CHART-GANTT
AVERAGE ASTRIDE JOB-SKILL MATCH SCORE
62
The Astride data reveal that the top-3 missing skills for Developers are
the following:
1. Component Integration (level 2),
2. Testing (level 2),
3. Documentation Production (level 3).
The days of any one individual staying in the same role for many years
are gone. From individuals to corporate employers and government
63 agencies, job market mobility is essential28.
Astride assesses what skills people currently possess and what the
requirements for each job are. The data can also tell whether the people
in each job have skills that also apply to other jobs. For instance, if you
are in a data science role but have more development skills than the
average developer, then a developer role could be a good follow-up
opportunity. The next graph reflects the overall job mobility picture that
follows from the Astride data:
28 https://www.gartner.com/en/newsroom/press-releases/2020-02-27-gartner-says-hr-leaders-must-build-a-robust-strategy-
CHART-GANTT
JOB MOBILITY OPPORTUNITIES BASED ON ASTRIDE SKILL DATA
64
LIGHTBULB-ON
Key Finding: Enterprise Architect and Digital
Transformation leader jobs are the most
densely skilled: on average, they are
out-skilling people in most other jobs.
65
FINAL THOUGHTS
Now that we are at the end of our fifth SIG Benchmark Report, it is clear
that the digital world needs to get its act into gear, everybody. The issues
we revealed are major issues requiring a concerted effort to resolve.
There is a lot more underlying data we can share to build your own
specific case, and we are more than happy to assist.
66
In our report, we shared the most important findings. There is something
for everybody to act upon. Whether it is to upskill you or your team's
digital skills, make the promise of low-code work, or ensure that your
great AI project is not becoming a legacy nightmare. Or, how you can
make sure open source is not the open door it seems to be: in fact, today
it’s a whole warehouse of open doors, so big that it's scary.
So, the digital world is progressing fast, a lot is happening, and there is
still a lot to be fixed.
In this report, we gave you the guidelines to focus your short-term efforts
and drive your long-term plans. Let us know where we can help with our
data, insights, and technology.
We’re at the end of the report with nothing more to read but all the more
to do!
Thanks for your attention and focus. We loved writing this and hope you
enjoyed reading it, and most importantly, we hope it is of value to you.
[prompt] a photo-realistic
picture in a happy setting of
a smiling little girl dressed
as a hip programmer
working on a laptop in fresh
colors
About Software Improvement Group
www.softwareimprovementgroup.com
[email protected]
COLOFON
SIG Benchmark Report | 2023
Enterprise software through the SIG looking glass
Authors 70
• Asma Oualmakran, SIG
• Benedetta Lavarone, SIG
• Chushu Gao, SIG
• Dennis Bijlsma, SIG
• Edward Song, EXIN
• Lodewijk Bergmans, SIG
• Luc Brandts, SIG
• Magiel Bruntink, SIG
• Miroslav Zivkovic, SIG
• Pepijn van de Kamp, SIG
• Rob van der Veer, SIG
• Xander Schrijen, SIG
• Wouter Knigge, EXIN
Legal Notice
This document may be part of a written agreement between Software
Improvement Group (SIG) and its customer, in which case the terms and
conditions of that agreement apply hereto. In the event that this document was
provided by SIG without any reference to a written agreement with SIG, to the
maximum extent permitted by applicable law this document and its contents are
provided as general information ‘as-is’ only, which may not be accurate, correct
and/or complete and SIG shall not be responsible for any damage or loss of any
nature related thereto.
©2023 Software Improvement Group. All rights reserved. No part of this book
may be reproduced or used in any manner without the prior written permission of
the copyright owner, except for the use of brief quotations in a book review.