Tech Report
Tech Report
Tech Report
BELAGAVI, KARNATAKA.
Submitted by:
CERTIFICATE
This is to certify that the Technical Seminar report entitled
Signature of HOD
i
ACKNOWLEDGMENT
The credit of the successful completion of the technical seminar should go to the
persons who rendered their consistent, constant source of knowledge, timely suggestions and
instructions towards us.
First of all I wish to express earnest thanks and affection respect to my project guide
Mr. Mr.Sandeep B,(B.E, M.Tech), Department of Computer Science & Engineering, who is
the motivator and source of inspiration.
I would like to thank our beloved Coordinators Dr. Ravindra S and Dr. Sankhya N
Nayak, Department of Computer Science & Engineering for their support.
I would like to thank our beloved Professor and Head Dr. Jalesh Kumar, Department
of Computer Science & Engineering for allowing me to take up this technical seminar.
I am very much grateful to our respected principal, Dr. Y. Vijay Kumar for his
encouragement and providing an excellent working environment in our college.
Finally, I thank our teaching and non-teaching staff, classmates and all who have
helped us directly or indirectly for the successful completion of this technical seminar and, I
would like to thank JNNCE College for providing a stage to show my talent.
Supritha Patil Ml
4JN21CS411
ii
TABLE OF CONTENTS
Abstract i
Acknowledgment ii
Contents iii
List of Figures iv
List of Tables v
1 INTRODUCTION 1-3
4 RESULTS 17
5 APPLICATIONS 18-19
iii
LIST OF FIGURES
Figure Title Page No.
iv
Empirical Study for Open Source Libraries in Automotive Software
System
CHAPTER 1
INTRODUCTION
1.1 Introduction of the project
existing studies that aim to provide such information. Instead, some related researchers are
trying to investigate the open-source software ecosystem in a more general situation
CHAPTER 2
LITERATURE SURVEY
Literature Survey helps in relating the proposed work to prior researches in statistics
and helps in finding errors and drawbacks of the particular method used to solve problem.
Authors Name: Boyang Du, Sarah Azimi, Annarita Moramarco and Davide
Sabena.Year: 2022.
Description:
Advantage:
In this article, it propose a new testing infrastructure that unifies different platforms
targeting different stages of development, taking advantage of existing CI tools in
software development. The testing infrastructure is able to generate test cases targeting
different systems taking into account the availability of different features of each
platform and gather test results to provide feedback to designers. It reduce cost in terms
of time
Disadvantage:
The resolution has not been provided for the Virtual Platform and System since
both the approaches adopted a simulation engine that cannot be compared with a system
emulation.
Advantage:
Disadvantage:
Some of the repositories from Py Git Hub based data was not available in G H
Torrent. Consequently. And it might have missed the different tiers of suppliers to car
makers since there is no straightforward way to identify suppliers from the GitHub
meta- data.
Description:
Advantage:
It is advantageous to provide not only the libraries themselves but also best
practices fortheir usage and management. This can encompass offering standardized
installation or import procedures for the libraries and ensuring their compatibility with
component detection tools. The overarching goal is to facilitate easier open-source
management for C/C++ developers.
Disadvantage:
Advantage:
The entire system developed offers a better perspective when it comes to analyze a
project’s dependencies. It offers a view on the age of the dependencies, on the
vulnerabilitiesa dependency has, and also can help to understand the structure of a
system by simply seeingits dependencies and where they are used. This approach
enables to allocate resources and efforts more efficiently, ensuring that the most
pressing library- related issues are tackled promptly. Therefore, this approach can make
an important contribution in the process of evaluating the quality of software systems.
Disadvantage:
Description:
This article is a ‘paper’. At the moment it will reach broader readership with a
formal citation attached, it will have passed peer review, and be part of a referenceable
collection ofproceedings of the ICSE 2023 Software Engineering in Society Track. This
form and workflow have been the traditional template for communicating scientific
outcomes, where getting papers accepted at prestigious venues has traditionally been
treated as the major indicator of academic achievement. Academic research has been
operating under scarcity, both regarding job and research funding security. As a
consequence, (not) getting major publications accepted and sufficiently cited thus has
great career consequences. Still, for a long time, research communities have been
acknowledging that contributions of scientific insight extend much beyond a paper, and
proposals for open science have emerged, including ventures 1Most likely, it will not
reach the reader on paper, but as a digital PDF. into open access, open and FAIR
(Findable, Accessibile, Interoperable, Reusable) data, and open-source software.
Advantage:
More specifically, considering empirical scientific insights in the broad sense (i.e.,
insights requiring empirical observation of phenomena, often expressed in the form of
data measurements), this will argue that making these insights more open will require
infrastructure and quality assurance mechanisms similar to those needed in developing
complex open-source software artifacts.
Disadvantage:
Still, open access is only an aspect of open science, and insights and methods
reported in a paper may not trivially be reproducible or replicable, either because
common specifications are not sufficiently detailed, or because claims may be outright
false.
CHAPTER 3
PROPOSED METHODS
The selected firmware must either represent a complete system image or an update
package aimed at upgrading the entire firmware system. If there is no such
firmware,select multiplefirmware to ensure that the combination of them can cover the
entire system. This criterion ensures to avoid selecting firmware that only contains partial
information about the system.
The selected firmware should originate from different automotive manufacturers,
rather than being limited to a single one. This approach ensures that the firmware comes
from different development teams, which may use different open source libraries.
The selected firmware should come from different parts of the automotive system.
As an automotive vehicle may integrate multiple firmware systems, this criterion ensures
that our selection includes firmware from various parts with different kinds of open
source libraries.
3.1 illustrates the workflow for our data extraction process. Beginning with a firmware
file, the initial step involves decompression or unzipping using appropriate archive tools,
resulting in first-level binary files. These binary files encompass known file types,
unknown types, and archive files. To further recover and extract binary files with
recognized formats, it employ distinct analysis tools. For unknown binaries, employ bin
walk to extract potential files, and for archive files, utilize archive tools to decompress
them. This process yields second-level binary files. By iteratively applying this procedure,
the progress to extract binary files that cannot undergo further decompression. These files
are considered the final output of our comprehensive data extraction process.
For instance, if the magic header corresponds to a well-known file type such as
PNG (Portable Network Graphics) or JPEG (Joint Photographic Experts Group), it
indicates that thefile is an image and thus not relevant to our objective of extracting open-
source libraries. Therefore, such files are filtered out from further consideration.
For files identified as being of file system type, such as. img files,undertake the process of
system recovery and subsequently extract all files residing within the system. It is important
to note that file systems are a unique form of compression and may not be decompressed
using standard decompression tools. Therefore, specialized techniques are applied to
recover the system and extract its contents successfully. Lastly, files with an unknown type,
typically binary files or data files, undergo an additional step in which the bin walk tool is
employed as the default method to determine whether they contain any of the smaller files
of interest. By subjecting these files to bin walk’s analysis, it can identify embedded files or
additional layers of compression that may exist within them. This approach enables
comprehensive exploration of the binary and data files to ensure no relevant open- source
libraries are overlooked during the extraction process.
In the final stage of the process, after decompressing the binary program, The main
focus on detecting the open-source libraries within the decompressed files. Drawing
inspiration from existing approaches and industry practices, and introduce an automatic
component detection algorithm to identify the majority of the open-source
components.This algorithm incorporates three types of feature matching: library name,
meta-info, and string matching, to effectively detect the components. Additionally, it
employ clustering rules to identify additional components that exhibit similar attributes.
The binary file names often provide valuable clues about the real names of the
libraries.As our first strategy, leverage this feature to identify libraries within the
automotive system. However, there are cases where the file names have been deliberately
altered, rendering this approach ineffective. Moreover, the expected errors will also occur
if the file name does not match the library name. For example, the network communication
library OpenSSL may be distributed with the file name lib crypto. Using the name
matching approach cannot match thesetwo names. To address this challenge, incorporate
additional strategies to verify the actual names of the libraries.
Meta-info is a special type of feature that usually exists in the open source. The file
header of a binary file generally encompasses metadata, including the magic number, file
type,and file version. Extracting this metadata necessitates reading the file header. It is
important to note that the file header format can differ across various types of binary files,
thereby requiringan understanding of how to interpret the file header based on the specific
file type. For instance, in the case of ELF (Executable and Linkable Format) files, the file
header encompasses significant metadata such as the file type, machine code, and entry
point address.
To access the information within the ELF file header, the ‘‘readelf’’ tool can be utilized.
The expected error of this matching approach occurs when the given binary file does not
contain the meta-info header. For the binary files that are imported from standard package
managers, the header will exist. Otherwise, different matching algorithms should be used
to determine the library name.
Strings in software binaries are typically invariants that remain unchanged during
the compilation process from source code. To exploit this characteristic, it collect and
extract strings from the source code of various projects and then proceed to detect libraries
in binary formats. and employ the command strings to extract strings from the target
binary files and compare them against the strings collected from the source code of
specific libraries. In our experiments, the set a threshold of 10%. If 10% of the strings in
the binary file match with those from a particular library,consider the binary file to contain
that library. This approach enables us to identify libraries present in the binary firmware
with a significant degree of confidence. The expected errors happen due to two reasons.
First, although the threshold of 10% is chosen to minimize the false positive and false
negative cases, there are cases where a binary file contains more than 10% strings of a
different library. Second, the library may importother libraries as their dependencies. This
will import other libraries’ strings into the source code. Matching strings of other libraries
will produce false positives if the percentage of these strings is relatively high. Therefore,
after the string matching algorithm, the employ human experts to verify the correctness of
the library prediction.
After completing all three steps of open-source library detection,to have identified
the libraries that can be detected through feature matching. However, this method of
detection relies on having the features of all libraries beforehand and cannot identify
rules to identify additional and new libraries within the binary firmware.
The rules are as follows:
The file must be in the elf format (e.g., .so, .bin, or a binary file without an extension).
The file should reside in the same folder as the already detected binary library.
The file should adhere to the same format as the already detected binary library.
The file should follow a specific naming convention, which can be expressed using
regular expressions.
To identify these additional libraries that exhibit similar characteristics to the
detected libraries, the employ the aforementioned rules and engage three software
engineering researchers for manual collection within the binary. In total, to successfully
collected 4092 open-source libraries from 676825 decompressed files, which will be
utilized for further analysis.
1) Telematics Box(TBOX):
The TBOX facilitates communication and data exchange between vehicles and the
Internet. Through TBOX connectivity, smart connected cars can access various telematics
services and applications, including real-time traffic information, navigation services,
remote payments, smart home integration, and entertainment applications. Acting as a
bridge between the vehicle and cloud services, the TBOX enables interaction and
integration with the external world.
interact with other vehicle systems, such as the vehicle control unit, to achieve a highly
integrated vehicle electrical architecture and enable intelligent and connected features. IVI
typically participates in vehicle integration as individual components.
3) In-Vehicle Gateway:
The In-Vehicle Gateway serves as a bridge between the internal and external
networks of a vehicle. It manages data flow within the vehicle’s internal network and
facilitates communication between the vehicle and the external Internet. The In-Vehicle
Gateway collectsdata from various internal systems and sensors, performs processing and
aggregation, and distributes the data, making it accessible and usable by different systems
and external cloud platforms.
The ADAS in intelligent connected vehicles integrates sensors, data processing, and
control algorithms to enable autonomous driving under specific conditions. Its purpose is
to provide advanced automated driving features, alleviate driver burden, and enhance
driving safety and convenience. The functionalities of ADAS include environment
perception and modeling, localization and route planning, autonomous driving decision-
making and vehicle control, and status monitoring and fault handling. ADAS utilizes sensors
such as lidar, cameras, and mm-wave radar to perceive and monitor road conditions,
vehicles, pedestrians, and obstacles in real time.
The RSU is the fundamental unit and primary deployment device in road network
construction, responsible for providing communication and interaction between vehicles
and road infrastructure. The main functionalities of the RSU include road information
interaction, intelligent traffic warning, and traffic management and optimization.
Simultaneously, the RSUcollects data from multiple vehicles for traffic flow analysis and
prediction, optimizing signal control and traffic dispatching to improve road efficiency and
reduce congestion. Through communication with vehicles, the RSU offers functions such
as traffic information exchange, traffic safety support, traffic management, and
optimization, providing essential support for thedevelopment of intelligent connected
vehicle systems and the intelligence of transportation systems.
Figure 4.2 shows the architecture for the automotive system and the locations where
theopen source package is found. The top structure of the automotive system is segmented
into distinct domains, delineating the boundaries of its core functionalities. Through an
assessment of component placement within the sample firmware, three pivotal domains are
discerned: the autonomous driving domain, intelligent cockpit domain, and vehicle control
domain. Seamlessly interlinked via an in-vehicle Ethernet connection, this architecture
CHAPTER 4
RESULTS
In this section,summarize the lessons learned from the finding in automotive open-
source library management. Firstly, for developers of software component detection tools,
it is essential to recognize the significant differences between automotive libraries and
commonly used libraries. Consequently, it is crucial to collect signatures specifically
tailored to the automotive domain. While some industrial tools claim to include libraries for
automobiles, our experiments revealed that none of them successfully detected libraries in
real-world cars. Therefore, developers must thoroughly investigate the scope of these
libraries and incorporate them into their databases.
Thirdly, for developers and management teams of open source libraries, given their
widespread use, it is advantageous to provide not only the libraries themselves but also best
practices for their usage and management. This can encompass offering standardized
installation or import procedures for the libraries and ensuring their compatibility with
component detection tools. The overarching goal is to facilitate easier open-source
management for C/C++ developers.
CHAPTER 5
APPLICATION
5.1 Commonly-used open source software identification
In this section, the outline the procedure for identifying commonly used open-
source software within the ecosystem. Defining what constitutes ‘‘commonly used’’
presents a significant challenge. To address this, propose three criteria for assessing
libraries and provide a detailed process for collecting the list of open-source libraries
based on each of thesethree criteria. By employing this comprehensive approach, The
aim to establish a robust and representative dataset of widely adopted open-source
software components.
To initiate our data collection process, explore various online platforms that
curate lists of significant and foundational open-source packages. These platforms
serve as valuable references for identifying commonly used libraries within the
software engineering community.Our approach involves gathering a compilation of
websites that feature popular open-source packages. Subsequently, to conduct web
crawling techniques to extract the names of these libraries, which form the basis of our
data collection. By leveraging these reputable online resources, To ensure a
comprehensive and representative dataset of widely adopted open- source libraries.
The third aspect, consider is the presence of Common Vulnerabilities and Exposures
(CVEs) within open-source libraries. CVEs refer to publicly known software security
bugs thatcan be exploited by malicious actors to target software systems. However,
identifying CVEs is a challenging task that requires significant effort. Consequently,
adversaries tend to focus theirefforts on finding vulnerabilities in popular software that
holds a large market share and can result in substantial losses when compromised. In
essence, open source projects with CVEs areoften those that are widely used by others.
4 Data consolidation:
Once to have collected the open-source lists from the three aforementioned
aspects, and merge them into a single unified list and eliminate any duplicate entries.
In cases to encounter libraries with similar but not identical names, and engage two
software engineering researchers to perform a manual verification process. Their
objective is to determine whether the open-source libraries in question refer to the
same project. In instances where a match is confirmed, to remove one of the duplicate
entries to ensure that there is no redundancy within our dataset.
CHAPTER 6
CONCLUSION
In conclusion, it presents a comprehensive empirical study on the utilization of
open- source libraries within automotive ecosystems. By collecting and analyzing 10
firmware samples and 4092 libraries, to offer insights into the overall software
architecture of automotive systems. Furthermore, to investigate the distribution
patterns of open-source libraries in this domain and compare them with those found in
general- purpose software. Surprisingly, our findings reveal that a significant portion
61.15% of automotive libraries is distinct from the libraries commonly used in general
software. Finally, conduct an analysis of security issues associated with the use of
these libraries and provide actionable recommendations for improving open-source
library management across all user categories. Through this research, the aim to
enhance understanding and facilitate effective utilization of open-source libraries in the
automotive context.
REFERENCE
[1]. B. Du, S. Azimi, A. Moramarco, D. Sabena, F. Parisi and L. Sterpone, "An
Automated Continuous Integration Multitest Platform for Automotive Systems," in
IEEE Systems Journal,vol. 16, no. 2, pp. 2495-2506, June 2022.
[2]. S. Kochanthara, Y. Dajsuren, L. Cleophas and M. van den Brand, "Painting the
Landscape of Automotive Software in GitHub," 2022 IEEE/ACM 19th International
Conference on Mining Software Repositories (MSR), Pittsburgh, PA, USA, 2022.
[3]. Y. Zhang, Y. Ning, C. Ma, L. Yu and Z. Guo, "Empirical Study for Open Source
Librariesin Automotive Software Systems," in IEEE Access, vol. 11, pp. 123717-123728,
2023.
[4]. A. Molin, A. M. Riviş and R. Marinescu, "Assessing the Real Impact of Open-
Source Components in Software Systems," in IEEE Access, vol. 11, pp. 111226-
111237, 2023.
[6]. W. Tang, Z. Xu, C. Liu, J. Wu, S. Yang, Y. Li, P. Luo, and Y. Liu, ‘‘Towards
understanding third-party library dependency in C/C++ ecosystem,’’ in Proc. 37th
IEEE/ACM Int. Conf. Automated Software. Eng., Oct. 2022, pp. 1–12.