Report 2023
Report 2023
Report 2023
Data Science
However, the results from these various filters presented a new challenge. Each filter produced
different results, making them difficult to analyze. A key issue was the lack of a method to
validate the outcomes of these filters beyond visual inspection. This raised concerns about
whether the filters were merely removing noise or also valuable information. The absence of a
'pure', noise-free reference curve compounded the difficulty in validating the filtered time series.
It's important to note that no single filter is perfect; each one tends to remove different types of
information.
Given these challenges, we decided to use the raw data without applying any filters. We
concluded that it's preferable to deal with the noise in the data rather than risk losing valuable
information. The reason for not having a perfect or "golden" filter lies in the inherent complexity
and variability of real-world data. Filters, while useful for removing specific types of noise, can
also inadvertently remove critical information. The effectiveness of these filters varies greatly
with different datasets, each of which can have unique characteristics.
The primary goal was to identify and eliminate these anomalies within the AI framework before
conveying this knowledge to other teams, such as the Hardware (HW) team. This step was
crucial because any irregularity in the raw data could significantly impact the outcomes of
machine learning algorithms.
It's crucial to emphasize that these developments represent an ongoing process of refinement.
The implementation timeline for these improvements is fluid, adapting as new anomalies are
detected. These are promptly addressed through newly developed, ad-hoc data validation
techniques. Focusing on the finger module, which is the most advanced at the time of this
report, we have implemented specific data validation techniques.
These techniques are part of our continuous effort to enhance the accuracy and reliability of our
data analysis, ensuring that the information passed on to subsequent stages and teams is as
precise and error-free as possible.
● Gauge Data Range: Gauge data, after normalization, is required to fall within the range
of -10 to 5. However, this criterion is currently not in use. While implemented, it has been
observed that applying this range filters out too many cases. A revision in collaboration
with the Hardware (HW) team is needed before this validation can be effectively put into
production.
● Consistent Measurement Length: All measurements, including delta timestamp, vertical,
tangential, radial accelerations, and gauge, must be of the same length. This
consistency is vital for ensuring data integrity and accuracy in analysis.
● Minimum Data Points: The combined measurements need to total more than 28,000
points across all trips. This threshold ensures a sufficient data volume for robust analysis
and model training.
● Defined Number of Trips: The data set must comprise exactly six trips, with an equal
distribution of three left and three right trips. This balance is crucial for maintaining
consistency and reliability in the data.
With these implementations, we aim to ensure that the data utilized by the AI Team is valid and
reliable. These validation techniques are critical in maintaining the integrity of our data analysis
and ensuring that the results are based on accurate and consistent information. The continuous
improvement and adaptation of these data validation techniques are integral to the success of
our AI-driven analysis and the overall project.
The utilization and significance of finger data can be understood through Figure 1, which
illustrates the behavior of the finger during a single trip over a textile surface. This trip can be
distinctly divided into three phases:
- Static Motor Phase: Initially, readings are taken while the motor is stationary. These
measurements are crucial as they serve as a baseline for filtering noise in the
subsequent phases of the trip.
- Acceleration Phase: This phase involves the finger accelerating but, currently, the data
from this phase is being disregarded.
- Constant Speed Phase: The final phase, where the finger moves at a constant speed,
provides the primary data we use for analysis.
In addition to the existing approaches to enhancing raw data, an important development comes
from Pavel's analysis. In the experiment, he used a cosine signal from which a sampling is
obtained. This sampling, in the case of our sensor (finger, version 1), is every ~600ms.
However, due to some errors regarding micropython, we are skipping some values, so the ratio
is not consistent to every ~600ms. In the experiment, Pavel wanted to test the influence of the
noise that is inherent to the system, and the skipping of some measurements. To this end, he
simulated exactly this. He randomly skipped a value to make the ratio inconsistent. Besides, he
added random noise to the signal.
The main idea was to test if the Fast Fourier Transform (FFT) analysis is influenced by the
random noise, by the inconsistent sampling ratio, or by both. As a result, after briefly discussing
it with the AI Team, he saw that the inconsistent sampling ratio is the most affecting error. In
order to overcome this, we propose a solution to interpolate the missing values so to fill the full
sequence.
It is worth to mention that, fundamentally, to obtain a FFT the sampling needs to be constant, so
minimally affecting this can result in an inconsistent analysis and although minimally, this should
be considered for the new version of the finger, because we understand that is something that
can happen, although very rarely.
.
Besides, referring to this experimental setup, we can also further leverage this framework to test
other already developed features such as the quality of the signal filters. However, it is important
to mention that we do not know the exact source of the noise that we are collecting in reality.
Meaning that the noise might not be random. To explain this better, let's use the following
example. Imagine that we have a delta timestamp as: 600, 560, 640, 1200, 600, 600, 1200, 600,
600, 600, 600, 600, 600, 600, 600, 600. As can be seen, we are having noise in the beginning
of the signal, and not completely random. Nonetheless, random noise is a good starting point.
Figure 1. Behavior of the finger when performing a trip. 3 different phases can be recognized,
the first part is when the finger is stopped and we get 100 of measures. These are intended to
be used for filtering noise. The second part is the acceleration step that is currently being
discarded. Finally, the used part that we use is the one at constant speed.
In the past year, we have developed two major methodologies or algorithms for deriving a
feature vector, though it's important to clarify that the initial approach has been discontinued due
to its limitations.
The first methodology involved the application of the Fast Fourier Transform (FFT) to the
constant speed part of the gauge data from the initial left-right trip. This FFT output was used as
the feature vector. For dimensionality reduction and to facilitate analysis, we employed Principal
Component Analysis (PCA) to project this data into a 3D space. However, this method had
significant drawbacks. Primarily, the FFT results were highly sensitive to any abnormalities in
the gauge data, leading to substantial variability. Moreover, this approach did not leverage
accelerometer data, which we later realized was a crucial source of information.
Due to these challenges, we have since moved away from this initial methodology. Our focus
has shifted to developing more comprehensive methodologies that incorporate a wider array of
data, including accelerometer readings, to construct a more accurate and reliable feature vector.
Several approaches were tested before settling on the current solution, as graphically depicted
in Figure 2. This figure illustrates how we utilized the barycenter of all gauge trips for both left
and right trips, the mean of these trips, and the trip that lies in the middle. Despite experimenting
with these various methods, our findings indicated no significant differences in the results.
Consequently, we decided to use data from just one trip in our analysis. Simplifying to a single
trip allows us to reduce this complexity while maintaining the effectiveness of our analysis. This
decision reflects our ongoing efforts to streamline our processes and focus on techniques that
offer the most utility and efficiency in our data analysis.
Figure 2. Graphical representation of the different approaches followed in the first iteration of
the feature vector of the finger.
The second algorithm, which is currently in use for our 3D visualization, is inspired by the
methodology presented in the paper titled "Humanoid Identification of Fabric Material Properties
by Vibration Spectrum Analysis”. In this paper, two measures are derived from the
accelerometer data. Specifically, they calculate the integral of the Fast Fourier Transform (FFT)
of the accelerometer data, denoted as S(FFT), and the integral of the square of the FFT,
referred to as S(FFT2).
S(FFT) frequency spectral integral: This is calculated as the integral of the magnitude
spectrum from the FFT analysis across a chosen frequency band. It serves as an indicator of
the total vibration energy within that frequency range, akin to the cumulative "strength" of the
vibration. The calculation involves summing the magnitudes at each frequency bin, multiplied by
the bin width, to approximate the integral.
Physical Interpretation: This metric essentially captures the overall vibration energy
generated when interacting with the fabric. In practical terms, S(FFT) can be linked to
the "feel" of the fabric as it relates to the combination of various tactile sensations such
as smoothness, roughness, and texture. A higher S(FFT) value might indicate a fabric
with more pronounced texture or roughness that generates stronger vibrations when
touched or rubbed. It reflects the aggregate of all frequency components in the vibration,
providing a sense of the total tactile stimulation.
Textile Applications: This measure can be particularly useful in quality control, where a
consistent tactile feel is important. It can help in differentiating fabrics based on their
overall tactile feedback, identifying those that are smoother or rougher to the touch.
S(FFT2) power energy of the vibration: This measures the integral of the square of the FFT
magnitude spectrum. This represents the power spectral density of the signal, which is a
measure of the power present at each frequency component of the signal. It's essentially a way
of quantifying how much power, or energy, the signal has at each frequency.
Physical Interpretation: S(FFT2) emphasizes the power at each frequency, giving more
weight to dominant frequency components. Physically, this can correlate with the specific
features of a fabric's texture that produce strong, resonant vibrations. It could be
indicative of certain weave patterns, thread density, or surface treatments that result in
prominent tactile features.
Textile Applications: This measure can be critical in textile design and innovation, where
specific tactile characteristics are desired. It can aid in analyzing how particular weaving
patterns or material compositions contribute to the unique tactile feedback of a fabric.
S(FFT2) can be especially useful in the development of textiles where certain tactile
sensations are targeted, such as in high-performance sportswear or luxury fabrics.
To comprehensively understand a fabric's tactile characteristics, two distinct metrics are
employed. The first component of our feature vector is the integral of the amplitude of the
accelerometer, which is calculated using the Fast Fourier Transform (FFT) of the
accelerometer's x, y, and vertical axes. This is represented as S(FFT) in the paper and provides
an overview of the tactile energy of a fabric. The second component, S(FFT2), involves the use
of the squared FFT (FFT^2) of the same accelerometer data. This metric delves deeper into the
fabric's tactile aspects, especially highlighting dominant tactile features.
Additionally, the third component of our feature vector is the energy of the gauge, which is
calculated similarly to the energy of the accelerometer using FFT^2 of the gauge data. This
addition aims to enrich the understanding of a fabric's tactile properties, contributing to a more
nuanced classification, design, and quality assessment of textiles.
The paper demonstrates that S(FFT) is the integral of FFT(f), and S(FFT2) is the integral of the
squared FFT(f). The decision to use FFT^2 for the gauge is aligned with this approach, aiming
to capture a detailed and comprehensive profile of the tactile properties of the fabrics under
study.
It is important to note that although the AI Team has more in-depth insights about how to
leverage the rest of the trips. We did not address this topic yet.
Similarly, we have developed tools so that the HW team can plot raw data without relying on
Sensor App. In this way, they can easily analyze their tests.
During a accuracy check of our results and methodologies, we encountered an error in the code
responsible for processing accelerometer data. This error had been affecting the accuracy of
our previous analyses. Following a comprehensive investigation, we pinpointed the root cause
of the issue and successfully implemented a correction to guarantee precise data processing.
In addition to correcting the code error, we have introduced a new standardization procedure for
accelerometer data. This involves normalizing the measurements from each axis (x, y, z) and
subtracting the mean. This standardization is crucial for eliminating biases and variations in the
data, leading to more reliable and consistent analysis.
With the corrected code and new standardization method in place, our immediate next step is to
regenerate the results using the updated version of the finger sensor. The regenerated results
will be meticulously compared and validated against the findings presented in the paper
"Humanoid Identification of Fabric Material Properties by Vibration Spectrum Analysis". This
comparison is vital to ensure that our methodologies align with established research and
produce comparable outcomes.
Another significant area of exploration is the methodology used for calculating integrals in three
dimensions. Currently, our approach involves calculating the integral separately for each axis (x,
y, z) and then summing these values. However, we are now investigating whether alternative
methods exist that could offer a more integrated or holistic approach to this calculation. Such
methodologies could potentially provide a more nuanced understanding of the fabric's tactile
properties by considering the interplay between different axes.
Moreover this method, while effective in depicting the overall tactile sensation through the
integral of the FFT values, does not account for the distribution of these frequencies. To
enhance our analysis, it is necessary to examine the frequency distribution more closely. This
additional analysis will allow us to distinguish between fabrics that exhibit high roughness with a
uniform pattern, and those with low roughness but varied patterns. Understanding this
distinction is crucial for a more nuanced and accurate differentiation of fabric textures.
1.2.6. Summary
1.3. Pressure data
The press sensor has always been an easier to analyze data source due to its simplicity.
Nonetheless, the provided information is very useful and properly analyzing it can lead us to
great success. However, we would like to mention that the press has had noticeable mechanical
errors during this time. In particular, the press sensor got completely, or partially stuck many
times. As a result, we obtained either flat curves with the max possible value, or press curves
with strange behaviors like more than one hill, or an oscillating increasing and then decreasing
curve. As we did with the previous module, we needed to validate the incoming data, before
continuing with its analysis. To this end, we also developed proper plotting and analyzing tools
that are included in the senstile-ai-dataprocessing package.
First, we tested that a curve only has just one hill. A simplified version of the process could be
explained as performing a summation over a sliding window that moves throughout the curve;
each time, we check if the sum is increasing or decreasing, and if decreasing, we mark it as
invalid. Second, we also test if the read values are within a range, so from -50 to 500. Third, we
check that the max value (the peak) must always be between 450 and 500. Finally, we
implemented a technique called findpeaks that behaves similar to our first approach but is more
capable to distinguish increasing or decreasing curves that have minor oscillations. In other
words, it is a more robust technique that filters better the cases shown in Figure 4.
These methods have been tested and we are very confident to say that the press curves have
to pass robust data validation techniques. As a result, the data analyzed by the AI Team is
guaranteed to be quality data.
Figure 3. Examples of different press curves obtained with the pressure sensor.
Figure 4. Example of a bad press curve on the left and a nice pressure curve on the right.
Firstly, we obtain the feature vector. A feature vector consists of a vector of 5000 elements
where the peak of the press curve is lying in the middle. This is done to resolve 2 main issues
that we encounter in the pressure curves. On the one hand, the curves are not equal in length.
A textile could require many more points to reach the max pressure resistance, while others
need less. This increases the difficulty of analyzing time series data. On the other hand, after its
alignment, we are able to compute a Euclidean distance between the feature vectors and avoid
using the Dynamic Time Warping (DTW) distance. To properly understand this matter, let’s focus
on Figure 5. As can be seen, on the left the curves are not aligned and if we compute the
Euclidean distance among these two, we would not make a fair comparison. In particular, when
the blue curve is at its max value, at the peak, that is the 100% press resistance, we would be
comparing that value against the 75% of pressure resistance of the orange curve. This impact is
minimized when aligning the curves, although we know that it is not 100% solved, we think that
this is a very good approach. In our tests, we did not find any major differences between this
technique and the use of DTW distance. To obtain more accurate distances, we could rely on
the DTW distance but the computational complexity of this is 3 times bigger.
Figure 5. Graphical explanation of why the alignment matters when comparing 2 pressure
curves. Left shows 2 unaligned curves while on the right an aligned version of those curves is
shown.
Once we obtain what we call the feature vectors, we use these to obtain the projections. The
projections, as occurs in the finger, are engineered features that represent statistical aspects of
the curves we have. These are supposed to be 3, to obtain a 3D projection, but we did
implement up to 7. Our approach was to then select the best 3 for a 3D representation, as we
were showing in Fabrik Hub. These features are as follows:
To select just 3, from the list, we performed a correlation study. The idea is to select those that
have less correlation among them. The results showed that although some have noticeable
correlation, in most cases, they can be considered statistically independent and hence, any 3 is
a good choice for a 3D view. Note that performing a PCA here would provide poor results also,
because of the same principle, there is low correlation between the engineered features.
1.3.3. Summary
1.4. Infra-red images
The IR data source has been the last one from which we have obtained data. Furthermore, we
are still in the process of performing an in-depth analysis of the quality of it. Besides, there is a
huge potential on this module that we are not exploiting. This is potentially motivated by the lack
of time to dedicate to this module in the last two months.
Nevertheless, we have come forward with two feature extractions. First, we obtain a normal
surface of the textile. Second, we compute the feature vector by using an ImageBind called
feature extraction technique.
To keep an historical record of the developments of the IR sensor, we focus on how we initially
worked with this data, and which paths have been discarded and which have been followed.
In our initial approach, we leveraged a standard ResNet50 neural network for feature extraction.
However, since ResNet50 is trained for image classification among 1000 classes, not aligning
with our textile dataset, we removed its last layer. This process aimed to retain the network's
ability to extract essential features, although originally geared towards predicting those 1000
classes. Despite common success using pretrained networks for different tasks, including ours,
the results were unsatisfactory when utilizing ResNet50, VGG16, and other ResNet variants
without the classification layer. Consequently, recognizing the need for a more tailored solution,
we attempted to construct an Autoencoder. Unfortunately, due to the limited number of
instances, even the simplest Autoencoder architectures didn't yield the desired outcomes.
Consequently, we shifted our focus to conventional computer vision techniques, leading to the
development of the MR-ELBP and WAVELET methods.
The Median Robust Extended Local Binary Pattern (MR-ELBP) enhances texture analysis by
addressing noise issues in the Local Binary Pattern (LBP) algorithm. Incorporating a median
operation, MR-ELBP mitigates the impact of outliers, making it ideal for scenarios with noisy
image data. By calculating LBP on the image, replacing the central pixel intensity with the
median of its local neighborhood, and then computing the extended LBP, MR-ELBP generates a
feature vector robustly representing local texture patterns. This descriptor proves valuable for
tasks such as texture classification and object recognition in challenging environments.
Meanwhile, the Wavelet-based Image Analysis method employs wavelet transforms for robust
and multi-scale feature extraction, particularly adept at analyzing complex image structures.
Decomposing images into different frequency bands captures both fine and coarse details.
Utilizing wavelet coefficients as a feature vector encodes information about the image's texture
and structural components across various scales. This approach excels in applications like
image denoising, compression, and pattern recognition, offering a versatile solution for tasks
demanding comprehensive image understanding and representation.
In conclusion, our experimentation reveals that the combined utilization of the MR-ELBP and
Wavelet-based Image Analysis vectors yields superior results compared to their individual
applications. This synergy prompted us to adopt the hybrid approach, referred to as
MRELBP+WAVELET.
Upon transitioning to production, we opted to put this model into a Google Cloud Functions, and
we discovered the ImageBind method in the Vertex AI model gallery. This is a specific neural
network developed to create embeddings of different data sources such as text, sound and
images. Hence, since this model is built to strictly create embeddings and feature vectors from
the aforementioned data sources, we opted to test it also due to its user-friendly interface and
seamless integration with Google Cloud services.
The initial trials with the ImageBind algorithm showcased notable efficiency, raising
considerations for its potential adoption. Consequently, a comprehensive evaluation of the
previously presented methodologies is imperative to determine the most effective algorithm for
our specific application. This assessment will guide us in selecting the optimal model for
deployment, ensuring the continued success of our image analysis endeavors.
When we saw this NPL-PS framework, we realized that we can not only obtain the normal
surfaces, but also retrieve a 3D mesh of the object. This allows us to also obtain height of the
threads of the images, which can potentially lead us to significant improvements.
However, after putting a considerable amount of time, we are still unsuccessful in obtaining the
3D mesh, but we are able to get the normal surfaces under this framework. Therefore, although
close, we are still unable to get the 3D mesh object. The main problem is in the Global Blending
step, there must be something happening with the Partial Least Squares minimization.
Figure 7. Photometric Stereo setup. In the left part, the classical photometric stereo is shown.
The light sources are assumed to be far away from the object under study. In the right part, the
object is close to the light sources. The latter is called Near Point Lightning Photometric Stereo.
A more in depth insight of what the NPL-PS does, is as follows. First, a plane image is initialized
and cut into small facets. A facet is a quadrangular facet composed by 4 vertices as
𝑓𝑖,𝑗 = {𝑣𝑖, 𝑗 , 𝑣𝑖+1, 𝑗 , 𝑣𝑖+1, 𝑗+1 , 𝑣𝑖, 𝑗+1}. Therefore, the image is cut into many squares of 1 pixel
length and a matrix-like shape is obtained. Then, the idea is to pursue the following 2 steps in
an interactive manner until convergence.
The local shaping step corresponds to calculating the normal vector to each of the facets,
given the different light sources, and images. For each light source, there is one image.
Afterwards, the idea is to move the height of these vertices, so that the face ends perpendicular
to the normal vector. This is done for every single facet in the mesh.
In the global blending step, the idea is to smooth those facets so that they form a connected
surface. This is done by minimizing the square difference between the shared vertices of the
different facets. This workflow is shown in the next schema:
In the very first step of the iteration, the normal surface is obtained, that is the one that we are
using.
1.5. Micro images
In this project, we focus on obtaining and analyzing micro images from textiles. The primary
objective is to accurately recover and analyze the threading and textile structure of these
materials. We begin by scanning textiles to obtain detailed micro images. These images serve
as the foundation for our subsequent analysis.
To process the data obtained from these micro images, we employ a combination of advanced
techniques:
- MRELBP+Wavelet Method: This method is utilized to extract significant features from the
micro images.
MRELBP is an advanced image analysis technique used for texture analysis. It's an
extension of the Local Binary Pattern (LBP) method, which is widely used for pattern
recognition and image processing. MRELBP improves upon the basic LBP by enhancing
its robustness and effectiveness in capturing detailed textures in images. It does this by
considering the median values in local image patches and extending the binary patterns
to encode more complex texture information. This makes MRELBP particularly effective
in scenarios where high levels of detail and subtle texture variations are important, such
as in textile analysis.
The Wavelet technique involves the use of wavelet transforms, a mathematical tool used
for signal processing and image analysis. Wavelet transforms decompose an image into
a set of wavelet coefficients, representing different frequency components of the image.
This technique is particularly adept at capturing both frequency and spatial information,
making it useful for analyzing textures and structures within an image. It's capable of
revealing details that might be missed by other methods, especially in images with
varying scales of texture or patterns.
It's noteworthy that these same algorithms - MRELBP and Wavelet techniques - were
employed by Fathima in her initial analysis.
- ImageBind Method: We also use this approach to process the data and extract relevant
features. ImageBind is a neural network based tool within Vertex AI. Its primary function
is to generate embeddings of an image. An embedding, in the context of machine
learning and particularly in image processing, is a representation of an image in a
lower-dimensional space. The process works by feeding an image into the neural
network, which then outputs a vector (the embedding). This vector captures the essential
features and characteristics of the image. These features can include aspects like
shapes, textures, colors, and other significant patterns present in the image. In the
context of Vertex AI, ImageBind likely utilizes advanced neural network architectures,
potentially including convolutional neural networks (CNNs), which are particularly
well-suited for image analysis. The tool is designed to integrate seamlessly within the
Vertex AI environment, enabling efficient and scalable processing of image data.
The vectors obtained from the above methods are then subjected to dimensionality reduction.
We use two primary techniques for this purpose:
- t-SNE, or t-distributed stochastic neighbor embedding, is a dimensionality reduction
technique commonly used in data visualization. It excels at preserving the local
relationships between data points in high-dimensional spaces, making it effective for
revealing clusters and patterns within complex datasets. t-SNE works by modeling the
similarity between pairs of data points in the high-dimensional space and then optimally
placing them in a lower-dimensional space, such as 2D or 3D, where the similarities are
still preserved. This method is particularly useful for visualizing intricate structures and
relationships in micro images, helping to highlight the threading and textile structure in a
more interpretable manner.
- UMAP, or Uniform Manifold Approximation and Projection, is another dimensionality
reduction technique designed for visualizing complex datasets. Similar to t-SNE, UMAP
aims to preserve the local relationships between data points while also maintaining a
more globally accurate representation of the data's structure. UMAP works by
constructing a topological representation of the high-dimensional space, which helps
capture both local and global patterns more efficiently than traditional methods. By doing
so, UMAP can provide insightful visualizations of the micro images' features, aiding in
the interpretation of threading and textile structures in a simplified manner.
To handle and process this data efficiently, we have developed the 'senstile-ai-micro' Python
package. This custom software package is tailored to meet the specific needs of our textile
imaging and analysis project. We recognize the need to develop robust data validation
techniques. These techniques are essential to ensure the accuracy and validity of the images
we process, and to prevent the analysis of any invalid data.
There are significant considerations to be mindful of when using t-SNE and UMAP as feature
extractors, despite their effectiveness in data visualization:
- Primary design for visualization: When contemplating the application of UMAP or t-SNE
for tasks beyond mere visualization, such as for clustering and distance computation, it's
essential to understand their inherent limitations. This understanding is vital because, in
scenarios where t-SNE or UMAP are used for clustering and calculating distances, the
reduced vector generated by these techniques must be employed as the feature vector.
However, since these methods were primarily designed for visualization, they might not
always maintain the exact geometric relationships or distances in the reduced space as
they were in the high-dimensional space.
- Computational efficiency concerns: Handling large datasets with t-SNE and UMAP can
be computationally demanding. This presents a challenge in scenarios where
near-real-time processing is required, as these techniques may not be the most efficient
choices for feature extraction.
- Risk of losing information: The dimensionality reduction process, while optimized for
visualization, might omit critical information from the original data. Subtle patterns or
nuances important for in-depth analysis, such as in textile threading and structure
recovery, could be lost. This loss of detail is a significant drawback for applications
requiring a comprehensive understanding of underlying structures.
- Concept drift: The addition of more images can lead to a phenomenon known as
"concept drift," where the underlying patterns and structures in the data evolve over time.
Consequently, the representations obtained from the original subset may no longer
accurately reflect the relationships within the expanded dataset. Concept drift poses a
significant concern when using t-SNE and UMAP as feature extractors, as the evolving
nature of the data can alter the positions of data points in the reduced space. This
means that insights derived from the initial visualizations may become outdated or less
relevant as the dataset grows. It emphasizes the need for continuous monitoring and
adaptation of the visualization models to accommodate the dynamic nature of the
incoming data. Furthermore, concept drift highlights the importance of considering the
temporal aspect when applying dimensionality reduction techniques for feature
extraction. Strategies such as periodic retraining of the models or incorporating
mechanisms to adapt to concept drift become crucial to maintaining the relevance and
accuracy of the visual representations over time. In situations where concept drift is a
significant concern, alternative approaches, such as online and incremental
dimensionality reduction methods, may be explored to better handle the evolving nature
of the dataset and ensure the stability of the extracted features across different temporal
snapshots.
- Sensitivity to hyperparameter: The performance of t-SNE and UMAP can significantly
vary based on hyperparameter settings. This sensitivity introduces a level of
unpredictability and subjectivity, potentially impacting the consistency and reproducibility
of results across different scenarios.
In summary, while t-SNE and UMAP are valuable for visualization, caution is needed when
using them as feature extractors due to their limitations in preserving accurate distances,
computational demands, potential loss of information, interpretability challenges, and sensitivity
to hyperparameters. Therefore, we still need to validate this in a potentially increasing workflow,
potentially when new clients will be demanding our services. This is represented in Figure 8.
Figure 8. Evolution of the visualizations through time. When adding more data, the visualization
can drastically change. And hence, the feature vector will change also.
1.5.1. Summary
However, once the groups are obtained, there are statistical methods that we have already used
to assess the best number of clusters. These assessing methods vary from one clustering
output to another.
Hierarchical clustering is a method that organizes data into a tree-like structure of clusters.
Although it can be used with any distance measure, it is typically used with the Euclidean
distance. Initially, each data point is treated as a separate cluster, and subsequent iterations
merge clusters based on their similarity. This process results in a dendrogram, visually depicting
relationships within the data. The benefits of hierarchical clustering include its versatility and the
ability to reveal hierarchical structures in the data. An example of a dendrogram is shown in
Figure 9. As can be seen, the red line is arbitrarily placed in that height and as a result, we
obtained 3 different clusters. But this could be placed in another point and the number of
clusters would have varied.
Figure 9. Example of a dendrogram diagram. In the x-axis, the instances are found, and in our
scope, we can think about Article Code - Side tuples. Y-axis shows the average distance
between clusters. The red line represents where we opted to cut and hence, obtain 3 clusters.
K-Means is a partitioning algorithm that classifies data into 'K' clusters based on similarity. The
process involves iteratively refining cluster assignments until convergence. Its efficiency,
simplicity, and scalability make it suitable for large datasets. However, K-Means requires the
predefined number of clusters 'K' and is sensitive to the initial choice of cluster centroids.
Additionally, it assumes clusters are spherical and equally sized. Furthermore, since this is an
stochastic clustering algorithm, meaning that every time the output of the algorithm is different,
is not deterministic, the solution can vary from one run to another. Therefore, several trials need
to be run, with different initialization parameters, so that we can guarantee that the solution is
optimal, or nearly optimal.
Gaussian Mixture Models (GMM) is a probabilistic model that assumes the presence of a
mixture of several Gaussian distributions in the data. Unlike some other clustering algorithms,
GMM provides a soft assignment for each data point, expressing the likelihood of it belonging to
each cluster. This flexibility allows GMM to capture complex patterns and structures within the
data.
The underlying idea behind GMM is to represent the dataset as a weighted sum of Gaussian
distributions, each characterized by its mean, covariance matrix, and weight. The
Expectation-Maximization (EM) algorithm is employed to iteratively estimate these parameters.
During the Expectation step, the algorithm calculates the probability of each data point
belonging to each cluster based on the current parameter estimates. In the Maximization step,
the parameters are updated to maximize the likelihood of the observed data given the current
cluster assignments.
One notable strength of GMM is its ability to model clusters with different shapes and
orientations, making it particularly suitable for datasets where clusters exhibit varying levels of
complexity. Additionally, GMM provides a natural way to handle overlapping clusters, as it
assigns probabilities rather than rigid memberships.
However, GMM has its drawbacks. It is computationally more demanding than simpler
algorithms like K-Means, and the performance can be sensitive to the initial choice of
parameters. Careful consideration and tuning are required to ensure optimal results.
Despite its strengths, HDBScan is not without its challenges. While it generally demonstrates
robustness to parameter choices, some sensitivity may persist, particularly when dealing with
datasets featuring varying densities or irregularly shaped clusters. Additionally, the algorithm's
performance may face limitations on high-dimensional data, necessitating thoughtful
consideration of dimensionality reduction techniques. Clusters with similar densities may pose
difficulties, potentially leading to the merging of adjacent clusters. Nonetheless, HDBScan's
unique combination of density-based classification, hierarchical clustering, and automatic cluster
detection makes it a valuable tool for data scientists tackling clustering tasks in diverse and
complex datasets.
Local Outlier Factor (LOF) is not a clustering technique but a scoring algorithm. Basically,
provides a score to each point, and hence, outliers can be identified. We implemented this to try
to find areas with very low density. Therefore, we could find abnormal data and analyze it. LOF
identifies outliers by comparing the local density of a data point to its neighbors. Points with
significantly lower density are considered outliers. LOF is effective at identifying local outliers
and does not assume a specific shape for clusters. However, it is sensitive to the choice of the
neighborhood size and may face challenges with global outliers.
As we have mentioned at the beginning, and after defining the clustering algorithms that we
have tested so far, let’s see how to obtain the best grouping. Note that the grouping or clusters
rely on the data that we have, and the clustering algorithms only find similarities and groups, but
the source data is not changed at any time when doing clustering.
BIC is a score that measures the balance between the model fit (likelihood) and its complexity
(number of parameters). To properly understand this tool, we need to think about how a mixture
of Gaussians work. In a nutshell, a mixture is just a sum of Gaussian distributions, or
components, and each component will have a mean and a variance. The elements of that sum,
i.e. the components are 1 cluster each. So, you can have 3 components, a mixture model, and
model just 3 clusters, or you can increase this to the number of points in the database and each
instance will belong to its own cluster. Nonetheless, there must be a balance between “how well
shaped my GMM is with respect to the data”, and how many components did I need to get to
that statement. Hence, we compute the BIC, that is a penalization based on the number of
parameters over the likelihood of the model with respect to the data.
By computing the BIC for different cluster configurations, one can identify the optimal number of
clusters by selecting the configuration that minimizes the BIC value. The BIC's ability to account
for both the fidelity of the model to the data and the simplicity of the model ensures a nuanced
and principled approach to determining the most suitable number of clusters, thereby enriching
the analytical depth of the clustering report.
1.6.1. Summary
2. Data Engineer
● Rest API
○ V1
○ V2
● Kafka
○ V1
○ V2
● Separated Demo environment
● Deployment of Google Cloud Functions
● Data Storage and Management
○ Milvus
○ MongoDB
○ Google Bucket
● Package development for all AI modules
○ Milvus connector
○ Rest API
○ senstile-ai-finger
○ senstile-ai-press
○ senstile-ai-micro
● Document all our developments
● Fabrik Demo WebApp for investors day
Other tasks
● Scan textiles with sensor
● Scan textiles with Spectrometers
3. Results and discussion
Following the approach 4-2-1
One of the use cases of the company consists of finding similar textiles to a given one.
However, the word similar can lead to different interpretations that may be more suitable in
some scenarios than others. Therefore, the main goal of the AI Team discussed on the 19th of
October, 2023, was to find the 4-2-1 approach. The 4-2-1 project is based on the 4 modules that
we have, finger, press, micro and ir and their combinations named properties. This is a
hierarchical or cascade approach in which in the second level, the models are combined so that
2 properties are obtained. Finally these 2, or the first 4 are combined to obtain the final digital
footprint that is also used to find similar textiles in general. A graphical representation can be
seen in Figure 9.
At the moment of writing this report, the current development is still on the 1st level, referring to
the 4 modules separately. The main aim is to know if the current developments in the different
modules lead us in the right direction. In other words, we want to test if the features extracted
with the different modules provide meaningful information and hence, that our feature extractors
are valid. So far, we have realized that the preliminary results are promising but still these need
a considerable amount of research.
3.1 Finger
Regarding the finger module, the feature vector consists of 3 variables. Hence, a 3D plot can
be done and this helps to understand the results. To the best of our knowledge, we strongly
believe that the feature extraction works and that the results are promising. To reach this
conclusion, we have performed clustering with a variety of techniques and we have verified
manually the results with some set of textiles. As can be seen in Figure 10, also shown in the
Demo Investor’s day, the current solution looks promising.
Finger 10. Image that shows the aggrupation of textiles based on the finger data. Taken from
Investors Day presentation. X axis is S(FFT) and y axis is the S(FFT²)
The following images are extracted from the mentioned paper Humanoid Identification of Fabric
Material Properties by Vibration Spectrum Analysis. The conditions of our finger are very much
similar to the ones given there (velocity is about 20mm/s and the weight of the head can be of
about 10 grams = 100mN).
How can we understand that, assuming that X axis is the S(FFT) and Y axis is the S (FFT2) we
are getting values so far away compared to the values obtained in the paper?
In the paper the S(FFT2) maximum is about 80,000 (8x104) - here is about 2,000,000,000
(2x109), this is about 25,000 times bigger here than in the reference paper. This can’t be correct,
or at least, we need to find a physical explanation for this.
Also in the paper the S(FFT) maximum is about 2,500 (2.5x103) - but here is about 80,000, this
is about 32 times bigger. This can’t be correct either, or at least, we need to find a physical
explanation for this.
Please make sure we have all this understood, for this make sure we have clarity on the
intermediate steps. Also, please extend this to what we see in the 3D view in tensorboard. Are
we also so far away with the scale?
It is important to note 2 main concerns. Firstly, the clustering only shows the evident things that
we can see in the 3D projection plot. Hence, if visually we see 4 groups, the clustering needs to
match these groups. Secondly, we have already detected some cases where there are some
errors regarding the raw data. This strongly affects the results, and needs revision to keep
improving the results.
3.2 Press
Considering the press module, the feature vector consists of a vector of length 5000 where its
peak is centered in the middle. The projection is based on 7 engineered features, being the
position of the peak in the raw data, the climbing slope of the curve, the kurtosis and the
skewness of the curve, the area under the curve, the wrinkleness of the curve and the residuals
of two linear regression curves, the first one can be plotted from the first value to the peak, while
the second one is from the peak to the last value of the curve. In this case, since we have 7
values, in order to obtain a 3D plot, a dimensionality reduction technique is applied like the PCA.
The most reach variables are those that do not depend on others1. Hence, since these variables
are very sparse, the PCA takes 3 variables that can not capture very high variance. This means
that the projection is not going to be very accurate.
As occurs with the Finger, we believe that the results obtained are promising in this regard also,
as can be seen in Figure 11. Analyzing the 7 engineered variables, and the clustering performed
over these 7, we have seen that the textiles are also similar after performing a manual search
among these. The experiment was based on the perception of softness, and compressibility of
the textiles. We have also noted that the vast majority of our textiles are flat, and hence, the
press does not provide too much information when analyzing these textiles. However, we
correctly recover and categorize based on the compressibility of the textiles.
Figure 11. Press curves of a consistency test. The results are very similar. This corresponds
with the article code 391, side 1.
3.3 Micro
Analyzing the results from the micro module, we have obtained a feature vector of 1024 using
the ImageBind neural network. This neural network is a feature extractor from images, text, and
1
Imagine that you have features of a room, and among others, you have length, width and height. Also
you have area and volume. These two are linear combinations of the previous ones. They do not provide
any new information, the information is already hidden in the first 3.
audio. As occurs also in the press module, the projection is obtained using a dimensionality
reduction technique. The PCA shows a biased visual representation of the textiles. However,
after running the UMAP dimensionality reduction technique, and then performing a clustering,
we can see that the results match to some extent the top level labels2 provided by Senstile.
Considering that, when writing this report, we analyzed 1487 textiles (counting side 1 and side 2
as different textiles), and the label distribution is shown in Table 1, there are only 2 major
categories. Our clustering solution with the UMAP features used as our feature vector, recovers
these 2 main classes. However, it is unable to recognize the minority classes due to the scarce
number of instances.
Lace/net 7 0,47 -
Table 1. Distribution of labels provided by Senstile considering the dataset of micro images
used when this report was written.
3.4 IR
Finally, considering the infrared module we have performed a very similar approach followed in
the micro module. We have used the ImageBind feature extractor and performed clustering. The
results in this case with the UMAP feature vectors are not that promising. The label distribution
is shown in Table 2. This table can not be used to assess the accuracy of the methodology. The
predictions are that X number of instances have been classified as that class, but do not mean
that this prediction is correct. To provide insights in this regard, we can say that the 2 majority
classes are being correctly classified. However, the 3 others are totally being misclassified. This
can potentially be motivated by the lack of instances and the unbalanced distribution.
2
These labels are: knitted, woven, hairy, lace/net and non woven.
Category Count Percentage Predictions
Hairy 85 37,66 84
Lace/net 2 0,20 -
Table 2. Distribution of labels provided by Senstile considering the dataset of ir images used
when this report was written. For woven, the predictions are in 2 separated clusters. Therefore,
we sum both as can be seen in the table.
3.5 Conclusion
Considering the Data Science part, the AI team has tested a large number of different
techniques and has brought forward numerous solutions to address the problem of finding the
most similar textiles to a given sample. The main approach has been to use an unsupervised
classification schema due to the lack of labeled examples. Nevertheless, how to validate our
results is still an open question to which we have some answers or potential solutions. These
are listed in the following lines.
In order to analyze the output of the machine learning models that we have developed, we must
rely on the inspection of the textiles, or in statistical measures of the clusters we obtain.
Considering the first solution, in our humble opinion, we should select a subset of textiles and
analyze these with respect to what a module can read. We strongly believe that a module is
able to classify instances between a range of classes that largely differ from what Senstile
provided so far.
Statistically analyzing the obtained results, relies on the following criteria that we have already
analyzed. Among other, already computed metrics, we describe here the most intuitive one that
is the Silhouette coefficient. This coefficient consists of a simple concept: how compact a cluster
is, and how far away it is from other clusters. After computing this for every single cluster, a
metric is obtained. This is how we have validated so far our clustering results. But indeed, we
need to test our solutions against a golden truth.
In the opinion of the AI Team, we think that we need to separately analyze the obtained results
from each data module. This can be done in 2 different ways.
On the one hand, we can select a set of textiles, and arrange these based on what each module
can read. In other words, manually, cluster the textiles without looking at the proposed clusters,
nor the labels provided by Senstile. Afterwards, compare the manually obtained clusters against
what the clustering algorithms of the AI Team output. Note that, we might encounter more or
less categories than what Senstile labels have already provided. And this is essential for the
forthcoming analysis.
On the other hand, we can select a subset from the clustering output, and find the reason why
the textiles are separated in different groups. If this makes sense, mark it as valid.
Only then we can move forward to tackle the 2nd level. We need a strong foundation for what is
coming.
Last but not least, the AI Team can guarantee that after reviewing the work of past AI Teams,
these are far from being comparable to the approach, and solutions obtained by the current AI
Team. In particular, addressing the problem in an unsupervised way leads us to very different
paths, and very different conclusions.