Towards Large-scale Empirical Assessment of Web APIs Evolution
Fabio Di Lauro, Souhaila Serbout, Cesare Pautasso
Hello everyone,
I'm Fabio Di Lauro and today we will present you our paper, titled Towards large scale
empirical assessment of web APIS evolution.
that is written by me and my collegue Souhaila Serbout with the supervision of the Prof. Cesare Pautasso
Web Application Programming
Interfaces
Remotely access services over HTTP(S)
Speed up development with third party API services
Reuse software written in different languages
1. Web Application Programming Interfaces - also known as APIs - are used to remotely access
software services over the HTTP and HTTPS protocols.
2. They make it possible to build complex applications very quickly using third-party data sources
and reusing
software delivered as a services.
3. Another advantage of using API is that you can re-use software
written in different languages without the need to
translate them in the language that you are currently using for a project
APIs evolution:
Add
Update
Delete
Mitigation techniques:
Preview of new experimental versions
Support one or more versions at the same time
API systems many times evolve during their lifetime for different reasons. These changes could
have a minor or severe impact on clients, potentially breaking their functionalities. The impact of
change depends on the type of the operations
the developers want to introduce in the system.
For example adding features usually doesn't produce any kind of impact on clients but - on the other
hand - editing, or worst, removing implemented features could
led to break the expected clients behaviour.
To mitigate this, service owners can use different techniques to guarantee the stability of their
offerings like, for example, reveal a preview of
a new API experimental versions only to selected clients or they can chose to support one or more
versions of an API at the same time, maybe introducing a deadline for
the switch off of the older one.
OpenAPI 3.x or Swagger 2.x
OpenAPI Specifications: https://spec.openapis.org/oas/v3.1.0
In our work we focused on API projects in which the Web interfaces are described using OpenAPI
specification language.
The OpenAPI Specification was originally based on the Swagger Specification, donated by SmartBear
Software company, and now are mantained by the OpenAPI Initiative that,
as an open governance structure under the Linux Foundation, is focused on creating, evolving and
promoting a vendor neutral description format
GitHub crawler (December 2020)
4,682 open-source API descriptions (Swagger 2.x || OpenAPI
3.x)
34,638 commits (13,786 commits with valid content)
55% of the APIs have more than 1 commit
We developed a software, hereinafter called crawler, that is responsible to check for new OpenAPI
specification files appearing on GitHub, download them and keep updated
their histories on our database. In december 2020 the crawler perform a complete github scan
discovering more than 4 thousand open-source API descriptions in both Swagger and OpenAPI
format, more than 34 thousand commits, which represent their histories, and something less than 40% of them contain valid specifications
files.
We discovered also that 55% of the
collected APIs contain more than 1 commit.
We retrieved also some metadata associated to each artifact like, for example, the commit timestamp,
the version identifier and the API title.
Yearly distribution of commits
In this slide we can observe how much the Swagger and OpenAPI standards become very popular over the last 7 years.
Their presence on GitHub increased by 82 times, if you consider the period starting from 2015 to 2020
Our
goal
Assess the feasibility of using API
descriptions mined from GitHub to study how Web APIs evolve over long periods of
time.
Do the API artifacts have a sufficiently long history?
What kind of changes can be detected by
analyzing basic metadata associated with the artifacts?
How stable are API artifacts over time?
Do APIs tend to grow or shrink over time?
Our goal is to assess the feasibility of using the collected API descriptions in order to study how
these Web APIs evolve over time.
We aim to answer to the following research questions:
1. Do the API artifacts have a sufficiently long history?
2. What kind of changes can be detected by analyzing basic metadata associated with the artifacts?
3. How stable are API artifacts over time and
5. Do APIs tend to grow or shrink over time?
Answering these questions I will let you my collegue Souhaila who will present our initial findings
API Size = Number of Operations
A simple metric that counts how many operations are published by the API
SOUHAILA
In order to answer the previous questions we must first define the metric choosen for our study.
We chose a simple as a meaningful metric which is the API size, that is basically the
count of published operations by each API.
in the histogram are reported the API size distribution for every commit. As you can see few
hundreds commits don't contain any operations and the size metric follows
an exponential distribution with a tail that reaches up to 357 operations.
OpenAPI-to-Tree is an API visualization tool developed by Souhaila Serbout
https://openapi-to-tree.herokuapp.com/
Size (Number of Operations) = 4
Here you can see a very simple example. In the left there is a simple OpenAPI specification file in
which you can observe two path entries: /init and /handle.
handle path it's reachable using three different HTTP verbs that are into specific get, post and
delete.
1. Using the previous definition, here we can count 4 different operations
described, so our size is exactly 4.
For showing this example I used OpenApi-to-tree, a tool developed by our collegue Souhaila Serbout,
that is available at this web address. This tool is particularly
interesting especially if you need to visualize an API description and you need some dynamic
interaction with it.
API evolution examples
Despite the large number of APIs collected in the next slide I want to show you a set of few APIs
cases with different evolution histories.
OpenStorage SDK
In the OpenStorage case study we can note how the version is increased in most of
the commits and how the developers have chosen to increment the version numbers essentially using only the minor identifier apparently
not introducing any major update to the code.
This is a perfect example of growing API with a short release cycle.
We can indeed observe a small but positive speed value of 0.30 operations per day.
Dockstore API
This particular example has a commit where 192 operations were deleted at once. Then it started slowly
growing during the next 254 days adding 85 operations more. As you can see, on the day 394, there
was a commit that inserted 122 operations to the API and, after that, we can observe
minimal variations in its size with a variation of 5 operations from day 394 to 574.
While during the first half of its history there are no changes in versioning metadata, after the
day 279 the developers start to change versions more frequently
This could be a perfect example of a breaking changes introduction probably caused by a code refactoring
activity
Kubermatic API
In Kubermatic example we can observe an API which both grows and shrinks over its history of 199
commits over more
than 2 years, eventually more than tripling its initial size. It grows rapidly with an average speed
of 1.77 operations/day.
I don't have time to go over the history of all the other 873 APIs we have collected,
so let's look at all of them together.
RQ1: Dataset overview
Commit History of APIs with more than 10 commits, sorted by number of
commits
In this figure we can see the entire history of all the APIs we have collected, which have more than 10 commits where each dot represents a
commit.
Its horizontal position shows which API changed, while the vertical position
represents when the change occurred, relative to the time of the initial commit for the
corresponding API.
The colors highlight how many commits have been found for each API.
We can see that for some APIs there are commits spanning across more than four years and that there
are 280 APIs which, as we said before, have more than
10 commits.
Age
How old are the APIs?
We define the age of the API as the time interval between the last and first
commit on its history. In the figure we can observe the age distribution of the APIs in our
collection.
Most APIs have an age of zero, thus indicates that those APIs have only 1 commit in their history.
On the other hand there are also some APIs whose history spans up to
5 years which make them potential subjects for further study. We noted also that in many APIs the
specification files are pushed only once and, after that,
they are no longer touched.
RQ2: API Versioning Metadata
What kind of changes can be detected from versioning metadata?
It is clear that the number of versions is bound by the number of commits since every
observable change of the version identifier requires a new commit to store the
updated API specification, but commits do not always match version increments.
Different API developers follow different versioning practices, ranging from version identifiers incremented every
other commit to API histories with only a few explicitly identified versions over hundreds of commits.
We also found examples in which the title of the API itself would change, while the OAS document used to describe
it would remain the same.
RQ3: API stability
Do APIs change less as they age?
How likely is an API to change? To find out, for every commit, we plotted the age of the API on the X axis against the
time since the previous commit on the Y axis. We can see that the interval between commits remains stable as the age of the API increases.
RQ4: API Size Evolution
How many APIs with more than 2 commits grow or shrink their size?
If we classify APIs in terms of whether they grow, shrink, or simply do not change, we
obtain the groups shown in the tables. In the first table are counted APIs
that have grown larger or smaller over their entire history. Here we can see that 6% of
them shrink, 44% remain stable and 50% of the APIs grow.
If we also consider changes occurring at every
commit, as shown in the table on the right, we can see that 42% of the APIs keep a constant size in
all of their commits.
Moreover there are only 17 APIs in which changes are detected in their intermediate size but
end up with the same size as the initial one. Furthermo, 16% of the APIs have a
history with some commits increasing their size while others reducing them
In summary
Do the API artifacts have a sufficiently long history?
Yes (875 APIs with a history of more than 2 commits)
What kind of changes can be detected by
analyzing basic metadata associated with the artifacts?
Commits do not always match version increments
How stable are API artifacts over time?
Likelihood of change does not depend on age
Do APIs tend to grow or shrink over time?
More APIs became larger, but some did shrink
A1) Thanks to the growing adoption of the OpenAPI specification language, a sample of 875 APIs with a
history
of more than 2 commits can be found on GitHub and, also, we find many more API description
artifacts, 1322 to be precise, without a
commit history of significant length. While these are still interesting to analyze, it remains to be
seen whether developers pushed only a
single commit because their APIs were committed only when stable, or we have crawled repositories of
projects which never went beyond the first commit.
A2) Commits do not always match version increments
A3) Likelihood of change does not depend on age
A4) Regarding the API evolution content we have focused on one possible API size metric, which has
allowed us to detect changes for
more than half of the APIs collected. For sure we will need a more in-depth
analysis of API artifacts to detect and measure changes beyond their size as, for example, to spot
changes in parameters, responses and schema definitions.
By only looking at the size we could show that the majority of the APIs in our
sample, which changed their size, have a tendency to grow larger over time.
Voyager
speed = 0.00 operations/day
there is also this one that we think is important to highlight
Voyager is an API example which has 40 commits in which 16 different versions are distributed.
However, we can notice that its size remains the same during his whole evolution time. This case is
an example where a more detailed
metric is required to detect changes. In fact we observed that 23 commits of its history contain
schema definition changes, 6 contain changes to paths parameters definitions, and 2 are related to
responses modifications; Furthermore, in its history
developers push 6 major and 8 minor version upgrades always without changing the operations number.
This means that our work is not finished.
Future work
Can these artifacts be used to trace, measure,
and classify changes on APIs structures during their
lifetime?
Define more metrics
Track repository relationships and forks
...
As future work we plan to:
1) We need to investigate more on how these artifacts can be used to trace, measure,
and classify changes on APIs structures during their lifetime
2) define heuristics for classifying the changes using more metrics like HTTP methods, paths and
query parameters and properties of response
objects.
2) We also plan to establish a non-linear, partial order relationship between
the artifacts which may undergo forks and merges across different repositories.
3) and, well, we are open to suggestions
Towards Large-scale Empirical Assessment of Web APIs Evolution
Fabio Di Lauro, Souhaila Serbout, Cesare Pautasso
References
Di Lauro F., Serbout S., Pautasso C. "Towards Large-scale Empirical Assessment of Web APIs
Evolution", Proc. of the 21th International Conference on Web Engineering (ICWE 2021), Biarritz,
France, May 2021
Lübke, D., Zimmermann, O., Pautasso, C., Zdun, U., Stocker, M.: Interface evolution patterns —
balancing compatibility and flexibility across microservices lifecycles. In: Proc. 24th European
Conference on Pattern Languages of Programs
(EuroPLoP 2019). ACM (2019)
OpenAPI 3.1.0 specification. https://spec.openapis.org/oas/v3.1.0 ,
accessed: 2021-04-20
OpenAPI-to-tree. https://openapi-to-tree.herokuapp.com/ ,
accessed: 2021-04-21
Prance. https://pypi.org/project/prance/ ,
accessed: 2021-04-20