{"id":33714,"date":"2026-05-04T05:00:00","date_gmt":"2026-05-04T03:00:00","guid":{"rendered":"https:\/\/sii.pl\/blog\/?p=33714"},"modified":"2026-04-30T14:43:49","modified_gmt":"2026-04-30T12:43:49","slug":"what-does-the-code-from-sii-testing-lab-look-like-repositories-deep-dive","status":"publish","type":"post","link":"https:\/\/sii.pl\/blog\/en\/what-does-the-code-from-sii-testing-lab-look-like-repositories-deep-dive\/","title":{"rendered":"What does the code from Sii Testing Lab look like? Repositories deep-dive"},"content":{"rendered":"\n<p>In the <a href=\"https:\/\/sii.pl\/wp-content\/uploads\/2026\/04\/PL_SWM-30407-CC-Testing-Hackaton-testingowy-Testing-Lab-AI-Edition-1.pdf\" target=\"_blank\" rel=\"noopener\" title=\"\">Sii Testing Lab<\/a> report, we described seven key conclusions from our Hackathon on the role of AI in test automation. The publication generated significant interest, and the number of questions about technical details we received from you exceeded our expectations. That is precisely why we decided to prepare this commentary.<\/p>\n\n\n\n<p>Although in the report we confirmed that AI increases efficiency, that the tester&#8217;s experience matters, and that the models have their limitations, those analyses (by their very nature) remained at the level of generalized conclusions.<\/p>\n\n\n\n<p><strong>This article is a step further. We go one level lower: into the code, into specific implementation decisions, and into moments where the boundary between the tool&#8217;s capabilities and the engineer&#8217;s critical thinking becomes clearly visible.<\/strong> To focus on what matters, on patterns and code quality rather than on rankings, I discuss the repositories anonymously. I care about drawing universal lessons, not about evaluating the work of individual teams.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Let&#8217;s first recall what we evaluated<\/strong><\/h2>\n\n\n\n<p>Each repository was evaluated against eight quality criteria:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>K1<\/strong> \u2013 Alignment with business goals and requirement coverage.<\/li>\n\n\n\n<li><strong>K2<\/strong> \u2013 Test data and state preparation.<\/li>\n\n\n\n<li><strong>K3<\/strong> \u2013 Solution stability.<\/li>\n\n\n\n<li><strong>K4<\/strong> \u2013 Quality of selectors and locators.<\/li>\n\n\n\n<li><strong>K5<\/strong> \u2013 Test architecture and patterns.<\/li>\n\n\n\n<li><strong>K6<\/strong> \u2013 Assertion quality.<\/li>\n\n\n\n<li><strong>K7<\/strong> \u2013 Diagnostics and error handling.<\/li>\n\n\n\n<li><strong>K8<\/strong> \u2013 Engineering quality.<\/li>\n<\/ul>\n\n\n\n<p>Each repository was evaluated on a five-point scale; however, these scores should be interpreted through the specifics of the study. We were comparing two approaches:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>teams conducting automation without using AI (Oldschool),<\/li>\n\n\n\n<li>teams conducting automation with the use of AI (AI).<\/li>\n<\/ul>\n\n\n\n<p>Both groups had only 6 hours of work, which determined the scoring approach. The jury deliberately turned a blind eye to unfinished modules, prioritizing engineering maturity and the quality of what was actually delivered. It is worth noting, however, that for the AI group, which, thanks to the technology, imposed a much faster pace, the bar was set correspondingly higher.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>The Oldschool Group: Decent work in difficult conditions<\/strong><\/h2>\n\n\n\n<p>Before we go into code details, an important contextual note: <strong>six hours is genuinely a truly short time to build anything from scratch<\/strong>. Decisions that look like simplifications were often conscious compromises resulting from time pressure rather than from a lack of knowledge or skills.<\/p>\n\n\n\n<p>With this in mind, let&#8217;s look at what the Oldschool teams delivered.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>The tests pass, but what do they verify?<\/strong><\/h3>\n\n\n\n<p>In the Oldschool repositories, one recurring pattern dominates: <strong>a sequence of UI steps ending with an assertion checking element visibility<\/strong>. The tests work; they cover the main user paths: login, cart, checkout. At first glance, everything looks correct.<\/p>\n\n\n\n<p>The problem reveals itself when we ask the question: What is actually being verified here?<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: xml; title: ; notranslate\" title=\"\">\n\/\/ A typical test from the Oldschool group*\npublic void OrderPlacement() {\n    UserIsLoggedIn();\n    UserHasProductsAddedToCart();\n    _productPage.OpenBasket();\n    _productPage.ProceedWithOrder();\n    _addressFormPage.ClickContinue();\n    \/\/ ...\n    Assert.That(_orderConfirmationPage.ConfirmationMessage);\n}\n<\/pre><\/div>\n\n\n<p><em>* Example 1:1<\/em><\/p>\n\n\n\n<p>The test confirms that <strong>the process is completed, but it does not confirm whether it was completed correctly.<\/strong> Verification of the price, of applied discounts, and of correct tax calculation is missing. The tests check the flow, not the business logic.<\/p>\n\n\n\n<p>How might the same test look if written with business value in mind?<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: xml; title: ; notranslate\" title=\"\">\n\/\/\/ A test verifying business logic*\n&#x5B;Test]\npublic void Should_Add_One_Product_To_Basket() {\n    CreatedProduct product = PrestashopTestDataService.CreateProductWithQuantity();\n    Driver.GoToUrl(Urls.Product(product.Url));\n    At&lt;ProductDetailsPage&gt;(x =&gt; x.AddToCart());\n    At&lt;CartPopupPage&gt;(x =&gt; {\n        x.ProductName.Should().Be(product.Name);\n        x.ProductQuantity.Should().Be(&quot;1&quot;);\n    });\n}\n<\/pre><\/div>\n\n\n<p><em>* .Be(&#8220;1&#8221;) used as a simplification to present the concept<\/em><\/p>\n\n\n\n<p>The difference is fundamental: instead of checking whether a success message appeared, we verify specific domain values: the product name and the quantity.<\/p>\n\n\n\n<p><strong>It is worth noting a positive exception in the other direction<\/strong>: one of the Oldschool teams wrote parameterized tests with negative cases, something no AI team did. The example shows awareness that it is worth testing more than just the happy path:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: xml; title: ; notranslate\" title=\"\">\n\/\/\/ Negative tests with parameterization  (Oldschool)*\n@ParameterizedTest\n@MethodSource(&quot;provideIncorrectUsers&quot;)\npublic void verifyAccountRegistrationWithoutMandatoryFields(\n        User user, String description) {\n    registrationLoginSteps.verifyUserIsNotLoggedIn();\n}\n\nprivate static Stream&lt;Arguments&gt; provideIncorrectUsers() {\n    return Stream.of(\n        Arguments.of(DataProvider.getUserWithoutMandatoryField(), &quot;...&quot;),\n        Arguments.of(DataProvider.getUserWithIncorrectEmail(), &quot;...&quot;),\n        Arguments.of(DataProvider.getUserWithTooShortPassword(), &quot;...&quot;),\n        Arguments.of(DataProvider.getUserWithAlreadyRegisteredEmail(), &quot;...&quot;)\n    );\n}\n<\/pre><\/div>\n\n\n<p><em>* Example 1:1<\/em><\/p>\n\n\n\n<p>This is an interesting example of thinking about what can go wrong. AI focused mainly on the happy path. <strong>An old testing instinct that the language models have not picked up.<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Interface instead of API: A conscious compromise, but with consequences<\/strong><\/h3>\n\n\n\n<p>In most Oldschool repositories, the data setup is based on the UI. This is understandable given the limited time since API integration requires additional infrastructure. However, the cost of this choice is visible in the code:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: xml; title: ; notranslate\" title=\"\">\n\/\/\/ Setup via UI (quick to write, fragile to maintain)*\n&#x5B;SetUp]\npublic void Setup() {\n    UserIsLoggedIn();\n    _loggedUser.IsUserLoggedIn();\n    UserHasProductsAddedToCart(); \/\/ navigates through UI to a product\n}\n<\/pre><\/div>\n\n\n<p><em>* Example 1:1<\/em><\/p>\n\n\n\n<p>Every change in the admin interface can trigger a domino effect, taking down not a single test but the entire setup. The tests are coupled with each other and difficult to isolate. Although under time pressure, such a compromise is forgivable; in a production project, it would have to be addressed eventually.<\/p>\n\n\n\n<p>The decision to cover this layer was made by two of the five Oldschool teams. These teams, despite the limited time, decided to invest in API-based data setup in parallel with work on the UI layer.<\/p>\n\n\n\n<p>In practice, this meant a division of specialization, where one person developed the UI layer while another built the API-based setup. <strong>The result was clearly higher quality and stability of the test data<\/strong>. Importantly, such a decision was not the easiest path, since it required more coordination, especially around integrating changes and ensuring consistency of implementation. Choosing stability over implementation speed is a conscious decision that best speaks to the craftsmanship of those teams.<\/p>\n\n\n\n<p>It is proof that engineering responsibility and an understanding of the risks won out over time pressure.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Stability and selectors: works locally, but in CI?<\/strong><\/h3>\n\n\n\n<p>In the K3 and K4 areas, a hidden risk emerges. Some teams use hardcodedwaits instead of intelligent <em>waits<\/em>:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: xml; title: ; notranslate\" title=\"\">\n\/\/ Hardcoded wait (a typical source of flakiness)*\nThread.Sleep(1000);\nvar continueButton = DriverProvider.Driver.FindElement(\n    By.Name(&quot;confirmDeliveryOption&quot;));\ncontinueButton.Click();\n\n\/\/ Suggested: a better approach (waiting for a specific state)\nawait page.click(&#039;#checkout&#039;);\nawait expect(page.locator(&#039;#payment&#039;)).toBeVisible();\n\n<\/pre><\/div>\n\n\n<p><em>* Example 1:1<\/em><\/p>\n\n\n\n<p>Similarly with selectors. Instead of stable test attributes, there appear to be short, structurally dependent <em>selectors<\/em>:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: xml; title: ; notranslate\" title=\"\">\n\/\/ Fragile selector, dependency on the button text*\nprivate IWebElement cartButton =&gt; DriverProvider.Driver\n    .FindElement(By.XPath(&quot;\/\/a&#x5B;@href=&#039;\/\/145.239.29.97\/cart?action=show&#039;]&quot;));\n\nprivate IWebElement continueShopping =&gt; DriverProvider.Driver\n    .FindElement(By.XPath(&quot;\/\/button&#x5B;@data-dismiss=&#039;modal&#039; and contains(.,&#039;Continue shopping&#039;)]&quot;));\n\n<\/pre><\/div>\n\n\n<p><em>* Example 1:1<\/em><\/p>\n\n\n\n<p>The tests work locally. The question of how they behave under small UI changes or in a slower CI environment remains open. In this case, the selector should be tied to the element&#8217;s intent (what it does), not to its representation (how it looks or what it says). A change of interface language, environment, or layout should not break the test.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Page Objects speak the language of clicks, not of business<\/h3>\n\n\n\n<p>The architecture in the Oldschool group is correct: Page Objects appear, the code is readable, and the structure makes sense. However, the Page Objects mostly play the role of technical wrappers, not of a domain layer:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: xml; title: ; notranslate\" title=\"\">\n\/\/ Wrapper without domain logic*\n@Step(&quot;Click &#039;Continue&#039; button in Address section&quot;)\npublic void clickContinueButtonInAddressSection() {\n    addressContinueButton.click();\n}\n\n@Step(&quot;Click &#039;Place order&#039; button&quot;)\npublic void clickPlaceOrderButton() {\n    placeOrderButton.click();\n}\n\n\/\/ Suggested: a domain method combining steps and verifying the outcome\npublic Order placeOrder(Product product) {\n    clickContinueButtonInAddressSection();\n    clickContinueButtonInShippingMethodSection();\n    selectCashPaymentRadioButton();\n    acceptTermsAndConditions();\n    clickPlaceOrderButton();\n    return orderConfirmationPage.getOrderDetails(); \/\/ returns a domain object\n}\n}\n<\/pre><\/div>\n\n\n<p><em>* Example 1:1<\/em><\/p>\n\n\n\n<p>In the first case, the test describes how to click. In the second, it checks how the system behaves, which fundamentally affects the test&#8217;s value. A UI change (for example, a new step in checkout) requires a change in only one place rather than across all the tests.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Assertions: A false sense of security<\/strong><\/h3>\n\n\n\n<p>The most recurring problem in the Oldschool repositories is the quality of assertions. What dominates are constructs checking visibility or truthiness instead of specific business values:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: xml; title: ; notranslate\" title=\"\">\n\/\/ An assertion checking only visibility*\nAssert.That(_orderConfirmationPage.ConfirmationMessage);\n\n\/\/ Suggested: an assertion checking a domain value\nupdated.StockAvailable.Quantity.Should().Be(updatedQuantity);\nupdated.StockAvailable.ProductId.Should().Be(productId);\n\n<\/pre><\/div>\n\n\n<p><em>* Example 1:1<\/em><\/p>\n\n\n\n<p>The effect is paradoxical: the tests pass, they look correct, but they do not protect the system against regression, <strong>giving a false sense of security baked into the pipeline.<\/strong><\/p>\n\n\n\n<p>To show the full spectrum of the Oldschool group: at the other end of the scale, there is a repository that contained a single file, an unmodified template generated by Playwright CLI, testing the Playwright.dev page rather than PrestaShop:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: xml; title: ; notranslate\" title=\"\">\n\/\/ The only test in the repository, default template from playwright.dev*\npublic async Task HomepageHasPlaywrightInTitleAndGetStartedLink()\n{\n    await Page.GotoAsync(&quot;https:\/\/playwright.dev&quot;);  \/\/ &lt;-- not PrestaShop\n    await Expect(Page).ToHaveTitleAsync(new Regex(&quot;Playwright&quot;));\n    var getStarted = Page.Locator(&quot;text=Get Started&quot;);\n    await getStarted.ClickAsync();\n    await Expect(Page).ToHaveURLAsync(new Regex(&quot;.*intro&quot;));\n}\n}\n<\/pre><\/div>\n\n\n<p><em><em>* Example 1:1, with a shortened method name without affecting the outcome<\/em><\/em><\/p>\n\n\n\n<p>It may seem that this is a malicious observation, but it is one of the examples our report talks about: <strong>without AI, some teams burn a day on a single technical problem<\/strong>. Someone got stuck on the environment setup, and six hours disappeared. Two teams in the same group, with scores of 4.3 and 2.9, that is the real spread of results in the Oldschool group.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>The exception that ruled the entire group<\/strong><\/h3>\n\n\n\n<p>With all these observations in mind, it is worth pointing out that in the Oldschool group, a difference appeared that is not minimal. One repository (with a score of 4.3\/5) <strong>was not only the best in the Oldschool group, but the best result in the entire study, above all the AI teams<\/strong>.<\/p>\n\n\n\n<p>This repository stood out at every level:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>an architecture of five separate projects (Core, Api, Ui, TestSupport, Tests),<\/li>\n\n\n\n<li>a dedicated data service,<\/li>\n\n\n\n<li>full cleanup after each test, a<\/li>\n\n\n\n<li>nd zero secrets in the repository.<\/li>\n<\/ul>\n\n\n\n<p>We will return to it in detail when we analyze the patterns.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Did the presence of an architect make a difference?<\/strong><\/h3>\n\n\n\n<p>In both groups, the distribution of forces was the same: four Regular+Regular teams and one with a person of significantly higher seniority.<\/p>\n\n\n\n<p>In the Oldschool group, the difference was noticeable. The repository with the architect stood out with a more careful structure, well-thought-out abstractions, and engineering discipline visible in every file. You could feel that someone in the team knew how it should look.<\/p>\n\n\n\n<p>But here, too, a key observation appears: experience alone, without AI support, was not enough to change the fundamental approach. The time pressure of the six-hour Hackathon pushed everyone towards similar compromises.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em><em>Experience improves the quality of implementation. Yet AI causes experience to start defining the entire outcome.<\/em><\/em><\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>The AI Group: A different approach, the same pitfalls (and one paradox)<\/strong><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Structure from the very first commit<\/strong><\/h3>\n\n\n\n<p>When we open the repositories of the AI teams, one thing is striking: a high level of structure from the very beginning. The tests are better named, logically grouped, and immediately embedded in the architecture. There is no &#8220;let&#8217;s start with anything, we&#8217;ll clean up later&#8221; stage.<\/p>\n\n\n\n<p>This is visible in how the base classes look, how the fixtures are organized, and how consistently the patterns are applied. The comparison is striking:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: xml; title: ; notranslate\" title=\"\">\n\/\/ Oldschool (test starts with UI details)*\npublic void OrderPlacement() {\n    var url = Config&#x5B;&quot;BaseUrl&quot;] + &quot;home-accessories\/7-mug.html&quot;;\n    DriverProvider.Driver.Navigate().GoToUrl(url);\n    _productPage.AddProductToCart();\n    Thread.Sleep(1000);\n    var continueButton = DriverProvider.Driver.FindElement(\n        By.Name(&quot;confirmDeliveryOption&quot;));\n    \/\/ ...\n\n<\/pre><\/div>\n\n\n<p><em><em>* Example: a composition of two methods to present the concept<\/em><\/em><\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: xml; title: ; notranslate\" title=\"\">\n\/\/ AI (test starts with the goal)*\n@Test\n@Description(&quot;End-to-end order: add product, fill checkout, pay, verify&quot;)\nvoid shouldPlaceOrder_whenPayingByBankWire() {\n    open(UIRoutes.HOME);\n    homePage.clickProductByName(EXISTING_PRODUCT);\n    productPage.clickAddToCart();\n    \/\/ ...\n    assertThat(orderConfirmationPage.getOrderReference())\n        .as(&quot;Order reference should be present&quot;)\n        .isNotBlank();\n}\n\n<\/pre><\/div>\n\n\n<p><em><em>* Example 1:1 (shortened)<\/em><\/em><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">AI as an architecture accelerator<\/h3>\n\n\n\n<p>The biggest jump in quality in the AI group is visible in architecture (K5). Service layers, conscious fixtures, separation of responsibilities appear, things which were the exception in Oldschool, <strong>here become the standard<\/strong>:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: xml; title: ; notranslate\" title=\"\">\n\n\/\/ A fixture with a full lifecycle, typical of the AI group*\npublic class HybridFixture : TracingFixture {\n    protected UserFactory UserFactory;\n    protected AuthService AuthService;\n\n    &#x5B;SetUp]\n    public async Task SetUpHybridContext() {\n        ApiFactory = new PrestaShopApiFactory(AdminApiContext);\n        UserFactory = new UserFactory(ApiFactory);\n        AuthService = new AuthService(Playwright);\n    }\n\n    &#x5B;TearDown]\n    public async Task TearDownHybridContext() {\n        await ApiFactory.CleanupAsync();\n        await AdminApiContext.DisposeAsync();\n    }\n}}\n<\/pre><\/div>\n\n\n<p><em><em>* Example 1:1 (shortened)<\/em><\/em><\/p>\n\n\n\n<p>AI allows teams to enter a level that normally requires time and many design iterations very quickly. It shortens the path from &#8220;works on my machine&#8221; to &#8220;is defined&#8221;.<\/p>\n\n\n\n<p><strong>But here the first problem arises as well<\/strong>. Part of this architecture is formally correct yet functionally shallow, generating structures that look professional but, without a conscious human-in-the-loop, do not serve any specific purpose.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Assertions: The same Achilles heel as in Oldschool<\/strong><\/h3>\n\n\n\n<p>Despite the better structure, one problem remains almost unchanged: the depth of business validation. Even in the best AI repositories, the assertions are often too generic:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: xml; title: ; notranslate\" title=\"\">\n\/\/ An assertion confirming that something exists*\nassertNotNull(actual.getId(), &quot;Customer ID should not be null&quot;);\nassertNotNull(actual.getId(), &quot;New customer ID should be returned&quot;);\n\n\/\/ Suggested: an assertion confirming that it works correctly\nassertThat(orderConfirmationPage.getOrderReference())\n    .isNotBlank();\nassertThat(displayedPrice).isEqualByComparingTo(expectedGrossPrice);\n\n<\/pre><\/div>\n\n\n<p><em><em>* Example 1:1<\/em><\/em><\/p>\n\n\n\n<p>One of the AI teams went further and built a dedicated PriceUtils class for verifying gross values after applying VAT. This is a rare example of an assertion at the level of business logic in this group.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em><strong><em>AI improves the way tests are written, but it does not enforce what the tests should be checking.<\/em><\/strong><\/em><\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Test data and surprise with secrets<\/strong><\/h3>\n\n\n\n<p>In the area of test data (K2), the AI teams take a step forward, API setup appears, thinking about isolation, and unique data generated by a factory. But the consistency tends to break: in one file, we see a model approach, while a few files away, a UI-based setup emerges.<\/p>\n\n\n\n<p><strong>There is also an area where the AI group did worse than Oldschool<\/strong>: secrets management. Two AI repositories had an API key directly in configuration files <em>committed to the repo<\/em>. One of them additionally contained the comment:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: xml; title: ; notranslate\" title=\"\">\n\/\/ An illustrative example of secrets*\n# Ideally, this should be stored in secrets\n# but let&#039;s keep it here for simplification\napi.key=&lt;API key would go here&gt;\nadmin.password=&lt;password would go here&gt;\n\n<\/pre><\/div>\n\n\n<p><em><em>* Example 1:1 (with sensitive data removed)<\/em><\/em><\/p>\n\n\n\n<p>The Oldschool teams managed this better; the absence of secrets in the repo was the norm there (years of experience?). <strong>This shows that AI accelerates implementation but does not replace engineering awareness in every area.<\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Four patterns in the AI group<\/strong><\/h2>\n\n\n\n<p>Analyzing the five AI repositories, it is clearly visible that they do not form a uniform group. Four different patterns emerge, which I would name:<\/p>\n\n\n\n<p><strong>1. &#8220;Test factory&#8221;<\/strong>: An impressive number of tests (close to 200), hybrid API+UI tests, and <em>fixtures<\/em> with <em>cleanup<\/em>. The largest volume in the entire study. Weakness: secrets in the repo and at times shallow assertions (ToContainTextAsync instead of value verification), as well as API tests limited to Statuscode200.<\/p>\n\n\n\n<p><strong>2. &#8220;Solid craftsman&#8221;<\/strong>: 37 tests, Allure, a dedicated PriceUtils class for VAT verification, assertions on specific values. The best quality average in the AI group. No API for data setup, registration is done through the UI, which is a conscious simplification.<\/p>\n\n\n\n<p><strong>Side note: <\/strong>This team struggled with managing the dynamic-button selectors mentioned in the original report. This burned through part of the allocated time, which required some sacrifices in coverage. They were not alone in this; this pattern recurred frequently in the AI teams.<\/p>\n\n\n\n<p><strong>3. &#8220;Wide reach&#8221;<\/strong>: 60 API tests covering coupons, currencies, promotions, and the catalog. A wide range, but in places the assertions operate on raw JSON instead of on deserialized objects.<\/p>\n\n\n\n<p><strong>Example: <\/strong>instead of deserializing the response and checking an object property, the test compares a fragment of a raw string:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: xml; title: ; notranslate\" title=\"\">\n\/\/ Assertion on raw JSON (fragile and unreadable)*\nAssert.That(body, Does.Contain(&quot;\\&quot;gift_product\\&quot;:\\&quot;11\\&quot;&quot;).Or.Contain(&quot;\\&quot;gift_product\\&quot;:11&quot;),\nAssert.That(body, Does.Contain(&quot;\\&quot;minimum_amount\\&quot;:\\&quot;80&quot;).Or.Contain(&quot;\\&quot;minimum_amount\\&quot;:80&quot;),\nAssert.That(body, Does.Contain(&quot;\\&quot;active\\&quot;:\\&quot;1\\&quot;&quot;).Or.Contain(&quot;\\&quot;active\\&quot;:1&quot;),\n\n\/\/ Suggested: deserialization and assertion on a property\nvar coupon = JsonSerializer.Deserialize&lt;CartRule&gt;(body);\nAssert.That(coupon.Active, Is.True);\nAssert.That(coupon.ReductionAmount, Is.EqualTo(5.0m));\n\n<\/pre><\/div>\n\n\n<p><em><em>* Example 1:1<\/em><\/em><\/p>\n\n\n\n<p>Such an assertion will pass even if the JSON format changes, because it is looking for a pattern in a string, not verifying a domain value. Interestingly, the example shows that the team was aware enough of the problem that in many places they add <em>.Or.Contain(&#8230;)<\/em> with an alternative version without quotation marks. This does not solve the problem, but it shows that someone realized the fragility.<\/p>\n\n\n\n<p><strong>4. &#8220;Documentation instead of code&#8221;<\/strong>: Impressive artifacts: test plans in .claude\/, self-evaluation against each criterion K1-K8, detailed scenarios TC-001 to TC-030. But there were 5 tests in the repository. AI did an excellent job at planning; unfortunately, time ran out before the team got to implementation.<\/p>\n\n\n\n<p><strong>Note: <\/strong>this is not a failed pattern but a warning for the entire industry. AI was so good at planning that the team spent most of its time building a map instead of going into the field. It is an interesting phenomenon where AI generates planning artifacts so convincing that implementation moves to the back seat.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>The paradox of this study: The best repository comes from the Oldschool group<\/strong><\/h2>\n\n\n\n<p>Here we arrive at the most important discovery of the entire analysis, and at the moment where simple narratives about AI must give way to facts.<\/p>\n\n\n\n<p><strong>The best repository in the entire study (with a score of 4.3\/5) comes from the Oldschool group<\/strong>. It beat all five AI repositories. And it is exactly what the AI section described as an unattainable model: <strong>API as the data foundation, hybrid tests, full <em>cleanup<\/em>, and domain validation<\/strong>.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: xml; title: ; notranslate\" title=\"\">\n\/\/ Model API data setup from the Oldschool group*\npublic CreatedProduct CreateProductWithQuantity(\n    int quantity = TestDataConstants.Product.DefaultAvailableQuantity) {\n    CreateProductRequest request = CreateProductRequestFactory.CreateValid(settings);\n    ProductEnvelope response = productsApi.CreateProduct(request);\n    var productId = response.Product.Id\n        ?? throw new InvalidOperationException(&quot;Product id not returned&quot;);\n    cleanupTracker.Track(productId, productsApi);  \/\/ automatic cleanup\n    stockAvailablesApi.UpdateQuantity(stockAvailableId, quantity);\n    return new CreatedProduct { Request = request, Response = response };\n}\n}\n<\/pre><\/div>\n\n\n<p><em><em>* Example, a composition of methods to present the concept<\/em><\/em><\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: xml; title: ; notranslate\" title=\"\">\n\/\/ Domain-value assertions from the Oldschool group*\nAt&lt;CartPopupPage&gt;(x =&gt; {\n    x.ProductName.Should().Be(product.Name);\n    x.ProductQuantity.Should().Be(&quot;1&quot;);\n});\n\nupdated.StockAvailable.Quantity.Should().Be(updatedQuantity);\nupdated.StockAvailable.ProductId.Should().Be(productId);\n\n<\/pre><\/div>\n\n\n<p><em><em>* Example, a composition of methods to present the concept<\/em><\/em><\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: xml; title: ; notranslate\" title=\"\">\n\/\/ ResourceCleanupTracker, automatic cleanup after tests*\npublic class ResourceCleanupTracker {\n    public void Cleanup() {\n        foreach (var item in _resources.DistinctBy(x =&gt; (x.Id, x.Api)).ToList())\n            TryDelete(() =&gt; item.Api.DeleteById(item.Id));\n    }\n}\n}\n<\/pre><\/div>\n\n\n<p><em><em>* Example 1:1, with a simplification using var item, only a stylistic difference<\/em><\/em><\/p>\n\n\n\n<p>Five separate projects (Core, Api, Ui, TestSupport, Tests), a dedicated data generator with a GUID in every email, configuration through User Secrets without a single secret in the repo, and automatic screenshots on failures.<\/p>\n\n\n\n<p>This is the description of the best repository in the entire study (written without AI), by a team with an architect, in six hours. From this, we can draw the following conclusion:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em><strong><em>AI raises the entry level for everyone, but the top still belongs to experience.<\/em><\/strong><\/em><\/p>\n<\/blockquote>\n\n\n\n<p>It is worth pausing on a comparison that emerges from the data: 196 tests, 3.6\/5 vs 8 tests, 4.3\/5. The article describes this, <strong>but it is worth saying it plainly<\/strong>: 8 tests that verify specific domain values, clean up after themselves, and isolate data are probably more valuable in production than 196 tests with shallow assertions. <strong>A large number of tests without quality gives a more misleading sense of security than a small number of good tests<\/strong>. This is the conclusion that managers and tech leaders should take from this study: <strong>coverage is not a counter; it is a quality criterion<\/strong>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Where experience really matters<\/strong><\/h2>\n\n\n\n<p>Now the main conclusion regarding the role of experience emerges. It should be noted, however, that it is more subtle than the things heard in the media, namely, &#8220;AI replaces experience&#8221; or &#8220;experience always wins&#8221;.<\/p>\n\n\n\n<p>In the Oldschool group, the difference between Regular+Regular teams and Architect+Junior was visible, but the time pressure flattened the results. Everyone was heading in a similar direction, and the time-related compromises were similar.<\/p>\n\n\n\n<p>In the AI group, this difference became decisive. AI removed the technical barriers: syntax, framework setup, and basic patterns. It turned out that when these barriers disappear, what remains are pure architectural decisions (proportional to experience).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Three areas where experience is decisive<\/strong><\/h2>\n\n\n\n<p><strong>1. What we test, not how we test<\/strong>: Most teams in both Oldschool and AI stopped at verifying the visibility of messages. The best repositories from both groups went further: they verified specific domain values, the correctness of VAT calculations, and the isolation of carts between users.<\/p>\n\n\n\n<p>This shows not a difference in the ability to write code, but a conscious thinking about what these tests are for in the first place.<\/p>\n\n\n\n<p><strong>2. Test data strategy<\/strong>: Every team that made a conscious decision about API as the setup layer immediately jumped to a level in quality. It is not just about speed; it is about test isolation, deterministic data, and the possibility of parallel execution. Regardless of team composition, this is an effect of experience earned through hard practice (so it does not strictly follow from prompting AI, but from understanding why data isolation in tests matters).<\/p>\n\n\n\n<p><strong>3. Systems thinking versus a battery of tests<\/strong>: Most repositories are a battery of tests. The best are the testing systems. The difference lies in whether the architecture allows scaling, whether cleanup is automatic, and whether layers of responsibility are separated. AI can help to quickly build individual elements of such a system, but the decision that the system should exist at all is a human architectural decision.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>What AI really does well (and where it falls short)<\/strong><\/h2>\n\n\n\n<p>To leave no doubt: AI brings huge, measurable value. In our study, the difference in the number of tests was striking. The AI groups delivered between 5 and almost 200 tests, Oldschool delivered between 5 and 8. The shortening of the time needed to reach a satisfactory level is real.<\/p>\n\n\n\n<p><strong>AI really helps with:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Quick entry to a level that Oldschool reaches only after many iterations.<\/li>\n\n\n\n<li>Reducing implementation errors and antipatterns.<\/li>\n\n\n\n<li>Speeding up the way out of technical blockers.<\/li>\n\n\n\n<li>Building the project structure from the very first <em>commit.<\/em><\/li>\n\n\n\n<li>Generating a large number of test scenarios in a brief time.<\/li>\n<\/ul>\n\n\n\n<p><strong>AI consistently fails to deliver in:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Depth of business validation, it will not ask whether you checked the VAT.<\/li>\n\n\n\n<li>Data strategy, it will not decide that the setup should be done through the API.<\/li>\n\n\n\n<li>Security, it will not stop you from committing secrets to the <em>repo.<\/em><\/li>\n\n\n\n<li>Systems thinking generates tests, not testing systems.<\/li>\n\n\n\n<li>Pragmatism often generates &#8220;pretty&#8221; code without real value.<\/li>\n\n\n\n<li>Negative tests focus on the happy path, omitting edge cases and incorrect data.<\/li>\n<\/ul>\n\n\n<div class=\"nsw-o-blogersii-banner\">\n            <picture>\n            <source srcset=\"https:\/\/sii.pl\/blog\/wp-content\/uploads\/2026\/04\/Blog-Testing-Lab-Desktop_.jpg\" media=\"(min-width: 992px)\" >\n            <source srcset=\"https:\/\/sii.pl\/blog\/wp-content\/uploads\/2026\/04\/Blog-Testing-Lab-Mob_.jpg\" media=\"(min-width: 300px)\" >            <img decoding=\"async\" src=\"https:\/\/sii.pl\/blog\/wp-content\/uploads\/2026\/04\/Blog-Testing-Lab-Desktop_.jpg\" alt=\"\"  class=\"\"  >\n        <\/picture>\n        <div class=\"cnt\">\n                    <div class=\"nsw-m-title-block -h3 -invert  -has-title-margin-bottom-0 -has-title-font-weight-bold\">\n                                <h2 class=\"nsw-m-title-block__title\">Testing &#038; QA<\/h2>\n                <\/div>\n                            <p class=\"has-nsw-p-4-font-size has-invert-color\">\n                Ensure the quality, performance, and security of your software with our testing and test automation services.\n            <\/p>\n                            <a  href=\"https:\/\/sii.pl\/en\/what-we-offer\/testing-qa\/\" class=\"nsw-a-button -ghost -banner-button\"   >\n        <span>Check our offer<\/span>\n    <\/a>\n            <\/div>\n<\/div>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>The most important conclusion<\/strong><\/h2>\n\n\n\n<p>This study had one clear paradox: the repository that everyone treated as a model for the AI group, with API setup, a hybrid approach, and cleanup and domain validation, <strong>belonged to the Oldschool group<\/strong>.<\/p>\n\n\n\n<p>This paradox tells us something important. AI dramatically lowers the entry level. It makes it possible for anyone to quickly write structured, correct code. This is a real change.<\/p>\n\n\n\n<p>But at the same time, when the technical barriers disappear, what becomes more apparent is what has always determined quality: understanding the domain, architectural thinking, knowing what is worth testing and why.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em><strong><em>AI significantly improves the way we write tests, but it does not improve what we test.<\/em><\/strong><\/em><\/p>\n<\/blockquote>\n\n\n\n<p>For teams, this translates into specific conclusions:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>If you do not use AI<\/strong>, you lose on productivity. The pace of work will be slower and the number of delivered scenarios lower.<\/li>\n\n\n\n<li><strong>If you use AI without changing your approach<\/strong>, you will have prettier code, but the tests will still be checking the flow rather than the business logic.<\/li>\n\n\n\n<li><strong>If you combine AI with architectural experience<\/strong>, you can build a testing system that actually protects the product.<\/li>\n<\/ul>\n\n\n\n<p>The greatest risk after this Hackathon is the simple assumption: &#8220;since we have AI, quality will improve on its own&#8221;. It will not. AI does not know your domain, does not understand your system, and will not make decisions for you about what is worth verifying.<\/p>\n\n\n\n<p>It can help you reach a solution faster. But it is you who must know where you are going.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em><strong><em>AI lowers the entry-level, but at the same time, it raises the importance of experience at the decision level.<\/em><\/strong><\/em><\/p>\n<\/blockquote>\n\n\n\n<p>This study shows us the present times and maturity, which some teams have not yet reached.<\/p>\n\n\n\n<p>Test automation without the use of AI has become a slower approach, increasingly difficult to justify. Just as manual tests did not disappear after the arrival of automation (but changed their role and status), so classic automation is becoming the new manual approach. Still needed, still valuable in the right contexts, but it is ceasing to be the default choice for someone who wants to work more efficiently.<\/p>\n\n\n\n<p>The question, then, is not &#8220;is it worth using AI in automation?&#8221; but rather &#8220;how quickly is your team able to change the way it thinks about this work?&#8221; because the tools are already here. They are waiting for experienced experts who will know what to build with them.<\/p>\n\n\n<div class=\"kk-star-ratings kksr-auto kksr-align-left kksr-valign-bottom\"\n    data-payload='{&quot;align&quot;:&quot;left&quot;,&quot;id&quot;:&quot;33714&quot;,&quot;slug&quot;:&quot;default&quot;,&quot;valign&quot;:&quot;bottom&quot;,&quot;ignore&quot;:&quot;&quot;,&quot;reference&quot;:&quot;auto&quot;,&quot;class&quot;:&quot;&quot;,&quot;count&quot;:&quot;2&quot;,&quot;legendonly&quot;:&quot;&quot;,&quot;readonly&quot;:&quot;&quot;,&quot;score&quot;:&quot;5&quot;,&quot;starsonly&quot;:&quot;&quot;,&quot;best&quot;:&quot;5&quot;,&quot;gap&quot;:&quot;2&quot;,&quot;greet&quot;:&quot;&quot;,&quot;legend&quot;:&quot;5\\\/5&quot;,&quot;size&quot;:&quot;30&quot;,&quot;title&quot;:&quot;What does the code from Sii Testing Lab look like? Repositories deep-dive&quot;,&quot;width&quot;:&quot;159&quot;,&quot;_legend&quot;:&quot;{score}\\\/5&quot;,&quot;font_factor&quot;:&quot;1.25&quot;}'>\n            \n<div class=\"kksr-stars\">\n    \n<div class=\"kksr-stars-inactive\">\n            <div class=\"kksr-star\" data-star=\"1\" style=\"padding-right: 2px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 30px; height: 30px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" data-star=\"2\" style=\"padding-right: 2px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 30px; height: 30px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" data-star=\"3\" style=\"padding-right: 2px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 30px; height: 30px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" data-star=\"4\" style=\"padding-right: 2px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 30px; height: 30px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" data-star=\"5\" style=\"padding-right: 2px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 30px; height: 30px;\"><\/div>\n        <\/div>\n    <\/div>\n    \n<div class=\"kksr-stars-active\" style=\"width: 159px;\">\n            <div class=\"kksr-star\" style=\"padding-right: 2px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 30px; height: 30px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" style=\"padding-right: 2px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 30px; height: 30px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" style=\"padding-right: 2px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 30px; height: 30px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" style=\"padding-right: 2px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 30px; height: 30px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" style=\"padding-right: 2px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 30px; height: 30px;\"><\/div>\n        <\/div>\n    <\/div>\n<\/div>\n                \n\n<div class=\"kksr-legend\" style=\"font-size: 24px;\">\n            5\/5    <\/div>\n    <\/div>\n","protected":false},"excerpt":{"rendered":"<p>In the Sii Testing Lab report, we described seven key conclusions from our Hackathon on the role of AI in &hellip; <a class=\"continued-btn\" href=\"https:\/\/sii.pl\/blog\/en\/what-does-the-code-from-sii-testing-lab-look-like-repositories-deep-dive\/\">Continued<\/a><\/p>\n","protected":false},"author":177,"featured_media":33712,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_editorskit_title_hidden":false,"_editorskit_reading_time":0,"_editorskit_is_block_options_detached":false,"_editorskit_block_options_position":"{}","inline_featured_image":false,"footnotes":""},"categories":[1320],"tags":[10265,2787,2198,1501,1459],"class_list":["post-33714","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-hard-development","tag-hackathon-en","tag-testng-en","tag-case-study-en","tag-artifiical-intelligence-en","tag-test-automation-en"],"acf":[],"aioseo_notices":[],"republish_history":[],"featured_media_url":"https:\/\/sii.pl\/blog\/wp-content\/uploads\/2026\/04\/Hackathon-1.jpg","category_names":["Hard development"],"_links":{"self":[{"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/posts\/33714"}],"collection":[{"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/users\/177"}],"replies":[{"embeddable":true,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/comments?post=33714"}],"version-history":[{"count":1,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/posts\/33714\/revisions"}],"predecessor-version":[{"id":33716,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/posts\/33714\/revisions\/33716"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/media\/33712"}],"wp:attachment":[{"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/media?parent=33714"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/categories?post=33714"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/tags?post=33714"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}