Luigi Sambolino

AZ-104: Azure Administrator Certification Guide

2026-01-19T00:00:00+00:00

As I study for the AZ-104 (Azure Administrator) certification, I’m documenting key concepts and knowledge to share with others on the same learning journey.

Overview

The AZ-104 certification validates the skills needed to manage Azure subscriptions, secure identities, administer the infrastructure, and manage data and applications.

Topics Covered

1. Manage Azure Identities and Governance

Manage Azure AD users and groups
Manage access with RBAC
Manage subscriptions and governance

Azure Subscription Basics

Definition: A subscription is a logical container for Azure resources and the billing boundary for their consumption.

Microsoft Entra ID (formerly Azure Active Directory)

Note: Azure Active Directory is now called Microsoft Entra ID. Microsoft Entra is Microsoft’s cloud-based identity and access management service.

Tenant: A dedicated instance that an organization receives when they sign up for a Microsoft cloud service. It represents the organization and contains all of its users, groups, applications and other identity-related resources. Each tenant has a unique domain name and is completely isolated from other tenants.

Licenses: Subscription-based entitlements that grant users access to specific Microsoft cloud services and features. Examples: Microsoft 365 E3/E5 Office 365 Each license provides access to a bundle of services and must be assigned to users before they can use those services.

Entra ID Attributes: Properties or characteristics associated with user objects in Microsoft Entra ID. These include information such as:

Department
Job title
Office location
Employee type (contractor, full-time, etc.)
Custom attributes

Attributes: metadata about users that can be used for automation, filtering, and policy application.

Dynamic security groups are groups in Microsoft Entra ID where membership is automatically determined and updated based on user or device attributes, rather than being manually managed. The group membership is controlled by rules (queries) that evaluate attributes, and users or devices are automatically added or removed as their attributes change.

2. Implement and Manage Storage

Create and configure storage accounts
Manage data in Azure Storage
Configure Azure Files and Azure Blob Storage

3. Deploy and Manage Compute Resources

Create and configure virtual machines
Automate deployment using Azure Resource Manager templates
Create and configure containers
Create and configure App Service

4. Configure and Manage Virtual Networking

Implement virtual networks
Configure network security groups and application security groups
- NSG Assignment Scope: Network Security Groups can be assigned to network interfaces, giving you fine-grained control over where security rules apply. You have two main approaches:
  - Subnet-Level NSG: Assign an NSG to an entire subnet, and the access control list (ACL) rules automatically apply to all virtual machine instances within that subnet. This is efficient for managing consistent security policies across multiple VMs.
  - Network Interface-Level NSG: Assign an NSG to a specific virtual machine’s network interface for granular, VM-specific security control. This approach allows different security policies for individual machines, even within the same subnet.
- NSG Port Configuration: To allow Remote Desktop Management and Secured HTTPS, configure inbound security rules to allow port 3389 (RDP) and allow port 443 (HTTPS)
- Application Security Groups (ASG): When you have multiple subnets (e.g., 4 subnets with 10 VMs each) and need to allow inbound traffic over TCP 8080 to specific VMs (e.g., 2 VMs per subnet), use Application Security Groups. ASGs allow you to group together the network interfaces from multiple virtual machines, then use the group as the source or destination in an NSG rule. Important: All network interfaces must be in the same virtual network. Associate the NSG to each subnet and use the ASG in the rule to target only the specific VMs that need access, rather than managing individual IP addresses.
- NSG Rule Priority: Rule priority is critical in NSGs - lower numbers are evaluated first. Example scenario with NSG1 containing VM1 and VM2 with outbound security rules:
  - Rule1 (Priority: 900) - BlockInternet - Port: 80, Protocol: TCP, Action: Block
  - Rule2 (Priority: 1000) - AllowInternet - Port: 80, Protocol: TCP, Action: Allow
  In this case, Rule1 (priority 900) will be evaluated first and block internet access on port 80. To ensure internet access to VM1 on port 80 is allowed, you must change the priority of Rule2 to a number lower than 900 (e.g., 800 or 850) so it gets evaluated before the blocking rule. Remember: Lower priority number = higher precedence.
Configure Azure DNS
Configure Azure ExpressRoute
User-Defined Routes (UDR): If you need to ensure all network traffic passes through a virtual machine named VM1 (a network inspection appliance), you need to configure a user-defined route. Azure automatically creates a route table for each subnet on an Azure virtual network, but custom routes allow you to override the default routing behavior and direct traffic through specific network appliances for inspection or filtering.

5. Monitor and Maintain Azure Resources

Monitor resources using Azure Monitor
Implement backup and recovery
Manage Azure updates

Key Resources

Study Progress

Module 1: Identity
Module 2: Governance and Compliance
Module 3: Azure Administration
Practice Exams

This post will be updated as I progress through the certification study materials.

Scadenza TLS in Kubernetes: bomba a orologeria che causa interruzioni critiche

2024-09-10T17:30:12+00:00

Immagina la scena: una giornata di routine nelle nostre operazioni tecniche si trasforma improvvisamente in una corsa contro il tempo ad alta tensione. Il nostro team di sviluppo è nel pieno di deployment critici quando si scontra con un muro—un’interruzione imprevista del servizio che minaccia di far deragliare il loro flusso di lavoro. Quella che segue è una storia di troubleshooting, lavoro di squadra e alcune rivelazioni sorprendenti.

La chiamata inaspettata Tutto è iniziato con una chiamata urgente dal nostro team di sviluppo. Stavano affrontando un’interruzione grave nei loro deployment, ed era chiaro che qualcosa era andato storto nei nostri cluster Kubernetes di pre-produzione (staging). Man mano che la confusione iniziale si placava, sapevamo di dover scavare a fondo per arrivare al nocciolo del problema.

Identificazione dei colpevoli

I certificati scaduti. Il primo indizio è arrivato da una fonte inaspettata: certificati TLS scaduti. Nel mondo di Kubernetes, questi certificati sono gli eroi non celebrati della comunicazione sicura. Quando scadono, l’intera rete di fiducia del sistema vacilla. Abbiamo capito rapidamente che non si trattava di un piccolo intoppo ma di un ostacolo significativo. Per prima cosa, abbiamo eseguito kubeadm certs check-expiration:

CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Dec 30, 2020 23:36 UTC   364d                                    no
apiserver                  Dec 30, 2020 23:36 UTC   364d            ca                      no
apiserver-etcd-client      Dec 30, 2020 23:36 UTC   364d            etcd-ca                 no
apiserver-kubelet-client   Dec 30, 2020 23:36 UTC   364d            ca                      no
controller-manager.conf    Dec 30, 2020 23:36 UTC   364d                                    no
etcd-healthcheck-client    Dec 30, 2020 23:36 UTC   364d            etcd-ca                 no
etcd-peer                  Dec 30, 2020 23:36 UTC   364d            etcd-ca                 no
etcd-server                Dec 30, 2020 23:36 UTC   364d            etcd-ca                 no
front-proxy-client         Dec 30, 2020 23:36 UTC   364d            front-proxy-ca          no
scheduler.conf             Dec 30, 2020 23:36 UTC   364d                                    no

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Dec 28, 2029 23:36 UTC   9y              no
etcd-ca                 Dec 28, 2029 23:36 UTC   9y              no
front-proxy-ca          Dec 28, 2029 23:36 UTC   9y              no

La rete di dipendenze. Man mano che svelavamo i vari livelli del problema, abbiamo scoperto una complessa rete di dipendenze tra i servizi. I servizi Kubernetes si affidano l’uno all’altro in modi intricati, e comprendere queste connessioni era fondamentale.
La configurazione disordinata. Ci siamo anche imbattuti nella configurazione YAML dell’apiserver, che era un disastro di righe obsolete e non necessarie. Era come trovare disordine in uno spazio di lavoro un tempo organizzato—una distrazione da eliminare per vedere chiaramente il problema.
Rinnovo dei certificati. Il certificato dell’apiserver era scaduto dopo il tempo predefinito (1 anno). Con le cause alla radice identificate, ci siamo messi al lavoro sul rinnovo dei certificati scaduti, assicurandoci che i nuovi certificati fossero correttamente emessi e integrati.
Pulizia della configurazione. Successivamente, abbiamo affrontato il file YAML dell’apiserver disordinato. Abbiamo eliminato le configurazioni obsolete, semplificando e razionalizzando il setup.
Test e verifica. Con le modifiche in atto, abbiamo eseguito una serie di test per assicurarci che tutto funzionasse senza problemi. Ogni test superato confermava che avevamo ripristinato le normali operazioni e risolto le interruzioni.

Riflessioni sul percorso Questo incidente è stato un potente promemoria dell’importanza della gestione proattiva dei certificati e della comprensione delle intricate dipendenze all’interno di un cluster Kubernetes. È stata un’esperienza impegnativa, ma che alla fine ha rafforzato il nostro approccio e i nostri processi.

Andando avanti, continueremo a condividere il nostro percorso e le nostre intuizioni, perfezionando le nostre pratiche per prevenire interruzioni future. Restate sintonizzati per ulteriori aggiornamenti e lezioni apprese dalle nostre continue avventure nel mondo di Kubernetes.

Conclusione Gestire cluster Kubernetes va oltre il semplice deployment e scaling—richiede un’attenzione meticolosa alla sicurezza e all’igiene dell’infrastruttura. L’interruzione inaspettata causata dai certificati TLS scaduti ci ha insegnato lezioni preziose sull’importanza della gestione proattiva dei certificati e degli audit regolari del cluster.

Rafforzando i nostri processi di monitoraggio e semplificando le configurazioni, non solo abbiamo risolto questo problema critico ma abbiamo anche fortificato il nostro ambiente per le sfide future. Questa esperienza è un promemoria che negli ecosistemi cloud-native, anche componenti apparentemente minori come i certificati possono essere bombe a orologeria pronte a esplodere.

Rimanere vigili e preparati sarà fondamentale mentre continuiamo a navigare nel complesso ma gratificante panorama di Kubernetes.

Kubernetes TLS Expiration: Critical Impact on Service Availability

2024-09-10T17:30:12+00:00

During routine operations, our pre-production Kubernetes cluster experienced a significant service disruption. This incident highlighted a critical infrastructure vulnerability: expired TLS certificates. This post documents the incident, root cause analysis, and remediation steps taken to prevent future occurrences.

Incident Overview

Our development team reported deployment failures affecting the pre-production environment. Initial investigation revealed the root cause was related to security certificates within the Kubernetes cluster infrastructure.

Root Cause Analysis

The investigation identified several contributing factors:

Expired TLS Certificates. The primary cause was the expiration of TLS certificates used for cluster component communication. In Kubernetes, these certificates authenticate communication between critical system components. When certificates expire, authentication failures cascade throughout the cluster, preventing normal operations. Running kubeadm certs check-expiration revealed:

CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Dec 30, 2020 23:36 UTC   364d                                    no
apiserver                  Dec 30, 2020 23:36 UTC   364d            ca                      no
apiserver-etcd-client      Dec 30, 2020 23:36 UTC   364d            etcd-ca                 no
apiserver-kubelet-client   Dec 30, 2020 23:36 UTC   364d            ca                      no
controller-manager.conf    Dec 30, 2020 23:36 UTC   364d                                    no
etcd-healthcheck-client    Dec 30, 2020 23:36 UTC   364d            etcd-ca                 no
etcd-peer                  Dec 30, 2020 23:36 UTC   364d            etcd-ca                 no
etcd-server                Dec 30, 2020 23:36 UTC   364d            etcd-ca                 no
front-proxy-client         Dec 30, 2020 23:36 UTC   364d            front-proxy-ca          no
scheduler.conf             Dec 30, 2020 23:36 UTC   364d                                    no

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Dec 28, 2029 23:36 UTC   9y              no
etcd-ca                 Dec 28, 2029 23:36 UTC   9y              no
front-proxy-ca          Dec 28, 2029 23:36 UTC   9y              no

Inter-Service Dependencies. Kubernetes cluster components have strict interdependencies for secure communication. Certificate expiration affects the entire authentication chain, preventing services from communicating with each other, which in turn cascades failures across multiple systems.
Configuration Complexity. The apiserver YAML configuration contained obsolete parameters and unnecessary configurations that complicated troubleshooting efforts and delayed root cause identification.
Certificate Renewal. All expired certificates were renewed using the standard Kubernetes certificate renewal procedures. The apiserver certificate, which had expired after the default one-year validity period, was regenerated and redeployed.
Configuration Optimization. The apiserver YAML configuration was reviewed and cleaned, removing deprecated parameters and unnecessary entries to improve maintainability and reduce future troubleshooting complexity.
Validation and Testing. Comprehensive testing was performed to verify that all cluster components successfully established secure communication and that normal operations were restored.

Key Takeaways

This incident underscores several important operational principles:

Proactive Certificate Management: Automated certificate rotation and monitoring are essential for maintaining cluster stability
Dependency Awareness: Understanding inter-component dependencies is critical for effective incident response
Configuration Management: Clean, well-organized configurations reduce troubleshooting time and improve operational visibility
Regular Audits: Periodic reviews of cluster certificates and configurations can prevent similar incidents

Conclusion

Certificate management is a foundational aspect of Kubernetes cluster operations. While TLS certificates are often overlooked in favor of more visible infrastructure components, their expiration can have cascading effects on system availability. By implementing automated certificate lifecycle management and maintaining proper monitoring practices, organizations can significantly reduce the risk of certificate-related outages. This incident has led to the implementation of enhanced monitoring and automated certificate rotation to prevent future disruptions.