Monday, April 13, 2026

Automating troubleshooting in Kognitos

 

Automating Troubleshooting in Kognitos

What if a support investigation could start with a vague prompt and end with a deterministic, repeatable workflow?

That’s what I tested in Kognitos.

I started with a simple request:

Look at Sentry logs and replay for jerome@kognitos.com and tell me what this user was trying to do. Add the summary activity, error, and next steps.

From that one high-level instruction, Kognitos created an SOP that could investigate the issue step by step. Behind the scenes, it also generated SPY code so the workflow could move from an exploratory AI-driven draft into deterministic automation.

That shift is the interesting part. This was not just “AI gives me an answer.” It was “AI builds a troubleshooting workflow I can test, refine, and operationalize.”



What the SOP does

The workflow takes a user email, then:

  • Pulls Sentry session replays from the last 24 hours
  • Summarizes session behavior, pages visited, and frontend errors
  • Identifies top transactions to understand what the user spent time doing
  • Extracts the workspace ID from replay URLs
  • Uses that workspace ID to query SigNoz for backend warnings and errors
  • Produces a human-readable activity summary, error analysis, and prioritized next steps

In other words, it connects frontend behavior with backend signals and turns scattered telemetry into a support-ready narrative.

SOP:



Why this matters

Support and engineering teams often spend too much time doing mechanical investigation work:

  • Watch a replay
  • Check logs
  • Infer intent
  • Correlate timestamps
  • Guess which backend errors matter
  • Write up next steps

This SOP compresses that process into something far more systematic.

Instead of manually stitching together Sentry, SigNoz, and product context, Kognitos assembled a workflow that did the correlation automatically and produced a usable report.

What the system found

For workspace rVYWaOvLl49ZGeDix0mhG, the automation surfaced 13 backend error/warn logs, all from kognitos.quill.

The failures fell into a few clear categories:

1. Automation creation failures

The most serious issue was a backend defect in Grimoire:

create_automation - Failed to create automation in Grimoire: No active draft - call begin_draft() first

This directly blocked automation creation.

2. SPY code generation errors

The agent generated invalid SPY in several places:

  • Function definitions without required annotations
  • An invalid url type
  • Unsupported imports

These were code generation problems, not user mistakes.

3. File lifecycle issues

Some files were referenced before they existed in tmp/, which caused file-not-found failures.

4. Input ordering problems

The agent attempted to save a default input before the automation had defined any inputs with read_input().

5. Security policy violations

Some bash commands using pipes and redirection were blocked by policy.

The real takeaway

The most valuable outcome was not just the list of errors. It was the system’s ability to separate symptoms from root causes and prioritize next actions.

Here’s the ordered view:

  • P0: Fix the Grimoire draft lifecycle bug blocking create_automation
  • P1: Fix Quill code generation so it produces valid SPY
  • P1: Fix tool ordering so inputs are only set after they exist
  • P2: Improve error messaging and file lifecycle handling

That is what good troubleshooting should do: not just describe what broke, but identify what matters first.

From AI chat to reliable automation

What I like most about this flow is that it starts creatively and ends deterministically.

I can begin with an open-ended prompt, let the system assemble the investigation logic, test it in draft mode, and then promote it into a repeatable automation with observable runs.

That is the bridge between generative AI and operational reliability.

The runtime behavior itself worked once the defects were understood. The main blockers were not the idea of the automation, but the platform and codegen issues uncovered during execution.

And that is exactly why this kind of workflow is useful: it doesn’t just solve support problems, it helps expose product and platform gaps that teams can actually fix.

Tuesday, December 17, 2024

Mindful Software: Building Agentic Automations using GenAI

Currently, software development and automation are painful. The software or automation team has to complete almost 95% of the process, taking care of all corner cases or the tribal knowledge accumulated over the years. If the developer misses anything, it comes back as a bug, and only the software engineer or automation developer can fix it by including the corner case. In addition to fixing the code, this process has to go through the entire lengthy software development life cycle of change management, QA and deployment in the sandbox and production.

With Kognitos, our customers develop the basic logic for their processes using English syntaxes and improve their accuracy over time by adding learnings. The learnings could be a simple one-liner, new logic to address the corner case, or a new document type. Thus, we create a way to capture tribal knowledge methodically and keep the records forever.

Neuroplasticity: Kognitos's method is not new, but this is how our human brains are designed. Babies are born with fewer neural connections. Humans learn a lot from their surroundings, and other developed humans in the first few years. 



With Kognitos, when the system encounters a new condition/situation, the system prompts an exception and waits for the process owner to review the English exception. The business operations team provides guidance on addressing the new situation. Until the exception is handled, the process will not use any compute resources.

Kognitos supports multiple learnings for similar exceptions, and Gen AI guides the system to the best context based Learning for the current situation or document. Ref: https://caff-ai-nate.blogspot.com/2024/03/vector-databases.html 

Unlearning an obsolete condition:

It is as easy as deleting the Learning from the UI instead of having to rewrite the entire automation.

Example:

The Main Process was activated through an email titled "CarDealer---Customer-Service-PUBLISHED-to-review-an-email-7jxxxxx@sb.kognitos.com." As you can read, we request chat GPT to classify this email concisely and send an email accordingly.

get the email body as the email text

ask koncierge
  the openai model is "gpt-4o"
  the task is "Review {the email text} and {the email subject}and classify the email based on the following rules: For any email inquiring about when a new vehicle will be delivered, the output should be 'Vehicle Delivery Updates'. For any email about fuel for their car, the output should be 'EV Card Issues'. For any email with mileage questions, the output should be 'Mina'. For any email where the sender is stating that they have been in an accident or their vehicle has been damaged, the output should be 'Please refer abcd.sharepoint.com/dealing-in-an-accident'. For any email inquiring about service or repairs, the output should be 'Please refer abcd.sharepoint.com/how-to-service'. Be concise."
get the above as the output

split the email sender with
  the delimiter is "@"
get the above as the email values
get the first email value
get that as the username

send an email to the email sender where
  the subject is "RE: {the email subject}"
  the message is "Dear {the username},<br><br>Thank you for reaching out.<br><br>Based on the content of your message, I have determined that you should...<br><br><br>{the output}  <br><br><br>Thank you again for your inquiry. Please feel free to reach out with any further inquiries.<br><br>Cheers,<br>Kognitos<br><br><br>From: {the email sender}<br>Date: {the email date}<br>Subject: {the email subject}<br><br>{the email body}"

As we see, we forgot to add a condition to see what happens if the information is missing from the email,

Our excellent car salesman can answer the exception used in similar situations. (Learning)

What is AgenticAI?
Agentic AI is a type of AI-driven automation that allows AI agents to perform complex tasks independently and to adapt to changing situations. It can analyse data, recognise patterns, and make decisions without human intervention.

This forays into an Agentic AI solution for your automation. As Kognitos Automation develops, all these exceptions can be used to learn as much as possible without the pitfalls of current GenAI (hallucinations and lack of predictable outcomes). These human interventions, i.e., exceptions, can be learnt. Thus, the Kognitos process created for automation can become more Agentic as the written process and LLMs evolve.

Our product has all the elements of Agentic AI except that we require minimal human intervention when encountering a new situation that needs to be considered while implementing the Kognitos process. As the Kognitos system learns these exceptions, it will eventually be trained to become Agentic. Our Kognitos system can generate new processes with minimal human input as the LLM models evolve.

Nonetheless, we are enhancing the process development lifecycle through the SDLC feature. Stay tuned for more updates.

Watch this demo to understand how our platform interacts with SAP - https://www.kognitos.com/resources/videos/extracting-information-from-sap-sales-order-with-kognitos/ 

Thursday, September 12, 2024

Kognitos: Your AI-Powered Automation System


Introduction:

Kognitos is a revolutionary business automation platform that harnesses the power of Generative AI (GenAI) to streamline your workflows. Our intuitive, English-based interface empowers you to create and manage automation without complex coding.

Analogy:

It is like a human learning a new skill, like how most of us learn to drive.

  1. Read the Drivers Manual ( Connect to Kognitos Books)

  2. Learn to Drive the car with the learner's license ( Playground testing)

  3. Pass the driver’s test ( move it to process)

  4. Learn while driving on the new roads and conditions - new signal on the road (exception handling)

  5. Drive with less effort and zero accidents

Key Features:

  • Natural Language Automation: Describe your desired automation in plain English using our innovative FlexGrammar syntax.
  • Serverless Infrastructure: Leverage the efficiency and scalability of serverless architecture to reduce costs and complexity.
  • Third-Party Integrations: Seamlessly connect to your favorite tools and applications through our extensive library of integrations.
  • Patented Exception Handling: Kognitos learns from exceptions, adapting to new scenarios and avoiding costly downtime. 
  • Continuous Learning: Our platform improves its understanding of your unique processes, ensuring optimal performance over time with manual exception handling

How It Works:

Architecture 



  1. Create Automation: Use our user-friendly interface to define your automation in plain English.
  2. Test and Refine: Experiment with your automation in the playground to ensure it meets your specific needs.
  3. Deploy and Scale: Promote your playground to a process. These processes/automation can be triggered via email or scheduled. 
  4. Continuous Improvement: Kognitos learns from exceptions and adapts to new scenarios based on input, ensuring your automation remains effective forever.

Benefits:

  • Increased Efficiency: Automate repetitive tasks, allowing your team to focus on higher-value work.
  • Reduced Errors: Minimize human error and ensure accuracy in your processes.
  • Faster Time-to-Value: Quickly implement automation without extensive technical coding expertise.
  • Scalability: Easily adapt your automation to changing business needs.
  • Cost Savings: Leverage the efficiency of serverless SaaS Platform

Example Automation:

  • Sample Code(s): ( with our UI preview)
    • Extract doc related to a vendor and translate it into different languages
  • Sample 2:(connect to external app)

    Find the Name in the email body
    Get the above as the lead name
    Find the Title in the email body
    Get the above as the lead title
    Connect to Salesforce
    create a lead in Salesforce with
       the lead status is "New"
       the last name is the lead name
       the title is the lead title
       the lifecycle stage is "marketingqualifiedlead"



    Sample 3 ( with GPT prompts):
  • process each file as follows
  • get the file as a scanned document
  • get the document's lines
  •      ask koncierge
  •          the task is "{the lines} \n-----\n You will be provided with a questionnaire. Find the following information in the document: telephone number, e-mail, nationality. Print the telephone number, e-mail, nationality. No explanation necessary."
  •          the openai model is "gpt-4o"
  •          the rules are "do not include any explanation", "do not include any description", "in the case that a value is not found, just print 'value not in the document' for it. Do not ask for further guidance", "make sure the output is ONLY a json list of rows"
  •         the response format is "table"
  • create a table from the above answer

  • Conclusion:
    Thanks for reading this blog. Kognitos is moving the automation code to English, where the domain experts in accounting/finance and HR can write and manage their day-to-day automation tasks with minimal help from IT/programming experts.
    If you need more information about how Kognitos can help with your workflow/business automation, please visit http://www.kognitos.com.


    Please read the Kognitos Blogs from our CEO and other top-notch industry leaders: http://www.kognitos.com/blog.

Wednesday, March 27, 2024

Embedding AND Vector Databases - creating a long term memory




What Are Vector Databases? - Intelligent memory of the GenAI


While traditional databases store data in rows and columns, a vector database stores data as math vectors. Each piece of data is represented as a point in high-dimensional space, with hundreds or thousands of dimensions. This allows very sophisticated relationships between data points to be captured.


Searching and analyzing vector databases relies on vector mathematics and similarity calculations. By comparing vector positions, highly relevant results can be returned, even if there are no exact keyword matches.
Vector databases index and store the vector embeddings/tokens for faster retrieval at interactive speeds and similarity search with capabilities like CRUD (create, read, update, and delete) operations, horizontal scaling, and serverless.

Why Are Vector Databases Important for AI?


Vector databases are ideal for managing and extracting insights from the enormous datasets required to train modern AI models. Key advantages include:

In the midst of the Gen AI revolution, efficient data processing is crucial not only for GenAI but also for efficient semantic search. GenAI and semantic search rely on vector embeddings/tokens. This vector data representation carries semantic information critical for the AI to gain understanding and maintain a long-term memory they can draw upon when executing complex tasks.

Embeddings/Tokens

LLMs generate embeddings with many attributes or features linked to each other to represent different dimensions essential to understanding patterns, relationships, etc., making their representation challenging to manage.

That is why we need a specialized database to handle this data type. Vector databases like Pinecone meet this by offering optimized storage and querying capabilities for embeddings. Vector databases have the capabilities of a traditional database that are absent in standalone vector indexes and the specialization of dealing with vector embeddings, which traditional scalar-based databases lack.

Embeddings (arrays of numbers) represent data(words and images transformed into numerical vectors that capture their essences). For example, the phrase puppy and dog will have similar embeddings with vectors close to each other. These embeddings are stored on the vector DB.
Puppy = 0.3, 0.5, 0.9, 0.8, 0.4...]
Dog =[0.1,0.51, 0.6, 0.2, 0.8,,,]
Numbers depend on the ML algorithm and model.

If you can convert a text, sentence or image into many vectors, you can compare, detect, and find the closest cosine similarity, semantic similarity, etc.

OpenAI’s text embeddings measure the relatedness of text strings. Embeddings are commonly used for:

  • Search (where results are ranked by relevance to a query string)
  • Clustering (where text strings are grouped by similarity)
  • Recommendations (where items with related text strings are recommended)
  • Anomaly detection (where outliers with little relatedness are identified)
  • Diversity measurement (where similarity distributions are analyzed)
  • Classification (where text strings are classified by their most similar label)

Embedding Apps - GloVe, OpenAI, Word2Vec
Vector DBs are pinecone, Milvus, PgVector, Weaviate

here is how to create an embedding for the text "food" via an Open AI model 

curl https://api.openai.com/v1/embeddings \

  -H "Authorization: Bearer sk-“VPVgpYYi5znT3BlbkFJj0otiGN" \

  -H "Content-Type: application/json" \

  -d '{

    "input": "food"",                             

    "model": "text-embedding-ada-002",

    "encoding_format": "float"

  }'


More details: (Credits)

1. https://platform.openai.com/docs/api-reference/embeddings

2. good video course - explains the theory as well as the setting up of vector db


3. 
 3. https://www.youtube.com/watch?v=ySus5ZS0b94

Right size the vector DB:

Setting up Vector stores introduces new challenges. For example, correctly partitioning large data that cannot fit entirely in RAM in vector stores like Milvus is not easy. 
- Doing it poorly/under partitioning can result in some queries taking up too much RAM and bringing the service down.
- RAG responsiveness significantly depends on reducing the probes required to find relevant documents. So avoid over-partitioning as well

The Road Ahead

As GenAI moves into mainstream applications, vector databases' role will only grow. Their ability to organize and structure knowledge in a format tailored for AI aligns with the needs of next-gen generative models. 


Combining vector databases and transformers allows GenAI to understand language meaning rather than just keywords. This next-generation AI capability, powered by vector math, delivers such natural, intelligent conversations.







Friday, March 15, 2024

Data for AI - Storage, ETL, Prepare, Clean and update the data


Taking your good data to AI



The most commonly used phrases

  • Garbage in, Garbage Out 
  • Bad input produces bad output 
  • Output can be only as good as input. 

Soon: Ethically Sourced, Organically Raised, Grass Fed Data at a Higher Price.

If we properly source and manage the data, LLMs will be trained on the correct data, causing fewer hallucinations. Unremembering or Unlearning specific segments of LLM will be one of the significant facets of GenAI in future.

Teaching the kids wrong things is worse than not teaching them at all.

https://www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-racist


Why do we need to be careful about source data?

1. Incorrect Information: This could lead to AI providing answers that could be disruptive. Need to be careful when prescribing steps for a problem that could lead to severe complications
2. PII and Secure Data: Inadvertently sharing the secure private data of one client to another client. Data Classification and Desensitization using GenAI to preprocess the data to be utilized by AI is becoming a significant business proposition. There are quite a few startups in this space.
3. Feeding Data driven by an agenda: IMHO, we all know about the Gemini fiasco that was providing results that were not truthful because the truth hurts or the truth is not politically correct
4. Properiterary/Copyright Data: How do we monetize and attribute these proprietary research data to the correct author and content creator to prevent plagiarism and reward the inventor? This would be another area of new startups.  
5. Using Publically Available Data has its downfall as well.
"Generative AI copyright battles have been brewing for over a year, and many stakeholders, from authors, photographers and artists to lawyers, politicians, regulators and enterprise companies, want to know what data trained Sora and other models — and examine whether they really were publicly available, properly licensed, etc."

The legal side is a big part of this, but let us review the technical side.

Here are some thoughts on data - types of data, storing, accessing, cleaning, preparing and updating the data

1) Structured Data: Structured data fits neatly into data tables and includes discrete data types such as numbers, short text, and dates.
2) Unstructured Data: Unstructured data, such as audio and video files and large text documents, doesn't fit neatly into a data table because of its size or nature.
3) How to store the data - Fast Storage like VAST and Pure stocks are rising as demand for low latency storage requirements increase
4) Sourcing the data without latency - primary data accessed by the business applications can't be used for observability using AI insights/analytics because it will impact the performance of the production business applications. Again, backup data can't used for analytics as it will generally be a few days older, and the answers will be aged. Databricks/Snowflake are pioneers in the Warehouse/DataLake and Lakehouse technologies with ETL pipelines using Apache Spark to manage both structured and unstructured data with the ability to run CPU-intensive queries on these data. This helps to replicate the data almost immediately for training LLM/analytics purposes.
5) Preparing the data for AI - 
     a) Improve the data quality, 
     b) integrate multiple data sources - Data integration can help you access and analyze more data, enrich your data with additional attributes, and reduce data silos and inconsistencies. ETL with data sync can help. Databricks is helpful for this.
     c) Data labelling: To label your data, you can use tools and techniques such as data annotation, classification, segmentation, and verification.
     d) Data augmentation can help with data scarcity, reduce bias, and improve data generalization and robustness.
     e) Data Governance: Data governance involves defining and implementing policies, processes, roles, and metrics to manage your data throughout its lifecycle. It can help you ensure that your data quality, integration, labelling, augmentation, and privacy are aligned with your AI objectives, standards, and best practices. You can use frameworks and platforms such as data strategy, stewardship, catalogue, and lineage to establish your data governance. 

6) Desensitizing the data for AI: To protect your data privacy, you can use tools and techniques such as data encryption, anonymization, consent, and audit.
7) Data management with proper Authentication/Authorization(IAM): Store and Isolate the data based on the users. Multitenancy and reducing cross-pollination of data without less cost. Having one LLM for each client will be an expensive proposition.
Secure-minded design to protect the data:
 Tier structure for LLMs—general, Domain-Specific, and private LLMs to protect the data or RAG/Grounding with hashed metadata embeddings in VectorDB.



Wednesday, March 13, 2024

Product improvements using GenAI - Serviceability and Usability

 Serviceability is the last in the mind of most Product managers and Developers.  90% of the product users have less time to play around with various configurations to make the product work, as this is one of many products they manage.  Nobody has time to read the docs.

The product managers say that their product is like Apple, but they need to remember to provide guardrails and alerts in a way that makes the product self-serviceable or self-healable.

In this blog, we will review how GenAI can improve products' usability and how product managers make the product suitable for LLMs to learn fast.

- Make the products and documents GenAI ready

- Products utilize GenAI principles to self-heal and be better usable/serviceable.  LLMs for Proactive monitoring and self-healing:


GenAI Ready Product:

  1.  Logs:

Logs generated by the product should have a clear structure, making it easy for LLMs to train on these logs.  Easily identifiable PII data.

Error: TimeStamp: Message in Clear English 

Info: Timestamp: Message in Clear English

All the processes in your product should follow a similar pattern.

  2.  GuardRails LLM

Guide the customer to an optimal solution rather than allowing them to shoot themselves in the foot.  These guardrails can be trained by LLMs (product-specific LLMs running within the product - smaller LLM  footprint that acts as a well-trained product user), 

Do not allow the customers to install the new software if the storage or memory crunch is already in the system.

  3.  Customer pattern learner LLM

This LLM can sit in the product or run in the SaaS to understand the customer usage/use case and provide solutions to the customer.  This LLM can alert the customer if any anomaly is spotted. 

Customers using an older code version with a bug with a specific use case can be alerted to upgrade.  (Version recommender)

  4. Utilizing LLM for insights/analytics and file walker algorithms ( backup vendors and others browse files to identify patterns and can use GenAI tech for LLMs/Vector DBs).

Convert all the ML-based analytics to LLM-based analytics.

  5.  Prompt Engineering: Simplify the UI experience for the customer.  Current UX/UI can be used for advanced users.

Example: The prompt can be - "Identify current bottlenecks and suggest a solution"  

Because there were too many zombie processes, a CPU Bottleneck was identified.  Also, identify these processes and kill them.

Prompt Example for a backup software: Show me the current job that protects VMWare-SQL-Server-5  or Protect SQL-Server6 (provides the step or configures it automatically)

6.  If you are shipping Hardware with your product, LPU/GPU-ready hardware may be the future, or you can ship the call home data to GPU clusters in Amazon to run insights and analytics.

7. Better Product APIs to interact with LLMs: LLMs should be able to connect to the data source, log in to the product, and automatically change the product's configuration as per the prompt. This will help with AI-powered automation. There is a new development in this area called AI-APIs. AI APIs take things a step further by using machine learning and natural language processing to understand requests, generate relevant responses, and complete tasks.

Documents suitable for GenAI:

If the product vendor generates the product documents, they should be structured and parsed through AI.  Reinforced/supervised training of AI is good for verifying that it can produce clear, concise, and correct answers for a specific vendor software version without hallucinations and contradictions for the questions asked and can translate correctly in multiple languages.

LLMs are good at reading documents and summarizing them.  A quick test run of various prompts with LLMs trained on the new docs for every version/white paper will improve confidence.


LLMs for Proactive monitoring and self-healing:

Model 1: The product sends the call-home data to the SaaS-based vendor monitoring system.  Based on the above requirements of the logs/alerts being AI-ready, this data is AI-ready and will spend less time in the data cleaning/prep phase. 

Model 2: An on-prem Master that collects data from multiple points (IoTs, Clusters, nodes, Servers) and looks for anomalies. 
  • Pros: Secure, Quick identification/Trained for local data—useful for Cameras to monitor a break-in; 
  • Cons: Local Admin required and upgrades required, Limited processing power
LLMs can analyze these data for
  • Identify the anomaly and the corresponding fingerprint, provide solutions or apply the solution if it exists. 
  • Walk through the logs/alerts to identify new issues and alert the respective teams (Engineering/Field Notice/CSMs).  Create a draft doc on the field notice.
  • LLMs can identify the blast radius of this fingerprint.
  • LLMs can be live monitors of your product, and LLMs can be enabled to fix the issue or create docs or scripts to resolve the issue.
  • LLMs can scan these logs much faster than the current log parsers/file walkers.
Examples:
  LLMs monitor the logs for FATAL failures and review whether this is a known issue or an unknown issue, triggering an appropriate action.
  LLMs monitoring the video input, start tracking the person who broke the glass based on glass shattering sound





Monday, March 11, 2024

Coming attractions in this blog space

Here is what you can expect in this space: I plan to write at least one blog monthly, if not more frequently. I appreciate your support in providing feedback and sharing this blog.

Let us learn together and make the world better with GenAI.


Index:

  • Review of Startups in this space


Automating troubleshooting in Kognitos

  Automating Troubleshooting in Kognitos What if a support investigation could start with a vague prompt and end with a deterministic, repea...