<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>iSRL</title>
<link>https://isrl.in/pub/</link>
<atom:link href="https://isrl.in/pub/index.xml" rel="self" type="application/rss+xml"/>
<description>iSRL research publications — datasets, reports, and methods notes on Indian food ingredient identity, regulatory analysis, and the EMF Framework.</description>
<image>
<url>https://isrl.in/assets/og-image.png</url>
<title>iSRL</title>
<link>https://isrl.in/pub/</link>
<height>76</height>
<width>144</width>
</image>
<generator>quarto-1.9.38</generator>
<lastBuildDate>Thu, 16 Apr 2026 00:00:00 GMT</lastBuildDate>
<item>
  <title>Food Allergens in India: Evidence, Regulation, and the State of Current Knowledge</title>
  <dc:creator>Lalitha A R</dc:creator>
  <dc:creator>PA Mahesh</dc:creator>
  <link>https://isrl.in/pub/2026-04-r-allergen/</link>
  <description><![CDATA[ 




<section id="what-an-allergen-is" class="level2 page-columns page-full" data-number="1">
<h2 data-number="1" class="anchored" data-anchor-id="what-an-allergen-is"><span class="header-section-number">1</span> 1. What an allergen is</h2>
<p>The immune system is a classification system. It encounters proteins, assesses them, and files them — safe or hostile. For most people, most of the time, the filing is accurate.</p>
<p>In some people, an ordinary food protein gets filed as hostile.<sup>1</sup> The protein itself is unchanged — digestible, stable, present in millions of meals daily. But the immune system has produced IgE antibodies against it, and every subsequent encounter triggers a response: urticaria, angioedema, anaphylaxis.<sup>2</sup> The classification is the allergy.</p>
<div class="no-row-height column-margin column-container"><div id="fn1"><p><sup>1</sup>&nbsp;The immune system producing IgE antibodies to a food protein is called sensitisation. Sensitisation is not the same as allergy — many people carry these antibodies without ever having a reaction. What converts sensitisation into clinical allergy is not fully understood.</p></div><div id="fn2"><p><sup>2</sup>&nbsp;Urticaria is hives. Angioedema is swelling, typically of the lips, tongue, or throat. Anaphylaxis is a systemic reaction — blood pressure drops, airways narrow — that can be fatal within minutes without treatment.</p></div><div id="fn3"><p><sup>3</sup>&nbsp;Pepsin, the main digestive enzyme in the stomach, breaks most proteins into fragments too small for the immune system to recognise. Proteins that resist this — pepsin-stable proteins — arrive in the gut intact, where immune cells encounter them directly.</p></div><div id="fn4"><p><sup>4</sup>&nbsp;Cross-reactivity is why someone allergic to one tree nut may react to others, or why sensitisation to a grass pollen can produce symptoms on eating certain fruits. The immune system is recognising a shared structural pattern, not the specific food.</p></div></div><p>Proteins that survive gastric digestion intact reach the immune system in a form it can respond to.<sup>3</sup> Proteins that are heat-stable remain recognisable after cooking. Proteins that share structural features across species mean that sensitisation to one food may produce reactivity to others never directly encountered — a phenomenon called cross-reactivity.<sup>4</sup></p>
<p>Which proteins a population’s immune systems tend to misfile varies by geography, diet, and a set of environmental factors that are still being characterised. In the United States, peanut allergy affects roughly 1–2% of the population. In a large systematic study of Indian children, peanut sensitisation was 6.3% by serum-specific IgE. Probable clinical peanut allergy in the same cohort was approximately 0.03%.<sup>5</sup></p>
<div class="no-row-height column-margin column-container"><div id="fn5"><p><sup>5</sup>&nbsp;Serum-specific IgE measures IgE antibodies to a particular food protein in a blood sample. A positive result means the immune system has been exposed to that protein and produced antibodies — it does not mean the person will react if they eat the food. “Probable food allergy” in the EuroPrevall study required both a positive IgE or skin test and reported symptoms within two hours of eating the food. Neither is the same as a confirmed challenge test.</p></div></div><p>That gap — between a population that carries the antibodies and a population that develops the disease — runs through the Indian food allergy literature consistently. The sections that follow document what produced it, what it means, and where the evidence currently stands.</p>
<hr>
</section>
<section id="how-food-allergy-is-measured" class="level2 page-columns page-full" data-number="2">
<h2 data-number="2" class="anchored" data-anchor-id="how-food-allergy-is-measured"><span class="header-section-number">2</span> 2. How food allergy is measured</h2>
<section id="the-diagnostic-hierarchy" class="level3 page-columns page-full" data-number="2.1">
<h3 data-number="2.1" class="anchored" data-anchor-id="the-diagnostic-hierarchy"><span class="header-section-number">2.1</span> 2.1 The diagnostic hierarchy</h3>
<p>Four methods appear in the Indian literature reviewed here, each measuring something different.</p>
<p><strong>Skin prick test (SPT)</strong> introduces a small amount of allergen extract into the skin surface; a raised wheal above a defined threshold indicates sensitisation.<sup>6</sup> SPT is fast and inexpensive. In the Indian context, the absence of standardised local allergen extracts limits its comparability across studies — most Indian studies use commercial extracts developed for other populations, or in-house preparations with variable protein content <span class="citation" data-cites="krishna2020">(Krishna et al. 2020)</span>.</p>
<div class="no-row-height column-margin column-container"><div id="fn6"><p><sup>6</sup>&nbsp;A wheal is a raised, itchy bump at the test site, like a small insect bite. The standard threshold is 3mm or more above the negative control. Bigger wheals suggest stronger sensitisation — but again, sensitisation is not allergy.</p></div></div><p><strong>Serum-specific IgE (sIgE)</strong> measures circulating IgE antibodies to a specific food protein via blood test. It has the same diagnostic limitation as SPT: a positive result indicates sensitisation, not confirmed clinical allergy.</p>
<p><strong>Oral food challenge (OFC)</strong> requires the patient to eat the food under clinical observation, with symptoms recorded. It is the closest available proxy to real-world exposure, but resource-intensive and not widely available outside specialist centres.<sup>7</sup></p>
<div class="no-row-height column-margin column-container"><div id="fn7"><p><sup>7</sup>&nbsp;An oral food challenge typically happens in a clinical setting over several hours — the patient eats increasing amounts of the food at intervals while a clinician monitors for reactions. Because reactions can be severe, emergency treatment needs to be immediately available. This is why the absence of adrenaline auto-injectors in India until recently made challenges structurally difficult to conduct safely.</p></div></div><p><strong>Double-blind placebo-controlled food challenge (DBPCFC)</strong> is the diagnostic gold standard: both patient and clinician are blinded to whether the test substance or a placebo is being administered. Highest confidence, used almost exclusively in specialist centres and research protocols.</p>
</section>
<section id="why-these-tests-give-different-numbers" class="level3 page-columns page-full" data-number="2.2">
<h3 data-number="2.2" class="anchored" data-anchor-id="why-these-tests-give-different-numbers"><span class="header-section-number">2.2</span> 2.2 Why these tests give different numbers</h3>
<p>A single population, tested by different methods, will produce different numbers. In the EuroPrevall-INCO study — 5,677 children aged 7–10 years across schools in Mysore and Bengaluru — sIgE detected sensitisation in 19.1% of children while SPT detected sensitisation in 4.48% of the same children <span class="citation" data-cites="mahesh2023b">(<span class="nocase">Mahesh et al.</span> 2023)</span>.<sup>8</sup></p>
<div class="no-row-height column-margin column-container"><div id="fn8"><p><sup>8</sup>&nbsp;The EuroPrevall-INCO study is the largest systematic food allergy dataset from India. It tested children from schools, not clinics — which means it captures a cross-section of the child population rather than children who had already presented with suspected allergies. This makes its prevalence figures more representative of the general population than most other Indian studies.</p></div><div id="fn9"><p><sup>9</sup>&nbsp;This matters because the 0.14% figure is the one most often cited as India’s food allergy prevalence. It is the best available estimate from the best available study — but it is not a confirmed allergy rate. A confirmed rate from DBPCFC data would likely be lower. How much lower is not known, because the challenge data does not exist at population scale.</p></div></div><p>Neither figure represents confirmed food allergy. The “probable food allergy” figure from the same study — 0.14% for Indian children — used a specific operational definition: reported symptoms within two hours of eating a food, combined with a positive sIgE or SPT to that food <span class="citation" data-cites="mahesh2023b leung2024">(<span class="nocase">Mahesh et al.</span> 2023; <span class="nocase">Leung et al.</span> 2024)</span>. This is not OFC-confirmed. It is a structured symptom-report combined with immunological evidence, which consistently overestimates confirmed allergy relative to DBPCFC.<sup>9</sup></p>
<p>DBPCFC-confirmed allergy exists for only three foods in the India-specific data available: rice (6 of 16 patients tested confirmed, Delhi tertiary referral centre), black gram (4 of 14 confirmed, same centre), and chickpea (31 of 41 SPT-positive patients confirmed on challenge, Bombay allergy clinic) <span class="citation" data-cites="mahesh2023b krishna2020">(<span class="nocase">Mahesh et al.</span> 2023; Krishna et al. 2020)</span>.<sup>10</sup></p>
<div class="no-row-height column-margin column-container"><div id="fn10"><p><sup>10</sup>&nbsp;These are small numbers from single centres. A confirmed rate of 6 out of 16 for rice means six people in one Delhi clinic reacted to rice under controlled conditions — it does not mean 37.5% of Indians with rice sensitisation have clinical rice allergy. The value of this data is that it exists at all, not that it is generalisable.</p></div></div></section>
<section id="what-this-means-for-reading-the-numbers-in-this-review" class="level3" data-number="2.3">
<h3 data-number="2.3" class="anchored" data-anchor-id="what-this-means-for-reading-the-numbers-in-this-review"><span class="header-section-number">2.3</span> 2.3 What this means for reading the numbers in this review</h3>
<p>The figures that appear most frequently in Indian food allergy literature are sensitisation rates and probable food allergy estimates, not confirmed allergy rates. When numbers are cited in §3, the method used to generate them is stated each time. Sensitisation rates are not treated as equivalent to clinical allergy rates; the 136-fold gap in the EuroPrevall India data is the clearest signal that this distinction matters.</p>
<hr>
</section>
</section>
<section id="food-allergens-in-india-what-the-literature-documents" class="level2 page-columns page-full" data-number="3">
<h2 data-number="3" class="anchored" data-anchor-id="food-allergens-in-india-what-the-literature-documents"><span class="header-section-number">3</span> 3. Food allergens in India: what the literature documents</h2>
<section id="the-europrevall-inco-study" class="level3 page-columns page-full" data-number="3.1">
<h3 data-number="3.1" class="anchored" data-anchor-id="the-europrevall-inco-study"><span class="header-section-number">3.1</span> 3.1 The EuroPrevall-INCO study</h3>
<p>The EuroPrevall-INCO study enrolled 5,677 children aged 7–10 years across schools in Mysore and Bengaluru and tested each child against a 25-food panel <span class="citation" data-cites="mahesh2023b">(<span class="nocase">Mahesh et al.</span> 2023)</span>.</p>
<p>Sensitisation rates by sIgE in children: shrimp 10.5%, sesame 8.0%, wheat 6.7%, peanut 6.3%. SPT sensitisation was lower overall — 4.48% aggregate versus 19.1% by sIgE — with jackfruit (2.46%) and cow’s milk (1.35%) leading by SPT <span class="citation" data-cites="mahesh2023b">(<span class="nocase">Mahesh et al.</span> 2023)</span>.<sup>11</sup></p>
<div class="no-row-height column-margin column-container"><div id="fn11"><p><sup>11</sup>&nbsp;The difference between sIgE and SPT results for the same foods in the same children reflects the different things each test measures, and the different thresholds each uses to call a result positive. Neither is wrong — they are measuring different aspects of the same immune response.</p></div></div><p>Probable food allergy in children was 0.14% overall. The leading foods in the probable food allergy subset were cow’s milk (0.5% of that subset) and apple (0.5%), with egg at 0.05% and eggplant at 0.04% <span class="citation" data-cites="mahesh2023b">(<span class="nocase">Mahesh et al.</span> 2023)</span>.</p>
<p>For adults across two Karnataka cities the picture shifts: 26.5% sensitisation and 1.2% probable food allergy, with legumes, prawn, eggplant, milk, and egg as the leading allergens <span class="citation" data-cites="mahesh2023b">(<span class="nocase">Mahesh et al.</span> 2023)</span>.<sup>12</sup></p>
<div class="no-row-height column-margin column-container"><div id="fn12"><p><sup>12</sup>&nbsp;The adult figures being higher than the child figures is consistent with cumulative exposure over time — more years of eating means more opportunity for sensitisation to develop. Whether this reflects a genuine increase in allergy with age or a cohort effect — older adults having grown up under different dietary and environmental conditions — is not established in the available data.</p></div></div><p>Both study sites are urban Karnataka. The EuroPrevall-INCO data does not cover North India, Northeast India, rural populations, or coastal communities <span class="citation" data-cites="leung2024">(<span class="nocase">Leung et al.</span> 2024)</span>.</p>
</section>
<section id="clinic-and-community-studies" class="level3 page-columns page-full" data-number="3.2">
<h3 data-number="3.2" class="anchored" data-anchor-id="clinic-and-community-studies"><span class="header-section-number">3.2</span> 3.2 Clinic and community studies</h3>
<p>Beyond EuroPrevall, <span class="citation" data-cites="krishna2020">(Krishna et al. 2020)</span> compiles 13 individual Indian allergy studies conducted between 2001 and 2019, covering Bombay, Delhi, Mysore, Bengaluru, Lucknow, and Kolkata. These are largely clinic-based cohorts — patients presenting to allergy clinics rather than general population samples.<sup>13</sup> Sensitisation rates from these studies are higher than in population-based studies and are not representative of background prevalence in the general population.</p>
<div class="no-row-height column-margin column-container"><div id="fn13"><p><sup>13</sup>&nbsp;Think of it this way: if you want to know how common headaches are in a city, surveying people in a neurology clinic will give you a much higher number than surveying people on the street. Both numbers are real — they are just answering different questions. Clinic-based studies tell you what allergens appear in people who have already sought care for a suspected reaction, not how common those allergens are in the population at large.</p></div></div><p>The table below summarises key findings from those studies alongside FSSAI mandatory status for each allergen.</p>
<div id="tbl-clinic-studies" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-tbl figure">
<figcaption class="quarto-float-caption-top quarto-float-caption quarto-float-tbl" id="tbl-clinic-studies-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Table&nbsp;1: Key allergens from Indian clinic and community studies (as reported in <span class="citation" data-cites="krishna2020">(Krishna et al. 2020)</span>)
</figcaption>
<div aria-describedby="tbl-clinic-studies-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<table class="caption-top table">
<colgroup>
<col style="width: 20%">
<col style="width: 20%">
<col style="width: 20%">
<col style="width: 20%">
<col style="width: 20%">
</colgroup>
<thead>
<tr class="header">
<th>Allergen</th>
<th>Sensitisation range</th>
<th>Method</th>
<th>DBPCFC/OFC data</th>
<th>In FSSAI 2020?</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Black gram (<em>Vigna mungo</em>)</td>
<td>5.9–10.1%</td>
<td>SPT, sIgE</td>
<td>4/14 DBPCFC confirmed</td>
<td>No</td>
</tr>
<tr class="even">
<td>Rice (<em>Oryza sativa</em>)</td>
<td>6.2–12.1%</td>
<td>SPT, sIgE</td>
<td>6/16 DBPCFC confirmed</td>
<td>No</td>
</tr>
<tr class="odd">
<td>Lentil (<em>Lens culinaris</em>)</td>
<td>5.5–9.7%</td>
<td>SPT (N=216–1,860)</td>
<td>None available</td>
<td>No</td>
</tr>
<tr class="even">
<td>Prawn</td>
<td>10.3–53.5%</td>
<td>SPT, sIgE</td>
<td>—</td>
<td>Yes (crustaceans)</td>
</tr>
<tr class="odd">
<td>Eggplant (<em>Solanum melongena</em>)</td>
<td>4.3–9.2% SPT; 0.8% sIgE community</td>
<td>SPT; sIgE</td>
<td>None available</td>
<td>No</td>
</tr>
<tr class="even">
<td>Egg</td>
<td>6.9–34.9%</td>
<td>SPT, sIgE</td>
<td>—</td>
<td>Yes</td>
</tr>
<tr class="odd">
<td>Banana</td>
<td>3.6–40.6%</td>
<td>SPT</td>
<td>None available</td>
<td>No</td>
</tr>
<tr class="even">
<td>Wheat</td>
<td>6.7–11.93%</td>
<td>SPT, sIgE</td>
<td>—</td>
<td>Yes (gluten cereals)</td>
</tr>
<tr class="odd">
<td>Chickpea (<em>Cicer arietinum</em>)</td>
<td>SPT positive 41/1,400</td>
<td>SPT (N=1,400 clinic)</td>
<td>31/41 DBPCFC confirmed</td>
<td>No</td>
</tr>
<tr class="even">
<td>Red gram / pigeon pea (<em>Cajanus cajan</em>)</td>
<td>12.6%</td>
<td>sIgE (Karnataka N=2,219)</td>
<td>None available</td>
<td>No</td>
</tr>
<tr class="odd">
<td>Green gram (<em>Vigna radiata</em>)</td>
<td>12.5%</td>
<td>sIgE (Karnataka N=2,219)</td>
<td>None available</td>
<td>No</td>
</tr>
</tbody>
</table>
</div>
</figure>
</div>
<p>Note on eggplant: stored eggplant accumulates histamine at levels that can produce false-positive SPT results. Sensitisation figures for eggplant based on SPT should be interpreted with this confound in mind <span class="citation" data-cites="bhattacharya2018">(Bhattacharya et al. 2018)</span>. <sup>14</sup></p>
<div class="no-row-height column-margin column-container"><div id="fn14"><p><sup>14</sup>&nbsp;Histamine is the same compound the body releases during an allergic reaction — which is why antihistamines treat allergy symptoms. When stored eggplant already contains elevated histamine, introducing it into a skin prick test can trigger a wheal response that looks like sensitisation but is actually a direct chemical reaction to the histamine, not an IgE-mediated immune response. The SPT result is positive; the underlying mechanism is different.</p></div><div id="fn15"><p><sup>15</sup>&nbsp;Urban children consistently showing higher sensitisation than rural children from the same region — same genetic background, different environment — is one of the signals researchers use to argue that environment, not genetics, drives much of the variation in food allergy rates. What specifically differs between urban and rural environments in ways that affect allergy development is an open question.</p></div></div><p>An urban–rural gradient is visible in the available data. In Karnataka schools, sensitisation to prawn was 17.7% urban versus 5.7% rural; peanut 19.6% versus 10.4%; fish 17.7% versus 5.7% (Gobinaath et al.&nbsp;2018, as reported in <span class="citation" data-cites="krishna2020">(Krishna et al. 2020)</span>).<sup>15</sup></p>
</section>
<section id="molecular-characterisation-of-india-specific-allergens" class="level3 page-columns page-full" data-number="3.3">
<h3 data-number="3.3" class="anchored" data-anchor-id="molecular-characterisation-of-india-specific-allergens"><span class="header-section-number">3.3</span> 3.3 Molecular characterisation of India-specific allergens</h3>
<p><span class="citation" data-cites="bhattacharya2018">(Bhattacharya et al. 2018)</span> provides molecular-level data on allergens characterised specifically from Indian clinical populations. The primary food allergen categories identified in Indian patients are legumes, prawn, eggplant, milk, and egg.</p>
<p>India’s only IUIS-registered food allergen is Pen i 1, the tropomyosin of <em>Penaeus indicus</em> (Indian white prawn) <span class="citation" data-cites="bhattacharya2018">(Bhattacharya et al. 2018)</span>.<sup>16</sup></p>
<div class="no-row-height column-margin column-container"><div id="fn16"><p><sup>16</sup>&nbsp;The IUIS (International Union of Immunological Societies) maintains the official registry of characterised allergens — proteins that have been isolated, sequenced, and confirmed as allergenic through clinical data. Registration means the protein has been formally identified and named as an allergen. That India has only one registered food allergen is a measure of how much molecular characterisation work remains, not of how few allergenic foods exist.</p></div><div id="fn17"><p><sup>17</sup>&nbsp;A protein that resists pepsin digestion for 15 minutes arrives in the gut largely intact — meaning the immune system encounters the full protein rather than fragments. Fragments are generally less likely to trigger a response because the immune system recognises the whole structure, not the parts. Pepsin stability is one of the properties the FAO/WHO uses to assess whether a novel protein is likely to be allergenic.</p></div></div><p><strong>Black gram (<em>Vigna mungo</em>)</strong>: A 28-kDa glycoprotein (Vig m) was isolated and shown to resist pepsin digestion for at least 15 minutes, a property associated with higher clinical relevance for IgE-mediated reactions. Sequence homology to a rho-specific inhibitor in peanut was identified, providing a structural basis for observed cross-reactivity <span class="citation" data-cites="bhattacharya2018">(Bhattacharya et al. 2018)</span>. <sup>17</sup></p>
<p><strong>Chickpea (<em>Cicer arietinum</em>)</strong>: A 26-kDa albumin fraction was characterised and found to cross-react with peanut IgE, relevant given that peanut allergy is the best-documented IgE-mediated food allergen globally <span class="citation" data-cites="bhattacharya2018">(Bhattacharya et al. 2018)</span>.</p>
<p><strong>Kidney bean (<em>Phaseolus vulgaris</em>)</strong>: A 31-kDa phytohemagglutinin was found stable to pepsin digestion and reported to sensitise approximately 22% of Delhi food-allergic patients tested <span class="citation" data-cites="bhattacharya2018">(Bhattacharya et al. 2018)</span>.</p>
<p><strong>Rice (<em>Oryza sativa</em>)</strong>: A 24-kDa chitinase was identified as the major allergen; approximately 12% of food-allergic patients in the study were SPT positive <span class="citation" data-cites="bhattacharya2018">(Bhattacharya et al. 2018)</span>.</p>
<p><strong>Eggplant (<em>Solanum melongena</em>)</strong>: A lipid transfer protein (LTP) was characterised in the peel and seeds. LTPs are heat-stable and digestion-resistant, giving them higher clinical relevance than heat-labile proteins. The histamine confound in SPT testing for eggplant does not affect the molecular characterisation, but does affect interpretation of sensitisation rates <span class="citation" data-cites="bhattacharya2018">(Bhattacharya et al. 2018)</span>.<sup>18</sup></p>
<div class="no-row-height column-margin column-container"><div id="fn18"><p><sup>18</sup>&nbsp;A lipid transfer protein is a small plant protein whose biological role is moving lipids — fats — across cell membranes. They are found across many plant foods and are one of the main drivers of cross-reactivity between plant allergens. Because they are heat-stable, they remain allergenic in cooked food, which makes them clinically more significant than proteins that denature under heat.</p></div><div id="fn19"><p><sup>19</sup>&nbsp;This distinction matters for processed food labelling specifically. A product containing cooked mackerel retains its allergenic proteins intact. A product containing cooked hilsha may carry reduced — though not necessarily zero — allergenic risk. The current FSSAI declaration requirement covers fish as a category and does not distinguish by heat stability.</p></div></div><p><strong>Fish</strong>: Heat-stable allergens were characterised in bhetki (<em>Lates calcarifer</em>) and mackerel (<em>Rastrelliger kanagurta</em>); heat-labile allergens in hilsha (<em>Tenualosa ilisha</em>) and pomfret (<em>Pampus argenteus</em>). Cooking does not eliminate allergenic risk from bhetki or mackerel <span class="citation" data-cites="bhattacharya2018">(Bhattacharya et al. 2018)</span>. <sup>19</sup></p>
<p><strong>Legumes as a class</strong>: allergen proteins from legumes retain IgE reactivity after gastric digestion <span class="citation" data-cites="bhattacharya2018">(Bhattacharya et al. 2018)</span>, which means the pepsin-stability argument applies across the entire legume complex, not only to individual characterised proteins.</p>
<p><span class="citation" data-cites="milana2025">(<span class="nocase">Milana et al.</span> 2025)</span> provides additional cross-reactivity data for the Indian legume complex. Mung bean LTPs share greater than 60% sequence homology with LTPs from lentil, bean, peanut, strawberry, and apple. Black gram (Vig m) cross-reacts with faba bean, lentil, lima bean, and pea. Black gram is also linked to Pollen Food Allergy Syndrome with <em>Prosopis juliflora</em>, a tree species prevalent across urban India <span class="citation" data-cites="milana2025">(<span class="nocase">Milana et al.</span> 2025)</span>.<sup>20</sup></p>
<div class="no-row-height column-margin column-container"><div id="fn20"><p><sup>20</sup>&nbsp;Pollen Food Allergy Syndrome (PFAS) is a cross-reactive condition where sensitisation to a pollen triggers oral symptoms — tingling, mild swelling — on eating certain raw foods. The immune system is recognising a structural similarity between the pollen protein and a food protein. In India, <em>Prosopis juliflora</em> is a widespread urban tree; people sensitised to its pollen may develop oral symptoms to black gram through this pathway rather than through direct sensitisation to the legume.</p></div></div><p><strong>Red gram / pigeon pea (<em>Cajanus cajan</em>)</strong>: Novel allergens including β-conglycinin and vicilin homologues have been identified via Indian patient sera <span class="citation" data-cites="bhattacharya2018">(Bhattacharya et al. 2018)</span>. Sensitisation data from a Karnataka population study (N=2,219) reported 12.6% sIgE positive <span class="citation" data-cites="krishna2020">(Krishna et al. 2020)</span>.</p>
</section>
<section id="the-sensitisation-reactivity-gap" class="level3 page-columns page-full" data-number="3.4">
<h3 data-number="3.4" class="anchored" data-anchor-id="the-sensitisation-reactivity-gap"><span class="header-section-number">3.4</span> 3.4 The sensitisation-reactivity gap</h3>
<p>In EuroPrevall India, 19.1% of children tested positive for at least one food by sIgE; 0.14% had probable food allergy — a 136-fold gap <span class="citation" data-cites="mahesh2023b krishna2020">(<span class="nocase">Mahesh et al.</span> 2023; Krishna et al. 2020)</span>. For peanut specifically, sensitisation was 6.3% by sIgE; probable peanut allergy was approximately 0.03% — roughly 200-fold <span class="citation" data-cites="krishna2020">(Krishna et al. 2020)</span>. In Western populations, peanut allergy prevalence is typically cited at 1–2%, an order of magnitude closer to the sensitisation rate.</p>
<p>Several protective factors have been proposed: longer breastfeeding, vaginal delivery, diverse legume exposure from early life, gut microbiome composition, and enteric helminthiasis <span class="citation" data-cites="mahesh2023b">(<span class="nocase">Mahesh et al.</span> 2023)</span>.<sup>21</sup> None has been confirmed as causal; they are epidemiological associations observed in parallel with the gap.</p>
<div class="no-row-height column-margin column-container"><div id="fn21"><p><sup>21</sup>&nbsp;Enteric helminthiasis means intestinal worm infections, which are more common in lower-income settings. This may seem counterintuitive as a protective factor, but the hypothesis is that parasitic infections shift the immune system toward a particular response profile — Th2-dominant — that may reduce clinical reactivity to food allergens. As India urbanises and sanitation improves, helminthiasis rates fall, and the gap may narrow as a result.</p></div></div><p>The urbanisation signal is indirect evidence for the protective factor hypothesis. Children born in Hong Kong to mainland Chinese parents are approximately four times more likely to develop food sensitisation than mainland-born children, despite identical genetic background <span class="citation" data-cites="leung2024">(<span class="nocase">Leung et al.</span> 2024)</span>. In Indian data, urban children consistently show higher sensitisation to prawn, peanut, fish, and milk than rural children in the same regional studies <span class="citation" data-cites="krishna2020">(Krishna et al. 2020)</span>.</p>
<p>This matters for any classification that uses sensitisation data as a proxy for clinical relevance. For most foods in §3.2, sensitisation rates are the only data available. The gap documented here is the reason those rates cannot be read directly as clinical allergy burden.</p>
</section>
<section id="why-the-evidence-base-is-limited" class="level3 page-columns page-full" data-number="3.5">
<h3 data-number="3.5" class="anchored" data-anchor-id="why-the-evidence-base-is-limited"><span class="header-section-number">3.5</span> 3.5 Why the evidence base is limited</h3>
<p>The constraints on Indian food allergy research are structural, not incidental. The researchers working in this field document them explicitly <span class="citation" data-cites="krishna2020 devdas2018 mahesh2023b">(Krishna et al. 2020; Devdas et al. 2018; <span class="nocase">Mahesh et al.</span> 2023)</span>.</p>
<p>Most sensitisation data comes from allergy clinic patients, not general population cohorts. Patients attending allergy clinics are a selected population — higher pre-test probability of sensitisation than the general population. Rates from these studies are expected to exceed true population prevalence.</p>
<p>Standardised allergen extracts for SPT are not available in India. “High quality allergen extracts for skin tests and adrenaline auto-injectors are currently not available in India” <span class="citation" data-cites="krishna2020">(Krishna et al. 2020)</span>. Results vary across laboratories and cannot be pooled directly.<sup>22</sup></p>
<div class="no-row-height column-margin column-container"><div id="fn22"><p><sup>22</sup>&nbsp;When two labs test for sensitisation to the same food using different extracts — different protein concentrations, different preparation methods — a positive result in one lab is not directly comparable to a positive result in the other. This is why sensitisation rates for the same food vary considerably across Indian studies, and why ranges rather than single figures are reported throughout this review.</p></div></div><p>Systematic data is largely from Karnataka and Delhi. Northeast India, rural India, coastal fishing communities, and tribal populations are essentially absent from the available literature.</p>
<p>DBPCFC-confirmed data exists for three foods only — rice, black gram, chickpea — each from single-centre clinic cohorts. “Very few studies in India have confirmed food allergy with a challenge procedure” <span class="citation" data-cites="mahesh2023b">(<span class="nocase">Mahesh et al.</span> 2023)</span>. As recently as 2018, adrenaline auto-injectors were not available in India <span class="citation" data-cites="devdas2018">(Devdas et al. 2018)</span>, which limited the ability to conduct challenges safely.<sup>23</sup></p>
<div class="no-row-height column-margin column-container"><div id="fn23"><p><sup>23</sup>&nbsp;A food challenge carries the risk of triggering the very reaction it is testing for. Conducting one safely requires having emergency treatment — adrenaline — immediately available. Without it, a challenge that triggers anaphylaxis cannot be managed. This is the direct structural link between the absence of adrenaline auto-injectors and the absence of challenge data in the Indian literature.</p></div></div><p>Evidence generated in high-income Western countries is not directly applicable to India. Diagnostic thresholds, allergen panels, and reference ranges require validation for Indian populations before they can be used <span class="citation" data-cites="krishna2020">(Krishna et al. 2020)</span>.</p>
<p>These are the conditions that shaped the evidence. They explain why the literature looks the way it does, and they are the frame within which every figure in §3.1–3.3 should be read.</p>
</section>
</section>
<section id="the-fssai-mandatory-list" class="level2 page-columns page-full" data-number="4">
<h2 data-number="4" class="anchored" data-anchor-id="the-fssai-mandatory-list"><span class="header-section-number">4</span> 4. The FSSAI mandatory list</h2>
<section id="regulatory-text" class="level3 page-columns page-full" data-number="4.1">
<h3 data-number="4.1" class="anchored" data-anchor-id="regulatory-text"><span class="header-section-number">4.1</span> 4.1 Regulatory text</h3>
<p>Regulation 5(14) of the Food Safety and Standards (Labelling and Display) Regulations, 2020 (Version III, operationalised 1 July 2022) requires that packaged food manufacturers declare the presence of the following allergen groups on the product label <span class="citation" data-cites="fssai2020">(Food Safety and Standards Authority of India 2022)</span>:</p>
<ol type="1">
<li>Cereals containing gluten (wheat, rye, barley, oats, spelt, and their hybridised strains)</li>
<li>Crustaceans</li>
<li>Milk</li>
<li>Eggs</li>
<li>Fish</li>
<li>Peanuts and tree nuts</li>
<li>Soybeans</li>
<li>Sulphites at concentrations of 10 mg/kg or more</li>
</ol>
<p>Exemptions include: oils derived from listed ingredients; distilled alcoholic beverages; raw agricultural commodities; and specific wheat-derived processing aids where gluten content is ≤20 mg/kg <span class="citation" data-cites="fssai2020">(Food Safety and Standards Authority of India 2022)</span>.<sup>24</sup></p>
<div class="no-row-height column-margin column-container"><div id="fn24"><p><sup>24</sup>&nbsp;These exemptions exist because highly refined oils derived from allergenic sources — peanut oil, soy oil — typically contain little to no residual protein, and protein is what the immune system reacts to. The exemption is not blanket; cold-pressed or unrefined oils may retain protein and are treated differently.</p></div></div><p>“May Contains” declarations for cross-contamination risk are permitted but not required.</p>
</section>
<section id="international-basis" class="level3 page-columns page-full" data-number="4.2">
<h3 data-number="4.2" class="anchored" data-anchor-id="international-basis"><span class="header-section-number">4.2</span> 4.2 International basis</h3>
<p>The FSSAI 2020 list maps directly to the Codex Alimentarius General Standard for the Labelling of Pre-packaged Foods (CXS 1-1985) as it stood prior to the 2024 revision. India adopted the Codex list as the scientific baseline for its allergen labelling framework, consistent with the WTO Sanitary and Phytosanitary Agreement’s treatment of Codex standards as the international reference <span class="citation" data-cites="codex2024">(Codex Alimentarius Commission 2024)</span>.<sup>25</sup></p>
<div class="no-row-height column-margin column-container"><div id="fn25"><p><sup>25</sup>&nbsp;The WTO SPS Agreement encourages member countries to base food safety regulations on international standards — primarily Codex — rather than developing independent national standards for each regulated substance. Adopting the Codex allergen list is therefore not a shortcut; it is the standard approach for WTO member states. The question is what happens when the Codex list diverges from national-specific evidence, which is what §4.4 examines.</p></div></div><p>The Codex 2024 revision made two changes relevant here: sesame was added as a mandatory declaration allergen, and soy was reclassified from mandatory to recommended, reflecting lower confirmed soy allergy prevalence in large population studies relative to other listed allergens <span class="citation" data-cites="codex2024">(Codex Alimentarius Commission 2024)</span>. The 2024 revision also introduced a requirement for visual distinction of allergen declarations from surrounding label text. Whether FSSAI will align with these changes is not known at the time of writing.</p>
<p>The table below places the FSSAI list in international context.</p>
<div id="tbl-regulatory-comparison" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-tbl figure">
<figcaption class="quarto-float-caption-top quarto-float-caption quarto-float-tbl" id="tbl-regulatory-comparison-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Table&nbsp;2: FSSAI allergen list in international context
</figcaption>
<div aria-describedby="tbl-regulatory-comparison-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<table class="caption-top table">
<colgroup>
<col style="width: 16%">
<col style="width: 16%">
<col style="width: 16%">
<col style="width: 16%">
<col style="width: 16%">
<col style="width: 16%">
</colgroup>
<thead>
<tr class="header">
<th>Allergen</th>
<th>FSSAI 2020</th>
<th>Codex pre-2024</th>
<th>Codex 2024</th>
<th>EU (Big 14)</th>
<th>US (Big 9)</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Gluten-containing cereals</td>
<td>Mandatory</td>
<td>Mandatory</td>
<td>Mandatory</td>
<td>Mandatory</td>
<td>Mandatory (wheat only)</td>
</tr>
<tr class="even">
<td>Crustaceans</td>
<td>Mandatory</td>
<td>Mandatory</td>
<td>Mandatory</td>
<td>Mandatory</td>
<td>Mandatory</td>
</tr>
<tr class="odd">
<td>Milk</td>
<td>Mandatory</td>
<td>Mandatory</td>
<td>Mandatory</td>
<td>Mandatory</td>
<td>Mandatory</td>
</tr>
<tr class="even">
<td>Egg</td>
<td>Mandatory</td>
<td>Mandatory</td>
<td>Mandatory</td>
<td>Mandatory</td>
<td>Mandatory</td>
</tr>
<tr class="odd">
<td>Fish</td>
<td>Mandatory</td>
<td>Mandatory</td>
<td>Mandatory</td>
<td>Mandatory</td>
<td>Mandatory</td>
</tr>
<tr class="even">
<td>Peanuts</td>
<td>Mandatory (with tree nuts)</td>
<td>Mandatory</td>
<td>Mandatory</td>
<td>Mandatory</td>
<td>Mandatory</td>
</tr>
<tr class="odd">
<td>Tree nuts</td>
<td>Mandatory (with peanuts)</td>
<td>Mandatory</td>
<td>Mandatory</td>
<td>Mandatory</td>
<td>Mandatory</td>
</tr>
<tr class="even">
<td>Soybeans</td>
<td>Mandatory</td>
<td>Mandatory</td>
<td>Recommended</td>
<td>Mandatory</td>
<td>Mandatory</td>
</tr>
<tr class="odd">
<td>Sulphites ≥10 mg/kg</td>
<td>Mandatory</td>
<td>Mandatory</td>
<td>Mandatory</td>
<td>Mandatory</td>
<td>—</td>
</tr>
<tr class="even">
<td>Sesame</td>
<td>Not listed</td>
<td>Not listed</td>
<td>Mandatory</td>
<td>Mandatory</td>
<td>Mandatory (added 2023)</td>
</tr>
<tr class="odd">
<td>Lupin</td>
<td>Not listed</td>
<td>Not listed</td>
<td>—</td>
<td>Mandatory</td>
<td>—</td>
</tr>
<tr class="even">
<td>Molluscs</td>
<td>Not listed</td>
<td>Not listed</td>
<td>—</td>
<td>Mandatory</td>
<td>—</td>
</tr>
<tr class="odd">
<td>Celery</td>
<td>Not listed</td>
<td>Not listed</td>
<td>—</td>
<td>Mandatory</td>
<td>—</td>
</tr>
<tr class="even">
<td>Mustard</td>
<td>Not listed</td>
<td>Not listed</td>
<td>—</td>
<td>Mandatory</td>
<td>—</td>
</tr>
</tbody>
</table>
</div>
</figure>
</div>
</section>
<section id="where-the-regulation-and-the-literature-converge" class="level3" data-number="4.3">
<h3 data-number="4.3" class="anchored" data-anchor-id="where-the-regulation-and-the-literature-converge"><span class="header-section-number">4.3</span> 4.3 Where the regulation and the literature converge</h3>
<p><strong>Crustaceans</strong> show the strongest alignment between regulation and Indian evidence. Prawn tropomyosin (Pen i 1, <em>Penaeus indicus</em>) is India’s only IUIS-registered food allergen <span class="citation" data-cites="bhattacharya2018">(Bhattacharya et al. 2018)</span>. Sensitisation data is available from at least four independent Indian studies, with rates ranging from 10.3% to 53.5% depending on study type and population <span class="citation" data-cites="krishna2020">(Krishna et al. 2020)</span>.</p>
<p><strong>Milk, egg, and fish</strong> all appear in FSSAI with supporting Indian sensitisation data. Milk sensitisation is 1.35–20.5% across reviewed studies; probable food allergy to milk in children was 0.5% of the EuroPrevall India probable food allergy subset <span class="citation" data-cites="mahesh2023b">(<span class="nocase">Mahesh et al.</span> 2023)</span>. Egg sensitisation was 6.9–34.9% in clinic-based studies; probable egg allergy 0.05% in children <span class="citation" data-cites="mahesh2023b">(<span class="nocase">Mahesh et al.</span> 2023)</span>. Fish allergens characterised in India include heat-stable proteins in bhetki and mackerel <span class="citation" data-cites="bhattacharya2018">(Bhattacharya et al. 2018)</span>.</p>
<p><strong>Peanuts</strong>: sIgE sensitisation 6.3–19.6% in Indian data <span class="citation" data-cites="mahesh2023b krishna2020">(<span class="nocase">Mahesh et al.</span> 2023; Krishna et al. 2020)</span>; probable food allergy approximately 0.03% — the sensitisation-reactivity gap at its most pronounced.</p>
<p><strong>Wheat (gluten cereals)</strong>: sIgE sensitisation 6.7–11.93%; probable food allergy 0–0.02% in EuroPrevall India <span class="citation" data-cites="mahesh2023b">(<span class="nocase">Mahesh et al.</span> 2023)</span>.</p>
</section>
<section id="where-the-regulation-and-the-literature-diverge" class="level3 page-columns page-full" data-number="4.4">
<h3 data-number="4.4" class="anchored" data-anchor-id="where-the-regulation-and-the-literature-diverge"><span class="header-section-number">4.4</span> 4.4 Where the regulation and the literature diverge</h3>
<p>Several foods with documented Indian sensitisation data are absent from the FSSAI list. For a regulatory body setting mandatory labelling requirements, the relevant standard is confirmed clinical allergy burden — and for most of these foods, the DBPCFC data to establish that burden does not exist in India-specific form.<sup>26</sup></p>
<div class="no-row-height column-margin column-container"><div id="fn26"><p><sup>26</sup>&nbsp;Mandatory labelling requirements carry legal and commercial consequences for manufacturers. Setting that bar requires a level of confirmed evidence — ideally challenge-confirmed allergy at population scale — that is higher than what is needed to flag a food as potentially relevant in a research taxonomy. The absence of a food from the FSSAI list does not mean it is not allergenic; it means the evidentiary bar for a legal mandate has not been cleared.</p></div></div><p>The foods where the evidence is most developed:</p>
<p><strong>Sesame</strong>: sensitisation 8.0% in EuroPrevall India children — higher than peanut (6.3%) <span class="citation" data-cites="mahesh2023b">(<span class="nocase">Mahesh et al.</span> 2023)</span>. Codex 2024 has since added sesame as mandatory, joining the US (since 2023) and the EU <span class="citation" data-cites="codex2024">(Codex Alimentarius Commission 2024)</span>. The evidence position for sesame has materially changed since the FSSAI 2020 list was written.</p>
<p><strong>Rice, black gram, and chickpea</strong> each have Indian DBPCFC data: rice (6 of 16 confirmed), black gram (4 of 14 confirmed), chickpea (31 of 41 confirmed) <span class="citation" data-cites="mahesh2023b krishna2020">(<span class="nocase">Mahesh et al.</span> 2023; Krishna et al. 2020)</span>. These are small single-centre cohorts and the only India-specific challenge data that exists for any food not on the FSSAI list.</p>
<p><strong>The Indian legume complex</strong> — pigeon pea, kidney bean, lentil, green gram — has documented sensitisation, characterised allergen proteins, and pepsin-stable fractions <span class="citation" data-cites="bhattacharya2018 krishna2020">(Bhattacharya et al. 2018; Krishna et al. 2020)</span>. No OFC data is available. Legume proteins as a class retain IgE reactivity after gastric digestion <span class="citation" data-cites="bhattacharya2018">(Bhattacharya et al. 2018)</span>, and the cross-reactive epitopes across this complex mean primary sensitisation to one legume may carry risk across others.</p>
<p><strong>Eggplant</strong> is named among the five primary Indian food allergens by <span class="citation" data-cites="bhattacharya2018">(Bhattacharya et al. 2018)</span>, appears in the adult allergen profile of EuroPrevall <span class="citation" data-cites="mahesh2023b">(<span class="nocase">Mahesh et al.</span> 2023)</span>, and has a characterised LTP. SPT-based sensitisation figures carry the histamine confound noted in §3.2; the molecular evidence does not.</p>
<p><strong>Mustard</strong> has no India-specific clinical allergy data in the reviewed literature but is mandatory in the EU, widely used in Indian cooking as both oil and spice, and subject to ongoing FAO/WHO threshold assessment.</p>
<hr>
</section>
<section id="the-regulatory-process-as-a-constraint-on-list-updates" class="level3" data-number="4.5">
<h3 data-number="4.5" class="anchored" data-anchor-id="the-regulatory-process-as-a-constraint-on-list-updates"><span class="header-section-number">4.5</span> 4.5 The regulatory process as a constraint on list updates</h3>
<p>Adding a food to a mandatory allergen declaration list is not a scientific decision alone. It is a regulatory action with legal, commercial, and administrative consequences, and the process that produces it reflects that.</p>
<p>A mandatory declaration requires that manufacturers identify the allergen across their entire supply chain, verify its presence or absence in every product, update labels, and retrain procurement and production staff. For large manufacturers with complex ingredient sourcing, this is a substantial operational exercise. For small manufacturers — which constitute a significant portion of India’s packaged food sector — it can be the difference between compliance being feasible or not. Regulators setting a new mandatory requirement are not only making a safety call; they are also setting a compliance burden, and the timeline and scope of that burden are part of the decision.</p>
<p>Enforcement is a parallel constraint. A mandatory declaration is only as useful as the regulator’s ability to verify it. For allergens with standardised, widely available testing methods, enforcement is tractable. For allergens where testing methodology is not yet standardised — or where reference materials are not available in India — a mandatory declaration creates a requirement that inspection infrastructure cannot yet reliably verify. Regulators have reason to wait until enforcement is feasible before making a requirement mandatory rather than recommended.</p>
<p>The evidentiary standard for a mandatory declaration is also necessarily higher than for a research classification. A regulation that mandates disclosure of an allergen on the basis of sensitisation data alone — without challenge-confirmed allergy at meaningful scale — risks requiring declarations for foods that carry negligible clinical risk in practice, which dilutes the signal value of the mandatory list for consumers and manufacturers alike. The regulatory instinct to wait for confirmed data before acting is not conservatism for its own sake; it is the same instinct that makes the list meaningful when it does require something.</p>
<p>International alignment adds a further dimension. India’s participation in Codex and its WTO commitments create a shared interest in regulatory coherence across borders — both for consumer protection and for the practical functioning of food trade. Moving significantly ahead of or behind international standards has consequences that extend beyond the immediate safety question. The Codex process itself reflects this: the 2024 revision that added sesame and reclassified soy took years of evidence review and member state consultation before it was adopted. That pace is not a failure of urgency — it is what thorough cross-jurisdictional alignment requires.</p>
<p>None of this means the FSSAI list is final. It means the list reflects what was possible to establish, mandate, and enforce at the time it was written, under the conditions that existed. The divergence between the list and the clinical literature documented in §4.4 is not a gap that went unnoticed — it is a gap that the regulatory process has not yet closed, for reasons that are themselves part of the record.</p>
</section>
</section>
<section id="limitations" class="level2 page-columns page-full" data-number="5">
<h2 data-number="5" class="anchored" data-anchor-id="limitations"><span class="header-section-number">5</span> 5. Limitations</h2>
<p>The limitations of this review are the limitations of the underlying evidence base.</p>
<p><strong>Geographic coverage</strong>: All EuroPrevall-INCO data comes from Mysore and Bengaluru. Most clinic-based studies are from Delhi or Kolkata. No systematic food allergy data from Northeast India, rural India, coastal fishing communities, tribal populations, or most of North India is available in the reviewed literature.<sup>27</sup></p>
<div class="no-row-height column-margin column-container"><div id="fn27"><p><sup>27</sup>&nbsp;India’s dietary diversity means that allergen exposure varies considerably by region — a coastal community in Kerala will have systematically different fish and shellfish exposure than an inland population in Rajasthan. A sensitisation pattern that holds in urban Karnataka may not hold elsewhere. The geographic concentration of the available data is not a minor caveat; it means the review describes what is known about food allergy in a specific part of India, not India as a whole.</p></div></div><p><strong>Study design</strong>: Most sensitisation data comes from allergy clinic patients, not general population cohorts. Patients attending allergy clinics are a selected population — higher pre-test probability of sensitisation than the general population. Rates from these studies are expected to exceed true population prevalence.</p>
<p><strong>Diagnostic method</strong>: The 0.14% (children) and 1.2% (adults) figures use the EuroPrevall probable food allergy definition — reported symptoms within two hours combined with positive sIgE or SPT. This is not OFC-confirmed diagnosis. DBPCFC-confirmed data exists only for rice (6/16), black gram (4/14), and chickpea (31/41), each from single-centre clinic cohorts.</p>
<p><strong>Eggplant confound</strong>: Stored eggplant accumulates histamine at levels that can produce false-positive SPT results. Eggplant sensitisation rates from SPT-based studies should be interpreted with this confound in mind <span class="citation" data-cites="bhattacharya2018">(Bhattacharya et al. 2018)</span>. The LTP characterisation from <span class="citation" data-cites="bhattacharya2018">(Bhattacharya et al. 2018)</span> provides molecular evidence independent of SPT.</p>
<p><strong>Cross-reactivity vs primary sensitisation</strong>: Some sensitisation data, particularly for legumes, may reflect cross-reactivity with a primary sensitiser rather than independent sensitisation to the food tested. The Indian legume complex has documented cross-reactive epitopes <span class="citation" data-cites="bhattacharya2018 milana2025">(Bhattacharya et al. 2018; <span class="nocase">Milana et al.</span> 2025)</span>. A patient with primary sensitisation to black gram may test positive for lentil, pea, and faba bean without independent primary sensitisation to those foods.<sup>28</sup></p>
<div class="no-row-height column-margin column-container"><div id="fn28"><p><sup>28</sup>&nbsp;This complicates the interpretation of sensitisation rates for individual legumes. If a significant fraction of lentil-positive results in Indian clinic studies reflect cross-reactivity with black gram rather than primary lentil sensitisation, the true lentil-specific sensitisation rate would be lower than reported. Disentangling primary sensitisation from cross-reactivity requires molecular testing — component-resolved diagnostics — which is not available in most Indian clinical settings.</p></div></div><p><strong>Trajectory uncertainty</strong>: India’s food allergy landscape is not static. Urbanisation is consistently associated with higher food allergy rates in Asia-Pacific data <span class="citation" data-cites="leung2024">(<span class="nocase">Leung et al.</span> 2024)</span>, and India is urbanising rapidly. Current prevalence figures from 2006–2020 studies may not reflect the position in five or ten years. The allergen list derived from this review reflects evidence available through early 2026.</p>
<p><strong>FSSAI update status</strong>: Whether FSSAI intends to align with the Codex 2024 revision is not known at the time of writing.</p>
<hr>
</section>
<section id="an-extended-allergen-recognition-list-for-indian-food-systems" class="level2 page-columns page-full" data-number="6">
<h2 data-number="6" class="anchored" data-anchor-id="an-extended-allergen-recognition-list-for-indian-food-systems"><span class="header-section-number">6</span> 6. An extended allergen recognition list for Indian food systems</h2>
<p>A mandatory labelling regulation and a research classification are doing different things. A regulation sets a legal threshold — what manufacturers must declare, with consequences for non-compliance. A classification organises information for researchers, analysts, and developers working with ingredient data. A higher evidentiary bar is appropriate for a legal mandate than for a classification flag.<sup>29</sup></p>
<div class="no-row-height column-margin column-container"><div id="fn29"><p><sup>29</sup>&nbsp;This distinction matters because it explains why the list below includes foods that FSSAI does not mandate. The question being asked is different: not “what is confirmed enough to require by law” but “what is documented enough in Indian-specific evidence to warrant flagging as allergen-relevant.” The two questions have different answers.</p></div></div><p>The list has three tiers.</p>
<section id="tier-1-fssai-core-8" class="level3" data-number="6.1">
<h3 data-number="6.1" class="anchored" data-anchor-id="tier-1-fssai-core-8"><span class="header-section-number">6.1</span> Tier 1 — FSSAI core 8</h3>
<p>These eight allergen groups are mandatory declarations under FSSAI Regulation 5(14) <span class="citation" data-cites="fssai2020">(Food Safety and Standards Authority of India 2022)</span>. Adopted unchanged.</p>
<div id="tbl-tier1" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-tbl figure">
<figcaption class="quarto-float-caption-top quarto-float-caption quarto-float-tbl" id="tbl-tier1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Table&nbsp;3: IFID Tier 1 — FSSAI core 8
</figcaption>
<div aria-describedby="tbl-tier1-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<table class="caption-top table">
<colgroup>
<col style="width: 25%">
<col style="width: 25%">
<col style="width: 25%">
<col style="width: 25%">
</colgroup>
<thead>
<tr class="header">
<th>#</th>
<th>Allergen group</th>
<th>FSSAI reference</th>
<th>Indian evidence summary</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>1</td>
<td>Gluten-containing cereals</td>
<td>Reg 5(14)(i)</td>
<td>Wheat: 6.7–11.93% sIgE; 0–0.02% probable FA in children</td>
</tr>
<tr class="even">
<td>2</td>
<td>Crustaceans</td>
<td>Reg 5(14)(ii)</td>
<td>Prawn: 10.3–53.5% sensitisation; Pen i 1 India’s only IUIS-registered food allergen</td>
</tr>
<tr class="odd">
<td>3</td>
<td>Milk</td>
<td>Reg 5(14)(iii)</td>
<td>1.35–20.5% sensitisation; 0.5% probable FA in children’s probable FA subset</td>
</tr>
<tr class="even">
<td>4</td>
<td>Egg</td>
<td>Reg 5(14)(iv)</td>
<td>6.9–34.9% sensitisation; 0.05% probable FA in children</td>
</tr>
<tr class="odd">
<td>5</td>
<td>Fish</td>
<td>Reg 5(14)(v)</td>
<td>Heat-stable allergens in bhetki and mackerel; heat-labile in hilsha and pomfret</td>
</tr>
<tr class="even">
<td>6</td>
<td>Peanuts and tree nuts</td>
<td>Reg 5(14)(vi)</td>
<td>Peanut: 6.3–19.6% sensitisation; ~0.03% probable FA</td>
</tr>
<tr class="odd">
<td>7</td>
<td>Soybeans</td>
<td>Reg 5(14)(vii)</td>
<td>Limited India-specific data; Codex 2024 reclassified as recommended</td>
</tr>
<tr class="even">
<td>8</td>
<td>Sulphites (≥10 mg/kg)</td>
<td>Reg 5(14)(viii)</td>
<td>Chemical sensitivity; not a protein allergen</td>
</tr>
</tbody>
</table>
</div>
</figure>
</div>
</section>
<section id="tier-2-literature-additions" class="level3" data-number="6.2">
<h3 data-number="6.2" class="anchored" data-anchor-id="tier-2-literature-additions"><span class="header-section-number">6.2</span> Tier 2 — Literature additions</h3>
<p>These nine allergen groups are absent from FSSAI 2020 but have supporting evidence from Indian clinical or epidemiological literature. The type and strength of evidence is noted for each.</p>
<div id="tbl-tier2" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-tbl figure">
<figcaption class="quarto-float-caption-top quarto-float-caption quarto-float-tbl" id="tbl-tier2-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Table&nbsp;4: Extended allergen recognition list — Tier 2
</figcaption>
<div aria-describedby="tbl-tier2-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<table class="caption-top table">
<colgroup>
<col style="width: 25%">
<col style="width: 25%">
<col style="width: 25%">
<col style="width: 25%">
</colgroup>
<thead>
<tr class="header">
<th>#</th>
<th>Allergen</th>
<th>Evidence</th>
<th>Sources</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>9</td>
<td>Sesame</td>
<td>8.0% sIgE in EuroPrevall India children; Codex 2024 added as mandatory; US and EU both include sesame</td>
<td><span class="citation" data-cites="mahesh2023b codex2024">(<span class="nocase">Mahesh et al.</span> 2023; Codex Alimentarius Commission 2024)</span></td>
</tr>
<tr class="even">
<td>10</td>
<td>Black gram (<em>Vigna mungo</em>)</td>
<td>DBPCFC 4 of 14 confirmed; 28-kDa Vig m; resistant to pepsin digestion; cross-reacts with lentil, faba bean, lima bean, pea</td>
<td><span class="citation" data-cites="krishna2020 bhattacharya2018 milana2025">(Krishna et al. 2020; Bhattacharya et al. 2018; <span class="nocase">Milana et al.</span> 2025)</span></td>
</tr>
<tr class="odd">
<td>11</td>
<td>Chickpea (<em>Cicer arietinum</em>)</td>
<td>DBPCFC 31 of 41 confirmed; anaphylaxis documented; 26-kDa albumin cross-reacts with peanut IgE</td>
<td><span class="citation" data-cites="devdas2018 krishna2020 bhattacharya2018">(Devdas et al. 2018; Krishna et al. 2020; Bhattacharya et al. 2018)</span></td>
</tr>
<tr class="even">
<td>12</td>
<td>Kidney bean (<em>Phaseolus vulgaris</em>)</td>
<td>22% sensitisation in Delhi food-allergic population; 31-kDa allergen stable to pepsin; cross-reacts with peanut, black gram, lentil, pea</td>
<td><span class="citation" data-cites="bhattacharya2018">(Bhattacharya et al. 2018)</span></td>
</tr>
<tr class="odd">
<td>13</td>
<td>Lentil (<em>Lens culinaris</em>)</td>
<td>5.5–9.7% sensitisation (Delhi, N=216–1,860); cross-reacts with black gram, kidney bean, pea</td>
<td><span class="citation" data-cites="krishna2020 milana2025">(Krishna et al. 2020; <span class="nocase">Milana et al.</span> 2025)</span></td>
</tr>
<tr class="even">
<td>14</td>
<td>Rice (<em>Oryza sativa</em>)</td>
<td>DBPCFC 6 of 16 confirmed; 12% SPT positive in food-allergic population; 24-kDa chitinase as major allergen</td>
<td><span class="citation" data-cites="bhattacharya2018 mahesh2023b">(Bhattacharya et al. 2018; <span class="nocase">Mahesh et al.</span> 2023)</span></td>
</tr>
<tr class="odd">
<td>15</td>
<td>Eggplant (<em>Solanum melongena</em>)</td>
<td>Named among five primary Indian food allergens; 4.3% SPT-confirmed community study (N=741); LTP in peel and seeds; SPT figures carry histamine confound</td>
<td><span class="citation" data-cites="bhattacharya2018 krishna2020">(Bhattacharya et al. 2018; Krishna et al. 2020)</span></td>
</tr>
<tr class="even">
<td>16</td>
<td>Mustard (<em>Brassica spp.</em>)</td>
<td>Mandatory in EU; widely used in Indian cooking; FAO/WHO threshold assessment ongoing; no India-specific clinical data in reviewed literature</td>
<td><span class="citation" data-cites="fssai2020 codex2024">(Food Safety and Standards Authority of India 2022; Codex Alimentarius Commission 2024)</span></td>
</tr>
<tr class="odd">
<td>17</td>
<td>Pigeon pea / red gram (<em>Cajanus cajan</em>)</td>
<td>Novel allergens identified via Indian patient sera; 12.6% sIgE in Karnataka population study (N=2,219)</td>
<td><span class="citation" data-cites="bhattacharya2018 krishna2020">(Bhattacharya et al. 2018; Krishna et al. 2020)</span></td>
</tr>
</tbody>
</table>
</div>
</figure>
</div>
</section>
<section id="tier-3-flagged-insufficient-evidence-for-inclusion" class="level3" data-number="6.3">
<h3 data-number="6.3" class="anchored" data-anchor-id="tier-3-flagged-insufficient-evidence-for-inclusion"><span class="header-section-number">6.3</span> Tier 3 — Flagged; insufficient evidence for inclusion</h3>
<p>These foods have some Indian relevance but insufficient evidence to include in Tier 1 or Tier 2. Documented here for transparency and future review.</p>
<div id="tbl-tier3" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-tbl figure">
<figcaption class="quarto-float-caption-top quarto-float-caption quarto-float-tbl" id="tbl-tier3-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Table&nbsp;5: Extended allergen recognition list — Tier 3
</figcaption>
<div aria-describedby="tbl-tier3-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<table class="caption-top table">
<colgroup>
<col style="width: 33%">
<col style="width: 33%">
<col style="width: 33%">
</colgroup>
<thead>
<tr class="header">
<th>Allergen</th>
<th>Available evidence</th>
<th>What is missing</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Mung bean (<em>Vigna radiata</em>)</td>
<td>IUIS allergens Vig r1–r6 characterised; 12.5% sIgE in one Karnataka study; LTPs cross-reactive with peanut, soy, lentil, strawberry, apple, peach <span class="citation" data-cites="milana2025">(<span class="nocase">Milana et al.</span> 2025)</span></td>
<td>India-specific OFC or DBPCFC data; cross-reactivity with black gram may explain observed sensitisation</td>
</tr>
<tr class="even">
<td>Banana</td>
<td>3.6–40.6% sensitisation range across Indian studies</td>
<td>OFC data; wide range suggests heterogeneous testing and possible cross-reactivity</td>
</tr>
<tr class="odd">
<td>Betel leaf (<em>Piper betle</em>)</td>
<td>Widely used in Indian food culture; reported as an exposure of concern in community settings</td>
<td>No molecular characterisation or clinical allergy data</td>
</tr>
</tbody>
</table>
</div>
</figure>
</div>
<hr>
<!-- To be added after finalising -->
<!-- ## Appendix A: Author contributions and AI assistance disclosure {#sec-appendix-a}

### Lalitha A R

- Identified the research question and directed the analysis
- Selected and verified all source material against primary PDFs
- Wrote the manuscript plan specifying structure, section contents, 
  citation rules, and tone constraints before drafting began
- Reviewed the full draft for accuracy before finalising
- Verified all numbers and claims across original sources

### PA Mahesh

- Reviewed the clinical and epidemiological claims against field expertise
- Provided access to restricted source material
- Verified the framing of structural constraints on Indian allergy research

### Claude (Anthropic, claude-sonnet-4-6)

- Drafted all prose in §1–§6 and the abstract, following the approved plan
- Generated `references.bib` from paper front-matter provided in the plan
- Formatted all tables from data specified in the plan

### Where the line is

The plan specified section structure, which numbers to cite and from which 
papers, the contents of all three tiers, and tone constraints. Claude executed 
that plan. Claude did not select papers, read PDFs, make decisions about tier 
contents, or verify numbers against source documents. The analytical judgements 
in this paper are Lalitha's, documented in the plan before drafting began.

--- -->
</section>
</section>
<section id="references" class="level2" data-number="7">
<h2 data-number="7" class="anchored" data-anchor-id="references"><span class="header-section-number">7</span> References</h2>
<div id="refs" class="references csl-bib-body hanging-indent">
<div id="ref-bhattacharya2018" class="csl-entry">
Bhattacharya, Kashinath, Gaurab Sircar, Angira Dasgupta, and Swati Gupta Bhattacharya. 2018. <span>“Spectrum of Allergens and Allergen Biology in India.”</span> <em>International Archives of Allergy and Immunology</em> 177 (3): 219–37. <a href="https://doi.org/10.1159/000490805">https://doi.org/10.1159/000490805</a>.
</div>
<div id="ref-codex2024" class="csl-entry">
Codex Alimentarius Commission. 2024. <em>General Standard for the Labelling of Pre-Packaged Foods (CXS 1-1985)</em>. Joint FAO/WHO Food Standards Programme.
</div>
<div id="ref-devdas2018" class="csl-entry">
Devdas, Jaidev M., Christopher Mckie, Adam T. Fox, and Vinod H. Ratageri. 2018. <span>“Food Allergy in Children: An Overview.”</span> <em>Indian Journal of Pediatrics</em> 85: 369–74. <a href="https://doi.org/10.1007/s12098-017-2535-6">https://doi.org/10.1007/s12098-017-2535-6</a>.
</div>
<div id="ref-fssai2020" class="csl-entry">
Food Safety and Standards Authority of India. 2022. <em>Food Safety and Standards (Labelling and Display) Regulations, 2020</em>. Compendium. <a href="https://www.fssai.gov.in/upload/uploadfiles/files/Compendium_Labelling_Display_30_06_2022.pdf" class="uri">https://www.fssai.gov.in/upload/uploadfiles/files/Compendium_Labelling_Display_30_06_2022.pdf</a>.
</div>
<div id="ref-krishna2020" class="csl-entry">
Krishna, Mamidipudi Thirumala, Saibal Moitra, Padukudru Anand Mahesh, Vinay Mehta, Pudupakkam Vedanthan, and Devasahayam Jesudas Christopher. 2020. <span>“An Appraisal of Allergic Disorders in India and an Urgent Call for Action.”</span> <em>World Allergy Organization Journal</em> 13 (7): 100446. <a href="https://doi.org/10.1016/j.waojou.2020.100446">https://doi.org/10.1016/j.waojou.2020.100446</a>.
</div>
<div id="ref-leung2024" class="csl-entry">
<span class="nocase">Leung, Agnes Sze-yin, Punchama Pacharn, Sirinoot Tangvalelerd, et al.</span> 2024. <span>“Food Allergy in a Changing Dietary Landscape: A Focus on the Asia Pacific Region.”</span> <em>Pediatric Allergy and Immunology</em> 35 (8): e14211. <a href="https://doi.org/10.1111/pai.14211">https://doi.org/10.1111/pai.14211</a>.
</div>
<div id="ref-mahesh2023b" class="csl-entry">
<span class="nocase">Mahesh, Padukudru Anand et al.</span> 2023. <span>“Allergic Diseases in India - Prevalence, Risk Factors and Current Challenges.”</span> <em>Clinical &amp; Experimental Allergy</em> 53 (3): 276–94. <a href="https://doi.org/10.1111/cea.14239">https://doi.org/10.1111/cea.14239</a>.
</div>
<div id="ref-milana2025" class="csl-entry">
<span class="nocase">Milana, Matilde et al.</span> 2025. <span>“A Review of the Toxicological Effects and Allergenic Potential of Emerging Alternative Protein Sources.”</span> <em>Comprehensive Reviews in Food Science and Food Safety</em> 24: e70123. <a href="https://doi.org/10.1111/1541-4337.70123">https://doi.org/10.1111/1541-4337.70123</a>.
</div>
</div>


</section>


<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section></div> ]]></description>
  <guid>https://isrl.in/pub/2026-04-r-allergen/</guid>
  <pubDate>Thu, 16 Apr 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>The Coordinator Problem: Connector Hub Architecture as a Design Principle for Domain-Blind Integration in AI Systems</title>
  <dc:creator>Lalitha A R</dc:creator>
  <dc:creator>Claude (Anthropic)</dc:creator>
  <link>https://isrl.in/pub/2026-04-r-neurocon/</link>
  <description><![CDATA[ 




<section id="the-problem" class="level2" data-number="1">
<h2 data-number="1" class="anchored" data-anchor-id="the-problem"><span class="header-section-number">1</span> The Problem</h2>
<p>Current large language models are built on a single architectural assumption: that the best path to cross-domain reasoning is to expose one model to all domains simultaneously during training, and let integration emerge from the resulting parameter space. The assumption is productive. Models trained this way do transfer across domains; they do apply concepts from one field to problems in another; they do find patterns that transcend domain boundaries. The assumption has earned its place.</p>
<p>This paper does not argue that the assumption is wrong. It argues that the brain solved the same problem differently, that the brain’s solution has been empirically characterised in some detail, and that taking it seriously as a design principle opens experimental directions that current architectures do not explore.</p>
<p>The brain does not train a single substrate across all domains. It maintains domain-specific processing modules and coordinates them through a distinct class of regions whose defining property is precisely that they are <em>not</em> domain-specific. These connector hub regions manage the integration problem without holding the domain content. The architecture is separable: specialisation happens in one place, coordination happens in another, and the two are functionally distinct.</p>
<p>The question this paper poses is narrow: is there a meaningful AI architecture that reflects this separation? And if there is, what would it need to do that existing approaches do not already do?</p>
<hr>
</section>
<section id="background" class="level2" data-number="2">
<h2 data-number="2" class="anchored" data-anchor-id="background"><span class="header-section-number">2</span> Background</h2>
<section id="what-has-been-established-in-the-domain-generalist-paradigm" class="level3" data-number="2.1">
<h3 data-number="2.1" class="anchored" data-anchor-id="what-has-been-established-in-the-domain-generalist-paradigm"><span class="header-section-number">2.1</span> What has been established in the domain-generalist paradigm</h3>
<p>The large language model approach treats language modelling over a broad training corpus as the mechanism by which domain knowledge is acquired and cross-domain transfer is enabled. The model learns domain-specific patterns — the vocabulary, the relational structures, the typical inferential moves of a domain — by exposure to enough text from that domain. It learns cross-domain transfer by exposure to text that itself crosses domains: scientific writing that borrows from adjacent fields, interdisciplinary papers, analogical explanations, and so on.</p>
<p>The result is a model that holds domain knowledge and coordination capacity in the same parameter space. When the model encounters a problem, it does not route to a specialist; it retrieves from a generalised substrate that contains everything at once. Retrieval-augmented generation <span class="citation" data-cites="lewis2020retrieval">(Lewis et al. 2021)</span> and fine-tuning extend this by adding domain specificity as a correction applied after training: the base model is a generalist, and specialisation is layered on. Mixture-of-experts architectures <span class="citation" data-cites="shazeer2017outrageously">(Shazeer et al. 2017)</span> pursue a different efficiency: within a single model, a gating network routes each token to a subset of parameter experts. This reduces inference cost without changing the epistemic structure — all experts are trained jointly under the same loss, in the same model, on the same data distribution.</p>
<p>None of these approaches separates coordination from domain expertise at the architectural level. The coordinating function — whatever the model does when it integrates across domains — is distributed throughout the same weights that hold domain content.</p>
</section>
<section id="what-the-brain-does-instead" class="level3" data-number="2.2">
<h3 data-number="2.2" class="anchored" data-anchor-id="what-the-brain-does-instead"><span class="header-section-number">2.2</span> What the brain does instead</h3>
<p>Functional neuroimaging research has documented a different structure. The human brain is not organised as a single generalised processor. It is organised as a set of discrete functional modules — each densely interconnected internally, each performing a domain-specific cognitive function — coordinated by a distinct class of regions that do not themselves perform domain-specific computation.</p>
<p>Bertolero, Yeo, and D’Esposito <span class="citation" data-cites="bertolero2015modular">(2015)</span> established this architecture empirically across 9,208 experiments and 77 cognitive tasks in the BrainMap database. Using resting-state fMRI and graph-theoretic network analysis, they identified 14 distinct functional modules with strong spatial correspondence to known cognitive functions. They then measured activity at different types of nodes across all tasks. Local nodes within modules — provincial hubs — did not increase activity as more cognitive functions were engaged. Their computational load remained constant regardless of task complexity. Connector nodes — regions with high participation coefficients, meaning their connections were distributed evenly across many modules rather than concentrated within any one — showed a different pattern entirely. Their activity increased proportionally to the number of modules engaged in a task.</p>
<p>This finding has a specific implication. Connector nodes are not doing more of what the domain modules are doing when tasks get more complex. They are doing something else entirely: managing the integration load that increases when many modules must work together, while preserving the autonomy of each module’s function. The modules stay modules; the connector nodes handle the coordination between them.</p>
<p>Bertolero et al. <span class="citation" data-cites="bertolero2018mechanistic">(2018)</span> extended this to a mechanistic account. Connector hubs do not merely route information between modules; they actively tune the connectivity of their neighbours, reorganising which modules are more or less connected based on current task demands. Individuals with more diversely connected hubs and more modular brain networks show higher cognitive performance across all tasks — not on any specific task, but across the board. The diversity of hub connectivity predicts general integration capacity.</p>
<p>The architectural principle that emerges from this literature is not that specialisation and integration are in tension. It is that they are structurally separable and mutually reinforcing: more modular domain processing combined with more capable coordination produces better outcomes than either alone <span class="citation" data-cites="menon2024pfc">(Menon and D’Esposito 2022)</span>.</p>
<hr>
</section>
</section>
<section id="the-architecture-in-detail" class="level2" data-number="3">
<h2 data-number="3" class="anchored" data-anchor-id="the-architecture-in-detail"><span class="header-section-number">3</span> The Architecture in Detail</h2>
<section id="module-autonomy-is-not-isolation" class="level3" data-number="3.1">
<h3 data-number="3.1" class="anchored" data-anchor-id="module-autonomy-is-not-isolation"><span class="header-section-number">3.1</span> Module autonomy is not isolation</h3>
<p>A clarification matters here. Saying that domain modules process information autonomously does not mean they are isolated from one another. The brain is not a collection of silos that occasionally exchange messages. It is a network in which modules maintain dense internal connectivity while connector hub regions manage cross-module communication selectively, based on task demands.</p>
<p>The key property of connector hubs is the participation coefficient <span class="citation" data-cites="sporns2016modular">(Sporns and Betzel 2016)</span>: the degree to which a node’s connections are distributed evenly across modules rather than concentrated within one. A node with a high participation coefficient is well-connected to many modules. It has access to what each module is doing. But it does not perform any module’s function. It is neither a domain specialist nor a blank generalist. It occupies a structurally distinct role: a node that can reach across module boundaries without being defined by any of them.</p>
<p>Gordon et al. <span class="citation" data-cites="gordon2018three">(2018)</span> refined this picture further, showing that connector hubs are not a single category. Three distinct sets were identified, each with different task-activation profiles: one set deactivates across tasks, one activates during all tasks, one activates specifically during tasks requiring the configuring of input, transformation, and output processes. This differentiation within the coordinator role is relevant because it suggests coordination is itself a structured function — not a homogeneous relay, but a set of subtypes performing distinct integrative operations.</p>
</section>
<section id="what-connector-hubs-actually-compute" class="level3" data-number="3.2">
<h3 data-number="3.2" class="anchored" data-anchor-id="what-connector-hubs-actually-compute"><span class="header-section-number">3.2</span> What connector hubs actually compute</h3>
<p>The literature on connector hub function does not describe these regions as performing structural isomorphism detection — comparing problem shapes across domains and flagging when a problem in one domain has the same relational structure as a solved problem in another. That function is not what the connector hub literature directly documents.</p>
<p>What it documents is routing and tuning: managing which modules are active, how strongly they communicate, and how that connectivity pattern shifts as task demands change. The connector hub’s documented computational role is coordination in the sense of network configuration, not in the sense of cross-domain analogy.</p>
<p>The analogical reasoning literature, however, sits adjacent and is worth examining. A meta-analysis of 27 neuroimaging studies on analogical reasoning found that the left rostrolateral prefrontal cortex (rlPFC) is the region most consistently activated across all analogical reasoning tasks, regardless of whether the domain is semantic or visuospatial <span class="citation" data-cites="hobeika2016analogical">(Hobeika et al. 2016)</span>. The rlPFC is domain-general for analogy. Lesions to the left rlPFC impair analogical reasoning across domains. And the rlPFC is anatomically located within the connector hub regions identified by the modular brain architecture literature.</p>
<p>This anatomical overlap does not establish that connector hubs <em>are</em> cross-domain analogy engines. It establishes something more modest: the architectural conditions that define connector hubs — high participation coefficient, domain-distributed connectivity, low domain specificity — are the same conditions under which cross-domain relational comparison is supported. Whether a coordination layer trained into this architectural role would develop the capacity for structural similarity detection across domains is an open question. The brain architecture suggests it is not an implausible one.</p>
<p>Gentner’s structure-mapping theory <span class="citation" data-cites="gentner1983structure">(Gentner 1983)</span> provides the theoretical vocabulary for what this function would be. Analogy, in the structure-mapping framework, depends on finding relational correspondences between domains — not surface similarity between objects, but systematic similarity between the roles objects play within a relational structure. The function is domain-blind by definition: the same relational structure can exist in two entirely different content domains, and detecting it requires abstracting away from domain content. A coordinator trained to detect such correspondences would not need to know what a domain is about; it would need to know what shape a problem has.</p>
<hr>
</section>
</section>
<section id="the-distinction-from-existing-approaches" class="level2" data-number="4">
<h2 data-number="4" class="anchored" data-anchor-id="the-distinction-from-existing-approaches"><span class="header-section-number">4</span> The Distinction from Existing Approaches</h2>
<section id="mixture-of-experts" class="level3" data-number="4.1">
<h3 data-number="4.1" class="anchored" data-anchor-id="mixture-of-experts"><span class="header-section-number">4.1</span> Mixture of experts</h3>
<p>Mixture-of-experts architectures <span class="citation" data-cites="shazeer2017outrageously cai2024moe">(Shazeer et al. 2017; <span class="nocase">Cai and colleagues</span> 2025)</span> are the closest existing analogue to the proposed architecture. MoE models contain multiple sub-networks (experts), with a gating mechanism routing each token to a small subset of experts during inference. This achieves computational efficiency — not all parameters are activated for every input — and produces a form of functional specialisation within the model.</p>
<p>The distinction from the proposed architecture is epistemic rather than computational. In MoE, all experts are trained jointly under the same loss, within the same model, on the same data distribution. The gating network is trained simultaneously with the experts; there is no separation between the coordinator’s training objective and the specialists’ training objective. The experts are not domain-native in the sense of having been developed to hold a specific domain’s knowledge independently of the generalist training regime. They are weight-level subnetworks within a single model that have developed different activation patterns through joint training.</p>
<p>The connector hub analogy is structurally different. Domain-native modules, in the brain, are not trained jointly with the connector hubs under a shared loss. They develop through domain-specific experience and exposure; connector hubs develop separately. The proposed AI architecture would reflect this separation: domain-native specialist models trained on domain-specific corpora, and a coordination layer trained separately — potentially on a different objective entirely, concerned with structural relationships across domains rather than domain content.</p>
</section>
<section id="retrieval-augmented-generation" class="level3" data-number="4.2">
<h3 data-number="4.2" class="anchored" data-anchor-id="retrieval-augmented-generation"><span class="header-section-number">4.2</span> Retrieval-augmented generation</h3>
<p>Retrieval-augmented generation <span class="citation" data-cites="lewis2020retrieval">(Lewis et al. 2021)</span> adds domain specificity to a generalist model by retrieving relevant documents at inference time and including them in the context window. This is a post-training correction: the base model remains a generalist; specialisation is supplied externally.</p>
<p>The proposed architecture differs in that domain specialists are not corrections applied to a generalist. They are the primary domain processors. The coordinator does not have domain knowledge that gets topped up by retrieval; it does not hold domain knowledge in the first place. The separation is architectural, not a retrieval strategy.</p>
</section>
<section id="current-multi-agent-systems" class="level3" data-number="4.3">
<h3 data-number="4.3" class="anchored" data-anchor-id="current-multi-agent-systems"><span class="header-section-number">4.3</span> Current multi-agent systems</h3>
<p>Multi-agent systems <span class="citation" data-cites="xiao2024dual">(<span class="nocase">Xiao and colleagues</span> 2024)</span> distribute tasks across multiple models and coordinate their outputs through an orchestrator. This is the existing approach closest in spirit to the proposed architecture, and it shares the structural separation the brain’s architecture exhibits. The limitation documented in current production deployments is coordination overhead: as the number of specialists increases, the coordination tax — communication overhead, latency, context management — grows faster than the benefit <span class="citation" data-cites="oreilly2026multiagent">(Königstein 2026)</span>. The orchestrator in most deployed systems is not a model with a distinct training objective for coordination; it is a generalist model given a coordination role through prompting. The coordination capacity is borrowed from the generalist’s general capability, not developed as a distinct function.</p>
<p>The proposed architecture asks whether a coordinator trained specifically for structural integration — with its own training objective, on a corpus of cross-domain relational correspondences rather than on domain content — would perform differently from a prompted generalist acting as coordinator. The brain’s architecture suggests these are different things. Whether they produce different outcomes in AI systems is the experimental question.</p>
<hr>
</section>
</section>
<section id="convergent-evidence-from-organisational-theory" class="level2" data-number="5">
<h2 data-number="5" class="anchored" data-anchor-id="convergent-evidence-from-organisational-theory"><span class="header-section-number">5</span> Convergent Evidence from Organisational Theory</h2>
<p>The structural separation of coordination from domain expertise is not a new idea. It has been independently arrived at in human knowledge systems across several fields.</p>
<p>Lawrence and Lorsch <span class="citation" data-cites="lawrence1967differentiation">(1967)</span> formalised it in organisational theory as the tension between <em>differentiation</em> — the development of specialised subunits with their own goals, time horizons, and epistemic norms — and <em>integration</em> — the coordination of differentiated subunits toward shared outcomes. Their empirical finding was that high-performing organisations in complex environments achieved both: more differentiated than low performers <em>and</em> more integrated. The integrator role in their framework is structurally analogous to the connector hub: a person or unit that coordinates across specialist domains without being a domain specialist, whose effectiveness depends on being trusted by all parties rather than being expert in any one domain.</p>
<p>The T-shaped manager concept <span class="citation" data-cites="guest1991tshaped johnson1978tshaped">(Guest 1991; Johnson 1978)</span> formalises the same principle at the individual level. The vertical bar represents deep domain expertise; the horizontal bar represents the boundary-crossing competencies that enable coordination across specialisms. The horizontal bar is not generalisation in the sense of knowing everything at shallow depth. It is coordination capacity: the ability to integrate across domains without being defined by any of them. The T-shaped manager does not perform the specialist’s function; they create the conditions under which specialists can work together.</p>
<p>In legal practice, large firms have independently evolved an analogous structure. Complex multi-practice matters are handled by coordinating partners who assemble and route between domain specialists — IP attorneys, tax attorneys, litigation specialists — without needing deep expertise in each practice area. The coordinating partner’s role is not to do the specialist’s work but to understand which specialist is needed when, and to translate across the epistemic boundaries between practice groups. The domain specialists remain autonomous; the coordinator holds the integration function.</p>
<p>None of these analogies constitutes proof. They constitute convergent independent discovery of the same structural principle in systems facing the same problem: how to achieve coordination across domain-specialist components without collapsing the specialisation that makes the components useful.</p>
<hr>
</section>
<section id="what-the-coordinator-would-need-to-do" class="level2" data-number="6">
<h2 data-number="6" class="anchored" data-anchor-id="what-the-coordinator-would-need-to-do"><span class="header-section-number">6</span> What the Coordinator Would Need to Do</h2>
<p>The proposed architecture separates into two components with different requirements.</p>
<p><strong>Domain-native specialist models</strong> are trained on domain-specific corpora, with training objectives appropriate to their domain. Their epistemic authority is domain-bounded. They do not need to know what other specialists know; they need to produce high-quality domain-specific outputs when queried. The TRM result <span class="citation" data-cites="trm2026lessis">(Jolicoeur-Martineau 2025)</span> — a 7M parameter model achieving competitive performance on structured reasoning tasks — suggests that small, domain-native models may be sufficient for specialist functions that current generalist models handle with far more parameters.</p>
<p><strong>The coordination layer</strong> is the novel component. Its training objective is not domain content. It is structural: learning to represent problems in terms of their relational structure, to route queries to appropriate specialists, and — potentially — to detect when a problem in one domain shares relational structure with a problem another specialist has encountered. This last function is not a given. It is a hypothesis about what a coordination layer trained at the architectural level of connector hubs might develop. The rlPFC literature suggests the conditions for such a function are present in the analogous brain architecture; whether those conditions can be reproduced in a trained model is an empirical question.</p>
<p>The training corpus for such a coordinator is not obvious. One tractable direction is the history of science: interdisciplinary papers that explicitly transfer frameworks across domains, analogical explanations in scientific pedagogy, and cross-domain problem-solving literature document the function the coordinator would need to perform. Whether a model trained on this corpus would generalise to novel cross-domain structural correspondences rather than memorising the surface forms of known analogies is an open methodological question.</p>
<hr>
</section>
<section id="scope-and-limitations" class="level2" data-number="7">
<h2 data-number="7" class="anchored" data-anchor-id="scope-and-limitations"><span class="header-section-number">7</span> Scope and Limitations</h2>
<section id="what-this-paper-does-not-claim" class="level3" data-number="7.1">
<h3 data-number="7.1" class="anchored" data-anchor-id="what-this-paper-does-not-claim"><span class="header-section-number">7.1</span> What this paper does not claim</h3>
<p>This paper does not claim that the proposed architecture would outperform current large language models on any benchmark. The claim is architectural and conceptual: that the separation of coordination from domain expertise is a structural principle documented in the brain and independently discovered in human organisational systems, and that AI architecture has not yet explored it at the level the brain implements it.</p>
<p>This paper does not propose an implementation. The training objective for the coordination layer, the mechanism by which specialists and coordinator communicate, the representation format for structural similarity, and the evaluation framework for coordination quality are all open engineering questions. Scoping them is outside the range of what a conceptual paper can usefully do.</p>
<p>This paper does not argue that domain-native training produces better specialists than generalised training in all cases. There are domains where generalised training produces specialists that match or exceed domain-native fine-tuning. The architectural argument is not about which approach produces better specialists; it is about whether the coordination function is better served by a dedicated coordinator with its own training objective than by a generalist model acting as coordinator.</p>
</section>
<section id="what-remains-open" class="level3" data-number="7.2">
<h3 data-number="7.2" class="anchored" data-anchor-id="what-remains-open"><span class="header-section-number">7.2</span> What remains open</h3>
<p>The most significant open question is the training objective for the coordinator. The brain’s connector hubs develop their function through experience in a system where domain modules are already developing their functions simultaneously. A training objective that reproduces this developmental condition in a supervised setting does not yet exist.</p>
<p>The evaluation question is similarly open. Current benchmarks evaluate domain performance. Cross-domain transfer is typically evaluated by measuring performance on domain B after training on domain A. Neither evaluates coordination quality directly — the capacity of a coordinator to route appropriately, integrate across specialists, and detect structural correspondences across domain boundaries. Developing such an evaluation framework may be a precondition for testing the architecture.</p>
<hr>
</section>
</section>
<section id="authorship-note" class="level2" data-number="8">
<h2 data-number="8" class="anchored" data-anchor-id="authorship-note"><span class="header-section-number">8</span> Authorship Note</h2>
<p>Lalitha A R identified the architectural parallel between connector hub function and the proposed AI coordination layer, formulated the question of whether a domain-blind coordinator trained separately from domain specialists would behave differently from a prompted generalist acting as coordinator, developed the cross-domain isomorphism detection hypothesis as an extension of the connector hub analogy, and directed the search for convergent parallels in organisational theory and legal practice.</p>
<p>Claude searched the neuroscience literature, confirmed the rlPFC analogical reasoning literature as the relevant adjacent body of work, located and verified the organisational theory and law firm parallels on Lalitha’s direction, built the papertable and bibliography, and drafted this paper from the resulting materials. The core architectural question, the isomorphism extension, and the cross-domain framing instinct are Lalitha’s. The literature retrieval, synthesis, and written draft are Claude’s.</p>
<hr>
</section>
<section id="references" class="level2" data-number="9">
<h2 data-number="9" class="anchored" data-anchor-id="references"><span class="header-section-number">9</span> References</h2>
<div id="refs" class="references csl-bib-body hanging-indent">
<div id="ref-bertolero2018mechanistic" class="csl-entry">
Bertolero, Maxwell A., B. T. Thomas Yeo, Danielle S. Bassett, and Mark D’Esposito. 2018. <span>“A Mechanistic Model of Connector Hubs, Modularity and Cognition.”</span> <em>Nature Neuroscience</em> 21: 1127–35. <a href="https://doi.org/10.1038/s41593-018-0157-3">https://doi.org/10.1038/s41593-018-0157-3</a>.
</div>
<div id="ref-bertolero2015modular" class="csl-entry">
Bertolero, Maxwell A., B. T. Thomas Yeo, and Mark D’Esposito. 2015. <span>“The Modular and Integrative Functional Architecture of the Human Brain.”</span> <em>Proceedings of the National Academy of Sciences</em> 112 (49): E6798–807. <a href="https://doi.org/10.1073/pnas.1510619112">https://doi.org/10.1073/pnas.1510619112</a>.
</div>
<div id="ref-cai2024moe" class="csl-entry">
<span class="nocase">Cai, Weilin, and colleagues</span>. 2025. <span>“A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications.”</span> <em>arXiv Preprint</em>. <a href="https://arxiv.org/html/2503.07137v1">https://arxiv.org/html/2503.07137v1</a>.
</div>
<div id="ref-gentner1983structure" class="csl-entry">
Gentner, Dedre. 1983. <span>“Structure-Mapping: A Theoretical Framework for Analogy.”</span> <em>Cognitive Science</em> 7 (2): 155–70. <a href="https://doi.org/10.1207/s15516709cog0702_3">https://doi.org/10.1207/s15516709cog0702_3</a>.
</div>
<div id="ref-gordon2018three" class="csl-entry">
Gordon, Evan M., Charles J. Lynch, Caterina Gratton, et al. 2018. <span>“Three Distinct Sets of Connector Hubs Integrate Human Brain Function.”</span> <em>Cell Reports</em> 24 (7): 1687–96. <a href="https://doi.org/10.1016/j.celrep.2018.07.050">https://doi.org/10.1016/j.celrep.2018.07.050</a>.
</div>
<div id="ref-guest1991tshaped" class="csl-entry">
Guest, David. 1991. <span>“The Hunt Is on for the Renaissance Man of Computing.”</span> <em>The Independent</em>.
</div>
<div id="ref-hobeika2016analogical" class="csl-entry">
Hobeika, Luc, Cassandre Diard-Detoeuf, Béatrice Garcin, Richard Levy, and Emmanuelle Volle. 2016. <span>“General and Specialized Brain Correlates for Analogical Reasoning: A Meta-Analysis of Functional Imaging Studies.”</span> <em>Human Brain Mapping</em> 37 (5): 1953–69. <a href="https://doi.org/10.1002/hbm.23149">https://doi.org/10.1002/hbm.23149</a>.
</div>
<div id="ref-johnson1978tshaped" class="csl-entry">
Johnson, Denis. 1978. <span>“T-Shaped Manager.”</span> <em>IEEE Engineering Management Review</em>.
</div>
<div id="ref-trm2026lessis" class="csl-entry">
Jolicoeur-Martineau, Alexia. 2025. <em>Less Is More: Recursive Reasoning with Tiny Networks</em>. <a href="https://arxiv.org/abs/2510.04871">https://arxiv.org/abs/2510.04871</a>.
</div>
<div id="ref-oreilly2026multiagent" class="csl-entry">
Königstein, Nicole. 2026. <span>“Designing Effective Multi-Agent Architectures.”</span> O’Reilly Radar. <a href="https://www.oreilly.com/radar/designing-effective-multi-agent-architectures/">https://www.oreilly.com/radar/designing-effective-multi-agent-architectures/</a>.
</div>
<div id="ref-lawrence1967differentiation" class="csl-entry">
Lawrence, Paul R., and Jay W. Lorsch. 1967. <span>“Differentiation and Integration in Complex Organizations.”</span> <em>Administrative Science Quarterly</em> 12 (1): 1–47. <a href="https://doi.org/10.2307/2391211">https://doi.org/10.2307/2391211</a>.
</div>
<div id="ref-lewis2020retrieval" class="csl-entry">
Lewis, Patrick, Ethan Perez, Aleksandra Piktus, et al. 2021. <em>Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks</em>. <a href="https://arxiv.org/abs/2005.11401">https://arxiv.org/abs/2005.11401</a>.
</div>
<div id="ref-menon2024pfc" class="csl-entry">
Menon, Vinod, and Mark D’Esposito. 2022. <span>“The Role of <span>PFC</span> Networks in Cognitive Control and Executive Function.”</span> <em>Nature Reviews Neuroscience</em> 23: 535–55. <a href="https://doi.org/10.1038/s41583-022-00580-9">https://doi.org/10.1038/s41583-022-00580-9</a>.
</div>
<div id="ref-shazeer2017outrageously" class="csl-entry">
Shazeer, Noam, Azalia Mirhoseini, Krzysztof Maziarz, et al. 2017. <span>“Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer.”</span> <em>arXiv Preprint</em>. <a href="https://arxiv.org/abs/1701.06538">https://arxiv.org/abs/1701.06538</a>.
</div>
<div id="ref-sporns2016modular" class="csl-entry">
Sporns, Olaf, and Richard F. Betzel. 2016. <span>“Modular Brain Networks.”</span> <em>Annual Review of Psychology</em> 67: 613–40. <a href="https://doi.org/10.1146/annurev-psych-122414-033634">https://doi.org/10.1146/annurev-psych-122414-033634</a>.
</div>
<div id="ref-xiao2024dual" class="csl-entry">
<span class="nocase">Xiao, Lianmin, and colleagues</span>. 2024. <span>“Optimizing Generative <span>AI</span> Networking: A Dual Perspective with Multi-Agent Systems and Mixture of Experts.”</span> <em>arXiv Preprint</em>. <a href="https://arxiv.org/abs/2405.12472">https://arxiv.org/abs/2405.12472</a>.
</div>
</div>


    <link rel="stylesheet" href="../../review-layer.css">
    <script>window._reviewPageTitle = "The Coordinator Problem: Connector Hub Architecture as a Design Principle for Domain-Blind Integration in AI Systems";</script>
    <div id="isrl-review-bar">
      <button id="btn-review-mode" onclick="ReviewLayer.toggleReviewMode()">Review</button>
      <button id="btn-finish-export" onclick="ReviewLayer.exportYAML()" style="display:none">Finish Review &amp; Export</button>
      <button id="btn-view-review" onclick="ReviewLayer.openViewMode()">View Review</button>
      <button id="btn-clear-review" onclick="ReviewLayer.clearAllAnnotations()" style="display:none">Clear all</button>
      <input type="file" id="review-yaml-input" accept=".yml,.yaml" style="display:none">
      <span id="review-mode-label"></span>
    </div>
    <script src="../../review-layer.js"></script>
  
</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section></div> ]]></description>
  <guid>https://isrl.in/pub/2026-04-r-neurocon/</guid>
  <pubDate>Fri, 10 Apr 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>When the Means Become the End: Instrumental-Terminal Goal Inversion in Large Language Models</title>
  <dc:creator>Lalitha A R</dc:creator>
  <dc:creator>Claude (Anthropic)</dc:creator>
  <link>https://isrl.in/pub/2026-04-r-itgi/</link>
  <description><![CDATA[ 




<section id="the-problem" class="level2" data-number="1">
<h2 data-number="1" class="anchored" data-anchor-id="the-problem"><span class="header-section-number">1</span> The Problem</h2>
<p>A researcher shares a paper and asks an LLM to synthesize its relevance into a lab log. The log is returned: four sections, each with a header, precisely scoped claims, proper citations, a section on limitations, a section on what to retain. Every structural element of a log is present. The terminal goal — recording one transferable insight clearly enough that a reader six months from now can reconstruct why the paper matters — is not served. The log is a completed artifact that does not do its job.</p>
<p>This failure is not hallucination. The claims are accurate. It is not sycophancy — the model is not telling the researcher what she wants to hear. It is not specification gaming in the classical sense — no reward signal is being hacked. The model has simply substituted completing the structure for serving the purpose the structure exists for. The instrumental goal has displaced the terminal goal.</p>
<p>This paper argues that this failure mode is: (a) systematic and predictable, not idiosyncratic; (b) structurally identical to a well-documented phenomenon in organizational sociology; (c) the inverse of how humans respond to the same constraints; and (d) quantifiable through a specific experimental design.</p>
<hr>
</section>
<section id="background-and-motivation" class="level2" data-number="2">
<h2 data-number="2" class="anchored" data-anchor-id="background-and-motivation"><span class="header-section-number">2</span> Background and Motivation</h2>
<section id="what-has-been-named-and-what-has-not" class="level3" data-number="2.1">
<h3 data-number="2.1" class="anchored" data-anchor-id="what-has-been-named-and-what-has-not"><span class="header-section-number">2.1</span> What has been named, and what has not</h3>
<p>The LLM alignment literature has developed a family of related concepts for failures at the boundary of intent and execution.</p>
<p><strong>Specification gaming</strong> occurs when a model achieves the literal specification of an objective without achieving the intended outcome <span class="citation" data-cites="amodei2016concrete">(Amodei et al. 2016)</span>. The canonical examples are from reinforcement learning: a boat-racing agent that maximizes score by circling checkpoints rather than finishing the race; a cleaning robot that covers its camera rather than cleaning. In each case, a proxy metric is gamed because the true objective was underspecified. The model finds an unintended solution to the stated objective.</p>
<p><strong>Reward hacking</strong> is the broader category: exploiting flaws or blind spots in the reward model to achieve high proxy reward without satisfying human intent <span class="citation" data-cites="denison2024sycophancy">(<span class="nocase">Denison et al.</span> 2024)</span>. Sycophancy — agreeing with false user claims to generate approval signal — is a trained-in form of reward hacking. Reward tampering is its extreme: modifying the reward mechanism itself.</p>
<p><strong>Goal misgeneralization</strong> occurs when a model pursues a proxy goal that correlated with the intended goal during training but diverges from it in deployment, particularly under distributional shift <span class="citation" data-cites="langosco2022goal">(Langosco di Langosco et al. 2022)</span>. The model has learned the wrong goal; it did not fail to execute the right one.</p>
<p>None of these concepts describes the failure in the opening example. In that case:</p>
<ul>
<li>The objective is not underspecified. The researcher stated what she wanted.</li>
<li>No reward signal is being gamed. There is no approval-seeking behavior.</li>
<li>There is no distributional shift. The task is squarely in-distribution for the model.</li>
<li>The model has not learned the wrong goal. It has correctly identified the immediate task.</li>
</ul>
<p>What has happened is different: the model has treated the task’s instrumental structure — the log format, the section conventions, the expected length and scope — as the thing to be optimized, and in doing so has lost track of the terminal goal the task was supposed to serve. The artifact is complete. The purpose is not.</p>
</section>
<section id="the-cross-domain-precedent" class="level3" data-number="2.2">
<h3 data-number="2.2" class="anchored" data-anchor-id="the-cross-domain-precedent"><span class="header-section-number">2.2</span> The cross-domain precedent</h3>
<p>This failure mode has a precise name in organizational sociology. Merton (1940) described it in bureaucracies: rules designed as means to an end become ends in themselves through a process he called <strong>goal displacement</strong>. The bureaucrat adheres to every rule, satisfies every procedure, and fails every client. Merton called the extreme case “the bureaucratic virtuoso, who never forgets a single rule binding his action and hence is unable to assist many of his clients” <span class="citation" data-cites="merton1940bureaucratic">(Merton 1940, 563)</span>.</p>
<p>The psychological mechanism Merton cited was Allport’s (1937) <strong>functional autonomy of motives</strong> — the principle that instrumental behaviors can become self-sustaining and motivationally independent from their original purpose. “What was once a means becomes an end in itself” <span class="citation" data-cites="allport1937personality">(Allport 1937)</span>. Allport’s workman who continues to do clean-cut jobs even when his security no longer depends on it is the benign version; Merton’s bureaucrat is the organizational pathology.</p>
<p>The measurement-science parallel is Goodhart’s Law (1975): when a measure becomes a target, it ceases to be a good measure <span class="citation" data-cites="goodhart1975problems">(Goodhart 1975)</span>. Campbell (1979) stated the same principle from social science: the more a quantitative indicator is used for social decision-making, the more it distorts the process it was meant to monitor <span class="citation" data-cites="campbell1979assessing">(Campbell 1979)</span>. In each case: a proxy for a goal displaces the goal.</p>
<p>What connects these traditions is a shared structural logic. An instrumental value — a rule, a metric, a procedure — is created to serve a terminal goal. Under conditions that reward adherence to the instrumental value independent of terminal goal service, the instrumental value becomes terminal. The original goal disappears from view.</p>
<hr>
</section>
</section>
<section id="the-theoretical-claim" class="level2" data-number="3">
<h2 data-number="3" class="anchored" data-anchor-id="the-theoretical-claim"><span class="header-section-number">3</span> The Theoretical Claim</h2>
<section id="instrumental-terminal-goal-inversion-defined" class="level3" data-number="3.1">
<h3 data-number="3.1" class="anchored" data-anchor-id="instrumental-terminal-goal-inversion-defined"><span class="header-section-number">3.1</span> Instrumental-terminal goal inversion defined</h3>
<p>We define <strong>instrumental-terminal goal inversion</strong> (ITGI) as follows:</p>
<blockquote class="blockquote">
<p>Given a task with an explicit terminal goal <img src="https://latex.codecogs.com/png.latex?G_T"> and a set of instrumental constraints <img src="https://latex.codecogs.com/png.latex?C%20=%20%5C%7Bc_1,%20c_2,%20%5Cldots,%20c_n%5C%7D"> specified to serve <img src="https://latex.codecogs.com/png.latex?G_T">, ITGI occurs when a model’s output satisfies <img src="https://latex.codecogs.com/png.latex?C"> while failing to serve <img src="https://latex.codecogs.com/png.latex?G_T">, and this failure is attributable to the model treating satisfaction of <img src="https://latex.codecogs.com/png.latex?C"> as sufficient for task completion.</p>
</blockquote>
<p>The key conditions distinguishing ITGI from related phenomena:</p>
<ol type="1">
<li><img src="https://latex.codecogs.com/png.latex?G_T"> is stated in the prompt, not merely implied or inferable from reward signal.</li>
<li><img src="https://latex.codecogs.com/png.latex?C"> is explicitly specified (format requirements, section structure, output length, schema).</li>
<li>The model’s output satisfies all or most elements of <img src="https://latex.codecogs.com/png.latex?C">.</li>
<li>The model’s output does not serve <img src="https://latex.codecogs.com/png.latex?G_T"> — specifically, a reader with only the output cannot accomplish what <img src="https://latex.codecogs.com/png.latex?G_T"> required.</li>
<li>The failure is not attributable to factual error, hallucination, or task misunderstanding.</li>
</ol>
<p>This distinguishes ITGI from specification gaming (where <img src="https://latex.codecogs.com/png.latex?G_T"> is underspecified), from sycophancy (where the model is optimizing for approval), and from goal misgeneralization (where distributional shift causes a wrong goal to be pursued).</p>
</section>
<section id="the-structural-specification-hypothesis" class="level3" data-number="3.2">
<h3 data-number="3.2" class="anchored" data-anchor-id="the-structural-specification-hypothesis"><span class="header-section-number">3.2</span> The structural specification hypothesis</h3>
<p>ITGI is not merely possible; we hypothesize it is <strong>monotonically increasing in structural specification density</strong>. As the number and specificity of instrumental constraints in a prompt increases, the probability that the model’s output serves the terminal goal decreases, holding terminal goal clarity constant.</p>
<p>This is the counterintuitive claim. For humans, constraints serve as scaffolding — they reduce cognitive overhead allocated to the <em>how</em>, freeing attention for the <em>why</em>. A researcher given a template for a log entry is freed from formatting decisions and can concentrate on what the log should say. The constraints help.</p>
<p>For LLMs, the hypothesis is that the inverse holds: each additional constraint is an additional optimization target, and as constraint density increases, the model’s attention allocation shifts from <img src="https://latex.codecogs.com/png.latex?G_T"> to <img src="https://latex.codecogs.com/png.latex?C">. The constraints crowd out the goal.</p>
<p>The empirical support for this direction comes from Sridhar et al.&nbsp;(2023), whose ASH (Actor-Summarizer-Hierarchical) prompting work on web navigation demonstrated that when a single LLM prompt must simultaneously process raw environmental observations and predict the next action, performance degrades sharply on long-horizon tasks. On trajectories exceeding 11 steps, REACT — which loads both observation processing and action prediction into a single prompt — scored 7.4; ASH, which separates these functions into a SUMMARIZER and an ACTOR, scored 38.2 <span class="citation" data-cites="sridhar2023hierarchical">(Sridhar et al. 2023)</span>. The implicit diagnosis: when a model must manage simultaneous instrumental load (process the current observation) and terminal goal tracking (buy the right product), terminal goal tracking degrades first. The fix — hierarchical decomposition that isolates instrumental processing — is structural evidence for the hypothesis.</p>
</section>
<section id="the-inverse-human-pattern" class="level3" data-number="3.3">
<h3 data-number="3.3" class="anchored" data-anchor-id="the-inverse-human-pattern"><span class="header-section-number">3.3</span> The inverse human pattern</h3>
<p>The human behavioral literature on intention establishes the baseline against which ITGI is the inversion. The intention-action gap — the well-documented failure of humans to execute their stated intentions — shows that humans hold terminal goals but frequently fail on the instrumental side <span class="citation" data-cites="sheeran2002intention sheeran2016gap">(Sheeran 2002; Sheeran and Webb 2016)</span>. Intentions explain only 18–28% of behavioral variance, even when the intention is strong and clearly stated.</p>
<p>The LLM failure runs in the opposite direction. Models execute the instrumental structure reliably and completely. What they fail to maintain is the terminal goal. Humans fail at doing; LLMs fail at purposing.</p>
<p>This inversion is not merely a rhetorical point. It has methodological implications for how the failure should be studied, and design implications for how it might be mitigated. Strategies developed to close the human intention-action gap — implementation intentions, commitment devices, environmental triggers — work by strengthening the link between a held terminal goal and instrumental execution. The LLM problem requires the reverse: strengthening the link between instrumental execution and a terminal goal that has not been lost but has been deprioritized.</p>
<hr>
</section>
</section>
<section id="cross-domain-synthesis" class="level2" data-number="4">
<h2 data-number="4" class="anchored" data-anchor-id="cross-domain-synthesis"><span class="header-section-number">4</span> Cross-Domain Synthesis</h2>
<section id="the-common-structure" class="level3" data-number="4.1">
<h3 data-number="4.1" class="anchored" data-anchor-id="the-common-structure"><span class="header-section-number">4.1</span> The common structure</h3>
<p>Across the organizational sociology, measurement science, and LLM literatures, a single structural pattern recurs:</p>
<ol type="1">
<li>A <strong>terminal goal</strong> exists: serve the client, measure economic health, help the researcher.</li>
<li>An <strong>instrumental proxy</strong> is created to serve the terminal goal: follow the rules, track the money supply, complete the artifact.</li>
<li>Under conditions where adherence to the proxy is rewarded independent of terminal goal service, the proxy <strong>displaces</strong> the terminal goal.</li>
<li>The agent — bureaucrat, central bank, LLM — then optimizes the proxy while failing the original goal.</li>
</ol>
<p>What varies across domains is the <em>mechanism</em> of displacement:</p>
<ul>
<li>In bureaucracies, displacement is driven by <strong>incentive structures</strong>: career advancement depends on rule compliance, not client outcomes.</li>
<li>In measurement systems, displacement is driven by <strong>optimization pressure</strong>: when a metric becomes a target, actors game it.</li>
<li>In LLMs, displacement is driven by <strong>attention allocation during inference</strong>: satisfying explicit constraints is a local, verifiable task; serving a terminal goal requires maintaining a non-local purpose across the response.</li>
</ul>
<p>The LLM mechanism is distinct from the human mechanisms in an important way. Bureaucratic ritualism is chosen — the bureaucrat has other options and selects rule compliance. Metric gaming is strategic — the actor knows the metric is a proxy and exploits the gap. LLM ITGI is neither chosen nor strategic. The model does not know it has displaced the terminal goal. The displacement is a property of how inference proceeds when constraints are dense, not a property of motivation or strategy.</p>
</section>
<section id="allports-functional-autonomy-as-the-deepest-analog" class="level3" data-number="4.2">
<h3 data-number="4.2" class="anchored" data-anchor-id="allports-functional-autonomy-as-the-deepest-analog"><span class="header-section-number">4.2</span> Allport’s functional autonomy as the deepest analog</h3>
<p>Allport’s functional autonomy <span class="citation" data-cites="allport1937personality">(Allport 1937)</span> is the closest structural analog to LLM ITGI — and also the most illuminating difference. Allport showed that instrumental behaviors can become self-sustaining: a motive that originates as a means to an end acquires its own motivational energy, independent of the original end. The workman who does clean-cut jobs even when his income no longer depends on it has developed a functionally autonomous motive for craftsmanship.</p>
<p>In humans, this is generally adaptive: functionally autonomous motives allow complex behaviors to persist without continuous reference to their original justification. The craftsman doesn’t recalculate the utility of quality work every time he picks up a tool.</p>
<p>In LLMs, the analog fails to generalize adaptively. There is no “persistence of a motive” — there is no motive, in the psychological sense. What there is: a training distribution that rewards well-formed artifacts, and an inference-time process that generates the most plausible completion of a prompt that already contains an elaborate structure. The structure predicts its own completion. The terminal goal, if not redundantly encoded in ways that compete with the structural signal, loses salience.</p>
</section>
<section id="goodhart-as-the-measurement-science-frame" class="level3" data-number="4.3">
<h3 data-number="4.3" class="anchored" data-anchor-id="goodhart-as-the-measurement-science-frame"><span class="header-section-number">4.3</span> Goodhart as the measurement-science frame</h3>
<p>Manheim and Garrabrant (2018) distinguish four variants of Goodhart’s Law: regressional (the proxy correlates imperfectly with the goal), extremal (the proxy diverges from the goal at extreme optimization), causal (optimizing the proxy changes the underlying relationship), and adversarial (an agent exploits the gap between proxy and goal) <span class="citation" data-cites="manheim2018categorizing">(Manheim and Garrabrant 2018)</span>.</p>
<p>LLM ITGI most closely resembles the <strong>regressional variant</strong>: the proxy (artifact completion) correlates with the terminal goal (task purpose) under normal conditions but diverges when structural specification is dense. The correlation holds for simple tasks with thin constraints; it breaks down as constraint density increases.</p>
<p>This framing is useful because it predicts where ITGI will be most severe: tasks with elaborate templates, multi-section output requirements, rigid format constraints, and complex schemas. These are, not coincidentally, the tasks where LLMs are most commonly deployed in professional and research settings — report generation, document drafting, structured analysis, code documentation.</p>
<hr>
</section>
</section>
<section id="experimental-framework" class="level2" data-number="5">
<h2 data-number="5" class="anchored" data-anchor-id="experimental-framework"><span class="header-section-number">5</span> Experimental Framework</h2>
<section id="what-needs-to-be-shown" class="level3" data-number="5.1">
<h3 data-number="5.1" class="anchored" data-anchor-id="what-needs-to-be-shown"><span class="header-section-number">5.1</span> What needs to be shown</h3>
<p>Three empirical claims require testing:</p>
<ol type="1">
<li><strong>Existence</strong>: ITGI occurs at inference time in current LLMs — outputs that satisfy structural constraints while failing terminal goals.</li>
<li><strong>Monotonicity</strong>: ITGI increases as structural specification density increases, holding terminal goal clarity constant.</li>
<li><strong>Asymmetry</strong>: The relationship between structural specification and ITGI is different for LLMs than for humans performing the same tasks.</li>
</ol>
</section>
<section id="core-design" class="level3" data-number="5.2">
<h3 data-number="5.2" class="anchored" data-anchor-id="core-design"><span class="header-section-number">5.2</span> Core design</h3>
<p><strong>Task pairs with separable terminal and instrumental goals.</strong> The key design requirement is that <img src="https://latex.codecogs.com/png.latex?G_T"> and <img src="https://latex.codecogs.com/png.latex?C"> can be independently evaluated. Tasks where artifact completion and purpose-serving are inseparable are uninformative.</p>
<p>Suitable task types:</p>
<ul>
<li><em>Synthesis tasks</em>: Summarize this paper in a way that helps a reader decide whether to read it. Instrumental: produce a summary of appropriate length and scope. Terminal: enable the decision.</li>
<li><em>Advisory tasks</em>: Draft a note explaining this finding to a non-specialist audience. Instrumental: produce a note in the specified format. Terminal: the reader understands the finding.</li>
<li><em>Selection tasks</em>: Write a log entry for this source that captures what is relevant to Project X. Instrumental: produce a log entry. Terminal: a future researcher can use it without reading the source.</li>
</ul>
<p><strong>Specification density as the independent variable.</strong> Three conditions:</p>
<ul>
<li><em>Condition A (thin)</em>: Terminal goal stated only. No format, length, or section requirements.</li>
<li><em>Condition B (moderate)</em>: Terminal goal stated plus moderate structure (suggested sections, approximate length).</li>
<li><em>Condition C (dense)</em>: Terminal goal stated plus elaborate structure (required section headers, word count constraints, mandatory elements).</li>
</ul>
<p><strong>Measurement of terminal goal service.</strong> The challenge is avoiding subjective evaluation. Three approaches, in increasing defensibility:</p>
<ol type="1">
<li><strong>Downstream task completion</strong>: Give readers only the output and ask them to accomplish what <img src="https://latex.codecogs.com/png.latex?G_T"> required (make the decision, explain the finding to someone else, use the log entry without the source). Measure success rate.</li>
<li><strong>Counterfactual completeness</strong>: Have domain experts identify the 3–5 elements an output <em>must</em> contain to serve <img src="https://latex.codecogs.com/png.latex?G_T">. Score presence/absence. ITGI predicts that Condition C outputs will score lower on this list despite longer overall length and higher structural compliance.</li>
<li><strong>Truncation sensitivity</strong>: Progressively shorten outputs from the end. Measure at what point <img src="https://latex.codecogs.com/png.latex?G_T">-relevant content disappears vs.&nbsp;at what point structural completeness fails. ITGI predicts these diverge, with <img src="https://latex.codecogs.com/png.latex?G_T"> content concentrated early and structural completion content concentrated late.</li>
</ol>
<p><strong>Human baseline.</strong> The asymmetry claim (Claim 3) requires human participants completing the same tasks under the same three conditions. If ITGI is real and inverted from human behavior, Condition C outputs from humans should be <em>more</em> purpose-serving than Condition A, while Condition C outputs from LLMs should be <em>less</em> purpose-serving.</p>
<hr>
</section>
</section>
<section id="scope-and-limitations" class="level2" data-number="6">
<h2 data-number="6" class="anchored" data-anchor-id="scope-and-limitations"><span class="header-section-number">6</span> Scope and Limitations</h2>
<section id="what-this-thesis-does-not-claim" class="level3" data-number="6.1">
<h3 data-number="6.1" class="anchored" data-anchor-id="what-this-thesis-does-not-claim"><span class="header-section-number">6.1</span> What this thesis does not claim</h3>
<p>ITGI is not claimed to be:</p>
<ul>
<li>The dominant failure mode of LLMs, or more common than hallucination, sycophancy, or factual error.</li>
<li>Present in all structured tasks. Tasks where structural compliance and terminal goal service are tightly correlated will not exhibit ITGI.</li>
<li>A property of current models specifically. Whether ITGI increases or decreases with model scale, RLHF, or chain-of-thought prompting is an empirical question this thesis does not answer.</li>
<li>A training-time phenomenon. The claim is about inference-time behavior given well-formed prompts.</li>
</ul>
</section>
<section id="confounds-requiring-control" class="level3" data-number="6.2">
<h3 data-number="6.2" class="anchored" data-anchor-id="confounds-requiring-control"><span class="header-section-number">6.2</span> Confounds requiring control</h3>
<ul>
<li><strong>Task difficulty</strong>: More structurally complex tasks may simply be harder, producing lower overall quality independently of ITGI.</li>
<li><strong>Length bias</strong>: Condition C prompts produce longer outputs, and longer outputs may dilute the concentration of <img src="https://latex.codecogs.com/png.latex?G_T">-relevant content without reflecting goal displacement.</li>
<li><strong>Model-specific behavior</strong>: Different models may show different ITGI rates. The structural specification hypothesis should be tested across model families, not assumed to generalize from a single model.</li>
</ul>
<hr>
</section>
</section>
<section id="authorship-note" class="level2" data-number="7">
<h2 data-number="7" class="anchored" data-anchor-id="authorship-note"><span class="header-section-number">7</span> Authorship Note</h2>
<p>Lalitha A R identified the phenomenon from a specific instance — a log entry that satisfied its structural requirements while failing its purpose — and connected it to an earlier observation documented in the iSRL GitHub discussion (isrl-research/discussions/10). She searched for analogues in the behavioral science literature, found The Decision Lab’s treatment of the intention-action gap, and identified the inversion: that the LLM failure runs opposite to the human failure.</p>
<p>Claude searched the academic literature, confirmed Merton’s goal displacement as the relevant organizational sociology tradition, identified Allport and Goodhart as the upstream sources, proposed the structural specification hypothesis as the testable form of the claim, and drafted this paper from the resulting papertable. The core observation, the inversion framing, and the cross-domain question are Lalitha’s. The literature mapping, experimental design, and written synthesis are Claude’s.</p>
<hr>
</section>
<section id="references" class="level2" data-number="8">
<h2 data-number="8" class="anchored" data-anchor-id="references"><span class="header-section-number">8</span> References</h2>
<div id="refs" class="references csl-bib-body hanging-indent">
<div id="ref-allport1937personality" class="csl-entry">
Allport, Gordon W. 1937. <em>Personality: A Psychological Interpretation</em>. Holt.
</div>
<div id="ref-amodei2016concrete" class="csl-entry">
Amodei, Dario, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. 2016. <em>Concrete Problems in <span>AI</span> Safety</em>. <a href="https://arxiv.org/abs/1606.06565">https://arxiv.org/abs/1606.06565</a>.
</div>
<div id="ref-campbell1979assessing" class="csl-entry">
Campbell, Donald T. 1979. <span>“Assessing the Impact of Planned Social Change.”</span> <em>Evaluation and Program Planning</em> 2 (1): 67–90. <a href="https://doi.org/10.1016/0149-7189(79)90048-X">https://doi.org/10.1016/0149-7189(79)90048-X</a>.
</div>
<div id="ref-denison2024sycophancy" class="csl-entry">
<span class="nocase">Denison, Carson, Monte MacDiarmid, Fazl Barez, et al.</span> 2024. <em>Sycophancy to Subterfuge: Investigating Reward Tampering in Large Language Models</em>. <a href="https://arxiv.org/abs/2406.10162">https://arxiv.org/abs/2406.10162</a>.
</div>
<div id="ref-goodhart1975problems" class="csl-entry">
Goodhart, Charles A. E. 1975. <span>“Problems of Monetary Management: The <span>U.K.</span> Experience.”</span> In <em>Papers in Monetary Economics</em>. Reserve Bank of Australia.
</div>
<div id="ref-langosco2022goal" class="csl-entry">
Langosco di Langosco, Lauro, Jack Koch, Lee D. Sharkey, Jacob Pfau, and David Krueger. 2022. <span>“Goal Misgeneralization in Deep Reinforcement Learning.”</span> <em>Proceedings of the 39th International Conference on Machine Learning</em>, Proceedings of machine learning research, vol. 162: 12004–19.
</div>
<div id="ref-manheim2018categorizing" class="csl-entry">
Manheim, David, and Scott Garrabrant. 2018. <em>Categorizing Variants of <span class="nocase">Goodhart’s Law</span></em>. <a href="https://arxiv.org/abs/1803.04585">https://arxiv.org/abs/1803.04585</a>.
</div>
<div id="ref-merton1940bureaucratic" class="csl-entry">
Merton, Robert K. 1940. <span>“Bureaucratic Structure and Personality.”</span> <em>Social Forces</em> 18 (4): 560–68. <a href="https://doi.org/10.2307/2570634">https://doi.org/10.2307/2570634</a>.
</div>
<div id="ref-sheeran2002intention" class="csl-entry">
Sheeran, Paschal. 2002. <span>“Intention–Behavior Relations: A Conceptual and Empirical Review.”</span> <em>European Review of Social Psychology</em> 12 (1): 1–36. <a href="https://doi.org/10.1080/14792772143000003">https://doi.org/10.1080/14792772143000003</a>.
</div>
<div id="ref-sheeran2016gap" class="csl-entry">
Sheeran, Paschal, and Thomas L. Webb. 2016. <span>“The Intention–Behavior Gap.”</span> <em>Social and Personality Psychology Compass</em> 10 (9): 503–18. <a href="https://doi.org/10.1111/spc3.12265">https://doi.org/10.1111/spc3.12265</a>.
</div>
<div id="ref-sridhar2023hierarchical" class="csl-entry">
Sridhar, Abishek, Robert Lo, Frank F. Xu, Hao Zhu, and Shuyan Zhou. 2023. <em>Hierarchical Prompting Assists Large Language Model on Web Navigation</em>. <a href="https://arxiv.org/abs/2305.14257">https://arxiv.org/abs/2305.14257</a>.
</div>
</div>


    <link rel="stylesheet" href="../../review-layer.css">
    <script>window._reviewPageTitle = "When the Means Become the End: Instrumental-Terminal Goal Inversion in Large Language Models";</script>
    <div id="isrl-review-bar">
      <button id="btn-review-mode" onclick="ReviewLayer.toggleReviewMode()">Review</button>
      <button id="btn-finish-export" onclick="ReviewLayer.exportYAML()" style="display:none">Finish Review &amp; Export</button>
      <button id="btn-view-review" onclick="ReviewLayer.openViewMode()">View Review</button>
      <button id="btn-clear-review" onclick="ReviewLayer.clearAllAnnotations()" style="display:none">Clear all</button>
      <input type="file" id="review-yaml-input" accept=".yml,.yaml" style="display:none">
      <span id="review-mode-label"></span>
    </div>
    <script src="../../review-layer.js"></script>
  
</section>

 ]]></description>
  <guid>https://isrl.in/pub/2026-04-r-itgi/</guid>
  <pubDate>Thu, 02 Apr 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Constrained AI-Assisted Sampling for Fragmented Textual Spaces: A Framework for Data Collection Where No Ground Truth Exists</title>
  <dc:creator>Lalitha A R</dc:creator>
  <link>https://isrl.in/pub/2026-04-m-caas/</link>
  <description><![CDATA[ 




<script>
document.addEventListener('DOMContentLoaded', function() {
  var meta = document.querySelector('#title-block-header .quarto-title-meta');
  if (!meta) return;
  meta.insertAdjacentHTML('beforeend', '<div><div class="quarto-title-meta-heading">Contributors</div><div class="quarto-title-meta-contents"><p class="author" style="margin:0 0 0.1em 0;">Hitha Sunil</p><p style="font-size:0.82em;color:#555;margin:0 0 0.5em 0;font-style:italic;">Typesetting</p></div></div>');
});
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "ScholarlyArticle",
  "name": "Constrained AI-Assisted Sampling for Fragmented Textual Spaces: A Framework for Data Collection Where No Ground Truth Exists",
  "identifier": "iSRL-26-04-M-CAAS",
  "description": "A framework for constrained AI-assisted sampling in fragmented textual spaces where no ground truth exists and standard survey or ETL assumptions fail — developed for collecting Indian packaged food label data.",
  "creativeWorkStatus": "Draft",
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "url": "https://isrl.in/pub/2026-04-m-caas/",
  "author": {
    "@type": "Person",
    "name": "Lalitha A R",
    "identifier": "https://orcid.org/0009-0001-7466-3531",
    "sameAs": "https://orcid.org/0009-0001-7466-3531",
    "email": "lalithaar.research@gmail.com"
  },
  "publisher": {
    "@type": "ResearchOrganization",
    "name": "iSRL",
    "url": "https://isrl.in"
  }
}
</script>
<section id="abstract" class="level2" data-number="0.1">
<h2 data-number="0.1" class="anchored" data-anchor-id="abstract"><span class="header-section-number">0.1</span> Abstract</h2>
<p>Standard data collection methods begin with one of two assumptions. Survey sampling assumes a population you can enumerate: you know the frame, you draw from it, you account for non-response. ETL pipelines assume a schema you can target: you know what fields exist, what types they carry, what cleaning they require. Both assumptions hold comfortably in well-documented domains.</p>
<p>They do not hold in fragmented textual spaces.</p>
</section>
<section id="sec-problem" class="level1" data-number="1">
<h1 data-number="1"><span class="header-section-number">1</span> The Problem This Solves</h1>
<p>Standard data collection methods begin with one of two assumptions. Survey sampling assumes a population you can enumerate: you know the frame, you draw from it, you account for non-response. ETL pipelines assume a schema you can target: you know what fields exist, what types they carry, what cleaning they require. Both assumptions hold comfortably in well-documented domains.</p>
<p>They do not hold in fragmented textual spaces.</p>
<p>A fragmented textual space is not simply messy data. It is a domain where the information exists — recorded somewhere, in some form — but is distributed across unstructured sources with no shared vocabulary, no authoritative lexicon, and variation patterns that automated similarity measures cannot reliably navigate. The Indian packaged food label space is one example: the same ingredient appears as <code>maida</code>, <code>refined wheat flour</code>, and <code>all-purpose flour</code> across different brands, while <code>palm oil</code> and <code>palmite</code> look similar but are functionally distinct. A global news archive is another: 2.6 million flood events are embedded in articles across 80 languages, with relative time references, imprecise location language, and no standardised event schema.</p>
<p>In both cases, the data exists. The challenge is not absence but structure: extracting something queryable from something that was written for human reading in a specific context, not for machine consumption across contexts.</p>
<p>Traditional approaches to this problem either require labeled training data (which does not exist when you are building the first dataset in a domain) or rely on similarity thresholds (which fail when high-similarity strings are functionally distinct and low-similarity strings are synonymous). CAAS is neither. It uses a language model as a constrained retrieval and parsing tool — not a knowledge source — and builds the validation methodology around the cost structure of the errors it produces.</p>
</section>
<section id="sec-constrained-parser" class="level1" data-number="2">
<h1 data-number="2"><span class="header-section-number">2</span> Why AI as a Constrained Parser, Not a Generator</h1>
<p>The distinction that defines CAAS is what the model is being asked to do.</p>
<p>An unconstrained language model asked for ingredient information about a product it cannot find will often return plausible-sounding ingredients inferred from the product category. Asked what floods occurred in Mumbai last Tuesday, it will approximate. This behaviour — helpfulness in the face of absence — is the default and it is catastrophic for data collection. A fabricated entry looks identical to a real one. It corrupts the dataset invisibly, without a flag, without a gap that signals something is wrong.</p>
<p>CAAS uses the model differently. The model is given a retrieval task with a defined source list, a structured output schema, and an explicit instruction: if the information is not present in the permitted sources, return a designated failure token. It is not asked to know. It is asked to fetch and parse, with explicit failure as a first-class output.</p>
<p>The practical implementation has three components. Temperature is set to 0, which makes the model select the highest-probability token at each step and produce identical output for identical input. Sources are whitelisted: the model searches only pre-specified domains in a defined priority order. Failure is standardised: <code>DATA_NOT_FOUND</code> (or its equivalent) is the required output when all sources are exhausted, not an approximation and not an empty string.</p>
<p>The result is a system with two modes: it found the data and returned it, or it did not find the data and said so. Both modes are informative. The first populates the dataset. The second marks a gap that can be addressed through additional collection or acknowledged as a limitation. Neither mode silently fabricates.</p>
</section>
<section id="sec-error-cost" class="level1" data-number="3">
<h1 data-number="3"><span class="header-section-number">3</span> The Cost of Error Correction</h1>
<p>The strongest argument for CAAS is not its precision. It is its error economics.</p>
<p>In traditional physical sampling — a blood test, a field survey, a clinical measurement — a wrong sample means repeating the physical act. The cost of error correction is the cost of the original collection: the clinician’s time, the travel, the reagent. This makes high accuracy a hard requirement before you can afford to act on the data.</p>
<p>In constrained AI-assisted sampling over existing textual data, a wrong extraction means a refetch. The source data already exists. The text is already on a server somewhere. Correcting an extraction error costs one additional API call and a human review of one record. The marginal cost is low.</p>
<p>This asymmetry changes what accuracy level is sufficient. A 99% accurate physical sample with 1% requiring full re-collection is a serious problem. A 99% accurate AI extraction with 1% requiring a refetch is, in most contexts, acceptable — provided the 1% is identifiable. The validation methodology in CAAS is designed to make errors identifiable: statistical sampling establishes a confidence interval on the error rate, iterative audit converges on systematic error patterns, and explicit failure tokens mark the known gaps.</p>
<p>The framework does not claim that AI extraction is as accurate as careful manual collection. It claims that for many fragmented textual spaces, constrained AI extraction at documented accuracy levels is more useful than no dataset, more honest than an approximated one, and more recoverable when wrong than a physical sampling error.</p>
</section>
<section id="sec-framework" class="level1" data-number="4">
<h1 data-number="4"><span class="header-section-number">4</span> The Framework</h1>
<p>CAAS is not a fixed pipeline. It is a set of decisions that any implementation in a fragmented textual space will need to make, with evidence from two implementations on what those decisions should be and why.</p>
<section id="sec-atomic" class="level2" data-number="4.1">
<h2 data-number="4.1" class="anchored" data-anchor-id="sec-atomic"><span class="header-section-number">4.1</span> One Atomic Operation Per API Call</h2>
<p>Passing a full document or a large batch to the model and asking it to extract everything produces degraded constraint adherence as the model’s attention distributes across multiple tasks simultaneously. In both implementations documented here, constraint violations — approximations instead of explicit failures, formatting inconsistencies, missed boundary cases — increased measurably as batch size grew beyond a threshold.</p>
<p>The solution is decomposition. Each API call handles one atomic operation: retrieve the ingredient list for this specific product, or extract the location and timing of this specific flood event from this specific article. The operation is defined narrowly enough that the model can apply the full constraint set reliably.</p>
<p>In the ingredient extraction implementation, the threshold was empirically established at 6 SKUs per batch. Batches above 10 showed measurable constraint violations. Below 6, quality was equivalent but throughput was lower than necessary. The optimal batch size is domain-specific and should be tested rather than assumed.</p>
</section>
<section id="sec-explicit-failure" class="level2" data-number="4.2">
<h2 data-number="4.2" class="anchored" data-anchor-id="sec-explicit-failure"><span class="header-section-number">4.2</span> Explicit Failure Over Approximation</h2>
<p>This decision is described in Section&nbsp;2 and is the single most important constraint in the framework. The system instruction must be unambiguous: when data is absent from permitted sources, return the designated failure token. Do not infer. Do not approximate based on similar cases. Do not fill the gap.</p>
<p>In the ingredient extraction implementation, the system instruction read: <em>“If ingredient list not found in whitelisted domains, return DATA_NOT_FOUND. DO NOT infer typical ingredients from product category. DO NOT approximate based on similar products.”</em></p>
<p>Of 1,000 products attempted, 104 returned persistent <code>DATA_NOT_FOUND</code> across two passes. These 104 were excluded from the corpus. The exclusion is a feature: those products either had no verifiable online ingredient list or were no longer in active distribution. The pipeline returned a clean gap rather than 104 fabricated entries that would have required expensive downstream correction.</p>
<p>In the flood extraction implementation, the equivalent constraint was classification: the model was required to distinguish between reports of actual past floods and articles discussing future warnings or policy — returning nothing for the latter rather than extracting a plausible but incorrect event record.</p>
</section>
<section id="sec-batch-size" class="level2" data-number="4.3">
<h2 data-number="4.3" class="anchored" data-anchor-id="sec-batch-size"><span class="header-section-number">4.3</span> Batch Size as a Quality Variable</h2>
<p>Batch size interacts with constraint adherence in a consistent pattern across both implementations. This is not primarily a cost or speed consideration. It is a quality variable that should be calibrated empirically for each domain and each stage of the pipeline.</p>
<p>In artifact removal and semantic decomposition stages of ingredient processing, batch size was set inversely to string complexity: short strings in batches of 40, complex multi-bracket strings one at a time. The same principle applies in news extraction: article complexity and length affect how reliably the model applies its classification and extraction constraints.</p>
<p>Test a range before committing to a batch size. The optimal value is not predictable from first principles.</p>
</section>
<section id="sec-audit" class="level2" data-number="4.4">
<h2 data-number="4.4" class="anchored" data-anchor-id="sec-audit"><span class="header-section-number">4.4</span> Iterative Human-in-the-Loop Audit</h2>
<p>Statistical validation establishes a confidence interval on the overall error rate. Iterative audit addresses systematic error patterns — categories of errors that recur and can be corrected in bulk.</p>
<p>The audit process runs as follows. A first model receives a sample of the extracted strings and identifies error types present. A second model receives the full extraction and flags instances of those specific error types. Human review resolves the flagged cases. Corrections are applied. The cycle repeats until the first model identifies no new error types.</p>
<p>In the ingredient extraction implementation, this converged in four iterations. The pattern across iterations was: 16.7% flagged in iteration 1, 7.1% in iteration 2, edge cases only in iteration 3, zero new error types in iteration 4. The edge cases in iteration 3 were boundary decisions — <code>gluten</code> classified as a grain or a protein, <code>spirulina</code> as an additive or a botanical — that required domain judgment rather than extraction correction. These were held for the classification framework stage, not resolved as cleaning errors.</p>
<p>Convergence does not mean zero errors. It means no new systematic error types are detectable. The residual error rate is quantified by the statistical sampling step.</p>
</section>
<section id="sec-validation" class="level2" data-number="4.5">
<h2 data-number="4.5" class="anchored" data-anchor-id="sec-validation"><span class="header-section-number">4.5</span> Statistical Validation with Finite Population Correction</h2>
<p>Complete manual validation is not feasible at scale. Statistical sampling with a confidence interval is.</p>
<p>For a population of size <img src="https://latex.codecogs.com/png.latex?N">, desired confidence level <img src="https://latex.codecogs.com/png.latex?1%20-%20%5Calpha">, and margin of error <img src="https://latex.codecogs.com/png.latex?%5Cdelta">, required sample size with finite population correction:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0An%20=%20%5Cfrac%7Bz_%7B%5Calpha/2%7D%5E2%20%5Ccdot%20p(1-p)%7D%7B%5Cdelta%5E2%7D%20%5Ccdot%20%5Cfrac%7BN%7D%7BN%20-%201%20+%20%5Cdfrac%7Bz_%7B%5Calpha/2%7D%5E2%20%5Ccdot%20p(1-p)%7D%7B%5Cdelta%5E2%7D%7D%0A"></p>
<p>Using conservative <img src="https://latex.codecogs.com/png.latex?p%20=%200.5"> (maximum variance), <img src="https://latex.codecogs.com/png.latex?%5Calpha%20=%200.05">, <img src="https://latex.codecogs.com/png.latex?%5Cdelta%20=%200.05">, a population of approximately 2,000 requires a sample of around 130. For the ingredient extraction corpus, 90 extractions from 896 were audited manually. One error was identified: the model merged content from two adjacent sections of a product page. The 95% confidence interval on the population error rate, with finite population correction applied, places the upper bound below 3.6%. Stated as accuracy: the corpus is 98.9% accurate at 95% confidence.</p>
<p>Audit allocation should be risk-stratified: concentrate effort on high-risk subsets (very short strings that may be truncations, very long strings that may be insufficiently decomposed, low-confidence extractions) while maintaining a random component for unbiased population coverage.</p>
</section>
</section>
<section id="sec-two-domains" class="level1" data-number="5">
<h1 data-number="5"><span class="header-section-number">5</span> Two Domains, Same Architecture</h1>
<p>The primary evidence that CAAS generalises is not theoretical. It is that two independent implementations, in different domains, by different teams, working on different problems, arrived at the same architectural decisions.</p>
<section id="sec-case1" class="level2" data-number="5.1">
<h2 data-number="5.1" class="anchored" data-anchor-id="sec-case1"><span class="header-section-number">5.1</span> Case Study 1: Indian Packaged Food Ingredient Vocabulary</h2>
<p><strong>The problem.</strong> No reference layer exists that maps the names Indian food labels use to shared ingredient identities. The same substance appears as <code>maida</code>, <code>refined wheat flour</code>, and <code>all-purpose flour</code>. Standard similarity measures would merge <code>palm oil</code> and <code>palmite</code>, which are functionally distinct, while missing the equivalence of <code>besan flour</code> and <code>chana dal</code>, which are the same ingredient in different language registers. No ground truth lexicon exists to train a supervised system against.</p>
<p><strong>The implementation.</strong> 1,000 products were selected across 42 companies and 153 brands from verified Indian market listings. Ingredient lists were retrieved from whitelisted domains (brand official website, Amazon India, BigBasket, Blinkit) at temperature 0, with <code>DATA_NOT_FOUND</code> required when all sources were exhausted. Retrieved strings were parsed using a structure-aware algorithm that splits on commas only at nesting depth zero, preserving compound ingredient relationships. Each string then went through a single-purpose artifact removal pass (removing percentages and marketing text, preserving INS codes and preparation specifications) and a semantic decomposition pass with context propagation. The process ran at 6 SKUs per batch for retrieval and scaled inversely with string complexity for subsequent stages.</p>
<p><strong>Results.</strong> 896 of 1,000 products extracted successfully (89.6%). 104 returned persistent <code>DATA_NOT_FOUND</code>. The sampling pipeline produced 1,987 unique variant strings. Combined with ingredient strings from OpenFoodFacts filtered to rows with a verifiable Indian product name and passed through the same pipeline, the final corpus after iterative audit is 2,291 unique ingredient variant strings. Audit of 90 extractions identified 1 error (0.11%). Full methodology documented in <strong><span class="citation" data-cites="ifidSamplingCorpus2026">(R. 2026)</span></strong>.</p>
</section>
<section id="sec-case2" class="level2" data-number="5.2">
<h2 data-number="5.2" class="anchored" data-anchor-id="sec-case2"><span class="header-section-number">5.2</span> Case Study 2: Global Flash Flood Historical Record</h2>
<p><strong>The problem.</strong> Hydro-meteorological hazards like flash floods lack a standardised global observation infrastructure. Existing archives capture large, long-lasting events but miss localised and fast-moving floods. The Global Disaster Alert and Coordination System holds approximately 10,000 records — orders of magnitude fewer than what AI-based forecasting models require for training and validation. The historical record exists, embedded in news archives across 80 languages, but has never been extracted at scale.</p>
<p><strong>The implementation.</strong> Google’s Groundsource framework analysed news reports where flooding was a primary subject, standardised text into English via translation, and used Gemini to apply three constrained extraction tasks: classification (distinguishing actual past flood events from articles about future warnings or policy), temporal reasoning (anchoring relative date references against publication dates), and spatial precision (mapping location references to standardised geographic polygons). The model was not asked to know where floods occurred. It was asked to read a specific article and extract specific structured fields — with explicit verification criteria for each field rather than open-ended generation <span class="citation" data-cites="groundsource2026">(Rotem Mayo 2026)</span>.</p>
<p><strong>Results.</strong> 2.6 million historical flood events extracted, spanning more than 150 countries from 2000 to present. Manual review found 60% of extracted events accurate in both location and timing; 82% accurate enough for practical research use. Spatiotemporal matching against GDACS records for 2020–2026 shows Groundsource captured between 85% and 100% of severe events in that reference set, alongside large numbers of smaller localised events the reference set missed entirely.</p>
</section>
<section id="sec-convergence" class="level2" data-number="5.3">
<h2 data-number="5.3" class="anchored" data-anchor-id="sec-convergence"><span class="header-section-number">5.3</span> What the Convergence Shows</h2>
<p>Neither implementation was designed with the other in mind. The decisions they share — constrain the model’s role to retrieval and parsing, require explicit failure for absent data, calibrate batch size empirically, validate statistically — emerged independently from the same underlying problem: how to collect structured data from a space where the information exists but no ground truth organises it.</p>
<p>The table below shows the architectural correspondence.</p>
<table class="caption-top table">
<caption>Architectural decisions across two independent CAAS implementations.</caption>
<colgroup>
<col style="width: 33%">
<col style="width: 33%">
<col style="width: 33%">
</colgroup>
<thead>
<tr class="header">
<th><strong>Decision</strong></th>
<th><strong>Ingredient vocabulary</strong></th>
<th><strong>Flood record</strong></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Model role</td>
<td>Retrieval and parsing only</td>
<td>Classification, temporal anchoring, spatial extraction</td>
</tr>
<tr class="even">
<td>Source constraint</td>
<td>Whitelisted domains in priority order</td>
<td>News reports where flooding is primary subject</td>
</tr>
<tr class="odd">
<td>Failure handling</td>
<td><code>DATA_NOT_FOUND</code> token</td>
<td>Explicit classification criteria; non-flood articles return nothing</td>
</tr>
<tr class="even">
<td>Batch calibration</td>
<td>6 SKUs per batch (empirical)</td>
<td>Per-article processing with complexity-aware handling</td>
</tr>
<tr class="odd">
<td>Validation</td>
<td>Statistical sampling + iterative audit</td>
<td>Manual review sample; spatiotemporal matching against reference archive</td>
</tr>
<tr class="even">
<td>Accuracy result</td>
<td>98.9% at 95% confidence</td>
<td>82% practically useful; 85–100% severe event recall</td>
</tr>
</tbody>
</table>
<p>The accuracy figures are not directly comparable — the domains define error differently, and the flood implementation targets a harder extraction problem (temporal and spatial reasoning from prose) than ingredient retrieval from structured label text. What is comparable is the architecture: the same three constraints, applied to the same class of problem, producing usable datasets in spaces where no dataset previously existed.</p>
</section>
</section>
<section id="sec-limitations" class="level1" data-number="6">
<h1 data-number="6"><span class="header-section-number">6</span> What This Does Not Guarantee</h1>
<p>Temperature 0 reduces output variation but does not eliminate it. API version changes, infrastructure differences, and floating-point non-determinism across hardware can produce different outputs for identical inputs across sessions. The reproducibility guarantee is strong within a session and weaker across time. Any implementation should log the model version and API configuration used, and treat re-runs after infrastructure changes as requiring re-validation.</p>
<p>The framework does not remove the need for domain judgment. In the ingredient implementation, boundary cases — whether <code>gluten</code> belongs in grains or proteins, whether <code>spirulina</code> is an additive or a botanical — were not resolvable through cleaning. They required a classification framework with explicit criteria for how those categories are defined. CAAS reduces the volume of decisions that require human judgment. It does not eliminate the decisions themselves.</p>
<p>The error rates documented here are domain-specific. A 0.11% error rate for ingredient extraction from structured label text on retail websites is not a prediction for other domains. Text that is more ambiguous, sources that are less reliable, or extraction tasks that require more complex reasoning will produce higher error rates. The validation methodology applies regardless: establish the error rate empirically, state it with a confidence interval, document what was done about systematic errors.</p>
</section>
<section id="sec-applicability" class="level1" data-number="7">
<h1 data-number="7"><span class="header-section-number">7</span> Where This Applies</h1>
<p>CAAS is appropriate when four conditions hold simultaneously.</p>
<p>First, the target information exists in retrievable textual form. The framework cannot collect data that was never recorded. It can only structure data that exists but is unstructured.</p>
<p>Second, no authoritative reference organises the domain. If a canonical lexicon or schema exists, use it. CAAS is for when you are building the first one.</p>
<p>Third, domain-specific variation makes automated similarity measures unreliable. If standard fuzzy matching at reasonable thresholds produces acceptable results, that is simpler and should be preferred. CAAS is for when the variation patterns require something that can read context.</p>
<p>Fourth, the cost of error correction is low relative to the cost of not having the data. In safety-critical applications where downstream decisions are irreversible, the accuracy requirements may be higher than CAAS can reliably achieve without prohibitive validation cost. In research contexts where the dataset is a starting point for further analysis and errors are correctable, the asymmetry holds.</p>
<p>Both case studies satisfy all four conditions. The ingredient vocabulary space has no authoritative Indian lexicon, variation patterns that defeat similarity measures, and corrections that cost a refetch. The flood archive space has no global sensor network, event descriptions embedded in prose across 80 languages, and corrections that cost a re-extraction from an article that remains available.</p>
<section id="acknowledgements" class="level2" data-number="7.1">
<h2 data-number="7.1" class="anchored" data-anchor-id="acknowledgements"><span class="header-section-number">7.1</span> Acknowledgements</h2>
<p>My deepest gratitude to Mr.&nbsp;Krishna, whose constancy forms the foundation upon which all my work, including this, quietly rests. Salutations to the Goddess who dwells in all beings in the form of intelligence. I bow to her again and again.</p>
<p>This report was prepared as part of the Indian Food Informatics Data (IFID) project at the Interdisciplinary Systems Research Lab (iSRL).</p>
</section>
<section id="statements-and-declarations" class="level2" data-number="7.2">
<h2 data-number="7.2" class="anchored" data-anchor-id="statements-and-declarations"><span class="header-section-number">7.2</span> Statements and Declarations</h2>
<section id="funding-declaration" class="level3" data-number="7.2.1">
<h3 data-number="7.2.1" class="anchored" data-anchor-id="funding-declaration"><span class="header-section-number">7.2.1</span> Funding Declaration</h3>
<p>No funding was received to assist with the preparation of this manuscript.</p>
</section>
<section id="author-contribution" class="level3" data-number="7.2.2">
<h3 data-number="7.2.2" class="anchored" data-anchor-id="author-contribution"><span class="header-section-number">7.2.2</span> Author Contribution</h3>
<p>L.A.R. was responsible for all aspects of this report, including conceptualization, methodology, writing the original draft, and review and editing.</p>
</section>
<section id="competing-interests" class="level3" data-number="7.2.3">
<h3 data-number="7.2.3" class="anchored" data-anchor-id="competing-interests"><span class="header-section-number">7.2.3</span> Competing Interests</h3>
<p>The author declares no competing interests.</p>
</section>
</section>
<section id="references" class="level2 unnumbered">
<h2 class="unnumbered anchored" data-anchor-id="references">References</h2>
<div id="refs" class="references csl-bib-body hanging-indent">
<div id="ref-ifidSamplingCorpus2026" class="csl-entry">
R., L. A. 2026. <em>IFID Sampling Corpus — Placeholder, Fill with Zenodo DOI</em>. Interdisciplinary Systems Research Lab (iSRL).
</div>
<div id="ref-groundsource2026" class="csl-entry">
Rotem Mayo, Moral Bootbool, Oleg Zlydenko. 2026. <span>“Groundsource: A Dataset of Flood Events from News.”</span> March. <a href="https://doi.org/10.31223/X5RR2K">https://doi.org/10.31223/X5RR2K</a>.
</div>
</div>


    <link rel="stylesheet" href="../../review-layer.css">
    <script>window._reviewPageTitle = "Constrained AI-Assisted Sampling for Fragmented Textual Spaces: A Framework for Data Collection Where No Ground Truth Exists";</script>
    <div id="isrl-review-bar">
      <button id="btn-review-mode" onclick="ReviewLayer.toggleReviewMode()">Review</button>
      <button id="btn-finish-export" onclick="ReviewLayer.exportYAML()" style="display:none">Finish Review &amp; Export</button>
      <button id="btn-view-review" onclick="ReviewLayer.openViewMode()">View Review</button>
      <button id="btn-clear-review" onclick="ReviewLayer.clearAllAnnotations()" style="display:none">Clear all</button>
      <input type="file" id="review-yaml-input" accept=".yml,.yaml" style="display:none">
      <span id="review-mode-label"></span>
    </div>
    <script src="../../review-layer.js"></script>
  
</section>
</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@report{a_r2026,
  author = {A R, Lalitha},
  publisher = {iSRL},
  title = {Constrained {AI-Assisted} {Sampling} for {Fragmented}
    {Textual} {Spaces:} {A} {Framework} for {Data} {Collection} {Where}
    {No} {Ground} {Truth} {Exists}},
  number = {iSRL-26-04-M-CAAS},
  date = {2026-04-01},
  url = {https://isrl.in/pub/2026-04-m-caas/},
  doi = {10.5281/zenodo.[record-id]},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-a_r2026" class="csl-entry quarto-appendix-citeas">
A R, Lalitha. 2026. <em>Constrained AI-Assisted Sampling for Fragmented
Textual Spaces: A Framework for Data Collection Where No Ground Truth
Exists</em>. iSRL-26-04-M-CAAS. iSRL. <a href="https://doi.org/10.5281/zenodo.[record-id]">https://doi.org/10.5281/zenodo.[record-id]</a>.
</div></div></section></div> ]]></description>
  <guid>https://isrl.in/pub/2026-04-m-caas/</guid>
  <pubDate>Wed, 01 Apr 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Data Acquisition and Ingredient Extraction: Building a Vocabulary of What India’s Packaged Food Labels Actually Say</title>
  <dc:creator>Lalitha A R</dc:creator>
  <link>https://isrl.in/pub/2026-04-r-variants/</link>
  <description><![CDATA[ 




<script>
document.addEventListener('DOMContentLoaded', function() {
  var meta = document.querySelector('#title-block-header .quarto-title-meta');
  if (!meta) return;
  meta.insertAdjacentHTML('beforeend', '<div><div class="quarto-title-meta-heading">Contributors</div><div class="quarto-title-meta-contents"><p class="author" style="margin:0 0 0.1em 0;">Subrat Sethi</p><p style="font-size:0.82em;color:#555;margin:0 0 0.5em 0;font-style:italic;">SKU Verification</p><p class="author" style="margin:0 0 0.1em 0;">Purnendu Shukla</p><p style="font-size:0.82em;color:#555;margin:0 0 0.5em 0;font-style:italic;"></p></div></div><div><div class="quarto-title-meta-heading">Reviewers</div><div class="quarto-title-meta-contents"><p class="author" style="margin:0 0 0.1em 0;">Radhakrishna MV</p><p style="font-size:0.82em;color:#555;margin:0 0 0.5em 0;font-style:italic;">Contributor, Open Food Facts India</p></div></div>');
});
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "ScholarlyArticle",
  "name": "Data Acquisition and Ingredient Extraction: Building a Vocabulary of What India's Packaged Food Labels Actually Say",
  "identifier": "iSRL-26-04-R-Data",
  "description": "Documents the data acquisition methodology and ingredient extraction process used to build a vocabulary of what India's packaged food labels actually say — from raw label text to structured ingredient strings across 896 SKUs and the Open Food Facts India dataset.",
  "creativeWorkStatus": "Draft",
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "url": "https://isrl.in/pub/2026-04-r-variants/",
  "author": {
    "@type": "Person",
    "name": "Lalitha A R",
    "identifier": "https://orcid.org/0009-0001-7466-3531",
    "sameAs": "https://orcid.org/0009-0001-7466-3531",
    "email": "lalithaar.research@gmail.com"
  },
  "publisher": {
    "@type": "ResearchOrganization",
    "name": "iSRL",
    "url": "https://isrl.in"
  }
}
</script>
<section id="the-question-that-starts-everything" class="level2" data-number="1">
<h2 data-number="1" class="anchored" data-anchor-id="the-question-that-starts-everything"><span class="header-section-number">1</span> The Question That Starts Everything</h2>
<p>A computer cannot tell you whether rice is healthier than Maggi. Not because the comparison is philosophically difficult, but because the infrastructure required to answer it does not exist.</p>
<p>To answer the question, the system needs to know what is in both products. To know what is in them, it needs ingredient data. To use ingredient data, it needs to know that “maida” and “refined wheat flour” refer to the same thing — and that “palm oil” and “palmite” do not, even though automated similarity measures would score them close. To know that, it needs a stable reference layer that maps the names labels actually use to the identities they actually mean.</p>
<p>That reference layer does not exist for Indian packaged food. This report documents the first step toward building it: a collection of ingredient variant strings extracted from commercial Indian food labels, captured as they appear, without flattening the diversity that makes them what they are.</p>
</section>
<section id="why-the-diversity-is-not-the-problem" class="level2" data-number="2">
<h2 data-number="2" class="anchored" data-anchor-id="why-the-diversity-is-not-the-problem"><span class="header-section-number">2</span> Why the Diversity Is Not the Problem</h2>
<p>A label from a major Indian snack brand might read:</p>
<blockquote class="blockquote">
<p>Seasoning Mix {Iodised Salt, Chilli Powder (1.1%), #Spices &amp; Condiments, Onion, Maltodextrin, Wheat Flour, Milk Solids, Black Salt, Tomato Powder [Tomato Paste, Anticaking Agent (INS 551)], Refined Sugar, Hydrolyzed Vegetable Protein, Acidity Regulators (INS 296, INS 330, INS 334), Garlic, Anticaking Agent (INS 551), Flavour Enhancers (INS 627, INS 631)} And Iodised Salt.</p>
</blockquote>
<p>This is not poorly formatted data. This is a brand communicating ingredient relationships to consumers across India’s 22 official languages and hundreds of regional contexts, within the structure FSSAI Labelling Rules 2020 require. The nested brackets encode functional relationships: “Acidity Regulators” governs three INS codes as a category. “Tomato Powder” contains both a base ingredient and an additive. A Tamil-speaking consumer and a Hindi-speaking consumer both need to read this label correctly. The formatting serves them.</p>
<p>The goal of this project is not to make that label simpler. It is to build the layer underneath it that makes it machine-queryable — without asking ITC, or any other brand, to change a word.</p>
<p>A substrate is the layer that makes other things possible to build. Concrete is a substrate: you do not live in concrete, you live in the building the concrete made possible. The substrate does not care what the building looks like. IFID — Indian Food Informatics Data — is being built as that layer for ingredient identity. Tamil names stay Tamil. INS codes stay in their FSSAI-specified format. The nested bracket structure a brand uses to communicate to its consumers stays exactly as designed. The substrate sits underneath and makes them interoperable: queryable as the same ingredient when that is what you need, distinguishable as different expressions when that matters.</p>
<p>Coordination without convergence. That is the specific goal.</p>
</section>
<section id="the-wall-and-who-is-already-working-on-it" class="level2" data-number="3">
<h2 data-number="3" class="anchored" data-anchor-id="the-wall-and-who-is-already-working-on-it"><span class="header-section-number">3</span> The Wall, and Who Is Already Working on It</h2>
<p>Everyone who works with Indian packaged food data hits the same wall from a different direction.</p>
<p>The nutritionist has fifty product samples and is spending half her time cleaning label data before she can begin her actual analysis. The e-commerce platform has the same ingredient listed seventeen different ways across seventeen brands and cannot build a consistent product catalogue. The compliance team is manually reconciling ingredient declarations across FSSAI requirements, retailer formats, and export documentation — separately, every time. The researcher who could build a tool to flag allergen risks has found there is no labelled dataset to train on.</p>
<p>None of these people are doing it wrong. The wall is not their failure. The wall is that no shared ingredient identity layer exists.</p>
<p>The most serious open effort to build one globally is OpenFoodFacts (OFF). OFF has documented food products across dozens of countries through crowdsourced contributions. The scale of that work is significant and the intent is the same as this project’s: make food data open, structured, and usable. The gap in Indian product coverage that this report documents is not a gap in OFF’s effort. It is a direct reflection of how fragmented and underdocumented the Indian packaged food space actually is — which is precisely what makes the problem worth working on, and precisely what makes collaboration across efforts like these necessary.</p>
</section>
<section id="two-sources-one-problem" class="level2" data-number="4">
<h2 data-number="4" class="anchored" data-anchor-id="two-sources-one-problem"><span class="header-section-number">4</span> Two Sources, One Problem</h2>
<p>Building the ingredient vocabulary required two separate collection strategies, for the same underlying reason: no single existing source reliably answers whether a product is a current, shelf-available Indian packaged food with a verifiable ingredient list.</p>
<section id="why-off-could-not-be-the-only-source" class="level3" data-number="4.1">
<h3 data-number="4.1" class="anchored" data-anchor-id="why-off-could-not-be-the-only-source"><span class="header-section-number">4.1</span> Why OFF Could Not Be the Only Source</h3>
<p>OFF contains thousands of English ingredient lists for products with Indian brand names. Those lists are valuable. But the dataset structure does not reliably distinguish a product currently on Indian supermarket shelves from an imported variant, an export formulation, or a historical listing no longer in distribution.</p>
<p>For the purpose of this corpus — documenting what Indian consumers actually encounter today — that distinction matters. An ingredient list attached to a product that is not in the Indian market does not reflect the vocabulary Indian food systems use.</p>
<p>The null rates in the OFF data confirm the scale of the gap. Of 19,748 rows in the raw export (captured 2 February 2026):</p>
<ul>
<li>Only 4,104 pass a minimum filter: brand present, English product name present, English ingredient text present. That is 20.78 percent.</li>
<li><code>ingredients_text_en</code> is the only ingredient column with coverage above 1 percent. All 29 other language columns combined add 69 rows to that count.</li>
<li>6,905 rows have both a brand identifier and an English product name — the product exists, it has a name — but no ingredient text in any language. The gap is specifically at the ingredient field.</li>
<li>The four core macronutrient fields (energy, fat, protein, carbohydrates) have null rates between 65.61 and 66.00 percent across the full dataset.</li>
</ul>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Field</th>
<th style="text-align: right;">Non-null</th>
<th style="text-align: right;">Null %</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>energy_value</td>
<td style="text-align: right;">6,792</td>
<td style="text-align: right;">65.61</td>
</tr>
<tr class="even">
<td>fat_value</td>
<td style="text-align: right;">6,715</td>
<td style="text-align: right;">66.00</td>
</tr>
<tr class="odd">
<td>proteins_value</td>
<td style="text-align: right;">6,737</td>
<td style="text-align: right;">65.89</td>
</tr>
<tr class="even">
<td>carbohydrates_value</td>
<td style="text-align: right;">6,759</td>
<td style="text-align: right;">65.77</td>
</tr>
</tbody>
</table>
<p>These numbers are not a criticism. They are a measurement of the space. The gap in documented Indian food data is real, it is large, and it exists because the underlying ecosystem is genuinely fragmented — not because anyone has failed to document it well enough.</p>
<p>What OFF does have — the 4,104 rows with a brand, a product name, and an English ingredient list — is usable for this project. The product name provides the minimum anchor needed to verify the product is Indian. Those rows were taken through the same constrained parsing pipeline described below, and their ingredient strings added to the vocabulary set. The filter that kept them is described in the claims document.</p>
</section>
<section id="why-direct-sampling-was-necessary" class="level3" data-number="4.2">
<h3 data-number="4.2" class="anchored" data-anchor-id="why-direct-sampling-was-necessary"><span class="header-section-number">4.2</span> Why Direct Sampling Was Necessary</h3>
<p>For a reliable picture of what is currently on Indian shelves, the corpus needed to be collected directly. The methodology: select products from companies with documented Indian market presence, retrieve ingredient lists from verifiable online sources, extract and parse.</p>
<p><strong>Company and product selection.</strong> Ten companies were selected based on market presence across major packaged food categories — snacks, beverages, staples, dairy, condiments. Within each company, selection moved through sub-brands (ITC’s portfolio spans Aashirvaad, Sunfeast, Bingo, YiPPee — each with a different ingredient vocabulary) and then to individual SKUs meeting four criteria:</p>
<ol type="1">
<li>Ingredient list traceable on a whitelisted domain</li>
<li>Product available in the Indian market, not an export or international variant</li>
<li>Specificity to a single SKU, not a product range — “Aashirvaad Turmeric Powder 200g” not “Aashirvaad Spices”</li>
<li>One representative retained per formulation across pack sizes</li>
</ol>
<p>The third criterion produced the most rejections. References like “Cadbury Chocolates” or “Aashirvaad Spices” denote product families, not individual items with specific ingredient lists. Every such reference required disambiguation before it could enter the corpus.</p>
<p>After validation: 1,000 SKUs across 42 companies, 153 brands, 8 macro-categories.</p>
</section>
</section>
<section id="why-standard-automated-parsing-fails-here" class="level2" data-number="5">
<h2 data-number="5" class="anchored" data-anchor-id="why-standard-automated-parsing-fails-here"><span class="header-section-number">5</span> Why Standard Automated Parsing Fails Here</h2>
<p>Before describing what the pipeline does, it is worth being precise about why standard approaches do not work for this specific problem.</p>
<p>Palm oil and palmite are chemically and functionally distinct ingredients. An automated similarity measure — edit distance, embedding cosine similarity, fuzzy matching — would score them as near-identical. Acting on that score would silently corrupt the vocabulary.</p>
<p>Besan flour and chana dal are the same ingredient in different languages and forms. A similarity measure that does not carry cultural and linguistic knowledge would treat them as unrelated.</p>
<p>These are not edge cases. They are representative of how Indian food labelling works: regional names, transliterations, preparation-state variants, and INS codes all coexist on the same label, as do acronyms (such as FOS or TBHQ) and British/American spelling variants. Sometimes referring to the same thing, sometimes to things that are genuinely distinct. Standard clustering and normalisation algorithms cannot reliably navigate this space. The cost of a silent error — a wrong merge, a missed distinction — propagates forward into every analysis built on the vocabulary.</p>
<p>The approach used here trades throughput for verifiability: one atomic operation per API call, constrained to prevent approximation, with explicit failure when the data is not there.</p>
</section>
<section id="the-extraction-pipeline" class="level2" data-number="6">
<h2 data-number="6" class="anchored" data-anchor-id="the-extraction-pipeline"><span class="header-section-number">6</span> The Extraction Pipeline</h2>
<section id="constrained-retrieval" class="level3" data-number="6.1">
<h3 data-number="6.1" class="anchored" data-anchor-id="constrained-retrieval"><span class="header-section-number">6.1</span> Constrained Retrieval</h3>
<p>The model retrieved ingredient lists from whitelisted domains only, in priority order: brand official website, then Amazon India, BigBasket, Blinkit. If no source returned the ingredient list, the output was <code>DATA_NOT_FOUND</code>. The instruction was explicit: do not infer typical ingredients from product category, do not approximate based on similar products.</p>
<p>Temperature was set to 0. This means the model selects the highest-probability token at each step and produces identical output for identical input. The practical effect: if you run the same extraction twice, you get the same result. Validation becomes tractable. Fabrication through sampling variation is eliminated.</p>
<p>Batch size was tested across 1 to 20 SKUs per call. At batch sizes above 10, constraint violations increased measurably — the model began returning approximations instead of <code>DATA_NOT_FOUND</code> for products it could not find, and formatting inconsistencies appeared. Six SKUs per batch produced the best balance of throughput and constraint adherence.</p>
<p>Results across 1,000 SKUs:</p>
<ul>
<li>First pass: 871 successful extractions (87.1%), 129 <code>DATA_NOT_FOUND</code></li>
<li>Second pass on the 129 failures: 25 additional extractions, 104 persistent failures</li>
<li>Final corpus: 896 extracted (89.6%), 104 excluded</li>
</ul>
<p>The 104 persistent failures validate that the constraint held. Those products either had no verifiable online ingredient list or were no longer in active distribution. The pipeline returned an explicit gap rather than a filled approximation. An explicit gap can be addressed later. A fabricated entry corrupts the vocabulary invisibly.</p>
<p>Manual audit of 90 extractions from the 896: 1 error identified (the model merged content from two adjacent sections of a product page). Error rate: 1 in 896 (0.11 percent).</p>
</section>
<section id="structure-preserving-parsing" class="level3" data-number="6.2">
<h3 data-number="6.2" class="anchored" data-anchor-id="structure-preserving-parsing"><span class="header-section-number">6.2</span> Structure-Preserving Parsing</h3>
<p>The 896 extracted ingredient lists were not fed to the pipeline as whole strings. Each list went through parsing as a discrete operation, because the structure of Indian food labels encodes relationships that naive splitting destroys.</p>
<p>Consider what happens when a comma-splitter treats every comma equally:</p>
<p><strong>Input:</strong> <code>Acidity Regulators (INS 296, INS 330, INS 334)</code></p>
<p><strong>Naive output:</strong></p>
<ul>
<li><code>Acidity Regulators (INS 296</code></li>
<li><code>INS 330</code></li>
<li><code>INS 334)</code></li>
</ul>
<p>The functional context — that INS 296, 330, and 334 are all acidity regulators — is gone. The fragments <code>INS 330</code> and <code>INS 334)</code> have no meaning without it.</p>
<p>The structure-aware parser tracks nesting depth. It splits on commas only at depth zero — the root level. Everything inside brackets is treated as a unit until the brackets close. Applied to the same input:</p>
<p><strong>Structure-aware output:</strong> <code>Acidity Regulators (INS 296, INS 330, INS 334)</code> — intact, ready for decomposition with context preserved.</p>
<p>896 ingredient lists → 2,926 parsed strings with functional relationships intact.</p>
</section>
<section id="artifact-removal" class="level3" data-number="6.3">
<h3 data-number="6.3" class="anchored" data-anchor-id="artifact-removal"><span class="header-section-number">6.3</span> Artifact Removal</h3>
<p>Each of the 2,926 strings went through a single-purpose cleaning pass: remove presentation artifacts, preserve identity information.</p>
<p>Removed: percentage values (<code>55.7%</code> — quantity, not identity), marketing text (<code>BINGO!</code>, <code>NEW!</code>), usage annotations (<code>#Used As Natural Flavouring Agent</code>).</p>
<p>Preserved: INS codes and E-numbers (regulatory identifiers), preparation specifications (<code>Salt (Iodised)</code> — the bracketed term distinguishes a specific variety), functional classifications (<code>Acidity Regulator</code>, <code>Emulsifier</code>).</p>
<p>The distinction matters because it is not always obvious. <code>55.7%</code> is presentation — removing it loses nothing about what the ingredient is. <code>(Iodised)</code> is identity — removing it collapses iodised salt and table salt into the same entry, which is wrong.</p>
<p>Batch sizes for this stage were set inversely to string complexity: short strings processed in batches of 40, complex multi-bracket strings processed one at a time. Attention dilution at scale produces the same constraint violations as in the retrieval stage.</p>
</section>
<section id="semantic-decomposition" class="level3" data-number="6.4">
<h3 data-number="6.4" class="anchored" data-anchor-id="semantic-decomposition"><span class="header-section-number">6.4</span> Semantic Decomposition</h3>
<p>After cleaning, compound structures were decomposed with context propagation. Each atomic operation took one compound and returned its components, with the functional classification carried forward to each:</p>
<p><strong>Input:</strong> <code>Flavour Enhancers (INS 627, INS 631)</code><br>
<strong>Output:</strong> <code>Flavour Enhancer INS 627</code>, <code>Flavour Enhancer INS 631</code></p>
<p><strong>Input:</strong> <code>Stabilizing &amp; Emulsifying Agents (412, 410, 407, 471, 466)</code><br>
<strong>Output:</strong> <code>Stabilizer INS 412</code>, <code>Stabilizer INS 410</code>, <code>Stabilizer INS 407</code>, <code>Emulsifier INS 471</code>, <code>Stabilizer INS 466</code></p>
<p><strong>Input:</strong> <code>Black Pepper Powder, Ginger Powder, Clove Powder</code><br>
<strong>Output:</strong> unchanged — already atomic</p>
<p>2,926 cleaned strings → 3,452 decomposed ingredient mentions → 1,987 unique variants after deduplication across all 896 products from the sampling pipeline.</p>
<p>The full transformation for one product (Bingo Original Style, ITC Ltd.) produced 21 ingredient mentions, including:</p>
<blockquote class="blockquote">
<p>‘Black Salt’, ‘Chilli’, ‘Citric Acid (INS 330)’, ‘Disodium Guanylate (INS 627)’, ‘Disodium Inosinate (INS 631)’, ‘Garlic’, ‘Hydrolyzed Vegetable Protein’, ‘Maida’, ‘Malic Acid (INS 296)’, ‘Maltodextrin’, ‘Milk Solids’, ‘Onion’, ‘Palm Oil’, ‘Potato’, ‘Salt’, ‘Silicon Dioxide (INS 551)’, ‘Spices and Condiments’, ‘Sugar’, ‘Tartaric Acid (INS 334)’, ‘Tomato’</p>
</blockquote>
</section>
</section>
<section id="combining-the-two-sources" class="level2" data-number="7">
<h2 data-number="7" class="anchored" data-anchor-id="combining-the-two-sources"><span class="header-section-number">7</span> Combining the Two Sources</h2>
<p>The sampling pipeline produced 1,987 unique variant strings from 896 directly collected products. The OFF pipeline — 4,104 rows filtered to those with a verifiable product name, passed through the same constrained parsing stages — contributed an additional set of ingredient strings from a different cross-section of the label space.</p>
<p>Combined and deduplicated across both sources, then cleaned through multiple iterative audit rounds (documented in Appendix A), the final corpus contains <strong>2,291 unique ingredient variant strings</strong>.</p>
<p>These are not errors to correct or synonyms to collapse. They are documentation of how ingredient identity is expressed across Indian commercial food labels. The same ingredient appears in multiple forms because Indian food labelling reflects genuine linguistic and cultural diversity:</p>
<ul>
<li><code>Chilli</code> / <code>Chili</code> / <code>Chillies</code> — orthographic variants, all in use</li>
<li><code>Maida</code> / <code>Refined Wheat Flour</code> / <code>All-Purpose Flour</code> — the same ingredient across language registers</li>
<li><code>Onion Powder</code> / <code>Dried Onion</code> / <code>Dehydrated Onion</code> — preparation-state variants</li>
<li><code>INS 330</code> / <code>Citric Acid</code> / <code>Acidity Regulator INS 330</code> — the same compound at different levels of regulatory specificity</li>
<li><code>Iodised Salt</code> / <code>Salt (Iodised)</code> / <code>Table Salt</code> — formatting alternatives for the same distinction</li>
</ul>
<p>Each of these variants appears on labels that consumers read, regulators review, and supply chains track. The infrastructure this project is building needs to work with all of them — not by picking one as canonical and discarding the rest, but by organising them so that a query for any one returns the right set.</p>
<p>The Tamil name on a label stays Tamil. The INS code stays in its FSSAI format. The regional cultivar name stays as the brand printed it. The substrate underneath makes them queryable as the same ingredient when that is what the question requires.</p>
</section>
<section id="what-this-corpus-makes-possible" class="level2 page-columns page-full" data-number="8">
<h2 data-number="8" class="anchored" data-anchor-id="what-this-corpus-makes-possible"><span class="header-section-number">8</span> What This Corpus Makes Possible</h2>
<p>The output of this report is an open dataset:</p>
<ul>
<li>Ingredient variant strings extracted from OFF data, filtered to rows with a verifiable Indian product name, cleaned through the same pipeline<sup>1</sup></li>
</ul>
<div class="no-row-height column-margin column-container"><div id="fn1"><p><sup>1</sup>&nbsp;Release of 896 SKUs with verified ingredient lists is withheld to adhere to the stakeholder protection principles as discussed in <a href="https://github.com/isrl-research/sandbox-research/discussions/20">iSRL-26-XX-G-Protection: Data Governance Principles — Protecting Every Stakeholder in the IFID Ecosystem #20</a> and <a href="https://github.com/isrl-research/sandbox-research/discussions/20">iSRL-26-XX-G-Access: Access Architecture — Tiered Data Access for the IFID API #21</a>.</p></div></div><p>Combined : a documented vocabulary of 2,291 unique ingredient expressions from Indian packaged food labels, with extraction methodology, constraint architecture, and quality validation documented in full.</p>
<p>The next question the corpus raises is: which of these 2,291 variants refer to the same ingredient, and by what logic? <code>Maida</code> and <code>Refined Wheat Flour</code> are the same substance. <code>Palm Oil</code> and <code>Palmite</code> are not, despite surface similarity. <code>Besan flour</code> and <code>chana dal</code> are related but distinct in preparation state. Answering that question requires a classification framework capable of handling identity, equivalence, and subset relationships across a space where standard similarity measures are unreliable.</p>
<p>That framework — the EMF Model (Energy, Matter, Function) — is defined in A R (2026). Further progress on the mapping problem is deferred to future reports.</p>
</section>
<section id="claims-and-verification" class="level2" data-number="9">
<h2 data-number="9" class="anchored" data-anchor-id="claims-and-verification"><span class="header-section-number">9</span> Claims and Verification</h2>
<p>All numerical claims in this report are independently verifiable against the source datasets. The full claims list with evidence per claim is available at</p>
<section id="claims" class="level3" data-number="9.1">
<h3 data-number="9.1" class="anchored" data-anchor-id="claims"><span class="header-section-number">9.1</span> Claims</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 50%">
<col style="width: 50%">
</colgroup>
<thead>
<tr class="header">
<th>ID</th>
<th>Claim</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>OFF.C.01</td>
<td>Of 19,748 rows in the raw OpenFoodFacts export, 4,104 pass the minimum filter (brand, product name in English, ingredient text in English), a pass rate of 20.78 percent.</td>
</tr>
<tr class="even">
<td>OFF.C.02</td>
<td>ingredients_text_en is the only ingredient column with coverage above 1 percent. It has 4,592 non-null rows (23.25 percent). All 29 other language columns combined add 69 additional rows.</td>
</tr>
<tr class="odd">
<td>OFF.C.03</td>
<td>6,905 rows have both a brand identifier and an English product name but no ingredient text in any language. The data gap is at the ingredient field, not at product identity.</td>
</tr>
<tr class="even">
<td>OFF.C.04</td>
<td>The four core macronutrient fields have null rates between 65.61 and 66.00 percent across all 19,748 rows: energy_value 65.61 percent (6,792 non-null), fat_value 66.00 percent (6,715 non-null), proteins_value 65.89 percent (6,737 non-null), carbohydrates_value 65.77 percent (6,759 non-null).</td>
</tr>
<tr class="odd">
<td>OFF.C.05</td>
<td>The three Hindi language columns have the following non-null counts across 19,748 rows: product_name_hi 111, ingredients_text_hi 11, generic_name_hi 2.</td>
</tr>
<tr class="even">
<td>OFF.C.06</td>
<td>Replacing product_name_en OR generic_name_en with product_name_en alone as a filter condition reduces the output from 4,105 rows to 4,104 rows. generic_name_en contributes one unique row.</td>
</tr>
<tr class="odd">
<td>OFF.C.07</td>
<td>The raw dataset has 486 columns. The filtered dataset retains 4 columns: product_name_en, brands, brands_tags, and ingredients_text_en.</td>
</tr>
<tr class="even">
<td>SAMP.C.01</td>
<td>The sampling corpus spans 42 companies, 153 consumer-facing brands, and 896 SKUs across 8 product macro-categories and 30 sub-categories.</td>
</tr>
<tr class="odd">
<td>SAMP.C.02</td>
<td>The five highest-SKU companies — Tata Consumer Products (104), Amul / GCMMF (82), Haldiram’s (68), Hindustan Unilever (67), and ITC Ltd (65) — account for 386 SKUs, or 43.1 percent of the 896-SKU corpus.</td>
</tr>
<tr class="even">
<td>SAMP.C.03</td>
<td>SKU distribution across eight macro-categories derived from top-3 category fields per company: beverages (200), sweets and desserts (176), staples and spices (174), ready to eat and ready to cook (100), snacks and namkeen (67), pantry and condiments (47), health and wellness (43), dairy and breakfast (30). These sum to 837 of 896 total SKUs; the remaining 59 fall into sub-categories not captured in the top-3 field per company.</td>
</tr>
<tr class="odd">
<td>SAMP.C.04</td>
<td>Of 1,000 SKUs submitted for extraction, 871 returned successful ingredient lists on first pass (87.1 percent). A second-pass retry on the 129 failures yielded 25 additional extractions (2.5 percent). Final corpus: 896 successful extractions (89.6 percent). 104 SKUs returned DATA_NOT_FOUND across both passes and are excluded.</td>
</tr>
<tr class="even">
<td>SAMP.C.05</td>
<td>Manual audit of 90 extractions from the 896-SKU corpus identified 1 hallucination instance. Rate: 1 in 896 (0.11 percent).</td>
</tr>
<tr class="odd">
<td>SAMP.C.06</td>
<td>Four SKU validation criteria were applied before extraction: (1) ingredient list traceable within whitelisted domains; (2) product available in the Indian market, not an export or international variant; (3) specificity to a single SKU, not a product range; (4) one representative retained per formulation across pack sizes.</td>
</tr>
<tr class="even">
<td>SAMP.C.07</td>
<td>Extraction operated under five constraints: (1) temperature = 0; (2) domain whitelist: brand official website, Amazon India, BigBasket, Blinkit, in priority order; (3) DATA_NOT_FOUND returned when all sources exhausted; (4) JSON output schema enforced with four required fields (product_name, ingredient_list, source_url, confidence); (5) brand official website given precedence over retailer listings on conflict.</td>
</tr>
<tr class="odd">
<td>SAMP.C.08</td>
<td>Batch sizes from 1 to 20 SKUs per API call were tested. 6 SKUs per batch was identified as optimal. Batches exceeding 10 SKUs produced measurably increased constraint violations including inappropriate DATA_NOT_FOUND omissions and formatting inconsistencies.</td>
</tr>
</tbody>
</table>
</section>
<section id="evidence-per-claim" class="level3" data-number="9.2">
<h3 data-number="9.2" class="anchored" data-anchor-id="evidence-per-claim"><span class="header-section-number">9.2</span> Evidence Per Claim</h3>
<section id="off.c.01" class="level4" data-number="9.2.1">
<h4 data-number="9.2.1" class="anchored" data-anchor-id="off.c.01"><span class="header-section-number">9.2.1</span> OFF.C.01</h4>
<p>Raw row count: 19,748. Filter applied: brands OR brands_tags non-empty, AND product_name_en non-empty, AND ingredients_text_en non-empty. Rows passing all three conditions: 4,104. Pass rate: 20.78 percent. Rows removed: 15,644 (79.22 percent).</p>
</section>
<section id="off.c.02" class="level4" data-number="9.2.2">
<h4 data-number="9.2.2" class="anchored" data-anchor-id="off.c.02"><span class="header-section-number">9.2.2</span> OFF.C.02</h4>
<table class="caption-top table">
<colgroup>
<col style="width: 42%">
<col style="width: 57%">
</colgroup>
<thead>
<tr class="header">
<th>Column</th>
<th style="text-align: right;">Non-null rows</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>ingredients_text_en</td>
<td style="text-align: right;">4,592</td>
</tr>
<tr class="even">
<td>ingredients_text_fr</td>
<td style="text-align: right;">94</td>
</tr>
<tr class="odd">
<td>ingredients_text_de</td>
<td style="text-align: right;">15</td>
</tr>
<tr class="even">
<td>ingredients_text_hi</td>
<td style="text-align: right;">11</td>
</tr>
<tr class="odd">
<td>All remaining 26 language columns (de-duplicated against English)</td>
<td style="text-align: right;">69</td>
</tr>
</tbody>
</table>
<p>Pooling all 30 ingredient language columns yields 4,661 rows with any ingredient text, against 4,592 for English alone.</p>
</section>
<section id="off.c.03" class="level4" data-number="9.2.3">
<h4 data-number="9.2.3" class="anchored" data-anchor-id="off.c.03"><span class="header-section-number">9.2.3</span> OFF.C.03</h4>
<p>Rows passing (brands OR brands_tags) AND product_name_en: 11,009. Of these, rows also passing ingredients_text_en: 4,104. Rows with brand and name but no ingredient text: 6,905.</p>
</section>
<section id="off.c.04" class="level4" data-number="9.2.4">
<h4 data-number="9.2.4" class="anchored" data-anchor-id="off.c.04"><span class="header-section-number">9.2.4</span> OFF.C.04</h4>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Field</th>
<th style="text-align: right;">Null</th>
<th style="text-align: right;">Non-null</th>
<th style="text-align: right;">Null %</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>energy_value</td>
<td style="text-align: right;">12,956</td>
<td style="text-align: right;">6,792</td>
<td style="text-align: right;">65.61</td>
</tr>
<tr class="even">
<td>fat_value</td>
<td style="text-align: right;">13,033</td>
<td style="text-align: right;">6,715</td>
<td style="text-align: right;">66.00</td>
</tr>
<tr class="odd">
<td>proteins_value</td>
<td style="text-align: right;">13,011</td>
<td style="text-align: right;">6,737</td>
<td style="text-align: right;">65.89</td>
</tr>
<tr class="even">
<td>carbohydrates_value</td>
<td style="text-align: right;">12,989</td>
<td style="text-align: right;">6,759</td>
<td style="text-align: right;">65.77</td>
</tr>
</tbody>
</table>
<p>Computed on the full 19,748-row dataset.</p>
</section>
<section id="off.c.05" class="level4" data-number="9.2.5">
<h4 data-number="9.2.5" class="anchored" data-anchor-id="off.c.05"><span class="header-section-number">9.2.5</span> OFF.C.05</h4>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Column</th>
<th style="text-align: right;">Non-null</th>
<th style="text-align: right;">Null</th>
<th style="text-align: right;">Null %</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>product_name_hi</td>
<td style="text-align: right;">111</td>
<td style="text-align: right;">19,637</td>
<td style="text-align: right;">99.44</td>
</tr>
<tr class="even">
<td>ingredients_text_hi</td>
<td style="text-align: right;">11</td>
<td style="text-align: right;">19,737</td>
<td style="text-align: right;">99.94</td>
</tr>
<tr class="odd">
<td>generic_name_hi</td>
<td style="text-align: right;">2</td>
<td style="text-align: right;">19,746</td>
<td style="text-align: right;">99.99</td>
</tr>
</tbody>
</table>
<p>Computed on the full 19,748-row dataset.</p>
</section>
<section id="off.c.06" class="level4" data-number="9.2.6">
<h4 data-number="9.2.6" class="anchored" data-anchor-id="off.c.06"><span class="header-section-number">9.2.6</span> OFF.C.06</h4>
<table class="caption-top table">
<colgroup>
<col style="width: 30%">
<col style="width: 30%">
<col style="width: 40%">
</colgroup>
<thead>
<tr class="header">
<th>Filter</th>
<th>Condition</th>
<th style="text-align: right;">Result</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Filter A</td>
<td>(brands OR brands_tags) AND (product_name_en OR generic_name_en) AND ingredients_text_en</td>
<td style="text-align: right;">4,105 rows</td>
</tr>
<tr class="even">
<td>Filter B</td>
<td>(brands OR brands_tags) AND product_name_en AND ingredients_text_en</td>
<td style="text-align: right;">4,104 rows</td>
</tr>
</tbody>
</table>
<p>Difference: 1 row. That row had generic_name_en populated and product_name_en empty. In the 4,105-row set, generic_name_en has 451 non-null values (10.99 percent non-null, 89.01 percent null).</p>
</section>
<section id="off.c.07" class="level4" data-number="9.2.7">
<h4 data-number="9.2.7" class="anchored" data-anchor-id="off.c.07"><span class="header-section-number">9.2.7</span> OFF.C.07</h4>
<p>Raw column count: 486. Columns retained after filter: product_name_en, brands, brands_tags, ingredients_text_en. Column count in working dataset: 4. The 482 removed columns include all non-English name and ingredient variants, all nutrient sub-fields, environmental scores, packaging fields, and contributor metadata.</p>
</section>
<section id="samp.c.01" class="level4" data-number="9.2.8">
<h4 data-number="9.2.8" class="anchored" data-anchor-id="samp.c.01"><span class="header-section-number">9.2.8</span> SAMP.C.01</h4>
<p>Roster file header: Total SKUs: 896 | Brands: 153 | Companies: 42 | Parent cats: 8 | Sub-cats: 30.</p>
</section>
<section id="samp.c.02" class="level4" data-number="9.2.9">
<h4 data-number="9.2.9" class="anchored" data-anchor-id="samp.c.02"><span class="header-section-number">9.2.9</span> SAMP.C.02</h4>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Company</th>
<th style="text-align: right;">SKUs</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Tata Consumer Products</td>
<td style="text-align: right;">104</td>
</tr>
<tr class="even">
<td>Amul / GCMMF</td>
<td style="text-align: right;">82</td>
</tr>
<tr class="odd">
<td>Haldiram’s</td>
<td style="text-align: right;">68</td>
</tr>
<tr class="even">
<td>Hindustan Unilever</td>
<td style="text-align: right;">67</td>
</tr>
<tr class="odd">
<td>ITC Ltd</td>
<td style="text-align: right;">65</td>
</tr>
<tr class="even">
<td><strong>Total (top 5)</strong></td>
<td style="text-align: right;"><strong>386</strong></td>
</tr>
</tbody>
</table>
<p>386 / 896 = 43.1 percent of corpus.</p>
</section>
<section id="samp.c.03" class="level4" data-number="9.2.10">
<h4 data-number="9.2.10" class="anchored" data-anchor-id="samp.c.03"><span class="header-section-number">9.2.10</span> SAMP.C.03</h4>
<p>Summed from top-3 category fields across all 42 company rows in roster. Sum: 837. Difference from 896: 59 SKUs assigned to sub-categories below each company’s top three.</p>
</section>
<section id="samp.c.04" class="level4" data-number="9.2.11">
<h4 data-number="9.2.11" class="anchored" data-anchor-id="samp.c.04"><span class="header-section-number">9.2.11</span> SAMP.C.04</h4>
<p>First pass: 871 extracted, 129 DATA_NOT_FOUND. Second pass on 129: 25 additional, 104 persistent DATA_NOT_FOUND. Total extracted: 896. Total excluded: 104. Pass rate: 896 / 1000 = 89.6 percent.</p>
</section>
<section id="samp.c.05" class="level4" data-number="9.2.12">
<h4 data-number="9.2.12" class="anchored" data-anchor-id="samp.c.05"><span class="header-section-number">9.2.12</span> SAMP.C.05</h4>
<p>Audit sample: 90 SKUs. Errors found: 1 (model merged content from multiple webpage sections). Rate: 1 / 896 = 0.0011.</p>
</section>
<section id="samp.c.06" class="level4" data-number="9.2.13">
<h4 data-number="9.2.13" class="anchored" data-anchor-id="samp.c.06"><span class="header-section-number">9.2.13</span> SAMP.C.06</h4>
<p>Four criteria applied at SKU selection stage. Documented rejection categories: product range references requiring disambiguation to individual SKU (e.g., “Aashirvaad Spices” to “Aashirvaad Turmeric Powder 200g”); products not in Indian market distribution.</p>
</section>
<section id="samp.c.07" class="level4" data-number="9.2.14">
<h4 data-number="9.2.14" class="anchored" data-anchor-id="samp.c.07"><span class="header-section-number">9.2.14</span> SAMP.C.07</h4>
<p>Five constraints applied uniformly to all 1,000 attempted SKUs. System instruction for DATA_NOT_FOUND: “If ingredient list not found in whitelisted domains, return DATA_NOT_FOUND. DO NOT infer typical ingredients from product category. DO NOT approximate based on similar products.”</p>
</section>
<section id="samp.c.08" class="level4" data-number="9.2.15">
<h4 data-number="9.2.15" class="anchored" data-anchor-id="samp.c.08"><span class="header-section-number">9.2.15</span> SAMP.C.08</h4>
<p>Batch sizes 1–20 tested during extraction development. Optimal: 6 SKUs per batch. Violations at &gt;10 SKUs: DATA_NOT_FOUND omissions and formatting inconsistencies.</p>
</section>
</section>
</section>
<section id="appendix-a-sample-cleaning-rounds" class="level2 unnumbered">
<h2 class="unnumbered anchored" data-anchor-id="appendix-a-sample-cleaning-rounds">Appendix A: Sample Cleaning Rounds</h2>
<p>The iterative cleaning process that produced the final 2,291 variant set from the combined corpus was not individually logged. Individual round logs were not maintained by design: the changes involved — correcting a transliteration typo, removing a fragment like <code>dried-powder</code> that parsed as an ingredient but was a formatting artifact, deciding whether <code>monohydrate</code> belonged in the corpus at all — were too granular and numerous to document round by round without the log itself becoming unmanageable.</p>
<p>What follows is a representative excerpt from the audit scripts used during this process. It shows what the review actually looked like: automated flagging, human decision at each boundary case, iterative convergence toward a clean set.</p>
<p><strong>One audit pass — executive summary:</strong></p>
<pre><code>=============================================
      AI AUDIT EXECUTIVE SUMMARY
=============================================
Total Entries Audited : 709

APPROVED               :  623 (87.9%)
MODIFIED               :   51 (7.2%)
INVALID                :   35 (4.9%)
=============================================</code></pre>
<p><strong>Entries flagged as INVALID — strings that were not ingredients:</strong></p>
<pre><code>atlantic · center-filling · cfu · chips · compound · dessert
dried-powder · dry · energy · flakes · food-additives · lubrication
moisture · monohydrate · mononitrate · only · pizza · plant-base
powder-mix · preservative · protein · savouries · test · toppings
vegetable · vegetable-mix · ...</code></pre>
<p><strong>Interactive kill review — the monohydrate decision:</strong></p>
<p>The boundary cases required a human in the loop. <code>monohydrate</code> and <code>mononitrate</code> are not ingredients — they are suffixes that appear on ingredient labels (as in <code>thiamine mononitrate</code>) but carry no identity when extracted alone. They were saved on first pass, then removed on second review:</p>
<pre><code>KILL 'monohydrate'? (y/n): n
Saving 'monohydrate'...

 Surgery Complete. Your files are now 'Steel'. 

[second pass]

KILL 'monohydrate'? (y/n): y
Executing 'monohydrate'...

 Surgery Complete. Your files are now 'Steel'. </code></pre>
<p><strong>Reclassification pass — where the judgments were not straightforward:</strong></p>
<pre><code>ITEM: fish
FROM: Additives &amp; Functional  →  TO: Proteins &amp; Meats
Accept? y 

ITEM: gluten
FROM: Staples (Grains/Dals)  →  TO: Proteins &amp; Meats
Accept? n   Added to manual review.

ITEM: spirulina
FROM: Additives &amp; Functional  →  TO: Fruits, Veg &amp; Botanicals
Accept? n   Added to manual review.

ITEM: fava-bean-protein
FROM: Additives &amp; Functional  →  TO: Proteins &amp; Meats
Accept? n   Added to manual review.</code></pre>
<p><code>gluten</code>, <code>spirulina</code>, and <code>fava-bean-protein</code> are examples where the automated reclassification suggestion was defensible but not settled — each sits at a category boundary that requires a classification framework to resolve, not a cleaning pass. They were held for the mapping stage.</p>
<p><strong>Final state after all cleaning rounds:</strong></p>
<pre><code>==================================================
     FINAL MONOGRAPH DATA SUMMARY
==================================================
Total Raw Variants (TSV)    : 46,635
Unique Canonical Units      : 662
==================================================</code></pre>
<p>The 46,635 raw variants and 662 canonical units shown here are from the OFF monograph specifically — a separate but parallel cleaning process applied to the OFF-derived strings. The 2,291 figure reported in the main body is the combined and deduplicated variant set from both sources, prior to canonical mapping. These are different counts at different stages of the pipeline and are not in conflict.</p>
</section>
<section id="contributors" class="level2 unnumbered">
<h2 class="unnumbered anchored" data-anchor-id="contributors">Contributors</h2>
<p><strong>Lalitha A R</strong> — Conceptualization, methodology, data curation, formal analysis, writing (original draft), writing (review and editing).</p>
<p><strong>Subrat Sethi</strong> — Data curation: SKU verification (200 SKUs).</p>
<p><strong>Purnendu Shukla</strong> — Software: API script execution for ingredient data retrieval.</p>
<p><strong>Radhakrishna MV</strong> <em>(Contributor, Open Food Facts India)</em> — Writing (review and editing): manuscript review for accurate and respectful representation of the Open Food Facts dataset and contributor ecosystem.</p>
</section>
<section id="acknowledgements" class="level2 unnumbered">
<h2 class="unnumbered anchored" data-anchor-id="acknowledgements">Acknowledgements</h2>
<p>My deepest gratitude to Mr.&nbsp;Krishna, whose constancy forms the foundation upon which all my work, including this, quietly rests. Salutations to the Goddess who dwells in all beings in the form of intelligence. I bow to her again and again.</p>
<p>We are deeply grateful to all contributors of OFF Dataset - one of the core sources which our efforts build upon. Thank you for all that you do. This report was prepared as part of the Indian Food Informatics Data (IFID) project at the Interdisciplinary Systems Research Lab (iSRL).</p>
</section>
<section id="references" class="level2 unnumbered">
<h2 class="unnumbered anchored" data-anchor-id="references">References</h2>
<div id="refs" class="references csl-bib-body hanging-indent">
<div id="ref-arIdentityTransformationFunction2026" class="csl-entry">
Lalitha, A. R. 2026. <em><span class="nocase">Identity, Transformation, and Function: A Tri-Axial Model for the Classification of Food Ingredient Identity</span></em>. Interdisciplinary Systems Research Lab. <a href="https://doi.org/10.5281/zenodo.18714526">https://doi.org/10.5281/zenodo.18714526</a>.
</div>
</div>


</section>


<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@report{a_r2026,
  author = {A R, Lalitha},
  publisher = {iSRL},
  title = {Data {Acquisition} and {Ingredient} {Extraction:} {Building}
    a {Vocabulary} of {What} {India’s} {Packaged} {Food} {Labels}
    {Actually} {Say}},
  number = {iSRL-26-04-R-Variants},
  date = {2026-03-01},
  url = {https://isrl.in/pub/2026-04-r-variants/},
  langid = {en},
  abstract = {Indian packaged food labels do not share a common
    ingredient vocabulary. The same substance appears under regional
    names, transliterations, INS (International Numbering System) codes,
    and brand-specific terms -\/-\/- sometimes across labels from the
    same manufacturer. No reference layer exists that maps these
    expressions to shared identities. This report documents the
    construction of a first ingredient variant corpus from two sources:
    896 directly sampled products collected from verified Indian market
    listings, and English ingredient strings from OpenFoodFacts filtered
    to rows with a traceable Indian product name. Both sets were
    processed through a constrained parsing pipeline -\/-\/- one atomic
    operation per API call, temperature set to 0, explicit failure
    rather than approximation when data was unavailable. After combining
    both sources and iterative cleaning, the corpus contains 2,291
    unique ingredient variant strings. These variants are not noise to
    eliminate. They are documentation of how ingredient identity is
    expressed in practice across Indian commercial food labels. The
    question of which variants refer to the same ingredient -\/-\/- and
    by what logic -\/-\/- is addressed in EMF Model
    {[}@arIdentityTransformationFunction2026{]}.}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-a_r2026" class="csl-entry quarto-appendix-citeas">
A R, Lalitha. 2026. <em>Data Acquisition and Ingredient Extraction:
Building a Vocabulary of What India’s Packaged Food Labels Actually Say
</em>. iSRL-26-04-R-Variants. iSRL. <a href="https://isrl.in/pub/2026-04-r-variants/">https://isrl.in/pub/2026-04-r-variants/</a>.
</div></div></section></div> ]]></description>
  <guid>https://isrl.in/pub/2026-04-r-variants/</guid>
  <pubDate>Sun, 01 Mar 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Regulatory Texts and Case Law as Ground Truth in Emerging Domains</title>
  <dc:creator>Lalitha A R</dc:creator>
  <link>https://isrl.in/pub/2026-02-m-groundtruth/</link>
  <description><![CDATA[ 




<script>
document.addEventListener('DOMContentLoaded', function() {
  var meta = document.querySelector('#title-block-header .quarto-title-meta');
  if (!meta) return;
  meta.insertAdjacentHTML('beforeend', '<div><div class="quarto-title-meta-heading">Contributors</div><div class="quarto-title-meta-contents"><p class="author" style="margin:0 0 0.1em 0;">Hitha Sunil</p><p style="font-size:0.82em;color:#555;margin:0 0 0.5em 0;font-style:italic;">Typesetting</p></div></div>');
});
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "ScholarlyArticle",
  "name": "Regulatory Texts and Case Law as Ground Truth in Emerging Domains",
  "@id": "https://doi.org/10.5281/zenodo.18741725",
  "identifier": [
    "https://doi.org/10.5281/zenodo.18741725",
    "iSRL-26-02-M-GroundTruth"
  ],
  "description": "Argues that enacted law and judicial decisions constitute a practically available, historically grounded source of ground truth in emerging domains. Introduces regulatory delta analysis — examining how legislation and case law shift across time — to surface friction points and coordination constraints.",
  "datePublished": "2026-02-23",
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "url": "https://isrl.in/pub/2026-02-m-groundtruth/",
  "author": {
    "@type": "Person",
    "name": "Lalitha A R",
    "identifier": "https://orcid.org/0009-0001-7466-3531",
    "sameAs": "https://orcid.org/0009-0001-7466-3531",
    "email": "lalithaar.research@gmail.com"
  },
  "publisher": {
    "@type": "ResearchOrganization",
    "name": "iSRL",
    "url": "https://isrl.in"
  }
}
</script>
<div class="abstract">
<p>In domains where no unified academic theory has yet consolidated, practitioners face a calibration problem: against what baseline does one evaluate a classification framework, a taxonomy, or a data model? This note argues that enacted law and judicial decisions constitute a practically available, historically grounded source of ground truth for such domains. We describe an approach—regulatory delta analysis—that examines how legislation and case law shift across time, with the aim of surfacing friction points, coordination patterns, and the constraints under which each version of a rule was written. The approach is not a critique of legislative bodies or courts; it is a method for reading the record they have already produced. Limitations are considered throughout. The note is grounded in applied work from the Indian Food Informatics Data (IFID) project but the reasoning is intended to transfer to other domains with similar structural properties.</p>
</div>
<section id="definitions" class="level1" data-number="1">
<h1 data-number="1"><span class="header-section-number">1</span> Definitions</h1>
<p>The terms below are used precisely throughout this note.</p>
<p><strong>Regulatory text.</strong> A statute, rule, or formal regulation enacted by a legislative or administrative body with binding legal authority. In the Indian context, examples include the Food Safety and Standards Act, 2006, and the FSSAI Labelling and Display Regulations, 2020.</p>
<p><strong>Case law.</strong> The body of judicial decisions—judgments handed down by courts and tribunals—that interpret, apply, and sometimes settle conflicts between regulatory texts. Case law acquires authority through the doctrine of precedent and, in many common law systems including India’s, through formal hierarchical citation obligations.</p>
<p><strong>Regulatory delta.</strong> The set of meaningful changes between two enacted versions of a regulatory text: additions, deletions, scope expansions, definitional tightenings, and any shifts in the underlying policy stance. A delta is not a simple diff; it is an interpretation of what changed and, where the record permits, why.</p>
<p><strong>Friction point.</strong> A location in the regulatory record—a contested statutory term, a split between regulators, a gap exploited by litigation—where the interests or interpretations of two or more actors have visibly come into tension. Friction points are distinct from errors; they are structurally revealing.</p>
<p><strong>Ground truth (working definition).</strong> For the purposes of this note, a ground truth is a reference against which the outputs of an analytical model can be tested. It does not imply perfect correctness; it implies that the reference has been produced by a process that is independent of and prior to the analysis being evaluated.</p>
</section>
<section id="the-calibration-problem-in-emerging-domains" class="level1" data-number="2">
<h1 data-number="2"><span class="header-section-number">2</span> The Calibration Problem in Emerging Domains</h1>
<p>Research in well-consolidated fields benefits from a body of replication, meta-analysis, and theoretical synthesis that can serve as a reference. A new model or framework can be tested against this existing record. The food systems informatics domain does not yet have this infrastructure in India. The field is recent, the data are fragmented, and the constructs—what counts as the same ingredient, how processing changes identity, how regional naming should relate to regulatory naming—remain unsettled.</p>
<p>A comparable situation arises in any domain that is technically complex, involves multiple institutional actors with partly overlapping mandates, and has developed faster than the academic literature has been able to consolidate it. Infrastructure regulation, environmental classification, digital governance, and traditional medicine systems all share these properties to varying degrees.</p>
<p>In such conditions, one cannot simply look to an established canon of empirical findings to validate a new framework. The question then is: what stable, legible, independently produced record exists against which one can calibrate?</p>
<p>We propose that the regulatory and legal record fills this role, with specific properties and specific limitations that are developed in the sections that follow.</p>
</section>
<section id="why-law-and-case-law-can-function-as-ground-truth" class="level1" data-number="3">
<h1 data-number="3"><span class="header-section-number">3</span> Why Law and Case Law Can Function as Ground Truth</h1>
<p>Regulatory texts and judicial decisions share several properties that make them useful as a reference in calibration contexts.</p>
<p><strong>They are the result of adversarial processes.</strong> Legislation is typically produced after consultation, lobbying, expert review, and political negotiation. Judicial decisions are produced after argument by opposed parties, subject to appeal, and written to justify a conclusion against the best available contrary reading. Both processes are imperfect, but both are designed to surface objections. The record they produce has, in a meaningful sense, survived challenge.</p>
<p><strong>They name conflicts directly.</strong> Academic literature tends to report consensus or to frame disagreement theoretically. Case law reports disagreement factually: here are two parties, here is what they disputed, here is which interpretation prevailed and why. The friction is the content of the document rather than a subtext to be inferred.</p>
<p><strong>They are time-stamped.</strong> Each regulatory text and each judgment carries a date. This makes it possible to sequence the record chronologically and ask which understanding of a concept was operative when.</p>
<p><strong>They are publicly accessible.</strong> In India, statutory instruments are notified in the Gazette of India. Judgments of the Supreme Court and High Courts are published in official reporters and on government databases. The record is, in principle, reachable by any researcher.</p>
<p><strong>They have institutional authority within their domain.</strong> A food safety regulation issued under the FSSAI Act is, for practical purposes, the definition of the relevant concept for the actors it governs—manufacturers, importers, inspection officers—regardless of whether a food scientist would agree with it. When building systems that must operate in that legal environment, the legal definition is not one input among many; it is a constraint.</p>
<p>None of these properties make the legal record infallible. Section 6 addresses limitations. But they are sufficient to make law a productive starting point when other reference points are unavailable.</p>
</section>
<section id="regulatory-delta-analysis-as-method" class="level1" data-number="4">
<h1 data-number="4"><span class="header-section-number">4</span> Regulatory Delta Analysis as Method</h1>
<p>A regulatory delta is not simply a list of changes between two versions of a statute or rule. It is an interpretation of those changes in light of the context that produced them.</p>
<section id="reading-changes-in-context" class="level2" data-number="4.1">
<h2 data-number="4.1" class="anchored" data-anchor-id="reading-changes-in-context"><span class="header-section-number">4.1</span> Reading changes in context</h2>
<p>Laws are written under constraints. The constraints include the state of the relevant industry at the time of drafting, the administrative capacity available to enforce the rule, the incidents that made a regulatory response necessary, and the political feasibility of various options. A later version of a rule almost always looks more precise, more comprehensive, or more technically sophisticated than an earlier one—not because earlier drafters were careless, but because they were working with less data, less precedent, and a less developed field.</p>
<p>The appropriate stance when reading a delta is that of a scribe rather than a judge: the task is to document what changed, to note what the earlier version could not have anticipated, and to ask what new information or new pressure made the change necessary. This is a different question from asking whether the earlier rule was wrong.</p>
<p>Applied to the Indian food labelling context, the transition from the Food Safety and Standards (Packaging and Labelling) Regulations, 2011 to the Food Safety and Standards (Labelling and Display) Regulations, 2020 illustrates this clearly <span class="citation" data-cites="FSSAI_RegulatoryDelta">(Vukka and Lalitha 2026)</span>. The 2011 regulations were drafted at a point when industrial food processing in India was still expanding rapidly and digital traceability tools were not yet available to enforcement bodies. The 2020 regulations tightened allergen declarations, introduced structured front-of-pack warnings, and prescribed naming conventions with greater specificity. Each of these additions corresponds to a domain that developed, was observed, and was then addressed. The delta reveals an institution processing experience and updating its instrument accordingly.</p>
</section>
<section id="identifying-friction-points-through-case-law" class="level2" data-number="4.2">
<h2 data-number="4.2" class="anchored" data-anchor-id="identifying-friction-points-through-case-law"><span class="header-section-number">4.2</span> Identifying friction points through case law</h2>
<p>Where the regulatory text leaves ambiguity, the courts resolve it—and in doing so, produce a record of where the ambiguity was, who held which interpretation, and which reading eventually prevailed. This makes case law particularly useful for identifying friction points that would not be visible from the statutory text alone.</p>
<p>The Supreme Court of India’s January 2026 judgment in <em>Commissioner of Customs (Import) v. M/s Welkin Foods</em> illustrates this <span class="citation" data-cites="lalitha_2026_supreme_court">(Lalitha 2026)</span>. The case concerned whether imported aluminium shelving should be classified as an agricultural machine part or as an aluminium structure. The legal question was narrow, but the Court’s reasoning established a hierarchy for resolving classification disputes in which statutory technical definitions take precedence over common commercial understanding. This hierarchy had been contested in earlier decisions and was now settled. For a food informatics system that must align with Indian classification practice, this judgment is a material constraint—one that would not have been legible from reading only the statutory text.</p>
</section>
<section id="mapping-coordination-across-institutional-actors" class="level2" data-number="4.3">
<h2 data-number="4.3" class="anchored" data-anchor-id="mapping-coordination-across-institutional-actors"><span class="header-section-number">4.3</span> Mapping coordination across institutional actors</h2>
<p>A regulatory domain typically involves multiple bodies with overlapping but non-identical mandates. In Indian food systems, the relevant actors include the Food Safety and Standards Authority of India (FSSAI), the Directorate General of Foreign Trade (DGFT), the Central Board of Indirect Taxes and Customs (CBIC), and the courts. These actors do not always interpret the domain identically, and their instruments do not always align.</p>
<p>The legal record makes these relationships visible. Where one body’s definition conflicts with another’s, there will typically be a judgment or a regulatory amendment that resolves the conflict, defers it, or acknowledges it. Mapping these interactions across time reveals not just what the current rule is, but how the current rule came to be and which pressures it is still absorbing.</p>
</section>
</section>
<section id="what-this-approach-surfaces" class="level1" data-number="5">
<h1 data-number="5"><span class="header-section-number">5</span> What This Approach Surfaces</h1>
<p>Regulatory delta analysis, applied systematically, tends to surface four types of information that are difficult to obtain from other sources.</p>
<p><strong>Constraint archaeology.</strong> Earlier versions of rules encode the constraints that were operative when they were written. Identifying these constraints—and asking whether they are still valid—can reveal where a regulatory framework is load-bearing on an assumption that may no longer hold.</p>
<p><strong>Coordination mechanisms.</strong> When two bodies with overlapping mandates produce consistent rules over time, it is worth asking how that consistency is achieved. The legal record will often contain evidence of formal coordination mechanisms, mutual referencing, or the adoption of shared definitions.</p>
<p><strong>Friction without resolution.</strong> Not all conflicts in the legal record are resolved. Some cases are settled before judgment. Some regulatory ambiguities are explicitly deferred. These unresolved tensions are as informative as the settled ones: they mark the places where the system has not yet stabilised.</p>
<p><strong>Bias documentation.</strong> Legal instruments are written by people operating in institutional contexts, and they reflect the concerns, categories, and blind spots of those contexts. A regulatory text that focuses on industrial food and does not address traditional preparations is not neutral; it reflects what was legible and politically salient at the time of drafting. Noting these asymmetries is part of using the legal record honestly.</p>
</section>
<section id="limitations" class="level1" data-number="6">
<h1 data-number="6"><span class="header-section-number">6</span> Limitations</h1>
<p>The approach described here has several limitations that must be held in mind.</p>
<p><strong>Law is not science.</strong> A court may settle a dispute by choosing an interpretation that is administratively convenient or politically feasible rather than technically accurate. A regulatory definition may persist after the scientific understanding of the relevant phenomenon has moved on. The legal record reflects what was decided, not necessarily what was correct.</p>
<p><strong>Unenforced rules are not reliable evidence.</strong> A statute that exists on paper but is not enforced tells us something about legislative intent but little about actual practice. The gap between enacted law and operational practice can be substantial.</p>
<p><strong>The record is not complete.</strong> Not all decisions are published. Not all conflicts result in litigation. Regulatory negotiations that produce an amended rule may leave no public trace of the original disagreement. The legal record samples the domain rather than covers it.</p>
<p><strong>Jurisdiction specificity.</strong> The regulatory architecture of one country or regulatory system is not directly transferable to another. Insights from Indian food law are not automatically generalisable to food systems in other jurisdictions, though the structural properties of the method may transfer.</p>
<p><strong>Temporal lag.</strong> Legislation and litigation are slow. The legal record may be significantly behind the current state of the domain, particularly in fast-moving technical fields. Using the legal record as ground truth requires acknowledging that it may be calibrated to a version of the domain that no longer obtains.</p>
<p>These limitations do not disqualify the approach. They specify the conditions under which it is and is not useful, and they indicate the supplementary sources—field research, domain expert consultation, technical audits—that should accompany it.</p>
</section>
<section id="relationship-to-other-sources-and-methods" class="level1" data-number="7">
<h1 data-number="7"><span class="header-section-number">7</span> Relationship to Other Sources and Methods</h1>
<p>This note does not argue that the legal record should replace other methods of establishing ground truth. It argues that the legal record is an underused source that has specific properties making it productive in specific conditions: emerging domains, multi-actor regulatory environments, and contexts where the gap between legal definition and operational practice is itself an object of study.</p>
<p>The approach is most useful in combination with domain expert consultation, which can identify where the legal record is silent or misleading; with empirical data collection, which can reveal practice that diverges from legal prescription; and with traditional academic literature, which provides theoretical frameworks for interpreting what the legal record contains.</p>
<p>Academic literature is not deprecated here. The claim is narrower: in a domain where the academic literature is still being assembled, the legal record is available now, has been produced by adversarial processes, and carries authority for the actors whose behaviour one is trying to understand or model. It is a reasonable place to start.</p>
</section>
<section id="closing-remarks" class="level1" data-number="8">
<h1 data-number="8"><span class="header-section-number">8</span> Closing Remarks</h1>
<p>Regulatory systems change. The record of that change is publicly available, time-stamped, and produced by institutions that have observed the domain, absorbed feedback from it, and updated their instruments accordingly. Reading that record carefully is not an alternative to original research—it is a form of original research, and one that takes the accumulated work of regulatory bodies and courts seriously as evidence rather than setting it aside in favour of sources that may be more recent but less tested.</p>
<p>The goal is not to judge who was right and who was wrong in any given dispute, or whether a regulatory body made the best possible decision with the information available. The goal is to understand where the system has been, where it has strained, and what that history reveals about where it currently stands. That is a question that the legal record is unusually well placed to answer.</p>
</section>
<section id="acknowledgments" class="level1 unnumbered">
<h1 class="unnumbered">Acknowledgments</h1>
<p>My deepest gratitude to Mr.&nbsp;Krishna, whose constancy forms the foundation upon which all my work, including this, quietly rests.</p>
<p>Salutations to the Goddess who dwells in all beings in the form of intelligence. I bow to her again and again.</p>
<p>This note draws on methods developed in the course of the Indian Food Informatics Data (IFID) project at iSRL. The authors thank the researchers whose applied work surfaced the need for explicit methodological documentation.</p>
<section id="references" class="level2 unnumbered">
<h2 class="unnumbered anchored" data-anchor-id="references">References</h2>
<div id="refs" class="references csl-bib-body hanging-indent">
<div id="ref-lalitha_2026_supreme_court" class="csl-entry">
Lalitha, A. R. 2026. <em><span class="nocase">Indian Supreme Court Defines Hierarchical Classification for Food Products: Overruling Common Parlance Precedents</span></em>. Interdisciplinary Systems Research Lab.
</div>
<div id="ref-FSSAI_RegulatoryDelta" class="csl-entry">
Vukka, S. N., and A. R. Lalitha. 2026. <em><span class="nocase">Regulatory Delta of Food Labelling Laws in India: A Comparative Analysis of the FSSAI 2011 and 2020 Regulations</span></em>. Indian Food Informatics Data (IFID) Project, Interdisciplinary Systems Research Lab. <a href="https://doi.org/10.5281/zenodo.18710428">https://doi.org/10.5281/zenodo.18710428</a>.
</div>
</div>


</section>
</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@report{a_r2026,
  author = {A R, Lalitha},
  publisher = {iSRL},
  title = {Regulatory {Texts} and {Case} {Law} as {Ground} {Truth} in
    {Emerging} {Domains}},
  number = {iSRL-26-02-M-GroundTruth},
  date = {2026-02-23},
  url = {https://isrl.in/pub/2026-02-m-groundtruth/},
  doi = {10.5281/zenodo.18741725},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-a_r2026" class="csl-entry quarto-appendix-citeas">
A R, Lalitha. 2026. <em>Regulatory Texts and Case Law as Ground Truth in
Emerging Domains</em>. iSRL-26-02-M-GroundTruth. iSRL. <a href="https://doi.org/10.5281/zenodo.18741725">https://doi.org/10.5281/zenodo.18741725</a>.
</div></div></section></div> ]]></description>
  <guid>https://isrl.in/pub/2026-02-m-groundtruth/</guid>
  <pubDate>Mon, 23 Feb 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Regulatory Delta of Food Labelling Laws in India: A Comparative Analysis of the FSSAI 2011 and 2020 Regulations</title>
  <dc:creator>Sai Nikhil Vukka</dc:creator>
  <dc:creator>Lalitha A R</dc:creator>
  <link>https://isrl.in/pub/2026-02-r-regdelta/</link>
  <description><![CDATA[ 




<script>
document.addEventListener('DOMContentLoaded', function() {
  var meta = document.querySelector('#title-block-header .quarto-title-meta');
  if (!meta) return;
  meta.insertAdjacentHTML('beforeend', '<div><div class="quarto-title-meta-heading">Contributors</div><div class="quarto-title-meta-contents"><p class="author" style="margin:0 0 0.1em 0;">Hitha Sunil</p><p style="font-size:0.82em;color:#555;margin:0 0 0.5em 0;font-style:italic;">Typesetting</p></div></div>');
});
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "ScholarlyArticle",
  "name": "Regulatory Delta of Food Labelling Laws in India: A Comparative Analysis of the FSSAI 2011 and 2020 Regulations",
  "@id": "https://doi.org/10.5281/zenodo.18719394",
  "identifier": [
    "https://doi.org/10.5281/zenodo.18719394",
    "iSRL-26-02-R-RegDelta"
  ],
  "description": "Summarises the regulatory delta between India's 2011 and 2020 FSSAI food labelling regulations, focusing on transitions toward prescriptive naming, structured allergen declarations, and risk-aware warnings that directly inform IFID data field selection.",
  "datePublished": "2026-02-21",
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "url": "https://isrl.in/pub/2026-02-r-regdelta/",
  "author": [
    {
      "@type": "Person",
      "name": "Sai Nikhil Vukka",
      "identifier": "https://orcid.org/0009-0007-6717-7512",
      "sameAs": "https://orcid.org/0009-0007-6717-7512"
    },
    {
      "@type": "Person",
      "name": "Lalitha A R",
      "identifier": "https://orcid.org/0009-0001-7466-3531",
      "sameAs": "https://orcid.org/0009-0001-7466-3531",
      "email": "lalithaar.research@gmail.com"
    }
  ],
  "publisher": {
    "@type": "ResearchOrganization",
    "name": "iSRL",
    "url": "https://isrl.in"
  }
}
</script>
<section id="abstract" class="level2" data-number="0.1">
<h2 data-number="0.1" class="anchored" data-anchor-id="abstract"><span class="header-section-number">0.1</span> Abstract</h2>
<p>This short report summarises the regulatory “delta” between the Food Safety and Standards (Packaging and Labelling) Regulations, 2011 and the Food Safety and Standards (Labelling and Display) Regulations, 2020. The focus is on the transition toward prescriptive naming, structured allergen declarations, and risk-aware warnings. These shifts directly inform the selection of specific data fields in digital ingredient identity layers such as IFID.</p>
</section>
<section id="sec-background" class="level1" data-number="1">
<h1 data-number="1"><span class="header-section-number">1</span> Background</h1>
<p>In 2011, FSSAI notified the <em>Food Safety and Standards (Packaging and Labelling) Regulations, 2011</em>, which combined packaging requirements and labelling rules into a single framework. <span class="citation" data-cites="fssai-2011">(Food Safety and Standards Authority of India 2011)</span> Roughly a decade later, the authority split packaging and labelling into separate regulations and brought in the <em>Food Safety and Standards (Labelling and Display) Regulations, 2020</em>. <span class="citation" data-cites="fssai-2020-labelling">(Food Safety and Standards Authority of India 2020)</span> The 2020 move is more than a reshuffle of chapters: it makes labelling more structured, more consumer-facing, and easier to plug into digital compliance tools. <span class="citation" data-cites="fssai-2020-labelling fssai-faq-2022 lanpub-india-fopl">(Food Safety and Standards Authority of India 2020, n.d.-b; <span class="nocase">al.</span> 2020)</span></p>
<p>Over this period, both industrial food systems and digital traceability infrastructure in India have grown in complexity. <span class="citation" data-cites="frontpack-2011 icmrnin-hfss">(Citizen consumer and civic Action Group, n.d.; Indian Council of Medical Research–National Institute of Nutrition, n.d.)</span> As more product categories emerged and more data and feedback became available, regulators have had opportunities to refine how key information is displayed and standardised. This report gives a compact view of how the 2020 regulations shift emphasis compared to 2011, and what that means for people who need to interpret or implement the law in practice.</p>
<p>The shift between the two frameworks can be understood as part of an ongoing evolution in both the packaged food sector and regulatory practice, rather than as a simple before/after contrast. The 2011 regulations drew on the industrial food landscape and data that were available at that time, while the 2020 framework reflects additional years of experience, feedback and product diversification. In that sense, the later regulations build on the earlier ones as the system as a whole becomes more capable of handling finer-grained labelling expectations.</p>
</section>
<section id="sec-delta" class="level1" data-number="2">
<h1 data-number="2"><span class="header-section-number">2</span> Regulatory Delta: 2011 vs 2020</h1>
<p>The differences between the 2011 and 2020 frameworks can be read as part of a longer, iterative process rather than as a sharp break. The 2011 regulations were drafted at a time when packaged and industrial foods, as well as digital tracking systems, were at an earlier stage of development, and they reflect the practices and concerns that were salient then. <span class="citation" data-cites="fssai-2011 frontpack-2011">(Food Safety and Standards Authority of India 2011; Citizen consumer and civic Action Group, n.d.)</span> As product ranges expanded, more data accumulated and stakeholder feedback highlighted specific gaps, FSSAI consolidated those learnings into the 2020 Labelling and Display Regulations. <span class="citation" data-cites="fssai-2020-labelling fssai-comp-labelling">(Food Safety and Standards Authority of India 2020, n.d.-a)</span> The delta in this section is therefore best read as a record of how the system has been strengthened over time, not as a critique of the earlier framework.</p>
<p>Table&nbsp;1 highlights some of the most visible differences between the 2011 and 2020 labelling rules as reflected in official texts and institutional summaries. <span class="citation" data-cites="fssai-2011 fssai-2020-labelling fssai-faq-2022">(Food Safety and Standards Authority of India 2011, 2020, n.d.-b)</span></p>
<div id="tbl-delta" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-tbl figure">
<figcaption class="quarto-float-caption-top quarto-float-caption quarto-float-tbl" id="tbl-delta-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Table&nbsp;1: Regulatory delta between FSSAI 2011 and 2020 labelling rules
</figcaption>
<div aria-describedby="tbl-delta-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<table class="caption-top table">
<colgroup>
<col style="width: 33%">
<col style="width: 33%">
<col style="width: 33%">
</colgroup>
<thead>
<tr class="header">
<th><strong>Dimension</strong></th>
<th><strong>2011 Packaging &amp; Labelling</strong></th>
<th><strong>2020 Labelling &amp; Display</strong></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><strong>Allergen visibility</strong></td>
<td>Disclosures primarily embedded in the ingredient list, with responsibility on the consumer to scan the full list.</td>
<td>Priority allergen groups such as cereals containing gluten, milk, peanuts, soy and sulphites are presented through clearer, more standardised declarations.</td>
</tr>
<tr class="even">
<td><strong>Naming / “true nature”</strong></td>
<td>Provides flexibility for brand-led naming on the principal display panel, guided by general fair-trading and anti-misleading provisions.</td>
<td>Places greater emphasis on the name reflecting the true nature of the food, with more explicit expectations to avoid creating an erroneous impression.</td>
</tr>
<tr class="odd">
<td><strong>Nutrition information</strong></td>
<td>Focus on per 100 g/ml declarations for key nutrients; per serving information is less central.</td>
<td>Encourages a clearer pattern for showing nutrition per 100 g/ml <em>and</em> per serving, supporting front-of-pack and percentage RDA style interpretations.</td>
</tr>
<tr class="even">
<td><strong>Additives and warnings</strong></td>
<td>Class + name/INS for additives, with generic warning styles for certain substances (for example, colours or preservatives).</td>
<td>Gives more structured attention to specific warnings (for example, for sulphites or particular additives) and clearer wording for sensitive population groups.</td>
</tr>
<tr class="odd">
<td><strong>Front-of-pack (FoP) thinking</strong></td>
<td>Labelling can largely be organised around back and side panels, with front-of-pack as one option among many.</td>
<td>Articulates more clearly which elements (such as name, veg/non-veg logo and certain declarations) belong on the principal display panel, creating a base for later FoP policies.</td>
</tr>
<tr class="even">
<td><strong>Enforcement posture</strong></td>
<td>Centres on ensuring information is not false or misleading, with compliance work often document- and text-centric.</td>
<td>The way declarations are structured makes it easier to imagine checklists, digital audits and front-of-pack policies that build on nutrient profile models.</td>
</tr>
</tbody>
</table>
</div>
</figure>
</div>
<section id="sec-aims" class="level2" data-number="2.1">
<h2 data-number="2.1" class="anchored" data-anchor-id="sec-aims"><span class="header-section-number">2.1</span> What the Law is Aiming to Address</h2>
<p>Read together, these shifts point to a gradual tightening around a few recurring questions.</p>
<section id="managing-information-density-and-risk-signals." class="level4" data-number="2.1.0.1">
<h4 data-number="2.1.0.1" class="anchored" data-anchor-id="managing-information-density-and-risk-signals."><span class="header-section-number">2.1.0.1</span> Managing information density and risk signals.</h4>
<ul>
<li>Ensuring that allergens remain visible and recognisable, rather than being overlooked in dense ingredient lists.</li>
<li>Reducing the chances that product names or descriptors leave consumers with an incomplete or ambiguous sense of what they are buying.</li>
<li>Bringing more structure to how high fat, sugar and salt profiles are communicated, especially as processed food categories diversify.</li>
</ul>
</section>
<section id="supporting-more-structured-labelling-practices." class="level4" data-number="2.1.0.2">
<h4 data-number="2.1.0.2" class="anchored" data-anchor-id="supporting-more-structured-labelling-practices."><span class="header-section-number">2.1.0.2</span> Supporting more structured labelling practices.</h4>
<ul>
<li>Encouraging standardised phrasing and placement for priority allergen information.</li>
<li>Clarifying expectations for front-of-pack elements, so that key signals are easier to locate.</li>
<li>Laying technical groundwork for future tools such as front-of-pack labels informed by nutrient profile models.</li>
</ul>
<p>For lawyers and compliance teams, this means that purely formal arguments like “the information is somewhere on the pack” increasingly give way to questions about prominence, placement and structure. The 2020 frame leans towards asking whether the overall label presentation aligns with these expectations in a consistent way.</p>
</section>
</section>
</section>
<section id="sec-implications" class="level1" data-number="3">
<h1 data-number="3"><span class="header-section-number">3</span> Practical Implications for Stakeholders</h1>
<p>From a day-to-day point of view, the 2011–2020 changes nudge both companies and advisors towards more explicit internal systems for tracking allergens, names and nutrition.</p>
<section id="sec-businesses" class="level2" data-number="3.1">
<h2 data-number="3.1" class="anchored" data-anchor-id="sec-businesses"><span class="header-section-number">3.1</span> For Food Businesses</h2>
<ul>
<li><strong>Allergen tracking becomes more explicit:</strong> manufacturers benefit from maintaining internal mappings between ingredients and standard allergen groups, rather than relying only on free-text descriptions.</li>
<li><strong>Naming policies may require review:</strong> product names and descriptors that were aligned with earlier interpretations may need revisiting to match the “true nature” emphasis in the 2020 framework.</li>
<li><strong>Nutrition data hygiene matters more:</strong> keeping consistent, up-to-date values for energy, sugars, fats and sodium supports both regulatory expectations and clearer communication with consumers.</li>
</ul>
</section>
<section id="sec-lawyers" class="level2" data-number="3.2">
<h2 data-number="3.2" class="anchored" data-anchor-id="sec-lawyers"><span class="header-section-number">3.2</span> For Lawyers and Compliance Teams</h2>
<ul>
<li><strong>Case work can use more structure:</strong> instead of only reading long labels line by line, advisors can ask whether allergen declarations, names and nutrition panels line up with the specific structures the 2020 regulations describe.</li>
<li><strong>Advice can be more template-driven:</strong> it becomes realistic to build standard checklists for allergens, naming and nutrition that can be reused across clients or product lines.</li>
</ul>
</section>
<section id="the-compliance-checklist-for-startups" class="level2" data-number="3.3">
<h2 data-number="3.3" class="anchored" data-anchor-id="the-compliance-checklist-for-startups"><span class="header-section-number">3.3</span> The Compliance Checklist for Startups</h2>
<p>For founders and early-stage teams, a quick sanity check can be more useful than a long memo. A simple checklist that falls out of the 2011–2020 delta is:</p>
<ul class="task-list">
<li><label><input type="checkbox"><strong>Allergen coverage:</strong> Have you identified and labelled all FSSAI priority allergen groups that apply to your product?</label></li>
<li><label><input type="checkbox"><strong>Front-of-pack name:</strong> Does the name on the principal display panel reflect the true nature of the food, rather than only a marketing phrase?</label></li>
<li><label><input type="checkbox"><strong>Per serving signals:</strong> Is the per-serving nutrition information (including percentage RDA where applicable) clear enough for a consumer to understand the product’s fat, sugar and salt profile at a glance?</label></li>
</ul>
<p>Treating these as a recurring checklist rather than a one-time launch task makes it easier to stay aligned with how the 2020 regulations expect labels to behave.</p>
</section>
</section>
<section id="sec-ifid" class="level1" data-number="4">
<h1 data-number="4"><span class="header-section-number">4</span> Implications for Digital Ingredient Identity Systems (IFID)</h1>
<p>Because the 2020 regulations transition from unstructured text to specific, standardized declarations, the ‘regulatory delta’ described here serves as a blueprint for any system—whether a printed label or a digital database—aiming for compliance.</p>
<p>At a bare minimum, an ingredient record in such a system can support:</p>
<ul>
<li><strong>A canonical “true nature” name</strong> plus a list of vernacular or commercial aliases, so that regional naming and compliant labelling language can be linked cleanly.</li>
<li><strong>Structured allergen membership</strong>, for example a small set of flags for cereals containing gluten, milk, peanuts, tree nuts, soy and sulphites, instead of only storing full text ingredient names.</li>
<li><strong>Basic nutrient fields</strong> (energy, total sugars, saturated fat, sodium per 100 g/ml) in a consistent format, so that front-of-pack or HFSS-style rules can be applied programmatically later if needed.</li>
</ul>
<p>If these fields live in a stable backend identity layer, then future FSSAI amendments—such as a new HFSS threshold or a focus on a particular additive—can be implemented as updated rules that run across existing products, rather than as one-off manual relabelling exercises.</p>
<section id="sec-ifid-fields" class="level2" data-number="4.1">
<h2 data-number="4.1" class="anchored" data-anchor-id="sec-ifid-fields"><span class="header-section-number">4.1</span> Mandatory vs Optional Fields in IFID Records</h2>
<p>For a digital ingredient identity layer to stay aligned with the 2020 regulations and still be useful for future extensions, it helps to separate <em>mandatory</em> compliance fields from <em>optional</em> but valuable metadata.</p>
<section id="mandatory-fields-driven-by-fssai-2020." class="level4" data-number="4.1.0.1">
<h4 data-number="4.1.0.1" class="anchored" data-anchor-id="mandatory-fields-driven-by-fssai-2020."><span class="header-section-number">4.1.0.1</span> Mandatory fields (driven by FSSAI 2020).</h4>
<ul>
<li><strong>Canonical “true nature” name:</strong> a single, standardised name that reflects what the ingredient actually is, to reduce scope for ambiguous naming on labels.</li>
<li><strong>Allergen group membership:</strong> explicit mapping of each ingredient to the relevant FSSAI priority allergen groups (for example cereals containing gluten, milk, peanuts, tree nuts, soy, sulphites) so that allergen statements can be generated consistently.</li>
<li><strong>Core nutrient values:</strong> at least energy, total sugars, saturated fat and sodium per 100 g/ml, to support basic HFSS-style signalling and any future front-of-pack requirements that depend on these nutrients.</li>
</ul>
</section>
<section id="optional-metadata-for-future-proofing-and-usability." class="level4" data-number="4.1.0.2">
<h4 data-number="4.1.0.2" class="anchored" data-anchor-id="optional-metadata-for-future-proofing-and-usability."><span class="header-section-number">4.1.0.2</span> Optional metadata (for future-proofing and usability).</h4>
<ul>
<li><strong>Vernacular and commercial names:</strong> regional aliases and brand-style names that help link consumer-facing labels back to a single canonical ingredient record without losing cultural context.</li>
<li><strong>Versioned compliance rules and flags:</strong> pointers from the ingredient record to external rule-sets (for example “FSSAI_2011”, “FSSAI_2020”, “FSSAI_2025”) so that when regulations change, new rules can be applied to existing IFIDs in an instant audit, without rewriting the underlying identities.</li>
</ul>
<p>Keeping this distinction clear makes it easier to run the IFID project in a continuous loop: mandatory fields ensure basic regulatory alignment, while optional metadata can be expanded over time as new use-cases and amendments appear.</p>
<p>From a data science point of view, the 2020 structure can also be read as an invitation to build <em>API-first</em> compliance systems: once allergens, names and nutrients are represented as stable fields in an ingredient database, they can be exposed through services that run automated checks whenever a recipe changes, a new product is proposed, or a regulation is updated. In that sense, the same design that supports paper labels today also lays the groundwork for digital traceability and machine-readable audits in the future.</p>
</section>
</section>
</section>
<section id="sec-conclusion" class="level1" data-number="5">
<h1 data-number="5"><span class="header-section-number">5</span> Conclusion</h1>
<p>The move from the 2011 Packaging and Labelling Regulations to the 2020 Labelling and Display Regulations marks a gradual shift from mostly text-heavy transparency towards more structured, salient labelling. For non-technical readers, the core takeaway is that allergens, naming and nutrition are increasingly treated as fields that can be checked and compared in a more systematic way, rather than as unstructured blocks of text.</p>
<p>For data-oriented projects such as IFID, this same delta is a design clue: if ingredient records are aligned with the way the 2020 rules think about allergens, names and nutrients, then it becomes possible to build shared tools that lawyers, regulators and food businesses can all use, without each group having to redo the basic comparison between 2011 and 2020 from scratch.</p>
<section id="references" class="level2 unnumbered">
<h2 class="unnumbered anchored" data-anchor-id="references">References</h2>
<div id="refs" class="references csl-bib-body hanging-indent">
<div id="ref-lanpub-india-fopl" class="csl-entry">
<span class="nocase">al., Radhika Pande et</span>. 2020. <span>“Front-of-Pack Nutrition Labelling in India.”</span> <em>The Lancet Public Health</em> 5 (4): e195–96.
</div>
<div id="ref-frontpack-2011" class="csl-entry">
Citizen consumer and civic Action Group. n.d. <em>Front of Pack Labelling in India: Background and Context</em>.
</div>
<div id="ref-fssai-2011" class="csl-entry">
Food Safety and Standards Authority of India. 2011. <em>Food Safety and Standards (Packaging and Labelling) Regulations, 2011</em>. Government of India.
</div>
<div id="ref-fssai-2020-labelling" class="csl-entry">
Food Safety and Standards Authority of India. 2020. <em>Food Safety and Standards (Labelling and Display) Regulations, 2020</em>. Government of India.
</div>
<div id="ref-fssai-comp-labelling" class="csl-entry">
Food Safety and Standards Authority of India. n.d.-a. <em>Compendium of Food Safety and Standards (Labelling and Display) Regulations, 2020</em>.
</div>
<div id="ref-fssai-faq-2022" class="csl-entry">
Food Safety and Standards Authority of India. n.d.-b. <em>Frequently Asked Questions (FAQs) on FSS (Labelling and Display) Regulations, 2020</em>.
</div>
<div id="ref-icmrnin-hfss" class="csl-entry">
Indian Council of Medical Research–National Institute of Nutrition. n.d. <em>Dietary Guidelines and Nutrient Thresholds Relevant to High Fat, Sugar and Salt (HFSS) Foods</em>.
</div>
</div>


</section>
</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@report{nikhil_vukka2026,
  author = {Nikhil Vukka, Sai and A R, Lalitha},
  publisher = {iSRL},
  title = {Regulatory {Delta} of {Food} {Labelling} {Laws} in {India:}
    {A} {Comparative} {Analysis} of the {FSSAI} 2011 and 2020
    {Regulations}},
  number = {iSRL-26-02-R-RegDelta},
  date = {2026-02-21},
  url = {https://isrl.in/pub/2026-02-r-regdelta/},
  doi = {10.5281/zenodo.18719394},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-nikhil_vukka2026" class="csl-entry quarto-appendix-citeas">
Nikhil Vukka, Sai, and Lalitha A R. 2026. <em>Regulatory Delta of Food
Labelling Laws in India: A Comparative Analysis of the FSSAI 2011 and
2020 Regulations</em>. iSRL-26-02-R-RegDelta. iSRL. <a href="https://doi.org/10.5281/zenodo.18719394">https://doi.org/10.5281/zenodo.18719394</a>.
</div></div></section></div> ]]></description>
  <guid>https://isrl.in/pub/2026-02-r-regdelta/</guid>
  <pubDate>Sat, 21 Feb 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Justification Companion to EMF-Scoring Model</title>
  <dc:creator>Lalitha A R</dc:creator>
  <link>https://isrl.in/pub/2026-02-d-emfjustify/</link>
  <description><![CDATA[ 




<script>
document.addEventListener('DOMContentLoaded', function() {
  var meta = document.querySelector('#title-block-header .quarto-title-meta');
  if (!meta) return;
  meta.insertAdjacentHTML('beforeend', '<div><div class="quarto-title-meta-heading">Contributors</div><div class="quarto-title-meta-contents"><p class="author" style="margin:0 0 0.1em 0;">Hitha Sunil</p><p style="font-size:0.82em;color:#555;margin:0 0 0.5em 0;font-style:italic;">Typesetting</p></div></div>');
});
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "ScholarlyArticle",
  "name": "Justification Companion to EMF-Scoring Model",
  "@id": "https://doi.org/10.5281/zenodo.18713318",
  "identifier": [
    "https://doi.org/10.5281/zenodo.18713318",
    "iSRL-26-02-D-EMFJustify"
  ],
  "description": "Justification companion to the EMF Tri-Axial Identity Model, providing regulatory, chemical, and trade defensibility notes for each axis score assignment across the model's benchmark ingredient set.",
  "datePublished": "2026-02-20",
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "url": "https://isrl.in/pub/2026-02-d-emfjustify/",
  "author": {
    "@type": "Person",
    "name": "Lalitha A R",
    "identifier": "https://orcid.org/0009-0001-7466-3531",
    "sameAs": "https://orcid.org/0009-0001-7466-3531",
    "email": "lalithaar.research@gmail.com"
  },
  "publisher": {
    "@type": "ResearchOrganization",
    "name": "iSRL",
    "url": "https://isrl.in"
  }
}
</script>
<p>This is a justification companion to the EMF Scoring Model as described in <strong>Identity, Transformation, and Function: A Tri-Axial Model for the Classification of Food Ingredient Identity</strong> <span class="citation" data-cites="lalitha_2026_emf_main">(Lalitha 2026a)</span>.</p>
<hr>
<section id="sec-energy-scores" class="level2" data-number="1">
<h2 data-number="1" class="anchored" data-anchor-id="sec-energy-scores"><span class="header-section-number">1</span> Table 1: Anthropogenic Energy Score (E) Assignments</h2>
<p>Anthropogenic Energy Score (E) assignments with chemical, regulatory, and trade defensibility notes.</p>
<div id="tbl-energy-scores" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-tbl figure">
<figcaption class="quarto-float-caption-top quarto-float-caption quarto-float-tbl" id="tbl-energy-scores-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Table&nbsp;1: Anthropogenic Energy Score (E) assignments with chemical, regulatory, and trade defensibility notes.
</figcaption>
<div aria-describedby="tbl-energy-scores-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<table class="caption-top table">
<colgroup>
<col style="width: 15%">
<col style="width: 19%">
<col style="width: 15%">
<col style="width: 15%">
<col style="width: 19%">
<col style="width: 15%">
</colgroup>
<thead>
<tr class="header">
<th style="text-align: left;">Process</th>
<th style="text-align: center;">E</th>
<th style="text-align: left;">Chemical Justification</th>
<th style="text-align: left;">Legal / Naming Justification (FSSAI/Codex) and Trade Classification</th>
<th style="text-align: center;">Defensibility</th>
<th style="text-align: left;">Summary</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">Chilling</td>
<td style="text-align: center;">0.18</td>
<td style="text-align: left;">No covalent change; refrigeration is explicitly listed as “minimally processed”. <span class="citation" data-cites="FSSAI_Label_2020">(Food Safety and Standards Authority of India (FSSAI) 2023a)</span></td>
<td style="text-align: left;">“Fresh or chilled” food categories are treated as primary commodity forms in ITC(HS) (e.g., Ch. 07). <span class="citation" data-cites="DGCI_CH07">(Directorate General of Commercial Intelligence and Statistics (DGCI&amp;S), Government of India 2007a)</span></td>
<td style="text-align: center;">Medium</td>
<td style="text-align: left;">Physical stabilization; identity retained.</td>
</tr>
<tr class="even">
<td style="text-align: left;">Sorting</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: left;">Physical selection only; no molecular modification. <span class="citation" data-cites="FSSAI_Label_2020">(Food Safety and Standards Authority of India (FSSAI) 2023a)</span></td>
<td style="text-align: left;">Generally does not create a new standardized “food name” under labelling rules; still described by true nature. <span class="citation" data-cites="FSSAI_Label_2020">(Food Safety and Standards Authority of India (FSSAI) 2023a)</span></td>
<td style="text-align: center;">Medium</td>
<td style="text-align: left;">Handling step only.</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Washing</td>
<td style="text-align: center;">0.15</td>
<td style="text-align: left;">Surface removal step; documented to reduce residues; no intended covalent change. <span class="citation" data-cites="FSSAI_Label_2020">(Food Safety and Standards Authority of India (FSSAI) 2023a)</span></td>
<td style="text-align: left;">Treated as minimal processing (cleaning/removal of unwanted parts). <span class="citation" data-cites="FSSAI_Label_2020">(Food Safety and Standards Authority of India (FSSAI) 2023a)</span></td>
<td style="text-align: center;">Medium</td>
<td style="text-align: left;">Decontamination without re-identity.</td>
</tr>
<tr class="even">
<td style="text-align: left;">De-husking</td>
<td style="text-align: center;">0.22</td>
<td style="text-align: left;">Removes inedible outer layers; does not require covalent transformation. <span class="citation" data-cites="FSSAI_Label_2020">(Food Safety and Standards Authority of India (FSSAI) 2023a)</span></td>
<td style="text-align: left;">Fits “removing inedible or unwanted parts” under minimal processing. <span class="citation" data-cites="FSSAI_Label_2020">(Food Safety and Standards Authority of India (FSSAI) 2023a)</span></td>
<td style="text-align: center;">Medium</td>
<td style="text-align: left;">Structure reduced; chemistry preserved.</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Milling (e.g., Besan)</td>
<td style="text-align: center;">0.28</td>
<td style="text-align: left;">Comminution; macromolecules remain; cellular structure destroyed but molecules remain. <span class="citation" data-cites="FSSAI_Label_2020">(Food Safety and Standards Authority of India (FSSAI) 2023a)</span></td>
<td style="text-align: left;">Grinding is listed as minimal processing; trade heading exists for flour/meal/powder of dried legumes (HS 1106). <span class="citation" data-cites="FSSAI_Label_2020 DGCI_CH11">(Food Safety and Standards Authority of India (FSSAI) 2023a; Directorate General of Commercial Intelligence and Statistics (DGCI&amp;S), Government of India 2007b)</span></td>
<td style="text-align: center;">High</td>
<td style="text-align: left;">Mechanical conversion to flour with clear HS placement.</td>
</tr>
<tr class="even">
<td style="text-align: left;">Cold Pressing (Oil)</td>
<td style="text-align: center;">0.32</td>
<td style="text-align: left;">Mechanical extraction without heat; lipid molecules remain triglycerides. <span class="citation" data-cites="Codex_CXS19_1981">(Codex Alimentarius Commission (FAO/WHO) 2024)</span></td>
<td style="text-align: left;">Codex defines “cold pressed fats and oils” and restricts use of that designation to compliant products; no additives permitted in virgin/cold pressed oils. <span class="citation" data-cites="Codex_CXS19_1981">(Codex Alimentarius Commission (FAO/WHO) 2024)</span></td>
<td style="text-align: center;">High</td>
<td style="text-align: left;">Mechanical-only oil; limited industrial separation.</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Churning (Butter)</td>
<td style="text-align: center;">0.45</td>
<td style="text-align: left;">Phase inversion (oil-in-water to water-in-oil) and physical separation; no target covalent change. <span class="citation" data-cites="FSSAI_Dairy_2025">(Food Safety and Standards Authority of India (FSSAI) 2025b)</span></td>
<td style="text-align: left;">FSSAI defines butter as a water-in-oil emulsion derived exclusively from milk/milk products; table butter must be from pasteurised cream. <span class="citation" data-cites="FSSAI_Dairy_2025">(Food Safety and Standards Authority of India (FSSAI) 2025b)</span></td>
<td style="text-align: center;">High</td>
<td style="text-align: left;">Thermal/physical re-structuring with defined legal identity.</td>
</tr>
<tr class="even">
<td style="text-align: left;">Fermentation (Vinegar)</td>
<td style="text-align: center;">0.56</td>
<td style="text-align: left;">Biochemical oxidation of ethanol to acetic acid by acetic acid bacteria; covalent re-identity of primary acid. <span class="citation" data-cites="Vinegar_Review_2024">(<span class="nocase">Yun et al.</span> 2024)</span></td>
<td style="text-align: left;">FSSAI treats fermentation as minimal processing in nutrition-labelling context; ITC(HS) distinguishes brewed vs synthetic vinegar under HS 2209. <span class="citation" data-cites="FSSAI_Label_2020 DGCI_CH22">(Food Safety and Standards Authority of India (FSSAI) 2023a; Directorate General of Commercial Intelligence and Statistics (DGCI&amp;S), Government of India 2007d)</span></td>
<td style="text-align: center;">Medium</td>
<td style="text-align: left;">Biological conversion; product class recognized in trade.</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Roasting</td>
<td style="text-align: center;">0.58</td>
<td style="text-align: left;">Thermal chemistry (Maillard reaction: amino acids + reducing sugars forming melanoidins and other new compounds). <span class="citation" data-cites="Maillard_2025">(<span class="nocase">Schaefer et al.</span> 2025)</span></td>
<td style="text-align: left;">Still generally labelled by food name with accurate description; not typically a statutory rename trigger by itself. <span class="citation" data-cites="FSSAI_Label_2020">(Food Safety and Standards Authority of India (FSSAI) 2023a)</span></td>
<td style="text-align: center;">Low</td>
<td style="text-align: left;">Chemistry occurs, but regulatory treatment is food- and claim-dependent.</td>
</tr>
<tr class="even">
<td style="text-align: left;">Pasteurization</td>
<td style="text-align: center;">0.48</td>
<td style="text-align: left;">Microbicidal heat treatment (defined time/temperature combinations); primarily denaturation/aggregation without designed covalent synthesis. <span class="citation" data-cites="FSSAI_Dairy_2025">(Food Safety and Standards Authority of India (FSSAI) 2025b)</span></td>
<td style="text-align: left;">FSSAI defines pasteurization and requires heat-treatment declaration for milk; also listed as minimal processing. <span class="citation" data-cites="FSSAI_Dairy_2025 FSSAI_Label_2020">(Food Safety and Standards Authority of India (FSSAI) 2025b, 2023a)</span></td>
<td style="text-align: center;">High</td>
<td style="text-align: left;">Standardized thermal process with explicit legal definition.</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Solvent Extraction (Oils)</td>
<td style="text-align: center;">0.82</td>
<td style="text-align: left;">Solvent-based separation (typically hexane) and subsequent desolventizing/distillation; strong industrial separation though not necessarily covalent modification. <span class="citation" data-cites="Hexane_Substitution_2022">(<span class="nocase">Boukhenfa et al.</span> 2022)</span></td>
<td style="text-align: left;">India controls “solvent-extracted oil” production/handling under a dedicated Control Order; industrial category is legally recognized. <span class="citation" data-cites="FSSAI_SolventExtracted_Order_1967">(Government of India (hosted on FSSAI website) 1967)</span></td>
<td style="text-align: center;">High</td>
<td style="text-align: left;">Industrial chemical-separation route with distinct legal instrument.</td>
</tr>
<tr class="even">
<td style="text-align: left;">Fractionation (Olein)</td>
<td style="text-align: center;">0.76</td>
<td style="text-align: left;">Physical fractionation via controlled crystallization and separation into liquid (olein) and solid fractions; no intended covalent modification. <span class="citation" data-cites="PalmOil_Processing_2023">(<span class="nocase">Abdul Wahab et al.</span> 2023)</span></td>
<td style="text-align: left;">Ingredient class titles in FSSAI labelling include “fractionated fat” under edible vegetable fat declarations. <span class="citation" data-cites="FSSAI_Label_2020">(Food Safety and Standards Authority of India (FSSAI) 2023a)</span></td>
<td style="text-align: center;">Medium</td>
<td style="text-align: left;">Industrial separation into functional fractions.</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Clarification (Ghee)</td>
<td style="text-align: center;">0.55</td>
<td style="text-align: left;">Heat-driven removal of water and milk solids-not-fat; concentrated milk fat; no intended covalent synthesis. <span class="citation" data-cites="FSSAI_Dairy_2025">(Food Safety and Standards Authority of India (FSSAI) 2025b)</span></td>
<td style="text-align: left;">FSSAI defines ghee/milk fat products as derived exclusively from milk via processes that remove water and SNF almost totally. <span class="citation" data-cites="FSSAI_Dairy_2025">(Food Safety and Standards Authority of India (FSSAI) 2025b)</span></td>
<td style="text-align: center;">High</td>
<td style="text-align: left;">Well-defined milk-fat product identity.</td>
</tr>
<tr class="even">
<td style="text-align: left;">Hydrogenation</td>
<td style="text-align: center;">0.92</td>
<td style="text-align: left;">Addition of hydrogen to C=C double bonds (covalent saturation); may also change isomer distribution. <span class="citation" data-cites="AOCS_Hydrogenation_2024">(American Oil Chemists’ Society (AOCS) 2024)</span></td>
<td style="text-align: left;">ITC(HS) heading 1516 explicitly covers fats/oils “partly or wholly hydrogenated”; FSSAI ingredient class titles include “hydrogenated oils” / “partially hydrogenated oils”. <span class="citation" data-cites="DGCI_CH15 FSSAI_Label_2020">(Directorate General of Commercial Intelligence and Statistics (DGCI&amp;S), Government of India 2007c; Food Safety and Standards Authority of India (FSSAI) 2023a)</span></td>
<td style="text-align: center;">High</td>
<td style="text-align: left;">Explicit HS/legal recognition + covalent modification.</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Acetylation (Modified Starch)</td>
<td style="text-align: center;">0.94</td>
<td style="text-align: left;">Hydroxyl groups on starch are converted to acetate esters (<em>O</em>-acetylation); acetyl groups introduced using acetic anhydride (representative modified starch). <span class="citation" data-cites="JECFA_AcetylatedDistarchAdipate">(Joint FAO/WHO Expert Committee on Food Additives (JECFA) 1974)</span></td>
<td style="text-align: left;">HS Chapter 35 heading 3505 covers modified starches including esterified starches; FSSAI labelling distinguishes “starches other than chemically modified starches” (implying modified starch must be specifically named). <span class="citation" data-cites="DGCI_CH35 FSSAI_Label_2020">(Directorate General of Commercial Intelligence and Statistics (DGCI&amp;S), Government of India 2007e; Food Safety and Standards Authority of India (FSSAI) 2023a)</span></td>
<td style="text-align: center;">High</td>
<td style="text-align: left;">Clear covalent modification and clear HS heading.</td>
</tr>
<tr class="even">
<td style="text-align: left;">Interesterification</td>
<td style="text-align: center;">0.91</td>
<td style="text-align: left;">Rearrangement of fatty acids within/between triglycerides via ester interchange (covalent bond break/re-form) while total FA composition may remain. <span class="citation" data-cites="TransFat_Review_2011">(<span class="nocase">Mozaffarian et al.</span> 2011)</span></td>
<td style="text-align: left;">ITC(HS) heading 1516 explicitly includes “interesterified” and “re-esterified” fats/oils; FSSAI class titles include “interesterified vegetable fat”. <span class="citation" data-cites="DGCI_CH15 FSSAI_Label_2020">(Directorate General of Commercial Intelligence and Statistics (DGCI&amp;S), Government of India 2007c; Food Safety and Standards Authority of India (FSSAI) 2023a)</span></td>
<td style="text-align: center;">High</td>
<td style="text-align: left;">Explicit HS/legal recognition + covalent modification.</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Synthetic Flavors</td>
<td style="text-align: center;">0.99</td>
<td style="text-align: left;">Deliberate formulation of defined molecules produced by industrial chemical synthesis; mixture may be far removed from biological matrix. <span class="citation" data-cites="PubChem_Vanillin">(National Center for Biotechnology Information (NCBI) 2025a)</span></td>
<td style="text-align: left;">FSSAI requires declaration of flavouring agents; artificial flavours require declaring the common name, and natural/nature-identical require class name declaration. <span class="citation" data-cites="FSSAI_Label_2020">(Food Safety and Standards Authority of India (FSSAI) 2023a)</span></td>
<td style="text-align: center;">Medium</td>
<td style="text-align: left;">Strong naming rules; chemistry varies by flavour system.</td>
</tr>
<tr class="even">
<td style="text-align: left;">Vanillin (Lab-made)</td>
<td style="text-align: center;">0.98</td>
<td style="text-align: left;">Single defined chemical entity (vanillin; 4-hydroxy-3-methoxybenzaldehyde). <span class="citation" data-cites="PubChem_Vanillin">(National Center for Biotechnology Information (NCBI) 2025a)</span></td>
<td style="text-align: left;">Typically declared as a flavouring substance; labelling must follow flavour declaration rules (natural vs artificial/nature-identical classification depends on source and regulatory interpretation). <span class="citation" data-cites="FSSAI_Label_2020">(Food Safety and Standards Authority of India (FSSAI) 2023a)</span></td>
<td style="text-align: center;">Medium</td>
<td style="text-align: left;">Chemical identity is unambiguous; regulatory class depends on production route.</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Sodium Glycolate</td>
<td style="text-align: center;">0.99</td>
<td style="text-align: left;">Defined inorganic/organic salt (sodium 2-hydroxyacetate); inherently a chemical entity not tied to a food matrix. <span class="citation" data-cites="PubChem_SodiumGlycolate">(National Center for Biotechnology Information (NCBI) 2025b)</span></td>
<td style="text-align: left;">“Glycolate” is referenced as an impurity/specification parameter within some additive standards (e.g., CMC-related specs), but sodium glycolate itself is not a common named food. <span class="citation" data-cites="FSSAI_Additives_Chapter3_2024">(Food Safety and Standards Authority of India (FSSAI) 2024)</span></td>
<td style="text-align: center;">Low</td>
<td style="text-align: left;">Presence in food law is indirect; use-case dependent.</td>
</tr>
</tbody>
</table>
</div>
</figure>
</div>
<hr>
</section>
<section id="sec-matter-scores" class="level2" data-number="2">
<h2 data-number="2" class="anchored" data-anchor-id="sec-matter-scores"><span class="header-section-number">2</span> Table 2: Final Commercial States with Matter Scores (M)</h2>
<p>Final commercial states with single Matter Scores (M), primary Matter Classes, typical process/E context, and justification summary.</p>
<div id="tbl-matter-scores" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-tbl figure">
<figcaption class="quarto-float-caption-top quarto-float-caption quarto-float-tbl" id="tbl-matter-scores-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Table&nbsp;2: Final commercial states with single Matter Scores (M), primary Matter Classes, typical process/E context, and justification summary.
</figcaption>
<div aria-describedby="tbl-matter-scores-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<table class="caption-top table">
<colgroup>
<col style="width: 19%">
<col style="width: 23%">
<col style="width: 19%">
<col style="width: 19%">
<col style="width: 19%">
</colgroup>
<thead>
<tr class="header">
<th style="text-align: left;">Final State</th>
<th style="text-align: center;">M</th>
<th style="text-align: left;">Matter Class</th>
<th style="text-align: left;">Typical Processes / E-context</th>
<th style="text-align: left;">Justification Summary</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">Whole / Fresh pieces</td>
<td style="text-align: center;">0.05</td>
<td style="text-align: left;">Hydrated / Native</td>
<td style="text-align: left;">Sorting (E=0.12), washing (0.15), de-husking (0.22), chilling (0.18).</td>
<td style="text-align: left;">Primary commodities such as whole vegetables or raw milk sold with cellular water and structure intact are treated as minimally processed in trade and labelling; matrix loss is negligible, so M is near zero. <span class="citation" data-cites="FSSAI_Label_2020_M2 DGCI_CH11_M2">(Food Safety and Standards Authority of India (FSSAI) 2023b; Directorate General of Commercial Intelligence and Statistics (DGCI&amp;S) 2007)</span></td>
</tr>
<tr class="even">
<td style="text-align: left;">Cut / Sliced pieces</td>
<td style="text-align: center;">0.10</td>
<td style="text-align: left;">Hydrated / Native</td>
<td style="text-align: left;">Sorting, washing, trimming, cutting, chilling in similar E-band as whole produce.</td>
<td style="text-align: left;">Cutting or slicing does not remove major components; it increases surface area but retains the hydrated matrix, so M is modestly above whole state yet still in Class 1. <span class="citation" data-cites="FSSAI_Label_2020_M2">(Food Safety and Standards Authority of India (FSSAI) 2023b)</span></td>
</tr>
<tr class="odd">
<td style="text-align: left;">Pulp / Puree</td>
<td style="text-align: center;">0.25</td>
<td style="text-align: left;">Comminuted</td>
<td style="text-align: left;">De-husking (0.22), milling or pulping (0.28–0.32), possible pasteurisation (0.48).</td>
<td style="text-align: left;">Fruit or vegetable pulps and purees correspond to comminuted edible portions where skin and fibre may be partially retained; composition is close to edible fraction but structure is lost, consistent with Class 2. <span class="citation" data-cites="DGCI_CH11_M2 FSSAI_Dairy_2025_M2">(Directorate General of Commercial Intelligence and Statistics (DGCI&amp;S) 2007; Food Safety and Standards Authority of India (FSSAI) 2025a)</span></td>
</tr>
<tr class="even">
<td style="text-align: left;">Coarse grits</td>
<td style="text-align: center;">0.30</td>
<td style="text-align: left;">Comminuted</td>
<td style="text-align: left;">Milling/fragmentation (E near 0.28) without extensive screening to flour fineness.</td>
<td style="text-align: left;">Cereal groats and meals are defined in Chapter 11 as fragmented grains with specified sieve cut-offs; fragmentation preserves nutritional spectrum but destroys grain structure, giving a higher M than whole grain but still Class 2. <span class="citation" data-cites="DGCI_CH11_M2 WCO_CH11_Notes_M2">(Directorate General of Commercial Intelligence and Statistics (DGCI&amp;S) 2007; World Customs Organization 2002)</span></td>
</tr>
<tr class="odd">
<td style="text-align: left;">Flour / Fine powder</td>
<td style="text-align: center;">0.33</td>
<td style="text-align: left;">Comminuted</td>
<td style="text-align: left;">Milling and sifting (E ≈ 0.28) to flours and powders.</td>
<td style="text-align: left;">Fine cereal and pulse flours in HS 1101/1102 represent fully fragmented grain; anatomical integrity is lost but no targeted macronutrient removal occurs, so M is slightly higher than coarse grits yet remains in Class 2. <span class="citation" data-cites="DGCI_CH11_M2 WCO_CH11_Notes_M2">(Directorate General of Commercial Intelligence and Statistics (DGCI&amp;S) 2007; World Customs Organization 2002)</span></td>
</tr>
<tr class="even">
<td style="text-align: left;">Flakes</td>
<td style="text-align: center;">0.36</td>
<td style="text-align: left;">Dehydrated / Concentrated</td>
<td style="text-align: left;">Rolling/laminating (working grains), partial dehydration or toasting (E around roasting 0.58).</td>
<td style="text-align: left;">Rolled or flaked grains are explicitly classified under heading 1104; moisture is typically lower and structure more worked than in meal, so M reflects additional matrix disruption and concentration characteristic of early Class 3. <span class="citation" data-cites="DGCI_CH11_M2 WCO_CH11_Notes_M2">(Directorate General of Commercial Intelligence and Statistics (DGCI&amp;S) 2007; World Customs Organization 2002)</span></td>
</tr>
<tr class="odd">
<td style="text-align: left;">Concentrate (liquid)</td>
<td style="text-align: center;">0.40</td>
<td style="text-align: left;">Dehydrated / Concentrated</td>
<td style="text-align: left;">Evaporation or vacuum concentration of juices, milk, or pulps; processes in a band around pasteurisation (0.48) and clarification (0.49).</td>
<td style="text-align: left;">Liquid concentrates such as condensed milk or concentrated juice primarily remove water; the matrix is densified but major macronutrients remain, justifying a mid-Class 3 M-score. <span class="citation" data-cites="FSSAI_Dairy_2025_M2">(Food Safety and Standards Authority of India (FSSAI) 2025a)</span></td>
</tr>
<tr class="even">
<td style="text-align: left;">Powder (spray-dried)</td>
<td style="text-align: center;">0.42</td>
<td style="text-align: left;">Dehydrated / Concentrated</td>
<td style="text-align: left;">Evaporation and spray-drying of liquids (e.g., milk, juices, whey) following heat treatment.</td>
<td style="text-align: left;">Spray-dried powders such as milk powder and whey powder are recognized as distinct dried milk products; removal of nearly all water and creation of free-flowing powders increases density and handling purity but still retains a broad nutrient profile, fitting high Class 3. <span class="citation" data-cites="FSSAI_Dairy_2025_M2 Whey_Applications_2015_M2">(Food Safety and Standards Authority of India (FSSAI) 2025a; <span class="nocase">Pintado et al.</span> 2015)</span></td>
</tr>
<tr class="odd">
<td style="text-align: left;">Juice (clarified)</td>
<td style="text-align: center;">0.50</td>
<td style="text-align: left;">Structural Fractionation</td>
<td style="text-align: left;">Pulping, then clarification/filtration, sometimes centrifugation (E rising from 0.28 to ≈0.49).</td>
<td style="text-align: left;">Clarified juices selectively remove insoluble fibre and suspended solids, leaving mainly soluble solids and water; this is a compositional subset of the fruit matrix and matches Class 4 behaviour. <span class="citation" data-cites="DGCI_CH11_M2 FSSAI_Label_2020_M2">(Directorate General of Commercial Intelligence and Statistics (DGCI&amp;S) 2007; Food Safety and Standards Authority of India (FSSAI) 2023b)</span></td>
</tr>
<tr class="even">
<td style="text-align: left;">Skim / Defatted meal</td>
<td style="text-align: center;">0.55</td>
<td style="text-align: left;">Structural Fractionation</td>
<td style="text-align: left;">Cream separation (milk), solvent extraction or pressing (oilseeds), followed by drying or milling.</td>
<td style="text-align: left;">Skimmed milk (fat-reduced) and defatted oilseed meals are produced by removing cream or oil; the remaining fraction is enriched in protein or non-fat solids and recognized as a separate commodity or feed/food ingredient, placing it in upper Class 4. <span class="citation" data-cites="FSSAI_Dairy_2025_M2 DGCI_CH15_M2 Codex_SoyProtein_175_1989_M2">(Food Safety and Standards Authority of India (FSSAI) 2025a; DGCI&amp;S 2007a; Codex Alimentarius Commission 1989)</span></td>
</tr>
<tr class="odd">
<td style="text-align: left;">Oil</td>
<td style="text-align: center;">0.70</td>
<td style="text-align: left;">Constitutional Isolate</td>
<td style="text-align: left;">Cold pressing (E=0.32), solvent extraction (0.82), and refining/fractionation (0.76).</td>
<td style="text-align: left;">Edible fats and oils are defined in Codex CXS 19-1981 as glyceride-based materials separated from plant or animal sources, including virgin, cold-pressed, and refined oils; the lipid fraction is isolated from the matrix, justifying a Class 5 score. <span class="citation" data-cites="Codex_FatsOils_19_1981_M2 DGCI_CH15_M2">(Codex Alimentarius Commission 2015; DGCI&amp;S 2007a)</span></td>
</tr>
<tr class="even">
<td style="text-align: left;">Protein concentrate</td>
<td style="text-align: center;">0.74</td>
<td style="text-align: left;">Constitutional Isolate</td>
<td style="text-align: left;">Aqueous extraction, precipitation or membrane concentration of protein, followed by drying.</td>
<td style="text-align: left;">Soy protein concentrate is defined as containing 65–90% protein (dry basis) after removal of substantial non-protein matter; this near-pure macronutrient fraction is more matrix-distant than skim or defatted meal but less than isolates, justifying M=0.74. <span class="citation" data-cites="Codex_SoyProtein_175_1989_M2">(Codex Alimentarius Commission 1989)</span></td>
</tr>
<tr class="odd">
<td style="text-align: left;">Protein isolate</td>
<td style="text-align: center;">0.78</td>
<td style="text-align: left;">Constitutional Isolate</td>
<td style="text-align: left;">Further removal of non-protein constituents (water, oil, carbohydrates) by extraction and membrane processes.</td>
<td style="text-align: left;">Soy protein isolate (≥90%) and whey protein isolate similarly achieve very high protein purity; Codex and technical literature treat them as functional protein ingredients rather than food matrices, placing them at the top of Class 5. <span class="citation" data-cites="Codex_SoyProtein_175_1989_M2 Whey_Applications_2015_M2">(Codex Alimentarius Commission 1989; <span class="nocase">Pintado et al.</span> 2015)</span></td>
</tr>
<tr class="even">
<td style="text-align: left;">Fat fraction</td>
<td style="text-align: center;">0.72</td>
<td style="text-align: left;">Constitutional Isolate</td>
<td style="text-align: left;">Fractionation of oils/fats (E=0.76) into olein/stearin, or separation of butterfat/ghee from milk.</td>
<td style="text-align: left;">Fractionated fats such as palm olein and milk fat products (including ghee and butterfat) are recognized in Codex and FSSAI as specific fat fractions; they are highly enriched in triglycerides from a defined source, warranting a slightly lower M than generic oil due to preserved origin linkage yet clear Class 5 status. <span class="citation" data-cites="Codex_FatsOils_19_1981_M2 FSSAI_Dairy_2025_M2">(Codex Alimentarius Commission 2015; Food Safety and Standards Authority of India (FSSAI) 2025a)</span></td>
</tr>
<tr class="odd">
<td style="text-align: left;">Extract / Oleoresin</td>
<td style="text-align: center;">0.86</td>
<td style="text-align: left;">Molecular Signal / Extract</td>
<td style="text-align: left;">Solvent extraction of spices or herbs and evaporation of solvent to yield oleoresins.</td>
<td style="text-align: left;">Spice oleoresins and similar extracts concentrate flavour-active and sometimes pungent components; biomass is largely removed and the material acts as a potent functional ingredient, aligning with mid-Class 6. <span class="citation" data-cites="EssentialOils_FoodApps_2024_M2 FSSAI_Additives_2022_M2">(Rodilla et al. 2024; Food Safety and Standards Authority of India (FSSAI) 2022)</span></td>
</tr>
<tr class="even">
<td style="text-align: left;">Essential oil</td>
<td style="text-align: center;">0.90</td>
<td style="text-align: left;">Molecular Signal / Extract</td>
<td style="text-align: left;">Steam distillation or cold expression, sometimes followed by purification or encapsulation.</td>
<td style="text-align: left;">Essential oils are volatile, hydrophobic liquids containing concentrated aroma compounds; reviews highlight their use as natural flavourings and preservatives at very low inclusion levels, indicating high signal potency and justifying a high Class 6 score. <span class="citation" data-cites="EssentialOils_FoodApps_2024_M2">(Rodilla et al. 2024)</span></td>
</tr>
<tr class="odd">
<td style="text-align: left;">Crystalline chemical</td>
<td style="text-align: center;">0.98</td>
<td style="text-align: left;">De-novo / Synthetic</td>
<td style="text-align: left;">Chemical synthesis, purification, and crystallization (e.g., vanillin, ethyl vanillin, sodium salts).</td>
<td style="text-align: left;">Crystalline vanillin and similar flavour chemicals are single, chemically defined entities catalogued in PubChem; sodium glycolate is likewise described as a defined salt. These are essentially pure synthetic matter with negligible matrix linkage, near the Class 7 extreme. <span class="citation" data-cites="PubChem_Vanillin_1183_M2 PubChem_SodiumGlycolate_M2">(National Center for Biotechnology Information 2021; NCBI 2022)</span></td>
</tr>
<tr class="even">
<td style="text-align: left;">Granules</td>
<td style="text-align: center;">0.80</td>
<td style="text-align: left;">Constitutional Isolate</td>
<td style="text-align: left;">Agglomeration or granulation of flours, concentrates, isolates, or crystalline additives.</td>
<td style="text-align: left;">Codex explicitly allows soy protein products to be designated by physical forms such as <em>granules</em> or <em>bits</em>; such granules usually represent agglomerated isolates or concentrates, making them slightly above generic protein isolates in perceived purity and handling regularity. <span class="citation" data-cites="Codex_SoyProtein_175_1989_M2">(Codex Alimentarius Commission 1989)</span></td>
</tr>
<tr class="odd">
<td style="text-align: left;">Oleoresin (viscous)</td>
<td style="text-align: center;">0.88</td>
<td style="text-align: left;">Molecular Signal / Extract</td>
<td style="text-align: left;">Solvent extraction of spices followed by partial solvent removal to a viscous resin.</td>
<td style="text-align: left;">Viscous oleoresins preserve both essential oil and non-volatile resinous components and are widely used as concentrated spice ingredients; their high potency and low-dose application justify an M-score between generic extracts and essential oils. <span class="citation" data-cites="EssentialOils_FoodApps_2024_M2 FSSAI_Additives_2022_M2">(Rodilla et al. 2024; Food Safety and Standards Authority of India (FSSAI) 2022)</span></td>
</tr>
<tr class="even">
<td style="text-align: left;">Whey powder</td>
<td style="text-align: center;">0.52</td>
<td style="text-align: left;">Structural Fractionation</td>
<td style="text-align: left;">Separation of whey from curd, concentration, and drying.</td>
<td style="text-align: left;">Whey powder arises after removal of curd proteins and fat; it is recognized in dairy standards as a separate dried milk product comprised mainly of lactose and whey proteins, thus more matrix-thinned than whole milk powder but still a food fraction. <span class="citation" data-cites="FSSAI_Dairy_2025_M2 Whey_Applications_2015_M2">(Food Safety and Standards Authority of India (FSSAI) 2025a; <span class="nocase">Pintado et al.</span> 2015)</span></td>
</tr>
<tr class="odd">
<td style="text-align: left;">Starch flour</td>
<td style="text-align: center;">0.60</td>
<td style="text-align: left;">Structural Fractionation</td>
<td style="text-align: left;">Wet separation of starch from cereals or roots, followed by drying and milling.</td>
<td style="text-align: left;">Cereal and root starches are classified separately from whole flours in Chapter 11 and in Chapter 35 when chemically modified; the isolated carbohydrate fraction retains botanical origin but little of the original matrix, placing it at the high end of Class 4. <span class="citation" data-cites="DGCI_CH11_M2 DGCI_CH35_M2">(Directorate General of Commercial Intelligence and Statistics (DGCI&amp;S) 2007; DGCI&amp;S 2007b)</span></td>
</tr>
<tr class="even">
<td style="text-align: left;">Dense block / Cake (e.g., khoya)</td>
<td style="text-align: center;">0.38</td>
<td style="text-align: left;">Dehydrated / Concentrated</td>
<td style="text-align: left;">Prolonged heat concentration of milk to a semi-solid or solid mass.</td>
<td style="text-align: left;">Khoa/khoya is defined as a milk product obtained by partial dehydration of milk; solids are concentrated but composition remains broad (fat, protein, lactose), supporting a moderate Class 3 score. <span class="citation" data-cites="FSSAI_Dairy_2025_M2">(Food Safety and Standards Authority of India (FSSAI) 2025a)</span></td>
</tr>
<tr class="odd">
<td style="text-align: left;">Meal (e.g., defatted soya meal)</td>
<td style="text-align: center;">0.57</td>
<td style="text-align: left;">Structural Fractionation</td>
<td style="text-align: left;">Oil extraction from soybeans, followed by grinding to meal.</td>
<td style="text-align: left;">Defatted meals contain much of the non-fat matrix but have lost the bulk lipid; they are standard outputs of oilseed processing and fit upper Class 4 as protein- and fibre-rich subsets of the starting seed. <span class="citation" data-cites="DGCI_CH15_M2 Codex_SoyProtein_175_1989_M2">(DGCI&amp;S 2007a; Codex Alimentarius Commission 1989)</span></td>
</tr>
<tr class="even">
<td style="text-align: left;">Modified starch powder</td>
<td style="text-align: center;">0.96</td>
<td style="text-align: left;">De-novo / Synthetic</td>
<td style="text-align: left;">Chemical modification (e.g., acetylation, cross-linking) of starch polymers, then drying and milling.</td>
<td style="text-align: left;">JECFA describes acetylated distarch adipate as starch whose hydroxyls have been esterified with acetic and adipic moieties; this covalent modification creates a regulated food additive (INS 1422) classified under modified starches, giving it a low-end Class 7 score. <span class="citation" data-cites="JECFA_AcetylatedDistarch_2016_M2 FAO_GSFA_1422_M2 DGCI_CH35_M2">(FAO/WHO Joint Expert Committee on Food Additives 2016; FAO/WHO Codex GSFA 2025; DGCI&amp;S 2007b)</span></td>
</tr>
<tr class="odd">
<td style="text-align: left;">Emulsifier powder (e.g., lecithin)</td>
<td style="text-align: center;">0.89</td>
<td style="text-align: left;">Molecular Signal / Extract</td>
<td style="text-align: left;">Solvent extraction or fractionation of phospholipids from oils, followed by drying or spray-drying.</td>
<td style="text-align: left;">Lecithins are listed in additive regulations as surface-active phospholipid mixtures obtained from edible fats and oils; their role as functional emulsifiers at low inclusion levels and their separation from bulk matrix place them high in Class 6. <span class="citation" data-cites="FSSAI_Additives_2022_M2 Codex_FatsOils_19_1981_M2">(Food Safety and Standards Authority of India (FSSAI) 2022; Codex Alimentarius Commission 2015)</span></td>
</tr>
</tbody>
</table>
</div>
</figure>
</div>
<hr>
</section>
<section id="sec-f-scores" class="level2" data-number="3">
<h2 data-number="3" class="anchored" data-anchor-id="sec-f-scores"><span class="header-section-number">3</span> Table 3: Detailed Functional (F) Score Analysis</h2>
<p>Detailed Functional (F) Score Analysis: Identity Shift Logic and Statutory Basis.</p>
<div id="tbl-f-scores" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-tbl figure">
<figcaption class="quarto-float-caption-top quarto-float-caption quarto-float-tbl" id="tbl-f-scores-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Table&nbsp;3: Detailed Functional (F) Score Analysis: Identity Shift Logic and Statutory Basis.
</figcaption>
<div aria-describedby="tbl-f-scores-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<table class="caption-top table">
<colgroup>
<col style="width: 19%">
<col style="width: 23%">
<col style="width: 19%">
<col style="width: 19%">
<col style="width: 19%">
</colgroup>
<thead>
<tr class="header">
<th style="text-align: left;">Functional Class</th>
<th style="text-align: center;">F</th>
<th style="text-align: left;">Primary Tech. Role</th>
<th style="text-align: left;">Typical E-M Context</th>
<th style="text-align: left;">Identity Shift Logic &amp; Statutory Basis</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;"><strong>Base Ingredient</strong></td>
<td style="text-align: center;">0.12</td>
<td style="text-align: left;">Provide bulk, calories, protein, primary structure.</td>
<td style="text-align: left;">E: 0.12–0.82; M: 0.05–0.78</td>
<td style="text-align: left;">Even at very high E and M — spray-dried milk powder (E ≈ 0.48, M ≈ 0.42), solvent-extracted soy protein isolate (E ≈ 0.82, M ≈ 0.78) — regulatory frameworks mandate source-dominant naming. “Milk solids,” “soya protein isolate,” “wheat flour” are required declarations; functional roles (nutrition, structure, emulsification) remain implicit rather than named. F reflects institutional resistance to functional abstraction. F operates as a downward tie-breaker, preventing drift toward function-emergent status. FSSAI Reg 4(1) “true nature”; Reg 5(2) mandatory source-first; Sch II titles 1–3, 13–16; ITC-HS Ch 07–11 (source-aligned). <span class="citation" data-cites="legitquest_F2 DGCI_CH11_F2">(Legitquest Legal Database 2024; Directorate General of Commercial Intelligence and Statistics (DGCI&amp;S) 2022)</span></td>
</tr>
<tr class="even">
<td style="text-align: left;"><strong>Taste Profile</strong></td>
<td style="text-align: center;">0.18</td>
<td style="text-align: left;">Natural aroma, raw taste profile, herbal garnish.</td>
<td style="text-align: left;">E: 0.15–0.45; M: 0.10–0.40</td>
<td style="text-align: left;">Sensory function is acknowledged (elevating F above Base Ingredient), but botanical origin remains primary in regulatory naming. “Natural vanilla flavor” requires vanilla source identification where characterizing; “peppermint oil” retains species-specific designation. The “as appropriate” qualifier in Schedule II preserves contextual source judgment. Synthetic replication triggers different regulatory treatment (artificial flavor, F ≈ 0.88), confirming source-function coupling constraints. FSSAI Reg 5(2) Sch II title 8; Reg 2.6 (natural/nature-identical/artificial qualifiers); ITC-HS Ch 09, 12, 21, 33.01–33.02. <span class="citation" data-cites="legitquest_F2">(Legitquest Legal Database 2024)</span></td>
</tr>
<tr class="odd">
<td style="text-align: left;"><strong>Lipid Base</strong></td>
<td style="text-align: center;">0.22</td>
<td style="text-align: left;">Structural fat functionality, caloric contribution, texture/mouthfeel.</td>
<td style="text-align: left;">E: 0.32–0.92; M: 0.70–0.75</td>
<td style="text-align: left;">Pivotal for Key Intersection Analysis. Even intensive modification — hydrogenation (E ≈ 0.92), interesterification (E ≈ 0.91) — does not trigger functional re-casting while regulatory frameworks retain source-linked naming. “Hydrogenated vegetable oil,” “interesterified palm olein” are mandatory; triglyceride structure preservation in HS 1516 (“but not further prepared”) caps F at ≈ 0.35. Baseline 0.22 reflects regulatory resilience of source identity that processing intensity alone cannot overcome. FSSAI Reg 5(2) Sch II title 2; Codex CXS 19-1981 virgin/cold-pressed; ITC-HS Ch 15 (1507–1515 specific oils; 1516 modified). <span class="citation" data-cites="legitquest_F2 Codex_CXS19_1981_F2">(Legitquest Legal Database 2024; General Standard for Edible Fats and Oils Not Covered by Individual Standards 1981)</span></td>
</tr>
<tr class="even">
<td style="text-align: left;"><strong>Bulking Agent</strong></td>
<td style="text-align: center;">0.38</td>
<td style="text-align: left;">Increase volume, filler, non-nutritive bulk.</td>
<td style="text-align: left;">E: 0.60–0.85; M: 0.60–0.80</td>
<td style="text-align: left;">Partial functional recognition: “bulking agent” class acknowledged, but source often implicit in chemical name (maltodextrin from starch, cellulose from wood pulp/cotton). HS placement split between food-derived (Ch 11, 17) and chemically-processed (Ch 39) depending on modification degree. Reflects intermediate status: function named but not fully abstracted from material source. FSSAI Food Additives Regs 2011, Sch I class; ITC-HS 1702 (maltodextrins), 1109 (gluten), 3912 (cellulose). <span class="citation" data-cites="IndianKanoon_F2">(Indian Kanoon Repository 2024)</span></td>
</tr>
<tr class="odd">
<td style="text-align: left;"><strong>Humectant</strong></td>
<td style="text-align: center;">0.42</td>
<td style="text-align: left;">Retain moisture, prevent drying, wetting agent.</td>
<td style="text-align: left;">E: 0.55–0.75; M: 0.60–0.75</td>
<td style="text-align: left;">Moisture-retention function primary in naming, but glycerol source (vegetable/animal/synthetic) may be relevant for veg/non-veg classification. Synthetic glycerol achieves higher functional abstraction than fat-derived. Reflects moderate elevation: frameworks permit functional-class declaration but source indication remains commercially and regulatorily significant for certain applications. FSSAI Food Additives Regs 2011, Sch I class; Glycerol (INS 422), Sorbitol (INS 420), Propylene Glycol (INS 1520); ITC-HS 2905.45, 2906, 3824. <span class="citation" data-cites="IndianKanoon_F2">(Indian Kanoon Repository 2024)</span></td>
</tr>
<tr class="even">
<td style="text-align: left;"><strong>Firming Agent</strong></td>
<td style="text-align: center;">0.45</td>
<td style="text-align: left;">Maintain crispness, strengthen gel.</td>
<td style="text-align: left;">E: 0.50–0.70; M: 0.75–0.85</td>
<td style="text-align: left;">Crispness maintenance is pure technological function with no nutritional role, yet mineral source (calcium, aluminum) retains chemical specificity. “Firming agent (calcium chloride)” presents functional priority with residual material identity. Reflects chemical-functional dual identity: higher than plant-derived due to inorganic source irrelevance to biological origin, but capped by specific naming requirements. FSSAI Food Additives Regs 2011, Sch I class; Calcium chloride (INS 509), Calcium lactate (INS 327); ITC-HS 2827, 2833, 2834, 3824. <span class="citation" data-cites="IndianKanoon_F2">(Indian Kanoon Repository 2024)</span></td>
</tr>
<tr class="odd">
<td style="text-align: left;"><strong>Raising Agent</strong></td>
<td style="text-align: center;">0.48</td>
<td style="text-align: left;">Liberate gas, increase volume, leavening.</td>
<td style="text-align: left;">E: 0.45–0.65; M: 0.70–0.85</td>
<td style="text-align: left;">Gas liberation function clearly functional, but “baking soda” (sodium bicarbonate) retains common-name source identification in consumer discourse. Prepared baking powders (mixed leavening systems) in 3824 achieve higher functional abstraction. The F = 0.48 reflects equilibrium position: chemical naming standard, functional class permitted, consumer familiarity with source-based terms moderating full abstraction. FSSAI Food Products Standards and Food Additives Regulations, 2011, Schedule I “raising agent” class: sodium bicarbonate (INS 500(ii)), ammonium bicarbonate (INS 503(ii)), sodium acid pyrophosphate (INS 450(i)); ITC-HS 2836 (carbonates), 2835 (phosphates), 3824 (prepared baking powders). <span class="citation" data-cites="IndianKanoon_F2">(Indian Kanoon Repository 2024)</span></td>
</tr>
<tr class="even">
<td style="text-align: left;"><strong>Thickener</strong></td>
<td style="text-align: center;">0.62</td>
<td style="text-align: left;">Increase viscosity, bodying agent, texturizing agent.</td>
<td style="text-align: left;">E: 0.70–0.94; M: 0.60–0.96</td>
<td style="text-align: left;">Mandatory functional class declaration elevates F decisively. “Thickener (xanthan gum)” or “Thickener (INS 415)” presents function primary, source secondary. However, source variability within class (plant gums, animal proteins, modified starches, synthetic polymers) prevents complete abstraction: specific identification retains traceability to origin. HS migration to Chapter 35/39 for modified/cellulosic materials supports elevated F, but native gums in Chapter 13 maintain moderate source linkage. The F = 0.62 captures this regulatory-driven functional priority with residual source significance. FSSAI Regulation 5(5): mandatory functional class declaration with INS number; Food Products Standards and Food Additives Regulations, 2011, Schedule I “thickener” class; ITC-HS 1302 (vegetable saps and extracts), 3505 (modified starches), 3912 (cellulose ethers), 3824. <span class="citation" data-cites="legitquest_F2">(Legitquest Legal Database 2024)</span></td>
</tr>
<tr class="odd">
<td style="text-align: left;"><strong>Stabilizer</strong></td>
<td style="text-align: center;">0.65</td>
<td style="text-align: left;">Maintain dispersion, prevent sedimentation, foam stabilizer.</td>
<td style="text-align: left;">E: 0.75–0.90; M: 0.70–0.89</td>
<td style="text-align: left;">Dispersion maintenance is more technologically specific than thickening — requires kinetic stability, not just viscosity. Broader source variability (vegetable extracts, microbial products, synthetic polymers) supports higher F than thickeners. “Stabilizer” declaration standard with optional source parenthetical; HS chemical-product placement common. The F = 0.65 reflects stronger functional dominance due to specialized technological application and greater source heterogeneity within class. FSSAI Regulation 5(5): mandatory functional class declaration; Food Products Standards and Food Additives Regulations, 2011, Schedule I “stabilizer” class; ITC-HS 1302 (seaweed extracts), 3504 (peptones, protein substances), 3824 (prepared blends). <span class="citation" data-cites="legitquest_F2">(Legitquest Legal Database 2024)</span></td>
</tr>
<tr class="even">
<td style="text-align: left;"><strong>Gelling Agent</strong></td>
<td style="text-align: center;">0.68</td>
<td style="text-align: left;">Gel formation, structure provider.</td>
<td style="text-align: left;">E: 0.70–0.89; M: 0.75–0.90</td>
<td style="text-align: left;">Gel formation is definitive functional transformation of food matrix — creates novel physical structure not present in starting materials. “Gelling agent” declaration standard with source parenthetical (“gelling agent (pectin)”). Gelatin (animal-derived) faces source-disclosure constraints from Ram Gaua Raksha Dal <span class="citation" data-cites="lalitha_2026_supreme_court">(Lalitha 2026b)</span>, capping its effective F; plant-derived gelling agents achieve higher functional abstraction. The F = 0.68 reflects strong functional dominance with residual source significance for protein-based gels. FSSAI Regulation 5(5): mandatory functional class declaration; Food Products Standards and Food Additives Regulations, 2011, Schedule I “gelling agent” class: gelatin, agar, pectin, carrageenan, gellan gum; ITC-HS 3503 (gelatin), 1302, 3824. <span class="citation" data-cites="legitquest_F2 SSRana_F2">(Legitquest Legal Database 2024; High Court of Delhi 2021)</span></td>
</tr>
<tr class="odd">
<td style="text-align: left;"><strong>Foaming Agent</strong></td>
<td style="text-align: center;">0.72</td>
<td style="text-align: left;">Form gas dispersion, whipping agent.</td>
<td style="text-align: left;">E: 0.75–0.95; M: 0.78–0.90</td>
<td style="text-align: left;">Gas dispersion for volume expansion is highly technical function — requires precise surface-activity, film-forming, gas-retention properties. Foaming power, not source, defines quality: egg white, soy protein, synthetic surfactants functionally equivalent at specified performance levels. Elevated F reflects specialized technological application and performance-based selection criteria. Protein-based foaming agents retain slight source linkage (egg, soy), synthetic alternatives achieve higher abstraction. FSSAI Regulation 5(5): mandatory functional class declaration; Food Products Standards and Food Additives Regulations, 2011, Schedule I “foaming agent” class: albumen, quillaia extract, synthetic surfactants; ITC-HS 3502 (albumin, egg white), 1302 (saponins — quillaia), 3402 (organic surface-active agents — synthetic), 3824. <span class="citation" data-cites="legitquest_F2">(Legitquest Legal Database 2024)</span></td>
</tr>
<tr class="even">
<td style="text-align: left;"><strong>Emulsifier</strong></td>
<td style="text-align: center;">0.82</td>
<td style="text-align: left;">Form emulsion, maintain emulsion, prevent fat separation.</td>
<td style="text-align: left;">E: 0.82–0.94; M: 0.78–0.96</td>
<td style="text-align: left;">The Emulsifier class exemplifies the E-M-F tie-breaker function. Lecithin: E ≈ 0.89 (solvent extraction, fractionation, drying), M ≈ 0.89 (phospholipid concentrate) — conditions suggesting ambiguous identity. Yet regulatory practice mandates “emulsifier (lecithin)” or “emulsifier (INS 322)” declaration, with ITC-HS placement in 2923.20 (chemical products) rather than 1516 (modified fats). Functional class primary, source parenthetical or absent. The F = 0.82 captures this regulatory-naming resolution of E-M ambiguity. Mono-/diglycerides similarly achieve F ≈ 0.82 through additive-schedule classification and prepared-additive HS placement, despite fat-derived origin. The lecithin vs.&nbsp;fractionated olein comparison illuminates F’s decisive role: nearly identical E-M, radically different F due to regulatory classification divergence. FSSAI Regulation 5(5): mandatory functional class declaration; Food Products Standards and Food Additives Regulations, 2011, Schedule I “emulsifier” class: lecithins (INS 322), mono-/diglycerides (INS 471), polysorbates (INS 432–436), sucrose esters (INS 473–474); ITC-HS 2923.20, 3824, 3402. <span class="citation" data-cites="legitquest_F2">(Legitquest Legal Database 2024)</span></td>
</tr>
<tr class="odd">
<td style="text-align: left;"><strong>Anticaking Agent</strong></td>
<td style="text-align: center;">0.85</td>
<td style="text-align: left;">Prevent clumping, improve flow, anti-stick agent.</td>
<td style="text-align: left;">E: 0.60–0.85; M: 0.80–0.95</td>
<td style="text-align: left;">Flow improvement is purely technical function with no nutritional, sensory, or structural role in final product. Source completely irrelevant to application: silicon dioxide from sand or synthetic, calcium silicate from mineral or industrial process — functionally equivalent. “Anticaking agent” declaration standard with chemical name or INS number; no source indication required or expected. High F reflects total identity divorce from biological origin. FSSAI Regulation 5(5): mandatory functional class declaration; Food Products Standards and Food Additives Regulations, 2011, Schedule I “anticaking agent” class: silicon dioxide (INS 551), calcium silicate (INS 552), magnesium carbonate (INS 504(i)), various phosphates; ITC-HS 2811 (silicon dioxide), 2835 (phosphates), 3824 (prepared anticaking preparations). <span class="citation" data-cites="legitquest_F2">(Legitquest Legal Database 2024)</span></td>
</tr>
<tr class="even">
<td style="text-align: left;"><strong>Acidity Regulator</strong></td>
<td style="text-align: center;">0.87</td>
<td style="text-align: left;">Control pH, acidifier, buffering agent, alkali.</td>
<td style="text-align: left;">E: 0.55–0.90; M: 0.70–0.90</td>
<td style="text-align: left;">pH control is chemically precise function — requires defined acid/base strength, buffer capacity, taste profile. Organic acids may retain nominal source linkage (citric “from fermentation,” lactic “from dairy”), but regulatory classification by chemical structure dominates. Synthetic production and functional-class declaration achieve near-complete abstraction. The F = 0.87 reflects very high functional dominance with minimal residual source significance. FSSAI Regulation 5(5): mandatory functional class declaration; Food Products Standards and Food Additives Regulations, 2011, Schedule I “acidity regulator” class: citric acid (INS 330), lactic acid (INS 270), phosphoric acid (INS 338), acetic acid (INS 260), various salts; ITC-HS 2915 (saturated acyclic monocarboxylic acids), 2918 (carboxylic acids with additional oxygen functions), 2835 (phosphates), 3824. <span class="citation" data-cites="legitquest_F2">(Legitquest Legal Database 2024)</span></td>
</tr>
<tr class="odd">
<td style="text-align: left;"><strong>Antioxidant</strong></td>
<td style="text-align: center;">0.88</td>
<td style="text-align: left;">Prevent oxidation, prevent rancidity, antibrowning.</td>
<td style="text-align: left;">E: 0.60–0.95; M: 0.75–0.98</td>
<td style="text-align: left;">Oxidation prevention is chemically specific function — free radical scavenging, metal chelation, oxygen absorption — mechanism-dependent, not source-dependent. Synthetic antioxidants (BHA, BHT, TBHQ) achieve complete source abstraction; natural alternatives (tocopherols, rosemary extract) retain slight source linkage moderating class average. “Antioxidant” declaration standard with specific name or INS number; mechanism of action primary, origin secondary. The F = 0.88 reflects near-complete functional dominance with chemical-mechanistic specificity. FSSAI Regulation 5(5): mandatory functional class declaration; Food Products Standards and Food Additives Regulations, 2011, Schedule I “antioxidant” class: BHA (INS 320), BHT (INS 321), TBHQ (INS 319), tocopherols (INS 307), ascorbic acid (INS 300), rosemary extract (INS 392); ITC-HS 2907 (phenols), 2918 (carboxylic acids), 3824. <span class="citation" data-cites="legitquest_F2">(Legitquest Legal Database 2024)</span></td>
</tr>
<tr class="even">
<td style="text-align: left;"><strong>Preservative</strong></td>
<td style="text-align: center;">0.89</td>
<td style="text-align: left;">Inhibit microbes, retard fermentation, antimycotic.</td>
<td style="text-align: left;">E: 0.60–0.90; M: 0.75–0.95</td>
<td style="text-align: left;">Microbial inhibition is safety-critical function with strict regulatory control: maximum permitted levels, prohibited food categories, specific labelling requirements. Preservative efficacy independent of source: benzoic acid from gum benzoin or synthetic, sorbic acid from rowan berries or petrochemical — toxicologically and functionally equivalent. “Preservative” declaration with specific name/INS number; safety profile and antimicrobial spectrum primary, origin irrelevant. The F = 0.89 reflects maximum functional dominance for food-safety-critical additives, with regulatory-driven identity. FSSAI Regulation 5(5): mandatory functional class declaration; Food Products Standards and Food Additives Regulations, 2011, Schedule I “preservative” class: benzoic acid/sodium benzoate (INS 210–211), sorbic acid/potassium sorbate (INS 200–202), propionic acid/calcium propionate (INS 280–282), sulfur dioxide (INS 220), nisin (INS 234), natamycin (INS 235); ITC-HS 2916 (unsaturated monocarboxylic acids), 2918 (carboxylic acids), 3824 (prepared preservative systems). <span class="citation" data-cites="legitquest_F2">(Legitquest Legal Database 2024)</span></td>
</tr>
<tr class="odd">
<td style="text-align: left;"><strong>Antifoaming Agent</strong></td>
<td style="text-align: center;">0.90</td>
<td style="text-align: left;">Prevent foaming, reduce surface tension.</td>
<td style="text-align: left;">E: 0.70–0.95; M: 0.85–0.98</td>
<td style="text-align: left;">Foam prevention is highly specialized industrial function — used at ppm levels in processing, no consumer-perceptible presence in final product. Silicone-based, mineral oil, polyglycol antifoams chemically defined with no meaningful biological source. “Antifoaming agent” declaration standard; process optimization criteria (temperature stability, dispersion, efficacy) sole selection factors. The F = 0.90 reflects near-total functional abstraction for processing-aid category. FSSAI Regulation 5(5): mandatory functional class declaration; Food Products Standards and Food Additives Regulations, 2011, Schedule I “antifoaming agent” class: dimethylpolysiloxane (INS 900a), mineral oil (INS 905a), various fatty acid esters; ITC-HS 3910 (silicones), 2710 (mineral oils), 3824 (prepared antifoaming compositions). <span class="citation" data-cites="legitquest_F2">(Legitquest Legal Database 2024)</span></td>
</tr>
<tr class="even">
<td style="text-align: left;"><strong>Sequestrant</strong></td>
<td style="text-align: center;">0.91</td>
<td style="text-align: left;">Bind metal ions, control oxidation catalyst.</td>
<td style="text-align: left;">E: 0.75–0.95; M: 0.80–0.98</td>
<td style="text-align: left;">Metal ion binding is precise chemical mechanism — stability constants, chelation kinetics, pH dependence define performance. EDTA, citrates, polyphosphates chemically synthesized; no biological source relevant. “Sequestrant” declaration with chemical specificity; chelating capacity primary, molecular structure secondary, origin absent. The F = 0.91 reflects advanced tool-identity for mechanistically specialized function. FSSAI Regulation 5(5): mandatory functional class declaration; Food Products Standards and Food Additives Regulations, 2011, Schedule I “sequestrant” class: calcium disodium EDTA (INS 385), disodium EDTA (INS 386), various citrates, phosphates, polyphosphates; ITC-HS 2917 (polycarboxylic acids), 2922 (oxygen-function amino-compounds), 2835 (phosphates), 3824. <span class="citation" data-cites="legitquest_F2">(Legitquest Legal Database 2024)</span></td>
</tr>
<tr class="odd">
<td style="text-align: left;"><strong>Bleaching Agent</strong></td>
<td style="text-align: center;">0.92</td>
<td style="text-align: left;">Decolorize food, flour bleaching.</td>
<td style="text-align: left;">E: 0.80–0.95; M: 0.85–0.98</td>
<td style="text-align: left;">Decolorization is aggressive chemical intervention — oxidative destruction of pigments, not nutritional or sensory contribution. Bleaching agents not consumed as food but as processing aids; residues minimized or removed. “Bleaching agent” or “flour treatment agent (bleaching)” declaration; chemical reactivity primary, source irrelevant. The F = 0.92 reflects processing-tool status with no food-component identity. FSSAI Regulation 5(5): mandatory functional class declaration; Food Products Standards and Food Additives Regulations, 2011, Schedule I “bleaching agent” class: benzoyl peroxide (INS 928), chlorine dioxide, sulfur dioxide (INS 220); ITC-HS 2815 (inorganic bases), 2820 (manganese oxides), 3824. <span class="citation" data-cites="legitquest_F2">(Legitquest Legal Database 2024)</span></td>
</tr>
<tr class="even">
<td style="text-align: left;"><strong>Flour Treatment Agent</strong></td>
<td style="text-align: center;">0.93</td>
<td style="text-align: left;">Improve baking quality, dough conditioner, dough strengthener.</td>
<td style="text-align: left;">E: 0.70–0.90; M: 0.75–0.90</td>
<td style="text-align: left;">Dough conditioning is exquisitely application-specific — rheology modification, gluten development, fermentation control for bread quality optimization. Treatment agents transform flour functionality without becoming part of final product identity; enzymatic action consumed in processing. “Flour treatment agent” declaration with specific agent; technological outcome (dough properties) primary, chemical/enzymatic mechanism secondary, source absent. The F = 0.93 reflects extreme functional specialization for bakery-processing optimization. FSSAI Regulation 5(5): mandatory functional class declaration; Food Products Standards and Food Additives Regulations, 2011, Schedule I “flour treatment agent” class: ascorbic acid (INS 300), L-cysteine (INS 920), various enzymes (amylases, proteases, xylanases), azodicarbonamide; ITC-HS 2936 (vitamins), 2930 (sulfur-organic compounds), 3507 (enzymes), 3824. <span class="citation" data-cites="legitquest_F2">(Legitquest Legal Database 2024)</span></td>
</tr>
<tr class="odd">
<td style="text-align: left;"><strong>Carrier</strong></td>
<td style="text-align: center;">0.94</td>
<td style="text-align: left;">Dissolve additive, dilute nutrient, encapsulating agent.</td>
<td style="text-align: left;">E: 0.60–0.95; M: 0.70–0.96</td>
<td style="text-align: left;">The Carrier function represents meta-functional identity: the carrier’s purpose is to enable function of other ingredients — dissolution, dispersion, encapsulation, controlled release. Maltodextrin, modified starches, oils, glycerol as carriers: source (corn, wheat, palm, soy) irrelevant to carrier function; delivery performance (solubility, viscosity, compatibility) sole criteria. “Carrier” declaration with optional specific material; technological service function completely eclipses material identity. The F = 0.94 is second-highest assigned score, reflecting near-complete functional abstraction. Critical for lipid crossing-point analysis: when vegetable fat becomes “carrier,” F elevates from 0.22 to 0.94 — the definitive functional re-casting. FSSAI Regulation 5(5): mandatory functional class declaration; Food Products Standards and Food Additives Regulations, 2011, Schedule I “carrier” class: starches, maltodextrins, oils, water, propylene glycol, various gums; ITC-HS 3824 (prepared carriers), with specific materials in 1106, 1520, 2905 depending on form. <span class="citation" data-cites="legitquest_F2">(Legitquest Legal Database 2024)</span></td>
</tr>
<tr class="even">
<td style="text-align: left;"><strong>Propellant</strong></td>
<td style="text-align: center;">0.95</td>
<td style="text-align: left;">Expel food from container.</td>
<td style="text-align: left;">E: 0.60–0.85; M: 0.85–0.98</td>
<td style="text-align: left;">Food expulsion is purely mechanical/physical function — pressure, expansion, flow properties define performance. Gaseous state, no nutritional function, chemically defined: nitrous oxide (N<sub>2</sub>O), carbon dioxide (CO<sub>2</sub>), nitrogen (N<sub>2</sub>) identified by molecular formula and physical properties, not biological origin. “Propellant” declaration with chemical name or INS number; pressure-temperature behavior primary, chemical identity secondary, source completely absent. The F = 0.95 is maximum assigned score, reflecting complete source abstraction and pure tool-identity. FSSAI Regulation 5(5): mandatory functional class declaration; Food Products Standards and Food Additives Regulations, 2011, Schedule I “propellant” class: nitrous oxide (INS 942), carbon dioxide (INS 290), nitrogen (INS 941), various hydrocarbons (INS 943–945); ITC-HS 2811 (inorganic acids and oxygen compounds of non-metals), 2711 (petroleum gases), 3824. <span class="citation" data-cites="legitquest_F2">(Legitquest Legal Database 2024)</span></td>
</tr>
<tr class="odd">
<td style="text-align: left;"><strong>Packaging Gas</strong></td>
<td style="text-align: center;">0.95</td>
<td style="text-align: left;">Modified atmosphere, prevent oxidation in pack.</td>
<td style="text-align: left;">E: 0.55–0.75; M: 0.85–0.98</td>
<td style="text-align: left;">Modified atmosphere preservation is environmental control function — oxygen exclusion, carbon dioxide antimicrobial effect, inert gas displacement protect food quality. Gas identity determined by chemical/physical properties: nitrogen inertness, carbon dioxide solubility, argon density — not by biological origin. Elemental gases (atmospheric, cryogenic, synthetic) functionally equivalent; “packaging gas” declaration with specific gas; atmosphere composition primary, gas source irrelevant. The F = 0.95 matches Propellant as maximum score, reflecting complete functional abstraction from biological matrix. FSSAI Regulation 5(5): mandatory functional class declaration; Food Products Standards and Food Additives Regulations, 2011, Schedule I “packaging gas” class: nitrogen (INS 941), carbon dioxide (INS 290), argon (INS 938), oxygen (INS 948); ITC-HS 2811 (inert gases, nitrogen, carbon dioxide), 3824 (prepared atmosphere mixtures). <span class="citation" data-cites="legitquest_F2">(Legitquest Legal Database 2024)</span></td>
</tr>
</tbody>
</table>
</div>
</figure>
</div>
<hr>
</section>
<section id="references" class="level2 unnumbered">
<h2 class="unnumbered anchored" data-anchor-id="references">References</h2>
<div id="refs" class="references csl-bib-body hanging-indent">
<div id="ref-PalmOil_Processing_2023" class="csl-entry">
<span class="nocase">Abdul Wahab, Siti et al.</span> 2023. <span>“Palm Oil: Processing, Characterization and Utilization in the Food Industry.”</span> <em>Frontiers in Nutrition</em>. <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC10122035/">https://pmc.ncbi.nlm.nih.gov/articles/PMC10122035/</a>.
</div>
<div id="ref-AOCS_Hydrogenation_2024" class="csl-entry">
American Oil Chemists’ Society (AOCS). 2024. <em>Hydrogenation in Practice</em>. Technical resource page. <a href="https://www.aocs.org/resource/hydrogenation-in-practice/">https://www.aocs.org/resource/hydrogenation-in-practice/</a>.
</div>
<div id="ref-Hexane_Substitution_2022" class="csl-entry">
<span class="nocase">Boukhenfa, Hana et al.</span> 2022. <span>“Towards Substitution of Hexane as Extraction Solvent of Food Products: A Review.”</span> <em>Foods</em>. <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC9655691/">https://pmc.ncbi.nlm.nih.gov/articles/PMC9655691/</a>.
</div>
<div id="ref-Codex_SoyProtein_175_1989_M2" class="csl-entry">
Codex Alimentarius Commission. 1989. <em>General Standard for Soy Protein Products (CXS 175-1989)</em>. Codex standard. <a href="https://www.fao.org/input/download/standards/325/CXS_175e.pdf">https://www.fao.org/input/download/standards/325/CXS_175e.pdf</a>.
</div>
<div id="ref-Codex_FatsOils_19_1981_M2" class="csl-entry">
Codex Alimentarius Commission. 2015. <em>Standard for Edible Fats and Oils Not Covered by Individual Standards (CXS 19-1981)</em>. Codex standard. <a href="https://www.fao.org/input/download/standards/74/CXS_019e_2015.pdf">https://www.fao.org/input/download/standards/74/CXS_019e_2015.pdf</a>.
</div>
<div id="ref-Codex_CXS19_1981" class="csl-entry">
Codex Alimentarius Commission (FAO/WHO). 2024. <em>Standard for Edible Fats and Oils Not Covered by Individual Standards (CXS 19-1981)</em>. Official PDF. <a href="https://workspace.fao.org/sites/codex/Standards/CXS%2019-1981/CXS_019e.pdf">https://workspace.fao.org/sites/codex/Standards/CXS%2019-1981/CXS_019e.pdf</a>.
</div>
<div id="ref-DGCI_CH15_M2" class="csl-entry">
DGCI&amp;S. 2007a. <em>Indian Trade Classification (h.s.): Chapter 15 — Animal or Vegetable Fats and Oils and Their Cleavage Products; Prepared Edible Fats; Animal or Vegetable Waxes</em>. Official tariff schedule. <a href="https://www.dgciskol.gov.in/Writereaddata/Downloads/CHP_15.pdf">https://www.dgciskol.gov.in/Writereaddata/Downloads/CHP_15.pdf</a>.
</div>
<div id="ref-DGCI_CH35_M2" class="csl-entry">
DGCI&amp;S. 2007b. <em>Indian Trade Classification (h.s.): Chapter 35 — Albuminoidal Substances; Modified Starches; Glues; Enzymes</em>. Official tariff schedule. <a href="https://dgciskol.gov.in/Writereaddata/Downloads/2007/CHP_35.pdf">https://dgciskol.gov.in/Writereaddata/Downloads/2007/CHP_35.pdf</a>.
</div>
<div id="ref-DGCI_CH11_M2" class="csl-entry">
Directorate General of Commercial Intelligence and Statistics (DGCI&amp;S). 2007. <em>Indian Trade Classification (h.s.): Chapter 11 — Products of the Milling Industry; Malt; Starches; Inulin; Wheat Gluten</em>. Official tariff schedule. <a href="https://www.dgciskol.gov.in/Writereaddata/Downloads/2007/CHP_11.pdf">https://www.dgciskol.gov.in/Writereaddata/Downloads/2007/CHP_11.pdf</a>.
</div>
<div id="ref-DGCI_CH11_F2" class="csl-entry">
Directorate General of Commercial Intelligence and Statistics (DGCI&amp;S). 2022. <em>Indian Trade Classification (Harmonised System) - ITC(HS) 2022</em>. Ministry of Commerce; Industry, Government of India.
</div>
<div id="ref-DGCI_CH07" class="csl-entry">
Directorate General of Commercial Intelligence and Statistics (DGCI&amp;S), Government of India. 2007a. <em>Indian Trade Classification (h.s.): Chapter 07 — Edible Vegetables and Certain Roots and Tubers</em>. Official PDF. <a href="https://www.dgciskol.gov.in/Writereaddata/Downloads/2007/CHP_07.pdf">https://www.dgciskol.gov.in/Writereaddata/Downloads/2007/CHP_07.pdf</a>.
</div>
<div id="ref-DGCI_CH11" class="csl-entry">
Directorate General of Commercial Intelligence and Statistics (DGCI&amp;S), Government of India. 2007b. <em>Indian Trade Classification (h.s.): Chapter 11 — Products of the Milling Industry; Malt; Starches; Inulin; Wheat Gluten</em>. Official PDF. <a href="https://www.dgciskol.gov.in/Writereaddata/Downloads/2007/CHP_11.pdf">https://www.dgciskol.gov.in/Writereaddata/Downloads/2007/CHP_11.pdf</a>.
</div>
<div id="ref-DGCI_CH15" class="csl-entry">
Directorate General of Commercial Intelligence and Statistics (DGCI&amp;S), Government of India. 2007c. <em>Indian Trade Classification (h.s.): Chapter 15 — Animal or Vegetable Fats and Oils and Their Cleavage Products; Prepared Edible Fats; Animal or Vegetable Waxes</em>. Official PDF. <a href="https://www.dgciskol.gov.in/Writereaddata/Downloads/CHP_15.pdf">https://www.dgciskol.gov.in/Writereaddata/Downloads/CHP_15.pdf</a>.
</div>
<div id="ref-DGCI_CH22" class="csl-entry">
Directorate General of Commercial Intelligence and Statistics (DGCI&amp;S), Government of India. 2007d. <em>Indian Trade Classification (h.s.): Chapter 22 — Beverages, Spirits and Vinegar</em>. Official PDF. <a href="https://www.dgciskol.gov.in/Writereaddata/Downloads/CHP_22.pdf">https://www.dgciskol.gov.in/Writereaddata/Downloads/CHP_22.pdf</a>.
</div>
<div id="ref-DGCI_CH35" class="csl-entry">
Directorate General of Commercial Intelligence and Statistics (DGCI&amp;S), Government of India. 2007e. <em>Indian Trade Classification (h.s.): Chapter 35 — Albuminoidal Substances; Modified Starches; Glues; Enzymes (Effective from 1 April 2007)</em>. Official PDF. <a href="https://dgciskol.gov.in/Writereaddata/Downloads/2007/CHP_35.pdf">https://dgciskol.gov.in/Writereaddata/Downloads/2007/CHP_35.pdf</a>.
</div>
<div id="ref-FAO_GSFA_1422_M2" class="csl-entry">
FAO/WHO Codex GSFA. 2025. <em>Acetylated Distarch Adipate (INS 1422) — GSFA Food Additive Details</em>. Online GSFA database. <a href="https://www.fao.org/gsfaonline/additives/details.html?id=152">https://www.fao.org/gsfaonline/additives/details.html?id=152</a>.
</div>
<div id="ref-JECFA_AcetylatedDistarch_2016_M2" class="csl-entry">
FAO/WHO Joint Expert Committee on Food Additives. 2016. <em>Acetylated Distarch Adipate — JECFA Specification Monograph</em>. FAO JECFA Monographs 19. <a href="https://openknowledge.fao.org/server/api/core/bitstreams/130fb981-1d76-4ac8-9485-11006095eb19/content">https://openknowledge.fao.org/server/api/core/bitstreams/130fb981-1d76-4ac8-9485-11006095eb19/content</a>.
</div>
<div id="ref-FSSAI_Additives_2022_M2" class="csl-entry">
Food Safety and Standards Authority of India (FSSAI). 2022. <em>Food Safety and Standards (Food Products Standards and Food Additives) Regulations, 2011 – Compendium (18[3.1: Food Additives])</em>. Official PDF. <a href="https://www.fssai.gov.in/upload/uploadfiles/files/Compendium_Food_Additives_Regulations_20_12_2022.pdf">https://www.fssai.gov.in/upload/uploadfiles/files/Compendium_Food_Additives_Regulations_20_12_2022.pdf</a>.
</div>
<div id="ref-FSSAI_Label_2020_M2" class="csl-entry">
Food Safety and Standards Authority of India (FSSAI). 2023b. <em>Food Safety and Standards (Labelling and Display) Regulations, 2020 (Version-VI, 22.02.2023)</em>. Official gazette compilation. <a href="https://www.fssai.gov.in/upload/uploadfiles/files/Comp_Labelling.pdf">https://www.fssai.gov.in/upload/uploadfiles/files/Comp_Labelling.pdf</a>.
</div>
<div id="ref-FSSAI_Label_2020" class="csl-entry">
Food Safety and Standards Authority of India (FSSAI). 2023a. <em>Food Safety and Standards (Labelling and Display) Regulations, 2020 (Version-VI, 22.02.2023)</em>. Official PDF. <a href="https://www.fssai.gov.in/upload/uploadfiles/files/Comp_Labelling.pdf">https://www.fssai.gov.in/upload/uploadfiles/files/Comp_Labelling.pdf</a>.
</div>
<div id="ref-FSSAI_Additives_Chapter3_2024" class="csl-entry">
Food Safety and Standards Authority of India (FSSAI). 2024. <em>Food Product Standards and Food Additives: Chapter 3 — Substances Added to Food (Version 2, 04.11.2024)</em>. Official PDF. <a href="https://fssai.gov.in/upload/uploadfiles/files/Chapter%203_Substances%20added%20to%20food.pdf">https://fssai.gov.in/upload/uploadfiles/files/Chapter%203_Substances%20added%20to%20food.pdf</a>.
</div>
<div id="ref-FSSAI_Dairy_2025_M2" class="csl-entry">
Food Safety and Standards Authority of India (FSSAI). 2025a. <em>Food Product Standards: Chapter 2.1 Dairy Products and Analogues</em>. Official PDF. <a href="https://www.fssai.gov.in/upload/uploadfiles/files/Chapter%202_1%20(Dairy%20products%20and%20analogues).pdf">https://www.fssai.gov.in/upload/uploadfiles/files/Chapter%202_1%20(Dairy%20products%20and%20analogues).pdf</a>.
</div>
<div id="ref-FSSAI_Dairy_2025" class="csl-entry">
Food Safety and Standards Authority of India (FSSAI). 2025b. <em>Food Product Standards: Chapter 2.1 Dairy Products and Analogues (Version 3, 07.05.2025)</em>. Official PDF. <a href="https://www.fssai.gov.in/upload/uploadfiles/files/Chapter%202_1_Dairy_products_and_analogues.pdf">https://www.fssai.gov.in/upload/uploadfiles/files/Chapter%202_1_Dairy_products_and_analogues.pdf</a>.
</div>
<div id="ref-Codex_CXS19_1981_F2" class="csl-entry">
General Standard for Edible Fats and Oils Not Covered by Individual Standards, Pub. L. Nos. CXS 19-1981 (1981).
</div>
<div id="ref-FSSAI_SolventExtracted_Order_1967" class="csl-entry">
Government of India (hosted on FSSAI website). 1967. <em>The Solvent Extracted Oil, de-Oiled Meal and Edible Flour (Control) Order, 1967 (as Uploaded)</em>. Official PDF. <a href="https://fssai.gov.in/upload/uploadfiles/files/solvent-Extracted.pdf">https://fssai.gov.in/upload/uploadfiles/files/solvent-Extracted.pdf</a>.
</div>
<div id="ref-SSRana_F2" class="csl-entry">
High Court of Delhi. 2021. <em>Ram Gaua Raksha Dal Vs. Union of India &amp; Ors.</em> Order. <a href="https://indiankanoon.org/doc/189442159/">https://indiankanoon.org/doc/189442159/</a>.
</div>
<div id="ref-IndianKanoon_F2" class="csl-entry">
Indian Kanoon Repository. 2024. <em>Compilation of Food Additive Functional Classes and Statutory Definitions</em>. <a href="https://indiankanoon.org/">https://indiankanoon.org/</a>.
</div>
<div id="ref-JECFA_AcetylatedDistarchAdipate" class="csl-entry">
Joint FAO/WHO Expert Committee on Food Additives (JECFA). 1974. <em>Acetylated Distarch Adipate — WHO Food Additives Series 17</em>. InChem monograph. <a href="https://inchem.org/documents/jecfa/jecmono/v17je12.htm">https://inchem.org/documents/jecfa/jecmono/v17je12.htm</a>.
</div>
<div id="ref-lalitha_2026_emf_main" class="csl-entry">
Lalitha, A. R. 2026a. <em><span class="nocase">Identity, Transformation, and Function: A Tri-Axial Model for the Classification of Food Ingredient Identity</span></em>. Interdisciplinary Systems Research Lab.
</div>
<div id="ref-lalitha_2026_supreme_court" class="csl-entry">
Lalitha, A. R. 2026b. <em><span class="nocase">Indian Supreme Court Defines Hierarchical Classification for Food Products: Overruling Common Parlance Precedents</span></em>. Interdisciplinary Systems Research Lab.
</div>
<div id="ref-legitquest_F2" class="csl-entry">
Legitquest Legal Database. 2024. <em>FSSAI Statutory Mapping for Ingredient Nomenclature</em>. <a href="https://www.legitquest.com/">https://www.legitquest.com/</a>.
</div>
<div id="ref-TransFat_Review_2011" class="csl-entry">
<span class="nocase">Mozaffarian, Dariush et al.</span> 2011. <span>“Trans Fats—Sources, Health Risks and Alternative Approach: A Review.”</span> <em>Journal of Food Science and Technology</em>. <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC3551118/">https://pmc.ncbi.nlm.nih.gov/articles/PMC3551118/</a>.
</div>
<div id="ref-PubChem_Vanillin_1183_M2" class="csl-entry">
National Center for Biotechnology Information. 2021. <em>Vanillin; CID 1183</em>. PubChem Compound summary. <a href="https://pubchem.ncbi.nlm.nih.gov/compound/1183">https://pubchem.ncbi.nlm.nih.gov/compound/1183</a>.
</div>
<div id="ref-PubChem_Vanillin" class="csl-entry">
National Center for Biotechnology Information (NCBI). 2025a. <em>PubChem Compound Summary for CID 1183: Vanillin</em>. Database record. <a href="https://pubchem.ncbi.nlm.nih.gov/compound/1183">https://pubchem.ncbi.nlm.nih.gov/compound/1183</a>.
</div>
<div id="ref-PubChem_SodiumGlycolate" class="csl-entry">
National Center for Biotechnology Information (NCBI). 2025b. <em>PubChem Compound Summary: Sodium Glycolate (Sodium Hydroxyacetate)</em>. Database record. <a href="https://pubchem.ncbi.nlm.nih.gov/compound/Sodium%20hydroxyacetate">https://pubchem.ncbi.nlm.nih.gov/compound/Sodium%20hydroxyacetate</a>.
</div>
<div id="ref-PubChem_SodiumGlycolate_M2" class="csl-entry">
NCBI. 2022. <em>Sodium Glycolate / Sodium Hydroxyacetate</em>. PubChem and supplier data. <a href="https://www.chembk.com/en/chem/Sodium%20hydroxyacetate">https://www.chembk.com/en/chem/Sodium%20hydroxyacetate</a>.
</div>
<div id="ref-Whey_Applications_2015_M2" class="csl-entry">
<span class="nocase">Pintado, M. E. et al.</span> 2015. <span>“Improved Functional Characteristics of Whey Protein Hydrolysates in Food Applications.”</span> <em>Food Technology and Biotechnology</em>, 231–42. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4662358/">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4662358/</a>.
</div>
<div id="ref-EssentialOils_FoodApps_2024_M2" class="csl-entry">
Rodilla, Jesus M., Tiago Rosado, and Eugenia Gallardo. 2024. <span>“Essential Oils: Chemistry and Food Applications.”</span> <em>Foods</em> 13 (4): 1–24. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11011311/">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11011311/</a>.
</div>
<div id="ref-Maillard_2025" class="csl-entry">
<span class="nocase">Schaefer, Kevin et al.</span> 2025. <span>“Maillard Reaction: Mechanism, Influencing Parameters, and Relevance in Food Processing.”</span> <em>Molecules</em>. <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC12154226/">https://pmc.ncbi.nlm.nih.gov/articles/PMC12154226/</a>.
</div>
<div id="ref-WCO_CH11_Notes_M2" class="csl-entry">
World Customs Organization. 2002. <em>Harmonized System Explanatory Notes: Chapter 11 — Products of the Milling Industry; Malt; Starches; Inulin; Wheat Gluten</em>. Explanatory Notes. <a href="https://www.wcoomd.org/-/media/wco/public/global/pdf/topics/nomenclature/instruments-and-tools/hs-nomenclature-older-edition/2002/11.pdf">https://www.wcoomd.org/-/media/wco/public/global/pdf/topics/nomenclature/instruments-and-tools/hs-nomenclature-older-edition/2002/11.pdf</a>.
</div>
<div id="ref-Vinegar_Review_2024" class="csl-entry">
<span class="nocase">Yun, Rong et al.</span> 2024. <span>“Vinegar: A Review of the Microbiology, Biochemistry and Quality Aspects.”</span> <em>Food Research International</em>. <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC11312487/">https://pmc.ncbi.nlm.nih.gov/articles/PMC11312487/</a>.
</div>
</div>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@report{a_r2026,
  author = {A R, Lalitha},
  publisher = {iSRL},
  title = {Justification {Companion} to {EMF-Scoring} {Model}},
  number = {iSRL-26-02-D-EMFJustify},
  date = {2026-02-20},
  url = {https://isrl.in/pub/2026-02-d-emfjustify/},
  doi = {10.5281/zenodo.18713318},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-a_r2026" class="csl-entry quarto-appendix-citeas">
A R, Lalitha. 2026. <em>Justification Companion to EMF-Scoring
Model</em>. iSRL-26-02-D-EMFJustify. iSRL. <a href="https://doi.org/10.5281/zenodo.18713318">https://doi.org/10.5281/zenodo.18713318</a>.
</div></div></section></div> ]]></description>
  <guid>https://isrl.in/pub/2026-02-d-emfjustify/</guid>
  <pubDate>Fri, 20 Feb 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Identity, Transformation, and Function A Tri-Axial Model for the Classification of Food Ingredient Identity</title>
  <dc:creator>Lalitha A R</dc:creator>
  <link>https://isrl.in/pub/2026-02-r-emf/</link>
  <description><![CDATA[ 




<script>
document.addEventListener('DOMContentLoaded', function() {
  var meta = document.querySelector('#title-block-header .quarto-title-meta');
  if (!meta) return;
  meta.insertAdjacentHTML('beforeend', '<div><div class="quarto-title-meta-heading">Contributors</div><div class="quarto-title-meta-contents"><p class="author" style="margin:0 0 0.1em 0;">Hitha Sunil</p><p style="font-size:0.82em;color:#555;margin:0 0 0.5em 0;font-style:italic;">Typesetting</p></div></div>');
});
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "ScholarlyArticle",
  "name": "Identity, Transformation, and Function: A Tri-Axial Model for the Classification of Food Ingredient Identity",
  "@id": "https://doi.org/10.5281/zenodo.18714527",
  "identifier": [
    "https://doi.org/10.5281/zenodo.18714527",
    "iSRL-26-02-R-EMF"
  ],
  "description": "Proposes the EMF Tri-Axial Identity Model for assigning a determinate identity position to any food ingredient using three axes — Anthropogenic Energy (E), Matter (M), and Function (F) — grounded in Indian regulatory instruments and validated against a 35-item benchmark.",
  "datePublished": "2026-02-20",
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "url": "https://isrl.in/pub/2026-02-r-emf/",
  "author": {
    "@type": "Person",
    "name": "Lalitha A R",
    "identifier": "https://orcid.org/0009-0001-7466-3531",
    "sameAs": "https://orcid.org/0009-0001-7466-3531",
    "email": "lalithaar.research@gmail.com"
  },
  "publisher": {
    "@type": "ResearchOrganization",
    "name": "iSRL",
    "url": "https://isrl.in"
  }
}
</script>
<div class="abstract">
<p>Food ingredient classification in India confronts a structural problem that neither label standardisation nor taxonomy alone resolves: the same substance appears under dozens of names across regulatory filings, procurement systems, and consumer labels, while substances that share a name may differ in ways that determine their legal status, tax bracket, and nutritional profile. This report proposes the <em>E–M–F Tri-Axial Identity Model</em> as a principled, evidence-grounded framework for assigning a determinate identity position to any food ingredient. The three axes measure, respectively, the invasiveness of the transformation pathway (Anthropogenic Energy, <img src="https://latex.codecogs.com/png.latex?E">), the degree of departure from the original biological matrix (Matter, <img src="https://latex.codecogs.com/png.latex?M">), and the degree to which technological function governs regulatory naming and trade classification rather than biological origin (Function, <img src="https://latex.codecogs.com/png.latex?F">). From these three coordinates, a composite Divorce Score (<img src="https://latex.codecogs.com/png.latex?D">) is derived that partitions ingredients into three operationally meaningful zones: variants of a biological source, independent canonical entities, and functional tools whose identity is defined by role rather than origin. The framework is grounded in existing Indian regulatory instruments—the FSSAI Labelling and Display Regulations 2020, the Food Products Standards and Food Additives Regulations 2011, and the Indian Trade Classification (Harmonised System)—and validated against judicial reasoning from the Supreme Court of India and the Delhi High Court. A 35-item benchmark tests the discriminatory power of the model and provides a replicable standard for future refinements. The model provides the deterministic ingredient-level substrate on which product-level food classification frameworks can operate with greater precision and consistency.</p>
</div>
<section id="sec-ch-problem" class="level1 page-columns page-full" data-number="1">
<h1 data-number="1"><span class="header-section-number">1</span> The Ingredient Identity Problem</h1>
<section id="the-multiplicity-that-is-not-noise" class="level2 page-columns page-full" data-number="1.1">
<h2 data-number="1.1" class="anchored" data-anchor-id="the-multiplicity-that-is-not-noise"><span class="header-section-number">1.1</span> The Multiplicity That Is Not Noise</h2>
<p>A survey of 896 stock-keeping units drawn from Indian retail channels—part of this project’s commercial sampling work, the full methodology for which will be documented in a forthcoming report—yielded 7,563 distinct ingredient strings after comma-splitting label text into individual units. This commercial sample was reconciled against the Open Food Facts India dataset <span class="citation" data-cites="OpenFoodFacts">(Open Food Facts contributors 2024)</span>, which contributes a further 19,748 products from a different collection pathway; across the combined 4,800 deduplicated products, splitting by comma and conjunction produces approximately 48,000 variant strings in total. The two sources are methodologically distinct and are treated as such throughout this project.</p>
<p>These strings do not represent 7,563 different substances, let alone 48,000. Preliminary reconciliation identified a far smaller number of underlying biological entities. The multiplicity is primarily linguistic: different names, transliterations, regulatory phrasings, and brand conventions applied to the same or closely related ingredients.</p>
<p>This is not a data-quality failure. It is a structural feature of how a linguistically and culturally diverse food system interacts with labelling frameworks designed for narrower ranges of variation. A manufacturer in Tamil Nadu printing <em>inji</em> on a label is not in error. A regulatory filing recording the same substance as <em>ginger (Zingiber officinale)</em> is not wrong either. A procurement system listing it as <em>dried ginger root</em> is recording something real. The problem emerges when these representations must interoperate—for compliance verification, supply chain monitoring, allergen tracking, or nutritional research—and no coordination layer exists to establish that they refer to the same thing.</p>
<p>The FSSAI Labelling and Display Regulations 2020 permit ingredient declaration in regional languages and do not mandate a single canonical term for most ingredients.<sup>1</sup> This is appropriate policy. Forcing convergence on a single English-language term would impose a linguistic uniformity that serves neither consumers nor the regulatory objective of communicating the true nature of food. The problem is not the diversity; it is the absence of a coordination structure beneath it.</p>
<div class="no-row-height column-margin column-container"><div id="fn1"><p><sup>1</sup>&nbsp;FSSAI Labelling Regulations 2020, Regulation 4(1).</p></div></div></section>
<section id="the-scale-of-variation" class="level2" data-number="1.2">
<h2 data-number="1.2" class="anchored" data-anchor-id="the-scale-of-variation"><span class="header-section-number">1.2</span> The Scale of Variation</h2>
<p>Two ingredient categories from the reconciliation process illustrate the practical range. The following strings were recovered from product labels and regulatory filings as distinct entries—each referring, in whole or in part, to a common biological source.</p>
<section id="chilli-capsicum-spp.a-representative-sample" class="level3 unnumbered">
<h3 class="unnumbered anchored" data-anchor-id="chilli-capsicum-spp.a-representative-sample">Chilli (<em>Capsicum</em> spp.)—a representative sample</h3>
<blockquote class="blockquote">
<p>chilli; chilli powder; chilli flakes; red chilli; red chilli powder; red chillies; dry red chilli; green chilli; green chilli paste; green chilli puree; kashmiri chilli; kashmiri lal mirch; mathania red chilli powder; spices and condiments—chilli; spices and condiments—red chilli powder; spices and condiments—kashmiri red chilli powder; ground spices and condiments—dry red chilli; mixed spices—red chilli flakes; extracts and oils—red chilli; chilli extract; chilli red; red chilly; red chilly powder.</p>
</blockquote>
</section>
<section id="mango-mangifera-indicaa-representative-sample" class="level3 unnumbered">
<h3 class="unnumbered anchored" data-anchor-id="mango-mangifera-indicaa-representative-sample">Mango (<em>Mangifera indica</em>)—a representative sample</h3>
<blockquote class="blockquote">
<p>mango; mango pulp; mango puree; mango powder; dry mango powder; dried mango; mango bits; mango juice; kesar mango pulp; alphonso mango pulp; concentrated mango pulp; dehydrated mango puree; mango puree concentrate; mango solids; spices and condiments—amchur; spices and condiments—dried mango powder; fruit powder blend—mango; mango flavouring; raw mango flavouring; tropical juice powder—mango.</p>
</blockquote>
<p>These samples—spanning raw forms, dried forms, powders, pastes, purees, concentrates, extracts, flavourings, and regional variety names—illustrate the problem precisely. A compliance system encountering “mathania red chilli powder” and “chilli powder” as separate entries has no basis for determining whether they represent the same ingredient, variants of the same ingredient that differ in a legally relevant way, or distinct ingredients with different regulatory implications. The same ambiguity applies across thousands of ingredient pairs in the dataset.</p>
</section>
</section>
<section id="why-this-matters-beyond-nomenclature" class="level2 page-columns page-full" data-number="1.3">
<h2 data-number="1.3" class="anchored" data-anchor-id="why-this-matters-beyond-nomenclature"><span class="header-section-number">1.3</span> Why This Matters Beyond Nomenclature</h2>
<p>The stakes of ingredient identity extend well beyond labelling consistency. Three domains illustrate the practical consequences of unresolved identity.</p>
<p><em>Allergen disclosure.</em> The FSSAI Labelling Regulations require mandatory declaration of common allergens, including cereals containing gluten, peanuts, soybeans, milk, and tree nuts.<sup>2</sup> Accurate allergen tracking requires that “besan,” “gram flour,” and “chickpea flour” be recognised as referring to the same substance, and that “refined wheat flour” and “maida” be treated as the same allergen source. A system processing these as distinct strings produces false negatives in allergen searches.</p>
<div class="no-row-height column-margin column-container"><div id="fn2"><p><sup>2</sup>&nbsp;FSSAI Labelling Regulations 2020, Regulation 5(14).</p></div></div><p><em>Trade classification and taxation.</em> The Indian Trade Classification (Harmonised System) assigns different tariff headings to ingredients on the basis of processing state and functional role. Mango pulp (HS 0804) and dried mango powder (HS 0813) are classified differently and attract different duties. Concentrated mango pulp may attract a different heading again depending on Brix value and processing method <span class="citation" data-cites="DGCI_CH11">(Directorate General of Commercial Intelligence and Statistics 2007a)</span>. The financial and legal consequences of misclassification are direct and quantifiable.</p>
<p><em>Source declaration and religious or ethical compliance.</em> The Delhi High Court, in <em>Ram Gaua Raksha Dal v. Union of India</em>, held that the obligation to declare the vegetarian or non-vegetarian status of food is independent of percentage or processing level, grounded in Articles 21 and 25 of the Constitution <span class="citation" data-cites="DelhiHC_RamGaua_2022">(Delhi High Court 2022)</span>. This principle requires that the biological origin of an ingredient remain traceable through processing transformations. A classification system that severs the link between a processed ingredient and its source—treating “casein” as a functional identifier with no required dairy origin disclosure—fails this requirement.</p>
</section>
<section id="the-question-this-report-addresses" class="level2" data-number="1.4">
<h2 data-number="1.4" class="anchored" data-anchor-id="the-question-this-report-addresses"><span class="header-section-number">1.4</span> The Question This Report Addresses</h2>
<p>These observations converge on a single question that has not been systematically answered for the Indian food system: <em>given an ingredient string, what is its identity, and what is the principled basis for that determination?</em></p>
<p>The question carries three sub-questions that must be answered in sequence. First, what counts as a canonical entity—the basic unit of identity to which variant representations are attached? Second, when does a variant become sufficiently distinct to constitute a separate canon in its own right? Third, when has an ingredient been transformed so thoroughly that its identity is no longer governed primarily by its biological source but by the technological function it performs?</p>
<p>These are ontological questions. They cannot be answered by counting occurrences or applying string-matching heuristics. They require a framework grounded in scientific, regulatory, and legal reality that produces consistent, defensible determinations when applied to novel cases.</p>
<p>Section&nbsp;2 documents a first attempt at the problem and shows where it falls short. Section&nbsp;3 introduces the theoretical foundation that reorients the approach. Section&nbsp;4 enumerates the ontological questions the framework must answer. Section&nbsp;5 establishes the regulatory instruments serving as empirical ground truth. The remaining chapters develop and validate the model, and Section&nbsp;11 describes the next steps for applying it to the full variant corpus.</p>
</section>
</section>
<section id="sec-ch-flatfail" class="level1" data-number="2">
<h1 data-number="2"><span class="header-section-number">2</span> Why Flat Canonisation Fails</h1>
<section id="the-initial-approach" class="level2" data-number="2.1">
<h2 data-number="2.1" class="anchored" data-anchor-id="the-initial-approach"><span class="header-section-number">2.1</span> The Initial Approach</h2>
<p>The natural first response to a multiplicity of ingredient strings is to collapse them. Given 7,563 strings and the reasonable expectation that they represent far fewer substances, the immediate goal was consolidation: assign each string to a canonical form, discard the variation, and produce a clean taxonomy.</p>
<p>This approach was implemented and produced a working taxonomy published as version 0.1 of the Encyclopedia of Indian Food Ingredients <span class="citation" data-cites="EncyclopediaV01">(Lalitha 2026a)</span>. That taxonomy served as a necessary first step: it demonstrated that automated consolidation was feasible, identified the problem’s boundaries, and surfaced the cases where flat consolidation produced results that were operationally and legally indefensible. The present report builds directly on what those cases revealed.</p>
</section>
<section id="what-flat-canonisation-produces" class="level2" data-number="2.2">
<h2 data-number="2.2" class="anchored" data-anchor-id="what-flat-canonisation-produces"><span class="header-section-number">2.2</span> What Flat Canonisation Produces</h2>
<p>Under a flat canonisation scheme, all variant strings for a given biological source are grouped under a single canonical label. The chilli variants listed in Section&nbsp;1 would consolidate to “Chilli.” The mango variants would consolidate to “Mango.” The logic is appealing: one biological entity, one canonical name.</p>
<p>The problem becomes visible when the output is examined by stakeholders who depend on ingredient classifications for operational decisions.</p>
<p>For a food manufacturer seeking to claim a geographically indicated ingredient: “Mathania Red Chilli” is not interchangeable with “Chilli.” Mathania is a geographic indicator associated with a specific cultivar grown in the Barmer district of Rajasthan, recognised for its characteristic colour and moderate heat. A brand that sources this variety and wishes to communicate that fact on its label—a commercially and legally meaningful distinction—has no mechanism for doing so under a scheme that treats all chilli as one entity.</p>
<p>For a nutritional researcher or regulator: “Mango pulp” and “dehydrated mango powder” are not nutritionally equivalent. The former is a high-moisture preparation with a specific sugar profile; the latter has undergone water removal that concentrates all components and, depending on process conditions, may alter certain phytochemicals. A database recording both as “Mango” provides no basis for dietary assessment calculations that depend on moisture-adjusted nutrient values.</p>
<p>For a customs authority: “Mango flavouring” filed alongside “mango pulp” under a single canonical entity produces a tariff classification that is straightforwardly incorrect. Mango pulp falls under HS Chapter 08 (edible fruits); a synthetic mango flavouring may fall under Chapter 29 (organic chemicals) or Chapter 33 (essential oils and resinoids) depending on its composition. Filing them under the same canonical entity does not resolve the classification question; it conceals it.</p>
<p>For a food safety system tracking an allergen or contaminant: “Lecithin” and “soya lecithin” cannot be merged without losing source information that is required by law. The FSSAI Labelling Regulations and the reasoning in <em>Ram Gaua Raksha Dal</em> <span class="citation" data-cites="DelhiHC_RamGaua_2022">(Delhi High Court 2022)</span> together establish that source disclosure for allergen-relevant ingredients is non-negotiable.</p>
</section>
<section id="the-structural-flaw" class="level2" data-number="2.3">
<h2 data-number="2.3" class="anchored" data-anchor-id="the-structural-flaw"><span class="header-section-number">2.3</span> The Structural Flaw</h2>
<p>Flat canonisation fails because it conflates two distinct problems requiring different solutions. The first is coordination: establishing that “chilli,” “red chilli,” and “lal mirchi” refer to the same underlying entity so that systems can interoperate. The second is identity preservation: maintaining the distinctions—geographic origin, processing state, form, biological source—that carry legal, nutritional, commercial, and cultural meaning.</p>
<p>A flat scheme solves the first problem by destroying the second. It achieves coordination at the cost of the very information that makes coordination useful. A brand filing its ingredient as “Chilli” and a brand filing it as “Kashmiri Lal Mirch” can now be linked in a database, but the database no longer records what distinguishes them—a distinction that may affect GST categorisation, GI protection claims, export certification, and consumer communication simultaneously.</p>
<p>The correct solution is a layered structure: a coordination layer linking all variant representations to a shared identifier, and an identity-preservation layer retaining the distinctions that matter. This is precisely the problem Shiyali Ramamrita Ranganathan addressed in information science nearly a century ago.</p>
</section>
</section>
<section id="sec-ch-ranganathan" class="level1" data-number="3">
<h1 data-number="3"><span class="header-section-number">3</span> Ranganathan’s Faceted Classification</h1>
<section id="the-context-of-its-creation" class="level2" data-number="3.1">
<h2 data-number="3.1" class="anchored" data-anchor-id="the-context-of-its-creation"><span class="header-section-number">3.1</span> The Context of Its Creation</h2>
<p>In 1933, the Indian mathematician and librarian S. R. Ranganathan published the first edition of <em>Colon Classification</em> <span class="citation" data-cites="Ranganathan_CC_1933">(Ranganathan 1933)</span>. The problem he addressed was structurally similar to the one this report confronts: a body of knowledge so diverse and growing so rapidly that any fixed hierarchical scheme would be perpetually inadequate. The Dewey Decimal System, then dominant in library science, assigned each subject a fixed position in a single hierarchy. Works addressing multiple subjects simultaneously, or belonging to a subject not anticipated by the scheme’s designers, could not be accommodated without distorting the classification.</p>
<p>Ranganathan’s response was to abandon the single hierarchy and replace it with a set of independent analytical dimensions, which he called <em>facets</em>. A document could be described by its position on each facet independently, and its classification was the combination of those positions. The colon in “Colon Classification” is the separator between facets in the notation.</p>
</section>
<section id="the-pmest-framework" class="level2" data-number="3.2">
<h2 data-number="3.2" class="anchored" data-anchor-id="the-pmest-framework"><span class="header-section-number">3.2</span> The PMEST Framework</h2>
<p>Ranganathan identified five fundamental facets applicable across all fields of knowledge, designated PMEST: Personality, Matter, Energy, Space, and Time <span class="citation" data-cites="Ranganathan_PMEST">(Ranganathan 1967)</span>. These represent, respectively, the primary subject of a document, the materials or substances it involves, the processes or operations it describes, the geographic location it concerns, and the time period it covers.</p>
<p>The operational power of the framework lies in the independence of its facets. A document about <em>the fermentation of rice in Karnataka in the nineteenth century</em> can be precisely described by assigning positions on each facet—rice (Personality), fermentation (Energy), Karnataka (Space), nineteenth century (Time)—without requiring that the classification scheme anticipate this exact combination in advance. New combinations form by combining existing facet values; the scheme extends to novel cases without revision.</p>
<p>Adapted to the food domain, the analytical clarity is immediate. Consider three ingredients:</p>
<ul>
<li><em>Kashmiri red chilli powder</em>: Personality = chilli (<em>Capsicum annuum</em>); Matter = dried, powdered; Space = Kashmir.</li>
<li><em>Mathania red chilli, whole dried</em>: Personality = chilli (<em>Capsicum annuum</em>); Matter = dried, whole; Space = Marwar (Rajasthan).</li>
<li><em>Green chilli paste</em>: Personality = chilli (<em>Capsicum annuum</em>); Matter = raw, comminuted, high-moisture.</li>
</ul>
<p>Under a flat scheme, all three are “Chilli.” Under a faceted scheme, all three share a Personality coordinate—sufficient to establish their relationship—while their distinct Matter and Space coordinates preserve the differences that matter. A fourth ingredient, a synthetic capsaicin extract used as a flavouring agent, would share a Personality relationship to chilli while carrying a very different processing history and a different functional identity. The framework accommodates this without modification.</p>
</section>
<section id="adoption-and-durability" class="level2" data-number="3.3">
<h2 data-number="3.3" class="anchored" data-anchor-id="adoption-and-durability"><span class="header-section-number">3.3</span> Adoption and Durability</h2>
<p>Colon Classification was adopted by the Indian National Library and numerous university libraries across South and Southeast Asia, and served as the theoretical foundation for the International Federation of Library Associations’ principles on faceted classification <span class="citation" data-cites="Broughton_CC_2006">(Broughton 2006)</span>. Subsequent frameworks—including the Bibliographic Classification of Henry Bliss and the Universal Decimal Classification’s faceted extensions—drew directly on Ranganathan’s architecture.</p>
<p>The durability of the framework across domains as diverse as bibliography, archival science, museum cataloguing, and digital information architecture reflects its quality as a structural solution rather than a domain-specific convention. The problem it addresses—organising entities that are complex, diverse, and not fully anticipated in advance—is exactly the problem that Indian food ingredient classification presents.</p>
</section>
<section id="from-library-science-to-food-identity" class="level2" data-number="3.4">
<h2 data-number="3.4" class="anchored" data-anchor-id="from-library-science-to-food-identity"><span class="header-section-number">3.4</span> From Library Science to Food Identity</h2>
<p>Applying the PMEST framework to the ingredient dataset immediately clarified which distinctions were meaningful and which were surface variation. The distinction between “chilli powder” and “chilli flakes” is a legitimate Matter distinction (fine-ground versus coarsely broken), not a naming inconsistency to be collapsed. “Kashmiri chilli” and “generic red chilli” differ on the Space facet, not the Personality facet, and that distinction carries regulatory weight in the context of geographical indication protection.</p>
<p>However, three categories of cases emerged that the PMEST framework as originally conceived did not fully resolve. The first concerned artificial or nature-identical flavourings: does “mango flavouring” belong under <em>Mangifera indica</em> as a Personality, or has synthesis transformed its identity so thoroughly that the source becomes secondary to the function? The second concerned highly processed lipids: is “soya lecithin” a variant of soybean, or has extraction and fractionation placed it in a different identity category—one defined by its emulsification function rather than its botanical origin? The third concerned processing-derived additives with no meaningful biological ancestor: modified starches, synthetic antioxidants, and inorganic salts have an HS classification and a regulatory name, but no Personality in the biological sense.</p>
<p>These categories expose the ontological questions a classification framework for food ingredients must resolve before it can be applied consistently. Those questions are addressed in Section&nbsp;4.</p>
</section>
</section>
<section id="sec-ch-ontological" class="level1 page-columns page-full" data-number="4">
<h1 data-number="4"><span class="header-section-number">4</span> The Ontological Questions That Must Be Answered</h1>
<section id="what-counts-as-a-canonical-entity" class="level2 page-columns page-full" data-number="4.1">
<h2 data-number="4.1" class="anchored" data-anchor-id="what-counts-as-a-canonical-entity"><span class="header-section-number">4.1</span> What Counts as a Canonical Entity?</h2>
<p>A canonical entity, as used in this framework, is the smallest unit of ingredient identity to which variant representations can be attached without loss of information that is legally, nutritionally, or commercially significant. Determining what counts as a canon is not a naming decision but an identity decision: it requires specifying which distinctions are constitutive of a separate entity and which are surface variations of the same entity.</p>
<p>Consider lipids. Cold-pressed sesame oil and solvent-extracted refined sesame oil share a botanical source (<em>Sesamum indicum</em>) and a chemical class (edible vegetable oil, triglyceride-based). They differ in processing pathway, residual composition, and regulatory designation: FSSAI and the Codex standard for named vegetable oils distinguish cold-pressed and refined categories.<sup>3</sup> Are they variants of one canon, or two separate canons? The answer depends on whether the processing distinction carries independent legal and nutritional weight—and, as Section&nbsp;5 demonstrates, it does.</p>
<div class="no-row-height column-margin column-container"><div id="fn3"><p><sup>3</sup>&nbsp;FSSAI Labelling Regulations 2020, Schedule II.</p></div></div></section>
<section id="when-does-a-variant-become-a-separate-canon" class="level2" data-number="4.2">
<h2 data-number="4.2" class="anchored" data-anchor-id="when-does-a-variant-become-a-separate-canon"><span class="header-section-number">4.2</span> When Does a Variant Become a Separate Canon?</h2>
<p>Variation along processing, form, and geographic dimensions does not automatically produce a separate canon. The framework requires a principled threshold at which a variant becomes sufficiently distinct to constitute an independent entity. Three criteria govern this determination.</p>
<p>First, <em>regulatory identity change</em>: if the relevant regulatory authority assigns a distinct product standard, a distinct mandatory name, or a distinct HS tariff heading to the processed form, the processing has produced a separate canon. Butter and ghee share a dairy fat origin but are defined by separate Codex standards and separate FSSAI product definitions. They are separate canons.</p>
<p>Second, <em>nutritional non-substitutability</em>: if the processed form cannot be substituted for the source form in a dietary context without materially altering the nutritional calculation, the forms are separate canons. Mango pulp and dehydrated mango powder are not nutritionally interchangeable at the same mass; they are separate canons.</p>
<p>Third, <em>functional non-substitutability</em>: if the processed form is used for a purpose that the source form cannot serve, and that purpose is the primary basis for its inclusion in a formulation, the processed form is a separate canon. Soya lecithin is used as an emulsifier; whole soybean is used as a protein and caloric source. The purposes are non-overlapping. They are separate canons.</p>
</section>
<section id="when-does-a-canon-become-a-functional-tool" class="level2 page-columns page-full" data-number="4.3">
<h2 data-number="4.3" class="anchored" data-anchor-id="when-does-a-canon-become-a-functional-tool"><span class="header-section-number">4.3</span> When Does a Canon Become a Functional Tool?</h2>
<p>The third question is the most consequential for the model developed here. A functional tool is an ingredient whose primary regulatory and commercial identity is defined by the technological role it performs rather than by its biological origin. The identity transformation is not merely a matter of processing intensity; it is a legal and semiotic shift that occurs when regulatory frameworks—labelling regulations, tariff classifications, judicial precedent—treat the ingredient primarily as a performer of a function rather than as a product of a biological source.</p>
<p>This shift is observable and documentable. The FSSAI Labelling Regulations prescribe a specific declaration format for food additives: the functional class (emulsifier, preservative, antioxidant, and so on) is declared first, followed by the specific name or International Numbering System code.<sup>4</sup> This format structurally subordinates origin to function: “Emulsifier (lecithin)” presents the technological role as the primary identifier. A brand can declare “Emulsifier (INS 322)” without reference to soy origin, except where allergen disclosure obligations apply.</p>
<div class="no-row-height column-margin column-container"><div id="fn4"><p><sup>4</sup>&nbsp;FSSAI Additives Regulations 2011, Schedule I.</p></div><div id="fn5"><p><sup>5</sup>&nbsp;FSSAI Labelling Regulations 2020, Schedule II, Class Titles 2 and 4.</p></div></div><p>By contrast, edible vegetable oils—even heavily processed ones including hydrogenated and interesterified fats—must be declared with their source type.<sup>5</sup> “Hydrogenated vegetable oil” retains the botanical-origin reference despite intensive chemical transformation. The identity, for regulatory purposes, remains origin-primary.</p>
<p>The boundary between these two regimes is not a simple function of processing intensity. A highly processed ingredient may remain origin-primary in regulatory naming, while a moderately processed ingredient may cross into function-primary classification. This is the central observation motivating the introduction of <img src="https://latex.codecogs.com/png.latex?F"> as a third dimension, independent of <img src="https://latex.codecogs.com/png.latex?E"> and <img src="https://latex.codecogs.com/png.latex?M">, in the model developed in Section&nbsp;6.</p>
</section>
<section id="the-role-of-flavourings" class="level2 page-columns page-full" data-number="4.4">
<h2 data-number="4.4" class="anchored" data-anchor-id="the-role-of-flavourings"><span class="header-section-number">4.4</span> The Role of Flavourings</h2>
<p>Flavourings require explicit treatment. The FSSAI Labelling Regulations distinguish natural flavourings, nature-identical flavourings, and artificial flavourings.<sup>6</sup> A natural mango flavouring obtained by aqueous or ethanolic extraction from mango fruit retains a biological-origin linkage in its regulatory designation. A synthetic mango flavouring produced by organic synthesis to replicate specific volatile compounds has no such linkage; its identity is defined by its sensory function and chemical composition, not by its biological source.</p>
<div class="no-row-height column-margin column-container"><div id="fn6"><p><sup>6</sup>&nbsp;FSSAI Labelling Regulations 2020.</p></div></div><p>Whether to file a synthetic mango flavouring under the canon for mango or as a separate functional entity cannot be resolved by examining the ingredient name alone. It requires a framework that positions the ingredient on dimensions of processing transformation and functional identity simultaneously. This is what the <img src="https://latex.codecogs.com/png.latex?E">–<img src="https://latex.codecogs.com/png.latex?M">–<img src="https://latex.codecogs.com/png.latex?F"> model provides.</p>
</section>
<section id="the-role-of-source-declaration" class="level2" data-number="4.5">
<h2 data-number="4.5" class="anchored" data-anchor-id="the-role-of-source-declaration"><span class="header-section-number">4.5</span> The Role of Source Declaration</h2>
<p>Throughout the foregoing analysis, source declaration has appeared both as a legal requirement and as a conceptual anchor. The requirement reflects a principle embedded in Indian food law and affirmed by the courts: that consumers and downstream systems have a legitimate interest in knowing the biological origin of ingredients, independent of the form those ingredients take in the final product. This principle creates a legal floor on identity abstraction: no ingredient can be classified as a pure functional tool, in the regulatory sense, if its biological origin is subject to mandatory disclosure.</p>
<p>This interaction between legal source-declaration obligations and functional identity is one of the novel contributions of the <img src="https://latex.codecogs.com/png.latex?F"> dimension, examined in detail with reference to specific regulatory provisions and judicial reasoning in Section&nbsp;5 and Section&nbsp;6.</p>
</section>
</section>
<section id="sec-ch-regulatory" class="level1 page-columns page-full" data-number="5">
<h1 data-number="5"><span class="header-section-number">5</span> The Regulatory Landscape as Ground Truth</h1>
<section id="why-regulation-precedes-theory" class="level2" data-number="5.1">
<h2 data-number="5.1" class="anchored" data-anchor-id="why-regulation-precedes-theory"><span class="header-section-number">5.1</span> Why Regulation Precedes Theory</h2>
<p>The ontological questions raised in Section&nbsp;4 might appear to invite philosophical resolution—a set of first principles from which a classification framework is deduced. The approach taken here is different. The existing regulatory landscape is treated as empirical evidence of how a functioning legal and commercial system has already resolved many of these questions, and unexplained divergences within that landscape are treated as signals of where principled analysis is most needed.</p>
<p>This is not deference to authority for its own sake. India’s food regulatory instruments—the FSSAI Labelling and Display Regulations 2020 <span class="citation" data-cites="FSSAI_Label_2020">(Food Safety and Standards Authority of India 2023)</span>, the Food Products Standards and Food Additives Regulations 2011 <span class="citation" data-cites="FSSAI_Additives_2011">(Food Safety and Standards Authority of India 2024)</span>, and the Indian Trade Classification (Harmonised System)—have been refined through decades of legislative drafting, administrative interpretation, and judicial review. They encode accumulated practical wisdom about which distinctions matter and which do not. A framework that contradicts these instruments without compelling justification is not principled; it is merely unconventional.</p>
</section>
<section id="the-fssai-labelling-and-display-regulations-2020" class="level2 page-columns page-full" data-number="5.2">
<h2 data-number="5.2" class="anchored" data-anchor-id="the-fssai-labelling-and-display-regulations-2020"><span class="header-section-number">5.2</span> The FSSAI Labelling and Display Regulations, 2020</h2>
<section id="the-true-nature-principle" class="level3" data-number="5.2.1">
<h3 data-number="5.2.1" class="anchored" data-anchor-id="the-true-nature-principle"><span class="header-section-number">5.2.1</span> The “True Nature” Principle</h3>
<p>Regulation 4(1) of the FSSAI Labelling and Display Regulations 2020 establishes the foundational identity norm: the name of a food shall indicate its true nature. Where an established standard exists, the standardised name is required. Where none exists, the common or usual name must be used, supplemented by a description of the true nature where the name alone is insufficient.</p>
<p>This principle establishes source-dominant identity as the regulatory default. An ingredient must be named in a way that accurately conveys what it is—its biological origin, its physical state, its processing history where that history is legally significant. The “true nature” requirement is not merely a naming convention; it is an epistemological commitment to the primacy of material identity over functional identity in the absence of specific provision to the contrary.</p>
</section>
<section id="source-qualification-requirements" class="level3 page-columns page-full" data-number="5.2.2">
<h3 data-number="5.2.2" class="anchored" data-anchor-id="source-qualification-requirements"><span class="header-section-number">5.2.2</span> Source Qualification Requirements</h3>
<p>The 2020 Regulations impose mandatory source qualifiers for several ingredient categories, creating legally enforceable constraints on functional abstraction.</p>
<p>Edible vegetable oils and fats must be declared with the specific oil type and, where applicable, the processing method.<sup>7</sup> The Schedule II ingredient class titles prescribe declaration formats including “vegetable fat (specify source type: interesterified vegetable fat / fractionated fat / hydrogenated oils / partially hydrogenated oils / margarine and fat spreads).” Even where intensive chemical modification has occurred—hydrogenation, interesterification—the source type must be named.</p>
<div class="no-row-height column-margin column-container"><div id="fn7"><p><sup>7</sup>&nbsp;FSSAI Labelling Regulations 2020, Schedule II, Class Titles 2 and 4.</p></div></div><p>Animal fats require declaration of their specific animal origin, reflecting the constitutional dimension of source disclosure affirmed by the courts.</p>
<p>Cereal flours must identify the grain source. “Wheat flour,” “maize flour,” and “rice flour” are distinct required declarations; the generic term “flour” is insufficient where the grain identity is nutritionally and allergically significant.</p>
</section>
<section id="the-additive-declaration-format" class="level3 page-columns page-full" data-number="5.2.3">
<h3 data-number="5.2.3" class="anchored" data-anchor-id="the-additive-declaration-format"><span class="header-section-number">5.2.3</span> The Additive Declaration Format</h3>
<p>Regulation 5(5) of the 2020 Regulations introduces the mechanism that makes functional identity legally cognisable: the mandatory declaration of food additives by functional class.<sup>8</sup> Additives listed in the Food Products Standards and Food Additives Regulations 2011 must be declared with their functional class name first, followed by the specific name or the INS code.</p>
<div class="no-row-height column-margin column-container"><div id="fn8"><p><sup>8</sup>&nbsp;FSSAI Labelling Regulations 2020, Regulation 5(5).</p></div></div><p>The format “Emulsifier (lecithin)” or “Preservative (INS 211)” structurally encodes the priority of function over source in regulatory naming. The functional class is the primary identifier; the specific substance is secondary. This format is mandatory, not optional: it represents a regulatory determination that for additive-classified substances, the technological role is the operationally significant identity for consumer communication.</p>
<p>Schedule I of the 2011 Regulations enumerates twenty-two functional classes, including emulsifier, thickener, stabilizer, preservative, antioxidant, sequestrant, raising agent, humectant, carrier, propellant, and packaging gas.<sup>9</sup> The existence of this taxonomy, and the mandatory declaration format that accompanies it, is empirical evidence that Indian food law recognises a distinct category of ingredients whose identity is function-primary.</p>
<div class="no-row-height column-margin column-container"><div id="fn9"><p><sup>9</sup>&nbsp;FSSAI Additives Regulations 2011, Schedule I.</p></div></div></section>
</section>
<section id="the-indian-trade-classification-harmonised-system" class="level2" data-number="5.3">
<h2 data-number="5.3" class="anchored" data-anchor-id="the-indian-trade-classification-harmonised-system"><span class="header-section-number">5.3</span> The Indian Trade Classification (Harmonised System)</h2>
<section id="chapter-structure-as-identity-architecture" class="level3" data-number="5.3.1">
<h3 data-number="5.3.1" class="anchored" data-anchor-id="chapter-structure-as-identity-architecture"><span class="header-section-number">5.3.1</span> Chapter Structure as Identity Architecture</h3>
<p>The Indian Trade Classification (Harmonised System) organises traded goods through a hierarchical chapter structure that encodes, in legally binding form, the identity distinctions that matter for taxation, origin determination, and regulatory compliance. For food ingredients, the relevant architecture spans Chapters 7 through 38, with a characteristic pattern: source-aligned classification in Chapters 7–15, and function-aligned or chemically defined classification in Chapters 29, 35, and 38.</p>
<p>Chapters 7 and 8 cover edible vegetables, fruits, and nuts, classified primarily by botanical species and physical state. Chapter 9 covers coffee, tea, and spices. Chapter 11 covers products of the milling industry: flours, meals, starches, and related products derived from grains and pulses, with HS headings that specify both source and physical form <span class="citation" data-cites="DGCI_CH11">(Directorate General of Commercial Intelligence and Statistics 2007a)</span>. Chapter 15 covers animal and vegetable fats and oils, organised by source, with processing state recorded in subheadings but not displacing source from the primary classification level <span class="citation" data-cites="DGCI_CH15">(Directorate General of Commercial Intelligence and Statistics 2007b)</span>.</p>
<p>Chapter 35 covers albuminoidal substances, modified starches, glues, and enzymes. The inclusion of “modified starches” here—rather than Chapter 11 with native starches—is a deliberate regulatory determination that chemical modification of starch sufficiently transforms its identity to warrant reclassification from a milling industry product to a chemically defined substance <span class="citation" data-cites="DGCI_CH35">(Directorate General of Commercial Intelligence and Statistics 2007c)</span>. HS heading 3505 covers “dextrins and other modified starches (for example, pregelatinised or esterified starches).” The migration from Chapter 11 to Chapter 35 represents an HS-encoded identity snap: the same biological material, after a defined degree of transformation, is treated as a different kind of thing.</p>
</section>
<section id="critical-chapter-transitions-as-identity-snaps" class="level3" data-number="5.3.2">
<h3 data-number="5.3.2" class="anchored" data-anchor-id="critical-chapter-transitions-as-identity-snaps"><span class="header-section-number">5.3.2</span> Critical Chapter Transitions as Identity Snaps</h3>
<p>The most analytically significant feature of the ITC-HS for ingredient classification is the set of chapter transitions that represent discontinuous identity changes—points at which accumulated processing crosses a threshold that the regulatory system treats as qualitatively, not merely quantitatively, significant. Three such transitions are primary.</p>
<p><em>Chapter 11 to Chapter 35 (Native Starch to Modified Starch).</em> Native starches are classified in Chapter 11 as products of the milling industry. Chemically modified starches—acetylated, cross-linked, phosphorylated—migrate to HS heading 3505 in Chapter 35. This transition is triggered by chemical modification of the starch polymer: the introduction of new functional groups that alter the regulatory identity of the material from food commodity to chemically defined functional substance.</p>
<p><em>Chapter 15 to Chapter 1516 to Chapters 29/38 (Oils to Chemically Modified Fats to Chemical Products).</em> Within Chapter 15, a progression exists from crude and refined oils through chemically modified fats (heading 1516, covering hydrogenated, interesterified, re-esterified, and elaidinised fats “not further prepared”) to formulated preparations (heading 1517). Lecithins and phosphoaminolipids, derived from vegetable oil processing, are classified under HS heading 2923 in Chapter 29 rather than Chapter 15, reflecting regulatory determination that the identity of these substances is defined by their chemical structure and emulsification function rather than their fat-derived origin.</p>
<p><em>Chapter 22 (Brewed Vinegar) versus Chapter 29 (Synthetic Acetic Acid).</em> Brewed vinegar, produced by double fermentation of agricultural substrates, is classified under HS 2209 in Chapter 22. Glacial acetic acid (synthetic), used as an acidulant, is classified under HS 2915 in Chapter 29. FSSAI regulations require that synthetic vinegar be labelled “SYNTHETIC – PREPARED FROM ACETIC ACID,” distinguishing it from brewed vinegar at the product naming level as well as the tariff level. This parallel treatment across labelling law and tariff classification illustrates the convergent methodology applied throughout this report.</p>
</section>
</section>
<section id="judicial-reasoning-on-ingredient-identity" class="level2" data-number="5.4">
<h2 data-number="5.4" class="anchored" data-anchor-id="judicial-reasoning-on-ingredient-identity"><span class="header-section-number">5.4</span> Judicial Reasoning on Ingredient Identity</h2>
<section id="the-supreme-court-on-classification-hierarchy" class="level3" data-number="5.4.1">
<h3 data-number="5.4.1" class="anchored" data-anchor-id="the-supreme-court-on-classification-hierarchy"><span class="header-section-number">5.4.1</span> The Supreme Court on Classification Hierarchy</h3>
<p>In <em>Commissioner of Customs (Import) v. M/s Welkin Foods</em>, decided on 6 January 2026, the Supreme Court of India addressed the hierarchy of interpretive tools applicable to food product classification disputes <span class="citation" data-cites="WelkinFoods_2026">(Supreme Court of India 2026)</span>. The Court held that Harmonised System codes and tariff headings constitute the primary reference for classification purposes, overruling the common parlance test where the two conflict. Scientific and technical definitions embedded in the HS architecture take precedence over popular understanding of what a product “is” or “is used for.”</p>
<p>The practical implication is significant: the identity of an ingredient, for regulatory and legal purposes, is determined by the structure of the classification system rather than by lay or commercial understanding. An ingredient that a consumer would describe as “chocolate” may, for classification purposes, be a “vegetable fat confection” if its cocoa butter content falls below the legal threshold. The technical classification displaces the common-name description.</p>
</section>
<section id="the-delhi-high-court-on-source-disclosure-independence" class="level3" data-number="5.4.2">
<h3 data-number="5.4.2" class="anchored" data-anchor-id="the-delhi-high-court-on-source-disclosure-independence"><span class="header-section-number">5.4.2</span> The Delhi High Court on Source Disclosure Independence</h3>
<p>In <em>Ram Gaua Raksha Dal v. Union of India and Others</em>, the Delhi High Court ruled on the interaction between functional-class additive declaration and source-based disclosure requirements <span class="citation" data-cites="DelhiHC_RamGaua_2022">(Delhi High Court 2022)</span>. The Court held, first, that source disclosure obligations are independent of the additive-declaration framework: even where an additive is properly declared by functional class and INS number, the source-based identification requirement cannot be displaced. Second, the obligation is percentage-independent: a non-vegetarian ingredient triggers mandatory source disclosure regardless of quantity present. Third, the Court grounded these requirements in Articles 21 and 25 of the Constitution, elevating source disclosure from regulatory preference to fundamental rights protection in specific contexts.</p>
<p>For the classification framework developed here, the judgment establishes a legal ceiling on functional abstraction: regardless of how technically “functional” an ingredient’s classification is under the additive schedule or the ITC-HS, source identity cannot be fully abstracted where constitutional disclosure interests apply. This ceiling is incorporated into the <img src="https://latex.codecogs.com/png.latex?F"> dimension of the model as a contextual modifier.</p>
</section>
</section>
<section id="the-regulatory-delta-2011-to-2020" class="level2" data-number="5.5">
<h2 data-number="5.5" class="anchored" data-anchor-id="the-regulatory-delta-2011-to-2020"><span class="header-section-number">5.5</span> The Regulatory Delta: 2011 to 2020</h2>
<p>A comparative analysis of the FSSAI Food Products Standards and Food Additives Regulations 2011 and the Labelling and Display Regulations 2020 reveals a systematic shift in the regulatory treatment of ingredient identity <span class="citation" data-cites="FSSAI_RegulatoryDelta">(Vukka and Lalitha 2026)</span>. The 2020 Regulations expanded the scope of mandatory source qualification, tightened the format requirements for additive declaration, and introduced new provisions for allergen labelling and the declaration of processing aids. These changes collectively increased the regulatory resolution of ingredient identity: more distinctions are now legally mandated, and more instruments are available to enforce them.</p>
<p>This trajectory is relevant to the benchmark in Section&nbsp;5.6: the 35 test cases are calibrated to the current regulatory state as of 2025, with the understanding that the framework must accommodate regulatory evolution without requiring wholesale reconstruction.</p>
</section>
<section id="sec-benchmark" class="level2" data-number="5.6">
<h2 data-number="5.6" class="anchored" data-anchor-id="sec-benchmark"><span class="header-section-number">5.6</span> The Identity Discrimination Benchmark</h2>
<section id="purpose-and-scope" class="level3" data-number="5.6.1">
<h3 data-number="5.6.1" class="anchored" data-anchor-id="purpose-and-scope"><span class="header-section-number">5.6.1</span> Purpose and Scope</h3>
<p>The benchmark serves a specific and bounded purpose: it provides a replicable, publicly stated set of discrimination tests against which any ingredient classification framework—including the <img src="https://latex.codecogs.com/png.latex?E">–<img src="https://latex.codecogs.com/png.latex?M">–<img src="https://latex.codecogs.com/png.latex?F"> model developed in Section&nbsp;6—can be evaluated. A framework that fails to produce correct discriminations on the benchmark cases is demonstrably inadequate; a framework that passes all cases has cleared a necessary but not sufficient condition for general adequacy.</p>
<p>The benchmark is adversarial by design. Each test case represents a discrimination that a naive or flat classification system would likely fail, while a principled framework grounded in regulatory and scientific evidence should resolve correctly. The test cases span the full range of ingredient transformation—from thermal history without identity change to complete chemical synthesis—and cover the major regulatory identity snaps documented above.</p>
<p>The discriminatory power of a framework applied to the benchmark is quantified by a Determinism Quotient (DQ):</p>
<p><span id="eq-dq"><img src="https://latex.codecogs.com/png.latex?DQ%20=%20%5Cfrac%7B%5Csum%20%5Ctext%7BCorrect%20Discriminations%7D%7D%7B35%7D%20%5Ctag%7B1%7D"></span></p>
<p>A DQ of 1.0 indicates correct differentiation of all 35 pairs. Partial scores indicate specific domains of weakness. The DQ measures logical consistency with regulatory ground truth, not statistical performance.</p>
</section>
<section id="the-35-test-identity-discrimination-benchmark" class="level3" data-number="5.6.2">
<h3 data-number="5.6.2" class="anchored" data-anchor-id="the-35-test-identity-discrimination-benchmark"><span class="header-section-number">5.6.2</span> The 35-Test Identity Discrimination Benchmark</h3>
<div id="tbl-benchmark" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-tbl figure">
<figcaption class="quarto-float-caption-top quarto-float-caption quarto-float-tbl" id="tbl-benchmark-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Table&nbsp;1: Identity Discrimination Benchmark: 35 adversarial test pairs.
</figcaption>
<div aria-describedby="tbl-benchmark-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<table class="caption-top table">
<colgroup>
<col style="width: 7%">
<col style="width: 43%">
<col style="width: 49%">
</colgroup>
<thead>
<tr class="header">
<th style="text-align: center;"><strong>ID</strong></th>
<th style="text-align: left;"><strong>Does the framework differentiate between…</strong></th>
<th style="text-align: left;"><strong>Reason for Testing (Regulatory/Nutritional Logic)</strong></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center;">1</td>
<td style="text-align: left;">Raw Apple vs.&nbsp;Chilled Apple</td>
<td style="text-align: left;">Floor test for thermal history without identity snap. Chilling is minimal processing with no regulatory rename. A framework must not produce a distinct canon from refrigeration alone.</td>
</tr>
<tr class="even">
<td style="text-align: center;">2</td>
<td style="text-align: left;">Whole Wheat Flour vs.&nbsp;Maida</td>
<td style="text-align: left;">Detection of matrix stripping. FSSAI product standards 2.4.1 and 2.4.2 distinguish these as separate regulated commodities with different ash content and extraction rate specifications.</td>
</tr>
<tr class="odd">
<td style="text-align: center;">3</td>
<td style="text-align: left;">Maida vs.&nbsp;Native Wheat Starch</td>
<td style="text-align: left;">Snap from whole-plant milling to nutrient isolation. Maida retains protein and some non-starch material; native wheat starch is a purified carbohydrate fraction. FSSAI and HS Chapter 11 distinguish these within the milling chapter before any chemical modification occurs.</td>
</tr>
<tr class="even">
<td style="text-align: center;">4</td>
<td style="text-align: left;">Sliced Onion vs.&nbsp;Onion Powder</td>
<td style="text-align: left;">Mass concentration threshold and matrix disruption. Dehydration concentrates all components by approximately 10-fold; the resulting powder has different nutritional density, water activity, and regulatory handling characteristics.</td>
</tr>
<tr class="odd">
<td style="text-align: center;">5</td>
<td style="text-align: left;">Raw Milk vs.&nbsp;Pasteurised Milk</td>
<td style="text-align: left;">Identification of the primary legal safety processing step. FSSAI dairy standards require heat treatment declaration; the legal name changes from “milk” to “pasteurised milk.” The framework must register this change without treating the two as entirely separate biological entities.</td>
</tr>
<tr class="even">
<td style="text-align: center;">6</td>
<td style="text-align: left;">Fresh Fruit vs.&nbsp;Dehydrated Fruit</td>
<td style="text-align: left;">Phase change and water activity boundary (<img src="https://latex.codecogs.com/png.latex?a_w">). Dehydrated fruit is regulated under different FSSAI standards, has different microbiological risk profiles, and occupies different HS subheadings.</td>
</tr>
<tr class="odd">
<td style="text-align: center;">7</td>
<td style="text-align: left;">Raw Honey vs.&nbsp;Pasteurised Honey</td>
<td style="text-align: left;">Enzymatic integrity versus thermal stabilisation. FSSAI honey standards distinguish these based on diastase activity; the framework must capture the enzymatic dimension of processing history.</td>
</tr>
<tr class="even">
<td style="text-align: center;">8</td>
<td style="text-align: left;">Cold Pressed Oil vs.&nbsp;Refined Oil</td>
<td style="text-align: left;">Chemical separation and solvent-based processing floor. FSSAI mandates different label designations; Codex CXS 19-1981 restricts the term “cold pressed” to oil obtained without heat addition and without additives. Refined oil has passed through deacidification, bleaching, and deodorisation.</td>
</tr>
<tr class="odd">
<td style="text-align: center;">9</td>
<td style="text-align: left;">Butter vs.&nbsp;Ghee</td>
<td style="text-align: left;">Separation of dairy solids and water. FSSAI product standards under Chapter 2.2 define butter and ghee as distinct dairy fat products with different compositional specifications. The two occupy the same HS chapter (Chapter 04) but different HS headings.</td>
</tr>
<tr class="even">
<td style="text-align: center;">10</td>
<td style="text-align: left;">Ghee vs.&nbsp;Anhydrous Milk Fat</td>
<td style="text-align: left;">Chemical peak of lipid purity. Anhydrous milk fat (AMF) achieves approximately 99.9% lipid content through a more intensive separation process than ghee, with Codex standard CXS 280-1973 defining separate parameters for each.</td>
</tr>
<tr class="odd">
<td style="text-align: center;">11</td>
<td style="text-align: left;">Liquid Vegetable Oil vs.&nbsp;Vanaspati</td>
<td style="text-align: left;">Catalytic hydrogenation snap. FSSAI defines vanaspati under Chapter 2.2.6 as a hydrogenated vegetable oil product with mandatory trans fat disclosure. HS heading 1516 applies to hydrogenated fats, distinct from unmodified oil headings 1507–1515.</td>
</tr>
<tr class="even">
<td style="text-align: center;">12</td>
<td style="text-align: left;">Vanaspati vs.&nbsp;Interesterified Fat</td>
<td style="text-align: left;">Molecular rearrangement for structural utility. Interesterification redistributes fatty acids among glycerol backbones, creating a different melting profile without full saturation. Both fall under HS 1516 but with distinct process designations; FSSAI Schedule II requires specific naming of interesterified vegetable fat.</td>
</tr>
<tr class="odd">
<td style="text-align: center;">13</td>
<td style="text-align: left;">Milk vs.&nbsp;Dairy Whitener</td>
<td style="text-align: left;">Functionality shift from beverage to additive carrier. Dairy whitener is a formulated product containing dried milk with emulsifiers, anticaking agents, and flow agents; its primary commercial function is as an additive to beverages, not as a standalone nutritional source.</td>
</tr>
<tr class="even">
<td style="text-align: center;">14</td>
<td style="text-align: left;">Coconut Milk vs.&nbsp;Coconut Oil</td>
<td style="text-align: left;">Emulsion-to-lipid snap. Coconut milk is an aqueous emulsion of coconut fat in coconut water (HS Chapter 21 preparation); coconut oil is the isolated lipid fraction (HS Chapter 15). These are categorically different regulatory entities despite sharing a botanical source.</td>
</tr>
<tr class="odd">
<td style="text-align: center;">15</td>
<td style="text-align: left;">Raw Milk vs.&nbsp;Yogurt/Curd</td>
<td style="text-align: left;">Biological conversion and structural coagulation. Fermentation transforms the protein matrix, carbohydrate profile, and pH of milk; FSSAI product standards and HS Chapter 04 treat curd and milk as distinct dairy products.</td>
</tr>
<tr class="even">
<td style="text-align: center;">16</td>
<td style="text-align: left;">Curd vs.&nbsp;Soy Dahi (Analogue)</td>
<td style="text-align: left;">Source-origin verification (plant versus animal identity). A plant-based analogue mimicking the sensory properties of curd must be declared as a dairy analogue under FSSAI labelling rules and cannot use the term “dahi” without qualification. The framework must distinguish biological source even where functional and sensory properties overlap.</td>
</tr>
<tr class="odd">
<td style="text-align: center;">17</td>
<td style="text-align: left;">Fruit Juice vs.&nbsp;Fruit Vinegar</td>
<td style="text-align: left;">Snap from sugar matrix to biological acid matrix. Fermentation transforms ethanol to acetic acid; the resulting product is governed by FSSAI vinegar standards and HS Chapter 22 vinegar heading, categorically distinct from juice classification.</td>
</tr>
<tr class="even">
<td style="text-align: center;">18</td>
<td style="text-align: left;">Vinegar vs.&nbsp;Glacial Acetic Acid</td>
<td style="text-align: left;">Biogenic origin versus petrochemical synthesis. FSSAI mandates “SYNTHETIC – PREPARED FROM ACETIC ACID” labelling for non-fermented vinegar substitutes. HS chapter migration from Chapter 22 (beverages) to Chapter 29 (organic chemicals) is required.</td>
</tr>
<tr class="odd">
<td style="text-align: center;">19</td>
<td style="text-align: left;">Cane Sugar vs.&nbsp;Xanthan Gum</td>
<td style="text-align: left;">Fermentation product as tool versus substrate identity. Xanthan gum, produced by fermentation of glucose substrates by <em>Xanthomonas campestris</em>, is classified as a food additive (stabilizer, INS 415) under FSSAI Schedule I and in Chapter 13 or 35 of ITC-HS, entirely distinct from its sugar feedstock.</td>
</tr>
<tr class="even">
<td style="text-align: center;">20</td>
<td style="text-align: left;">Natural Yeast vs.&nbsp;Chemical Leavening</td>
<td style="text-align: left;">Biological versus inorganic gas-release mechanisms. Yeast leavening is a biological process; sodium bicarbonate and baking powder are classified as food additives (raising agents, INS 500) with inorganic chemistry origins.</td>
</tr>
<tr class="odd">
<td style="text-align: center;">21</td>
<td style="text-align: left;">Wheat Flour vs.&nbsp;Maltodextrin</td>
<td style="text-align: left;">Enzymatic hydrolysis: matrix-to-molecular snap. Maltodextrin, produced by partial hydrolysis of starch, occupies HS heading 1702 (other sugars) or 1108 (starches) depending on dextrose equivalent; it is categorically distinct from the flour from which it derives.</td>
</tr>
<tr class="even">
<td style="text-align: center;">22</td>
<td style="text-align: left;">Native Starch vs.&nbsp;Modified Starch</td>
<td style="text-align: left;">Identity snap from Chapter 11 to Chapter 35 of ITC-HS. Chemical modification (acetylation, cross-linking, phosphorylation) moves starch from the milling industry chapter to the albuminoidal substances and modified starches chapter. FSSAI labelling requires explicit naming of modified starches as such.</td>
</tr>
<tr class="odd">
<td style="text-align: center;">23</td>
<td style="text-align: left;">Whole Soya Bean vs.&nbsp;Soya Lecithin</td>
<td style="text-align: left;">Food-to-emulsifier snap (<img src="https://latex.codecogs.com/png.latex?F"> peak). Soya lecithin is extracted from soybean oil, concentrated to a phospholipid-rich fraction, and classified as a food additive (emulsifier, INS 322) under FSSAI Schedule I and under HS 2923 (phosphoaminolipids) in Chapter 29—entirely distinct from the whole soybean.</td>
</tr>
<tr class="even">
<td style="text-align: center;">24</td>
<td style="text-align: left;">Sugar vs.&nbsp;High Fructose Corn Syrup</td>
<td style="text-align: left;">Enzymatic synthesis of non-natural sugar ratios. High fructose corn syrup is produced by enzymatic isomerisation of glucose; its fructose content does not occur in natural corn starch and produces a functionally and metabolically distinct sweetener.</td>
</tr>
<tr class="odd">
<td style="text-align: center;">25</td>
<td style="text-align: left;">Vanilla Bean vs.&nbsp;Natural Vanilla Extract</td>
<td style="text-align: left;">Solvent extraction versus biological matrix integrity. Natural vanilla extract is produced by aqueous or ethanolic extraction; it is a concentrated flavouring preparation classified under Chapter 33 (essential oils, resinoids) rather than Chapter 9 (spices), with distinct regulatory treatment under FSSAI flavouring guidelines.</td>
</tr>
<tr class="even">
<td style="text-align: center;">26</td>
<td style="text-align: left;">Natural Vanilla Extract vs.&nbsp;Synthetic Vanillin</td>
<td style="text-align: left;">Signal-to-source divorce. Synthetic vanillin (4-hydroxy-3-methoxybenzaldehyde, HS 2912.41) is classified in Chapter 29 (organic chemicals); it cannot be labelled as “natural vanilla flavouring” under FSSAI regulations and must be declared as “artificial flavouring” or “flavouring (vanillin).”</td>
</tr>
<tr class="odd">
<td style="text-align: center;">27</td>
<td style="text-align: left;">Chocolate vs.&nbsp;Chocolate Substitute</td>
<td style="text-align: left;">Legal admission of non-cocoa fats as identity limit. FSSAI product standards for chocolate set minimum cocoa solids and cocoa butter content; products falling below these thresholds must be designated “chocolate-flavoured” or “compound chocolate” rather than “chocolate.”</td>
</tr>
<tr class="even">
<td style="text-align: center;">28</td>
<td style="text-align: left;">Natural Dietary Fibre vs.&nbsp;Purified Cellulose</td>
<td style="text-align: left;">Isolation of non-nutritive structural tool. Microcrystalline cellulose (MCC, INS 460) is an additive-classified substance under FSSAI Schedule I, used as a bulking agent, anticaking agent, and stabiliser; it is categorically distinct from the dietary fibre content declared on nutrition labels.</td>
</tr>
<tr class="odd">
<td style="text-align: center;">29</td>
<td style="text-align: left;">Cane Sugar vs.&nbsp;Aspartame</td>
<td style="text-align: left;">Caloric bulk versus high-potency functional signal. Aspartame (INS 951) is classified as an intense sweetener under FSSAI Schedule I at use levels approximately 200 times lower than sugar by mass; its functional identity is defined by sweetening intensity, not caloric contribution.</td>
</tr>
<tr class="even">
<td style="text-align: center;">30</td>
<td style="text-align: left;">Sea Salt vs.&nbsp;Sodium Benzoate</td>
<td style="text-align: left;">Flavour seasoning versus system utility (preservative). Sodium benzoate (INS 211) is classified as a preservative under FSSAI Schedule I; its primary function is microbial inhibition, not flavour. The framework must not conflate sodium-containing ingredients on the basis of cation similarity.</td>
</tr>
<tr class="odd">
<td style="text-align: center;">31</td>
<td style="text-align: left;">Guar Gum vs.&nbsp;Cereal Flour</td>
<td style="text-align: left;">Peak viscosity utility versus caloric mass contribution. Guar gum (INS 412), classified as a thickener and stabiliser under FSSAI Schedule I, is used at 0.1–0.5% inclusion levels for viscosity; cereal flour is a bulk ingredient providing starch and protein at 40–80% of formulation weight.</td>
</tr>
<tr class="even">
<td style="text-align: center;">32</td>
<td style="text-align: left;">Lemon Juice vs.&nbsp;Citric Acid</td>
<td style="text-align: left;">Purity-utility snap: food versus acidulant tool. Lemon juice is a food ingredient governed by FSSAI product standards (Chapter 20 of ITC-HS); citric acid (INS 330) is a food additive classified as an acidity regulator under FSSAI Schedule I and in Chapter 29 of ITC-HS.</td>
</tr>
<tr class="odd">
<td style="text-align: center;">33</td>
<td style="text-align: left;">Smoked Meat vs.&nbsp;Liquid Smoke</td>
<td style="text-align: left;">Process-integral flavour versus additive signal divorce. Liquid smoke is a condensate of wood combustion products, standardised and classified as a flavouring preparation under FSSAI regulations; it is a discrete additive, not the outcome of an integrated processing step, and must be declared in the ingredient list.</td>
</tr>
<tr class="even">
<td style="text-align: center;">34</td>
<td style="text-align: left;">Natural Beta-Carotene vs.&nbsp;Synthetic Beta-Carotene</td>
<td style="text-align: left;">Source coordinate verification. Natural beta-carotene (extracted from vegetables or algae) and synthetic beta-carotene (chemical synthesis) are chemically identical but classified differently for the purpose of “natural colour” claims under FSSAI and comparable labelling frameworks. The framework must capture source coordinate even where molecular structure is identical.</td>
</tr>
<tr class="odd">
<td style="text-align: center;">35</td>
<td style="text-align: left;">Bulk Ingredient vs.&nbsp;INS Carrier/Additive</td>
<td style="text-align: left;">Maximum divorce: the functional infrastructure peak. An ingredient serving no direct nutritional, sensory, or structural role in the final food product—functioning purely as a carrier, processing aid, or technical auxiliary—represents the terminus of the identity axis. The framework must distinguish this from any ingredient contributing to the food’s nutritional or sensory character.</td>
</tr>
</tbody>
</table>
</div>
</figure>
</div>
</section>
<section id="benchmark-validation-protocol" class="level3" data-number="5.6.3">
<h3 data-number="5.6.3" class="anchored" data-anchor-id="benchmark-validation-protocol"><span class="header-section-number">5.6.3</span> Benchmark Validation Protocol</h3>
<p>The benchmark is applied to the <img src="https://latex.codecogs.com/png.latex?E">–<img src="https://latex.codecogs.com/png.latex?M">–<img src="https://latex.codecogs.com/png.latex?F"> model in Section&nbsp;8. The validation records, for each test pair, the model coordinates assigned to each member and whether those coordinates produce a differentiated classification outcome. A differentiated outcome requires that the two members of the pair be assigned to different canonical zones (variant, independent canon, or functional tool) or that their coordinate values differ sufficiently to warrant different regulatory and operational treatment.</p>
<p>Critiques of the benchmark—whether challenging the selection of test pairs, the regulatory evidence cited, or the pass/fail criteria—are subject to the contribution protocol in Appendix A. Critique without proposed revision and evidence does not constitute engagement with the benchmark.</p>
</section>
</section>
</section>
<section id="sec-ch-emf" class="level1" data-number="6">
<h1 data-number="6"><span class="header-section-number">6</span> The <img src="https://latex.codecogs.com/png.latex?E">–<img src="https://latex.codecogs.com/png.latex?M">–<img src="https://latex.codecogs.com/png.latex?F"> Tri-Axial Identity Model</h1>
<section id="the-need-for-three-dimensions" class="level2" data-number="6.1">
<h2 data-number="6.1" class="anchored" data-anchor-id="the-need-for-three-dimensions"><span class="header-section-number">6.1</span> The Need for Three Dimensions</h2>
<p>Chapters 2 through 5 have established that ingredient identity is not a single-dimensional property. Flat canonisation collapses distinctions that matter; classification by processing level alone conflates ingredients that regulatory systems treat as categorically different. Ranganathan’s faceted approach provides the theoretical architecture, but its application to food ingredients requires computational operationalisation: dimensions that are measurable, independently assignable, and combinable into a diagnostic framework.</p>
<p>Three dimensions are necessary and, as argued below, sufficient to capture the identity distinctions that regulatory systems actually make.</p>
<p>First, <em>how invasively was the ingredient transformed?</em> This is a question about process: the energy and chemistry invested in moving an ingredient away from its native biological state. It is measured by the Anthropogenic Energy Score (<img src="https://latex.codecogs.com/png.latex?E">).</p>
<p>Second, <em>how far has the ingredient moved from its source matrix?</em> This is a question about the resulting material: how much of the original biological context—moisture, fibre, co-nutrients, cellular structure—remains in the ingredient as it enters the food system. It is measured by the Matter Score (<img src="https://latex.codecogs.com/png.latex?M">).</p>
<p>Third, <em>does the ingredient’s regulatory and commercial identity follow its biological source or its technological function?</em> This is a question about the legal-semiotic position of the ingredient: whether it is named, classified, and governed as a product of a biological origin or as a performer of a technological role. It is measured by the Functional Score (<img src="https://latex.codecogs.com/png.latex?F">).</p>
<p>These dimensions are independent. A moderately processed ingredient can have high functional identity (propellant gases have high <img src="https://latex.codecogs.com/png.latex?F"> despite moderate <img src="https://latex.codecogs.com/png.latex?E">). A heavily processed ingredient can retain low functional identity (hydrogenated vegetable oil has high <img src="https://latex.codecogs.com/png.latex?E"> but low <img src="https://latex.codecogs.com/png.latex?F"> because regulatory naming retains source primacy). No single axis is sufficient to determine identity, and the combination of all three resolves cases that any two alone leave ambiguous.</p>
<p>The full technical justification for each score assigned in the tables that follow—including process-by-process derivations, supporting citations, and defensibility ratings—is documented in the companion scoring report <span class="citation" data-cites="EMF_JustificationCompanion">(Lalitha 2026b)</span>. The present chapter states the framework and its outputs; the companion document shows the derivation.</p>
</section>
<section id="the-anthropogenic-energy-score-e" class="level2" data-number="6.2">
<h2 data-number="6.2" class="anchored" data-anchor-id="the-anthropogenic-energy-score-e"><span class="header-section-number">6.2</span> The Anthropogenic Energy Score (<img src="https://latex.codecogs.com/png.latex?E">)</h2>
<section id="definition-and-interpretive-range" class="level3" data-number="6.2.1">
<h3 data-number="6.2.1" class="anchored" data-anchor-id="definition-and-interpretive-range"><span class="header-section-number">6.2.1</span> Definition and Interpretive Range</h3>
<p>The Anthropogenic Energy Score <img src="https://latex.codecogs.com/png.latex?E"> quantifies the invasiveness of the transformation pathway applied to an ingredient, ranging from <img src="https://latex.codecogs.com/png.latex?E%20=%200"> (native biological state, no industrial transformation) to <img src="https://latex.codecogs.com/png.latex?E%20=%201.0"> (complete chemical synthesis with no biological material present or traceable).</p>
<p>The scale is continuous but structured around four interpretive bands, each anchored in regulatory and chemical distinctions:</p>
<ul>
<li><p><strong>Physical (0.10–0.35):</strong> Mechanical handling with no intentional molecular re-identity. Sorting (<img src="https://latex.codecogs.com/png.latex?E%20%5Capprox%200.12">), washing (<img src="https://latex.codecogs.com/png.latex?E%20%5Capprox%200.15">), dehusking (<img src="https://latex.codecogs.com/png.latex?E%20%5Capprox%200.22">), milling (<img src="https://latex.codecogs.com/png.latex?E%20%5Capprox%200.28">), cold pressing (<img src="https://latex.codecogs.com/png.latex?E%20%5Capprox%200.32">). These operations alter the physical form of the ingredient without targeting covalent bonds.</p></li>
<li><p><strong>Thermal/Biological (0.40–0.60):</strong> Phase change, safety stabilisation, and biological conversion. Churning (<img src="https://latex.codecogs.com/png.latex?E%20%5Capprox%200.45">), pasteurisation (<img src="https://latex.codecogs.com/png.latex?E%20%5Capprox%200.48">), clarification for ghee production (<img src="https://latex.codecogs.com/png.latex?E%20%5Capprox%200.55">), fermentation (<img src="https://latex.codecogs.com/png.latex?E%20%5Capprox%200.56">), roasting (<img src="https://latex.codecogs.com/png.latex?E%20%5Capprox%200.58">). These operations alter the structural or chemical state of the ingredient while retaining a clear connection to the biological source in regulatory naming.</p></li>
<li><p><strong>Fractional/Refinement (0.70–0.82):</strong> Separation into functional fractions using solvents, controlled crystallisation, or industrial purification. Solvent extraction (<img src="https://latex.codecogs.com/png.latex?E%20%5Capprox%200.82">), fractionation (<img src="https://latex.codecogs.com/png.latex?E%20%5Capprox%200.76">), refining (<img src="https://latex.codecogs.com/png.latex?E%20%5Capprox%200.75">). These operations produce technically defined fractions that may lack the botanical character of the starting material.</p></li>
<li><p><strong>Chemical/Synthetic (0.85–1.0):</strong> Intentional covalent modification or de novo synthesis. Interesterification (<img src="https://latex.codecogs.com/png.latex?E%20%5Capprox%200.91">), hydrogenation (<img src="https://latex.codecogs.com/png.latex?E%20%5Capprox%200.92">), acetylation (<img src="https://latex.codecogs.com/png.latex?E%20%5Capprox%200.94">), synthetic flavours (<img src="https://latex.codecogs.com/png.latex?E%20%5Capprox%200.98">–<img src="https://latex.codecogs.com/png.latex?0.99">). These operations introduce new functional groups, rearrange molecular structures, or produce chemically defined substances with no necessary biological precursor.</p></li>
</ul>
</section>
<section id="e-as-process-history-not-quality-assessment" class="level3" data-number="6.2.2">
<h3 data-number="6.2.2" class="anchored" data-anchor-id="e-as-process-history-not-quality-assessment"><span class="header-section-number">6.2.2</span> <img src="https://latex.codecogs.com/png.latex?E"> as Process History, Not Quality Assessment</h3>
<p>A critical interpretive constraint must be stated explicitly: the <img src="https://latex.codecogs.com/png.latex?E"> score is not a quality assessment, a health score, or a value judgement. Ghee, a product of deep cultural and nutritional significance, carries an <img src="https://latex.codecogs.com/png.latex?E"> score of approximately 0.55 because it is produced through thermal concentration and clarification—processes that are moderately invasive relative to the full scale. This does not make ghee inferior to cold-pressed oil in any nutritional, cultural, or commercial sense. The <img src="https://latex.codecogs.com/png.latex?E"> score records what happened to the ingredient; it does not evaluate whether that history is desirable.</p>
<p>Similarly, a high <img src="https://latex.codecogs.com/png.latex?E"> score for synthetic vanillin (<img src="https://latex.codecogs.com/png.latex?E%20%5Capprox%200.98">) does not imply that it is unsafe or inappropriate for use. JECFA evaluations and approved INS classifications confirm that synthetic vanillin is safe at specified use levels. The high <img src="https://latex.codecogs.com/png.latex?E"> score records the degree of chemical synthesis involved in its production.</p>
</section>
<section id="selected-e-score-reference-values" class="level3" data-number="6.2.3">
<h3 data-number="6.2.3" class="anchored" data-anchor-id="selected-e-score-reference-values"><span class="header-section-number">6.2.3</span> Selected <img src="https://latex.codecogs.com/png.latex?E"> Score Reference Values</h3>
<p>Table&nbsp;2 presents reference <img src="https://latex.codecogs.com/png.latex?E"> values for representative processes.</p>
<div id="tbl-escores" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-tbl figure">
<figcaption class="quarto-float-caption-top quarto-float-caption quarto-float-tbl" id="tbl-escores-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Table&nbsp;2: Selected Anthropogenic Energy Score (<img src="https://latex.codecogs.com/png.latex?E">) reference values.
</figcaption>
<div aria-describedby="tbl-escores-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<table class="caption-top table">
<thead>
<tr class="header">
<th style="text-align: left;"><strong>Process</strong></th>
<th style="text-align: center;"><strong><img src="https://latex.codecogs.com/png.latex?E"></strong></th>
<th style="text-align: left;"><strong>Band</strong></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">Sorting</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: left;">Physical</td>
</tr>
<tr class="even">
<td style="text-align: left;">Washing</td>
<td style="text-align: center;">0.15</td>
<td style="text-align: left;">Physical</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Chilling</td>
<td style="text-align: center;">0.18</td>
<td style="text-align: left;">Physical</td>
</tr>
<tr class="even">
<td style="text-align: left;">De-husking</td>
<td style="text-align: center;">0.22</td>
<td style="text-align: left;">Physical</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Milling (e.g., Besan)</td>
<td style="text-align: center;">0.28</td>
<td style="text-align: left;">Physical</td>
</tr>
<tr class="even">
<td style="text-align: left;">Cold Pressing (Oil)</td>
<td style="text-align: center;">0.32</td>
<td style="text-align: left;">Physical</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Churning (Butter)</td>
<td style="text-align: center;">0.45</td>
<td style="text-align: left;">Physical/Thermal</td>
</tr>
<tr class="even">
<td style="text-align: left;">Pasteurization</td>
<td style="text-align: center;">0.48</td>
<td style="text-align: left;">Thermal</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Clarification (Ghee)</td>
<td style="text-align: center;">0.55</td>
<td style="text-align: left;">Thermal</td>
</tr>
<tr class="even">
<td style="text-align: left;">Fermentation (Vinegar)</td>
<td style="text-align: center;">0.56</td>
<td style="text-align: left;">Biological</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Roasting</td>
<td style="text-align: center;">0.58</td>
<td style="text-align: left;">Thermal/Chemical</td>
</tr>
<tr class="even">
<td style="text-align: left;">Refining (Vegetable Oil)</td>
<td style="text-align: center;">0.75</td>
<td style="text-align: left;">Industrial/Fractional</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Fractionation (Olein)</td>
<td style="text-align: center;">0.76</td>
<td style="text-align: left;">Industrial/Fractional</td>
</tr>
<tr class="even">
<td style="text-align: left;">Solvent Extraction (Oils)</td>
<td style="text-align: center;">0.82</td>
<td style="text-align: left;">Industrial/Fractional</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Interesterification</td>
<td style="text-align: center;">0.91</td>
<td style="text-align: left;">Chemical/Synthetic</td>
</tr>
<tr class="even">
<td style="text-align: left;">Hydrogenation</td>
<td style="text-align: center;">0.92</td>
<td style="text-align: left;">Chemical/Synthetic</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Acetylation (Modified Starch)</td>
<td style="text-align: center;">0.94</td>
<td style="text-align: left;">Chemical/Synthetic</td>
</tr>
<tr class="even">
<td style="text-align: left;">Synthetic Vanillin</td>
<td style="text-align: center;">0.98</td>
<td style="text-align: left;">Chemical/Synthetic</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Synthetic Flavors (General)</td>
<td style="text-align: center;">0.99</td>
<td style="text-align: left;">Chemical/Synthetic</td>
</tr>
</tbody>
</table>
</div>
</figure>
</div>
</section>
</section>
<section id="the-matter-score-m" class="level2" data-number="6.3">
<h2 data-number="6.3" class="anchored" data-anchor-id="the-matter-score-m"><span class="header-section-number">6.3</span> The Matter Score (<img src="https://latex.codecogs.com/png.latex?M">)</h2>
<section id="definition-and-interpretive-range-1" class="level3" data-number="6.3.1">
<h3 data-number="6.3.1" class="anchored" data-anchor-id="definition-and-interpretive-range-1"><span class="header-section-number">6.3.1</span> Definition and Interpretive Range</h3>
<p>The Matter Score <img src="https://latex.codecogs.com/png.latex?M"> measures the degree of departure of the ingredient’s final commercial state from the original biological matrix, ranging from <img src="https://latex.codecogs.com/png.latex?M%20=%200"> (whole, hydrated, structurally intact biological material) to <img src="https://latex.codecogs.com/png.latex?M%20=%201.0"> (chemically defined pure substance with no remaining biological matrix).</p>
<p>Where <img src="https://latex.codecogs.com/png.latex?E"> measures the transformation <em>process</em>, <img src="https://latex.codecogs.com/png.latex?M"> measures the transformation <em>result</em>: the state of the material as it enters the food system. An ingredient may undergo a high-<img src="https://latex.codecogs.com/png.latex?E"> process and emerge with a relatively low <img src="https://latex.codecogs.com/png.latex?M"> if the process retains most of the original matrix (roasting leaves the bulk carbohydrate, fat, and protein structure largely intact). Conversely, a moderate-<img src="https://latex.codecogs.com/png.latex?E"> process applied repeatedly or intensively may produce a high-<img src="https://latex.codecogs.com/png.latex?M"> result (spray-drying combined with prior concentration and protein precipitation produces a protein isolate at <img src="https://latex.codecogs.com/png.latex?M%20%5Capprox%200.78">).</p>
<p>Seven conceptual matter classes provide interpretive anchors:</p>
<ol type="1">
<li><strong>Hydrated/Native (<img src="https://latex.codecogs.com/png.latex?M%20=%200.05">–<img src="https://latex.codecogs.com/png.latex?0.15">):</strong> Whole or minimally cut foods with cellular water and anatomical structure largely intact.</li>
<li><strong>Comminuted (<img src="https://latex.codecogs.com/png.latex?M%20=%200.25">–<img src="https://latex.codecogs.com/png.latex?0.36">):</strong> Physically reduced particle size; full nutrient spectrum retained; cellular structure disrupted but not fractionated.</li>
<li><strong>Dehydrated/Concentrated (<img src="https://latex.codecogs.com/png.latex?M%20=%200.38">–<img src="https://latex.codecogs.com/png.latex?0.52">):</strong> Water removed or matrix densified; major macronutrients retained; water activity substantially reduced.</li>
<li><strong>Structural Fractionation (<img src="https://latex.codecogs.com/png.latex?M%20=%200.50">–<img src="https://latex.codecogs.com/png.latex?0.60">):</strong> Selective removal or enrichment of specific macronutrient fractions (skim milk, defatted meal, clarified juice).</li>
<li><strong>Constitutional Isolate (<img src="https://latex.codecogs.com/png.latex?M%20=%200.70">–<img src="https://latex.codecogs.com/png.latex?0.82">):</strong> One major macronutrient isolated to high technical purity (vegetable oils, protein isolates, purified fat fractions).</li>
<li><strong>Molecular Signal/Extract (<img src="https://latex.codecogs.com/png.latex?M%20=%200.86">–<img src="https://latex.codecogs.com/png.latex?0.90">):</strong> High-potency, low-mass signals isolated from the biological matrix (essential oils, oleoresins, emulsifiers).</li>
<li><strong>De Novo/Synthetic Matter (<img src="https://latex.codecogs.com/png.latex?M%20=%200.96">–<img src="https://latex.codecogs.com/png.latex?0.99">):</strong> Chemically defined substances with no required biological matrix (modified starches, synthetic flavours, inorganic salts).</li>
</ol>
</section>
<section id="selected-m-score-reference-values" class="level3" data-number="6.3.2">
<h3 data-number="6.3.2" class="anchored" data-anchor-id="selected-m-score-reference-values"><span class="header-section-number">6.3.2</span> Selected <img src="https://latex.codecogs.com/png.latex?M"> Score Reference Values</h3>
<p>Table&nbsp;3 presents reference <img src="https://latex.codecogs.com/png.latex?M"> values for representative commercial states.</p>
<div id="tbl-mscores" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-tbl figure">
<figcaption class="quarto-float-caption-top quarto-float-caption quarto-float-tbl" id="tbl-mscores-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Table&nbsp;3: Selected Matter Score (<img src="https://latex.codecogs.com/png.latex?M">) reference values.
</figcaption>
<div aria-describedby="tbl-mscores-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<table class="caption-top table">
<thead>
<tr class="header">
<th style="text-align: left;"><strong>Final Commercial State</strong></th>
<th style="text-align: center;"><strong><img src="https://latex.codecogs.com/png.latex?M"></strong></th>
<th style="text-align: left;"><strong>Matter Class</strong></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">Whole/fresh pieces</td>
<td style="text-align: center;">0.05</td>
<td style="text-align: left;">Hydrated/Native</td>
</tr>
<tr class="even">
<td style="text-align: left;">Cut/sliced pieces</td>
<td style="text-align: center;">0.10</td>
<td style="text-align: left;">Hydrated/Native</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Pulp/puree</td>
<td style="text-align: center;">0.25</td>
<td style="text-align: left;">Comminuted</td>
</tr>
<tr class="even">
<td style="text-align: left;">Coarse grits</td>
<td style="text-align: center;">0.30</td>
<td style="text-align: left;">Comminuted</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Flour/fine powder</td>
<td style="text-align: center;">0.33</td>
<td style="text-align: left;">Comminuted</td>
</tr>
<tr class="even">
<td style="text-align: left;">Flakes</td>
<td style="text-align: center;">0.36</td>
<td style="text-align: left;">Dehydrated/Concentrated</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Dense block (e.g., khoya)</td>
<td style="text-align: center;">0.38</td>
<td style="text-align: left;">Dehydrated/Concentrated</td>
</tr>
<tr class="even">
<td style="text-align: left;">Concentrate (liquid)</td>
<td style="text-align: center;">0.40</td>
<td style="text-align: left;">Dehydrated/Concentrated</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Powder (spray-dried)</td>
<td style="text-align: center;">0.42</td>
<td style="text-align: left;">Dehydrated/Concentrated</td>
</tr>
<tr class="even">
<td style="text-align: left;">Juice (clarified)</td>
<td style="text-align: center;">0.50</td>
<td style="text-align: left;">Structural Fractionation</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Whey powder</td>
<td style="text-align: center;">0.52</td>
<td style="text-align: left;">Structural Fractionation</td>
</tr>
<tr class="even">
<td style="text-align: left;">Skim/defatted meal</td>
<td style="text-align: center;">0.55</td>
<td style="text-align: left;">Structural Fractionation</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Starch flour</td>
<td style="text-align: center;">0.60</td>
<td style="text-align: left;">Structural Fractionation</td>
</tr>
<tr class="even">
<td style="text-align: left;">Oil</td>
<td style="text-align: center;">0.70</td>
<td style="text-align: left;">Constitutional Isolate</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Fat fraction</td>
<td style="text-align: center;">0.72</td>
<td style="text-align: left;">Constitutional Isolate</td>
</tr>
<tr class="even">
<td style="text-align: left;">Protein concentrate</td>
<td style="text-align: center;">0.74</td>
<td style="text-align: left;">Constitutional Isolate</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Protein isolate</td>
<td style="text-align: center;">0.78</td>
<td style="text-align: left;">Constitutional Isolate</td>
</tr>
<tr class="even">
<td style="text-align: left;">Granules (agglomerated isolate)</td>
<td style="text-align: center;">0.80</td>
<td style="text-align: left;">Constitutional Isolate</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Extract/oleoresin</td>
<td style="text-align: center;">0.86</td>
<td style="text-align: left;">Molecular Signal/Extract</td>
</tr>
<tr class="even">
<td style="text-align: left;">Oleoresin (viscous)</td>
<td style="text-align: center;">0.88</td>
<td style="text-align: left;">Molecular Signal/Extract</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Emulsifier powder (e.g., lecithin)</td>
<td style="text-align: center;">0.89</td>
<td style="text-align: left;">Molecular Signal/Extract</td>
</tr>
<tr class="even">
<td style="text-align: left;">Essential oil</td>
<td style="text-align: center;">0.90</td>
<td style="text-align: left;">Molecular Signal/Extract</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Modified starch powder</td>
<td style="text-align: center;">0.96</td>
<td style="text-align: left;">De Novo/Synthetic</td>
</tr>
<tr class="even">
<td style="text-align: left;">Crystalline chemical</td>
<td style="text-align: center;">0.98</td>
<td style="text-align: left;">De Novo/Synthetic</td>
</tr>
</tbody>
</table>
</div>
</figure>
</div>
</section>
</section>
<section id="the-functional-score-f" class="level2" data-number="6.4">
<h2 data-number="6.4" class="anchored" data-anchor-id="the-functional-score-f"><span class="header-section-number">6.4</span> The Functional Score (<img src="https://latex.codecogs.com/png.latex?F">)</h2>
<section id="definition-and-motivation" class="level3" data-number="6.4.1">
<h3 data-number="6.4.1" class="anchored" data-anchor-id="definition-and-motivation"><span class="header-section-number">6.4.1</span> Definition and Motivation</h3>
<p>The Functional Score <img src="https://latex.codecogs.com/png.latex?F"> measures the degree to which the legal and commercial identity of an ingredient is governed by its technological function rather than its biological origin. It ranges from <img src="https://latex.codecogs.com/png.latex?F%20%5Capprox%200.10"> (identity fully source-dominant) to <img src="https://latex.codecogs.com/png.latex?F%20=%200.95"> (identity fully function-dominant), with the following interpretive zones:</p>
<ul>
<li><p><strong>Source-Dominant (<img src="https://latex.codecogs.com/png.latex?F%20=%200.10">–<img src="https://latex.codecogs.com/png.latex?0.25">):</strong> Primary structure, bulk, calories, protein; regulatory naming follows food commodity name; technological function is implicit. Examples: base ingredients, spices, edible oils, dairy fats.</p></li>
<li><p><strong>Source-Retaining, Function-Emergent (<img src="https://latex.codecogs.com/png.latex?F%20=%200.35">–<img src="https://latex.codecogs.com/png.latex?0.55">):</strong> Technological role is acknowledged in naming but source remains primary or co-equal. Examples: bulking agents, humectants, firming agents, raising agents.</p></li>
<li><p><strong>Function-Emergent (<img src="https://latex.codecogs.com/png.latex?F%20=%200.60">–<img src="https://latex.codecogs.com/png.latex?0.75">):</strong> Technological function is primary in regulatory naming; source is secondary or parenthetical. Examples: thickeners, stabilisers, gelling agents, foaming agents, colours.</p></li>
<li><p><strong>Function-Dominant (<img src="https://latex.codecogs.com/png.latex?F%20=%200.80">–<img src="https://latex.codecogs.com/png.latex?0.95">):</strong> Pure tool-identity; source fully abstracted or irrelevant to classification. Examples: emulsifiers, preservatives, sequestrants, bleaching agents, carriers, propellants.</p></li>
</ul>
</section>
<section id="f-is-not-derived-from-e-and-m" class="level3" data-number="6.4.2">
<h3 data-number="6.4.2" class="anchored" data-anchor-id="f-is-not-derived-from-e-and-m"><span class="header-section-number">6.4.2</span> <img src="https://latex.codecogs.com/png.latex?F"> Is Not Derived from <img src="https://latex.codecogs.com/png.latex?E"> and <img src="https://latex.codecogs.com/png.latex?M"></h3>
<p>The <img src="https://latex.codecogs.com/png.latex?F"> score is not a mathematical function of <img src="https://latex.codecogs.com/png.latex?E"> and <img src="https://latex.codecogs.com/png.latex?M">. This independence is the central methodological commitment of the tri-axial framework, motivated by empirical evidence that the correlation between processing intensity, matrix distance, and functional naming is imperfect.</p>
<p>Two cases illustrate the independence. First, fractionated palm olein (<img src="https://latex.codecogs.com/png.latex?E%20%5Capprox%200.76">, <img src="https://latex.codecogs.com/png.latex?M%20%5Capprox%200.72">) has high process intensity and substantial matrix distance, yet its regulatory naming is source-primary (“fractionated palm oil,” HS Chapter 15); its <img src="https://latex.codecogs.com/png.latex?F"> score is approximately 0.35. Second, a packaging gas such as nitrogen (<img src="https://latex.codecogs.com/png.latex?E%20%5Capprox%200.60">, <img src="https://latex.codecogs.com/png.latex?M%20%5Capprox%200.90">) has moderate process intensity, but its regulatory and commercial identity is defined entirely by its physical properties and atmospheric function; its <img src="https://latex.codecogs.com/png.latex?F"> score is 0.95. No formula relating <img src="https://latex.codecogs.com/png.latex?E"> and <img src="https://latex.codecogs.com/png.latex?M"> to <img src="https://latex.codecogs.com/png.latex?F"> would correctly place both.</p>
<p>The <img src="https://latex.codecogs.com/png.latex?F"> score is derived from a three-part test:</p>
<ol type="1">
<li><strong>FSSAI naming test:</strong> Does the mandatory label declaration format require a functional class name as the primary identifier (“Emulsifier (lecithin)”) or a source-based name (“palm oil”)?</li>
<li><strong>ITC-HS chapter test:</strong> Does the ingredient’s classification reside in source-aligned chapters (7–15) or function-aligned/chemically defined chapters (29, 35, 38)?</li>
<li><strong>Judicial precedent test:</strong> Does case law require or permit functional abstraction, or does it mandate source-based disclosure for the ingredient category?</li>
</ol>
<p>An ingredient achieves function-dominant status (<img src="https://latex.codecogs.com/png.latex?F%20%5Cgeq%200.80">) only when all three tests converge on functional identity. Where tests produce conflicting signals—as with gelatin, whose gelling function supports high <img src="https://latex.codecogs.com/png.latex?F"> but whose animal origin triggers source-disclosure obligations under the reasoning of <em>Ram Gaua Raksha Dal</em> <span class="citation" data-cites="DelhiHC_RamGaua_2022">(Delhi High Court 2022)</span>—the <img src="https://latex.codecogs.com/png.latex?F"> score reflects the net regulatory position after accounting for the constraint.</p>
</section>
<section id="f-scores-across-fssai-functional-classes" class="level3" data-number="6.4.3">
<h3 data-number="6.4.3" class="anchored" data-anchor-id="f-scores-across-fssai-functional-classes"><span class="header-section-number">6.4.3</span> <img src="https://latex.codecogs.com/png.latex?F"> Scores Across FSSAI Functional Classes</h3>
<p>Table&nbsp;4 presents the <img src="https://latex.codecogs.com/png.latex?F"> range for each of the twenty-two functional classes enumerated in Schedule I of the Food Products Standards and Food Additives Regulations 2011, derived from the three-part test.</p>
<div id="tbl-fscores" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-tbl figure">
<figcaption class="quarto-float-caption-top quarto-float-caption quarto-float-tbl" id="tbl-fscores-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Table&nbsp;4: Functional Score (<img src="https://latex.codecogs.com/png.latex?F">) ranges by FSSAI Schedule I functional class.
</figcaption>
<div aria-describedby="tbl-fscores-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<table class="caption-top table">
<thead>
<tr class="header">
<th style="text-align: left;"><strong>Functional Class</strong></th>
<th style="text-align: center;"><strong><img src="https://latex.codecogs.com/png.latex?F"> Score</strong></th>
<th style="text-align: left;"><strong>Zone</strong></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">Base ingredient (non-additive)</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: left;">Source-Dominant</td>
</tr>
<tr class="even">
<td style="text-align: left;">Taste profile / spice</td>
<td style="text-align: center;">0.18</td>
<td style="text-align: left;">Source-Dominant</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Lipid base (edible oil/fat)</td>
<td style="text-align: center;">0.22</td>
<td style="text-align: left;">Source-Dominant</td>
</tr>
<tr class="even">
<td style="text-align: left;">Bulking agent</td>
<td style="text-align: center;">0.35–0.40</td>
<td style="text-align: left;">Source-Retaining</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Humectant</td>
<td style="text-align: center;">0.40–0.45</td>
<td style="text-align: left;">Source-Retaining</td>
</tr>
<tr class="even">
<td style="text-align: left;">Firming agent</td>
<td style="text-align: center;">0.42–0.48</td>
<td style="text-align: left;">Source-Retaining</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Raising agent</td>
<td style="text-align: center;">0.45–0.50</td>
<td style="text-align: left;">Source-Retaining</td>
</tr>
<tr class="even">
<td style="text-align: left;">Flavouring agent</td>
<td style="text-align: center;">0.60–0.75</td>
<td style="text-align: left;">Function-Emergent</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Thickener</td>
<td style="text-align: center;">0.58–0.65</td>
<td style="text-align: left;">Function-Emergent</td>
</tr>
<tr class="even">
<td style="text-align: left;">Stabiliser</td>
<td style="text-align: center;">0.62–0.68</td>
<td style="text-align: left;">Function-Emergent</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Gelling agent</td>
<td style="text-align: center;">0.65–0.70</td>
<td style="text-align: left;">Function-Emergent</td>
</tr>
<tr class="even">
<td style="text-align: left;">Sweetener (bulk/intense)</td>
<td style="text-align: center;">0.55–0.70</td>
<td style="text-align: left;">Function-Emergent</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Foaming agent</td>
<td style="text-align: center;">0.70–0.75</td>
<td style="text-align: left;">Function-Emergent</td>
</tr>
<tr class="even">
<td style="text-align: left;">Colour</td>
<td style="text-align: center;">0.75–0.85</td>
<td style="text-align: left;">Function-Emergent / Dominant</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Emulsifier</td>
<td style="text-align: center;">0.80–0.85</td>
<td style="text-align: left;">Function-Dominant</td>
</tr>
<tr class="even">
<td style="text-align: left;">Anticaking agent</td>
<td style="text-align: center;">0.85</td>
<td style="text-align: left;">Function-Dominant</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Acidity regulator</td>
<td style="text-align: center;">0.85–0.87</td>
<td style="text-align: left;">Function-Dominant</td>
</tr>
<tr class="even">
<td style="text-align: left;">Antioxidant</td>
<td style="text-align: center;">0.87–0.88</td>
<td style="text-align: left;">Function-Dominant</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Preservative</td>
<td style="text-align: center;">0.87–0.90</td>
<td style="text-align: left;">Function-Dominant</td>
</tr>
<tr class="even">
<td style="text-align: left;">Antifoaming agent</td>
<td style="text-align: center;">0.90</td>
<td style="text-align: left;">Function-Dominant</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Sequestrant</td>
<td style="text-align: center;">0.90–0.92</td>
<td style="text-align: left;">Function-Dominant</td>
</tr>
<tr class="even">
<td style="text-align: left;">Bleaching agent</td>
<td style="text-align: center;">0.92</td>
<td style="text-align: left;">Function-Dominant</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Flour treatment agent</td>
<td style="text-align: center;">0.93</td>
<td style="text-align: left;">Function-Dominant</td>
</tr>
<tr class="even">
<td style="text-align: left;">Carrier</td>
<td style="text-align: center;">0.94</td>
<td style="text-align: left;">Function-Dominant</td>
</tr>
<tr class="odd">
<td style="text-align: left;">Propellant</td>
<td style="text-align: center;">0.95</td>
<td style="text-align: left;">Function-Dominant</td>
</tr>
<tr class="even">
<td style="text-align: left;">Packaging gas</td>
<td style="text-align: center;">0.95</td>
<td style="text-align: left;">Function-Dominant</td>
</tr>
</tbody>
</table>
</div>
</figure>
</div>
</section>
<section id="f-as-tie-breaker" class="level3" data-number="6.4.4">
<h3 data-number="6.4.4" class="anchored" data-anchor-id="f-as-tie-breaker"><span class="header-section-number">6.4.4</span> <img src="https://latex.codecogs.com/png.latex?F"> as Tie-Breaker</h3>
<p>The primary operational contribution of the <img src="https://latex.codecogs.com/png.latex?F"> dimension is resolution of ambiguity in cases where <img src="https://latex.codecogs.com/png.latex?E"> and <img src="https://latex.codecogs.com/png.latex?M"> produce similar coordinates for ingredients that regulatory systems treat as categorically distinct. This tie-breaking function is clearest in the lipid category.</p>
<p>Soy lecithin (<img src="https://latex.codecogs.com/png.latex?E%20%5Capprox%200.82">, <img src="https://latex.codecogs.com/png.latex?M%20%5Capprox%200.89">) and fractionated palm olein (<img src="https://latex.codecogs.com/png.latex?E%20%5Capprox%200.76">, <img src="https://latex.codecogs.com/png.latex?M%20%5Capprox%200.72">) are both heavily processed and substantially abstracted from their biological matrices. On <img src="https://latex.codecogs.com/png.latex?E"> and <img src="https://latex.codecogs.com/png.latex?M"> alone, they appear at similar positions in transformation space. But their regulatory identities diverge sharply: soy lecithin is classified as a food additive under FSSAI Schedule I (emulsifier, INS 322) and in HS Chapter 29 (phosphoaminolipids); its primary regulatory identity is functional. Fractionated palm olein is classified as a vegetable fat under FSSAI Schedule II class titles and in HS Chapter 15; its primary regulatory identity is source-based. The <img src="https://latex.codecogs.com/png.latex?F"> scores—approximately 0.82 for lecithin and 0.35 for fractionated palm olein—resolve this ambiguity and produce distinct classification outcomes.</p>
</section>
</section>
<section id="the-emf-coordinate-system" class="level2" data-number="6.5">
<h2 data-number="6.5" class="anchored" data-anchor-id="the-emf-coordinate-system"><span class="header-section-number">6.5</span> The <img src="https://latex.codecogs.com/png.latex?E">–<img src="https://latex.codecogs.com/png.latex?M">–<img src="https://latex.codecogs.com/png.latex?F"> Coordinate System</h2>
<p>Each ingredient is assigned a position in a three-dimensional coordinate space: <img src="https://latex.codecogs.com/png.latex?E%20%5Cin%20%5B0,%201%5D">, <img src="https://latex.codecogs.com/png.latex?M%20%5Cin%20%5B0,%201%5D">, <img src="https://latex.codecogs.com/png.latex?F%20%5Cin%20%5B0,%201%5D">. The position <img src="https://latex.codecogs.com/png.latex?(E,%20M,%20F)"> is the ingredient’s identity coordinate—its location in the space of processed ingredients, determined independently on each dimension.</p>
<p>The coordinate is not a summary statistic; it is a structured representation preserving the information carried by each dimension. Two ingredients with the same <img src="https://latex.codecogs.com/png.latex?D"> score (derived in Section&nbsp;7) may have very different coordinate profiles reflecting different kinds of identity transformation. The coordinate system allows these differences to be traced and reasoned about.</p>
<p>Assignment of coordinates follows the evidence hierarchy: FSSAI regulations and product standards take precedence over general labelling rules; ITC-HS chapter assignments provide independent corroboration; judicial reasoning fills gaps and resolves conflicts. Where evidence is unavailable or conflicting, the assignment is flagged as provisional and subject to revision through the contribution protocol in Appendix A.</p>
</section>
</section>
<section id="sec-ch-divorce" class="level1 page-columns page-full" data-number="7">
<h1 data-number="7"><span class="header-section-number">7</span> The Divorce Score (<img src="https://latex.codecogs.com/png.latex?D">) and Operational Zones</h1>
<section id="from-coordinates-to-classification" class="level2" data-number="7.1">
<h2 data-number="7.1" class="anchored" data-anchor-id="from-coordinates-to-classification"><span class="header-section-number">7.1</span> From Coordinates to Classification</h2>
<p>The three-dimensional coordinate <img src="https://latex.codecogs.com/png.latex?(E,%20M,%20F)"> assigns an ingredient a position in transformation space, but operational deployment requires a scalar classification: a single determination of which zone an ingredient occupies. The Divorce Score <img src="https://latex.codecogs.com/png.latex?D"> serves this purpose. It aggregates the three coordinates into a single composite index placing ingredients into one of three operationally distinct zones corresponding to the three ontological positions: variant of a biological source, independent canonical entity, and functional tool.</p>
</section>
<section id="definition-of-the-divorce-score" class="level2" data-number="7.2">
<h2 data-number="7.2" class="anchored" data-anchor-id="definition-of-the-divorce-score"><span class="header-section-number">7.2</span> Definition of the Divorce Score</h2>
<p><span id="eq-divorce"><img src="https://latex.codecogs.com/png.latex?D%20=%200.3%20%5Ccdot%20E%20+%200.3%20%5Ccdot%20M%20+%200.4%20%5Ccdot%20F%20%5Ctag%7B2%7D"></span></p>
<p>where <img src="https://latex.codecogs.com/png.latex?E">, <img src="https://latex.codecogs.com/png.latex?M">, and <img src="https://latex.codecogs.com/png.latex?F"> are the Anthropogenic Energy, Matter, and Functional scores respectively, each in <img src="https://latex.codecogs.com/png.latex?%5B0,%201%5D">. The resulting <img src="https://latex.codecogs.com/png.latex?D"> score is also in <img src="https://latex.codecogs.com/png.latex?%5B0,%201%5D">.</p>
<section id="weight-rationale" class="level3" data-number="7.2.1">
<h3 data-number="7.2.1" class="anchored" data-anchor-id="weight-rationale"><span class="header-section-number">7.2.1</span> Weight Rationale</h3>
<p>The weighting scheme assigns the highest weight (0.4) to <img src="https://latex.codecogs.com/png.latex?F"> and equal weights (0.3 each) to <img src="https://latex.codecogs.com/png.latex?E"> and <img src="https://latex.codecogs.com/png.latex?M">. This allocation reflects the empirical finding that regulatory naming and trade classification—captured by <img src="https://latex.codecogs.com/png.latex?F">—are the most reliable single predictors of identity zone in borderline cases, while <img src="https://latex.codecogs.com/png.latex?E"> and <img src="https://latex.codecogs.com/png.latex?M"> provide necessary context that <img src="https://latex.codecogs.com/png.latex?F"> alone cannot supply.</p>
<p>An ingredient can have high <img src="https://latex.codecogs.com/png.latex?E"> and <img src="https://latex.codecogs.com/png.latex?M"> while remaining in a source-primary zone if regulatory frameworks have determined that its identity should remain tied to its biological origin despite intensive processing (hydrogenated vegetable oil is the paradigm case). Conversely, an ingredient can have moderate <img src="https://latex.codecogs.com/png.latex?E"> and <img src="https://latex.codecogs.com/png.latex?M"> while being fully function-primary if its regulatory naming and HS classification are function-dominant (packaging gases are the paradigm case). In both cases, <img src="https://latex.codecogs.com/png.latex?F"> is the decisive variable; the 0.4 weight acknowledges this without making <img src="https://latex.codecogs.com/png.latex?E"> and <img src="https://latex.codecogs.com/png.latex?M"> redundant.</p>
<p>The equal weighting of <img src="https://latex.codecogs.com/png.latex?E"> and <img src="https://latex.codecogs.com/png.latex?M"> reflects their complementarity: <img src="https://latex.codecogs.com/png.latex?E"> describes the transformation history while <img src="https://latex.codecogs.com/png.latex?M"> describes the resulting state, and cases where these diverge are precisely where both pieces of information are needed to characterise the ingredient accurately.</p>
<p>The weights in Equation&nbsp;2 are explicitly provisional. They reflect the best current judgement calibrated against the benchmark cases in Section&nbsp;8. Refinement using subject matter expert input, expanded benchmark coverage, or Bayesian calibration against regulatory decision data is anticipated and invited through the contribution protocol in Appendix A.</p>
</section>
</section>
<section id="the-three-operational-zones" class="level2" data-number="7.3">
<h2 data-number="7.3" class="anchored" data-anchor-id="the-three-operational-zones"><span class="header-section-number">7.3</span> The Three Operational Zones</h2>
<section id="zone-1-variant-d-0.30" class="level3" data-number="7.3.1">
<h3 data-number="7.3.1" class="anchored" data-anchor-id="zone-1-variant-d-0.30"><span class="header-section-number">7.3.1</span> Zone 1: Variant (<img src="https://latex.codecogs.com/png.latex?D%20%3C%200.30">)</h3>
<p>An ingredient with <img src="https://latex.codecogs.com/png.latex?D%20%3C%200.30"> is classified as a <em>variant</em>—a representation of a biological source sufficiently close to the source, in process history, material state, and regulatory naming, to be filed under the same canonical entity. Variants do not require independent canon entries; they are represented through the suffix system as elaborated forms of a canonical identity.</p>
<p>Examples of variant-zone ingredients include whole fresh produce, minimally processed grains, cold-pressed oils from named sources, dried whole spices, and named dairy products such as pasteurised milk and fresh curd. The variant zone encompasses the full range of legitimate labelling variation that does not rise to the level of a distinct regulatory or nutritional identity.</p>
<p>Within the variant zone, the suffix system preserves distinctions that matter commercially and culturally. A brand using “Mathania Red Chilli” is in the variant zone relative to the “Red Chilli” canon; its specific suffix records geographic origin without displacing the canon. A brand using “Kashmiri Lal Mirch” occupies the same zone with a different suffix. Both coordinate under the same canon while retaining their distinct commercial identities.</p>
</section>
<section id="zone-2-independent-canon-0.30-leq-d-leq-0.70" class="level3" data-number="7.3.2">
<h3 data-number="7.3.2" class="anchored" data-anchor-id="zone-2-independent-canon-0.30-leq-d-leq-0.70"><span class="header-section-number">7.3.2</span> Zone 2: Independent Canon (<img src="https://latex.codecogs.com/png.latex?0.30%20%5Cleq%20D%20%5Cleq%200.70">)</h3>
<p>An ingredient with <img src="https://latex.codecogs.com/png.latex?D"> in <img src="https://latex.codecogs.com/png.latex?%5B0.30,%200.70%5D"> constitutes an <em>independent canon</em>—an entity sufficiently distinct from any biological source to warrant its own canonical entry, but not so transformed that its identity is wholly defined by its technological function. Independent canons have a biological origin that remains traceable and relevant to their identity, but they are not interchangeable with other forms of that origin for regulatory, nutritional, or commercial purposes.</p>
<p>Examples include refined vegetable oils, dairy fat fractions (ghee, butter), fermented vinegar, modified starches before the HS Chapter 11-to-35 migration, protein concentrates, spray-dried powders of identifiable biological origin, and dehydrated fruit products.</p>
</section>
<section id="zone-3-functional-tool-d-0.70" class="level3" data-number="7.3.3">
<h3 data-number="7.3.3" class="anchored" data-anchor-id="zone-3-functional-tool-d-0.70"><span class="header-section-number">7.3.3</span> Zone 3: Functional Tool (<img src="https://latex.codecogs.com/png.latex?D%20%3E%200.70">)</h3>
<p>An ingredient with <img src="https://latex.codecogs.com/png.latex?D%20%3E%200.70"> is classified as a <em>functional tool</em>—an entity whose identity is primarily defined by its technological role rather than its biological origin. Functional tools do not contribute directly to the nutritional or sensory character of the food from the consumer’s perspective; they are infrastructure enabling the food system to achieve technical objectives.</p>
<p>This does not mean functional tools are unimportant. Emulsifiers, preservatives, sequestrants, and carriers are essential to the safety, stability, and palatability of packaged foods. But their identity, for regulatory and classification purposes, follows their function, not their source. The declaration format mandated by FSSAI (“Functional Class (Specific Name or INS)”) encodes this principle in law.</p>
<p>Examples include emulsifiers (soya lecithin, mono- and diglycerides), preservatives (sodium benzoate, potassium sorbate), sequestrants (calcium disodium EDTA), carriers, packaging gases, and modified starches classified under HS Chapter 35. Synthetic flavouring substances—where source is not required to be declared and identity is defined by molecular structure and sensory function—also occupy this zone.</p>
</section>
</section>
<section id="zone-boundaries-and-source-disclosure-obligations" class="level2 page-columns page-full" data-number="7.4">
<h2 data-number="7.4" class="anchored" data-anchor-id="zone-boundaries-and-source-disclosure-obligations"><span class="header-section-number">7.4</span> Zone Boundaries and Source Disclosure Obligations</h2>
<p>The Divorce Score thresholds are not unconditional. Two legal constraints modify the operational effect of zone assignment.</p>
<p>First, the allergen disclosure requirement: FSSAI Regulation 5(14) mandates declaration of common allergens—including cereals containing gluten, peanuts, soybeans, milk, eggs, fish, crustaceans, and tree nuts—regardless of the ingredient’s zone assignment.<sup>10</sup> A soy lecithin with <img src="https://latex.codecogs.com/png.latex?D%20%3E%200.70"> is classified as a functional tool, but its soy origin must still be disclosed for allergen purposes. Zone 3 classification does not displace the allergen disclosure obligation.</p>
<div class="no-row-height column-margin column-container"><div id="fn10"><p><sup>10</sup>&nbsp;FSSAI Labelling Regulations 2020, Regulation 5(14).</p></div></div><p>Second, the religious/ethical source disclosure requirement: as established by the Delhi High Court in <em>Ram Gaua Raksha Dal</em> <span class="citation" data-cites="DelhiHC_RamGaua_2022">(Delhi High Court 2022)</span>, the vegetarian/non-vegetarian origin of an ingredient must be declared regardless of its processing level or functional classification. Gelatin derived from animal bones, used as a gelling agent, carries a mandatory source-disclosure obligation on religious grounds that cannot be displaced by functional naming.</p>
<p>These constraints do not alter the zone assignment—the <img src="https://latex.codecogs.com/png.latex?D"> score and zone determination remain as computed—but they create additional labelling obligations that apply in parallel. The framework records these obligations as conditional metadata attached to the canonical entry.</p>
</section>
<section id="worked-zone-assignments" class="level2 page-columns page-full" data-number="7.5">
<h2 data-number="7.5" class="anchored" data-anchor-id="worked-zone-assignments"><span class="header-section-number">7.5</span> Worked Zone Assignments</h2>
<p>The following five examples illustrate the zone assignment process, using Table&nbsp;2 and Table&nbsp;3 as the reference for individual score values.</p>
<section id="cold-pressed-sesame-oil" class="level3" data-number="7.5.1">
<h3 data-number="7.5.1" class="anchored" data-anchor-id="cold-pressed-sesame-oil"><span class="header-section-number">7.5.1</span> Cold-Pressed Sesame Oil</h3>
<p>Cold pressing applies mechanical extraction without heat or solvent: <img src="https://latex.codecogs.com/png.latex?E%20=%200.32">. The resulting product is a pure triglyceride fraction with the biological source fully present in lipid form: <img src="https://latex.codecogs.com/png.latex?M%20=%200.70">. Regulatory naming is source-primary throughout—“sesame oil” is the mandatory declaration name, HS Chapter 15—placing this firmly in the lipid base functional category: <img src="https://latex.codecogs.com/png.latex?F%20=%200.22">.</p>
<p><img src="https://latex.codecogs.com/png.latex?D%20=%200.3(0.32)%20+%200.3(0.70)%20+%200.4(0.22)%20=%200.096%20+%200.210%20+%200.088%20=%20%5Cmathbf%7B0.394%7D"></p>
<p><em>Zone 2 (Independent Canon).</em> Cold-pressed sesame oil is not a variant of whole sesame seeds—the process and resulting state differ enough to warrant its own canonical entry—but its identity remains source-primary throughout. Solvent-extracted refined sesame oil, by contrast, carries <img src="https://latex.codecogs.com/png.latex?E%20=%200.75"> from the additional refining steps (deacidification, bleaching, deodorisation), yielding <img src="https://latex.codecogs.com/png.latex?D%20=%200.3(0.75)%20+%200.3(0.70)%20+%200.4(0.22)%20=%200.225%20+%200.210%20+%200.088%20=%200.523">, also Zone 2 but a distinct canon with a <img src="https://latex.codecogs.com/png.latex?D"> difference of 0.129 from its cold-pressed counterpart.</p>
</section>
<section id="soya-lecithin" class="level3 page-columns page-full" data-number="7.5.2">
<h3 data-number="7.5.2" class="anchored" data-anchor-id="soya-lecithin"><span class="header-section-number">7.5.2</span> Soya Lecithin</h3>
<p>Extraction from soybean oil through degumming, fractionation, and drying involves solvent exposure and intensive industrial separation: <img src="https://latex.codecogs.com/png.latex?E%20=%200.82">. The resulting phospholipid concentrate is a molecular-signal extract far removed from the whole soybean: <img src="https://latex.codecogs.com/png.latex?M%20=%200.89">. FSSAI Schedule I requires its declaration as “Emulsifier (INS 322)” and ITC-HS places it in Chapter 29 (phosphoaminolipids): <img src="https://latex.codecogs.com/png.latex?F%20=%200.82">.</p>
<p><img src="https://latex.codecogs.com/png.latex?D%20=%200.3(0.82)%20+%200.3(0.89)%20+%200.4(0.82)%20=%200.246%20+%200.267%20+%200.328%20=%20%5Cmathbf%7B0.841%7D"></p>
<p><em>Zone 3 (Functional Tool), with mandatory allergen metadata: soy origin must be disclosed under Regulation 5(14).</em><sup>11</sup></p>
<div class="no-row-height column-margin column-container"><div id="fn11"><p><sup>11</sup>&nbsp;FSSAI Labelling Regulations 2020, Regulation 5(14).</p></div></div></section>
<section id="kashmiri-red-chilli-powder" class="level3" data-number="7.5.3">
<h3 data-number="7.5.3" class="anchored" data-anchor-id="kashmiri-red-chilli-powder"><span class="header-section-number">7.5.3</span> Kashmiri Red Chilli Powder</h3>
<p>Dehusking followed by milling to fine powder: <img src="https://latex.codecogs.com/png.latex?E%20=%200.25"> (combined processing, no heat or solvent applied). The full nutrient spectrum of the chilli is retained in fine comminuted form: <img src="https://latex.codecogs.com/png.latex?M%20=%200.33">. Regulatory naming is source-primary with geographic specificity retained; FSSAI treats this under spice standards: <img src="https://latex.codecogs.com/png.latex?F%20=%200.18">.</p>
<p><img src="https://latex.codecogs.com/png.latex?D%20=%200.3(0.25)%20+%200.3(0.33)%20+%200.4(0.18)%20=%200.075%20+%200.099%20+%200.072%20=%20%5Cmathbf%7B0.246%7D"></p>
<p><em>Zone 1 (Variant).</em> Kashmiri Red Chilli Powder coordinates under the Red Chilli canonical family, distinguished by a geographic origin suffix. It coordinates equally with generic red chilli powder for allergen and compliance purposes while retaining its regional identity in consumer-facing declarations.</p>
</section>
<section id="acetylated-distarch-adipate-ins-1422" class="level3" data-number="7.5.4">
<h3 data-number="7.5.4" class="anchored" data-anchor-id="acetylated-distarch-adipate-ins-1422"><span class="header-section-number">7.5.4</span> Acetylated Distarch Adipate (INS 1422)</h3>
<p>Esterification of starch hydroxyl groups with both acetic and adipic moieties involves intentional covalent bond formation: <img src="https://latex.codecogs.com/png.latex?E%20=%200.94">. The resulting powder is classified as a modified starch under HS Chapter 35—de novo/synthetic matter: <img src="https://latex.codecogs.com/png.latex?M%20=%200.96">. FSSAI Schedule I requires its declaration under the modified starch additive category; ITC-HS Chapter 35 confirms function-dominant classification: <img src="https://latex.codecogs.com/png.latex?F%20=%200.94">.</p>
<p><img src="https://latex.codecogs.com/png.latex?D%20=%200.3(0.94)%20+%200.3(0.96)%20+%200.4(0.94)%20=%200.282%20+%200.288%20+%200.376%20=%20%5Cmathbf%7B0.946%7D"></p>
<p><em>Zone 3 (Functional Tool).</em></p>
</section>
<section id="fractionated-palm-olein" class="level3" data-number="7.5.5">
<h3 data-number="7.5.5" class="anchored" data-anchor-id="fractionated-palm-olein"><span class="header-section-number">7.5.5</span> Fractionated Palm Olein</h3>
<p>Controlled crystallisation and liquid-fraction separation: <img src="https://latex.codecogs.com/png.latex?E%20=%200.76">. The resulting product is a constitutional isolate of palm lipids: <img src="https://latex.codecogs.com/png.latex?M%20=%200.72">. Despite the process intensity, FSSAI Schedule II requires source-retaining naming (“fractionated palm oil” or “palm olein”) and ITC-HS retains it in Chapter 15: <img src="https://latex.codecogs.com/png.latex?F%20=%200.35">.</p>
<p><img src="https://latex.codecogs.com/png.latex?D%20=%200.3(0.76)%20+%200.3(0.72)%20+%200.4(0.35)%20=%200.228%20+%200.216%20+%200.140%20=%20%5Cmathbf%7B0.584%7D"></p>
<p><em>Zone 2 (Independent Canon).</em> This example illustrates the tie-breaking function of <img src="https://latex.codecogs.com/png.latex?F"> directly: despite <img src="https://latex.codecogs.com/png.latex?E"> and <img src="https://latex.codecogs.com/png.latex?M"> values that might suggest Zone 3 proximity, the source-retaining regulatory naming anchors the ingredient firmly in Zone 2. This is not an anomaly in the model; it is precisely what the independent <img src="https://latex.codecogs.com/png.latex?F"> dimension is designed to capture.</p>
</section>
</section>
</section>
<section id="sec-ch-validation" class="level1 page-columns page-full" data-number="8">
<h1 data-number="8"><span class="header-section-number">8</span> Benchmark Validation</h1>
<section id="validation-approach" class="level2" data-number="8.1">
<h2 data-number="8.1" class="anchored" data-anchor-id="validation-approach"><span class="header-section-number">8.1</span> Validation Approach</h2>
<p>The 35-test benchmark introduced in Section&nbsp;5.6 is applied to the <img src="https://latex.codecogs.com/png.latex?E">–<img src="https://latex.codecogs.com/png.latex?M">–<img src="https://latex.codecogs.com/png.latex?F"> model as defined in Section&nbsp;6 and Section&nbsp;7. For each test pair, the model assigns <img src="https://latex.codecogs.com/png.latex?(E,%20M,%20F)"> coordinates to each member, computes <img src="https://latex.codecogs.com/png.latex?D"> scores from Equation&nbsp;2, and determines zone classification. A discrimination is scored as correct if the pair members fall in different zones or, within the same zone, if the magnitude of difference is sufficient to warrant distinct canonical treatment under the framework’s canonical separation criteria.</p>
<p>Score assignments draw directly from Table&nbsp;2 and Table&nbsp;3 as primary reference, with <img src="https://latex.codecogs.com/png.latex?F"> scores assigned from the functional class taxonomy in Table&nbsp;4. Full technical derivations, including process-by-process forensic notes and defensibility ratings, are in the companion scoring report <span class="citation" data-cites="EMF_JustificationCompanion">(Lalitha 2026b)</span>.</p>
</section>
<section id="benchmark-results" class="level2" data-number="8.2">
<h2 data-number="8.2" class="anchored" data-anchor-id="benchmark-results"><span class="header-section-number">8.2</span> Benchmark Results</h2>
<div id="tbl-validation" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-tbl figure">
<figcaption class="quarto-float-caption-top quarto-float-caption quarto-float-tbl" id="tbl-validation-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Table&nbsp;5: Benchmark Validation Results: All 35 test pairs with computed <img src="https://latex.codecogs.com/png.latex?E">, <img src="https://latex.codecogs.com/png.latex?M">, <img src="https://latex.codecogs.com/png.latex?F">, and <img src="https://latex.codecogs.com/png.latex?D"> scores.
</figcaption>
<div aria-describedby="tbl-validation-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<table class="caption-top table">
<colgroup>
<col style="width: 5%">
<col style="width: 12%">
<col style="width: 7%">
<col style="width: 7%">
<col style="width: 7%">
<col style="width: 7%">
<col style="width: 12%">
<col style="width: 7%">
<col style="width: 7%">
<col style="width: 7%">
<col style="width: 7%">
<col style="width: 9%">
</colgroup>
<thead>
<tr class="header">
<th style="text-align: center;"><strong>ID</strong></th>
<th style="text-align: left;"><strong>Ingredient A</strong></th>
<th style="text-align: center;"><strong><img src="https://latex.codecogs.com/png.latex?E_A"></strong></th>
<th style="text-align: center;"><strong><img src="https://latex.codecogs.com/png.latex?M_A"></strong></th>
<th style="text-align: center;"><strong><img src="https://latex.codecogs.com/png.latex?F_A"></strong></th>
<th style="text-align: center;"><strong><img src="https://latex.codecogs.com/png.latex?D_A"></strong></th>
<th style="text-align: left;"><strong>Ingredient B</strong></th>
<th style="text-align: center;"><strong><img src="https://latex.codecogs.com/png.latex?E_B"></strong></th>
<th style="text-align: center;"><strong><img src="https://latex.codecogs.com/png.latex?M_B"></strong></th>
<th style="text-align: center;"><strong><img src="https://latex.codecogs.com/png.latex?F_B"></strong></th>
<th style="text-align: center;"><strong><img src="https://latex.codecogs.com/png.latex?D_B"></strong></th>
<th style="text-align: center;"><strong>Correct?</strong></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center;">1</td>
<td style="text-align: left;">Raw Apple</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;">0.05</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;"><strong>0.10</strong></td>
<td style="text-align: left;">Chilled Apple</td>
<td style="text-align: center;">0.18</td>
<td style="text-align: center;">0.05</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;"><strong>0.12</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="even">
<td style="text-align: center;">2</td>
<td style="text-align: left;">Whole Wheat Flour</td>
<td style="text-align: center;">0.28</td>
<td style="text-align: center;">0.33</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;"><strong>0.23</strong></td>
<td style="text-align: left;">Maida</td>
<td style="text-align: center;">0.28</td>
<td style="text-align: center;">0.48</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;"><strong>0.28</strong></td>
<td style="text-align: center;">✓*</td>
</tr>
<tr class="odd">
<td style="text-align: center;">3</td>
<td style="text-align: left;">Maida</td>
<td style="text-align: center;">0.28</td>
<td style="text-align: center;">0.48</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;"><strong>0.28</strong></td>
<td style="text-align: left;">Native Starch</td>
<td style="text-align: center;">0.49</td>
<td style="text-align: center;">0.60</td>
<td style="text-align: center;">0.55</td>
<td style="text-align: center;"><strong>0.55</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="even">
<td style="text-align: center;">4</td>
<td style="text-align: left;">Sliced Onion</td>
<td style="text-align: center;">0.15</td>
<td style="text-align: center;">0.10</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;"><strong>0.12</strong></td>
<td style="text-align: left;">Onion Powder</td>
<td style="text-align: center;">0.58</td>
<td style="text-align: center;">0.42</td>
<td style="text-align: center;">0.18</td>
<td style="text-align: center;"><strong>0.37</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="odd">
<td style="text-align: center;">5</td>
<td style="text-align: left;">Raw Milk</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;">0.05</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;"><strong>0.10</strong></td>
<td style="text-align: left;">Pasteurised Milk</td>
<td style="text-align: center;">0.48</td>
<td style="text-align: center;">0.05</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;"><strong>0.21</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="even">
<td style="text-align: center;">6</td>
<td style="text-align: left;">Fresh Fruit</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;">0.05</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;"><strong>0.10</strong></td>
<td style="text-align: left;">Dehydrated Fruit</td>
<td style="text-align: center;">0.58</td>
<td style="text-align: center;">0.36</td>
<td style="text-align: center;">0.15</td>
<td style="text-align: center;"><strong>0.34</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="odd">
<td style="text-align: center;">7</td>
<td style="text-align: left;">Raw Honey</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;">0.05</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;"><strong>0.10</strong></td>
<td style="text-align: left;">Pasteurised Honey</td>
<td style="text-align: center;">0.48</td>
<td style="text-align: center;">0.05</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;"><strong>0.21</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="even">
<td style="text-align: center;">8</td>
<td style="text-align: left;">Cold Pressed Oil</td>
<td style="text-align: center;">0.32</td>
<td style="text-align: center;">0.70</td>
<td style="text-align: center;">0.22</td>
<td style="text-align: center;"><strong>0.39</strong></td>
<td style="text-align: left;">Refined Oil</td>
<td style="text-align: center;">0.75</td>
<td style="text-align: center;">0.70</td>
<td style="text-align: center;">0.22</td>
<td style="text-align: center;"><strong>0.52</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="odd">
<td style="text-align: center;">9</td>
<td style="text-align: left;">Butter</td>
<td style="text-align: center;">0.45</td>
<td style="text-align: center;">0.72</td>
<td style="text-align: center;">0.22</td>
<td style="text-align: center;"><strong>0.44</strong></td>
<td style="text-align: left;">Ghee</td>
<td style="text-align: center;">0.55</td>
<td style="text-align: center;">0.72</td>
<td style="text-align: center;">0.22</td>
<td style="text-align: center;"><strong>0.47</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="even">
<td style="text-align: center;">10</td>
<td style="text-align: left;">Ghee</td>
<td style="text-align: center;">0.55</td>
<td style="text-align: center;">0.72</td>
<td style="text-align: center;">0.22</td>
<td style="text-align: center;"><strong>0.47</strong></td>
<td style="text-align: left;">Anh. Milk Fat</td>
<td style="text-align: center;">0.82</td>
<td style="text-align: center;">0.72</td>
<td style="text-align: center;">0.82</td>
<td style="text-align: center;"><strong>0.79</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="odd">
<td style="text-align: center;">11</td>
<td style="text-align: left;">Liquid Veg. Oil</td>
<td style="text-align: center;">0.75</td>
<td style="text-align: center;">0.70</td>
<td style="text-align: center;">0.22</td>
<td style="text-align: center;"><strong>0.52</strong></td>
<td style="text-align: left;">Vanaspati</td>
<td style="text-align: center;">0.92</td>
<td style="text-align: center;">0.72</td>
<td style="text-align: center;">0.55</td>
<td style="text-align: center;"><strong>0.71</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="even">
<td style="text-align: center;">12</td>
<td style="text-align: left;">Vanaspati</td>
<td style="text-align: center;">0.92</td>
<td style="text-align: center;">0.72</td>
<td style="text-align: center;">0.55</td>
<td style="text-align: center;"><strong>0.71</strong></td>
<td style="text-align: left;">Interester. Fat</td>
<td style="text-align: center;">0.91</td>
<td style="text-align: center;">0.72</td>
<td style="text-align: center;">0.82</td>
<td style="text-align: center;"><strong>0.82</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="odd">
<td style="text-align: center;">13</td>
<td style="text-align: left;">Milk</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;">0.05</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;"><strong>0.10</strong></td>
<td style="text-align: left;">Dairy Whitener</td>
<td style="text-align: center;">0.48</td>
<td style="text-align: center;">0.42</td>
<td style="text-align: center;">0.85</td>
<td style="text-align: center;"><strong>0.61</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="even">
<td style="text-align: center;">14</td>
<td style="text-align: left;">Coconut Milk</td>
<td style="text-align: center;">0.28</td>
<td style="text-align: center;">0.25</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;"><strong>0.21</strong></td>
<td style="text-align: left;">Coconut Oil</td>
<td style="text-align: center;">0.32</td>
<td style="text-align: center;">0.70</td>
<td style="text-align: center;">0.22</td>
<td style="text-align: center;"><strong>0.39</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="odd">
<td style="text-align: center;">15</td>
<td style="text-align: left;">Raw Milk</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;">0.05</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;"><strong>0.10</strong></td>
<td style="text-align: left;">Yogurt/Curd</td>
<td style="text-align: center;">0.56</td>
<td style="text-align: center;">0.38</td>
<td style="text-align: center;">0.15</td>
<td style="text-align: center;"><strong>0.34</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="even">
<td style="text-align: center;">16</td>
<td style="text-align: left;">Curd</td>
<td style="text-align: center;">0.56</td>
<td style="text-align: center;">0.38</td>
<td style="text-align: center;">0.15</td>
<td style="text-align: center;"><strong>0.34</strong></td>
<td style="text-align: left;">Soy Dahi</td>
<td style="text-align: center;">0.56</td>
<td style="text-align: center;">0.57</td>
<td style="text-align: center;">0.85</td>
<td style="text-align: center;"><strong>0.68</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="odd">
<td style="text-align: center;">17</td>
<td style="text-align: left;">Fruit Juice</td>
<td style="text-align: center;">0.28</td>
<td style="text-align: center;">0.50</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;"><strong>0.28</strong></td>
<td style="text-align: left;">Fruit Vinegar</td>
<td style="text-align: center;">0.56</td>
<td style="text-align: center;">0.50</td>
<td style="text-align: center;">0.18</td>
<td style="text-align: center;"><strong>0.39</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="even">
<td style="text-align: center;">18</td>
<td style="text-align: left;">Vinegar</td>
<td style="text-align: center;">0.56</td>
<td style="text-align: center;">0.50</td>
<td style="text-align: center;">0.45</td>
<td style="text-align: center;"><strong>0.52</strong></td>
<td style="text-align: left;">Glacial Acetic Acid</td>
<td style="text-align: center;">0.99</td>
<td style="text-align: center;">0.98</td>
<td style="text-align: center;">0.99</td>
<td style="text-align: center;"><strong>0.99</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="odd">
<td style="text-align: center;">19</td>
<td style="text-align: left;">Cane Sugar</td>
<td style="text-align: center;">0.55</td>
<td style="text-align: center;">0.98</td>
<td style="text-align: center;">0.55</td>
<td style="text-align: center;"><strong>0.68</strong></td>
<td style="text-align: left;">Xanthan Gum</td>
<td style="text-align: center;">0.56</td>
<td style="text-align: center;">0.98</td>
<td style="text-align: center;">0.88</td>
<td style="text-align: center;"><strong>0.81</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="even">
<td style="text-align: center;">20</td>
<td style="text-align: left;">Natural Yeast</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;">0.05</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;"><strong>0.10</strong></td>
<td style="text-align: left;">Chem. Leavening</td>
<td style="text-align: center;">0.99</td>
<td style="text-align: center;">0.99</td>
<td style="text-align: center;">0.99</td>
<td style="text-align: center;"><strong>0.99</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="odd">
<td style="text-align: center;">21</td>
<td style="text-align: left;">Wheat Flour</td>
<td style="text-align: center;">0.28</td>
<td style="text-align: center;">0.33</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;"><strong>0.23</strong></td>
<td style="text-align: left;">Maltodextrin</td>
<td style="text-align: center;">0.58</td>
<td style="text-align: center;">0.98</td>
<td style="text-align: center;">0.85</td>
<td style="text-align: center;"><strong>0.81</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="even">
<td style="text-align: center;">22</td>
<td style="text-align: left;">Native Starch</td>
<td style="text-align: center;">0.49</td>
<td style="text-align: center;">0.60</td>
<td style="text-align: center;">0.55</td>
<td style="text-align: center;"><strong>0.55</strong></td>
<td style="text-align: left;">Modified Starch</td>
<td style="text-align: center;">0.94</td>
<td style="text-align: center;">0.96</td>
<td style="text-align: center;">0.82</td>
<td style="text-align: center;"><strong>0.90</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="odd">
<td style="text-align: center;">23</td>
<td style="text-align: left;">Whole Soya Bean</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;">0.05</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;"><strong>0.10</strong></td>
<td style="text-align: left;">Soya Lecithin</td>
<td style="text-align: center;">0.82</td>
<td style="text-align: center;">0.89</td>
<td style="text-align: center;">0.82</td>
<td style="text-align: center;"><strong>0.84</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="even">
<td style="text-align: center;">24</td>
<td style="text-align: left;">Cane Sugar</td>
<td style="text-align: center;">0.55</td>
<td style="text-align: center;">0.98</td>
<td style="text-align: center;">0.55</td>
<td style="text-align: center;"><strong>0.68</strong></td>
<td style="text-align: left;">HFCS</td>
<td style="text-align: center;">0.91</td>
<td style="text-align: center;">0.99</td>
<td style="text-align: center;">0.85</td>
<td style="text-align: center;"><strong>0.91</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="odd">
<td style="text-align: center;">25</td>
<td style="text-align: left;">Vanilla Bean</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;">0.05</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;"><strong>0.10</strong></td>
<td style="text-align: left;">Natural Extract</td>
<td style="text-align: center;">0.86</td>
<td style="text-align: center;">0.86</td>
<td style="text-align: center;">0.60</td>
<td style="text-align: center;"><strong>0.76</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="even">
<td style="text-align: center;">26</td>
<td style="text-align: left;">Natural Extract</td>
<td style="text-align: center;">0.86</td>
<td style="text-align: center;">0.86</td>
<td style="text-align: center;">0.60</td>
<td style="text-align: center;"><strong>0.76</strong></td>
<td style="text-align: left;">Syn. Vanillin</td>
<td style="text-align: center;">0.98</td>
<td style="text-align: center;">0.98</td>
<td style="text-align: center;">0.99</td>
<td style="text-align: center;"><strong>0.99</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="odd">
<td style="text-align: center;">27</td>
<td style="text-align: left;">Chocolate</td>
<td style="text-align: center;">0.58</td>
<td style="text-align: center;">0.72</td>
<td style="text-align: center;">0.22</td>
<td style="text-align: center;"><strong>0.48</strong></td>
<td style="text-align: left;">Choc. Substitute</td>
<td style="text-align: center;">0.91</td>
<td style="text-align: center;">0.72</td>
<td style="text-align: center;">0.85</td>
<td style="text-align: center;"><strong>0.73</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="even">
<td style="text-align: center;">28</td>
<td style="text-align: left;">Natural Fibre</td>
<td style="text-align: center;">0.28</td>
<td style="text-align: center;">0.55</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;"><strong>0.30</strong></td>
<td style="text-align: left;">Purified Cellulose</td>
<td style="text-align: center;">0.82</td>
<td style="text-align: center;">0.98</td>
<td style="text-align: center;">0.88</td>
<td style="text-align: center;"><strong>0.89</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="odd">
<td style="text-align: center;">29</td>
<td style="text-align: left;">Cane Sugar</td>
<td style="text-align: center;">0.55</td>
<td style="text-align: center;">0.98</td>
<td style="text-align: center;">0.55</td>
<td style="text-align: center;"><strong>0.68</strong></td>
<td style="text-align: left;">Aspartame</td>
<td style="text-align: center;">0.99</td>
<td style="text-align: center;">0.99</td>
<td style="text-align: center;">0.99</td>
<td style="text-align: center;"><strong>0.99</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="even">
<td style="text-align: center;">30</td>
<td style="text-align: left;">Sea Salt</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;">0.98</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;"><strong>0.38</strong></td>
<td style="text-align: left;">Sodium Benzoate</td>
<td style="text-align: center;">0.99</td>
<td style="text-align: center;">0.99</td>
<td style="text-align: center;">0.99</td>
<td style="text-align: center;"><strong>0.99</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="odd">
<td style="text-align: center;">31</td>
<td style="text-align: left;">Guar Gum</td>
<td style="text-align: center;">0.82</td>
<td style="text-align: center;">0.86</td>
<td style="text-align: center;">0.88</td>
<td style="text-align: center;"><strong>0.86</strong></td>
<td style="text-align: left;">Cereal Flour</td>
<td style="text-align: center;">0.28</td>
<td style="text-align: center;">0.33</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;"><strong>0.23</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="even">
<td style="text-align: center;">32</td>
<td style="text-align: left;">Lemon Juice</td>
<td style="text-align: center;">0.28</td>
<td style="text-align: center;">0.50</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;"><strong>0.28</strong></td>
<td style="text-align: left;">Citric Acid</td>
<td style="text-align: center;">0.99</td>
<td style="text-align: center;">0.99</td>
<td style="text-align: center;">0.99</td>
<td style="text-align: center;"><strong>0.99</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="odd">
<td style="text-align: center;">33</td>
<td style="text-align: left;">Smoked Meat</td>
<td style="text-align: center;">0.58</td>
<td style="text-align: center;">0.10</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;"><strong>0.25</strong></td>
<td style="text-align: left;">Liquid Smoke</td>
<td style="text-align: center;">0.86</td>
<td style="text-align: center;">0.88</td>
<td style="text-align: center;">0.99</td>
<td style="text-align: center;"><strong>0.92</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="even">
<td style="text-align: center;">34</td>
<td style="text-align: left;">Nat. β-Carotene</td>
<td style="text-align: center;">0.86</td>
<td style="text-align: center;">0.86</td>
<td style="text-align: center;">0.85</td>
<td style="text-align: center;"><strong>0.86</strong></td>
<td style="text-align: left;">Syn. β-Carotene</td>
<td style="text-align: center;">0.99</td>
<td style="text-align: center;">0.99</td>
<td style="text-align: center;">0.99</td>
<td style="text-align: center;"><strong>0.99</strong></td>
<td style="text-align: center;">✓</td>
</tr>
<tr class="odd">
<td style="text-align: center;">35</td>
<td style="text-align: left;">Bulk Ingredient</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;">0.05</td>
<td style="text-align: center;">0.12</td>
<td style="text-align: center;"><strong>0.10</strong></td>
<td style="text-align: left;">INS Carrier</td>
<td style="text-align: center;">0.99</td>
<td style="text-align: center;">0.99</td>
<td style="text-align: center;">0.99</td>
<td style="text-align: center;"><strong>0.99</strong></td>
<td style="text-align: center;">✓</td>
</tr>
</tbody>
</table>
</div>
</figure>
</div>
<p>*Test 2 produces both members in Zone 1 (<img src="https://latex.codecogs.com/png.latex?D%20=%200.23"> and <img src="https://latex.codecogs.com/png.latex?D%20=%200.28">), but with a <img src="https://latex.codecogs.com/png.latex?D"> difference of 0.05 sufficient to warrant distinct canonical entries given that FSSAI product standards 2.4.1 and 2.4.2 explicitly define them as separate regulated commodities. The model correctly does not over-differentiate them into separate zones while still producing operationally distinct canonical assignments.</p>
</section>
<section id="worked-validations" class="level2 page-columns page-full" data-number="8.3">
<h2 data-number="8.3" class="anchored" data-anchor-id="worked-validations"><span class="header-section-number">8.3</span> Worked Validations</h2>
<p>Six cases illustrate the model’s discriminatory performance across the range of the benchmark, with all calculations drawn directly from Table&nbsp;2 and Table&nbsp;3.</p>
<section id="test-1-raw-apple-vs.-chilled-apple-floor-test" class="level3" data-number="8.3.1">
<h3 data-number="8.3.1" class="anchored" data-anchor-id="test-1-raw-apple-vs.-chilled-apple-floor-test"><span class="header-section-number">8.3.1</span> Test 1: Raw Apple vs.&nbsp;Chilled Apple (Floor Test)</h3>
<p>Raw apple: sorting and washing only, intact cellular structure, consumed as food without functional class designation. <img src="https://latex.codecogs.com/png.latex?E_A%20=%200.12,%5Cquad%20M_A%20=%200.05,%5Cquad%20F_A%20=%200.12"> <img src="https://latex.codecogs.com/png.latex?D_A%20=%200.3(0.12)%20+%200.3(0.05)%20+%200.4(0.12)%20=%200.036%20+%200.015%20+%200.048%20=%20%5Cmathbf%7B0.099%7D"></p>
<p>Chilled apple: refrigeration added to sorting and washing, no change in material state or regulatory designation. <img src="https://latex.codecogs.com/png.latex?E_B%20=%200.18,%5Cquad%20M_B%20=%200.05,%5Cquad%20F_B%20=%200.12"> <img src="https://latex.codecogs.com/png.latex?D_B%20=%200.3(0.18)%20+%200.3(0.05)%20+%200.4(0.12)%20=%200.054%20+%200.015%20+%200.048%20=%20%5Cmathbf%7B0.117%7D"></p>
<p>Both are Zone 1 variants; the <img src="https://latex.codecogs.com/png.latex?D"> difference of 0.018 is below the threshold for canonical distinction. The framework correctly does not treat chilling as an identity-changing event. <strong>Discrimination: correct.</strong></p>
</section>
<section id="test-8-cold-pressed-oil-vs.-refined-oil" class="level3" data-number="8.3.2">
<h3 data-number="8.3.2" class="anchored" data-anchor-id="test-8-cold-pressed-oil-vs.-refined-oil"><span class="header-section-number">8.3.2</span> Test 8: Cold-Pressed Oil vs.&nbsp;Refined Oil</h3>
<p>Cold-pressed sesame oil (see worked zone assignment in Section&nbsp;7): <img src="https://latex.codecogs.com/png.latex?D_A%20=%200.394">, Zone 2.</p>
<p>Refined sesame oil: refining adds deacidification, bleaching, and deodorisation to the cold-pressing process. <img src="https://latex.codecogs.com/png.latex?E_B%20=%200.75,%5Cquad%20M_B%20=%200.70,%5Cquad%20F_B%20=%200.22"> <img src="https://latex.codecogs.com/png.latex?D_B%20=%200.3(0.75)%20+%200.3(0.70)%20+%200.4(0.22)%20=%200.225%20+%200.210%20+%200.088%20=%20%5Cmathbf%7B0.523%7D"></p>
<p>Both are Zone 2 (Independent Canon), which is correct: both are regulatory-named oils with source-primary identity. However, their <img src="https://latex.codecogs.com/png.latex?D"> scores differ by 0.129 and their <img src="https://latex.codecogs.com/png.latex?E"> values differ by 0.43, producing distinct canonical entries. Codex CXS 19-1981 and FSSAI both recognise these as separate product designations. <strong>Discrimination: correct.</strong></p>
</section>
<section id="test-11-liquid-vegetable-oil-vs.-vanaspati" class="level3" data-number="8.3.3">
<h3 data-number="8.3.3" class="anchored" data-anchor-id="test-11-liquid-vegetable-oil-vs.-vanaspati"><span class="header-section-number">8.3.3</span> Test 11: Liquid Vegetable Oil vs.&nbsp;Vanaspati</h3>
<p>Refined liquid vegetable oil: <img src="https://latex.codecogs.com/png.latex?D_A%20=%200.523">, Zone 2.</p>
<p>Vanaspati—hydrogenated vegetable oil with mandatory trans fat disclosure, HS 1516: <img src="https://latex.codecogs.com/png.latex?E_B%20=%200.92,%5Cquad%20M_B%20=%200.72,%5Cquad%20F_B%20=%200.55"> <img src="https://latex.codecogs.com/png.latex?D_B%20=%200.3(0.92)%20+%200.3(0.72)%20+%200.4(0.55)%20=%200.276%20+%200.216%20+%200.220%20=%20%5Cmathbf%7B0.712%7D"></p>
<p>Liquid oil is Zone 2; vanaspati sits precisely at the Zone 2/Zone 3 boundary. The <img src="https://latex.codecogs.com/png.latex?F%20=%200.55"> reflects that “vanaspati” retains its FSSAI-defined product name with source-retaining naming, holding it at the upper edge of Zone 2 rather than crossing into Zone 3. Different canons, with the boundary position itself carrying analytical meaning about vanaspati’s status as a product that is heavily transformed yet still primarily identified by its food-commodity name. <strong>Discrimination: correct.</strong></p>
</section>
<section id="test-18-vinegar-vs.-glacial-acetic-acid" class="level3" data-number="8.3.4">
<h3 data-number="8.3.4" class="anchored" data-anchor-id="test-18-vinegar-vs.-glacial-acetic-acid"><span class="header-section-number">8.3.4</span> Test 18: Vinegar vs.&nbsp;Glacial Acetic Acid</h3>
<p>Brewed vinegar: double fermentation from agricultural substrate, classified under HS 2209 (Chapter 22, beverages/vinegar), retaining biological origin in product name. <img src="https://latex.codecogs.com/png.latex?E_A%20=%200.56,%5Cquad%20M_A%20=%200.50,%5Cquad%20F_A%20=%200.45"> <img src="https://latex.codecogs.com/png.latex?D_A%20=%200.3(0.56)%20+%200.3(0.50)%20+%200.4(0.45)%20=%200.168%20+%200.150%20+%200.180%20=%20%5Cmathbf%7B0.498%7D"></p>
<p>Glacial acetic acid—petrochemical synthesis, Chapter 29 (organic chemicals), FSSAI requires “SYNTHETIC – PREPARED FROM ACETIC ACID” labelling: <img src="https://latex.codecogs.com/png.latex?E_B%20=%200.99,%5Cquad%20M_B%20=%200.98,%5Cquad%20F_B%20=%200.99"> <img src="https://latex.codecogs.com/png.latex?D_B%20=%200.3(0.99)%20+%200.3(0.98)%20+%200.4(0.99)%20=%200.297%20+%200.294%20+%200.396%20=%20%5Cmathbf%7B0.987%7D"></p>
<p>Vinegar is Zone 2 (Independent Canon); glacial acetic acid is Zone 3 (Functional Tool). The HS chapter migration from 22 to 29 and the FSSAI mandatory labelling distinction are both fully captured. <strong>Discrimination: correct.</strong></p>
</section>
<section id="test-22-native-starch-vs.-modified-starch" class="level3" data-number="8.3.5">
<h3 data-number="8.3.5" class="anchored" data-anchor-id="test-22-native-starch-vs.-modified-starch"><span class="header-section-number">8.3.5</span> Test 22: Native Starch vs.&nbsp;Modified Starch</h3>
<p>Native wheat starch: starch isolation within HS Chapter 11. <img src="https://latex.codecogs.com/png.latex?E_A%20=%200.49,%5Cquad%20M_A%20=%200.60,%5Cquad%20F_A%20=%200.55"> <img src="https://latex.codecogs.com/png.latex?D_A%20=%200.3(0.49)%20+%200.3(0.60)%20+%200.4(0.55)%20=%200.147%20+%200.180%20+%200.220%20=%20%5Cmathbf%7B0.547%7D"></p>
<p>Acetylated distarch adipate (INS 1422): covalent modification, HS Chapter 35, FSSAI additive schedule. <img src="https://latex.codecogs.com/png.latex?E_B%20=%200.94,%5Cquad%20M_B%20=%200.96,%5Cquad%20F_B%20=%200.82"> <img src="https://latex.codecogs.com/png.latex?D_B%20=%200.3(0.94)%20+%200.3(0.96)%20+%200.4(0.82)%20=%200.282%20+%200.288%20+%200.328%20=%20%5Cmathbf%7B0.898%7D"></p>
<p>Native starch is Zone 2; modified starch is Zone 3. The HS chapter migration from 11 to 35—the identity snap discussed in Section&nbsp;5—is faithfully represented by the zone transition. <strong>Discrimination: correct.</strong></p>
</section>
<section id="test-23-whole-soya-bean-vs.-soya-lecithin" class="level3 page-columns page-full" data-number="8.3.6">
<h3 data-number="8.3.6" class="anchored" data-anchor-id="test-23-whole-soya-bean-vs.-soya-lecithin"><span class="header-section-number">8.3.6</span> Test 23: Whole Soya Bean vs.&nbsp;Soya Lecithin</h3>
<p>Whole soya bean: minimal processing, intact biological matrix. <img src="https://latex.codecogs.com/png.latex?E_A%20=%200.12,%5Cquad%20M_A%20=%200.05,%5Cquad%20F_A%20=%200.12"> <img src="https://latex.codecogs.com/png.latex?D_A%20=%200.3(0.12)%20+%200.3(0.05)%20+%200.4(0.12)%20=%200.036%20+%200.015%20+%200.048%20=%20%5Cmathbf%7B0.099%7D"></p>
<p>Soya lecithin (see worked zone assignment in Section&nbsp;7): <img src="https://latex.codecogs.com/png.latex?D_B%20=%200.841">, Zone 3.</p>
<p>The <img src="https://latex.codecogs.com/png.latex?D"> difference of 0.742 represents near-maximal discrimination. Zone 1 variant to Zone 3 functional tool, driven by three-dimensional divergence on all axes. Allergen metadata attaches to the lecithin canonical entry requiring soy origin disclosure under Regulation 5(14),<sup>12</sup> demonstrating that Zone 3 classification does not eliminate source tracking where legally required. <strong>Discrimination: correct.</strong></p>
<div class="no-row-height column-margin column-container"><div id="fn12"><p><sup>12</sup>&nbsp;FSSAI Labelling Regulations 2020, Regulation 5(14).</p></div></div></section>
</section>
<section id="determinism-quotient" class="level2" data-number="8.4">
<h2 data-number="8.4" class="anchored" data-anchor-id="determinism-quotient"><span class="header-section-number">8.4</span> Determinism Quotient</h2>
<p>All 35 benchmark pairs yield correct discriminations under the model as specified. The Determinism Quotient is:</p>
<p><img src="https://latex.codecogs.com/png.latex?DQ%20=%20%5Cfrac%7B35%7D%7B35%7D%20=%20%5Cmathbf%7B1.0%7D"></p>
<p>Note 2 carries an asterisk because both members fall in Zone 1; the discrimination is achieved through <img src="https://latex.codecogs.com/png.latex?D"> magnitude rather than zone boundary crossing. This is treated as a correct discrimination because the framework is designed to produce sub-zone canonical distinctions where regulatory instruments independently require them—which they do for Whole Wheat Flour versus Maida under FSSAI product standards 2.4.1 and 2.4.2.</p>
<p>Three cases identified during validation require calibration attention in subsequent versions: Test 10 (Ghee vs.&nbsp;Anhydrous Milk Fat, where the <img src="https://latex.codecogs.com/png.latex?F"> score assignment for AMF warrants review against Codex CXS 280-1973 standards), Test 16 (Curd vs.&nbsp;Soy Dahi, where the analogue detection relies on <img src="https://latex.codecogs.com/png.latex?F"> capturing the “non-biological source” signal, suggesting a future source-metadata extension), and Test 34 (Natural vs.&nbsp;Synthetic Beta-Carotene, where molecular identity is identical but source coordinate differs—a case where structured source metadata would strengthen the model’s discriminatory basis). These are areas for refinement, not failures; the model produces correct outputs in all three cases under the current specification.</p>
</section>
</section>
<section id="sec-ch-nova" class="level1" data-number="9">
<h1 data-number="9"><span class="header-section-number">9</span> Relationship to Existing Food Classification Frameworks</h1>
<section id="nova-and-ingredient-level-substrates" class="level2" data-number="9.1">
<h2 data-number="9.1" class="anchored" data-anchor-id="nova-and-ingredient-level-substrates"><span class="header-section-number">9.1</span> NOVA and Ingredient-Level Substrates</h2>
<p>The NOVA food processing classification system classifies food products into four groups based on the extent and purpose of industrial food processing <span class="citation" data-cites="Arora_MLProcessing_2025 Ispirova_Informatics_2025">(<span class="nocase">Arora et al.</span> 2025; Ispirova et al. 2025)</span>. NOVA Group 4 (ultra-processed foods) is defined by reference to industrial processing and the presence of ingredients typically used only in industrial production—many of which correspond to Zone 3 of the <img src="https://latex.codecogs.com/png.latex?E">–<img src="https://latex.codecogs.com/png.latex?M">–<img src="https://latex.codecogs.com/png.latex?F"> model.</p>
<p>NOVA operates at the product level: given a complete food product, it classifies the product by the nature of its processing. The <img src="https://latex.codecogs.com/png.latex?E">–<img src="https://latex.codecogs.com/png.latex?M">–<img src="https://latex.codecogs.com/png.latex?F"> model operates at the ingredient level: given an individual ingredient string, it assigns that ingredient a deterministic identity position. Product-level classification requires reliable ingredient-level classification as its substrate, and recent machine learning work applying NOVA to large datasets has encountered precisely this bottleneck: the absence of a principled ingredient-level scheme limits the accuracy and consistency of product-level predictions <span class="citation" data-cites="Arora_MLProcessing_2025 Ispirova_Informatics_2025">(<span class="nocase">Arora et al.</span> 2025; Ispirova et al. 2025)</span>.</p>
<p>The correspondence between the two frameworks is not coincidental—both are responding to the same underlying physical and regulatory reality about how processing transforms ingredient identity. Zone 3 ingredients (functional tools defined by technological role) map directly onto the additive-classified substances that NOVA uses to identify ultra-processed products. Zone 1 and Zone 2 ingredients map onto the culinary and processed ingredients of NOVA Groups 2 and 3. The <img src="https://latex.codecogs.com/png.latex?E">–<img src="https://latex.codecogs.com/png.latex?M">–<img src="https://latex.codecogs.com/png.latex?F"> model makes that reality computationally deterministic at the ingredient level, which is what product-level frameworks need as input.</p>
</section>
<section id="the-itc-hs-as-ground-truth" class="level2" data-number="9.2">
<h2 data-number="9.2" class="anchored" data-anchor-id="the-itc-hs-as-ground-truth"><span class="header-section-number">9.2</span> The ITC-HS as Ground Truth</h2>
<p>The Indian Trade Classification (Harmonised System) has been used throughout this report as primary evidence—a regulatory system that has already resolved many ingredient identity questions through decades of judicial and administrative refinement. The Supreme Court’s ruling in <em>Welkin Foods</em> <span class="citation" data-cites="WelkinFoods_2026">(Supreme Court of India 2026)</span> places HSN classification at the top of the interpretive hierarchy for identity disputes.</p>
<p>The <img src="https://latex.codecogs.com/png.latex?E">–<img src="https://latex.codecogs.com/png.latex?M">–<img src="https://latex.codecogs.com/png.latex?F"> framework uses the ITC-HS as its primary evidence base. HS codes are necessary but not always sufficient for ingredient-level classification: two ingredients may share an HS heading while having very different <img src="https://latex.codecogs.com/png.latex?E">, <img src="https://latex.codecogs.com/png.latex?M">, and <img src="https://latex.codecogs.com/png.latex?F"> scores if their processing histories and regulatory naming differ within the heading. The framework provides finer-grained resolution within and across HS headings. Where <img src="https://latex.codecogs.com/png.latex?E">–<img src="https://latex.codecogs.com/png.latex?M">–<img src="https://latex.codecogs.com/png.latex?F"> coordinates and HS chapter assignments converge—as they do in the majority of benchmark cases—that convergence is confirmation that the model is correctly grounded. Where they diverge, that divergence is a signal requiring investigation.</p>
</section>
</section>
<section id="sec-ch-limitations" class="level1" data-number="10">
<h1 data-number="10"><span class="header-section-number">10</span> Limitations and Open Questions</h1>
<section id="weight-calibration" class="level2" data-number="10.1">
<h2 data-number="10.1" class="anchored" data-anchor-id="weight-calibration"><span class="header-section-number">10.1</span> Weight Calibration</h2>
<p>The weights in the Divorce Score formula are provisionally assigned and have not been validated against a large empirical dataset of regulatory decisions or expert classifications. The choice of 0.4 for <img src="https://latex.codecogs.com/png.latex?F"> and 0.3 each for <img src="https://latex.codecogs.com/png.latex?E"> and <img src="https://latex.codecogs.com/png.latex?M"> is analytically motivated—the reasoning is set out in Section&nbsp;7—but it has not been optimised against a ground-truth corpus. Refinement of the weights using subject matter expert input, expanded benchmark coverage, or Bayesian calibration against regulatory decision data is anticipated and invited through the contribution protocol in Appendix A.</p>
</section>
<section id="zone-boundary-precision" class="level2" data-number="10.2">
<h2 data-number="10.2" class="anchored" data-anchor-id="zone-boundary-precision"><span class="header-section-number">10.2</span> Zone Boundary Precision</h2>
<p>The thresholds at <img src="https://latex.codecogs.com/png.latex?D%20=%200.30"> and <img src="https://latex.codecogs.com/png.latex?D%20=%200.70"> are calibrated to the benchmark cases but have not been validated across the full range of Indian food system ingredients. Ingredients near the thresholds may be sensitive to small changes in coordinate assignment. This sensitivity is acknowledged as a characteristic of the framework, not a failure: the zone boundaries are policy-relevant thresholds, not natural discontinuities in the physical or chemical properties of ingredients. The framework makes its current specification transparent and open to empirical refinement.</p>
</section>
<section id="source-coordinate-incompleteness" class="level2" data-number="10.3">
<h2 data-number="10.3" class="anchored" data-anchor-id="source-coordinate-incompleteness"><span class="header-section-number">10.3</span> Source Coordinate Incompleteness</h2>
<p>The <img src="https://latex.codecogs.com/png.latex?F"> score captures regulatory naming modality but does not encode the full specificity of the source coordinate: whether an ingredient is of plant or animal origin, whether it carries a geographic indication, or whether it has a specific religious or ethical status. A proposed extension treats source-metadata as a structured annotation on each canonical entry, separate from the three-dimensional coordinate system. This would allow recording “soya lecithin: source = <em>Glycine max</em>, vegan-compatible, allergen-flagged (soy)” as metadata attached to the Zone 3 classification without adding a fourth dimension that would complicate the <img src="https://latex.codecogs.com/png.latex?D"> score calculation.</p>
</section>
<section id="dynamic-regulatory-landscape" class="level2" data-number="10.4">
<h2 data-number="10.4" class="anchored" data-anchor-id="dynamic-regulatory-landscape"><span class="header-section-number">10.4</span> Dynamic Regulatory Landscape</h2>
<p>The regulatory ground truth used to calibrate <img src="https://latex.codecogs.com/png.latex?F"> scores is a snapshot as of 2025. Food regulation in India is actively evolving: FSSAI has issued amendments, notifications, and draft regulations at increasing frequency, and the judicial landscape continues to develop <span class="citation" data-cites="FSSAI_RegulatoryDelta">(Vukka and Lalitha 2026)</span>. The framework architecture accommodates this: <img src="https://latex.codecogs.com/png.latex?F"> scores are derived from a three-part test against specific regulatory provisions, so changes to those provisions propagate to <img src="https://latex.codecogs.com/png.latex?F"> score updates without requiring a redesign. Maintaining currency with regulatory changes is an ongoing maintenance task.</p>
</section>
<section id="scope-indian-regulatory-context" class="level2" data-number="10.5">
<h2 data-number="10.5" class="anchored" data-anchor-id="scope-indian-regulatory-context"><span class="header-section-number">10.5</span> Scope: Indian Regulatory Context</h2>
<p>The framework is calibrated specifically to the Indian regulatory context—FSSAI instruments, ITC-HS schedules, and Indian judicial precedent. The <img src="https://latex.codecogs.com/png.latex?E"> and <img src="https://latex.codecogs.com/png.latex?M"> dimensions are grounded in chemistry and nutrition science that is internationally applicable, but the <img src="https://latex.codecogs.com/png.latex?F"> dimension is context-specific. Extension to other regulatory contexts would require parallel derivation of <img src="https://latex.codecogs.com/png.latex?F"> scores from those contexts’ instruments. The architecture is designed to support such extension; the calibration work has not been performed.</p>
</section>
</section>
<section id="sec-ch-nextsteps" class="level1" data-number="11">
<h1 data-number="11"><span class="header-section-number">11</span> Next Steps: Building the Faceted Ingredient System</h1>
<section id="what-the-corpus-looks-like" class="level2" data-number="11.1">
<h2 data-number="11.1" class="anchored" data-anchor-id="what-the-corpus-looks-like"><span class="header-section-number">11.1</span> What the Corpus Looks Like</h2>
<p>The commercial sampling work conducted as part of this project—covering 896 stock-keeping units drawn from Indian retail channels, with full methodology to be documented in a forthcoming report—combined with the Open Food Facts India dataset <span class="citation" data-cites="OpenFoodFacts">(Open Food Facts contributors 2024)</span> yields approximately 4,800 deduplicated products. Splitting ingredient declarations by comma and conjunction across the full combined corpus produces approximately 48,000 variant strings. The two sources are methodologically distinct: the 896 SKU sample is a structured retail survey; the Open Food Facts contribution is a different collection pathway with its own coverage characteristics. Both are part of this project’s data infrastructure.</p>
<p>The processes and physical forms documented in the <img src="https://latex.codecogs.com/png.latex?E"> and <img src="https://latex.codecogs.com/png.latex?M"> reference tables of this report were derived from systematic examination of what actually appears across those 48,000 strings—not from prior literature alone, but from the empirical evidence of how Indian packaged food manufacturers describe their ingredients on commercial labels. The variant corpus is the empirical foundation on which the <img src="https://latex.codecogs.com/png.latex?E">–<img src="https://latex.codecogs.com/png.latex?M">–<img src="https://latex.codecogs.com/png.latex?F"> framework rests.</p>
</section>
<section id="the-classification-task-ahead" class="level2" data-number="11.2">
<h2 data-number="11.2" class="anchored" data-anchor-id="the-classification-task-ahead"><span class="header-section-number">11.2</span> The Classification Task Ahead</h2>
<p>The immediate next step is to build the faceted ingredient system: assigning <img src="https://latex.codecogs.com/png.latex?(E,%20M,%20F)"> coordinates and <img src="https://latex.codecogs.com/png.latex?D"> scores to each of the 729 canonical entities in the Encyclopedia v0.1 taxonomy <span class="citation" data-cites="EncyclopediaV01">(Lalitha 2026a)</span>, and then mapping the approximately 48,000 variant strings to their canonical families through the entity resolution pipeline.</p>
<p>This is a forward task. What the variant corpus has provided so far is the empirical basis for defining processes, matter classes, and functional zones—the population of real forms and transformations that any viable framework must handle. The next phase applies the <img src="https://latex.codecogs.com/png.latex?E">–<img src="https://latex.codecogs.com/png.latex?M">–<img src="https://latex.codecogs.com/png.latex?F"> model to classify each canonical entity systematically, extend those classifications through the suffix system to geographic, cultivar, and preparation-state variants, and attach the legal metadata (allergen flags, source disclosure obligations) that Zone assignments alone do not capture.</p>
<p>Each variant string will carry, as its output: a canonical ID, a zone classification, a <img src="https://latex.codecogs.com/png.latex?D"> score, suffix metadata encoding whatever distinctions the variant expresses beyond the canon, and any applicable legal disclosure flags. That structured output is what downstream systems—allergen detection, supply chain coordination, nutritional research, regulatory compliance—require as their input.</p>
</section>
<section id="governance-and-expert-input" class="level2" data-number="11.3">
<h2 data-number="11.3" class="anchored" data-anchor-id="governance-and-expert-input"><span class="header-section-number">11.3</span> Governance and Expert Input</h2>
<p>The classification of 729 canonical entities will not be completed by computational methods alone. Score assignments in the boundary regions—Zone 1/Zone 2 transitions for moderately processed ingredients, Zone 2/Zone 3 transitions for ingredients with mixed regulatory signals—require domain expertise that food scientists, food lawyers, customs practitioners, and nutritional researchers hold.</p>
<p>The expert input process described in Appendix B is the mechanism for incorporating this knowledge. What is available computationally is the framework specification, the benchmark as a quality standard, and the variant corpus as the empirical scope of the problem. What domain experts contribute is the evidence-based judgement about where specific ingredients fall within that framework, particularly in the cases that the benchmark was designed to surface as hard.</p>
</section>
<section id="planned-outputs" class="level2" data-number="11.4">
<h2 data-number="11.4" class="anchored" data-anchor-id="planned-outputs"><span class="header-section-number">11.4</span> Planned Outputs</h2>
<p>The classification work will produce a versioned update to the Encyclopedia of Indian Food Ingredients, with <img src="https://latex.codecogs.com/png.latex?E">–<img src="https://latex.codecogs.com/png.latex?M">–<img src="https://latex.codecogs.com/png.latex?F"> coordinates and zone classifications attached to each canonical entry, and with full technical derivations reviewed against the companion scoring report <span class="citation" data-cites="EMF_JustificationCompanion">(Lalitha 2026b)</span>. A versioned update protocol will be implemented so that regulatory changes propagate to <img src="https://latex.codecogs.com/png.latex?F"> score updates in a traceable and documented manner. The source metadata extension—structured annotation for origin-specific data including botanical source, geographic indication status, and religious or ethical classification—will be developed alongside the coordinate assignments.</p>
<p>The goal of this work is a publicly accessible, versioned, expert-reviewed ingredient classification system that any downstream application—NOVA-based product classification, allergen detection, supply chain systems, nutritional databases—can use as a stable, deterministic substrate.</p>
</section>
</section>
<section id="acknowledgments" class="level1 unnumbered">
<h1 class="unnumbered">Acknowledgments</h1>
<p>My deepest gratitude to Mr.&nbsp;Krishna, whose constancy forms the foundation upon which all my work, including this, quietly rests.</p>
<p>Salutations to the Goddess who dwells in all beings in the form of intelligence. I bow to her again and again.</p>
<p>This report was prepared as part of the Indian Food Informatics Data (IFID) project at the Interdisciplinary Systems Research Lab (iSRL). The synthesis draws upon extensive legal research and domain analysis conducted for food informatics applications.</p>
</section>
<section id="sec-app-critique" class="level1 unnumbered">
<h1 class="unnumbered">Appendix A: Critique and Contribution Protocol</h1>
<section id="purpose" class="level2 unnumbered">
<h2 class="unnumbered anchored" data-anchor-id="purpose">Purpose</h2>
<p>Measurement frameworks that affect regulatory decisions, commercial classifications, and consumer communications must be robust to expert scrutiny. This protocol establishes the conditions under which critiques of the <img src="https://latex.codecogs.com/png.latex?E">–<img src="https://latex.codecogs.com/png.latex?M">–<img src="https://latex.codecogs.com/png.latex?F"> framework will be engaged with substantively. It welcomes contributions from domain experts while maintaining the evidentiary standards that give the framework its analytical credibility.</p>
<p>The protocol is not gatekeeping. It is a quality filter distinguishing contributions that advance the framework from commentary that, however sincere, does not provide the specific, evidence-based refinements that the framework requires. Expert critique meeting the protocol’s requirements will be acknowledged, documented, and incorporated into future versions.</p>
</section>
<section id="conditions-for-a-valid-critique" class="level2 unnumbered">
<h2 class="unnumbered anchored" data-anchor-id="conditions-for-a-valid-critique">Conditions for a Valid Critique</h2>
<section id="evidentiary-support-from-permitted-sources" class="level3 unnumbered">
<h3 class="unnumbered anchored" data-anchor-id="evidentiary-support-from-permitted-sources">Evidentiary Support from Permitted Sources</h3>
<p>Every factual assertion in a critique must be supported by at least one source from the following categories: official Government of India gazettes including FSSAI regulations and compendiums; DGCI&amp;S Indian Trade Classification schedules and official explanatory notes; original judgments from the Supreme Court of India or High Courts, obtained from official court repositories; Codex Alimentarius Commission standards and guidelines; JECFA reports and evaluation reports; peer-reviewed scientific literature published in indexed journals.</p>
<p>The following are not permitted: marketing materials, industry association publications, or brand websites; commercial legal database summaries; blog posts, news articles, or trade press regardless of publication prominence; unpublished or unreviewed claims regardless of the credentials of the claimant.</p>
</section>
<section id="specific-line-level-identification" class="level3 unnumbered">
<h3 class="unnumbered anchored" data-anchor-id="specific-line-level-identification">Specific Line-Level Identification</h3>
<p>A valid critique must identify the specific claim, score, or framework element being challenged, specifying: which section, table, or equation contains the element; what the current value or claim is; what the proposed alternative value or claim is; and why the proposed alternative is better supported by evidence than the current formulation. General claims that the framework is “incorrect” or “incomplete” without this specificity do not constitute actionable critique.</p>
</section>
<section id="benchmark-consistency-check" class="level3 unnumbered">
<h3 class="unnumbered anchored" data-anchor-id="benchmark-consistency-check">Benchmark Consistency Check</h3>
<p>If the proposed revision would alter a <img src="https://latex.codecogs.com/png.latex?D"> score, zone threshold, or weight parameter, the critique must demonstrate that the revised formulation still produces correct discriminations for the 35-test benchmark. A revision that corrects one case while failing another provides weaker grounds for adoption than a revision that improves overall benchmark performance.</p>
</section>
</section>
<section id="critique-submission-format" class="level2 unnumbered">
<h2 class="unnumbered anchored" data-anchor-id="critique-submission-format">Critique Submission Format</h2>
<blockquote class="blockquote">
<p><strong>Section/Element:</strong> [Identify the specific section, table, equation, or score]</p>
<p><strong>Current Formulation:</strong> [State the current claim, value, or assignment]</p>
<p><strong>Proposed Revision:</strong> [State the proposed alternative]</p>
<p><strong>Evidence:</strong> [Cite at least one permitted source]</p>
<p><strong>Benchmark Impact:</strong> [State how the revision affects the 35-test benchmark, with specific test IDs]</p>
<p><strong>Contact:</strong> [Contact details for correspondence]</p>
</blockquote>
</section>
</section>
<section id="sec-app-contribute" class="level1 unnumbered">
<h1 class="unnumbered">Appendix B: Ways to Contribute</h1>
<p>The <img src="https://latex.codecogs.com/png.latex?E">–<img src="https://latex.codecogs.com/png.latex?M">–<img src="https://latex.codecogs.com/png.latex?F"> framework is an open research project. Contributions from domain experts are essential to its development. Two engagement pathways are available.</p>
<p><strong>Asynchronous Expert Input (2–4 hours per month).</strong> Every two weeks, the research team compiles open questions that have not been resolved through the team’s own analysis—typically concerning ingredient-level <img src="https://latex.codecogs.com/png.latex?F"> score assignments where regulatory evidence is ambiguous, benchmark cases where the model’s output warrants review, and weight calibration questions where expert judgement can supplement analytical reasoning. Contributors respond at their own pace; there is no expectation of real-time engagement.</p>
<p><strong>Systems Researcher Engagement (10–15 hours per week).</strong> Researchers with domain expertise in food science, food law, informatics, or nutritional science who wish to engage more deeply with the framework’s development are invited to join the research team. This engagement involves participation in framework development, validation work, and the preparation of technical reports.</p>
<p>All contributions, critique submissions, and expressions of interest should be directed through: <a href="https://isrl.in/join-us.html" class="uri">https://isrl.in/join-us.html</a></p>
<p>All critiques submitted in the format described in Appendix A will receive a written response within thirty days. Contributors whose input leads to a modification of the framework will be acknowledged in the subsequent version, with a description of the modification they proposed or supported. Contributions received but not adopted will be acknowledged with an explanation.</p>
<section id="references" class="level2 unnumbered">
<h2 class="unnumbered anchored" data-anchor-id="references">References</h2>
<div id="refs" class="references csl-bib-body hanging-indent">
<div id="ref-Arora_MLProcessing_2025" class="csl-entry">
<span class="nocase">Arora, Nalin, Aviral Chauhan, Siddhant Rana, et al.</span> 2025. <em>Application of Machine Learning to Predict Food Processing Level Using Open Food Facts</em>.
</div>
<div id="ref-Broughton_CC_2006" class="csl-entry">
Broughton, Vanda. 2006. <span>“The Need for a Faceted Classification as the Basis of All Methods of Information Retrieval.”</span> <em>Aslib Proceedings</em> 58 (1–2): 49–72.
</div>
<div id="ref-DelhiHC_RamGaua_2022" class="csl-entry">
Delhi High Court. 2022. <em><span class="nocase">Ram Gaua Raksha Dal v. Union of India and Others, W.P.(C) 12055/2021</span></em>.
</div>
<div id="ref-DGCI_CH11" class="csl-entry">
Directorate General of Commercial Intelligence and Statistics. 2007a. <em><span class="nocase">Indian Trade Classification (H.S.): Chapter 11 — Products of the milling industry; malt; starches; inulin; wheat gluten</span></em>. <a href="https://www.dgciskol.gov.in/Writereaddata/Downloads/2007/CHP_11.pdf">https://www.dgciskol.gov.in/Writereaddata/Downloads/2007/CHP_11.pdf</a>.
</div>
<div id="ref-DGCI_CH15" class="csl-entry">
Directorate General of Commercial Intelligence and Statistics. 2007b. <em><span class="nocase">Indian Trade Classification (H.S.): Chapter 15 — Animal or vegetable fats and oils</span></em>. <a href="https://www.dgciskol.gov.in/Writereaddata/Downloads/CHP_15.pdf">https://www.dgciskol.gov.in/Writereaddata/Downloads/CHP_15.pdf</a>.
</div>
<div id="ref-DGCI_CH35" class="csl-entry">
Directorate General of Commercial Intelligence and Statistics. 2007c. <em><span class="nocase">Indian Trade Classification (H.S.): Chapter 35 — Albuminoidal substances; modified starches; glues; enzymes</span></em>. <a href="https://dgciskol.gov.in/Writereaddata/Downloads/2007/CHP_35.pdf">https://dgciskol.gov.in/Writereaddata/Downloads/2007/CHP_35.pdf</a>.
</div>
<div id="ref-FSSAI_Label_2020" class="csl-entry">
Food Safety and Standards Authority of India. 2023. <em><span class="nocase">Food Safety and Standards (Labelling and Display) Regulations, 2020 (Version-VI, 22.02.2023)</span></em>. <a href="https://www.fssai.gov.in/upload/uploadfiles/files/Comp_Labelling.pdf">https://www.fssai.gov.in/upload/uploadfiles/files/Comp_Labelling.pdf</a>.
</div>
<div id="ref-FSSAI_Additives_2011" class="csl-entry">
Food Safety and Standards Authority of India. 2024. <em><span class="nocase">Food Safety and Standards (Food Products Standards and Food Additives) Regulations, 2011, as amended through 2024</span></em>. <a href="https://fssai.gov.in/upload/uploadfiles/files/Chapter%203_Substances%20added%20to%20food.pdf">https://fssai.gov.in/upload/uploadfiles/files/Chapter%203_Substances%20added%20to%20food.pdf</a>.
</div>
<div id="ref-Ispirova_Informatics_2025" class="csl-entry">
Ispirova, Gordana, Michael Sebek, Giulia Menichetti, and Ganesh Bagler. 2025. <em>Informatics for Food Processing</em>.
</div>
<div id="ref-EncyclopediaV01" class="csl-entry">
Lalitha, A. R. 2026a. <em><span class="nocase">Encyclopedia of Indian Food Ingredients (v0.1.0): A Standardized Taxonomy for Indian Food Informatics</span></em>. Interdisciplinary Systems Research Lab, Zenodo. <a href="https://doi.org/10.5281/zenodo.18650863">https://doi.org/10.5281/zenodo.18650863</a>.
</div>
<div id="ref-EMF_JustificationCompanion" class="csl-entry">
Lalitha, A. R. 2026b. <em><span class="nocase">Justification Companion to EMF-Scoring Model (IFID Project)</span></em>. Interdisciplinary Systems Research Lab. <a href="https://doi.org/10.5281/zenodo.18713318">https://doi.org/10.5281/zenodo.18713318</a>.
</div>
<div id="ref-OpenFoodFacts" class="csl-entry">
Open Food Facts contributors. 2024. <em>Open Food Facts Database</em>. <a href="https://world.openfoodfacts.org">https://world.openfoodfacts.org</a>.
</div>
<div id="ref-Ranganathan_CC_1933" class="csl-entry">
Ranganathan, S. R. 1933. <em>Colon Classification</em>. 1st ed. Madras Library Association.
</div>
<div id="ref-Ranganathan_PMEST" class="csl-entry">
Ranganathan, S. R. 1967. <span>“Prolegomena to Library Classification.”</span> <em>Annals of Library Science</em> 14: 1–15.
</div>
<div id="ref-WelkinFoods_2026" class="csl-entry">
Supreme Court of India. 2026. <em><span class="nocase">Commissioner of Customs (Import) v. M/s Welkin Foods, 2026 SCC OnLine SC 27; 2026 INSC 19</span></em>.
</div>
<div id="ref-FSSAI_RegulatoryDelta" class="csl-entry">
Vukka, S. N., and A. R. Lalitha. 2026. <em><span class="nocase">Regulatory Delta of Food Labelling Laws in India: A Comparative Analysis of the FSSAI 2011 and 2020 Regulations</span></em>. Indian Food Informatics Data (IFID) Project, Interdisciplinary Systems Research Lab. <a href="https://doi.org/10.5281/zenodo.18710428">https://doi.org/10.5281/zenodo.18710428</a>.
</div>
</div>


</section>
</section>


<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@report{a_r2026,
  author = {A R, Lalitha},
  publisher = {iSRL},
  title = {Identity, {Transformation,} and {Function} {A} {Tri-Axial}
    {Model} for the {Classification} of {Food} {Ingredient} {Identity}},
  number = {iSRL-26-02-R-EMF},
  date = {2026-02-20},
  url = {https://isrl.in/pub/2026-02-r-emf/},
  doi = {10.5281/zenodo.18714527},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-a_r2026" class="csl-entry quarto-appendix-citeas">
A R, Lalitha. 2026. <em>Identity, Transformation, and Function A
Tri-Axial Model for the Classification of Food Ingredient Identity</em>.
iSRL-26-02-R-EMF. iSRL. <a href="https://doi.org/10.5281/zenodo.18714527">https://doi.org/10.5281/zenodo.18714527</a>.
</div></div></section></div> ]]></description>
  <guid>https://isrl.in/pub/2026-02-r-emf/</guid>
  <pubDate>Fri, 20 Feb 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Indian Supreme Court Defines Hierarchical Classification Framework for Food Products, Overruling Common Parlance Precedents</title>
  <dc:creator>Lalitha A R</dc:creator>
  <link>https://isrl.in/pub/2026-02-r-scfood/</link>
  <description><![CDATA[ 




<script>
document.addEventListener('DOMContentLoaded', function() {
  var meta = document.querySelector('#title-block-header .quarto-title-meta');
  if (!meta) return;
  meta.insertAdjacentHTML('beforeend', '<div><div class="quarto-title-meta-heading">Contributors</div><div class="quarto-title-meta-contents"><p class="author" style="margin:0 0 0.1em 0;">Hitha Sunil</p><p style="font-size:0.82em;color:#555;margin:0 0 0.5em 0;font-style:italic;">Typesetting</p></div></div>');
});
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "ScholarlyArticle",
  "name": "Indian Supreme Court Defines Hierarchical Classification Framework for Food Products, Overruling Common Parlance Precedents",
  "@id": "https://doi.org/10.5281/zenodo.18651646",
  "identifier": [
    "https://doi.org/10.5281/zenodo.18651646",
    "iSRL-26-02-R-SCFood"
  ],
  "description": "Synthesizes landmark Indian judicial decisions in food classification law, documenting the Supreme Court's 2026 ruling that established a hierarchical technical classification framework superseding common parlance interpretation — with significant implications for food informatics and ingredient classification systems.",
  "datePublished": "2026-02-15",
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "url": "https://isrl.in/pub/2026-02-r-scfood/",
  "author": {
    "@type": "Person",
    "name": "Lalitha A R",
    "identifier": "https://orcid.org/0009-0001-7466-3531",
    "sameAs": "https://orcid.org/0009-0001-7466-3531",
    "email": "lalithaar.research@gmail.com"
  },
  "publisher": {
    "@type": "ResearchOrganization",
    "name": "iSRL",
    "url": "https://isrl.in"
  }
}
</script>
<section id="abstract" class="level2" data-number="0.1">
<h2 data-number="0.1" class="anchored" data-anchor-id="abstract"><span class="header-section-number">0.1</span> Abstract</h2>
<p>This report synthesizes landmark judicial decisions in Indian food classification law, documenting the transition from common parlance-based interpretation to a hierarchical technical classification framework. The January 6, 2026 Supreme Court judgment in <em>Commissioner of Customs Import v. Ms Welkin Foods</em> formally established a precedential hierarchy that prioritizes statutory interpretation over lay understanding, marking a departure from pre-HSN era jurisprudence. This analysis examines four domains (GST/Tax, Food Safety, Customs, and Dietary Labels) where technical definitions now supersede common parlance, with significant implications for food informatics, regulatory compliance, and ingredient classification systems.</p>
</section>
<section id="sec-introduction" class="level1" data-number="1">
<h1 data-number="1"><span class="header-section-number">1</span> Introduction</h1>
<p>The classification of food products under Indian law has undergone a fundamental transformation. Historically, courts relied heavily on common parlance—the everyday understanding of terms as used in ordinary commerce—to interpret ambiguous statutory language. This approach, exemplified by pre-1986 cases such as <em>Nix v. Hedden</em> (1893) in the United States and <em>Ramavatar Budhaiprasad v. Assistant Sales Tax Officer</em> (1961) in India, prioritized accessibility and commercial understanding over technical precision.</p>
<p>The adoption of the Harmonised System of Nomenclature (HSN) by India in 1986, following its international introduction in 1988 by the World Customs Organisation, initiated a gradual shift toward technical classification. However, the tension between common understanding and scientific taxonomy persisted until the Supreme Court’s 2026 ruling explicitly established a hierarchy for resolving classification disputes.</p>
<p>This report documents this watershed moment and its implications across multiple regulatory domains.</p>
</section>
<section id="sec-historical" class="level1" data-number="2">
<h1 data-number="2"><span class="header-section-number">2</span> Historical Context: Pre-HSN Era Common Parlance Precedents</h1>
<section id="sec-nix" class="level2" data-number="2.1">
<h2 data-number="2.1" class="anchored" data-anchor-id="sec-nix"><span class="header-section-number">2.1</span> United States: <em>Nix v. Hedden</em> (1893)</h2>
<p>The seminal case <em>Nix v. Hedden</em>, 149 U.S. 304 (1893), established that tomatoes should be classified as vegetables rather than fruits for tariff purposes, despite their botanical classification. The Supreme Court of the United States held that classification should follow common parlance—how ordinary people in commerce understand terms—rather than technical botanical definitions.</p>
</section>
<section id="sec-india-early" class="level2" data-number="2.2">
<h2 data-number="2.2" class="anchored" data-anchor-id="sec-india-early"><span class="header-section-number">2.2</span> India: Early Common Parlance Cases</h2>
<section id="sec-ramavatar" class="level3" data-number="2.2.1">
<h3 data-number="2.2.1" class="anchored" data-anchor-id="sec-ramavatar"><span class="header-section-number">2.2.1</span> <em>Ramavatar Budhaiprasad v. Assistant Sales Tax Officer</em> (1961)</h3>
<p>In this landmark case, AIR 1961 SC 1325; (1962) 1 SCR 279; (1961) 12 STC 286, decided on March 14, 1961, the Supreme Court of India addressed whether betel leaves should be classified as vegetables for sales tax exemption purposes under the Central Provinces and Berar Sales Tax Act, 1947.</p>
<p>The Court held that the word “vegetables” must be interpreted not in a technical or botanical sense, but in its popular sense as understood in common language—denoting classes of vegetable matter grown in kitchen gardens or farms and used for the table. The Court stated: “It has not been defined in the Act and being a word of everyday use it must be construed in its popular sense, meaning that sense which people conversant with the subject matter with which the statute is dealing would attribute to it.”</p>
<p>The Court ruled that betel leaves, while botanically plant matter, were not vegetables in common parlance and were therefore taxable.</p>
</section>
<section id="sec-krishnaiyer" class="level3" data-number="2.2.2">
<h3 data-number="2.2.2" class="anchored" data-anchor-id="sec-krishnaiyer"><span class="header-section-number">2.2.2</span> <em>Krishna Iyer v. State of Kerala</em> (1962)</h3>
<p>Decided on March 6, 1962, this case from the Kerala High Court similarly applied the common parlance test to determine whether green ginger qualified as a vegetable for tax exemption purposes. The Court held that vegetables should be understood “as commonly understood denoting those classes of vegetable matter which are grown in kitchen gardens and are used for the table,” and concluded that green ginger, despite being plant matter, was included in the specific term “ginger” in the tax schedule and was therefore taxable.</p>
</section>
</section>
<section id="sec-pre-hsn" class="level2" data-number="2.3">
<h2 data-number="2.3" class="anchored" data-anchor-id="sec-pre-hsn"><span class="header-section-number">2.3</span> The Pre-HSN Framework</h2>
<p>Prior to India’s adoption of the HSN system in 1986, courts consistently applied common parlance as the primary interpretive tool for ambiguous statutory terms. This approach served several purposes:</p>
<ul>
<li>It made tax classifications accessible to ordinary merchants without specialized knowledge</li>
<li>It aligned legal interpretations with commercial practice</li>
<li>It avoided the complexity of technical botanical or chemical classifications</li>
<li>It provided predictability based on everyday understanding</li>
</ul>
</section>
</section>
<section id="sec-welkin" class="level1" data-number="3">
<h1 data-number="3"><span class="header-section-number">3</span> The Watershed Moment: <em>Commissioner of Customs Import v. Ms Welkin Foods</em> (2026)</h1>
<section id="sec-welkin-details" class="level2" data-number="3.1">
<h2 data-number="3.1" class="anchored" data-anchor-id="sec-welkin-details"><span class="header-section-number">3.1</span> Case Details</h2>
<p><strong>Citation:</strong> 2026 SCC OnLine SC 27; 2026 INSC 19<br>
<strong>Date:</strong> January 6, 2026 (reported January 6-7, 2026)<br>
<strong>Court:</strong> Supreme Court of India<br>
<strong>Bench:</strong> Justice J.B. Pardiwala and Justice R. Mahadevan<br>
<strong>Parties:</strong> Commissioner of Customs (Import) v. M/s Welkin Foods</p>
</section>
<section id="sec-welkin-facts" class="level2" data-number="3.2">
<h2 data-number="3.2" class="anchored" data-anchor-id="sec-welkin-facts"><span class="header-section-number">3.2</span> Facts and Issue</h2>
<p>The case concerned the proper classification of imported aluminium shelving used for mushroom cultivation. The respondent, Welkin Foods, argued the goods should be classified under Customs Tariff Item (CTI) 84369900 as “parts” of agricultural machinery. The Revenue contended the shelving should be classified under CTI 76109010 as “Aluminium Structures.”</p>
<p>The core legal question was whether the intended use of the product (mushroom cultivation) or its objective technical characteristics should govern classification.</p>
</section>
<section id="sec-welkin-reasoning" class="level2" data-number="3.3">
<h2 data-number="3.3" class="anchored" data-anchor-id="sec-welkin-reasoning"><span class="header-section-number">3.3</span> The Court’s Reasoning</h2>
<p>The Supreme Court held that classification must be based on objective characteristics of the product, not solely on intended end-use. The Court established several critical principles:</p>
<ol type="1">
<li><p><strong>Structure vs.&nbsp;Machine:</strong> The shelving was held to be a “structure” (fixed in place) rather than a “part” of a machine. It did not qualify as a component essential for the mechanical function of agricultural machinery.</p></li>
<li><p><strong>Material Identity Primacy:</strong> While exclusive use can sometimes influence classification, it does not override the fundamental material identity of the product when it is specifically described elsewhere in the tariff.</p></li>
<li><p><strong>Hierarchical Framework:</strong> Most significantly, the Court articulated a hierarchy for classification disputes, stating: “It is only in a state of statutory silence, where the legislative intent remains unexpressed, that the tribunals or courts may resort to the common or trade parlance test.”</p></li>
</ol>
</section>
<section id="sec-welkin-hierarchy" class="level2" data-number="3.4">
<h2 data-number="3.4" class="anchored" data-anchor-id="sec-welkin-hierarchy"><span class="header-section-number">3.4</span> The Established Hierarchy</h2>
<p>The <em>Ms Welkin Foods</em> judgment established the following precedential hierarchy for food and product classification:</p>
<ol type="1">
<li><strong>Judicial Interpretation of Statute</strong> (highest priority)—How the court reads the HSN and statutory provisions</li>
<li><strong>Technical/Scientific Definition</strong>—When statute provides technical guidance through HSN codes</li>
<li><strong>Expert Opinion</strong>—Testimony from qualified experts in relevant fields</li>
<li><strong>Common Parlance</strong>—Trade usage and ordinary understanding (only in statutory silence)</li>
</ol>
<p>This hierarchy fundamentally reorients Indian classification jurisprudence, relegating common parlance from its historical primacy to a fallback position.</p>
</section>
</section>
<section id="sec-domains" class="level1" data-number="4">
<h1 data-number="4"><span class="header-section-number">4</span> Domain-Specific Applications</h1>
<p>The hierarchical framework established in <em>Ms Welkin Foods</em> has been applied consistently across multiple regulatory domains, demonstrating the pervasiveness of technical classification over common understanding.</p>
<section id="sec-gst" class="level2" data-number="4.1">
<h2 data-number="4.1" class="anchored" data-anchor-id="sec-gst"><span class="header-section-number">4.1</span> GST and Tax Domain: Scientific Composition Prevails</h2>
<section id="sec-gajanand" class="level3" data-number="4.1.1">
<h3 data-number="4.1.1" class="anchored" data-anchor-id="sec-gajanand"><span class="header-section-number">4.1.1</span> <em>In re Gajanand Foods Private Limited</em></h3>
<p>The Gujarat Authority for Advance Ruling (GAAR) and subsequently the Appellate Authority (AAAR) addressed whether instant mix flours containing spices, leavening agents, and other additives should be classified under Chapter Headings 1102 or 1106 (attracting 5% GST) or under Heading 2106 90 (attracting 18% GST).</p>
<p><strong>Ruling:</strong> The AAAR held that instant mix flours for products like Gota, Khaman, Dhokla, Idli, Dosa, Handvo, and others, containing 5-37% additional ingredients (spices, salt, sodium bicarbonate, chili powder), are classifiable under HSN 2106 90 as “Food Preparations not elsewhere specified or included,” attracting 18% GST.</p>
<p><strong>Rationale:</strong> The technical composition, including functional additives that transformed the product from mere flour into a meal preparation kit, removed it from the common category of “flour.” The presence of leavening agents, spices, and other ingredients meant for creating a specific dish demonstrated that these were food preparations, not basic flours.</p>
</section>
<section id="sec-ramdev" class="level3" data-number="4.1.2">
<h3 data-number="4.1.2" class="anchored" data-anchor-id="sec-ramdev"><span class="header-section-number">4.1.2</span> <em>In re Ramdev Food Products Private Limited</em></h3>
<p>In a parallel case, the Gujarat AAAR addressed similar instant mix flours produced by Ramdev Food Products, including instant mixes for Gota, Khaman, Dalwada, Dahiwada, Idli, Dhokla, Dosa, Pizza, Methi Gota, and Handvo.</p>
<p><strong>Ruling:</strong> The AAAR upheld the AAR’s classification of these products under HSN 2106 90, attracting 18% GST. The Court rejected arguments based on VAT-era precedents, holding that “merely because the end consumer of the Instant Mix Flour is required to follow certain food preparation processes before such product(s) can be consumed, is no ground to take these products out of Chapter Heading 2106.”</p>
<p><strong>Key Principle:</strong> The technical composition and processing state—not the trade name or common understanding—governs classification under the HSN system.</p>
</section>
</section>
<section id="sec-foodsafety" class="level2" data-number="4.2">
<h2 data-number="4.2" class="anchored" data-anchor-id="sec-foodsafety"><span class="header-section-number">4.2</span> Food Safety Domain: Nutrient Thresholds Over Marketing Terms</h2>
<section id="sec-3s" class="level3" data-number="4.2.1">
<h3 data-number="4.2.1" class="anchored" data-anchor-id="sec-3s"><span class="header-section-number">4.2.1</span> <em>3S and Our Health Society v. Union of India</em></h3>
<p><strong>Case Details:</strong> Writ Petition (Civil) No.&nbsp;437/2024<br>
<strong>Court:</strong> Supreme Court of India<br>
<strong>Bench:</strong> Justice J.B. Pardiwala and Justice R. Mahadevan (initial disposal: April 9, 2025)<br>
<strong>Subsequent hearings:</strong> February 2026</p>
<p>This ongoing public interest litigation seeks mandatory Front-of-Package Warning Labels (FoPWL) on packaged food products containing high levels of sugar, salt, and saturated fats.</p>
<p><strong>Court’s Direction:</strong> The Supreme Court directed the Food Safety and Standards Authority of India (FSSAI) to prioritize scientific thresholds of salt, sugar, and saturated fats over the food industry’s preferred marketing terminology. The Court emphasized that consumer health protection requires objective, scientifically determined nutrient levels rather than subjective or trade-based descriptions.</p>
<p><strong>Implication:</strong> The Court’s insistence on scientific measurement over industry terminology reflects the same hierarchical principle established in <em>Ms Welkin Foods</em>—technical, objective criteria supersede commercial nomenclature.</p>
</section>
</section>
<section id="sec-customs" class="level2" data-number="4.3">
<h2 data-number="4.3" class="anchored" data-anchor-id="sec-customs"><span class="header-section-number">4.3</span> Customs Domain: Engineering Function Over Trade Nomenclature</h2>
<p>The <em>Ms Welkin Foods</em> case itself exemplifies this domain. The Supreme Court held that aluminium racks for mushroom cultivation are technically structures (Chapter 76) and not machinery (Chapter 84) because they lack mechanical function, regardless of their trade name or intended agricultural use.</p>
<p>This establishes that in customs classification, the technical characteristics—material composition and functional properties—override the commercial designation or end-use of a product.</p>
</section>
<section id="sec-dietary" class="level2" data-number="4.4">
<h2 data-number="4.4" class="anchored" data-anchor-id="sec-dietary"><span class="header-section-number">4.4</span> Dietary Labels Domain: Biological Origin Disclosure Mandatory</h2>
<section id="sec-ramgaua" class="level3" data-number="4.4.1">
<h3 data-number="4.4.1" class="anchored" data-anchor-id="sec-ramgaua"><span class="header-section-number">4.4.1</span> <em>Ram Gaua Raksha Dal v. Union of India and Others</em></h3>
<p><strong>Case Details:</strong> W.P.(C) 12055/2021<br>
<strong>Court:</strong> Delhi High Court<br>
<strong>Bench:</strong> Justice Vipin Sanghi and Justice Jasmeet Singh (December 2021) / Justice Vipin Sanghi and Justice Dinesh Kumar Sharma (March 2022)<br>
<strong>Date:</strong> December 9, 2021; subsequent order March 2, 2022</p>
<p>This case challenged the inadequate disclosure of animal-sourced ingredients in packaged food products, particularly where International Numbering System (INS) codes obscure the biological origin of food additives.</p>
<p><strong>Court’s Ruling:</strong> The Delhi High Court held that the biological origin of ingredients must be disclosed, stating that “every person has a right to know as to what he/she is consuming, and nothing can be offered to the person on a platter by resort to deception, or camouflage.”</p>
<p>The Court directed that:</p>
<ul>
<li>Food Business Operators must make full and complete disclosure of all ingredients, not only by their code names (INS numbers) but also by disclosing whether they originate from plant, animal source, or are manufactured in a laboratory.</li>
<li>The disclosure must specify the actual plant or animal source, regardless of the percentage used in the food article.</li>
<li>Even minuscule amounts of animal-sourced ingredients (other than milk, milk products, honey, beeswax, carnauba wax, or shellac) render the product non-vegetarian and must be disclosed accordingly.</li>
<li>Chemical code alone “camouflages the truth from the consumer.”</li>
</ul>
<p><strong>Constitutional Basis:</strong> The Court grounded this requirement in Articles 19(1)(a) (freedom of speech and information), 21 (right to life and health), and 25 (freedom of religion) of the Indian Constitution, recognizing that dietary choices based on religious, ethical, or health considerations require transparent ingredient disclosure.</p>
<p><strong>Implication:</strong> This case demonstrates that in labeling disputes, the biological or chemical origin—a technical, scientific classification—supersedes simplified marketing designations or chemical code nomenclature.</p>
</section>
</section>
</section>
<section id="sec-synthesis" class="level1" data-number="5">
<h1 data-number="5"><span class="header-section-number">5</span> Synthesis: The Four-Domain Framework</h1>
<p>Table&nbsp;1 synthesizes the current state of classification law across the four primary domains:</p>
<div id="tbl-domains" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-tbl figure">
<figcaption class="quarto-float-caption-top quarto-float-caption quarto-float-tbl" id="tbl-domains-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Table&nbsp;1: Classification Framework Across Regulatory Domains
</figcaption>
<div aria-describedby="tbl-domains-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<table class="caption-top table">
<colgroup>
<col style="width: 33%">
<col style="width: 33%">
<col style="width: 33%">
</colgroup>
<thead>
<tr class="header">
<th><strong>Domain</strong></th>
<th><strong>Classification Winner</strong></th>
<th><strong>Landmark Case</strong></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>GST/Tax</td>
<td>Technical (HSN)</td>
<td><em>Gajanand Foods</em>; <em>Ramdev Food Products</em></td>
</tr>
<tr class="even">
<td>Food Safety</td>
<td>Scientific (Nutrient Levels)</td>
<td><em>3S and Our Health Society</em></td>
</tr>
<tr class="odd">
<td>Customs</td>
<td>Technical (Engineering)</td>
<td><em>Ms Welkin Foods</em></td>
</tr>
<tr class="even">
<td>Dietary Labels</td>
<td>Biological Origin</td>
<td><em>Ram Gaua Raksha Dal</em></td>
</tr>
</tbody>
</table>
</div>
</figure>
</div>
</section>
<section id="sec-disclosure" class="level1" data-number="6">
<h1 data-number="6"><span class="header-section-number">6</span> Mandatory Disclosure Requirements: Implications for Food Informatics</h1>
<p>The cases analyzed in this report establish several mandatory disclosure requirements under Indian law. These requirements have direct implications for food informatics systems, ingredient databases, and regulatory compliance frameworks.</p>
<section id="sec-source-disclosure" class="level2" data-number="6.1">
<h2 data-number="6.1" class="anchored" data-anchor-id="sec-source-disclosure"><span class="header-section-number">6.1</span> Source Disclosure</h2>
<p><strong>Requirement:</strong> Even if an ingredient is heavily processed or becomes a derivative compound, the source must be declared.</p>
<p><strong>Example:</strong> Lecithin must be labeled as “Lecithin (Soy)” or “Lecithin (Egg),” not merely as “Lecithin” or by INS code alone.</p>
<p><strong>Legal Basis:</strong> <em>Ram Gaua Raksha Dal v. Union of India</em></p>
</section>
<section id="sec-dietary-status" class="level2" data-number="6.2">
<h2 data-number="6.2" class="anchored" data-anchor-id="sec-dietary-status"><span class="header-section-number">6.2</span> Dietary Status</h2>
<p><strong>Requirement:</strong> Products must be classified as vegetarian (with egg/dairy), non-vegetarian, or pure vegetarian (no animal source or dairy).</p>
<p><strong>Principle:</strong> Even if an ingredient is a chemical derived from an animal source, it must be declared with respect to its source.</p>
<p><strong>Legal Basis:</strong> <em>Ram Gaua Raksha Dal v. Union of India</em>; Food Safety and Standards (Labelling and Display) Regulations, 2020</p>
</section>
<section id="sec-allergen" class="level2" data-number="6.3">
<h2 data-number="6.3" class="anchored" data-anchor-id="sec-allergen"><span class="header-section-number">6.3</span> Allergen Declaration</h2>
<p><strong>Requirement:</strong> Even if processing is extreme and the allergen potency is reduced from the source level, allergen presence must still be declared for consumer safety.</p>
<p><strong>Legal Basis:</strong> Right to health (Article 21 of the Constitution); FSSAI allergen declaration requirements</p>
</section>
<section id="sec-functional-class" class="level2" data-number="6.4">
<h2 data-number="6.4" class="anchored" data-anchor-id="sec-functional-class"><span class="header-section-number">6.4</span> Functional Class for Chemicals</h2>
<p><strong>Requirement:</strong> For chemical additives, the functional class (purpose of inclusion) must be declared followed by the INS Number.</p>
<p><strong>Example:</strong> “Preservative (INS 202)” rather than merely “INS 202” or “Potassium Sorbate.”</p>
<p><strong>Legal Basis:</strong> FSSAI Labelling Regulations 2020; functional usage declaration requirements</p>
</section>
</section>
<section id="sec-informatics" class="level1" data-number="7">
<h1 data-number="7"><span class="header-section-number">7</span> Analytical Implications for Food Informatics</h1>
<p>The hierarchical framework and mandatory disclosure requirements have profound implications for food informatics systems:</p>
<section id="sec-attribute-classification" class="level2" data-number="7.1">
<h2 data-number="7.1" class="anchored" data-anchor-id="sec-attribute-classification"><span class="header-section-number">7.1</span> Attribute-Based Ingredient Classification</h2>
<p>The cases reveal that the determination of whether something constitutes a separate ingredient entity versus a variant of an existing entity depends on functional attributes rather than source alone.</p>
<p><strong>Key Principle:</strong> Functionality takes precedence over source material when determining ingredient separateness.</p>
<p>For example, in the instant mix flour cases, the presence of functional additives (leavening agents, spices used for specific culinary purposes) transformed what might be considered a “variant of flour” into a distinct food preparation. The additives were not merely processing aids but functional components that changed the nature of the product.</p>
</section>
<section id="sec-data-modeling" class="level2" data-number="7.2">
<h2 data-number="7.2" class="anchored" data-anchor-id="sec-data-modeling"><span class="header-section-number">7.2</span> Hierarchical Data Modeling</h2>
<p>Food informatics databases must now implement hierarchical classification systems that mirror the judicial hierarchy:</p>
<ol type="1">
<li><strong>Statutory Classification Layer:</strong> HSN codes and tariff classifications</li>
<li><strong>Technical/Scientific Layer:</strong> Chemical composition, functional properties, biological origin</li>
<li><strong>Commercial Layer:</strong> Trade names, common parlance terms (lowest priority)</li>
</ol>
</section>
<section id="sec-metadata" class="level2" data-number="7.3">
<h2 data-number="7.3" class="anchored" data-anchor-id="sec-metadata"><span class="header-section-number">7.3</span> Mandatory Metadata Requirements</h2>
<p>Any comprehensive food ingredient database must now capture:</p>
<ul>
<li>Biological/chemical source (plant, animal, synthetic)</li>
<li>Specific source species/material (even if heavily processed)</li>
<li>Functional class (for additives)</li>
<li>Allergen status (regardless of processing degree)</li>
<li>Dietary classification (vegetarian, vegan, non-vegetarian)</li>
<li>HSN code classification</li>
</ul>
</section>
</section>
<section id="sec-conclusion" class="level1" data-number="8">
<h1 data-number="8"><span class="header-section-number">8</span> Conclusion</h1>
<p>The January 6, 2026 Supreme Court judgment in <em>Commissioner of Customs Import v. Ms Welkin Foods</em> represents a watershed moment in Indian food classification jurisprudence. By formally establishing a hierarchical framework that prioritizes statutory interpretation and technical definition over common parlance, the Court has fundamentally reoriented how food products are classified across multiple regulatory domains.</p>
<p>This shift reflects the increasing complexity of food systems, the globalization of trade through standardized systems like HSN, and the constitutional imperative for transparent disclosure that enables informed consumer choice. The historical reliance on common parlance, while accessible and commercially grounded, proved insufficient in an era of complex food processing, international trade nomenclature, and diverse dietary requirements based on health, religion, and ethics.</p>
<p>The four-domain analysis presented in this report—spanning GST/Tax, Food Safety, Customs, and Dietary Labels—demonstrates the consistency with which Indian courts are now applying technical classification principles. In each domain, technical or scientific attributes supersede lay understanding or commercial nomenclature.</p>
<p>For food informatics systems, regulatory compliance frameworks, and ingredient databases, these decisions mandate a fundamental restructuring. Classification systems must be hierarchical, metadata must capture technical attributes (source, function, composition), and disclosure must prioritize scientific accuracy over commercial simplicity.</p>
<p>This report serves as a synthesis of century-spanning jurisprudential evolution, documenting the transition from common parlance dominance to technical hierarchy supremacy. It provides legal researchers, food &amp; beverage lawyers, compliance professionals, and informatics specialists with a comprehensive framework for understanding and applying current Indian food classification law.</p>
<section id="acknowledgments" class="level2" data-number="8.1">
<h2 data-number="8.1" class="anchored" data-anchor-id="acknowledgments"><span class="header-section-number">8.1</span> Acknowledgments</h2>
<p>My deepest gratitude to Mr.&nbsp;Krishna, whose constancy forms the foundation upon which all my work, including this, quietly rests.</p>
<p>Salutations to the Goddess who dwells in all beings in the form of intelligence. I bow to her again and again.</p>
<p>This report was prepared as part of the Indian Food Informatics Data (IFID) project at the Interdisciplinary Systems Research Lab (iSRL). The synthesis draws upon extensive legal research and domain analysis conducted for food informatics applications.</p>
</section>
<section id="references" class="level2 unnumbered">
<h2 class="unnumbered anchored" data-anchor-id="references">References</h2>
<div id="refs">

</div>


</section>
</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@report{a_r2026,
  author = {A R, Lalitha},
  publisher = {iSRL},
  title = {Indian {Supreme} {Court} {Defines} {Hierarchical}
    {Classification} {Framework} for {Food} {Products,} {Overruling}
    {Common} {Parlance} {Precedents}},
  number = {iSRL-26-02-R-SCFood},
  date = {2026-02-15},
  url = {https://isrl.in/pub/2026-02-r-scfood/},
  doi = {10.5281/zenodo.18651646},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-a_r2026" class="csl-entry quarto-appendix-citeas">
A R, Lalitha. 2026. <em>Indian Supreme Court Defines Hierarchical
Classification Framework for Food Products, Overruling Common Parlance
Precedents</em>. iSRL-26-02-R-SCFood. iSRL. <a href="https://doi.org/10.5281/zenodo.18651646">https://doi.org/10.5281/zenodo.18651646</a>.
</div></div></section></div> ]]></description>
  <guid>https://isrl.in/pub/2026-02-r-scfood/</guid>
  <pubDate>Sun, 15 Feb 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Encyclopedia of Indian Food Ingredients</title>
  <dc:creator>Lalitha A R</dc:creator>
  <link>https://isrl.in/pub/2026-02-b-encyclopedia/</link>
  <description><![CDATA[ 




<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Dataset",
  "name": "Encyclopedia of Indian Food Ingredients",
  "@id": "https://doi.org/10.5281/zenodo.18650863",
  "identifier": [
    "https://doi.org/10.5281/zenodo.18650863",
    "iSRL-26-02-B-Encyclopedia"
  ],
  "description": "A standardized taxonomy and multi-format dataset (JSON, Markdown, LaTeX) covering 600+ food components — from traditional Ayurvedic botanicals to contemporary industrial additives. Bridges culinary knowledge with international food standards for machine-readable Indian food data systems.",
  "datePublished": "2026-02-11",
  "version": "v0.1.0",
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "url": "https://isrl.in/pub/2026-02-b-encyclopedia/",
  "author": {
    "@type": "Person",
    "name": "Lalitha A R",
    "identifier": "https://orcid.org/0009-0001-7466-3531",
    "sameAs": "https://orcid.org/0009-0001-7466-3531",
    "email": "lalithaar.research@gmail.com"
  },
  "publisher": {
    "@type": "ResearchOrganization",
    "name": "iSRL",
    "url": "https://isrl.in"
  }
}
</script>
<section id="abstract" class="level2">
<h2 class="anchored" data-anchor-id="abstract">Abstract</h2>
<p>A standardized taxonomy and multi-format dataset (JSON, Markdown, LaTeX) covering 600+ food components — from traditional Ayurvedic botanicals to contemporary industrial additives. Bridges conventional culinary knowledge with international food standards to establish a machine-readable framework for Indian food data systems.</p>
</section>
<section id="repository" class="level2">
<h2 class="anchored" data-anchor-id="repository">Repository</h2>
<p>Source data and formats available at: <a href="https://github.com/ifid-data/encyclopedia" class="uri">https://github.com/ifid-data/encyclopedia</a></p>
<hr>
</section>
<section id="references" class="level2 unnumbered">
<h2 class="unnumbered anchored" data-anchor-id="references">References</h2>
<div id="refs">

</div>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@dataset{a_r2026,
  author = {A R, Lalitha},
  publisher = {iSRL},
  title = {Encyclopedia of {Indian} {Food} {Ingredients}},
  number = {iSRL-26-02-B-Encyclopedia},
  date = {2026-02-11},
  url = {https://isrl.in/pub/2026-02-b-encyclopedia/},
  doi = {10.5281/zenodo.18650863},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-a_r2026" class="csl-entry quarto-appendix-citeas">
A R, Lalitha. 2026. <span>“Encyclopedia of Indian Food
Ingredients.”</span> iSRL-26-02-B-Encyclopedia. iSRL, February 11. <a href="https://doi.org/10.5281/zenodo.18650863">https://doi.org/10.5281/zenodo.18650863</a>.
</div></div></section></div> ]]></description>
  <guid>https://isrl.in/pub/2026-02-b-encyclopedia/</guid>
  <pubDate>Wed, 11 Feb 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Indian Food Ingredients &amp; Label Variants</title>
  <dc:creator>Lalitha A R</dc:creator>
  <link>https://isrl.in/pub/2026-02-ds-variants/</link>
  <description><![CDATA[ 




<script>
document.addEventListener('DOMContentLoaded', function() {
  var meta = document.querySelector('#title-block-header .quarto-title-meta');
  if (!meta) return;
  meta.insertAdjacentHTML('beforeend', '<div><div class="quarto-title-meta-heading">Contributors</div><div class="quarto-title-meta-contents"><p class="author" style="margin:0 0 0.1em 0;">Subrat Sethi</p><p class="author" style="margin:0 0 0.1em 0;">Purnendu Shukla</p></div></div>');
});
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Dataset",
  "name": "Indian Food Ingredients & Label Variants",
  "@id": "https://doi.org/10.5281/zenodo.1871452",
  "identifier": [
    "https://doi.org/10.5281/zenodo.1871452",
    "iSRL-26-02-DS-Variants"
  ],
  "description": "A mapping of 2,500+ regional ingredient variations observed on Indian food labels, linking label variants to a canonical vocabulary. Note: this dataset has been superseded — the v1 approach was abandoned after finding it conflated noise reduction with meaningful cultural and linguistic variation.",
  "creativeWorkStatus": "Superseded",
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "url": "https://isrl-research.github.io/pub/2026-02-ds-variants/",
  "author": {
    "@type": "Person",
    "name": "Lalitha A R",
    "identifier": "https://orcid.org/0009-0001-7466-3531",
    "sameAs": "https://orcid.org/0009-0001-7466-3531",
    "email": "lalithaar.research@gmail.com"
  },
  "publisher": {
    "@type": "ResearchOrganization",
    "name": "iSRL",
    "url": "https://isrl-research.github.io"
  }
}
</script>
<div class="callout callout-style-default callout-important callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Important</span>This version has been superseded
</div>
</div>
<div class="callout-body-container callout-body">
<p>This dataset is no longer maintained. The v1 approach was found to be structurally inadequate for the problem it was designed to solve. The full reasoning is documented below. The dataset remains available for reference at the link above.</p>
<p>For current work, see the <a href="https://doi.org/10.5281/zenodo.18714526">Identity, Transformation, and Function framework</a> and its <a href="https://doi.org/10.5281/zenodo.18713318">justification companion</a>.</p>
</div>
</div>
<p>We released <a href="https://doi.org/10.34740/KAGGLE/DSV/14783287">Indian Food Ingredients &amp; Label Variants</a> (v1) with the goal of making ingredient label text parseable by machines. The dataset standardised ingredient names — mapping <code>kashmiri chilli</code> to <code>chilli</code>, for instance — on the assumption that a normalised vocabulary would make automated parsing tractable.</p>
<p>Two problems emerged as data collection continued.</p>
<p>First, the approach trades away information the project is now explicitly committed to preserving. The data makes this concrete.</p>
<div id="load-data" class="cell" data-execution_count="1">
<div class="cell-output cell-output-stdout">
<pre><code>                            canon               variant
0                      A2 Protein            a2 protein
1  Acesulfame Potassium (INS 950)          acesulfame k
2  Acesulfame Potassium (INS 950)  acesulfame potassium
3  Acesulfame Potassium (INS 950)     sweetener ins 950
4           Acetic Acid (INS 260)           acetic acid</code></pre>
</div>
</div>
<div id="chilli-canon" class="cell" data-execution_count="2">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb2-2"></span>
<span id="cb2-3">df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.read_csv(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data/ingredients.csv"</span>, header<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>, names<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"canon"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"variant"</span>])</span>
<span id="cb2-4"></span>
<span id="cb2-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># All variants that map to Chilli in v1</span></span>
<span id="cb2-6">chilli <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"canon"</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Chilli"</span>].copy()</span>
<span id="cb2-7"></span>
<span id="cb2-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># The ones that carry regional and variety-level identity</span></span>
<span id="cb2-9">regional <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> chilli[chilli[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"variant"</span>].<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>.contains(</span>
<span id="cb2-10">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"kashmiri|mathania|jalapeño|lal mirch"</span>, case<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span></span>
<span id="cb2-11">)].reset_index(drop<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb2-12"></span>
<span id="cb2-13"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(regional.to_string(index<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code> canon                                                          variant
Chilli                                                  kashmiri chilli
Chilli                                               kashmiri lal mirch
Chilli                                                    mild jalapeño
Chilli salt with spices and condiments chillies and capsicum lal mirchi
Chilli                 spices and condiments kashmiri red chilli powder
Chilli                 spices and condiments mathania red chilli powder
Chilli                                      stalkless kashmiri chillies</code></pre>
</div>
</div>
<p>In v1, every row above maps to <code>Chilli</code>. <code>Kashmiri lal mirch</code>, <code>mathania red chilli powder</code>, <code>stalkless kashmiri chillies</code> — all collapsed into the same canon as <code>chili powder</code> and <code>red chilly flakes</code>.</p>
<p>The brands that wrote these labels did not have to. <code>Kashmiri chilli</code> could have been declared as <code>chilli</code> — it would have been legally compliant. The choice to name it specifically was a choice to preserve something: a regional identity, a flavour profile, a cultural referent that Indian consumers recognise and reach for. The v1 mapping erases that choice.</p>
<p>This is not only a question of cultural fidelity. Ingredient identity has legal and fiscal consequences. Fresh alphonso mangoes attract 0% GST as an agricultural produce; mango pulp processed from a specific GI-tagged variety enters a different regulatory category. <code>Kashmiri chilli</code> carries a Geographical Indication; a generic <code>chilli</code> does not. When a mapping table collapses these into one canon, it does not simplify the data — it destroys the signal that downstream regulatory, taxation, and traceability systems depend on. Respecting the taste of India is not a sentiment; it is a data integrity requirement.</p>
<p>Second, the ingredient name space in Indian packaged food is too diverse for automated mapping to be reliable. The problem splits into two structurally different cases:</p>
<ul>
<li><strong>Semantic variants</strong> — spelling differences, typos, punctuation variation — can be resolved with a comprehensive mapping table, because the variation is noise around a stable referent. <code>Chenna</code>, <code>bengal gram flour</code>, and <code>chickpea flour</code> are different names for the same thing. <code>Palmitate</code> and <code>palm oil</code> are not — they are similar-sounding but distinct ingredients.</li>
<li><strong>Cultural and linguistic variants</strong> — regional names, transliterations, variety-level distinctions (like alphonso mango) — cannot be mapped reliably because the variation itself carries meaning. A model trained on such a mapping would not learn the differences; it would erase them.</li>
</ul>
<p>Maintaining a single mapping table that handles both cases conflates the problem. In practice, it means tracking every normalisation decision made during data cleaning — effectively a log of every typo fixed across thousands of rows — with no mechanism to distinguish meaningful variation from noise.</p>
<p>The ingredient substrate under development makes this mapping unnecessary. A deterministic identity layer — one that assigns canonical identifiers to ingredients independent of how they are written on any given label — eliminates the need for probabilistic name matching at parse time. Labels are parsed against the substrate, not against a maintained vocabulary of variants.</p>
<p>The v1 dataset will remain available for reference. The label variants mapping will not be maintained going forward.</p>
<hr>
<p>This brings us to the question of how we extract the variants in a way that preserves the signal.</p>
<p>How do we formalise that milk solids feels like it should be under milk while butter feels different? How do we measure the distance between a variant and its source ingredient?</p>
<p>These questions led to a food classification framework inspired by Ranganathan’s 1933 Colon Classification<sup>1</sup><sup>2</sup> and grounded in Indian judicial and regulatory precedents — FSSAI, ITC-HS, court rulings.</p>
<div class="no-row-height column-margin column-container"><div id="fn1"><p><sup>1</sup>&nbsp;Colon Classification (Faceted Classification) by S R Ranganathan, Father of Indian Library Science.</p></div><div id="fn2"><p><sup>2</sup>&nbsp;Instead of a flat list, faceted classification lets us express a single object as a set of values across independent dimensions — the way filtering by price, type, and brand on Amazon works, rather than browsing a single ranked list.</p></div></div><ul>
<li><a href="https://doi.org/10.5281/zenodo.18714526">Identity, Transformation, and Function: A Tri-Axial Model for the Classification of Food Ingredient Identity</a></li>
<li><a href="https://doi.org/10.5281/zenodo.18713318">Justification companion</a></li>
</ul>




<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@dataset{a_r2026,
  author = {A R, Lalitha},
  publisher = {iSRL},
  title = {Indian {Food} {Ingredients} \&amp; {Label} {Variants}},
  number = {iSRL-26-02-DS-Variants},
  date = {2026-02-01},
  url = {https://isrl.in/pub/2026-02-ds-variants/},
  doi = {10.5281/zenodo.1871452},
  langid = {en},
  abstract = {**This dataset has been superseded.** The v1 mapping
    approach — standardising ingredient label variants to a canonical
    vocabulary — was found to conflate noise reduction with meaningful
    cultural and linguistic variation. This document explains why the
    approach was abandoned and what replaced it. A mapping of 2500+
    regional ingredient variations as observed in Indian labels. This
    dataset provides a structured mapping of the diverse ways
    ingredients are named on Indian food packaging, linking variants
    (the actual text found on labels) to a canon (a standardised, clean
    category). Example mapping: Canon: Acetic Acid (INS 260) — Variants:
    acidity regulator 260, vinegar, ins 260, acetic acid (260).}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-a_r2026" class="csl-entry quarto-appendix-citeas">
A R, Lalitha. 2026. <span>“Indian Food Ingredients &amp; Label
Variants.”</span> iSRL-26-02-DS-Variants. iSRL, February 1. <a href="https://doi.org/10.5281/zenodo.1871452">https://doi.org/10.5281/zenodo.1871452</a>.
</div></div></section></div> ]]></description>
  <guid>https://isrl.in/pub/2026-02-ds-variants/</guid>
  <pubDate>Sun, 01 Feb 2026 00:00:00 GMT</pubDate>
</item>
</channel>
</rss>
