Close printable page

Recommendation

Estimating the entropy of neural data by saving them as a .png file

Haudur Freyja Olafsdottir, Mahesh Karnani and Fleur Zeldenrust based on reviews by Federico Stella and 2 anonymous reviewers

A recommendation of:

A quick and easy way to estimate entropy and mutual information for neuroscience

Mickael Zbili, Sylvain Rama (2020), bioRxiv, 2020.08.04.236174, ver. 3 peer-reviewed and recommended by PCI Circuit Neuroscience https://doi.org/10.1101/2020.08.04.236174

Read preprint in preprint server Now published in a journal

Codes used in this study

Scripts used to obtain or analyze results

Abstract

EN

AR

ES

FR

HI

JA

PT

RU

ZH-CN

A quick and easy way to estimate entropy and mutual information for neuroscience

Calculations of entropy of a signal or mutual information between two variables are valuable analytical tools in the field of neuroscience. They can be applied to all types of data, capture nonlinear interactions and are model independent. Yet the limited size and number of recordings one can collect in a series of experiments makes their calculation highly prone to sampling bias. Mathematical methods to overcome this so-called “sampling disaster” exist, but require significant expertise, great time and computational costs. As such, there is a need for a simple, unbiased and computationally efficient tool for estimating the level of entropy and mutual information. In this paper, we propose that application of entropy-encoding compression algorithms widely used in text and image compression fulfill these requirements. By simply saving the signal in PNG picture format and measuring the size of the file on the hard drive, we can estimate entropy changes through different conditions. Furthermore, with some simple modifications of the PNG file, we can also estimate the evolution of mutual information between a stimulus and the observed responses through different conditions. We first demonstrate the applicability of this method using white-noise-like signals. Then, while this method can be used in all kind of experimental conditions, we provide examples of its application in patch-clamp recordings, detection of place cells and histological data. Although this method does not give an absolute value of entropy or mutual information, it is mathematically correct, and its simplicity and broad use make it a powerful tool for their estimation through experiments.

Entropy, Mutual Information, Electrophysioogy, Histology, PNG, DEFLATE, Place Fields

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

طريقة سريعة وسهلة لتقدير الإنتروبيا والمعلومات المتبادلة لعلم الأعصاب

تعد حسابات إنتروبيا الإشارة أو المعلومات المتبادلة بين متغيرين أدوات تحليلية قيمة في مجال علم الأعصاب. ويمكن تطبيقها على جميع أنواع البيانات، والتقاط التفاعلات غير الخطية، كما أنها مستقلة عن النماذج. ومع ذلك، فإن الحجم والعدد المحدودين من التسجيلات التي يمكن جمعها في سلسلة من التجارب يجعل حساباتها عرضة للتحيز في أخذ العينات. توجد طرق رياضية للتغلب على ما يسمى "كارثة أخذ العينات"، ولكنها تتطلب خبرة كبيرة ووقتًا كبيرًا وتكاليف حسابية. على هذا النحو، هناك حاجة إلى أداة بسيطة وغير متحيزة وفعالة حسابيًا لتقدير مستوى الإنتروبيا والمعلومات المتبادلة. في هذا البحث، نقترح أن تطبيق خوارزميات ضغط ترميز الإنتروبيا المستخدمة على نطاق واسع في ضغط النصوص والصور يفي بهذه المتطلبات. بمجرد حفظ الإشارة بتنسيق صورة PNG وقياس حجم الملف على القرص الصلب، يمكننا تقدير تغيرات الإنتروبيا من خلال ظروف مختلفة. علاوة على ذلك، مع بعض التعديلات البسيطة لملف PNG، يمكننا أيضًا تقدير تطور المعلومات المتبادلة بين التحفيز والاستجابات المرصودة من خلال ظروف مختلفة. نوضح أولاً إمكانية تطبيق هذه الطريقة باستخدام إشارات تشبه الضوضاء البيضاء. ثم، في حين يمكن استخدام هذا الأسلوب في جميع أنواع الظروف التجريبية، فإننا نقدم أمثلة لتطبيقه في تسجيلات التصحيح، والكشف عن خلايا المكان والبيانات النسيجية. على الرغم من أن هذه الطريقة لا تعطي قيمة مطلقة للإنتروبيا أو المعلومات المتبادلة، إلا أنها صحيحة رياضيًا، كما أن بساطتها واستخدامها على نطاق واسع يجعلها أداة قوية لتقديرها من خلال التجارب.

الإنتروبيا، المعلومات المتبادلة، الفيزيولوجيا الكهربية، علم الأنسجة، PNG، الانكماش، حقول المكان

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Una forma rápida y sencilla de estimar la entropía y la información mutua para la neurociencia

Los cálculos de entropía de una señal o de información mutua entre dos variables son valiosas herramientas analíticas en el campo de la neurociencia. Se pueden aplicar a todo tipo de datos, capturar interacciones no lineales y son independientes del modelo. Sin embargo, el tamaño y la cantidad limitados de grabaciones que se pueden recopilar en una serie de experimentos hacen que sus cálculos sean muy propensos a sesgos de muestreo. Existen métodos matemáticos para superar este llamado "desastre de muestreo", pero requieren una gran experiencia, mucho tiempo y costos computacionales. Como tal, existe la necesidad de una herramienta simple, imparcial y computacionalmente eficiente para estimar el nivel de entropía y la información mutua. En este artículo, proponemos que la aplicación de algoritmos de compresión de codificación de entropía ampliamente utilizados en la compresión de texto e imágenes cumpla con estos requisitos. Simplemente guardando la señal en formato de imagen PNG y midiendo el tamaño del archivo en el disco duro, podemos estimar los cambios de entropía en diferentes condiciones. Además, con algunas modificaciones simples del archivo PNG, también podemos estimar la evolución de la información mutua entre un estímulo y las respuestas observadas a través de diferentes condiciones. Primero demostramos la aplicabilidad de este método utilizando señales similares a ruido blanco. Luego, si bien este método se puede utilizar en todo tipo de condiciones experimentales, proporcionamos ejemplos de su aplicación en grabaciones de parches, detección de células de lugar y datos histológicos. Aunque este método no proporciona un valor absoluto de entropía ni información mutua, es matemáticamente correcto, y su simplicidad y amplio uso lo convierten en una poderosa herramienta para su estimación mediante experimentos.

Entropía, información mutua, electrofisiología, histología, PNG, DEFLATE, campos de lugar

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Un moyen simple et rapide d'estimer l'entropie et les informations mutuelles pour les neurosciences

Les calculs d'entropie d'un signal ou d'informations mutuelles entre deux variables sont des outils analytiques précieux dans le domaine des neurosciences. Ils peuvent être appliqués à tous les types de données, capturent les interactions non linéaires et sont indépendants du modèle. Pourtant, la taille et le nombre limités d’enregistrements que l’on peut collecter dans une série d’expériences rendent leur calcul très sujet aux biais d’échantillonnage. Il existe des méthodes mathématiques pour surmonter ce que l’on appelle le « désastre de l’échantillonnage », mais elles nécessitent une expertise considérable, du temps et des coûts de calcul importants. En tant que tel, il existe un besoin pour un outil simple, impartial et efficace sur le plan informatique pour estimer le niveau d’entropie et d’information mutuelle. Dans cet article, nous proposons que l'application d'algorithmes de compression à codage entropique largement utilisés dans la compression de texte et d'images répondent à ces exigences. En enregistrant simplement le signal au format d'image PNG et en mesurant la taille du fichier sur le disque dur, nous pouvons estimer les changements d'entropie dans différentes conditions. De plus, avec quelques modifications simples du fichier PNG, nous pouvons également estimer l'évolution de l'information mutuelle entre un stimulus et les réponses observées à travers différentes conditions. Nous démontrons d’abord l’applicabilité de cette méthode en utilisant des signaux de type bruit blanc. Ensuite, bien que cette méthode puisse être utilisée dans toutes sortes de conditions expérimentales, nous fournissons des exemples de son application dans les enregistrements patch-clamp, la détection de cellules placeuses et les données histologiques. Bien que cette méthode ne donne pas une valeur absolue de l'entropie ou des informations mutuelles, elle est mathématiquement correcte, et sa simplicité et sa large utilisation en font un outil puissant pour leur estimation par le biais d'expériences.

Entropie, Information mutuelle, Électrophysiologie, Histologie, PNG, DÉFLATE, Place Fields

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

तंत्रिका विज्ञान के लिए एन्ट्रापी और पारस्परिक जानकारी का अनुमान लगाने का एक त्वरित और आसान तरीका

किसी सिग्नल की एन्ट्रापी या दो चरों के बीच पारस्परिक जानकारी की गणना तंत्रिका विज्ञान के क्षेत्र में मूल्यवान विश्लेषणात्मक उपकरण हैं। उन्हें सभी प्रकार के डेटा पर लागू किया जा सकता है, नॉनलाइनियर इंटरैक्शन कैप्चर किया जा सकता है और मॉडल स्वतंत्र हैं। फिर भी प्रयोगों की एक श्रृंखला में एकत्रित की जा सकने वाली रिकॉर्डिंग का सीमित आकार और संख्या उनकी गणना को नमूनाकरण पूर्वाग्रह के लिए अत्यधिक प्रवण बनाती है। इस तथाकथित "सैंपलिंग आपदा" पर काबू पाने के लिए गणितीय तरीके मौजूद हैं, लेकिन इसके लिए महत्वपूर्ण विशेषज्ञता, महान समय और कम्प्यूटेशनल लागत की आवश्यकता होती है। ऐसे में, एन्ट्रापी और पारस्परिक जानकारी के स्तर का अनुमान लगाने के लिए एक सरल, निष्पक्ष और कम्प्यूटेशनल रूप से कुशल उपकरण की आवश्यकता है। इस पेपर में, हम प्रस्तावित करते हैं कि पाठ और छवि संपीड़न में व्यापक रूप से उपयोग किए जाने वाले एन्ट्रॉपी-एन्कोडिंग संपीड़न एल्गोरिदम का अनुप्रयोग इन आवश्यकताओं को पूरा करता है। केवल पीएनजी चित्र प्रारूप में सिग्नल को सहेजकर और हार्ड ड्राइव पर फ़ाइल के आकार को मापकर, हम विभिन्न स्थितियों के माध्यम से एन्ट्रापी परिवर्तनों का अनुमान लगा सकते हैं। इसके अलावा, पीएनजी फ़ाइल के कुछ सरल संशोधनों के साथ, हम विभिन्न स्थितियों के माध्यम से उत्तेजना और देखी गई प्रतिक्रियाओं के बीच पारस्परिक जानकारी के विकास का भी अनुमान लगा सकते हैं। हम पहले श्वेत-शोर जैसे संकेतों का उपयोग करके इस पद्धति की प्रयोज्यता को प्रदर्शित करते हैं। फिर, जबकि इस पद्धति का उपयोग सभी प्रकार की प्रायोगिक स्थितियों में किया जा सकता है, हम पैच-क्लैंप रिकॉर्डिंग, स्थान कोशिकाओं का पता लगाने और हिस्टोलॉजिकल डेटा में इसके अनुप्रयोग के उदाहरण प्रदान करते हैं। यद्यपि यह विधि एन्ट्रापी या पारस्परिक जानकारी का पूर्ण मूल्य नहीं देती है, यह गणितीय रूप से सही है, और इसकी सादगी और व्यापक उपयोग इसे प्रयोगों के माध्यम से उनके अनुमान के लिए एक शक्तिशाली उपकरण बनाता है।

एन्ट्रॉपी, पारस्परिक सूचना, इलेक्ट्रोफिजियोलॉजी, हिस्टोलॉजी, पीएनजी, डिफ्लेट, प्लेस फील्ड्स

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

神経科学のためのエントロピーと相互情報量を推定する迅速かつ簡単な方法

信号のエントロピーや 2 つの変数間の相互情報量の計算は、神経科学の分野における貴重な分析ツールです。これらはあらゆるタイプのデータに適用でき、非線形相互作用をキャプチャし、モデルに依存しません。しかし、一連の実験で収集できる記録のサイズと数は限られているため、計算にはサンプリングバイアスが非常に発生しやすくなります。このいわゆる「サンプリング災害」を克服する数学的手法は存在しますが、多大な専門知識、多大な時間と計算コストが必要です。したがって、エントロピーと相互情報量のレベルを推定するための、単純で偏りのない、計算効率の高いツールが必要とされています。この論文では、テキストおよび画像圧縮で広く使用されているエントロピー符号化圧縮アルゴリズムの適用がこれらの要件を満たすことを提案します。信号を PNG 画像形式で保存し、ハードドライブ上のファイルのサイズを測定するだけで、さまざまな条件によるエントロピーの変化を推定できます。さらに、PNG ファイルをいくつか簡単に変更することで、さまざまな条件下での刺激と観察された反応の間の相互情報量の変化を推定することもできます。まず、ホワイトノイズのような信号を使用して、この方法の適用可能性を実証します。次に、この方法はあらゆる種類の実験条件で使用できますが、パッチクランプ記録、場所細胞および組織学的データの検出への応用例を示します。この方法はエントロピーや相互情報量の絶対値を与えませんが、数学的には正しく、そのシンプルさと幅広い用途により、実験による推定のための強力なツールになります。

エントロピー、相互情報、電気生理学、組織学、PNG、収縮、プレイスフィールド

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Uma maneira rápida e fácil de estimar a entropia e informações mútuas para a neurociência

Cálculos de entropia de um sinal ou informação mútua entre duas variáveis são ferramentas analíticas valiosas no campo da neurociência. Eles podem ser aplicados a todos os tipos de dados, capturam interações não lineares e são independentes de modelo. No entanto, o tamanho e o número limitados de gravações que podem ser coletadas em uma série de experimentos tornam seus cálculos altamente propensos a vieses de amostragem. Existem métodos matemáticos para superar este chamado “desastre de amostragem”, mas requerem conhecimentos significativos, muito tempo e custos computacionais. Como tal, existe a necessidade de uma ferramenta simples, imparcial e computacionalmente eficiente para estimar o nível de entropia e informação mútua. Neste artigo, propomos que a aplicação de algoritmos de compressão de codificação de entropia amplamente utilizados na compressão de texto e imagem atenda a esses requisitos. Simplesmente salvando o sinal no formato de imagem PNG e medindo o tamanho do arquivo no disco rígido, podemos estimar as mudanças de entropia através de diferentes condições. Além disso, com algumas modificações simples no arquivo PNG, também podemos estimar a evolução da informação mútua entre um estímulo e as respostas observadas através de diferentes condições. Primeiro demonstramos a aplicabilidade deste método usando sinais semelhantes a ruído branco. Então, embora este método possa ser usado em todos os tipos de condições experimentais, fornecemos exemplos de sua aplicação em registros de patch-clamp, detecção de células locais e dados histológicos. Embora este método não forneça um valor absoluto de entropia ou informação mútua, é matematicamente correto, e sua simplicidade e amplo uso o tornam uma ferramenta poderosa para sua estimativa por meio de experimentos.

Entropia, Informação Mútua, Eletrofisiologia, Histologia, PNG, DEFLATE, Colocar Campos

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Быстрый и простой способ оценить энтропию и взаимную информацию для нейробиологии

Расчеты энтропии сигнала или взаимной информации между двумя переменными являются ценными аналитическими инструментами в области нейробиологии. Их можно применять ко всем типам данных, фиксировать нелинейные взаимодействия и они не зависят от модели. Однако ограниченный размер и количество записей, которые можно собрать в серии экспериментов, делают их расчеты весьма подверженными ошибкам выборки. Математические методы преодоления этой так называемой «катастрофы выборки» существуют, но требуют значительных знаний, большого количества времени и вычислительных затрат. Таким образом, существует потребность в простом, объективном и эффективном в вычислительном отношении инструменте для оценки уровня энтропии и взаимной информации. В этой статье мы предполагаем, что применение алгоритмов сжатия энтропийного кодирования, широко используемых при сжатии текста и изображений, отвечает этим требованиям. Просто сохранив сигнал в формате изображения PNG и измерив размер файла на жестком диске, мы можем оценить изменения энтропии в различных условиях. Более того, с помощью некоторых простых модификаций PNG-файла мы также можем оценить эволюцию взаимной информации между стимулом и наблюдаемыми реакциями в различных условиях. Сначала мы продемонстрируем применимость этого метода, используя сигналы, подобные белому шуму. Затем, хотя этот метод можно использовать во всех экспериментальных условиях, мы приводим примеры его применения в патч-кламп-записях, обнаружении клеток места и гистологических данных. Хотя этот метод не дает абсолютного значения энтропии или взаимной информации, он математически корректен, а его простота и широкое использование делают его мощным инструментом для их оценки посредством экспериментов.

Энтропия, Взаимная информация, Электрофизиология, Гистология, PNG, СДУВАНИЕ, Поля места

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

一种快速、简单地估计神经科学熵和互信息的方法

信号熵或两个变量之间互信息的计算是神经科学领域有价值的分析工具。它们可以应用于所有类型的数据，捕获非线性交互并且与模型无关。然而，在一系列实验中可以收集的记录的大小和数量有限，这使得他们的计算很容易出现抽样偏差。克服这种所谓的“采样灾难”的数学方法是存在的，但需要大量的专业知识、大量的时间和计算成本。因此，需要一种简单、无偏且计算高效的工具来估计熵和互信息的水平。在本文中，我们提出应用广泛应用于文本和图像压缩的熵编码压缩算法来满足这些要求。通过简单地将信号保存为 PNG 图片格式并测量硬盘上文件的大小，我们可以估计不同条件下的熵变化。此外，通过对 PNG 文件进行一些简单的修改，我们还可以估计不同条件下刺激和观察到的响应之间的互信息的演变。我们首先使用类白噪声信号证明该方法的适用性。然后，虽然该方法可用于所有类型的实验条件，但我们提供了其在膜片钳记录、位置细胞检测和组织学数据中的应用示例。虽然该方法没有给出熵或互信息的绝对值，但它在数学上是正确的，并且其简单性和广泛的用途使其成为通过实验进行估计的有力工具。

熵、互信息、电生理学、组织学、PNG、DEFLATE、放置场

Submission: posted 06 August 2020
Recommendation: posted 20 April 2021, validated 20 April 2021

Cite this recommendation as:
Olafsdottir, H., Karnani, M. and Zeldenrust, F. (2021) Estimating the entropy of neural data by saving them as a .png file. Peer Community in Neuroscience, 100001. https://doi.org/10.24072/pci.cneuro.100001

Recommendation

Entropy and mutual information are useful metrics for quantitative analyses of various signals across the sciences including neuroscience (Verdú, 2019). The information that a neuron transfers about a sensory stimulus is just one of many examples of this. However, estimating the entropy of neural data is often difficult due to limited sampling (Tovée et al., 1993; Treves and Panzeri, 1995). This manuscript overcomes this problem with a 'quick and dirty' trick: just save the corresponding plots as PNG files and measure the file sizes! The idea is that the size of the PNG file obtained by saving a particular set of data will reflect the amount of variability present in the data and will therefore provide an indirect estimation of the entropy content of the data.

The method the study employs is based on Shannon’s Source Coding Theorem - an approach used in the field of compressed sensing - which is still not widely used in neuroscience. The resulting algorithm is very straightforward, essentially consisting of just saving a figure of your data as a PNG file. Therefore it provides a useful tool for a fast and computationally efficient evaluation of the information content of a signal, without having to resort to more math-heavy methods (as the computation is done “for free” by the PNG compression software). It also opens up the possibility to pursue a similar strategy with other (than PNG) image compression software. The main limitation is that the PNG conversion method presented here allows only a relative entropy estimation: the size of the file is not the absolute value of entropy, due to the fact that the PNG algorithm also involves filtering for 2D images.

The study comprehensively reviews the use of entropy estimation in circuit neuroscience, and then tests the PNG method against other math-heavy methods, which have also been made accessible elsewhere (Ince et al., 2010). The study demonstrates use of the method in several applications. First, the mutual information between stimulus and neural response in whole-cell and unit recordings is estimated. Second, the study applies the method to experimental situations with less experimental control - such as recordings of hippocampal place cells (O’Keefe & Dostrovsky,1971) as animals freely explore an environment. The study shows the method can replicate previously established metrics in the field (e.g. Skaggs information, Skaggs et al. 1993). Importantly, it does this while making fewer assumptions on the data than traditional methods. Third, he study extends the use of the method to imaging data of neuronal morphology, such as charting the growth stage of neuronal cultures. However, the radial entropy of a dendritic tree seems at first more difficult to interpret than the common Sholl analysis of radial crossings of dendrite segments (Figure 6Ac of Zbili and Rama, 2021). As the authors note, a similar technique is used in paleobiology to discriminate pictures of biogenic rocks from abiogenic ones (Wagstaff and Corsetti, 2010). Perhaps neuronal subtypes could also be easily distinguished through PNG file size (Yuste et al., 2020). These examples are generally promising and creative applications.The authors used open source software and openly shared their code so anyone can give it a spin (https://github.com/Sylvain-Deposit/PNG-Entropy).

We were inspired by the wide applicability of the presented back-of-the-envelope technique, so we used it in a situation that the study had not tested: namely, the dissection of microcircuits via optogenetic tagging of target neurons. In this process, one is often confronted with the problem that not only the opsin-carrying cells will spike in response to light, but also other nearby neurons which are activated synaptically (via the opto-tagged cell). Separating these two types of responses is typically done using a latency or jitter analysis, which requires the experimenter subjectively searching for detection parameters. Therefore a rapid and objective technique is preferable. The PNG rate difference method on slice whole cell recordings of opsin tagged neurons revealed higher mutual information metrics for direct optogenetic activation than for postsynaptic responses, showing the method can be easily used to objectively segregate different spike triggers.

Figure caption: Using a PNG entropy metric to distinguish between direct optogenetic responses and postsynaptic excitatory responses. Left, PNG rate difference calculated for whole cell recordings of optogenetic activation in brain slices. About 20 consecutive 60ms sweeps were analysed from each of 7 postsynaptic cells and 8 directly activated cells. Analysis was performed as in Fig4B of the preprint (https://doi.org/10.1101/2020.08.04.236174) using code from https://github.com/Sylvain-Deposit/PNG-Entropy/blob/master/BatchSaveAsPNG.py. Right, six example traces from a cell carrying channelrhodopsin (black, top) and a cell that was excited synaptically (gray, bottom).

References

Ince, R.A.A., Mazzoni, A., Petersen, R.S., and Panzeri, S. (2010). Open source tools for the information theoretic analysis of neural data. Front Neurosci 4. https://doi.org/10.3389/neuro.01.011.2010

O'Keefe, J., & Dostrovsky, J. (1971). The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. Brain Research, 34(1), 171-175. https://doi.org/10.1016/0006-8993(71)90358-1

Skaggs, M. E., McNaughton, B. L., Gothard, K. M., and Markus, E. J. (1993). An information-theoretic approach to deciphering the hippocampal code. Adv. Neural Inform. Process Syst. 5, 1030-1037.

Tovée, M.J., Rolls, E.T., Treves, A., and Bellis, R.P. (1993). Information encoding and the responses of single neurons in the primate temporal visual cortex. J Neurophysiol 70, 640-654. https://doi.org/10.1152/jn.1993.70.2.640

Treves, A., and Panzeri, S. (1995). The Upward Bias in Measures of Information Derived from Limited Data Samples. Neural Computation 7, 399-407. https://doi.org/10.1162/neco.1995.7.2.399

Verdú, S. (2019). Empirical Estimation of Information Measures: A Literature Guide. Entropy (Basel) 21. https://doi.org/10.3390/e21080720

Wagstaff, K.L., and Corsetti, F.A. (2010). An evaluation of information-theoretic methods for detecting structural microbial biosignatures. Astrobiology 10, 363-379. https://doi.org/10.1089/ast.2008.0301

Yuste, R., Hawrylycz, M., Aalling, N., Aguilar-Valles, A., Arendt, D., Armañanzas, R., Ascoli, G.A., Bielza, C., Bokharaie, V., Bergmann, T.B., et al. (2020). A community-based transcriptomics classification and nomenclature of neocortical cell types. Nat Neurosci 23, 1456-1468. https://doi.org/10.1038/s41593-020-0685-8

Zbili, M., and Rama, S. (2021). A quick and easy way to estimate entropy and mutual information for neuroscience. BioRxiv 2020.08.04.236174. https://doi.org/10.1101/2020.08.04.236174

PDF recommendation

Download recommender's annotations

Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article. The authors declared that they comply with the PCI rule of having no financial conflicts of interest in relation to the content of the article.

Reviews

Reviewed by Federico Stella, 16 Mar 2021

The authors answered my concerns and I think the paper does not need further additions.

https://doi.org/10.24072/pci.cneuro.100001.rev21

Reviewed by anonymous reviewer 1, 28 Feb 2021

The second submission version of the manuscript addresses most of the remarks that I made in the previous round. I appreaciate also the additions included in the new version, namely the introduction of the smart PNG-rate concept, and the new possible application examples. However, some of the issues I pointed out about the original version have not been dealt with completely, and would still need minor revision.

The main remark is still about the definitions and terminology. The authors have clarified the presentation and use of the terms well, but I still find that there are a few more clarifications needed.

For one, the concept of entropy rate still pops up out of the blue in the presentation. It appears first on line 86, then on lines 141-142, and then in Section 2.2. It should preferably be defined before all of these in a formal way. A somewhat formal definition is given on lines 247-248 with "The entropy rate R is defined as $R = \frac{1}{T}H$ with T being the sampling of the signal." This should indeed appear earlier, with also preferably a bit of more elaboration.

Moreover, the term 'sampling of the signal' for $T$ in the above quote is inconsistent with the earlier terminology. Also, I still hope for more clarification of the term 'size'. This also applies to its use on lines 84-86; i.e. what does "a signal of size S" exactly mean?

In addition, I still find a few smaller remarks about the manuscript. The authors have dealt only partly with my remark about italicization of math symbols. For instance, around Equation 1, the $xi$-terms are only in italic in the equation, but not in the body text. The symbols should be in italic everywhere, not only inside equations. This should also apply also to the variables $T$, $v$, $RS$, $R_N$, the probabilities $p(x)$, etc.

There still appear some language issues and typos, e.g. two in "In this study, the authors showed that a larger synapse drived the postsynaptic spiking in a greater manner, which increase the SIE." At least some of these mistakes should be possible to detect even with proof-reading software. Correcting them would make the presentation more convincing and readable.

https://doi.org/10.24072/pci.cneuro.100001.rev22

Evaluation round #1

DOI or URL of the preprint: 10.1101/2020.08.04.236174

Version of the preprint: 1

Author's Reply, 31 Jan 2021

Download author's reply https://doi.org/10.24072/pci.neuro.100001.ar1

Decision by Haudur Freyja Olafsdottir, Mahesh Karnani and Fleur Zeldenrust, posted 05 Nov 2020

Dear Sylvain and Mickael, Thanks for submitting your preprint to PCI C Neuro, and apologies again for the delays - it took us some time to reach outside our usual neuroscientist pool in order to get feedback on the mathematics as well. The reviews of your manuscript "A quick and easy way to estimate entropy and mutual information for neuroscience" are below. The manuscript was evaluated by 2 reviewers and 2 of us (reviewers 3 and 4). Everyone was enthusiastic about the work, but felt there were some crucial edits to be made. Furthermore, the reviewers would like you to make explicit under what experimental conditions the method can be used, which could enhance the use of the presented method. If you could write a response to the reviews and upload a revised preprint, we would be happy to evaluate a revision for recommendation, this time with a much faster turnaround time. We hope this feedback helps in the preparation of a revised manuscript. Thank you for participating in the PCI Circuit Neuroscience initiative. Best regards, Freyja, Mahesh and Fleur Reviewer 1: The manuscript presents a novel method for entropy estimation based on leveraging on optimal noiseless compression algorithms, already developed for image processing, and in particular on the PNG format. The idea is that the size of the PNG file obtained by saving a particular set of data will reflect the amount of variability present in the data and will therefore provide with an indirect estimation of the entropy content of the data. The method is based on Shannon’s Source Coding Theorem, therefore approaching entropy estimation problem from the field of compressed sensing, something that as the authors rightly state, has not yet found wide use at least in the field of neuroscience. The resulting algorithm is extremely straightforward, essentially consisting in just the PNG saving step. Therefore it provides a useful tool for a fast and computationally efficient evaluation of the complexity of a signal, without having to resort to more math-heavy methods (as the math is done “for free” by the PNG compression software). The main issue with this method is that in its present form is essentially limited to a private use by the experimenter, when needing to have a on-the-go assessment of the variation of entropy content in some preparation. Especially, it seems to me that it is particularly suitable for well controlled recordings with multiple trials of fixed length or for continuous imaging of cell cultures (and indeed, these are the examples presented in the manuscript). It would problematic to apply the method in its present form either to experiments with a larger behavioral component or, on the other hand, to compare different recordings This comes from the fact that the PNG conversion method presented here allows only for relative entropy estimation. The authors rightly note how the size of the file has no direct relationship to the value of entropy. And indeed all the examples presented in the manuscript deal with the estimation of entropy or information content in data of fixed size obtained from the same system. While this is no doubt a very handy approach to obtain quick measures of entropy for large dataset, it is hard for me to see how it could be generalized to allow for comparison across different experimental setup, or the same experimental setup in different conditions. Taking the case of mutual information, consider as an example the mutual information between a place cell activity and the position of the animal in space. The conditional entropy component of the mutual information requires to consider different bins in space, for which variable amount of data would be available, given the heterogeneities of the animal behavior. Moreover comparing two neurons recorded in two different sessions will also present the problem of different exploration time and amount of data. I am not sure wether the authors have already considered cases of this type and wether it would be possible to include a normalization step in their method, so that, for example PNG size is compared to an image of the same size obtained from a random process. As this would greatly extend the applicability of the PNG conversion, the paper would benefit from a discussion of possible solutions to this sort of situation. Reviewer 2: The article proposes quantifying the entropy and mutual information of neuroscientific data based on file size after compressing the data as a PNG-file. The idea is intriguing and is presented and studied in the article fairly thoroughly. The paper has some merits, but there are multiple issues as well, which are listed below. The authors correctly note the issue that file size does not correspond to absolute entropy values. Nevertheless, they state, already in the Abstract, that "we can reliably estimate entropy" with the method, which seems a bit like false advertising. Perhaps using "the level of entropy" in place of just "entropy" would be more accurate. The paper contains some repetition, especially in Sections 2.2 and 3.1. However, even the parts that are explained twice are not explained clearly enough; in particular, the quadratic extrapolation method requires some clarification. The terminology should be defined and stated more clearly and rigorously in the beginning; for instance, the authors use the terms "sampling", "bin" and "word" related to the variable T, but the meaning of these terms and the variable remain nevertheless unclear. The authors state the formula of Shannon entropy in Equation 1, but they do not discuss what is the probability space on which the entropy is considered in the context of the paper. It would be nice to have a discussion about this at least on a descriptive and intuitive level. The authors use the terms entropy and entropy rate in an interchangeable manner, although they are two quite different concepts. Which one are the authors actually interested in computing? This especially causes confusion when interpreting Figure 2A. Figure 2B has a couple of issues: First, the authors state that the plot of the PNG file size "follows the same curve than the entropy". However, although they appear similar, the values of the curves differ quite substantially at ~25% white pixels, where the entropy is around 50% of the maximum entropy but the file size is above 60% of the maximum. Secondly, the file size curve seems slightly asymmetric, which is surprising, as the file size should be the same regardless of whether compressing an image containing x percent white pixels or x percent black pixels. Perhaps the file size curve should be averaged over more runs of the compression? The authors should discuss the fact that, in addition to compression, the PNG algorithm also involves filtering for 2D images, (see e.g. http://www.libpng.org/pub/png/book/chapter09.html ), which affects the compression size of 2D images. The authors should note that this also affects the results gained with the 2D-images. Currently the 2D images are simply transformed into 1D signals. There are quite a few grammatical errors and typos throughout the text, so I would urge the authors to read through and edit the manuscript with care in this respect. The authors talk about "white noise with amplitude x". Is this a common convention for the use of the term amplitude? Perhaps some other terminology would be more standard? The mathematical notation has some issues. Most crucially, Equation 5 for the conditional entropy is simply incorrect. The conditional probability should be denoted with a vertical line "|", not a forward slash "/". The mathematical variables should always be italicized (e.g. on lines 41-42). The two (unnumbered) Equations between lines 161 and 166 are repeated in Equations 2 and 8 unnecessarily. Also, the Equation below line 161 has the term T in all three limits. Reviewer 3: This preprint presents a png-file compression based method for estimating relative entropy of time series signals and 2-d images. The authors start with a very useful presentation of another entropy estimation method and careful demonstration of better performance of the png method. They also explore the limitations in Fig2. After this, they present the use of the png method for analysing neural data by obtaining a metric similar to mutual information for repetitive trials of electrophysiological data, and analysing dendritic complexity in micrographs of neurons. The method appears useful and simple to apply, though parts of the application should be clarified and it would be more useful to present the questions that can be addressed with the method rather than metrics that can be approximated. Only the last use case seems to be addressing a clear question, ‘what is the growth state of a neural culture?’. I have major and minor suggestions for improvement below. Major: 1) Line 296 states that conditional entropy of signal X given signal Y can be interpreted as the noise entropy of X across trials. This seems incorrect and needs justification. Conditional entropy should express the entropy that remains in X given the knowledge of Y. It makes some intuitive sense that the noise entropy of X across trials will be affected by the driving signal Y, but to equate it with conditional entropy as defined in eq.5 seems a step too far. The main usefulness of the png method presented in the paper, estimating mutual information, relies on this interpretation. Perhaps an overly ambitious assumption like this explains the deviations between Fig3C middle and right panels. If the authors could come up with a more convincing method of estimating mutual information with the png approach, that would enhance the usefulness of the paper. However, given that the png method only returns relative entropy of one signal at a time it appears not suited for estimating conditional entropies. Other electrophysiological signal entropy metrics may also be equally useful for the community, such as transfer entropy, and just the relative entropy of signals, but it may be helpful to demonstrate an example of what questions can be answered with these. 2) Entropy of images is presented as another key use of the method in figure 4. The estimation of culture growth stage (Fig4C) is useful. However, the usefulness of the other two presented instances is dubious. Fig4A shows a use in estimating layers in a Cajal drawing. It is unclear why one would need to do this, as the boundaries of layers 2-5 appear to be obscured in the curve, compared to just looking at the drawing. Fig4B shows a use in estimating a curve similar to Sholl analysis (but different enough to not serve the same purpose). As the Sholl analysis curve describes the number of neural branches as a function of radial distance from soma, it gives us an immediately useful metric which needs no interpretation. The png metric however seems more difficult to interpret. It means something about the uncertainty/complexity of dendritic patterns at a given radius. Can the authors explain more why they believe this is useful? Demonstrating an improved identification of cell types based on the png metric over the Sholl curve would make a very strong case for usefulness. As the authors pointed out, astrobiologists can use png metrics to help distinguish between biogenic and nonbiogenic rocks – could the technique offer an equally useful categorization assay for neuroscience? Minor: 1) Authors explain on lines 64-70 that the quadratic extrapolation method is not the only alternative for estimating signal entropy. Why was the quadratic extrapolation method singled out as the comparison for the png method? Would the other methods perform as well or better than the png method? 2) Could the authors mention what computer was used for the study? This is important because, e.g., line 424, it is mentioned that speed is an advantage of the png method because getting the same metric with quadratic extrapolation took 2h. This depends on the hardware. 3) Equation between lines 161 and 162 should be the same as eq.2 on line 222. Reviewer 4: Estimating entropy/mutual information is often a difficult and computationally expensive process, and the simple method the authors propose is attractive, because it offers a quick and easy way to compare the entropy between conditions. However, I think there is an issue, namely that the "file size does not correspond to absolute entropy values", possibly because "the PNG algorithm also involves filtering for 2D images, (see e.g. http://www.libpng.org/pub/png/book/chapter09.html ), which affects the compression size of 2D images". Since entropy and mutual information are almost always heavily dependent on the estimation method (and direct measurement is almost never possible), this is not a problem per se, but I think the authors should be extremely clear on when their method does or does not apply. The authors make strong claims in the abstract ("By simply saving the signal in PNG picture format and measuring the size of the file on the hard drive, we can reliably estimate entropy through different conditions") and although they mention the limitations themselves, both in the abstract and in the text, I think they have to be careful here. So I believe this context should be made clearer, and it should be made explicit when to use (and NOT use) the method. The authors claim in the discussion that "PNG files must be all of the same dimensions, of the same dynamic range and saved with the same software" -- maybe they could think of a few typical experimental conditions where this does or does not apply? Secondly, I also wonder whether the size of a PNG file correlates linearly with the entropy of non-white noise files. The authors do not show this explicitly, only for white nose files, but aren't any signals that we are interested in non-white noise? Finally, personally I would be quite curious where the differences in png file size come from, if it is not the entropy. Do the authors have an opinion on that (because that could help others to judge when or when not one can use the method)?

https://doi.org/10.24072/pci.neuro.100001.d1

Reviewed by anonymous reviewer 2, 26 Oct 2020

The manuscript presents a novel method for entropy estimation based on leveraging on optimal noiseless compression algorithms, already developed for image processing, and in particular on the PNG format. The idea is that the size of the PNG file obtained by saving a particular set of data will reflect the amount of variability present in the data and will therefore provide with an indirect estimation of the entropy content of the data. The method is based on Shannon’s Source Coding Theorem, therefore approaching entropy estimation problem from the field of compressed sensing, something that as the authors rightly state, has not yet found wide use at least in the field of neuroscience. The resulting algorithm is extremely straightforward, essentially consisting in just the PNG saving step. Therefore it provides a useful tool for a fast and computationally efficient evaluation of the complexity of a signal, without having to resort to more math-heavy methods (as the math is done “for free” by the PNG compression software).

The main issue with this method is that in its present form is essentially limited to a private use by the experimenter, when needing to have a on-the-go assessment of the variation of entropy content in some preparation. Especially, it seems to me that it is particularly suitable for well controlled recordings with multiple trials of fixed length or for continuous imaging of cell cultures (and indeed, these are the examples presented in the manuscript). It would problematic to apply the method in its present form either to experiments with a larger behavioral component or, on the other hand, to compare different recordings

This comes from the fact that the PNG conversion method presented here allows only for relative entropy estimation. The authors rightly note how the size of the file has no direct relationship to the value of entropy. And indeed all the examples presented in the manuscript deal with the estimation of entropy or information content in data of fixed size obtained from the same system. While this is no doubt a very handy approach to obtain quick measures of entropy for large dataset, it is hard for me to see how it could be generalized to allow for comparison across different experimental setup, or the same experimental setup in different conditions.

Taking the case of mutual information, consider as an example the mutual information between a place cell activity and the position of the animal in space. The conditional entropy component of the mutual information requires to consider different bins in space, for which variable amount of data would be available, given the heterogeneities of the animal behavior. Moreover comparing two neurons recorded in two different sessions will also present the problem of different exploration time and amount of data. I am not sure wether the authors have already considered cases of this type and wether it would be possible to include a normalization step in their method, so that, for example PNG size is compared to an image of the same size obtained from a random process. As this would greatly extend the applicability of the PNG conversion, the paper would benefit from a discussion of possible solutions to this sort of situation.

https://doi.org/10.24072/pci.neuro.100001.rev11

Reviewed by anonymous reviewer 1, 29 Sep 2020

The article proposes quantifying the entropy and mutual information of neuroscientific data based on file size after compressing the data as a PNG-file. The idea is intriguing and is presented and studied in the article fairly thoroughly. The paper has some merits, but there are multiple issues as well, which are listed below.

The authors correctly note the issue that file size does not correspond to absolute entropy values. Nevertheless, they state, already in the Abstract, that "we can reliably estimate entropy" with the method, which seems a bit like false advertising. Perhaps using "the level of entropy" in place of just "entropy" would be more accurate.
The paper contains some repetition, especially in Sections 2.2 and 3.1. However, even the parts that are explained twice are not explained clearly enough; in particular, the quadratic extrapolation method requires some clarification. The terminology should be defined and stated more clearly and rigorously in the beginning; for instance, the authors use the terms "sampling", "bin" and "word" related to the variable T, but the meaning of these terms and the variable remain nevertheless unclear.
The authors state the formula of Shannon entropy in Equation 1, but they do not discuss what is the probability space on which the entropy is considered in the context of the paper. It would be nice to have a discussion about this at least on a descriptive and intuitive level.
The authors use the terms entropy and entropy rate in an interchangeable manner, although they are two quite different concepts. Which one are the authors actually interested in computing? This especially causes confusion when interpreting Figure 2A.
Figure 2B has a couple of issues: First, the authors state that the plot of the PNG file size "follows the same curve than the entropy". However, although they appear similar, the values of the curves differ quite substantially at ~25% white pixels, where the entropy is around 50% of the maximum entropy but the file size is above 60% of the maximum. Secondly, the file size curve seems slightly asymmetric, which is surprising, as the file size should be the same regardless of whether compressing an image containing x percent white pixels or x percent black pixels. Perhaps the file size curve should be averaged over more runs of the compression?
The authors should discuss the fact that, in addition to compression, the PNG algorithm also involves filtering for 2D images, (see e.g. http://www.libpng.org/pub/png/book/chapter09.html ), which affects the compression size of 2D images. The authors should note that this also affects the results gained with the 2D-images. Currently the 2D images are simply transformed into 1D signals, which
There are quite a few grammatical errors and typos throughout the text, so I would urge the authors to read through and edit the manuscript with care in this respect.
The authors talk about "white noise with amplitude x". Is this a common convention for the use of the term amplitude? Perhaps some other terminology would be more standard?
The mathematical notation has some issues. Most crucially, Equation 5 for the conditional entropy is simply incorrect.
The conditional probability should be denoted with a vertical line "|", not a forward slash "/".
The mathematical variables should always be italicized (e.g. on lines 41-42).
The two (unnumbered) Equations between lines 161 and 166 are repeated in Equations 2 and 8 unnecessarily. Also, the Equation below line 161 has the term T in all three limits.

https://doi.org/10.24072/pci.neuro.100001.rev12