# NativQA: Multilingual Culturally-Aligned Natural Queries for LLMs

## Overview
MultiNativQA dataset is a multilingual dataset designed to provide culturally-aligned natural queries for large language models (LLMs). This dataset includes various languages with specific focus on regional and cultural nuances in question answering.

## Directory Structure
The dataset is organized into directories based on language and region. Each directory contains TSV a file for train, dev and test set.


## File Format
Each TSV file in the dataset follows the structure with the following fields:

- `data_id`: Unique identifier for each data entry.
- `category`: Category of the question.
- `input_query`: The natural language query input.
- `question`: The question derived from the input query.
- `answer`: The corresponding answer.
- `question_type`: The type of question (e.g., factual, opinion).
- `answer_URLs`: URLs where the answer can be verified.
- `is_reliable`: Indicator of the reliability of the answer (1 for reliable, 0 for not reliable).

**Example entry in TSV file:**

321245990ac29eab8e401e9623dc0fb1	tradition	What is the meaning of tablighi ijtema?	Where is the ijtema in Bangladesh?	Ans. The Bishwa Ijtema (Bengali: Global Congregation) is an annual Muslim gathering in Tongi, Bangladesh, on the banks of the River Turag, on the outskirts of Dhaka. It is one of the world's largest peaceful gatherings.	related_questions	https://unacademy.com/content/upsc/study-material/government-schemes/bishwa-ijtema/	very_reliable



## Python Function to Read File

Here's a Python function to read the TSV files:

```python
import pandas as pd

def read_MultiNativQA_tsv(file_path):
    """
    Reads a NativQA TSV file and returns a pandas DataFrame.

    Parameters:
    file_path (str): The path to the TSV file.

    Returns:
    pd.DataFrame: DataFrame containing the dataset.
    """
    df = pd.read_csv(file_path, sep='\t')
    return df

file_path = 'path/to/NativQA_ar_msa_qa_test.tsv'
df = read_MultiNativQA_tsv(file_path)
print(df.head())

```

# Example usage

# License
This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

#  Contact
TBA

# Citation
TBA
