Provide functions for reference data #312

jan-janssen · 2024-07-29T17:57:10Z

Summary by CodeRabbit

New Features
- Introduced functions to retrieve detailed chemical information based on element symbols from Mendeleev's database and Wolfram Alpha.
- Added functionality to fetch experimental elastic properties of elements from Wikipedia.
- Enhanced the environment with additional dependencies for improved capabilities and functionality.

These enhancements provide users with a structured and efficient way to access extensive chemical data, benefiting educational and research purposes.

for more information, see https://pre-commit.ci

coderabbitai · 2024-07-29T17:57:19Z

Walkthrough

This update significantly enhances the capability to retrieve detailed chemical information from sources like Mendeleev, Wikipedia, and Wolfram Alpha. New modules streamline access to elemental properties, promoting better data accessibility for research and education. Additionally, the environment configurations have been optimized, with various dependencies added and others removed, ensuring a more streamlined and effective functionality.

Changes

File Path	Change Summary
`.ci_support/environment-old.yml`	Major updates to dependencies: several packages added (e.g., `dynaphopy`, `requests`), and others removed (e.g., `phonopy`, `seekpath`).
`.ci_support/environment.yml`	New dependencies added (e.g., `lxml`, `mendeleev`, `pandas`, `phonopy`, `requests`, `seekpath`), while previous outdated entries were removed.
`atomistics/referencedata/__init__.py`	Establishes a public API for chemical information retrieval, managing imports to control function access.
`atomistics/referencedata/mendeleev.py`	Introduces `get_chemical_information`, which retrieves extensive data about elements based on their symbols.
`atomistics/referencedata/wikipedia.py`	Implements `get_elastic_properties` to fetch elastic properties of elements from Wikipedia using their symbols.
`atomistics/referencedata/wolframalpha.py`	Adds `get_chemical_information`, for extracting various chemical properties from Wolfram Alpha, including physical and thermal characteristics.
`tests/test_referencedata.py`	New test suite validating retrieval functions for chemical data, ensuring robustness and correctness under specified conditions.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Mendeleev
    participant Wikipedia
    participant Wolfram

    User->>Mendeleev: Request chemical information
    Mendeleev-->>User: Return detailed properties

    User->>Wikipedia: Request elastic properties
    Wikipedia-->>User: Return elastic property values

    User->>Wolfram: Request chemical data
    Wolfram-->>User: Return comprehensive chemical properties

🐇 In fields of data where elements reside,
New functions appear, oh what a ride!
Mendeleev, Wolfram, and Wikipedia's might,
Share secrets of atoms, in day and in night!
With knowledge so vast, for all to embrace,
Let's leap into science, a beautiful chase! ✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 8

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between d6cb030 and dc43179.

Files selected for processing (3)

atomistics/referencedata/mendeleevdb.py (1 hunks)
atomistics/referencedata/wiki.py (1 hunks)
atomistics/referencedata/wolfram.py (1 hunks)

Additional context used

Ruff

atomistics/referencedata/wolfram.py

83-83: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

106-109: Combine if branches using logical or operator

Combine if branches

(SIM114)

Additional comments not posted (1)

atomistics/referencedata/wiki.py (1)

21-23: Handle potential URL changes.

The Wikipedia URL might change, breaking the function. Consider adding a fallback mechanism or updating the URL dynamically.

Is the URL expected to remain stable? If not, consider adding a fallback mechanism or a way to update the URL dynamically.

coderabbitai · 2024-07-29T18:01:58Z

atomistics/referencedata/wiki.py

+    for i, p in enumerate(property_lst):
+        df_tmp = df_lst[i]
+        property_dict[p] = float(
+            df_tmp[df_tmp.symbol == chemical_symbol].squeeze(axis=0).to_dict()["WEL[1]"]
+        )


Add error handling for missing chemical symbols.

The function assumes that the chemical symbol will always be found. Add error handling for cases where the symbol is not found.

- property_dict[p] = float( - df_tmp[df_tmp.symbol == chemical_symbol].squeeze(axis=0).to_dict()["WEL[1]"] - ) + try: + property_dict[p] = float( + df_tmp[df_tmp.symbol == chemical_symbol].squeeze(axis=0).to_dict()["WEL[1]"] + ) + except KeyError: + property_dict[p] = None

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for i, p in enumerate(property_lst):

df_tmp = df_lst[i]

property_dict[p] = float(

df_tmp[df_tmp.symbol == chemical_symbol].squeeze(axis=0).to_dict()["WEL[1]"]

)

for i, p in enumerate(property_lst):

df_tmp = df_lst[i]

try:

property_dict[p] = float(

df_tmp[df_tmp.symbol == chemical_symbol].squeeze(axis=0).to_dict()["WEL[1]"]

)

except KeyError:

property_dict[p] = None

coderabbitai · 2024-07-29T18:01:58Z

atomistics/referencedata/wiki.py

+def get_experimental_elastic_property_wikipedia(chemical_symbol: str) -> dict:
+    """
+    Looks up elastic properties for a given chemical symbol from the Wikipedia: https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page) sourced from webelements.com.
+
+    Args:
+        chemical_symbol (str): Chemical symbol of the element.
+        property (str): Name of the property to retrieve. Options: youngs_modulus, poissons_ratio, bulk_modulus, shear_modulus
+
+    Returns:
+        str: Property value (various types): Value of the property for the given element, if available.
+    """


Improve the function documentation.

The return type in the docstring should be dict instead of str to match the actual return type.

- str: Property value (various types): Value of the property for the given element, if available. + dict: Dictionary containing the property values for the given element, if available.

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def get_experimental_elastic_property_wikipedia(chemical_symbol: str) -> dict:

"""

Looks up elastic properties for a given chemical symbol from the Wikipedia: https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page) sourced from webelements.com.

Args:

chemical_symbol (str): Chemical symbol of the element.

property (str): Name of the property to retrieve. Options: youngs_modulus, poissons_ratio, bulk_modulus, shear_modulus

Returns:

str: Property value (various types): Value of the property for the given element, if available.

"""

def get_experimental_elastic_property_wikipedia(chemical_symbol: str) -> dict:

"""

Looks up elastic properties for a given chemical symbol from the Wikipedia: https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page) sourced from webelements.com.

Args:

chemical_symbol (str): Chemical symbol of the element.

property (str): Name of the property to retrieve. Options: youngs_modulus, poissons_ratio, bulk_modulus, shear_modulus

Returns:

dict: Dictionary containing the property values for the given element, if available.

"""

coderabbitai · 2024-07-29T18:01:58Z

atomistics/referencedata/mendeleevdb.py

+    df = fetch_table("elements")
+    return df[df.symbol == chemical_symbol].squeeze(axis=0).to_dict()


Consider caching the fetched data.

Fetching the table every time the function is called might be inefficient. Consider caching the data to improve performance.

from functools import lru_cache @lru_cache(maxsize=1) def get_chemical_information_from_mendeleev(chemical_symbol: str) -> dict: ...

coderabbitai · 2024-07-29T18:01:58Z

atomistics/referencedata/mendeleevdb.py

+def get_chemical_information_from_mendeleev(chemical_symbol: str) -> dict:
+    """
+    Get information of a given chemical element
+
+    Args:
+        chemical_symbol: Chemical Element like Au for Gold
+
+    Returns:
+        dict: Dictionary with the following keys


Add error handling for missing chemical symbols.

The function assumes that the chemical symbol will always be found. Add error handling for cases where the symbol is not found.

- return df[df.symbol == chemical_symbol].squeeze(axis=0).to_dict() + try: + return df[df.symbol == chemical_symbol].squeeze(axis=0).to_dict() + except KeyError: + return {}

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def get_chemical_information_from_mendeleev(chemical_symbol: str) -> dict:

"""

Get information of a given chemical element

Args:

chemical_symbol: Chemical Element like Au for Gold

Returns:

dict: Dictionary with the following keys

def get_chemical_information_from_mendeleev(chemical_symbol: str) -> dict:

"""

Get information of a given chemical element

Args:

chemical_symbol: Chemical Element like Au for Gold

Returns:

dict: Dictionary with the following keys

"""

try:

return df[df.symbol == chemical_symbol].squeeze(axis=0).to_dict()

except KeyError:

return {}

coderabbitai · 2024-07-29T18:01:58Z

atomistics/referencedata/wolfram.py

+    result = pandas.concat(
+        [
+            _collect(
+                url=v["url"],
+                column=k,
+                select_function=v["select_function"],
+                current_filter=v["current_filter"],
+            ).set_index("element")
+            for k, v in tqdm(data_dict.items())
+        ],
+        axis=1,


Add logging for download progress.

The function might benefit from logging the download progress to provide feedback to the user.

import logging logging.basicConfig(level=logging.INFO) def _wolframalpha_download(): logging.info("Starting download from Wolfram Alpha...") result = pandas.concat( [ _collect( url=v["url"], column=k, select_function=v["select_function"], current_filter=v["current_filter"], ).set_index("element") for k, v in tqdm(data_dict.items()) ], axis=1, sort=False, ) result["volume"] = result.apply( lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1 ) data_path = os.path.join(os.path.dirname(__file__), "..", "data") os.makedirs(data_path, exist_ok=True) result.to_csv(os.path.join(data_path, "wolfram.csv")) logging.info("Download completed.")

coderabbitai · 2024-07-29T18:01:58Z

atomistics/referencedata/wolfram.py

+    filename = os.path.join(os.path.dirname(__file__), "..", "data", "wolfram.csv")
+    if not os.path.exists(filename):
+        _wolframalpha_download()
+    df = pandas.read_csv(filename)


Consider caching the fetched data.

Fetching the data every time the function is called might be inefficient. Consider caching the data to improve performance.

from functools import lru_cache @lru_cache(maxsize=1) def get_chemical_information_from_wolframalpha(chemical_element): ...

coderabbitai · 2024-07-29T18:01:58Z

atomistics/referencedata/wolfram.py

+def _wolframalpha_download():
+    result = pandas.concat(
+        [
+            _collect(
+                url=v["url"],
+                column=k,
+                select_function=v["select_function"],
+                current_filter=v["current_filter"],
+            ).set_index("element")
+            for k, v in tqdm(data_dict.items())
+        ],
+        axis=1,
+        sort=False,
+    )
+    result["volume"] = result.apply(
+        lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1
+    )
+    data_path = os.path.join(os.path.dirname(__file__), "..", "data")
+    os.makedirs(data_path, exist_ok=True)
+    result.to_csv(os.path.join(data_path, "wolfram.csv"))
+


Add error handling for network-related errors.

The function does not handle network-related errors, which might cause it to fail silently.

- result = pandas.concat( + try: + result = pandas.concat( - result.to_csv(os.path.join(data_path, "wolfram.csv")) + result.to_csv(os.path.join(data_path, "wolfram.csv")) + except requests.RequestException as e: + print(f"Network error occurred: {e}")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _wolframalpha_download():

result = pandas.concat(

[

_collect(

url=v["url"],

column=k,

select_function=v["select_function"],

current_filter=v["current_filter"],

).set_index("element")

for k, v in tqdm(data_dict.items())

],

axis=1,

sort=False,

)

result["volume"] = result.apply(

lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1

)

data_path = os.path.join(os.path.dirname(__file__), "..", "data")

os.makedirs(data_path, exist_ok=True)

result.to_csv(os.path.join(data_path, "wolfram.csv"))

def _wolframalpha_download():

try:

result = pandas.concat(

[

_collect(

url=v["url"],

column=k,

select_function=v["select_function"],

current_filter=v["current_filter"],

).set_index("element")

for k, v in tqdm(data_dict.items())

],

axis=1,

sort=False,

)

result["volume"] = result.apply(

lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1

)

data_path = os.path.join(os.path.dirname(__file__), "..", "data")

os.makedirs(data_path, exist_ok=True)

result.to_csv(os.path.join(data_path, "wolfram.csv"))

except requests.RequestException as e:

print(f"Network error occurred: {e}")

coderabbitai · 2024-07-29T18:01:58Z

atomistics/referencedata/wolfram.py

+def get_chemical_information_from_wolframalpha(chemical_element):
+    """
+    Get information of a given chemical element
+    Args:
+        chemical_element: Chemical Element like Au for Gold
+    Returns:
+        dict: Dictionary with the following keys
+            element: chemical element
+            thermalcondictivity: thermal conductivity
+            atomicradius: calculated distance from nucleus of outermost electron
+            bulkmodulus: bulk modulus (incompressibility)
+            shearmodulus: shear modulus of solid
+            youngmodulus: Young's modulus of solid
+            poissonratio: Poisson ratio of solid
+            density: density at standard temperature and pressure
+            liquiddensity: liquid density at melting point
+            thermalexpansion: linear thermal expansion coefficient
+            meltingpoint: melting temperature in kelvin
+            vaporizationheat: latent heat for liquid-gas transition
+            specificheat: specific heat capacity
+            latticeconstant: crystal lattice constants
+            crystal: basic crystal lattice structure
+            volmolar: molar volume
+            mass: average atomic weight in atomic mass units
+            volume: Volume
+    """
+    filename = os.path.join(os.path.dirname(__file__), "..", "data", "wolfram.csv")
+    if not os.path.exists(filename):
+        _wolframalpha_download()
+    df = pandas.read_csv(filename)
+    return df[df.element == chemical_element].squeeze(axis=0).to_dict()


Add error handling for missing chemical elements.

The function assumes that the chemical element will always be found. Add error handling for cases where the element is not found.

- return df[df.element == chemical_element].squeeze(axis=0).to_dict() + try: + return df[df.element == chemical_element].squeeze(axis=0).to_dict() + except KeyError: + return {}

Committable suggestion was skipped due to low confidence.

for more information, see https://pre-commit.ci

…o reference_data # Conflicts: # atomistics/referencedata/__init__.py

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 6

Outside diff range, codebase verification and nitpick comments (4)

tests/test_referencedata.py (3)
29-31: Consider adding more assertions.

The test currently checks only the atomic number. Adding more assertions can improve test coverage and ensure the function retrieves all expected data correctly.
self.assertEqual(al_data["atomic_number"], 13)
self.assertEqual(al_data["symbol"], "Al")
self.assertEqual(al_data["name"], "Aluminum")
37-39: Consider adding more assertions.

The test currently checks only the lattice constant. Adding more assertions can improve test coverage and ensure the function retrieves all expected data correctly.
self.assertEqual(al_data["latticeconstant"], "(404.95, 404.95, 404.95)")
self.assertEqual(al_data["atomic_number"], 13)
self.assertEqual(al_data["symbol"], "Al")
self.assertEqual(al_data["name"], "Aluminum")
44-53: Consider adding more assertions.

The test currently checks only the elastic properties. Adding more assertions can improve test coverage and ensure the function retrieves all expected data correctly.
self.assertEqual(al_data["bulk_modulus"], 76.0)
self.assertEqual(al_data["poissons_ratio"], 0.35)
self.assertEqual(al_data["shear_modulus"], 26.0)
self.assertEqual(al_data["youngs_modulus"], 70.0)
self.assertEqual(al_data["symbol"], "Al")
self.assertEqual(al_data["name"], "Aluminum")
pyproject.toml (1)
59-64: LGTM! Consider adding a comment explaining the purpose of the experimental section.

The addition of the experimental section is a good way to manage dependencies that are not yet fully integrated into the main project workflow. Adding a brief comment can help other developers understand its purpose.
[project.optional-dependencies]
# Dependencies that are still experimental and not fully integrated
experimental = [
    "lxml==5.2.2",
    "mendeleev==0.17.0",
    "pandas==2.2.2",
    "requests==2.32.3",
]

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between dc43179 and a013161.

Files selected for processing (6)

.ci_support/environment-old.yml (1 hunks)
.ci_support/environment.yml (1 hunks)
atomistics/referencedata/init.py (1 hunks)
atomistics/referencedata/wolfram.py (1 hunks)
pyproject.toml (1 hunks)
tests/test_referencedata.py (1 hunks)

Additional context used

Ruff

tests/test_referencedata.py

17-17: lxml imported but unused; consider using importlib.util.find_spec to test for availability

(F401)

atomistics/referencedata/wolfram.py

82-82: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

105-108: Combine if branches using logical or operator

Combine if branches

(SIM114)

Additional comments not posted (20)

.ci_support/environment.yml (6)

6-6: Dependency Addition: lxml

The addition of lxml =5.2.2 looks good. Ensure that this version is compatible with other dependencies in the environment.

7-7: Dependency Addition: mendeleev

The addition of mendeleev =0.17.0 looks good. Ensure that this version is compatible with other dependencies in the environment.

9-9: Dependency Addition: pandas

The addition of pandas =2.2.2 looks good. Ensure that this version is compatible with other dependencies in the environment.

10-10: Dependency Addition: phonopy

The addition of phonopy =2.26.6 looks good. Ensure that this version is compatible with other dependencies in the environment.

11-11: Dependency Addition: requests

The addition of requests =2.32.3 looks good. Ensure that this version is compatible with other dependencies in the environment.

13-13: Dependency Addition: seekpath

The addition of seekpath =2.1.0 looks good. Ensure that this version is compatible with other dependencies in the environment.

.ci_support/environment-old.yml (7)

5-5: Dependency Addition: dynaphopy

The addition of dynaphopy =1.17.5 looks good. Ensure that this version is compatible with other dependencies in the environment.

7-7: Dependency Addition: iprpy-data

The addition of iprpy-data =2023.07.25 looks good. Ensure that this version is compatible with other dependencies in the environment.

8-8: Dependency Addition: jinja2

The addition of jinja2 =2.11.3 looks good. Ensure that this version is compatible with other dependencies in the environment.

9-9: Dependency Addition: lammps

The addition of lammps =2022.06.23 looks good. Ensure that this version is compatible with other dependencies in the environment.

10-10: Dependency Addition: lxml

The addition of lxml =4.9.1 looks good. Ensure that this version is compatible with other dependencies in the environment.

11-11: Dependency Addition: mendeleev

The addition of mendeleev =0.10.0 looks good. Ensure that this version is compatible with other dependencies in the environment.

16-16: Dependency Addition: requests

The addition of requests =2.24.0 looks good. Ensure that this version is compatible with other dependencies in the environment.

atomistics/referencedata/__init__.py (3)

1-1: Import Statement: get_experimental_elastic_property_wikipedia

The import of get_experimental_elastic_property_wikipedia from atomistics.referencedata.wiki looks good. Ensure that this function exists in the specified module.

3-14: Import Statements and Error Handling: get_chemical_information_from_mendeleev and get_chemical_information_from_wolframalpha

The import of get_chemical_information_from_mendeleev and get_chemical_information_from_wolframalpha from atomistics.referencedata.mendeleevdb looks good. The try-except block effectively handles potential import errors. Ensure that these functions exist in the specified module.

9-17: all Declaration

The __all__ declaration correctly lists the imported functions. This ensures that only the specified functions are accessible when the module is imported.

atomistics/referencedata/wolfram.py (4)

17-21: LGTM!

The function correctly handles the conversion of density values from g/l to kg/l.

24-28: LGTM!

The function correctly handles the conversion of split string values to float.

31-36: LGTM!

The function correctly handles the conversion of lattice constant values from a split string.

39-43: LGTM!

The function correctly handles the conversion of scientific notation strings to float values.

coderabbitai · 2024-07-29T20:38:36Z

atomistics/referencedata/wolfram.py

+def _wolframalpha_download():
+    result = pandas.concat(
+        [
+            _collect(
+                url=v["url"],
+                column=k,
+                select_function=v["select_function"],
+                current_filter=v["current_filter"],
+            ).set_index("element")
+            for k, v in data_dict.items()
+        ],
+        axis=1,
+        sort=False,
+    )
+    result["volume"] = result.apply(
+        lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1
+    )
+    data_path = os.path.join(os.path.dirname(__file__), "data")
+    os.makedirs(data_path, exist_ok=True)
+    result.to_csv(os.path.join(data_path, "wolfram.csv"))


Add error handling for network-related errors.

The function does not handle network-related errors, which might cause it to fail silently.

- result = pandas.concat( + try: + result = pandas.concat( - result.to_csv(os.path.join(data_path, "wolfram.csv")) + result.to_csv(os.path.join(data_path, "wolfram.csv")) + except requests.RequestException as e: + print(f"Network error occurred: {e}")

Committable suggestion was skipped due to low confidence.

coderabbitai · 2024-07-29T20:38:36Z

atomistics/referencedata/wolfram.py

+def _extract_entry(df, n):
+    for i in range(int(len(df.columns) / 2)):
+        if n in df[i].values:
+            v = df[i + 1].values[df[i].values.tolist().index(n)]
+            return v


Add error handling for missing values.

The function does not handle cases where the value is not found in the DataFrame, which might cause it to fail silently.

- v = df[i + 1].values[df[i].values.tolist().index(n)] + try: + v = df[i + 1].values[df[i].values.tolist().index(n)] + except ValueError: + print(f"Value {n} not found in DataFrame.")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _extract_entry(df, n):

for i in range(int(len(df.columns) / 2)):

if n in df[i].values:

v = df[i + 1].values[df[i].values.tolist().index(n)]

return v

def _extract_entry(df, n):

for i in range(int(len(df.columns) / 2)):

if n in df[i].values:

try:

v = df[i + 1].values[df[i].values.tolist().index(n)]

except ValueError:

print(f"Value {n} not found in DataFrame.")

return v

coderabbitai · 2024-07-29T20:38:36Z

atomistics/referencedata/wolfram.py

+def get_chemical_information_from_wolframalpha(chemical_element):
+    """
+    Get information of a given chemical element
+    Args:
+        chemical_element: Chemical Element like Au for Gold
+    Returns:
+        dict: Dictionary with the following keys
+            element: chemical element
+            thermalcondictivity: thermal conductivity
+            atomicradius: calculated distance from nucleus of outermost electron
+            bulkmodulus: bulk modulus (incompressibility)
+            shearmodulus: shear modulus of solid
+            youngmodulus: Young's modulus of solid
+            poissonratio: Poisson ratio of solid
+            density: density at standard temperature and pressure
+            liquiddensity: liquid density at melting point
+            thermalexpansion: linear thermal expansion coefficient
+            meltingpoint: melting temperature in kelvin
+            vaporizationheat: latent heat for liquid-gas transition
+            specificheat: specific heat capacity
+            latticeconstant: crystal lattice constants
+            crystal: basic crystal lattice structure
+            volmolar: molar volume
+            mass: average atomic weight in atomic mass units
+            volume: Volume
+    """
+    filename = os.path.join(os.path.dirname(__file__), "data", "wolfram.csv")
+    if not os.path.exists(filename):
+        _wolframalpha_download()
+    df = pandas.read_csv(filename)
+    return df[df.element == chemical_element].squeeze(axis=0).to_dict()


Add error handling for missing chemical elements.

The function assumes that the chemical element will always be found. Add error handling for cases where the element is not found.

- return df[df.element == chemical_element].squeeze(axis=0).to_dict() + try: + return df[df.element == chemical_element].squeeze(axis=0).to_dict() + except KeyError: + return {}

Committable suggestion was skipped due to low confidence.

coderabbitai · 2024-07-29T20:38:37Z

atomistics/referencedata/wolfram.py

+def _get_content_from_url(url):
+    content = pandas.read_html(requests.get(url).text)
+    if len(content[8]) > 1:
+        return content[8]
+    else:
+        return content[9]


Add error handling for network-related errors.

The function does not handle network-related errors, which might cause it to fail silently.

- content = pandas.read_html(requests.get(url).text) + try: + response = requests.get(url) + response.raise_for_status() + content = pandas.read_html(response.text) + except requests.RequestException as e: + print(f"Network error occurred: {e}")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _get_content_from_url(url):

content = pandas.read_html(requests.get(url).text)

if len(content[8]) > 1:

return content[8]

else:

return content[9]

def _get_content_from_url(url):

try:

response = requests.get(url)

response.raise_for_status()

content = pandas.read_html(response.text)

except requests.RequestException as e:

print(f"Network error occurred: {e}")

return None

if len(content[8]) > 1:

return content[8]

else:

return content[9]

coderabbitai · 2024-07-29T20:38:37Z

atomistics/referencedata/wolfram.py

+def _extract_lst(df, column, select_function, current_filter):
+    element_lst, property_lst = [], []
+    try:
+        ptable = fetch_table("elements")
+        for n, el in zip(ptable.name.values, ptable.symbol.values):
+            v = _extract_entry(df=df, n=n)
+            if current_filter(v):
+                element_lst.append(el)
+                property_lst.append(select_function(v))
+    except ValueError:
+        raise ValueError(column, el, v)


Improve error handling by providing more context.

The function raises a ValueError without providing context about the error. Use raise ... from None to distinguish it from errors in exception handling.

- except ValueError: - raise ValueError(column, el, v) + except ValueError as e: + raise ValueError(f"Error extracting {column} for element {el} with value {v}") from e

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _extract_lst(df, column, select_function, current_filter):

element_lst, property_lst = [], []

try:

ptable = fetch_table("elements")

for n, el in zip(ptable.name.values, ptable.symbol.values):

v = _extract_entry(df=df, n=n)

if current_filter(v):

element_lst.append(el)

property_lst.append(select_function(v))

except ValueError:

raise ValueError(column, el, v)

def _extract_lst(df, column, select_function, current_filter):

element_lst, property_lst = [], []

try:

ptable = fetch_table("elements")

for n, el in zip(ptable.name.values, ptable.symbol.values):

v = _extract_entry(df=df, n=n)

if current_filter(v):

element_lst.append(el)

property_lst.append(select_function(v))

except ValueError as e:

raise ValueError(f"Error extracting {column} for element {el} with value {v}") from e

Tools

Ruff

82-82: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

coderabbitai · 2024-07-29T20:38:37Z

atomistics/referencedata/wolfram.py

+def _get_volume(lat_lst, crystal):
+    if not isinstance(lat_lst, float) and len(lat_lst) == 3:
+        if crystal == "Face-centered Cubic":
+            return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100
+        elif crystal == "Body-centered Cubic":
+            return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100


Combine the if conditions.

The if conditions for Face-centered Cubic and Body-centered Cubic are redundant and can be combined.

- if crystal == "Face-centered Cubic": - return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100 - elif crystal == "Body-centered Cubic": + if crystal in ["Face-centered Cubic", "Body-centered Cubic"]: return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _get_volume(lat_lst, crystal):

if not isinstance(lat_lst, float) and len(lat_lst) == 3:

if crystal == "Face-centered Cubic":

return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100

elif crystal == "Body-centered Cubic":

return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100

def _get_volume(lat_lst, crystal):

if not isinstance(lat_lst, float) and len(lat_lst) == 3:

if crystal in ["Face-centered Cubic", "Body-centered Cubic"]:

return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100

Tools

Ruff

105-108: Combine if branches using logical or operator

Combine if branches

(SIM114)

samwaseda · 2024-07-29T21:28:30Z

I need an example to see how you imagined the interaction with the database.

jan-janssen · 2024-07-30T05:21:38Z

I need an example to see how you imagined the interaction with the database.

That's what the unit tests show, right?

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 6

Outside diff range, codebase verification and nitpick comments (1)

tests/test_referencedata.py (1)
1-3: Remove unnecessary blank lines.

There are unnecessary blank lines at the beginning of the file.
import unittest
from atomistics.referencedata import get_experimental_elastic_property_wikipedia

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between a013161 and 7cfd7c9.

Files selected for processing (2)

atomistics/referencedata/wolfram.py (1 hunks)
tests/test_referencedata.py (1 hunks)

Additional context used

Ruff

tests/test_referencedata.py

17-17: lxml imported but unused; consider using importlib.util.find_spec to test for availability

(F401)

atomistics/referencedata/wolfram.py

201-201: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

250-253: Combine if branches using logical or operator

Combine if branches

(SIM114)

Additional comments not posted (13)

tests/test_referencedata.py (3)

29-32: LGTM!

The test method is well-written and conditionally skipped if mendeleev is not available.

37-39: LGTM!

The test method is well-written and conditionally skipped if mendeleev is not available.

44-54: LGTM!

The test method is well-written and conditionally skipped if lxml is not available.
atomistics/referencedata/wolfram.py (10)
27-40: LGTM!

The function is straightforward and correctly handles different density units.

43-56: LGTM!

The function is straightforward and correctly handles both string and float inputs.

59-73: LGTM!

The function is straightforward and correctly handles the conversion.

76-93: LGTM!

The function is straightforward and correctly handles the conversion.

114-124: LGTM!

The function is straightforward and correctly handles the validation.

127-137: LGTM!

The function is straightforward and correctly handles the validation.

140-151: LGTM!

The function is straightforward and correctly handles the selection.

154-168: LGTM!

The function is straightforward and correctly handles the selection.

205-233: LGTM!

The function is straightforward and correctly handles the data collection and extraction.

96-111: Add error handling for missing values.

The function does not handle cases where the value is not found in the DataFrame, which might cause it to fail silently.
            return v
            try:
                return v
            except ValueError:
                print(f"Value {n} not found in DataFrame.")
Likely invalid or redundant comment.

coderabbitai · 2024-07-30T05:53:01Z

tests/test_referencedata.py

+try:
+    from atomistics.referencedata import (
+        get_chemical_information_from_mendeleev,
+        get_chemical_information_from_wolframalpha,
+    )
+
+    mendeleev_not_available = False
+except ImportError:
+    mendeleev_not_available = True
+


Consider using importlib.util.find_spec for conditional imports.

Instead of using try-except blocks for conditional imports, consider using importlib.util.find_spec to check for module availability.

try: from atomistics.referencedata import ( get_chemical_information_from_mendeleev, get_chemical_information_from_wolframalpha, ) mendeleev_not_available = False except ImportError: mendeleev_not_available = True import importlib.util mendeleev_not_available = importlib.util.find_spec("atomistics.referencedata") is None if not mendeleev_not_available: from atomistics.referencedata import ( get_chemical_information_from_mendeleev, get_chemical_information_from_wolframalpha, )

Committable suggestion was skipped due to low confidence.

coderabbitai · 2024-07-30T05:53:01Z

tests/test_referencedata.py

+try:
+    import lxml
+
+    lxml_not_available = False
+except ImportError:
+    lxml_not_available = True


Consider using importlib.util.find_spec for conditional imports.

Instead of using try-except blocks for conditional imports, consider using importlib.util.find_spec to check for module availability.

try: import lxml lxml_not_available = False except ImportError: lxml_not_available = True lxml_not_available = importlib.util.find_spec("lxml") is None if not lxml_not_available: import lxml

Committable suggestion was skipped due to low confidence.

Tools

Ruff

17-17: lxml imported but unused; consider using importlib.util.find_spec to test for availability

(F401)

coderabbitai · 2024-07-30T05:53:01Z

atomistics/referencedata/wolfram.py

+            "url": "https://periodictable.com/Properties/A/ThermalConductivity.an.html",
+            "select_function": _select_function_split,
+            "current_filter": _default_filter,
+        },
+        "atomicradius": {
+            "url": "https://periodictable.com/Properties/A/AtomicRadius.an.html",
+            "select_function": _select_function_split,
+            "current_filter": _default_filter,
+        },
+        "bulkmodulus": {
+            "url": "https://periodictable.com/Properties/A/BulkModulus.an.html",
+            "select_function": _select_function_split,
+            "current_filter": _default_filter,
+        },
+        "shearmodulus": {
+            "url": "https://periodictable.com/Properties/A/ShearModulus.an.html",
+            "select_function": _select_function_split,
+            "current_filter": _default_filter,
+        },
+        "youngmodulus": {
+            "url": "https://periodictable.com/Properties/A/YoungModulus.an.html",
+            "select_function": _select_function_split,
+            "current_filter": _default_filter,
+        },
+        "poissonratio": {
+            "url": "https://periodictable.com/Properties/A/PoissonRatio.an.html",
+            "select_function": _select_function_poisson,
+            "current_filter": _poisson_filter,
+        },
+        "density": {
+            "url": "https://periodictable.com/Properties/A/Density.an.html",
+            "select_function": _select_function_density,
+            "current_filter": _default_filter,
+        },
+        "liquiddensity": {
+            "url": "https://periodictable.com/Properties/A/LiquidDensity.an.html",
+            "select_function": _select_function_split,
+            "current_filter": _default_filter,
+        },
+        "thermalexpansion": {
+            "url": "https://periodictable.com/Properties/A/ThermalExpansion.an.html",
+            "select_function": _select_function_scientific,
+            "current_filter": _default_filter,
+        },
+        "meltingpoint": {
+            "url": "https://periodictable.com/Properties/A/AbsoluteMeltingPoint.an.html",
+            "select_function": _select_function_scientific,
+            "current_filter": _default_filter,
+        },
+        "vaporizationheat": {
+            "url": "https://periodictable.com/Properties/A/VaporizationHeat.an.html",
+            "select_function": _select_function_split,
+            "current_filter": _default_filter,
+        },
+        "specificheat": {
+            "url": "https://periodictable.com/Properties/A/SpecificHeat.an.html",
+            "select_function": _select_function_split,
+            "current_filter": _default_filter,
+        },
+        "latticeconstant": {
+            "url": "https://periodictable.com/Properties/A/LatticeConstants.an.html",
+            "select_function": _select_function_lattice,
+            "current_filter": _default_filter,
+        },
+        "crystal": {
+            "url": "https://periodictable.com/Properties/A/CrystalStructure.an.html",
+            "select_function": _select_function_poisson,
+            "current_filter": _default_filter,
+        },
+        "volmolar": {
+            "url": "https://periodictable.com/Properties/A/MolarVolume.an.html",
+            "select_function": _select_function_scientific,
+            "current_filter": _default_filter,
+        },
+        "mass": {
+            "url": "https://periodictable.com/Properties/A/AtomicMass.an.html",
+            "select_function": _select_function_mass,
+            "current_filter": _default_filter,
+        },
+    }
+    result = pandas.concat(
+        [
+            _collect(
+                url=v["url"],
+                column=k,
+                select_function=v["select_function"],
+                current_filter=v["current_filter"],
+            ).set_index("element")
+            for k, v in data_dict.items()
+        ],
+        axis=1,
+        sort=False,
+    )
+    result["volume"] = result.apply(
+        lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1
+    )
+    data_path = os.path.join(os.path.dirname(__file__), "data")
+    os.makedirs(data_path, exist_ok=True)
+    result.to_csv(os.path.join(data_path, "wolfram.csv"))
+


Add logging for download progress and error handling for network-related errors.

The function might benefit from logging the download progress to provide feedback to the user and handling network-related errors.

import logging logging.basicConfig(level=logging.INFO) def _wolframalpha_download() -> None: """ Downloads data from Wolfram Alpha and saves it to a CSV file. This function downloads various properties data from Wolfram Alpha using the provided URLs in the `data_dict`. The downloaded data is then processed and saved to a CSV file named "wolfram.csv" in the "data" directory. Note: This function requires internet connectivity to download the data. """ data_dict = { "thermalcondictivity": { "url": "https://periodictable.com/Properties/A/ThermalConductivity.an.html", "select_function": _select_function_split, "current_filter": _default_filter, }, "atomicradius": { "url": "https://periodictable.com/Properties/A/AtomicRadius.an.html", "select_function": _select_function_split, "current_filter": _default_filter, }, "bulkmodulus": { "url": "https://periodictable.com/Properties/A/BulkModulus.an.html", "select_function": _select_function_split, "current_filter": _default_filter, }, "shearmodulus": { "url": "https://periodictable.com/Properties/A/ShearModulus.an.html", "select_function": _select_function_split, "current_filter": _default_filter, }, "youngmodulus": { "url": "https://periodictable.com/Properties/A/YoungModulus.an.html", "select_function": _select_function_split, "current_filter": _default_filter, }, "poissonratio": { "url": "https://periodictable.com/Properties/A/PoissonRatio.an.html", "select_function": _select_function_poisson, "current_filter": _poisson_filter, }, "density": { "url": "https://periodictable.com/Properties/A/Density.an.html", "select_function": _select_function_density, "current_filter": _default_filter, }, "liquiddensity": { "url": "https://periodictable.com/Properties/A/LiquidDensity.an.html", "select_function": _select_function_split, "current_filter": _default_filter, }, "thermalexpansion": { "url": "https://periodictable.com/Properties/A/ThermalExpansion.an.html", "select_function": _select_function_scientific, "current_filter": _default_filter, }, "meltingpoint": { "url": "https://periodictable.com/Properties/A/AbsoluteMeltingPoint.an.html", "select_function": _select_function_scientific, "current_filter": _default_filter, }, "vaporizationheat": { "url": "https://periodictable.com/Properties/A/VaporizationHeat.an.html", "select_function": _select_function_split, "current_filter": _default_filter, }, "specificheat": { "url": "https://periodictable.com/Properties/A/SpecificHeat.an.html", "select_function": _select_function_split, "current_filter": _default_filter, }, "latticeconstant": { "url": "https://periodictable.com/Properties/A/LatticeConstants.an.html", "select_function": _select_function_lattice, "current_filter": _default_filter, }, "crystal": { "url": "https://periodictable.com/Properties/A/CrystalStructure.an.html", "select_function": _select_function_poisson, "current_filter": _default_filter, }, "volmolar": {

coderabbitai · 2024-07-30T05:53:01Z

atomistics/referencedata/wolfram.py

+def _get_volume(
+    lat_lst: Union[float, Tuple[float, float, float]], crystal: str
+) -> Optional[float]:
+    """
+    Calculate the volume of a crystal unit cell.
+
+    Args:
+        lat_lst: Lattice constants of the crystal unit cell.
+        crystal: Basic crystal lattice structure.
+
+    Returns:
+        The volume of the crystal unit cell in cubic meters, or None if the input is invalid.
+    """
+    if not isinstance(lat_lst, float) and len(lat_lst) == 3:
+        if crystal == "Face-centered Cubic":
+            return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100
+        elif crystal == "Body-centered Cubic":
+            return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100
+        elif crystal == "Simple Hexagonal":
+            return (
+                lat_lst[0]
+                * lat_lst[1]
+                * lat_lst[2]
+                * 3
+                * np.sqrt(3)
+                / 2
+                / 100
+                / 100
+                / 100
+            )
+        else:
+            return None
+    else:
+        return None


Combine the if conditions.

The if conditions for "Face-centered Cubic" and "Body-centered Cubic" crystals are redundant and can be combined.

if crystal == "Face-centered Cubic": return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100 elif crystal == "Body-centered Cubic": if crystal in ["Face-centered Cubic", "Body-centered Cubic"]: return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _get_volume(

lat_lst: Union[float, Tuple[float, float, float]], crystal: str

) -> Optional[float]:

"""

Calculate the volume of a crystal unit cell.

Args:

lat_lst: Lattice constants of the crystal unit cell.

crystal: Basic crystal lattice structure.

Returns:

The volume of the crystal unit cell in cubic meters, or None if the input is invalid.

"""

if not isinstance(lat_lst, float) and len(lat_lst) == 3:

if crystal == "Face-centered Cubic":

return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100

elif crystal == "Body-centered Cubic":

return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100

elif crystal == "Simple Hexagonal":

return (

lat_lst[0]

* lat_lst[1]

* lat_lst[2]

* 3

* np.sqrt(3)

/ 2

/ 100

/ 100

/ 100

)

else:

return None

else:

return None

if not isinstance(lat_lst, float) and len(lat_lst) == 3:

if crystal in ["Face-centered Cubic", "Body-centered Cubic"]:

return lat_lst[0] * lat_lst[1] * lat_lst[2] / 100 / 100 / 100

elif crystal == "Simple Hexagonal":

return (

lat_lst[0]

* lat_lst[1]

* lat_lst[2]

* 3

* np.sqrt(3)

/ 2

/ 100

/ 100

/ 100

)

else:

return None

else:

return None

Tools

Ruff

250-253: Combine if branches using logical or operator

Combine if branches

(SIM114)

coderabbitai · 2024-07-30T05:53:01Z

atomistics/referencedata/wolfram.py

+def _get_content_from_url(url: str) -> pandas.DataFrame:
+    """
+    Retrieves content from the given URL and returns a pandas DataFrame.
+
+    Args:
+        url (str): The URL to retrieve content from.
+
+    Returns:
+        pandas.DataFrame: The content retrieved from the URL as a pandas DataFrame.
+    """
+    content = pandas.read_html(requests.get(url).text)
+    if len(content[8]) > 1:
+        return content[8]
+    else:
+        return content[9]
+


Add error handling for network-related errors.

The function does not handle network-related errors, which might cause it to fail silently.

content = pandas.read_html(requests.get(url).text) try: response = requests.get(url) response.raise_for_status() content = pandas.read_html(response.text) except requests.RequestException as e: print(f"Network error occurred: {e}")

Committable suggestion was skipped due to low confidence.

coderabbitai · 2024-07-30T05:53:02Z

atomistics/referencedata/wolfram.py

+def _extract_lst(
+    df: pandas.DataFrame,
+    column: str,
+    select_function: Callable,
+    current_filter: Callable,
+) -> Tuple[List[str], List[str]]:
+    """
+    Extracts element and property lists from a DataFrame based on the given column, select function, and current filter.
+
+    Args:
+        df (DataFrame): The DataFrame to extract data from.
+        column (str): The column name to extract data from.
+        select_function (Callable): A function that selects a property value from a DataFrame entry.
+        current_filter (Callable): A function that filters DataFrame entries based on a condition.
+
+    Returns:
+        Tuple[List[str], List[str]]: A tuple containing the element list and property list.
+
+    Raises:
+        ValueError: If an error occurs during the extraction process.
+    """
+    element_lst, property_lst = [], []
+    try:
+        ptable = fetch_table("elements")
+        for n, el in zip(ptable.name.values, ptable.symbol.values):
+            v = _extract_entry(df=df, n=n)
+            if current_filter(v):
+                element_lst.append(el)
+                property_lst.append(select_function(v))
+    except ValueError:
+        raise ValueError(column, el, v)
+    return element_lst, property_lst


Improve error handling by providing more context.

The function raises a ValueError without providing context about the error. Use raise ... from None to distinguish it from errors in exception handling.

raise ValueError(column, el, v) raise ValueError(f"Error extracting {column} for element {el} with value {v}") from e

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _extract_lst(

df: pandas.DataFrame,

column: str,

select_function: Callable,

current_filter: Callable,

) -> Tuple[List[str], List[str]]:

"""

Extracts element and property lists from a DataFrame based on the given column, select function, and current filter.

Args:

df (DataFrame): The DataFrame to extract data from.

column (str): The column name to extract data from.

select_function (Callable): A function that selects a property value from a DataFrame entry.

current_filter (Callable): A function that filters DataFrame entries based on a condition.

Returns:

Tuple[List[str], List[str]]: A tuple containing the element list and property list.

Raises:

ValueError: If an error occurs during the extraction process.

"""

element_lst, property_lst = [], []

try:

ptable = fetch_table("elements")

for n, el in zip(ptable.name.values, ptable.symbol.values):

v = _extract_entry(df=df, n=n)

if current_filter(v):

element_lst.append(el)

property_lst.append(select_function(v))

except ValueError:

raise ValueError(column, el, v)

return element_lst, property_lst

def _extract_lst(

df: pandas.DataFrame,

column: str,

select_function: Callable,

current_filter: Callable,

) -> Tuple[List[str], List[str]]:

"""

Extracts element and property lists from a DataFrame based on the given column, select function, and current filter.

Args:

df (DataFrame): The DataFrame to extract data from.

column (str): The column name to extract data from.

select_function (Callable): A function that selects a property value from a DataFrame entry.

current_filter (Callable): A function that filters DataFrame entries based on a condition.

Returns:

Tuple[List[str], List[str]]: A tuple containing the element list and property list.

Raises:

ValueError: If an error occurs during the extraction process.

"""

element_lst, property_lst = [], []

try:

ptable = fetch_table("elements")

for n, el in zip(ptable.name.values, ptable.symbol.values):

v = _extract_entry(df=df, n=n)

if current_filter(v):

element_lst.append(el)

property_lst.append(select_function(v))

except ValueError as e:

raise ValueError(f"Error extracting {column} for element {el} with value {v}") from e

return element_lst, property_lst

Tools

Ruff

201-201: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

…ference_data

for more information, see https://pre-commit.ci

Merge main

samwaseda

The names must be consistent: currently get_{property}_from_{source} for Wolfram and Mendeleev but get_{property}_{source} for Wikipedia. And I would also avoid putting specifiers, in this case experimental, in the function names. There's anyway no way to get theoretical elastic constants, but in general I would put experimental yes/no etc. in the function arguments.

samwaseda · 2024-07-30T06:31:18Z

Actually I would simply call these functions something like get_chemical_information or get_elastic_property, so that people can use it like

from referencedata import wikipedia


elast = wikipedia.get_elastic_property("Al")

because this way the hierarchy becomes clearer, it avoids having to import too many functions and the function names are shorter.

…y_wikipedia() to get_elastic_properties_from_wikipedia()

coderabbitai

Actionable comments posted: 0

Outside diff range, codebase verification and nitpick comments (2)

tests/test_referencedata.py (2)
5-14: Consider using importlib.util.find_spec for conditional imports.

Instead of using try-except blocks for conditional imports, consider using importlib.util.find_spec to check for module availability.
- try:
-     from atomistics.referencedata import (
-         get_chemical_information_from_mendeleev,
-         get_chemical_information_from_wolframalpha,
-     )
-     mendeleev_not_available = False
- except ImportError:
-     mendeleev_not_available = True
+ import importlib.util
+ mendeleev_not_available = importlib.util.find_spec("atomistics.referencedata") is None
+ if not mendeleev_not_available:
+     from atomistics.referencedata import (
+         get_chemical_information_from_mendeleev,
+         get_chemical_information_from_wolframalpha,
+     )
16-21: Remove unused import lxml.

The lxml module is imported but not used in the code. Consider using importlib.util.find_spec to check for module availability instead.
- try:
-     import lxml
-     lxml_not_available = False
- except ImportError:
-     lxml_not_available = True
+ import importlib.util
+ lxml_not_available = importlib.util.find_spec("lxml") is None
Tools

Ruff

17-17: lxml imported but unused; consider using importlib.util.find_spec to test for availability

(F401)

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 7cfd7c9 and fa0929b.

Files selected for processing (4)

atomistics/referencedata/init.py (1 hunks)
atomistics/referencedata/wiki.py (1 hunks)
atomistics/referencedata/wolfram.py (1 hunks)
tests/test_referencedata.py (1 hunks)

Files skipped from review as they are similar to previous changes (3)

atomistics/referencedata/init.py
atomistics/referencedata/wiki.py
atomistics/referencedata/wolfram.py

Additional context used

Ruff

tests/test_referencedata.py

17-17: lxml imported but unused; consider using importlib.util.find_spec to test for availability

(F401)

Additional comments not posted (6)

tests/test_referencedata.py (6)

24-28: LGTM!

The unittest.skipIf condition is correctly implemented to skip the test if mendeleev is not available.

29-32: LGTM!

The test method correctly checks if the atomic number of Aluminum is 13.

33-36: LGTM!

The unittest.skipIf condition is correctly implemented to skip the test if mendeleev is not available.

37-40: LGTM!

The test method correctly checks if the lattice constant of Aluminum is retrieved as expected.

41-43: LGTM!

The unittest.skipIf condition is correctly implemented to skip the test if lxml is not available.

44-54: LGTM!

The test method correctly checks if the elastic properties of Aluminum are retrieved as expected.

from atomistics.referencedata import get_elastic_properties_from_wikipedia as well as from atomistics.referencedata.wikipedia import get_elastic_properties

for more information, see https://pre-commit.ci

jan-janssen · 2024-07-30T06:40:30Z

@samwaseda Support both:

from atomistics.referencedata import get_elastic_properties_from_wikipedia

as well as

from atomistics.referencedata.wikipedia import get_elastic_properties

I like the first one, as it allows me to get a quick overview of which modules are available but I can also live with the second one in particular for properties which are available from multiple sources.

samwaseda

ok for the rest we have to see how things evolve. I don't think we'll suffer from backward incompatibility in the near future, so you can merge it

samwaseda · 2024-07-30T06:45:26Z

atomistics/referencedata/wikipedia.py

+        property (str): Name of the property to retrieve. Options: youngs_modulus, poissons_ratio, bulk_modulus, shear_modulus
+
+    Returns:
+        str: Property value (various types): Value of the property for the given element, if available.


Suggested change

str: Property value (various types): Value of the property for the given element, if available.

str: Property value (various types): Value of the property for the given element in GPa, if available.

coderabbitai

Actionable comments posted: 13

Outside diff range, codebase verification and nitpick comments (2)

atomistics/referencedata/mendeleev.py (1)
4-112: Specify the return type in the docstring.

The docstring should specify that the function returns a dictionary.
-        dict: Dictionary with the following keys
+        dict: A dictionary containing the following keys
atomistics/referencedata/wolframalpha.py (1)
384-414: Specify the return type in the docstring.

The docstring should specify that the function returns a dictionary.
-        dict: Dictionary with the following keys
+        dict: A dictionary containing the following keys

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between fa0929b and fe3c0d7.

Files selected for processing (4)

atomistics/referencedata/init.py (1 hunks)
atomistics/referencedata/mendeleev.py (1 hunks)
atomistics/referencedata/wikipedia.py (1 hunks)
atomistics/referencedata/wolframalpha.py (1 hunks)

Files skipped from review as they are similar to previous changes (1)

atomistics/referencedata/init.py

Additional comments not posted (2)

atomistics/referencedata/wikipedia.py (1)
25-29: Verify table order assumption and handle potential changes.

The function assumes that the order of tables in the Wikipedia page will not change, which might not be a safe assumption. Consider adding a verification step to ensure the correct table is being used.
+    if len(df_lst) < len(property_lst):
+        raise ValueError("The number of tables on the Wikipedia page is less than expected.")
atomistics/referencedata/wolframalpha.py (1)
21-24: Verify table order assumption and handle potential changes.

The function assumes that the content will always be in the 8th or 9th table, which might not be a safe assumption. Consider adding a verification step to ensure the correct table is being used.
+    if len(content) < 9:
+        raise ValueError("The number of tables on the page is less than expected.")

coderabbitai · 2024-07-30T06:45:28Z

atomistics/referencedata/wikipedia.py

+    df_lst = pandas.read_html(
+        "https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page)"
+    )


Handle potential errors during data retrieval.

The function does not handle potential errors that may occur during the data retrieval process. Add error handling to manage cases where the URL might be inaccessible or the data format changes.

+ try: + df_lst = pandas.read_html( + "https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page)" + ) + except Exception as e: + raise ValueError(f"Error retrieving data from Wikipedia: {e}")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

df_lst = pandas.read_html(

"https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page)"

)

try:

df_lst = pandas.read_html(

"https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page)"

)

except Exception as e:

raise ValueError(f"Error retrieving data from Wikipedia: {e}")

coderabbitai · 2024-07-30T06:45:28Z

atomistics/referencedata/wikipedia.py

+def get_elastic_properties(chemical_symbol: str) -> dict:
+    """
+    Looks up elastic properties for a given chemical symbol from the Wikipedia: https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page) sourced from webelements.com.
+
+    Args:
+        chemical_symbol (str): Chemical symbol of the element.
+        property (str): Name of the property to retrieve. Options: youngs_modulus, poissons_ratio, bulk_modulus, shear_modulus
+
+    Returns:
+        str: Property value (various types): Value of the property for the given element, if available.
+    """


Fix the docstring to match the function signature.

The docstring mentions an argument property which is not present in the function signature. Remove the mention of the property argument from the docstring.

- property (str): Name of the property to retrieve. Options: youngs_modulus, poissons_ratio, bulk_modulus, shear_modulus

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def get_elastic_properties(chemical_symbol: str) -> dict:

"""

Looks up elastic properties for a given chemical symbol from the Wikipedia: https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page) sourced from webelements.com.

Args:

chemical_symbol (str): Chemical symbol of the element.

property (str): Name of the property to retrieve. Options: youngs_modulus, poissons_ratio, bulk_modulus, shear_modulus

Returns:

str: Property value (various types): Value of the property for the given element, if available.

"""

def get_elastic_properties(chemical_symbol: str) -> dict:

"""

Looks up elastic properties for a given chemical symbol from the Wikipedia: https://en.wikipedia.org/wiki/Elastic_properties_of_the_elements_(data_page) sourced from webelements.com.

Args:

chemical_symbol (str): Chemical symbol of the element.

Returns:

str: Property value (various types): Value of the property for the given element, if available.

"""

coderabbitai · 2024-07-30T06:45:28Z

atomistics/referencedata/mendeleev.py

+    df = fetch_table("elements")
+    return df[df.symbol == chemical_symbol].squeeze(axis=0).to_dict()


Handle cases where the chemical symbol is not found.

The function does not handle cases where the chemical symbol is not found in the table. Add error handling to manage this scenario.

+ element_data = df[df.symbol == chemical_symbol] + if element_data.empty: + raise ValueError(f"Chemical symbol '{chemical_symbol}' not found in the table.") + return element_data.squeeze(axis=0).to_dict()

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

df = fetch_table("elements")

return df[df.symbol == chemical_symbol].squeeze(axis=0).to_dict()

df = fetch_table("elements")

element_data = df[df.symbol == chemical_symbol]

if element_data.empty:

raise ValueError(f"Chemical symbol '{chemical_symbol}' not found in the table.")

return element_data.squeeze(axis=0).to_dict()

coderabbitai · 2024-07-30T06:45:28Z

atomistics/referencedata/wolframalpha.py

+def _select_function_scientific(v: Union[str, float]) -> float:
+    """
+    Converts a scientific notation string to a float.
+
+    Args:
+        v (Union[str, float]): The value to be converted. If it is a string, it should be in scientific notation.
+
+    Returns:
+        float: The converted value.
+
+    Example:
+        >>> _select_function_scientific('1.23E-4')
+        0.000123
+    """
+    if isinstance(v, str):
+        return float(v.split()[0].replace("×10", "E"))
+    else:
+        return v


Handle unexpected input formats.

The function does not handle cases where the input string is not in the expected format. Add error handling to manage these scenarios.

+ try: + if isinstance(v, str): + return float(v.split()[0].replace("×10", "E")) + else: + return v + except (ValueError, IndexError) as e: + raise ValueError(f"Unexpected input format for scientific value '{v}': {e}")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _select_function_scientific(v: Union[str, float]) -> float:

"""

Converts a scientific notation string to a float.

Args:

v (Union[str, float]): The value to be converted. If it is a string, it should be in scientific notation.

Returns:

float: The converted value.

Example:

>>> _select_function_scientific('1.23E-4')

0.000123

"""

if isinstance(v, str):

return float(v.split()[0].replace("×10", "E"))

else:

return v

def _select_function_scientific(v: Union[str, float]) -> float:

"""

Converts a scientific notation string to a float.

Args:

v (Union[str, float]): The value to be converted. If it is a string, it should be in scientific notation.

Returns:

float: The converted value.

Example:

>>> _select_function_scientific('1.23E-4')

0.000123

"""

try:

if isinstance(v, str):

return float(v.split()[0].replace("×10", "E"))

else:

return v

except (ValueError, IndexError) as e:

raise ValueError(f"Unexpected input format for scientific value '{v}': {e}")

coderabbitai · 2024-07-30T06:45:28Z

atomistics/referencedata/wolframalpha.py

+def _select_function_lattice(v: str) -> tuple[float, float, float]:
+    """
+    Convert a string representation of a lattice vector to a tuple of floats.
+
+    Args:
+        v (str): The string representation of the lattice vector.
+
+    Returns:
+        tuple[float, float, float]: The lattice vector as a tuple of floats.
+    """
+    return (
+        float(v.split(", ")[0]),
+        float(v.split(", ")[1]),
+        float(v.split(", ")[2].split()[0]),
+    )


Handle unexpected input formats.

The function does not handle cases where the input string is not in the expected format. Add error handling to manage these scenarios.

+ try: + return ( + float(v.split(", ")[0]), + float(v.split(", ")[1]), + float(v.split(", ")[2].split()[0]), + ) + except (ValueError, IndexError) as e: + raise ValueError(f"Unexpected input format for lattice value '{v}': {e}")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _select_function_lattice(v: str) -> tuple[float, float, float]:

"""

Convert a string representation of a lattice vector to a tuple of floats.

Args:

v (str): The string representation of the lattice vector.

Returns:

tuple[float, float, float]: The lattice vector as a tuple of floats.

"""

return (

float(v.split(", ")[0]),

float(v.split(", ")[1]),

float(v.split(", ")[2].split()[0]),

)

def _select_function_lattice(v: str) -> tuple[float, float, float]:

"""

Convert a string representation of a lattice vector to a tuple of floats.

Args:

v (str): The string representation of the lattice vector.

Returns:

tuple[float, float, float]: The lattice vector as a tuple of floats.

"""

try:

return (

float(v.split(", ")[0]),

float(v.split(", ")[1]),

float(v.split(", ")[2].split()[0]),

)

except (ValueError, IndexError) as e:

raise ValueError(f"Unexpected input format for lattice value '{v}': {e}")

coderabbitai · 2024-07-30T06:45:28Z

atomistics/referencedata/wolframalpha.py

+def _select_function_split(v: Union[str, float]) -> float:
+    """
+    Splits a string and returns the first element as a float.
+
+    Args:
+        v (Union[str, float]): The input value to be processed.
+
+    Returns:
+        float: The first element of the input string as a float, or the input value if it's already a float.
+    """
+    if isinstance(v, str):
+        return float(v.split()[0])
+    else:
+        return v


Handle unexpected input formats.

The function does not handle cases where the input string is not in the expected format. Add error handling to manage these scenarios.

+ try: + if isinstance(v, str): + return float(v.split()[0]) + else: + return v + except (ValueError, IndexError) as e: + raise ValueError(f"Unexpected input format for value '{v}': {e}")

Committable suggestion was skipped due to low confidence.

coderabbitai · 2024-07-30T06:45:28Z

atomistics/referencedata/wolframalpha.py

+def _extract_entry(df: pandas.DataFrame, n: str) -> Optional[Union[str, float]]:
+    """
+    Extracts the value corresponding to the given element name from a DataFrame.
+
+    Args:
+        df (pandas.DataFrame): The DataFrame to extract data from.
+        n (str): The element name to extract data for.
+
+    Returns:
+        Optional[Union[str, float]]: The extracted value if found, None otherwise.
+    """
+    for i in range(int(len(df.columns) / 2)):
+        if n in df[i].values:
+            v = df[i + 1].values[df[i].values.tolist().index(n)]
+            return v
+    return None


Handle cases where the element name is not found.

The function does not handle cases where the element name is not found in the DataFrame. Add error handling to manage this scenario.

+ for i in range(int(len(df.columns) / 2)): + if n in df[i].values: + v = df[i + 1].values[df[i].values.tolist().index(n)] + return v + raise ValueError(f"Element name '{n}' not found in the DataFrame.")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _extract_entry(df: pandas.DataFrame, n: str) -> Optional[Union[str, float]]:

"""

Extracts the value corresponding to the given element name from a DataFrame.

Args:

df (pandas.DataFrame): The DataFrame to extract data from.

n (str): The element name to extract data for.

Returns:

Optional[Union[str, float]]: The extracted value if found, None otherwise.

"""

for i in range(int(len(df.columns) / 2)):

if n in df[i].values:

v = df[i + 1].values[df[i].values.tolist().index(n)]

return v

return None

def _extract_entry(df: pandas.DataFrame, n: str) -> Optional[Union[str, float]]:

"""

Extracts the value corresponding to the given element name from a DataFrame.

Args:

df (pandas.DataFrame): The DataFrame to extract data from.

n (str): The element name to extract data for.

Returns:

Optional[Union[str, float]]: The extracted value if found, None otherwise.

"""

for i in range(int(len(df.columns) / 2)):

if n in df[i].values:

v = df[i + 1].values[df[i].values.tolist().index(n)]

return v

raise ValueError(f"Element name '{n}' not found in the DataFrame.")

coderabbitai · 2024-07-30T06:45:29Z

atomistics/referencedata/wolframalpha.py

+def _collect(
+    url: str, column: str, select_function: callable, current_filter: callable
+) -> pandas.DataFrame:
+    """
+    Collect data from a given URL and extract specific properties.
+
+    Args:
+        url: The URL to fetch the data from.
+        column: The name of the column to extract.
+        select_function: The function to apply to the extracted data.
+        current_filter: The filter function to apply to the extracted data.
+
+    Returns:
+        A pandas DataFrame containing the extracted data.
+    """
+    return pandas.DataFrame(
+        {
+            n: d
+            for n, d in zip(
+                ["element", column],
+                _extract_lst(
+                    df=_get_content_from_url(url=url),
+                    column=column,
+                    select_function=select_function,
+                    current_filter=current_filter,
+                ),
+            )
+        }
+    )


Handle potential errors during data retrieval.

The function does not handle potential errors that may occur during the data retrieval process. Add error handling to manage cases where the URL might be inaccessible or the data format changes.

+ try: + data = _extract_lst( + df=_get_content_from_url(url=url), + column=column, + select_function=select_function, + current_filter=current_filter, + ) + except Exception as e: + raise ValueError(f"Error collecting data from URL '{url}': {e}")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _collect(

url: str, column: str, select_function: callable, current_filter: callable

) -> pandas.DataFrame:

"""

Collect data from a given URL and extract specific properties.

Args:

url: The URL to fetch the data from.

column: The name of the column to extract.

select_function: The function to apply to the extracted data.

current_filter: The filter function to apply to the extracted data.

Returns:

A pandas DataFrame containing the extracted data.

"""

return pandas.DataFrame(

{

n: d

for n, d in zip(

["element", column],

_extract_lst(

df=_get_content_from_url(url=url),

column=column,

select_function=select_function,

current_filter=current_filter,

),

)

}

)

def _collect(

url: str, column: str, select_function: callable, current_filter: callable

) -> pandas.DataFrame:

"""

Collect data from a given URL and extract specific properties.

Args:

url: The URL to fetch the data from.

column: The name of the column to extract.

select_function: The function to apply to the extracted data.

current_filter: The filter function to apply to the extracted data.

Returns:

A pandas DataFrame containing the extracted data.

"""

try:

data = _extract_lst(

df=_get_content_from_url(url=url),

column=column,

select_function=select_function,

current_filter=current_filter,

)

except Exception as e:

raise ValueError(f"Error collecting data from URL '{url}': {e}")

return pandas.DataFrame(

{

n: d

for n, d in zip(

["element", column],

data,

)

}

)

coderabbitai · 2024-07-30T06:45:29Z

atomistics/referencedata/wolframalpha.py

+def _extract_lst(
+    df: pandas.DataFrame,
+    column: str,
+    select_function: Callable,
+    current_filter: Callable,
+) -> Tuple[List[str], List[str]]:
+    """
+    Extracts element and property lists from a DataFrame based on the given column, select function, and current filter.
+
+    Args:
+        df (DataFrame): The DataFrame to extract data from.
+        column (str): The column name to extract data from.
+        select_function (Callable): A function that selects a property value from a DataFrame entry.
+        current_filter (Callable): A function that filters DataFrame entries based on a condition.
+
+    Returns:
+        Tuple[List[str], List[str]]: A tuple containing the element list and property list.
+
+    Raises:
+        ValueError: If an error occurs during the extraction process.
+    """
+    element_lst, property_lst = [], []
+    try:
+        ptable = fetch_table("elements")
+        for n, el in zip(ptable.name.values, ptable.symbol.values):
+            v = _extract_entry(df=df, n=n)
+            if current_filter(v):
+                element_lst.append(el)
+                property_lst.append(select_function(v))
+    except ValueError as e:
+        raise ValueError(
+            f"Error extracting {column} for element {el} with value {v}"
+        ) from e
+    return element_lst, property_lst


Handle cases where the DataFrame is empty or the column is not found.

The function does not handle cases where the DataFrame is empty or the column is not found. Add error handling to manage these scenarios.

+ if df.empty: + raise ValueError("The DataFrame is empty.") + if column not in df.columns: + raise ValueError(f"Column '{column}' not found in the DataFrame.")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _extract_lst(

df: pandas.DataFrame,

column: str,

select_function: Callable,

current_filter: Callable,

) -> Tuple[List[str], List[str]]:

"""

Extracts element and property lists from a DataFrame based on the given column, select function, and current filter.

Args:

df (DataFrame): The DataFrame to extract data from.

column (str): The column name to extract data from.

select_function (Callable): A function that selects a property value from a DataFrame entry.

current_filter (Callable): A function that filters DataFrame entries based on a condition.

Returns:

Tuple[List[str], List[str]]: A tuple containing the element list and property list.

Raises:

ValueError: If an error occurs during the extraction process.

"""

element_lst, property_lst = [], []

try:

ptable = fetch_table("elements")

for n, el in zip(ptable.name.values, ptable.symbol.values):

v = _extract_entry(df=df, n=n)

if current_filter(v):

element_lst.append(el)

property_lst.append(select_function(v))

except ValueError as e:

raise ValueError(

f"Error extracting {column} for element {el} with value {v}"

) from e

return element_lst, property_lst

def _extract_lst(

df: pandas.DataFrame,

column: str,

select_function: Callable,

current_filter: Callable,

) -> Tuple[List[str], List[str]]:

"""

Extracts element and property lists from a DataFrame based on the given column, select function, and current filter.

Args:

df (DataFrame): The DataFrame to extract data from.

column (str): The column name to extract data from.

select_function (Callable): A function that selects a property value from a DataFrame entry.

current_filter (Callable): A function that filters DataFrame entries based on a condition.

Returns:

Tuple[List[str], List[str]]: A tuple containing the element list and property list.

Raises:

ValueError: If an error occurs during the extraction process.

"""

if df.empty:

raise ValueError("The DataFrame is empty.")

if column not in df.columns:

raise ValueError(f"Column '{column}' not found in the DataFrame.")

element_lst, property_lst = [], []

try:

ptable = fetch_table("elements")

for n, el in zip(ptable.name.values, ptable.symbol.values):

v = _extract_entry(df=df, n=n)

if current_filter(v):

element_lst.append(el)

property_lst.append(select_function(v))

except ValueError as e:

raise ValueError(

f"Error extracting {column} for element {el} with value {v}"

) from e

return element_lst, property_lst

coderabbitai · 2024-07-30T06:45:29Z

atomistics/referencedata/wolframalpha.py

+        "thermalcondictivity": {
+            "url": "https://periodictable.com/Properties/A/ThermalConductivity.an.html",
+            "select_function": _select_function_split,
+            "current_filter": _default_filter,
+        },
+        "atomicradius": {
+            "url": "https://periodictable.com/Properties/A/AtomicRadius.an.html",
+            "select_function": _select_function_split,
+            "current_filter": _default_filter,
+        },
+        "bulkmodulus": {
+            "url": "https://periodictable.com/Properties/A/BulkModulus.an.html",
+            "select_function": _select_function_split,
+            "current_filter": _default_filter,
+        },
+        "shearmodulus": {
+            "url": "https://periodictable.com/Properties/A/ShearModulus.an.html",
+            "select_function": _select_function_split,
+            "current_filter": _default_filter,
+        },
+        "youngmodulus": {
+            "url": "https://periodictable.com/Properties/A/YoungModulus.an.html",
+            "select_function": _select_function_split,
+            "current_filter": _default_filter,
+        },
+        "poissonratio": {
+            "url": "https://periodictable.com/Properties/A/PoissonRatio.an.html",
+            "select_function": _select_function_poisson,
+            "current_filter": _poisson_filter,
+        },
+        "density": {
+            "url": "https://periodictable.com/Properties/A/Density.an.html",
+            "select_function": _select_function_density,
+            "current_filter": _default_filter,
+        },
+        "liquiddensity": {
+            "url": "https://periodictable.com/Properties/A/LiquidDensity.an.html",
+            "select_function": _select_function_split,
+            "current_filter": _default_filter,
+        },
+        "thermalexpansion": {
+            "url": "https://periodictable.com/Properties/A/ThermalExpansion.an.html",
+            "select_function": _select_function_scientific,
+            "current_filter": _default_filter,
+        },
+        "meltingpoint": {
+            "url": "https://periodictable.com/Properties/A/AbsoluteMeltingPoint.an.html",
+            "select_function": _select_function_scientific,
+            "current_filter": _default_filter,
+        },
+        "vaporizationheat": {
+            "url": "https://periodictable.com/Properties/A/VaporizationHeat.an.html",
+            "select_function": _select_function_split,
+            "current_filter": _default_filter,
+        },
+        "specificheat": {
+            "url": "https://periodictable.com/Properties/A/SpecificHeat.an.html",
+            "select_function": _select_function_split,
+            "current_filter": _default_filter,
+        },
+        "latticeconstant": {
+            "url": "https://periodictable.com/Properties/A/LatticeConstants.an.html",
+            "select_function": _select_function_lattice,
+            "current_filter": _default_filter,
+        },
+        "crystal": {
+            "url": "https://periodictable.com/Properties/A/CrystalStructure.an.html",
+            "select_function": _select_function_poisson,
+            "current_filter": _default_filter,
+        },
+        "volmolar": {
+            "url": "https://periodictable.com/Properties/A/MolarVolume.an.html",
+            "select_function": _select_function_scientific,
+            "current_filter": _default_filter,
+        },
+        "mass": {
+            "url": "https://periodictable.com/Properties/A/AtomicMass.an.html",
+            "select_function": _select_function_mass,
+            "current_filter": _default_filter,
+        },
+    }
+    result = pandas.concat(
+        [
+            _collect(
+                url=v["url"],
+                column=k,
+                select_function=v["select_function"],
+                current_filter=v["current_filter"],
+            ).set_index("element")
+            for k, v in data_dict.items()
+        ],
+        axis=1,
+        sort=False,
+    )
+    result["volume"] = result.apply(
+        lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1
+    )
+    data_path = os.path.join(os.path.dirname(__file__), "data")
+    os.makedirs(data_path, exist_ok=True)
+    result.to_csv(os.path.join(data_path, "wolfram.csv"))


Handle potential errors during data retrieval and file writing.

The function does not handle potential errors that may occur during the data retrieval and file writing process. Add error handling to manage these scenarios.

+ try: + result = pandas.concat( + [ + _collect( + url=v["url"], + column=k, + select_function=v["select_function"], + current_filter=v["current_filter"], + ).set_index("element") + for k, v in data_dict.items() + ], + axis=1, + sort=False, + ) + result["volume"] = result.apply( + lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1 + ) + data_path = os.path.join(os.path.dirname(__file__), "data") + os.makedirs(data_path, exist_ok=True) + result.to_csv(os.path.join(data_path, "wolfram.csv")) + except Exception as e: + raise ValueError(f"Error downloading data from Wolfram Alpha: {e}")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _wolframalpha_download() -> None:

"""

Downloads data from Wolfram Alpha and saves it to a CSV file.

This function downloads various properties data from Wolfram Alpha using the provided URLs in the `data_dict`.

The downloaded data is then processed and saved to a CSV file named "wolfram.csv" in the "data" directory.

Note: This function requires internet connectivity to download the data.

"""

data_dict = {

"thermalcondictivity": {

"url": "https://periodictable.com/Properties/A/ThermalConductivity.an.html",

"select_function": _select_function_split,

"current_filter": _default_filter,

},

"atomicradius": {

"url": "https://periodictable.com/Properties/A/AtomicRadius.an.html",

"select_function": _select_function_split,

"current_filter": _default_filter,

},

"bulkmodulus": {

"url": "https://periodictable.com/Properties/A/BulkModulus.an.html",

"select_function": _select_function_split,

"current_filter": _default_filter,

},

"shearmodulus": {

"url": "https://periodictable.com/Properties/A/ShearModulus.an.html",

"select_function": _select_function_split,

"current_filter": _default_filter,

},

"youngmodulus": {

"url": "https://periodictable.com/Properties/A/YoungModulus.an.html",

"select_function": _select_function_split,

"current_filter": _default_filter,

},

"poissonratio": {

"url": "https://periodictable.com/Properties/A/PoissonRatio.an.html",

"select_function": _select_function_poisson,

"current_filter": _poisson_filter,

},

"density": {

"url": "https://periodictable.com/Properties/A/Density.an.html",

"select_function": _select_function_density,

"current_filter": _default_filter,

},

"liquiddensity": {

"url": "https://periodictable.com/Properties/A/LiquidDensity.an.html",

"select_function": _select_function_split,

"current_filter": _default_filter,

},

"thermalexpansion": {

"url": "https://periodictable.com/Properties/A/ThermalExpansion.an.html",

"select_function": _select_function_scientific,

"current_filter": _default_filter,

},

"meltingpoint": {

"url": "https://periodictable.com/Properties/A/AbsoluteMeltingPoint.an.html",

"select_function": _select_function_scientific,

"current_filter": _default_filter,

},

"vaporizationheat": {

"url": "https://periodictable.com/Properties/A/VaporizationHeat.an.html",

"select_function": _select_function_split,

"current_filter": _default_filter,

},

"specificheat": {

"url": "https://periodictable.com/Properties/A/SpecificHeat.an.html",

"select_function": _select_function_split,

"current_filter": _default_filter,

},

"latticeconstant": {

"url": "https://periodictable.com/Properties/A/LatticeConstants.an.html",

"select_function": _select_function_lattice,

"current_filter": _default_filter,

},

"crystal": {

"url": "https://periodictable.com/Properties/A/CrystalStructure.an.html",

"select_function": _select_function_poisson,

"current_filter": _default_filter,

},

"volmolar": {

"url": "https://periodictable.com/Properties/A/MolarVolume.an.html",

"select_function": _select_function_scientific,

"current_filter": _default_filter,

},

"mass": {

"url": "https://periodictable.com/Properties/A/AtomicMass.an.html",

"select_function": _select_function_mass,

"current_filter": _default_filter,

},

}

result = pandas.concat(

[

_collect(

url=v["url"],

column=k,

select_function=v["select_function"],

current_filter=v["current_filter"],

).set_index("element")

for k, v in data_dict.items()

],

axis=1,

sort=False,

)

result["volume"] = result.apply(

lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1

)

data_path = os.path.join(os.path.dirname(__file__), "data")

os.makedirs(data_path, exist_ok=True)

result.to_csv(os.path.join(data_path, "wolfram.csv"))

def _wolframalpha_download() -> None:

"""

Downloads data from Wolfram Alpha and saves it to a CSV file.

This function downloads various properties data from Wolfram Alpha using the provided URLs in the `data_dict`.

The downloaded data is then processed and saved to a CSV file named "wolfram.csv" in the "data" directory.

Note: This function requires internet connectivity to download the data.

"""

data_dict = {

"thermalcondictivity": {

"url": "https://periodictable.com/Properties/A/ThermalConductivity.an.html",

"select_function": _select_function_split,

"current_filter": _default_filter,

},

"atomicradius": {

"url": "https://periodictable.com/Properties/A/AtomicRadius.an.html",

"select_function": _select_function_split,

"current_filter": _default_filter,

},

"bulkmodulus": {

"url": "https://periodictable.com/Properties/A/BulkModulus.an.html",

"select_function": _select_function_split,

"current_filter": _default_filter,

},

"shearmodulus": {

"url": "https://periodictable.com/Properties/A/ShearModulus.an.html",

"select_function": _select_function_split,

"current_filter": _default_filter,

},

"youngmodulus": {

"url": "https://periodictable.com/Properties/A/YoungModulus.an.html",

"select_function": _select_function_split,

"current_filter": _default_filter,

},

"poissonratio": {

"url": "https://periodictable.com/Properties/A/PoissonRatio.an.html",

"select_function": _select_function_poisson,

"current_filter": _poisson_filter,

},

"density": {

"url": "https://periodictable.com/Properties/A/Density.an.html",

"select_function": _select_function_density,

"current_filter": _default_filter,

},

"liquiddensity": {

"url": "https://periodictable.com/Properties/A/LiquidDensity.an.html",

"select_function": _select_function_split,

"current_filter": _default_filter,

},

"thermalexpansion": {

"url": "https://periodictable.com/Properties/A/ThermalExpansion.an.html",

"select_function": _select_function_scientific,

"current_filter": _default_filter,

},

"meltingpoint": {

"url": "https://periodictable.com/Properties/A/AbsoluteMeltingPoint.an.html",

"select_function": _select_function_scientific,

"current_filter": _default_filter,

},

"vaporizationheat": {

"url": "https://periodictable.com/Properties/A/VaporizationHeat.an.html",

"select_function": _select_function_split,

"current_filter": _default_filter,

},

"specificheat": {

"url": "https://periodictable.com/Properties/A/SpecificHeat.an.html",

"select_function": _select_function_split,

"current_filter": _default_filter,

},

"latticeconstant": {

"url": "https://periodictable.com/Properties/A/LatticeConstants.an.html",

"select_function": _select_function_lattice,

"current_filter": _default_filter,

},

"crystal": {

"url": "https://periodictable.com/Properties/A/CrystalStructure.an.html",

"select_function": _select_function_poisson,

"current_filter": _default_filter,

},

"volmolar": {

"url": "https://periodictable.com/Properties/A/MolarVolume.an.html",

"select_function": _select_function_scientific,

"current_filter": _default_filter,

},

"mass": {

"url": "https://periodictable.com/Properties/A/AtomicMass.an.html",

"select_function": _select_function_mass,

"current_filter": _default_filter,

},

}

try:

result = pandas.concat(

[

_collect(

url=v["url"],

column=k,

select_function=v["select_function"],

current_filter=v["current_filter"],

).set_index("element")

for k, v in data_dict.items()

],

axis=1,

sort=False,

)

result["volume"] = result.apply(

lambda x: _get_volume(lat_lst=x.latticeconstant, crystal=x.crystal), axis=1

)

data_path = os.path.join(os.path.dirname(__file__), "data")

os.makedirs(data_path, exist_ok=True)

result.to_csv(os.path.join(data_path, "wolfram.csv"))

except Exception as e:

raise ValueError(f"Error downloading data from Wolfram Alpha: {e}")

…o reference_data

atomistics/referencedata/wikipedia.py

coderabbitai

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between fe3c0d7 and 92cf77a.

Files selected for processing (1)

atomistics/referencedata/wikipedia.py (1 hunks)

Files skipped from review as they are similar to previous changes (1)

atomistics/referencedata/wikipedia.py

coderabbitai

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 92cf77a and c744330.

Files selected for processing (1)

.ci_support/environment-old.yml (1 hunks)

Files skipped from review as they are similar to previous changes (1)

.ci_support/environment-old.yml

jan-janssen and others added 2 commits July 29, 2024 19:56

Provide functions for reference data

a0d5c55

[pre-commit.ci] auto fixes from pre-commit.com hooks

dc43179

for more information, see https://pre-commit.ci

jan-janssen marked this pull request as draft July 29, 2024 17:57

coderabbitai bot reviewed Jul 29, 2024

View reviewed changes

jan-janssen and others added 8 commits July 29, 2024 21:51

Add a first set of unit tests

86a1645

[pre-commit.ci] auto fixes from pre-commit.com hooks

e141423

for more information, see https://pre-commit.ci

update dependencies

035d69d

Merge remote-tracking branch 'refs/remotes/origin/reference_data' int…

fe58857

…o reference_data # Conflicts: # atomistics/referencedata/__init__.py

fix import

9f01b57

[pre-commit.ci] auto fixes from pre-commit.com hooks

6de6fed

for more information, see https://pre-commit.ci

Add lxml

be84a5d

[pre-commit.ci] auto fixes from pre-commit.com hooks

a013161

for more information, see https://pre-commit.ci

jan-janssen requested a review from samwaseda July 29, 2024 20:32

jan-janssen marked this pull request as ready for review July 29, 2024 20:32

coderabbitai bot reviewed Jul 29, 2024

View reviewed changes

jan-janssen and others added 2 commits July 30, 2024 07:45

Add docstrings and type hints with co-pilot

8b339c6

[pre-commit.ci] auto fixes from pre-commit.com hooks

7cfd7c9

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Jul 30, 2024

View reviewed changes

jan-janssen and others added 4 commits July 30, 2024 07:56

add coderabbit recommendations

d4d94a7

Merge branch 'reference_data' of github.com:pyiron/atomistics into re…

9830680

…ference_data

[pre-commit.ci] auto fixes from pre-commit.com hooks

d221132

for more information, see https://pre-commit.ci

Merge pull request #315 from pyiron/main

e820d3f

Merge main

samwaseda requested changes Jul 30, 2024

View reviewed changes

Following Sam's suggestion - renamed get_experimental_elastic_propert…

fa0929b

…y_wikipedia() to get_elastic_properties_from_wikipedia()

coderabbitai bot reviewed Jul 30, 2024

View reviewed changes

Support both:

dcb287a

from atomistics.referencedata import get_elastic_properties_from_wikipedia as well as from atomistics.referencedata.wikipedia import get_elastic_properties

[pre-commit.ci] auto fixes from pre-commit.com hooks

fe3c0d7

for more information, see https://pre-commit.ci

samwaseda approved these changes Jul 30, 2024

View reviewed changes

samwaseda reviewed Jul 30, 2024

View reviewed changes

coderabbitai bot reviewed Jul 30, 2024

View reviewed changes

jan-janssen added 2 commits July 30, 2024 08:50

Fix wikipedia docstring

031632d

Merge remote-tracking branch 'refs/remotes/origin/reference_data' int…

92cf77a

…o reference_data

samwaseda reviewed Jul 30, 2024

View reviewed changes

atomistics/referencedata/wikipedia.py Show resolved Hide resolved

coderabbitai bot reviewed Jul 30, 2024

View reviewed changes

jan-janssen added 2 commits July 30, 2024 09:11

Update environment-old.yml

d9d4a83

Update environment-old.yml

c744330

coderabbitai bot reviewed Jul 30, 2024

View reviewed changes

jan-janssen merged commit 3e1a594 into main Jul 30, 2024
22 checks passed

jan-janssen deleted the reference_data branch July 30, 2024 07:45

		df = fetch_table("elements")
		return df[df.symbol == chemical_symbol].squeeze(axis=0).to_dict()

	str: Property value (various types): Value of the property for the given element, if available.
	str: Property value (various types): Value of the property for the given element in GPa, if available.

-    df = fetch_table("elements")
-    return df[df.symbol == chemical_symbol].squeeze(axis=0).to_dict()
+    df = fetch_table("elements")
+    element_data = df[df.symbol == chemical_symbol]
+    if element_data.empty:
+        raise ValueError(f"Chemical symbol '{chemical_symbol}' not found in the table.")
+    return element_data.squeeze(axis=0).to_dict()

Provide functions for reference data #312

Provide functions for reference data #312

Conversation

jan-janssen commented Jul 29, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Jul 29, 2024 • edited Loading

Walkthrough

Changes

Sequence Diagram(s)

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jul 29, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 29, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 29, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 29, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 29, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 29, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 29, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 29, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jul 29, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 29, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 29, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 29, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 29, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 29, 2024

Choose a reason for hiding this comment

samwaseda commented Jul 29, 2024

jan-janssen commented Jul 30, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jul 30, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 30, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 30, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 30, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 30, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 30, 2024

Choose a reason for hiding this comment

samwaseda left a comment

Choose a reason for hiding this comment

samwaseda commented Jul 30, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

jan-janssen commented Jul 30, 2024 • edited Loading

samwaseda left a comment

Choose a reason for hiding this comment

samwaseda Jul 30, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jul 30, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 30, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 30, 2024

Choose a reason for hiding this comment

coderabbitai bot Jul 30, 2024

jan-janssen commented Jul 29, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 29, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

jan-janssen commented Jul 30, 2024 •

edited

Loading